Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComStock! #65

Merged
merged 80 commits into from May 20, 2020
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
fc4c041
Add todo noting use of unit specifications in commercial results.
rHorsey Jun 24, 2019
65e6329
Enable setting of OS_VERSION and OS_SHA through the yml.
rHorsey Jun 24, 2019
eda0f5a
Enable specification of the singularity image and path from the yml f…
rHorsey Jun 24, 2019
fbdbd76
Check singularity container download status code in case of 404, 501,…
rHorsey Jun 24, 2019
2990efe
Adding catch in run_batch to ensure that results are only ever overwr…
rHorsey Jun 24, 2019
5bcd369
Add support for custom (non-linked) gems in the singularity container…
rHorsey Jun 24, 2019
37cb2f9
Adding default commercial workflow generator.
rHorsey Jun 24, 2019
48b1873
Updates to the commercial sobol sampler.
rHorsey Jun 24, 2019
44f59cf
Additional updates to the commercial sobol sampler.
rHorsey Jun 24, 2019
0bc880c
Add support for precomputed buildstock.csv files as a sampling algori…
rHorsey Jun 24, 2019
c0a9bfe
merging master.
rHorsey Aug 26, 2019
b708915
Ensuring standard n_datapoint default interface of None.
rHorsey Aug 26, 2019
d4e48c4
Fix sampling inheritance preemption.
rHorsey Aug 27, 2019
39f7a29
Updates enabling through simulation.
Aug 27, 2019
f1fafb1
Updates to enable commercial apply-upgrade measure usage.
rHorsey Sep 27, 2019
850303d
Merging enable-com
rHorsey Sep 27, 2019
fa7f068
Merge remote-tracking branch 'origin/master' into enable-com
nmerket Oct 10, 2019
810a638
Adding in option of additional QAQC measure to Commercial workflow.
rHorsey Oct 21, 2019
95a0e06
Removing debugging sleep statement.
rHorsey Oct 21, 2019
fa16cef
Fix a botched merge in postprocessing.py.
rHorsey Oct 21, 2019
23a6efc
Update to qaqc enablement.
rHorsey Oct 22, 2019
c76b578
Adding simulation settings check into QAQC block.
rHorsey Oct 23, 2019
878fa75
making a return true explicit.
rHorsey Oct 23, 2019
721f4bb
Add preliminary ComStock postprocessing functions
Oct 30, 2019
0189471
Merge branch 'fix_ci' into enable-com
nmerket Nov 1, 2019
54cd807
Change os_version and os_sha from class constants to instance variables
asparke2 Jan 10, 2020
0b0c144
Provide clearer error message if docker daemon not running on Windows
asparke2 Jan 10, 2020
1dd308e
Avoid attempts to copy buildstock.csv files to current location
asparke2 Jan 10, 2020
9d88003
Modifies tests to use LocalDocker and HPCBatch instances to test inst…
asparke2 Jan 10, 2020
8c7c7f2
Modifies test again to avoid unrelated docker hub connection error
asparke2 Jan 10, 2020
4ddcf13
Fix style errors
asparke2 Jan 10, 2020
a79b2ea
Fix style and missing import references in postprocessing code; still…
asparke2 Jan 10, 2020
5142566
Small change to make default behavior more obvious.
rHorsey Jan 10, 2020
f3e88d0
Merge pull request #124 from NREL/fix_com_docker_4
rHorsey Jan 10, 2020
6d77611
Merge branch 'master' into enable-com
nmerket Feb 11, 2020
b034f82
fixing merge conflict goof
nmerket Feb 11, 2020
bd14ecd
add new sensitivity reporting measure
Mar 19, 2020
aafc455
forgot to change measure name
Mar 19, 2020
f27017f
Adding in support for configiration settings required for com.
rHorsey Mar 20, 2020
76bef02
Merge branch 'rHorsey/enable-com' into comstock-sensitivity
rHorsey Mar 20, 2020
120d159
Making sure weather_dir method is hit in sampling - removes a rare ra…
rHorsey Mar 30, 2020
d1db85e
add qoi measure to workflow_generator
Mar 30, 2020
a0c5628
Merge pull request #142 from NREL/comstock-sensitivity
rHorsey Mar 30, 2020
d2cd915
Fixing datetime specification error.
rHorsey Mar 31, 2020
517c59b
Initial merge of output refactor complete.
rHorsey Apr 14, 2020
5fb5dc7
Fixed several tests - sampler mock still needs fixing, will inquire w…
rHorsey Apr 14, 2020
1788a4e
Fixed test and made additional changes to support close-to-previous p…
rHorsey Apr 15, 2020
c061c26
Forgot to flake8
rHorsey Apr 15, 2020
b7b404c
Merge branch 'master' into enable-com
nmerket Apr 22, 2020
72bbd66
using docker_image property instead of function
nmerket Apr 22, 2020
c96a397
Merge branch 'master' into rHorsey/enable-com
rHorsey Apr 24, 2020
a0780fa
Fixing postprocessing bug.
rHorsey Apr 24, 2020
adf302e
Merge branch 'rHorsey/enable-com' of http://github.com/nrel/buildstoc…
rHorsey Apr 24, 2020
ed602bd
Adding seeds folder mount.
rHorsey Apr 24, 2020
ae62171
Try two updating to include seeds directory in mounts.
rHorsey Apr 24, 2020
34cb059
Fixing tests for local docker and upgrading schema to v0.2
rHorsey Apr 28, 2020
c5510d6
Forgot to add the v0.2 schema...
rHorsey Apr 28, 2020
8a6b4bc
Adding docker support to circle - maybe a bad idea...
rHorsey Apr 28, 2020
9abb0b7
Remove empty.osm seed dependency -- we do not need it.
May 6, 2020
da480c7
Forcing sampling_algorithm key to exist.
rHorsey May 6, 2020
d9e03b8
Removing the seeds mount from eagle.
rHorsey May 6, 2020
5f653eb
Updated documentation for 0.18 release - added changelog and migratio…
rHorsey May 8, 2020
4d6d5d7
Final doc updates.
rHorsey May 8, 2020
1fb08dd
Merge branch 'master' into rHorsey/enable-com
nmerket May 12, 2020
50e9f82
Merge branch 'master' into enable-com
nmerket May 12, 2020
36eb73c
Adding logic and tests to validate downselect schema requirements.
rHorsey May 14, 2020
3aba168
updating version
nmerket May 15, 2020
6ea0b75
precomputed sampling overhaul
nmerket May 18, 2020
ccbb7d3
fixing testing
nmerket May 18, 2020
49d4b88
style fixes
nmerket May 18, 2020
afbb416
more validation
nmerket May 18, 2020
1679fb4
moving precomputed sample validation
nmerket May 19, 2020
f8d3d67
fixing testing
nmerket May 19, 2020
69f7b51
Fixing style.
rHorsey May 19, 2020
588e590
fixing docker image for aws
nmerket May 19, 2020
bbc0cc6
Fixing weather files doc conflict.
rHorsey May 20, 2020
6928d6a
Final documentation updates for release 0.18
rHorsey May 20, 2020
d7702b5
Fix for comm update docs.
rHorsey May 20, 2020
744bae5
One last rst fix.
rHorsey May 20, 2020
37b56fd
Merge remote-tracking branch 'origin/master' into enable-com
nmerket May 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
46 changes: 8 additions & 38 deletions buildstockbatch/base.py
Expand Up @@ -61,30 +61,15 @@ def __init__(self, project_filename):
elif (self.stock_type != 'residential') & (self.stock_type != 'commercial'):
raise KeyError('Key `{}` for value `stock_type` not recognized in `{}`'.format(self.cfg['stock_type'],
project_filename))
self.sampler = None
self._weather_dir = None
# Call property to create directory and copy weather files there
_ = self.weather_dir # noqa: F841

if 'buildstock_csv' in self.cfg['baseline']:
buildstock_csv = self.path_rel_to_projectfile(self.cfg['baseline']['buildstock_csv'])
if not os.path.exists(buildstock_csv):
raise FileNotFoundError('The buildstock.csv file does not exist at {}'.format(buildstock_csv))
df = pd.read_csv(buildstock_csv)
n_datapoints = self.cfg['baseline'].get('n_datapoints', df.shape[0])
self.cfg['baseline']['n_datapoints'] = n_datapoints
if n_datapoints != df.shape[0]:
raise RuntimeError(
'A buildstock_csv was provided, so n_datapoints for sampling should not be provided or should be '
'equal to the number of rows in the buildstock.csv file. Remove or comment out '
'baseline->n_datapoints from your project file.'
)
if 'downselect' in self.cfg:
raise RuntimeError(
'A buildstock_csv was provided, which isn\'t compatible with downselecting.'
'Remove or comment out the downselect key from your project file.'
)

self.sampler = None
# Load in overriding OS_VERSION and OS_SHA arguments if they exist in the YAML
if 'os_version' in self.cfg.keys():
self.OS_VERSION = self.cfg['os_version']
if 'os_sha' in self.cfg.keys():
self.OS_SHA = self.cfg['os_sha']

def path_rel_to_projectfile(self, x):
if os.path.isabs(x):
Expand Down Expand Up @@ -155,23 +140,7 @@ def skip_baseline_sims(self):
return baseline_skip

def run_sampling(self, n_datapoints=None):
if n_datapoints is None:
n_datapoints = self.cfg['baseline']['n_datapoints']
if 'buildstock_csv' in self.cfg['baseline']:
buildstock_csv = self.path_rel_to_projectfile(self.cfg['baseline']['buildstock_csv'])
destination_filename = self.sampler.csv_path
if destination_filename != buildstock_csv:
if os.path.exists(destination_filename):
logger.info("Removing {!r} before copying {!r} to that location."
.format(destination_filename, buildstock_csv))
os.remove(destination_filename)
shutil.copy(
buildstock_csv,
destination_filename
)
return destination_filename
else:
return self.sampler.run_sampling(n_datapoints)
return self.sampler.run_sampling(n_datapoints)

def run_batch(self):
raise NotImplementedError
Expand Down Expand Up @@ -292,6 +261,7 @@ def validate_project(project_file):
assert(BuildStockBatchBase.validate_options_lookup(project_file))
assert(BuildStockBatchBase.validate_measure_references(project_file))
assert(BuildStockBatchBase.validate_reference_scenario(project_file))
#assert(BuildStockBatchBase.validate_options_lookup(project_file))
logger.info('Base Validation Successful')
return True

Expand Down
56 changes: 49 additions & 7 deletions buildstockbatch/hpc.py
Expand Up @@ -26,7 +26,7 @@
import time

from .base import BuildStockBatchBase, SimulationExists
from .sampler import ResidentialSingularitySampler, CommercialSobolSampler
from .sampler import ResidentialSingularitySampler, CommercialSobolSingularitySampler, PrecomputedSingularitySampler

logger = logging_.getLogger(__name__)

Expand Down Expand Up @@ -59,7 +59,15 @@ def __init__(self, project_filename):
elif self.stock_type == 'commercial':
sampling_algorithm = self.cfg['baseline'].get('sampling_algorithm', 'sobol')
if sampling_algorithm == 'sobol':
self.sampler = CommercialSobolSampler(
self.sampler = CommercialSobolSingularitySampler(
self.output_dir,
self.cfg,
self.buildstock_dir,
self.project_dir
)
elif sampling_algorithm == 'precomputed':
print('calling precomputed sampler')
nmerket marked this conversation as resolved.
Show resolved Hide resolved
self.sampler = PrecomputedSingularitySampler(
self.output_dir,
self.cfg,
self.buildstock_dir,
Expand All @@ -83,17 +91,33 @@ def singularity_image_url(cls):

@property
def singularity_image(self):
# Check the project yaml specification - if the file does not exist do not silently allow for non-specified simg
if 'sys_image_dir' in self.cfg.keys():
sys_image_dir = self.cfg['sys_image_dir']
sys_image = os.path.join(sys_image_dir, 'OpenStudio-{ver}.{sha}-Singularity.simg'.format(
ver=self.OS_VERSION,
sha=self.OS_SHA
))
if os.path.isfile(sys_image):
return sys_image
else:
raise RuntimeError('Unable to find singularity image specified in project file: `{}`'.format(sys_image))
# Use the expected HPC environment default if not explicitly defined in the YAML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this is redundant with the next several lines. The only thing that is different is sys_image_dir. What if instead of doing this you created an @property getter for sys_image_dir on this class that looked in the config file and did this check. You could then change the current class attribute sys_image_dir to default_sys_image_dir and use if it is not specified in the config file.

sys_image = os.path.join(self.sys_image_dir, 'OpenStudio-{ver}.{sha}-Singularity.simg'.format(
ver=self.OS_VERSION,
sha=self.OS_SHA
))
if os.path.isfile(sys_image):
return sys_image
# Download the appropriate singularity image for the defined OS_VERSION and OS_SHA
else:
singularity_image_path = os.path.join(self.output_dir, 'openstudio.simg')
if not os.path.isfile(singularity_image_path):
logger.debug('Downloading singularity image')
r = requests.get(self.singularity_image_url(), stream=True)
if r.status_code != requests.codes.ok:
logger.error('Unable to download simg file from OpenStudio releases S3 bucket.')
r.raise_for_status()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good check.

with open(singularity_image_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
Expand Down Expand Up @@ -139,8 +163,19 @@ def run_batch(self):
else:
# otherwise just the plain sampling process needs to be run
buildstock_csv_filename = self.run_sampling()

# read the results
# If the results directory already exists, implying the existence of results, require a user defined override
# in the YAML file to allow for those results to be overwritten. Note that this will not impact the
# postprocessonly or uploadonly flags as they do not ever invoke the run_batch function, instead skipping to the
# queue_post_processing and then process_results functions
if 'output_directory' in self.cfg:
if os.path.isdir(os.path.join(self.cfg['output_directory'], 'results')):
if self.cfg.get('override_existing', False):
raise RuntimeError('results directory exists in {} - please address'.format(
self.cfg['output_directory']))
else:
logger.warn('Overriding results in results directory in {}'.format(self.cfg['output_directory']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this. We have a validation that checks for the results directory and errors out if it exists. Then the user has to delete it themselves or choose another location. Things get really weird when you start overwriting results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to get rid of "override existing".


# Determine the number of simulations expected to be executed
df = pd.read_csv(buildstock_csv_filename, index_col=0)

# find out how many buildings there are to simulate
Expand Down Expand Up @@ -229,6 +264,13 @@ def run_building(cls, project_dir, buildstock_dir, weather_dir, output_dir, sing
weather_dir,
]

# If custom gems are to be used in the singularity container add extra bundle arguments to the cli command
cli_cmd = 'openstudio run -w in.osw'
if cfg.get('baseline', dict()).get('custom_gems', False):
cli_cmd = 'openstudio --bundle /var/oscli/Gemfile --bundle_path /var/oscli/gems run -w in.osw --debug'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% percent sure how these gems are getting into the container. Is it through your custom singularity image?

Anyway, I don't see something comparable in localdocker.py. There probably should be something there too.

if get_bool_env_var('MEASURESONLY'):
cli_cmd += ' --measures_only'

# Call singularity to run the simulation
args = [
'singularity', 'exec',
Expand All @@ -247,14 +289,13 @@ def run_building(cls, project_dir, buildstock_dir, weather_dir, output_dir, sing
args.extend(['-B', '{}:{}:ro'.format(src, container_mount)])
container_symlink = os.path.join('/var/simdata/openstudio', os.path.basename(src))
runscript.append('ln -s {} {}'.format(*map(shlex.quote, (container_mount, container_symlink))))
runscript.append('openstudio run -w in.osw')
if get_bool_env_var('MEASURESONLY'):
runscript[-1] += ' --measures_only'
runscript.append(cli_cmd)
args.extend([
singularity_image,
'bash', '-x'
])
logger.debug(' '.join(args))
logger.debug('\n'.join(runscript))
with open(os.path.join(sim_dir, 'singularity_output.log'), 'w') as f_out:
try:
subprocess.run(
Expand All @@ -268,6 +309,7 @@ def run_building(cls, project_dir, buildstock_dir, weather_dir, output_dir, sing
except subprocess.CalledProcessError:
pass
finally:
time.sleep(600)
# Clean up the symbolic links we created in the container
for mount_dir in dirs_to_mount + [os.path.join(sim_dir, 'lib')]:
try:
Expand Down
12 changes: 10 additions & 2 deletions buildstockbatch/localdocker.py
Expand Up @@ -22,7 +22,7 @@
import shutil

from buildstockbatch.base import BuildStockBatchBase, SimulationExists
from buildstockbatch.sampler import ResidentialDockerSampler, CommercialSobolSampler
from buildstockbatch.sampler import ResidentialDockerSampler, CommercialSobolDockerSampler, PrecomputedDockerSampler

logger = logging.getLogger(__name__)

Expand All @@ -45,12 +45,20 @@ def __init__(self, project_filename):
elif self.stock_type == 'commercial':
sampling_algorithm = self.cfg['baseline'].get('sampling_algorithm', 'sobol')
if sampling_algorithm == 'sobol':
self.sampler = CommercialSobolSampler(
self.sampler = CommercialSobolDockerSampler(
nmerket marked this conversation as resolved.
Show resolved Hide resolved
self.project_dir,
self.cfg,
self.buildstock_dir,
self.project_dir
)
elif sampling_algorithm == 'precomputed':
print('calling precomputed sampler')
self.sampler = PrecomputedDockerSampler(
self.output_dir,
self.cfg,
self.buildstock_dir,
self.project_dir
)
else:
raise NotImplementedError('Sampling algorithem "{}" is not implemented.'.format(sampling_algorithm))
else:
Expand Down
9 changes: 9 additions & 0 deletions buildstockbatch/postprocessing.py
Expand Up @@ -73,11 +73,20 @@ def flatten_datapoint_json(reporting_measures, d):
new_d[f'{col1}.{k}'] = v

# if there is no units_represented key, default to 1
# TODO @nmerket @rajeee is there a way to not apply this to Commercial jobs? It doesn't hurt, but it is weird for us
nmerket marked this conversation as resolved.
Show resolved Hide resolved
units = int(new_d.get(f'{col1}.units_represented', 1))
new_d[f'{col1}.units_represented'] = units
<<<<<<< HEAD

# copy over all the keys and values in SimulationOutputReport
col3 = 'SimulationOutputReport'
for k, v in d.get(col3, {}).items():
new_d[f'{col3}.{k}'] = v
=======
col2 = 'SimulationOutputReport'
for k, v in d.get(col2, {}).items():
new_d[f'{col2}.{k}'] = v
>>>>>>> origin/master
nmerket marked this conversation as resolved.
Show resolved Hide resolved

# additional reporting measures
for col in reporting_measures:
Expand Down
3 changes: 2 additions & 1 deletion buildstockbatch/sampler/__init__.py
Expand Up @@ -2,4 +2,5 @@

from .residential_docker import ResidentialDockerSampler # noqa F041
from .residential_singularity import ResidentialSingularitySampler # noqa F041
from .commercial_sobol import CommercialSobolSampler # noqa F041
from .commercial_sobol import CommercialSobolSingularitySampler, CommercialSobolDockerSampler # noqa F041
from .precomputed import PrecomputedDockerSampler, PrecomputedSingularitySampler
62 changes: 57 additions & 5 deletions buildstockbatch/sampler/commercial_sobol.py
Expand Up @@ -25,7 +25,7 @@
logger = logging.getLogger(__name__)


class CommercialSobolSampler(BuildStockSampler):
class CommercialBaseSobolSampler(BuildStockSampler):

def __init__(self, output_dir, *args, **kwargs):
"""
Expand All @@ -43,7 +43,17 @@ def __init__(self, output_dir, *args, **kwargs):
def csv_path(self):
return os.path.join(self.project_dir, 'buildstock.csv')

def run_sampling(self, n_datapoints):
def run_sampling(self, n_datapoints=None):
"""
Execute the sampling generating the specified number of datapoints.

This is a stub. It needs to be implemented in the child classes for each deployment environment.

:param n_datapoints: Number of datapoints to sample from the distributions.
"""
raise NotImplementedError

def run_sobol_sampling(self, n_datapoints=None):
"""
Run the commercial sampling.

Expand All @@ -54,7 +64,10 @@ def run_sampling(self, n_datapoints):
:param n_datapoints: Number of datapoints to sample from the distributions.
:return: Absolute path to the output buildstock.csv file
"""
logging.debug('Sampling, n_datapoints={}'.format(n_datapoints))
sample_number = self.cfg['baseline'].get('n_datapoints', 350000)
if isinstance(n_datapoints, int):
sample_number = n_datapoints
logging.debug(f'Sampling, number of data points is {sample_number}')
tsv_hash = {}
for tsv_file in os.listdir(self.buildstock_dir):
if '.tsv' in tsv_file:
Expand All @@ -63,7 +76,7 @@ def run_sampling(self, n_datapoints):
tsv_df[dependency_columns] = tsv_df[dependency_columns].astype('str')
tsv_hash[tsv_file.replace('.tsv', '')] = tsv_df
dependency_hash, attr_order = self._com_order_tsvs(tsv_hash)
sample_matrix = self._com_execute_sobol_sampling(attr_order.__len__(), n_datapoints)
sample_matrix = self._com_execute_sobol_sampling(attr_order.__len__(), sample_number)
csv_path = self.csv_path
header = 'Building,'
for item in attr_order:
Expand All @@ -78,7 +91,7 @@ def run_sampling(self, n_datapoints):
Parallel(n_jobs=n_jobs, verbose=5)(
delayed(self._com_execute_sample)(tsv_hash, dependency_hash, attr_order, sample_matrix, index, csv_path,
lock)
for index in range(n_datapoints)
for index in range(sample_number)
)
return csv_path

Expand Down Expand Up @@ -175,3 +188,42 @@ def _com_execute_sample(tsv_hash, dependency_hash, attr_order, sample_matrix, sa
fd.write(csv_row)
finally:
lock.release()


class CommercialSobolSingularitySampler(CommercialBaseSobolSampler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have separate singularity and docker sampler classes? These look exactly the same. I know you're probably doing this because we have separate classes for residential because our sampling runs in a container. Someday I'd like that not to be the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to deal with this in #147, so let's not worry about it for now.


def __init__(self, output_dir, *args, **kwargs):
"""
This class uses the Commercial Sobol Sampler to execute samples for Peregrine Singularity deployments
"""
self.output_dir = output_dir
super().__init__(*args, **kwargs)

def run_sampling(self, n_datapoints=None):
"""
Execute the sampling for use in Peregrine Singularity deployments

:param n_datapoints: Number of datapoints to sample from the distributions.
:return: Path to the sample CSV file
"""
csv_path = os.path.join(self.output_dir, 'buildstock.csv')
return self.run_sobol_sampling(n_datapoints, csv_path)


class CommercialSobolDockerSampler(CommercialBaseSobolSampler):

def __init__(self, *args, **kwargs):
"""
This class uses the Commercial Sobol Sampler to execute samples for local Docker deployments
"""
super().__init__(*args, **kwargs)

def run_sampling(self, n_datapoints=None):
"""
Execute the sampling for use in local Docker deployments

:param n_datapoints: Number of datapoints to sample from the distributions.
:return: Path to the sample CSV file
"""
csv_path = os.path.join(self.project_dir, 'housing_characteristics', 'buildstock.csv')
return self.run_sobol_sampling(n_datapoints, csv_path)