Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampler and Workflow Refactor #187

Merged
merged 27 commits into from Feb 10, 2021
Merged

Sampler and Workflow Refactor #187

merged 27 commits into from Feb 10, 2021

Conversation

nmerket
Copy link
Member

@nmerket nmerket commented Oct 8, 2020

Fixes #147.

Pull Request Description

  • Sampler class refactor. Simplifies things quite a it.

Checklist

Not all may apply

  • Code changes (must work)
  • Tests exercising your feature/bug fix (check coverage report on CircleCI build -> Artifacts)
  • All other unit tests passing
  • Update validation for project config yaml file changes
  • Update existing documentation
  • Run a small batch run to make sure it all works (local is fine, unless an Eagle specific feature)
  • Add to the changelog_dev.rst file and propose migration text in the pull request

@nmerket nmerket requested a review from rHorsey October 8, 2020 23:16
Copy link
Member Author

@nmerket nmerket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rHorsey It's still kind of rough and doesn't work, but what are your thoughts on the direction of this?

Comment on lines -39 to -48
class SimulationExists(Exception):

def __init__(self, msg, sim_id, sim_dir):
super().__init__(msg)
self.sim_id = sim_id
self.sim_dir = sim_dir


class ValidationError(Exception):
pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These had to be moved to avoid a circular import.

else:
return os.path.abspath(os.path.join(os.path.dirname(startfile), x))
def get_sampler_class(sampler_name):
sampler_class_name = ''.join(x.capitalize() for x in sampler_name.strip().split('_')) + 'Sampler'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

residential_quota ➡️ ResidentialQuotaSampler

Comment on lines 76 to 77
Sampler = self.get_sampler_class(self.cfg['sampler']['type'])
self.sampler = Sampler(self, **self.cfg['sampler'].get('args', {}))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much simpler than the logic we used to have. Each sampler will then define its own arguments and have validation for it.

key, value = logic.split('|')
return df[key] == value

def downselect(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this into its own sampler class.

Comment on lines 242 to 250
def validate_sampler(project_file):
cfg = BuildStockBatchBase.get_project_configuration(project_file)
sampler_name = cfg['sampler']['type']
try:
Sampler = BuildStockBatchBase.get_sampler_class(sampler_name)
except AttributeError:
raise ValidationError(f'Sampler class `{sampler_name}` is not available.')
args = cfg['sampler']['args']
return Sampler.validate_args(project_file, **args)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each sampler defines its own validation, which gets called as part of the validation step.

@@ -47,6 +46,8 @@ def get_bool_env_var(varname):

class EagleBatch(BuildStockBatchBase):

CONTAINER_RUNTIME = ContainerRuntime.SINGULARITY
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Batch class tells has this enum on it to identify which container runtime it uses. The samplers then read this and call the appropriate method. This allows us to not have separate docker and singularity classes.

self.cfg = cfg
self.buildstock_dir = buildstock_dir
self.project_dir = project_dir
self.parent = weakref.ref(parent) # This removes circular references and allows garbage collection to work.
Copy link
Member Author

@nmerket nmerket Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pass the parent object to here and use python's weakref to avoid problems with circular references and garbage collection. This allows the child object to read stuff off the parent, such as which container runtime it uses.

logger = logging.getLogger(__name__)


class DownselectSamplerBase(BuildStockSampler):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subclass from this class to do downselecting for a particular sampler, as long as it takes the n_datapoints arg. Then you specify the downselect version of the sampler if that's what you want.

Comment on lines +110 to +111
class ResidentialQuotaDownselectSampler(DownselectSamplerBase):
SUB_SAMPLER_CLASS = ResidentialQuotaSampler
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one of the downselect versions of a sampler class.

@nmerket nmerket mentioned this pull request Jan 14, 2021
7 tasks
@nmerket nmerket requested a review from ejhw January 22, 2021 16:58
@nmerket nmerket marked this pull request as ready for review January 22, 2021 16:58
@nmerket
Copy link
Member Author

nmerket commented Jan 22, 2021

@ejhw @rHorsey @joseph-robertson This represents a pretty major change to the configuration files set up. I'm pretty satisfied that's it's working at this point. I'd appreciate it if you guys read over the docs changes and let me know if it communicates it well enough. You can see the generated docs from CicleCI by clicking on the CircleCI check, Artifacts tab, then docs/index.html.

@rHorsey I have not tried this with ComStock. I think it should work. There are gaps in the commercial sobol sampler documentation and tests. If you don't have time to review now we can just merge it and do bugfixes later to get ComStock working again.

@afontani
Copy link
Collaborator

@nmerket: It makes sense to put the n_datapoints argument in with the sampling arguments. I have been out of the loop on this PR, but if we wanted to add another sampling method does this restructure make that easier?

@nmerket
Copy link
Member Author

nmerket commented Jan 25, 2021

@afontani, yes it does make it easier to add another sampler. It separates the sampler code into a class that provides a specific interface. If you want to make a new sampler, you just need to create a new class that does sampling according to whatever method you want as long as it matches that interface.

The same is true of workflow generators. This will help with allowing for workflow generators for HPXML and eventually Home Energy Score.

@nmerket
Copy link
Member Author

nmerket commented Jan 25, 2021

Update yamls:

  • Local
  • Eagle
  • AWS.

@nmerket
Copy link
Member Author

nmerket commented Jan 28, 2021

@ejhw Are we still maintaining project_resstock_national_upgrades.yml?

@nmerket
Copy link
Member Author

nmerket commented Jan 28, 2021

@ejhw Heads up, I updated project_resstock_national_upgrades.yml to use the new schema version, but it doesn't work because the upgrade options don't match what's in resstock anymore.

@ejhw
Copy link
Contributor

ejhw commented Jan 29, 2021

@nmerket Thanks for the heads up. Yes, we should still maintain it; though I'd like to move it to /resstock. Do you think that makes sense?

buildstockbatch/schemas/v0.3.yaml Show resolved Hide resolved
project_resstock_national.yml Outdated Show resolved Hide resolved
aws_demo_project.yml Outdated Show resolved Hide resolved
project_resstock_national.yml Outdated Show resolved Hide resolved
project_resstock_national_upgrades.yml Outdated Show resolved Hide resolved
aws_demo_project.yml Outdated Show resolved Hide resolved
@ejhw ejhw self-requested a review February 9, 2021 00:53
@ejhw
Copy link
Contributor

ejhw commented Feb 9, 2021

@joseph-robertson Can you approve? Looks like @nmerket resolved your requested changes.

@nmerket nmerket merged commit 86067e7 into develop Feb 10, 2021
@nmerket nmerket deleted the sampler_workflow_refactor branch February 10, 2021 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor of YAML config file
4 participants