Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrainingContainer should keep track parameters used to generate stored quantities #550

Open
pmrv opened this issue Dec 21, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@pmrv
Copy link
Contributor

pmrv commented Dec 21, 2022

For FAIR data provenance it would be nice if exported TrainingContainers can know (roughly) how they were generated. This would include both generic dft parameters (k sampling, convergence criteria, etc.) and code specific inputs (ISYM, SYMPREC, ALGO, etc.). Ideally these could be taken from jobs directly once we have some generic setup inplace. This will probably take a while and TrainingContainer would need to store them in an efficient, sparsified way anyway. Therefore I will start here with a simple implementation first. Maybe just something like

DftInput= namedtuple('VaspInput', ['kpoints', 'electronic_convergence', 'code_specific'])

class TrainingContainer:
  ...
  def add_provenance(self, input):
    self._input_provenance = input

Then in a few steps it could be:

  1. make it possible to do per structure in a sparse list
  2. make it extract that automatically on include_job
  3. export it into the pandas dataframe somehow as well
  4. add optional checks that input parameters in one container are matching/compatible/consistent
@pmrv pmrv added the enhancement New feature or request label Dec 21, 2022
@pmrv pmrv self-assigned this Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant