TrainingContainer should keep track parameters used to generate stored quantities #550

pmrv · 2022-12-21T11:52:56Z

For FAIR data provenance it would be nice if exported TrainingContainers can know (roughly) how they were generated. This would include both generic dft parameters (k sampling, convergence criteria, etc.) and code specific inputs (ISYM, SYMPREC, ALGO, etc.). Ideally these could be taken from jobs directly once we have some generic setup inplace. This will probably take a while and TrainingContainer would need to store them in an efficient, sparsified way anyway. Therefore I will start here with a simple implementation first. Maybe just something like

DftInput= namedtuple('VaspInput', ['kpoints', 'electronic_convergence', 'code_specific'])

class TrainingContainer:
  ...
  def add_provenance(self, input):
    self._input_provenance = input

Then in a few steps it could be:

make it possible to do per structure in a sparse list
make it extract that automatically on include_job
export it into the pandas dataframe somehow as well
add optional checks that input parameters in one container are matching/compatible/consistent

The text was updated successfully, but these errors were encountered:

pmrv added the enhancement New feature or request label Dec 21, 2022

pmrv self-assigned this Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrainingContainer should keep track parameters used to generate stored quantities #550

TrainingContainer should keep track parameters used to generate stored quantities #550

pmrv commented Dec 21, 2022

TrainingContainer should keep track parameters used to generate stored quantities #550

TrainingContainer should keep track parameters used to generate stored quantities #550

Comments

pmrv commented Dec 21, 2022