Skip to content

Memo on parser

Espen edited this page Jan 7, 2021 · 98 revisions

Note: Memo on parser before refactoring was moved to https://github.com/aiida-vasp/aiida-vasp/wiki/Memo-on-parser-before-refactoring.

Overall parsing workflow:

(AiiDA core responsibility)

  1. CalcJob is finished.
  2. parse method in the set parser is called. One can set the parser by specifying the metadata.options.parser_name as a Str input of the CalcJob.

(Responsibility is now handed over to the plugin)

  1. Before we can execute the parse function on the plugin side, the parsing class which houses this function (VaspParser in this case) need to be initialized. This involves the following steps:
    1. We first map a function get_quantity to a Delegate() class. The goal when constructing the parser was to be as general as possible such that we could configure the parser to compose a quantity that was dependent on for instance different file parsers. The other option would be to manually add all combinations, but since this would introduce code duplication the Delegate() approach was chosen. In addition, we wanted these quantities to be easy to extend and configure. The idea being that users could add custom file parsers and not touch the core parser code. Then the composition is configured with the parser settings. For the isolated VASP case this is certainly overkill, but makes it possible to reuse this parser for other plugins and maybe more importantly yields the possibility to parse results of auxiliary codes such as Wannier90 with the same core engine. In fact, the parser was initially constructed with the aim of making a general parser for AiiDA. Notice that even though the Delegate() approach was chosen, there are other ways to archive similar functionality.
    2. The settings are initialized using the ParserSettings class. This will contain all relevant settings, including for instance which file parser are associated with physical files, if some files are critical etc. In addition it will house which quantities end up on which output nodes and their respective keys.
    3. Then the quantities are initialized using the class ParsableQuantities. This will contain which quantities we can parse, if files are missing to parse the requested quantities, and also important, alternative parsers. E.g. if one typically fetches a parameter from fileA one can specify that one alternatively can parse it from filaB. If say fileA then is not present or there is some other issue with its file parsing, it parses it from fileB and so on. It is only initialized at this point. When parse is executed, this will call quantities.somemethods that handle and set these properties.
    4. The file parsers are initialized using the ParserManager class. This basically sets which file parsers class to physical file mapping and checks that the file is there etc. It is again only initialized and later calls to parses.somemethods are performed after the parse is executed to actually perform these tasks.
  2. The parse in VaspParser is executed and the actual parsing starts.
  3. First a few checks of missing critical files are performed. If a critical file is not found an exit code is returned.
  4. Then quantities.setup is executed, which

Parsing starts with VaspParser. The parsing from AiiDA is triggered by calling the parse method in VaspParser. This intrinsic AiiDA functionality. When parse completes, the parsing should be completed and

Refactoring

Data structures

vasp.py

DEFAULT_OPTIONS = {
    'add_trajectory': False,
    'add_bands': False,
    'add_chgcar': False,
    'add_dos': False,
    'add_kpoints': False,
    'add_energies': False,
    'add_misc': True,
    'add_structure': False,
    'add_projectors': False,
    'add_born_charges': False,
    'add_dielectrics': False,
    'add_hessian': False,
    'add_dynmat': False,
    'add_wavecar': False,
    'add_forces': False,
    'add_stress': False,
    'add_site_magnetization': False,
    'store_energies_sc': False,
}

settings.py

FILE_PARSER_SETS = {
    'default': {
        'DOSCAR': {
            'parser_class': DosParser,
            'is_critical': False,
            'status': 'Unknown'
        },
...

The dict key of FILE_PARSER_SETS['default'] is accessed by a file name obtained from parser.retrieved, e.g., each of [retrieved_file.name for retrieved_file in parser.retrieved.list_objects()].

NODES = {
    'misc': {
        'link_name': 'misc',
        'type': 'dict',
        'quantities': ['total_energies', 'maximum_stress', 'maximum_force', 'symmetries', 'magnetization', 'notifications']
    },
    'kpoints': {
        'link_name': 'kpoints',
        'type': 'array.kpoints',
        'quantities': ['kpoints'],
    },
    'structure': {
        'link_name': 'structure',
        'type': 'structure',
        'quantities': ['structure'],
    },
    'poscar-structure': {
        'link_name': 'structure',
        'type': 'structure',
        'quantities': ['poscar-structure'],
    },
...

NODES.keys() (or Settings.output_nodes_dict.keys()) are identifiers locally used (here we call it node_name and it seems node_name will not be stored in AiiDA database.) NODES[node_name]['link_name'] is the AiiDA link label. Each element of NODES[node_name]['quantities'] corresponds to one of those given by 'alternatives' in PARSABLE_ITEMS and also ParsableQuantities()._parsable_quantities.keys().

node_composer.py

NODES_TYPES = {
    'dict': ['total_energies', 'maximum_force', 'maximum_stress', 'symmetries', 'magnetization', 'site_magnetization', 'notifications'],
    'array.kpoints': ['kpoints'],
    'structure': ['structure'],
    'array.trajectory': ['trajectory'],
    'array.bands': ['eigenvalues', 'kpoints', 'occupancies'],
    'vasp.chargedensity': ['chgcar'],
    'vasp.wavefun': ['wavecar'],
    'array': [],
}

vasprun.py

DEFAULT_OPTIONS = {
    'quantities_to_parse': [
        'structure', 'eigenvalues', 'dos', 'bands', 'kpoints', 'occupancies', 'trajectory', 'energies', 'projectors', 'dielectrics',
        'born_charges', 'hessian', 'dynmat', 'forces', 'stress', 'total_energies', 'maximum_force', 'maximum_stress'
    ],
    'energy_type': ['energy_no_entropy']
}

The items of 'quantities_to_parse' are used to access the kyes of PARSABLE_ITEMS.

    PARSABLE_ITEMS = {
        'structure': {
            'inputs': [],
            'name': 'structure',
            'prerequisites': [],
            'alternatives': ['poscar-structure']
        },
    ...

self._parsable_items = self.PARSABLE_ITEMS. This can be accessed as the attribute parsable_items of the file parser instance (@property). 'name' corresponds to elements of NODES items' 'quantities'. Elements of 'prerequisites' and 'alternatives' correspond to keys of parsable_items.

poscar.py

    PARSABLE_ITEMS = {
        'poscar-structure': {
            'inputs': [],
            'name': 'structure',
            'prerequisites': [],
        },
    }

Variable names

  • node_key (node_name) : NODES.keys(), add_xxxx in DEFAULT_OPTIONS
  • quantity_items : for quantity_key, quantity_dict in quantity_items.items()
  • quantity_name : quantity_dict['name'], NODES[node_key]['quantities']
  • quantity_key : PARSABLE_ITEMS.keys(), quantity_dict['alternatives'], quantity_dict['prerequisites']
  • quantity_dict : PARSABLE_ITEMS[quantity_key]

Future extensions

  • In the original implementation we preloaded OUTCAR and vasprun.xml etc. Now the file parser is loaded for every quantity. We should consider preloading
  • In addition, we should consider to release memory when the file parser is no more needed (but the parser needs to continue with other file parsers)
  • show_screening_steps should possibly be integrated with the AiiDA debug settings etc.
  • Try to make the node composer concept more general, such that it is possible to have several dict nodes etc.
  • get_node_inputs_from_file_parser is needed only for tests?
  • Consider to simplify the file parsers, e.g. the init stuff. In addition, see if we can move more into the BaseFileParser and make each file parser simpler.
  • Consider to remove BaseParser from file parser module.
  • Consider to not bring the exit codes along into the parsing and use another container for errors (say the notifications or a more general one) and then in say parse return one exit code that we define on the calculation, where we can for instance update the message and introduce text from the notification and return it.
  • Consider get_quantity_from_input etc. too check if there is use for it and change it to comply with the new standard.
  • Decide how to handle composed calculations utilizing different folders (say that VASP expects results or ejects results in different folders)