Default Parser Spec

General

Purpose of this document

This document describes the class which inspects a finished VaspCalculation and aims to provide end user level digested information about it, specifically about the results. This happens by parsing the files found in the retrieved folder (output link name: retrieved). This document is meant as a base for discussing design changes, and is supposed to be updated along with changes on the implementation level.

This document is meant as a base for discussing design changes, and is supposed to be updated along with changes on the implementation level.

Terminology

parser: A aiida.parser.Parser instance, which is intended to parse a finished Calculation node
file parser: Any code entity designed to be called by the VaspParser through a common interface in order to parse quantities from an output file
output file: Any file written or modified by VASP during the course of a simulation run
quantity: A physical quantity which describes the system simulated using VASP. The goal of running VASP is always to calculate one or more quantities. VASP typically records the quantities it calculates in one or more output files

Terminology (proposed change)

file parser: Any code entity designed to be called by the VaspParser through a common interface, or directly in order to parse quantities from an output or input file
output file: Any file written or modified by VASP during the course of a simulation run
input file: Any file used as an input to VASP
quantity: A quantity which describes a physical property, or collection of it of a system simulated using VASP.

Working principle of the parser

After obtaining the output files of VASP it will proceed to create database records of user requested quantities and attach them (through the existing aiida.parsers.Parser machinery) as output links to the calculation.

Names

The default parser is called VaspParser. It must be loadable by the AiiDA ParserFactory as follows: ParserFactory('vasp.vasp').

Inheritance

The VaspParser must be inherited from aiida.parsers.Parser. It does so via aiida_vasp.parsers.base.BaseParser, which abstracts away some of the common tasks parsers perform.

Requirements

Correctness

In no way shall the default parser falsify or fabricate results
The default parser will always strife to present results in an unambiguous fashion

Correctness (proposed change)

In no way shall the default parser falsify or fabricate results
The default parser will always strife to present results in an unambiguous fashion
Metadata should be preserved if present.
- TODO: give an example for metadata (For new devs who don't know EMMC) and / or define 'metadata' in the Terminology section

Extendability

It must be possible to add parsing capability for output files or quantities without touching the VaspParser classes source code. The file in which it is defined should not be part of the modification. This ensures low entry barrier for contributors and remedies common VCS conflicts.

Currently this is achieved by splitting off file parsers into separate classes and registering the file parsers along with which quantities they parse outside of the VaspParser class.

Customizability

It must be possible to pass details to the VaspCalculation, which will control the parser in the following ways:

Instruct the parser which output files are required and optional
Record which quantities are required and optional. From this the parser can conclude which output files are needed. For required quantities the parser shall fail if the output files are missing.
Instruct to parse quantities from another output file than the default one if need be or if the user insists
The details shall be given as parser_settings dictionary in the settings input of the VaspCalculation

It must be easy to write custom parsers (by subclassing or by adhering to a method interface), which benefits from the existing file parsers.

Compatibility

With other plugins

The default parser should wherever appropriate stick to the conventions set by the AiiDA-Quantumespresso plugin in terms of names, content and type of output nodes. If not, a coordinated effort should be pursued in order to synchronize further development.

With BaseRestartWorkChain

The default parser should record warnings about known types of VASP failures in a special output node, analogous to the QE plugin, such that (future) error handlers can decide on a restart strategy.

With BaseRestartWorkChain (proposed change)

The default parser should record warnings about known types of VASP failures in a special output node, analogous to the QE plugin, such that (future) error handlers can decide on a restart or exit strategy. Constant monitoring of warnings and errors are also necessary in order to avoid excessive and unnecessary usage of computational resources.

With BaseRestartWorkChain (counter proposal by DropD)

... To that end capability to read output from an additional monitoring job which can be launched alongside VASP runs may be added. Such a monitoring script is planned to become part of AiiDA-VASP in the future.

Reasoning for the counter proposal:

The monitoring script should be separate from the parser, though it might reuse file parser code. Launching it alongside VASP is not something the parser can do at all (must be done in VaspCalculation). Therefore the necessity to have such a script is not part of the VaspParser spec, whereas potentially interfacing with it is.

Efficiency

When a quantity can be parsed from multiple output files, the default parser should always choose the most robust path by default. If multiples are possible, the smallest in size should be prioritized, then the most efficient (a measure is pending). The default must be overridable on a per VaspCalculation basis or in a custom parser
Within the constraints given by the user the default parser must always try to avoid storage of large output files.

Current implementation

The current implementation meets the requirements as defined above in the following ways:

Extendability

Separate file parsers

The VaspParser does not contain any file parsing capabilities directly. Instead that functionality is contained in separate classes, one per output file type. These classes reside in the aiida_vasp.io module.

File parser interface

In order to keep the VaspParser and the file parsers separated, the following items are required as interface for parsing quantities:

FileParser.PARSABLE_ITEMS: a static dictionary with definitions for the quantities this file parser can parse. It will be gathered by the VaspParser.
FileParser.get_quantity(quantity, inputs): a method that can be subscribed to VaspParser.get_quantity.
VaspParser.get_quantity(quantity, inputs): a method with @delegate decorator to which the file parsers can subscribe. The VaspParser will call this method, when it wants to parse a quantity. All file parsers check, whether they can (and should, i.e. based on user priority) parse this quantity and if so return it. If none of the subscribed file parsers returns anything, the body of this method is called.
VaspParser.get_inputs(quantity): a method that can be called by the file parsers in order to obtain another quantity that would be required for parsing without interacting with the other file parser directly.

The file parsers side of the interface can be inherited from io.parser.BaseFileParser, which provides a get_quantity method and takes care of adding it to the VaspParsers delegate.

Example file parser

The intention of the base BaseFileParser class for file parsers is to allow implementing new file parser with as little work as possible. Below is an example for a minimal setup. Two important things have to be overridden: what quantities can be parsed by the ExampleFileParser.PARSABLE_ITEMS, and how those quantities will be parsed from the file by ExampleFileParser.parse_file().

import re

import py

from aiida_vasp.utils.aiida_utils import get_data_class
from aiida_vasp.io.parser import BaseFileParser

ExampleFileParser(BaseFileParser):

    PARSABLE_ITEMS = {
        # The name of a quantity as key. It should be unique among all of the FileParsers.
        'item1': { 
            # This quantity will be parsed first and made available in time if possible
            'inputs': ['required_quantity'],  
            # During setup the VaspParser will check, whether ExampleFile has been
            # retrieved and initialise the corresponding parser, if this quantity is 
            # requested by setting any of the 'parser_settings['add_OutputNode'] = True'.
            'parsers': ['ExampleFile'], 
            # The quantity will be added to the 'output_examples' output node
            'nodeName': ['examples'], 
            # This prohibits the parser from trying to parse item1 without the
            # ``required_quantity``.
            'prerequisites: ['required_quantity'],  
            # (Optional) If a quantity can be parsed from more than one file, a list of 
            # alternative quantities can be provided here.
            'alternatives': ['alternative_quantity1', ... ] 
            # (Optional) If this quantity is an alternative to another_quantity set this
            # flag. The VaspParser will automatically add this quantity to 
            # ``another_quantities`` alternatives.
            'is_alternative': another_quantity 
        }
        'item2': {
            'inputs': [],
            'parsers': ['ExampleFile'],
            'nodeName': ['examples'],
            'prerequisites': [],
        },
        # An example for a quantity representing an ``output_node``, that should be 
        # attached to the VaspCalculation. At the moment quantities with ``name`` == 
        # ``nodeName`` are considered as representing output_nodes.
        'examples': {
            'inputs': [item1, item2],
            'parsers': ['ExampleFile'],
            'nodeName': ['examples'],
            'prerequisites': [],
        }
    }

    def __init__(self, *args, **kwargs):
        super(ExampleFileParser, self).__init__(*args, **kwargs)
        self.init_with_kwargs(**kwargs)

    def _parse_file(self, inputs):

        # self._data_obj will be set during init.
        example_file = py.path.local(self._data_obj.path) 
 
        data = example_file.read()

        # extract item 1
        item1 = int(re.findall(r'item1 is: (\d+)', data)[0]) * inputs['required_quantity']
        # extract list of item2 
        item2 = [int(i) for i in re.findall(r'item2: (\d+)', data)]  

        # construct ParameterData node
        output_node = get_data_class('parameter')(dict={ 
            'item1': item1,
            'item2': item2
            }

        # each of the ``PARSABLE_ITEMS``s from above must be a key in the returned dict
        return {'examples': output_node}

If the write() method of the ExampleFileParser should be used and the ExampleFileParser has been initialized with Aiida data other than SingleFileData, _init_with_data() and _parsed_obj have to be overridden as well.

Adding and registering a file parser

File parsers can be added or replaced at run-time by using the interface VaspParser.add_file_parser(file_name, parser_definition), where file_name is the name of the output file this parser is supposed to be operating on and parser_definition must contain the following keys:

parser_cls the reference for the file parser class that the VaspParser should instantiate.
is_critical bool that controls whether the parsing should be aborted with an error message if the file corresponding to the file parser has not been retrieved.

Dealing with equivalent quantities

Some quantities can be parsed from more than one output file and the VaspParser will have to decide on which quantity to parse based on user input and priorities. In order to keep the process of adding and replacing file parsers as well as quantities simple the system must be flexible. This is achieved by:

Requiring that the quantity names are unique. The VaspParser can then decide which out of all available quantities to parse, which specific file is needed. The naming should follow fileName_quantity.
Defining one of the equivalent quantities as the main quantity by setting the 'alternatives' list in the definition of that quantity. The VaspParser then checks which of all of the alternative quantities can be parsed based on the available files. This is intended as a flexible way of assigning a priority to each individual quantity.
Quantities that are an alternative to another quantity can also be marked as such by setting 'is_alternative': another_quantity in the definition of a quantity. The quantity will then be automatically added to another_quantities 'alternatives' list. If another_quantity does not exist, a dummy quantity will be created, that cannot be parsed. This is intended as a way to update the priority order of a certain quantity without modifying the source code of the originally defining file parser for another_quantity.

Customization

Output nodes

Which output nodes will be added to the VaspCalculation can be controlled by setting 'add_<nodeName>': True in the 'parser_settings' card of VaspCalculation.settings. By default 'structure', 'parameters' will be added.

Output nodes (proposed change)

Which output nodes will be added to the VaspCalculation can be controlled by setting 'add_<nodeName>': True in the 'parser_settings' card of VaspCalculation.settings. By default 'structure', 'parameters', 'energies' and 'kpoints' will be added.

Required output files

Which output files are required/optional can be controlled by the 'is_critical' flag in parser.file_parser_definitions. If a file marked as 'is_critical' parsing will be aborted with success = False.

Set of file parsers

Which set of file parsers will be loaded from 'parsers.file_parser_definitions' can be controlled by 'file_parser_set' in 'parser_settings'.

Output file from which a quantity will be parsed

In case that a quantity can be parsed from more than one output file, the quantity that masters the 'alternatives' attribute will be chosen. This implicitly determines from which file it will be parsed. If the main quantity can not be parsed, the next in the 'alternatives' list will be checked and parsed.

For overriding this default either the file parser containing the main quantity has to be replaced or the main quantity has to be overridden by VaspParser.add_parsable_quantity(...).

Compatibility

With other plugins

The VaspParser does currently conform with the names and general content of output nodes as defined in VaspParser.LINKNAME_DICT with exception of output_parameters.

Efficiency

Priority of quantities

If a quantity can be parsed from more than one file, all of the equivalent quantities will be ordered by 'alternatives' list on the main quantity. This list should reflect above mentioned criteria like robustness, size and efficiency to parse.

ToDo list

The following issues will have to be addressed in order to bring the VaspParser closer to meeting all of the requirements.

Separate output nodes from quantities

In the current implementation output nodes are quantities, for which quantity.name == quantity.nodeName. But output nodes are in fact similar to output files in the sense that they can contain one or more quantities. Separating output nodes from quantities will improve the VaspParser in the following ways:

the code determining whether a quantity is an output node becomes obsolete improving readability and extendability.
the quantities that will be assigned to an output node could be customized. Right now this is only possible for the 'output_parameters' node.

A suggested first step to separate output nodes and quantities would be to turn the VaspParser.LINKNAME_DICT into VaspParser.output_node_definitions and then check whether quantity.nodeName is in that dictionary in order to determine whether that quantity should be attached to that output node.

Implement a way to control the storage behavior of output files

The requirement of avoiding long term storage of large output files is currently not met.

Add `_init_with_vasp_parser` to the `BaseFileParser`

Registering the file parser to the VaspParser is currently done within the BaseFileParser.__init__. This could be moved to a new method _init_with_vasp_parser that will be called by init_with_kwargs. init_with_kwargs could then be moved to BaseFileParser.__init__ which will allow inheriting from BaseFileParser without specifying an __init__ improving the VaspParsers extensibility.

Check compatibility with other plugins

In order to make the VaspParser compatible with other plugins, the content of 'output_parameters' should be checked and adjusted accordingly.

Check and align more complex data structures in the link nodes.

It is currently not decided on how to store multidimensional data in the Aiida data structures and how the parsers should unite on this.

Implement a parser for stdout and stderr that triggers on known errors and warnings related to VASP

The two next points need this.

Implement a special output node for errors and warnings regarding known VASP failures

Implement constant monitoring of the errors and warnings from VASP failures

Improve the interface for what quantities should be parsed

The current implementation only allows to control which quantities will be parsed by means of VaspParser.add_parsable_quantity, 'VaspParser.add_quantity_to_parse' and VaspParser.add_file_parser. In order to fulfill the requirements, there should be a way to do this by e.g. providing a list of quantities in the 'parser_settings'.

From the `spec.output` definitions in workchain topography, configure the parser accordingly.

Collect `output_parameters` from different parsers

It should be possible to collect all output_parameters from whatever parsers the users request. Now, if we want to add results from both vasprun.xml and OUTCAR we need to write both inside the OUTCAR or the vasprun.xml parser. This is not so clean.

Add `output_incar`

We need this in order to check calculations etc.

Consider to change the name of `output_parameters` to something like `output_properties`

This container should contain all kinds of trivial strings, scalars etc. that it does not make sense to put into its own container. parameters is maybe a bit misleading. Need to be coordinated with aiida_core. Open a ticket on this.

Default Parser Spec

General

Purpose of this document

Terminology

Terminology (proposed change)

Working principle of the parser

Names

Inheritance

Requirements

Correctness

Correctness (proposed change)

Extendability

Customizability

Compatibility

With other plugins

With BaseRestartWorkChain

With BaseRestartWorkChain (proposed change)

With BaseRestartWorkChain (counter proposal by DropD)

Reasoning for the counter proposal:

Efficiency

Current implementation

Extendability

Separate file parsers

File parser interface

Example file parser

Adding and registering a file parser

Dealing with equivalent quantities

Customization

Output nodes

Output nodes (proposed change)

Required output files

Set of file parsers

Output file from which a quantity will be parsed

Compatibility

With other plugins

Efficiency

Priority of quantities

ToDo list

Separate output nodes from quantities

Implement a way to control the storage behavior of output files

Add _init_with_vasp_parser to the BaseFileParser

Check compatibility with other plugins

Check and align more complex data structures in the link nodes.

Implement a parser for stdout and stderr that triggers on known errors and warnings related to VASP

Implement a special output node for errors and warnings regarding known VASP failures

Implement constant monitoring of the errors and warnings from VASP failures

Improve the interface for what quantities should be parsed

From the spec.output definitions in workchain topography, configure the parser accordingly.

Collect output_parameters from different parsers

Add output_incar

Consider to change the name of output_parameters to something like output_properties

Clone this wiki locally

Add `_init_with_vasp_parser` to the `BaseFileParser`

From the `spec.output` definitions in workchain topography, configure the parser accordingly.

Collect `output_parameters` from different parsers

Add `output_incar`

Consider to change the name of `output_parameters` to something like `output_properties`