Default Parser Spec
This document describes the class which inspects a finished VaspCalculation and aims to provide end user level digested information about it, specifically about the results. This happens by parsing the files found in the retrieved folder (output link name: retrieved
).
This document is meant as a base for discussing design changes, and is supposed to be updated along with changes on the implementation level.
This document is meant as a base for discussing design changes, and is supposed to be updated along with changes on the implementation level.
- parser: A
aiida.parser.Parser
instance, which is intended to parse a finishedCalculation
node - file parser: Any code entity designed to be called by the
VaspParser
through a common interface in order to parse quantities from an output file - output file: Any file written or modified by VASP during the course of a simulation run
- quantity: A physical quantity which describes the system simulated using VASP. The goal of running VASP is always to calculate one or more quantities. VASP typically records the quantities it calculates in one or more output files
- file parser: Any code entity designed to be called by the
VaspParser
through a common interface, or directly in order to parse quantities from an output or input file - output file: Any file written or modified by VASP during the course of a simulation run
- input file: Any file used as an input to VASP
- quantity: A quantity which describes a physical property, or collection of it of a system simulated using VASP.
After obtaining the output files of VASP it will proceed to create database records of user requested quantities and attach them (through the existing aiida.parsers.Parser
machinery) as output links to the calculation.
The default parser is called VaspParser
. It must be loadable by the AiiDA ParserFactory
as follows: ParserFactory('vasp.vasp')
.
The VaspParser
must be inherited from aiida.parsers.Parser
. It does so via aiida_vasp.parsers.base.BaseParser
, which abstracts away some of the common tasks parsers perform.
- In no way shall the default parser falsify or fabricate results
- The default parser will always strife to present results in an unambiguous fashion
- In no way shall the default parser falsify or fabricate results
- The default parser will always strife to present results in an unambiguous fashion
- Metadata should be preserved if present.
- TODO: give an example for metadata (For new devs who don't know EMMC) and / or define 'metadata' in the Terminology section
- It must be possible to add parsing capability for output files or quantities without touching the
VaspParser
classes source code. The file in which it is defined should not be part of the modification. This ensures low entry barrier for contributors and remedies common VCS conflicts.
- Currently this is achieved by splitting off file parsers into separate classes and registering the file parsers along with which quantities they parse outside of the
VaspParser
class.
- It must be possible to pass details to the
VaspCalculation
, which will control the parser in the following ways:
- Instruct the parser which output files are required and optional
- Record which quantities are required and optional. From this the parser can conclude which output files are needed. For required quantities the parser shall fail if the output files are missing.
- Instruct to parse quantities from another output file than the default one if need be or if the user insists
- The details shall be given as
parser_settings
dictionary in thesettings
input of theVaspCalculation
- It must be easy to write custom parsers (by subclassing or by adhering to a method interface), which benefits from the existing file parsers.
- The default parser should wherever appropriate stick to the conventions set by the AiiDA-Quantumespresso plugin in terms of names, content and type of output nodes. If not, a coordinated effort should be pursued in order to synchronize further development.
- The default parser should record warnings about known types of VASP failures in a special output node, analogous to the QE plugin, such that (future) error handlers can decide on a restart strategy.
- The default parser should record warnings about known types of VASP failures in a special output node, analogous to the QE plugin, such that (future) error handlers can decide on a restart or exit strategy. Constant monitoring of warnings and errors are also necessary in order to avoid excessive and unnecessary usage of computational resources.
- ... To that end capability to read output from an additional monitoring job which can be launched alongside VASP runs may be added. Such a monitoring script is planned to become part of AiiDA-VASP in the future.
The monitoring script should be separate from the parser, though it might reuse file parser code. Launching it alongside VASP is not something the parser can do at all (must be done in VaspCalculation
). Therefore the necessity to have such a script is not part of the VaspParser
spec, whereas potentially interfacing with it is.
- When a quantity can be parsed from multiple output files, the default parser should always choose the most robust path by default. If multiples are possible, the smallest in size should be prioritized, then the most efficient (a measure is pending). The default must be overridable on a per
VaspCalculation
basis or in a custom parser - Within the constraints given by the user the default parser must always try to avoid storage of large output files.
The current implementation meets the requirements as defined above in the following ways:
The VaspParser
does not contain any file parsing capabilities directly. Instead that functionality is contained in separate classes, one per output file type. These classes reside in the aiida_vasp.io
module.
In order to keep the VaspParser
and the file parsers separated, the following items are required as interface for parsing quantities:
-
FileParser.PARSABLE_ITEMS
: a static dictionary with definitions for the quantities this file parser can parse. It will be gathered by theVaspParser
. -
FileParser.get_quantity(quantity, inputs)
: a method that can be subscribed toVaspParser.get_quantity
. -
VaspParser.get_quantity(quantity, inputs)
: a method with@delegate
decorator to which the file parsers can subscribe. TheVaspParser
will call this method, when it wants to parse a quantity. All file parsers check, whether they can (and should, i.e. based on user priority) parse this quantity and if so return it. If none of the subscribed file parsers returns anything, the body of this method is called. -
VaspParser.get_inputs(quantity)
: a method that can be called by the file parsers in order to obtain another quantity that would be required for parsing without interacting with the other file parser directly.
The file parsers side of the interface can be inherited from io.parser.BaseFileParser
, which provides a get_quantity
method and takes care of adding it to the VaspParser
s delegate.
The intention of the base BaseFileParser
class for file parsers is to allow implementing new file parser with as little work as possible. Below is an example for a minimal setup. Two important things have to be overridden: what quantities can be parsed by the ExampleFileParser.PARSABLE_ITEMS
, and how those quantities will be parsed from the file by ExampleFileParser.parse_file()
.
import re
import py
from aiida_vasp.utils.aiida_utils import get_data_class
from aiida_vasp.io.parser import BaseFileParser
ExampleFileParser(BaseFileParser):
PARSABLE_ITEMS = {
# The name of a quantity as key. It should be unique among all of the FileParsers.
'item1': {
# This quantity will be parsed first and made available in time if possible
'inputs': ['required_quantity'],
# During setup the VaspParser will check, whether ExampleFile has been
# retrieved and initialise the corresponding parser, if this quantity is
# requested by setting any of the 'parser_settings['add_OutputNode'] = True'.
'parsers': ['ExampleFile'],
# The quantity will be added to the 'output_examples' output node
'nodeName': ['examples'],
# This prohibits the parser from trying to parse item1 without the
# ``required_quantity``.
'prerequisites: ['required_quantity'],
# (Optional) If a quantity can be parsed from more than one file, a list of
# alternative quantities can be provided here.
'alternatives': ['alternative_quantity1', ... ]
# (Optional) If this quantity is an alternative to another_quantity set this
# flag. The VaspParser will automatically add this quantity to
# ``another_quantities`` alternatives.
'is_alternative': another_quantity
}
'item2': {
'inputs': [],
'parsers': ['ExampleFile'],
'nodeName': ['examples'],
'prerequisites': [],
},
# An example for a quantity representing an ``output_node``, that should be
# attached to the VaspCalculation. At the moment quantities with ``name`` ==
# ``nodeName`` are considered as representing output_nodes.
'examples': {
'inputs': [item1, item2],
'parsers': ['ExampleFile'],
'nodeName': ['examples'],
'prerequisites': [],
}
}
def __init__(self, *args, **kwargs):
super(ExampleFileParser, self).__init__(*args, **kwargs)
self.init_with_kwargs(**kwargs)
def _parse_file(self, inputs):
# self._data_obj will be set during init.
example_file = py.path.local(self._data_obj.path)
data = example_file.read()
# extract item 1
item1 = int(re.findall(r'item1 is: (\d+)', data)[0]) * inputs['required_quantity']
# extract list of item2
item2 = [int(i) for i in re.findall(r'item2: (\d+)', data)]
# construct ParameterData node
output_node = get_data_class('parameter')(dict={
'item1': item1,
'item2': item2
}
# each of the ``PARSABLE_ITEMS``s from above must be a key in the returned dict
return {'examples': output_node}
If the write()
method of the ExampleFileParser
should be used and the ExampleFileParser
has been initialized with Aiida data other than SingleFileData
, _init_with_data()
and _parsed_obj
have to be overridden as well.
File parsers can be added or replaced at run-time by using the interface VaspParser.add_file_parser(file_name, parser_definition)
,
where file_name
is the name of the output file this parser is supposed to be operating on and parser_definition
must contain the following keys:
-
parser_cls
the reference for the file parser class that theVaspParser
should instantiate. -
is_critical
bool that controls whether the parsing should be aborted with an error message if the file corresponding to the file parser has not been retrieved.
Some quantities can be parsed from more than one output file and the VaspParser
will have to decide on which quantity to parse based on user input and priorities. In order to keep the process of adding and replacing file parsers as well as quantities simple the system must be flexible. This is achieved by:
- Requiring that the quantity names are unique. The
VaspParser
can then decide which out of all available quantities to parse, which specific file is needed. The naming should followfileName_quantity
. - Defining one of the equivalent quantities as the main quantity by setting the
'alternatives'
list in the definition of that quantity. TheVaspParser
then checks which of all of the alternative quantities can be parsed based on the available files. This is intended as a flexible way of assigning a priority to each individual quantity. - Quantities that are an alternative to another quantity can also be marked as such by setting
'is_alternative': another_quantity
in the definition of a quantity. The quantity will then be automatically added toanother_quantities
'alternatives'
list. Ifanother_quantity
does not exist, a dummy quantity will be created, that cannot be parsed. This is intended as a way to update the priority order of a certain quantity without modifying the source code of the originally defining file parser foranother_quantity
.
Which output nodes will be added to the VaspCalculation
can be controlled by setting 'add_<nodeName>': True
in the 'parser_settings' card of VaspCalculation.settings
. By default 'structure', 'parameters' will be added.
Which output nodes will be added to the VaspCalculation
can be controlled by setting 'add_<nodeName>': True
in the 'parser_settings' card of VaspCalculation.settings
. By default 'structure', 'parameters', 'energies' and 'kpoints' will be added.
Which output files are required/optional can be controlled by the 'is_critical' flag in parser.file_parser_definitions
. If a file marked as 'is_critical' parsing will be aborted with success = False
.
Which set of file parsers will be loaded from 'parsers.file_parser_definitions' can be controlled by 'file_parser_set' in 'parser_settings'.
In case that a quantity can be parsed from more than one output file, the quantity that masters the 'alternatives' attribute will be chosen. This implicitly determines from which file it will be parsed. If the main quantity can not be parsed, the next in the 'alternatives' list will be checked and parsed.
For overriding this default either the file parser containing the main quantity has to be replaced or the main quantity has to be overridden by VaspParser.add_parsable_quantity(...)
.
The VaspParser
does currently conform with the names and general content of output nodes as defined in VaspParser.LINKNAME_DICT
with exception of output_parameters
.
If a quantity can be parsed from more than one file, all of the equivalent quantities will be ordered by 'alternatives'
list on the main quantity. This list should reflect above mentioned criteria like robustness, size and efficiency to parse.
The following issues will have to be addressed in order to bring the VaspParser
closer to meeting all of the requirements.
In the current implementation output nodes are quantities, for which quantity.name == quantity.nodeName
. But output nodes are in fact similar to output files in the sense that they can contain one or more quantities. Separating output nodes from quantities will improve the VaspParser
in the following ways:
- the code determining whether a quantity is an output node becomes obsolete improving readability and extendability.
- the quantities that will be assigned to an output node could be customized. Right now this is only possible for the 'output_parameters' node.
A suggested first step to separate output nodes and quantities would be to turn the VaspParser.LINKNAME_DICT
into VaspParser.output_node_definitions
and then check whether quantity.nodeName
is in that dictionary in order to determine whether that quantity should be attached to that output node.
The requirement of avoiding long term storage of large output files is currently not met.
Registering the file parser to the VaspParser
is currently done within the BaseFileParser.__init__
. This could be moved to a new method _init_with_vasp_parser
that will be called by init_with_kwargs
. init_with_kwargs
could then be moved to BaseFileParser.__init__
which will allow inheriting from BaseFileParser
without specifying an __init__
improving the VaspParser
s extensibility.
In order to make the VaspParser
compatible with other plugins, the content of 'output_parameters' should be checked and adjusted accordingly.
It is currently not decided on how to store multidimensional data in the Aiida data structures and how the parsers should unite on this.
The two next points need this.
The current implementation only allows to control which quantities will be parsed by means of VaspParser.add_parsable_quantity
, 'VaspParser.add_quantity_to_parse' and VaspParser.add_file_parser
. In order to fulfill the requirements, there should be a way to do this by e.g. providing a list of quantities in the 'parser_settings'.
It should be possible to collect all output_parameters
from whatever parsers the users request. Now, if we want to add results from both vasprun.xml and OUTCAR we need to write both inside the OUTCAR or the vasprun.xml parser. This is not so clean.
We need this in order to check calculations etc.
This container should contain all kinds of trivial strings, scalars etc. that it does not make sense to put into its own container. parameters
is maybe a bit misleading. Need to be coordinated with aiida_core
. Open a ticket on this.