Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Allow alternative file name of vasp_output #487

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

[WIP] Allow alternative file name of vasp_output #487

wants to merge 1 commit into from

Conversation

atztogo
Copy link
Collaborator

@atztogo atztogo commented May 16, 2021

Please check the applicable boxes, thank you:

I, the author consider this PR

  • ready to be reviewed and merged (as soon as tests have passed)
  • work in progress (early feedback welcome)

Interactions with issues / other PRs

Related to issues #482 and PR #485.

Description

This can be independent PR from PR #485 to satisfy the issue #482.
Currently I shallow-copy _ALWAYS_RETRIEVE_LIST to a method local list and replace vasp_output in the list by metadata.options.output_filename, then use this list instead of _ALWAYS_RETRIEVE_LIST. In this case, _ALWAYS_RETRIEVE_LIST is not the list of files that are always retrieved, unfortunately. However, if this option is used only for immigration, since immigration is a special case, so this breaking of the context may be acceptable. But this will be widely used, then we need refactoring.

Questions

@espenfl, @zhubonan, could you tell me how I should write the test of the following update.

I updated aiida_vasp/calcs/vasp.py to accept metadata.options.output_filename and set this string as codes_info[0].stdout_name instead of the string vasp_output. I confirmed that my stdout_name file was retrieved after this update. What I don't understand are:

  1. which code handles parsing stdout_name.
  2. what values are parsed.
  3. where and how I should write the test.

@codecov
Copy link

codecov bot commented May 16, 2021

Codecov Report

Merging #487 (9ec7b8f) into develop (285152e) will decrease coverage by 0.04%.
The diff coverage is 63.64%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #487      +/-   ##
===========================================
- Coverage    74.06%   74.03%   -0.03%     
===========================================
  Files           55       55              
  Lines         5805     5813       +8     
===========================================
+ Hits          4299     4303       +4     
- Misses        1506     1510       +4     
Impacted Files Coverage Δ
aiida_vasp/calcs/vasp.py 91.81% <63.64%> (-1.91%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 285152e...9ec7b8f. Read the comment docs.

@atztogo
Copy link
Collaborator Author

atztogo commented May 21, 2021

@zhubonan and @espenfl, I have questions.
This PR aims to accept filename instead of vasp_output.
If I understand correctly, the stream parser parses vasp_output.

  1. Which data is attached by the stream parser? misc['notification']?
  2. When vasp_output doesn't exist, how does the stream parser behaves?
  3. How can I get any value of data (i.e., non emply list of misc['nitifications']) attached to calculation or workchain node?

@espenfl
Copy link
Collaborator

espenfl commented May 21, 2021

Before answering your questions I will just make a general comment. Currently we do this:

calcinfo.codes_info[0].stdout_name = self._VASP_OUTPUT
calcinfo.codes_info[0].join_files = True

Obviously, the self._VASP_OUTPUT is easy to replace, but if there are two output files, e.g. if VASP output have not been redirected into its own output file we need to utilize the scheduler stderror and stdout files. And we probably need to concatenate them before parsing.

If we could somehow prepare this before the immigration is done, that would maybe be ideal as then we could leave the current framework intact. E.g. the parser does not know if it is an immigrant calculation or not, which I believe is maybe a behaviour that is useful.

@espenfl
Copy link
Collaborator

espenfl commented May 21, 2021

1. which code handles parsing `stdout_name`.

In fact, this is now hardcoded in the parser config, which we should change. Or we need to make sure that particular file is present (can be a file object as we will move to this anyway, which is something to consider). See FILE_PARSER_SETS, which is iterated. In there vasp_output is listed.

2. what values are parsed.

This is handled by the Stream in parsevasp. Basically we look for anything that is defined here: https://github.com/aiida-vasp/parsevasp/blob/develop/parsevasp/stream.yml

3. where and how I should write the test.

We could just make another of the test_vasp calculation tests to check this or implicitly do it under immigrant. But I think it would be nice to have a dedicated test for this.

@espenfl
Copy link
Collaborator

espenfl commented May 21, 2021

Thanks a lot @atztogo for addressing this. I will try to give you some answers below.

  • Which data is attached by the stream parser? misc['notification']?

In the Stream parser of parsevasp we look for these entries: https://github.com/aiida-vasp/parsevasp/blob/develop/parsevasp/stream.yml (not complete). Multiple entries can also be tracked. This content of this is then stored in notifications. And that content is set in the StreamParser on the plugin side as: notifications.append({'name': item.shortname, 'kind': item.kind, 'message': item.message, 'regex': regex}).

  • When vasp_output doesn't exist, how does the stream parser behaves?

That file is on the always retrieve list currently, so that should fail if it is not there. We of course need the immigrator to work without such files being present as well.

  • How can I get any value of data (i.e., non emply list of misc['nitifications']) attached to calculation or workchain node?

The attachment is done in the parser like for the other nodes. That should sit in outputs.misc.notifications if it passed parsing and the node compose.

@atztogo
Copy link
Collaborator Author

atztogo commented May 21, 2021

Maybe the current behaviour is different from what you might think. I was thinking what I have to do is little, but it seems we have large degrees of freedom in the design. Probably we should discuss in meeting.

My last question is about how I can make the real calculation where the error message can be attached to notifications. I had a look at stream.yml and tried something easy. The last one (MAGMOM) didn't show error in my VASP calculation (v5.4.4). The second last about NSW, aiida-vasp crashed :). We may need more integrated tests for stream parser.

@espenfl
Copy link
Collaborator

espenfl commented May 21, 2021

Maybe the current behaviour is different from what you might think. I was thinking what I have to do is little, but it seems we have large degrees of freedom in the design. Probably we should discuss in meeting.

That can certainly be. Also, there might be something that still does not work as it should :)

My last question is about how I can make the real calculation where the error message can be attached to notifications. I had a look at stream.yml and tried something easy. The last one (MAGMOM) didn't show error in my VASP calculation (v5.4.4). The second last about NSW, aiida-vasp crashed :). We may need more integrated tests for stream parser.

That is interesting. Let us have a look at that. Do you want to raise an issue?

@atztogo
Copy link
Collaborator Author

atztogo commented May 21, 2021

Do you want to raise an issue?

We may be able to fix this one point. But I feel I should enlarge my scope to tackle this issue around stream. I has underestimated this issue. I will investigate more to make a starting point for the discussion (for which I will need @espenfl and @zhubonan's helps.) If you agree, I will once close this PR. How do you think?

@zhubonan
Copy link
Member

zhubonan commented May 21, 2021

Hmm, this is slightly more complicated than I think. The VaspCalculation._VASP_OUTPUT is not the only thing that needs to be made updatable at the run time so the stdout/stderr are diverted and the file is retrieved in the end, but also those defined for the parser

'vasp_output': {
'parser_class': StreamParser,
'is_critical': False,
'status': 'Unkonwn'
}

If we decide to use the output_filename as the name for the STDOUT file we need to make the parser dynamically change the FILE_PARSER_SETS or a least a copy of it by inspecting the output_filename stored in the attributed of the calculation node.

I personally actually set output_filename to OUTCAR so when I do verdi calcjob outputcat <pk> the OUTCAR is printed as the default, so I can grep some keywords easily. I probably did this initially coming from used to CASTEP which prints nothing to stdout.

When vasp_output doesn't exist, how does the stream parser behaves?

With the current develop branch the calculation is likely to not fail and the parsed misc['notifications'] should be an empty list. This is due to the Stream parser in parservasp will not parse the file is the filename is None (since the file does not exist), and by default list of parsed item it empty, giving an empty notifications.

This probably explains @atztogo 's finding that the stream parser does not catch the error, as it is still looking for the vasp_output which does not exists?

@espenfl
Copy link
Collaborator

espenfl commented May 21, 2021

With the current develop branch the calculation is likely to not fail and the parsed misc['notifications'] should be an empty list. This is due to the Stream parser in parservasp will not parse the file is the filename is None (since the file does not exist), and by default list of parsed item it empty, giving an empty notifications.

Yes, this should be fixed. It is in the always retrieved list by default. And if it is not there it should be a hard fail, or at least one should enter fallback mode. Of course, if the file, while other VASP files are for some strange reason, we might be able to salvage. I think we already did check that the files was in place (but maybe not an actual check after the copy). If vasp_output is empty is an entirely different manner of course and then notifications should be an empty list for sure.

@atztogo
Copy link
Collaborator Author

atztogo commented May 22, 2021

@zhubonan,

If we decide to use the output_filename as the name for the STDOUT file we need to make the parser dynamically change the FILE_PARSER_SETS...

Filenames are used only as the keys of parser_definitions. So for example we can write like below

    def _init_parser_definitions(self, file_parser_set, vasp_output_filename=None):
        """Load a set of parser definitions."""
        if file_parser_set not in FILE_PARSER_SETS:
            return
        for filename, parser_dict in FILE_PARSER_SETS.get(file_parser_set).items():
            _filename = filename
            if filename == 'vasp_output' and vasp_output_filename is not None:
                _filename = vasp_output_filename
            self._parser_definitions[_filename] = deepcopy(parser_dict)

I treat vasp_output_filename as a special one, though in principle, we can provide arbitrary file_parser_set to modify filenames.

@atztogo
Copy link
Collaborator Author

atztogo commented May 22, 2021

I personally actually set output_filename to OUTCAR so when I do verdi calcjob outputcat the OUTCAR is printed as the default, so I can grep some keywords easily. I probably did this initially coming from used to CASTEP which prints nothing to stdout.

This is a good idea and an independent issue. Currently the default output_filename is empty and alternative file name of vasp_output can be given not through it.

@atztogo
Copy link
Collaborator Author

atztogo commented May 22, 2021

Yes, this should be fixed. It is in the always retrieved list by default. And if it is not there it should be a hard fail, or at least one should enter fallback mode. Of course, if the file, while other VASP files are for some strange reason, we might be able to salvage. I think we already did check that the files was in place (but maybe not an actual check after the copy). If vasp_output is empty is an entirely different manner of course and then notifications should be an empty list for sure.

Do vasp_output have to be in ALWAYS_RETRIEVED_LIST?
If possible, I want to have a specification of the file and data flow as a diagram if possible, because I lost my way. Fallbacks or exceptions are important concepts, but if they are chained, it will cost us. Software design may ease it.

@zhubonan
Copy link
Member

Do vasp_output have to be in ALWAYS_RETRIEVED_LIST?

Yes, I mean the actual file that the stdout is diverted to should be in the ALWAYS_RETRIEVED_LIST.
So for it to work, the VaspCalculation must be able to set the name of the STDOUT name, and add it to the retrieve list and save both in the calcinfo. The parser also needs to be able to identify which file the STDOUT is diverted toat the time of parsing. These are the only two places that the name is (which defaults to vasp_output is used).

@atztogo
Copy link
Collaborator Author

atztogo commented May 23, 2021

Yes, I mean the actual file that the stdout is diverted to should be in the ALWAYS_RETRIEVED_LIST.

Is the term ALWAYS_RETRIEVED_LIST well defined? For example, I add an element fakefilename in ALWAYS_RETRIEVED_LIST, I get no error. So probably this class variable name ALWAYS_RETRIEVED_LIST is misleading.

@atztogo
Copy link
Collaborator Author

atztogo commented May 23, 2021

I think this PR to be merged, my propose of being done will be minimized to those:

  1. Create new Str input ports named stdout_name in VaspCalculation and VaspWorkChain (maybe the default is vasp_output).
  2. Set calcinfo.codes_info[0].stdout_name = self.inputs.stdout_name.value
  3. FILE_PARSER_SET['default'] is changed to an instance variable of ParserDefinitions and take care of stdout_name as a special case among other file name keys, and stdout_name is stored as a key of parser_definitions.
  4. vasp_output is removed from the list _ALWAYS_RETRIEVE_LIST, and stdout_name is added to calcinfo.retrieve_list in prepare_for_submission if store is True, otherwise to calcinfo.retrieve_temporary_list.

In this way, I don't need to consider the file & process handler chains and this PR to be well independent. If we don't like this, I will simply close this PR.

@zhubonan
Copy link
Member

Is the term ALWAYS_RETRIEVED_LIST well defined? For example, I add an element fakefilename in ALWAYS_RETRIEVED_LIST, I get no error. So probably this class variable name ALWAYS_RETRIEVED_LIST is misleading.

I think so. When calculation is prepared and passed to aiida, we attach a list of files to be retrieved. The ALWAYS_RETRIEVE_LIST contains the files that will always be passed, regardless of other settings. The user can also supply additional files to be retrieved (ADDITIONAL_RETRIEVE_LIST in settings). AiiDA will get the file if it exists, and if it does not, it will not raise any error - it would be the job of the parser plugin to do this. I guess that means that if a code does generate different files depending on the inputs, one can oversupply a large list so the files will be "relieved if they do exist". The behaviour you had is the expected one.

@zhubonan
Copy link
Member

I think this PR to be merged, my propose of being done will be minimized to those:

1. Create new `Str` input ports named `stdout_name` in `VaspCalculation` and `VaspWorkChain` (maybe the default is `vasp_output`).

2. Set `calcinfo.codes_info[0].stdout_name = self.inputs.stdout_name.value`

3. `FILE_PARSER_SET['default']` is changed to an instance variable of `ParserDefinitions` and take care of `stdout_name` as a special case among other file name keys, and `stdout_name` is stored as a key of `parser_definitions`.

4. `vasp_output` is removed from the list `_ALWAYS_RETRIEVE_LIST`, and `stdout_name` is added to `calcinfo.retrieve_list` in `prepare_for_submission` if `store` is True, otherwise to `calcinfo.retrieve_temporary_list`.

Sounds good to me!

One thing I would like to suggest is that you can just add the stdout_name as one of the metadata.options of the VaspCalculation (same as the output_filename currently) and make the default to be vasp_output. This way, the field is stored just like other options inside the attributes of the CalculationNode, and not directly on the provenance graph as a single node. Otherwise, all calculation/workchains will have this input node. For workchains, one just has to pass the desired value in the options input port if needed, similar to other fields such as account etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants