Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify metadata #1274

Open
shivupa opened this issue Sep 4, 2023 · 5 comments
Open

Unify metadata #1274

shivupa opened this issue Sep 4, 2023 · 5 comments
Assignees
Milestone

Comments

@shivupa
Copy link
Contributor

shivupa commented Sep 4, 2023

          I'm more than happy to improve the spec and/or the [`avogadro-cclib` plugin](https://github.com/OpenChemistry/avogadro-cclib)

One thing I note, is that the metadata is really inconsistent across parsers.

  • Gaussian gives functional but Orca and GAMESS do not
  • Orca gives me a list of all keywords
  • NWChem gives me 'functional': 'Becke', for dvb_dispersion_bp86_d3zero
  • GAMESS gives me 'methods': ['RHF'] for dvb_dispersion_bp86_d3zero
  • etc.

I'd really like to be able to read an output file and then re-use those methods / keywords in the input generators.

Originally posted by @ghutchis in #1269 (comment)

@ghutchis
Copy link
Contributor

ghutchis commented Sep 4, 2023

For example, it might be nice to separate dispersion from the functional:

'functional' : 'bp86',
'dispersion' : 'd3zero',
'methods': 'dft',

I like the idea of a string repeating all the keywords, since that may be useful to other programs / validation, etc.

@berquist berquist added this to the v1.8.1 milestone Sep 5, 2023
@berquist berquist self-assigned this Sep 5, 2023
@oliver-s-lee
Copy link
Contributor

+1 for this whole endeavour, I think metadata is long overdue some loving attention :)

I like the idea of the keywords line, particularly because it doesn't require any strenuous parsing on cclib's side. Is there any desire to do anything more with keywords (splitting into separate keywords with their associated options, standardising short and long names for Gaussian etc)?

Dispersion should definitely be separate IMHO. We may also want to look at standardising functional names like we do for symmetry labels, eg to deal with PBE0 Vs PBE1PBE. Related and because it came up before, we don't currently parse functionals for ORCA because, weirdly, it doesn't report the functional using its common name anywhere.

This is ORCA's way of reporting PBE0:

------------
SCF SETTINGS
------------
Hamiltonian:
 Density Functional     Method          .... DFT(GTOs)
 Exchange Functional    Exchange        .... PBE
   PBE kappa parameter   XKappa         ....  0.804000
   PBE mue parameter    XMuePBE         ....  0.219520
 Correlation Functional Correlation     .... PBE
   PBE beta parameter  CBetaPBE         ....  0.066725
 LDA part of GGA corr.  LDAOpt          .... PW91-LDA
 Gradients option       PostSCFGGA      .... off
 Hybrid DFT is turned on
   Fraction HF Exchange ScalHFX         ....  0.250000
   Scaling of DF-GGA-X  ScalDFX         ....  0.750000
   Scaling of DF-GGA-C  ScalDFC         ....  1.000000
   Scaling of DF-LDA-C  ScalLDAC        ....  1.000000
   Perturbative correction              ....  0.000000
   Density functional embedding theory  .... OFF
   NL short-range parameter             ....  6.900000

And B3LYP:

------------
SCF SETTINGS
------------
Hamiltonian:
 Density Functional     Method          .... DFT(GTOs)
 Exchange Functional    Exchange        .... B88
   X-Alpha parameter    XAlpha          ....  0.666667
   Becke's b parameter  XBeta           ....  0.004200
 Correlation Functional Correlation     .... LYP
 LDA part of GGA corr.  LDAOpt          .... VWN-5
 Gradients option       PostSCFGGA      .... off
 Hybrid DFT is turned on
   Fraction HF Exchange ScalHFX         ....  0.200000
   Scaling of DF-GGA-X  ScalDFX         ....  0.720000
   Scaling of DF-GGA-C  ScalDFC         ....  0.810000
   Scaling of DF-LDA-C  ScalLDAC        ....  1.000000
   Perturbative correction              ....  0.000000
   Density functional embedding theory  .... OFF
   NL short-range parameter             ....  4.800000

@ghutchis
Copy link
Contributor

ghutchis commented Sep 5, 2023

As far as parsing functionals, I think it's easiest to do from the keywords line in Orca. I'm doing that now anyway.

I personally wouldn't attempt to standardize keywords. Honestly, a common use-case is "I want to re-run this calculation."

@oliver-s-lee
Copy link
Contributor

Yeah I think parsing from the keywords is probably easier, but the downside is there's no way of automatically determining what's a functional name and what's a normal keyword. Do you just compare against a whitelist to extract the functional name?

Restarting calculations is a cool use-case, and yes in that case there's no need to transform keywords. One thing that's worth considering when we look to implement this are 'keywords' that appear in weird places. Eg Gaussian has a few options that have to appear after the geometry section, such as the ModRedun and gen/genECP sections.

@ghutchis
Copy link
Contributor

ghutchis commented Sep 7, 2023

whitelist to extract the functional name?

For now, yes. I'm open to better suggestions, but IMHO it's possible to cover a large percentage of cases with this, since a few functionals are the most popular.

@berquist berquist modified the milestones: v1.8.1, v2.0 Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants