Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some datasets have placeholder spec names after migration #739

Open
j-wags opened this issue Aug 31, 2023 · 7 comments
Open

Some datasets have placeholder spec names after migration #739

j-wags opened this issue Aug 31, 2023 · 7 comments

Comments

@j-wags
Copy link

j-wags commented Aug 31, 2023

Describe the bug

Some datasets have had their spec names replaced by strings like spec_1, spec_2, etc in the new QCA server.

To Reproduce

In QCFractal 0.15:

from qcportal import FractalClient
client = FractalClient()
ds = client.get_collection("Dataset", "OpenFF Theory Benchmarking Single Point Energies v1.0")
spec_names = [h[4] for h in ds.data.history]
print(spec_names)

outputs

['b3lyp-d3bj/def2-qzvp', 'b3lyp-d3bj/6-311+g**', 'wb97m-d3bj/dzvp', 'm05-2x-d3/dzvp', 'default', 'b97-d3bj/def2-tzvp', 'm08-hx-d3/dzvp', 'dsd-blyp-d3bj/heavy-aug-cc-pvtz', 'pw6b95-d3/dzvp', 'b3lyp-d3bj/def2-tzvpd', 'gfn1xtb', 'wb97m-v/dzvp', 'pw6b95-d3bj/dzvp', 'wb97m-d3bj/dzvp', 'm06-2x-d3/dzvp', 'pw6b95-d3bj/dzvp', 'mp2/heavy-aug-cc-pv(t+d)z', 'b3lyp-d3bj/def2-tzvp', 'wb97x-d3bj/dzvp', 'b3lyp-d3bj/6-31+g**', 'mp2/aug-cc-pvtz', 'b3lyp-nl/dzvp', 'gfn2xtb', 'b3lyp-d3bj/6-311+g**', 'b3lyp-d3bj/def2-tzvpp', 'b3lyp-d3bj/def2-tzvppd', 'm05-2x-d3/dzvp', 'm08-hx-d3/dzvp', 'wb97m-d3bj/dzvp', 'b97-d3bj/def2-tzvp', 'b3lyp-d3bj/def2-tzvppd', 'pw6b95-d3/dzvp', 'b3lyp-d3bj/def2-tzvpp', 'gfnff', 'b97-d3bj/def2-tzvp', 'b3lyp-d3mbj/dzvp', 'default', 'm06-2x-d3/dzvp', 'b3lyp-d3bj/def2-qzvp', 'wb97x-d3bj/dzvp', 'ani2x', 'df-ccsd(t)/cbs', 'b3lyp-d3bj/def2-tzvp', 'b3lyp-d3bj/6-31+g**', 'b3lyp-d3mbj/dzvp', 'b3lyp-d3bj/def2-tzvpd', 'dsd-blyp-d3bj/heavy-aug-cc-pvtz']

Using QCPortal 0.50, I believe the equivalent code is:

from qcportal import PortalClient
client = PortalClient()
ds = client.get_dataset("singlepoint", "OpenFF Theory Benchmarking Single Point Energies v1.0")
print(ds.specification_names)

which outputs

['spec_5', 'spec_26', 'spec_46', 'spec_47', 'spec_6', 'spec_14', 'spec_37', 'spec_41', 'spec_24', 'spec_19', 'spec_28', 'spec_11', 'spec_39', 'spec_38', 'spec_9', 'spec_45', 'spec_40', 'spec_44', 'spec_36', 'spec_8', 'spec_2', 'spec_34', 'spec_22', 'spec_27', 'spec_30', 'spec_33', 'spec_35', 'spec_7', 'spec_4', 'spec_43', 'spec_18', 'spec_13', 'spec_3', 'spec_10', 'spec_31', 'spec_15', 'spec_29', 'spec_21', 'spec_42', 'spec_16', 'spec_20', 'spec_12', 'spec_25', 'spec_23', 'spec_1', 'spec_17', 'spec_32']

Additional notes
It seems like the spec lookup logic continued to work fine for the optimization and torsiondrive datasets in our testing, so this may just be a problem with migrating single point datasets.

@bennybp
Copy link
Contributor

bennybp commented Aug 31, 2023

IIRC, the previous version, the singlepoint datasets had 'aliases', but these only referred to sets of keywords rather than a whole specification. So I had to have placeholders for the whole specification name.

Specifications can be renamed (ds.rename_specification()) so for formulaic specification names, you can write a script that does this automatically. I can help with this if you would like

@j-wags
Copy link
Author

j-wags commented Aug 31, 2023

Thanks. Would you be open to a script that replaces the placeholder specifications for relevant OpenFF datasets on the central QCArchive, or should I do it client-side?

@bennybp
Copy link
Contributor

bennybp commented Aug 31, 2023

Client-side would be sufficient. Renaming a specification is fast, and it can handle doing a bunch of them

@j-wags
Copy link
Author

j-wags commented Sep 1, 2023

Ah, I may not have asked clearly - Are you willing to have these spec names changed in QCArchive itself, or do you plan to stick with the placeholder names? If it's the former I'll put together a script to do the renaming on QCA. If it's the latter I'll need to provide utilities within OpenFF to make our workflows continue working with the placeholder names.

@bennybp
Copy link
Contributor

bennybp commented Sep 1, 2023

oh feel free to rename them on the server itself. They are your datasets after all :)

@j-wags
Copy link
Author

j-wags commented Sep 1, 2023

Excellent - Thanks!

@j-wags
Copy link
Author

j-wags commented Sep 6, 2023

Update: I'll still take this on, but after I get QCSubmit updated!

@bennybp bennybp changed the title [next] Some datasets have placeholder spec names after migration Some datasets have placeholder spec names after migration Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants