Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong typecasting in records #766

Open
chrisiacovella opened this issue Oct 5, 2023 · 1 comment
Open

Wrong typecasting in records #766

chrisiacovella opened this issue Oct 5, 2023 · 1 comment

Comments

@chrisiacovella
Copy link

Describe the bug

As I mentioned in the meeting the other day, I came across what I think is a few bugs in the records for the following single point datasets on the ml server for the spice datasets. It seems to specifically be impacting "spec_6" data, for the following properties:

current energy <class 'str'>
dispersion correction energy <class 'str'>
2-body dispersion correction energy <class 'str'>
b3lyp-d3(bj) dispersion correction energy <class 'str'>

For this dataset, it appears those 4 properties all store the same energy (and it is identical to 'return_energy' which is properly typed as a float). I'll note the lists of value (e.g., the fields related to gradients) are constructed correctly of floats.

The following datasets have this issue for spec_6

SPICE Solvated Amino Acids Single Points Dataset v1.0 spec_6
SPICE DES Monomers Single Points Dataset v1.0 spec_6
SPICE PubChem Set 1 Single Points Dataset v1.0 spec_6
SPICE Dipeptides Single Points Dataset v1.0 spec_6
SPICE PubChem Set 2 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 3 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 5 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 6 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 1 Single Points Dataset v1.1 spec_6
SPICE DES Monomers Single Points Dataset v1.1 spec_6
SPICE Dipeptides Single Points Dataset v1.1 spec_6
SPICE Pubchem Set 4 Single Points Dataset v1.0 spec_6
SPICE Solvated Amino Acids Single Points Dataset v1.1 spec_6
SPICE DES370K Single Points Dataset v1.0 spec_6
SPICE PubChem Set 1 Single Points Dataset v1.2 spec_6
SPICE Dipeptides Single Points Dataset v1.2 spec_6
SPICE DES370K Single Points Dataset Supplement v1.0 spec_6
SPICE PubChem Set 2 Single Points Dataset v1.2 spec_6
SPICE PubChem Set 3 Single Points Dataset v1.2 spec_6
SPICE Pubchem Set 4 Single Points Dataset v1.2 spec_6
SPICE PubChem Set 5 Single Points Dataset v1.2 spec_6
SPICE Ion Pairs Single Points Dataset v1.0 spec_6
SPICE PubChem Set 6 Single Points Dataset v1.2 spec_6
SPICE Ion Pairs Single Points Dataset v1.1 spec_6

To Reproduce

Just a quick code to loop over everything.

from qcportal import PortalClient
client = PortalClient("ml.qcarchive.molssi.org")
dataset_type = "singlepoint"


datasets = client.list_datasets()

datasets_to_consider = [] 
for dataset in datasets:
    if dataset['dataset_type'] == 'singlepoint':
        if 'SPICE' in dataset['dataset_name']:
            datasets_to_consider.append(dataset['dataset_name'])

spec = 'spec_6'
for dataset_name in datasets_to_consider:
    ds = client.get_dataset(
                dataset_type=dataset_type, dataset_name=dataset_names[0]
            )
    
    
    entry_names = ds.entry_names
    
    max_val = 1
    
    for record in ds.iterate_records(entry_names[0:max_val], specification_names=[spec]):
        has_strings = False
        for k in record[2].dict()['properties'].keys():
            if isinstance(record[2].dict()['properties'][k], str):
                has_strings = True
                #print(k, type(record[2].dict()['properties'][k]))
        if has_strings:
            print(f'{dataset_name} {spec}')
@bennybp
Copy link
Contributor

bennybp commented Oct 6, 2023

This seems to apply only to DFTD3 calculations, where the values are converted to strings: https://github.com/MolSSI/QCEngine/blob/1b27a14255817f13092ae846593b0fb7c975625b/qcengine/programs/dftd3.py#L273C41-L273C41

@loriab is looking to clean that up in qcengine soon. I can convert the existing values in the database next week.

(The DFTD3 calculations come from specifying b3lyp-d3 calculations. In the legacy version, this caused two separate records/specifications to be created - one for b3lyp and one for the d3 correction. The new version makes these existing records explicit, but no longer does the splitting for new calculations. It's a bit complicated...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants