Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QChem Adapter & Molpro Updates #712

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

QChem Adapter & Molpro Updates #712

wants to merge 31 commits into from

Conversation

calvinp0
Copy link
Member

This PR includes QChem as an adapter and also provides updates to the Molpro adapter

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

arc/scheduler.py Fixed Show fixed Hide fixed
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Attention: 224 lines in your changes are missing coverage. Please review.

Comparison is base (1f561ed) 73.81% compared to head (f486220) 73.52%.
Report is 3 commits behind head on main.

❗ Current head f486220 differs from pull request most recent head 5e4b433. Consider uploading reports for the commit 5e4b433 to get more accurate results

Files Patch % Lines
arc/job/trsh.py 37.79% 69 Missing and 10 partials ⚠️
arc/job/adapters/qchem.py 42.15% 48 Missing and 11 partials ⚠️
arc/parser.py 63.15% 43 Missing and 6 partials ⚠️
arc/scheduler.py 0.00% 13 Missing ⚠️
arc/species/converter.py 13.33% 12 Missing and 1 partial ⚠️
arc/species/vectors.py 14.28% 5 Missing and 1 partial ⚠️
arc/job/adapters/molpro.py 80.00% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #712      +/-   ##
==========================================
- Coverage   73.81%   73.52%   -0.29%     
==========================================
  Files          99      100       +1     
  Lines       27346    27650     +304     
  Branches     5717     5790      +73     
==========================================
+ Hits        20185    20331     +146     
- Misses       5734     5895     +161     
+ Partials     1427     1424       -3     
Flag Coverage Δ
unittests 73.52% <60.77%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Began reviewing, but cannot finish today, left a few comments

@@ -0,0 +1,152 @@
software,basis_set,atoms_supported,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly missing a b in the commit message

@@ -326,7 +326,7 @@ def test_set_cpu_and_mem(self):
self.job_4.cpu_cores = None
self.job_4.set_cpu_and_mem()
self.assertEqual(self.job_4.cpu_cores, 8)
expected_memory = math.ceil((14 * 1024 * 1.1) / self.job_4.cpu_cores)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message here should be something like "Minor: Style modification in job adapter test"
(I understand that it is probably a result of a rebase)

@@ -145,7 +145,7 @@ def test_set_files(self):
'source': 'path',
'make_x': False},
]
job_1_files_to_download = [{'file_name': 'input.out',
job_1_files_to_download = [{'file_name':'output.out',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@calvinp0 calvinp0 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was changed in this PR


if memory_values:
min_memory_value = min(memory_values)
required_cores = math.floor(max_memory / (min_memory_value * 7.45e-3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain 7.45e-3? If it's the product of some factors, can we explicitly write them for maintenance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mega word; 1000 MW = 7.45 GB on a 64-bit machine

It is in the doc string of the function

if self.core_change is None:
self.core_change = required_cores
elif self.core_change == required_cores:
# We have already done this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does self.core_change pass on between a failed job to a new job that fixes it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% certain, but as long as the adapter class is not re-instantiated

self.input_file_memory = math.ceil(self.job_memory_gb / (7.45e-3 * self.cpu_cores)) if 'zeus' not in socket.gethostname() else math.ceil(self.job_memory_gb / (7.45e-3))



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: please remove the extra line breaks

"""
Generates the angles for a Q-Chem scan. The scan is split into two parts, one from start_angle to 180, and one from -180 to end_angle.

Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to the Google docsting style used throughout ARC (can also probably set your IDE to automatically do adopt that style from now on)

@@ -362,6 +562,63 @@ def execute_queue(self):
Execute a job to the server's queue.
"""
self.legacy_queue_execution()

def software_input_matching(self, basis):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems important. Although only implemented for Q-Chem currently, it feels like this function would be more at home in job/adapters/common.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. This function is only for QChem and if we are to commit to the idea that anyone can create an adapter for ARC, then any function that is only relevant for that adapter should stay with that adapter. I also believe this should be true in the future for troubleshooting

# If method ends with D3, then we need to remove it and add the D3 as a keyword. Need to account for -D3
if input_dict['method'].endswith('d3') or input_dict['method'].endswith('-d3'):
input_dict['method'] = input_dict['method'][:-2]
# Remove the - if it exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, let's use as few comments as possible, especially if the code is self-explanatory

if input_dict['method'].endswith('-'):
input_dict['method'] = input_dict['method'][:-1]
# DFT_D - FALSE, EMPIRICAL_GRIMME, EMPIRICAL_CHG, D3_ZERO, D3_BJ, D3_CSO, D3_ZEROM, D3_BJM, D3_OP,D3 [Default: None]
# TODO: Add support for other D3 options. Check if the user has specified a D3 option in the level of theory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this weakness as well, so it's an unfair request: Let's not check-in TODO comments into the production code, because then they practically become to-not-do's. Would we like to ask developers in our ecosystem to create issues instead of TODOs? (comment inspired by "Uncle Bob")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends what you mean but creating issues. As in the Issues tab? I don't think that is a good idea for ARC since the Issues section is not monitored and only you, me and @kfir4444 deal with issues that we come across.

@@ -18,7 +18,7 @@ class TestSubmit(unittest.TestCase):
def test_servers(self):
"""Test server keys in submit_scripts"""
for server in submit_scripts.keys():
self.assertTrue(server in ['local', 'atlas', 'txe1', 'pbs_sample', 'server1', 'server2'])
self.assertTrue(server in ['local', 'atlas', 'txe1', 'pbs_sample', 'server1', 'server2', 'azure'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As CodeQL suggests above, we should modify assertTrue here to assertIn to get a more meaningful error message if this doesn't pass

arc/common.py Fixed Show fixed Hide fixed
@@ -274,6 +275,30 @@
job_status = 'errored'
else:
job_id = _determine_job_id(stdout=stdout, cluster_soft=cluster_soft)
# Check if job was ACTUALLY submitted
temp = check_running_jobs_ids()

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable temp is not used.
Comment on lines +1530 to +1542
# if isinstance(object_, Species):
# # Check if object_.molecule is a list and then iterate over it
# for i in range(len(object_.molecule)):
# if isinstance(object_.molecule[i], Molecule):
# # Analyse the molecule to determine whether it is an aryl radical and aromatic
# features_molecule = analyze_molecule(mol=object_.molecule[i])
# # If the molecule is an aryl radical and aromatic, remove the aryl radical flag
# if features_molecule['is_aryl_radical'] and features_molecule['is_aromatic']:
# features_molecule['is_aryl_radical'] = False
# # Generate optimal aromatic resonance structures using the adjusted features
# mol_list = generate_optimal_aromatic_resonance_structures(mol=object_.molecule[i], features = features_molecule, save_order=True)
# # If the list of resonance structures is not empty, update the object
# object_.molecule[i]= mol_list[0]

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
the adapter now can recognise through additional information if the job was killed due to exceeding the wall time
it also now has a troubleshoot queue function
when writing the submit file, the function will now attempt to see if the user provided a dictionary of queue(s) with wall times and then place them in the submit file template

Test:
Written a test for the recognition of ServerTimeLimit keyword
…t attempts to consider all the OPT job but does not determine whether the OPT job has a status of done
Built a dataframe of possible combinations of basis sets that QChem requires and the correct format for the input file. Additionally, ensured that it is uploaded in the git push. This is a WIP as there may be more basis sets to add or even fix!
Molpro Adapter: Molpro needs a different touch to troubleshooting its memory. Here in setting the input file memory we determine if the MWords was enough per process but the total memory was too high. If that's the case, we reduce the processes req. while maintaining the memory per process
Update
Adjusted the file name to download from input.out to output.out
 QChem Adapter

0. QChem Adapter: Import - Pandas, ARC_PATH, rapidfuzz
1. QChem Adapter: Input Template now supports IRC and {trsh} args and ensures IQMOL_FCHK is false (This can be set to true BUT be aware this .fchk file can be rather large)
2. QChem Adapter: write_input_file - basis set is now matched via the software_input_matching function
3. QChem Adapter: write_input_file - QChem now supports D3 method. We should look at working with other DFT_D methods in the future. More specifically there are other types of D3 methods
4. QChem Adapter: write_input_file - Correctly pass in troubleshooting arguments into the input file
5. QChem Adapter: write_input_file - Capitalised TRUE/FALSE in UNRESTRICTED parameter
6. QChem Adapter: write_input_file - Removed the scan job type and moved it to another section of the input file writing
7. QChem Adapter: write_input_file - If scan is set, the job_type is PES_SCAN. We also set the fine to be XC_GRID 3. However, we may need to look into changing the tolerances later
8. QChem Adapter: write_input_file - We now write correctly the torsional scans for the input file for a scan
9. QChem Adapter: write_input_file - IRC is now supported, however this input file means we run two jobs from the one input file - A  FREQ job and then IRC. This currently works but will need improvements when used more by users
10. QChem Adapter: write_input_file - Ensuring that the SCF CONVERGENCE is 10^-8 for certain job types
11. QChem Adapter: [NEWFUNCTION] generate_scan_angles - to support PES SCAN jobs, we have a function that will look at what the required angle we want to scan, and the step size, and then return a start and end angle between -180, 180 that will ensure we scan the require angle during the stepping
12. QChem Adapter: [NEWFUNCTION] software_input_matching - Since QCHEM has different formatting for basis sets, this function will try take the users format of the basis set and match it against a dataframe (which should always be updated if its missing a format). This uses the new package in the ARC environment called rapidfuzz

Additionally - Updated QChem Tests
TrshQChem: Added in new trsh keywrods

TrshQChem: 'Error within run_minimization with minimization method' - Not certain what this error requires, and also if we should troubleshoot it if the job type is a 'conformer'. For now, we will re-run the job under the same conditions and if it fails again, we will declare it not possible to troubleshoot
remove 'break'

TrshMolpro: Parse new array length memory

Updated Trsh Tests
For future reference
Added QChem for the Normal Mode Displacement parsing and also change the default bool for raise_error
1. Scheduler: Now imports JobError
2. Scheduler: Fixed adding trsh to the args
3. Scheduler: Added JobError exception for determining job status
4. Scheduler: Now removing remote jobs at the end of the scheduler - !!!MAY NEED TO SEE IF THIS HAS AN EFFECT ON NON SSH JOBS!!!
5. Scheduler: Getting the recent opt job name via properly checking if the opt job was considered done (This was not done before)
6. Scheduler: TODO - We attempt to trouble shoot a frequency we deem not okay. Yet, there is no specific troubleshoot method, so why do we do this?
7. Scheduler: Properly troubleshoot an job
8. Scheduler: Fix conformer troubleshoot if it was a TS conformer
For future reference and good to know for SLURM servers
…onal

Needs further development and understanding
The original code did not functional correctly - nor was never used hence why it was passed into production. It has now been changed to properly return the actual number of heavy atoms
Added in rapidfuxx
This function will be required for later use when SSH is updated. For the time being, better to remove. It also causes an error in xTBAdapater (has no attribute remove_remote_files)
Molpro Memory Set Update

Will now limit the memory of Molpro to what is req. by Molpro if the CPU is set 1 core.
If user has both software available for IRC,  it will use go with preferred order

level tests update
Fixed local var min_mem_value

Update core reduction logic test for MP
@@ -431,6 +446,13 @@
'remote': os.path.join(self.job_1.remote_path, 'm.x'),
'source': 'input_files',
'make_x': True})

def test_determine_job_status(self):

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'test_determine_job_status' is unnecessary as it is
redefined
before this value is used.
@@ -12,10 +12,9 @@
from arkane.encorr.corr import assign_frequency_scale_factor
from arkane.modelchem import METHODS_THAT_REQUIRE_SOFTWARE, LevelOfTheory, standardize_name

from arc.common import ARC_PATH, get_logger, get_ordered_intersection_of_two_lists, read_yaml_file
from arc.common import ARC_PATH, get_logger, get_ordered_intersection_of_two_lists, read_yaml_file, normalize_method_name

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'normalize_method_name' is not used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants