Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather_dict on local error is big bottleneck for large datasets #527

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

daurer
Copy link
Contributor

@daurer daurer commented Jan 31, 2024

By default errors for each view are saved at the end of each block of iterations into a dictionary. Those dictionaries are then gathered across all MPI ranks into a global dictionary and might be saved into the .ptyr file if record_local_error is true in the engine params.

For the high-perfomance engines, the dictionary MPI gathering of the errors can be a major bottleneck. In this PR I have made the collection of per-view error metrics optional (using the existing record_local_error parameter) and if not needed, the errors are first reduced on each rank with a subsequent MPI allreduce across all the ranks. This completely removes the bottleneck but still allows collecting the per-view errors if required. By default record_local_error is false.

@daurer daurer force-pushed the fix-to-avoid-gathering-large-dicts branch from 4280d4f to 9c4176c Compare February 6, 2024 16:42
@daurer daurer added the 0.8.1 path release label Feb 29, 2024
@daurer daurer merged commit 9155c5d into dev Mar 5, 2024
4 checks passed
@daurer daurer deleted the fix-to-avoid-gathering-large-dicts branch March 5, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.8.1 path release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants