Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize field_transform #38

Open
martinvandriel opened this issue Sep 29, 2015 · 9 comments
Open

parallelize field_transform #38

martinvandriel opened this issue Sep 29, 2015 · 9 comments

Comments

@martinvandriel
Copy link
Contributor

Once going to very large databases (~10TB), the field_transform becomes a serious bottleneck. While computation can be done in a few hours, serial field transform takes multiple days to weeks (serial read/write rates on parallel file systems are really bad, about 30 MB/s on CSCS machines).

Workload should be limited, because this refers to a single loop with very few lines

https://github.com/geodynamics/axisem/blob/master/SOLVER/UTILS/field_transform.F90#L851-L909

@martinvandriel
Copy link
Contributor Author

Parallelization might not work together with compression.

@martinvandriel
Copy link
Contributor Author

@sstaehler
Copy link
Contributor

Right, that was one of the reasons not to try parallel NetCDF some years ago. Switching the compression off is probably not a problem for kernel applications, where the wave fields are never moved, but for all applications, where databases are sent to IRIS or whoever, this is a problem.

@martinvandriel
Copy link
Contributor Author

The only way I can think of to avoid this: use the old round robin IO and write the correct chunking readily compressed in the SOLVER :/

@sstaehler
Copy link
Contributor

but didn't we test that writing the correct chunking in the SOLVER is a bazillion times slower?

@martinvandriel
Copy link
Contributor Author

yes, but there might be room to optimize it. Only include those processors, that actually have to write stuff, threading, reduce number of dumps by buffering as many steps as possible in memory.

I am waiting since a week for field_tranform on a 10TB database, which I computed in a few hours, and it's only 30% done.

@sstaehler
Copy link
Contributor

Well, that is annoying. When trying to increase the dump buffer, keep in mind the low memory on most HPC machines. But I'm curious...

@martinvandriel
Copy link
Contributor Author

I guess we would need to control, which part of the mesh goes where on the cluster: if each node only has one processor that has crust, it might fit larger time chunks.

@martinvandriel
Copy link
Contributor Author

So here we go: system maintenance and field transform was killed. We should at least have a restart capability. This should be really easy to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants