Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VASP Drone and onsite_density_matrix causing large document sizes #577

Open
mkhorton opened this issue Jan 27, 2021 · 2 comments
Open

VASP Drone and onsite_density_matrix causing large document sizes #577

mkhorton opened this issue Jan 27, 2021 · 2 comments
Labels
bug improvement reported issues that considered further improvement to atomate

Comments

@mkhorton
Copy link
Contributor

I have seen an example of a calculation (~200 atoms, ~50 SCF steps) where the task document size goes past 16 MB -- the vast majority of this due to the onsite_density_matrix in the OUTCAR.

Creating this issue to keep an eye on it. Possibilities are (1) a bug in parsing the matrix, (2) a sub-optimal representation of the matrix, (3) the possibility we shouldn't be storing this regardless except for the last SCF step. I have not had an opportunity to investigate further yet, if anyone wants a test file let me know.

@utf
Copy link
Member

utf commented Jan 27, 2021

Thanks @mkhorton.

This is actually something we had to deal with in emmet-cli recently: https://github.com/materialsproject/emmet/blob/e30cbf2d6856d51dd7149ee253c4eb1ea969ddc9/emmet-cli/emmet/cli/utils.py#L394

I agree it would be better to handle this in the drone directly. Do you know of any potential uses for the onsite_density_matrix data? As in, is there any downside to always removing it?

@mkhorton
Copy link
Contributor Author

@acrutt brought this to my attention, we can share the example file privately if it's helpful.

I don't think this is data we'd commonly need... I think I'm actually to blame for this, I added the parsing to the Outcar two years ago, though I can't recall the context now.

In the example file, it ends up being a list of dicts (15504 elements) keyed by spin (+1, -1).

I think we could probably safely remove the key from the drone, and probably the way this data is represented could improved at a later date in pymatgen, because I think the current representation of the data is basically 1-to-1 equivalent of how it's stored in the Outcar, except as a list of dicts, and I don't think this is very sensible.

@itsduowang itsduowang added bug enhancement improvement reported issues that considered further improvement to atomate and removed enhancement labels Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug improvement reported issues that considered further improvement to atomate
Projects
None yet
Development

No branches or pull requests

3 participants