You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using this package for loading the open problems h5ad file in a tutorial from the best practices book (70k cells, 130k features).
While loading the processed NeurIPs dataset (~3.0 GB) with zellkonverter 1.8.0 in R, one gets a peak RAM usage of more than 30 GB. This is exceptional and would be hard to execute in most local hardware. In comparison, using Python's and anndata 0.8.0, the total memory increase after loading does not increase more than 10GB.
I am suspecting that sparse to dense matrix conversions are generating this mem increase, but it could be something else. In general, I am asking whether there are flags that can be applied, or proposed, while loading such objects at the readH5AD step, to avoid this peak memory usage.
Here are some suggestions/comments but I'm not sure how much they will help:
Because we first read the data into Python and then convert it to R there are two copies of the dataset in memory during the process. So if using normal Python requires 10GB then using {zellkonverter} should realistically use at least 20GB.
The native R reader should be more efficient but the version in {zellkonverter} v1.8.0 doesn't work well with anndata v0.8.0 files. Worth a try though.
You could try using the backed mode. That should be more memory efficient but sometimes the conversion is not as reliable.
If you only need some parts of the object you can set various parameters (obs, var, obsm etc.) in readH5AD() so that only those things are converted to R (the whole dataset is still read into Python though)
If you are already using Python in the document anyway you can use AnnData2SCE() which avoids reading a second Python version.
Thanks you for dev of this package!
We're using this package for loading the open problems h5ad file in a tutorial from the best practices book (70k cells, 130k features).
While loading the processed NeurIPs dataset (~3.0 GB) with
zellkonverter
1.8.0 in R, one gets a peak RAM usage of more than 30 GB. This is exceptional and would be hard to execute in most local hardware. In comparison, using Python's andanndata
0.8.0, the total memory increase after loading does not increase more than 10GB.I am suspecting that sparse to dense matrix conversions are generating this mem increase, but it could be something else. In general, I am asking whether there are flags that can be applied, or proposed, while loading such objects at the
readH5AD
step, to avoid this peak memory usage.GEO - Dataset Processed
Thank you,
The text was updated successfully, but these errors were encountered: