High RAM peak while loading h5ad. #92

ilibarra · 2023-04-03T11:44:27Z

Thanks you for dev of this package!

We're using this package for loading the open problems h5ad file in a tutorial from the best practices book (70k cells, 130k features).

While loading the processed NeurIPs dataset (~3.0 GB) with zellkonverter 1.8.0 in R, one gets a peak RAM usage of more than 30 GB. This is exceptional and would be hard to execute in most local hardware. In comparison, using Python's and anndata 0.8.0, the total memory increase after loading does not increase more than 10GB.

library(zellkonverter)
sce <- readH5AD("oproblems_bmmc_multiome_genes_filtered.h5ad")

I am suspecting that sparse to dense matrix conversions are generating this mem increase, but it could be something else. In general, I am asking whether there are flags that can be applied, or proposed, while loading such objects at the readH5AD step, to avoid this peak memory usage.

GEO - Dataset Processed

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

[21] SingleCellExperiment_1.20.0       zellkonverter_1.8.0

Thank you,

The text was updated successfully, but these errors were encountered:

lazappi · 2023-04-04T06:22:37Z

Here are some suggestions/comments but I'm not sure how much they will help:

Because we first read the data into Python and then convert it to R there are two copies of the dataset in memory during the process. So if using normal Python requires 10GB then using {zellkonverter} should realistically use at least 20GB.
The native R reader should be more efficient but the version in {zellkonverter} v1.8.0 doesn't work well with anndata v0.8.0 files. Worth a try though.
You could try using the backed mode. That should be more memory efficient but sometimes the conversion is not as reliable.
If you only need some parts of the object you can set various parameters (obs, var, obsm etc.) in readH5AD() so that only those things are converted to R (the whole dataset is still read into Python though)
If you are already using Python in the document anyway you can use AnnData2SCE() which avoids reading a second Python version.

Let me know if any of that is helpful

ilibarra added the enhancement New feature or request label Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High RAM peak while loading h5ad. #92

High RAM peak while loading h5ad. #92

ilibarra commented Apr 3, 2023

lazappi commented Apr 4, 2023

High RAM peak while loading h5ad. #92

High RAM peak while loading h5ad. #92

Comments

ilibarra commented Apr 3, 2023

lazappi commented Apr 4, 2023