Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High RAM peak while loading h5ad. #92

Open
ilibarra opened this issue Apr 3, 2023 · 1 comment
Open

High RAM peak while loading h5ad. #92

ilibarra opened this issue Apr 3, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@ilibarra
Copy link
Member

ilibarra commented Apr 3, 2023

Thanks you for dev of this package!

We're using this package for loading the open problems h5ad file in a tutorial from the best practices book (70k cells, 130k features).

While loading the processed NeurIPs dataset (~3.0 GB) with zellkonverter 1.8.0 in R, one gets a peak RAM usage of more than 30 GB. This is exceptional and would be hard to execute in most local hardware. In comparison, using Python's and anndata 0.8.0, the total memory increase after loading does not increase more than 10GB.

library(zellkonverter)
sce <- readH5AD("oproblems_bmmc_multiome_genes_filtered.h5ad")

I am suspecting that sparse to dense matrix conversions are generating this mem increase, but it could be something else. In general, I am asking whether there are flags that can be applied, or proposed, while loading such objects at the readH5AD step, to avoid this peak memory usage.

GEO - Dataset Processed

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

[21] SingleCellExperiment_1.20.0       zellkonverter_1.8.0     

Thank you,

@ilibarra ilibarra added the enhancement New feature or request label Apr 3, 2023
@lazappi
Copy link
Member

lazappi commented Apr 4, 2023

Here are some suggestions/comments but I'm not sure how much they will help:

  • Because we first read the data into Python and then convert it to R there are two copies of the dataset in memory during the process. So if using normal Python requires 10GB then using {zellkonverter} should realistically use at least 20GB.
  • The native R reader should be more efficient but the version in {zellkonverter} v1.8.0 doesn't work well with anndata v0.8.0 files. Worth a try though.
  • You could try using the backed mode. That should be more memory efficient but sometimes the conversion is not as reliable.
  • If you only need some parts of the object you can set various parameters (obs, var, obsm etc.) in readH5AD() so that only those things are converted to R (the whole dataset is still read into Python though)
  • If you are already using Python in the document anyway you can use AnnData2SCE() which avoids reading a second Python version.

Let me know if any of that is helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants