Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeH5AD fails for very large datasets (> 1.5 million cells) #73

Open
GabrielHoffman opened this issue Sep 21, 2022 · 1 comment
Open
Labels
bug Something isn't working

Comments

@GabrielHoffman
Copy link

GabrielHoffman commented Sep 21, 2022

Hi Luke,
Thanks again for the package, I use it every day!

I have a huge H5AD file of 40k genes and 3.7M cells. I load it into R with readH5AD(...,use_hdf5=TRUE). After QC and filtering I want to write a 1.5M cells to another H5AD file. When I use writeH5AD(sce[,include],outfile) I get a segfault after ~20 minutes. Memory shouldn't be an issue since I requested 576 Gb RAM on my compute node. I managed to solve this by 1) writing the SingleCellExperiment as 4 chunks to separate H5AD files, 2) then using AnnData in python to concatenate the 4 files into a single H5AD.

I am using R 4.2.0 zellkonverter v1.6.5

Have you encountered this issue with large datasets? I wanted to check with you first since creating a reproducible examine I can share will take a substantial amount of work.

Best,
Gabriel

@lazappi lazappi changed the title writeH5AD fails for large dataset writeH5AD fails for very large datasets (> 1.5 million cells) Sep 29, 2022
@lazappi lazappi added the bug Something isn't working label Sep 29, 2022
@lazappi
Copy link
Member

lazappi commented Sep 29, 2022

Hi @GabrielHoffman

That is indeed a large dataset! I think the largest I have ever tried is a few hundred thousand cells. I'm actually fairly impressed you manage to work with it in both R and Python and it's just the conversion that seems to be the issue.

Have you tried running it with verbose = TRUE? That would be helpful for figuring out which part is failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants