Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when converting h5ad to SCE #95

Open
joseph-siefert opened this issue May 22, 2023 · 3 comments
Open

segfault when converting h5ad to SCE #95

joseph-siefert opened this issue May 22, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@joseph-siefert
Copy link

Thanks for the great tool. Unfortunately I am getting a segmentation error when converting a large dataset. A smaller subset works without issue. Here is the error from the large dataset:


 *** caught segfault ***
address 0x2ae45ba3f000, cause 'memory not mapped'

Traceback:
 1: py_ref_to_r(x)
 2: py_to_r.default(x)
 3: NextMethod()
 4: py_to_r.numpy.ndarray(x)
 5: py_to_r(x)
 6: as_r_value(x$indices)
 7: .nextMethod(.Object = .Object, ... = ...)
 8: callNextMethod()
 9: initialize(value, ...)
10: initialize(value, ...)
11: new("dgRMatrix", j = as.integer(as_r_value(x$indices)), p = as.integer(as_r_value(x$indptr)),     x = as.vector(as_r_value(x$data)), Dim = as.integer(dim(x)))
12: py_to_r.scipy.sparse.csr.csr_matrix(mat)
13: py_to_r(mat)
14: t(py_to_r(mat))
15: doTryCatch(return(expr), name, parentenv, handler)
16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
17: tryCatchList(expr, classes, parentenv, handlers)
18: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call, nlines = 1L)        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        sm <- strsplit(conditionMessage(e), "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && isTRUE(getOption("show.error.messages"))) {        cat(msg, file = outFile)        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
19: try(t(py_to_r(mat)), silent = TRUE)
20: .extract_or_skip_assay(skip_assays = skip_assays, hdf5_backed = hdf5_backed,     dims = dims, mat = adata$X, name = "'X' matrix")
21: AnnData2SCE(adata, X_name = X_name, hdf5_backed = backed, verbose = verbose,     ...)
22: fun(...)
23: basiliskRun(env = env, fun = .H5ADreader, file = file, X_name = X_name,     backed = use_hdf5, verbose = verbose, ...)
24: readH5AD("full_dataset.h5ad",     verbose = TRUE)
An irrecoverable exception occurred. R is aborting now ...
/cm/local/apps/uge/var/spool/chbscl-50-10/job_scripts/10091596: line 23: 79746 Segmentation fault      (core dumped)```
@lazappi lazappi added the bug Something isn't working label May 23, 2023
@lazappi
Copy link
Member

lazappi commented May 23, 2023

Hi @joseph-siefert

Thanks for giving {zellkonverter} a go! Are you able to share the dataset at all? It's a bit hard to say if this is a dataset issue or something to do with your setup. I assume you don't have any issues reading the file in Python?

@joseph-siefert
Copy link
Author

Unfortunately I can't share the dataset. I removed the layers and was able to get past the above error, however a new error arose:
'X' matrix does not support transposition and has been skipped
I think both errors are due to memory limitations, as I was able to circumvent this by subsampling the matrix. I have plenty of available memory, so it seems related to the available memory in R during the matrix conversion. Is that another more memory-efficient way to make the conversion and avoid R memory limits?

@lazappi
Copy link
Member

lazappi commented Jun 2, 2023

Sorry for the slow response. I thought I had replied to this but obviously not. The way the conversion works there are two copies of the data in memory, one in Python and one in R so for large datasets the memory requirement can be large. One approach to try is using the HDF5 backed mode which should help with this. The other thing is I made some fixes for this specific message recently so it might be worth trying the most recent version (see #96).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants