Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pml() segfault: memory not mapped #144

Open
iferres opened this issue Jan 18, 2023 · 2 comments
Open

pml() segfault: memory not mapped #144

iferres opened this issue Jan 18, 2023 · 2 comments

Comments

@iferres
Copy link

iferres commented Jan 18, 2023

Hi, I'm having this issue with pml():

library(magrittr)
library(phangorn)

tree <- readRDS("tree.RDS")
dat <- readRDS("phydat.RDS") 

pml <- phangorn::pml(tree, data = dat, k = 8)    

 *** caught segfault ***  
address 0x8cc5f80, cause 'memory not mapped'

Traceback:
 1: pml.fit(tree, data, bf, shape = shape, k = k, Q = Q, levels = attr(data,     "levels"), inv = inv, rate = rate, g = g, w = w, eig = eig,     INV = INV, ll.0 = ll.0, llMix = llMix, wMix = wMix, site = TRUE,     ASC = ASC) 
 2: phangorn::pml(tree, data = dat, k = 8)

I'm send you the files through wetransfer to reproduce the error (https://we.tl/t-SrejR8GFvG). The phydat.RDS is quite big (about 41 Mb). If I subset it before computing pml, the error disappears.

sessionInfo()            
R version 4.2.2 Patched (2022-11-10 r83330)           
Platform: x86_64-pc-linux-gnu (64-bit)                           
Running under: Debian GNU/Linux bookworm/sid                               
                                      
Matrix products: default                 
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so   
    
locale:                                                                            
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C            
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C    
 [9] LC_ADDRESS=C               LC_TELEPHONE=C   
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 

attached base packages:   
[1] stats     graphics  grDevices utils     datasets  methods   base 
   
other attached packages:        
[1] phangorn_2.10.0 ape_5.6-2       magrittr_2.0.3   
 
loaded via a namespace (and not attached):   
 [1] Rcpp_1.0.9       quadprog_1.5-8   lattice_0.20-45  codetools_0.2-18  
 [5] grid_4.2.2       nlme_3.1-160     rlang_1.0.6      cli_3.4.1 
 [9] Matrix_1.5-1     generics_0.1.3   fastmatch_1.1-3  igraph_1.3.5  
[13] parallel_4.2.2   compiler_4.2.2   pkgconfig_2.0.3 
@KlausVigo
Copy link
Owner

Dear @iferres,

with this data set it is a case that it runs out of memory. More that one cannot allocate enough memory.
So your data are pretty big, so dat is about 400Mb in memory.

> dat
664 sequences with 381968 character and 153762 different site patterns.
The states are a c g t
> object.size(dat)
410636904 bytes
> 153762 * 664 * 4
[1] 408391872

Where 153762 is the number of site pattern, 664 number of sequences and 4 bytes for an integer.
However if you try to run pml you need to allocate more memory (~26Gb) in your case:

153762 * 664 * 8 * 4  *  8
26137079808

Where 153762 is the number of site pattern, 664 number of sequences and 8 rate classes, 4 states and 8 bytes for an double. So maybe iqtree or RAxML can handle your data set.

Maybe we can discuss offline and brainstorm how to handle such data sets.
While I should one day allow longer vectors, this is not a trivial change.

Kind regards,
Klaus

@iferres
Copy link
Author

iferres commented Jan 18, 2023

Thank you very much for your quick response, Klaus!

I see. However, I ran it on a my desktop which has 64Gb of RAM, and then on a server with 1Tb of RAM. It fails in both of them. (The funny thing is that this is actually a 1/5 subset of my real dataset, which is a core genome concatenated alignment of 664 organisms and about 1000 genes 😅 ).

I'm a user-level phylogenetist, not sure if I could help with this, but I'm open to try to improve it.

Regards,
Ignacio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants