Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pml segfault: memory not mapped #149

Open
lemmonquiche opened this issue Jun 22, 2023 · 3 comments
Open

pml segfault: memory not mapped #149

lemmonquiche opened this issue Jun 22, 2023 · 3 comments

Comments

@lemmonquiche
Copy link

Hello,

I am having the same issue as address in #144 while running on an 1TB RAM server. I have a tree with 50,211 leaf nodes with MSA length of 3,828 bases. I have multiple other large trees and alignment that are also failing due to the memory not mapped and foresee more in the future, so I would be very interested in seeing phangorn being able to handle trees of this size and even larger.

I specifically am using your package instead of RAxML or iqtree because I need to keep my tree ultrametric. While BEAST could be an option, it does not seem particularly friendly to automated pipelines/limited documentation online.

Below is how I am using pml() for my task. I can send you the sample data if you would like.

library(phangorn)
library(phytools)

#Load MSA and tree into R:
nt_seqs <- read.phyDat("filteredAlignment.fasta"), format = "fasta", type = "DNA")
tree <- read.newick("filteredTree.nwk")

##pare alignment to only use the sequences included in the tree
nt_seqs_pared <- nt_seqs[which(names(nt_seqs)%in% tree$tip.label)] 

##coerce tree to be ultrametric
tree_ultra <- phangorn:::minEdge(tree, tau = 1e-5, enforce_ultrametric = TRUE)
fit_ultra <- pml(tree_ultra, data = nt_seqs_pared, k = 4, bf = baseFreq(nt_seqs_pared))
fitGTR_ultra <- optim.pml(fit_ultra, model = "GTR", optRooted = T, optQ = T, optGamma = TRUE, optBf = TRUE,  rearrangement = "none", control = pml.control(trace = 1))

At the pml() I get:

 *** caught segfault ***
address 0x14dcfabe0, cause 'memory not mapped'

Traceback:
 1: pml.fit(tree, data, bf, shape = shape, k = k, Q = Q, levels = attr(data,     "levels"), inv = inv, rate = rate, g = g, w = w, eig = eig,     INV = INV, ll.0 = ll.0, llMix = llMix, wMix = wMix, site = TRUE,     ASC = ASC)
 2: pml(tree_ultra, data = nt_seqs_pared, k = 4, bf = baseFreq(nt_seqs_pared))
An irrecoverable exception occurred. R is aborting now ...

Information about version:

R version 4.3.0 (2023-04-21)                                                                                                                                                                  
Platform: x86_64-pc-linux-gnu (64-bit)                                                                                                                                                        
Running under: Ubuntu 20.04.6 LTS                                                                                                                                                             
                                                                                                                                                                                              
Matrix products: default                                                                                                                                                                      
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3                                                                                                                                     
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3;  LAPACK version 3.9.0                                                                                                            
                                                                                                                                                                                              
locale:                                                                                                                                                                                       
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C                                                                                                                                                  
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8                                                                                                                                        
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8                                                                                                                                       
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] seqmagick_0.1.5 phytools_1.5-1  maps_3.4.1      phangorn_2.11.1
[5] ape_5.7-1 

Thanks

@KlausVigo
Copy link
Owner

Hi @lemmonquiche,
the problem is as in #144 mentioned that there is too much space allocated. Unfortunately there is no quick fix.
I am working on fixing this, but it will take some time. All the underlying C-code needs to be rewritten. I started using RcppArmadillo, which should make improvements easier later on and I plan to better integrate partitioned and mixture models.
The ultrametric and tipdated phylogeny optimisation is a bit simpler than unrooted trees, so I might get a testing version out earlier.
Kind regards,
Klaus

@Phylloxera
Copy link

I, too, am having an out of memory issue. Mine is on NJ, treeNJ <- NJ(dm)

#Error in nj(x) : cannot allocate memory block of size 134217728 Tb
#Calls: source -> withVisible -> eval -> eval -> NJ -> reorder -> nj

I'm willing to test and have access to some high performance computing, so I'll keep an eye on this.

@KlausVigo
Copy link
Owner

Hi @Phylloxera,
this problem should be fixed by commit emmanuelparadis/ape@20332d8 and discussion emmanuelparadis/ape#97 . So NJ should work after updating ape to the development version. NJ is just a wrapper around the ape function nj.
pml, pml_bb will likely complain afterwards.
Regards,
Klaus

I, too, am having an out of memory issue. Mine is on NJ, treeNJ <- NJ(dm)

#Error in nj(x) : cannot allocate memory block of size 134217728 Tb
#Calls: source -> withVisible -> eval -> eval -> NJ -> reorder -> nj

I'm willing to test and have access to some high performance computing, so I'll keep an eye on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants