Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate_new crash with big unstructured matrix #999

Closed
bbolker opened this issue Mar 12, 2024 · 8 comments
Closed

simulate_new crash with big unstructured matrix #999

bbolker opened this issue Mar 12, 2024 · 8 comments

Comments

@bbolker
Copy link
Contributor

bbolker commented Mar 12, 2024

This crashes at the MakeADFun() step. Obviously this is not a model we should actually try to fit, but a segfault seems bad. Is there a limit we can detect to throw an error before this happens?

library(glmmTMB)
form <- ~ 1 + (bigf + 0 | dummy)

sim <- function(n) {
    dd <- data.frame(dummy  = 1, bigf = factor(1:n))
    simulate_new(form,
                 seed = 101,
                 family = poisson,
                 newdata = dd,
                 newparams = list(beta = 0, theta = rep(0, n*(n+1)/2)))
}

sim(10)
sim(46)
if (FALSE) sim(47) ## boom, segfault
@kaskr
Copy link
Contributor

kaskr commented Mar 13, 2024

I can't reproduce this. Are you on a special branch?

@bbolker
Copy link
Contributor Author

bbolker commented Mar 13, 2024

No, this happens for me at the HEAD of the master branch.

 sessionInfo()
R Under development (unstable) (2024-03-07 r86058)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so 
LAPACK: /usr/local/lib/R/lib/libRlapack.so;  LAPACK version 3.12.0

[locale/time zone redacted]

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] glmmTMB_1.1.8-9000

loaded via a namespace (and not attached):
 [1] lme4_1.1-35.1       codetools_0.2-19    numDeriv_2016.8-1.1
 [4] multcomp_1.4-25     mgcv_1.9-1          Matrix_1.6-5       
 [7] lattice_0.22-5      TH.data_1.1-2       coda_0.19-4.1      
[10] estimability_1.5    splines_4.4.0       zoo_1.8-12         
[13] bspm_0.5.3          emmeans_1.10.0      mvtnorm_1.2-4      
[16] TMB_1.9.10          xtable_1.8-4        nloptr_2.0.3       
[19] sandwich_3.1-0      grid_4.4.0          compiler_4.4.0     
[22] boot_1.3-30         nlme_3.1-163        minqa_1.2.6        
[25] survival_3.5-8      Rcpp_1.0.12         MASS_7.3-60.0.1    

@kaskr
Copy link
Contributor

kaskr commented Mar 13, 2024

I'd be curious to see a gdb backtrace? My first guess would have been an out-of-bounds error, but no 'luck'. I've built glmmTMB with TMB_SAFEBOUNDS and your example worked for all values of n and far beyond.

@bbolker
Copy link
Contributor Author

bbolker commented Mar 13, 2024

Here you go. (I could also try to replicate this in a rocker container if that would be useful?)

sim(46)
[New Thread 0x7fffebdaf640 (LWP 24247)]
[New Thread 0x7fffe35ae640 (LWP 24248)]
[New Thread 0x7fffdadad640 (LWP 24249)]
[[1]]
 [1] 1 4 0 2 3 5 0 0 2 1

[[1]]
 [1] 1 1 0 2 1 2 0 0 1 0 1 1 5 0 0 0 1 4 0 0 1 4 2 2 4 0 1 1 1 0 0 1 2 0 3 0 1 1
[39] 0 1 1 1 0 0 0 4

> sim(47)
[New Thread 0x7fffd17ff640 (LWP 24250)]
[New Thread 0x7fffd0ffe640 (LWP 24251)]

Thread 6 "R" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd0ffe640 (LWP 24251)]
TMBad::global::add_to_stack<TMBad::global::RefOp> (this=this@entry=0x0, pOp=pOp@entry=0x7fffcc000b70, x=std::vector of length 0, capacity 0) at /usr/local/lib/R/site-library/TMB/include/TMBad/global.hpp:2550
2550	    IndexPair ptr((Index)inputs.size(), (Index)values.size());
(gdb) bt
#0  TMBad::global::add_to_stack<TMBad::global::RefOp> (this=this@entry=0x0, 
    pOp=pOp@entry=0x7fffcc000b70, x=std::vector of length 0, capacity 0)
    at /usr/local/lib/R/site-library/TMB/include/TMBad/global.hpp:2550
#1  0x00007fffd21c788a in TMBad::global::ad_aug::addToTape (
    this=0x7fffd0ff9110)
    at /usr/local/lib/R/site-library/TMB/include/TMBad/TMBad.cpp:2208
#2  0x00007fffd21c794d in TMBad::global::ad_plain::ad_plain (
    this=0x7fffd0ff9134, x=...)
    at /usr/local/lib/R/site-library/TMB/include/TMBad/TMBad.cpp:1964
#3  0x00007fffd21cc623 in TMBad::global::ad_aug::operator* (other=..., 
    this=0x7fffd0ff93c0)
    at /usr/local/lib/R/site-library/TMB/include/TMBad/TMBad.cpp:2298
#4  TMBad::global::ad_aug::operator* (this=0x7fffd0ff93c0, other=...)
    at /usr/local/lib/R/site-library/TMB/include/TMBad/TMBad.cpp:2290
#5  0x00007fffd21dd3cd in Eigen::internal::pmul<TMBad::global::ad_aug> (b=..., 
    a=...)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/GenericPacketMath.h:237
#6  Eigen::internal::conj_helper<TMBad::global::ad_aug, TMBad::global::ad_aug, false, false>::pmul (y=..., x=..., this=<synthetic pointer>)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/arch/Default/ConjHelper.h:99
#7  Eigen::internal::gebp_traits<TMBad::global::ad_aug, TMBad::global::ad_aug, f--Type <RET> for more, q to quit, c to continue without paging--
alse, false, 1, 0>::madd<TMBad::global::ad_aug, TMBad::global::ad_aug, TMBad::global::ad_aug, Eigen::internal::FixedInt<0> > (b=..., b=..., 
    this=<synthetic pointer>, tmp=..., c=..., a=...)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/products/GeneralBlockPanelKernel.h:523
#8  Eigen::internal::gebp_traits<TMBad::global::ad_aug, TMBad::global::ad_aug, false, false, 1, 0>::madd<TMBad::global::ad_aug, TMBad::global::ad_aug, Eigen::internal::FixedInt<0> > (lane=..., tmp=..., c=..., b=..., a=..., 
    this=<synthetic pointer>)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/products/GeneralBlockPanelKernel.h:530
#9  Eigen::internal::gebp_kernel<TMBad::global::ad_aug, TMBad::global::ad_aug, long, Eigen::internal::blas_data_mapper<TMBad::global::ad_aug, long, 0, 0, 1>, 2, 4, false, false>::operator() (res=..., blockA=0x555562279bc0, 
    blockB=blockB@entry=0x7fffd0ff94f0, rows=25, depth=depth@entry=47, 
    cols=cols@entry=24, alpha=..., strideA=47, strideB=47, offsetA=0, 
    offsetB=0, this=<optimized out>)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1754
#10 0x00007fffd22d3dfc in Eigen::internal::general_matrix_matrix_product<long, TMBad::global::ad_aug, 0, false, TMBad::global::ad_aug, 1, false, 0, 1>::run (
    rows=<optimized out>, cols=27, depth=depth@entry=47, _lhs=<optimized out>, 
    lhsStride=<optimized out>, _rhs=<optimized out>, 
--Type <RET> for more, q to quit, c to continue without paging--
    rhsStride=<optimized out>, _res=<optimized out>, 
    resStride=<optimized out>, alpha=..., blocking=..., info=<optimized out>, 
    resIncr=1)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/products/GeneralMatrixMatrix.h:133
#11 0x00007fffd22d42a7 in Eigen::internal::gemm_functor<TMBad::global::ad_aug, long, Eigen::internal::general_matrix_matrix_product<long, TMBad::global::ad_aug, 0, false, TMBad::global::ad_aug, 1, false, 0, 1>, Eigen::Matrix<TMBad::global::ad_aug, -1, -1, 0, -1, -1>, Eigen::Transpose<Eigen::Matrix<TMBad::global::ad_aug, -1, -1, 0, -1, -1> const>, Eigen::Matrix<TMBad::global::ad_aug, -1, -1, 0, -1, -1>, Eigen::internal::gemm_blocking_space<0, TMBad::global::ad_aug, TMBad::global::ad_aug, -1, -1, -1, 1, false> >::operator() (info=<optimized out>, 
    cols=<optimized out>, col=<optimized out>, rows=<optimized out>, 
    row=<optimized out>, this=<optimized out>)
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/products/GeneralMatrixMatrix.h:230
#12 _ZN5Eigen8internal16parallelize_gemmILb1ENS0_12gemm_functorIN5TMBad6global6ad_augElNS0_29general_matrix_matrix_productIlS5_Li0ELb0ES5_Li1ELb0ELi0ELi1EEENS_6MatrixIS5_Lin1ELin1ELi0ELin1ELin1EEENS_9TransposeIKS9_EES9_NS0_19gemm_blocking_spaceILi0ES5_S5_Lin1ELin1ELin1ELi1ELb0EEEEElEEvRKT0_T1_SJ_SJ_b._omp_fn.0(void)
    ()
    at /usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/products/Parallelizer.h:171
--Type <RET> for more, q to quit, c to continue without paging--
#13 0x00007ffff7858b77 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#14 0x00007ffff7694ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#15 0x00007ffff7726850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@kaskr
Copy link
Contributor

kaskr commented Mar 13, 2024

OK, I guess it could be same as kaskr/adcomp#390.
Could you try the solution proposed there?

@bbolker
Copy link
Contributor Author

bbolker commented Mar 13, 2024

🎉 that appears to solve the problem (sim(300) was slow [18 seconds], but worked ... didn't bother testing anything bigger). I guess we should add #define EIGEN_DONT_PARALLELIZE to src/glmmTMB.cpp ... ??

bbolker added a commit that referenced this issue Mar 13, 2024
bbolker added a commit that referenced this issue Mar 13, 2024
@kaskr
Copy link
Contributor

kaskr commented Mar 13, 2024

Yeah, better set that flag for now.

@kaskr kaskr closed this as completed Mar 13, 2024
@mmaechler
Copy link
Contributor

Instructive indeed, also from this point of view:
W ehave another case where (semi-)automatic parallelizing leads to (partly not reproducible / not easily reproducible ) problems...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants