Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPRODUCIBILITY: .Random.seed is updated when 'later' is loaded #167

Open
HenrikBengtsson opened this issue Oct 25, 2022 · 1 comment
Open

Comments

@HenrikBengtsson
Copy link

Issue

Loading later causes the RNG state to be updated, e.g.

$ R --quiet --vanilla
> str(globalenv()$.Random.seed)
 NULL
> loadNamespace("later")
<environment: namespace:later>
> str(globalenv()$.Random.seed)
  int [1:626] 10403 624 607059698 1535537611 ...

Why is this a problem? This makes it near impossible to get numerically reproducible results in parallel processing where the persistent workers are used (e.g. parallel::makeCluster()), because the results will depend on a package dependencies already being loaded or not. This affects all packages importing later directly or indirectly.

The only workaround for this is to (i) know what packages might be loaded up-front, and (ii) pre-load them all on the parallel workers before performing the actually tasks. In practice, that's not feasible.

Suggestion

I don't know what the RNG is used for during .onLoad(), but could one solution be do draw the random number in stealth mode, i.e. make sure to undo .Random.seed afterward?

Details

It's not one of the package dependencies that forwards the RNG;

$ R --quiet --vanilla
> str(globalenv()$.Random.seed)
 NULL
> loadNamespace("Rcpp")
> str(globalenv()$.Random.seed)
 NULL
> loadNamespace("rlang")
<environment: namespace:rlang>
> str(globalenv()$.Random.seed)
 NULL
> loadNamespace("later")
<environment: namespace:later>
> str(globalenv()$.Random.seed)
  int [1:626] 10403 624 1853159584 1558919201 ...
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/lib/libRblas.so
LAPACK: /home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.2.1 cli_3.4.1      later_1.3.0    Rcpp_1.0.9     rlang_1.0.6
@wch
Copy link
Member

wch commented Oct 25, 2022

My guess is that this is caused by the use of Rcpp::RNGScope in the auto-generated file RcppExports.cpp:
https://github.com/r-lib/later/blob/ba70887d77527e5647e375013e2e1ad9dc2c3646/src/RcppExports.cpp

There's one more use of RNGScope in later.cpp:

Rcpp::RNGScope rngscope;

I believe those are the only places where later interacts with R's random number generator.

Here's a simple example with just Rcpp:

str(globalenv()$.Random.seed)
#>  NULL

Rcpp::cppFunction('int go() { return 1; }')
str(globalenv()$.Random.seed)
#>  NULL

go()
#> [1] 1
str(globalenv()$.Random.seed)
#>  int [1:626] 10403 624 -1688476694 -149789597 758540872 1648411561 -1149942954 1420315103 -1604158828 -374756219 ...

It looks like when the Rcpp-wrapped C++ function is run, that causes the random seed to be set. The later package runs some of its C++ functions on load, and that's probably what's causing the seed to be set.

Note that if the random seed is already set (to a non-NULL value) before loading later, then it does not alter the seed:

rnorm(1)
#> [1] -0.484127
str(globalenv()$.Random.seed)
#>  int [1:626] 10403 2 -1539884961 -1138419659 1420516300 1450512162 -1641721475 806492791 -2033368162 541053528 ...

Rcpp::cppFunction('int go() { return 1; }')
str(globalenv()$.Random.seed)
#>  int [1:626] 10403 2 -1539884961 -1138419659 1420516300 1450512162 -1641721475 806492791 -2033368162 541053528 ...

go()
#> [1] 1
str(globalenv()$.Random.seed)
#>  int [1:626] 10403 2 -1539884961 -1138419659 1420516300 1450512162 -1641721475 806492791 -2033368162 541053528 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants