Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we control the seed of a Monte Carlo simulation to recreate the same sequence of results? #62

Open
msrocka opened this issue Jan 10, 2019 · 2 comments

Comments

@msrocka
Copy link
Member

msrocka commented Jan 10, 2019

tl;dr

We currently do not have a single seed in the simulation but a seed for each uncertainty distribution. It would be possible to switch to a single starting seed but we would also need to implement rules (especially for the parameter distributions) so that the numbers of the distributions are always generated in the same order. The influence on the performance has to be tested and it requires some implementation effort.

details

When running a Monte Carlo simulation in openLCA each uncertainty distribution of a model gets its own number generator. There can be uncertainty distributions for inputs, outputs, characterization factors, and parameters and all these elements get their own number generator. For example, for an ecoinvent database this results in hundreds of thousands of number generators. For each simulation run, a numeric value is generated from each number generator and the matrices for the calculation are updated with these values. Generated values of inputs, outputs, and characterization factors are directly mapped to the respective matrix cells. Then all formulas are evaluated and the respective matrix cells are also updated. Finally, unit and flow property conversion factors and allocation factors are applied and the simulation result is calculated.

image

Currently, each uncertainty distribution holds its own random number generator with its own seed. We use the Random class from the Java SDK as the underlying generator for each distribution. It is possible to control the seed of an instance of Random. To recreate the same sequence of results we would need to control the seed of each number generator (which is absurd) or use a shared instance of Random for all (may hundreds of thousands) distributions.

In the current implementation the numbers are not generated in parallel (because the calculation time still massively outweighs the time for generating the numbers). However, when we want to parallelize this process we have to consider the following:

Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs.

Thus, we need to do some tests before we can say what the influence on the performance of such a shared instance of Random in a multi-threaded number generation process is.

But … even with such a shared Random instance with controlled seed there is no guarantee to get the same sequence of results because we also have to assure that the numbers for all of the (may hundreds of thousands) uncertainty distributions are evaluated in exactly the same order. While such an order is easy to assure for the uncertainty distributions of inputs, outputs, and characterization factors which are directly mapped to matrix cells it is not that simple for the parameters. For the formula evaluation we build a nested environment for the formula interpreter in which parameter values are bound and updated in the simulation runs (very similar as described here). To replicate the order of the number generation process we also have to implement some additional rules in which order the parameter distributions are handled.

@m-jamieson
Copy link

Would this be simplified at all if replicated results were only guaranteed to repeat exactly for a given seed when the version of openLCA and the product system are the same? I know in developing a Monte Carlo tool for Excel here, we ended up at a place where the Monte Carlo results were guaranteed to repeat until the worksheet calculation chain changed - like adding a new equation to a cell, causing the order of all equations to shift. I find this behavior to be generally acceptable for our needs.

@msrocka
Copy link
Member Author

msrocka commented Jan 11, 2019

@jump2conclusionsmatt exactly this. A new openLCA version could result in a different order for the number generation. So yes, it makes it easier when we restrict the guarantee to recreate a result sequence to these conditions:

  • same product system (with all related processes, (global) parameters, LCIA factors etc.),
  • same openLCA version,
  • same seed

For the parameter uncertainties I still need to find and implement some rules so that the order of number generation is deterministic. I will do some tests in the next days and post the results in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants