-
We've been experiencing reproducibility issues with OrthogonalMatchingPursuit in different computing environments. Searching the project's issues it looks like there was some OMP reproducibility issues in the CI environment around 2014, which led to the OMP tests being disabled. Looking at the code, there isn't a lot of potential culprits explaining such a behavior. Parallel/delayed is used in OrthogonalMatchingPursuitCV, but can be controlled with n_jobs and this doesn't change the outcome. Diving deeper brings us to scipy's use of OpenBLAS. So now to the questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
With multithreading enabled (that is However, computers work with finite precision floating point numbers instead of true real numbers. In consequence this can cause non deterministic rounding error changes in the results. In case you have correlated data, those small changes can dramatically impact the sparsity pattern of the solution OMP converges to.
None is more correct than the others. The rounding errors that you get with sequential code have no particular reasons to be more correct than the rounding errors you get in non-deterministic orders of execution in general. The sparsity pattern of the solution of the OMP optimization problem is fundamentally not stable when dealing with correlated features. To quantify this (lack of) stability, it is a good idea to fit the same models many times with random resampling of the dataset, for instance using You can have a look at this example in the documentation for instance: This above example considers Ridge and Lasso regression but you have similar (or even stronger) variability with OMP.
On multicore machines, If you do not set You can introspect the number of threads used by openblas on your machine by default using the
|
Beta Was this translation helpful? Give feedback.
With multithreading enabled (that is
OPENBLAS_NUM_THREADS > 1
) the ordering of sub-operations (e.g. blocks of computations performed during a matrix matrix multiplication) is not deterministic (it depends on the thread scheduling decisions made by the operating system). From a mathematical point of view, this should not matter because arithmetic operations such as additions performed on the intermediate results are commutative.However, computers work with finite precision floating point numbers instead of true real numbers. In consequence this can cause non deterministic rounding error changes in the results.
In case you have correlated data, those small…