Add MISFIT_PREPROCESSOR to ERT template #217

wouterjdb · 2020-10-14T09:25:07Z

This PR implements scaling of correlated observations using the ERT build-in PCA scaling method.

Contributor checklist

wouterjdb · 2021-02-09T10:11:18Z

✔️ kmeans clustering has now been added equinor/semeio#286

🚫 currently still blocked by equinor/ert#1316

wouterjdb · 2021-02-18T08:21:02Z

✔️ kmeans clustering has now been added equinor/semeio#286

✔️ speed improvement for many observations equinor/ert#1316

wouterjdb · 2021-02-18T08:23:59Z

🚫 Currently blocked by the new commits not yet being in pypi.

wouterjdb · 2021-02-22T09:30:43Z

Both packages are now updated on pypi (2.21.b0 and 1.0.b0)

✔️ Ready for testing.

…-manyobs

edubarrosTNO · 2021-03-26T12:38:30Z

I have tested the MISFIT_PREPROCESSOR option in the Norne case by using the code in the branch of this PR.
With this workflow job enabled, ERT writes some files to a subfolder inside the FlowNet output folder (<FLOWNET_OUTPUT_FOLDER>/reports/default_0):

Inside subfolder CorrelatedObservationsScalingJob, 3 files are created:
a. scale_factor.json: [34.63, 14.76]
b. svd.json: a 2D array of size (33, 2) containing what appears to be two lists of 33 singular values in decreasing order.
c. workflow-log.txt: a text file with some information about the calculation of the scaling factors stored in scale_factor.json - in this case two blocks of information indicating the number of primary components, number of observations and a list of observation keys used to calculate the scaling factor.
Inside subfolder MisfitPreprocessorJob, 4 files are created:
a. clusters.json: a Python dictionary of dictionaries associating the observation keys to their numbering
b. correlation_matrix.csv: a rather large CSV file (950 MB) which was hard to inspect given its size (but I believe a square matrix Nobs x Nobs).
c. svd.json: a 2D array of size (33, 1) containing what appears to be a list of 33 singular values in decreasing order (same as one of the lists stored in 1.b)
d. workflow-log.txt: a text file with some information about the obtained clusters of observations - in this case two clusters as stored in clusters.json, cluster 0 and cluster 1, with their respective list of observation keys and numbering (cluster 1 appears to contain many more observation keys than cluster 0)

edubarrosTNO · 2021-03-26T12:44:37Z

All in all, the only thing that I could infer from these output files is that 2 clusters of observations seem to be formed and assigned to calculated scaling factors based on some singular value decomposition or PCA (with 33 non-zero singular values). But it remains unclear why 2 clusters and how the singular values are used to determine the scaling factors.

Another observation is that, when I ran it for the second time, I noticed differences in the output of MISFIT_PREPROCESSOR with respect to the first attempt. In the second one, 3 clusters seem to have been formed: I saw that the scaling factor of cluster 0 remained close to the factor calculated in the first attempt and that the scaling factors of clusters 1 and 2 add up approximately to the scaling factor of cluster 1 in the first attempting (suggesting that, in this second run, old cluster 1 was split into two clusters). In summary, there seems to be some randomness associated with this MISFIT_PREPROCESSOR process despite that fact that the RANDOM_SEED fixed in the ERT config file is the same in both attempt runs. This should be reported in the ERT repository.

To conclude: based on my tests done in the Norne example, I would not recommend to merge this PR branch to master before we understand better what this option is doing exactly and ensure that we can control any possible randomness associated with this process. If we do proceed with merging, my advice would be to expose this as an optional setting in FlowNet config file and make sure to have it disabled as default. The large number of FlowNet failing simulations when this option was enabled stopped me from determining whether or not this would be useful to mitigate the problem of having a very large number of observations in our FlowNet runs.

wouterjdb added the enhancement New feature or request label Oct 14, 2020

wouterjdb self-assigned this Oct 14, 2020

wouterjdb added this to In progress in FlowNet via automation Oct 14, 2020

wouterjdb changed the title ~~Add STD_SCALE_CORRELATED_OBS TRUE to ERT template~~ Add MISFIT_PREPROCESSOR to ERT template Oct 14, 2020

wouterjdb added the blocked label Oct 16, 2020

oyvindeide mentioned this pull request Oct 21, 2020

[MPP] Running slow with large number of observations equinor/semeio#241

Closed

2 tasks

wouterjdb moved this from In progress to On hold in FlowNet Nov 13, 2020

wouterjdb added 3 commits February 12, 2021 08:57

Add STD_SCALE_CORRELATED_OBS TRUE to ERT template

2250877

Add MISFIT_PREPROCESSOR workflow

85473c2

Add semeio 0.5.6 to test speed

9ca3e53

wouterjdb force-pushed the i206-manyobs branch from 5bab7e1 to 9ca3e53 Compare February 12, 2021 08:28

wouterjdb added 2 commits February 12, 2021 09:29

Remove STD_SCALE_CORRELATED_OBS

84c1e17

Bump semeio

839596a

Merge branch 'master' into i206-manyobs

97f477a

wouterjdb removed the blocked label Feb 22, 2021

wouterjdb and others added 10 commits February 22, 2021 11:28

Bump ert and semeio versions

e58020f

Merge branch 'i206-manyobs' of github.com:wouterjdb/flownet into i206…

7c1fc4e

…-manyobs

pylint fixes

eeecca0

Add misfit preprocessor config

3056d08

black

49bfe6c

Add the config file

f0dad00

Test larger timeout = 3600

dd4593d

Merge branch 'master' into i206-manyobs

5eb48b9

Merge branch 'master' into i206-manyobs

7106e6d

Merge branch 'master' into i206-manyobs

a546e47

wouterjdb moved this from On hold ✋ to In progress 🚧 in FlowNet Mar 5, 2021

wouterjdb assigned olelod Mar 12, 2021

Merge branch 'master' into i206-manyobs

7575c4d

edubarrosTNO mentioned this pull request Mar 20, 2021

Problems with ert=2.21.0 when running large ensembles #367

Closed

wouterjdb added this to the HM open Norne Model milestone Apr 26, 2021

wouterjdb moved this from In progress 🚧 to On hold ✋ in FlowNet May 19, 2021

wouterjdb removed their assignment Jun 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MISFIT_PREPROCESSOR to ERT template #217

Add MISFIT_PREPROCESSOR to ERT template #217

wouterjdb commented Oct 14, 2020 •

edited

wouterjdb commented Feb 9, 2021 •

edited

wouterjdb commented Feb 18, 2021

wouterjdb commented Feb 18, 2021

wouterjdb commented Feb 22, 2021

edubarrosTNO commented Mar 26, 2021

edubarrosTNO commented Mar 26, 2021

Add MISFIT_PREPROCESSOR to ERT template #217

Are you sure you want to change the base?

Add MISFIT_PREPROCESSOR to ERT template #217

Conversation

wouterjdb commented Oct 14, 2020 • edited

Contributor checklist

wouterjdb commented Feb 9, 2021 • edited

wouterjdb commented Feb 18, 2021

wouterjdb commented Feb 18, 2021

wouterjdb commented Feb 22, 2021

edubarrosTNO commented Mar 26, 2021

edubarrosTNO commented Mar 26, 2021

wouterjdb commented Oct 14, 2020 •

edited

wouterjdb commented Feb 9, 2021 •

edited