Generate a Gaussian noise dataset #52

jrs65 · 2019-10-08T23:32:43Z

This task should:

Use the noise estimates of an existing dataset to replace the data section (use the weight property).
Support multiple container types. I'm not sure duck typing will work, because the data section has different names, and sometimes is real and sometimes complex, so you might need to specify the properties for different types of data
Support both draco and CHIME timestream types without explicitly importing CHIME packages (this you may need to duck type)
Efficiently generate random numbers (use RandomGen which is already a dependency)

@tristpinsm feel free to add items to this one.

The text was updated successfully, but these errors were encountered:

anjakefala · 2019-10-09T23:46:55Z

Edited based on Tristan's comment
My understanding:

The weight property is the estimate of the variance of the noise in each datasample.
vis_weight is the inverse variance
The task should zero all data and replace it with a realisation of a Gaussian noise distribution with a weight variance

Is the above correct?

tristpinsm · 2019-10-09T23:48:53Z

that's right, except that vis_weight is actually the inverse variance

anjakefala · 2019-10-09T23:50:40Z

@tristpinsm Thanks!

anjakefala · 2019-10-15T23:29:43Z

Notes from meeting:

For a random generator, the following key things are needed:

you want it to be reproducable
you want it to be generating different arrays for each node and for each realisation

for mpi_random_seed, the extra param takes care of generating a different array for each node. we also may want to have multiple reproducible realisations. Proposal is to either add a counter to tasks that employ it or the mpi_random_seed context manager keeps track of how often it was run.

…nerator Closes #52.

anjakefala · 2019-10-22T23:27:37Z

Do we want to update the other tasks to use RandomGen instead of NumPy.random?

tristpinsm · 2019-10-22T23:45:14Z

Would the reason to do so be performance? Do we think using numpy is currently a limitation?

anjakefala · 2019-10-22T23:46:12Z

I will look into the difference between them! I just noticed that @jrs65 recommended using RandomGen and elsewhere in synthesis/noise.py uses NumPy.random.

tristpinsm · 2019-10-22T23:50:16Z

yeah I had never heard of RandomGen until now and I'm still not sure what its advantages are. From what I read in their docs it is supposed to be quite a bit faster in certain situations. They also note that compatibility (presumably between versions) is not guaranteed.

tristpinsm · 2019-10-22T23:58:50Z

I think this has been integrated in numpy already: https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator. As long as we are running a recent version this functionality should already be available (?)

jrs65 · 2019-10-23T04:08:47Z

I don't think there's any reason to go back and change things, but as RandomGen is already a requirement we should use it when we can.

As @tristpinsm says the advantage is performance, and there are certain tasks (of which this will be one) where the speed of the RNG is the bottleneck. I introduced it for the delay power spectrum estimator (which in some sense internally does what your doing here hundreds of times), and it took it down from 40 mins per power spectrum to more like 10 mins.

jrs65 · 2019-10-23T04:12:38Z

I agree that it looks like it has been merged into numpy in 1.17 (or at least the core). I'd like to check that it still has the OpenMP parallel mode though before dropping the dependency.

anjakefala · 2019-10-23T17:23:58Z

Understood!

The reason behind changing the original ones was to simplify the seed state setting (to avoid having one for setting the NumPy seed and one for setting the RandomGen seed). But if it is integrated into NumPy (I will check if it still has the OpenMP parallel mode), perhaps they share a state.

anjakefala · 2019-10-23T18:51:29Z

So, seeding seems to work in different ways for the "legacy random" and the "new generators".

RandomState provides access to legacy random https://numpy.org/devdocs/reference/random/legacy.html. get_state/set_state/seed specifically work with the legacy randoms https://numpy.org/devdocs/reference/random/legacy.html?highlight=seed.

The new RandomGenerator works by initialising a generator with a seed https://numpy.org/devdocs/reference/random/generator.html#numpy.random.Generator. SeedSequence https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence is the main class that determines the sequence of seeds.

So if we bump to NumPy 1.17, it will be a bit of a refactor, and the two random generators do not intersect with their seed states.

anjakefala · 2019-10-23T19:52:26Z

I cannot confirm whether it still has OpenMP parallel mode, but I do not see anything in here that would indicate support being removed: numpy/numpy#13163.

tristpinsm · 2019-10-23T20:02:00Z

It seems like the changes are pretty significant between 1.16 and 1.17. Should we create an issue to migrate whatever uses the old RandomState methods to Generators?

anjakefala · 2019-10-23T21:19:02Z

@tristpinsm Agreed, they are big and out of scope for this issue/pr, and I think it would be good to do the migration.

Closes #52.

jrs65 assigned anjakefala Oct 8, 2019

tristpinsm self-assigned this Oct 9, 2019

anjakefala added a commit that referenced this issue Oct 22, 2019

feat(gaussiannoise): add a first-pass add a gaussian noise dataset ge…

b974338

…nerator Closes #52.

anjakefala mentioned this issue Oct 22, 2019

feat(gaussiannoise): add a gaussian noise dataset generator #55

Merged

anjakefala added a commit that referenced this issue Oct 23, 2019

feat(gaussiannoise): add local seed states for RandomGen

4bec93a

Closes #52.

anjakefala closed this as completed in #55 Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate a Gaussian noise dataset #52

Generate a Gaussian noise dataset #52

jrs65 commented Oct 8, 2019 •

edited by anjakefala

anjakefala commented Oct 9, 2019 •

edited

tristpinsm commented Oct 9, 2019

anjakefala commented Oct 9, 2019

anjakefala commented Oct 15, 2019

anjakefala commented Oct 22, 2019

tristpinsm commented Oct 22, 2019

anjakefala commented Oct 22, 2019

tristpinsm commented Oct 22, 2019

tristpinsm commented Oct 22, 2019

jrs65 commented Oct 23, 2019

jrs65 commented Oct 23, 2019

anjakefala commented Oct 23, 2019 •

edited

anjakefala commented Oct 23, 2019 •

edited

anjakefala commented Oct 23, 2019 •

edited

tristpinsm commented Oct 23, 2019

anjakefala commented Oct 23, 2019

Generate a Gaussian noise dataset #52

Generate a Gaussian noise dataset #52

Comments

jrs65 commented Oct 8, 2019 • edited by anjakefala

anjakefala commented Oct 9, 2019 • edited

tristpinsm commented Oct 9, 2019

anjakefala commented Oct 9, 2019

anjakefala commented Oct 15, 2019

anjakefala commented Oct 22, 2019

tristpinsm commented Oct 22, 2019

anjakefala commented Oct 22, 2019

tristpinsm commented Oct 22, 2019

tristpinsm commented Oct 22, 2019

jrs65 commented Oct 23, 2019

jrs65 commented Oct 23, 2019

anjakefala commented Oct 23, 2019 • edited

anjakefala commented Oct 23, 2019 • edited

anjakefala commented Oct 23, 2019 • edited

tristpinsm commented Oct 23, 2019

anjakefala commented Oct 23, 2019

jrs65 commented Oct 8, 2019 •

edited by anjakefala

anjakefala commented Oct 9, 2019 •

edited

anjakefala commented Oct 23, 2019 •

edited

anjakefala commented Oct 23, 2019 •

edited

anjakefala commented Oct 23, 2019 •

edited