Rework random generator to use LCG #928

zvirja · 2017-11-19T22:41:14Z

Fixes #789.

Current random generator suffers from a few issues which I tried to address in this PR:

it uses the !generated.Contains(newRnd) statement to guarantee the uniqueness of the produced values. This approach could lead to significant delays when latest elements in a sequence are generated - we should find those "remaining" numbers using random 😕
it leads to OOM exceptions when you create large number of items, because underlying collection grows too much.

To address those issues I used the Linear Congruential Generator which generates the unique values if constants match the specific criteria. It has very small memory footprint and offers a good generation performance. I used constants from this document, where they mentioned that values are pretty good. Also I've tested all the constants till 2^33 and proven that they indeed produce a unique sequence (there were a few wrong constants, so I fixed them). I haven't tested the constants above because my PC has no enough power for that (RAM and CPU cores).

The issue with LCG is that it always produces the same sequence for the given constants. To overcome that I used a few techniques:

I use the system random to generate a seed - the starting number;
we have a few generator constants for the same value range and we randomly pick the constant to use.

Those approaches allows to make the produced sequence even more random and non-repeating.

A few things we should to be aware of:

Under the hood generator uses the hardcoded constants, meaning that random value range is fixed (this.m-1). To adapt to the required range I ignore random values above the range (see a cycle in the GenerateNextValue() method). In average we ignore a half of the generated numbers as we have constants for each power of 2. So far I haven't noticed any issues with that during testing.
I haven't tested the distribution and randomless quality of the hardcoded constants (using SmallCrush, Diehard and or tests). I believe that it shouldn't be an issue, as we are not a crypt tool and the previous approach didn't guarantee any distribution neither. I reviewed values visually and they look pretty random 😅
I used Debug.Assert() a few times - that condition is never expected to happen logically, however left them for documentation purpose and as a sanity check.

No tests were modified (you might see a refactoring only).

Additionally I've referenced the System.ValueTuple package in the AutoFixture package, so we could use the tuples from C# 7. That shouldn't be an issue as we are still building v4. Also that could enable us to support tuples generation by AutoFixture if needed.

@moodmosaic Please take a look and let me know your opinion. I've spent a few weeks working on this, but it could happen that I overlooked something 😉

moodmosaic

The issue with LCG is that it always produces the same sequence for the given constants.

This is NOT an issue; that's how PRNGs are supposed to work 😉

given the same seed
you get back the same result
— so how do you have PRNGs generate random numbers, then?
you make sure to supply a totally random seed
and one way of implementing this is by splitting the seed each time you need to generate a number

I have A LOT feedback to provide here, as my experience from years with AutoFixture, and the issues I've seen by using FsCheck for years, are more or less the result of finding myself working in Hedgehog.

It really depends on you, @zvirja, which direction you'd like to take:

Keep around the existing random number generation (as well as the existing way of dealing with ranges (Add support for string based properties decorated with the RangeAttribute #919)), which is fine—after all, these are already there, from version 3—though, we'd have to be conservative with bugfixes, as we can easily end up putting a lipstick on a pig...
Fix the random number generation by decoupling the 'range' concept (constant, linear, exponential) from the PRNG itself, and use some reasonable seed, preferably SplitMix.

It's exciting that you took the time to tackle this very issue 👍 — I look forward to hearing your decision.

zvirja · 2017-11-21T12:18:36Z

we'd have to be conservative with bugfixes, as we can easily end up putting a lipstick on a pig...

Well, this is approximately what I'm trying to do in this PR 😄 I don't change the overall architecture and
simply change the implementation to require much less memory and provide results with less delay.

It really depends on you, @zvirja, which direction you'd like to take:

Well, actually it's pretty hard for me to decide 😕 The idea of total rework looks attractive, however, I feel that it's probably better to keep the existing approach and simply improve it internally for now. That would decrease the amount of changes we have in v4, as we already applied a set of them. Also that will allow us to save some time and finally release v4 RTM. In v5 when we have more time, we could rework that and probably a few other things.

I feel a high level of uncertainty with this decision, however it looks like a good plan for now.

Does that make sense to you?

moodmosaic · 2017-11-23T14:41:01Z

I understand that you threw a lot of effort on this, and it is highly appriciated 🎉

However,

Is it a simple change?
Is this set of changes warranted?

Why choose LCG over others, battle-tested ones, like

I understand that the current implementation suffers from mediocre performance, however,

we haven't run statistical tests (e.g. diehard) between the current and the proposed implementation.

That being said, it might be just as easy as

trying to 'fix/optimize' the current one, without throwing in a whole bunch of changes.
—or—by moving LCG to its own class, and then change the Fixture instance to use that one instead.

zvirja · 2017-11-23T23:39:39Z

Why choose LCG over others, battle-tested ones, like

I'm not very strong in this particular area, therefore I relied on the Google. I didn't want to change the behavior of the existing generator, and the current one has a full cycle (no repetitions within the range). From what I got we have either LCG or LFSR which could produce such sequences. I ended up with the first one as it was documented to provide a bit better quality. Could the "TFGen" or "SplitMix" also produce non-repeating values withing the specified range? Otherwise, that change is breaking and it was not my intention.

we haven't run statistical tests (e.g. diehard) between the current and the proposed implementation.

Yep, it's the whole point. While we haven't tested, I 99% sure that the previous generator implementation suffered from the distribution issues due to the nature of its implementation. Therefore, when I decided to replace it with LCG, I assumed that quality is not the strong point here. Rather, we expect have something a bit random and that's enough. The LCG provided me with such values, so ti looked like a good replacement even if I haven't tested that function has good enough quality. The more important thing is that it preserves the full-cycle semantics as that is asserted in our unit tests.

That being said, it might be just as easy as

I don't see too much value in persisting the previous implementation as a new one provides the exactly same output. The observer will not see any difference after this change. The only visible difference might be the changed distribution of numbers, but our tool is not designed to generate random of the perfect quality, so I'm keeping the status quo.

That is how I see the situation and that's why I suggested to proceed with current non-optimal architecture and re-design it later in v5 in a more architectural way if we will.

Probably, from your experience my changes look like a vandalism or something totally amateurish and that might be so. However I didn't pursue a goal to build a generator with a good distribution. Rather I simply wanted to have:

full-cycle generator
which doesn't suffer from current issues
and generates pseudo-random sequence of "non-obvious" quality 😄.

From my perspective this PR achieved that goal.

If you see that my goal is not achieved or you see the issues I've introduced - please share your feedback, so we are aligned. I'd be really happy to learn this area and see why my proposal is not acceptable (if that is so), despite the fact that I've already invested a lot of resources into that.

moodmosaic · 2017-11-30T06:28:26Z

You did a massive work here, and I think we can use this in place of the RandomNumericSequenceGenerator in Fixture.

Would you mind putting this in a new file named as LinearCongruentialNumberGenerator or LinearCongruentialNumericGenerator if we want to be consistent?

It's just that it feels like we're putting a lipstick on a pig trying to fix the existing one is all 😅

That's the same strategy we used with all sorts of new generators, we never deleted or modified the old ones, and instead we've added new ones. (SOLID can be append-only, too.)

zvirja · 2017-11-30T22:48:10Z

The more I think about this PR and review the stuff, the more internal pain I start to feel 😅

My initial intent was to fix the current generator issues, without changing the observable behavior. I assumed that our clients will not notice any difference after the change, while internally we win performance. In fact, the public API wasn't changed, none of the test failed - I simply adjusted the implementation detail, which is safe to do even in bug fix releases.

Previously, you mentioned and now I that clearly see, that current implementation is.. not the best one. I also don't like the idea that RandomNumericSequenceGenerator has too much responsibilities both in implementation and usage. Because of that class contract is very fuzzy. For instance, if you request sbyte elements, you'll quickly start to see negative numbers due to overflow. As for the int type you would probably never get the negative numbers. This builder blows my mind each time I see it, because it pretends that it offers full cycle, however that will be only if you request "types" large enough to hold those values. I also don't see any reason in that contract when we offer the full cycle for sporadic member value requests. And so on 😧😟😢

I was fine to keep that architecture as-is and simply change the implementation. However, if you decline that idea and the only option is to add a new builder, that's another story. I don't want to add a shiny new builder which will be poisoned by the bad architecture from the beginning. We'll need to keep it in our code for a few major versions even if we have a good replacement. In that light, I'd vote to close this PR (keeping the implementation in branch to re-use it later) and do nothing for now. Later we'll design the better architecture (as you initially suggested) where we'll fix all the drawbacks and split responsibilities, so this issue will be automatically fixed. We can develop that in minor updates, as that will not lead to the breaking changes.

Therefore, let's see your final voice. If you see reasons why we cannot alter implementation of the RandomNumericSequenceGenerator, then let's close the PR - I don't want to spread this chaos across the project 😄

moodmosaic · 2017-12-01T10:58:43Z

Yes, you probably didn't see any tests failing as we haven't done any statistical testing on what we currently have.

zvirja · 2017-12-01T15:15:45Z

Yep, that is probably the only thing that could be affected. I don't know the project background well to know whether we actually promised the distribution quality to our clients. I was under the impression, that we didn't (as we don't have tests for that) so it's safe to change it.

Therefore, I keep it up to you to decide whether current change is acceptable or not, as you are more experienced on this area than me 😉

moodmosaic · 2017-12-01T18:29:31Z

I'd rather keep your implementation in a new file as LcgNumericSequenceGenerator, and then change the Fixture class to use this one instead.

moodmosaic · 2017-12-01T18:30:13Z

If you do that, I have a few comments for your implementation, but you'd have to move it to a new file first.

zvirja · 2017-12-12T08:57:29Z

@moodmosaic In this case let's follow the other option you suggested - decouple random generator (and the range concept) from the RandomNumericSequenceGenerator, that is used to serve number requests.

I suggest to create the following interface in the Random namespace:

interface IRandomGenerator
{
    long GetNextValue();
}

and also probably:

interface IRangedRandomGenerator: IRandomGenerator
{
    (long value, (long min, long max) range) GetNextValueAndRange();
}

The latter one could be used by RandomRangedNumberGenerator, so we don't hardcode the range and use the default one.

With this approach we could:

implement different generators (i.e. Random based, LCG, exponential, etc);
easily customize fixture to use different random generator, as we simply need to change the strategy for the existing builder;
use generators directly when we need random values (e.g. by ElementsBuilder), rather than RandomNumericSequenceGenerator that provides less obvious API.

Is that design fine to you, or you see some drawbacks with it? Is that what you previously meant?

moodmosaic · 2017-12-13T11:16:44Z

@zvirja, I'm not sure we need to expose (and maintain) interfaces right now. After all,

"clients [...] own the abstract interfaces"
-- Agile Principles, Patterns, and Practices, ch. 11

I think, all we really need is two functions, like those in here and a seed like this one.

The previous implementation sufferred from a few major issues. The first issue is that it failed with OOM if large number of numeric requests are issued. Another issue is that generation time of the latest elements in a range could take large amount of time, as it works in a loop. The new implementation uses the Linear Congruential Generator approach where the properties of prime numbers are used to generate non-repeating sequence on a particular range. While we are still generating numbers in a loop ignoring out-of-range values, the predictability of this approach is much better because of the values distribution. If distribution were uniform, we'd ignore at most half of the generated numbers. In reality the distibution is worse, however still good enough to rely on it.

zvirja added bug enhancement labels Nov 19, 2017

zvirja added this to the 4.0 RTM milestone Nov 19, 2017

zvirja requested a review from moodmosaic November 19, 2017 22:41

moodmosaic reviewed Nov 20, 2017

View reviewed changes

zvirja removed this from the 4.0 RTM milestone Dec 15, 2017

jzabroski mentioned this pull request Jun 21, 2020

What's next for AutoFixture? #1167

Closed

moodmosaic mentioned this pull request Jul 3, 2020

Proposed fix for #1183 - multiple regex within short time might give identical strings #1187

Merged

zvirja added 2 commits February 21, 2021 23:04

Improve test readability without changing its logic

3abf85b

aivascu force-pushed the improve-random-generator branch from 3a33289 to 305c4ad Compare February 21, 2021 21:20

aivascu added 2 commits February 21, 2021 23:39

Implements code style changes

3a4dd51

Renames random numeric sequence generator

d566f61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework random generator to use LCG #928

Rework random generator to use LCG #928

zvirja commented Nov 19, 2017

moodmosaic left a comment

zvirja commented Nov 21, 2017

moodmosaic commented Nov 23, 2017 •

edited

zvirja commented Nov 23, 2017

moodmosaic commented Nov 30, 2017

zvirja commented Nov 30, 2017 •

edited

moodmosaic commented Dec 1, 2017

zvirja commented Dec 1, 2017

moodmosaic commented Dec 1, 2017

moodmosaic commented Dec 1, 2017

zvirja commented Dec 12, 2017

moodmosaic commented Dec 13, 2017

Rework random generator to use LCG #928

Are you sure you want to change the base?

Rework random generator to use LCG #928

Conversation

zvirja commented Nov 19, 2017

moodmosaic left a comment

Choose a reason for hiding this comment

zvirja commented Nov 21, 2017

moodmosaic commented Nov 23, 2017 • edited

zvirja commented Nov 23, 2017

moodmosaic commented Nov 30, 2017

zvirja commented Nov 30, 2017 • edited

moodmosaic commented Dec 1, 2017

zvirja commented Dec 1, 2017

moodmosaic commented Dec 1, 2017

moodmosaic commented Dec 1, 2017

zvirja commented Dec 12, 2017

moodmosaic commented Dec 13, 2017

moodmosaic commented Nov 23, 2017 •

edited

zvirja commented Nov 30, 2017 •

edited