Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use existing value to "seed" another. #456

Open
MelGrubb opened this issue Jan 12, 2023 · 3 comments
Open

Use existing value to "seed" another. #456

MelGrubb opened this issue Jan 12, 2023 · 3 comments

Comments

@MelGrubb
Copy link

MelGrubb commented Jan 12, 2023

Please describe why you are requesting a feature

Apologies if this exists already, but I'm looking for a way to use Bogus to anonymize data. I would like to anonymize production data for use in a QA environment, but it would be nice if the data came out the same way each time. In other words, I would like to use "Bob Smith" as input and have it come out as "Fred Jones" each time. This is only an example, I don't literally mean those specific names, but it would be helpful for QA if the anonymized data were stable so that when we refresh the data, the example person they were looking at last week still has the same name.

tl;dr - I would like a way to pass a "seed" value to individual rules to ensure that the same random value is generated each time, based on an input value so that, for example, using "Bob" as the seed value always results in "Fred" being generated.

Please provide a code example of what you are trying to achieve

Something like this:

var testUsers = new Faker<User>()
    .RuleFor(u => u.FirstName, (f, u) => f.Name.FirstName(u.Gender, seed = {Some string value}))

Ideally, "Some value" would be automatically derived from an input value based on the real-world data, such as the existing record's FirstName property.

Please answer any or all of the questions below

  • Is the feature something that currently cannot be done?
    Not that I have found in the examples, but I could be simply missing something.

  • What alternatives have you considered?
    Seeding the generator with the .UseSeed method on each loop through the anonymizer, based on hashing the record's Id. As pointed out in Sequence Determinism When Adding New Property #104 though, any changes made to the structure such as the addition of new fields would throw everything after that off.

  • Is this feature request any issues or current problems?

  • Has the feature been requested in the past?
    Not that I could find in a cursory search of other requests. The closest I've found is Sequence Determinism When Adding New Property #104

If the feature request is approved, would you be willing to submit a PR?

No
I wish I had the time, but I don't. Maybe if I get to retire from the day job someday.

@Crossbow78
Copy link

Crossbow78 commented Jan 13, 2023

Other use-cases could be:

  • Pick a random "StartDate" in the past, and an optional "EndDate" which must lie after the chosen StartDate
  • Pick a random "IsCancelled" value, and only populate the "CancellationReason" if the chosen 'IsCancelled' value was true

Generally speaking, how would we teach Bogus about (basic) dependencies between properties in our data models?

I could imagine a syntax like this:

.RuleFor(x => x.StartDate, f => f.Date.Past())
.RuleFor(x => x.EndDate, f => f.Date.Future(relativeToProperty: x => x.StartDate).OrNull(f))  // Suggested syntax, not working

or:

.RuleFor(x => x.IsCancelled, f => f.Random.Bool())
.RuleFor(x => x.CancellationReason, f => f.Random.Words().OrNullWhen(x => !x.IsCancelled, f)  // Suggested syntax, not working

@Pigna
Copy link

Pigna commented Jun 20, 2023

I am using Bogus already to do something you are doing,

I added the following code and used the ID of the data from the database to fill the seed.

var faker = new Faker
{
    Random = new Randomizer(seed)
};

@MelGrubb
Copy link
Author

That was addressed in my second bullet point. Seeding the randomizer based on the Id, or a hash of the Id works great until you add new properties to the object. To ensure that each output property is stable, you'd have to re-seed the randomizer for each and every individual field, which would be very cumbersome. I'm specifically looking for per-field seeding based on an input value so that the output is random, but stable for each input value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants