Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed weights of probabilities for various xeger options #58

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Licho1
Copy link

@Licho1 Licho1 commented Mar 8, 2021

Current solution gives every continuous set of characters same weight. For example if you have
"[abcdefghijklmnopqrstuvxyzč]" 50% chars were from a-z and 50% were the č

Licho1 and others added 5 commits March 8, 2021 16:31
updated xeger (definition of that charset)
updated regexp - use that set of chars instead of "printable ascii" used before + create function for print usedcharset for any regexp
define charset for  .*, \w, \W, \d, \D, \s, \S
Copy link
Owner

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR 👍 I left a comment.


namespace Fare
{
public static class RandomGenerator
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of this PRNG compared to others? E.g. SplitMix, PCG, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no idea,didn't give it much thought and this was just the natural implementation :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by natural?

Copy link
Owner

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. (I hope they won't scare you from doing further work on this PR.)


// Assert
Assert.All(result, regex => Assert.Matches(pattern, regex));
Assert.All(result, regex => Assert.Matches(pattern, regex));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assert.All(result, regex => Assert.Matches(pattern, regex));
Assert.All(result, regex => Assert.Matches(pattern, regex));

{
public static class RandomGenerator
{
private static Random RandomInstance = new Random();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass a seed here?


namespace Fare
{
public static class RandomGenerator
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by natural?

var index = RandomInstance.Next(sum);
return items[borders.IndexOf(x => x > index)];
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -87,7 +127,7 @@ public void GeneratedTextIsCorrectWithRexEngine(string pattern)
"x[0-9A-Z]",
"[^A-M]in",
".gr",
@"\(.*l",
//@"\(.*l", - on "classic" went sometimes overflow (even before my changes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put it back? (Uncomment) We can always tackle the erratic tests in a separate PR.

@@ -121,9 +161,72 @@ public void GeneratedTextIsCorrectWithRexEngine(string pattern)
@"\Sabc\S{3}111",
@"^\S\S (\S)+$",
@"\\abc\\d",
@"\w+1\w{4}",
//@"\w+1\w{4}",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put it back? (Uncomment) We can always tackle the erratic tests in a separate PR.

{"x[0-9A-Z]","x0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"},
{"[^A-M]in",null},
{".gr",null},
//@"\(.*l", - on "classic" went sometimes overflow (even before my changes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this commented-out case, it looks like this data theory list is a duplicate of the existing one(?) If so, why?

@moodmosaic
Copy link
Owner

@Licho1, I can see that you're updating the PR, which is great 👍 What is the current state of the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants