Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fairer enumerations? #14

Open
jwaldmann opened this issue Jun 13, 2019 · 4 comments
Open

fairer enumerations? #14

jwaldmann opened this issue Jun 13, 2019 · 4 comments

Comments

@jwaldmann
Copy link

jwaldmann commented Jun 13, 2019

currently, leancheck enumerates lexicographically (within each tier):

tiers !! 2 :: [(Nat,Nat,Nat,Nat)]
[(0,0,0,2),(0,0,1,1),(0,0,2,0),(0,1,0,1),(0,1,1,0),(0,2,0,0),(1,0,0,1),(1,0,1,0),(1,1,0,0),(2,0,0,0)]

That means the frequency of changes in the first component is much lower than in the last
(first is moving slowly, last one is busy).

This lexicographic property is not part of the published API? But some tests depend on it.
My expectation (for the API) would be "enumerates each tier in some (unspecified) order",

and I think I prefer a style of enumeration where these changes are distributed more evenly.
I have a version (see recent commits in my fork https://github.com/jwaldmann/leancheck) that gives

tiers !! 2 :: [(Nat,Nat,Nat,Nat)]
[(0,0,0,2),(0,0,1,1),(0,0,2,0),(0,1,0,1),(0,1,1,0),(1,0,0,1),(1,0,1,0),(0,2,0,0),(1,1,0,0),(2,0,0,0)]

There's a nice paper waiting to be written here: what's the "most even" enumeration
(I'm sure there's previous work on codes), and can we obtained it via a nice program?

I think the concept only makes sense if the things to be enumerated have a fixed shape.
So it's not applicable to lists or trees, so overall effect (on my typical leancheck usage)
will probably be small.

@rudymatela
Copy link
Owner

@jwaldmann I'm sorry for the delay in replying.

currently, leancheck enumerates lexicographically (within each tier):

> tiers !! 2 :: [(Nat,Nat,Nat,Nat)]
[(0,0,0,2),(0,0,1,1),(0,0,2,0),(0,1,0,1),(0,1,1,0),(0,2,0,0),(1,0,0,1),(1,0,1,0),(1,1,0,0),(2,0,0,0)]

That means the frequency of changes in the first component is much lower than in the last
(first is moving slowly, last one is busy).

You are right. That is usually what happens to tuple values within a tier.

As you may have noticed, LeanCheck tries to be fair between tiers, but not within them. The idea is that values on the same tier have the same "priority" when tested, none is more important than others. One way to make it so that tests are always fair could be to use size, instead of the number of tests as a parameter to holds and check. And that's exactly what SmallCheck and Feat do -- on them you choose the number of tests by configuring the maximum depth/size.

On LeanCheck, I choose to use just the number of tests for two reasons:

  1. it just seems simpler from the user's perspective
  2. it allows for a more precise control of runtime if you have expensive tests

The drawback is that LeanCheck may end up being "unfair" to the last tier being tested. Which is what you've just shown on your example.

One way to still keep it simpler but loosing fine grained control of runtime, is to define holds as follows:

-- | Does a property __hold__ for at least the given number of test values?
--
-- > holds 1000 $ \xs -> length (sort xs) == length xs
--
-- The actual number of tests may be higher than the given number of test
-- values to account for fairness: if we start testing a tier of values, we do
-- so until the end.
holdsForAtLeast :: Testable a => Int -> a -> Bool
holdsForAtLeast n = and
                  . map snd
                  . concatMap snd
                  . takeWhile ((<= n) . fst)
                  . accumulate 0
                  . resultiers
  where
  accumulate n [] = []
  accumulate n (xs:xss) = let n' = n + length xs
                          in  (n',xs) : accumulate n' xss

That way, it will be fair to the last tier being tested -- which is where the unfairness may rise.

This lexicographic property is not part of the published API? But some tests depend on it. My expectation (for the API) would be "enumerates each tier in some (unspecified) order",

Explicitly it is not. Implicitly, somewhat yes. It will be lexicographic only when the underlying enumeration is also lexicographic. I find that property interesting.

and I think I prefer a style of enumeration where these changes are distributed more evenly. I have a version (see recent commits in my fork https://github.com/jwaldmann/leancheck) that gives

tiers !! 2 :: [(Nat,Nat,Nat,Nat)]
[(0,0,0,2),(0,0,1,1),(0,0,2,0),(0,1,0,1),(0,1,1,0),(1,0,0,1),(1,0,1,0),(0,2,0,0),(1,1,0,0),(2,0,0,0)]

In this specific case (0,2,0,0) has been promoted to appear 2 positions earlier. Perhaps the difference is more significant on further tiers? I took a look at the code changes and I see that your version works like enumerating a pair of pairs in this case.

There's a nice paper waiting to be written here: what's the "most even" enumeration
(I'm sure there's previous work on codes), and can we obtained it via a nice program?

This always bugged me a bit, but I never actually went down and tried to solve fairness withing tiers. This reminds me of an earlier unreleased version of LeanCheck I had which did not have the concept of tiers, it was just lists. But enumerations turned out to be very unfair. I think this is quite a hard problem, aggravated by the fact that fairness may be a bit subjective.

I think the concept only makes sense if the things to be enumerated have a fixed shape.
So it's not applicable to lists or trees, so overall effect (on my typical leancheck usage)
will probably be small.

I think you are right. As we start to have more complex structures, the effect of the way we combine tiers will have in the enumerations becomes harder to predict intuitively.

@jwaldmann
Copy link
Author

On fairness between tiers and inside tiers: I think in practice most of the time is spent in the "last" tier
(since length of tiers is (drastically) increasing) so the order of earlier tiers does not matter much. But of course we don't know which one is "last" (i.e., has too many elements for using them all), and that's a reason to be fair everywhere (inside each tier).

Anyway, I came up with this "objective" specification: when enumerating values of one tier of a fixed shape (e.g., tuples), for each set of components (indices), the projection of the enumeration to that subset of components should approximate the limit distribution as fast as possible. (It's still inexact - need to explain
how to compare series of distributions).

E.g., 3-tuples of N, tier s contains tuples t with sum t == s. We can do

(0,0,8) (0,1,7) .. (0,8,0) (1,0,7) .. (1,7,0) ..

the projection to first component is

0         0       ..   0         1     ..     1     ..      

For any short prefix of the enumeration, then the projection to the first component will have very uneven distribution. There must be a theorem that says that one could do better.

E.g., we can (hypothetically) enumerate the tier completely (or represent it in some other way, e.g., via BDDs) and then take a random permutation (equivalently, draw elements one after another, randomly).
The challenge is to come up with a deterministic algorithm that is "online" (i.e., it does not need complete enumeration beforehand). Paper waiting to be written ...

@jwaldmann
Copy link
Author

on fairness - see https://github.com/jwaldmann/fair-enum

@rudymatela
Copy link
Owner

on fairness - see https://github.com/jwaldmann/fair-enum

@jwaldmann Interesting... I'll do a careful read soon and I'll be back to this thread with my thoughts.

@JonasDuregard already mentioned this in the other thread: FEAT may be more suitable for "shuffling" tiers of values as the enumeration is more efficient and permits efficient sampling of arbitrary values. The following example extracts the 10^10000th (gazillionth?) values on the enumeration of lists of booleans in less than a second:

> import Test.Feat
> index (10^10000) :: [Bool]
[False,False,True,True,False,True,True,True,False,False,...]
(0.71 secs, 590625624 bytes)

... and since LeanCheck wasn't designed for sampling like that, it'll run out of memory before you can say gazillionth, haha!

Besides FEAT it may also be insteresting to take a look on:

  • Duregard's lazy-search;
  • GenCheck -- this one seems to be in a more of experimental state but it's goal is tangentially related to what you are proposing so it seemed worth a metioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants