Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimisation of subsequence #490

Open
Kleidukos opened this issue May 31, 2023 · 7 comments
Open

Optimisation of subsequence #490

Kleidukos opened this issue May 31, 2023 · 7 comments

Comments

@Kleidukos
Copy link

Kleidukos commented May 31, 2023

Hi @moodmosaic! I would like to upstream an optimisation to subsequence.

At present time, subsequence is implement as such:

filterM (const bool_) xs

Which we can interpret as "Traverse the whole list and at each element, perform a random boolean generation". This is quite expensive as random generation happens for every element, and is thus unguarded.

Here is an alternative:

We can generate a random Word64 that will act as a bitfield, and iterate on it with our list. Each time we encounter a 1, then we keep the element at this index in our list. If we encounter one or multiple 0, we drop them (which is quite efficient, as for instance five successive 0s are dropped once and not five times).

Then we advance either one step (if we found a 1) or as many 0s we had found, and keep going until there are no bits left in our list.

If the Word64 was not enough, we generate one when to retrieve another random list of 1s and 0s.
Considering the average size of random subsequences, this becomes very cheap because we get 1 random operation, perhaps 2, in most cases.

Here are exerpts of the benchmark:

All
  Subsequence
    Hedgehog subsequence, native, 10 items:       OK (6.13s)
      5.87 μs ±  21 ns
    Hedgehog subsequence64, optimised, 10 items:  OK (1.41s)
      1.32 μs ±  21 ns, 0.22x

    Hedgehog subsequence, native, 20 items:       OK (100.85s)
      12.1 μs ± 1.2 μs
    Hedgehog subsequence64, optimised, 20 items:  OK (58.55s)
      1.74 μs ±  17 ns, 0.14x

    Hedgehog subsequence, native, 30 items:       OK (2.34s)
      17.8 μs ± 299 ns
    Hedgehog subsequence64, optimised, 30 items:  OK (17.73s)
      2.11 μs ±  18 ns, 0.12x

	[…]

    Hedgehog subsequence, native, 90 items:       OK (3.59s)
      54.4 μs ± 471 ns
    Hedgehog subsequence64, optimised, 90 items:  OK (1.37s)
      5.11 μs ±  64 ns, 0.09x

    Hedgehog subsequence, native, 100 items:      OK (7.90s)
      60.6 μs ± 1.0 μs
    Hedgehog subsequence64, optimised, 100 items: OK (1.47s)
      5.61 μs ± 103 ns, 0.09x

You can run the benchmarks manually at: https://github.com/Kleidukos/subsequence-bench/. They also contain comparisons with QuickCheck, both baseline and improved.

@ChickenProp
Copy link
Contributor

ChickenProp commented Jun 1, 2023

(As a note - your implementation has Gen.word64 (Range.constant 0 10000) in a couple places. Should that be Range.constant minBound maxBound?)

@Kleidukos
Copy link
Author

Good catch, I've updated the repo and the description of the ticket

@moodmosaic
Copy link
Member

This looks impressive! Shall we keep both or replace the existing one? /cc @ChickenProp, @ocharles (and any other who wants to vote, feel free!)

@ocharles
Copy link
Contributor

ocharles commented Jun 5, 2023

It certainly makes sense to me!

@ChickenProp
Copy link
Contributor

ChickenProp commented Jun 5, 2023

Replace makes sense to me, I can't immediately think of a downside.

Oh, but I'd suggest double-checking the shrink behavior. Right now I suspect it can shrink to include things that weren't in the original sublist (e.g. if the original Gen.word64 gives us 8 = 0b1000, then shrinks to 7 = 0b0111, we're including things we previously weren't), but that it could easily be fixed with src -> prune $ Gen.word64 .... But I haven't checked on either count. I'm not sure if hedgehog currently has tests for how subsequence shrinks.

(Quick edit: oh, and shrink Shrink.list $ Gen.element [[], [x]] might be off too, I think if it generates [x] it might try [] as a shrink twice.)

@Kleidukos
Copy link
Author

@ChickenProp Thank you very much for these suggestions, I'll implement them over the week-end. If you have more ideas regarding how to test the behaviour of subsequence I am all ears.

@ChickenProp
Copy link
Contributor

So the test I think I'd write would probably be to take both a generated value and its shrink tree from Gen.subsequence someList (I think I saw a way to do that but don't remember how). Then check its shrink tree is equal to that of Gen.shrink Shrink.list $ pure generatedValue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants