Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up frequency #338

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fix up frequency #338

wants to merge 3 commits into from

Conversation

treeowl
Copy link
Contributor

@treeowl treeowl commented Nov 11, 2019

  • Make frequency detect negative frequencies, frequency
    total overflow, and zero total frequency.

  • Rewrite frequency to build an IntMap to represent the
    frequency list, which greatly improves efficiency for long
    lists.

Fixes #337.

@treeowl
Copy link
Contributor Author

treeowl commented Nov 11, 2019

I could use another pair of eyes to make sure this produces precisely the right distribution. I believe it does, but I don't trust myself around fence-posts.

* Make `frequency` detect negative frequencies, frequency
  total overflow, and zero total frequency.

* Rewrite `frequency` to build an `IntMap` to represent the
  frequency list, which greatly improves efficiency for long
  lists.

* Bump `containers` lower bound to 0.5.11.

Fixes hedgehogqa#337.
@treeowl
Copy link
Contributor Author

treeowl commented Nov 11, 2019

Question: if no frequencies are positive, should we error out (as presently) or should we just discard in that case?

@treeowl
Copy link
Contributor Author

treeowl commented Nov 11, 2019

I added a commit that discards when all the frequencies are zero. That feels more correct to me, but feel free to go with whichever you prefer.

| nk < 0 = error "Hedgehog.Gen.frequency: Frequency sum above maxBound :: Int"
| k > 0 = Just ((nk, x), (nk, xs))
| otherwise = go (n, xs)
where !nk = n + fromIntegral k
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fromIntegral isn't needed. Let's clean that up in merging.

@treeowl
Copy link
Contributor Author

treeowl commented Nov 12, 2019

Ah, I just realized we can do a tiny bit better! We can make the IntMap one entry smaller than the frequency table, holding one element in reserve to handle failed lookup. That removes the "something went wrong" error that's been bothering me.

* Make the `IntMap` one smaller and hold the last element
  as a fall-through.

* Use a lazy `IntMap` instead of a strict one; there's no
  sense in forcing such polymorphic things.
@treeowl
Copy link
Contributor Author

treeowl commented Nov 12, 2019

While the IntMap is far better for very long lists, it's definitely not so great for the more typical short lists. What would be good for short lists? Sort the list by frequency in descending order, store the accumulated frequencies in an unboxed vector, store the values in a boxed vector, and do a linear scan.

In some cases, a hybrid mechanism may be best. If a few entries dominate, then it would make sense to store them in vectors, checked first, and the rest in an IntMap, as a fall-back. Sounds fussy.

@treeowl
Copy link
Contributor Author

treeowl commented Dec 24, 2019

Ping? I haven't seen any comments on this yet.

@HuwCampbell
Copy link
Member

HuwCampbell commented Dec 29, 2019

Personally I'm not convinced about this one.

Almost always the input to frequency is hand written with a small number of choices and the time will be quite acceptable. It'll probably actually be faster as is for idiomatic use.

@jacobstanley jacobstanley force-pushed the master branch 2 times, most recently from 4139585 to c228279 Compare May 22, 2022 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

frequency is inefficient and wrong
2 participants