Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum up a function over primes #97

Open
Bodigrim opened this issue Mar 13, 2018 · 10 comments
Open

Sum up a function over primes #97

Bodigrim opened this issue Mar 13, 2018 · 10 comments

Comments

@Bodigrim
Copy link
Owner

Math.NumberTheory.Primes.Counting.Impl.primeCount implements an algorithm to compute π(n), which is number of primes below n, in O(n^0.7) time and O(n^1/2) space. One can view π(n) as a sum of f p = 1 over prime p below n.

As it was pointed by @CarlEdman in #60 (primeLucy), this approach allows a generalization for any function f with the same time and space requirements. Moreover, there is a (difficult) way to improve its time complexity to O(n^2/3).

That said, it seems desirable to productionalize primeLucy, provide a nice API, write decent tests and benchmarks, and then express primeCount as a special case.

Additional resources:
https://en.wikipedia.org/wiki/Meissel–Lehmer_algorithm
http://sweet.ua.pt/tos/bib/5.4.pdf
https://projecteuler.net/thread=10;page=5#111677

@jhenahan
Copy link

I've been messing around with this on a branch, and there are some really nice implementations that fall out of this approach.

  • primeCount falls out immediately as primeLucy (const 1)
  • sumOfPrimesUpTo is simply primeLucy id

It's still quite messy, but it's extremely fast once it has warmed up.

primeCount 100000 is slowish on my machine, every time I run it. primeCount' 100000 on my machine is slow the first time, but the result is cached and instantaneous for every n <= 100000 thereafter. And that shared allocation of primes carries over to every other function that uses primeLucy under the hood, so I can instantly get the sum of primes up to 100000.

I'll probably meditate on this and see if I can reimplement in terms of Vector (since that's another open task). I'll likely also rework the API so there's an option to provide your own summatory function for the case where summatory f isn't what you actually want, as in the original formulation.

@CarlEdman
Copy link
Contributor

CarlEdman commented Mar 18, 2018

@jhenahan
If we are talking about the primeLucy I originally submitted, I don't entirely understand.

  1. My primeLucy not only allows you to supply, but requires, a summatory function. While calculating the summatory function from the principal function is possible, that would take O(n) and hence defeat the purpose.

  2. As for switching from the Array interface to the Vector interface, far be it from me to defend the former's elegance. That said, the code I submitted was highly optimized using explicit offsets and the unsafe functions. That ought to compile down to close to optimal x86-64 assembly, so I doubt that using the nicer Vector interface will buy you much. But if you want to do it for elegance's sake and achieve the same (or better!) performance, please do!

@jhenahan
Copy link

@CarlEdman Thanks very much for the fairly critical point about the summatory function. :) That would absolutely defeat the purpose. I'll have to think on it more, in that case.

@CarlEdman
Copy link
Contributor

@jhenahan My pleasure. Let me know if I can help.

@Bodigrim
Copy link
Owner Author

With regards to array vs. vector: there are unboxed mutable vectors with a low-level interface, providing decent performance: unsafeRead, unsafeWrite, etc. I feel like nowadays Haskell developers (or at least myself :) are more acquainted with Vector and it is a default choice, but in course of this issue we should try to benchmark both data structures.

@CarlEdman
Copy link
Contributor

@Bodigrim I think that unboxing could be a big win. For starters, it ought to reduce GC time to just about 0. The reason I didn't use unboxed arrays was that for my problem I needed more than 64-bit int precision, so I had to go with Integers which can't really be unboxed.

Given how large some of the summatory function results can be, I think keeping Integers available for primeLucy would be useful. But an alternative version that uses unboxed ints (perhaps with modular arithmetic support at every stage) could be a big win for certain users.

@jhenahan
Copy link

jhenahan commented Mar 19, 2018

Would you have any suggested literature on “useful” summatory functions? While I’m thinking on it, it couldn’t hurt to build in more special cases like primeCount.

@CarlEdman
Copy link
Contributor

CarlEdman commented Mar 19, 2018

@jhenahan

I don't really have a literature, but a few pointers:

  1. The Faulhaber polynomials (for which I submitted code at Issue #70) give the summatory functions for all series of the form $x^n$. Hence, by trivial extension, they also give the summatory functions for any polynomial.

  2. I have not tried this, but I believe primeLucy can be used to generate summatory functions which can be fed back to itself. This should allow you to efficiently calculate, e.g., the sum of all composite numbers up to n which are the product of exactly two primes.

@Bodigrim
Copy link
Owner Author

Bodigrim commented Apr 8, 2018

@jhenahan any news? Feel free to ping me, if you need to discuss anything.

@jhenahan
Copy link

jhenahan commented Apr 8, 2018

Nothing yet, but I’ll be setting aside some time to look into it further next week. Work’s been eating my OSS time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants