Skip to content

calyau/preiistmmo

Repository files navigation

preiistmmo

Pondering scalability via primes ...

Build Status Dependencies Status

Clojars Project Tag Clojure version

Contents

Usage

CLI

There are two entry points for the CLI:

  • primes - this just does raw computation of prime numbers using an algorithm of choice
  • table - this demos a UI (terminal) utilizing the prime-number generator

From the shell script

The following examples are for using the CLI to generate primes.

$ ./bin/preiistmmo primes --help
Options:

  -c, --count INTEGER   10        The number of primes to return
  -s, --start INTEGER   2         The starting integer (only valid for the 'divisors' algorithm)
  -a, --algorithm NAME  divisors  The prime number generating algorithm to use (one of ["apache" "sundaram" "eratosthenes" "divisors"])
  -h, --help                      This usage text

Default behaviour:

$ ./bin/preiistmmo primes
 (2 3 5 7 11 13 17 19 23 29)

Getting more primes:

$ ./bin/preiistmmo primes --count 20
 (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)

If you're using the default algorithm, you may also set the starting integer:

$ ./bin/preiistmmo primes --start 100 --count 20
 (101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197)

The following examples are for using the CLI to generate prime number arithmatic tables (by default, these are multiplication tables; however addition, subtraction, division, and modulus are also supported).

$ ./bin/preiistmmo table --help
Options:

  -c, --columns INTEGER   10        The number of primes to to use for columns
  -r, --rows INTEGER      10        The number of primes to to use for rows
  -o, --operation SYMBOL  *         The arithmatic operation to perform (one of ["*" "%" "/" "-" "+"])
  -s, --start INTEGER     2         The starting integer (only valid for the 'divisors' algorithm)
  -a, --algorithm NAME    divisors  The prime number generating algorithm to use (one of ["apache" "sundaram" "eratosthenes" "divisors"])
  -h, --help                        This usage text

Default behaviour (a 10x10 grid):

$ ./bin/preiistmmo table
|  * |  2 |  3 |   5 |   7 |  11 |  13 |  17 |  19 |  23 |  29 |
|----+----+----+-----+-----+-----+-----+-----+-----+-----+-----|
|  2 |  4 |  6 |  10 |  14 |  22 |  26 |  34 |  38 |  46 |  58 |
|  3 |  6 |  9 |  15 |  21 |  33 |  39 |  51 |  57 |  69 |  87 |
|  5 | 10 | 15 |  25 |  35 |  55 |  65 |  85 |  95 | 115 | 145 |
|  7 | 14 | 21 |  35 |  49 |  77 |  91 | 119 | 133 | 161 | 203 |
| 11 | 22 | 33 |  55 |  77 | 121 | 143 | 187 | 209 | 253 | 319 |
| 13 | 26 | 39 |  65 |  91 | 143 | 169 | 221 | 247 | 299 | 377 |
| 17 | 34 | 51 |  85 | 119 | 187 | 221 | 289 | 323 | 391 | 493 |
| 19 | 38 | 57 |  95 | 133 | 209 | 247 | 323 | 361 | 437 | 551 |
| 23 | 46 | 69 | 115 | 161 | 253 | 299 | 391 | 437 | 529 | 667 |
| 29 | 58 | 87 | 145 | 203 | 319 | 377 | 493 | 551 | 667 | 841 |
$ ./bin/preiistmmo table --columns 4 --rows 4
| * |  2 |  3 |  5 |  7 |
|---+----+----+----+----|
| 2 |  4 |  6 | 10 | 14 |
| 3 |  6 |  9 | 15 | 21 |
| 5 | 10 | 15 | 25 | 35 |
| 7 | 14 | 21 | 35 | 49 |
$ ./bin/preiistmmo table --rows 4
| * |  2 |  3 |  5 |  7 | 11 | 13 |  17 |  19 |  23 |  29 |
|---+----+----+----+----+----+----+-----+-----+-----+-----|
| 2 |  4 |  6 | 10 | 14 | 22 | 26 |  34 |  38 |  46 |  58 |
| 3 |  6 |  9 | 15 | 21 | 33 | 39 |  51 |  57 |  69 |  87 |
| 5 | 10 | 15 | 25 | 35 | 55 | 65 |  85 |  95 | 115 | 145 |
| 7 | 14 | 21 | 35 | 49 | 77 | 91 | 119 | 133 | 161 | 203 |
$ ./bin/preiistmmo table --columns 18 --rows 36 --operation %
|   % | 2 | 3 | 5 | 7 | 11 | 13 | 17 | 19 | 23 | 29 | 31 | 37 | 41 | 43 | 47 | 53 | 59 | 61 |
|-----+---+---+---+---+----+----+----+----+----+----+----+----+----+----+----+----+----+----|
|   2 | 0 | 2 | 2 | 2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |  2 |
|   3 | 1 | 0 | 3 | 3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |  3 |
|   5 | 1 | 2 | 0 | 5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |  5 |
|   7 | 1 | 1 | 2 | 0 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |  7 |
|  11 | 1 | 2 | 1 | 4 |  0 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
|  13 | 1 | 1 | 3 | 6 |  2 |  0 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 |
|  17 | 1 | 2 | 2 | 3 |  6 |  4 |  0 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 |
|  19 | 1 | 1 | 4 | 5 |  8 |  6 |  2 |  0 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 |
|  23 | 1 | 2 | 3 | 2 |  1 | 10 |  6 |  4 |  0 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 |
|  29 | 1 | 2 | 4 | 1 |  7 |  3 | 12 | 10 |  6 |  0 | 29 | 29 | 29 | 29 | 29 | 29 | 29 | 29 |
|  31 | 1 | 1 | 1 | 3 |  9 |  5 | 14 | 12 |  8 |  2 |  0 | 31 | 31 | 31 | 31 | 31 | 31 | 31 |
|  37 | 1 | 1 | 2 | 2 |  4 | 11 |  3 | 18 | 14 |  8 |  6 |  0 | 37 | 37 | 37 | 37 | 37 | 37 |
|  41 | 1 | 2 | 1 | 6 |  8 |  2 |  7 |  3 | 18 | 12 | 10 |  4 |  0 | 41 | 41 | 41 | 41 | 41 |
|  43 | 1 | 1 | 3 | 1 | 10 |  4 |  9 |  5 | 20 | 14 | 12 |  6 |  2 |  0 | 43 | 43 | 43 | 43 |
|  47 | 1 | 2 | 2 | 5 |  3 |  8 | 13 |  9 |  1 | 18 | 16 | 10 |  6 |  4 |  0 | 47 | 47 | 47 |
|  53 | 1 | 2 | 3 | 4 |  9 |  1 |  2 | 15 |  7 | 24 | 22 | 16 | 12 | 10 |  6 |  0 | 53 | 53 |
|  59 | 1 | 2 | 4 | 3 |  4 |  7 |  8 |  2 | 13 |  1 | 28 | 22 | 18 | 16 | 12 |  6 |  0 | 59 |
|  61 | 1 | 1 | 1 | 5 |  6 |  9 | 10 |  4 | 15 |  3 | 30 | 24 | 20 | 18 | 14 |  8 |  2 |  0 |
|  67 | 1 | 1 | 2 | 4 |  1 |  2 | 16 | 10 | 21 |  9 |  5 | 30 | 26 | 24 | 20 | 14 |  8 |  6 |
|  71 | 1 | 2 | 1 | 1 |  5 |  6 |  3 | 14 |  2 | 13 |  9 | 34 | 30 | 28 | 24 | 18 | 12 | 10 |
|  73 | 1 | 1 | 3 | 3 |  7 |  8 |  5 | 16 |  4 | 15 | 11 | 36 | 32 | 30 | 26 | 20 | 14 | 12 |
|  79 | 1 | 1 | 4 | 2 |  2 |  1 | 11 |  3 | 10 | 21 | 17 |  5 | 38 | 36 | 32 | 26 | 20 | 18 |
|  83 | 1 | 2 | 3 | 6 |  6 |  5 | 15 |  7 | 14 | 25 | 21 |  9 |  1 | 40 | 36 | 30 | 24 | 22 |
|  89 | 1 | 2 | 4 | 5 |  1 | 11 |  4 | 13 | 20 |  2 | 27 | 15 |  7 |  3 | 42 | 36 | 30 | 28 |
|  97 | 1 | 1 | 2 | 6 |  9 |  6 | 12 |  2 |  5 | 10 |  4 | 23 | 15 | 11 |  3 | 44 | 38 | 36 |
| 101 | 1 | 2 | 1 | 3 |  2 | 10 | 16 |  6 |  9 | 14 |  8 | 27 | 19 | 15 |  7 | 48 | 42 | 40 |
| 103 | 1 | 1 | 3 | 5 |  4 | 12 |  1 |  8 | 11 | 16 | 10 | 29 | 21 | 17 |  9 | 50 | 44 | 42 |
| 107 | 1 | 2 | 2 | 2 |  8 |  3 |  5 | 12 | 15 | 20 | 14 | 33 | 25 | 21 | 13 |  1 | 48 | 46 |
| 109 | 1 | 1 | 4 | 4 | 10 |  5 |  7 | 14 | 17 | 22 | 16 | 35 | 27 | 23 | 15 |  3 | 50 | 48 |
| 113 | 1 | 2 | 3 | 1 |  3 |  9 | 11 | 18 | 21 | 26 | 20 |  2 | 31 | 27 | 19 |  7 | 54 | 52 |
| 127 | 1 | 1 | 2 | 1 |  6 | 10 |  8 | 13 | 12 | 11 |  3 | 16 |  4 | 41 | 33 | 21 |  9 |  5 |
| 131 | 1 | 2 | 1 | 5 | 10 |  1 | 12 | 17 | 16 | 15 |  7 | 20 |  8 |  2 | 37 | 25 | 13 |  9 |
| 137 | 1 | 2 | 2 | 4 |  5 |  7 |  1 |  4 | 22 | 21 | 13 | 26 | 14 |  8 | 43 | 31 | 19 | 15 |
| 139 | 1 | 1 | 4 | 6 |  7 |  9 |  3 |  6 |  1 | 23 | 15 | 28 | 16 | 10 | 45 | 33 | 21 | 17 |
| 149 | 1 | 2 | 4 | 2 |  6 |  6 | 13 | 16 | 11 |  4 | 25 |  1 | 26 | 20 |  8 | 43 | 31 | 27 |
| 151 | 1 | 1 | 1 | 4 |  8 |  8 | 15 | 18 | 13 |  6 | 27 |  3 | 28 | 22 | 10 | 45 | 33 | 29 |

From the .jar file:

This is the faster way to run the CLI; options and usage are identical to the above, but instead of using lein primes run, you will use the following:

$ lein uberjar
$ java -jar target/preiistmmo-0.2.0-SNAPSHOT-standalone.jar -m preiistmmo.cli.core \
       primes

For example, to view the help:

$ java -jar target/preiistmmo-0.2.0-SNAPSHOT-standalone.jar -m preiistmmo.cli.core \
       primes --help

or, for table:

$ java -jar target/preiistmmo-0.2.0-SNAPSHOT-standalone.jar -m preiistmmo.cli.core \
       table --help

or, to run the benchmarks:

$ java -jar target/preiistmmo-0.2.0-SNAPSHOT-standalone.jar -m preiistmmo.cli.core \
       benchmarks

From lein

If, for whatever reason, you prefer running these from Lein, you can do that too :-) There are aliases setup in the project.clj file that utilize lein trampoline so that the JVM for lein is replaced by the one for preiistmmo, thus allowing preiistmmo to parse options and execute accordingly.

The aliases support two ways of calling, depending upon your CLI specificity needs:

$ lein primes run
$ lein primes run -h
$ lein primes run --start 100 --count 20

or

$ lein preiistmmo run primes
$ lein preiistmmo run primes -h
$ lein preiistmmo run primes --start 100 --count 20

The same applies to table and benchmarks CLIs.

API

Generate the first-n primes using a primality trial by divisors implementation:

(require '[preiistmmo.core :as preiistmmo])
(def algo (preiistmmo/select-algo :divisors))
(preiistmmo/n-primes algo 29)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109)

Note that of the three, the :divisors algorithm has the ability to deliver sequences of prime numbers at any starting point. This makes it very convenient for spliting work across different compute resources.

Changing the alorithm, the same can be done for the other implementations.

Sieve of Eratosthenes:

(def algo (preiistmmo/select-algo :eratosthenes))
(preiistmmo/n-primes algo 29)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109)

Sieve of Sundaram:

(def algo (preiistmmo/select-algo :sundaram))
(preiistmmo/n-primes algo 29)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109)

Using isPrime static method from Apache Commons Math:

(def algo (preiistmmo/select-algo :apache))
(preiistmmo/n-primes algo 29)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109)

Support has been added for grids (e.g., in termainal-based demos):

(require '[clojure.pprint :as pprint])
(pprint (prime-grid algo 5 7))
{:header ("*" 2 3 5 7 11),
 :body
 ({"*" 2, 2 4, 3 6, 5 10, 7 14, 11 22}
  {"*" 3, 2 6, 3 9, 5 15, 7 21, 11 33}
  {"*" 5, 2 10, 3 15, 5 25, 7 35, 11 55}
  {"*" 7, 2 14, 3 21, 5 35, 7 49, 11 77}
  {"*" 11, 2 22, 3 33, 5 55, 7 77, 11 121})}
(preiistmmo/print-prime-grid algo 5 7)
|  * |  2 |  3 |  5 |  7 |  11 |
|----+----+----+----+----+-----|
|  2 |  4 |  6 | 10 | 14 |  22 |
|  3 |  6 |  9 | 15 | 21 |  33 |
|  5 | 10 | 15 | 25 | 35 |  55 |
|  7 | 14 | 21 | 35 | 49 |  77 |
| 11 | 22 | 33 | 55 | 77 | 121 |
:ok

Thoughts on Scaling

Algorithm implementation:

  • This is the domain of "old-school scaling": code-level optimization of algorithms for performance.
  • The :sundaram implementation is in sad need of this! The use of set and intersection is a performance-killer: it would be better to do something like what the copied "Sieve of Eratosthenes" implementation does, using a factors caching table, or some such.
  • There are other implementations that could be used that would be faster, etc. DJB has a great web page that summarizies state of the art in 2004 in calcualtion of primes, complexity, etc,

Single machine:

  • Moving out of the code and into hardware for performance, one could take better advantage of multiple cores.
  • Possible refactoring of code applies here, in order to take advantage of parallelizations, e.g., using pmap or pvalues.
  • The :divisors implementation is nicely primed for this (ugh, sorry about the pun), as you can start the prime generation from any point. This allows for the possibility of some nice batching.
  • Using infrastructure that can treat vast computational/memory/storage resources as a single machine is also an option (e.g., Mesos). It may be possible to scale well in such scenarious by simply containerizing a fast algorithm and running it in an executor that has access to copious distributed resources.

Multiple machines:

  • This is where things get really interesting ;-)
  • Using various messaging and/or streaming mechanisms, prime number generation could be batched and then assembled (a la Hadoop/MapReduce). Storm, Flink, Kafka, Onyx, etc., offer interesting possibilities here. This can be done quite easily with the :divisors implementation, since it is trivial to partition computation for that algorithm by starting integer.
  • While the above works quite well in HPC scenarios designed for embarassingly parallelizable functions, an interesting option becomes available when we think about factorization cache tables and distributed hash tables: this opens up a whole new area for optimizations where compute nodes could perform lookups (and updates) on a distributed hash table (lots of implementations to choose from). Depending upon operational setup, database clusters (Cassandra, Redis, Datomic, etc.) could also be used to similar effect.
  • For the copied "Sieve of Eratosthenes" implementation, a local in-JVM cache table is already being used, and a move to a DHT should be a fairly straight-forward (potentially minor) refactoring (the effort and complexity, of course, then moves from code to operations).

Benchmarks

Of a sort, anyway:

$ lein benchmarks run
Running n-primes for :divisors implementation ...
10 primes: "Elapsed time: 1.41252 msecs"
100 primes: "Elapsed time: 1.922843 msecs"
1000 primes: "Elapsed time: 8.307988 msecs"
10000 primes: "Elapsed time: 233.71016 msecs"
100000 primes: "Elapsed time: 6136.338203 msecs"

Running n-primes for :apache implementation ...
10 primes: "Elapsed time: 0.026569 msecs"
100 primes: "Elapsed time: 0.277534 msecs"
1000 primes: "Elapsed time: 6.589782 msecs"
10000 primes: "Elapsed time: 190.079645 msecs"
100000 primes: "Elapsed time: 6467.238386 msecs"

Running n-primes for :eratosthenes implementation ...
10 primes: "Elapsed time: 1.661216 msecs"
100 primes: "Elapsed time: 1.822298 msecs"
1000 primes: "Elapsed time: 14.686621 msecs"
10000 primes: "Elapsed time: 240.921639 msecs"
100000 primes: "Elapsed time: 4363.927017 msecs"

Running n-primes for :sundaram implementation ...
10 primes: "Elapsed time: 4.086518 msecs"
100 primes: "Elapsed time: 3.433624 msecs"
1000 primes: "Elapsed time: 25.422443 msecs"
10000 primes: "Elapsed time: 1165.294828 msecs"
100000 primes:"Elapsed time: 166559.860863 msecs"

License

Copyright © 2018, Duncan McGreggor

Apache License, Version 2.0