The seventh euler problem requires us to compute the 10001st prime number. To make this problem a bit harder, I decided that the function which will generate prime numbers must be lazy, or at least appear so to the consumer.

My first try involved using the well-known Sieve of Eratosthenes in an unusually clumsy fashion: with a lazy wrapper around it to manage caching of values. It not only kept a list of primes (which takes storage space), but because I hadn’t yet thought much about how many prime numbers to generate, it conservatively generates up to way more primes than it really needs.

Ugly and inefficient code may be good enough for one-off hacks, but part of the reason I am working through Project Euler is to improve my own ability to program, so I will iterate my solution to this a few times until I get a blend of efficiency and cleanliness that I can be satisfied with.

As the programmer’s adage goes, “Make it work, make it right, make it fast.” Of those three steps, here was my step one:

(defn zeroify-multiples [n offset v]
  "Sets every nth element in sequence v to zero, starting at
  the index indicated by offset."
  (loop [i   (if (nil? offset) 0 offset)
         tr  (transient (vec v))
         len (count v)]
    (if (< i len)
      (recur (+ i n) (assoc! tr i 0) len)
      (persistent! tr))))

(defn calc-primes [primes]
  "Returns a list of new primes found between p and 2p, where
  p is the last prime in the (assumed sorted <) list of known primes."
  (let [highest    (last primes)
        primecount (count primes)]
    (loop [i  0
           ns (vec (range highest (* highest 2)))]
      (if (< i primecount)
        (let [prime (nth primes i)]
          (recur (inc i)
                 (zeroify-multiples prime (- prime (mod highest prime)) ns)))
        (rest (remove zero? ns))))))

(defn prime-cacher [[prime new old]]
  "Caches prime numbers that were computed already using the
  non-lazy process."
  (let [nf (first new)
        nr (rest new)
        o  (cons nf old)
        nw (if (< 0 (count nr))
             nr
             (calc-primes (reverse o)))]
    [nf nw o]))

(def lazy-primes-1 (map first (iterate prime-cacher [2 [3 5 7] [2]])))

(nth lazy-primes-1 10000) ;; nth uses 0, not 1 for first element index

Not too pretty, is it? After I finished writing that, I realized I had missed another obvious improvement: I don’t need to keep the full list of all numbers in the sieve of Eratosthenes, I just needed to keep a list of the highest multiple of each prime factor in the search so far. This is a much smaller list of numbers to maintain.

But before I rewrote anything, I decided to stop and think about what I had learned from the first implementation. Well, two properties of primes had become obvious to me:

  1. All primes are odd except 2.
  2. If we haven’t found any prime factors less than or equal to sqrt(n), then the number n must be prime.

I wondered if perhaps abandoning the sieve algorithm all together and just trying to write a tight loop that exploited that second property would be fast enough for my purposes. This was my second attempt, which uses a completely different (naive) algorithm for computing primes.

(defn lazy-primes-2 []
  "A vector-based lazy primes generator."
  (letfn [(next-prime [primes n]
            (let [sqrt-n (int (Math/sqrt n))]
              (if (some #(= 0 (rem n %)) (take-while #(<= % sqrt-n) primes))
                (recur primes (+ n 2))
                (cons n (lazy-seq (next-prime (conj primes n) (+ n 2)))))))]
    (concat [2] (lazy-seq (next-prime [] 3)))))

(nth (lazy-primes-2) 10000) 

That’s much cleaner and easier to understand, and I can see that performance is much better even without any benchmarking yet, but one thing still bugs me. Once a prime number (say, 19) is in the vector of smaller primes, all odd numbers greater than the square of this prime (19^2) will have to be checked against it. That is, say we are checking 403, 405, 407, etc for primality. We will still divide each of these by 19, even though it is obvious that we should only be checking every 19th number. That’s why the Sieve of Eratosthenes could still have better performance here; it wouldn’t perform such obviously unneeded tests.

Then I had an insight into how to implement the Sieve of Eratosthenes in a lazy manner: I could queue up the prime factors under each composite number in a hash map! My third attempt I was thus:

(defn lazy-primes-3 []
  (letfn [(append-under-key [h key val]
             (assoc h key (conj (h key) val)))
          (run-iterators [h n]
             (dissoc (reduce #(append-under-key %1 (+ n %2) %2) h (h n)) n))
          (next-prime [h n]
             (if (h n) ;; If h has any items, it is not prime
               (recur (run-iterators h n) (+ n 2))
               (cons n (lazy-seq
                        (next-prime (append-under-key h (* n n) (+ n n)) (+ n 2))))))]
    (cons 2 (lazy-seq (next-prime {} 3)))))

(time (nth (lazy-primes-3) 10000))

Isn’t that better?

Now it’s time to see how well my 2nd and 3rd attempts fare against other people’s code. Nothing at clojure-euler really caught my eye for this particular problem, but the one in the ‘optimal toolkit‘ by puzzler was rather long and sophisticated; I surmised that its length must result from some optimization. We’ll hold off on that for a moment, but just look at the length of it:

;; From the optimal toolkit at:
;; http://clojure-euler.wikispaces.com/The+Optimal+Toolkit

(defn- wheel2357 [] (cycle [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8
 6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10]))
 
(defn- spin [l n] (lazy-seq (cons n (spin (rest l) (+ n (first l))))))
 
(defn- insert-prime [p xs table]
  (update-in table [(* p p)] #(conj % (map (fn [n] (* n p)) xs))))
 
(defn- reinsert [table x table-x]
  (loop [m (dissoc table x), elems table-x]
    (if-let [elems (seq elems)]
      (let [elem (first elems)]
    (recur (update-in m [(first elem)] #(conj % (rest elem))) (rest elems)))
      m)))
 
(defn- adjust [x table]
  (let [nextTable (first table),
    n (nextTable 0)]
    (if (<= n x) (recur x (reinsert table n (nextTable 1)))
    table)))
 
(defn- sieve-helper [s table]
  (when-let [ss (seq s)]
    (let [x (first ss), xs (rest ss),
      nextTable (first table),
      nextComposite (nextTable 0)]
      (if (> nextComposite x)
    (lazy-seq (cons x (sieve-helper xs (insert-prime x xs table))))
    (recur xs (adjust x table))))))
 
(defn- sieve [s]
  (when-let [ss (seq s)]
    (let [x (first ss), xs (rest ss)]
      (cons x (sieve-helper xs (insert-prime x xs
                                             (sorted-map)))))))

(defn lazy-primes-puzzler  []
  (concat [2 3 5 7] (sieve (spin (wheel2357) 11))))

Interestingly, the code for clojure.contrib.lazy-seq/primes does not use the sieve of Eratosthenes, and uses the same algorithm that I used in lazy-primes-2. As near as I can tell, it exploits clojure’s memoization on lazy lists by carefully using the same var, rather than using an explicit vector as in lazy-primes-2. The automatic memoization makes it a little hard to benchmark directly, so I defined two new versions (the memoryless ‘func‘) version, and the memoized ‘var‘ version. By running ‘anti-memoization-hack’ between tests, the var containing the lazy seq will be reset between benchmarks so that memozation only occurs within a single benchmark run. Let’s compare their code:

(use 'clojure.contrib.def)

(defn anti-memoization-hack []
  (defvar lazy-primes-contrib
    (concat
     [2 3 5 7]
     (lazy-seq
      (let [primes-from
            (fn primes-from [n [f & r]]
              (if (some #(zero? (rem n %))
                        (take-while #(<= (* % %) n) lazy-primes-contrib))
                (recur (+ n f) r)
                (lazy-seq (cons n (primes-from (+ n f) r)))))
            wheel (cycle [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2
                          6 4 6 8 4 2 4 2 4 8 6 4 6 2 4 6
                          2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10])]
        (primes-from 11 wheel)))))
  (def lazy-primes-contrib-var (fn [] lazy-primes-contrib)))


(defn lazy-primes-contrib-func []
  (concat
   [2 3 5 7]
   (lazy-seq
    (let [primes-from
          (fn primes-from [n [f & r]]
            (if (some #(zero? (rem n %))
                      (take-while #(<= (* % %) n) (lazy-primes-contrib-func)))
              (recur (+ n f) r)
              (lazy-seq (cons n (primes-from (+ n f) r)))))
          wheel (cycle [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2
                        6 4 6 8 4 2 4 2 4 8 6 4 6 2 4 6
                        2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10])]
      (primes-from 11 wheel)))))

As you can see, lazy-primes-contrib-var will still exploit memoization, while lazy-primes-contrib-func will not have any memory at all, and act similar to those one-line haskell snippets for making primes that people throw around on the web.

My favorite solution was by Cristophe Grande, not just because he followed a similar approach to me, but also because he was very clever in his use of the hash map. He realized that there is no need to ‘queue’ the entries in the map as I had done; once a number has been marked ‘composite’ in the map, we don’t need to mark it again with each new factor we find. Instead of appending a vector of the prime factors under each composite number key, we could just continue searching for higher, as-yet-unmarked composite that we would have to find later anyway.

I reprint his solution here:

(defn lazy-primes-cgrande []
  (letfn [(enqueue [sieve n step]
            (let [m (+ n step)]
              (if (sieve m)
                (recur sieve m step)
                (assoc sieve m step))))
          (next-sieve [sieve n]            
            (if-let [step (sieve n)]
              (-> sieve
                  (dissoc n)    
                  (enqueue n step))
              (enqueue sieve n (+ n n))))
          (next-primes [sieve n]
            (if (sieve n) 
              (recur (next-sieve sieve n) (+ n 2))
              (cons n (lazy-seq (next-primes (next-sieve sieve n) (+ n 2))))))]
    (cons 2 (lazy-seq (next-primes {} 3)))))

(time (nth (lazy-primes-cgrande) 10000))

It’s tempting to try to play with Christophe’s code an try to improve it further, but I don’t see any obvious ways to parallelize this without creating some significant redundancy, and overhead of coordinating multiple threads probably won’t net us any benefits unless we have really a lot of cores available.

On to the benchmarking…

Let’s see how well everyone did on step three, “Make it fast”, by measuring how fast each of these functions are in an absolutely meaningless benchmark: how fast it runs on my computer! Of course, there will be no controls or attempts to make this scientific.

I used the following code to perform my little benchmark:

(defn avg [v] (double (/ (reduce + v) (count v))))

(defn as-tabbed-line [v] (apply str (apply str (interpose "\t" v)) "\n"))

(defmacro bench [n expr]
  "Runs expression n times and returns the execution times."
  `(for [_# (range ~n)]
     (let [start# (System/nanoTime)
           ret# ~expr
           end# (System/nanoTime)]
       (anti-memoization-hack)
       (/ (double (- end# start#)) 1000000.0))))

(defn test-prime-generators [max generators]
  "Average the result of 50 timed trials to generate X primes
  between 100 and max, spaced logarithmically so X grows 5% each time"
  (let [trials 50
        num-primes  (take-while #(> max %) (iterate #(int (* 1.05 %)) 100))
        take-trials (fn [tr gen pri]
                      (avg (bench tr (nth (gen) pri))))]
    (println
     (for [num num-primes]
       (as-tabbed-line
        (concat [num] (for [g generators]
                        (take-trials trials g num))))))))

(test-prime-generators 1000 [lazy-primes-1
                             lazy-primes-2
                             lazy-primes-3
                             lazy-primes-cgrande
                             lazy-primes-puzzler
                             lazy-primes-contrib-var
                             lazy-primes-contrib-func])

(test-prime-generators 10000 [lazy-primes-2
                              lazy-primes-3
                              lazy-primes-cgrande
                              lazy-primes-puzzler
                              lazy-primes-contrib-var])

Pushing the results of those last two expressions into files, I got the following pretty graphs using Gnuplot:

Average performance of various lazy-prime generators. Notice the stair-stepping performance of the caching system of lazy-primes-1.

We can see that cgrande's code seems to work remarkably well, followed closely by lazy-primes-3.

These trends continue out to at least the first 1,000,000 primes on my machine:

user> (time (nth (lazy-primes-3) 1000000))
"Elapsed time: 19697.895423 msecs"
user> (time (nth (lazy-primes-cgrande) 1000000))
"Elapsed time: 19107.089807 msecs"
user> (time (nth (lazy-primes-puzzler) 1000000))
"Elapsed time: 44241.588567 msecs"
(time (nth (lazy-primes-contrib-var) 1000000))
"Elapsed time: 120645.257555 msecs"

Neat!

My takeaway from this exercise was:

  • Implementing the sieve properly (unlike lazy-primes-1) is essential.
  • A better algorithm (the Sieve vs Naive algorithm) wins.
  • A simpler datastructure (lazy-primes-cgrande) wins.
  • More sophisticated code may be mildly counterproductive. (lazy-primes-puzzler)
  • Hashmaps can be really fast! (lazy-primes-cgrande,lazy-primes-3)
  • Managing the list of primes manually (lazy-primes-2) is slightly faster than letting clojure memoize automatically (clojure.contrib.lazy-seqs/primes), but then again memoization will probably be better for most applications.