Fun with Prime Numbers (C++ console)

A good few years back I created a program to quickly generate prime numbers. I remember being pretty impressed with the speed I achieved. Alas, a hard drive failure caused this nice little bit of code to meet its end. I've now recreated it (and developed it further), and I'm fairly sure that the essence is pretty much the same as before, but it isn't as fast as I remember. Maybe I had some almighty inspiration all those years ago, leading to some incredibly clever and resourceful trick, or maybe it's just wishful thinking. Or maybe it was just a dream, and it never existed.

In any case, I present the new prime number generator, and successive improvements. It uses the sieve of Eratosthenes, and it's fairly fast, generating the first million primes in up to 90ms.

The basic algorithm — Version #1

You're probably well aware of the most basic primality test that exists: to test if a number $n$ is prime, divide it in turn by each of the numbers from 2 up to and including $\sqrt{n}$. If any of the results is a whole number then $n$ has a divisor and is not prime.

Why only up to the square root of $n$? Divisors come in pairs; if $n$ has a divisor larger than $\sqrt{n}$, then it must also have a divisor smaller than $\sqrt{n}$.

The largest the "smaller" of a pair of divisors can be is $\sqrt{n}$ — equal to the "larger" of the pair, i.e. when $n$ is a square!

You'll also be well aware that this is an incredibly ineffficient method of generating prime numbers. A much better method is using a prime sieve, and here we use the simple and elegant sieve of Eratosthenes.

Here is the algorithm:

Start with a list of all integers from $2$ to $N$, the upper limit of your prime search.
Let $p=2$. Go through the list removing all multiples of, but not including, $p$.
Let $p=3$. Go through the list removing all multiples of, but not including, $p$.
Let $p=5$. Go through the list removing all multiples of, but not including, $p$.
Let $p=7$. Go through the list removing all multiples of, but not including, $p$.
Let $p=11$. Go through the list removing all multiples of, but not including, $p$.
... you get the gist. Stop when $p>\sqrt{N}$. (For the same reason as in the basic primality test.)
Congratulations, the surviving numbers in your list are all the primes up to $N$!

Here are some notes on the algorithm:

You'll notice that we let $p$ equal each sucessive prime number, starting with 2. In actual fact, we let $p$ be "the next number in the list", but it so happens that "the next number in the list" is the next prime number, as all numbers in between will already have been removed.
In fact, after we've removed all multiples of $p$, all of the numbers in our list up to the square of the next prime will also be prime! For example, after we've sieved through with $p=7$, all numbers in our list up to (but not including) $11^2$ are prime. Hence, when we start sieving with $p=11$, the first multiple of $p$ we'll need to remove is $p^2=11^2=121$.

If this briefest of explanations doesn't make much sense, please take a trip to Wikipedia. (And even if it does, said small excursion still might not be a bad idea!)

Without further ado, here's my code. VS2010 project file included. Some notes:

We use a char array. Each char holds either a 1 (prime) or 0 (not prime), and the position of each char is the number it corresponds to. For example, the nth char in the array corresponds to the number n. So we use 1 byte of RAM for every number, so we can certainly get into the billions before we start running out of RAM, but that we do.

On my (fairly old) laptop I can find:

- 1 million primes (15,485,863) in 520ms.
- 10 million primes (179,424,673) in 7.5s.
- 50 million primes (982,451,653) in 44s.

SieveOfEratosthenes_v1.zip

Faster, faster! — Version #2

So, we're getting some not-too-shabby runtimes for our sieve. But we can do better that that!

The most glaringly obvious optimisation is with the list of candidates we start with: it's full of even numbers! Now, it's fairly obvious that no even number (except 2) can be prime, but lets analyse it in a slightly more complicated way (to a useful end, I promise!), with modular arithmetic. All natural numbers can be written in one of the following two ways: $$\left\{\begin{matrix} 2k\\ 2k+1 \end{matrix}\right.$$ However. Any number of the form $2k$ is obviously divisible by $2$, so the only form a prime number could take is $2k+1$.

So, by filling our initial array only with numbers of this form, we're not only halving the memory we need, but we're removing the need for the $p=2$ pass (which takes the longest).

So here's the code. VS2010 project file included. You can see that this version is more than twice as fast as version #1!

- 1 million primes (15,485,863) in 240ms.
- 10 million primes (179,424,673) in 3.5s.
- 50 million primes (982,451,653) in 21.1s.
- 100 million primes (2,038,074,743) in 46.2s.

SieveOfEratosthenes_v2.1.zip

And we don't have to stop there ...

Why not do the same for $3$, and $5$, and $7$? Well, one quite convincing answer is that it starts getting very complicated very quickly! Let's including $3$. First we need a number divisble by $2$ and $3$. Since they're primes, the lowest common multiple of $2$ and $3$ is $2\times{}3=6$. Now, all natural numbers can be written in one of the following $6$ ways: $$\left\{\begin{matrix} 6k\\ 6k+1\\ 6k+2\\ 6k+3\\ 6k+4\\ 6k+5 \end{matrix}\right.$$ However, any number of the form $6k$ is divisible by both $2$ and $3$ (because we built it so). Then we can see that $6k+2$ and $6k+4$ are divisble by $2$, and $6k+3$ is divisible by $3$, so the only forms that a prime number could take are: $$\left\{\begin{matrix} 6k+1\\ 6k+5 \end{matrix}\right.$$ So now we only need to even consider $2/6$ of the natural numbers for primality!

Like I said, you could include $5$ too, but then you'll have $lcm(2,3,5)=30$ different cases, $8$ of which could be prime. Include $7$ as well and you'll have $lcm(2,3,5,7)=210$ different cases, $48$ of which could be prime. It gets ugly very quickly, and the benefit of including each additional prime decays exponentially. Here's a graph.

But let's not count our chickens just yet

It's all very well reducing the set of numbers we need to check for primality, but we haven't considered how our code must change to take this into account; and change it must.

Luckily, implementing the first improvement (ignoring all even numbers) was fairly easily. We let each successive char in our array correspond to each successive odd number. In cycling through the multiples of $p$ we simply skip every even multiple, i.e. increase by $2p$ each time. In this way, we can guarantee that we'll never see an even number, so we can directly map every number we come accross to the array — that is, the number $n$ will be represented by the $\frac{(n-1)}{2}$th char in the array — without having to waste precious operations first checking that $n$ is odd.

But we're not quite so lucky when we try to implement our next improvement — only accounting for numbers of the form $6k+1$ or $6k+5$.

Assume we fill our char array with pairs of numbers of the above form, i.e. $\left\{1,5,7,11,13,17,19,23,...\right\}$. Now, let's say we want to remove the $5th$ multiple of $13$, from our array. Which element in our array represents $117$? Well in this case the answer is none of them, since it's of the form $6k+3$. The point is, before we can remove any multiple from our array, we need to spend operations finding out if it's of the form $6k+1$, $6k+3$, or $6k+5$, and taking the modulus of every multiple would surely compromise any benefit.

Naturally I coded it anyway, in the name of tinkering, and it turns out that it's actually $\sim 25\%$ quicker! Here's the code, VS2010 project file included:

- 1 million primes (15,485,863) in 170ms.
- 10 million primes (179,424,673) in 2.6s.
- 50 million primes (982,451,653) in 15.6s.
- 100 million primes (2,038,074,743) in 33.7s.

SieveOfEratosthenes_v2.2.1.zip

One last thing (I don't give up easily)

Taking the modulus of every multiple is obviously is less than ideal. The whole point of a sieve method is so we could do away with divisibility checks!

It occured to me that we should easily be able to keep track of each multiple's remainder mod 6 as we go along, since in cycling through the multiples we add a contant number each time. After some scribbling, I came up with the following:

Some prime number results

For any prime $p>3$, the odd multiples of $p$ follow a repeating pattern$\pmod 6$ of: $$ \begin{cases} p=6k+1 & \hspace{10mm} 1,\,3,\,5,\,...\\ p=6k+5 & \hspace{10mm} 5,\,3,\,1,\,... \end{cases} $$
For any prime $p>3$, $p^2 = 1 \pmod 6$

Proof:
- If $p=6k+1$ then $p^2=36k^2+12k+1\quad=1 \pmod 6$
- If $p=6k+5$ then $p^2=36k^2+60k+25\,\,\,=1 \pmod 6$

So instead of taking the modulus of every multiple of $p$, we simply need to take the modulus of $p$ once, and then cycle through either $\left\{1,\,3,\,5\right\}$ or $\left\{5,\,3,\,1\right\}$ to obtain the remainder$\pmod 6$ of each multiple, depending on whether $p$ takes the form $6k+1$ or $6k+5$.

The second result is useful since the first multiple of $p$ we need to remove is always $p^2$, so the first remainder will always be $1$.

Now for the moment of truth; did this make any difference? Here's the code, VS2010 project file included.

- 1 million primes (15,485,863) in 150ms.
- 10 million primes (179,424,673) in 2.3s.
- 50 million primes (982,451,653) in 14.3s.
- 100 million primes (2,038,074,743) in 30.8s.

SieveOfEratosthenes_v2.2.2.zip

Well, last time I checked 30.8 was less than 33.7! Success! Albeit only an $8\%$ or so increase, but it's better than a kick in the foot.

More, more! — Version #3

So, we've spent some time trying to make our prime generator as fast as possible, and I think we've done pretty well. We can find all the prime numbers up to 2,038,074,743 (yes, just over 2 billion!) in just over 30 seconds! That's fairly ridiculous if you think about it.

The problem is, there is a strict upper limit to the number of primes we can find in this way — determined purely by how much RAM we have available (ignoring virtual memory). On my old machine the 100 millionth prime is pretty much the limit. So what can we do about this?

From bytes to bits

Up till now we've been using a char array or 1's and 0's to keep track of whether a number is prime or not. In other words, we're using 4 bits of memory to hold a binary variable! This is silly, and requires fixing. Unfortunately, this necessitates more complicated, and therefore slower, code. But there we go.

Unhappily, there are no data types in C++ which take just one bit of memory (even a bool takes a whole byte) so we have to improvise: we use an int array, consider each int in its binary representation, and let each sucessive bit correspond to a number. For example, let the first int in he array represent the numbers from 1 to 32. Then, after removing all composite numbers, the int would end up as the following in its binary form (or the reverse, depending on which end you start from — here the leftmost bit represents 1, and the rightmost represents 32): $$ 0\,1\,1\,0\,1\,0\,1\,0\,0\,0\,1\,0\,1\,0\,0\,0\,1\,0\,1\,0\,0\,0\,1\,0\,0\,0\,0\,0\,1\,0\,1\,0 $$ Which, in its decimal form, is $6957218$.

So now we need a way to manipulate individual bits in any given int. Luckily, bitwise operations provide a fairly easy (and quite efficient) way to do this; see this tutorial for a good explanation of bitwise operators.

So, here's the code. VS2010 project file included. Just one note: I originally coded functions to perform the operations on the bit array, but as suggested in Steve Litt's excellent article, I thought I'd see if using macros instead made any significant difference. The result? Only a threefold speed increase!

Version #3.1, extending v2.1 (considering even numbers only) to use a bit array:

- 1 million primes (15,485,863) in 90ms.
- 10 million primes (179,424,673) in 2.35s.
- 50 million primes (982,451,653) in 15.8s.
- 100 million primes (2,038,074,743) in 35s.

SieveOfEratosthenes_v3.1.zip

Version #3.2, extending v2.2.2 (considering numbers of form $6k+1$ or $6k+5$ only) to use a bit array:

- 1 million primes (15,485,863) in 135ms.
- 10 million primes (179,424,673) in 2.32s.
- 50 million primes (982,451,653) in 15.5s.
- 100 million primes (2,038,074,743) in 34s.
- 1 billion primes (22,801,763,489) in 30m.

SieveOfEratosthenes_v3.2.zip

Even more?

A billion prime numbers is a lot. But as a percentage of how many primes exist it's ... well, zero. We can of course continue with this sieve method, but since we've run out of RAM our only option is to get the hard drive involved; sieving all numbers up to N, writing these results to a file, then moving on to the next N integers, and so on.

Storing the primes: save the bit array, or just the prime numbers?

As the primes get larger, the distance between them increases. By the time we reach the millionth prime an average of about 1 in 15 integers are prime. When we reach the billionth, an average of only 1 in 23 are prime. We could think of this as "using" 23 bits to store each prime. Sounds a lot, so maybe it'd be more efficient to just store each prime number explicitly?

To store an integer $x$, we require $\lfloor \log_2{x}\rfloor +1$ bits, so storing the billionth prime would require 35 bits. It seems that storing the bit array would be more efficient! But hang on – not all of the numbers up to the billionth prime would require 35 bits. How many bits would we need to store all the prime numbers up to the millionth? A quick Mathematica calculation reveals that we'd need 22,809,363 bits, as opposed to the 15,485,863 required for the bit array.

One last check: what about when we get to the really big numbers? To answer this we look at the asymptotics: in the limit of large $x$, we'll find one prime in an average of every $\ln{x}$ integers (see the prime number theorem), and to explicitly store $x$ we'll need $\log_2{x}$ bits. Again, storing the bit array itself would be more efficient.

Of course, if we don't store the primes explicitly we'll have to do some work searching through the bit array and then converting to the number each bit represents, but it's a fairly quick operation, and it should be negligible compared to the main sieving process. We also didn't take into account that we're only storing odd numbers in the bit array – a two-fold efficiency boost.

The answer is conclusive: store the bit array!

Alas, for the moment at least, this is where my own quest for the primes ends. But if you've got this far, you'll certainly want to take a look at Steve Litt's article, where he doesn't give up until he's found all primes up to 40 billion, and also discusses how one could get up to the really big numbers.

Useful Links

http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
The main ideas in this post came from this most excellent article by Steve Litt, which I strongly urge you to check out: http://www.troubleshooters.com/codecorn/primenumbers/primenumbers.htm
Tutorial on bitwise operators: http://www.cprogramming.com/tutorial/bitwise_operators.html

Tactical Tinkering