Optimizing the Sieve of Eratosthenes
by
, July 9th, 2012 at 06:29 PM (17163 Views)
The Sieve of Eratosthenes is one of the fastest methods for generating a list of prime numbers less than a given number n. A general description of the algorithm is given below:
1. Initialize a list representing the numbers [2,n). This denotes whether a given number is prime or composite.
2. Start from the beginning of the list. If the number is denoted as prime, save it. Otherwise, move on.
3. For every multiple of the current prime, mark it as a composite number.
Below is code implementing the Sieve of Eratosthenes as naively as possible:
public static List<Integer> eratosthenes_naive(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>(); for (int i = 2; i < is_composite.length; ++i) { if (!is_composite[i]) { results.add(i); for (int j = i; j < is_composite.length; j += i) { is_composite[j] = true; } } } return results; }
While fairly quick, there are several optimizations which can be made. The first such optimization is to reduce the inner loop's span. It only needs to start from j = i*i because all values smaller have been marked already.
public static List<Integer> eratosthenes_optimized1(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>(); for (int i = 2; i < is_composite.length; ++i) { if (!is_composite[i]) { results.add(i); for (long j = (long) i * i; j < is_composite.length; j += i) { is_composite[(int) j] = true; } } } return results; }
Another optimization that can be made is that the external loop only needs to run for every odd number.
public static List<Integer> eratosthenes_optimized2(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>(); results.add(2); for (int i = 3; i < is_composite.length; i += 2) { if (!is_composite[i]) { results.add(i); for (long j = (long) i * i; j < is_composite.length; j += i) { is_composite[(int) j] = true; } } } return results; }
A third optimization is to provide an upper bound for the list of prime numbers. An approximation of the upper bound of the prime counting function (the number of primes below a given number n) is 1.25506 * n / ln(n).
public static List<Integer> eratosthenes_optimized3(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>((int) Math.ceil(1.25506 * n / Math.log(n))); results.add(2); for (int i = 3; i < is_composite.length; i += 2) { if (!is_composite[i]) { results.add(i); for (long j = i * (long) i; j < is_composite.length; j += i) { is_composite[(int) j] = true; } } } return results; }
The last optimization is to completely remove all even numbers from the is_composite array. This requires a transpose of all indices.
public static List<Integer> eratosthenes_optimized4(int n) { boolean[] is_composite = new boolean[n - 2 >> 1]; List<Integer> results = new ArrayList<>((int) Math.ceil(1.25506 * n / Math.log(n))); results.add(2); for (int i = 0; i < is_composite.length; ++i) { if (!is_composite[i]) { results.add(2 * i + 3); for (long j = 4L * i * i + 12L * i + 9; j < 2 * is_composite.length + 3; j += 4 * i + 6) { is_composite[(int) (j - 3L >> 1)] = true; } } } return results; }
Results
As with any attempts at optimizing code, it's important to benchmark the code. For this, I created a simple test which runs each method 10 times for n = 0x4000000 (~67 million). The results are given below. All times are in milliseconds.
naive
min: 2626
avg: 2928
max: 3121
optimized 1 (inner loop optimization)
min: 1607
avg: 1808
max: 1994
optimized 2 (outer loop optimization)
min: 1532
avg: 1658
max: 1835
optimized 3 (memory optimization)
min: 1516
avg: 1618
max: 1738
optimized 4 (structure optimization)
min: 845
avg: 930
max: 1040
I also tracked ideal memory use. Method 4 uses by far the least memory ~50 MB, method 3 is next at ~82 MB. The other methods are difficult to track due to the expanding nature of ArrayLists, but it's sufficient to say they will at best use at least ~82 MB of memory.
The test was also run with a larger data set with n = 0x3FFFFFFF (~1 billion).
naive
min: 57899
avg: 62960
max: 73454
optimized 1
min: 34069
avg: 35855
max: 38507
optimized 2
min: 26480
avg: 27870
max: 30047
optimized 3
min: 25290
avg: 27920
max: 30100
optimized 4
min: 17201
avg: 17324
max: 17914
Summary
Through careful testing and analysis it's possible to achieve ~2-3 times speedup over the naive Sieve of Eratosthenes. There are other programs which implement various features such as segmented sieves, wheel factorization, and multi-threading to achieve some very impressive results. The current leader as far as I know is primesieve. It can compute primes up 1 billion in ~0.08 seconds utilizing multi-threading, or in ~0.4 seconds with just a single thread.
Update
Rather than have a boolean (which takes ~ 1 byte) represent each number, I made a recent change which represents each number as a bit in a char. This drastically cuts memory use by ~50%. Performance is not drastically affected, possibly slightly faster.
public static List<Integer> eratosthenes_optimized5(int n) { if (n < 2) { return new ArrayList<Integer>(); } char[] is_composite = new char[(n - 2 >> 5) + 1]; final int limit_i = n - 2 >> 1, limit_j = 2 * limit_i + 3; // boolean[] is_composite = new boolean[n - 2 >> 1]; List<Integer> results = new ArrayList<>((int) Math.ceil(1.25506 * n / Math.log(n))); results.add(2); for (int i = 0; i < limit_i; ++i) { if ((is_composite[i >> 4] & 1 << (i & 0xF)) == 0) { results.add(2 * i + 3); for (long j = 4L * i * i + 12L * i + 9; j < limit_j; j += 4 * i + 6) { is_composite[(int) (j - 3L >> 5)] |= 1 << (j - 3L >> 1 & 0xF); } } } return results; }