Sieve Of Eratosthenes

: Sift the twos and sift the threes:
: The Sieve of Eratosthenes.
: When the multiples sublime,
: The numbers that remain are prime.

A reasonably efficient way to enumerate the PrimeNumbers (although there are more complex but faster ways), but if you want to test whether a specific given number is prime there are far more efficient methods.

http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
"The Genuine Sieve of Eratosthenes," Melissa E. O’Neill: http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf (PDF)
http://www.utm.edu/research/primes/prove/prove2_1.html
http://www.fpx.de/fp/Software/Sieve.html

Compare

QuadraticSieve? for factorization: http://planetmath.org/encyclopedia/QuadraticSieve.html
StanislawUlam?'s "LuckyNumbers?": http://www.maa.org/mathland/mathtrek1.html
Sieving to produce lists of squares, cubes, etc.: http://go.hrw.com/resources/go_mt/a1/c8/ASIEVES.PDF (PDF)

There is a lot of confusion about this problem. It is *not* about finding any old algorithm to find a series of primes; SieveOfEratosthenes is a very *specific* algorithm for finding all the prime numbers up to a given number, n. Anything that sharply deviates from it is not the S of E, no matter how good that new algorithm is at finding primes.

Here's the SieveOfEratosthenes:

On paper, write the numbers 2...n
Circle the first number, 2. Call it P
Strike out all the P's multiples (for 2 it's 4, 6, 8, ...) by counting up from P in increments of P
Circle the smallest number not yet struck out. Call it P, and repeat from the previous step
When there's no more numbers to circle, stop. You've now circled all the prime numbers up to n

There can be no significant deviation from those steps, else it is not Sieve of E. Here is a quickie CeeLanguage implementation of same. #include <stdio.h> #include <stdlib.h>

 int main(int argc, char **argv) {
        int     top_value = 100;
        int     count     = top_value - 1;
        int     *array    = calloc( top_value + 1, sizeof(int));
        int     i, prime, multiple;
        /* mark each int as potentially prime                                    */
        for (i=2; i <= top_value; ++i) 
                array[i] = 1;
        /* for each starting prime, mark its every multiple as non-prime         */
        for (prime = 2; prime <= top_value; ++prime)
        {
                if (array[prime])
                        for (multiple = 2*prime; multiple <= top_value; multiple += prime)
                                if (array[multiple]) {
                                        array[multiple] = 0;
                                        --count;
                                }
        }
        /* Now that we have marked all multiples of primes as non-prime, print   */
        /* the remaining numbers that fell through the sieve, and are thus prime */
        for (i=2; i <= top_value; ++i)
        {
                if (array[i])
                        printf("%d ", i);
        }
        printf("\n\n %d primes up to %d found.\n", count, top_value);
        exit(0);
 }

-- DougMerritt (I took a liberty to edit the description and the code above somewhat -- WillNess )

An obvious refinement to the algorithm is to strike out the multiples of prime P starting from P*P instead of from 2*P, because all the previous ones will have been already stricken out on previous steps. That means that it's OK to stop when the prime exceeds the square root of the top value. This will obviously happen automatically if you start from the square of P.

A common improvement is to work with odds only, thus saving about half of all the work, i.e. of removing the multiples of two, when they are not considered in the first place (cf. WheelFactorization optimization). Another is to work with bit-array instead of an array of ints; also, to use a fixed-size array small enough to fit into your cache memory and keep the multiples-generating primes info separately. -- WillNess

A comment, prompted by the SieveOfEratosthenesInManyProgrammingLanguages page ...

Some of the implementations over there use this algorithm:

Get a list of all integers (possibly implicit, possibly delayed)
For as long as necessary
- Remember the first element, P
- Print P - it's prime
- Remove P from the front
- For each remaining element
  - Compute (n mod P), and if the result is 0 ...
    - Remove it from the list

This is not, in fact, the sieve of Eratosthenes. This is exhaustive trial division. For each number so far identified as prime, each number in the list is checked by a trial division (assuming "n mod P" is calculated using division). In effect, each number in our original range is divided by each prime until either it's known to be prime, or it's identified as composite.

For the real sieve of Eratosthenes, each prime has its multiples marked for removal from the remaining numbers, these multiples being found by successive addition (counting up in steps of size P). This would normally be implemented by holding the numbers in a vector with flags, so that multiples are "removed" by flagging them, and all the entries in the vector that are left unflagged in the end, signify the prime numbers.

Another possibility is to keep an actual list of remaining numbers. In that case you remove composite numbers from the list, and then subsequent tests must be made by performing a division, rather than simply counting up in the vector. Or you could generate the (additional) lists of multiples and then actually remove them from the intial candidates list, but then you have to take care to do that in an efficient manner approaching the efficiency of a direct-access O(1) mutable "canvass" of an array that holds and combines the results very efficiently. The sieve thus becomes more incremental, immediate, generating its results one-by-one. Or you can use smaller, fixed sized array for that effect, preferably one that will fit into the cache size of your computer which is supposed to also give you a considerable speedup.

Of course, a vector with some bits set can be considered to be a list, and, similarly, a list can be represented as a vector with some bits set. The difference lies in how you determine if a remaining element is a multiple of a prime. If you perform a division, you've not implemented the "proper" sieve algorithm. The difference being, when testing you have to test each candidate number, and for primes the tests are many (no early bailout as for the composites); but when removing multiples of primes by counting up from them, you get the remaining numbers as primes for free. Of course if all the divisons were to be performed at once on some massively parallel hardware turning all the tests for each candidate number into an O(1) operation, trial division would be just fine.

The problem with trial division is it tests each number in isolation. But all non-composite numbers can not be but prime; so all the tests on primes were for nothing, they added no new information into the system. Moreover for the composites, trial division tests them against all the preceding primes, but they are only multiples of their prime factors, so those additional tests were for nothing too.

Hence the worse complexity than that of the "genuine" sieve, which works "upwards" from the known factors towards their multiples, each operation being meaningful and thus necessary (almost; the remaining redundancy is in double hits on composites which are multiples of several prime factors, but this seems to be unavoidable but through additional computations with prohibitive cost, like Euler's sieve or its equivalent, the prime wheels spirals - both demanding for too much context to be retained during computation, making it much less localized, so the operational overhead would seem to outgrow the algorithmic benefit - plus, both are inherently linear, with each step depedent on results of previous one, whereas the streams of primes' multiples in sieve of Eratosthenes are produced each independently of another).

If the above explanation is clear to you then you don't need to read any further. However, for the sake of being concrete here are two PythonLanguage scripts to illustrate the point. They are not intended to be idiomatic Python, they are not intended to be general. They are for teaching about the algorithm, not about Python. Feel free to improve them, but please, keep in mind the purpose.

    candidates = range(2,n+1)
    limit = int(sqrt(n))
    prime = candidates[0]
    result = []
    while prime<=limit:
        result += [prime]
        c0 = []
        for i in candidates:
            if 0 != i%prime:
                c0.append(i)
        candidates = c0
        prime = candidates[0]
    result += candidates

Notice how elements are actually removed from the list of candidates, and explicit divisions are performed to see which remaining candidates should be kept. This is *not* a sieve of E.

Now here's the second version, setting composite numbers to None. This is a proper sieve of E.:

 def sieve(n):
    candidates = range(n+1)
    fin = int(n**0.5)
    for i in xrange(2, fin+1):
        if not candidates[i]:
            continue
        candidates[2*i::i] = [None] * (n//i - 1)
    return [i for i in candidates[2:] if i]

 sieve(19)            # returns [2, 3, 5, 7, 11, 13, 17, 19]

CategoryMath