Tuesday, November 3, 2015

Small Prime Sieve in C

You can find primes without using multiplication or division. All you need is a large place to write a large table of numbers.
I showed how to do the brute force method and the easy sieve of Eratosthenes by hand in my Math and English blog, here.

The easy version of the sieve of Eratosthenes is pretty simple. It's practically de rigueur. If you are a computer professional and you haven't yet written one out from scratch, you should look at yourself in the mirror and then sit down and do it now.

If you are not a computer professional, you can copy the source codes I'm going to present here into a source code editor and try them. Maybe it will get you interested in math and programming, or maybe you'll just say, "huh?" But it will be a better way to waste an hour or two than getting stoned.

My motivation in writing these is that I need to re-learn FORTH, and I think I should learn some Ada, and this is one of those problems that, when done the interesting way, exposes a lot of the basic features of a language.

This post is not going to expose much, however. It's just going to get us started.

The algorithm, to a fair degree of detail, will be as follows:

Set up a sieve array. Note that, in the representation below, the numbers are not in the array itself. They are indexes, or addresses into the array. The contents are the lower half.

In C:

    char sieve[ 24 ];

gives us an array called sieve with indexes from 0 to 23:

01234567 89101112131415 1617181920212223
________ ________ ________

By the underscores ("_") here, I mean that we don't know what is in the array. So the first step is to set the sieve array to a state that we can use.

We'll flag non-primes with "0".

Zero and one are by definition, non-prime. (They are integers, and they are not evenly divisible by any other integer, but they are not greater than 1.)

All the rest, we will say, may be prime. We don't know yet. Or, at least, the computer doesn't know yet. (Computers really aren't smart until we program some smarts into them.) So we will set their sieve entries to "1" so we can program some smarts into your computer.

(And it will forget, as soon as the program ends, unless we tell it to write it down somewhere on a disk drive or in flash memory or such. That's the way computers are.)

In C:

    sieve[ 0 ] = 0;
    sieve[ 1 ] = 0;
    for ( index = 2; index < 24; index = index + 1 )
    {   sieve[ index ] = 1;
    }

01234567 89101112131415 1617181920212223
00111111 11111111 11111111

The second step is to decide we are going to start at two.

    prime = 2;

The third step is to start at the first multiple of two and count up through the array, clearing the array at every multiple:

    4, 6, 8, 10, 12, 14, 16, 18, 20, 22

In C:

    for ( index = prime + prime; index < 24; index = index + prime )
    {   sieve[ index ] = 0;
    }

The result of this step when prime = 2 looks like this:

01234567 89101112131415 1617181920212223
00110101 01010101 01010101

We start at 2 times 2, then repeatedly add 2 to the index and clear the sieve array at the resulting multiple of 2.

It will be the same at 3:

    prime = prime + 1;
    for ( index = prime + prime; index < 24; index = index + prime )
    {   sieve[ index ] = 0;
    }

After the pass for 3, the array will look like this:

01234567 89101112131415 1617181920212223
00110101 00010100 01010001

By using prime to count through the potential primes, and index to clear the multiples, we can use the same loop for 2, 3, and all the primes we try.

Really, really simplifying, we will find that every multiple of a non-prime is also a multiple of a prime, so we could just count the variable we call "prime" through the whole list.

    for( prime = 2; prime < 24; prime = prime + 1)
    {
        for ( index = prime + prime; index < 24; index = index + prime )
        {   sieve[ index ] = 0;
        }
    }

When we are testing any prime that is greater than or equal to 12, the first multiple will be greater than or equal to 12 times 2, or 24, which is beyond the end of the array, outside our list.

Fortunately, the inner loop that actually clears the array will check whether index less than 24 before it enters the body of the loop the first time. So the loop won't even start once unless prime is less than 12.

We don't have to in this version, and it doesn't really save much time (relatively speaking) but we can go ahead and stop the outer loop at that point:

    for( prime = 2; prime < 12; prime = prime + 1)
    {
        for ( index = prime + prime; index < 24; index = index + prime )
        {   sieve[ index ] = 0;
        }
    }

If we knew that the flag in the sieve were always valid for the value of prime, then we could skip the inner loop when the flag is "0". This could potentially save a lot more time than just stopping the outer loop early.

Let's walk through the loops a little more carefully.

When prime is 2, the inner loop will clear all multiples of 2, In the process, it will also clear all multiples of multiples of 2:
  • 4 = 2 × 2
  • 6 = 2 × 3
  • 8 = 2 × 4 = (2 × 2) × 2
  • 10 = 2 × 5
  • 12 = 2 × 6 = (2 × 3) × 2
  • 14 = 2 × 7
  • 16 = 2 × 8 = (2 × 4) × 2
  • 18 = 2 × 9 = (3 × 3) × 2
  • 20 = 2 × 10 = (2 × 5) × 2
  • 22 = 2 × 11
So, when the pass for 2 is finished, effectively, so are the passes for 4, 6, 8, and 10. None of those passes have to actually be run.

Likewise, when the pass for 3 has finished, the passes for 6 and 9 don't have to be run. And when the pass for 5 has finished, the pass for 10 doesn't have to be run. How can we take advantage of this information?

After initially setting up the sieve, the flags for 2 and 3 are valid. So, when prime is 2, the flag is valid. The inner loop won't touch 3 when prime is 2, so when prime is 3, the flag is valid.

The inner loop clears the flag at 4 when prime is 2, so, by the time prime is 4, the flag is valid.

Looking carefully, we can see that, for every value of prime, the flags between prime and prime times 2 are always valid. The computer doesn't know, but we do, so we can tell the computer to use that information, by preventing the inner loop when the flag says it isn't really prime:

    for( prime = 2; prime < 12; prime = prime + 1)
    {
        if ( sieve[ prime ] == 1 )
        {   for ( index = prime + prime; index < 24; index = index + prime )
            {   sieve[ index ] = 0;
            }
        }
    }

So we now have a nice framework to work within.

Here is the source for a C version able to find all primes less than 256, with all the variables declared, some convenient manifest constants to make the size of the list easier to change, and other stuff I usually do when I write in C:



/* Archetypical implementation of the sieve of eratosthenes in C
** By Joel Rees, Amagasaki, Japan, 2015
** All rights reserved.
** Permission granted by the author to use this code
** for any purpose, 
** on condition that substantial use 
** shall retain this copyright and permission notice.
*/

#include <stdio.h>
#include <stdlib.h>


#define MAXSIEVE 256

#define FINALPASS ( ( MAXSIEVE + 1 ) / 2 )


int main( int argc, char * argv[] )
{   char sieve[ MAXSIEVE ];
    int index, prime;

    sieve[ 0 ] = 0;
    sieve[ 1 ] = 0;
    for ( index = 2; index < MAXSIEVE; index = index + 1 )
    {   sieve[ index ] = 1;
    }

    for ( prime = 2; prime < FINALPASS; prime = prime + 1)
    {
        if ( sieve[ prime ] == 1 )
        {   for ( index = prime + prime; index < MAXSIEVE; index = index + prime )
            {   sieve[ index ] = 0;
            }
        }
    }

    for ( index = 0; index < MAXSIEVE; index = index + 1 )
    {   printf( "%d: %s prime.\n", index, ( sieve[ index ] == 0 ) ? "is not" : "is" );
    }
    return EXIT_SUCCESS;
}



This is a long post, so my Ada and FORTH implementations will be posted separately:
  • Three versions of the sieve in Ada
    Showing three minor variations of the above algorithm,
  • The sieve in FORTH, but not natural FORTH. Includes code that will compile in 16 kilobytes of RAM on old 8-bit microcomputers, but, of course, won't give you large sieves.
  • Since I have a list, here's the same sieve in an 8-bit BASIC. Its really slow.
  • [24 Feb 2019 -- If you want to crib a way to do many, many more primes, see this post. And it leaves the sieve array available for querying after the program ends. But it doesn't print the primes. That is an exercise left to the reader.;-)]
  • [2 Jan 2020 -- If you want to query the sieve array in C, you can use the OS command line to avoid writing program code to read the queries. But Forth makes querying the sieve natural. See this post for code in C and Forth.]

[You might also be interested in the C language sieve entry in Rosetta Code.] 


No comments:

Post a Comment