RSS FEED

Project Euler: Problem 7

Lucky seven, here we come, problem No. 7 in all its glory:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10001st prime number?

You might remember my complaints from the last time concerning the relative easiness of the last problem. Now I feel fully compensated, primes and a whole lot of them, yay!

If you do a little research on prime numbers, you'll soon find the Sieve of Eratosthenes, an easy to explain way of finding primes. Basically it does something similar to what we did with trial division when identifying a prime, only the other way round. It just crosses out all the multiples of each prime (a number it reaches during the iteration phase and it haven't been crossed out yet). What remains are the primes. Let's look at what wikipedia says about the process:

To find all the prime numbers less than or equal to a given integer n by Eratosthenes' method:

   1. Create a list of consecutive integers from two to n: (2, 3, 4, ..., n).
   2. Initially, let p equal 2, the first prime number.
   3. Starting from p, count up by p and cross out thus found numbers in the list (which will be 2p, 3p, 4p, etc.).
   4. Find the first number not yet crossed out after p; let p now equal this number (which is the next prime).
   5. Repeat steps 3 and 4 until p2 is greater than n.
   6. All the numbers in the list which are not crossed out are prime.

As a refinement, it is sufficient to cross out the numbers in step 3 starting from p2 because all the lesser ones will be already crossed out at that point. That's why it's OK to stop there in step 5. This does not appear in the original algorithm.

A common improvement is to work with odd numbers only (3, 5, ..., n), and count up using 2p as a step (thus excluding all even numbers in advance). This actually appears in the original description.

Creating a list is easy enough, bitarray is my choice here as it allows for easy switching of the true false values of its individual indices and iterating over it is a breeze. As for the second step, as we'll be iterating over the bitarray, this is already taken care of for us so we can skip to next point. The step will be equal to p, crossing out is just setting the index of our bitarray to false. Number four tells us to find a number that haven't been crossed out yet; as on each new iteration the crossed-out numbers will be excluded from being iterated over, only such numbers that number four talks about will remain in it so this is already accomplished; to ensure we won't the cross out the number itself, let's start with crossing out its next multiple, i.e. (p+p). Number five then tells us to test whether the square of our index is bigger than the input number and if so, exit the loop; easy enough. Let's try to code it:

fn sieveEratosthenes nr =
(
    local sieve = #{2..nr}

    for p in sieve while p^2 < nr do
        for i = (p+p) to nr by p do
            sieve[i] = false

    sieve
)

Okay, this works, now we only need to improve it a little bit, let's look at the hints: the first one first, start from p2. We already calculate p2 when we check if the square of p is not bigger than the input number, thus we can easily assign this to a variable and use it later. As for the other part, counting up using 2p as a step and excluding all even numbers. We can do it by crossing out all multiples of two at the start and then setting 2*p as a step for the nested loop.

To shed some light on the reasoning beind this, all the numbers we are dealing with are primes which means they are odd (we skipped two as the only odd prime right at the beginning and will add it later on). If we square and odd number it just means we multiply it by another odd number, and if we then add 2p, i.e. two times the number to it, we have another odd multiple. Let's take 7 as an example, 7 × 7 = 49; 49 + (2 × 7) = 7 × 9; (49 + (2 × 7)) + (2 × 7) = 7 × 11 etc.

Okay, now for the function itself:

fn sieveEratosthenes nr =
(
    local sieve = #{3..nr}
    for i = 4 to nr by 2 do
        sieve[i] = false

    local pSquared

    for p in sieve while (pSquared = p^2) < nr do
        for i = pSquared to nr by 2*p do
            sieve[i] = false

    sieve[2] = true
    sieve
)

Is there anything else that can be done to speed it up? There sure is, even though it won't be all big speed-up. In the line no. 9 we use while (pSquared = p^2) < nr to check if we are still in the range. That means we do the calculation each and every time, even the last time when we are out of range. What we can actually do is to compare our p to a square root of the target number instead of comparing squares of p to the number itself. Just compute the square root once and join the fun:

fn sieveEratosthenes nr =
(
    local sieve = #{3..nr}
    for i = 4 to nr by 2 do
        sieve[i] = false

    local rootNr = nr^0.5
    local pSquared

    for p in sieve while p < rootNr do
    (
        pSquared = p^2
        for i = pSquared to nr by 2*p do
            sieve[i] = false
    )
    sieve[2] = true
    sieve
)

And a finishing touch to take care of really big numbers: we might want to cache 2*p to save some calculation cycles; also let's make it local to the for-loop, the higher in the scope it'd be, the longer the access time.

fn sieveEratosthenes nr =
(
    local sieve = #{3..nr}
    for i = 4 to nr by 2 do
        sieve[i] = false

    local rootNr = nr^0.5

    for p in sieve while p < rootNr do
    (
        local pSquared = p^2, twoP = 2*p
        for i = pSquared to nr by twoP do
            sieve[i] = false
    )
    sieve[2] = true
    sieve
)

That all looks like a lot of effort, are we done yet? Actually not, this is just the first part of the problem, this way we are able to get primes up to an arbitrary number but what we need to do is get at least a certain number of primes. Not knowing the upper boundary we could just guess; if our guess would be too small we can add to it, if it's too big it doesn't really matter as we get our prime in the resulting array anyway. But wait, there's a way to get an approximate number of primes for the given upper boundary, that should be much more effective than really sieving each and every time.

It's called the offset logarithmic integral, Li(x), and it can be expressed as Li(x) = li(x) - li(2) where li(x) is the logarithmic integral function and as we see further on in the description it can be also described in the form of series expansions. The first series expansion obtained via the exponential integral, Ei(ln x), is of a particular interest for us (well, because it's not so hard to implement), we only substitute the eu for x and u for ln x (or log x in maxscript syntax) and we basically get γ + ln (ln x) + (ln x)^1/(1 × 1!) + .. + (ln x)^∞/(∞ × ∞!) where γ is the Euler-Mascheroni gamma constant.

We can separate the start of the series with the γ, ln (ln x) and the first term which evaluates to x and then iterate over it term by term. The terms in this exponential integral function get get quite small after reaching a peak quite soon. This is due to the fact that (ln x)^n continues to grow steadily in contrast to its divisor, (n × n!). As we can't go all the way to the infinity, we need to limit the execution of the function and here we can actually make use of the imprecision of a single precision float – numbers small enough not to matter in our case can be caught by testing if they are > 0. As we iterate with a step of one, we can easily get the factorial part of the divisor by multiplying the previous value of nFact (for the first step that we've skipped we need to assign the value of 1.

fn exponentialIntegral x =
(
    local term
    local nFact = 1
    local inf = 1.0/0.
    local result = 0.577216 + log x + x

    for n = 2 to inf
        while (term = (x^n)/(n*(nFact *= n))) > 0 do
            result += term
    result
)

Now we are able to calculate Ei(x). Getting the offset logarithmic integral is a breeze. Starting with the equation Li(x) = li(x) - li(2), we can substitute li(x) for Ei(ln x) and the value of li(2) we can either calculate ourselves as well or be lazy and take it from the wikipedia article. As we want an integer in the end, we add .5 to the result and convert it to integer. To combine these two steps we can just insert a value of 0.545164, which is a rounded-off value of 0.5 - li(2).

fn offsetLogarithmicIntegral x =
    int((exponentialIntegral(log x)) - 0.545164)

This enables us to get an approximate number of primes less than a given number. To get the 10001st prime, we can start with 10001 as the value of this number, it doesn't make sense to start at one here, and then multiply this limit by 1,5 until we reach a sufficient number of supposed primes. Then we pass this number to our sieve and get the value at the 10001st index and that's the desired result.

(
    fn sieveEratosthenes nr =
    (
        local sieve = #{3..nr}
        for i = 4 to nr by 2 do
            sieve[i] = false

        local rootNr = nr^0.5

        for p in sieve while p < rootNr do
        (
            local pSquared = p^2, twoP = 2*p
            for i = pSquared to nr by twoP do
                sieve[i] = false
        )
        sieve[2] = true
        sieve
    )

    fn exponentialIntegral x =
    (
        local term
        local nFact = 1
        local inf = 1.0/0.
        local result = 0.577216 + log x + x

        for n = 2 to inf
            while (term = (x^n)/(n*(nFact *= n))) > 0 do
                result += term
        result
    )

    fn offsetLogarithmicIntegral x =
        int((exponentialIntegral(log x)) - 0.545164)

    local limit = 10001.
    while (offsetLogarithmicIntegral limit) < 10001 do
        limit *= 1.5

    ((sieveEratosthenes limit) as array)[10001]
)

While this exercise took much more time than the last one, it was certainly worth it. I'm pretty sure it can be optimized even more so if you have any suggestions let me know in the comments.

DISCLAIMER: All scripts and snippets are provided as is under Creative Commons Zero (public domain, no restrictions) license. The author and this blog cannot be held liable for any loss caused as a result of inaccuracy or error within these web pages. Use at your own risk.

This Post needs Your Comment!

Return to top