Large examples of strong pseudoprimes to several bases

A strong pseudoprime to base a is a composite number that passes the strong probable prime test (i.e. the Miller-Rabin test) in the base a. There are indications that strong pseudoprimes are rare. According to [6], there are only 4842 numbers below 25 \cdot 10^9 that are strong pseudoprimes to base 2. Strong pseudoprimes to several bases are even rarer. According to [6], there are only 13 numbers below 25 \cdot 10^9 that are strong pseudoprimes to bases 2, 3 and 5. The paper [4] tabulates the numbers below 10^{12} that are strong pseudoprimes to bases 2, 3 and 5. That listing contains only 101 numbers. These examples show that strong pseudoprimes to several bases may be rare but they do exist. In this post we discuss two rather striking examples of large numbers that are strong pseudoprimes to the first 11 prime bases for the first one and to the first 46 prime bases for the second one. One important lesson that can be drawn from these examples is that the implementation of the strong probable prime test must be randomized.

___________________________________________________________________

The strong probable prime test

Given an odd number n whose “prime versus composite” status is not known, set n-1=2^k \cdot Q where k \ge 1 and Q is odd. Then calculate the following sequence of k+1 numbers:

    \displaystyle a^Q, \ a^{2Q}, \ a^{2^2 Q}, \ \cdots, \ a^{2^{k-1} Q}, \ a^{2^{k} Q} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

where a is some integer relatively prime to n. The first term a^Q can be efficiently calculated using the fast powering algorithm. Each subsequent term is the square of the preceding term. Each term is reduced modulo n.

If a^Q \equiv 1 \ (\text{mod} \ n) (i.e. the first term in sequence (1) is a 1) or a^{2^j \cdot Q} \equiv -1 \ (\text{mod} \ n) for some j=0,1,2,\cdots,k-1 (i.e. the term preceding the first 1 in the sequence is a -1), then n is said to be a strong probable prime to the base a. A strong probable prime to base a could be a prime or could be composite. If the latter, it is said to be a strong pseudoprime to base a. In fact, most strong probable primes to one base are prime.

The strong probable prime test consists of checking whether n is a strong probable prime to several bases. If n is not a strong probable prime to one of the bases, then n is composite for sure. If n is a strong probable prime to all of the bases being used, then n is likely a prime number in that the probability that it is composite is at most 4^{-u} if u is the number of bases.

The last probability of 4^{-u} is what makes the 337-digit number N defined below striking. Here we have a number that is a strong probable prime to 46 bases. What could go wrong in declaring it a prime number? The problem is that using a pre-determined set of bases is not a randomized implementation of the strong probable prime test.

___________________________________________________________________

Examples

The following is a 46-digit found in [3] that is a strong pseudoprime to the first 11 prime bases (the prime numbers from 2 to 31).

    M=
    1195068768795265792518361315725116351898245581

The following is a 337-digit number found in [3] that is a strong pseudoprime to all 46 prime bases up to 200 (the prime numbers from 2 to 199).

    N=
    80383745745363949125707961434194210813883768828755
    81458374889175222974273765333652186502336163960045
    45791504202360320876656996676098728404396540823292
    87387918508691668573282677617710293896977394701670
    82304286871099974399765441448453411558724506334092
    79022275296229414984230688168540432645753401832978
    6111298960644845216191652872597534901

The following sets up the calculation for the Miller-Rabin test (strong probable prime test).

    M-1=2^2 \cdot Q

    N-1=2^2 \cdot R

where Q and R are odd and

    Q=
    298767192198816448129590328931279087974561395

    R=
    20095936436340987281426990358548552703470942207188
    95364593722293805743568441333413046625584040990011
    36447876050590080219164249169024682101099135205823
    21846979627172917143320669404427573474244348675417
    70576071717774993599941360362113352889681126583523
    19755568824057353746057672042135108161438350458244
    6527824740161211304047913218149383725

___________________________________________________________________

Random bases

Though the second number N is very striking, the author of [3] has an even larger example in [2], a 397-digit Carmichael number that is a strong pseudoprime to all the 62 prime bases under 300! One lesson from these examples is that the implementation of the strong probable prime test should be randomized, or at least should include some randomly chosen bases in the testing. Any algorithm that implements the strong probable prime test in any “fixed” way (say, only checking the prime bases up to a certain limit) may incorrectly identify these rare numbers as prime.

Let’s apply the strong probable prime test on the above numbers N and M using some random bases. Consider the following randomly chosen bases b and c where 1<b<M and 1<c<N.

    b=
    932423153687800383671087185189848318498417236

    c=
    23876349986768815408041169070899917334655923628776
    58344592618224528502905948639172375368742187714892
    08287654410018942444630244906406410549094447554821
    42803219639200974486541191341820595453041950723891
    75748815383568979859763861820607576949539863746244
    98291058421101888215044176056586791235119485393994
    789287924346696785645273545040760136

___________________________________________________________________

The calculation using random bases

Here’s the calculation for the 46-digit number M.

    b^Q \equiv t_1 \ (\text{mod} \ M)

    b^{2 \cdot Q} \equiv t_2 \ (\text{mod} \ M)

    b^{M-1}=b^{4 \cdot Q} \equiv t_3 \ (\text{mod} \ M)

where t_1, t_2 and t_3 are:

    t_1=
    1042866890841899880275968177787239559549445173

    t_2=
    826876320842405260107866407865475914456579581

    t_3=
    1195068768795265792518263537659322626328455738

Note that the last number t_3 is not a 1. So the number M is not a strong probable prime to base b. This means that it is composite.

Here’s the calculation for the 337-digit number N.

    c^R \equiv g_1 \ (\text{mod} \ N)

    c^{2 \cdot R} \equiv g_2 \ (\text{mod} \ N)

    c^{N-1}=c^{4 \cdot R} \equiv g_3 \ (\text{mod} \ N)

where g_1, g_2 and g_3 are:

    g_1=
    43050290968654965761881145696359381339174664947842
    93659429396009893693594328847691223585119425166890
    43134041173054778367051375333950357876719375530986
    40705386242996844394887879855798166233504226845778
    76290707027478869178569806270616567220414388766208
    75314254126730991658967210391794715621886266557484
    525788655243561737981785859480518172

    g_2=
    69330060289060873891683879069908453474713549543545
    79381982871766126161928753307995063953670075390874
    26152164526384520359363221849834543643026694082818
    11219470237408138833421506246436132564652734265549
    18153992550152009526926092009346342470917684117296
    93655167027805943247124949639747970357553704408257
    9853042920364675222045446207932076084

    g_3=
    80383745745363949125707961434194210813883768828755
    81458374889175222974273765333652186502336163960045
    45791504202360320876656996676098728404396540823292
    87387918508691668493091034289810372813316104284761
    45244183107779151749589951324358651494383103941607
    35777636976785468267798055151468799251924355070195
    1772723904683953622590747688533861698

Interestingly, the last number g_3 agrees with the modulus N from the first digit (the largest) to digit 167. It is clear that g_3 is clearly not a 1. So the 337-digit number N is not a strong probable prime to base c, meaning it is composite. Even though the number M is a strong pseudoprime to all of the first 46 prime bases, it is easy to tell that it is composite by using just one random base. This calculation demonstrates that it is not likely to make the mistake of identifying large pseudoprimes as prime if randomized bases are used in the strong probable prime test.

___________________________________________________________________

Some verification

We also verify that the 46-digit M is a strong pseudoprime to the first 11 prime bases.

    \left[\begin{array}{rrrrrrr}      a & \text{ } & a^Q \ (\text{mod} \ M) & \text{ }  & a^{2Q} \ (\text{mod} \ M) & \text{ } & a^{4Q} \ (\text{mod} \ M) \\           \text{ } & \text{ } & \text{ }  \\            2 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\      3 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      5 & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\      7 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\      11 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\        13 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\      17 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\      19 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\      23 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\      29 & \text{ } & * & \text{ } & -1 & \text{ } & 1  \\        31 & \text{ } & * & \text{ } & -1 & \text{ } & 1             \end{array}\right]

The asterisks in the above table mean that those cells have numbers that are \not \equiv \pm 1 modulo M. Clearly the table shows that the number M passes the strong probable prime test for these 11 bases. We also verify that M is not a strong probable prime to base 37, the next prime base.

    37^Q \equiv w_1 \ (\text{mod} \ M)

    37^{2 \cdot Q} \equiv w_2 \ (\text{mod} \ M)

    37^{M-1}=b^{4 \cdot Q} \equiv w_3 \ (\text{mod} \ M)

where

    w_1=
    977597583337476418144488003654858986215112009

    w_2=
    368192447952860532410592685925434163011455842

    w_3=
    1195068768795265792518263537659322626328455738

Note that the last number w_3 agrees with the modulus M for the first 22 digits and they differ in the subsequent digits. It is clear that w_3 is not 1. So the strong pseudoprimality of M stops at the base 37.

We do not verify the number N for all the 46 prime bases. We only show partial verification. The following calculation shows the pseudoprimality of N to bases 197 and 199, and that the strong pseudoprimality of N fails at 211, the next prime base.

    197^R \equiv * \ (\text{mod} \ N)

    197^{2 \cdot R} \equiv -1 \ (\text{mod} \ N)

    197^{N-1}=197^{4 \cdot R} \equiv 1 \ (\text{mod} \ N)

    __________________

    199^R \equiv * \ (\text{mod} \ N)

    199^{2 \cdot R} \equiv -1 \ (\text{mod} \ N)

    199^{N-1}=199^{4 \cdot R} \equiv 1 \ (\text{mod} \ N)

    __________________

    211^R \equiv v_1 \ (\text{mod} \ N)

    211^{2 \cdot R} \equiv v_2 \ (\text{mod} \ N)

    211^{N-1}=211^{4 \cdot R} \equiv 1 \ (\text{mod} \ N)

where

    v_1=
    48799236892584399744277334997653638759429800759183
    35229821626086043683022014304157526246067279138485
    99367014952239377476103536546259094903793421522217
    99291447356172114871135567262925519534270746139465
    22832800077663455346130103616087329184090071367607
    57478119260722231575606816699461642864577323331271
    2974554583992816678859279700007174348

    v_2=
    80191643327899921083661290416909370601037633208226
    50175490124094760064341402392485432446383194439467
    16432633017071633393829046762783433857505596089159
    3600905184063673203

Note that v_2 is clearly not congruent to -1. Thus the number N is not a strong pseudoprime to base 211 (though it is a pseudoprime to base 211). Clearly, the above calculation indicates that the number N is a strong pseudoprime to bases 197 and 199.

___________________________________________________________________

Remark

Because strong pseudoprimes are rare (especially those that are strong pseudoprimes to several bases), they can be used as primality tests. One idea is to use the least pseudoprimes to the first n prime bases [5]. This method is discussed in this previous post.

Another idea is to use the listing of strong pseudoprimes to several bases that are below a bound. Any number below the bound that is strong probable primes to these bases and that is not on the list must be a prime. For example, Table 1 of [4] lists 101 strong pseudoprimes to bases 2, 3 and 5 that are below 10^{12}. This test is fast and easy to use; it requires only three modular exponentiations to determine the primality of numbers less than 10^{12}.

The numbers M and N are not Carmichael numbers since b^{M-1} \not \equiv 1 \ (\text{mod} \ M) and c^{N-1} \not \equiv 1 \ (\text{mod} \ N), making the random numbers b and c Fermat witnesses for these two numbers respectively. Another way to see that they are not Carmichael is that each of the two number M and N is also a product of two distinct primes according to [3].

An even more striking result than the examples of M and N is that it follows from a theorem in [1] that for any finite set of bases, there exist infinitely many Carmichael numbers which are strong pseudoprimes to all the bases in the given finite set. This is discussed in the next post.

___________________________________________________________________

Reference

  1. Alford W., Granville A., and Pomerance C., On the difficulty
    of finding reliable witnesses
    , L. Adleman and M.-D. Huang, editors, Algorithmic Number Theory: Proc. ANTS-I, Ithaca, NY, volume 877 of
    Lecture Notes in Computer Science, pages 1–16. Springer–Verlag, 1994.
  2. Arnault F., Constructing Carmichael numbers which are strong pseudoprimes to several bases, J. Symbolic Computation, 20, 151-161, 1995.
  3. Arnault F., Rabin-Miller primality test: composite numbers that pass it, Math. Comp., Volume 64, No. 209, 355-361, 1995.
  4. Jaeschke G., On strong pseudoprimes to several bases, Math. Comp., Volume 61, No. 204, 915-926, 1993.
  5. Jiang Yupeng, Deng Yingpu, Strong pseudoprimes to the first 9 prime bases, arXiv:1207.0063v1 [math.NT], June 30, 2012.
  6. Pomerance C., Selfridge J. L., Wagstaff, S. S., The pseudoprimes to 25 \cdot 10^9, Math. Comp., Volume 35, No. 151, 1003-1026, 1980.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}

Looking for a good witness

For some primality tests, the approach of proving a number as composite is to produce an auxiliary number that shows that a property of prime numbers is violated. Such an auxiliary number is called a “witness” for compositeness. When using such a primality test on a composite number, for the test to be correct, at least one “witness” must be found. In this post, we compare the Fermat primality test and the Miller-Rabin test from the standpoint of producing “witnesses”. In this regard, the Fermat test works correctly most of the time. But there is a rare but infinite class of composite numbers for which no Fermat test witness can be found. On the other hand, using the Miller-Rabin test on a composite number, at least one witness can be found (in fact, there are lots of them).

___________________________________________________________________

Fermat witnesses

According to Fermat’s little theorem, if n is a prime number, then the following congruence holds for all numbers a that are coprime to n.

    \displaystyle a^{n-1} \equiv 1 \ (\text{mod} \ n) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

If there exists a number a such that a^{n-1} \not \equiv 1 \ (\text{mod} \ n), then n is proved to be composite, in which case the number a is said to be a Fermat witness for the compositeness of n. If the congruence (1) holds for several values of a, then n is a likely a prime.

Suppose n is a large odd number whose “prime versus composite” status is not known. If the congruence (1) holds for some positive integer a, then n is said to be a probable prime to base a. A probable prime to the base a could be prime or could be composite. If the latter, the number n is said to be a pseudoprime to base a.

This is how the Fermat test (or probable prime test) works. Given that n is a large odd number whose “prime versus composite” status is not known, if n is not a probable prime to one base a, then n is proven to be composite and a is a Fermat witness for n. On the other hand, if n is a probable prime to several randomly selected bases, then n is a probable prime.

Suppose that you apply the Fermat test on the number n and find that n is a probable prime to several bases. There are two possibilities. Either n is prime or is a pseudoprime to the several bases that are being used. Pseudoprimess to one base are rare. Pseudoprimes to several bases at once are even rarer. For example, according to [3], there are only 21853 pseudoprimes to the base 2 that are less than 25 \cdot 10^9. The number of primes below 25 \cdot 10^9 is roughly 10^9. So most probable primes to base 2 are primes. Of the 21853 pseudoprimes to the base 2 that are below 25 \cdot 10^9, 4709 of them are pseudoprimes to the bases 2 and 3, 2522 of them are pseudoprimes to the bases 2, 3 and 5, and 1770 of them are pseudoprimes to the bases 2, 3, 5 and 7. Thus there is indication that the numbers that are pseudoprimes to several bases are very rare. Therefore, the number n being a simultaneous pseudoprime to several bases is very strong evidence that n is prime. But this is not the end of the story for the Fermat test.

The Fermat test can detect the compositeness of most composite numbers. These are the numbers n that are probable prime to one base but are not probable prime to another base. For example, 341 is a probable prime to base 2 since 2^{340} \equiv 1 \ (\text{mod} \ 341). On the other hand, 3^{340} \equiv 56 \ (\text{mod} \ 341). Thus 341 is not a probable prime to base 3 (i.e. 3 is a Fermat witness for the compositeness of 341). For integers like 341, they may be probable primes to one base, but they are not probable primes to another base. If there exists at least one Fermat witness for a composite number, there is a good chance that the Fermat test will detect the compositeness.

A odd composite number n is said to be a Carmichael number if the congruence (1) holds for all numbers a coprime to n. The smallest Carmichael number is 561. There is no danger of mistaking a small Carmichael number such as 561 as prime because factoring is possible. Carmichael numbers are rare but there are infinitely of them. So there are large Carmichael numbers (e.g. with hundreds of digits or thousands of digits). The Fermat test fails completely with these large Carmichael numbers. In other words, no matter how many bases are used, it is impossible to find a number a that will witness the compositeness of a Carmichael number. Unlike the number 341, the Fermat test will fail to detect the compositeness of a Carmichael number.

___________________________________________________________________

Looking for a better type of witness

Let n be an odd number with n-1=2^k \cdot Q where k \ge 1 and Q is odd. Then compute the following sequence of k+1 numbers modulo n:

    \displaystyle a^Q, \ a^{2Q}, \ a^{2^2 Q}, \ \cdots, \ a^{2^{k-1} Q}, \ a^{2^{k} Q} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

where a is some number coprime to n. The first term in the sequence can be calculated by using the fast powering algorithm. Each subsequent term is the square of the preceding one. Of course, the last term is the same as a^{n-1}, the calculation for (1). Then if this calculation is done on a prime number n, the last term in (2) is always a 1. But more can be said about the sequence (2).

The better witnesses and primality test we discuss here are based on this theorem about prime numbers.

    If n is a prime number, then either one of the following conditions holds:

    • \displaystyle a^{Q} \equiv 1 \ (\text{mod} \ n) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2a)
    • \displaystyle a^{2^j \cdot Q} \equiv -1 \ (\text{mod} \ n) for some j=0,1,2,\cdots, k-1 \ \ \ \ \ \ \ \ (2b)

    \text{ }

    for each number a that is coprime to n.

The theorem just mentioned is the basis of the Miller-Rabin test. If there exists some a such that both (2a) and (2b) are not true, then the number n is proved to be composite, in which case the number a is said to be a witness for the compositeness of n. So a witness from the Fermat test standpoint is called Fermat witness. The term witness here refers to the calculation of (2). If the conditions (2a) or (2b) holds for several values of a, then n is likely a prime number.

Suppose n is a large odd number whose “prime versus composite” status is not known. If the conditions (2a) or (2b) holds for some positive integer a, then n is said to be a strong probable prime to base a. A strong probable prime to the base a could be prime or could be composite. If the latter, the number n is said to be a strong pseudoprime to base a.

This is how the Miller-Rabin test (or strong probable prime test) works. Given that n is a large odd number whose “prime versus composite” status is not known, if n is not a strong probable prime to one base a, then n is proved to be composite and the number a is a witness for a. If n is a strong probable prime to several randomly chosen bases, then the probability that n is composite is smaller than 4^{-k} (if k bases are used). In other words, the error probability can be made arbitrarily small by using more random bases in the calculation for sequence (2).

The last point about the small error probability is the most important advantage of the Miller-Rabin test over the Fermat test. Let’s look at a small example first. Recall that 561 is the smallest Carmichael number. Note that 560=2^4 \cdot 35. The following is the calculation for (2).

    2^{35} \equiv 263 \ (\text{mod} \ 561)

    2^{2 \cdot 35} \equiv 166 \ (\text{mod} \ 561)

    2^{4 \cdot 35} \equiv 67 \ (\text{mod} \ 561)

    2^{8 \cdot 35} \equiv 1 \ (\text{mod} \ 561)

    2^{16 \cdot 35} \equiv 1 \ (\text{mod} \ 561)

The number 561 is a probable prime to base 2 (from a Fermat perspective). Notice that the condition (2a) is not met. The condition (2b) is also not met since the term preceding the first 1 is 67, which is not congruent to -1. So the number 2 is a witness for the compositeness of 561. The Fermat test will have virtually no chance of detecting the compositeness of 561. Yet the strong probable prime test takes only one calculation for 561. In this case at least, it is pretty easy to find a witness.

___________________________________________________________________

At least one witness can be found

One characteristic of the strong probable prime test is that there are lots of witnesses when testing composite numbers. In fact, if n is a composite number, over 75% of the possible choices of bases a are witnesses for n. We prove a simpler theorem that if n is composite, at least one witness for the compositeness of n can be found. Even this much weaker theorem is a big improvement over the Fermat test.

Theorem 1
Let n be a composite positive odd number. Then there a number a that is coprime to n such that n is not a strong pseudoprime to base a.

Proof of Theorem 1
Let n=u \cdot v such that u,v are odd and are coprime. Consider the following two congruence equations.

    x \equiv 1 \ (\text{mod} \ u)

    x \equiv -1 \ (\text{mod} \ v)

By the Chinese remainder theorem (CRT), there is an a such that

    a \equiv 1 \ (\text{mod} \ u)

    a \equiv -1 \ (\text{mod} \ v)

We can assume that 1<a<n. If not, reduce a modulo n. Then a \not \equiv 1 \ (\text{mod} \ n). If a \equiv 1 \ (\text{mod} \ n), then a=1+n \cdot t=1+u \cdot v \cdot t for some t. This would mean that a \equiv 1 \ (\text{mod} \ v), contradicting a \equiv -1 \ (\text{mod} \ v). Similarly a \not \equiv -1 \ (\text{mod} \ n). So we have a \not \equiv \pm 1 \ (\text{mod} \ n). On the other hand, we have

    a^2 \equiv 1 \ (\text{mod} \ u)

    a^2 \equiv 1 \ (\text{mod} \ v)

By CRT, a^2 \equiv 1 \ (\text{mod} \ n). Let n-1=2^k \cdot Q where k \ge 1 and Q odd. Let Q=2w+1 for some integer w. The following calculates a^Q and a^{2 \cdot Q}.

    a^Q=a^{2w} \cdot a = (a^2)^w \cdot a \equiv 1^w \cdot a \equiv a \not \equiv \pm 1 \ (\text{mod} \ n)

    a^{2 \cdot Q} \equiv (a^2)^Q \equiv (1)^Q \equiv 1 \ (\text{mod} \ n)

The first term in (2) that is a 1 is a^{2 \cdot Q}. The preceding term is not a -1. This shows that the composite number n is not a strong pseudoprime to base a. Thus for any composite number n, there always exists a witness for the compositeness of n. \blacksquare

The proof of Theorem 1 is essentially that if n is composite, there is a square root other than \pm 1. This nontrivial square root is a witness to the compositeness of n.

___________________________________________________________________

The least witness

If n is a composite number, there exists a witness for n according to Theorem 1. Of all the witnesses for a given composite number n, there must be the smallest one. Let w(n) be the least witness for the compositeness for the composite number n. The number w(n) is an interesting one.

In general, the larger w(n) is, the harder to find the number n. This stems from the fact that strong pseudoprimes are very rare. Most composite numbers are not strong pseudoprimes to base 2. For these composite numbers, w(n)=2. For the strong pseudoprimes to base 2, w(n)>2. For w(n)>3, the composite numbers must be strong pseudoprimes to base 2 and base 3. For w(n)>5, the composite numbers must be strong pseudoprimes to the prime bases 2, 3 and 5. There are only 13 such numbers below 25 \cdot 10^9. Thus it is not easy to find composite numbers that have large least witnesses.

Two more comments to make. One is that knowing the least integer n such that w(n) is greater than the first k prime numbers can be as a deterministic primality test [2]. This is discussed in the next post. The other is that even though a large w(n) means that the composite numbers n are rare (under a specific bound). But there are infinitely many of them (over the entire number line). It follows from a theorem in [1] that for any finite set of integers, there are infinitely many Carmichael numbers whose w(n) is not found in the finite set.

___________________________________________________________________

Reference

  1. Alford W., Granville A., and Pomerance C., On the difficulty
    of finding reliable witnesses
    , L. Adleman and M.-D. Huang, editors, Algorithmic Number Theory: Proc. ANTS-I, Ithaca, NY, volume 877 of
    Lecture Notes in Computer Science, pages 1–16. Springer–Verlag, 1994.
  2. Jiang Yupeng, Deng Yingpu, Strong pseudoprimes to the first 9 prime bases, arXiv:1207.0063v1 [math.NT], June 30, 2012.
  3. Pomerance C., Selfridge J. L., Wagstaff, S. S., The pseudoprimes to 25 \cdot 10^9, Math. Comp., Volume 35, 1003-1026, 1980.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}

The first prime number after the 8th Fermat number

In this post, we discuss a primality testing exercise involving the eighth Fermat number. A Fermat number is of the form F_n=2^{2^n}+1 where n is any nonnegative integer. We search for the first prime number that is greater than F_8. The basic idea is to search for the first probable prime base 2 among the odd numbers after F_8. Once the first probable prime base 2 is identified, we apply the Miller-Rabin primality test to confirm that it is a prime number. At the outset of this exercise, we did not know how many numbers we had to check before reaching the first prime number.

The first five Fermat numbers F_0, F_1, F_2, F_3 and F_4 are the only Fermat numbers that are known to be prime (it was conjectured by Fermat that all Fermat numbers are prime). It is unknown whether there exists prime Fermat number beyond F_4. What is clear, however, is that all the higher Fermat numbers that were studied turn out to be composite. The 8th Fermat number F_8 has 78 decimal digits with two factors with 16 and 62 digits (it was factored in 1961). The largest Fermat number that has been completely factored (as of the writing of this post) is F_{11} which has 617 decimal digits. Many Fermat numbers larger than F_{11} have been partially factored.

___________________________________________________________________

The basic approach

The following is the number 2^{256}, which has 78 decimal digits.

    2^{256}=
    11579208923731619542357098500868790785326998466564
    0564039457584007913129639936

Define P_j=2^{256}+j where j is an odd positive integer, i.e., j=1,3,5,7,\cdots. The exercise is to find the smallest j such that P_j is a prime number. According to Euclid’s proof that there are infinitely many prime numbers, such a P_j is sure to exist. Just that we do not know at the outset how far we have to go to find it. Of course, P_1 is the 8th Fermat number, which is a composite number with two prime factors with 16 and 62 decimal digits. So the search starts with j=3.

The key is to do the following two quick checks to eliminate composite numbers so that we can reach a probable prime as quickly as possible.

  • For any given P_j, the first step is to look for small prime factors, i.e., to factor P_j using prime numbers less than a bound B. If a small prime factor is found, then we increase j by 2 and start over. Note that we skip any P_j where the sum of digits is divisible by 3. We also skip any P_j that ends with the digit 5.
  • If no small factors are found, then compute the congruence 2^{P_j-1} \ (\text{mod} \ P_j). If the answer is not congruent to 1, then we know P_j is composite and work on the next number. If 2^{P_j-1} \equiv 1 \ (\text{mod} \ P_j), then P_j is said to be a probable prime base 2. Once we know that a particular P_j is a probable prime base 2, it is likely a prime number. To further confirm, we apply the Miller-Rabin primality test on that P_j.

In the first check, we check for prime factors among the first 100 odd prime numbers (i.e. all odd primes up to and including 547).

___________________________________________________________________

Searching the first probable prime

At the outset, we did not know how many numbers we will have to check. Since there can be a long gap between two successive prime numbers, the worse fear is that the number range we are dealing with is situated in such a long gap, in which case we may have to check thousands of numbers (or even tens of thousands). Luckily the search settles on a probable prime rather quickly. The magic number is 297. In other words, for the number

    P_{297}=2^{256}+297

, we find that 2^{P_{297}-1} \equiv 1 \ (\text{mod} \ P_{297}). Thus P_{297} is a probable prime in base 2. The following shows the decimal digits of P_{297}.

    P_{297}=
    11579208923731619542357098500868790785326998466564
    0564039457584007913129640233

To further give a sense of how the magic number P_{297} is reached, the following table lists the 25 calculations leading to the magic number.

    \left[\begin{array}{rrrrrrr}      j & \text{ } & \text{last 5 digits of } P_j & \text{ }  & \text{least factor of } P_j & \text{ } & 2^{P_j-1} \ \text{mod} \ P_j \\           \text{ } & \text{ } & \text{ }  \\                      259 & \text{ } & 40195 & \text{ } & 5 & \text{ } & \text{ }  \\      261 & \text{ } & 40197 & \text{ } & * & \text{ } & \not \equiv 1  \\      263 & \text{ } & 40199 & \text{ } & 3 & \text{ } & \text{ }  \\      265 & \text{ } & 40201 & \text{ } & * & \text{ } & \not \equiv 1  \\      267 & \text{ } & 40203 & \text{ } & * & \text{ } & \not \equiv 1  \\        269 & \text{ } & 40205 & \text{ } & 3 & \text{ } & \text{ }  \\      271 & \text{ } & 40207 & \text{ } & 7 & \text{ } & \text{ }  \\      273 & \text{ } & 40209 & \text{ } & * & \text{ } & \not \equiv 1  \\      275 & \text{ } & 40211 & \text{ } & 3 & \text{ } & \text{ }  \\      277 & \text{ } & 40213 & \text{ } & 11 & \text{ } & \text{ }  \\        279 & \text{ } & 40215 & \text{ } & 5 & \text{ } & \text{ }  \\      281 & \text{ } & 40217 & \text{ } & 3 & \text{ } & \text{ }  \\      283 & \text{ } & 40219 & \text{ } & 13 & \text{ } & \text{ }  \\      285 & \text{ } & 40221 & \text{ } & 7 & \text{ } & \text{ }  \\      287 & \text{ } & 40223 & \text{ } & 3 & \text{ } & \text{ }  \\        289 & \text{ } & 40225 & \text{ } & 5 & \text{ } & \text{ }  \\      291 & \text{ } & 40227 & \text{ } & 23 & \text{ } & \text{ }  \\      293 & \text{ } & 40229 & \text{ } & 3 & \text{ } & \text{ }  \\      295 & \text{ } & 40231 & \text{ } & 71 & \text{ } & \text{ }  \\      297 & \text{ } & 40233 & \text{ } & * & \text{ } & \equiv 1      \end{array}\right]

The first number is in the table P_{259} ends in a 5 and is thus composite. The third number P_{263} is composite since the sum of the digits of its is divisible by 3. The third column of the above table shows the least prime factor below 547 (if one is found). An asterisk in the third column means that none of the prime numbers below 547 is a factor. For such numbers, we compute the modular exponentiation 2^{P_j-1} \ (\text{mod} \ P_j).

In the above table, 4 of the asterisks lead to the result 2^{P_j-1} \not \equiv 1 \ (\text{mod} \ P_j). These numbers P_j are thus composite. For example, for P_{273}, the following is the result:

    2^{P_{273}-1} \ (\text{mod} \ P_{273}) \equiv
    55365573520609500639906523255562025480037454102798
    631593548187358338340281435

The last number P_{297} in the table is a probable prime base 2 since our calculation shows that 2^{P_{297}-1} \equiv 1 \ (\text{mod} \ P_{297}). Being a probable prime to base 2 is actually very strong evidence that the number is a prime number. We want even stronger evidence that P_{297} is a prime. For example, we can carry out the Miller-Rabin test in such a way that the probability of mistaking a composite number as prime is at most one in a septillion! A septillion is the square of a trillion. A trillion is 10^{12}. Thus a septillion is 10^{24}. One in a septillion is for all practical purposes zero. But if one wants more reassurance, one can always run the Miller-Rabin test with more bases.

___________________________________________________________________

The Miller-Rabin primality test

The Miller-Rabin test is a variant of the Fermat test because Miller-Rabin still relies on Fermat’s little theorem. But Miller-Rabin uses Fermat’s little theorem in such a way that it eliminates the issue of the Fermat test mistakenly identifying Carmichael numbers as prime.

Given an odd positive integer whose “prime or composite” status is not known, the Miller-Rabin test will output “composite” or “probable prime”. Like the Fermat test, the Miller-Rabin test calculates a^{n-1} \ (\text{mod} \ n) for several values of a. But the test organizes the congruence a^{n-1} \ (\text{mod} \ n) a little differently to capture additional information about prime numbers.

Here’s how to set up the calculation for Miller-Rabin. Because n is odd, n-1 is even. We can factor n-1 as a product of a power of 2 and an odd number. So we have n-1=2^k \cdot q where k \ge 1 and q is odd (q may not be prime). Then we calculate the following sequence:

    a^q, \ a^{2 \cdot q}, \ a^{2^2 \cdot q}, \cdots, a^{2^{k-1} \cdot q}, \ a^{2^{k} \cdot q} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The first term in (1) can be calculated using the fast powering algorithm (using the binary expansion of q to convert the calculation of a^q into a series of squarings and multiplications). Each subsequent term is then the square of the preceding term. The last term is of course a^{n-1}. Each squaring or multiplication is reduced modulo n. The Miller-Rabin test is based on the following property of prime numbers:

    Theorem 1
    Let n be an odd prime number such that n-1=2^k \cdot q where k \ge 1 and q is odd. Let a be a positive integer not divisible by n. Then the following two conditions are true about the sequence (1).

    • At least one term in the sequence (1) is congruent to 1 modulo n.
    • Either the first term in (1) is congruent to 1 modulo n or the term preceding the first 1 is congruent to -1 modulo n.

How the Miller-Rabin test works
Suppose that the “prime or composite” status of an odd integer n is not known. If both conditions in the above theorem are satisfied with respect to the number a, then n is said to be a strong probable prime in base a. If a strong probable prime in base a happens to be composite, then it is said to be a strong pseudoprime in base a. In other words, a strong pseudoprime is a composite number that possesses a prime-like property, namely it satisfies the two conditions in Theorem 1 with respect to one base a.

The test procedure of Miller-Rabin is to check whether n is a strong probable prime to several bases that are randomly chosen. The following determines the outcome of the test:

  • If n is not a strong probable prime in one of the chosen bases, then n is proved to be composite.
  • If n is shown to be a strong probable prime in all the chosen bases (say there are k of them), then n is “probably prime” with an error probability of at most 0.25^k.

To prove the integer n is composite, we look for a base a for which n is not a strong probable prime. Such a value of a is also called a Miller-Rabin witness for the compositeness of n. For primality, the Miller-Rabin test does not give a mathematical proof that a number is prime. The Miller-Rabin test is a probable prime test. It gives strong evidence that n is a prime number, with an error probability that can be made arbitrarily small by using a large random sample of values of a.

Take the prime candidate P_{297} that is discussed above. We plan to run the Miller-Rabin test on P_{297} using 40 random values of a where 1<a<P_{297}. If P_{297} is shown to be a strong probable prime in all 40 bases, then the prime candidate P_{297} is likely a prime number with an error probability of at most 0.25^{40}. This probability works out to be less than 1 in 10 raised to 24 (hence the one in a septillion that is mentioned earlier). If one wants stronger evidence, we can compute for more values of a. Thus if P_{297} is in actuality a composite number, there is at most a one in septillion chance that the Miller-Rabin test will declare P_{297} is a prime number.

How can the Miller-Rabin test make the claim of having such a small error probability? The fact the the error probability of Miller-Rabin can be made arbitrarily small stems from the following fact.

    Theorem 2
    Suppose that n is a composite odd number. At most 25% of the numbers in the interval 1<a<n are bases in which n is a strong pseudoprime. Putting it in another way, at least 75% of the numbers in 1<a<n are bases in which n is not a strong pseudoprime.

To paraphrase Theorem 2, if n is composite to begin with, at least 75% of the numbers in 1<a<n will prove its compositeness. That means that at most 25% of the numbers a will exhibit the prime-like property described in Theorem 1. The power of Miller-Rabin comes from the fact that for composite numbers there are more values of a that will give a correct result (in fact, at least 3 times more).

Thus if you apply the Miller-Rabin test on a composite number n, you will bound to stumble on a base a that will prove its compositeness, especially if the bases are randomly chosen. Any random choice of a where 1<a<n has at least a 75% chance of being correct on the composite number n. In a series of 100 random choices of a, it will be hard to miss such values of a. The only way that Miller-Rabin can make a mistake by declaring a composite number as prime is to pick all the values of a from the (at most) 25% of the pool of values of a that are strong pseudoprime prime. This probability is bounded by 0.25^k (if k is the number of selections of a).

___________________________________________________________________

Applying Miller-Rabin on the prime candidate

The first task is to factor P_{297}-1. We find that P_{297}-1=2^3 \times q where q is the following odd number:

    q=
    14474011154664524427946373126085988481658748083205
    070504932198000989141205029

For each randomly selected a, we calculate the following sequence:

    a^q, \ a^{2q}, \ a^{4q}, \ a^{8q} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

The first term is calculated using the fast powering algorithm (a series of squarings and multiplications). Each subsequent term is the square of the preceding term. Each term in the sequence is reduced modulo P_{297}. The goal is to see if the two conditions in Theorem 1 are satisfied. One is that one of the 4 values in (2) is a 1. The other is that the term preceding the first 1 in (2) has to be a -1.

The following shows the 40 numbers that are randomly chosen in the interval 1<a<P_{297}.

    a_1=
    03006957708701194503849170682264647623506815369915
    7798209693214442533348380872
    a_2=
    02223067440101780765895379553626469438082041828085
    0568523022714143509352911267
    a_3=
    04531895131849504635258523281146698909008537921009
    6337435091877410129499153591
    a_4=
    05434508269932993379745836263818598804800824522102
    0278113825689716192402178622
    a_5=
    08799241442673378780142202326330306495270149563840
    3866810486309815815031353521
    a_6=
    02638607393577034288802880492058261281940769238659
    8928068666401909247319838064
    a_7=
    04283430251977183138176255955338099404217762991191
    9192783003754562986178981473
    a_8=
    09773398144692973692102006868849010147546139698798
    3443958657834362269077067224
    a_9=
    05504666974469005713839308880951115507992521746498
    7157086751623602877205126361
    a_{10}=
    11369425784373951812019794994427515082375862595853
    6524984616385315102874812557
    a_{11}=
    11280428157869817083329641054154150272024966029283
    2165114734540900026838117128
    a_{12}=
    11208322317253928483879618989535357346499197200982
    7728283667193655956607063861
    a_{13}=
    05585951853297694372636067012444311272073854408338
    4421611399136081624631900538
    a_{14}=
    06831924581003106427566658433259804779354874917795
    9811865334330929987281859876
    a_{15}=
    07339174229323952008915772840377019251465052264221
    1294344032116313026124007734
    a_{16}=
    05117387267263929174559713098463717229625661656017
    7194611080485470890280573816
    a_{17}=
    06599941646668915168578091934085890873056463577356
    8090503454939353325803291530
    a_{18}=
    07545265152740184887140788322673806569482388835389
    5577110370797470603035554930
    a_{19}=
    02591621894664804222839429868664505564743756550515
    2520842332602724614579447809
    a_{20}=
    04791002227899384351266879075743764807247161403811
    8767378458621521760044966007
    a_{21}=
    03251071871924939761772100645669847224066002842238
    6690935371046248267119967874
    a_{22}=
    07211128555514235391448579740428274673170438137060
    9390617781010839144521896079
    a_{23}=
    02839820419745979344283855308465698534375525126267
    1701870835230228506944995955
    a_{24}=
    06304631891686637702274634195264042846471748931602
    4893381338158934204519928855
    a_{25}=
    06492095235781034422561843267711627481401158404402
    2978856782776323231230432687
    a_{26}=
    11078868891712009912929762366314190797941038596568
    5459274315695355251764942151
    a_{27}=
    05795069944009506186885816367149671702413127414386
    2708093175566185349033983346
    a_{28}=
    01712922833914010148104423892201355622294341143990
    7524285008693345292476544524
    a_{29}=
    09743541325262594740093734822046739122734773994479
    9814337973200740861495044676
    a_{30}=
    02503872375817370838455279068302037475992008315394
    2976462871038003917493744995
    a_{31}=
    06980677383898331402575574511880992071872803011356
    6498794763450065008785347168
    a_{32}=
    01507075889390134242331585173319278262699562685820
    7121480322563439665642035394
    a_{33}=
    02471785068822350832987019936892052187736451275830
    5372059292781558599916131031
    a_{34}=
    10950891460180297156465120507537244257810396062906
    9207306297501015755045004254
    a_{35}=
    11052976297188507170707306917942099264941855478856
    2965936913589165233381674539
    a_{36}=
    03911878231948499128291863266472008604449261315172
    1053813631612297577166335941
    a_{37}=
    06903294587603383022211116535092146484651980588002
    9291840261276683214113088012
    a_{38}=
    03942020579038616658412018517396703874933208670283
    3087287933190554281896471934
    a_{39}=
    04338728160253711124705740270085271024911573570055
    1690460857511205663297661796
    a_{40}=
    06707597137792150532106913489524457238449067437061
    7211249957355483821516113140

For each random number a_j, we calculated the 4 numbers indicated in sequence (2). The following 3 tables show the results of the calculation.

    \left[\begin{array}{rrrrrrrrr}      j & \text{ } & a_j^q & \text{ }  & a_j^{2q} & \text{ } & a_j^{4q} & \text{ } & a_j^{8q} \\           \text{ } & \text{ } & \text{ }  \\                  1 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\      2 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      3 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      4 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      5 & \text{ } & -1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\        6 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\      7 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\      8 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      9 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\      10 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\        11 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      12 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      13 & \text{ } & -1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      14 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      15 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1        \end{array}\right]

    \left[\begin{array}{rrrrrrrrr}      j & \text{ } & a_j^q & \text{ }  & a_j^{2q} & \text{ } & a_j^{4q} & \text{ } & a_j^{8q} \\           \text{ } & \text{ } & \text{ }  \\                  16 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      17 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      18 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      19 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\      20 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\        21 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      22 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      23 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      24 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      25 & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1 & \text{ } & 1  \\        26 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      27 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      28 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      29 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      30 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1       \end{array}\right]

    \left[\begin{array}{rrrrrrrrr}      j & \text{ } & a_j^q & \text{ }  & a_j^{2q} & \text{ } & a_j^{4q} & \text{ } & a_j^{8q} \\           \text{ } & \text{ } & \text{ }  \\                  31 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      32 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      33 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      34 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      35 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\        36 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      37 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      38 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1  \\      39 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1 & \text{ } & 1  \\      40 & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & -1 & \text{ } & 1             \end{array}\right]

There are 4 columns of calculation results, one for each term in sequence (2). If a calculation result is a blank in the above tables, it means that the result is a number that is not 1 or -1 modulo P_{297}. For example, a_1^q \ (\text{mod} \ P_{297}) and a_2^q \ (\text{mod} \ P_{297}) are congruent to the following two numbers:

    a_1^q \equiv
    86168678768024029811437552745042076645410792873480
    629834883948094184848812907

    a_2^q \equiv
    10235477176842589582260882228891913141693105976929
    7597880545619812030150151760

In the above 3 tables, all results match the conditions of Theorem 1. For each number a_j, the calculated results are eventually 1. On some of the rows, the first result is a 1. In all the other rows, the term right before the first 1 is a -1. For example, in the first row where j=1, the first 1 a_1^{4q} and the term preceding that is a -1.

The results in the above 3 tables show that the number P_{297} is a strong probable prime in all 40 of the randomly chosen bases. We have very strong evidence that the number P_{297} is a prime number. The probability that it is a composite number but we mistakenly identify it as prime is at most one in a septillion!

___________________________________________________________________

Exercise

In our search for probable primes larger than the 8th Fermat number, we also find that the number P_{301}=2^{301}+301 is also a probable prime base 2. The following shows the decimal digits:

    P_{301}=
    11579208923731619542357098500868790785326998466564
    0564039457584007913129640237

Is it a prime number? Perform the Miller-Rabin test on this number.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}

Is factorization a hard problem?

Is factorization a hard problem? There is plenty of empirical evidence that it is so. Take the following 309-digit number that is known as RSA-1024, an example of an RSA number.

    RSA-1024
    13506641086599522334960321627880596993888147560566
    70275244851438515265106048595338339402871505719094
    41798207282164471551373680419703964191743046496589
    27425623934102086438320211037295872576235850964311
    05640735015081875106765946292055636855294752135008
    52879416377328533906109750544334999811150056977236
    890927563

RSA-1024, a 1024-bit number, is a product of two prime numbers p and q. No one has been able to factor this number, despite the advances in factoring algorithms and computing technology in recent decades. RSA-1024 is part of the RSA Factoring Challenge that was created in 1991. Even though the challenge was withdrawn in 2007, it is believed that people are still taking up the challenge to factor this and other unfactored RSA numbers. In fact, the successful factoring of RSA-1024 or similarly sized numbers would have huge security implication for the RSA algorithm. The RSA cryptosystem is built on the difficulty (if not the impossibility) of factoring large numbers such as RSA-1024.

Yet it is very easy to demonstrate that RSA-1024 is not a prime number. The fact that it is composite can be settled by performing one modular exponentiation. Denote RSA-1024 by N. We compute 2^{N-1} \ (\text{mod} \ N).

We find that 2^{N-1} \equiv T \ (\text{mod} \ N) where T is the following 309-digit number.

    T=
    12093909443203361586765059535295699686754009846358
    89512389028083675567339322020593385334853414711666
    28419681241072885123739040710771394053528488357104
    98409193003137847878952260296151232848795137981274
    06300472693925500331497519103479951096634123177725
    21248297950196643140069546889855131459759160570963
    857373851

Obviously T is not 1. This fact is enough to prove that the modulus N is not a prime number. This is because the number N lacks a property possessed by all prime numbers. According to Fermat’s little theorem, if N were prime, then that a^{N-1} \equiv 1 \ (\text{mod} \ N) for all integers a that are relatively prime to N. In particular, if N were prime, then we would have 2^{N-1} \equiv 1 \ (\text{mod} \ N), the opposite of our result.

The modular exponentiation a^{N-1} \ (\text{mod} \ N) discussed here can be performed using the fast powering algorithm, which runs in polynomial time. In the fast powering algorithm, the binary expansion of the exponent is used to convert the modular exponentiation into a series of squarings and multiplications. If the exponent N-1 is a k-bit number, then it takes k-1 squarings and at most k-1 multiplications. For RSA-1024, it takes 1023 squarings and at most 1023 multiplications (in this instance exactly 507 multiplications). This calculation, implemented in a modern computer, can be done in seconds.

The above calculation is a vivid demonstration that factoring is hard while detecting the primality or compositeness of a number is a much simpler problem.

The minimum RSA key length prior to the end of 2013 is 1024. After 2013, The minimum RSA key length is 2048. In fact, the largest RSA number is RSA-2048 (has 2048 bits and 617 decimal digits), which is expected to stay unfactored for years to come barring dramatic advances in factoring algorithms or computing capabilities.

___________________________________________________________________

Exercise

Using a software package that can handle modular exponentiation involving large numbers, it is easy to check for “prime versus composite” status of a large number. Find a number n whose prime factorization is not known. Either use known numbers such as RSA numbers or randomly generate a large number. Then calculate the modular exponentiation a^{n-1} \ (\text{mod} \ a) for several values of a (it is a good practice to start with a=2). If the answer is not congruent to 1 for one value of a, then we know n is composite. If the exponentiation is all congruent to 1 for the several values of a, then n is a likely a prime number.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}

The magic words are squeamish ossifrage

The title of the blog post is the answer to a decryption challenge problem in connection to a factoring problem of a 129-digit number that is known as RSA-129. The challenge problem was posed in 1977. Our goal here is to use this challenge problem to demonstrate how to encrypt and decrypt in the RSA algorithm. The author of this blog recently built a calculator that can handle modular exponentiation involving large numbers. The challenge problem of RSA-129 provides an opportunity to verify the calculator.

In the August 1977, Martin Gardner wrote an article called “A new kind of cipher that would take millions of years to break” in his Mathematical Games column in Scientific American. This article posed a challenge problem created by the inventors of the RSA cryptosystem. Here’s the gist of the problem.

    An English sentence is converted into a number according to the rule that maps A to 01, B to 02, C to 03, . . . ,Y to 25, Z to 26, with 00 representing a space between words. Call the resulting number m. Then raise m to e= 9007 and then reduce the exponentiation result modulo the following 129-digit number N.

      N=
      11438162575788886766923577997614661201021829672124
      23625625618429357069352457338978305971235639587050
      58989075147599290026879543541

    In modular arithmetic, the notation for this calculation is m^{9007} \equiv c \ (\text{mod} \ N), where c is the following number:

      c=
      96869613754622061477140922254355882905759991124574
      31987469512093081629822514570835693147662288398962
      8013391990551829945157815154

    The number c is the ciphertext. What is the plaintext m? What is English phrase that is behind m?

During the 17 years after the publication of the challenge problem, no one other than the creators of the problem knew the English phrase. In 1994, a team led by Atkins, Graff, Lenstra and Leyland [1] cracked the code and found that the answer is the title of this blog post. The effort that led to the answer was a massive computing project that was distributed over the Internet. Their goal was to factor the 129-digit number N (the modulus) into the two prime factors p and q. A more detailed description of the problem can also be found in a piece from MAA (see here).

Once the factors of the modulus are known, the decryption exponent d was calculated using the extended Euclidean algorithm. Then raise the ciphertext c to d and reduce modulo N. The result is the plaintext m. Using the predetermined rule described earlier, the plaintext m is translated back to the original English phrase. In the remainder of this post, we demonstrate how this process works with the prime factors p and q as a given.

The modulus N is a 129-digit number that is known as RSA-129. It is the product of two prime numbers with 64 and 65 digits. Instead of taking millions of years to factor this modulus as suggested in the title of Martin Gardner’s article, it took a mere 17 years! In 1977, Rivest, one of the creators of the RSA algorithm, estimated that factoring a 125-digit number that is a product of two 63-digit prime numbers would require at least 40 quadrillion years using the best factoring algorithm available. This goes to show that what was considered hard or impossible at one point in time may one day become doable or even easy due to advances in computer power and factoring algorithms. What is considered a secure RSA modulus today (say 1024-bit) may be breakable in the future.

___________________________________________________________________

Decrypting the challenge message

The modulus N as indicated above is a 129-digit number that is known as RSA-129. The number pair N and e= 9007 is an RSA public key. With only the knowledge of public key represented by N and e and the ciphertext c, how hard is it to recover the plaintext m? An answer emerged only after 17 years of the publication of the challenge. In this post, we demonstrate how to retrieve the plaintext from the ciphertext by using the prime factors p and q of the modulus N. Here’s are the factors p and q (found here):

    p=
    3490529510847650949147849619903898133417764638493387843990820577

    q=
    32769132993266709549961988190834461413177642967992942539798288533

As mentioned earlier, we take the prime factors p and q of N as a given. Then compute the product of p-1 and q-1:

    \phi(N)=(p-1) \times (q-1)

The function \phi is important one in number theory and goes by a special name, Euler’s phi function. Then the decryption exponent d is the number satisfying the following congruence relationship:

    ed \equiv 1 \ (\text{mod} \ \phi(N))

where e=9007. In other words, the number d is the multiplicative inverse of e modulo \phi(N). The typical approach in finding d is to use the extended Euclidean algorithm. Once d is found, the following modular exponentiation is computed:

    c^d \equiv m \ (\text{mod} \ N)

We use the predetermined rule to recover the original English phrase. The steps are summarized as follows:

  1. Using the Euclidean algorithm to compute \text{GCD}(e,\phi(N)), the greatest common divisor (GCD) of E and \phi(N)=(p-1) \times (q-1).
  2. The GCD in the previous step should be 1. Performing Euclidean algorithm is so that we can work backward (the extended Euclidean algorithm) to compute d=e^{-1} modulo \phi(N).
  3. Use the fast powering algorithm to compute c^d \equiv m \ (\text{mod} \ N).
  4. Use the predetermined mapping scheme to obtain the plaintext.

___________________________________________________________________

Step 1. Euclidean algorithm

The following is the product of p-1 and q-1.

    \phi(N)=
    11438162575788886766923577997614661201021829672124
    23625625618428994472727416195373314872857532203455
    12393667541112959643090434432

Next, use Euclidean algorithm to find \text{GCD}(9007,N), which should be 1. The point is that we will work backward to find the multiplicative inverse of e= 9007. The Euclidean algorithm consists of a series of divisions until reaching the GCD. The divisions are shown in the following matrix:

    \left[\begin{array}{rrrrrrrr}      \phi(N) & = & 9007 & \times & T & + & 5166 & \text{ }\\      9007 & = & 5166 & \times & 1 & + & 3841 & \text{ } \\      5166 & = & 3841 & \times & 1 & + & 1325 & \text{ } \\      3841 & = & 1325 & \times & 2 & + & 1191 & \text{ } \\      1325 & = & 1191 & \times & 1 & + & 134 & \text{ } \\      1191 & = & 134 & \times & 8 & + & 119  & \text{ } \\      134 & = & 119 & \times & 1 & + & 15 & \text{ } \\      119 & = & 15 & \times & 7 & + & 14 & \text{ } \\      15 & = & 14 & \times & 1 & + & 1 & \text{*} \\      14 & = & 1 & \times & 14 & + & 0 & \text{ }    \end{array}\right]

where T is the following number:

    T=
    12699192379026187151019849003680094594228743945957
    85084518284033523340432348390555473379435474856728
    2379667762974681874441038

The last column in the above table shows the remainders in the divisions. The last non-zero remainder (marked by *) is the GCD of 9007 and \phi(N), in this case 1. Next, we apply extended Euclidean algorithm to solve the following equation:

    9007 \times x + \phi(N) \times y = 1

The solution x will be the decryption exponent d.
___________________________________________________________________

Step 2. Extended Euclidean algorithm

The extended Euclidean algorithm is to take the above table of divisions and perform back substitutions. The process of doing back substitutions is logically clear but can be tedious. We use a tabular formulation of this process as shown below.

    \left[\begin{array}{rrrrrrrrr}      \text{divisor} & \text{ } & \text{quotient} & \text{ } & \text{back substitution} & \text{ } & \text{sign} & \text{ }\\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      \phi(N) & \text{ } & \text{ } & \text{ } & W & \text{ } & - & \text{ } & \text{Step 9}\\      9007 & \text{ } & T & \text{ } & 605 & \text{ } & + & \text{ } & \text{Step 8}\\      5166 & \text{ } & 1 & \text{ } & 347 & \text{ } & - & \text{ } & \text{Step 7}\\      3841 & \text{ } & 1 & \text{ } & 258 & \text{ } & + & \text{ } & \text{Step 6}\\      1325 & \text{ } & 2 & \text{ } & 89 & \text{ } & - & \text{ } & \text{Step 5}\\      1191 & \text{ } & 1 & \text{ } & 80 & \text{ } & + & \text{ } & \text{Step 4}\\      134 & \text{ } & 8 & \text{ } & 9 & \text{ } & - & \text{ } & \text{Step 3}\\      119 & \text{ } & 1 & \text{ } & 8 & \text{ } & + & \text{ } & \text{Step 2}\\      15 & \text{ } & 7 & \text{ } & 1 & \text{ } & - & \text{ } & \text{Step 1} \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      14 & \text{ } & 1 & \text{ } & \bold 1 & \text{ } & + & \text{ }\\      1 & \text{ } & 14 & \text{ } & \bold 0 & \text{ } & - & \text{ }          \end{array}\right]

where W is the following number:

    W=
    76830113893108432263670086472264572295083900873044
    99761335618402816209615707762860613945584622883205
    839698996599682534036828337

Let’s explain the table for the back substitution. The first number in the first column is \phi(N). The rest of the numbers in the first column are the divisors in the Euclidean algorithm. Note that the bottom number is always the GCD. The second column consists of the quotients in the Euclidean algorithm. The third column is where the back substitution takes place. It works from bottom up. At the bottom of the third column, the numbers are always 0 and 1 (in bold). To make it clear to see, there is blank row separating the bottom two rows from the rows above. The fourth column has the signs of the back substitution numbers. Starting with minus at the bottom, the signs are alternating: minus, plus, minus, plus etc (going from bottom to top).

We now demonstrate the calculation for the back substitution. The following shows the calculation for the rows labeled Step 1, Step 2, Step 3 and Step 9:

    1=1 \times 1 + 14 \times 0 (Step 1)

    1=7 \times 1 + 1 \times 1 (Step 2)

    1=1 \times 8 + 7 \times 1 (Step 3)

    W=T \times 605 + 1 \times 347 (Step 9)

The idea is that each number is the sum of the cross multiplications of the two rows below. This recurrence relation is a more efficient way of performing extended Euclidean algorithm.

The first two rows of the above table give the solution to the equation 9007 \times x + \phi(N) \times y = 1. In the first two rows, cross multiply the divisor and back substitution columns (along with the sign) as follows:

    9007 \times (-W) + \phi(N) \times 605 = 1

Then -W is the solution to ed \equiv 1 \ (\text{mod} \ \phi(N)). We need d to be positive. To find d, simply add \phi(N) to -W. Thus d is the least positive integer such that -W \equiv d \ (\text{mod} \ \phi(N)) where

    d=
    10669861436857802444286877132892015478070990663393
    78628012262244966310631259117744708733401685974623
    06553968544513277109053606095

One additional comment about the above table. When you cross multiply divisor column and back substitution column for any two rows, the result is the GCD. In particular, when this is done on the first two rows, we solve the equation 9007 \times x + \phi(N) \times y = 1 and find the multiplicative inverse of e as a result.

___________________________________________________________________

Step 3 and Step 4. Decipher

With the decryption key d solved in the above step, we are now ready to calculate the following modular exponentiation:

    c^d \equiv m \ (\text{mod} \ N)

The standard approach is to use the fast powering algorithm. The idea is to use the binary expansion of the exponent d to convert the exponentiation c^d into a series of squarings and multiplications. The exponent d is a 426-bit numbers, which means that the fast powering algorithm only requires 425 squarings and at most 425 multiplications. The algorithm is fast and efficient since it runs in polynomial time. Our calculator produces the following result:

    m=
    20080500130107090300231518041900011805001917210501
    1309190800151919090618010705

Using the predetermined mapping of A = 01, B = 02 (and so on), the numeric plaintext is converted to the following English phrase:

    the magic words are squeamish ossifrage

which is the title of this blog post.

___________________________________________________________________

Remark

The above demonstration shows how to derive the decryption key d from the knowledge of the two prime factors p and q of the modulus N. Thus the prime factors p and q should be kept secret. Of course, the decryption key d should also be kept secret so that the originator/creator of the public key is the only one who can read the encrypted messages sent to him or her.

___________________________________________________________________

Exercise

Suppose the public key N and e is as above. Verify that encrypting the plaintext m will derive the ciphertext c given in the challenge problem. The plaintext m is repeated below.

    m=
    20080500130107090300231518041900011805001917210501
    1309190800151919090618010705

___________________________________________________________________

Reference

  1. Atkins D., Graff M., Lenstra, A. K. Lenstra, Leyland P. C., The magic words are squeamish ossifrage, ASIACRYPT-94, Lecture Notes Comput. Sci. Volume 917, Springer-Verlag, Berlin, 1995.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}

A primality testing exercise from the 9th Fermat number

In this post, we discuss a primality testing exercise on a number derived from a prime factor of the 9th Fermat number. The idea of the exercise is that we test the primality of a number generated from a known large prime number by tagging or inserting a digit into the known prime number. This is an excellent way to generate exercises for primality testing. This approach of altering a known prime number is not meant to be a prime number generation process.

A Fermat number is a positive integer of the form F_n=2^{2^n}+1 where n=0,1,2,\cdots. The 9th Fermat number is F_9=2^{2^9}+1=2^{512}+1. This is a 155-digit number that was completely factored in 1993 [1]. Its three prime factors have 7 digits, 49 digits and 99 digits. The following is the 49-digit factor of F_9:

    7,455,602,825,647,884,208,337,395,736,200,454,918,783,366,342,657

The above 49-digit number is a prime number that was resulted from a published effort of factorizing F_9. The other two factors of F_9 can be found in [1] or here. If we tag a digit in front of this 49-digit number, say the digit 6, is the resulting number a prime number? In this post, we discuss the primality testing of the following 50-digit number:

    A= 67,455,602,825,647,884,208,337,395,736,200,454,918,783,366,342,657

Primality testing is an important aspect of public key cryptography. For example, the modulus in the public key in RSA cryptosystem is a product of two distinct large prime numbers. To generate the key, it is crucial to generate large numbers at random and to be able to efficiently test whether the numbers are prime.

___________________________________________________________________

Pre-checks before primality testing

Primality tests such as Fermat test and Miller-Rabin test are popular and are easy to use. These tests check for the primality of a given integer n by randomly choosing a set of auxiliary numbers and then checking whether any one of these auxiliary values is a witness to the compositeness of the number n. We would like to emphasize the point that before applying a randomized primality testing procedure, it will be advantageous to perform some easy checks to quickly rule out primality. If these quick checks show that the number being tested is composite, we can then continue the search by moving on to another candidate prime number. With the above number A as example, we perform the following pre-checks.

  • Check small prime numbers just in case the number being tested has small prime factors.
  • If the previous step produces no factors, look for small Fermat witnesses.

The idea of the first check is clear. A Fermat witness for an integer n is an integer a that is relatively prime to n such that a^{n-1} \not \equiv 1 \ (\text{mod} \ n). If we can find such a number a, we can say conclusively that n is not a prime number. According to Fermat’s little theorem, if n is a prime number, then the congruence would be a one, i.e., a^{n-1} \equiv 1 \ (\text{mod} \ n). This idea is the basis for the Fermat primality test, which is performed on a random set of values of a. Here in the pre-check, we calculate a^{n-1} \ (\text{mod} \ n) for some small values of a such as 2 or 3. The idea is to quickly rule out the candidate prime numbers that happen to have a small Fermat witness.

If either one of the two checks indicate the number being tested is composite, then we pick another number to test. If both of the above steps fail to indicate compositeness, we can then proceed to use the “probabilistic” part of the Fermat test or another probabilistic primality test.

We like to point out that at the outset, the author of the blog did not know the primality status of the 50-digit number A that is obtained by tagging a 6 in front of the 49-digit factor of the 9th Fermat number. Though this number is far too small to work in the RSA algorithm, it serves as an excellent exercise for primality testing.

___________________________________________________________________

Pre-checking the number A

Very quickly, 7 is found to be a factor of A. We have A=7 \times B where B is:

    B= 9636514689378269172619627962314350702683338048951

We then try to find small factors of B. The number 283 is found to be a factor of B. Then we have A=7 \times 283 \times C where C is the following 47-digit number:

    C= 34051288655046887535758402693690285168492360597

By tagging a 6 in front of the 49-digit factor of the 9th Fermat number, we obtain a composite number. The prime or composite status for the number A is settled. Now the next question: is the remaining factor C prime or composite?

___________________________________________________________________

Pre-checking the number C

We perform the two pre-checks as described above. In searching for small factors, we find that C is not divisible by all prime numbers less than 200,000. So if C has a factor, it will have to be a large one. Or it could be that C is prime. Instead of continuing to search for factors, we can do the second check. We calculate 2^{C-1} \ (\text{mod} \ C) and find that:

    2^{C-1} \equiv t \ (\text{mod} \ C)

    where t= 7426390354305013563302537374271875139618265902

Since t clearly is not 1, this tells us that C is composite.

___________________________________________________________________

Remarks

Instead of doing the randomized primality testing right away, it makes sense to do the pre-checks as described above so that we do not waste time and effort on composite numbers that have small factors or have small Fermat witnesses. As soon as these two checks indicate the compositeness of a number, we then move on to another candidate prime number. If a number passes these two checks, then we can apply a primality test such as the Fermat test or the Miller-Rabin test.

Another remark is that primality testing is easy while factoring is hard. The compositeness of the 47-digit number C can be determined by just one modular exponentiation calculation, which can be computed using the fast powering algorithm. This algorithm uses the binary expansion of the exponent to convert the modular exponentiation into a series of squarings and multiplications. Because the exponent C-1 is a 155-bit number, computing 2^{C-1} \ (\text{mod} \ C) only requires 154 squarings and at most 154 multiplications (the actual count of multiplications is 76). Implemented in a modern computer, this calculation takes less than one second.

On the other hand, factoring the above 47-digit number C by the brute force approach (checking every possible prime factor) would require checking approximately 2 \times 10^{20} many prime numbers as potential factors. The age of the universe is believed to be about 13 billions years, which is about 4 \times 10^{17} seconds. That’s why we stop checking for prime factors of C at some point. The authors of [1] factored a 155-digit number in 1993. So the current computing technology is certainly capable of factoring a 47-digit number. But the effort will take time and will require using more advanced factoring algorithm.

___________________________________________________________________

Exercises

The idea of the exercise is that we test the primality of a number generated from a known large prime number by tagging or inserting a digit into the known prime number. Using the same 49-digit factor of F_9, we can tag another digit in front or insert a digit somewhere in the middle. Another possibility is to try the other two factors of F_9.

There are many other large primes to be found on the Internet that can be used for this exercise. For example, factors of other Fermat numbers and RSA numbers.

___________________________________________________________________

Reference

  1. Lenstra A. K., Lenstra H. W., Manasse M. S., Pollard J. M. The Factorization of the Ninth Fermat Number, Mathematiics of Computation, Volume 61, Number 203, July 1993, Pages 319-349.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}