Is factorization a hard problem?

Is factorization a hard problem? There is plenty of empirical evidence that it is so. Take the following 309-digit number that is known as RSA-1024, an example of an RSA number.

    RSA-1024
    13506641086599522334960321627880596993888147560566
    70275244851438515265106048595338339402871505719094
    41798207282164471551373680419703964191743046496589
    27425623934102086438320211037295872576235850964311
    05640735015081875106765946292055636855294752135008
    52879416377328533906109750544334999811150056977236
    890927563

RSA-1024, a 1024-bit number, is a product of two prime numbers p and q. No one has been able to factor this number, despite the advances in factoring algorithms and computing technology in recent decades. RSA-1024 is part of the RSA Factoring Challenge that was created in 1991. Even though the challenge was withdrawn in 2007, it is believed that people are still taking up the challenge to factor this and other unfactored RSA numbers. In fact, the successful factoring of RSA-1024 or similarly sized numbers would have huge security implication for the RSA algorithm. The RSA cryptosystem is built on the difficulty (if not the impossibility) of factoring large numbers such as RSA-1024.

Yet it is very easy to demonstrate that RSA-1024 is not a prime number. The fact that it is composite can be settled by performing one modular exponentiation. Denote RSA-1024 by N. We compute 2^{N-1} \ (\text{mod} \ N).

We find that 2^{N-1} \equiv T \ (\text{mod} \ N) where T is the following 309-digit number.

    T=
    12093909443203361586765059535295699686754009846358
    89512389028083675567339322020593385334853414711666
    28419681241072885123739040710771394053528488357104
    98409193003137847878952260296151232848795137981274
    06300472693925500331497519103479951096634123177725
    21248297950196643140069546889855131459759160570963
    857373851

Obviously T is not 1. This fact is enough to prove that the modulus N is not a prime number. This is because the number N lacks a property possessed by all prime numbers. According to Fermat’s little theorem, if N were prime, then that a^{N-1} \equiv 1 \ (\text{mod} \ N) for all integers a that are relatively prime to N. In particular, if N were prime, then we would have 2^{N-1} \equiv 1 \ (\text{mod} \ N), the opposite of our result.

The modular exponentiation a^{N-1} \ (\text{mod} \ N) discussed here can be performed using the fast powering algorithm, which runs in polynomial time. In the fast powering algorithm, the binary expansion of the exponent is used to convert the modular exponentiation into a series of squarings and multiplications. If the exponent N-1 is a k-bit number, then it takes k-1 squarings and at most k-1 multiplications. For RSA-1024, it takes 1023 squarings and at most 1023 multiplications (in this instance exactly 507 multiplications). This calculation, implemented in a modern computer, can be done in seconds.

The above calculation is a vivid demonstration that factoring is hard while detecting the primality or compositeness of a number is a much simpler problem.

The minimum RSA key length prior to the end of 2013 is 1024. After 2013, The minimum RSA key length is 2048. In fact, the largest RSA number is RSA-2048 (has 2048 bits and 617 decimal digits), which is expected to stay unfactored for years to come barring dramatic advances in factoring algorithms or computing capabilities.

___________________________________________________________________

Exercise

Using a software package that can handle modular exponentiation involving large numbers, it is easy to check for “prime versus composite” status of a large number. Find a number n whose prime factorization is not known. Either use known numbers such as RSA numbers or randomly generate a large number. Then calculate the modular exponentiation a^{n-1} \ (\text{mod} \ a) for several values of a (it is a good practice to start with a=2). If the answer is not congruent to 1 for one value of a, then we know n is composite. If the exponentiation is all congruent to 1 for the several values of a, then n is a likely a prime number.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}

The magic words are squeamish ossifrage

The title of the blog post is the answer to a decryption challenge problem in connection to a factoring problem of a 129-digit number that is known as RSA-129. The challenge problem was posed in 1977. Our goal here is to use this challenge problem to demonstrate how to encrypt and decrypt in the RSA algorithm. The author of this blog recently built a calculator that can handle modular exponentiation involving large numbers. The challenge problem of RSA-129 provides an opportunity to verify the calculator.

In the August 1977, Martin Gardner wrote an article called “A new kind of cipher that would take millions of years to break” in his Mathematical Games column in Scientific American. This article posed a challenge problem created by the inventors of the RSA cryptosystem. Here’s the gist of the problem.

    An English sentence is converted into a number according to the rule that maps A to 01, B to 02, C to 03, . . . ,Y to 25, Z to 26, with 00 representing a space between words. Call the resulting number m. Then raise m to e= 9007 and then reduce the exponentiation result modulo the following 129-digit number N.

      N=
      11438162575788886766923577997614661201021829672124
      23625625618429357069352457338978305971235639587050
      58989075147599290026879543541

    In modular arithmetic, the notation for this calculation is m^{9007} \equiv c \ (\text{mod} \ N), where c is the following number:

      c=
      96869613754622061477140922254355882905759991124574
      31987469512093081629822514570835693147662288398962
      8013391990551829945157815154

    The number c is the ciphertext. What is the plaintext m? What is English phrase that is behind m?

During the 17 years after the publication of the challenge problem, no one other than the creators of the problem knew the English phrase. In 1994, a team led by Atkins, Graff, Lenstra and Leyland [1] cracked the code and found that the answer is the title of this blog post. The effort that led to the answer was a massive computing project that was distributed over the Internet. Their goal was to factor the 129-digit number N (the modulus) into the two prime factors p and q. A more detailed description of the problem can also be found in a piece from MAA (see here).

Once the factors of the modulus are known, the decryption exponent d was calculated using the extended Euclidean algorithm. Then raise the ciphertext c to d and reduce modulo N. The result is the plaintext m. Using the predetermined rule described earlier, the plaintext m is translated back to the original English phrase. In the remainder of this post, we demonstrate how this process works with the prime factors p and q as a given.

The modulus N is a 129-digit number that is known as RSA-129. It is the product of two prime numbers with 64 and 65 digits. Instead of taking millions of years to factor this modulus as suggested in the title of Martin Gardner’s article, it took a mere 17 years! In 1977, Rivest, one of the creators of the RSA algorithm, estimated that factoring a 125-digit number that is a product of two 63-digit prime numbers would require at least 40 quadrillion years using the best factoring algorithm available. This goes to show that what was considered hard or impossible at one point in time may one day become doable or even easy due to advances in computer power and factoring algorithms. What is considered a secure RSA modulus today (say 1024-bit) may be breakable in the future.

___________________________________________________________________

Decrypting the challenge message

The modulus N as indicated above is a 129-digit number that is known as RSA-129. The number pair N and e= 9007 is an RSA public key. With only the knowledge of public key represented by N and e and the ciphertext c, how hard is it to recover the plaintext m? An answer emerged only after 17 years of the publication of the challenge. In this post, we demonstrate how to retrieve the plaintext from the ciphertext by using the prime factors p and q of the modulus N. Here’s are the factors p and q (found here):

    p=
    3490529510847650949147849619903898133417764638493387843990820577

    q=
    32769132993266709549961988190834461413177642967992942539798288533

As mentioned earlier, we take the prime factors p and q of N as a given. Then compute the product of p-1 and q-1:

    \phi(N)=(p-1) \times (q-1)

The function \phi is important one in number theory and goes by a special name, Euler’s phi function. Then the decryption exponent d is the number satisfying the following congruence relationship:

    ed \equiv 1 \ (\text{mod} \ \phi(N))

where e=9007. In other words, the number d is the multiplicative inverse of e modulo \phi(N). The typical approach in finding d is to use the extended Euclidean algorithm. Once d is found, the following modular exponentiation is computed:

    c^d \equiv m \ (\text{mod} \ N)

We use the predetermined rule to recover the original English phrase. The steps are summarized as follows:

  1. Using the Euclidean algorithm to compute \text{GCD}(e,\phi(N)), the greatest common divisor (GCD) of E and \phi(N)=(p-1) \times (q-1).
  2. The GCD in the previous step should be 1. Performing Euclidean algorithm is so that we can work backward (the extended Euclidean algorithm) to compute d=e^{-1} modulo \phi(N).
  3. Use the fast powering algorithm to compute c^d \equiv m \ (\text{mod} \ N).
  4. Use the predetermined mapping scheme to obtain the plaintext.

___________________________________________________________________

Step 1. Euclidean algorithm

The following is the product of p-1 and q-1.

    \phi(N)=
    11438162575788886766923577997614661201021829672124
    23625625618428994472727416195373314872857532203455
    12393667541112959643090434432

Next, use Euclidean algorithm to find \text{GCD}(9007,N), which should be 1. The point is that we will work backward to find the multiplicative inverse of e= 9007. The Euclidean algorithm consists of a series of divisions until reaching the GCD. The divisions are shown in the following matrix:

    \left[\begin{array}{rrrrrrrr}      \phi(N) & = & 9007 & \times & T & + & 5166 & \text{ }\\      9007 & = & 5166 & \times & 1 & + & 3841 & \text{ } \\      5166 & = & 3841 & \times & 1 & + & 1325 & \text{ } \\      3841 & = & 1325 & \times & 2 & + & 1191 & \text{ } \\      1325 & = & 1191 & \times & 1 & + & 134 & \text{ } \\      1191 & = & 134 & \times & 8 & + & 119  & \text{ } \\      134 & = & 119 & \times & 1 & + & 15 & \text{ } \\      119 & = & 15 & \times & 7 & + & 14 & \text{ } \\      15 & = & 14 & \times & 1 & + & 1 & \text{*} \\      14 & = & 1 & \times & 14 & + & 0 & \text{ }    \end{array}\right]

where T is the following number:

    T=
    12699192379026187151019849003680094594228743945957
    85084518284033523340432348390555473379435474856728
    2379667762974681874441038

The last column in the above table shows the remainders in the divisions. The last non-zero remainder (marked by *) is the GCD of 9007 and \phi(N), in this case 1. Next, we apply extended Euclidean algorithm to solve the following equation:

    9007 \times x + \phi(N) \times y = 1

The solution x will be the decryption exponent d.
___________________________________________________________________

Step 2. Extended Euclidean algorithm

The extended Euclidean algorithm is to take the above table of divisions and perform back substitutions. The process of doing back substitutions is logically clear but can be tedious. We use a tabular formulation of this process as shown below.

    \left[\begin{array}{rrrrrrrrr}      \text{divisor} & \text{ } & \text{quotient} & \text{ } & \text{back substitution} & \text{ } & \text{sign} & \text{ }\\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      \phi(N) & \text{ } & \text{ } & \text{ } & W & \text{ } & - & \text{ } & \text{Step 9}\\      9007 & \text{ } & T & \text{ } & 605 & \text{ } & + & \text{ } & \text{Step 8}\\      5166 & \text{ } & 1 & \text{ } & 347 & \text{ } & - & \text{ } & \text{Step 7}\\      3841 & \text{ } & 1 & \text{ } & 258 & \text{ } & + & \text{ } & \text{Step 6}\\      1325 & \text{ } & 2 & \text{ } & 89 & \text{ } & - & \text{ } & \text{Step 5}\\      1191 & \text{ } & 1 & \text{ } & 80 & \text{ } & + & \text{ } & \text{Step 4}\\      134 & \text{ } & 8 & \text{ } & 9 & \text{ } & - & \text{ } & \text{Step 3}\\      119 & \text{ } & 1 & \text{ } & 8 & \text{ } & + & \text{ } & \text{Step 2}\\      15 & \text{ } & 7 & \text{ } & 1 & \text{ } & - & \text{ } & \text{Step 1} \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      14 & \text{ } & 1 & \text{ } & \bold 1 & \text{ } & + & \text{ }\\      1 & \text{ } & 14 & \text{ } & \bold 0 & \text{ } & - & \text{ }          \end{array}\right]

where W is the following number:

    W=
    76830113893108432263670086472264572295083900873044
    99761335618402816209615707762860613945584622883205
    839698996599682534036828337

Let’s explain the table for the back substitution. The first number in the first column is \phi(N). The rest of the numbers in the first column are the divisors in the Euclidean algorithm. Note that the bottom number is always the GCD. The second column consists of the quotients in the Euclidean algorithm. The third column is where the back substitution takes place. It works from bottom up. At the bottom of the third column, the numbers are always 0 and 1 (in bold). To make it clear to see, there is blank row separating the bottom two rows from the rows above. The fourth column has the signs of the back substitution numbers. Starting with minus at the bottom, the signs are alternating: minus, plus, minus, plus etc (going from bottom to top).

We now demonstrate the calculation for the back substitution. The following shows the calculation for the rows labeled Step 1, Step 2, Step 3 and Step 9:

    1=1 \times 1 + 14 \times 0 (Step 1)

    1=7 \times 1 + 1 \times 1 (Step 2)

    1=1 \times 8 + 7 \times 1 (Step 3)

    W=T \times 605 + 1 \times 347 (Step 9)

The idea is that each number is the sum of the cross multiplications of the two rows below. This recurrence relation is a more efficient way of performing extended Euclidean algorithm.

The first two rows of the above table give the solution to the equation 9007 \times x + \phi(N) \times y = 1. In the first two rows, cross multiply the divisor and back substitution columns (along with the sign) as follows:

    9007 \times (-W) + \phi(N) \times 605 = 1

Then -W is the solution to ed \equiv 1 \ (\text{mod} \ \phi(N)). We need d to be positive. To find d, simply add \phi(N) to -W. Thus d is the least positive integer such that -W \equiv d \ (\text{mod} \ \phi(N)) where

    d=
    10669861436857802444286877132892015478070990663393
    78628012262244966310631259117744708733401685974623
    06553968544513277109053606095

One additional comment about the above table. When you cross multiply divisor column and back substitution column for any two rows, the result is the GCD. In particular, when this is done on the first two rows, we solve the equation 9007 \times x + \phi(N) \times y = 1 and find the multiplicative inverse of e as a result.

___________________________________________________________________

Step 3 and Step 4. Decipher

With the decryption key d solved in the above step, we are now ready to calculate the following modular exponentiation:

    c^d \equiv m \ (\text{mod} \ N)

The standard approach is to use the fast powering algorithm. The idea is to use the binary expansion of the exponent d to convert the exponentiation c^d into a series of squarings and multiplications. The exponent d is a 426-bit numbers, which means that the fast powering algorithm only requires 425 squarings and at most 425 multiplications. The algorithm is fast and efficient since it runs in polynomial time. Our calculator produces the following result:

    m=
    20080500130107090300231518041900011805001917210501
    1309190800151919090618010705

Using the predetermined mapping of A = 01, B = 02 (and so on), the numeric plaintext is converted to the following English phrase:

    the magic words are squeamish ossifrage

which is the title of this blog post.

___________________________________________________________________

Remark

The above demonstration shows how to derive the decryption key d from the knowledge of the two prime factors p and q of the modulus N. Thus the prime factors p and q should be kept secret. Of course, the decryption key d should also be kept secret so that the originator/creator of the public key is the only one who can read the encrypted messages sent to him or her.

___________________________________________________________________

Exercise

Suppose the public key N and e is as above. Verify that encrypting the plaintext m will derive the ciphertext c given in the challenge problem. The plaintext m is repeated below.

    m=
    20080500130107090300231518041900011805001917210501
    1309190800151919090618010705

___________________________________________________________________

Reference

  1. Atkins D., Graff M., Lenstra, A. K. Lenstra, Leyland P. C., The magic words are squeamish ossifrage, ASIACRYPT-94, Lecture Notes Comput. Sci. Volume 917, Springer-Verlag, Berlin, 1995.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}