The magic words are squeamish ossifrage

The title of the blog post is the answer to a decryption challenge problem in connection to a factoring problem of a 129-digit number that is known as RSA-129. The challenge problem was posed in 1977. Our goal here is to use this challenge problem to demonstrate how to encrypt and decrypt in the RSA algorithm. The author of this blog recently built a calculator that can handle modular exponentiation involving large numbers. The challenge problem of RSA-129 provides an opportunity to verify the calculator.

In the August 1977, Martin Gardner wrote an article called “A new kind of cipher that would take millions of years to break” in his Mathematical Games column in Scientific American. This article posed a challenge problem created by the inventors of the RSA cryptosystem. Here’s the gist of the problem.

    An English sentence is converted into a number according to the rule that maps A to 01, B to 02, C to 03, . . . ,Y to 25, Z to 26, with 00 representing a space between words. Call the resulting number m. Then raise m to e= 9007 and then reduce the exponentiation result modulo the following 129-digit number N.

      N=
      11438162575788886766923577997614661201021829672124
      23625625618429357069352457338978305971235639587050
      58989075147599290026879543541

    In modular arithmetic, the notation for this calculation is m^{9007} \equiv c \ (\text{mod} \ N), where c is the following number:

      c=
      96869613754622061477140922254355882905759991124574
      31987469512093081629822514570835693147662288398962
      8013391990551829945157815154

    The number c is the ciphertext. What is the plaintext m? What is English phrase that is behind m?

During the 17 years after the publication of the challenge problem, no one other than the creators of the problem knew the English phrase. In 1994, a team led by Atkins, Graff, Lenstra and Leyland [1] cracked the code and found that the answer is the title of this blog post. The effort that led to the answer was a massive computing project that was distributed over the Internet. Their goal was to factor the 129-digit number N (the modulus) into the two prime factors p and q. A more detailed description of the problem can also be found in a piece from MAA (see here).

Once the factors of the modulus are known, the decryption exponent d was calculated using the extended Euclidean algorithm. Then raise the ciphertext c to d and reduce modulo N. The result is the plaintext m. Using the predetermined rule described earlier, the plaintext m is translated back to the original English phrase. In the remainder of this post, we demonstrate how this process works with the prime factors p and q as a given.

The modulus N is a 129-digit number that is known as RSA-129. It is the product of two prime numbers with 64 and 65 digits. Instead of taking millions of years to factor this modulus as suggested in the title of Martin Gardner’s article, it took a mere 17 years! In 1977, Rivest, one of the creators of the RSA algorithm, estimated that factoring a 125-digit number that is a product of two 63-digit prime numbers would require at least 40 quadrillion years using the best factoring algorithm available. This goes to show that what was considered hard or impossible at one point in time may one day become doable or even easy due to advances in computer power and factoring algorithms. What is considered a secure RSA modulus today (say 1024-bit) may be breakable in the future.

___________________________________________________________________

Decrypting the challenge message

The modulus N as indicated above is a 129-digit number that is known as RSA-129. The number pair N and e= 9007 is an RSA public key. With only the knowledge of public key represented by N and e and the ciphertext c, how hard is it to recover the plaintext m? An answer emerged only after 17 years of the publication of the challenge. In this post, we demonstrate how to retrieve the plaintext from the ciphertext by using the prime factors p and q of the modulus N. Here’s are the factors p and q (found here):

    p=
    3490529510847650949147849619903898133417764638493387843990820577

    q=
    32769132993266709549961988190834461413177642967992942539798288533

As mentioned earlier, we take the prime factors p and q of N as a given. Then compute the product of p-1 and q-1:

    \phi(N)=(p-1) \times (q-1)

The function \phi is important one in number theory and goes by a special name, Euler’s phi function. Then the decryption exponent d is the number satisfying the following congruence relationship:

    ed \equiv 1 \ (\text{mod} \ \phi(N))

where e=9007. In other words, the number d is the multiplicative inverse of e modulo \phi(N). The typical approach in finding d is to use the extended Euclidean algorithm. Once d is found, the following modular exponentiation is computed:

    c^d \equiv m \ (\text{mod} \ N)

We use the predetermined rule to recover the original English phrase. The steps are summarized as follows:

  1. Using the Euclidean algorithm to compute \text{GCD}(e,\phi(N)), the greatest common divisor (GCD) of E and \phi(N)=(p-1) \times (q-1).
  2. The GCD in the previous step should be 1. Performing Euclidean algorithm is so that we can work backward (the extended Euclidean algorithm) to compute d=e^{-1} modulo \phi(N).
  3. Use the fast powering algorithm to compute c^d \equiv m \ (\text{mod} \ N).
  4. Use the predetermined mapping scheme to obtain the plaintext.

___________________________________________________________________

Step 1. Euclidean algorithm

The following is the product of p-1 and q-1.

    \phi(N)=
    11438162575788886766923577997614661201021829672124
    23625625618428994472727416195373314872857532203455
    12393667541112959643090434432

Next, use Euclidean algorithm to find \text{GCD}(9007,N), which should be 1. The point is that we will work backward to find the multiplicative inverse of e= 9007. The Euclidean algorithm consists of a series of divisions until reaching the GCD. The divisions are shown in the following matrix:

    \left[\begin{array}{rrrrrrrr}      \phi(N) & = & 9007 & \times & T & + & 5166 & \text{ }\\      9007 & = & 5166 & \times & 1 & + & 3841 & \text{ } \\      5166 & = & 3841 & \times & 1 & + & 1325 & \text{ } \\      3841 & = & 1325 & \times & 2 & + & 1191 & \text{ } \\      1325 & = & 1191 & \times & 1 & + & 134 & \text{ } \\      1191 & = & 134 & \times & 8 & + & 119  & \text{ } \\      134 & = & 119 & \times & 1 & + & 15 & \text{ } \\      119 & = & 15 & \times & 7 & + & 14 & \text{ } \\      15 & = & 14 & \times & 1 & + & 1 & \text{*} \\      14 & = & 1 & \times & 14 & + & 0 & \text{ }    \end{array}\right]

where T is the following number:

    T=
    12699192379026187151019849003680094594228743945957
    85084518284033523340432348390555473379435474856728
    2379667762974681874441038

The last column in the above table shows the remainders in the divisions. The last non-zero remainder (marked by *) is the GCD of 9007 and \phi(N), in this case 1. Next, we apply extended Euclidean algorithm to solve the following equation:

    9007 \times x + \phi(N) \times y = 1

The solution x will be the decryption exponent d.
___________________________________________________________________

Step 2. Extended Euclidean algorithm

The extended Euclidean algorithm is to take the above table of divisions and perform back substitutions. The process of doing back substitutions is logically clear but can be tedious. We use a tabular formulation of this process as shown below.

    \left[\begin{array}{rrrrrrrrr}      \text{divisor} & \text{ } & \text{quotient} & \text{ } & \text{back substitution} & \text{ } & \text{sign} & \text{ }\\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      \phi(N) & \text{ } & \text{ } & \text{ } & W & \text{ } & - & \text{ } & \text{Step 9}\\      9007 & \text{ } & T & \text{ } & 605 & \text{ } & + & \text{ } & \text{Step 8}\\      5166 & \text{ } & 1 & \text{ } & 347 & \text{ } & - & \text{ } & \text{Step 7}\\      3841 & \text{ } & 1 & \text{ } & 258 & \text{ } & + & \text{ } & \text{Step 6}\\      1325 & \text{ } & 2 & \text{ } & 89 & \text{ } & - & \text{ } & \text{Step 5}\\      1191 & \text{ } & 1 & \text{ } & 80 & \text{ } & + & \text{ } & \text{Step 4}\\      134 & \text{ } & 8 & \text{ } & 9 & \text{ } & - & \text{ } & \text{Step 3}\\      119 & \text{ } & 1 & \text{ } & 8 & \text{ } & + & \text{ } & \text{Step 2}\\      15 & \text{ } & 7 & \text{ } & 1 & \text{ } & - & \text{ } & \text{Step 1} \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      14 & \text{ } & 1 & \text{ } & \bold 1 & \text{ } & + & \text{ }\\      1 & \text{ } & 14 & \text{ } & \bold 0 & \text{ } & - & \text{ }          \end{array}\right]

where W is the following number:

    W=
    76830113893108432263670086472264572295083900873044
    99761335618402816209615707762860613945584622883205
    839698996599682534036828337

Let’s explain the table for the back substitution. The first number in the first column is \phi(N). The rest of the numbers in the first column are the divisors in the Euclidean algorithm. Note that the bottom number is always the GCD. The second column consists of the quotients in the Euclidean algorithm. The third column is where the back substitution takes place. It works from bottom up. At the bottom of the third column, the numbers are always 0 and 1 (in bold). To make it clear to see, there is blank row separating the bottom two rows from the rows above. The fourth column has the signs of the back substitution numbers. Starting with minus at the bottom, the signs are alternating: minus, plus, minus, plus etc (going from bottom to top).

We now demonstrate the calculation for the back substitution. The following shows the calculation for the rows labeled Step 1, Step 2, Step 3 and Step 9:

    1=1 \times 1 + 14 \times 0 (Step 1)

    1=7 \times 1 + 1 \times 1 (Step 2)

    1=1 \times 8 + 7 \times 1 (Step 3)

    W=T \times 605 + 1 \times 347 (Step 9)

The idea is that each number is the sum of the cross multiplications of the two rows below. This recurrence relation is a more efficient way of performing extended Euclidean algorithm.

The first two rows of the above table give the solution to the equation 9007 \times x + \phi(N) \times y = 1. In the first two rows, cross multiply the divisor and back substitution columns (along with the sign) as follows:

    9007 \times (-W) + \phi(N) \times 605 = 1

Then -W is the solution to ed \equiv 1 \ (\text{mod} \ \phi(N)). We need d to be positive. To find d, simply add \phi(N) to -W. Thus d is the least positive integer such that -W \equiv d \ (\text{mod} \ \phi(N)) where

    d=
    10669861436857802444286877132892015478070990663393
    78628012262244966310631259117744708733401685974623
    06553968544513277109053606095

One additional comment about the above table. When you cross multiply divisor column and back substitution column for any two rows, the result is the GCD. In particular, when this is done on the first two rows, we solve the equation 9007 \times x + \phi(N) \times y = 1 and find the multiplicative inverse of e as a result.

___________________________________________________________________

Step 3 and Step 4. Decipher

With the decryption key d solved in the above step, we are now ready to calculate the following modular exponentiation:

    c^d \equiv m \ (\text{mod} \ N)

The standard approach is to use the fast powering algorithm. The idea is to use the binary expansion of the exponent d to convert the exponentiation c^d into a series of squarings and multiplications. The exponent d is a 426-bit numbers, which means that the fast powering algorithm only requires 425 squarings and at most 425 multiplications. The algorithm is fast and efficient since it runs in polynomial time. Our calculator produces the following result:

    m=
    20080500130107090300231518041900011805001917210501
    1309190800151919090618010705

Using the predetermined mapping of A = 01, B = 02 (and so on), the numeric plaintext is converted to the following English phrase:

    the magic words are squeamish ossifrage

which is the title of this blog post.

___________________________________________________________________

Remark

The above demonstration shows how to derive the decryption key d from the knowledge of the two prime factors p and q of the modulus N. Thus the prime factors p and q should be kept secret. Of course, the decryption key d should also be kept secret so that the originator/creator of the public key is the only one who can read the encrypted messages sent to him or her.

___________________________________________________________________

Exercise

Suppose the public key N and e is as above. Verify that encrypting the plaintext m will derive the ciphertext c given in the challenge problem. The plaintext m is repeated below.

    m=
    20080500130107090300231518041900011805001917210501
    1309190800151919090618010705

___________________________________________________________________

Reference

  1. Atkins D., Graff M., Lenstra, A. K. Lenstra, Leyland P. C., The magic words are squeamish ossifrage, ASIACRYPT-94, Lecture Notes Comput. Sci. Volume 917, Springer-Verlag, Berlin, 1995.

___________________________________________________________________
\copyright \ \ 2014 \ \text{Dan Ma}