Foundations of Coding: Compression, Encryption, Error Correction (2015)

Chapter 3. Cryptology

3.4 Public Key Cryptography

3.4.1 Motivations and Main Principles

We have seen that symmetric cryptosystems can be almost secure and efficient in terms of computation time. Nevertheless, from the middle of the seventies, new questions were raised:

·     Before using a symmetric cryptosystem, how is a key agreed upon?

·     How is a secure communication between two entities with no preliminary key exchange established ?

In order to answer these questions, Diffie and Hellman laid the foundations of public key cryptosystems in 1976, using the analogy with a mail-box and assuming that Bob is the only person having the key:

·     any person is able to send a message to Bob;

·     only Bob is able to read the messages stored in his mail- box.

On the other hand, a symmetric cryptosystem can be seen as a safe whose key is shared between Alice and Bob.

Keeping in mind such a system and in order to keep the notation introduced in Equation (3.1 ), one has c3-math-0424. To be more precise, c3-math-0425 is a public key that is published in some kind of directory (actually in the form of a certificate: see Section 3.6.2) in such way that anyone is able to obtain this key, check its origin, and encrypt a message with it. c3-math-0426 is a private and secret key that is kept and used by Bob in order to decrypt the ciphertexts. As the two keys are distinct, public key encryption is also called asymmetric encryption(Figure 3.16).

nfg016

Figure 3.16 Principle of public key encryption

Is it possible to give the characteristics of such a cryptosystem using the notions of information theory? In such a context, the measure of the secrecy consists in computing the remaining information in the private key, assuming that the public key is known: c3-math-0427. Besides, an asymmetric algorithm must satisfy c3-math-0428 with the functions D and E publicly known. This point implies that the key c3-math-0429 is completely determined once the public key c3-math-0430 is known: one only has to invert the function c3-math-0431. In theory, at least, it implies that c3-math-0432 and, thus, that an asymmetric system has no secret at all. In practice, the two keys c3-math-0433 and c3-math-0434 are linked but they are chosen in such way that it is far too difficult to compute the value of the key c3-math-0435 from c3-math-0436.

For this purpose, one uses the principle of one-way functions, which was introduced in Section 1.3.3.4. Therefore, the secret of a public key cryptosystem cannot be characterized by Shannon's information theory; this secret does not come from the uncertainty of the private key c3-math-0437 but rather from the inner difficulty of computing c3-math-0438 from the only c3-math-0439 and the ciphertext C. The mathematical tool that enables one to characterize this difficulty is the theory of algorithmic complexity.

The most well-known and most widely used public key encryption algorithm—mainly because of its ease of use—is the RSA system, whose name comes from its designers, Rivest, Shamir, and Adleman. The difficulty of RSA is based on the difficulty of factoring an integer.

3.4.2 Rivest-Shamir-Adleman (RSA) Encryption

When combining the Euler theorem and the Chinese Remainder theorem (see page 38), one is able to obtain a common congruence relation for all elements in c3-math-0440 when n is the product of two prime numbers. This result, which was presented by Rivest, Shamir, and Adleman in 1978, is at the origin of RSA public key cryptography.

Theorem 24 (RSA Theorem)

Let c3-math-0441 be the product of two prime numbers.Let a be any element in c3-math-0442. Then, for any positive integer k, one has c3-math-0443.

Proof

There are two cases: if a is invertible in c3-math-0444, then one directly applies the Euler theorem and thus c3-math-0445. Computing the c3-math-0446 power of this relation and multiplying it by a, one obtains the desired relation.

If a is zero, the relation modulo n is immediate. Otherwise, one uses the Fermat theorem twice modulo p and modulo q.

Indeed, if a is nonzero modulo p prime, then c3-math-0447. Computing the c3-math-0448 power and multiplying by a, one obtains c3-math-0449.

On the other hand, if c3-math-0450, then one has necessarily c3-math-0451. In the same way, one obtains a similar relation modulo qc3-math-0452.

As p and q are coprime, it is enough to apply the uniqueness of the Chinese Remainder theorem to have c3-math-0453.

From this statement, one deduces the RSA cryptosystem. In such system, a user generates a pair (public key and private key) using the following procedure:

1.  Choose two large primes p and qp and q must contain at least 150 digits each.

2.  Compute c3-math-0454

3.  Choose a small integer e coprime with c3-math-0455

4.  Compute d, the inverse of e modulo c3-math-0456.

5.  Release the pair c3-math-0457 as the RSA public key.

6.  Keep the pair c3-math-0458 secret: this pair is the RSA private key.

Therefore, one has

1.  RSA encryption: c3-math-0459

2.  RSA decryption: c3-math-0460

One easily checks that, for c3-math-0461, one has c3-math-0462. Indeed, c3-math-0463, according to RSA theorem.

Exercise 3.16 (RSA Encryption)

1.  Determine the public key and the private key for c3-math-0464 and c3-math-0465 with c3-math-0466 (and justify the validity of this choice).

2.  Encrypt the letter B in ASCII code (66) using the public key and check that the private key actually enables one to recover the original message.

Solution on page 310.

Exercise 3.17 (RSA Encryption/Decryption)

We now consider the toy RSA public key c3-math-0467.

1.  What is the ciphertext associated with the message c3-math-0468 using this key?

2.  Compute the private key c3-math-0469 corresponding to the public key c3-math-0470.

3.  Decrypt the message c3-math-0471. One may use the following result: c3-math-0472.

4.  Is it possible to obtain a ciphertext equal to 625 using this public key? Same question with the private key.

Solution on page 311.

Exercise 3.18 (Vigenère Encryption and RSA)

We consider a cryptosystem over the alphabet c3-math-0473 where each symbol in the alphabet is represented with a number between 0 and 26 as follows: ngr003

Given a key c3-math-0474 and a plaintext c3-math-0475, the ciphertext c3-math-0476 is given by

equation

This is what we call Vigenère encryption. Here is a French ciphertext that Oscar intercepted for you:

HWQIO QVPIF TDIHT Y-WAF NGY-F COMVI CGEVZ CVIAF JDFZK

YLYHG YGEHR SHMMX CVHBF AJYKN ZIXHP ZHEQY YJRHT YWMUK

   YKPBY YGEHA G-DY- YWDTF MHFZK YZYHX CISVI CHIVZ}

1.  Propose a method for the cryptanalysis of the Vigenère encryption starting by determining the length of the key.

2.  Oscar and Eve also work on the cryptanalysis of this text they have intercepted. Oscar sends to Eve the length of the key that he has managed to determine. For this, he uses the RSA public key of Eve c3-math-0478. Ivan intercepts the RSA ciphertext containing the length of the key: he obtains 10. What is the length of the key?

3.           Eve has managed to decrypt the second and the third encryption key. She sends them to Oscar using the RSA public key of the latter c3-math-0479. In the same way, Ivan intercepts the ciphertexts of c3-math-0480 (he obtains 48) and c3-math-0481 (4). What are the values of c3-math-0482 and c3-math-0483?

Note: one may use the following results:

c3-math-0484 and c3-math-0485.

4.  Now that you have the length of the key, decrypt this text. Specify the key used for the encryption and the decryption (in the form of a string of characters). The repartition (in %) of the symbols used in a text written in French is summarized in Table 3.11.

Solution on page 311.

Table 3.11 Frequential Repartition of Symbols in a Text Written in French

18.35%

E

14.86%

S

6.97%

A

6.40%

N

6.23%

3.4.2.1 Efficiency and Robustness of RSA

Thanks to the algorithms that we have seen in the previous sections:

·     It is easy to generate large prime numbers, at least when accepting an error margin (see Miller–Rabin test, Section 1.3.4.1, on page 44). In the case of RSA, the error is not very dangerous. Indeed, if one commits an error and believes that p and q are prime numbers, one will soon realize that they are not: either the key d is not invertible or some parts of the decrypted message are incomprehensible. In this case, one is still able to change the RSA keys (by recomputing p and q);

·     Computing the pair c3-math-0486 is very easy: it is enough to apply the extended Euclidean algorithm;

·     Finally, encryption and decryption are performed using modular exponentiation. We have seen that such exponentiation can be performed quite efficiently.

The security provided by RSA mainly relies on the difficulty of factoring large integers. Indeed, if an attacker is able to factorize the number c3-math-0487 of the public key, then he is able to deduce directly c3-math-0488 and thus to compute the private key c3-math-0489 from the public key c3-math-0490 using the extended Euclidean algorithm. Therefore, if there exists a fast algorithm for factoring large numbers, breaking RSA also becomes easy.

After 20 years of research, no more efficient method than factoring n has been published to break RSA. However, the reciprocal proposition : “ if factoring large numbers is hard then breaking an RSA ciphertext is hard” has not been proved. In other words, today, one does not know if breaking RSA is as difficult as factoring n.

However, one is able to demonstrate the equivalence between “determining d from c3-math-0491” and “factoring n.” One way is trivial: knowing p and q, one is able to easily compute d for this is what is done during the key generation. Reciprocally, computing p and q from c3-math-0492 can be obtained:

·     by a deterministic algorithm if e is small (typically c3-math-0493—see Exercise 3.19);

·     by a probabilistic algorithm (a variant of Miller–Rabin algorithm) in general (Exercise 3.20).

Exercise 3.19 (Factoring n from ne, and d for e Small)

We consider the RSA cryptosystem, with c3-math-0494 as keys, where e is “small.”

1.  Prove that there exists c3-math-0495, such that c3-math-0496.

2.  One assumes that p and q are different from 2 and 3. Prove that c3-math-0497.

3.  Then propose an algorithm for factoring n.

Solution on page 313.

Exercise 3.20 (Factoring n from ne, and d)

One considers an RSA cryptosystem c3-math-0498. Let c3-math-0499, such that c3-math-0500 divides c3-math-0501 (In other words: c3-math-0502).

1.  Show that there exists a variant of the Miller–Rabin algorithm that enables one to factorize nthat is, to determine p and q. The algorithm should perform only a single attempt and should return either the factors of n or an error.

2.  What is the average number of calls of this algorithm that one should perform in order to find out the factorization of n?

Solution on page 313.

Hence, breaking RSA by computing the private key d from the public key c3-math-0503 is as difficult as factoring n (this problem is known to be difficult if p and q are very large). The confidence in RSA relies on this statement. Current factoring limits deal with numbers of approximately 232 decimal digits (so far, the current record6 is held by Kleinjung, Aoki, Franke, Lenstra, Thomé, Bos, Gaudry, Kruppa, Montgomery, Osvik, te Riele, Timofeev, and Zimmermann, obtained in December 2009, during the RSA-768 challenge). Today, one considers that a modulo n of 2048 bits is secure.

3.4.2.2 Attacks on RSA

The key generation which was presented previously is a simplified version, and in practice, one should be careful when generating the keys. These precautions are the results of attacks exploiting some relations between the RSA encryption parameters. Some attacks will be presented in the sequel in the form of exercises. Recommendations concerning the implementation of RSA-based cryptosystems (key generation, etc.) are summarized in the PKCS#1 standard. This standard is regularly updated (depending on the breakthroughs in the cryptanalysis of RSA).

Exercise 3.21 (Common Exponent Attack on RSA)

The RSA public keys of William, Jack, and Averell are, respectively, equal to c3-math-0504, and c3-math-0505. Joe sends the same message x to each one of them, with c3-math-0506, using their respective public keys.

Prove that Lucky Luke, who is able to obtain c3-math-0507, and c3-math-0508 on the network, can easily compute x.

Hint. One recalls (or assumes !) that for a and k integers, the method of Newton–Raphson (convergence of the sequence c3-math-0509) enables one to compute very quickly the value of c3-math-0510, with an arithmetic complexity bound of the order of c3-math-0511.

Solution on page 315.

Exercise 3.22 (Common Modulus Attack on RSA)

An implementation of RSA gives the same modulus n (product of two prime numbers) to Alice and Bob but with two different pairs of keys c3-math-0512 and c3-math-0513. Moreover, one assumes that c3-math-0514 and c3-math-0515 are coprime (it is true in general). Then, one assumes that Alice and Bob receive the same message M encrypted using their respective public key. Oscar intercepts both messages c3-math-0516 and c3-math-0517, which are known to be two ciphertexts of the same plaintext. Show that Oscar is then able to easily recover the message M.

Solution on page 316.

Exercise 3.23 (Well Chosen Ciphertext Attack on RSA)

Eve intercepts the ciphertext c sent by Bob to Alice: c3-math-0518. In order to decrypt c, Eve proceeds in the following way:

1.  Eve chooses an integer c3-math-0519 randomly and computes x:= c3-math-0520;

2.  Eve computes c3-math-0521;

3.  Eve asks Alice to sign y using her private key; Alice sends c3-math-0522 to Eve.

Show that Eve is now able to recover the message m sent by Bob. Morality?

Solution on page 316.

Exercise 3.24 (Factorial Attack or Pollard p-1 Attack)

Let p and q be the two prime numbers that are used to build the modulus c3-math-0523 of an RSA public key. The integer c3-math-0524 is even; let B be the smallest integer such that c3-math-0525 (factorial of B) is a multiple of c3-math-0526. In other words, there exists an integer c3-math-0527such that: c3-math-0528

1.   Let c3-math-0529 be a prime factor of c3-math-0530. Show that c3-math-0531.

2.  Let c3-math-0532 be an integer. Show that c3-math-0533.

3.  Let c3-math-0534; deduce that c3-math-0535 is a multiple of p.

4.  Let k be an integer; what is the cost of computing c3-math-0536?

5.  Deduce a bound on the cost of computing c3-math-0537 with respect to B and c3-math-0538.

6.  If c3-math-0539 is a divisor of c3-math-0540, with B small, then an attacker who knows n but neither p nor q is able to perform the following attack: assuming that C is a small upper bound of B (c3-math-0541), he computes

equation

7.  Prove that, if G is neither equal to 1 nor n, then an attacker has necessarily broken RSA with the modulo n.

8.           One assumes that c3-math-0543 is a divisor of c3-math-0544 with c3-math-0545; moreover, one assumes that c3-math-0546 is small (for instance c3-math-0547).

Show that this attack enables one to break RSA. Give an upper bound on the cost of this attack.

9.  What is the counter-measure ?

Solution on page 316.

3.4.3 El Gamal Encryption

While RSA relies on the problem of factoring integers, El Gamal encryption exploits the Discrete Logarithm Problem (DLP), which was studied in Section 1.3.3.3. The key generation is performed in the following way:

·     Let p be a prime number for which the DLP is difficult in c3-math-0548.

·     Let c3-math-0549 be a primitive root.

·     Let s be an integer and c3-math-0550.

·     Then, the public key is the triplet c3-math-0551.

·     The corresponding private key is c3-math-0552.

1.  El Gamal encryption: Let M be the plaintext one wishes to encrypt and let c3-math-0553 be a secret random number. One has

equation

2.  El Gamal decryption: For c3-math-0555, one defines

equation

Indeed c3-math-0557.

Therefore, it is possible to easily encrypt a message using modular exponentiation. Decryption without the private key is equivalent to solving the discrete logarithm.

However, when one has the private key, it is enough to perform a modular exponentiation, followed by the extended Euclidean algorithm to find the inverse of an element in c3-math-0558. The main interest of this kind of encryption is that, for a given level of security, it can work with smaller numbers – hence quicker –than RSA. Indeed, the complexity of the attacks performed on RSA and DLP make it that the size of the RSA parameters must be roughly twice as large as those required for an El Gamal encryption.

Besides, as DLP can be applied in any group, one is able to extend El Gamal to other groups in which the Discrete Logarithm Problem is much more difficult. For instance, this is the case for the group of points on an elliptic curve, see Section 1.3.6, in which index calculus attacks do not succeed. On the basis of the conjectured complexity bound of the currently known attacks on RSA and Elliptic Curve Cryptography (ECC), the NESSIE7 project as well as the U.S. NIST thus proposed the equivalence guide of Table 3.12.

Table 3.12 Conjectured Compared Security of Block Ciphers

 

80

112

128

196

256

bits

Skipjack

3-DES

AES-small

AES-medium

AES-large

RSA

1024

2048

3072

7168

13312

ECC

192

224

256

384

521

Indeed, the only known attacks for the discrete logarithm in elliptic curve groups are of baby-step/giant-step or Pollard's rho type, with complexity bound c3-math-0559, where n is the number of points in the curve. Thus, the recommended number of points in the elliptic curve (and therefore the cardinal number of the underlying finite field) should be around c3-math-0560, where t is the required security.