## Foundations of Coding: Compression, Encryption, Error Correction (2015)

### Chapter 3. Cryptology

Now, let us focus on extending the encoding scheme to cases in which the transmitted information contains a secret. We must assume that roads are not controlled and that the transmitted message can be read by anyone. However, only the recipient must be able to recover the information it contains, that is, the original message. The applications are obviously numerous for both military and commercial uses.

Cryptology—literally “ science of secret” - consists of two main components, namely *cryptography*, which deals with studying and building processes for information encryption, and *cryptanalysis*, which deals with analyzing (or attacking) ciphertexts in order to recover the hidden information. In this chapter, we present how to build secure systems, and this requires the knowledge of defense as well as attack techniques, in order to develop a global strategy of security.

First, we model the possible threats and derive general cryptography principles. Then, two types of ciphers are developped: secret key cryptography that mimics perfect secrecy in realistic settings and public key cryptography that eases key exchanges. Authentication of a sender and electronic signature are developped in Section 3.5, and we end this chapter with the study of several usual protocols for secure information exchanges.

### 3.1 General Principles

**3.1.1 Terminology**

The fundamental objective of cryptography is to enable two people, traditionally called *Alice* and *Bob*, to communicate through an insecure channel in such way that any opponent, *Oscar*, having access to the information circulating onthe communication channel is not able to understand what is exchanged. For instance, the channel can be a phone line or any other communication network.

The information Alice wishes to transmit to Bob is called *plaintext* (or *clear text*). It can be a text in English, numerical data or any other arbitrary information. Encoding, that is to say the transformation processing of a message *M* in order to make it incomprehensible, is called encryption or ciphering. Hence, one generates a *ciphertext C* from an *encryption function E* by . The process of recovering the plaintext from the ciphertext is called *decryption* or *deciphering* and it uses a *decryption function D*. Therefore, one must have . Thus, *E* is injective (a ciphertext is associated with at most one source message) and *D* is surjective or onto.

A *cryptographic algorithm* is an algorithm that computes the value of the mathematical functions related to encryption and decryption. In practice, security of messages is always provided by what we call the *keys*. These are parameters for functions *E* and *D*, which are denoted and , whose value belongs to a set called *key space*. Hence, one reaches the fundamental relation of cryptography (illustrated in __Figure 3.1__):

__3.1__

** Figure 3.1** Fundamental relation of cryptography

This kind of relation between the keys and —which are used in encryption and decryption—enables one to highlight two main classes of cryptographic systems:

** Figure 3.2** Principle of an integrity checking algorithm

· On the one hand, symmetric cryptosystems: the key is a shared secret between the sender and the recipient; such systems are described in Section 3.2.

· On the other hand, public key (or asymmetric) cryptosystems: no secret information is shared *a priori* among several protagonists; such systems are described in Section 3.2.

**3.1.2 What is the Use of Cryptography?**

Communications exchanged between Alice and Bob suffer from several threats (these threats are described a little further on, see Section 3.1.3). Cryptography provides several functionalities or services to overcome these threats, which are summarized in the acronym CAIN:

1. **Confidentiality** of the information stored or manipulated: this is done using encryption algorithms. Confidentiality consists of *preventing unauthorized access* to information for nonrecipients. They are able to read ciphertexts transmitted on the channel, but they are not able to decrypt them.

2. **Authentication** of the protagonists in a communication (or more generally in a resource) exchange. One must be able to detect identity theft. For instance, Alice may identify herself by proving to Bob that she knows a secret *S*, and that she is the only one who knows it.

3. **Integrity** of information stored or manipulated. The question here is to check that the message was not *falsified* during transmission (__Figure 3.2__). This integrity checking is not put on the same level as what we will see in Chapter 4. It rather deals with a voluntary and malicious modification of the information performed by a third party during the transfer on the channel. Generally, these modifications are hidden by the third party so that they are hardly noticeable. In __Figure 3.2__, for example, integrity checking of a message*M* is performed using a function *f* such that it is very difficult to find two messages and having the same image *A* through *f*.

4. **Nonrepudiation** of information. This is not a protection against a third party, but rather a protection for protagonists against each other. If Alice sends a message *M*, she must not be able afterward to pretend to Bob that she did not or that she has sent a message and that it was actually misunderstood. For this, one associates signature-based algorithms.This aspect is discussed in more detail in Section 3.5.

Up to a recent period, encryption only took confidentiality into account, and only symmetric methods were developed. During recent 40 years, new trends have emerged:

· authentication has become at least as important as secrecy. It is especially true in e-business: one must be able to prove that the order actually came from the person to whom the shipment is sent, in order to avoid disputes.

· some part of the key have to be public, so that the number of keys needed to communicate with a large number of interlocutors does not explode.

Possibly, another criterion is fundamental: one has to consider the computation speed for encryption and decryption and the size of the ciphertexts. As operations are performed on a large amount of data, the efficiency criterion is very important when encrypting—for example—audio or video streams “ on-the-fly” while using a minimum part of the bandwidth.

An ideal cryptosystem would solve all these issues simultaneously: it would use public keys, ensure secrecy, authentication, and integrity, and it would be fast enough. Unfortunately, up to now, there does not exist a single technique satisfying all these criteria. Common systems such as AES (Section 3.2.4) are efficient but they use only private keys; public key cryptosystems are able to ensure authentication but their computation is often costlier. They are thus usually inefficient for ciphering a large amount of data. This trade-off has induced the development of hybrid cryptographic protocols, such as Pretty Good Privacy (PGP) (Section 3.6.3), which are based on both symmetric and asymmetric cryptosystems.

In order to build good cryptosystems, one has to study the several types of attacks that Oscar might perform in order to recover the meaning of the exchanged messages. A good knowledge of all types of threats bearing upon the secret enables one to actually ensure secrecy.

**3.1.3 Main Types of Threats**

**3.1.3.1 Passive/Active Attacks**

First of all, one distinguishes *passive attacks*, in which Oscar only listens to messages exchanged between Alice and Bob, and *active attacks*, in which Oscar is able to modify the message during transmission. The first type of attack threatens confidentiality of information, whereas the second one might induce modifications of information and identity thefts.

**3.1.3.2 Cryptanalysis and Attacks on Encryption Schemes**

Generally, one assumes that Oscar knows which cryptosystem is used (according to Kerckhoffs' principles). One distinguishes the possible attack levels:

1. **Ciphertext only attack (COA).** Oscar only knows a ciphertext *C*.

2. Known **plaintext attack (KPA).** Oscar disposes of both a plaintext *M* and its ciphertext *C*.

3. Chosen **plaintext attack (CPA).** Oscar is able to choose the plaintext *M* and to get its ciphertext *C*.

4. Chosen **ciphertext attack (CCA).** Oscar is able to choose a ciphertext *C* and to get its plaintext *M*.

In all cases, ensuring confidentiality of the communications between Alice and Bob means that Oscar is not able to

· find *M* from ; the encryption scheme must resist attacks on ciphertexts.

· find the decryption method *D* from a sequence derived from a plaintext sequence ; the system must be resistant against attacks using plaintexts.

Encryption functions use keys as parameters. In order to “ break” an encryption algorithm, most of the time one tries to find the value of these keys. For instance, one may consider that a key is good if its entropy (see the definition in Chapter 1) is highbecause it means that it does not contain recurring patterns providing information on its form.

In addition to the attack levels we have just defined, Oscar may also use several general algorithms:

1. Brute **force attack.** This consists in trying all possible values for the key, that is, the exploration of the whole key space. The complexity bound of this method appears immediately: for a key on 64 bits, one has to try distinct combinations. On a computer trying one billion keys per second, it would take 584 years to be sure to find the key__ ^{1}__.

2. Known **sequence attack.** This kind of attack consists in assuming that part of the plaintext is known (for instance, it can be standard headers in the case of e-mails) and in starting from this knowledge in order to guess the key. This attack may be successful if the encryption algorithm preserves the regularities of the initial message.

3. Forced **sequence attack.** This method, which is based on the previous one, consists in leading the victim to ciphering a block of data whose content is known by the attacker.

4. Differential **analysis attack.** This attack uses the slight differences between successive messages (server ID (*logs*) for instance) to guess the key.