## Foundations of Coding: Compression, Encryption, Error Correction (2015)

### Chapter 2. Information Theory and Compression

### 2.5 Lossy Compression

**2.5.1 Deterioration of Information**

We have seen that lossless compression does enable one to compress above the threshold of entropy and that heuristics of entropy reduction enable one to drop this threshold by changing the model of the source. But none of these methods do take the type of file to compress and its purpose into account. Let us take the example of a numerical picture. Lossless compressors will transmit the file without using the fact that it is actually an image and the file will still be fully recoverable. But there might exist far more efficient formats for which, if part of the information is lost, the difference with the original picture will never be *visible*.

Therefore, one will not be able to recover the complete initial files. However, it will be possible to visualize images with almost the same quality. This means that the quality that was formerly encoded in the initial file is superfluous, because it does not induce any visible difference. Thus lossy compressors are specific to the type of data one wishes to transmit or store (mostly pictures, sounds, and videos), and they use this information in order to encode the visual result of the file rather than the file itself. This type of compression is necessary for video formats, for example, in which the size of the data and the related computations are very large.

Lossy compressors will enable one to overcome the threshold of entropy and to obtain very high compression rates. Not only do they take the pure information into account but they also consider the way it is perceived by human beings by modeling retinal or audio persistence, for example. One is able to measure the efficiency of these compressors with respect to the compression rate, but still the quality of the resulting pictures can only be evaluated by the experience of the users. Therefore, we present here the standard formats that have been well tried. Besides, for audiovisual information,another important criterion will be coupled with compression rate: that is, decompression speed. In particular, for sounds or videos, hearing and visualization require very fast computations to read the compressed files. Thus the complexity of the decompression algorithms is a constraint as important as the size of the compressed messages when dealing with audiovisual information.

**2.5.2 Transformation of Audiovisual Information**

Compact formats for storing pictures, sounds, or videos always use extended sources. They do not encode the file pixel by pixel but they take the pixel and its neighborhood into account in order to capture global parameters such as contours, regularities, or intensity variation. These data are treated in the form of pseudo-periodic signals. For instance, if one part of an image is a soft gradation, the frequency of the signal is low, whereas the frequency of another part of the picture in which colors and textures often vary is high. This enables one to perform a very efficient encoding on low frequency zones. Besides, brutally deleting very high frequency zones is often unnoticeable and, hence, acceptable.

**2.5.3 JPEG Format**

The Joint Photographic Experts Group (JPEG) encoding format compresses static images with a loss of information. The encoding algorithm is complex and is performed in several steps. The main idea is that usually the colors of contiguous pixels in an image hardly differ. Moreover, an image is a signal: instead of computing the value of the pixels, one computes the frequencies (Discrete Fourier Transform or Discrete Cosine Transform for JPEG.

**2.5.3.1 DCT-JPEG**

Indeed, one can consider that an image is a matrix containing the color of each pixel; it appears that digital pictures are mostly composed of zones having low DCT frequencies when considering the color luma (or *luminance*, i.e., the brightness in an image in the encoding of an image in Luma/Chroma mode) rather than the Red/Green/Blue (RGB) mode. There exist several formulas to switch from the RGB mode to the Luminance/Chrominance color space (YUV) mode (Y for the luma and U and V for the color information) depending on the physical model of light.

A classic formula for finding the luma sums weighted values of R, G, and B: where , and . Then U and V are computed as scaled differences between Y and the blue and redvalues, using the scales and , and following the linear transformation:

Therefore, the idea is to apply the DCT (see Section 1.4.2.4) to the luma matrix.

However, the DCT will transform this matrix into another matrix containing the high frequencies in the bottom right corner and the low frequencies in the top left corner. Hence, the luma of __Figure 2.7__ is transformed by the DCT such that :

Possibly, the human eye gives more importance to brightness than to colors. When only keeping the first terms of the luma DCT decomposition (the most important ones), one looses some information but the picture remains distinguishable.

** Figure 2.7** The RGB image with white background (here in gray-levels)

This operation is called *quantization*, and it is the only step with the loss of information in the whole JPEG encoding scheme. The operation consists of rounding the value , where *Q* is the following quantization matrix:

We then obtain the above matrix *C*. In the example, one can notice that after quantization, many values are identical and close to zero. The choice of a quantization matrix with values close to 1 (respectively far from 1) enables one to increase (respectively to decrease) the detail level.

Eventually, the picture is encoded as a sequence of numbers (a DCT quantization value followed by the number of consecutive pixels having the same value according to the zigzag scan presented in __Figure 2.8__). Thanks to the small values one has already noticed, one can hope that entropy after this RLE has been considerably reduced.

** Figure 2.8** JPEG zigzag scan

Then, the JPEG algorithm ends with a statistical encoding, such as the Huffman or the arithmetic codings (__Figure 2.9__).

** Figure 2.9** JPEG compression

**Exercise 2.16 (JPEG )**

Suppose that we apply the JPEG algorithm on smaller blocks and that the luma value is computed on a given block.

1. Compute the DCT matrix of this block.

2. Apply the quantification , for .

*Solution on page 301*.

This is for the luma. For the chrominance, one can also use subsampling, as the human eye is less sensitive to the position and motion of color than to its luminance. The idea is use the same *U* and *V* values for several consecutive pixels. A subsampling represents a block of pixels where the luma values are stored, but only *a* chroma values for the first *J* pixels and *b* chroma values for the second row of *J* pixels. For instance, represents the full resolution without any subsampling.

**Exercise 2.17 (Subsampling Formats)**

For the following subsampling formats, give the compression gain with respect to the full resolution and their horizontal and vertical resolutions: . *Solution on page 301*.

Notice that in the JPEG-2000 standard__ ^{1}__, which replaced the DCT-JPEG standard, the cosine transform and quantization are replaced by a wavelet transform where only some levels are preserved. The number of levels or bands one chooses to keep directly affects the compression rate one wishes to reach.

**2.5.3.2 Watermarking JPEG Images**

A classical way of embedding some information into JPEG images is to use the lower bits of the quantized coefficients. Indeed, by construction, JPEG quantization uses a modifications of the luma that induces little visual distortion. Therefore, a modification of the lower bits of the large coefficients in the *C* matrices should remain invisible.

For instance, suppose that *r* is an acceptable distortion rate for the luma coefficient (i.e., that coefficients modified with at most that rate will remain visually close to the original). Then any coefficient larger than can embed up to bits of additional information.

The watermarking algorithm can be introduced within the JPEG compression:

1. Transform the original image into its quantized matrices. Use the rate *r* to identify the coefficients capable of storing some additional information.

2. The lower bits of the large enough coefficients of every *C* matrix are set to zero. Then bits can be added to the image in the latter identified positions (where *i* and *j* are taken over all the *C* matrices comprising the image). These bits can be the result of a compression, then potentially followed by an encryption and an error-correction coding depending of the required property.

3. Terminate the JPEG compression with the lossless compressions (the RLE zigzag and the statistical encoding).

Then the extraction of the watermark is performed also during JPEG decompression: after the lossless decoding, the watermark is recovered also by identifying the large enough coefficients with respect to *r* and extracting their lower bits.

The classical steganalysis consists of comparing the entropy of the suspected image to that of the average entropy of similar images. Another approach uses the statistical tests of Section 1.3.7.4.

**Exercise 2.18 (Steganalysis of Watermarked JPEG Images)**

1. **1.** An image without any hidden data should have a similar luma for adjacent DCT coefficients. Design a way to build an expected distribution of luma using this fact.

2. Which statistical test would you then use to discriminate an image with an embedded content?

*Solution on page 302*.

**2.5.3.3 Secure JPEG-2000 (JPSEC)**

Part 8 of the JPEG-2000 standard provides tools to generate, consume, and exchange secure JPEG-2000 codestreams. This is referred to as JPSEC and defines a framework that includes confidentiality, integrity, authentication, access control, or Digital Right Management (DRM).

This is achieved using the various techniques presented in this book such as encryption, electronic signature, or watermarking.

Now a first approach could be to treat the JPEG-2000 bitstream as an input for any of the security algorithms. The idea of JPSEC is instead that these security techniques could be performed during the JPEG compression as was shown, for example, for the watermarking example of Section 2.5.3.2. For instance, the JPSEC bitstream still retains the rich structures of the JPEG-2000 standard such as tile, layer, and resolution.

The JPSEC syntax uses three types of security tools: template tools, registration authority tools, and user-defined tools that define which protection methods are used. It defines a Zone of Influence (ZOI) to describe on which part of the image the security tool is applied. The standard already proposed the following tools:

· A flexible access control scheme;

· A unified authentication framework;

· A simple packet-based encryption method;

· Wavelet and bitstream domain scrambling for conditional access control;

· Progressive access;

· Scalable authenticity;

· Secure scalable streaming.

For instance, multilevel access control could uses several keys to decipher selected levels of details in the JPEG image using the JPEG layers; partial encryption could reveal only parts of the image; and so on.

**2.5.4 Motion Picture Experts Group(MPEG) Format**

The MPEG format defines the compression for animated images. The encoding algorithm uses JPEG in order to encode an image, and it also takes into account the fact that two consecutive images in a video stream are very close. One of the singularities of the MPEG norms is motion compensation (the movement can be a zoom, for instance) from an image to the next one. A MPEG-1 output contains four types of images: images in format I (JPEG), images P encoded using differential encoding (one only encodes the differences between the image and the previous one: motion compensation enables one to have very close consecutive images), bidirectional images B encoded using differential encoding with respect to the previous image and the next one (in case the next image is closer than the previous one), and eventually low resolution images used for fast forward in videotape recorders. In practice, with P and B images, a new full I image every 10–15 images is enough.

As video streams consist of images and sounds, one must also be able to compress sounds. The MPEG group defined three formats: MPEG-1 Audio Layer I, II, and III, the latter being the well-known MP3 format. MPEG-1 is the combination of image and sound compressions, coupled with a time-stamp synchronization and a system reference clock as presented in __Figure 2.10__.

** Figure 2.10** MPEG-1 compression

Since 1992, the MPEG format has evolved from MPEG-1 ( pixels and a throughput of 1.5 Mbits/s) to MPEG-2 in 1996 (with four different resolutions from up to pixels plus an additional subtitle and language multiplexed stream: this format is currently used in DVDs) and to MPEG-4 (format for videoconferences and DivXs: video becomes an object-oriented scene), started in 1999.

**Exercise 2.19 (PAL Video Format)**

PAL, Phase Alternating Line, is an analogical color television coding system using images of *576* pixels per line and *720* lines. Its frame rate (number of images per second) is usually of *25* images per second.

1. If each pixel uses a color encoding of *1* byte per color with a subsampling (see Exercise ), what would be the size of an uncompressed, *90*-min long, movie?

2. Suppose the same movie is compressed in a DivX format and fits onto a CD-ROM. What is the compression gain?

*Solution on page 302*.

Sound is decomposed into three layers: MP1, MP2, and MP3, depending on the desired compression rate (i.e., the loss of information). In practice, compression is performed with factor *4* for MP1, *6*–*8* for MP2 (this format is used in Video CDs for instance), and for MP3 with respect to the analogical signal.

In __Figure 2.11__, we give an idea of the MP3 transform: first of all, analogical sound is filtered and modified by the Fourier transform (with loss) and simultaneously by a Discrete Cosine Transform.

** Figure 2.11** MP3 sound compression

Then, the combination of these two transforms enables one to perform a quantization and control the result in order to preserve only the main audible part. At the end of the algorithm, statistical compression is obviously performed in order to reduce the file size and reach its final entropy.

^{1}__http://www.jpeg.org/jpeg2000__.