gut-aha. s, .h 5;? in} A. o r-oaucma‘nv i. ..., .2 .. This is to certify that the thesis entitled RATE DISTORTION BASED RATE CONTROL METHODS FOR SCALABLE VIDEO presented by SHARADHA PARTHASARATHY has been accepted towards fulfillment of the requirements for ”-5- degree in ELECTRICAL ENGR. /‘/// / / // 1:, 1’ / 1377/ K: 1:11/4 ; / . z 1 V L / f, (2" / Major professor ’/ f y. Date 7/23/02 0-7639 MS U is an Affirmative Action/Equal Opportunity Institution _ LIBRARY Michigan State University PMCE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c-JCIRCJDateDuopGS-uts RATE-DISTORTION BASED RATE-CONTROL METHODS FOR SCALABLE VIDEO By Sharadha Parthasarathy A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Electrical and Computer Engineering 2002 ABSTRACT RATE-DISTORTION BASED RATE-CONTROL METHODS FOR SCALABLE VIDEO By Sharadha Parthasarathy Scalable video coding techniques are used to facilitate the delivery of streamed video content over networks with varying channel characteristics and unpredictable quality-of-service ((208) such as the Internet and wireless Local-Area-Networks (LANs). Rate control methods are typically used with scalability solutions to adapt the video to variations in bandwidth availability. In this thesis, we develop two rate- distortion (RD) based rate-control algorithms for scalable video coded using the new fine-grained scalability (F GS) technique incorporated in the MPEG-4 video-coding standard. First, we show that the distortion measures, which are based on a square error criterion, can be computed using a very simple approach in the transform (DCT) domain. Second, we describe the two RD algorithms and their implementations within the transform domain of the FGS MPEG-4 framework. Finally, we present the results of experiments conducted to test the performance of the RD-based algorithms against the existing rate-control method. We find, upon analysis of these results in the transform domain, that the proposed rate-control algorithms provide optimum (convex) RD curves for a wide range of bitrates and video sequences. Improvements shown in the RD curves are particularly salient over bitrate ranges that can be supported over emerging Internet access technologies (e. g., DSL, cable modems, and wireless LANs). To My Parents and Meera, For all your love and support iii ACKNOWLEDGEMENTS I would like to express my sincere gratitude to my advisor, Dr. Hayder Radha, for providing me with the opportunity to work with him. Without his guidance and support over the last two years, this thesis would not be possible. I would also like to thank the members of my defense committee, Dr. John R. Deller and Dr. Percy Pierre, for their valuable comments and advice. Thanks are also due to all the members of the WAVES lab at Michigan State University for their constructive reviews of my work and more importantly for their fi’iendship and support. I would specially like to thank Apama Gurijala for always being there for me. Finally, I would like to thank my parents and my sister for teaching me the importance of hard work and patience, for their perpetual belief in my abilities and for their irreplaceable support through all these years. iv TABLE OF CONTENTS List of Figures 1 Introduction 1.1 Objective Of Thesis Work 1.2 Streaming Video Over The Internet 1.3 Scalability Techniques 1.3.1 Fine Granularity Scalability (FGS) 1.4 Summary And Thesis Organization 2 Background 2.1 Rate Distortion Theory 2.2 Image And Video Compression 2.2.1 Compression Techniques 2.2.2 Video Coding Schemes 2.2.3 Distortion Measures 2.3 MPEG Video Coding Standard 2.3.1 Compression 2.3.2 Frame Types 2.3.3 Transform Coding 2.4 MPEG-4 Scalability Structure 2.5 MPEG-4 Video Coding Framework 2.5.1 The Encoder 2.5.2 The Streaming Server 2.5.3 The Decoder vii \ILIIUJNH 10 12 13 14 15 l6 16 18 19 20 22 22 27 28 2.6 Summary 3 Bitplane Rate-Distortion Based Rate-Control Algorithm 3.1 Rate And Distortion Measures 3.2 The Rate Control Algorithm 3.2.1 Mathematical Formulation 3.2.2 Description 3.3 Experimental Analysis 3.4 Conclusions 4 Cross-Bitplane Rate-Distortion Based Rate-Control Algorithm 4.1 Developing The Algorithm 4.1.1 Description 4.2 Experimental Results 4.3 Conclusions 5 Summary Of Conclusions 5.1 Future Work References vi 29 31 34 37 38 41 44 65 69 72 73 81 83 84 88 LIST OF FIGURES Figure 2-1 Typical (optimal) RD tradeoff curve 11 Figure 2-2 4:2:0 composite color standard 17 Figure 2-3 I, P and B Frames in MPEG [30] 19 Figure 2-4 FGS scalability structure [30]. 21 Figure 2-5 System configuration for streaming video over the Internet [42] 22 Figure 2-6 Basic (SNR) encoder for the FGS base and enhancement layers. It is clear that the added complexity of the FGS enhancement-layer encoder is relatively small [18]. 23 Figure 2-7 An example of different number of bitplane levels for the different color components. 25 Figure 2-8 an example of the number of passes required to code a block. This number depends on the arrangement of Is and Os in the block. A continuous, uninterrupted string of zeros is called a run. Here, number of runs =4. 26 Figure 2-9 Examples of the FGS scalability structure at the encoder (left), streaming server (center), and decoder (right) for a typical unicast Internet streaming application. The top and bottom rows of the figure represent base-layers without and with Bi-directional (B) frames, respectively. Note that only part of the enhancement layer is transmitted in real time. [18] 28 Figure 2-10 Basic (SNR) FGS decoder for the base and enhancement layers [18]. 29 Figure 3-1 Pruning of macroblocks- width of the macroblocks represents the number of bits needed to encode that macroblock (a) Raster-scan rate—control (b) R—D based rate-control 33 vii Figure 3-2 Different rate-distortion curves Figure 3-3 Akiyo I frame Qp=8 Figure 3-4 Akiyo P frame Qp=8 Figure 3-5 Akiyo B frame Qp=8 Figure 3-6 Akiyo I Frame Base layer=100k Figure 3-7 Akiyo P Frame Base layer=100k Figure 3-8 Akiyo B Frame Base layer=100k Figure 3-9 Foreman I frame for Qp=28 Figure 3-10 Foreman P frame for Qp=28 Figure 3-11 Foreman B frame for Qp=28 Figure 3-12 Foreman 1 frame Base layer=100k Figure 3-13 Foreman P frame Base layer=100k Figure 3-14 Foreman B frame Base layer=100k Figure 3-15 Mobile I frame for Qp=28 Figure 3-16 Mobile P Frame Qp=28 Figure 3-17 Mobile B Frame Qp=28 Figure 3-18 Mobile I frame Base Layer=100k Figure 3-19 Mobile P Frame Base Layer=100k Figure 3-20 Mobile B frame Base layer=100k 35 47 48 49 50 51 52 53 54 55 56 57 58 59 6O 61 62 63 64 Figure 4-1 Pruning of macroblocks- width of the macroblocks represents the number of bits needed to encode that macroblock (a) BPRD 4020,, is selected from bitplane level=j (b) CBPRD- 2m macroblocks on different bitplane levels viii is selected from 68 Figure 4-2 Akiyo I, P and B frames for Qp=8 Figure 4-3 Akiyo I, P and B frames for Base layer=100k Figure 4-4 Foreman I, P and B frames for Qp=28 Figure 4-5Mobi1e I, P and B frames for Qp=28 Figure 4-6 Mobile I, P and B frames with Base layer=100k ix 75 76 77 78 8O 1 Introduction The advent of the World Wide Web has essentially transformed the Internet into the world’s largest public network. The popular applications have evolved from being file transfer and e-mail to video conferencing, Internet telephony, Web searching and multimedia streaming. For streaming applications in particular, a primary consideration is the necessary ability to transmit data at sustained and relatively high rates. Recent technological advances enable the network to handle these data rates. Some examples of the new tools proposed in this regard include scalability and rate control. In this chapter, we first state the objective of the work presented in this thesis. We then provide an introduction to streaming video over the Internet and briefly examine certain constraints posed by this network on such an application. We also discuss the scalability techniques used to work around these constraints. We conclude this chapter with an overview of the organization of the rest of this thesis. 1.1 Objective Of Thesis Work One of the scalability techniques proposed for the MPEG-4 video-coding standard is fine-granularity scalability (FGS)[17]. This technique involves coding the video into two layers- a base layer and a single enhancement layer. The base-layer is the non-scalable video component. It determines the minimum quality of the reconstructed video at the receiving l end. The enhancement layer is the scalable video component. In MPEG-4 FGS, the enhancement layer is partially transmitted and its size depends on the available bandwidth for transmitting the video. Rate-control is essentially carried out in the FGS enhancement layer by means of a brute-force raster-scan bit allocation method. Such a rate-control method has very low complexity but it is not optimal. In this thesis, we develop and analyze the performance of two rate—control algorithms that result in optimal rate-distortion performance for the F GS enhancement layer. The algorithms are operational rate-distortion algorithms, in the sense that they are designed specifically for the MPEG-4 FGS encoder-decoder system. The analysis and development presented in this thesis is carried out in the transform domain. The distortion measures selected for the two algorithms are designed to minimally increase the computational complexity. The presented RD-based algorithms are easy to implement and have low complexity. They are also independent of the rate-control scheme used in the base layer. 1.2 Streaming Video Over The Internet The Internet has its roots in the ARPANET[26]. It was initially designed for networking research and for scientists to exchange information amongst themselves. The primary applications were download-based, wherein the transfer and viewing are temporally sequential. Real-time multimedia applications differ from the traditional Internet applications in that viewing takes place at the same time as downloading. For example, in the case of streaming applications, such as video streaming, the user views the video as the transfer occurs. In other words, transfer and viewing are concurrent. The best effort model on which 2 the Internet is based[41], does not offer data delivery reliability. Hence, there is a variation in the probability of packet loss. This could lead to an unacceptable quality in a video streaming application. Another concern in the streaming of video over the Internet is high bandwidth variation. The heterogeneous nature of the access-technologies of the receivers (analog modems, cable modems, DSL etc.) and congestion over the core network are two of the primary reasons for non-constant bandwidth availability. It is necessary to overcome the problems posed by unreliable packet transmission and bandwidth availability for dependable video streaming quality. The objective of many contemporary research efforts in this field has been to find Internet video coding solutions so as to make Internet video quality comparable to television telecasts. One generic framework that addresses the primary concerns of video coding and networking is scalability. A video application that is to be streamed over the Internet must “adapt” to changes in the network conditions- specifically to the variations in the available bandwidth. This is made possible by using scalable video. Scalable video coding primarily aims to provide minimum real-time processing of the video, high adaptability of the coded video to network conditions, low complexity decoding and resilience to packet-losses. 1.3 Scalability Techniques Scalability is one of the most desirable properties in a flexible video encoder-decoder module. The basic idea here is to use layers of video. The layers comprise of a single non— scalable base layer and one or more enhancement layers. If a user is incapable or unwilling to reconstruct the video at its complete resolution, scalability allows that user to decode subsets of the layered bitstream. Therefore, he or she can display the video at a lower spatial or temporal resolution or with lower quality. Another important purpose for layering of video is that it makes the bitstream suitable for prioritized transmission. Some common tools used in recent video coding standards (such as MPEG-2) are: data partitioning, SNR scalability, spatial scalability and temporal scalability[34][28]. Hybrid scalability is also supported. Briefly, the salient features of these scalable techniques are: Data Partitioning: Here, the bitstream is partitioned into important error-prone data (e. g., header information and motion vectors) and less-important data (e.g., transform coefficients). This method, which allows for different protection levels during transmission for the data types of data in the compressed video bitstream, has low implementation complexity. It helps in concealment of transmission errors and channel errors. SNR Scalability: In this method, video layers at the same spatial resolution are generated. These layers, however, have different video qualities. The lower layer (usually known as the base layer) provides a basic video quality and the enhancement layers, when added to the base layer, augment the video quality. The layers are generated from the same source. Spatial Scalability: This scalability technique involves creating different video layers having different spatial resolution. The layers are again, generated from a single video source. The base layer provides the basic spatial resolution. The enhancement layer(s) encode the interpolated video from the base layer. Temporal Scalability: Here, the base layer has the basic temporal resolution. The enhancement layer(s) are coded using temporal prediction with respect to the base layer. At the decoder, the layers are temporally multiplexed to yield the full temporal resolution of the video source sequence. 1.3.1 Fine Granularity Scalability (FGS) In the layered coding schemes mentioned above, the enhancement layer is similar to the base layer in the sense that, by itself, it is non-scalable. In other words, if the enhancement layer is not entirely received, it cannot be used to enhance the video quality at all. The desired objective is to have a video coding scheme that provides scalability progressively through the enhancement layer. A new scalability technique called fine-granularity-scalability (FGS) was proposed in [17] that addressed the said objective. This technique has been included in the MPEG-4 standard for video coding[21][18][42]. The major difference between this technique and the other scalable coding techniques is that the enhancement layer can be truncated into any specified number of bits within every frame. This allows for partial enhancement proportional to the number of bits decoded for each frame. The FGS base and enhancement layers employ a video compression algorithm wherein the original video frames are first transform coded using the discrete cosine transform (DCT). The DCT coefficients are then quantized and run-length encoded. The base layer is coded at a constant bitrate by means of a rate-control algorithm. A rate control algorithm is fundamentally a bit allocation algorithm, which guarantees that the average bitrate of the 5 compressed video is a coded at a desired bitrate. This enables any video transmitter (e. g., an Internet video server) to send the compressed bitstream at that constant rate. One rate—control mechanism that is used for the F GS base layer is the MPEGZ-TMS (test model 5) algorithm[39]. In the base layer, the DCT coefficients are first subject to quantization. The quantized values are subsequently run-length coded to form the base layer bitstream. For every frame of the video sequence, the difference between the original frame and the frame reconstructed afier inverse quantization gives what is called the residual frame. The residual frame is also DCT coded. The residual DCT coefficient for every pixel of every frame is entropy encoded to form the FGS enhancement layer. The entropy coding method used for the FGS enhancement layer is bitplane coding. In this method, each residual DCT coefficient value is considered to be an integer that consists of several bits. Therefore, the residual DCT coefficients, which belong to a given FGS enhancement-layer frame, form multiple bitplanes. These DCT bitplanes, which range from the most-significant-bit (MSB) plane to the least-significant one, give the desired SNR (quality) scalability of the FGS structure. Details of this coding method are explained later in Chapter 3. F GS enables the separation between the encoding process and the transmission process. In other words, the encoder compresses the bitstream prior to the time when the stream is sent to the receiver(s). This separation is a required feature for applications that rely on streaming pre-compressed video that is already stored in a web-server. When the video has to be streamed, the base layer is sent through completely using the constant bitrate that it has been coded at. An FGS-compatible streaming server sends the appropriate portion of the 6 enhancement layer in a raster-scan order, starting from the most-significant-bit (MSB) plane of that layer. When the server reaches the bandwidth limit (that is available between the transmitter and the receiver), it discards the remainder of the enhancement layer. This process represents a rate-control procedure that is very simple to implement, but is not optimal. In this thesis, we develop and analyze the performance of an optimal rate-distortion (RD) based rate-control mechanism for the FGS enhancement layer. Based on this framework we present two RD-based algorithms. These algorithms are easy to implement and have low complexity. They are also independent of the rate-control scheme used in the base layer. 1.4 Summary And Thesis Organization Internet content developers have increasingly used multimedia to make information more interactive and accessible. There are a growing number of web sites that make available the option of streaming applications, particularly streaming of video content. Considerable research has been done towards improving the quality of streamed video [2][3][l7][37][41][34]. Scalable video coding has been adopted to improve the overall performance and quality of the video received by the end user. Fine-Granularity Scalability (F GS) is used with the MPEG-4 video-coding standard for this purpose. Simple rate control algorithms have been defined and developed for the FGS base layer. The enhancement layer is cut using a coarse raster-scan approach. In this thesis, we introduce a rate-distortion based solution for optimal rate-control of the FGS enhancement layer. The remainder of this document is organized as follows. Chapter 2 provides a short discussion of classical rate—distortion theory. It also explicates the need for image data compression and lists some common encoding techniques. This is followed by deliberations on the issue of determining a suitable measure of distortion. A few important aspects of the MPEG video-coding framework are emphasized. Also, an overview of the F ine-Granularity- Scalability technique used in the MPEG-4 video-coding standard is presented in this chapter. This includes a discussion of the bitplane-coding method used in FGS and some important features of the F GS encoder and decoder. In Chapter 3, the mathematics and logistics for the bitplane rate-distortion based rate-control (BPRD) algorithm are described in detail. These are accompanied by a comprehensive discussion of experimental results. In Chapter 4, the mathematical formulation, the algorithm and the results of experimental analysis of an extension to the BPRD, called the cross-bitplane rate-distortion rate-control (CBPRD) algorithm, are elucidated. The final chapter provides a summary of the conclusions made from the analysis of the two algorithms proposed in this thesis and states the possible future work in this area. 2 Background The need for effective and standardized image compression procedures has been amplified by the rapid growth in digital imaging applications such as multimedia, teleconferencing and high-definition television (HDTV). In the past few decades, standards have emerged for still images (JPEG) and motion video (MPEG). In the case of transmitting video over a network such as the Internet, a primary concern is compressing the video to fit within the available bandwidth. A compression scheme typically introduces some error into the video. It is essential that an optimal balance be struck between the rate to which the video is restricted and the distortion introduced by such a restriction. A branch of classical information theory, called rate-distortion theory addresses this optimization problem. The first part of this chapter contains a simple explanation of rate-distortion theory and its relation to the compression of image data. Some of the techniques used for still image compression and video coding are discussed. Also, the problem of determining a suitable distortion measure for different types of image data is studied. In the second part of this chapter, a few important aspects of the MPEG video-coding framework in general and the MPEG-4 standard in particular are emphasized. MPEG-4 video is intended to provide standardized technology for efficient storage, transmission and manipulation of video data in the broad spectrum of multimedia applications. Instead of being application specific, the solutions are presented in the form of tools and algorithms that have functionalities common to a set of applications. Efficient compression, object scalability, spatial and temporal scalability and error resilience are examples of such functionalities. Our focus in the later sections of this chapter is specifically the fine- granularity scalability (FGS) scheme included in MPEG-4. 2.1 Rate Distortion Theory Rate-Distortion theory can be traced back to Shannon [5][6]. In his Coding Theorem, Shannon states that a source with an entropy rate of H can be transmitted reliably over any channel of capacity C subject to the condition that C>H. The converse of this theorem states that not all the source data can be recovered reliably when H>C. It follows from this that in order to recover all data correctly, we can either decrease H or increase C. We typically assume that the channels are such that they have the minimum possible H and the maximum possible C but the condition of greater entropy (H>C) persists. In such a situation, we attempt to preserve those aspects of the source data that are most essential. In other words, we attempt to minimize the distortion subject to the condition that the rate of transmission cannot exceed the channel capacity. This leads to a trade-off between the rate at which information is provided at the source output and the fidelity with which this output can be reconstructed at the receiving end, on the basis of the information provided. This trade-off is mathematically modeled in the context of information theory by a discipline known as Rate- Distortion theory [35]. Thus, rate-distortion theory is concerned with maximally reducing the redundancy from a source (or the number of bits necessary to represent the source IO information) for a given reproduction quality. A graphical representation of a typical rate- distortion trade-off is shown in Figure 2-1 Typical Rate-Distortion Curve 0.4 I V j cask - l 0.3 ~ \ -‘ \ 0.25 ~ \\ 5 E3 \\ 'O \ "’ 0.2 ~ \ .. 35’ “x E ‘x 015 " \ -i 0 1 \\“\ —1 .\\ D 05 ’- \\\‘\ T “K 0 I 1 l 1 I l 1 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Bitrale bps Figure 2-1 Typical (optimal) RD tradeofl' curve If the bitrate R at which information is provided exceeds the entropy H, then no information loss occurs and hence no distortion. The minimum distortion that can be obtained at any rate between R=H and R=0 increases from D=O to some value D=Dmax. If no information is transmitted (i.e., when R=0), the receiver can estimate the source by using its mean value. In this case, the mean-square distortion D=Dmax is the variance of the source. This distortion is the minimum possible value that can be made in the total absence of information about the output from the source. For a given available rate Rmax, the objective ll of an RD-based algorithm is to identify a subset X ' of the source data X that can be represented with R (or fewer) bits while minimizing some distortion measure D: X‘ =arg{ min 00292)} (2.1) X’ RSRmax We now proceed to look at the concept of rate-distortion in the context of coding image and video signals. 2.2 Image And Video Compression The volume of data associated with visual information is extremely high. The very conversion of an analog video signal into a digital signal calls for a tremendous increase in required transmission bandwidth. For instance, a 4 MHz television signal sampled at Nyquist rate with 8 bits per sample would require a bandwidth of 32 MHz when transmitted using a digital modulation scheme like phase-shift keying (PSK). Thus, the analog to digital conversion results in almost an eightfold increase in bandwidth [1]. Also, video image sources generate data at very high rates. Typical motion data images like television images are generated at rates greater than 10 million bytes per second. Other sources produce images at even higher bit rates. Evidently, transmission of such data requires large channel capacity, which can be very expensive and not available to all users in many applications like streaming video over the Internet. There is, therefore, a need to reduce the data volume and accommodate the data within the available bandwidth. Image data compression [2 techniques are applied to compensate for the increase in bandwidth caused by digital conversion of the video signal. They do so by reducing the number of bits required to transmit such data with tolerable loss of information. 2.2.1 Compression Techniques Compression can be achieved by the use of “lossless” techniques. Here, the data obtained after decompression are an exact copy of the original (e.g. staple software tools like zip or compress). Lossless compression schemes can be classified as spatial coherence based encoding (e.g. run-length encoding and statistical encoding), entropy encoding (e.g. Huffman encoding, arithmetic encoding) and codebook encoding (e.g. LZW coding). Details on these compression schemes are in [27][24]. Such techniques are independent of the type of information. They are only concerned with how the information is represented. They are applicable in cases where a perfect reconstruction of the source is an essential requirement. Lossless compression techniques are mainly used for compressing digital manuscripts etc. When image or video data are being handled, the large volume of the data, the bandwidth and storage requirements limit the performance of a lossless compression method. In such a scenario, “lossy” compression techniques are a preferred alternative [43]. Compression is said to be lossy if the decoded image is not an exact replica but is instead an almost indistinguishable reconstruction of the original image (we can recreate an image that is “indistinguishable” from the original if the properties of the human visual system are correctly exploited). Here, source fidelity is compromised for a better and more feasible coding rate. This compromise or trade—off is exactly the rate-distortion trade-off [3]. The I3 trade-off is between the number of bits used to represent the image (the rate) and the fidelity of the reconstructed representation (the distortion). Lossy encoding is defined as a data compression algorithm that loses some of the input data during encoding in order to achieve a better compression ratio. Some common lossy compression schemes are predictive coding and transform coding [32]. 2.2.2 Video Coding Schemes Digital video coding technology has diversified and it is targeted for a plethora of emerging applications, such as video on demand and digital TV/HDTV broadcasting. The rising commercial interest in video communications posed the need for international image and video coding standards. Today, there are various video coding standards available. These standards were developed in conjunction with the development in coding, transmission, video storage and VLSI design technologies. The Study Group XV of the Community Colleges for Innovative Technology Transfer (CCITT)l proposed the first international standard for video coding, H.120, in l984[8]. With advances in image compression techniques and implementation technology, other standards have been proposed and been more commercially successful. These include H.261, H.263, Motion JPEG, MPEG-1, MPEG-2 and MPEG-4 [33],[31]-[10],[16]. The JPEG and MPEG ' The CCITI‘ was re-named as the International Telecommunications Union — Telecommunications (ITU- T) sector in 1993. I4 visual coding standards were developed under the auspices of the International Standard Organization (ISO). In section 2.3, we review some of the characteristics of the MPEG video coding standards. 2.2.3 Distortion Measures Once an image has been converted into a digital format and is subjected to lossy compression, the question of how to measure the trade-off in the fidelity of the decompressed representation arises. Extensive and continuing studies have been performed for a suitable measure of the distortion for images and video. An objective distortion measure that is widely used and accepted is the mean-squared error (MSE). It has been shown that systems that are optimized for MSE performance also give fairly good results in terms of the perceptive quality of the image [12][40]. The Mean-Squared Error is the sample average of the total squared error between the compressed and the original image. It is mathematically represented as follows: _1_ MN MSE = fi[1(x,y)-1'(x,y)]2 (2.2) l x=l [V]: ‘C ll where, I (x, y) is the original image, I '(x, y) is the decompressed image and M, N are the dimensions of the image. Another metric that is used for evaluating the performance of an image compression technique is the peak signal-to-noise ratio (PSNR). The PSNR is closely related to the MSE and is expressed in terms of the MSE as: 15 PSNR = 20* 16g10 (255/JMSE) (2.3) A lower value of MSE implies a lesser error in the decompressed image. Inversely, a high PSNR is desirable because it implies that the ratio of signal to noise is higher where the ‘signal’ is the original image and the ‘noise’ is the error in reconstruction. 2.3 MPEG Video Coding Standard The Moving Picture Experts Group (MPEG) was founded in 1988 to standardize video coding algorithms for digital storage media at bit rates up to about 1.5 Mbits/s [19]. However, beginning with MPEG-2, the standards have been more generic. This means that the standard is not application specific and mainly consists of a set of tools that can be tailored to meet the needs of the user. In this section, we review some important characteristics of the MPEG framework. 2.3.1 Compression Video consists of sequences of still pictures or frames that, if uncompressed, have data volumes that are extremely high. In the MPEG standard, each frame is divided into an array of 16x16 pixels, called a macroblock. Each macroblock comprises of four 8x8 luminance (intensity) blocks (or Y-blocks) and two 8x8 chrominance (color) blocks (Figure 2-2). The chroma blocks are each one U-block (blue) and one V-block (red). The color information, therefore, has half the horizontal and vertical resolution as the luminance Y information (ITU—R 601). This component video standard is known as the 422:0 standard. A picture with [6 a luminance resolution of 360 lines and 288 pixels per line is known as the common interchange format (CIF). 1W Figure 2-2 4:2:0 composite color standard Popular video coding standards (including MPEG) employ a hybrid Motion- Compensation (MC) prediction and transform-domain DCT-based compression method. First, a macroblock at a given flame is predicted flom one or two adjacent flames using Motion Estimation (ME). The prediction error is then transformed (using DCT), quantized, and entropy-coded. The type of prediction that is used for a particular flame defines different picture types as explained below. 2.3.2 Frame Types MPEG video sequences consists of flames of three types: 17 Intro-coded Frames: (1 Frames): The I flames use DCT encoding to compress one single flame, without being dependent on any other flame. I flames are inserted once every 12-15 flames and are used to start a sequence. Decoding of video can only start at an I frame. Predicted Frames: (P Frames): P flames use motion compensation and DCT encoding to “predict” the values of each pixel. These frames are coded as the difference flom the last I or P flame. They can offer a better compression ratio than I flames, depending on the amount of motion present. Bi-directional Frames: (B Frames): B flames also use motion compensation and DCT encoding to estimate the value of each pixel. They differ flom the P flame in the sense that they can use either the last I or P frame or the next I or P flame to get the value of each pixel. Quantization further improves the compression ratio in all the flames. It is also useful to note, at this juncture, that since B Frames use both previous and subsequent flames for correct decoding, the temporal sequence of the flames in MPEG video differs flom their spatial sequence. Figure 2-3 illustrates the types of flames and their sequential orders in space and time. BIBBPBBPBBPBBI Figure 2-3 I, P and B Frames in MPEG [30] I8 2.3.3 Transform Coding In MPEG video coding, the Discrete Cosine Transform (DCT) is used as part of the video compression algorithm. The DCT is a mathematical transformation of a 2D matrix of pixel values into an equivalent matrix of spatial flequency components. It is an invertible, discrete, separable orthonormal transformation. The DCT can be mathematically represented as: B(k,,k,)= {—13— In] (— 2%) iii/\(iymjnosB—NL"(2i+1)]cos[2;:(2j+1)].f(i,j) i-O j-O where, A(§)= 21’“ 0 l,otherwise (2.4) and A is the input image, B is the output image, A(i, j) is the intensity of the pixel in row i and column j within an Nl x N2 image block, B(k1,k2) is the DCT coefficient in row kl and column k2 of the DCT matrix. The DCT, by itself, is a lossless transform. However, when used as part of an image compression algorithm, it is subjected to quantization and subsequent run-length encoding. In MPEG, the DCT is used to transform blocks of pixel values into blocks of spatial frequency values. These blocks are organized in such a way that the lower-flequency 19 components lie in the upper-left comer and the higher-flequency blocks lie in the lower-right corner. By this, the high flequency components are first dropped or discarded. Once DCT has been applied to an image, no spatial localization can be recovered. Also, canceling out sets of flequencies can lead to high distortion in image regions where the low frequency components have low amplitude. In order to overcome these shortcomings, the compression algorithm in MPEG applies the DCT to 8x8 blocks. This speeds up the algorithm and also, localizes the flequency content. 2.4 MPEG-4 Scalability Structure Scalable video coding solutions have been used to facilitate delivery of multimedia content over networks with varying channel characteristics and unpredictable quality-of- service measures (e.g. the Internet)[7]. All types of scalable video consist of a base layer and one or more enhancement layers. The base layer represents the minimum data needed for decoding the video, so as to obtain a minimum video quality. The enhancement layer represents additional information that, if sent to the decoder, enhances the video quality. Scalable video solutions are usually based on some scalability structure. This structure defines the relationship between the base layer and the enhancement layer flames of the video sequence. One scalability structure supported by MPEG-4 is fine-granular scalability. This structure allows for progressive decoding. This means that the decoder can start decoding once it has the base layer and any or no part of the enhancement layer. As more data is received, the quality of the decoded image improves. This process continues until the entire enhancement layer is received. 20 The F GS scheme in MPEG-4 is based on the scalability structure proposed in [17]. FGS is essentially a hybrid video-scalability solution. It has a prediction-based, motion- compensated base layer and a fine granular scalable enhancement layer[l7]. This is illustrated in Figure 2-4. These characteristics allow it to cater to the dual requirement of efficient coding and continuous fine-grained scalability. FGS Enhancement layer Figure 2-4 FGS scalability structure [30]. 2.5 MPEG-4 Video Coding Framework A typical system configured for streaming video over the Internet has the basic blocks of an encoder, streaming server, channel and decoder. This basic block diagram is shown in Figure 2-5. 21 —-> Encoder l' t/ Server 4 Channel .1 C ten —§ Decoder Figure 2-5 System configuration for streaming video over the Internet [42] In this section, we shall look at the structure of these blocks in the MPEG—4 FGS framework. We proceed in logical order, starting flom the encoder and ending with the decoder. 2.5.1 The Encoder The detailed block diagram for the MPEG-4 F GS encoder is shown in Figure 2-6. The FGS framework requires the encoder to have two encoders- one to generate the base layer 22 "Adaptive Quantization" (optional) FGS EL Frequency Weighling Bitplane DCT scanning and (bit-shifting selected coeff.) entropy coding “ream Selective Enhancement (bit-shifting selected MBs) DCT (bitplanes) Residual lma_e FGS Enhancement Layer Encoder Original Entropy Encoder Inverse DCT In verse Quanlt. Q . Frame Memory M no 'I Estimation I Base Layer Encoder V Figure 2-6 Basic (SNR) encoder for the FGS base and enhancement layers. It is clear that the added complexity of the FGS enhancement-layer encoder is relatively small [18]. bitstream and the other to generate the enhancement layer bitstream. The base layer is coded to a minimal acceptable video quality. In other words, the minimum available bandwidth must always be at least equal to the bitrate chosen for the base layer. This ensures that the reconstructed video has a quality equal to that provided by the base layer bitstream. 23 Thus, the base layer is coded to the minimum available bandwidth (RBL ~R ). The enhancement layer fitlly utilizes the additional available bandwidth at any instance of time. This can be estimated in real-time. The upper limit on the size of the enhancement layer (Ru = Rm — REL) is determined by the maximum bandwidth Rmax that is available at time of transmission. An important observation here is that the range [R Rm] can be min ’ determined off-line. The base-layer encoder contains motion compensation and motion estimation tools. Hence, the base layer bitstream is organized into I, P and B flames. The base layer is compressed using the DCT-based video encoding method. This method demonstrates good coding efficiency, particularly at low bitrates. The DCT coded video is quantized and sent to an entropy encoder to form the base layer bitstream. In the base layer, the DCT coefficients are run-length coded. The video coding tools used for coding the base layer are described in length in [38], [371, [201, [221. Encoding The Enhancement Layer The F GS enhancement layer takes two inputs- the original flame and the reconstructed base-layer flame. The difference in the intensity value of every pixel in the original flame and the corresponding pixel in the reconstructed flame is DCT coded. The DCT coding follows the traditional 8x8 blocks described above. Also, each set of four 8x8 luminance (Y) blocks and two color (U and V) blocks form the typical 16x16 macroblock structure. Once all the DCT coefficients in a flame have been obtained, the one with the highest absolute value is determined. The number of bits required to represent this maximum value 24 gives the maximum number of bit levels or bitplanes required to code the enhancement-layer (residual) flame. The absolute values of the coefficients in every 8x8 DCT block are zigzag ordered into an array. Consider an array of 64 bits representing each of these coefficients at a particular significant position say p. This array is said to be a block bitplane at level p. Every frame has all the three-color components (Y, U and V) coded into it. The maximum number of bitplane levels required to code each of these components may differ. Mathematically [18], N,,, = [16g, (1 C [Wu +1 (2.5) where, N 3,, is the number of bitplanes and IC |mam is the maximum DCT magnitude of the given residual flame. Y U V Bitplane 0 Bitplane 0 Bitplane O Bitplane l Bitplane l Bitplane l Bitplane 2 Bitplane 2 Bitplane 2 Bitplane 3 Bitplane 3 Bitplane 3 Bitplane 4 Bitplane 4 Bitplane 4 Bitplane 5 Figure 2-7 An example of different number of bitplane levels for the different color components. The maximum number of bitplanes required to code the flame under consideration is that corresponding to the greatest of the maximum levels of Y, U and V. Hence, not all the blocks have the same number of bitplane levels. This is illustrated in Figure 2-7. 25 The maximum number of bitplanes required to code the flame under consideration is that corresponding to the greatest of the maximum levels of Y, U and V. Hence, not all the blocks have the same number of bitplane levels. This is illustrated in Figure 2-7. Each such block is now subject to variable-length coding (VLC). The code length depends on the significant position of the bitplane level under consideration. It must be noted here that this is the absolute significant position of the level in the binary representation of the largest DCT coefficient in the given block. The number of passes required to code a particular block depends on the number and order of Is and OS in each block.Figure 28 illustrates an example that helps understand this better. 0 O O 0 O O l 0 O O 0 O O 0 O 0 0 O O 0 O 1 l l 1 O O 0 O O O 0 0 O O 0 0 O O O 0 0 0 O O O O O O 0 0 0 0 0 O O 0 O 0 0 1 O O 0 Figure 2-8 an example of the number of passes required to code a block. This number depends on the arrangement of Is and OS in the block. A continuous, uninterrupted string of zeros is called a run. Here, number of runs =4. More information on the bitplane coding method used for the FGS enhancement layer can be found in [l4],[18],[42],[21]. 26 2.5.2 The Streaming Server The second block in the Internet streaming video system structure is the streaming server. In the case of the MPEG-4 FGS system, the streaming server determines the size of the portion of the enhancement layer that gets transmitted. Figure 2-9 shows the total, lossless enhancement layer being input to the server. The shaded region indicates the part of the enhancement layer of every flame that fits within the available bandwidth. As shown in the figure, only this part of the enhancement layer reaches the decoder. In the current implementation of the MPEG-4 F GS standard, the streaming server allocates bitrates for the enhancement layer of every flame by a brute-force method. The basic equation defining this allocation is: BFR(EL)=BTOT(E%FR (2.6) where, BFR(EL) is the bitrate allocated for a given flame in the enhancement layer, er (EL) is the total bitrate available for the enhancement layer of the video sequence and N FR is the number of flames in the video sequence. Within a frame, the streaming server performs a crude, raster-scan driven pruning of the enhancement layer to fit the bitrate allocated for that flame. This is a low-complexity, non- optimal bit allocation method. The objective of this thesis is to develop and analyze the performance of optimum (in a rate-distortion sense) algorithms for FGS rate control. 27 Portion of the FGS ,. PCB enhancerrent layer C] enhancemert laya trammittod i inreal-time Attheeneoder Atthestrmningserver Atthedeeoder l S l 1 i 1 1 i Eselayer FJihamerImIayer I “-1 i t r": l Baselayer j Figure 2-9 Examples of the FGS scalability structure at the encoder (left), streaming server (center), and decoder (right) for a typical unicast Internet streaming application. The top and bottom rows of the figure represent base-layers without and with Bi-directional (B) frames, respectively. Note that only part of the enhancement layer is transmitted in real time. [18] 2.5.3 The Decoder The last block in the Streaming server system is the decoder. It is the client-side module. The block diagram for an MPEG-4 FGS decoder is shown in Figure 2-10 below. The 28 structure of the decoder is basically the reverse of that of the encoder. The decoder reconstructs the video flames using the base layer bitstream and the truncated enhancement layer bitstream. It is useful to note that the two bitstreams are decoded independently and added to form the reconstructed flame. The quality of the video is proportional to the number of bits decoded by the decoder for the enhancement layer of each flame. Enhancement Layer FGS . '5 Enhanced Bit-plane Bit-plane VLD de—shifling » Video Motion Compensation ’ H —-BLMVS Base Layer BL L BL Frame Memory V Video stream (optional) Inverse Base Layer ’ Quantization Inverse VLD Q" DCT Figure 2-10 Basic (SNR) FGS decoder for the base and enhancement layers [18]. 2.6 Summary To summarize, rate-distortion theory indicates that there exists a minimum distortion level that can be associated with any given bitrate lesser than the total lossless bitrate. In the case of image data compression, the losses incurred due to applying an image compression algorithm lead to a similar trade off between rate and distortion. A widely used measure of distortion for image data is the Mean Squared Error. The compression algorithm in the MPEG-4 video-coding standard comprises of transform coding using the DCT, quantization and run-length encoding. The FGS scalability scheme allows for progressive scalability in the enhancement layer. This is possible because the enhancement layer is coded using the bitplane coding technique at the encoder. The streaming server roughly determines the bitrates for every flame. It employs a brute-force approach to allocate bits to meet this bitrate constraint. At the decoder, the base layer and enhancement layer bitstreams are decoded separately and added together to form the reconstructed flames of the video sequence. 30 3 Bitplane Rate-Distortion Based Rate- Control Algorithm In this chapter, we describe and develop a rate-distortion based rate-control algorithm for the specific encoder-decoder system of the MPEG-4 F GS standard. This makes it an operational rate-distortion based algorithm. The advantages of using an operational curve is that the data points used are directly achievable for the implementation of the system and for the given test data. FGS codes a video sequence into a base layer and a single enhancement layer. The base layer is non-scalable and is equal in size to the lower bound of the available bandwidth. The residual DCT coefficients representing the difference between the original picture and the reconstructed picture are bitplane coded and comprise the enhancement layer. The enhancement layer can be truncated within each flame to allow partial enhancement proportional to the number of bits decoded for every flame. The limiting of the base layer bitrate to the minimum available bandwidth is a typical rate-control problem. Rate control for non-scalable coders is essentially based on a buffering technique. The quantization step size is based on the fullness of the output buffer and the characteristics of the picture to be coded [44], [4], [l3]. MPEG-4 uses the rate control scheme proposed for MPEG-2 and which is commonly known as the Test Model 5 (TM5). This rate control algorithm consists of three steps- target bit allocation for a flame, rate control via buffer monitoring and adaptive quantization based on local activity [39]. The 3] TM5 algorithm is used to restrict the base layer to the lower bound of the available bandwidth. The enhancement layer bitrate depends on the bandwidth that is available for the video sequence streaming above the base layer bitrate. The encoded enhancement layer bitstream is sent to the streaming server. Effectively, the streaming server performs a raster scan of the bitplanes- starting flom the first macroblock through to the last on a particular bitplane. The first bitplane scanned is the one that corresponds to the most significant position of the largest DCT value in that flame. Once all the macroblocks in a particular bitplane have been sent, the server starts sending the macroblocks flom the beginning of the next highest significant position bitplane. This is illustrated in Figure 3-1. The server stops sending more enhancement layer information whenever the target available bitrate is reached for that flame. This is a coarse raster-scan bit allocation method that has very low complexity. However, such a method is not optimal since it performs brute-force bit-allocation. When the streaming server prunes macroblocks flom the enhancement layer bitstream, it adds to the distortion in the reconstructed enhancement layer image. The rest of this chapter is devoted to investigating a low computation, low complexity rate-control algorithm for the F GS enhancement layer that minimizes this distortion. It is also intended that this algorithm be compatible with the rate-control mechanism employed for the base layer. 32 Frame F I evel =8- 1 (L33) level =.i 53w pm," ' I: LrIrumittedmncroblocka E pruned macroblock- (a) FrameP level =B-I (LSB) level =i x- W‘— "a . 4" - .1 .57 . . [: trnnmniucdmncroblocka pruned mna-oblocka Cb) Figure 3—1 Pruning of macroblocks- width of the macroblocks represents the number of bits needed to encode that macroblock (a) Raster-scan rate-control (b) R-D based rate-control 33 3.1 Rate And Distortion Measures Let M be the set of all macroblocks in the enhancement layer bitstream. The distortion added to the enhancement layer image by pruning any macroblock m is represented by AD," . Let 9.71 c M be the subset of macroblocks that is pruned by the streaming server when it implements the existing raster-scan rate-control method. Our objective here is to determine the Optimal subset of macroblocks, am that minimizes the total distortion in the apt ’ reconstructed image i.e., 9310p, =arg{r%iln Z ADm} (3.1) 016931 We make a rate-distortion interpretation of equation (3.1). In terms of the available bitrate for video transmission and the associated distortion, we can state that our objective is to determine the set of optimal macroblocks shop, that will minimize the total distortion in going from the lossless bitrate Ru of the enhancement layer to a target bitrate R, such that R, $12”. From Figure 3-2, we can see that in order to get to any bitrate that is lesser than the lossless bitrate, there is more than one path that can be followed. To find the path with the least associated distortion at all points or the optimal performance curve, we have to find the one that will have minimum slope at every point. Hence, in a rate-distortion sense, our objective is to minimize the ratio between the increase in the distortion associated with a given reduction in the rate. In terms of the macroblocks to be pruned, our objective is to 34 determine the set of macroblocks that, when pruned, result in the minimum slope of the rate- distortion curve. This implies that we have to determine the rate-distortion ratio associated with the pruning of each macroblock in the flame. u.‘ I 7 I U I I I I 136 '- 4 11.3 ‘ ‘-—-—.__ 1 \‘H 125 ‘, a MSE K“ X‘ x 0-2 " -\ - . q -, X“ \ \ 3.15 - \ .. I ~ ‘_ “-1 \K 011 '- “~\‘:\~\ - \‘.\"".-\ a ‘3‘. -._\ 1E5 1- \- '\ - \\‘-‘\ as! o J l J J I I l I 1.5 2 2.5 3 3.5 4 4.5 5 5.: Bitrate (bps) Figure 3-2 Different rate-distortion curves In order to find the subset 9310p, , we have to find the distortion ADM associated with the pruning of each macroblock in the flame. One method of doing this is to completely decode the enhancement layer bitstream to get the reconstructed image. This image is then compared with the original image and the mean-squared error (MSE) between the two is determined. This gives us the distortion in the pixel domain. In the CIF format, the size of a flame is 352 x 288 pixels. Each flame can hence be divided into 396 macroblocks (16x16 35 pixels). If B is the number of bitplanes in the enhancement layer flame, then it contains a total of 396*B macroblocks. Calculating the distortion associated with each of these macroblocks by reconstructing the image completely, results in very high computational complexity. This is neither desirable nor acceptable. Hence, we look for an alternative method of determining ADM . One possible alternative is to determine the MSE in the transform domain. Let [X08] be the transform domain representation of the original image [x(fi)] and [X (16] be the transform domain representation of the reconstructed image [58(5)]. Here, I? = (k,,k2) and F = (n,,n2) are two-dimensional vectors in the transform and pixel domains, respectively. It can be easily shown that the squared error in the transform domain is equal to the squared error in the pixel domain, that is, the distortion, 1),, = Zip/1AA.) — ,1",.,,,(/?)]2 =Z[x,‘m(fi) —;t,,,,(fi)]2 (3.2) I where, D,” is the distortion associated with block I of macroblock m, X 1.». (I? ) and 21",” (I?) are the DCT coefficients in position I: = (k,,k2) in the same block in the transform domain representations of the original and reconstructed images respectively. x,'m(fi) and 5.1mm) are the corresponding intensity values of the pixel at location fi=(n,,n2) in block I of macroblock m. Therefore; we use the DCT coefficients in the enhancement layer to determine the distortion. 36 If I is the number of blocks in the macroblock m, the distortion for the entire macroblock is given by, Q=qu (m) Our objective is to determine the set of optimal macroblocks map, that will minimize the total distortion in going flom the lossless bitrate Ru of the enhancement layer to a target bitrate R, such that R, S Ru. The macroblocks that we select for pruning have to satisfy the condition, 2 Rm 2 (Ru —R,) (3.4) meDJl Therefore, the optimization RD-based problem addressed in this thesis can be expressed as follows: r i Y 9310,” = arg< mmiin [2 AD") (3.5) "1er ngmkmzlkLL_RTl J 3.2 The Rate Control Algorithm In this section, we introduce the bitplane rate-distortion based rate-control (BPRD) algorithm for the FGS enhancement layer. 37 3.2.1 Mathematical Formulation Let the target bitrate for the FGS enhancement layer be R, . Let R,.,i = 0,1,...B —-1 be the number of bits required to code the ith bitplane. Here i=0 corresponds to the MSB bitplane and i=(B-l) corresponds to the plane representing the least-significant bit (LSB). In terms of R, , the target bitrate can be expressed as, (3.6) where the target bitrate RT lies between the number of bits required to encode j bitplanes of the image and the number of bits required to encode j-l bitplanes of the image. The existing raster-scan rate-control method completely discards the (j+l)st to the (B-l)st bitplanes. Macroblocks are pruned flom the bitplane level j in order to reach the target bitrate R,. In the bitplane rate-distortion based rate-control (BPRD) algorithm described in this section, we therefore resolve to find the optimal set of macroblocks that must be pruned flom bitplane level j. This is given by 931 c: M. (3.7) OPIU) I where the global set of macroblocks is considered to be M j , the set of all macroblocks at bitplane level j. Given a target bitrate RT , the rate-control algorithm begins at a bitplane level j. The input to the rate-control algorithm is hence considered to be the video sequence at the bitrate 38 13-] Z R, . In other words, the input video that the distortion is measured against is considered to 1:) have only the j most-significant bitplane levels if the algorithm is operating on the bitplane level j. In the FGS enhancement layer, the DCT coefficients are bitplane coded. In these terms, the DCT coefficient in block I of macroblock m can be represented as, _ B“ , — XL," (k) = Z 2-'b,m.(k) (3.8) i=0 where, b,‘m.,(/E) is the bit in the ith significant position of the coefficient X 1.»: (I? ) and B is the total number of bitplane levels needed to represent the block I of macroblock m. In the case of the proposed BPRD, there is always only one bitplane in the binary representation of the DCT coefficient, that is, B =1 always. This means that the index i=0 always. For this reason, we shall henceforth drop the index i. Equation (3.8) thus reduces to, X,” (76) = b“, (I?) (3.9) The distortion D,” (mean-squared error) between the block I in the macroblock m in the transform domains of the original and the reconstructed images is given by, _ A _ 2 _ A _ 2 21X“, (k ) — X,‘m(k )] = Z[b,.m(k ) —b,,m(k)] (3.10) It I: Since there is a distortion already associated with the input image, this distortion can be considered to be the increase from the original distortion of the input image or ADM. If j= B , then the input distortion is 0. 39 Using equation (3.3), the total change in distortion due to the pruning of the macroblock m (ADM ) is given by, ADM £2,1er (I?) — It”, (13]2 = :zbflui) 42,06]2 (3.11) Here, bjm (I?) and bl," (I?) can take values of O or 1. This implies that b,” (I?) —b,'m (I?) is also always 0 or 1. Hence, we can rewrite equation (3.11) as, AD... = XXIX.,.(I?)—X.,.(IF)]2 = 2204.416]: = 224.41?) (3.12) l k I k l k Only the cases in which b,‘m(lc_)=l and 13,.”‘(1—6 )=O contribute to this sum. Hence, the distortion added by pruning a macroblock m at bitplane level j is equal to the number of ones in the macroblock at level j that get converted to zeros. This is the measure of distortion for the BPRD algorithm. Thus, a simple indication of the distortion is the number of ones in a bitplane macroblock. Every block in a macroblock comprises of ones and zeros. Each such block is separately VLC coded based on the number of consecutive zeros (run of zeros) and the length of these runs. The number of bits required to encode a particular block depends on the bitplane level and the length and number of runs of zeros. From the VLC tables developed for the FGS enhancement layer in [21], the number of bits required to code a block that consists entirely of zeros (or an all-zero block) is less than that required to code a block that contains one or more runs. Hence, converting a block to the all-zero state reduces the bitrate. This difference in the bitrate is the change in the rate ARM due to converting (to zeros) the 40 bitplane of block I of macroblock m. Therefore, the total number of bits reduced by pruning macroblock m: ARm=ZAR l.m l (3.13) 3.2.2 Description The BPRD algorithm is bitplane-level independent. It is also flame specific. It operates only on the bitplanes in the current flame and not across flames. The steps involved are enumerated below: Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Start algorithm for flame f, bitplane level i such that 0 S i S B —1 ; Recover the DCT coefficients flom the compressed enhancement layer bitstream by performing variable-length decoding. Determine the change in rate (ARM) and change in distortion (ADm) for every macroblock m as described the previous section. Calculate the rate- distortion ratio (ADA ) for each macroblock. Store references to all the macroblocks in an array and sort them in ascending order of the rate-distortion ratio. Convert the macroblock referenced at the bottom of the sorted array to all- zero. Move reference one step at a time up the sorted order. Check if the target bitrate has been reached by this conversion. 41 Step 7: If the target bitrate has not been reached, go to Step 4. Step 8: If the target bitrate has been reached, re-encode the modified DCT coefficients using variable-length coding. Step 9: Repeat for all frames f. Before proceeding to the experimental results, we highlight two important aspects of the proposed BPRD algorithm The All-Zero Code: When the enhancement layer is VLC coded based on the MPEG-4 FGS standard; the code for every block depends on the significant position of the bitplane level under consideration. This level is the absolute significant position of the bitplane level in the binary representation of the largest DCT coefficient in the given block. The most- significant-bit (MSB) plane for every individual block is defined as the first non-all-zero bitplane. If an all-zero block is encountered afier the MSB level has been reached for that block, it is coded using a special symbol called the all-zero symbol. The length of the all- zero symbol once again depends on the absolute significant position of its bitplane level, for that block. More importantly, the length for the all-zero symbol used by the MPEG-4 FGS standard could be very high, and consequently not very efficient to use in our proposed BPRD algorithm. Consequently, we introduce into the enhancement layer bitstream, a flag to mark every macroblock in the bitplane under consideration. This flag is set to 1 if the macroblock has been chosen flom the sorted order for conversion to all-zero. If not, the flag is set to 42 zero. The change in rate measure (ARM) that is used above in our algorithm takes into consideration this extra (though very minor) overhead. Blocks versus Macroblocks: Not every 8x8 DCT block has the same number of bitplanes. Each of the blocks is an all-zero case at bitplane levels higher than its MSB plane. At the encoder, if a block is all zero above its MSB level, a l—bit flag is used to indicate that the MSB for that block has not been reached. It is possible that there are a lot of 8x8 DCT blocks that have fewer bitplanes than the maximum number of bitplanes in a flame [14]. In other words, the first couple of bitplanes of every flame are very likely to have many all- zero cases. In such a scenario, using 1 bit to signal the all-zero status of every block is not very efficient. The encoder instead groups the blocks in each macroblock together and uses a single bit to indicate an all-zero macroblock [14], [21]. Let us assume that our granularity resolution was a block instead of a macroblock. In the raster-scan bit allocation method, the blocks that lie beyond the target bitrate would be discarded. In other words, no overhead is added to the enhancement layer bitstream to indicate to the decoder that these blocks are to be rebuilt as all-zero cases. In the case of bitplane rate-distortion rate-control (BPRD), we would start flom the bottom of the sorted array containing references to the blocks in the ascending order of their rate-distortion ratios and convert the blocks to all-zeros. We are not concerned with the position of the block within the flame. Our primary interest is to minimize the slope of the rate-distortion curve. However, it now becomes imperative that we signal the decoder that the block has been converted to an all-zero block and should hence be rebuilt as one. The relative overhead incurred in indicating this conversion on a block-by—block basis is high for every bitplane 43 level, but especially so for the first two-bitplane levels. This is because in these bitplane levels, the encoder itself has been optimized so that it adds the minimum number of bits into the enhancement layer bitstream. In order to restrict the overhead closer to the lowest possible level, we select a macroblock as the granular resolution of our rate-control algorithm. 3.3 Experimental Analysis We begin the experimental analysis of the bitplane rate-distortion based rate-control algorithm by defining a suitable comprehensive test-bed. Video Format The file format used here is the common—interchange format (CIF). The flame size is therefore, 352 x 288 and the input source flame rate is 30Hz. Test Sequences The test sequences library is described in the MPEG-4 standard [21]. The library is divided into six classes flom class A through to class F. The sequences are classified based on characteristics such as spatial detail, amount of movement, number of bits used for coding. The test sequences used in this experimental analysis belong to Class A through to Class C. Class A consists of sequences that have low spatial detail and low amount of movement (e.g. Akiyo). Class B sequences are characterized by medium spatial detail and low amount of movement or vice versa (e. g. Foreman). Class C sequences have high spatial 44 detail and medium amount of movement or vice versa (e.g. Mobile Calendar). The example sequences are the ones that are used here. Base Layer Rate Control We consider the following two cases for rate control in the base layer. First, we assume that there is no rate control algorithm applied to the base layer. We fix the quantization parameter Q, and run the BPRD algorithm on the enhancement layer bitstream. The values we use for the test sequences are Qp= 4, 6, 8 for the Akiyo sequence and Q1): 10, 15, 28 for the Foreman and Mobile sequences. These Q, values are chosen because they approximate to reasonable base layer sizes of 100 kbits, 200 kbits and 300 kbits respectively. Second, we consider the sequences with TM-S rate-control applied to the base layer. Again, we restrict the base layer to the sizes of 100 kbits, 200 kbits and 300 kbits. We consider the following two cases for rate control in the base layer. First, we assume that there is no rate control algorithm applied to the base layer. We fix the quantization parameter Qp and run the BPRD algorithm on the enhancement layer bitstream. The values we use for the test sequences are Q): 4, 6, 8 for the Akiyo sequence and Q): 10, 15, 28 for the Foreman and Mobile sequences. These Q, values are chosen because they approximate to reasonable base layer sizes of 100 kbits, 200 kbits and 300 kbits respectively. Second, we consider the sequences with TM-5 rate-control applied to the base layer. Again, we restrict the base layer to the sizes of 100 kbits, 200 kbits and 300 kbits. 45 We present below the results obtained by running the BPRD algorithm on the enhancement layer sequences obtained for all the different cases derived flom the test-bed described above. All figures represent the mean-squared error plotted against the bitrate (in bits per second). The bitplane level=0 corresponds to the MSB bitplane. In this section, four performance curves are presented for every case. The curves in the top-left comer represent the performance at all the bitplanes for both the current raster-scan and the proposed BPRD-based algorithms. As explained above, these results are shown for a given flame with the BPRD algorithm applied independently to each bitplane. The results are organized sequence-wise. The first sequence that we analyze is the Akiyo sequence. This is a Class A sequence i.e. it has low spatial detail and low amount of motion. Figure 3—3-Figure 3-5 show the results obtained for the I, P and B flames for Q”: 8 respectively. For the Akiyo sequence, a Q, value of 8 corresponds to a base layer bitrate of about 100 kbits/s. Figure 3-6-Figure 3-8 show the result of applying the BPRD algorithm on the individual bitplanes of the I, P and B flames for the Akiyo sequence. The base layer in these cases is coded at 100 kbits/s using the TM—S rate-control algorithm. It can be seen flom the top-left curve in Figure 3-3 that the performance curve of the existing raster-scan rate-control method closely follows that of the BPRD algorithm for bitrates higher than about 200 kbits/s per flame. 46 - 1&4? Raster-scanbialoc. _ L — BPRD ' l Figure 3-3 Akiyo I frame Qp=8 The improvement in performance is more significant for bitrates in the range of 50 kbits/s-ZOO kbits/s for each of the flames. At lOHz, this corresponds to a total enhancement layer bitrate in the range of 500 kbits/s to 2 Mbits/s. This is a desirable and practical bandwidth range. The rest of the figure shows the performance in this specific bitrate range. 47 Raster-seahi lee. — m Rana-sell It in. no Figure 3-4 Akiyo P frame Qp-s Raster-scarbialoe. ' Naturalistic. Figure 3-5 Akiyo B frame Qp-Q 49 Trmwfilgis: ;: t ' ,4," _ 40.. “ mammflAlB‘tplanss B 'I 2 Bitrale- bps g . ' 2 . Bitrate - bps t"? ’3... Raster-scan bit allot. _ — BPRD ‘. of. x ‘ T’. ._ .. . \'. 1].. [J \ -. \x“-‘ . r ‘\j ti“ Figure 3-6 Akiyo I Frame Base layer=100k 11:11:: ' - ~15; 1,... ' In lml=2 A Raster-semblance, "1;” -— BPRD 5. ' Rastn—scanbitalloc. . j — BPRD A ‘" i v: t’? l ’1. I mss P Frama. Overall Btplanes Raster-scan hr allot — BPRD 0.5 1’ l l 5 2 2.5 3 3.5 4 P Frame Overall Bilplanes Raster-scan bit allot — BPRD 0.2 or as 118 1 1,2 1.4 1.5 1.8 “2 Bitrate - bps x105 Bmaia - bps r 106 . '23 ~ 22 ludil V _ Raster-scan bit allot. —— BPRD Figure 3-7 Akiyo P Frame Base layer=100k Based on the above figure, the BPRD based algorithm provides its highest gain in performance at bandwidth ranges that can be available over current and evolving access networks (e.g., Internet access over Local-Area-Networks, cable modems, DSL, etc.). K I V . 1" o‘ t 8 Frame Overall Btplanes ‘ lml=l 1 r. 4 »P .I .‘ Raster-scan bfl allot '; “SW56?" 5“ alloc — BPRD 29' 1“ —— BPRD 26» . 2.85 i ‘ .. r 2.3 7* 2.75 :15 3 21* i E E2116- 5 Ii 25* ‘* 255’ 0‘5 2.5r ., 2.45 0.5 1 1.5 2 2.5 a 3.5-: 4 ‘ 1 . 1.5‘ ’ 2 ., 25 ’ 3 ' Duals-bps “9’ .m’m" . 3‘ , {8102‘ Bane OvarallBtplanes lml=2 V. a» 24-3 " 1 — epso “; _ BPRD 2.3 ‘_ 25> 2.21 ‘ .fl 21* 2 ... \ s “a 2+ ‘ E E 1.9 1.5~ " 18 1 l] 4 ‘5' EEK; 0.2 04 [1.5 08 1 1.2 14 16 1.8 2 15 4 4.5 5 5.5 5 5.5 7 7.5 B 8.5 Bitrate - bps [10‘ B‘mlo- bps x 10‘ Figure 3-8 Akiyo B Frame Base layer=100k From the figures shown in the following pages, it can be seen that this observation is applicable for all the presented cases for the Akiyo sequence. Hence, all the figures show specific results for the bandwidth range of 500 kbits/s to 2 Mbits/s. In all cases, this bitrate range corresponds to bitplane levels 1 and 2 for the Akiyo sequence. 52 Ralerscaaoi doc. -— BPRD Figure 3-9 Foreman I frame for Qp-28 The second test-sequence used in this experimental analysis of the BPRD is the Foreman sequence. The Foreman is a Class B sequence. It has medium spatial detail and low amount of motion. 53 Figure 3-10 Foreman P frame for 01248 ya41."‘“’“".s4"‘ pf :11 '.....“.._............. ........... . ' _ -Raster- 4m bi aloe .~ —BPRD :1" \‘KFT'. )4... 9'). ~- 'e‘nr y... ”:1.“ .1 kw awuflfi‘ Figure 3-11 Foreman B frame for Qp-28 Figure 3-9 to Figure 3-11 show the results obtained by applying the BPRD algorithm to the individual bitplane levels coded at a constant Q, value of 28 for the I, P and B flames respectively. 55 Figure 3-12-Figure 3-14 show the results for the I, P and B flames with a base layer bitrate of 100 kbits/s. In these cases, the base layer is coded to the specific bitrate using the TM-S rate-control algorithm. Raster-scan bl alloc. - Roster-mold doc. — BPRD . — BPRD Raster-sen bl doc. — BPRD Figure 3-12 Foreman 1 frame Base layer=ka 56 Hutu-m fl ioc. — m Figure 3-13 Foreman P frame Base layer-100k From the top-left curve in all these figures, we can see that for bitrates higher than about 200 kbits/s per flame, the BPRD is closely approximated in performance by the raster-scan rate-control method. That is, for a 10Hz flame rate, the improvement in performance is more 57 see c".- *3 ..- . -1» an 3.2-: .~. 4». .41.. p 1:. .......................................... mRasler-uuibialloc. ,1". ‘ J" ._-‘ ""RiflI-muu. . :f 13 — BPRD —amo .-r at? "a ‘1 ‘ 9.." “at"? .2 W ,_ ._ .2“.me 51:4“, - . ‘ I“ ‘ I ‘ #%,fie,flw fl ffh‘sr.‘~; 9'" ’ ”Eryn“; . ., 19:1 .. a; “in AW‘ ”’75 ‘i Figure 3-14 Foreman B frame Base layer=100k apparent if the entire enhancement layer lies within the bandwidth range of 200 kbits/s to 2 Mbits/s. The other three curves in each figure hence, show the performance for these bitrates. It can be seen that the bitplanes that correspond to the specified enhancement layer bitrates vary between bitplane levels 0, I, 2 and 3 for the Foreman sequence. 58 Figure 3—15 Mobile I frame for Qp=28 59 nevi. 19‘] H2 ' Raster-scan bl allot. — BPRD Figure 3-16 Mobile P Frame Qp=28 The last sequence that we analyze for the performance of the BPRD rate-control algorithm is the Mobile Calendar sequence. The Mobile sequence is a Class C sequence. It has high spatial detail and low amount of motion. Figure 3—17 Mobile B Frame Or” 61 n» J, ,1“! . . uh“ v-,,.,,, ._~-. Racial-nu H let. —— BPRD ‘1" ~7- r. {in t- ' . ‘ 1,.) “N; “3.2)" “”4 1... __ . -.1-. saw . -«. raga”... .5312 - i L __-Mm~§“‘eum 139:5" I Figure 3-18 Mobile I frame Base Layer-100k Figure 3-15-Figure 3-17 represent the results obtained for the I, P and B flames of the Mobile sequence coded using a constant Q, = 28. This value of the quantization parameter Qp corresponds approximately to a base layer coded at bitrate of 200 kbits/s. Figure 3-18-Figure 3-20 show the performance of the BPRD algorithm applied to the I, P and B frames of the sequence coded using the TM-S rate-control algorithm to have a base layer of 100 kbits/s. a; '34‘: M; a - '. 1' v.1 ‘~!‘W"'w 1.397: "figy‘.4:9§-'; u ”j, «m .... ‘1 M35. '5‘. " -_ Va. » _ f -, 3‘ "“"i '- .' ,. . In“; ~3f§ :. “(3.3?" Jygwfirl ;‘“.”‘_.~v . L)... ,‘A “u v '1‘ Figure 3-19 Mobile P Frame Base Layer=100k 63 BFrame Overall Baplanes ‘ INFO < 4.45“. “ . Raslsr-scan butane: _ Raster-scan bnahoc : ._ 4115 - 35- ‘ 4.44 - 3, 4.6 - 25» 3 ~ 3 E E U3 2r . l 15- 1.425 ‘r . ‘fi?’ .. 0.5 ‘--\ 4.415- \‘ . l l J %L | l ' I 2 4 5 5 7 . IN .3“, m . an «my am am agitate-bps “05‘ "t . again-fly! B F tame: Omall Biplane: Imbii - ' - u- ” y‘ Raslenscan bnalloc ‘ Raster-scan bflalloc ‘2' V — BPRD . _ BPRD 4 ,. L3- 3.8- 35 “2; 0 3.1 o E in» 3.2 3 " 5 2.8 ~ \ 2.5. ‘ ~ "3.9 2". l l m_ l ‘1‘ i l l I n n a; \ 0,5 1 13 ‘ 2 ' 2.5 (.5 1 1.5 T 25 3 3:5 4 45 Bilfll. ’ bps ‘ 10° Biiiil- bp‘ .10. Figure 3-20 Mobile B frame Base layer=100k The top-lefi curve indicates in the case of the Mobile sequence that the performance of the BPRD algorithm improves in the bandwidth range of about 100 kbits/s to 200 kbits/s per frame. This is equivalent to a total enhancement layer bandwidth of i Mbits/s to 2 Mbits/s for a frame rate of 10Hz. The bitplane levels that correspond to this bitrate range are levels 1 64 and 2 for the Mobile sequence. Hence, the performance for these levels is shown in detail in these figures. 3.4 Conclusions The performance of the bitplane rate-distortion based rate-control (BPRD) algorithm has been presented thus far. We proceed to state certain conclusions on the performance characteristics of the BPRD algorithm based on the analysis of the presented results. In general, it can be seen that the performance of the BPRD algorithm is consistent with the theoretical expectation in that it generated convex RD curves. Application of the BPRD to an individual bitplane level produces a more optimal rate-distortion curve than the one obtained by applying the existing raster-scan rate-control method. The improvement in the mean-squared error is different for different bitplane levels. It is observed that this decrease in the error is more notable within a bandwidth range of approximately 50 kbits to 200 kbits per frame. Given that the encoded sequence typically has a frame rate of 10Hz, this corresponds to a total enhancement layer bitrate of 500 kbits/s to 2 Mbits/s. The performance improvement is noteworthy in the bitplanes that correspond to this target bitrate range. The magnitude of the improvement in the mean-squared error is also differs from sequence to sequence. In the same target bitrate of 500 kbits/s to 2 Mbits/s, the BPRD algorithm gives a more significant lowering of the MSE in the Foreman sequence (A Class B sequence). The next best performance is for the Akiyo sequence (A Class A sequence). The 65 BPRD performs with only a fair amount of improvement for the Mobile sequence (A Class C sequence). This indicates that the performance of the algorithm depends to a large extent on the characteristics of the video sequence being encoded and transmitted. The figures showing the performance of the BPRD algorithm on each individual bitplane, plotted together indicates that if the target bitrate should occur in the region of transition from one bitplane level to the next, the rate-distortion curve is locally non-optimal. Hence, in the next chapter, we propose an extension to the BPRD algorithm in order to find a rate—distortion performance curve that is optimal for all target bitrates. 66 4 Cross-Bitplane Rate-Distortion Based Rate-Control Algorithm In this chapter, we extend the BPRD algorithm in an effort to optimize the rate-distortion curve at the region of transition from one bitplane to the next. Our objective in the BPRD was to find the optimal subset 9)? from the set of all oth') macroblocks M j at bitplane level j in the transform (DCT) domain of a given frame. The cross-bitplane rate-distortion based rate-control (CBPRD) algorithm differs from the BPRD algorithm as follows. In this case, we search for the optimum subset map, within the total set M . Here, M is the set of all bitplane macroblocks in the DCT domain of the frame under consideration. Consequently, the optimum subset an could include bitplane macroblocks opt that belong to different bitplane levels Figure 4-1. Hence, under the CBPRD algorithm, we determine the optimal subset that minimizes the increase in distortion as measured across all possible macroblocks belonging to all possible bitplanes subject to a bitrate constraint: (4.1) ‘3 m g \___/ v map, = arg< mwitn [ Z ADM ngmkmfiku'kr) i 67 Frame F L: n “:25 f 55:71: I”: i ‘ : "‘ ‘ "1 ' .1; ": fl, pruned macroblocks Franc P f r! prunedmncroblockl (b) Figure 4-1 Pruning of macroblocks- width of the macroblocks represents the number of bits needed to encode that macroblock (a) BPRD — map, is selected from bitplane level=j (b) CBPRD- map, is selected from macroblocks on different bitplane levels 68 where ml. is bitplane i of macroblock m. In other words, ml. is an index to a bitplane macroblock. Therefore, ARM and ADM. are the changes in rate and distortion that are associated with pruning bitplane macroblock "7,-- Finally, 9110p, c M , RT is the target bitrate for coding the frame under consideration, and Ru is the lossless bitrate of that frame. 4.1 Developing The Algorithm Based on the above formulation of the CBPRD optimization problem, we need to identify an efficient method for measuring the changes in rate and distortion measures for every bitplane macroblock mi. As stated above, we desire to restrict the bitrate to a target bitrate RT. The change in rate ARM is basically identical to the change-in-rate measure that we have used for the simpler BPRD algorithm. Now we focus our attention on the distortion measure ADM . l The DCT coefficient in block I of macroblock m can be represented as, X.,..(I'€)=Ez‘B“"‘b...,.(F) (4.2) i=0 where, b, (I?) is the bit in the ith significant position of the coefficient X MU? ) and B is ,m.i the total number of bitplane levels needed to represent the block I of macroblock m. The distortion associated with a block I of macroblock m is ADM and the total distortion of macroblock m is AD," . If the macroblock is considered at a bitplane level j such 69 that Os j 58—] and B is the total number of bitplanes in that frame, the block and macroblock distortions are represented by AD, ,,,, and ADM. respectively. These are denoted . , 1 as follows, 2 ADM,=;[X,..j(i€)—X,,.j(i€)] kz[2‘”""j(b,..ju?)—13,,,.j(1?))]2 (4.3) and AD," =2AD (4.4) Therefore, [3ij = ;Z[X (I?) — X...,.(E>]2 = ZZ[2““"" (14.1.0?) 43...]. (13)] (4.5) Here again, b,m.(l?) and 51m.(i‘—) can take values of O or 1. This implies that ' .l ' J bun}. (I?) ‘51.».j (I?) is also always 0 or 1. Therefore, 2 2 _ _ _ “ _ = (B-l)—i — Aij —;;[X,'mj(k) X,‘mj(k)] 2,2;[2 b,.mj(k)] (4.6) Since ADM. is the accumulated distortion over certain specific bitplane levels and the J distortion at every bitplane level is really the squared error, we shall refer to ADM. as the J accumulated squared error. In the CBPRD, we shall use this accumulated squared error as the measure of distortion. In equation (4.6), only the cases where b,'m,(/7)=l and bA,'m_(l:)=0 contribute J I towards the total ADM. Hence, operationally, the computation of ADmcan be achieved by J J 70 counting the number of ones in bitplane macroblock m j, weighted by 2‘34"". The (B-l)‘j weighting by 2 indicates that the ones in the MSB bitplane (level j=0) are given the highest weight and those in the LSB bitplane (level j=B—1) are weighted the lowest. Consequently, if N,(mj.)is the number of ones in bitplane macroblock m j. , then ADW can I be expressed as follows: 2 _ (3—1)-~ — _ (B—I)—' ADM} -2212 ’b,.mj(k):l -4 IN,(m,) (4.7) In the VLC tables developed for the FGS enhancement layer in [21], the number of bits required to code a block that consists entirely of zeros (or an all-zero block) is less than that required to code a block that contains one or more runs. Hence, converting a block to the all- zero state reduces the bitrate. This difference in the bitrate is the change in the rate AR, m_ due ' J to converting (to zeros) the bitplane of block I of macroblock m at a given bitplane level=j. Therefore, the total number of bits reduced by pruning bitplane macroblock m I. : Aij = :13ij (4.8) 4.1.1 Description Having defined the measures of rate and distortion, we proceed to outline the steps involved in the implementation of the CBPRD algorithm. Once again, it must be remembered that this algorithm operates on a frame-by-frame basis in an independent manner, and therefore does not extend across multiple frames. 7/ Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: Step 8: Step 9: Start algorithm for frame f. Start from the bitplane with the lowest significance level, i = B -1. Determine the change in rate (ARM) and change in distortion (ADm ) a J for every bitplane macroblock m j as described earlier in this section. m. I AD Calculate the rate-distortion ratio [ MR ] for each macroblock. Repeat for all bitplane levels i until i = 0. Store references to all the macroblocks in an array and sort them in ascending order of the rate-distortion ratio. Convert the macroblock referenced at the bottom of the sorted array to all-zero. Move reference one step at a time up the sorted order. Check if the target bitrate has been reached by this conversion. If the target bitrate has not been reached, go to Step 6. If the target bitrate has been reached, re-encode the modified DCT coefficients using variable-length coding. Step 10: Repeat for all frames f A single flag indicates the all-zero conversion in the CBPRD. The choice of macroblocks is made to reduce the overhead in recreating the compressed bitstream. We now proceed to look at the experimental analysis of the CBPRD. 72 4.2 Experimental Results For the experimental analysis of the cross-bitplane rate-distortion based rate-control (CBPRD) algorithm, we define the same test bed as was done for the BPRD. The video format used is the CIF format. The test sequences used are the Akiyo (Class A sequence), Foreman (Class B sequence) and Mobile (Class C) sequence. The performance of the CBPRD algorithm on these test-sequences is analyzed for the cases when the sequences are coded with and without application of rate-control to the base layer bitstream. When rate-control is not applied, the base layer is encoded using a constant quantization parameter Qp. The Qp values used are 4, 6 and 8 for the Akiyo sequence and 10, 15 and 28 for the Foreman and Mobile sequences. The other case considered is application of the TM-S rate control algorithm to the base layer. In this case, the three sequences were encoded to constrain the base layer bitrate to 100 kbits/s, 200 kbits/s and 300 kbits/s. The results are presented below in order of the test-sequences. As before, we start with the results of applying the CBPRD algorithm to the I, P and B frames of the Akiyo sequence followed by the results for the I, P and B frames of each of the Foreman and Mobile sequences. For all these cases, the figure showing the results consists of three performance CUI'VCS. 73 1 Frame IFrama _? A? 55 raster-scan bit allot, \,\‘\ , Egggscan bn allot I 1 — CBPRD \ — 3.1 so, -- BPRDusmgASE 9* ‘\ 1\ ~ —- BPRD-usmgASE *4 ' Mi JG" “ \ a: ‘ :3 '1 1 A.“ r - '3 n 1:: 22>- g; .115 i: l ‘41. ‘3 - ' 4‘ g I 5.215 g 1: ~ ,, £220 1', £15- 3 - v ,e . ‘l E \ g. 3‘15 . \‘ p 2 1s _ 10" . i '5 \Q, 1 "' 5 \‘V‘Ka. .94 1" 1.». A 1 HM ' l" ’ ‘-fis.w_1‘r 157mg”: :fi'fg 3.5, .41?” X; "$.11”. 4' £7 _. . PFrame :- -- raster-scan bialloc sf _ 36' ’ PRD ' .70 — - BPRDusmg ASE ‘ . 60 .- '. , L if . - 5” a 93 i rag-”‘91 . E "'1 i i {:4 . E50 1 E10": # . g '2. u a. l r .. 'lO-‘f a . i ' M “3 ' V". 3 3). , ‘ .. '5 ii - 1,... 1.. i' a. 3 I] . \; 3 \‘1 1 E ‘1. § ’ \1 :3 u \. ‘3 20 ' " 1.5 firs- ~‘\ 2 ‘1. .4’.’ 10- it. ' ‘°' \1» 1" '7 -,.\‘\-‘.‘-- \JF-VE ‘ b 115 1.5 2 25 3 3.5 0.5 1 15 . 2 25 3 35 44- -. Btralo- bps x105 Birau- by “L“ 10‘ Figure 4-2 Akiyo I, P and B frames for Qp=8 74 Figure 4-3 Akiyo I, P and B frames for Base layer=100k The first curve is the performance of the existing raster-scan rate-control method. The second curve indicates the performance of the CBPRD algorithm. The third curve that has 75 been plotted gives an insight into the performance of the BPRD algorithm, if the distortion measure of (AD W) was used for it. The second part of each figure shows in detail one I region of transition from one bitplane to another. I Frame [Frame 55 raster-scan 111 alloc raster-scan 1111 alloc W BPRD 311 » BPRD 50 _ — - BPPUusmg ASE — - BPRDusIng ASE 3 . g -. ‘ \ ‘ E .; E] D @- \ . 3 1 .1! “’1 ‘ ~-. .2 '~ g 1 . 3 1 g \ g ‘ ‘ 9 17 )- \~. P, 2.5 v1 . 2 \' ~_ 1r- 5 1. '5 K ~. 3 Z] r. 5 . .. E \ ,5 1s \ . ° 15 ‘5 ° \ 2 1, 2 .\ 111- \ g. 15- \ ‘\ \ . 1 5 \. . \ ages“ . 1 \ 1 1 1 a.“ ’ A 1 1‘ L 1 1 1 1 1 1 1 4L '4 2x1 [15 l 15 2 2.5 3 ‘35 4 2.8 2.9 3 3.1 3.21 33 -. 3.4 3.5 3.8 3.7 3.8 Barala - bps x 105 Birate ~ bps I 10‘ P Frame 8 Frame raster-scan brt alloc 7U _ raster-scan brt allot m , BPRD .- BPRD — - BPPDusmg ASE — - BPPDusmg ASE 60 .m! .mi g 1 E l. o . ' ll. ‘5 m V - r e aw 1, 3 3 )1 3 10 1 3 - u 9 1'. 1; 1, £311 .\, gar» 1‘:- g it D \ o m 1 " 3 20» '\ 3 \p 1. ‘ . 10 R m \i‘. ‘ “ .1 w \ ““Pkfi. + % L ‘~x —— 1 1 1 1 1 1 V.. 0.5 1 1.5 2 2.5 3 3.5 4 Bunte- bps x 106 0.5 1 1.5 2 2.5 3 35 4 Hints - bps r 10‘ Figure 4-4 Foreman I, P and B frames for Qp=28 76 u-scan b‘I aloe. — CBPRD -- BPRDusingASE Figure 4-5Mobile I, P and B frames for Qp=28 77 It can be seen from the figures, that the CBPRD algorithm performs more optimally in these regions than the BPRD would, if the same measure of distortion were to be used for both the algorithms. The Akiyo sequence is a Class A sequence. Figure 4-2 shows the results of applying the CBPRD algorithm to the Akiyo I, P and B framesz. Figure 4-3 depicts the performance of the CBPRD algorithm for the sequence coded at a base layer of 100 kbits using the TM-S rate-control algorithm. The Foreman sequence is Class B sequence. It has medium spatial detail and low amount of motion. Figure 4-4 shows the results of running the CBPRD algorithm on the I, P and B frames for the sequence coded using a constant Qp value of 28 in the base layer. This corresponds to a base layer bitrate of approximately 100 kbits/s. The Mobile sequence is a Class C sequence. It has high spatial detail and low amount of motion. Figure 4-5 shows the results of applying the CBPRD algorithm to the enhancement 2 The square error numbers shown in these plots are based on averaging the accumulated square error over the size of the picture under consideration. In this case, the picture size is 352x288 pixels. 78 The Mobile sequence is a Class C sequence. It has high spatial detail and low amount of motion. Figure 4-5 shows the results of applying the CBPRD algorithm to the enhancement .. hm. 'H { -&-" ,fr’j ”arse-"F ff . . ' raster-scuba aloe. \ tutu-ambush; — CBPRD j ‘ -, — cam —. BPRDuangASE _ 1. —. 39110me a raster-scan bl aloe. 0M0 —- BPRDuaingASE Figure 4-6 Mobile I, P and B frames with Base layer-100k 4.3 Conclusions The performance of the cross-bitplane rate-distortion rate-control (CBPRD) algorithm has been presented. The results have been tested on three different classes of test-sequences. In general, the CBPRD algorithm also behaves according to theoretical expectations and produces a convex rate-distortion curve. The improvement in the performance, in terms of the accumulated squared error is more pronounced in the bitrate range of 20 kbits to 100 kbits per flame. Considering sequences coded to a flame rate of 10Hz, this corresponds to a total enhancement layer bandwidth range of 200 kbits/s to l Mbits/s. The CBPRD, like the BPRD presented earlier, produces its highest gain in performance at bandwidth ranges that can be available over current and evolving access networks (e. g., Internet access over Local- Area-Networks, cable modems, DSL etc.). Again, as in the case of the BPRD, the performance of the CBPRD is found to be dependent on the sequence characteristics. It was also seen that in most cases, the performance of the BPRD using the distortion measure ADM performs just as well as the J CBPRD itself. This indicates that in view of the simpler implementation requirements posed by the BPRD, calculating the distortion measure AD,” and using this measure in the J implementation of the BPRD algorithm, the computational overhead incurred in using the CBPRD can be reduced. Consequently, for scalable video rate-control systems that desire to achieve optimum rate-distortion performance while maintaining the lowest possible computational 80 complexity, the BPRD algorithm provides a good option. As shown above, the BPRD algorithm virtually performs as well as the more optimum and more complex CBPRD. 81 5 Summary Of Conclusions In this thesis, we have developed and analyzed the performance of two rate-distortion based algorithms for scalable video rate control. These algorithms are tailored for the recently developed F ine-Granularity—Scalable (FGS) video coding methods such as the one adopted by the MPEG-4 video standard. The first rate-distortion based rate-control algorithm operates on a bitplane-by-bitplane basis within an FGS compressed bitstream. We refer to this algorithm as the bitplane RD algorithm (BPRD). A more general RD algorithm that operates across all possible bitplanes of an FGS coded stream is subsequently presented. We refer to the second algorithm as the cross-bitplane RD (CBPRD) algorithm. The distortion measures used for these two algorithms are different. The distortion measures selected in both cases are based on the popular square-error criterion. One of our primary objectives has been the ease of computation of the distortion measure. We have shown that a simple count of ones found in the bitplanes of the lossless (original) video in the transform domain corresponds directly to a square-error measure. We have shown that the RD algorithms developed using the specified distortion measures gave a better rate-distortion performance than the existing raster-scan rate-control method that is currently implemented in the MPEG-4 FGS verification model. It was also 82 observed that the region of greatest improvement corresponds to a bandwidth range that can be available over current and evolving Internet access networks such as LAN 5 and DSL. The calculation of the rate and distortion can be done off-line and the rate-distortion ratio for every macroblock can be stored prior to transmission. This implies that implementing these RD based algorithms in real-time calls for very little computational overhead during the real-time streaming process of the desired video. 5.1 Future Work One extension of the work done in this thesis is to incorporate the algorithm into existing decoder systems for the MPEG-4 FGS standard coded video. This will verify that the theoretical improvements observed in the transform (DCT) domain presented in this thesis can also be observed in the pixel domain. In the case of image and video systems, determining the best measure of distortion is a challenge. The distortion measures used in this thesis were designed to reduce the computational complexity, while remaining as close as possible to the actual mean-squared error in the transform domain. Thus, another area of extending the work done in this thesis is to explore the possibility of finding better measures of distortion that can correlate better with subjective improvements as perceived by the human visual system. This could include incorporating different weights for the distortion associated with the different DCT coefficients in the transform domain. 83 Finally, we are currently investigating the correlation among the three key measures used in the development of the proposed algorithms: change-in-rate AR , change-in-distortion AD , and their ratio AD/ AR. This correlation study could lead to an efficient and optimum estimate of the RD ratio (AD/AR) based on measuring only one of the two parameters (AD, AR). This approach could be useful for systems that can easily measure (or have access to) one of the parameters but not the other. 84 [2] [6] [7] [8] [9] References A.K. Jain, Fundamentals of Digital Image Processing, Prentice Hall Information and System Sciences Series, pp. 476-560, 1989. AK. Jain, “Image data compression: A review,” Proc. IEEE, vol. 69, pp. 349-384, 1981. A. Ortega and K. Ramachandran, “Rate distortion methods for image and video compression,” IEEE Signal Proc. Magazine, pp. 23- 50, Nov. 1998. A. Puri and R. Aravind, “Motion-compensated video coding with adaptive perceptual quantization,” IEEE Trans. Circuits Syst. Video T ech., vol. 1, pp. 351-361, 1991. C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. Journal, vol.27, pp.379-423, 1948. C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” in IRE National Convention Record, Part 4, pp. 142-163, 1959. C. Gonzales and E. Viscito, “Flexibly scalable digital video coding,” Signal Processing: Image Communication, vol. 5, Feb. 1993. CCITT Recommendation H.120, “Codecs for videoconferencing using primary digital group transmission,” Geneva, 1989. CCITT Recommendation H.261, “Video codec for audiovisual services at p x 64 kbit/s,” Geneva, 1990. 85 [10] [11] [12] [14] [15] [16] [17] [18] CCITT SG XV, “Drafi revision of recommendation H.261 ,” Document 572, Mar. 1990. D. J. Le Gall, “The MPEG video compression algorithm,” Signal Processing: Image Communication, vol. 4, pp. 129-140, 1992. D. Lee, “New work item proposal: J PEG 2000 image coding system,” ISO/IEC JTCl/SC29/WG1 N390, 1996. E. Viscito and C. Gonzales, “A video compression algorithm with adaptive bit allocation and quantization,” Proc. SPIE, 1605, pp. 58—72, 1991. F.Ling, W.Li, and H.Sun, “Bitplane coding of DCT coefficients for image and video compression,” in Proc. SPIE Visual Communications and Image Processing (V CIP), San Jose, CA, Jan.25-27, 1999. G. K. Wallace, “The JPEG still picture compression standard,” IEEE Trans. Consumer Electronics, vol. 38, pp. 18-34, 1992. H. G. Musmann, M. Hotter and J. Ostennann, “Obj ect-oriented analysis-synthesis coding of moving images,” in Signal Processing: Image Communication, vol. 1, pp. 117-138,1989. H. Radha and Y. Chen, “F ine-granular-scalable video for packet networks” in Packet Video '99, New York, Apr. 1999. H. Radha, M. van der Schaar, Y. Chen, “The MPEG-4 fine-grained scalable video coding method for multimedia streaming over IP,” IEEE Trans. Multimedia, vol. 3, Mar. 2001. 86 [19] [20] [21] [22] [24] [25] [26] [27] [28] ISO/IEC 11172-2, “Information technology- coding of moving pictures and associated audio for digital storage media at up to about 1. Mbits/s— video,” Geneva, 1993. ISO/IEC 14496-2, “Information technology- coding of audio-visual objects: visual,” Committee Draft, ISO/IEC JTCl/SC29/WG11, N2202, Mar. 1998. ISO/IEC J TC/SC 29/WG ll N3515, “MPEG-4 video verification model version 17.0”. ISO/IEC JTCl/SC29/WG11, “MPEG-4 version 2 visual working draft rev 3.0,” N2212, Mar. 1998. J. De Lameillieure and R. Schafer, “MPEG2 image coding for digital TV,” F ernseh und Kino T echnik, 48. Jahrgang, pp. 99-107, Mar. 1994 (in German). K. Sayood, Introduction to Data Compression, Morgan Kaufrnann, 1996. L. Stenger, “Digital coding of television signals—~CCIR activities for standardization,” Signal Processing: Image Communication, vol.1, pp. 29-43, 1989. M. Hauben, “Behind the net- the untold history of the ARPANET,” http://www.dci.isepipp.pt/docs.l"'ar‘pa.html M. Nelson, The Data Compression Book, M&T Books, 1995. MPEG-4 FAQ, ISO/IEC JTCl/SC29/WG11, July 1997. http:h’drogo.cscltstct.it/mpcg/fagl aq_mpcg-4.htm 87 [29] [30] [31] [34] [35] [36] [37] N. J ayant, J. Johnston, and R. Saflanek, “Signal compression based on models of human perception,” Proc. IEEE, Oct. 1993. N. Wakamiya, M. Miyabayashi, M. Murata and H. Miyahara, “MPEG-4 video transfer with TCP-friendly rate control," Proc. 4"' IEEE Intl. Conf on MMNS, 2216, pp. 29-42, 2001. R. C. Nicol and N. Mukawa, “Motion video coding in CCITT SG XV — the coded picture format,” in Conference Record of IEEE Global Telecommunications Conference and Exhibition—Communications for the Information Age, Hollywood, vol. 2, pp. 992-996, Dec. 1988. R. J. Clark, Transform Coding of Images, Academic Press, Orlando, Florida, 1985. R. Schifer and T. Sikora, “Digital video coding standards and their role in video communications,” in Proc. IEEE, pp. 907-924, vol. 83, Jun. 1995. Scalability for stereo video coding, http://trkjist.ac.kr/~3Dz"Research/Stcrco/scalc_vitico.html T. Berger and L. D. Davidson, “Advances in source coding,” in CISM- Courses and Lectures, no. 166, Udine 1975. T. Ebrahimi, “MPEG-4 video verification model 8.0,” ISO/IEC JTC 1/SC29/WG11 MPEG-97 Doc No. W1796, July 1997. T. Sikora and L. Chiariglione, “The MPEG-4 video standard and its potential for future multimedia applications,” Proc. IEEE ISCAS Conf, Hong Kong, June 1997. 88 [38] [39] [40] [41] [42] [43] [44] [45] T. Sikora, “ The MPEG-4 video standard verification model,” IEEE Trans. CS V T, vol. 7, Feb. 1997. “Test model 5,” ISO/IEC JTCl/SC29/WG11/No. 400, MPEG93/457, Apr. 1993. Testing Ad Hoc Group, “JPEGZOOO testing results.” ISO/IEC JTClJ/SC29/WG1/N705, Nov. 1997. V. Paxson, “End-to-end intemet packet dynamics,” in Proc. ACM SIGCOMM, vol. 27, pp.13-52, Oct. 1997. W. Li, “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. CSVT, vol. 11, Mar. 2001. W. Pennebaker and J. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, 1994. W.H. Chen and WK. Pratt, “Scene adaptive coder,” IEEE Trans. Communications, pp. 225-232, 1984. W.Li, F.Ling and H.Sun, “BitPlane coding of DCT coefficients,” ISO/IEC JTCl/SC29/WG11, MPEG97/M269l. Oct. 1997. 89