gut-aha. s,

.h 5;?

in} A.

 

o r-oaucma‘nv

 

 

i. ..., .2 ..

 

 

This is to certify that the
thesis entitled

RATE DISTORTION BASED RATE CONTROL
METHODS FOR SCALABLE VIDEO

presented by

SHARADHA PARTHASARATHY

has been accepted towards fulﬁllment
of the requirements for

”-5- degree in ELECTRICAL ENGR.

 

 

 

/‘/// / / // 1:, 1’ /
1377/ K: 1:11/4 ;
/ . z 1 V L /

f, (2" / Major professor

’/
f y.

Date 7/23/02

 

0-7639 MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

 

 

_ LIBRARY
Michigan State
University

 

 

 

PMCE IN RETURN Box to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c-JCIRCJDateDuopGS-uts

 

RATE-DISTORTION BASED RATE-CONTROL
METHODS FOR SCALABLE VIDEO

By

Sharadha Parthasarathy

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Department of Electrical and Computer Engineering

2002

ABSTRACT

RATE-DISTORTION BASED RATE-CONTROL METHODS
FOR SCALABLE VIDEO

By

Sharadha Parthasarathy

Scalable video coding techniques are used to facilitate the delivery of streamed
video content over networks with varying channel characteristics and unpredictable
quality-of-service ((208) such as the Internet and wireless Local-Area-Networks (LANs).
Rate control methods are typically used with scalability solutions to adapt the video to
variations in bandwidth availability. In this thesis, we develop two rate- distortion (RD)
based rate-control algorithms for scalable video coded using the new ﬁne-grained
scalability (F GS) technique incorporated in the MPEG-4 video-coding standard. First, we
show that the distortion measures, which are based on a square error criterion, can be
computed using a very simple approach in the transform (DCT) domain. Second, we
describe the two RD algorithms and their implementations within the transform domain
of the FGS MPEG-4 framework. Finally, we present the results of experiments conducted
to test the performance of the RD-based algorithms against the existing rate-control
method. We ﬁnd, upon analysis of these results in the transform domain, that the
proposed rate-control algorithms provide optimum (convex) RD curves for a wide range
of bitrates and video sequences. Improvements shown in the RD curves are particularly
salient over bitrate ranges that can be supported over emerging Internet access

technologies (e. g., DSL, cable modems, and wireless LANs).

To My Parents and Meera,

For all your love and support

iii

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to my advisor, Dr. Hayder Radha, for
providing me with the opportunity to work with him. Without his guidance and support
over the last two years, this thesis would not be possible. I would also like to thank the
members of my defense committee, Dr. John R. Deller and Dr. Percy Pierre, for their

valuable comments and advice.

Thanks are also due to all the members of the WAVES lab at Michigan State
University for their constructive reviews of my work and more importantly for their
ﬁ’iendship and support. I would specially like to thank Apama Gurijala for always being

there for me.

Finally, I would like to thank my parents and my sister for teaching me the
importance of hard work and patience, for their perpetual belief in my abilities and for

their irreplaceable support through all these years.

iv

TABLE OF CONTENTS

List of Figures

1 Introduction

1.1 Objective Of Thesis Work
1.2 Streaming Video Over The Internet
1.3 Scalability Techniques
1.3.1 Fine Granularity Scalability (FGS)
1.4 Summary And Thesis Organization

2 Background

2.1 Rate Distortion Theory

2.2 Image And Video Compression
2.2.1 Compression Techniques
2.2.2 Video Coding Schemes
2.2.3 Distortion Measures

2.3 MPEG Video Coding Standard
2.3.1 Compression
2.3.2 Frame Types
2.3.3 Transform Coding

2.4 MPEG-4 Scalability Structure

2.5 MPEG-4 Video Coding Framework
2.5.1 The Encoder
2.5.2 The Streaming Server
2.5.3 The Decoder

vii

\ILIIUJNH

10
12
13
14
15
l6
16
18
19
20
22
22
27
28

2.6 Summary

3 Bitplane Rate-Distortion Based Rate-Control Algorithm

3.1 Rate And Distortion Measures

3.2 The Rate Control Algorithm
3.2.1 Mathematical Formulation
3.2.2 Description

3.3 Experimental Analysis

3.4 Conclusions

4 Cross-Bitplane Rate-Distortion Based Rate-Control Algorithm

4.1 Developing The Algorithm
4.1.1 Description
4.2 Experimental Results

4.3 Conclusions

5 Summary Of Conclusions

5.1 Future Work

References

vi

29

31

34
37
38
41
44
65

69
72
73
81
83

84

88

LIST OF FIGURES

Figure 2-1 Typical (optimal) RD tradeoff curve 11
Figure 2-2 4:2:0 composite color standard 17
Figure 2-3 I, P and B Frames in MPEG [30] 19
Figure 2-4 FGS scalability structure [30]. 21

Figure 2-5 System conﬁguration for streaming video over the Internet [42] 22

Figure 2-6 Basic (SNR) encoder for the FGS base and enhancement layers.
It is clear that the added complexity of the FGS enhancement-layer
encoder is relatively small [18]. 23

Figure 2-7 An example of different number of bitplane levels for the
different color components. 25

Figure 2-8 an example of the number of passes required to code a block.
This number depends on the arrangement of Is and Os in the block. A
continuous, uninterrupted string of zeros is called a run. Here, number
of runs =4. 26

Figure 2-9 Examples of the FGS scalability structure at the encoder (left),
streaming server (center), and decoder (right) for a typical unicast
Internet streaming application. The top and bottom rows of the ﬁgure
represent base-layers without and with Bi-directional (B) frames,
respectively. Note that only part of the enhancement layer is
transmitted in real time. [18] 28

Figure 2-10 Basic (SNR) FGS decoder for the base and enhancement layers
[18]. 29

Figure 3-1 Pruning of macroblocks- width of the macroblocks represents the
number of bits needed to encode that macroblock (a) Raster-scan
rate—control (b) R—D based rate-control 33

vii

Figure 3-2 Different rate-distortion curves
Figure 3-3 Akiyo I frame Qp=8

Figure 3-4 Akiyo P frame Qp=8

Figure 3-5 Akiyo B frame Qp=8

Figure 3-6 Akiyo I Frame Base layer=100k
Figure 3-7 Akiyo P Frame Base layer=100k
Figure 3-8 Akiyo B Frame Base layer=100k
Figure 3-9 Foreman I frame for Qp=28

Figure 3-10 Foreman P frame for Qp=28
Figure 3-11 Foreman B frame for Qp=28
Figure 3-12 Foreman 1 frame Base layer=100k
Figure 3-13 Foreman P frame Base layer=100k
Figure 3-14 Foreman B frame Base layer=100k
Figure 3-15 Mobile I frame for Qp=28

Figure 3-16 Mobile P Frame Qp=28

Figure 3-17 Mobile B Frame Qp=28

Figure 3-18 Mobile I frame Base Layer=100k
Figure 3-19 Mobile P Frame Base Layer=100k
Figure 3-20 Mobile B frame Base layer=100k

35
47
48
49
50
51
52
53
54
55
56
57
58
59
6O
61
62
63
64

Figure 4-1 Pruning of macroblocks- width of the macroblocks represents the
number of bits needed to encode that macroblock (a) BPRD 4020,, is

selected from bitplane level=j (b) CBPRD- 2m

macroblocks on different bitplane levels

viii

is selected from

68

Figure 4-2 Akiyo I, P and B frames for Qp=8

Figure 4-3 Akiyo I, P and B frames for Base layer=100k
Figure 4-4 Foreman I, P and B frames for Qp=28

Figure 4-5Mobi1e I, P and B frames for Qp=28

Figure 4-6 Mobile I, P and B frames with Base layer=100k

ix

75
76
77
78
8O

1 Introduction

The advent of the World Wide Web has essentially transformed the Internet into the
world’s largest public network. The popular applications have evolved from being ﬁle
transfer and e-mail to video conferencing, Internet telephony, Web searching and
multimedia streaming. For streaming applications in particular, a primary consideration is
the necessary ability to transmit data at sustained and relatively high rates. Recent
technological advances enable the network to handle these data rates. Some examples of the

new tools proposed in this regard include scalability and rate control.

In this chapter, we ﬁrst state the objective of the work presented in this thesis. We then
provide an introduction to streaming video over the Internet and brieﬂy examine certain
constraints posed by this network on such an application. We also discuss the scalability
techniques used to work around these constraints. We conclude this chapter with an

overview of the organization of the rest of this thesis.

1.1 Objective Of Thesis Work

One of the scalability techniques proposed for the MPEG-4 video-coding standard is
ﬁne-granularity scalability (FGS)[17]. This technique involves coding the video into two
layers- a base layer and a single enhancement layer. The base-layer is the non-scalable video
component. It determines the minimum quality of the reconstructed video at the receiving

l

end. The enhancement layer is the scalable video component. In MPEG-4 FGS, the
enhancement layer is partially transmitted and its size depends on the available bandwidth
for transmitting the video. Rate-control is essentially carried out in the FGS enhancement
layer by means of a brute-force raster-scan bit allocation method. Such a rate-control method
has very low complexity but it is not optimal. In this thesis, we develop and analyze the
performance of two rate—control algorithms that result in optimal rate-distortion performance
for the F GS enhancement layer. The algorithms are operational rate-distortion algorithms, in
the sense that they are designed speciﬁcally for the MPEG-4 FGS encoder-decoder system.
The analysis and development presented in this thesis is carried out in the transform domain.
The distortion measures selected for the two algorithms are designed to minimally increase
the computational complexity. The presented RD-based algorithms are easy to implement
and have low complexity. They are also independent of the rate-control scheme used in the

base layer.

1.2 Streaming Video Over The Internet

The Internet has its roots in the ARPANET[26]. It was initially designed for networking
research and for scientists to exchange information amongst themselves. The primary
applications were download-based, wherein the transfer and viewing are temporally
sequential. Real-time multimedia applications differ from the traditional Internet applications
in that viewing takes place at the same time as downloading. For example, in the case of
streaming applications, such as video streaming, the user views the video as the transfer

occurs. In other words, transfer and viewing are concurrent. The best effort model on which

2

the Internet is based[41], does not offer data delivery reliability. Hence, there is a variation
in the probability of packet loss. This could lead to an unacceptable quality in a video
streaming application. Another concern in the streaming of video over the Internet is high
bandwidth variation. The heterogeneous nature of the access-technologies of the receivers
(analog modems, cable modems, DSL etc.) and congestion over the core network are two of

the primary reasons for non-constant bandwidth availability.

It is necessary to overcome the problems posed by unreliable packet transmission and
bandwidth availability for dependable video streaming quality. The objective of many
contemporary research efforts in this ﬁeld has been to ﬁnd Internet video coding solutions so

as to make Internet video quality comparable to television telecasts.

One generic framework that addresses the primary concerns of video coding and
networking is scalability. A video application that is to be streamed over the Internet must
“adapt” to changes in the network conditions- speciﬁcally to the variations in the available
bandwidth. This is made possible by using scalable video. Scalable video coding primarily
aims to provide minimum real-time processing of the video, high adaptability of the coded

video to network conditions, low complexity decoding and resilience to packet-losses.

1.3 Scalability Techniques

Scalability is one of the most desirable properties in a ﬂexible video encoder-decoder
module. The basic idea here is to use layers of video. The layers comprise of a single non—

scalable base layer and one or more enhancement layers. If a user is incapable or unwilling

to reconstruct the video at its complete resolution, scalability allows that user to decode
subsets of the layered bitstream. Therefore, he or she can display the video at a lower spatial
or temporal resolution or with lower quality. Another important purpose for layering of

video is that it makes the bitstream suitable for prioritized transmission.

Some common tools used in recent video coding standards (such as MPEG-2) are: data
partitioning, SNR scalability, spatial scalability and temporal scalability[34][28]. Hybrid

scalability is also supported. Brieﬂy, the salient features of these scalable techniques are:

Data Partitioning: Here, the bitstream is partitioned into important error-prone data (e. g.,
header information and motion vectors) and less-important data (e.g., transform
coefﬁcients). This method, which allows for different protection levels during transmission
for the data types of data in the compressed video bitstream, has low implementation

complexity. It helps in concealment of transmission errors and channel errors.

SNR Scalability: In this method, video layers at the same spatial resolution are generated.
These layers, however, have different video qualities. The lower layer (usually known as the
base layer) provides a basic video quality and the enhancement layers, when added to the

base layer, augment the video quality. The layers are generated from the same source.

Spatial Scalability: This scalability technique involves creating different video layers
having different spatial resolution. The layers are again, generated from a single video
source. The base layer provides the basic spatial resolution. The enhancement layer(s)

encode the interpolated video from the base layer.

Temporal Scalability: Here, the base layer has the basic temporal resolution. The
enhancement layer(s) are coded using temporal prediction with respect to the base layer. At
the decoder, the layers are temporally multiplexed to yield the full temporal resolution of the

video source sequence.
1.3.1 Fine Granularity Scalability (FGS)

In the layered coding schemes mentioned above, the enhancement layer is similar to the
base layer in the sense that, by itself, it is non-scalable. In other words, if the enhancement
layer is not entirely received, it cannot be used to enhance the video quality at all. The
desired objective is to have a video coding scheme that provides scalability progressively

through the enhancement layer.

A new scalability technique called ﬁne-granularity-scalability (FGS) was proposed in
[17] that addressed the said objective. This technique has been included in the MPEG-4
standard for video coding[21][18][42]. The major difference between this technique and the
other scalable coding techniques is that the enhancement layer can be truncated into any
speciﬁed number of bits within every frame. This allows for partial enhancement

proportional to the number of bits decoded for each frame.

The FGS base and enhancement layers employ a video compression algorithm wherein
the original video frames are ﬁrst transform coded using the discrete cosine transform
(DCT). The DCT coefﬁcients are then quantized and run-length encoded. The base layer is
coded at a constant bitrate by means of a rate-control algorithm. A rate control algorithm is

fundamentally a bit allocation algorithm, which guarantees that the average bitrate of the

5

compressed video is a coded at a desired bitrate. This enables any video transmitter (e. g., an
Internet video server) to send the compressed bitstream at that constant rate. One rate—control
mechanism that is used for the F GS base layer is the MPEGZ-TMS (test model 5)

algorithm[39].

In the base layer, the DCT coefﬁcients are ﬁrst subject to quantization. The quantized
values are subsequently run-length coded to form the base layer bitstream. For every frame
of the video sequence, the difference between the original frame and the frame reconstructed
aﬁer inverse quantization gives what is called the residual frame. The residual frame is also
DCT coded. The residual DCT coefﬁcient for every pixel of every frame is entropy encoded
to form the FGS enhancement layer. The entropy coding method used for the FGS
enhancement layer is bitplane coding. In this method, each residual DCT coefﬁcient value is
considered to be an integer that consists of several bits. Therefore, the residual DCT
coefﬁcients, which belong to a given FGS enhancement-layer frame, form multiple
bitplanes. These DCT bitplanes, which range from the most-signiﬁcant-bit (MSB) plane to
the least-signiﬁcant one, give the desired SNR (quality) scalability of the FGS structure.

Details of this coding method are explained later in Chapter 3.

F GS enables the separation between the encoding process and the transmission process.
In other words, the encoder compresses the bitstream prior to the time when the stream is
sent to the receiver(s). This separation is a required feature for applications that rely on
streaming pre-compressed video that is already stored in a web-server. When the video has
to be streamed, the base layer is sent through completely using the constant bitrate that it has
been coded at. An FGS-compatible streaming server sends the appropriate portion of the

6

enhancement layer in a raster-scan order, starting from the most-signiﬁcant-bit (MSB) plane
of that layer. When the server reaches the bandwidth limit (that is available between the
transmitter and the receiver), it discards the remainder of the enhancement layer. This
process represents a rate-control procedure that is very simple to implement, but is not
optimal. In this thesis, we develop and analyze the performance of an optimal rate-distortion
(RD) based rate-control mechanism for the FGS enhancement layer. Based on this
framework we present two RD-based algorithms. These algorithms are easy to implement
and have low complexity. They are also independent of the rate-control scheme used in the

base layer.

1.4 Summary And Thesis Organization

Internet content developers have increasingly used multimedia to make information more
interactive and accessible. There are a growing number of web sites that make available the
option of streaming applications, particularly streaming of video content. Considerable
research has been done towards improving the quality of streamed video
[2][3][l7][37][41][34]. Scalable video coding has been adopted to improve the overall
performance and quality of the video received by the end user. Fine-Granularity Scalability

(F GS) is used with the MPEG-4 video-coding standard for this purpose.

Simple rate control algorithms have been deﬁned and developed for the FGS base layer.
The enhancement layer is cut using a coarse raster-scan approach. In this thesis, we
introduce a rate-distortion based solution for optimal rate-control of the FGS enhancement

layer.

The remainder of this document is organized as follows. Chapter 2 provides a short
discussion of classical rate—distortion theory. It also explicates the need for image data
compression and lists some common encoding techniques. This is followed by deliberations
on the issue of determining a suitable measure of distortion. A few important aspects of the
MPEG video-coding framework are emphasized. Also, an overview of the F ine-Granularity-
Scalability technique used in the MPEG-4 video-coding standard is presented in this chapter.
This includes a discussion of the bitplane-coding method used in FGS and some important
features of the F GS encoder and decoder. In Chapter 3, the mathematics and logistics for the
bitplane rate-distortion based rate-control (BPRD) algorithm are described in detail. These
are accompanied by a comprehensive discussion of experimental results. In Chapter 4, the
mathematical formulation, the algorithm and the results of experimental analysis of an
extension to the BPRD, called the cross-bitplane rate-distortion rate-control (CBPRD)
algorithm, are elucidated. The ﬁnal chapter provides a summary of the conclusions made
from the analysis of the two algorithms proposed in this thesis and states the possible future

work in this area.

2 Background

The need for effective and standardized image compression procedures has been
ampliﬁed by the rapid growth in digital imaging applications such as multimedia,
teleconferencing and high-deﬁnition television (HDTV). In the past few decades, standards
have emerged for still images (JPEG) and motion video (MPEG). In the case of transmitting
video over a network such as the Internet, a primary concern is compressing the video to ﬁt
within the available bandwidth. A compression scheme typically introduces some error into
the video. It is essential that an optimal balance be struck between the rate to which the
video is restricted and the distortion introduced by such a restriction. A branch of classical

information theory, called rate-distortion theory addresses this optimization problem.

The ﬁrst part of this chapter contains a simple explanation of rate-distortion theory and
its relation to the compression of image data. Some of the techniques used for still image
compression and video coding are discussed. Also, the problem of determining a suitable

distortion measure for different types of image data is studied.

In the second part of this chapter, a few important aspects of the MPEG video-coding
framework in general and the MPEG-4 standard in particular are emphasized. MPEG-4
video is intended to provide standardized technology for efﬁcient storage, transmission and
manipulation of video data in the broad spectrum of multimedia applications. Instead of

being application speciﬁc, the solutions are presented in the form of tools and algorithms

that have functionalities common to a set of applications. Efﬁcient compression, object
scalability, spatial and temporal scalability and error resilience are examples of such
functionalities. Our focus in the later sections of this chapter is speciﬁcally the ﬁne-

granularity scalability (FGS) scheme included in MPEG-4.

2.1 Rate Distortion Theory

Rate-Distortion theory can be traced back to Shannon [5][6]. In his Coding Theorem,
Shannon states that a source with an entropy rate of H can be transmitted reliably over any
channel of capacity C subject to the condition that C>H. The converse of this theorem states
that not all the source data can be recovered reliably when H>C. It follows from this that in
order to recover all data correctly, we can either decrease H or increase C. We typically
assume that the channels are such that they have the minimum possible H and the maximum
possible C but the condition of greater entropy (H>C) persists. In such a situation, we
attempt to preserve those aspects of the source data that are most essential. In other words,
we attempt to minimize the distortion subject to the condition that the rate of transmission
cannot exceed the channel capacity. This leads to a trade-off between the rate at which
information is provided at the source output and the ﬁdelity with which this output can be
reconstructed at the receiving end, on the basis of the information provided. This trade-off is
mathematically modeled in the context of information theory by a discipline known as Rate-
Distortion theory [35]. Thus, rate-distortion theory is concerned with maximally reducing

the redundancy from a source (or the number of bits necessary to represent the source

IO

information) for a given reproduction quality. A graphical representation of a typical rate-

distortion trade-off is shown in Figure 2-1

Typical Rate-Distortion Curve

 

 

 

 

0.4 I V j
cask -
l
0.3 ~ \ -‘
\
0.25 ~ \\ 5
E3 \\
'O \
"’ 0.2 ~ \ ..
35’ “x
E ‘x
015 " \ -i
0 1 \\“\ —1
.\\
D 05 ’- \\\‘\ T
“K
0 I 1 l 1 I l 1
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
Bitrale bps

Figure 2-1 Typical (optimal) RD tradeofl' curve

If the bitrate R at which information is provided exceeds the entropy H, then no
information loss occurs and hence no distortion. The minimum distortion that can be
obtained at any rate between R=H and R=0 increases from D=O to some value D=Dmax. If no
information is transmitted (i.e., when R=0), the receiver can estimate the source by using its
mean value. In this case, the mean-square distortion D=Dmax is the variance of the source.
This distortion is the minimum possible value that can be made in the total absence of

information about the output from the source. For a given available rate Rmax, the objective

ll

of an RD-based algorithm is to identify a subset X ' of the source data X that can be

represented with R (or fewer) bits while minimizing some distortion measure D:

X‘ =arg{ min 00292)} (2.1)
X’ RSRmax

We now proceed to look at the concept of rate-distortion in the context of coding image and

video signals.

2.2 Image And Video Compression

The volume of data associated with visual information is extremely high. The very
conversion of an analog video signal into a digital signal calls for a tremendous increase in
required transmission bandwidth. For instance, a 4 MHz television signal sampled at
Nyquist rate with 8 bits per sample would require a bandwidth of 32 MHz when transmitted
using a digital modulation scheme like phase-shift keying (PSK). Thus, the analog to digital
conversion results in almost an eightfold increase in bandwidth [1]. Also, video image
sources generate data at very high rates. Typical motion data images like television images
are generated at rates greater than 10 million bytes per second. Other sources produce
images at even higher bit rates. Evidently, transmission of such data requires large channel
capacity, which can be very expensive and not available to all users in many applications
like streaming video over the Internet. There is, therefore, a need to reduce the data volume

and accommodate the data within the available bandwidth. Image data compression

[2

techniques are applied to compensate for the increase in bandwidth caused by digital
conversion of the video signal. They do so by reducing the number of bits required to

transmit such data with tolerable loss of information.
2.2.1 Compression Techniques

Compression can be achieved by the use of “lossless” techniques. Here, the data obtained
after decompression are an exact copy of the original (e.g. staple software tools like zip or
compress). Lossless compression schemes can be classiﬁed as spatial coherence based
encoding (e.g. run-length encoding and statistical encoding), entropy encoding (e.g.
Huffman encoding, arithmetic encoding) and codebook encoding (e.g. LZW coding). Details
on these compression schemes are in [27][24]. Such techniques are independent of the type
of information. They are only concerned with how the information is represented. They are
applicable in cases where a perfect reconstruction of the source is an essential requirement.

Lossless compression techniques are mainly used for compressing digital manuscripts etc.

When image or video data are being handled, the large volume of the data, the bandwidth
and storage requirements limit the performance of a lossless compression method. In such a

scenario, “lossy” compression techniques are a preferred alternative [43].

Compression is said to be lossy if the decoded image is not an exact replica but is instead
an almost indistinguishable reconstruction of the original image (we can recreate an image
that is “indistinguishable” from the original if the properties of the human visual system are
correctly exploited). Here, source ﬁdelity is compromised for a better and more feasible

coding rate. This compromise or trade—off is exactly the rate-distortion trade-off [3]. The

I3

trade-off is between the number of bits used to represent the image (the rate) and the ﬁdelity

of the reconstructed representation (the distortion).

Lossy encoding is deﬁned as a data compression algorithm that loses some of the input
data during encoding in order to achieve a better compression ratio. Some common lossy

compression schemes are predictive coding and transform coding [32].

2.2.2 Video Coding Schemes

Digital video coding technology has diversiﬁed and it is targeted for a plethora of
emerging applications, such as video on demand and digital TV/HDTV broadcasting. The
rising commercial interest in video communications posed the need for international image
and video coding standards. Today, there are various video coding standards available.
These standards were developed in conjunction with the development in coding,

transmission, video storage and VLSI design technologies.

The Study Group XV of the Community Colleges for Innovative Technology Transfer
(CCITT)l proposed the ﬁrst international standard for video coding, H.120, in l984[8]. With
advances in image compression techniques and implementation technology, other standards
have been proposed and been more commercially successful. These include H.261, H.263,

Motion JPEG, MPEG-1, MPEG-2 and MPEG-4 [33],[31]-[10],[16]. The JPEG and MPEG

 

' The CCITI‘ was re-named as the International Telecommunications Union — Telecommunications (ITU-

T) sector in 1993.

I4

visual coding standards were developed under the auspices of the International Standard
Organization (ISO). In section 2.3, we review some of the characteristics of the MPEG video

coding standards.

2.2.3 Distortion Measures

Once an image has been converted into a digital format and is subjected to lossy
compression, the question of how to measure the trade-off in the ﬁdelity of the
decompressed representation arises. Extensive and continuing studies have been performed
for a suitable measure of the distortion for images and video. An objective distortion
measure that is widely used and accepted is the mean-squared error (MSE). It has been
shown that systems that are optimized for MSE performance also give fairly good results in
terms of the perceptive quality of the image [12][40]. The Mean-Squared Error is the sample
average of the total squared error between the compressed and the original image. It is
mathematically represented as follows:

_1_
MN

MSE =

ﬁ[1(x,y)-1'(x,y)]2 (2.2)

l x=l

[V]:

‘C
ll

where, I (x, y) is the original image, I '(x, y) is the decompressed image and M, N are the

dimensions of the image.

Another metric that is used for evaluating the performance of an image compression
technique is the peak signal-to-noise ratio (PSNR). The PSNR is closely related to the MSE

and is expressed in terms of the MSE as:

15

PSNR = 20* 16g10 (255/JMSE) (2.3)

A lower value of MSE implies a lesser error in the decompressed image. Inversely, a
high PSNR is desirable because it implies that the ratio of signal to noise is higher where the

‘signal’ is the original image and the ‘noise’ is the error in reconstruction.

2.3 MPEG Video Coding Standard

The Moving Picture Experts Group (MPEG) was founded in 1988 to standardize video
coding algorithms for digital storage media at bit rates up to about 1.5 Mbits/s [19].
However, beginning with MPEG-2, the standards have been more generic. This means that
the standard is not application speciﬁc and mainly consists of a set of tools that can be
tailored to meet the needs of the user. In this section, we review some important

characteristics of the MPEG framework.

2.3.1 Compression

Video consists of sequences of still pictures or frames that, if uncompressed, have data
volumes that are extremely high. In the MPEG standard, each frame is divided into an array
of 16x16 pixels, called a macroblock. Each macroblock comprises of four 8x8 luminance
(intensity) blocks (or Y-blocks) and two 8x8 chrominance (color) blocks (Figure 2-2). The
chroma blocks are each one U-block (blue) and one V-block (red). The color information,
therefore, has half the horizontal and vertical resolution as the luminance Y information

(ITU—R 601). This component video standard is known as the 422:0 standard. A picture with

[6

a luminance resolution of 360 lines and 288 pixels per line is known as the common

interchange format (CIF).

1W

 

 

Figure 2-2 4:2:0 composite color standard

Popular video coding standards (including MPEG) employ a hybrid Motion-
Compensation (MC) prediction and transform-domain DCT-based compression method.
First, a macroblock at a given ﬂame is predicted ﬂom one or two adjacent ﬂames using
Motion Estimation (ME). The prediction error is then transformed (using DCT), quantized,
and entropy-coded. The type of prediction that is used for a particular ﬂame deﬁnes different

picture types as explained below.

2.3.2 Frame Types

MPEG video sequences consists of ﬂames of three types:

17

Intro-coded Frames: (1 Frames): The I ﬂames use DCT encoding to compress one single
ﬂame, without being dependent on any other ﬂame. I ﬂames are inserted once every 12-15

ﬂames and are used to start a sequence. Decoding of video can only start at an I frame.

Predicted Frames: (P Frames): P ﬂames use motion compensation and DCT encoding to
“predict” the values of each pixel. These frames are coded as the difference ﬂom the last I or
P ﬂame. They can offer a better compression ratio than I ﬂames, depending on the amount

of motion present.

Bi-directional Frames: (B Frames): B ﬂames also use motion compensation and DCT
encoding to estimate the value of each pixel. They differ ﬂom the P ﬂame in the sense that

they can use either the last I or P frame or the next I or P ﬂame to get the value of each pixel.

Quantization further improves the compression ratio in all the ﬂames. It is also useful to
note, at this juncture, that since B Frames use both previous and subsequent ﬂames for
correct decoding, the temporal sequence of the ﬂames in MPEG video differs ﬂom their
spatial sequence. Figure 2-3 illustrates the types of ﬂames and their sequential orders in

space and time.

 

BIBBPBBPBBPBBI

Figure 2-3 I, P and B Frames in MPEG [30]
I8

2.3.3 Transform Coding

In MPEG video coding, the Discrete Cosine Transform (DCT) is used as part of the

video compression algorithm.

The DCT is a mathematical transformation of a 2D matrix of pixel values into an
equivalent matrix of spatial ﬂequency components. It is an invertible, discrete, separable

orthonormal transformation. The DCT can be mathematically represented as:

B(k,,k,)= {—13— In] (— 2%) iii/\(iymjnosB—NL"(2i+1)]cos[2;:(2j+1)].f(i,j)

i-O j-O

 

where,

A(§)= 21’“ 0

l,otherwise
(2.4)
and A is the input image, B is the output image, A(i, j) is the intensity of the pixel in row i
and column j within an Nl x N2 image block, B(k1,k2) is the DCT coefﬁcient in row kl and
column k2 of the DCT matrix.
The DCT, by itself, is a lossless transform. However, when used as part of an image
compression algorithm, it is subjected to quantization and subsequent run-length encoding.

In MPEG, the DCT is used to transform blocks of pixel values into blocks of spatial

frequency values. These blocks are organized in such a way that the lower-ﬂequency

19

components lie in the upper-left comer and the higher-ﬂequency blocks lie in the lower-right
corner. By this, the high ﬂequency components are ﬁrst dropped or discarded. Once DCT
has been applied to an image, no spatial localization can be recovered. Also, canceling out
sets of ﬂequencies can lead to high distortion in image regions where the low frequency
components have low amplitude. In order to overcome these shortcomings, the compression
algorithm in MPEG applies the DCT to 8x8 blocks. This speeds up the algorithm and also,

localizes the ﬂequency content.

2.4 MPEG-4 Scalability Structure

Scalable video coding solutions have been used to facilitate delivery of multimedia
content over networks with varying channel characteristics and unpredictable quality-of-
service measures (e.g. the Internet)[7]. All types of scalable video consist of a base layer and
one or more enhancement layers. The base layer represents the minimum data needed for
decoding the video, so as to obtain a minimum video quality. The enhancement layer

represents additional information that, if sent to the decoder, enhances the video quality.

Scalable video solutions are usually based on some scalability structure. This structure
deﬁnes the relationship between the base layer and the enhancement layer ﬂames of the
video sequence. One scalability structure supported by MPEG-4 is ﬁne-granular scalability.
This structure allows for progressive decoding. This means that the decoder can start
decoding once it has the base layer and any or no part of the enhancement layer. As more
data is received, the quality of the decoded image improves. This process continues until the

entire enhancement layer is received.

20

The F GS scheme in MPEG-4 is based on the scalability structure proposed in [17]. FGS
is essentially a hybrid video-scalability solution. It has a prediction-based, motion-
compensated base layer and a ﬁne granular scalable enhancement layer[l7]. This is
illustrated in Figure 2-4. These characteristics allow it to cater to the dual requirement of

efﬁcient coding and continuous ﬁne-grained scalability.

 

FGS Enhancement layer

 

 

 

 

Figure 2-4 FGS scalability structure [30].

2.5 MPEG-4 Video Coding Framework

A typical system conﬁgured for streaming video over the Internet has the basic blocks of
an encoder, streaming server, channel and decoder. This basic block diagram is shown in

Figure 2-5.

21

 

 

—-> Encoder

 

 

 

 

 

 

l' t/
Server 4 Channel .1 C ten —§
Decoder

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2-5 System configuration for streaming video over the Internet [42]

In this section, we shall look at the structure of these blocks in the MPEG—4 FGS
framework. We proceed in logical order, starting ﬂom the encoder and ending with the

decoder.

2.5.1 The Encoder

The detailed block diagram for the MPEG-4 F GS encoder is shown in Figure 2-6. The

FGS framework requires the encoder to have two encoders- one to generate the base layer

22

    
 

 

 

"Adaptive Quantization"
(optional)

    
  
    
  
  
  

 

  
 

FGS EL
Frequency Weighling Bitplane DCT scanning and
(bit-shifting selected coeff.) entropy coding “ream

 

 

 

   
    
   
 

Selective Enhancement
(bit-shifting selected MBs)

DCT (bitplanes)
Residual
lma_e

    

 

 

FGS Enhancement Layer
Encoder

 

 

Original

     
   
     
 

Entropy
Encoder

  

Inverse

DCT

 

In verse
Quanlt.
Q .

   
   

 

Frame
Memory

M no
'I Estimation I

Base Layer
Encoder

 

 

 

V

 

 

Figure 2-6 Basic (SNR) encoder for the FGS base and enhancement layers. It is clear that the added

complexity of the FGS enhancement-layer encoder is relatively small [18].

bitstream and the other to generate the enhancement layer bitstream. The base layer is

coded to a minimal acceptable video quality. In other words, the minimum available

bandwidth must always be at least equal to the bitrate chosen for the base layer. This ensures

that the reconstructed video has a quality equal to that provided by the base layer bitstream.

23

Thus, the base layer is coded to the minimum available bandwidth (RBL ~R ). The

enhancement layer ﬁtlly utilizes the additional available bandwidth at any instance of time.
This can be estimated in real-time. The upper limit on the size of the enhancement layer

(Ru = Rm — REL) is determined by the maximum bandwidth Rmax that is available at time

of transmission. An important observation here is that the range [R Rm] can be

min ’

determined off-line.

The base-layer encoder contains motion compensation and motion estimation tools.
Hence, the base layer bitstream is organized into I, P and B ﬂames. The base layer is
compressed using the DCT-based video encoding method. This method demonstrates good
coding efﬁciency, particularly at low bitrates. The DCT coded video is quantized and sent to
an entropy encoder to form the base layer bitstream. In the base layer, the DCT coefﬁcients
are run-length coded. The video coding tools used for coding the base layer are described in

length in [38], [371, [201, [221.
Encoding The Enhancement Layer

The F GS enhancement layer takes two inputs- the original ﬂame and the reconstructed
base-layer ﬂame. The difference in the intensity value of every pixel in the original ﬂame
and the corresponding pixel in the reconstructed ﬂame is DCT coded. The DCT coding
follows the traditional 8x8 blocks described above. Also, each set of four 8x8 luminance (Y)

blocks and two color (U and V) blocks form the typical 16x16 macroblock structure.

Once all the DCT coefﬁcients in a ﬂame have been obtained, the one with the highest

absolute value is determined. The number of bits required to represent this maximum value

24

gives the maximum number of bit levels or bitplanes required to code the enhancement-layer
(residual) ﬂame. The absolute values of the coefﬁcients in every 8x8 DCT block are zigzag
ordered into an array. Consider an array of 64 bits representing each of these coefﬁcients at a

particular signiﬁcant position say p. This array is said to be a block bitplane at level p.

Every frame has all the three-color components (Y, U and V) coded into it. The

maximum number of bitplane levels required to code each of these components may differ.

Mathematically [18],

N,,, = [16g, (1 C [Wu +1 (2.5)

where, N 3,, is the number of bitplanes and IC |mam is the maximum DCT magnitude of

the given residual ﬂame.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Y U V
Bitplane 0 Bitplane 0 Bitplane O
Bitplane l Bitplane l Bitplane l
Bitplane 2 Bitplane 2 Bitplane 2
Bitplane 3 Bitplane 3 Bitplane 3
Bitplane 4 Bitplane 4 Bitplane 4
Bitplane 5

 

 

 

 

Figure 2-7 An example of different number of bitplane levels for the different color components.

The maximum number of bitplanes required to code the ﬂame under consideration is that
corresponding to the greatest of the maximum levels of Y, U and V. Hence, not all the

blocks have the same number of bitplane levels. This is illustrated in Figure 2-7.

25

The maximum number of bitplanes required to code the ﬂame under consideration is that
corresponding to the greatest of the maximum levels of Y, U and V. Hence, not all the

blocks have the same number of bitplane levels. This is illustrated in Figure 2-7.

Each such block is now subject to variable-length coding (VLC). The code length
depends on the signiﬁcant position of the bitplane level under consideration. It must be
noted here that this is the absolute signiﬁcant position of the level in the binary

representation of the largest DCT coefﬁcient in the given block.

The number of passes required to code a particular block depends on the number and
order of Is and OS in each block.Figure 28 illustrates an example that helps understand this

better.

 

 

 

 

 

 

 

 

0 O O 0 O O l 0
O O 0 O O 0 O 0
0 O O 0 O 1 l l
1 O O 0 O O O 0
0 O O 0 0 O O O
0 0 0 O O O O O
O 0 0 0 0 0 O O
0 O 0 0 1 O O 0

 

 

 

 

 

 

 

 

 

 

Figure 2-8 an example of the number of passes required to code a block. This number depends on the
arrangement of Is and OS in the block. A continuous, uninterrupted string of zeros is called a run. Here,
number of runs =4.

More information on the bitplane coding method used for the FGS enhancement layer

can be found in [l4],[18],[42],[21].

26

2.5.2 The Streaming Server

The second block in the Internet streaming video system structure is the streaming
server. In the case of the MPEG-4 FGS system, the streaming server determines the size of
the portion of the enhancement layer that gets transmitted. Figure 2-9 shows the total,
lossless enhancement layer being input to the server. The shaded region indicates the part of
the enhancement layer of every ﬂame that ﬁts within the available bandwidth. As shown in

the ﬁgure, only this part of the enhancement layer reaches the decoder.

In the current implementation of the MPEG-4 F GS standard, the streaming server
allocates bitrates for the enhancement layer of every ﬂame by a brute-force method. The

basic equation deﬁning this allocation is:

BFR(EL)=BTOT(E%FR (2.6)

where, BFR(EL) is the bitrate allocated for a given ﬂame in the enhancement layer,
er (EL) is the total bitrate available for the enhancement layer of the video sequence and

N FR is the number of ﬂames in the video sequence.
Within a frame, the streaming server performs a crude, raster-scan driven pruning of the
enhancement layer to ﬁt the bitrate allocated for that ﬂame. This is a low-complexity, non-

optimal bit allocation method. The objective of this thesis is to develop and analyze the

performance of optimum (in a rate-distortion sense) algorithms for FGS rate control.

27

Portion of the FGS ,.
PCB enhancerrent layer C] enhancemert laya trammittod i

 

 

 

 

inreal-time
Attheeneoder Atthestrmningserver Atthedeeoder

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l

S
l
1
i
1
1
i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Eselayer
FJihamerImIayer I
“-1 i t r":
l
Baselayer j

 

 

 

 

 

 

 

 

 

 

Figure 2-9 Examples of the FGS scalability structure at the encoder (left), streaming server (center), and
decoder (right) for a typical unicast Internet streaming application. The top and bottom rows of the
ﬁgure represent base-layers without and with Bi-directional (B) frames, respectively. Note that only part
of the enhancement layer is transmitted in real time. [18]

2.5.3 The Decoder

The last block in the Streaming server system is the decoder. It is the client-side module.

The block diagram for an MPEG-4 FGS decoder is shown in Figure 2-10 below. The

28

structure of the decoder is basically the reverse of that of the encoder. The decoder
reconstructs the video ﬂames using the base layer bitstream and the truncated enhancement
layer bitstream. It is useful to note that the two bitstreams are decoded independently and
added to form the reconstructed ﬂame. The quality of the video is proportional to the

number of bits decoded by the decoder for the enhancement layer of each ﬂame.

Enhancement Layer

FGS . '5 Enhanced
Bit-plane Bit-plane
VLD de—shiﬂing » Video

 

 

   

    

Motion
Compensation

  
 
 

  

’ H —-BLMVS

 

 

 

 

 

 

Base Layer BL L
BL Frame Memory V
Video
stream (optional)
Inverse
Base Layer ’ Quantization Inverse
VLD Q" DCT

 

 

 

 

 

 

 

 

 

 

 

Figure 2-10 Basic (SNR) FGS decoder for the base and enhancement layers [18].

2.6 Summary

To summarize, rate-distortion theory indicates that there exists a minimum distortion

level that can be associated with any given bitrate lesser than the total lossless bitrate. In the

case of image data compression, the losses incurred due to applying an image compression
algorithm lead to a similar trade off between rate and distortion. A widely used measure of

distortion for image data is the Mean Squared Error.

The compression algorithm in the MPEG-4 video-coding standard comprises of
transform coding using the DCT, quantization and run-length encoding. The FGS scalability
scheme allows for progressive scalability in the enhancement layer. This is possible because
the enhancement layer is coded using the bitplane coding technique at the encoder. The
streaming server roughly determines the bitrates for every ﬂame. It employs a brute-force
approach to allocate bits to meet this bitrate constraint. At the decoder, the base layer and
enhancement layer bitstreams are decoded separately and added together to form the

reconstructed ﬂames of the video sequence.

30

3 Bitplane Rate-Distortion Based Rate-
Control Algorithm

In this chapter, we describe and develop a rate-distortion based rate-control algorithm for
the specific encoder-decoder system of the MPEG-4 F GS standard. This makes it an
operational rate-distortion based algorithm. The advantages of using an operational curve is
that the data points used are directly achievable for the implementation of the system and for

the given test data.

FGS codes a video sequence into a base layer and a single enhancement layer. The base
layer is non-scalable and is equal in size to the lower bound of the available bandwidth. The
residual DCT coefﬁcients representing the difference between the original picture and the
reconstructed picture are bitplane coded and comprise the enhancement layer. The
enhancement layer can be truncated within each ﬂame to allow partial enhancement

proportional to the number of bits decoded for every ﬂame.

The limiting of the base layer bitrate to the minimum available bandwidth is a typical
rate-control problem. Rate control for non-scalable coders is essentially based on a buffering
technique. The quantization step size is based on the fullness of the output buffer and the
characteristics of the picture to be coded [44], [4], [l3]. MPEG-4 uses the rate control
scheme proposed for MPEG-2 and which is commonly known as the Test Model 5 (TM5).
This rate control algorithm consists of three steps- target bit allocation for a ﬂame, rate

control via buffer monitoring and adaptive quantization based on local activity [39]. The

3]

TM5 algorithm is used to restrict the base layer to the lower bound of the available

bandwidth.

The enhancement layer bitrate depends on the bandwidth that is available for the video
sequence streaming above the base layer bitrate. The encoded enhancement layer bitstream
is sent to the streaming server. Effectively, the streaming server performs a raster scan of the
bitplanes- starting ﬂom the ﬁrst macroblock through to the last on a particular bitplane. The
ﬁrst bitplane scanned is the one that corresponds to the most signiﬁcant position of the
largest DCT value in that ﬂame. Once all the macroblocks in a particular bitplane have been
sent, the server starts sending the macroblocks ﬂom the beginning of the next highest
signiﬁcant position bitplane. This is illustrated in Figure 3-1. The server stops sending more
enhancement layer information whenever the target available bitrate is reached for that
ﬂame. This is a coarse raster-scan bit allocation method that has very low complexity.

However, such a method is not optimal since it performs brute-force bit-allocation.

When the streaming server prunes macroblocks ﬂom the enhancement layer bitstream, it
adds to the distortion in the reconstructed enhancement layer image. The rest of this chapter
is devoted to investigating a low computation, low complexity rate-control algorithm for the
F GS enhancement layer that minimizes this distortion. It is also intended that this algorithm

be compatible with the rate-control mechanism employed for the base layer.

32

Frame F

I evel
=8- 1

(L33)

    

 

 

 

level
=.i
53w pm," '
I: LrIrumittedmncroblocka E pruned macroblock-
(a)
FrameP

level

=B-I
(LSB)
level
=i

 

 

 

 

x- W‘— "a
. 4" - .1 .57
. .

[: trnnmniucdmncroblocka pruned mna-oblocka

Cb)

 

Figure 3—1 Pruning of macroblocks- width of the macroblocks represents the number of bits needed to
encode that macroblock (a) Raster-scan rate-control (b) R-D based rate-control

33

3.1 Rate And Distortion Measures

Let M be the set of all macroblocks in the enhancement layer bitstream. The distortion

added to the enhancement layer image by pruning any macroblock m is represented by AD," .

Let 9.71 c M be the subset of macroblocks that is pruned by the streaming server when it
implements the existing raster-scan rate-control method. Our objective here is to determine

the Optimal subset of macroblocks, am that minimizes the total distortion in the

apt ’
reconstructed image i.e.,

9310p, =arg{r%iln Z ADm} (3.1)

016931

We make a rate-distortion interpretation of equation (3.1). In terms of the available

bitrate for video transmission and the associated distortion, we can state that our objective is

to determine the set of optimal macroblocks shop, that will minimize the total distortion in

going from the lossless bitrate Ru of the enhancement layer to a target bitrate R, such that

R, $12”.

From Figure 3-2, we can see that in order to get to any bitrate that is lesser than the
lossless bitrate, there is more than one path that can be followed. To ﬁnd the path with the
least associated distortion at all points or the optimal performance curve, we have to ﬁnd the
one that will have minimum slope at every point. Hence, in a rate-distortion sense, our
objective is to minimize the ratio between the increase in the distortion associated with a
given reduction in the rate. In terms of the macroblocks to be pruned, our objective is to

34

determine the set of macroblocks that, when pruned, result in the minimum slope of the rate-
distortion curve. This implies that we have to determine the rate-distortion ratio associated

with the pruning of each macroblock in the ﬂame.

 

 

 

 

 

u.‘ I 7 I U I I I I
136 '- 4
11.3 ‘ ‘-—-—.__ 1

\‘H
125 ‘, a
MSE K“ X‘ x
0-2 " -\ - . q
-, X“
\ \
3.15 - \ ..
I ~ ‘_ “-1
\K

011 '- “~\‘:\~\ -

\‘.\"".-\ a

‘3‘. -._\
1E5 1- \- '\ -
\\‘-‘\
as!
o J l J J I I l
I 1.5 2 2.5 3 3.5 4 4.5 5 5.:
Bitrate (bps)

Figure 3-2 Different rate-distortion curves

In order to ﬁnd the subset 9310p, , we have to ﬁnd the distortion ADM associated with the

pruning of each macroblock in the ﬂame. One method of doing this is to completely decode
the enhancement layer bitstream to get the reconstructed image. This image is then
compared with the original image and the mean-squared error (MSE) between the two is
determined. This gives us the distortion in the pixel domain. In the CIF format, the size of a

ﬂame is 352 x 288 pixels. Each ﬂame can hence be divided into 396 macroblocks (16x16

35

pixels). If B is the number of bitplanes in the enhancement layer ﬂame, then it contains a
total of 396*B macroblocks. Calculating the distortion associated with each of these
macroblocks by reconstructing the image completely, results in very high computational
complexity. This is neither desirable nor acceptable. Hence, we look for an alternative

method of determining ADM .

One possible alternative is to determine the MSE in the transform domain. Let [X08]
be the transform domain representation of the original image [x(ﬁ)] and [X (16] be the

transform domain representation of the reconstructed image [58(5)]. Here, I? = (k,,k2) and

F = (n,,n2) are two-dimensional vectors in the transform and pixel domains, respectively. It

can be easily shown that the squared error in the transform domain is equal to the squared

error in the pixel domain, that is, the distortion,

1),, = Zip/1AA.) — ,1",.,,,(/?)]2 =Z[x,‘m(ﬁ) —;t,,,,(ﬁ)]2 (3.2)

I
where, D,” is the distortion associated with block I of macroblock m, X 1.». (I? ) and 21",” (I?)
are the DCT coefﬁcients in position I: = (k,,k2) in the same block in the transform domain

representations of the original and reconstructed images respectively. x,'m(ﬁ) and 5.1mm)

are the corresponding intensity values of the pixel at location ﬁ=(n,,n2) in block I of

macroblock m. Therefore; we use the DCT coefﬁcients in the enhancement layer to

determine the distortion.

36

If I is the number of blocks in the macroblock m, the distortion for the entire macroblock

is given by,

Q=qu (m)

Our objective is to determine the set of optimal macroblocks map, that will minimize the
total distortion in going ﬂom the lossless bitrate Ru of the enhancement layer to a target

bitrate R, such that R, S Ru. The macroblocks that we select for pruning have to satisfy the

condition,

2 Rm 2 (Ru —R,) (3.4)

meDJl

Therefore, the optimization RD-based problem addressed in this thesis can be expressed as

follows:

r i

Y

9310,” = arg< mmiin [2 AD") (3.5)

"1er
ngmkmzlkLL_RTl J

 

 

3.2 The Rate Control Algorithm

In this section, we introduce the bitplane rate-distortion based rate-control (BPRD)

algorithm for the FGS enhancement layer.

37

3.2.1 Mathematical Formulation

Let the target bitrate for the FGS enhancement layer be R, . Let R,.,i = 0,1,...B —-1 be the

number of bits required to code the ith bitplane. Here i=0 corresponds to the MSB bitplane
and i=(B-l) corresponds to the plane representing the least-signiﬁcant bit (LSB). In terms of

R, , the target bitrate can be expressed as,

(3.6)

where the target bitrate RT lies between the number of bits required to encode j bitplanes of

the image and the number of bits required to encode j-l bitplanes of the image. The existing
raster-scan rate-control method completely discards the (j+l)st to the (B-l)st bitplanes.

Macroblocks are pruned ﬂom the bitplane level j in order to reach the target bitrate R,. In

the bitplane rate-distortion based rate-control (BPRD) algorithm described in this section,
we therefore resolve to ﬁnd the optimal set of macroblocks that must be pruned ﬂom

bitplane level j. This is given by

931 c: M. (3.7)

OPIU) I

where the global set of macroblocks is considered to be M j , the set of all macroblocks at

bitplane level j.

Given a target bitrate RT , the rate-control algorithm begins at a bitplane level j. The

input to the rate-control algorithm is hence considered to be the video sequence at the bitrate

38

13-]
Z R, . In other words, the input video that the distortion is measured against is considered to
1:)

have only the j most-signiﬁcant bitplane levels if the algorithm is operating on the bitplane

level j.

In the FGS enhancement layer, the DCT coefﬁcients are bitplane coded. In these terms,

the DCT coefﬁcient in block I of macroblock m can be represented as,
_ B“ , —
XL," (k) = Z 2<B*'>-'b,m.(k) (3.8)
i=0

where, b,‘m.,(/E) is the bit in the ith signiﬁcant position of the coefﬁcient X 1.»: (I? ) and B is

the total number of bitplane levels needed to represent the block I of macroblock m.

In the case of the proposed BPRD, there is always only one bitplane in the binary
representation of the DCT coefﬁcient, that is, B =1 always. This means that the index i=0

always. For this reason, we shall henceforth drop the index i. Equation (3.8) thus reduces to,
X,” (76) = b“, (I?) (3.9)

The distortion D,” (mean-squared error) between the block I in the macroblock m in the

transform domains of the original and the reconstructed images is given by,
_ A _ 2 _ A _ 2
21X“, (k ) — X,‘m(k )] = Z[b,.m(k ) —b,,m(k)] (3.10)
It I:

Since there is a distortion already associated with the input image, this distortion can be

considered to be the increase from the original distortion of the input image or ADM. If

j= B , then the input distortion is 0.

39

Using equation (3.3), the total change in distortion due to the pruning of the macroblock

m (ADM ) is given by,

ADM

£2,1er (I?) — It”, (13]2 = :zbﬂui) 42,06]2 (3.11)

Here, bjm (I?) and bl," (I?) can take values of O or 1. This implies that b,” (I?) —b,'m (I?) is also

always 0 or 1. Hence, we can rewrite equation (3.11) as,
AD... = XXIX.,.(I?)—X.,.(IF)]2 = 2204.416]: = 224.41?) (3.12)
l k I k l k

Only the cases in which b,‘m(lc_)=l and 13,.”‘(1—6 )=O contribute to this sum. Hence, the

distortion added by pruning a macroblock m at bitplane level j is equal to the number of ones
in the macroblock at level j that get converted to zeros. This is the measure of distortion for

the BPRD algorithm.

Thus, a simple indication of the distortion is the number of ones in a bitplane
macroblock. Every block in a macroblock comprises of ones and zeros. Each such block is
separately VLC coded based on the number of consecutive zeros (run of zeros) and the
length of these runs. The number of bits required to encode a particular block depends on the
bitplane level and the length and number of runs of zeros. From the VLC tables developed
for the FGS enhancement layer in [21], the number of bits required to code a block that
consists entirely of zeros (or an all-zero block) is less than that required to code a block that
contains one or more runs. Hence, converting a block to the all-zero state reduces the bitrate.

This difference in the bitrate is the change in the rate ARM due to converting (to zeros) the

40

bitplane of block I of macroblock m. Therefore, the total number of bits reduced by pruning

macroblock m:

ARm=ZAR

l.m
l

(3.13)

3.2.2 Description

The BPRD algorithm is bitplane-level independent. It is also ﬂame speciﬁc. It operates

only on the bitplanes in the current ﬂame and not across ﬂames. The steps involved are

enumerated below:

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Start algorithm for ﬂame f, bitplane level i such that 0 S i S B —1 ;

Recover the DCT coefﬁcients ﬂom the compressed enhancement layer

bitstream by performing variable-length decoding.

Determine the change in rate (ARM) and change in distortion (ADm) for

every macroblock m as described the previous section. Calculate the rate-

distortion ratio (ADA ) for each macroblock.

Store references to all the macroblocks in an array and sort them in

ascending order of the rate-distortion ratio.

Convert the macroblock referenced at the bottom of the sorted array to all-

zero. Move reference one step at a time up the sorted order.

Check if the target bitrate has been reached by this conversion.

41

Step 7: If the target bitrate has not been reached, go to Step 4.

Step 8: If the target bitrate has been reached, re-encode the modiﬁed DCT

coefﬁcients using variable-length coding.
Step 9: Repeat for all frames f.

Before proceeding to the experimental results, we highlight two important aspects of the

proposed BPRD algorithm

The All-Zero Code: When the enhancement layer is VLC coded based on the MPEG-4
FGS standard; the code for every block depends on the signiﬁcant position of the bitplane
level under consideration. This level is the absolute signiﬁcant position of the bitplane level
in the binary representation of the largest DCT coefﬁcient in the given block. The most-
signiﬁcant-bit (MSB) plane for every individual block is deﬁned as the ﬁrst non-all-zero
bitplane. If an all-zero block is encountered aﬁer the MSB level has been reached for that
block, it is coded using a special symbol called the all-zero symbol. The length of the all-
zero symbol once again depends on the absolute signiﬁcant position of its bitplane level, for

that block.

More importantly, the length for the all-zero symbol used by the MPEG-4 FGS standard
could be very high, and consequently not very efﬁcient to use in our proposed BPRD
algorithm. Consequently, we introduce into the enhancement layer bitstream, a ﬂag to mark
every macroblock in the bitplane under consideration. This ﬂag is set to 1 if the macroblock

has been chosen ﬂom the sorted order for conversion to all-zero. If not, the ﬂag is set to

42

zero. The change in rate measure (ARM) that is used above in our algorithm takes into

consideration this extra (though very minor) overhead.

Blocks versus Macroblocks: Not every 8x8 DCT block has the same number of
bitplanes. Each of the blocks is an all-zero case at bitplane levels higher than its MSB plane.
At the encoder, if a block is all zero above its MSB level, a l—bit ﬂag is used to indicate that
the MSB for that block has not been reached. It is possible that there are a lot of 8x8 DCT
blocks that have fewer bitplanes than the maximum number of bitplanes in a ﬂame [14]. In
other words, the ﬁrst couple of bitplanes of every ﬂame are very likely to have many all-
zero cases. In such a scenario, using 1 bit to signal the all-zero status of every block is not
very efﬁcient. The encoder instead groups the blocks in each macroblock together and uses a

single bit to indicate an all-zero macroblock [14], [21].

Let us assume that our granularity resolution was a block instead of a macroblock. In the
raster-scan bit allocation method, the blocks that lie beyond the target bitrate would be
discarded. In other words, no overhead is added to the enhancement layer bitstream to
indicate to the decoder that these blocks are to be rebuilt as all-zero cases. In the case of
bitplane rate-distortion rate-control (BPRD), we would start ﬂom the bottom of the sorted
array containing references to the blocks in the ascending order of their rate-distortion ratios
and convert the blocks to all-zeros. We are not concerned with the position of the block
within the ﬂame. Our primary interest is to minimize the slope of the rate-distortion curve.
However, it now becomes imperative that we signal the decoder that the block has been
converted to an all-zero block and should hence be rebuilt as one. The relative overhead

incurred in indicating this conversion on a block-by—block basis is high for every bitplane

43

level, but especially so for the ﬁrst two-bitplane levels. This is because in these bitplane
levels, the encoder itself has been optimized so that it adds the minimum number of bits into
the enhancement layer bitstream. In order to restrict the overhead closer to the lowest
possible level, we select a macroblock as the granular resolution of our rate-control

algorithm.

3.3 Experimental Analysis

We begin the experimental analysis of the bitplane rate-distortion based rate-control

algorithm by deﬁning a suitable comprehensive test-bed.
Video Format

The ﬁle format used here is the common—interchange format (CIF). The ﬂame size is

therefore, 352 x 288 and the input source ﬂame rate is 30Hz.
Test Sequences

The test sequences library is described in the MPEG-4 standard [21]. The library is
divided into six classes ﬂom class A through to class F. The sequences are classiﬁed based
on characteristics such as spatial detail, amount of movement, number of bits used for
coding. The test sequences used in this experimental analysis belong to Class A through to
Class C. Class A consists of sequences that have low spatial detail and low amount of
movement (e.g. Akiyo). Class B sequences are characterized by medium spatial detail and

low amount of movement or vice versa (e. g. Foreman). Class C sequences have high spatial

44

detail and medium amount of movement or vice versa (e.g. Mobile Calendar). The example

sequences are the ones that are used here.
Base Layer Rate Control

We consider the following two cases for rate control in the base layer. First, we assume
that there is no rate control algorithm applied to the base layer. We ﬁx the quantization

parameter Q, and run the BPRD algorithm on the enhancement layer bitstream. The values
we use for the test sequences are Qp= 4, 6, 8 for the Akiyo sequence and Q1): 10, 15, 28 for

the Foreman and Mobile sequences. These Q, values are chosen because they approximate

to reasonable base layer sizes of 100 kbits, 200 kbits and 300 kbits respectively. Second, we
consider the sequences with TM-S rate-control applied to the base layer. Again, we restrict

the base layer to the sizes of 100 kbits, 200 kbits and 300 kbits.

We consider the following two cases for rate control in the base layer. First, we assume
that there is no rate control algorithm applied to the base layer. We ﬁx the quantization

parameter Qp and run the BPRD algorithm on the enhancement layer bitstream. The values
we use for the test sequences are Q): 4, 6, 8 for the Akiyo sequence and Q): 10, 15, 28 for
the Foreman and Mobile sequences. These Q, values are chosen because they approximate

to reasonable base layer sizes of 100 kbits, 200 kbits and 300 kbits respectively. Second, we
consider the sequences with TM-5 rate-control applied to the base layer. Again, we restrict

the base layer to the sizes of 100 kbits, 200 kbits and 300 kbits.

45

We present below the results obtained by running the BPRD algorithm on the
enhancement layer sequences obtained for all the different cases derived ﬂom the test-bed
described above. All ﬁgures represent the mean-squared error plotted against the bitrate (in

bits per second). The bitplane level=0 corresponds to the MSB bitplane.

In this section, four performance curves are presented for every case. The curves in the
top-left comer represent the performance at all the bitplanes for both the current raster-scan
and the proposed BPRD-based algorithms. As explained above, these results are shown for a
given ﬂame with the BPRD algorithm applied independently to each bitplane. The results

are organized sequence-wise.

The ﬁrst sequence that we analyze is the Akiyo sequence. This is a Class A sequence i.e.
it has low spatial detail and low amount of motion. Figure 3—3-Figure 3-5 show the results

obtained for the I, P and B ﬂames for Q”: 8 respectively. For the Akiyo sequence, a Q,

value of 8 corresponds to a base layer bitrate of about 100 kbits/s. Figure 3-6-Figure 3-8
show the result of applying the BPRD algorithm on the individual bitplanes of the I, P and B
ﬂames for the Akiyo sequence. The base layer in these cases is coded at 100 kbits/s using

the TM—S rate-control algorithm.

It can be seen ﬂom the top-left curve in Figure 3-3 that the performance curve of the
existing raster-scan rate-control method closely follows that of the BPRD algorithm for

bitrates higher than about 200 kbits/s per ﬂame.

46

 

 

   

- 1&4?

Raster-scanbialoc. _ L
— BPRD '

l

 

Figure 3-3 Akiyo I frame Qp=8

The improvement in performance is more signiﬁcant for bitrates in the range of 50
kbits/s-ZOO kbits/s for each of the ﬂames. At lOHz, this corresponds to a total enhancement
layer bitrate in the range of 500 kbits/s to 2 Mbits/s. This is a desirable and practical

bandwidth range. The rest of the ﬁgure shows the performance in this speciﬁc bitrate range.

47

 

Raster-seahi lee.
— m

 

Rana-sell It in.
no

 

 

Figure 3-4 Akiyo P frame Qp-s

 

 

Raster-scarbialoe. ' Naturalistic.

 

 

 

Figure 3-5 Akiyo B frame Qp-Q

49

 

 

Trmwﬁlgis: ;: t '

 

 

 

 

 

 

 

,4," _ 40..
“ mammﬂAlB‘tplanss

B 'I 2
Bitrale- bps
g .

' 2 .
Bitrate - bps

    
   

t"? ’3...

Raster-scan bit allot. _
— BPRD

  
   

‘. of. x
‘ T’.
._ .. .
\'. 1].. [J
\ -.
\x“-‘ . r
‘\j ti“

Figure 3-6 Akiyo I Frame Base layer=100k

11:11:: ' - ~15;

1,... ' In lml=2 A

Raster-semblance, "1;”
-— BPRD 5. '

 

Rastn—scanbitalloc. . j

— BPRD A ‘"
i
v:

t’?

l

’1.
I

 

 

mss

P Frama. Overall Btplanes

 

 

Raster-scan hr allot
— BPRD

 

 

 

 

 

 

0.5 1’ l l 5 2 2.5 3 3.5 4

P Frame Overall Bilplanes

Raster-scan bit allot
— BPRD

0.2 or as 118 1 1,2 1.4 1.5 1.8 “2
Bitrate - bps x105

Bmaia - bps r 106 .

 

'23 ~

22

 

 

ludil

V _

Raster-scan bit allot.
—— BPRD

 

 

Figure 3-7 Akiyo P Frame Base layer=100k

Based on the above ﬁgure, the BPRD based algorithm provides its highest gain in

performance at bandwidth ranges that can be available over current and evolving access

networks (e.g., Internet access over Local-Area-Networks, cable modems, DSL, etc.).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

K I V . 1" o‘ t
8 Frame Overall Btplanes ‘ lml=l
1 r. 4 »P .I
.‘ Raster-scan bﬂ allot '; “SW56?" 5“ alloc
— BPRD 29' 1“ —— BPRD
26» . 2.85 i ‘ ..
r 2.3
7* 2.75
:15 3 21*
i
E E2116- 5
Ii 25*
‘* 255’
0‘5 2.5r
., 2.45
0.5 1 1.5 2 2.5 a 3.5-: 4 ‘ 1 . 1.5‘ ’ 2 ., 25 ’ 3 '
Duals-bps “9’ .m’m" . 3‘ , {8102‘
Bane OvarallBtplanes lml=2 V.
a» 24-3 "
1 — epso “; _ BPRD
2.3 ‘_
25> 2.21
‘ .ﬂ
21*
2 ... \
s “a 2+ ‘
E E
1.9
1.5~ "
18
1 l]
4 ‘5' EEK;
0.2 04 [1.5 08 1 1.2 14 16 1.8 2 15 4 4.5 5 5.5 5 5.5 7 7.5 B 8.5
Bitrate - bps [10‘ B‘mlo- bps x 10‘

 

Figure 3-8 Akiyo B Frame Base layer=100k

From the ﬁgures shown in the following pages, it can be seen that this observation is
applicable for all the presented cases for the Akiyo sequence. Hence, all the ﬁgures show
speciﬁc results for the bandwidth range of 500 kbits/s to 2 Mbits/s. In all cases, this bitrate

range corresponds to bitplane levels 1 and 2 for the Akiyo sequence.

52

 

 

Ralerscaaoi doc.
-— BPRD

 

Figure 3-9 Foreman I frame for Qp-28

The second test-sequence used in this experimental analysis of the BPRD is the Foreman
sequence. The Foreman is a Class B sequence. It has medium spatial detail and low amount

of motion.

53

 

 

 

 

 

Figure 3-10 Foreman P frame for 01248

 

 

ya41."‘“’“".s4"‘

pf :11 '.....“.._............. ........... . '
_ -Raster- 4m bi aloe
.~ —BPRD

      

 

:1" \‘KFT'. )4... 9'). ~- 'e‘nr

y... ”:1.“ .1 kw awuﬂﬁ‘

Figure 3-11 Foreman B frame for Qp-28

Figure 3-9 to Figure 3-11 show the results obtained by applying the BPRD algorithm to

the individual bitplane levels coded at a constant Q, value of 28 for the I, P and B ﬂames

respectively.

55

Figure 3-12-Figure 3-14 show the results for the I, P and B ﬂames with a base layer
bitrate of 100 kbits/s. In these cases, the base layer is coded to the speciﬁc bitrate using the

TM-S rate-control algorithm.

 

 

 

Raster-scan bl alloc. - Roster-mold doc.
— BPRD . — BPRD

 

 

 

Raster-sen bl doc.
— BPRD

Figure 3-12 Foreman 1 frame Base layer=ka

56

 

 

Hutu-m ﬂ ioc.
— m

 

 

Figure 3-13 Foreman P frame Base layer-100k

From the top-left curve in all these ﬁgures, we can see that for bitrates higher than about
200 kbits/s per ﬂame, the BPRD is closely approximated in performance by the raster-scan

rate-control method. That is, for a 10Hz ﬂame rate, the improvement in performance is more

57

 

 

      
  

see c".- *3 ..- . -1»
an 3.2-: .~. 4». .41..
p 1:. .......................................... mRasler-uuibialloc. ,1". ‘ J" ._-‘ ""RiﬂI-muu.
. :f 13 — BPRD —amo

.-r at? "a

 

   
  
 

‘1 ‘ 9.." “at"? .2 W ,_ ._ .2“.me 51:4“, - .
‘ I“ ‘ I ‘ #%,ﬁe,ﬂw ﬂ ffh‘sr.‘~; 9'" ’ ”Eryn“;
. ., 19:1 .. a; “in AW‘ ”’75 ‘i

Figure 3-14 Foreman B frame Base layer=100k

apparent if the entire enhancement layer lies within the bandwidth range of 200 kbits/s to
2 Mbits/s. The other three curves in each ﬁgure hence, show the performance for these
bitrates. It can be seen that the bitplanes that correspond to the speciﬁed enhancement layer

bitrates vary between bitplane levels 0, I, 2 and 3 for the Foreman sequence.

58

 

 

 

 

 

Figure 3—15 Mobile I frame for Qp=28

59

 

 

nevi. 19‘] H2 '

Raster-scan bl allot.
— BPRD

 

 

Figure 3-16 Mobile P Frame Qp=28

The last sequence that we analyze for the performance of the BPRD rate-control
algorithm is the Mobile Calendar sequence. The Mobile sequence is a Class C sequence. It

has high spatial detail and low amount of motion.

 

 

 

 

 

Figure 3—17 Mobile B Frame Or”

61

 

 

n» J,

,1“! . . uh“ v-,,.,,, ._~-.

Racial-nu H let.
—— BPRD

    

 

   

‘1" ~7- r. {in t- ' .
‘ 1,.) “N; “3.2)" “”4 1... __ . -.1-.
saw . -«. raga”... .5312 -

i L __-Mm~§“‘eum 139:5" I

Figure 3-18 Mobile I frame Base Layer-100k

Figure 3-15-Figure 3-17 represent the results obtained for the I, P and B ﬂames of the

Mobile sequence coded using a constant Q, = 28. This value of the quantization parameter

Qp corresponds approximately to a base layer coded at bitrate of 200 kbits/s.

Figure 3-18-Figure 3-20 show the performance of the BPRD algorithm applied to the I, P

and B frames of the sequence coded using the TM-S rate-control algorithm to have a base

layer of 100 kbits/s.

 

     
 

 

 

a;
'34‘:
M; a -
'. 1' v.1 ‘~!‘W"'w 1.397: "ﬁgy‘.4:9§-'; u ”j, «m .... ‘1
M35. '5‘. " -_ Va. » _ f -, 3‘ "“"i '- .' ,.
. In“; ~3f§ :. “(3.3?" Jygwﬁrl ;‘“.”‘_.~v . L)... ,‘A “u v '1‘

Figure 3-19 Mobile P Frame Base Layer=100k

63

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BFrame Overall Baplanes ‘ INFO <
4.45“.
“ . Raslsr-scan butane: _ Raster-scan bnahoc :
._ 4115 -
35- ‘
4.44 -
3,
4.6 -
25»
3 ~ 3
E E U3
2r
. l
15- 1.425
‘r . ‘ﬁ?’ ..
0.5 ‘--\ 4.415-
\‘ .
l l J %L | l '
I 2 4 5 5 7 . IN .3“, m . an «my am am
agitate-bps “05‘ "t . again-fly!
B F tame: Omall Biplane: Imbii
- ' - u- ”
y‘ Raslenscan bnalloc ‘ Raster-scan bﬂalloc
‘2' V — BPRD . _ BPRD
4 ,. L3-
3.8-
35 “2;
0 3.1 o
E in»
3.2
3 " 5
2.8 ~ \
2.5. ‘ ~ "3.9
2". l l m_ l ‘1‘ i l l I n n a; \
0,5 1 13 ‘ 2 ' 2.5 (.5 1 1.5 T 25 3 3:5 4 45
Bilﬂl. ’ bps ‘ 10° Biiiil- bp‘ .10.

 

Figure 3-20 Mobile B frame Base layer=100k

The top-leﬁ curve indicates in the case of the Mobile sequence that the performance of
the BPRD algorithm improves in the bandwidth range of about 100 kbits/s to 200 kbits/s per
frame. This is equivalent to a total enhancement layer bandwidth of i Mbits/s to 2 Mbits/s

for a frame rate of 10Hz. The bitplane levels that correspond to this bitrate range are levels 1

64

and 2 for the Mobile sequence. Hence, the performance for these levels is shown in detail in

these figures.

3.4 Conclusions

The performance of the bitplane rate-distortion based rate-control (BPRD) algorithm has
been presented thus far. We proceed to state certain conclusions on the performance

characteristics of the BPRD algorithm based on the analysis of the presented results.

In general, it can be seen that the performance of the BPRD algorithm is consistent with
the theoretical expectation in that it generated convex RD curves. Application of the BPRD
to an individual bitplane level produces a more optimal rate-distortion curve than the one

obtained by applying the existing raster-scan rate-control method.

The improvement in the mean-squared error is different for different bitplane levels. It is
observed that this decrease in the error is more notable within a bandwidth range of
approximately 50 kbits to 200 kbits per frame. Given that the encoded sequence typically
has a frame rate of 10Hz, this corresponds to a total enhancement layer bitrate of 500 kbits/s
to 2 Mbits/s. The performance improvement is noteworthy in the bitplanes that correspond

to this target bitrate range.

The magnitude of the improvement in the mean-squared error is also differs from
sequence to sequence. In the same target bitrate of 500 kbits/s to 2 Mbits/s, the BPRD
algorithm gives a more signiﬁcant lowering of the MSE in the Foreman sequence (A Class B

sequence). The next best performance is for the Akiyo sequence (A Class A sequence). The

65

BPRD performs with only a fair amount of improvement for the Mobile sequence (A Class
C sequence). This indicates that the performance of the algorithm depends to a large extent

on the characteristics of the video sequence being encoded and transmitted.

The ﬁgures showing the performance of the BPRD algorithm on each individual
bitplane, plotted together indicates that if the target bitrate should occur in the region of
transition from one bitplane level to the next, the rate-distortion curve is locally non-optimal.
Hence, in the next chapter, we propose an extension to the BPRD algorithm in order to ﬁnd

a rate—distortion performance curve that is optimal for all target bitrates.

66

4 Cross-Bitplane Rate-Distortion Based
Rate-Control Algorithm

In this chapter, we extend the BPRD algorithm in an effort to optimize the rate-distortion

curve at the region of transition from one bitplane to the next.

Our objective in the BPRD was to find the optimal subset 9)? from the set of all

oth')
macroblocks M j at bitplane level j in the transform (DCT) domain of a given frame. The
cross-bitplane rate-distortion based rate-control (CBPRD) algorithm differs from the BPRD

algorithm as follows. In this case, we search for the optimum subset map, within the total

set M . Here, M is the set of all bitplane macroblocks in the DCT domain of the frame under

consideration. Consequently, the optimum subset an could include bitplane macroblocks

opt

that belong to different bitplane levels Figure 4-1.

Hence, under the CBPRD algorithm, we determine the optimal subset that minimizes the
increase in distortion as measured across all possible macroblocks belonging to all possible

bitplanes subject to a bitrate constraint:

(4.1)

‘3
m
g
\___/
v

map, = arg< mwitn [ Z ADM

ngmkmfiku'kr)
i

 

 

67

Frame F

 

 

L: n “:25 f 55:71: I”: i
‘ : "‘ ‘ "1 ' .1; ":

 

ﬂ, pruned macroblocks

  
  

Franc P

 

 

f r!
prunedmncroblockl

(b)

Figure 4-1 Pruning of macroblocks- width of the macroblocks represents the number of bits needed to
encode that macroblock (a) BPRD — map, is selected from bitplane level=j (b) CBPRD- map, is selected
from macroblocks on different bitplane levels

68

where ml. is bitplane i of macroblock m. In other words, ml. is an index to a bitplane
macroblock. Therefore, ARM and ADM. are the changes in rate and distortion that are

associated with pruning bitplane macroblock "7,-- Finally, 9110p, c M , RT is the target bitrate

for coding the frame under consideration, and Ru is the lossless bitrate of that frame.

4.1 Developing The Algorithm

Based on the above formulation of the CBPRD optimization problem, we need to

identify an efﬁcient method for measuring the changes in rate and distortion measures for

every bitplane macroblock mi. As stated above, we desire to restrict the bitrate to a target

bitrate RT. The change in rate ARM is basically identical to the change-in-rate measure that

we have used for the simpler BPRD algorithm. Now we focus our attention on the distortion

measure ADM .
l

The DCT coefﬁcient in block I of macroblock m can be represented as,

X.,..(I'€)=Ez‘B“"‘b...,.(F) (4.2)

i=0

where, b, (I?) is the bit in the ith signiﬁcant position of the coefﬁcient X MU? ) and B is

,m.i

the total number of bitplane levels needed to represent the block I of macroblock m.

The distortion associated with a block I of macroblock m is ADM and the total

distortion of macroblock m is AD," . If the macroblock is considered at a bitplane level j such

69

that Os j 58—] and B is the total number of bitplanes in that frame, the block and

macroblock distortions are represented by AD, ,,,, and ADM. respectively. These are denoted
. , 1

as follows,

2

ADM,=;[X,..j(i€)—X,,.j(i€)] kz[2‘”""j(b,..ju?)—13,,,.j(1?))]2 (4.3)

and AD," =2AD (4.4)

Therefore,
[3ij = ;Z[X (I?) — X...,.(E>]2 = ZZ[2““"" (14.1.0?) 43...]. (13)] (4.5)

Here again, b,m.(l?) and 51m.(i‘—) can take values of O or 1. This implies that
' .l ' J

bun}. (I?) ‘51.».j (I?) is also always 0 or 1. Therefore,

2 2
_ _ _ “ _ = (B-l)—i —
Aij —;;[X,'mj(k) X,‘mj(k)] 2,2;[2 b,.mj(k)] (4.6)
Since ADM. is the accumulated distortion over certain speciﬁc bitplane levels and the
J

distortion at every bitplane level is really the squared error, we shall refer to ADM. as the
J

accumulated squared error. In the CBPRD, we shall use this accumulated squared error as

the measure of distortion.
In equation (4.6), only the cases where b,'m,(/7)=l and bA,'m_(l:)=0 contribute
J I

towards the total ADM. Hence, operationally, the computation of ADmcan be achieved by
J J

70

counting the number of ones in bitplane macroblock m j, weighted by 2‘34"". The

(B-l)‘j

weighting by 2 indicates that the ones in the MSB bitplane (level j=0) are given the
highest weight and those in the LSB bitplane (level j=B—1) are weighted the lowest.

Consequently, if N,(mj.)is the number of ones in bitplane macroblock m j. , then ADW can
I

be expressed as follows:
2
_ (3—1)-~ — _ (B—I)—'
ADM} -2212 ’b,.mj(k):l -4 IN,(m,) (4.7)

In the VLC tables developed for the FGS enhancement layer in [21], the number of bits
required to code a block that consists entirely of zeros (or an all-zero block) is less than that

required to code a block that contains one or more runs. Hence, converting a block to the all-

zero state reduces the bitrate. This difference in the bitrate is the change in the rate AR, m_ due
' J

to converting (to zeros) the bitplane of block I of macroblock m at a given bitplane level=j.

Therefore, the total number of bits reduced by pruning bitplane macroblock m I. :

Aij = :13ij (4.8)

4.1.1 Description

Having deﬁned the measures of rate and distortion, we proceed to outline the steps
involved in the implementation of the CBPRD algorithm. Once again, it must be
remembered that this algorithm operates on a frame-by-frame basis in an independent

manner, and therefore does not extend across multiple frames.

7/

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Step 7:
Step 8:

Step 9:

Start algorithm for frame f.

Start from the bitplane with the lowest signiﬁcance level, i = B -1.

Determine the change in rate (ARM) and change in distortion (ADm )
a J

for every bitplane macroblock m j as described earlier in this section.

m.
I

AD
Calculate the rate-distortion ratio [ MR ] for each macroblock.

Repeat for all bitplane levels i until i = 0.

Store references to all the macroblocks in an array and sort them in

ascending order of the rate-distortion ratio.

Convert the macroblock referenced at the bottom of the sorted array to

all-zero. Move reference one step at a time up the sorted order.
Check if the target bitrate has been reached by this conversion.
If the target bitrate has not been reached, go to Step 6.

If the target bitrate has been reached, re-encode the modiﬁed DCT

coefﬁcients using variable-length coding.

Step 10: Repeat for all frames f

A single ﬂag indicates the all-zero conversion in the CBPRD. The choice of

macroblocks is made to reduce the overhead in recreating the compressed bitstream. We

now proceed to look at the experimental analysis of the CBPRD.

72

4.2 Experimental Results

For the experimental analysis of the cross-bitplane rate-distortion based rate-control
(CBPRD) algorithm, we deﬁne the same test bed as was done for the BPRD. The video
format used is the CIF format. The test sequences used are the Akiyo (Class A sequence),

Foreman (Class B sequence) and Mobile (Class C) sequence.

The performance of the CBPRD algorithm on these test-sequences is analyzed for the
cases when the sequences are coded with and without application of rate-control to the base
layer bitstream. When rate-control is not applied, the base layer is encoded using a constant
quantization parameter Qp. The Qp values used are 4, 6 and 8 for the Akiyo sequence and
10, 15 and 28 for the Foreman and Mobile sequences. The other case considered is
application of the TM-S rate control algorithm to the base layer. In this case, the three
sequences were encoded to constrain the base layer bitrate to 100 kbits/s, 200 kbits/s and

300 kbits/s.

The results are presented below in order of the test-sequences. As before, we start with
the results of applying the CBPRD algorithm to the I, P and B frames of the Akiyo sequence
followed by the results for the I, P and B frames of each of the Foreman and Mobile
sequences. For all these cases, the ﬁgure showing the results consists of three performance

CUI'VCS.

73

 

 

   
 

1 Frame

 

 

 

 

 

 

 

 

 

 

IFrama _? A?
55 raster-scan bit allot, \,\‘\ , Egggscan bn allot I 1
— CBPRD \ — 3.1
so, -- BPRDusmgASE 9* ‘\ 1\ ~ —- BPRD-usmgASE *4
' Mi JG" “ \ a: ‘
:3 '1 1 A.“ r
- '3 n 1:: 22>- g;
.115 i: l ‘41.
‘3 - ' 4‘ g
I 5.215 g
1: ~ ,,
£220 1', £15-
3 - v ,e .
‘l E \ g.
3‘15 . \‘ p 2 1s _
10" . i '5
\Q, 1
"' 5 \‘V‘Ka. .94 1"
1.». A 1 HM ' l"
’ ‘-ﬁs.w_1‘r 157mg”: :ﬁ'fg 3.5, .41?” X;
"$.11”. 4' £7 _. .
PFrame :-
-- raster-scan bialloc sf _
36' ’ PRD ' .70
— - BPRDusmg ASE ‘
. 60 .- '.
, L if
. - 5” a 93 i rag-”‘91
. E "'1 i i {:4
. E50 1 E10": #
. g '2. u a. l r ..
'lO-‘f a . i ' M
“3 ' V". 3 3). , ‘ ..
'5 ii - 1,... 1.. i' a.
3 I] . \; 3 \‘1 1
E ‘1. § ’ \1 :3
u \. ‘3 20 ' " 1.5
ﬁrs- ~‘\ 2 ‘1. .4’.’
10- it. ' ‘°' \1» 1" '7
-,.\‘\-‘.‘-- \JF-VE ‘ b
115 1.5 2 25 3 3.5 0.5 1 15 . 2 25 3 35 44- -.
Btralo- bps x105 Birau- by “L“ 10‘

 

Figure 4-2 Akiyo I, P and B frames for Qp=8

74

 

 

 

 

 

 

 

 

Figure 4-3 Akiyo I, P and B frames for Base layer=100k

The ﬁrst curve is the performance of the existing raster-scan rate-control method. The

second curve indicates the performance of the CBPRD algorithm. The third curve that has

75

 

been plotted gives an insight into the performance of the BPRD algorithm, if the distortion

measure of (AD

W) was used for it. The second part of each ﬁgure shows in detail one
I

region of transition from one bitplane to another.

 

I Frame

 

 

 

 

 

 

 

 

 

[Frame
55 raster-scan 111 alloc raster-scan 1111 alloc
W BPRD 311 » BPRD
50 _ — - BPPUusmg ASE — - BPRDusIng ASE
3 . g -. ‘ \ ‘
E .; E] D @- \ .
3 1 .1! “’1 ‘ ~-.
.2 '~ g 1 .
3 1 g \
g ‘ ‘ 9 17 )- \~.
P, 2.5 v1 . 2 \' ~_ 1r-
5 1. '5 K ~.
3 Z] r. 5 . ..
E \ ,5 1s \ .
° 15 ‘5 ° \
2 1, 2 .\
111- \ g. 15- \ ‘\
\ . 1
5 \. . \
ages“ . 1 \
1 1 1 a.“ ’ A 1 1‘ L 1 1 1 1 1 1 1 4L '4 2x1
[15 l 15 2 2.5 3 ‘35 4 2.8 2.9 3 3.1 3.21 33 -. 3.4 3.5 3.8 3.7 3.8
Barala - bps x 105 Birate ~ bps I 10‘
P Frame 8 Frame
raster-scan brt alloc 7U _ raster-scan brt allot
m , BPRD .- BPRD
— - BPPDusmg ASE — - BPPDusmg ASE
60
.m! .mi
g 1 E l.
o . ' ll.
‘5 m V -
r e aw 1,
3 3 )1
3 10 1 3 -
u 9 1'.
1; 1, £311 .\,
gar» 1‘:- g it
D \ o m 1 "
3 20» '\ 3 \p
1. ‘ .
10 R m \i‘.
‘ “ .1 w \ ““Pkﬁ.
+ % L ‘~x —— 1 1 1 1 1 1 V..
0.5 1 1.5 2 2.5 3 3.5 4

Bunte- bps x 106

 

0.5 1 1.5 2 2.5 3 35 4

 

Hints - bps r 10‘

 

 

Figure 4-4 Foreman I, P and B frames for Qp=28

76

 

u-scan b‘I aloe.
— CBPRD
-- BPRDusingASE

 

 

Figure 4-5Mobile I, P and B frames for Qp=28

77

It can be seen from the ﬁgures, that the CBPRD algorithm performs more optimally in
these regions than the BPRD would, if the same measure of distortion were to be used for

both the algorithms.

The Akiyo sequence is a Class A sequence. Figure 4-2 shows the results of applying the
CBPRD algorithm to the Akiyo I, P and B framesz. Figure 4-3 depicts the performance of
the CBPRD algorithm for the sequence coded at a base layer of 100 kbits using the TM-S

rate-control algorithm.

The Foreman sequence is Class B sequence. It has medium spatial detail and low amount
of motion. Figure 4-4 shows the results of running the CBPRD algorithm on the I, P and B
frames for the sequence coded using a constant Qp value of 28 in the base layer. This

corresponds to a base layer bitrate of approximately 100 kbits/s.

The Mobile sequence is a Class C sequence. It has high spatial detail and low amount of

motion. Figure 4-5 shows the results of applying the CBPRD algorithm to the enhancement

 

2 The square error numbers shown in these plots are based on averaging the accumulated square error over

the size of the picture under consideration. In this case, the picture size is 352x288 pixels.

78

The Mobile sequence is a Class C sequence. It has high spatial detail and low amount of

motion. Figure 4-5 shows the results of applying the CBPRD algorithm to the enhancement

 

.. hm. 'H { -&-" ,fr’j ”arse-"F ff . . '

raster-scuba aloe. \ tutu-ambush;
— CBPRD j ‘ -, — cam
—. BPRDuangASE _ 1. —. 39110me

    

      

a

         
   

 
 

raster-scan bl aloe.
0M0

—- BPRDuaingASE

  

 

 

Figure 4-6 Mobile I, P and B frames with Base layer-100k

4.3 Conclusions

The performance of the cross-bitplane rate-distortion rate-control (CBPRD) algorithm

has been presented. The results have been tested on three different classes of test-sequences.

In general, the CBPRD algorithm also behaves according to theoretical expectations and
produces a convex rate-distortion curve. The improvement in the performance, in terms of
the accumulated squared error is more pronounced in the bitrate range of 20 kbits to 100
kbits per ﬂame. Considering sequences coded to a ﬂame rate of 10Hz, this corresponds to a
total enhancement layer bandwidth range of 200 kbits/s to l Mbits/s. The CBPRD, like the
BPRD presented earlier, produces its highest gain in performance at bandwidth ranges that
can be available over current and evolving access networks (e. g., Internet access over Local-

Area-Networks, cable modems, DSL etc.).

Again, as in the case of the BPRD, the performance of the CBPRD is found to be

dependent on the sequence characteristics. It was also seen that in most cases, the

performance of the BPRD using the distortion measure ADM performs just as well as the
J

CBPRD itself. This indicates that in view of the simpler implementation requirements posed

by the BPRD, calculating the distortion measure AD,” and using this measure in the
J

implementation of the BPRD algorithm, the computational overhead incurred in using the

CBPRD can be reduced.

Consequently, for scalable video rate-control systems that desire to achieve optimum

rate-distortion performance while maintaining the lowest possible computational

80

complexity, the BPRD algorithm provides a good option. As shown above, the BPRD

algorithm virtually performs as well as the more optimum and more complex CBPRD.

81

5 Summary Of Conclusions

In this thesis, we have developed and analyzed the performance of two rate-distortion
based algorithms for scalable video rate control. These algorithms are tailored for the
recently developed F ine-Granularity—Scalable (FGS) video coding methods such as the one

adopted by the MPEG-4 video standard.

The ﬁrst rate-distortion based rate-control algorithm operates on a bitplane-by-bitplane
basis within an FGS compressed bitstream. We refer to this algorithm as the bitplane RD
algorithm (BPRD). A more general RD algorithm that operates across all possible bitplanes
of an FGS coded stream is subsequently presented. We refer to the second algorithm as the

cross-bitplane RD (CBPRD) algorithm.

The distortion measures used for these two algorithms are different. The distortion
measures selected in both cases are based on the popular square-error criterion. One of our
primary objectives has been the ease of computation of the distortion measure. We have
shown that a simple count of ones found in the bitplanes of the lossless (original) video in

the transform domain corresponds directly to a square-error measure.

We have shown that the RD algorithms developed using the speciﬁed distortion
measures gave a better rate-distortion performance than the existing raster-scan rate-control

method that is currently implemented in the MPEG-4 FGS veriﬁcation model. It was also

82

observed that the region of greatest improvement corresponds to a bandwidth range that can

be available over current and evolving Internet access networks such as LAN 5 and DSL.

The calculation of the rate and distortion can be done off-line and the rate-distortion ratio
for every macroblock can be stored prior to transmission. This implies that implementing
these RD based algorithms in real-time calls for very little computational overhead during

the real-time streaming process of the desired video.

5.1 Future Work

One extension of the work done in this thesis is to incorporate the algorithm into existing
decoder systems for the MPEG-4 FGS standard coded video. This will verify that the
theoretical improvements observed in the transform (DCT) domain presented in this thesis

can also be observed in the pixel domain.

In the case of image and video systems, determining the best measure of distortion is a
challenge. The distortion measures used in this thesis were designed to reduce the
computational complexity, while remaining as close as possible to the actual mean-squared
error in the transform domain. Thus, another area of extending the work done in this thesis is
to explore the possibility of ﬁnding better measures of distortion that can correlate better
with subjective improvements as perceived by the human visual system. This could include
incorporating different weights for the distortion associated with the different DCT

coefﬁcients in the transform domain.

83

Finally, we are currently investigating the correlation among the three key measures used
in the development of the proposed algorithms: change-in-rate AR , change-in-distortion
AD , and their ratio AD/ AR. This correlation study could lead to an efﬁcient and optimum
estimate of the RD ratio (AD/AR) based on measuring only one of the two parameters
(AD, AR). This approach could be useful for systems that can easily measure (or have

access to) one of the parameters but not the other.

84

[2]

[6]

[7]

[8]

[9]

References

A.K. Jain, Fundamentals of Digital Image Processing, Prentice Hall Information and

System Sciences Series, pp. 476-560, 1989.

AK. Jain, “Image data compression: A review,” Proc. IEEE, vol. 69, pp. 349-384,

1981.

A. Ortega and K. Ramachandran, “Rate distortion methods for image and video

compression,” IEEE Signal Proc. Magazine, pp. 23- 50, Nov. 1998.

A. Puri and R. Aravind, “Motion-compensated video coding with adaptive perceptual

quantization,” IEEE Trans. Circuits Syst. Video T ech., vol. 1, pp. 351-361, 1991.

C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. Journal,

vol.27, pp.379-423, 1948.

C. E. Shannon, “Coding theorems for a discrete source with a ﬁdelity criterion,” in

IRE National Convention Record, Part 4, pp. 142-163, 1959.

C. Gonzales and E. Viscito, “Flexibly scalable digital video coding,” Signal

Processing: Image Communication, vol. 5, Feb. 1993.

CCITT Recommendation H.120, “Codecs for videoconferencing using primary

digital group transmission,” Geneva, 1989.

CCITT Recommendation H.261, “Video codec for audiovisual services at p x 64

kbit/s,” Geneva, 1990.

85

[10]

[11]

[12]

[14]

[15]

[16]

[17]

[18]

CCITT SG XV, “Draﬁ revision of recommendation H.261 ,” Document 572, Mar.

1990.

D. J. Le Gall, “The MPEG video compression algorithm,” Signal Processing: Image

Communication, vol. 4, pp. 129-140, 1992.

D. Lee, “New work item proposal: J PEG 2000 image coding system,” ISO/IEC

JTCl/SC29/WG1 N390, 1996.

E. Viscito and C. Gonzales, “A video compression algorithm with adaptive bit

allocation and quantization,” Proc. SPIE, 1605, pp. 58—72, 1991.

F.Ling, W.Li, and H.Sun, “Bitplane coding of DCT coefﬁcients for image and video
compression,” in Proc. SPIE Visual Communications and Image Processing (V CIP),

San Jose, CA, Jan.25-27, 1999.

G. K. Wallace, “The JPEG still picture compression standard,” IEEE Trans.

Consumer Electronics, vol. 38, pp. 18-34, 1992.

H. G. Musmann, M. Hotter and J. Ostennann, “Obj ect-oriented analysis-synthesis
coding of moving images,” in Signal Processing: Image Communication, vol. 1, pp.

117-138,1989.

H. Radha and Y. Chen, “F ine-granular-scalable video for packet networks” in Packet

Video '99, New York, Apr. 1999.

H. Radha, M. van der Schaar, Y. Chen, “The MPEG-4 ﬁne-grained scalable video
coding method for multimedia streaming over IP,” IEEE Trans. Multimedia, vol. 3,

Mar. 2001.

86

[19]

[20]

[21]

[22]

[24]

[25]

[26]

[27]

[28]

ISO/IEC 11172-2, “Information technology- coding of moving pictures and
associated audio for digital storage media at up to about 1. Mbits/s— video,” Geneva,

1993.

ISO/IEC 14496-2, “Information technology- coding of audio-visual objects: visual,”

Committee Draft, ISO/IEC JTCl/SC29/WG11, N2202, Mar. 1998.

ISO/IEC J TC/SC 29/WG ll N3515, “MPEG-4 video veriﬁcation model version

17.0”.

ISO/IEC JTCl/SC29/WG11, “MPEG-4 version 2 visual working draft rev 3.0,”

N2212, Mar. 1998.

J. De Lameillieure and R. Schafer, “MPEG2 image coding for digital TV,” F ernseh

und Kino T echnik, 48. Jahrgang, pp. 99-107, Mar. 1994 (in German).
K. Sayood, Introduction to Data Compression, Morgan Kaufrnann, 1996.

L. Stenger, “Digital coding of television signals—~CCIR activities for

standardization,” Signal Processing: Image Communication, vol.1, pp. 29-43, 1989.

M. Hauben, “Behind the net- the untold history of the ARPANET,”

http://www.dci.isepipp.pt/docs.l"'ar‘pa.html

 

M. Nelson, The Data Compression Book, M&T Books, 1995.

MPEG-4 FAQ, ISO/IEC JTCl/SC29/WG11, July 1997.

http:h’drogo.cscltstct.it/mpcg/fagl aq_mpcg-4.htm

 

87

[29]

[30]

[31]

[34]

[35]

[36]

[37]

N. J ayant, J. Johnston, and R. Saﬂanek, “Signal compression based on models of

human perception,” Proc. IEEE, Oct. 1993.

N. Wakamiya, M. Miyabayashi, M. Murata and H. Miyahara, “MPEG-4 video
transfer with TCP-friendly rate control," Proc. 4"' IEEE Intl. Conf on MMNS, 2216,

pp. 29-42, 2001.

R. C. Nicol and N. Mukawa, “Motion video coding in CCITT SG XV — the coded
picture format,” in Conference Record of IEEE Global Telecommunications

Conference and Exhibition—Communications for the Information Age, Hollywood,

vol. 2, pp. 992-996, Dec. 1988.
R. J. Clark, Transform Coding of Images, Academic Press, Orlando, Florida, 1985.

R. Schifer and T. Sikora, “Digital video coding standards and their role in video

communications,” in Proc. IEEE, pp. 907-924, vol. 83, Jun. 1995.

Scalability for stereo video coding,

http://trkjist.ac.kr/~3Dz"Research/Stcrco/scalc_vitico.html

 

T. Berger and L. D. Davidson, “Advances in source coding,” in CISM- Courses and

Lectures, no. 166, Udine 1975.

T. Ebrahimi, “MPEG-4 video veriﬁcation model 8.0,” ISO/IEC JTC 1/SC29/WG11

MPEG-97 Doc No. W1796, July 1997.

T. Sikora and L. Chiariglione, “The MPEG-4 video standard and its potential for

future multimedia applications,” Proc. IEEE ISCAS Conf, Hong Kong, June 1997.

88

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

T. Sikora, “ The MPEG-4 video standard veriﬁcation model,” IEEE Trans. CS V T,

vol. 7, Feb. 1997.
“Test model 5,” ISO/IEC JTCl/SC29/WG11/No. 400, MPEG93/457, Apr. 1993.

Testing Ad Hoc Group, “JPEGZOOO testing results.” ISO/IEC

JTClJ/SC29/WG1/N705, Nov. 1997.

V. Paxson, “End-to-end intemet packet dynamics,” in Proc. ACM SIGCOMM, vol.

27, pp.13-52, Oct. 1997.

W. Li, “Overview of ﬁne granularity scalability in MPEG-4 video standard,” IEEE

Trans. CSVT, vol. 11, Mar. 2001.

W. Pennebaker and J. Mitchell, JPEG Still Image Data Compression Standard, Van

Nostrand Reinhold, 1994.

W.H. Chen and WK. Pratt, “Scene adaptive coder,” IEEE Trans. Communications,

pp. 225-232, 1984.

W.Li, F.Ling and H.Sun, “BitPlane coding of DCT coefﬁcients,” ISO/IEC

JTCl/SC29/WG11, MPEG97/M269l. Oct. 1997.

89