y 1.4.

....)

 

 

 

virxqwﬂv'!

.759 . ...:
l a.

94

.y. )1
E!\; V
.1

w . ‘5!1

; .¢ , m. V». . -., ... ... M. ..n. . . m......u~u..r....€.; awniefuin, 3.51; .2
,gﬁaaﬁk 7 . . EEEEEEE . ‘ ﬁrmwc$g$§§§EEN.

 

LIBRARIES
MICHIGAN STATE UNIVERSITY
EAST LANSING, MICI-I 48824-1048

 

This is to certify that the
dissertation entitled

OCTREE-BASED ANIMATED GEOMETRY COMPRESSION

presented by

Jinghua Zhang

has been accepted towards fulﬁllment
of the requirements for the

 

 

vMéfor Professor's Signature

Feed) 17 2005"
'Date

MSU is an Minn-live AcdonEquoi Opportunity institution

 

PLACE IN RETURN Box to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 c:/ClFtC/DateDue.l

 

15‘

 

OCTREE —BASED ANIMATED GEOMETRY COMPRESSION

By

J inghua Zhang

A DISSERTATION

Submitted to
MICHIGAN STATE UNIVERSITY
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2005

ABSTRACT

OCTREE-BASED ANIMATED GEOMETRY COMPRESSION

By

J inghua Zhang

Geometry compression is the compression of the 3D geometric data that provides
a computer graphics system with the scene description necessary to render images.
Geometric data is quite large and, therefore, needs effective compression methods to
decrease the transmission and storage requirements. A large amount of research has been
focused on static geometry compression, but only limited research has addressed
animated geometry compression, the compression of temporal sequences of geometric
data. The goal of this thesis is to represent 3D animation sequences with a reduced set of
motion vectors that take advantage of the large data coherence in space and time. An
octree-based motion representation method is proposed. In this approach, a small set of
motion vectors are generated for each frame by accessing two consecutive frames at a
time. These motion vectors represent the differential motion from the previous frame to
the current frame. They are used to predict the vertex positions for each frame. The
octree motion representation process generates a hierarchical octree structure for each
frame in the sequence. Based on this approach, a hybrid coding method is proposed that
combines the octree-based approach and delta coding method. Given the same threshold
requirement, the hybrid approach performs better than the octree only approach in terms

of compression ratio. Both the octree and hybrid approaches can represent 3D

animated sequences with high compression factors while maintaining reasonable quality.
These two approaches are easy to implement and have a low cost encoding process and a

fast decoding process, which make them very suitable for real time application.

DEDICATION

To my parents, my husband (J insheng), my son (Haiming) and my sister (Songhua).

iv

ACKNOWLEDGMENTS

I would like to express my appreciation to my adviser, Dr. Owen for his generous
guidance and encouragement during my Ph.D. study. I also thank him for valuable
suggestions and comments on the thesis. I also would like to thank my other committee
members, Dr. Ofria, Dr. Hughes and Dr. Stapleton for their support and comments on the
thesis.

I would like to thank my entire family members for their support and love.

I want to thank Andrew Glassner for providing access to the Chicken data and
MIT CSAIL Graphics Lab for the Dance sequence used in this thesis. I also want to thank

Professor Craig Gotsman for providing the TG coder.

TABLE OF CONTENTS

 

 

 

 

 

 

LIST OF TABLES-L-.-"- - ........... - - VIII
LIST OF FIGURES ..... -- ----- - X
1 INTRODUCTION - - ‘ - - -1
1.1 Background ......................................................................................................... 3

1.2 Thesis statement .................................................................................................. 5

1.3 Major contributions ............................................................................................. 6

1.4 Thesis structures .................................................................................................. 7

2 DATA COMPRESSION TECHNIQUES _ - - - 9
2.1 Text compression and Entropy .......................................................................... 10
2.1.1 Entropy ............................................................................................... 12

2.1.2 Huffman coding ................................................................................. 14

2.1.3 Arithmetic Coding ............................................................................. 16

2.2 Media data compression .................................................................................... 17
2.2.1 Fourier Transform (FT) ...................................................................... 18

2.2.2 Wavelets ............................................................................................. 19

2.2.3 Vector Quantization ........................................................................... 22

2.2.4 JPEG .................................................................................................. 24

2.2.5 MPEG ................................................................................................ 26

2.3 Summary ........................................................................................................... 28

3 RELATED WORK ON 3D COMPRESSION 29
3.1 Static geometry compression ............................................................................ 29

3. 1 . 1 Introduction ........................................................................................ 30

3.1.2 Geometric data representations .......................................................... 31

3.1.3 Vertex compression techniques ......................................................... 33

3.1.4 Connectivity compression techniques ................................................ 37

3.2 Animated geometry compression ...................................................................... 46

3 .2. 1 Introduction ........................................................................................ 46

3.2.2 Time-dependent geometry compression ............................................ 46

3.2.3 Principle component analysis approach ............................................. 47

3.2.4 Dynapack ........................................................................................... 48

3.2.5 Geometry Video ................................................................................. 49

3.2.6 Soft—body animation compression ..................................................... 50

3.3 Summary ........................................................................................................... 50

4 OCTREE-BASED ANIMATED GEOMETRY COMPRESSION - -- - 42
4.1 Data representation ............................................................................................ 52

4.2 Test data sets ..................................................................................................... 55

4.3 Octree-based approach ...................................................................................... 57
4.3.1 Encoding process ............................................................................... 58

vi

4.3.2 Motion—modeling process .................................................................. 59

 

 

 

 

 

4.3.3 Tri-linear interpolation ....................................................................... 62

4.3.4 Motion vector computation ................................................................ 64

4.3.5 Adaptive arithmetic coding ................................................................ 66

4.4 Evaluations ........................................................................................................ 67
4.4.1 Octree results ..................................................................................... 67

4.4.2 Octree vs. PCA ................................................................................... 74

4.4.3 Octree vs. pure static compression mechanism ................................. 81

4.4.4 Octree vs. other approaches ............................................................... 81

4.5 Conclusions ....................................................................................................... 82

5 DELTA APPROACH... - - - -- 83
5.1 Delta encoding process ..................................................................................... 84

5.2 Evaluations ........................................................................................................ 85
5.2.1 Delta approach results ........................................................................ 85

5.2.2 Overall compression ratio comparison: delta vs. octree .................... 88

5.2.3 F rame-wise compression ratio comparison: delta vs. octree ............. 90

5.3 Conclusions ....................................................................................................... 92

6 HYBRID APPROACH--- ......... - -- -94
6.1 Hybrid encoding process ................................................................................... 94

6.2 Evaluations ........................................................................................................ 96
6.2.1 Hybrid approach results ..................................................................... 96

6.2.2 Compression ratio comparison: hybrid, octree and delta ................. 103

6.3 Conclusions ..................................................................................................... 105

7 SYSTEM - 106
7.1 System design ................................................................................................. 107

7.2 System features ............................................................................................... 110

8 CONCLUSIONS AND FUTURE WORK.-. ...... - - --.--.113
8.1 Conclusions ..................................................................................................... 113

8.2 Future work ..................................................................................................... 114
REFERENCES - - - - - --------- - -- ..117

vii

LIST OF TABLES

Table 2.1: Probability of each symbol in the message ...................................................... 15
Table 2.2: Huffman codes. ................................................................................................ 15
Table 4.1: Test data sets information. ............................................................................... 56

Table 4.2: Compression ratios of the Chef animation. The Chef animation has 75 ﬂames
and each frame has 8162 triangles and 4241 vertices. The object size is 440
units in x dimension, 148 units in y dimension and 455 units in z dimension.
Original ﬁle size is 3,816,900 bytes. Max Distance threshold =p% *455 units,
where p=1, 2, 3, ...7. .......................................................................................... 68

Table 4.3: Compression ratios of the Chicken Crossing animation. The Chicken Crossing
animation has 400 frames and each frame has 5664 triangles and 3030 vertices.
The object size is 2.556 units in x dimension, 2.23 units in y dimension and 1.07
units in z dimension. The original ﬁle size is 14,544,000 bytes. Max Distance
threshold= p% *2.556 units, where p=2.5, 5, 7.5, 10, 20. ............................. 68

Table 4.4: Compression ratios of the Dance animation. The Dance animation has 201
ﬂames and each ﬂame has 14118 triangles and 7061vertices. The object size is
0.0758 units in x dimension, 0.172 units in y dimension and 0.074 units in z
dimension. Original ﬁle size is 17,031,132 bytes. Max Distance
threshold=p%*0.l72 units, where p=l, 2, 3, 8. ............................................. 68

Table 5.1: Compression ratios of the Chef animation using the delta approach. The Chef
animation has 75 ﬂames and each ﬂame has 8162 triangles and 4241 vertices.
The object size is 440 units in x dimension, 148 units in y dimension and 455
units in z dimension. Original ﬁle size is 3,816,900 bytes. Max Distance
threshold =p% *455 units, where p=1, 2, 3, ...6. ............................................... 86

Table 5.2: Compression ratios of the Chicken animation using the delta approach. The
Chicken Crossing animation has 400 ﬂames and each frame has 5664 triangles
and 3030 vertices. The object size is 2.556 units in x dimension, 2.23 units in y
dimension and 1.07 units in z dimension. The original ﬁle size is 14,544,000
bytes. Max Distance threshold= p% *2.556 units, where p=2.5, 5, 7.5, 10, 15,
20. ....................................................................................................................... 86

Table 5.3: Compression ratios of the Dance animation using the delta approach. The
Dance animation has 201 ﬂames and each frame has 14118 triangles and
7061vertices. The object size is 0.0758 units in x dimension, 0.172 units in y
dimension and 0.074 units in z dimension. Original ﬁle size is 17,031,132 bytes.

Max Distance threshold-=p%*0.172 units, where p=1, 2, 3, 4, 5, 6. .................. 86
Table 6.1: Percentage of the octree encoded and delta encoded ﬂames in the hybrid
method. ............................................................................................................... 97

Table 6.2: Compression ratios of the Chef animation using the hybrid approach. The Chef
animation sequence has 75 ﬂames and each frame has 8162 triangles and 4241

viii

vertices. The object size is 440 units in x dimension, 148 units in y dimension
and 455 units in z dimension. Original ﬁle size is 3,816,900 bytes. Max Distance
threshold =p% *455 units, where p=l, 2, 3, ...6. ............................................... 98

Table 6.3: Compression ratios of the Chicken Crossing animation using the hybrid
approach. The Chicken Crossing animation has 400 ﬂames and each frame has
5664 triangles and 3030 vertices. The object size is 2.556 units in x dimension,
2.23 units in y dimension and 1.07 units in z dimension. The original ﬁle size is
14,544,000 bytes. Max Distance threshold= p% *2.556 units, where p=2.5, 5,
7.5, 10, 15, 20. .................................................................................................... 98

Table 6.4: Compression ratios of the Dance animation using the hybrid approach. The
Dance animation has 201 ﬂames and each ﬂame has 14118 triangles and
7061vertices. The object size is 0.0758 units in x dimension, 0.172 units in y
dimension and 0.074 units in z dimension. Original ﬁle size is 17,031,132 bytes.
Max Distance threshold=p%*0.l72 units, where p=1, 2, 3, 4, 5, 6. .................. 98

ix

LIST OF FIGURES

Figure 2.1: Huffman tree ................................................................................................... 15
Figure 2.2: Arithmetic coding process. ............................................................................. 17
Figure 2.3: One-dimensional VQ ...................................................................................... 22
Figure 2.4: Two-dimensional VQ. .................................................................................... 23
Figure 2.5: JPEG compression diagram ............................................................................ 25
Figure 2.6: Illustration of an inter-coded P frame ............................................................. 26
Figure 2.7: Illustration of an inter-coded B ﬂame. ........................................................... 26
Figure 3.1: This head model has 11,703 vertices and 23,402 triangle faces. (a) Wire
frames. (b) Shaded Illustration. (c) Triangle mesh representation. .................... 33
Figure 3.2: Parallelogram Rule. ........................................................................................ 36
Figure 3.3: Triangle strip. ................................................................................................. 40
Figure 3.4: Generalized triangle strip. .............................................................................. 41
Figure 3.5: Edge collapse operation .................................................................................. 44
Figure 3.6: Vertex insert operation. .................................................................................. 44
Figure 4.1: 3D animation representation. ......................................................................... 53
Figure 4.2: Histogram of vector length differences between the motion of the vertex and
the average motion of the nearest four neighboring vertices. .......................... 54
Figure 4.3: Histogram of vector angle differences between the motion of the vertex and
the average motion of the nearest four neighboring vertices. .......................... 55
Figure 4.4: Sample ﬂames ﬂom original Chef animation sequence. ............................... 56
Figure 4.5: Sample frames ﬂom original Chicken animation sequence. .......................... 57
Figure 4.6: Sample frames ﬂom original Dance animation sequence. ............................. 57
Figure 4.7: Logical ﬂow of encoding process for compressing 3D animated ﬂames. ..... 59
Figure 4.8: (a) Eight motion vectors. (b) The splitting process. (0) Corresponding octree
representation. .................................................................................................... 61
Figure 4.9: Tri-linear Interpolation. .................................................................................. 63

Figure 4.10: A selected set of reconstructed Chef animation. The top row is the original
animation. The reconstructed animation by using Max Distance threshold = 1%,
3%, 5%, 7% are shown in row 2, 3, 4, 5 respectively. The compression ratio is
12:1, 33:1, 50:1, and 93:1 in row 2, 3, 4, 5 respectively. ................................... 69

Figure 4.1 1: A selected set of reconstructed Chicken Crossing animation. The top row is
the original one. The reconstructed animation by using Max Distance threshold
= 5%, 10%, 15% and 20% are shown in row 2, 3, 4, 5 respectively. The
compression ratio is 18:], 40:1, 67:1 and 117:1 in row 2, 3, 4, 5 respectively. 70

Figure 4.12: A selected set of reconstructed Dance animation. The top row is the original
animation. The reconstructed animation by using Max Distance threshold = 1%,
3%, 5%, 7% are shown in row 2, 3, 4, 5 respectively. The compression ratio is
15:1, 36:1, 59:1, and 92:1 in row 2, 3, 4, 5 respectively. ................................... 71

Figure 4.13: L2 distances of some sample ﬂames in the reconstructed Chef animation by
using Max Distance threshold = 1%, 3%, 5% and 7% respectively. The
compression ratio is 12:1, 33:1, 50:1, and 93:1 respectively. ............................ 73

Figure 4.14: L2 distances of some sample ﬂames in the reconstructed Chicken animation
by using Max Distance threshold = 5%, 10%, 15% and 20% respectively. The
compression ratio is 18:1, 40:1, 67:1 and 117:1 respectively. ........................... 73

Figure 4.15: L2 distances of some sample ﬂames in the reconstructed Dance animation by
using Max Distance threshold = 1%, 3%, 5% and 7% respectively. The
compression ratio is 15:1, 36:1, 59:1, and 92:] respectively. ............................ 74

Figure 4.16: L2 distance comparison between the octree and PCA approaches using the
Chef animation. This comparison is based on approximately the same
compression ratio (12:1). To get this ratio, threshold in the octree approach is set
to be 1% of the initial cubic bounding box size. It corresponds to PCA with 6
bases. .................................................................................................................. 75

Figure 4.17: L2 distance comparison between the octree and PCA approaches using the
Chef animation. This comparison is based on approximately the same
compression ratio. The compression ratio of PCA with 3 bases is 24:1 and the
compression ratio of the octree with threshold 2.5% is 27:1. ............................ 76

Figure 4.18: Image in the middle is the original ﬂame 25. The reconstruction using the
octree approach with threshold 2.5% and PCA with 3 bases are shown in the
ﬁrst and last image respectively. ........................................................................ 76

Figure 4.19: L2 distance comparison between the octree and PCA approaches using the
Chicken animation. This comparison is based on approximately the same
compression ratio. The compression ratio of PCA with 19 bases is 19:1 and the
compression ratio of the octree with threshold 5% is 18:1. ............................... 77

Figure 4.20: L2 distance comparison between the octree and PCA approaches using the
Chicken animation. This comparison is based on approximately the same
compression ratio (40:1). To get this ratio, threshold in the octree approach is
10%. It corresponds to PCA with 9 bases. ......................................................... 78

Figure 4.21: Image in the middle is the original ﬂame 300. The reconstruction using the
octree approach with threshold 5% and PCA with 19 bases are shown in the ﬁrst
and last image respectively ................................................................................. 78

Figure 4.22: L2 distance comparison between the octree and PCA approaches using the
Dance animation. This comparison is based on approximately the same

xi

compression ratio. The compression ratio of PCA with 14 bases is 14:] and the
compression ratio of the octree with threshold 1% is 15:1. ............................... 79

Figure 4.23: L2 distance comparison between the octree and PCA approaches using the
Dance animation. This comparison is based on approximately the same
compression ratio. The compression ratio of PCA with 4 bases is 48:1 and the
compression ratio of the octree with threshold 5% is 59:1. ............................... 79

Figure 4.24: Image in the middle is the original ﬂame 150. The reconstruction using the
octree approach with threshold 1% and PCA with 14 bases are shown in the ﬁrst
and last image respectively ................................................................................. 80

Figure 5 . 1: Logical ﬂow of the encoding process of the delta method ............................. 84

Figure 5.2: L2 distances of some sample ﬂames in the reconstructed Chef animation using
the delta approach given different threshold settings. ........................................ 87

Figure 5.3: L2 distances of some sample ﬂames in the reconstructed Chicken animation
using the delta approach given different threshold settings. .............................. 87

Figure 5.4: L2 distances of some sample ﬂames in the reconstructed Dance animation
using the delta approach given different threshold settings. .............................. 88

Figure 5.5: Compression ratio comparison between the delta and octree approach using
the Chef animation. ............................................................................................ 89

Figure 5.6: Compression ratio comparison between the delta and octree approach using
the Chicken animation. ....................................................................................... 89

Figure 5.7: Compression ratio comparison between the delta and octree approach using
the Dance animation. .......................................................................................... 90

Figure 5.8: F rame-wise compression ratio comparison between the delta and octree
approach using the Chef animation given the same threshold settings.
(threshold=2%). .................................................................................................. 91

Figure 5.9: F tame-wise compression ratio comparison between the delta and octree
approach using the Chicken animation given the same threshold settings.
(threshold=5%). .................................................................................................. 91

Figure 5.10: Frame-wise compression ratio comparison between the delta and octree
approach using the Dance animation given the same threshold settings.

(threshold=1%). .................................................................................................. 92
Figure 6.1: Logical ﬂow of the hybrid encoding process for compressing 3D animation
sequence. ............................................................................................................ 94

Figure 6.2: A selected set of reconstructed Chef animation using the hybrid approach.
The top row is the original animation. The reconstructed animation by using
Max Distance threshold = 1% and 5% are shown in row 2and 3 respectively.
The compression ratio is 22:1 and 78:1 in row 2 and 3 respectively. ................ 99

Figure 6.3: A selected set of reconstructed Chicken Crossing animation using the hybrid
approach. The top row is the original one. The reconstructed animation by using
Max Distance threshold = 5%, 10% and 15% are shown in row 2, 3, and 4

xii

respectively. The compression ratio is 47:1, 77:1 and 112:1 in row 2, 3 and 4
respectively. ...................................................................................................... 100

Figure 6.4: A selected set of reconstructed Dance animation using the hybrid approach.
The top row is the original animation. The reconstructed animation by using
max distance threshold = 1% and 3% are shown in row 2 and 3 respectively. The

compression ratio is 17:1 and 37:1 in row 2 and 3 respectively. ..................... 101
Figure 6.5: L2 distances of some sample ﬂames in the reconstructed Chef animation using

the hybrid approach given different threshold settings. ................................... 102
Figure 6.6: L2 distances of some sample ﬂames in the reconstructed Chicken animation

using the hybrid approach given different threshold settings ........................... 102
Figure 6.7: L2 distances of some sample ﬂames in the reconstructed Dance animation

using the hybrid approach given different threshold settings ........................... 103
Figure 6.8: Compression ratio comparison among the hybrid, octree and delta approach

. using the Chef animation. ................................................................................. 104

Figure 6.9: Compression ratio comparison among the hybrid, octree and delta approach

using the Chicken animation. ........................................................................... 104
Figure 6.10: Compression ratio comparison among the hybrid, octree and delta approach

using the Dance animation. .............................................................................. 105
Figure 7 .1: System Diagram. .......................................................................................... 108

xiii

Chapter 1

1 Introduction

3D graphics are rapidly achieving mainstream success, both in motion pictures
and computer gaming. The conﬂuence of fast real-time hardware and advanced modeling
and rendering capabilities is opening the door to a world where the animated ﬁlms of the
future will be rendered and viewed on the desktop with varying viewpoints, high
resolutions, and with output devices ranging ﬂom cell-phones to theatre-sized projection.
But, as these models grow more complex, more detailed, and longer format, it becomes
increasingly difﬁcult to efﬁciently store and transmit the huge 3D models that form the
basic input for these future rendering systems. Geometric data is quite large and,
therefore, needs effective compression methods to decrease the transmission and storage
requirements.

The focus of this thesis is the design of efﬁcient and powerful algorithms for
compressing and transmitting animated geometric data. Geometry compression is the
compression of the 3D geometric data that provides a computer graphics system with the
scene description necessary to render images. Animated 3D models require even larger
memory and transmission bandwidth since a sequence consists of a large number of
frames and every ﬂame is an already large static 3D object. Therefore, efﬁcient
compression techniques need to be applied to the 3D animation sequence before

distributing it over the network.

Animated geometry compression is the compression of temporal sequences of
geometric data. Compressed animated geometry models make possible the incorporation
of rich animations in the interactive applications of the ﬁiture. As a simple example, an
efﬁcient and powerful mechanism for animated geometry compression allows for the
delivery of an animated motion picture such as Shrek as a 3D model using the least
possible bandwidth in real time on demand or as a multicast. Rather than rendering the
ﬁlm to video ﬂames in advance, the rendering can be done at the delivery point.
Symmetric Multi-Rendering with Dynamic Load Balancing mechanism proposed by
nVidia may accelerate rendering process dramatically [1]. Delivering content as a 3D
model rather than as video ﬂames opens up a new realm of applications. A user can then
view the ﬁlm ﬂom different and arbitrary viewpoints, in stereo, or in an immersive
Virtual Reality environment such as Improvisational Theater Space. The Improvisational
Theater Space is an interactive theatre where the users are put into the characters and
perform together with virtual actors [2]. To achieve this data delivery in real time, a
method that can achieve high compression ratios while maintaining good quality is
required.

Animation systems typically represent animation with a relatively small
parameter space. The motion of a limb may be represented by only two discrete motion
points modeled on a Bezier path, with the vertex motion being interpolated using
algorithms sensitive to the skeletal properties of the animated character. Hence,
animation should lend itself to high levels of compression since this underlying
parameterization implies a great deal of correlation among vertex data. Indeed, if the

underlying parameterizations were available, an encoder could likely create highly

efﬁcient representations. However, this data is generally not available for distribution
and likely to remain unavailable to rendering systems for several reasons. Often the
animation systems incorporate modeling techniques that are proprietary in nature, using
algorithms subject to intellectual property protection. The mechanisms necessary to
conversion of parameterized graphical models to animated meshes are oﬁen complex,
particularly in relation to physical modeling systems such as water, ﬁre, or explosions [3,
4] and online rendering is neither practical or economical. And, distribution of structural
model data and associated animation parameterization greatly simpliﬁes unauthorized
manipulation of the models, aiding greatly those who would steal the complex model
designs in these systems for unauthorized derivative works.

Indeed it is likely that distribution will be limited to simple polygon or triangle
meshes, rather than complete hierarchical scene graphs so as to limit user modiﬁcations
of sequences. Hence, this thesis assumes the mostly simplistic representation of the
geometric data: a temporal sequence of polygonal meshes for each discrete ﬂame of the

animated sequence.

1.1 Background

The basis for this thesis is the transmission of animated geometric data. Clearly,
the genesis for this work is the efficient compression of static data. The general ﬁeld of
geometry compression is concerned with efﬁcient compression of 3D data, be it animated
or static. Most initial work has been focused on static geometry. This section provides a
short overview of the current state of the art in geometry compression. More detail on

compression methods is provided in later chapters.

Deering ﬁrst proposed geometry compression techniques in 1995 with
compression ratios of 6-10 to 1 [5]. Since that time, various strategies for geometry
compression have been reported [6-9]. Progressive compression provides for partial data
transmission at reduced resolution and is particularly useful when varying levels of detail
are required. Progressive compression techniques have been discussed since Hoppe’s
progressive meshes [10-18]. Boosen provides a comprehensive overview of the reported
geometry compression techniques [19]. Another geometry compression survey paper was
written by Shikhare [20]. Luebke wrote a survey on polygonal simpliﬁcation algorithms
[21]. All these techniques focus on static geometry. Less attention has been paid to
animated geometry compression [22-26].

Lengyel presented a prediction compression method that splits vertices into sets
and uses afﬁne transformations to approximate the vertex paths [22]. His method is
effective only when the animation is well represented by the supported transformations.
Alexa and Miiller proposed an interpolation predictor to represent animations by
principal components [23]. Principal Component Analysis (PCA) makes their approach
expensive. It requires signiﬁcant memory usage and processing time. The PCA approach
cannot achieve high compression ratios if the number of ﬂames is signiﬁcantly smaller
than the number of vertices. Ibarria and Rossignac proposed a time-space predictor for
all the vertices and all the ﬂames [24]. Briceﬁo et al. proposed an algorithm to generate
geomeﬂy video from 3D animated meshes [25]. They extended the geometry images
proposed by Gu et al.[27]. To construct a geometry video, all the ﬂames in the animation
sequence are replaced by the corresponding geometry images. Conventional video

compression techniques are applied to encode geometry video.

The TG coder proposed by Touma and Gotsman can compress a typical input to
approximately 9 bits per vertex [8]. In the raw data, each vertex has 3 coordinates and
each coordinate needs 32 bit ﬂoating point number to represent it, so the typical
compression ratio is about 10 to 1. If this technique is used to compress an animation
sequence, the compression ratio it can achieve is still about 10 to 1. The size of an
animation sequence depends on the number of vertices in each ﬂame and the number of
frames in the sequence. The raw data size of an animation sequence is numﬂames *
numvertices * 96 bits. The PCA approach can achieve 39.8 to 1 with reasonable quality,
but it has too many limitations. Therefore a better approach is needed to compress

animated geometry.

1.2 Thesis statement

The goal of this thesis is to design new compression algorithms for animated
geometry that improve on existing methods, both in compression ratios and quality. The
methods presented herein represent 3D animation sequences with a reduced set of motion
vectors that take advantage of the large vertex data coherence in space and time.

This thesis focuses on the compression of vertex positions for each ﬂame in the
animation sequence and the connectivity is assumed to be the same for all the ﬂames. An
octree-based motion representation method is proposed that hierarchically represents 3D
animation [28]. This method only needs to access two consecutive ﬂames at a time to
generate differential motion between ﬂames. It controls the quality of every ﬂame in the
reconstructed sequence. It achieves high compression ratios with reasonable quality.

This approach is easy to implement and has low cost features. An additional delta coding

method is also presented that is effective for selected data sequences, but is not generally
the best solution. To further increase the performance of the octree-based motion
representation method, a hybrid coding method is proposed that combines the octree-
based and delta coding approaches [29]. The hybrid coding method achieves higher

compression ratios given similar quality requirements.

1.3 Major contributions

The contribution of this thesis is listed as follows:

0 A novel new octree-based approach is proposed and implemented that efﬁciently
represents the data coherence in a 3D animation sequence.

0 A novel new delta coding mechanism is proposed and implemented that
represents the data coherence in a 3D animation sequence and is efﬁcient as a
compression mechanism on some pair-wise ﬂame sequences.

0 A hybrid coding approach is proposed and evaluated that combines the octree-
based approach and delta approach to improve the performance over the octree
only approach.

0 All the systems are evaluated in terms of compression ratio and L2 norm.

0 The octree-based approach is compared to a pure static compression mechanism
and the existing PCA animated geometry approach. The systems are compared in
terms of L2 norm.

0 The delta approach is compared to the octree-based approach in terms of the

overall performance and the ﬂame-wise performance.

o The hybrid approach is compared to the octree—based approach and the delta
approach in terms of compression ratio.

0 A systems-layer is proposed for animated geometry compression that supports
practical application of these methods in distribution systems including stream
joining, random access in sequences, merging with additional multimedia streams

and playback controls.

1.4 Thesis structures

This thesis is organized as follows. Chapter 2 reviews previous work on text and
media compression, providing the basic context for all compression methods. In
particular the relation of the mature ﬁeld of video compression to animated geometry
compression is examined. Chapter 3 reviews existing work on static and animated
geometry compression. Chapter 4 presents a new representation of 3D animation using an
octree-based motion representation method. The chapter includes evaluation results
between the octree-based approach and PCA approach. Brief comparisons with other
approaches are also included. Chapter 5 describes a delta coding approach to represent
animated geometric data. It presents evaluation results between the octree-based
approach and the delta coding approach in terms of compression ratio. Chapter 6 presents
a hybrid coding approach that combines the octree-based approach and delta coding
approach. By selectively incorporating these two mechanisms, the method is able to
achieve better performance than any of these methods individually. Evaluation results
for the mechanism are presented in detail. Chapter 7 describes the system implementation

of a ﬂame-based geometry system with particular emphasis on support for practical

playback requirements including stream joining, playback controls, and error correction.
The details of such a system are contrasted to the mature MPEG systems layer designs.
Chapter 8 includes the conclusions of this thesis and describes possible extensions of the

current work.

Chapter 2

2 Data compression techniques

Data compression is the conversion of an input data stream into another data
stream that has smaller size [30]. Data compression is of great importance because it
allows users to save storage space and to transmit data efﬁciently. The use of data
compression makes it possible to store more data on a disk and transmit more data in an
available bandwidth. This chapter illustrates some basic compression concepts by
presenting how they are utilized in text and media data compression. Many elements of
these techniques form the basic tools for the implementation of 3D geometry
compression.

Compression techniques can be classiﬁed in many ways [31]. One such
partitioning is to subdivide the various methods into lossless techniques and lossy
techniques. Loss/ess compression refers to data compression methods where the data can
be completely recovered after decompression and the recovered data is bit-wise identical
to the original data. Lossy compression refers to data compression methods where the
data cannot be completely recovered aﬁer decompression and the recovered data is not
identical to the original data because some information has been lost. Typically, the
information that is lost is chosen so as to be imperceptible. The advantage of lossy
compression is that it typically achieves much greater compression ratios than the best
lossless methods. Lossy techniques take advantage of the fact that, in many applications,

exact data representation may not be necessary. If the data after compression is

perceptually indistinguishable ﬂom the original in the application context, it does not
matter if the data is bit-wise identical. This is particularly the case for media data such as
images and audio. Consequently, lossy methods are especially useful in compressing
images, movies or sounds but rarely suitable for text compression.

This chapter is organized as follows: Section 2.1 reviews text compression
techniques. The concept of entropy and entropy coding methods such as Huffman coding
and arithmetic doing are introduced. Section 2.2 reviews media compression techniques.
Transform coding methods are illustrated by presenting how they are used in media data
compression, such as Fourier transform and wavelet transform. J PEG and MPEG are also
included in this section as an illustration of how these methods are applied in static image
compression and moving sequence compression. The goal of this chapter is to introduce

tools and concepts that will be further developed in the context of geometry compression.

2.1 Text compression and Entropy

Methods that represent text using fewer bits or bytes are referred to as text
compression [32]. Text data is highly sensitive to bit-wise modiﬁcations; even a single bit
change will change a letter in context and modify the meaning. Hence, text compression
methods must be lossless, i.e. able to reconstruct the original text exactly. Most modern
data compression algorithms break the compression process into two parts, a predictive
modeler and an entropy encoder [32]. The predictive modeler assigns probabilities to
symbols by analyzing the pattern of usage in the input data and the encoder translates the
symbols into a sequence of bits representing a compressed version of the original

message. These two processes are almost completely independent. The encoder does not

10

need to know the prediction process, so different prediction models can be applied in
various applications. The encoder and modeler development have proceeded
independently.

As an example, assume a text data set consisting only of the symbols a, b, and c.
A predictive modeler will decide the symbols actually subject to encoding and their
probabilities in the data stream. In this case, assume that the letter a has probability 0.5
and b and c have probabilities 0.25. This is the determination a predictive modeler seeks
to make. This determination might have been made by analyzing the input data or known
in advance by analysis of a standard corpus of source data with similar distributions.

In this example, an encoder may choose to assign the bit sequence 0 to represent
a, 01 to represent b, and 00 to represent c. This assignment takes advantage of the fact
that the symbol a is more ﬂequent and, therefore, should be represented with fewer bits.

Static modeling assumes that the encoder and decoder agree in advance on a ﬁxed
model. In the previous example, drawing probabilities ﬂom a standard corpus would be
an example of static modeling. In adaptive modeling, the code used for particular data is
dependent on the data actually being transmitted [33]. In the above example, drawing the
probabilities from the input data would be an example of adaptive modeling. It is
common that adaptive modeling will all continuously adjust the symbol probabilities as
the data is streamed, thereby adapting to changing characteristics of the input streams.

Numerous methods exist for predictive modeling of text data [31]. Descriptions
of these methods are beyond the scope of this thesis. However, it is illustrative to discuss
the concept of symbol assignment in text compression. There is no reason that a symbol

must be a single character. Indeed, methods for text compression have been proposed

11

based on single character symbols, bi-grams, tri-grams, and complete word units as
symbols. An important element in the design of any predictive modeler is the choice of

symbols.

2.1.1 Entropy

Entropy is one of the most important terms in data compression. It is a measure of
the actual information content of data [32]. It plays a central role in information theory
as a measure of information, choice and uncertainty. If there is considerable redundancy
in a data set then the value of entropy is small because redundancy implies copies of data,
not new data. If a data set is highly predictable, the entropy is low, since predictability
implies a lesser amount of actual information content.

As an example, the entropy of a stream of heads/tails selections of a coin, where
the coin has equal probability of each result, will be 11, where n is the number of coin
tosses. The events are independent, so each new coin toss cannot be predicted; each new
coin toss represents a new datum. However, if the coin has a 90% chance of coming up
heads, the entropy will be much less, since each coin toss can be predicted with a high
degree of certainty. The only real data in this case are the number of tosses and the cases
where the coin does not come up heads. The entropy of this stream turns out to be 0.47n
bits, as will be demonstrated shortly.

Entropy is intimately related to data compression. In an ideal (theoretically
optimum) case, the encoded message length would be equal to its entropy.

Suppose that there is a set of n possible events with known

probabilities p, , p2 p" that sum to l. The entropy of this set measures how much choice
is involved in the selection of the event and how uncertain we are of the outcome.

12

Shannon postulated that the entropy E ( p,, p,... p") should have the following required
properties [34]:
1) E should be continuous in p,;

2) If each event is equally likely, E should be a steadily increasing function of n.

3) If a choice is broken down into successive choices, the original B should be the
weighted sum of individual values of E.
Shannon demonstrated that the only function E that satisﬁes the above three

requirements is in the form of Equation 2.1, the equation for entropy.

Erp. 422.42. )=- k2 12. log 12. (2.1)

i=1

In Equation 2.1, k is a positive constant and governs the unit in which entropy is
measured. In computer applications this unit is typically the “bit”, where k=l and logs are

taken with base 2. In this case, the entropy function is in the form of Equation 2.2.

E(p1,pz-~p.)=- Zia-1082 p.- (2.2)

i=1

The minus sign guarantees the result for entropy to be a positive value. This
means that the message with greater probability contains less “information”. Therefore
the overall entropy is the average of the entropy of the individual decisions involved.

In the previous coin toss example, if the coin has a 90% chance of coming up
heads and 10% chance of coming up tails, the entropy of each coin toss is shown in

Equation 2.3 after applying Equation 2.2. The minimum number of bits needed to present

13

the toss result of a biased coin turns out to be 0.47 bits. The entropy of the coin toss

stream turns out to be 0.47n bits, where n is the number of tosses.

—0.l"‘log2 0.1 —0.9"‘log2 0.9 (2.3)

2.1.2 Huffman coding

D.A. Huffman discovered a way to achieve compression by constructing codes
based on a set of message probabilities. This is the well-known coding method called
Huﬂinan coding. It is a variable length coding method. The basic idea is that shorter
codes are used to encode symbols that occur ﬂequently and longer codes are used to
encode symbols that seldom occur.

Huffman’s algorithm takes a list of probabilities and constructs a full binary tree
whose leaves are labeled with the probabilities associated with the source messages.
Initially there is a forest of single node trees, one for each probability in the list. In each

step, locate two trees corresponding to the smallest probabilities, p, and p j. They are
selected to construct a new tree with a combined probability of p, + p j. with the previous
trees becoming subtrees of the new tree. The probabilities p, and p j. are replaced by
p,+ p j. for the new tree. This process is repeated until only one value exists in the list.

The ﬁnal tree is the Huffman coding tree. This is best illustrated by an example. Given
ﬁve symbols with probabilities as shown in Table 2.1, the Huffman tree is shown in
Figure 2.1 and the ﬁnal Huffman code is shown in Table 2.2. Therefore CAD can be

encoded using 9 bits (101] 100 11).

14

Table 2.1: Probability of each symbol in the message.

 

 

 

 

 

 

 

Symbols Probabilities (p)
A 0. 13
B 0.40
C 0. 10
D 0.29
E 0.08
Total 1 .00

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.08 0.10

 

 

Figure 2.1: Huffman tree.

Table 2.2: Huffman codes.

 

 

 

 

 

 

Symbols Binary Codes
A 100
B 0
C 10 l l
D l 1
E 1010

 

 

 

 

Huffman coding provides the optimal solution if the optimum number of bits per

symbol is integral. In this case, all symbol probabilities are exact powers of %. But this is

seldom practical. To make Huffman coding more effective, symbols can be blocked into

15

n-grams and these n-grams can be used as the coding units. This is the shortcoming of

Huffman coding.

2.1.3 Arithmetic Coding

Arithmetic coding was invented by Rissanen [35]. It is also an entropy-based
lossless coding method. However, it is much more ﬂexible than Huffman coding.
Arithmetic coding guarantees that any message can be encoded in the number of bits
dictated by its entropy. Arithmetic coding encodes symbols using non-integral numbers
of bits. This coding method is computationally efﬁcient and computes the code
incrementally, one symbol at a time. In arithmetic coding, a message is represented by an
interval for real numbers between 0 and 1. The longer the message, the more precision is
required for the real numbers. It means the number of bits needed to specify the range
increases. Before anything is processed, the range for the message is [0,1). As each
symbol is processed, the range is narrowed to the portion allocated to the symbol. The
arithmetic coding process is illustrated in Figure 2.2. The symbol probabilities are shown

in Table 2.1. Any value in the ﬁnal range can be used as an encoding of CAD.

16

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.0 1.0
E D c B A
0.37 0.47
E D C B A
0.457 0.47
0.45804 0.46181

Figure 2.2: Arithmetic coding process.

2.2 Media data compression

Media data oﬁen exhibits high degrees of redundancy. As an example, the value
of a pixel in an image can be predicted with high probability based on the value of
neighboring pixels. This predictability often extends of large areas of media data.
Hence, simple ﬁrst order predictions or models are often insufﬁcient. What is desired is
a way to decrease the redundancy in the media data as a unit rather than attempt to
predict this redundancy.

Transform coding methods convert source data to an alternative representation
that decorrelates the data, thereby reducing the redundancy. Fourier Transforms, Wavelet
Transforms, Vector Quantization and Fractal are important transform techniques used in
media data compression. The main reason for using transform-coding methods on media
data is to take advantage of correlations among media data. For example, image

compression can be achieved by transforming its pixels to a decorrelated representation.

17

An original image is represented as pixels in a spatial domain. If the image is subject to a
Fourier transform, the data is transformed to the Fourier domain. This domain exhibits
information packing, meaning the information in the image is packed into fewer actual
data points, each of which is orthogonal to all other data points. JPEG and MPEG are
discussed in this section as an illustration of how these methods are applied in static

image compression and moving sequence compression.

2.2.1 Fourier Transform (FT)

The Fourier transform is used to transform a continuous time signal into the
frequency domain [36, 37]. Joseph Fourier showed that any function could be expressed
as an inﬁnite sum of sinusoids of different ﬂequencies. Sine waves of different
ﬂequencies are orthogonal. The transformed signal contains exactly the same information
as that of the original ﬁmction and they differ only in the information presentation [38].
The Fourier transform is an invertible transform. The transformed ﬂequency domain
signal can be converted back to the original time domain signal and vice versa.
Continuous Fourier transform

For a continuous function, the Fourier transform pair is given by the following
two formulas. Equation 2.4 is the forward transform. Equation 2.5 is the reverse

transform.

00") = [gar-Wat (2.4)
g0) = ]G(f)ej2””df (2.5)

18

In the above equations, j is the square root of -1. g(t) is a continuous function in
the time domain and G( f) is the corresponding FT of the function in the ﬂequency

domain.
Discrete Fourier transform
In the case of sampled discrete functions that are represented by values at equally
spaced points, the discrete Fourier transform is utilized. It is an invertible linear
transformation and widely used for the analysis and design of signals and systems.
Although the Fourier transform has been widely used in many areas, it has its
limitations. This transform is not a suitable technique for non-stationary signals. A Non-

stationary signal is a signal that has different characteristics at different times.

2.2.2 Wavelets

Wavelets are functions that meet certain mathematical requirements and are
utilized to represent other functions and data [39]. The main idea of wavelet transform is
to select a mother wavelet, a wavelet prototype function, and use it to explore the
properties of a nonzero function in a small interval. The mother wavelet is then translated
to another interval of t and used in the same way. Different ﬂequency resolutions are
explored by sealing the mother wavelet with a scale factor. The original signal can then
be represented using coefﬁcients in a linear combination of the wavelet functions. If
some coefﬁcients are ignored based on a predeﬁned threshold, the original data is
represented by a small number of coefﬁcients. This makes wavelets an excellent tool in

the ﬁeld of data compression.

l9

Wavelet transforms are oﬂen classiﬁed into two categories, namely the
continuous wavelet transform (CWT) and the discrete wavelet transform (DWT). The
continuous transform is deﬁned as a continuous, multi-resolution transform built up ﬂom
a mother wavelet. It decomposes a function into a set of basis functions. More detailed
information about CWT can be found in the book written by Chui [40]. The discrete
wavelet transform is applied to discrete data sets and generates discrete outputs. DWT is
more commonly used in practice than the CWT because the information being
transformed is usually discrete. The discrete wavelet transform provides sufﬁcient
information both for analysis and synthesis of the original signal. The signal is passed
through a series of high pass ﬁlters to analyze the high ﬂequencies and a series of low
pass ﬁlters to analyze the low ﬂequencies [41].

Wavelet transforms are especially useful for compressing image data. An image
can be represented as sets of real coefﬁcients aﬂer applying the wavelet transform. Most
of the wavelet coefﬁcients of a typical image are close to zero, so the image thus is well
approximated with a small number of large wavelet coefficients. Wavelet compression is
achieved by quantizing and encoding wavelet coefﬁcients.

The Haar transform is an example of the simplest wavelet transform. An example
of applying Haar wavelet transform to a gray image will be demonstrated shortly. The
Haar wavelet transform uses a scale function and a wavelet to represent a large number of
functions. Calculating averages and differences is the principle of the Haar transform.

Each step of Haar transform computes a set of wavelet coefﬁcients (differences) and a set

. n n .
of averages. For a n-element data set, there Will be -2- averages and Ewavelet coefﬁcrent

values after the ﬁrst step. The averages will be used as inputs for the next wavelet

20

calculation step until a single average and a single wavelet coefﬁcient are computed.
This replaces the original data set with an average followed by a set of differences.

It is easy to explain the Haar transform using matrix multiplication. Suppose
there is a simple grayscale 8 by 8 image and the values of the ﬁrst row pixels are (8, 7, 6,
5, 4, 3, 2, 1). Matrix A1, A2 and A3 as shown in Equation 2.6 are used in the ﬁrst, second
and third steps of the Haar transform respectively. For this data set it only needs 3 steps

to get Haar wavelet transform of the original data items. The result is shown in Equation

 

 

 

 

 

 

2.7.
\
(11000000
2211 nloooooo‘
-— r w
11 00——0000 2.
000000;; 5’5000000 00100000
A1: A2: A3:
1-1000 00 001_10000 00010000
22 003021000 00001000
001__1_0000 00000100
22 00000100
11 0 0 10 00000010
0000—-—oo ° 000 (00000001)
22]] (00000001)
00 000———
l 2 2/
/8\ (4.5\
7 2
6
5 1
A3 * A2 * A1* = (2.7)
4 0.5
3 0.5
2 0.5

 

 

 

 

The above example shows that the Haar transform can be achieved by building

matrices and multiplying them to get a result matrix W. The matrix W is used to convert

21

the original data set to a set of wavelet coefﬁcients. To compute the Haar transform of a
complete two-dimensional image, the standard algorithm starts by applying W to all the
rows of the original image and then it applies W to all the columns of the transformed
image. After applying the Harr wavelet transform, the pixel values of ﬁnal image should
be smaller than the corresponding values in the original image. The ﬁnal values can be
compressed using some basic coding methods. Lossy wavelet image compression can be

achieved by discarding some small coefﬁcients.

2.2.3 Vector Quantization

Due to the fact that adjacent data items in an image or digitized voice are often
highly correlated, vector quantization (V Q) is commonly used to compress images and
digitized sound [42]. It is a lossy data compression method. A vector quantizer is an
approximator. As a simple illustration, a one-dimensional two-bit VQ is shown in Figure

2.3 and a formalized representation of this quantizer in Equation 2.8.

 

00 01 10 11
4 4 IL —8 l a i 4 >
3 2 -1 0 1 2 3

—3 x 2

VQ(x)==< '1 ‘2 S x S 0 (2.8)
1 05x32
. 3 _

 

A two-dimensional VQ is illustrated in Figure 2.4. A two-dimensional quantizer

assigns any input point (a pair of number) in the plane to one of a particular set of 4 black

22

dots in the plane. In this example, there are 4 regions and 4 black dots. This is a two-
dimensional, two-bit VQ. In these two examples, the black dots are called code vectors
and the region associated with each dot is called an encoding region. The codebook

consists of all the code vectors and all encoding regions form a space partition.

 

 

 

 

" 1 1 1 1 1 r
- - - 2

w—l

Figure 2.4: Two-dimensional VQ.

When the VQ method is applied to image compression, the image is ﬁrst divided
into small blocks of pixels, typically 2x2 or 4x4. The compression is achieved by
compressing the pixel blocks instead of individual pixels. The encoder maintains a
codebook. For each image block, the encoder outputs a pointer that points to the
corresponding entry in the codebook and writes it on the compressed stream. Linde et al.

proposed an efﬁcient algorithm (LBG) for designing an optimal vector quantizer for a

23

given signal source. The LBG algorithm is the basis of many vector quantization methods
used in image and voice compression [43].

Fractal compression is a form of vector quantization that utilizes a virtual
codebook [44]. Compression is achieved by locating self-similar sections of an image,
then using a ﬂactal algorithm to generate the sections. This technique generates lossy
compression. Compression is slow but decompression is fast. Fractal techniques are

widely used. They are used in generating terrain world, image compression and so on.

2.2.4 JPEG

JPEG stands for Joint Photographic Experts Group and is a standardized image
compression mechanism. It is designed for compressing continuous-tone still images
[45]. JPEG utilizes the fact that the human eyes perceive small chrominance changes less
accurately than small luminance changes. Thus, JPEG compressed images are suitable to
be looked at by humans but not for machine image analysis. JPEG provides several
operation modes so the user can choose different modes to compress the image according
to different needs. JPEG is a lossy compression mechanism. JPEG allows the user to
achieve a desired compression/quality tradeoff by adjusting compression parameters.
Storing full color information (24 bits/pixel) is also an advantage of JPEG.

Discrete Cosine Transform (DC T) is a variant of the discrete Fourier transform. It
can transform an image ﬂom the spatial domain to the frequency domain. Given a n by 11
image, the general equation for its two dimensional DCT is deﬁned by the Equation 2.9.

JPEG standard uses 11 equal to 8.

n—ln-l

D11j)=—f—=C(i)C(j)ZZp(x.22y>cos(”—(——x l)’)co Mfg-£91) (2.9)
x-0y=0

24

1 .
where C(i): :5- fort—l andOSi,an—l.
1 otherwise

In Equation 2.9, p(x, y) is the intensity of the pixel in row x and column y; D(i, j) is the

DCT coefﬁcient in row i and column j of the DCT matrix. For most images, much of the
image information lies at low ﬂequencies. The upper left values of the DCT matrix
represent lower ﬂequencies. The higher ﬂequencies are represented by the lower right
values. The lower right values are often small, so the DCT makes it possible to achieve
image compression by neglecting those values with little visible distortion.

Figure 2.5 shows the JPEG compression diagram. The color images are
transformed ﬂom RGB color space to luminance/chrominance color space and organized
in groups of 8 x 8 pixel blocks. The JPEG standard applies the two dimensional DCT to
each 8 x 8 pixel block to generate an 8 x 8 coefﬁcient matrix for the corresponding
ﬂequency components. Each DCT coefﬁcient is divided by its quantization coefﬁcient
(QC) and rounded to an integer. The 64 quantized DCT coefﬁcients of each pixel block
are encoded using entropy encoder to generate the compressed bit stream. JPEG is a

symmetric compression method because the decoder performs the reverse steps.

 
   
 

_ , 01 100...
Quantrzatron

8x8 pixel block

Figure 2.5: JPEG compression diagram.

25

2.2.5 MPEG

MPEG stands for Moving Pictures Experts Group. The MPEG group has devised
numerous standards for compression and representations of digital video and multimedia
data. The MPEG standards for compressed video data include MPEG-1, MPEG-2 and
MPEG-4 [46]. MPEG-1 is intended for intermediate data rates, on the order of 1.5 M
bits/s. MPEG-2 is intended for high data rates of at least 10M bits/s. MPEG—4 is intended
for very low data rates of less than 64 K bits/s.

MPEG represents video sequence using three types of ﬂames: I, P and B—ﬂames. I
stands for intra—coded ﬂames. I-frames are coded independently. I-frames are basically
encoded using JPEG compression algorithm. A P-frame is inter-coded using its
predecessor. A P-frame is a differential representation of a ﬂame based on the previously
reconstructed I or P-ﬂame as shown in Figure 2.6. A B-ﬂame is inter-coded based on
both past and future I-ﬂame and P-frame as shown in Figure 2.7. MPEG utilizes
differentially encoded ﬂames to take advantages of interﬂame correlation of digital

video.

 

 

 

 

I P P P
W

Figure 2.6: Illustration of an inter-coded P frame.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2.7: Illustration of an inter-coded B frame.

26

The macroblock is the basic building block of an MPEG picture. It is composed
of a 16 x 16 block of luminance samples and two 8 x 8 chrominance sample blocks,
therefore a macroblock has six 8 x 8 blocks of samples. To compress a macroblock, the
MPEG standard applies a DCT to each 8 x 8 sample block to create decorrelated values,
quantizes the DCT coefﬁcients and encodes the quantized results. The main difference
ﬂom JPEG is that MPEG uses different quantization tables, different code tables and
different rounding methods for I-ﬂames and non I-ﬂames.

Motion Compensation is an important element in MPEG inter compression. It can
only be used for non I-ﬂame coding. In this mode, the pels are predicted by the previous
reference ﬂame. The smallest detail that can be reproduced in a picture is about the size
of a picture element, referred to as a pel in video terminology [47]. Before applying DCT,
the pels are subtracted to get the difference. The results are quantized and encoded.
Motion is normally the main cause of differences between the current picture and the
previous reference picture. Matching a region in the current picture with different
regions in the previous reference picture can provide a template for the region that is a
good prediction of the actual data values. Macroblocks are used as an elementary region
in motion compensation. A P-ﬂame uses an earlier I-ﬂame or P-frame as a reference
picture, so P-frame uses forward prediction. Forward prediction means that a target
macroblock to be encoded is matched with a set of displaced macroblocks of the same
size in a past reference picture. A B-frame may use forward or backward prediction, or
both. Backward prediction means that a target macroblock can be predicted by a

prediction macroblock ﬂom the future reference picture.

27

For decompression, MPEG reconstructs the pels of the entire video sequence. It
reads the codes of a block ﬂom the compressed data stream, decodes them, recovers the
quantized DCT coefﬁcients and takes the inverse DCT. For P and B-ﬂames, motion
compensated prediction should be added to the result of inverse DCT. The entire video

sequence is decoded picture by picture and macroblock by macroblock for each picture.

2.3 Summary

This chapter reviewed some standard compression techniques for both text and
multimedia data. These methods are not directly applicable to 3D geometry. The text
compression methods are ineffective because the actual symbol properties in geometric
data are not well distributed. There is a much larger set of symbols and they are used
with small frequency. Hence, predictive methods based on symbol probabilities are not
generally effective. Pure transform techniques have been applied to geometry
compression [48] and animated geometry compression [23, 26], but the methods that will
be presented in this thesis are more closely related to wavelet compression methods.

The system design features of MPEG take advantage of the temporal redundancy
in video sequence and are well developed in their support of playback controls and
stream joining. Many of the general concepts of MPEG will be applicable to 3D
geometry compression, though with considerable changes, as illustrated in Chapter 4 and

Chapter 7.

28

Chapter 3

3 Related work on 3D compression

The standard compression techniques discussed in the previous chapter are
effective for text and media compression. However, these methods are not well adapted
to compression of 3D geometric data. This chapter reviews existing work on static and
animated geometry compression and provides the foundation for new development in
animated geometry compression. Section 3.1 reviews the related work on geometry
compression and the animated geometry compression techniques are reviewed in Section
3.2.

Geometry compression can be divided into static and animated methods. Static
methods encode a single geometry model for transmission and are typically applied to
walkthroughs and applications where the animation is provided locally (such as some
gaming applications). They also provide the intraﬂame coding necessary for any
practical animated geometry compression solution, such as the basic concepts of JPEG
compression are used to provide the intra-coded frame solution in MPEG compression.

Animated geometry compression encodes a sequence of geometry models and is

suitable for delivery of fully animated content such as movies or interactive games.

3.1 Static geometry compression

This section reviews existing static geometry compression techniques. Section

3.1.2 presents different ways to represent 3D geometry. Static geometry compression

29

techniques can be grouped into two categories, vertex compression techniques and
connectivity compression techniques. Section 3.1.3 and Section 3.1.4 review vertex

compression and connectivity compression techniques respectively.

3.1.1 Introduction

With the increasing popularity of 3D graphics, large amounts of 3D data are
routinely accessed over the Internet, while whole new applications for 3D graphics await
more efﬁcient ways to access the data. Due to the increasing complexity of 3D graphics,
it becomes more difﬁcult to efﬁciently store and transmit the massive 3D models that
form the input for any rendering system. Geometric data is quite large and, therefore, it is
valuable to develop effective compression methods to decrease the transmission and
storage requirements.

Geometry compression has become a rapidly growing ﬁeld since Deering
proposed a geometry compression technique [5]. Various strategies have been developed
for geometry compression since then. Geometry compression has initially focused on
non-progressive techniques. Progressive geometry compression has become an active
topic since Hoppe introduced progressive meshes [10]. Progressive techniques provide
for partial data transmission at reduced resolution. It is good to compress 3D models in
progressive fashion to meet the need of progressive transmission, display and Level of
Detail (LOD) control. For progressive compression, information is compressed at
different resolutions. The coarsest version of the model is encoded ﬁrst and followed by
ﬁner detailed versions of the model. Most common geometry compression techniques are
in a non-progressive fashion [5-9, 49-51]. Researchers have proposed progressive

compression techniques [10-18, 52, 53].

30

View-independent methods compute LODs for each object in the scene. At run
time the rendering system selects a proper LCD for each object based on the distance. As
the object is further away ﬂom the viewer, the coarser LCD for this object is used. View-
dependent approaches use a dynamic representation of the model [53-56]. It is
continuously queried at run time to produce an appropriate representation of the model to
the user’s current viewpoint. View-dependent approaches address the problem of view-
dependent LOD control. They ﬁnd and update a mesh according to the viewing
parameters. View-dependent methods have some advantages over view-independent
approaches. An object can be rendered at multiple LODs. It is a very useful tool for real
time rendering complex 3D environments. Hoppe’s work represents an arbitrary triangle
mesh as a hierarchy of geometrically optimized reﬁnement transformations [54, 55]. This.
representation makes it easier to retrieve accurate approximating meshes based on view
parameters. Increased run-time computation is one drawback of view-dependent

approaches.

3.1.2 Geometric data representations

Geometric data is used to represent 3D objects in a computer. It provides
foundations for computer graphics, computer-aided geometric design and visualization.
3D models can be represented in many different ways. These representations can be
classiﬁed into four categories: raw data, surfaces, solids and high-level structures [57].
Raw data is unstructured data acquired directly ﬂom 3D modeling devices. It includes
point clouds, range images and polygon soups. Point clouds are obtained ﬂom range
scanners as a set of unstructured 3D point samples. Range images are obtained ﬂom a

range scanner and are a set of points mapping to pixels as a depth image. Unstructured

31

sets of polygons are called polygon soups. Mesh, subdivision surfaces, parametric
surfaces and implicit surfaces are in the surface category. Connected sets of polygons are
called Mesh surfaces. The polygon mesh is the most common modeling method for 3D
computer graphics. Surfaces can also be built by using coarse meshes and subdivision
rules that specify how resolution is increased on the mesh. This representation is referred
to as subdivision surfaces. Parametric surfaces describe surfaces using a small set of
parameters such as control points. Spline and Bezier patches are examples of parametric
surfaces. A surface can also be deﬁned in an implicit way, an implicit surface, using a
function to describe all the points on a surface. An equation for a sphere deﬁnes an
implicit surface.

Solids categories include Voxels and BSP trees. As an example, voxels obtained
from CAT or MRI are a uniform grid of volumetric samples. From polygonal
representations, binary space partitions with solid cells labeled can be constructed. Scene
graphs and skeletons are high-level structures to represent 3D objects. A scene graph is a
hierarchical representation of a graphical scene.

Although many representations have been proposed for 3D models, the polygon
mesh (and the more constrained triangle mesh) is the most prevalent representation. A
polygon mesh is a very ﬂexible and simple means of describing a 3D surface. 3D
graphics adapters optimized for triangles reinforce this trend and, in fact, encourage the
trend to make the polygons all triangles. Since all other representations can be converted
to triangle meshes, triangle mesh representation has become a popular format to represent

3D models.

32

A triangle mesh is represented by vertex data and connectivity information.
Vertex data consists of coordinates of all the vertices and (optionally) vertex normal
vectors and texture coordinates. Connectivity information is simply a triangle list where
each triangle is a list of vertices. Figure 3.1 illustrates a 3D model and its representation.
The PLYﬁle format is an example of this data structure [58]. A PLY ﬁle consists of a

header followed by a list of vertices and a list of polygons.

 

(a) (b)

 

Vertex1 :X1 Y1 Z1 Face1 : 3 1 4
+Vertex2 : X2 Y; Z; Face2 : 2 1 3
Vertex3 : X1 Y; Z; Face3 :2 5 1

 

 

(C)

Figure 3.1: This head model has 1 1,703 vertices and 23,402 triangle faces. (a) Wire frames. (b) Shaded
Illustration. (c) Triangle mesh representation.

3.1.3 Vertex compression techniques
Vertex data and connectivity information are the main elements of the geometry.

The standard geometry representation used in modern graphics hardware is a set of

33

vertices containing xyz coordinates, and a set of faces containing indices referring to the
vertices. Geometry compression techniques can be classiﬁed into two categories: vertex
compression and connectivity compression techniques.

Vertex data consists of coordinates of all the vertices and optional normal vectors
and textures. Each coordinate component of a vertex is usually a ﬂoating-point number.
According to the survey by Gotsman et al. [59], there are three vertex compression
techniques in the literature, namely quantization, prediction methods and spectral

methods.

3.1.3.1 Quantization

Quantization is a common technique to compress numerical data by discarding
excess resolution. It is a lossy compression method because the recovered data after
decoding is not identical to the original data. Uniform quantization is a typical form of
quantization, though the vector quantization methods described in Chapter 2 are all
applicable. If the input range is divided into levels with equal spacing, a uniform
quantizer is described; otherwise it is non-uniform quantization. They both have
advantages and disadvantages. Uniform quantization is easy to implement and fast to
quantize/dequantize, but the quality of reconstructed data is not optimal. A non-uniform
quantizer has higher computation cost, but the reconstructed data has optimal quality.

Deering used a uniform quantizer to reduce the number of bits for each
coordinate component of each vertex to 16 bits [5]. After quantization, vertex locations
and colors were delta coded followed by modiﬁed Huffman compression. He used a
table-based approach for normal vectors. In this method, any normal can be represented

by 18-bit index and many normals can be represented with 8 bits or smaller index deltas.

34

3.1.3.2 Prediction methods

Some researchers have proposed a prediction idea that improved Deering’s
method by considering the features in the geometry. Predictive encoders work better than
pure quantization methods. A predictive encoder takes the advantage of the correlation
between the position of a vertex and the position of its neighboring vertices. Predictive
encoders are based on vertex estimation that is derived ﬂom the incidence graph and
ﬂom previously decoded vertex positions. Taubin and Rossignac used a linear predictive
encoder [6]. Vertex coordinate components are ﬁrst quantized to a ﬁxed number of bits
(8,10 or12 bits) by bounding the interval in which the coordinates lie. A vertex-spanning
tree is used to predict the geometry of each vertex as a linear combination of the
geometry of its ancestors in the tree. The geometry of a new vertex can be expressed as
’1le + 112v, + 13v, + e, where A, , 112 and ,1, are estimated by minimizing the prediction
errors over the entire mesh, where v1 ,v2 and v3 are the immediately proceeding ancestors

of this new vertex in the vertex spanning tree, and where e represents the prediction error.
Geometry is predictively encoded and the prediction errors are encoded with standard
lossless entropy coding techniques. Other properties, such as normals, colors and texture
coordinates are encoded in a similar way. This method can compress the geometry to
approximately 12 bits per vertex. Touma and Gotsman proposed a more sophisticated
prediction scheme to achieve a more accurate prediction [8]. The “parallelogram” rule
was used to predict vertex positions. Using this rule, the geometry of a vertex is predicted
ﬂom the geometry of vertices surrounding it on the mesh surface. This rule is based on
the assumption that the two adjacent triangles in a smooth mesh can approximately form

a parallelogram, so the geometry of fourth vertex may be predicted ﬂom the other three

35

vertices. The vertex d of the triangle incident on the edge bc can be predicted by

d” = c+b —a , where a is the third vertex of the other previously processed triangle
incident upon edge bc , as shown in Figure 3.2 [59]. For typical inputs, their method can

achieve approximately 9 bits per vertex.

pPredicted Vertex

 

b

Figure 3.2: Parallelogram Rule.

Pajarola and Rossignac used a butterﬂy prediction method to estimate the original
non-collapsed vertex positions [11]. The basic idea of their approach is to predict the
position of a vertex by computing the weighted sum of surrounding vertices at

topological distance 1 or 2 on the triangulation graph.

3.1.3.3 Spectral methods

Spectral methods are similar to the transform coding methods used for image and
signal compression. Kami and Gotsman [48] ﬁrst employed mesh spectral analysis to
compress vertex coordinates of polygonal meshes. They built a set of geometric data that
represents basis functions for the geometry of the entire mesh. Transform coding

methods work by transforming a function (in this case the discrete vertex locations) to an

36

alternative domain such that the original data can be reconstructed as a weighted sum of
the basis functions in the domain.

Their method computes the spectral coefﬁcients by building a mesh Laplacian
matrix ﬂom the topology information of the mesh. The Laplacian matrix is a n by n
matrix, where n is the number of vertices in the mesh, and each element in the matrix is
computed using Equation 3.1, where d, is the degree of vertex i. The next step of their
approach is to compute the eigenvectors of this matrix to obtain a set of basis functions.
The spectrum of the geometry can be obtained ﬂom the spectral basis functions. For
meshes containing more than 1000 vertices, it is not practical to compute eigenvectors. In
this case, an efficient partitioning algorithm must be used to generate submeshes and

mesh spectral analysis can be applied to each submesh separately [48].

1 if i and j are the same vertex
L . = —— i and j are neighbors (3.1)

0 otherwise

3.1.4 Connectivity compression techniques

A large part of geometry compression research has been focused on connectivity
information compression. Connectivity compression aims to ﬁnd a way to reduce
repeated references to vertices shared by many triangles. Most of these techniques
compress connectivity information in a lossless way [6, 8, 9, 49]. Vertex positions can be
subject to some loss, but actual geometry loss must be controlled carefully so as to create
an object that is perceptually identical to the original.

Some existing connectivity compression methods assume speciﬁc structures in

the geometric data such as triangle strips [5] or spanning trees [6, 49]. Rossignac’s

37

Edgebreaker is a technique suitable for triangle manifold meshes [9]. In a manifold mesh,
the neighborhood of each vertex can be continuously deformed to a disk and each edge is
shared by no more than two triangles [60]. In their method, they used a preprocessing
stage to renumber the vertices of the mesh and represent the model in a half edge data
structure, then traverse triangles, recording the history in terms of 5 op-codes. The op-
code is used to describe the topological relation between the current triangle and the
boundary of the remaining part of the mesh [9]. Connectivity is encoded in 1.5 to 2 bits
per triangle. Isenburg and Snoeyink extended the edgebreaker approach proposed by
Rossignac to compress polygonal manifold meshes [50]. King et al. have worked on
compressing irregular quadrilateral meshes [51]. Some researchers also extended earlier
efforts to handle non-manifold meshes. Till now, connectivity compression techniques
have been well developed. Connectivity representation and several signiﬁcant previous

published connectivity compression techniques are summarized in the following sections.

3.1.4.1 Connectivity representation

The simplest representation of each triangle is to store a list of vertex coordinates
that consist of three coordinates of each vertex in the triangle. For this approach, each
triangle is stored independently and the location of a vertex is repeated 6 times on
average, which turns out to be very inefﬁcient. To avoid storing the same vertex multiple
times, the vertices are stored in an indexed list and the connectivity (each triangle) is
stored as a list of indices into the vertex list. Each triangle can be represented using 3
integer numbers and 310g 2(I V I) bits are needed for each triangle, where |V| stands for
number of vertices in the model [9]. An improved method is to combine the advantages

of the above two methods by storing only the triangles, each represented by 3 vertex

38

descriptors [9]. In this scheme, each vertex descriptor begins with one bit switch
indicating a new vertex or previously seen vertex. If it is a new vertex, the switch bit is
followed by the three coordinates of this vertex; else if this is a known vertex, the switch

bit is followed by the [log,(p)-lbits that identify one of the P previous seen vertices.

Using this method, connectivity cost is 2.5 log 2 (| V |) +0.5 bits for each triangle [9]. Later
on, Bar-Yehuda and Gotsman proposed a method to visit the triangles in an order that

guarantees no more than 13(|V|)°'5 vertices are exposed at any given time [61].
Connectivity cost can be lowered down to 1.2510g2 (l V l) +9.75 bits per triangle with this

improved method.

3.1.4.2 Triangle-strip based technique

Deering presented the idea of triangle strips. A mesh representation based on this
idea is supported by OpenGL [62] and other graphics libraries. This representation aims
to reduce the number of times the same vertex is transferred and processed by the
graphics subsystem. In a triangle strip, each triangle is represented by combining a new
vertex with two previous seen vertices. To achieve this, ordering the triangles is needed
to make consecutive triangles adjacent to each other. Each new triangle shares an edge
with the previous triangle in the strip. Therefore a triangle strip is a series of connected
triangles. The Figure 3.3 is an example of a triangle strip that consists of 4 triangles: 0 1

2;213;234;435

39

 

012345...

Figure 3.3: Triangle strip.

Some geometry, such as that in Figure 3.3, can be represented by one triangle
strip, but geometry such as that in Figure 3.4 is not suitable to represent using a single
triangle strip. The interior vertices will appear twice in the strip, which is undesirable.
Deering used a generalized technique to avoid references to old data. Figure 3.4 is an

illustration of generalized triangle mesh [5].

40

 

Generalized Triangle Strip:

R6, 01, O7, 02, 03, M4, M8. 05, O9. 010, M11,
M17, M16, M9, 015, 08, 07, M14. 013, M6,
012, M18,M19,M20, M14, 021.015.022.016.
023, 017, 024, M30. M29, M28. M22, 021, M20,
M17, 026, M19. 025. 018

Generalized Triangle Mesh:

R6p, O1, 07p, 02, 03, M4, Map, 05, 09p. 010, M11,

M17p, M16p, M-3, 015p, 0-5, 06, M14p, 013p, M—9,

012, M18p, M19p, M20p, M-5, 021p, 0-7, 022p, 0-9,

023, 0-10, 0-7, M30, M29, M26, M-1. 0-2, M-3.

M27, 026, M-4, 025, 0-5

Legend:

First Letter: R=Restart. 0=Replace Oldest, M= Replace Mlddle
Trailing 'p'=push into mesh buffer

Number is vertex number,-number is mesh butter reference where —1 is
most recent pushed vertex.

Figure 3.4: Generalized triangle strip.

Deering’s method is achieved by encoding geometry using generalized triangle
strips and by using lossy techniques to encode vertex coordinates and other properties [5].
In the ﬁrst step, the triangle data are converted into generalized triangle strips which are a
near-optimal representation of triangle mesh given ﬁxed storage. After that, quantization
techniques are used to quantize vertex coordinates, color and vertex normals. The
quantized results are delta encoded between neighbors and the last step is to encode the
delta values using modiﬁed Hufﬂnan encoding method. Compression ratios obtained by

using this technique are between 6 and 10 to 1 based on the original representation format

41

and the desired result quality. Deering’s compressed format is designed for reducing the
bandwidth requirements of the rendering hardware [5]. His work has been incorporated

into the J ava3d API by Sun Microsystems.

3.1.4.3 Topological surgery approach

Taubin and Rossignac’s work on geometry compression through topology surgery
[6] improved on Deering’s [5] earlier results. Their method is not directly suitable for
hardware rendering with limited on board memory. It aims to achieve high compression
efﬁciency for transmission over networks. This approach can preserve the connectivity
information. It is a single-resolution mesh compression scheme. It is derived ﬂom the
body of work on encoding of planar graphs for traversal of manifold meshes in a
particular fashion to obtain a vertex spanning tree. To encode the connectivity
information, a binary triangle spanning tree of the mesh is constructed by cutting through
the edges of the vertex spanning tree. The triangle spanning tree can be encoded in the

same way as the vertex spanning tree.

3.1.4.4 Progressive approaches

This section summarizes the multi-resolution mesh compression schemes.
Compressing polygonal meshes in progressive fashion can meet the need of progressive
transmission, display and LOD control. It is a requirement for progressive transmission.
These techniques can further improve the performance of graphics rendering hardware.
The mesh with reduced resolution is used when it is far away ﬂom the camera, and the
high-resolution mesh is used when it is closer to the camera. Progressive compression is

used to create a compressed representation for geometry models so that the compressed

42

ﬁle consists of coarse version of the model followed by more detailed versions of the
objects. Progressive techniques provide for partial data transmission at reduced
resolution. Another advantage of this representation is that the user can view the coarse
model before the entire object is decoded.

Hoppe proposed progressive meshes representation [10]. It can be used to
transmit a 3D mesh progressively, starting ﬂom a coarse mesh and reﬁning the mesh by
inserting new vertices one by one. He applied a vertex insertion operation, which is the
inverse of the edge collapse operation used in many mesh simpliﬁcation methods [63].
Progressive mesh representation is a scheme for storing and transmitting arbitrary
triangle meshes. The advantage of this representation is that it keeps the geometry of the
original mesh and its overall appearance.

The ﬁrst step to generate a progressive mesh is edge collapse transformations. An
edge collapse transformation merges two adjacent vertices into a single vertex and two
(one if boundary edge) adjacent faces disappear in the process as illustrated in Figure 3.5.
The process starts ﬂom the original mesh and does a sequence of edge collapse
transformations until it reaches a much coarser mesh. Because edge collapse
transformations are invertible, the original mesh can be recovered ﬂom the coarser mesh

using a sequence of vertex insert operations as shown in Figure 3.6.

43

VR

 

VR
Edge Collapse
Transformation

Figure 3.5: Edge collapse operation.

R
Vertex Insert
Operation

Figure 3.6: Vertex insert operation.

VR

 

v

The key point of this approach is to determine which edge to collapse during the
process. If this edge is randomly selected, the progressive mesh will not be able to keep
its appearance during reconstruction. The author improved the energy function proposed

in his earlier paper and used that as the metric [63]. Each edge selection attempts to

minimize the energy function. Therefore, the overall quality of progressive meshes is
decided by the energy function.

Hoppe’s progressive mesh representation is naturally a compact encoding method.
For a mesh with n vertices and approximately 2n faces, the author estimated that the size
of vertex insert records requires about (log(n) + 5)n bits. Using a similar method used in
Deering’s method with the progressive mesh representation, the geometry of the mesh
can be encoded in 31n to 50h bits.

Taubin et al. proposed a method to group Hoppe’s insertions into reﬁnements
[15]. A mesh is split into a forest of spanning trees of vertex runs. The forest split
operation can be seen as a grouping of several consecutive edge split operations into a
set. This decomposition is invertible so that the complete connectivity can be used to
reconstruct the original mesh. A high compression ratio can be achieved by using this
method when the size grows exponentially with LOD.

Li and Kuo combined progressive transmission of connectivity reﬁnements with
progressive transmission of vertex coordinates [12]. A vertex decimation scheme is used
to produce series of operations that decrease the LOD of the model. Their approach leads

to 0.510g,(| V l) + 5 bits for each triangle. Pajarola and Rossignac attempted to compactly

encode progressive mesh representations for triangle meshes [l 1]. It extends Hoppe’s
idea by grouping simpliﬁcations and reﬁnements to increase connectivity compression
efﬁciency. Khodakovsky et al. proposed a wavelet based scheme for progressive
geometry compression that works well on regular and semi-regular connectivity meshes

[17]. The limitation of their method is that it does not work effectively on arbitrary

45

meshes. They deﬁned wavelet basis functions over manifold surfaces for multi-resolution

analysis. The multi-resolution technique provides LOD progressively.

3.2 Animated geometry compression

A large body of geometry compression research has been focused on static
geometry, while only limited research has addressed animated geometry compression
[22-26, 64-66]. This section aims to review some notable compression techniques used

for 3D animation sequences.

3.2.1 Introduction

Animated geometry compression is the compression of temporal sequences of
geometric data. Animated geometry compression makes possible the delivery of an
animated motion picture as a 3D model using the least possible bandwidth in real time on
demand or as a multicast. Animated 3D geometry models require larger memory and
transmission bandwidth since a sequence consists of a large number of ﬂames and every
frame is an already large static 3D object. Therefore, efﬁcient compression techniques
need to be applied to the 3D animation sequence before distributing it in the network. The
following sections give a survey of recent notable work in animated geometry

compression.

3.2.2 Time-dependent geometry compression

Lengyel’s work aims to transmit and play-back large time-dependent geometry in
real time using the minimum required bandwidth [22]. His method is based on the idea

that with proper mesh representation, the gross movement of a geometric mesh can be

46

coded with a small set of controls and the residual differences can be quantized and
coded in a low bit rate. He presents a new view of time-dependent geometry as a media
stream and presents a new technique for taking advantage of the large amount of
coherence in time. This method exploits the coherence across animated ﬂames by using a
predictive encoder and describes a method to compress a generic animated geometry
stream by solving for few-parameter models and encoding the residual. This prediction
compression method splits vertices into sets and uses afﬁne transformations to
approximate the vertex paths.

This method is effective only when the animation is well represented by the

supported transformations.

3.2.3 Principle component analysis approach

Alexa and Muller proposed an interpolation predictor to represent animations by
principal components [23]. Their compression scheme works as follows: Given a 3D
animated mesh with n vertices and m ﬂames, all shapes are translated so the center of the
mass is in the original, and then an afﬁne transformation ﬂom each ﬂame to the ﬁrst
frame is computed using the linear least square ﬁt to normalize all the ﬂames. They then
build a 3n by 111 matrix A using the result ﬂames ﬂom the normalization step, where each
column of the matrix A is the geometry of one ﬂame. Singular value decomposition is
performed on matrix A to obtain the eigenvectors of AAT. These eigenvectors are stored
in the result matrix U. It is used to deﬁne a new basis replacing the original ﬂames. This
is a transform coding method, where the basis functions are determined directly ﬂom the

data set.

47

 

Principal components analysis makes their approach expensive. The approach
requires signiﬁcant memory usage and processing time. It cannot achieve high
compression ratios if the number of ﬂames is signiﬁcantly smaller than the number of
vertices. It is also hard to control the quality of individual ﬂames in the animation
sequence. Since the method is global over a large range of ﬂames, some ﬂames may
exhibit considerably larger errors than others or the average. It is good in terms of
smoothness of the shape.

Among those notable approaches, PCA can achieve higher compression ratios. In
order to compare PCA approach to the octree-based approach, PCA approach was

reimplemented in this thesis. The detailed comparison results can be found in Chapter 4.

3.2.4 Dynapack

Ibarria and Rossignac proposed a time—space predictor for all the vertices and all
the ﬂames [24]. The geometry of a vertex is predicted based on the geometry of its
neighbors in the same frame and the geometry of corresponding vertices in the previous
ﬂame.

The dynapack encoder runs very fast [26]. Additionally, it is predominantely
lossless in nature. The only lossy element of the algorithm is caused by quantization,
which can be ignored by using 12-bit quantization. This approach requires re-ordering the
vertices of the mesh to meet the needs of a particular conﬁguration of neighboring
vertices in the current and previous ﬂame. The compression ratio this approach can

achieve is approximately 10-22 to 1.

48

3.2.5 Geometry Video

Briceﬁo et al. proposed an algorithm to generate geometry video ﬂom a 3D
animation sequence [25]. This approach is based on the geometry image representation
proposed by On et al. [27]. A geometry image unwraps a polygon mesh surface onto a 2D
color image that can then be subjected to conventional image compression methods. The
method exploits conventional image compression methods indirectly by converting
geometric data into images.

A geometry video consists of a sequence of geometry images. To generate a
geometry video ﬂom a 3D animation sequence, they extend the three major steps of
constructing geometry image: cut, parameterization and compression. The algorithm
works as follows: They ﬁrst slice the mesh using the cut algorithm in order to create a
surface with the topology of a disk. Slicing converts a manifold surface into a surface.
Given a single cut that is the same for all the ﬂames in the sequence, they aim to seek a
single parameterization that minimizes the error across all the ﬂames. Once they have
determined a cut and a parameterization that work well across all the ﬂames, each ﬂame
in the animation sequence can be transformed into a geometry image. All the ﬂames in
the animation sequence are replaced by the corresponding geometry images. The
resulting geometry image sequence is called geometry video. They apply standard video
compression techniques such as MPEG to encode the geometry video.

Their approach supports LOD because it is easy to get a coarse version of the

mesh by removing every other pixel in each direction of the geometry images.

49

3.2.6 Soft-body animation compression

Kami and Gotsman presented a technique to compress soft-body animation
sequences [26]. Their approach is based on the principle component analysis idea
proposed by Alexa and Mﬁller [23]. Linear prediction coding is applied to the PCA
coefﬁcients in order to achieve ﬁrrther code reduction by capturing the temporal behavior
present in the sequence. Their compression scheme works as follows: they ﬁrst applied
singular value decomposition to a 3n by m matrix A, where n is the number of vertices
and m is the number of ﬂames in the animated 3D mesh. The eigenvectors of matrix AAT
are stored in matrix U. Coefﬁcient matrix (UTA) is obtained by projecting the original
animation onto the new basis. Using only low frequency parts results a new k by m
coefﬁcient matrix. In the end they apply linear prediction coding on the resulting

coefﬁcients.

3.3 Summary

This chapter reviewed geometry compression techniques, both static and
animated. Static geometry compression techniques can be divided into two categories:
vertex compression and connectivity compression techniques. These techniques are
suitable for static 3D meshes.

Connectivity compression is not the focus of this dissertation. This dissertation is
focused on compressing animated polygonal meshes with ﬁxed connectivity. Compared
to the huge amount of geometry information in each ﬂame, connectivity information is
negligible because it needs to be transmitted only once. A straightforward way to

compress 3D animated meshes is to apply static geometry compression algorithm

50

 

individually to each ﬂame in the animation sequence, but this does not capture the
temporal coherence present in the animation sequence. Animation compression
techniques need to exploit not only the space coherence, but also the temporal coherence
existed in the sequence.

The chapter also reviewed recent efforts to compress animated 3D meshes. All
these approaches assume ﬁxed connectivity. Each approach has its advantages and
disadvantages. The time-dependent geometry compression scheme needs to split the
dataset to ﬁt different transforms. PCA approach is computationally expensive for large
meshes. Dynapack requires re-ordering of the vertices of the mesh. The soft-body
animation compression scheme also suffers ﬂom signiﬁcant memory usage and
processing time because it is based on PCA approach. Geometry videos, while interesting
and colorful, are not really competitive as animated geometry compression methods.

In the next chapter, a novel octree-based animated geometry compression
approach is presented. This approach can achieve high compression ratios and quality
with no restriction on the data set. A simple and a low cost encoding process and a fast

decoding process make this approach quite suitable for real time applications.

51

 

Chapter 4

4 Octree-based animated geometry compression

This chapter presents a new algorithm for animated geometry compression. The
algorithm uses an octree-based motion representation with motion vectors associated with
the corners of the octree cells. The motion vectors capture the motion coherence within
spatial localities. This new method efﬁciently compresses 3D animated geometric data.

Section 4.1 introduces the data representation for 3D animation. In this thesis,
three different data sets are used to evaluate the performance of new algorithms in
relation to the existing methods. Section 4.2 presents these data sets. Section 4.3
describes the octree-based animated geometry compression in detail. Section 4.4 presents
the compression results, the comparison between the octree and PCA approaches and the
comparison between the octree approach and a simplistic approach that compresses each

frame in the sequence using static geometry compression.

4.1 Data representation

This chapter assumes the mostly simplistic representation of the geometric data:
triangle meshes for each discrete ﬂame in the animated sequence. The connectivity
information is assumed to remain unchanged for each ﬂame in the animation sequence.
Any polygonal mesh can be converted into a triangle mesh through triangulation [67, 68].
A triangle mesh is represented by its vertex data and connectivity information. Vertex

data consists of coordinates of all the vertices and, optionally, vertex normal vectors and

52

texture coordinates. Connectivity information is simply a triangle list. Each triangle

contains indices referring to the vertices. Figure 4.1 shows the data representation for 3D

 

 

 

 

 

 

 

 

 

 

 

 

 

 

animation.
WEE}? X1 71 21‘
iii 7 Vertex 2: x Y z m if
leed connectivity] 2 2 2 ' Frame 1
_- 4 LL- , Vertex 3: X3 Y3 Z3
Vertex 1: X, Y1 Z1
Vertex 2: X2 Y2 Z2
, H ‘,___- Vertex 3: X3 Y3 Z3
Vertex 1: X1 Y1 Z,
Vertex 2: X2 Y2 Z2 W
Vertex 3: X3 Y3 Z3

 

 

 

 

 

Figure 4.1: 3D animation representation.

This thesis is focused on compressing vertex positions for all the ﬂames.
Polyhedral simpliﬁcation techniques are not suitable to compress animated geometric
data because these methods cannot maintain the exact connectivity of the model. The
straightforward way to compress 3D animated meshes is to apply vertex data
compression techniques [5, 6, 8, 11] individually to each ﬂame in the animation
sequence, but this approach does not take advantage of the temporal coherence present in
the animation sequence. Animation compression techniques need to exploit not only the
spatial coherence, but also the temporal coherence in the sequence.

To show the data coherence in the animation sequence, the following tests are
conducted on the dance sequence. For each vertex, compare the motion of the vertex to

the average motion of its nearest four neighboring vertices in terms of vector length and

53

angle between these two vectors. Figure 4.2 shows the histogram of vector length
differences between the motion of the vertex and the average motion of the nearest four
neighboring vertices. Figure 4.3 shows the histogram of vector angle differences. These
tests show that the motion of a vertex is predictable ﬂom its neighbors. There is a great

deal of coherence among moving vertices.

 

    

 

1

| Dance Sequence Correlation (1)

‘ 600000

l 500000

' 5400000

= '-

] 8300000

] I$200000

] 100000

i 0 .. 7' . .. . .
NOODWI‘FIOF‘NMCDLOFN

] 35889338§§R88$

] §S555§§8 c.3388
.°°.°.°. . .°.°.°.°.°.°°.

] 99¢9999°°°°°°°

1 Vector Length Difference

L-L_, ,_,_,___#L L4 L7 L__i L.___ -_,m

Figure 4.2: Histogram of vector length differences between the motion of the vertex and the average motion
of the nearest four neighboring vertices.

54

rLL L L L L L LL L L L L L L LLL LLLL_LLLL_LLL_L__

Dance Sequence Correlation (2)

600000 ,
500000

Frequency

.ssss

 

0.515085 1 . “

0') V V ID to 8 O O O O 8 O 8

l\ v- 10 N ix 3 ‘— CO N
g Ix (O t!) CO (V) 1- N a N O N

to N (D (0 N ‘- 0 CD 1‘ V N 1-
O N I!) <1) g 1‘ O (O In no 1- N O (‘0
O. ‘- N. ‘2 . 'ﬁ 0’. Q ". “l V. . ‘9. °Q °’
C O O O O O O ‘- 0- 1— ‘- \— v- 1- 1-

Vector Angle leference

 

IKLLLLL‘ L7 __7L_L_L_L_,Li L L

 

Figure 4.3: Histogram of vector angle differences between the motion of the vertex and the average motion
of the nearest four neighboring vertices.

4.2 Test data sets

In this thesis, three different animation datasets are used to evaluate the
performance of the octree-based approach and compare to other approaches. The data
sets include “Chef”, “Chicken Crossing” and “Dance”. The Chef data was generated ﬂom
a 3D Studio Max animation. It was used as an initial test data set. “Chicken Crossing”
was created by Andrew Glassner et al.[69]. It was used as a test data set in the dynapack
approach proposed by Ibarria and Rossignac [24] and PCA approach proposed by Alexa
and Muller [23]. The “Dance” sequence was created by the MIT CSAIL Graphics Lab. It
was used in the geometry video approach proposed by Briceﬁo et al.[25]. To make it
easier to compare the octree-based approach to other approaches in the literature,

“Chicken Crossing” and “Dance” sequence were chosen as standard test data sets. Table

55

4.1 shows the properties of “Chef”, “Chicken Crossing” and “Dance” animation

 

 

 

 

 

sequences.
Table 4.1: Test data sets information.
Data sets Properties Chef animation Chicken animation Dance animation
Number of vertices 4241 3030 7061
Number of triangles 8162 5664 14118
Number of frames 75 400 201
Total size (bytes) 3,816,900 14,544,000 17,031,132

 

 

 

 

 

 

Figure 4.4 shows the ﬁrst frame and every 5‘h frame of the original Chef
animation sequence. Figure 4.5 shows the ﬁrst frame and every 20'h frame of the original
Chicken animation sequence. Figure 4.6 shows the ﬁrst frame and every 10‘h frame of the

original Dance animation sequence.

 

Figure 4.4: Sample frames from original Chef animation sequence.

56

O 260

 

110 120 130 140 150 160 170 180 190 200

Figure 4.6: Sample frames ﬂom original Dance animation sequence.
4.3 Octree-based approach

The octree-based approach presented in this chapter takes advantage of the spatial
and temporal coherence present in the data. Two consecutive ﬂames are needed to
generate a reduced set of motion vectors that represent the motion ﬂom the previous

ﬂame to the current ﬂame. The motion vectors are used to predict the vertex positions in

57

the decoder side for each frame except for the ﬁrst ﬂame. The process generates a
hierarchical octree motion representation for each ﬂame.

The method can be classiﬁed as a predictive modeler in that it attempts to create a
reduced set of data that can be used to predict the actual vertex motion over a localized

area. The motion vectors are then subject to encoding using arithmetic coding.

4.3.1 Encoding process

In a raw animated data set, an animated 3D model is represented by the vertex
positions for each ﬂame. Connectivity is assumed to remain constant. This is an
assumption in all of the current work on animated geometry compression, though future
work must eventually address dynamic changes in topology. Chapter 7 discusses system
issues including the interspersing of intra-coded ﬂames which can support geometry
additions or modiﬁcations as well as scene cuts.

In the motion modeling process, each ﬂame except the ﬁrst ﬂame is represented
by a set of motion vectors, which are stored in an octree. In a client—server scenario,
when the server wishes to send an animated 3D model to the client, it will send the
encoded ﬁrst ﬂame (using an intraﬂame encoding method) together with an encoded
octree for each of the subsequent ﬂames rather than sending vertex positions for each
new ﬂame. The encoded octree is signiﬁcantly smaller than the corresponding original
frame. The vertex locations for each ﬂame are approximated by the corresponding octree
on the client side.

The logical ﬂow of the encoding process is illustrated in Figure 4.7. To encode
the current ﬂame, the previous ﬂame needs to be available to the decoder, so the ﬁrst

ﬂame is encoded using an intraﬂame coding technique. A complete system will include

58

occasional intra-coded ﬂames so as to provide synchronization points, as in MPEG video
and as discussed in Chapter 7. The previous and current ﬂames are used to generate
vertex delta values representing the differential motion between the ﬂames. Motion
vectors are obtained by using a Least Square Error estimation method to estimate the
motion of enclosed vertices. The motion vectors are then subject to encoding using
adaptive arithmetic coding. To continue encoding the following ﬂames, a local decoder
is called to generate a decoded version of the current ﬂame as known to the receiving
system, which can be used to encode the next ﬂame at the encoder side. This process is

repeated for each new ﬂame.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Previous Frame
. Arithmetic
Delta 3D Motion Vectors Octree Coder
Current
Frame
De Frame 21 Decoder
Encoded Frame

 

 

 

Figure 4.7: Logical flow of encoding process for compressing 3D animated frames.

4.3.2 Motion-modeling process

The most important step of the encoding process is the construction of an octree.
This process is called the motion-modeling process. Jackins and Tanimoto showed that
octrees can be used to represent 3D objects [70]. Ahuja and Nash improved the idea
presented by Jackins and Tanimoto, and proposed a method to represent moving 3D

objects using octree structures [71]. The octree representation of the occupancy of space

59

 

by objects is obtained by recursive decomposition of the space into octants. Octrees are a
common method for modeling 3D data.

Instead of using an octree to represent a 3D object, this thesis uses an octree to
represent motion within space. Constructing a minimum cube (axis parallel bounding
box) that contains all the vertices of a 3D object as a starting box or cell is the initial step
of motion-modeling process. A motion vector is associated with each comer of the cell as
illustrated in Figure 4.8 (a). These eight motion vectors approximate the motion of the
enclosed vertices using interpolation of the motion vectors over the space. If the motion
of all vertices in the current cell is not well approximated by the motion vectors, the cell
is subdivided into eight child octants as shown in Figure 4.8 (b) and the corresponding
tree node generates eight children as shown in Figure 4.8 (0). Each octant has new
motion vectors that are stored in the corresponding tree nodes. The splitting process
continues until the motion of the enclosed vertices is estimated to be below a speciﬁed
threshold. Therefore, any reconstructed vertex in any ﬂame in the reconstructed
animation sequence satisﬁes the requirement that the error between the reconstructed
vertex and the corresponding original vertex is smaller than the threshold as illustrated in
Equation 4.1. In this thesis, the Max Distance threshold measurement is examined for the
splitting process. Distance for a vertex is deﬁned as Euclidean error between the original
vertex and the reconstructed vertex. Max Distance is the maximum error of all the
enclosed vertices. The threshold is set to be the maximum allowed distance between the
reconstructed vertex and its corresponding original vertex. It is represented by the
percentage of the edge length of the initial cubic bounding box of the object as illustrated

in the Equation 4.2.

60

 

 

 

Figure 4.8: (a) Eight motion vectors. (b) The splitting process. (c) Corresponding octree representation.

Vv 6 current cell

 

. , . 2 . 2 (4.1)
\/(V,. — V. )“ + (v), — vy) + (vz — v) < threshold
v and v' represent the original vertex and the corresponding reconstructed vertex.
Threshold = %Length (4.2)

61

Length is the edge length of the initial cubic bounding box. The chosen percentage
inﬂuences the quality; as the percentage increases the quality decreases.

The tree representation of the subdivision process is shown in Figure 4.8 (0). Each
node has eight motion vectors. Motion vectors of all the leaf nodes represent the motion
of the entire object. Black nodes (leaf nodes) represent motion vectors that can
approximate all the vertex movements enclosed in this cell to a deﬁned threshold; white
nodes indicate that the current cell needs to split into eight octants and the current tree

node will generate eight children.

4.3.3 Tri-linear interpolation

Motion for vertices within a cell is approximated using tri-linear interpolation of
the motion vectors. Given a cell of size s, lower left rear comer (minimum x,y,z)

<bx,by,bz>, motion vectors m,,...,m8, and a vertex v=<vx,vy,vz> as shown in Figure 4.9,
the ratio p,r in the x direction is computed using Equation 4.3.

= vx —bx

 

p. (4.3)

S

In a similar way, the ratios py in the y direction and ,02 in the z direction can be

computed. The weight for each motion vector w1, ...,w8 is computed using Equation 4.4.

62

W1 = (1— pomp.

W2 =(l-p.)(1-py)pz

W3 =p.(1-p,.)p.
m=mmm

W5 =(1-p.)(1-p,)(l-pz)
W6 =p.(1-py)(1—p.)

W7 :pxpy(l—pz)

W8 =(1—p.)p,.(1-p.)

(4.4)

The vertex motion Av for the target point is then computed using tri-linear

interpolation of the eight motion vectors of the cell [72], as illustrated in Equation 4.5.

 

 

 

 

8
Av 2' Zwim, (4.5)
'=1
II’TIB / m7
1
l
1
l /
m4 1 ' m3

1
l
l
1 Target 11(vxvyvz)
: point

Y i
l

[I‘m-5—
/
I
x ,’
l
I
2 m1

 

 

 

 

Figure 4.9: Tri-linear Interpolation.

The use of tri-linear interpolation of comer representational motion vectors for
estimation of vertex motion accommodates a wide variety of motion styles including
afﬁne motion, the most common rigid motion in computer animation. Non-linear motion

or discrete (independent) object motion is approximated using the octree subdivisions.

63

Other methods for motion estimation for a cell will be explored in later work including

compact quatemion representations and tri-cubic interpolations.

4.3.4 Motion vector computation

Each node in the octree approximates the motion of enclosed vertices. The key
point of the motion-modeling process is to compute the motion vectors. Motion vectors
are assumed to approximate the motion of the enclosed vertices using tri-linear
interpolation of the motion vectors over the space. Let n be the number of vertices

enclosed in a given cell. Matrix A is a 3n by 24 matrix and is built as illustrated in

 

 

Equation 4.6.
rw10 Ow20 0---w80 0—
0 W1 0 0 W2 0 "' 0 W8 0
11:99“? 9117'???“ (4.6)
Vector b, a 3n element vector is computed using Equation 4.7.
—Avx11
AV)“
Av,l
Av,r2
b: Avy2 (4.7)
Av,2
L - .

 

 

64

 

 

 

 

 

 

rmxl- ”Av,“-
" in Avyr
1w, 0 0 w, 0 0 w8 0 0 m2} Av,
0 WI 0 0 wz O 0 WS 0 mx, Avxz
*m, em,
m,2 Av,2

- ' t - ' - (4.8)

The motion vectors for a given cell are then computed using least square
estimation of x in the equation Ax=b as shown in Equation 4.8, where x is a 24 element
vector consisting of the <x, y, z> components of eight motion vectors. This process ﬁnds
the vector x that minimizes the sum of residual squares as illustrated in Equation 4.9. In
the implemented solution, the least square estimation is computed using QR
decomposition. The matrix A is decomposed into a 3n-by-24 orthogonal matrix Q and a
24-by-24 upper triangular matrix R so that A = QR. Template Numerical Toolkit (TNT)
and JAMA/C++ linear algebra library are used to compute vector x [73]. A least square
estimate of the motion vectors in this application minimizes the Euclidian distance
between the motion vectors. This is identical to minimizing the Euclidian distances

between the reconstructed and actual vertex locations after motion.

31:
Residual sum = Z (Aix — b, )2 (4.9)

i=1

To avoid accumulative error, the previous decoded ﬂame is utilized to compute
the next set of motion vectors rather than the previous input ﬂame on the encoder side.

For example, to build an octree representation of ﬂame 3, construct a minimum bounding

65

 

box that contains all the vertices of the 3D model as the ﬁrst step. Given decoded ﬂame 2
and original ﬂame 3, vector b can be obtained easily; matrix A can be built by using the
tri-linear interpolation factors for all the vertices in the current cell based on the
reconstructed ﬂame 2 locations. Using least square estimation, motion vectors are
computed, which provide the best estimate of the motion of the enclosed vertices in the
current cell. After obtaining the motion vectors for the current bounding cell, the
accuracy of the estimation is evaluated. A distance threshold is used to measure accuracy
of the current motion vectors. If maximum distance is larger than a speciﬁed threshold,
the current cell is subdivided into eight octants, each with eight new motion vectors to
estimate the motion of its enclosed vertices.

The splitting process continues until the motion estimated by the motion vectors is
less than the threshold. Therefore all leaf nodes of the octree contain the motion vectors
for the entire object. The octree is stored into two parts: the tree structure and the motion
vectors. ‘0’ is used to indicate an interior node and use ‘1’ to indicate a leaf node. The
tree structure is represented with preﬁx traversal of the tree. The tree structure can be
restored uniquely using this representation. The motion vectors are stored in the leaf

nodes. The motion vectors can be consecutively stored in the leaf nodes ﬂom left to right.

4.3.5 Adaptive arithmetic coding

Adaptive arithmetic coding is used to encode the octree. Adaptive coder has
several advantages. It does not need two passes and there is no need to transmit the
probabilities of the symbols. An adaptive model changes the probabilities of the symbols
in the compression process in order to adapt to the changing contexts. Both the encoder

and decoder assume some initial basic model. Then in the compression process, the

66

model gets adapted by the symbols transmitted before. The point is that the model gets
adapted only by the symbols that have already been transmitted, since the decoder needs
to mirror the encoder’s operation later in the decompression process. The decoder can
therefore use the symbols transmitted before to adapt the model in the same way as the
encoder did.

The version implemented by Witten et al. is utilized in the octree coding [74]. An
improved implementation of this arithmetic coding is described in the paper written by

Moffat et al. [75].

4.4 Evaluations

This section presents an evaluation of the performance of the octree-based motion
representation method in terms of the compression ratio and L2 distances. PCA approach
was reimplemented and compared to the octree-based approach in detail. This section
includes brief comparison between the octree approach and pure static mechanism. A
brief comparison of the octree approach to other animated geometry compression

techniques is also included.

4.4.1 Octree results

Test results for the octree approach using all three test data sets are presented in

this section.

4.4.1.1 Compression ratios

The compression ratio results for the Chef, Chicken Crossing and Dance

animation are shown in Table 4.2, Table 4.3 and Table 4.4 respectively. A selected set of

67

 

reconstructed Chef, Chicken Crossing and Dance animations are shown in Figure 4.10,

Figure 4.1 l and Figure 4.12 respectively.

Table 4.2: Compression ratios of the Chef animation. The Chef animation has 75 frames and each ﬂame
has 8162 triangles and 4241 vertices. The object size is 440 units in x dimension, 148 units in y dimension
and 455 units in z dimension. Original ﬁle size is 3,816,900 bytes. Max Distance threshold =p% *455 units,
where p=1, 2, 3, ...7.

 

Chef Max Distance

Animation

 

1% 2% 3% 4% 5% 6% 7%

 

Encoded Size 309,593 173,853 1 15,689 90,752 77,101 50,474 41,112

 

Ratio 12:1 22:1 33:1 42:1 50:1 77:1 93:1

 

 

 

 

 

 

 

 

 

 

Table 4.3: Compression ratios of the Chicken Crossing animation. The Chicken Crossing animation has
400 frames and each ﬂame has 5664 triangles and 3030 vertices. The object size is 2.556 units in x
dimension, 2.23 units in y dimension and 1.07 units in z dimension. The original ﬁle size is 14,544,000
bytes. Max Distance threshold= p% ”2556 units, where p=2.5, 5, 7.5, 10, 20.

 

Chicken Max Distance

Animation

 

2.5% 5% 7.5% 10% 12.5% 15% 17.5% 20%

 

Encoded Size 1,540,423 820,417 517,648 367,387 270,287 218,639 165,678 124,254

 

Ratio 9.421 18:1 28:1 40:1 54:1 67:1 88:1 117:1

 

 

 

 

 

 

 

 

 

 

 

Table 4.4: Compression ratios of the Dance animation. The Dance animation has 201 ﬂames and each
ﬂame has 14118 triangles and 7061vertices. The object size is 0.0758 units in x dimension, 0.172 units in
y dimension and 0.074 units in z dimension. Original ﬁle size is 17,031,132 bytes. Max Distance
threshold=p%*0.l72 units, where p=l, 2, 3, 8.

 

Max Distance
Dance

Animation

 

l % 2% 3% 4% 5% 6% 7% 8%

 

Encoded Size 1,118,420 687,636 478,444 369,675 287,510 227,481 185,108 161,163

 

Ratio 15:1 25:1 36:1 46:1 59:1 75:1 92:1 106:]

 

 

 

 

 

 

 

 

 

 

 

68

ix .11

i
.l
,

I‘ 1,111 13"

\

l
1;.1-

H» ”-

. -- ‘ l
' \n‘
11' "‘“v-umw) ‘
K
V
E.)

 

Figure 4.10: A selected set of reconstructed Chef animation. The top row is the original animation. The
reconstructed animation by using Max Distance threshold = 1%, 3%, 5%, 7% are shown in row 2, 3, 4, 5
respectively. The compression ratio is 12:1, 33: 1, 50:1, and 93:1 in row 2, 3, 4, 5 respectively.

69

 

Figure 4.1 l: A selected set of reconstructed Chicken Crossing animation. The top row is the original one.
The reconstructed animation by using Mar Distance threshold = 5%, 10%, 15% and 20% are shown in row
2, 3, 4, 5 respectively. The compression ratio is 18:1, 40:1, 67:1 and 117:1 in row 2, 3, 4, 5 respectively.

70

 

Figure 4.12: A selected set of reconstructed Dance animation. The top row is the original animation. The
reconstructed animation by using Max Distance threshold = 1%, 3%, 5%, 7% are shown in row 2, 3, 4, 5
respectively. The compression ratio is 15:], 36:1, 59:1, and 92:1 in row 2, 3, 4, 5 respectively.

71

4.4.1.2 Quality

Quantitatively evaluating the visual quality of the reconstructed 3D model is not
easy, and assessing the visual quality of 3D animation is even more difﬁcult. L2 distance
is widely used in the literature to measure the quality so this thesis uses it to evaluate the
quality of reconstructed ﬂames in the animation sequences. This is done in a similar way
to Briceﬁo et al. [25]. This L2 metric is also used in the following section in order to
compare the octree—based method to PCA approach proposed by Alexa and Muller [23]

The L2 distance ﬂom surface X to surface Y is deﬁned as follows:

1

__....._ Ixexd(x, Y)2dx]3
area(X)

d(X, Y) =[
(4.7)

Where d(x, Y) is deﬁned as the minimum Euclidean distance ﬂom the point x to

the surface Y. The distance computed ﬂom Equation 4.7 is not syMetric, so the L2

distance of two surfaces is deﬁned as the max (d(X, Y), d(Y,X )) . This thesis utilized the

gtscompare functionality provided in the GTS library [76]. For the integral
approximation, 0.5 percent of the side length of the initial bounding box diagonal is used
for the delta step. For simpliﬁed surfaces, L2 distance can be obtained by using Metro
tool [77]. Figure 4.13 shows comparison of L2 distance for some sample ﬂames in the
reconstructed Chef animation given different threshold settings. Similarly Figure 4.14
and Figure 4.15 show the comparison results for Chicken and Dance animation

respectively.

72

 

 
  
 

Jo: Janie; 174 i
+Octree 3% i
—-t——Octree 5%
- - K- - Octree 7% .

Octree approach: L2 distances

 

  

1

L2 Dlstance

 

1015 20 25 30 35 40 45 50 55 60 65 70

Frame Number

 

 

 

Figure 4.13: L2 distances of some sample ﬂames in the reconstructed C . .- y usrng Max
Distance threshold— - 1%, 3%, 5% and 7% respectively. The compression ratio is 12: 1, 33: l, 50: l, and 93:1
respectively.

— -0—- Octree 5% -]
-I—Octree 10% 1,
-t—Octree 15% l
1
i

‘1
Octree approach: L2 distances 1

i

1- - L's-L oaeeav»

0.45
0.4
0.35
0.3
0.25
0.2

L2 Distance

0.1
0.05

 

Frame Number

 

 

 

Figure 4.14: L2 distances of some sample ﬂames in the reconstructed Chicken animation by using Max
Distance threshold = 5%, 10%, 15% and 20% respectively. The compression ratio is 18:1, 40:1, 67:1 and
117:1 respectively.

73

 

Octree approach: L2 distances — ‘— 'OCU'96 1% 1
—O—0ctree 3% ‘
—t—Octree 5% I
" - - x- - Octree 7% ,

 

   

.3, .gr .3» 41. .51. (:1. gr gt (a 45"

Frame Number

 

Figure 4.15: L2 distances of some sample frames in the reconstructed Dance animation by using Max
Distance threshold = 1%, 3%, 5% and 7% respectively. The compression ratio is 15:1, 36:1, 59:1, and 92:1
respectively.

4.4.2 Octree vs. PCA

To compare the octree approach with existing approaches in the literature, the
PCA approach was reimplemented and evaluated. PCA was chosen for comparison
because it can achieve relatively high compression ratios with reasonable quality. Three
different animation data sets: “Chef”, “Chicken Crossing” and “Dance” were used for
comparison. The compressed size of the octree approach is the sum of encoded octree
size for each ﬂame. The compressed size of PCA approach should include the afﬁne
transformations for each ﬂame, matrix U (number of bases * number of vertices * 12
bytes/vertex), matrix S (number of bases*4) and matrix V (number of bases “ number of
ﬂames*12). Given approximately the same compression ratio, L2 distances of two

systems are compared. Some comparison results for the Chef animation sequence are

74

shown in Figure 4.16 and 4.17 respectively. In Figure 4.17, the average L2 distance for
the octree approach and PCA approach are 8.7 and 19.1 respectively. This means the
octree approach can consistently achieve better results. Figure 4.18 shows the

reconstructed frame 25 using the octree approach and PCA approach.

 
  

- - I- - PCA6 bases ‘1

+7 7 Octreei1% .‘1

 

12 , L:

L2 Distance
0')
i
i

 

 

 

 

o ~. '7 w““"”* " '4’ '
10 15 20 25 30 35 4o 45 50 55 60 65 70

Frame Number

 

 

 

Figure 4.16: L2 distance comparison between the octree and PCA approaches using the Chef animation.
This comparison is based on approximately the same compression ratio (12:1). To get this ratio, threshold

in the octree approach is set to be 1% of the initial cubic bounding box size. It corresponds to PCA with 6
bases.

75

L2 Dlstance
.4 _x N N
O 01 0 01

U"

Figure 4.17: L2 distance comparison between the octree and PCA approaches using the Chef animation.
This comparison is based on approximately the same compression ratio. The compression ratio of PCA

 

 

L2 distances: PCA vs. Octree 1

 

- - D- - SPCA 3 basesl‘
.ﬁfostee 257° 1

 

 

f

 

10 15 20 25 30 35 4O 45 50
Frame Number

 

with 3 bases is 24:1 and the compression ratio of the octree with threshold 2.5% is 27:1.

 

Figure 4.18: Image in the middle is the original frame 25. The reconstruction using the octree approach

with threshold 2.5% and PCA with 3 bases are shown in the ﬁrst and last image respectively.

76

L2 distance comparison between the octree and PCA approach using the Chicken
animation sequence are shown in Figure 4.19 and 4.20 respectively. In Figure 4.20, the
average L2 distance for the octree approach and PCA approach are 0.167 and 0.18
respectively. Figure 4.2] shows a reconstructed sample frame using the octree and PCA
approaches. Figure 4.19 shows that the octree approach has much larger L2 distance error
for frame 300 than PCA approach. However the quality of the reconstructed image using
PCA approach is similar to the quality of the octree approach as shown in Figure 4.21.
L2 distance could not exactly reﬂect the visual quality of the reconstruction. To get more
accurate quality evaluation, the quality metric may take into account the effect of human

visual system (HVS).

L2 distances: PCA vs. Octree 1- :7-7- :PCK 19 bases
‘—I-— Octree 5% .

 

L2 Dlstance

   

Frame Number

 

Lg, n_,,_i ;.7,,__—_ __ ﬁn v , ,

Figure 4.19: L2 distance comparison between the octree and PCA approaches using the Chicken animation.
This comparison is based on approximately the same compression ratio. The compression ratio of PCA
with 19 bases is 19:1 and the compression ratio of the octree with threshold 5% is 18: 1.

77

if 7 if if vvvv v v.vVV_-K1Avﬁ
1

T::WP‘CAﬁ-as;es#

L2 distances: PCA vs. Octree !
—t—Octreei1§%ﬁ .

 

 

L2 Distance
.0
M

0.15
0.1 -
0.05 - 7*

 

 

 

 

 

2
50
75
100
25
150
7
o
2
5
75 ' ' ~
300
325
350
375
400

 

Frame Number

Figure 4.20: L2 distance comparison between the octree and PCA approaches using the Chicken animation.
This comparison is based on approximately the same compression ratio (40: l ). To get this ratio, threshold
in the octree approach is 10%. It corresponds to PCA with 9 bases.

   

Figure 4.21: Image in the middle is the original frame 300. The reconstruction using the octree approach
with threshold 5% and PCA with 19 bases are shown in the ﬁrst and last image respectively.

L2 distance comparison between the octree and PCA approaches using the Dance
animation are shown in Figure 4.22 and 4.23 respectively. In Figure 4.22, the average L2

distance for the octree approach is 0.0014 and the average L2 distance for PCA approach

is 0.019.

78

 

1“”—‘——1
L2 distances: PCA vs. Octree 1' ' " ' PCA 14 bases 1

1 —a— Octree 1 %

1
1
1
1
1
1
1
1
1

1
1
1

 

.3, r.31, 431, 4» 0;» ,5» (gt (.31, ((1, an,

Frame Number

 

1

1

1

1

1
,,,,,,,, , ,, 1
Figure 4.22: I.2 distance comparison between the octree and PCA approaches using the Dance animation.
This comparison is based on approximately the same compression ratio. The compression ratio of PCA
with 14 bases is 14:1 and the compression ratio of the octree with threshold 1% is 15:1.

 

1

L2 distances: PCA vs. Octree 7- nftiK 4 bases
T. #0981514, 1

L2 Distance

    

a 03" cs» 4» 5% Ne e

'\ 65L

\
Frame Number

 

 

 

Figure 4.23: L2 distance comparison between the octree and PCA approaches using the Dance animation.
This comparison is based on approximately the same compression ratio. The compression ratio of PCA
with 4 bases is 48:1 and the compression ratio of the octree with threshold 5% is 59:1.

79

 

Figure 4.24: Image in the middle is the original ﬂame 150. The reconstruction using the octree approach
with threshold 1% and PCA with 14 bases are shown in the ﬁrst and last image respectively.

Figure 4.24 shows a reconstructed sample frame using the octree and PCA
approaches.

Compared to PCA approach, the octree-based approach has some advantages.
PCA approach requires signiﬁcant memory usage and processing time. The octree-based
approach has a much smaller memory requirement and runs much faster. The octree-
based approach compresses the animation sequence equally well no matter how many
frames are in the animation sequences. PCA approach cannot achieve high compression
ratios if the number of ﬂames is signiﬁcantly smaller than the number of vertices. The

PCA approach can not control the quality of an individual ﬂame in the animation

80

sequence. Some reconstructed ﬂames have outrageous error. However, the octree
approach can control the quality of each ﬂame in the sequence by setting the threshold.
The average L2 distance comparison shows that the octree approach is better than PCA
approach in terms of result consistency. PCA approach is good in terms of smoothness of

the shape.

4.4.3 Octree vs. pure static compression mechanism

Another way to compress animated 3D models is to apply static geometry
compression algorithm individually to each ﬂame in the animation sequence, but this
does not take advantage of the temporal coherence present in the animation sequence.
Additionally this approach is expensive for large models. This approach is tested using
the “parallelogram” approach proposed by Touma and Gotsman [8]. For the test data sets
used in this thesis, the TG coder can compress the animation with a compression ratio of
9 to 1, not a competitive animation result. Compared to this approach, the octree-based
approach exploits not only the space coherence, but also the temporal coherence existed

in the sequence. It can achieve higher compression ratios and better reconstruction

quality.

4.4.4 Octree vs. other approaches

According to the results interpreted by Ibarria and Rossignac, dynapack approach
can compress the entire chicken animation with 13.7:1 compression ratio when it uses 11-
bit quantization and Lengyel’s approach can achieve 14:1 compression ratio with
comparable accuracy [24]. To get more accurate results, Lengyel’s approach can get 2:1

compression ratio and dynapack can achieve 108:] with lS-bit quantization [24]. Ibarria

81

and Rossignac claimed their approach could achieve undistinguishable result ﬂom the
original using 13-bit quantization. In this case, the compressed ratio is 10:8 to 13.7: 1.
The octree approach can achieve 18:1 compression ratio with quality undistinguishable
from the original animation. When less accuracy is allowed, the octree-based approach
can achieve 100:1 compression ratio. Compared to the results obtained by previously
published animation compression techniques [22-24], the octree approach can achieve

higher compression ratios with comparable animation quality.

4.5 Conclusions

This chapter described the octree-based motion representation method in detail.
This method presents a novel way to implement predictive, differential ﬂame
compression. It achieves the goal of this thesis: represent 3D animation sequence with a
reduced set of motion vectors that take advantage of the large data coherence in space
and time. This approach presents a novel way to represent 3D animation. It is easy to
implement and has a low cost encoding process and fast decoding process. Compared to
other approaches, it can achieve higher compression ratios and at the same time achieve

very good quality.

82

Chapter 5

5 Delta approach

This chapter presents a new delta coding approach and evaluation of the
performance of this method for animated geometry compression. The delta coding
approach generates 3D delta values ﬂom two consecutive ﬂames. The method achieves
the compression by quantizing the 3D delta values. Different ﬂames in the animation
sequence can use different quantization levels. Quantized values are coded using
arithmetic coding to achieve further data reduction. This method is simple and easy to
implement.

During the course of this research, the delta coding was explored as an alternative
to the octree-based approach. This chapter will show that the delta coding method does
not achieve the consistent level of the performance demonstrated by the octree approach.
However, in limited cases the delta approach does outperform. Hence, it is explored here

in isolation, and then incorporated into a hybrid compression approach in Chapter 6.

83

5.1 Delta encoding process

 

 

 

 

 
 

 

 

 

 

 

 

    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reconstructed
I Previous Frame
Search for
_.. Delta the Error of Reconstruct : .
3D quanta-non Frame < Threshold ?
Current level
Frame
Quantization
Arrth' metic
coding
De Frame 2'1 Decoder
Encoded
current
frame

 

 

 

Figure 5.1: Logical ﬂow of the encoding process of the delta method.

The logical ﬂow of the encoding process is illustrated in Figure 5.1. To encode
the current ﬂame, the previous reconstructed ﬂame needs to be available in the encoder
side. Therefore the ﬁrst ﬂame is encoded differently ﬂom other ﬂames in the sequence. It
is encoded using an intraﬂame coding method. The ﬁrst step in the delta encoding
process is to generate delta 3D values between two consecutive ﬂames, namely the
reconstructed previous ﬂame and the current original frame. Delta 3D data presents the
differential motion between two ﬂames. After the first step, the delta approach seeks the
best quantization level with the minimum number of bits for the current delta 3D that
satisfies a predefined threshold. After the number of bits for the quantization is
determined, the linear quantization over the range of each possible vertex is applied to

the delta 3D as illustrated in Equation 5.1. The quantized values are encoded using

84

adaptive arithmetic coding to further achieve data reduction. To this step, the current
ﬂame is completely delta encoded. In the local decoding step, a linear inverse

quantization is applied to the quantized samples as shown in Equation 5.2.

7c : [24[_EELJJ (5.1)
xmax _ xmin

_ _ =1: -
= (xmax xmm) x . (5.2)

 

The above steps are repeated for every ﬂame except for the ﬁrst ﬂame in the

animation sequence.

5.2 Evaluations

In order to compare the results of different approaches, the same test data sets as
those in octree approach are used to evaluate the performance of the delta approach.
Compression ratio and the L2 distance results of the delta approach for different
animation sequences are presented. This section also includes the overall and ﬂame-wise

compression ratio comparison between the delta approach and octree approach.

5.2.1 Delta approach results

This section presents the compression ratio results for Chef, Chicken Crossing
and Dance animation sequences given different threshold settings. Table 5.1 shows the
results for Chef animation. Similarly Table 5.2 and Table 5.3 show the results for the

Chicken Crossing and Dance animation respectively.

85

Table 5.]: Compression ratios of the Chef animation using the delta approach. The Chef animation has 75
ﬂames and each frame has 8162 triangles and 4241 vertices. The object size is 440 units in x dimension,
148 units in y dimension and 455 units in z dimension. Original ﬁle size is 3,816,900 bytes. Max Distance
threshold =p% *455 units, where p=1, 2, 3, ...6.

 

 

 

Delta approach Max Distance
Chef sequence 1% 2% 3% 4% 5% 6%
Compression Ratio 17:1 21:1 31 :1 32:1 32:1 34:1

 

 

 

 

 

 

 

 

 

Table 5.2: Compression ratios of the Chicken animation using the delta approach. The Chicken Crossing
animation has 400 ﬂames and each ﬂame has 5664 triangles and 3030 vertices. The object size is 2.556
units in x dimension, 2.23 units in y dimension and 1.07 units in z dimension. The original ﬁle size is
14,544,000 bytes. Max Distance threshold= p% *2.556 units, where p=2.5, 5, 7.5, 10, 15, 20.

 

 

 

 

 

 

Delta approach Max D 1stance
Chieken sequence 2.5% 5% 7.5% 10% 15% 20%
Compression Ratio 21:1 26:1 28:1 30:1 33:1 35:1

 

 

 

 

 

 

Table 5.3: Compression ratios of the Dance animation using the delta approach. The Dance animation has
201 frames and each ﬂame has 14118 triangles and 7061vertices. The object size is 0.0758 units in x
dimension, 0.172 units in y dimension and 0.074 units in z dimension. Original ﬁle size is 17,031,132
bytes. Max Distance threshold=p%*0.l72 units, where p=1, 2, 3, 4, 5, 6.

 

 

 

Delta approach Max Distance
Dance sequence 1% 2% 3% 4% 5% 6%
Compression Ratio 102] 1511 18:] 1921 21:] 2321

 

 

 

 

 

 

 

 

 

L2 distance results for the test data sets given different threshold settings are also
included in this section. Figure 5.2 shows the L2 distance results for the Chef animation.
Similarly L2 distance results for the Chicken animation and Dance animation are shown

in Figure 5.3 and Figure 5.4 respectively.

86

 

Delta approach: L2 distances ._ -o— oeua 1%
—I—Delta 3%

- - I- - Delta 5%

 

    

10 15 20 25 30 35 40 45 50 55 60 65 70
Frame Number

 

 

1*, m, ,n nwnr -1, In n-7,”, #2 n niinm

Figure 5.2: L2 distances of some sample ﬂames in the reconstructed Chef animation using the delta
approach given different threshold settings.

 

   
 

Delta approach: L2 distances

—-—DeIta 15%
- - a- - Delta 20%

    

0.6

 

0.5

0.4

0.3

L2 Dlstance

0.2
0.1

1

1

1

1

1

1

1

1

0
: (to 93° «(O '96 (to (00 ’3‘) W®$$$¢D (500 (5(6) (593% 4‘) “0°
1

1

 

Frame Number

 

 

Figure 5.3: L2 distances of some sample ﬂames in the reconstructed Chicken animation using the delta
approach given different threshold settings.

87

2.7;;-Deltai1%11
_._Delta 3%11

11
“...-Delta 5%_1

Delta approach: L2 distances

L2 Distance

    

.3. .31. .51, .{1 (51. (a, (.31, (on. ((1,

 

Frame Number

 

Figure 5.4: L2 distances of some sample ﬂames in the reconstructed Dance animation using the delta
approach given different threshold settings.

5.2.2 Overall compression ratio comparison: delta vs. octree

Compression ratio is one of the most important measurements to evaluate the
performance of an approach. This section presents the overall compression ratio
comparison between the delta approach and the octree approach. Compression ratio
comparisons for the Chef animation are shown in Figure 5.5. Similarly Figure 5.6 and
Figure 5.7 show the comparison results for the Chicken Crossing animation and Dance

animation.

88

r

80

 

Compression ratio: Delta vs. Octree

 

70

U!
D

 

#
0

”Compression Ratlom
0.)
O

 

 

 

 

 

1
1
1
160
1
1
1:
1

3 4
Threshold

+ DeltaiApproach
Le- - Octree Approacl'L

 

 

Figure 5.5: Compression ratio comparison between the delta and octree approach using the Chef animation.

Compression Ratlo

—| —l _|
A 0') co 0 N J:
O O O O O O

M
0

Compression ratio: Delta vs. Octree

, » 77 1
1+ Delta Approach

- - o- - Octree Aprcach

 

 

 

 

 

O

5% 10% 1 5%

Threshold

20%

 

Figure 5.6: Compression ratio comparison between the delta and octree approach using the Chicken

animation.

89

 

+7 7777 Delta approachi 7‘

Compression ratio: Delta vs. Octree
- - I- - Octree approach 1

  

Compression Ratio

 

1% 2% 3% 4% 5% 6%
Threshold

 

 

 

Figure 5.7: Compression ratio comparison between the delta and octree approach using the Dance
animation.

5.2.3 Frame-wise compression ratio comparison: delta vs. octree

The overall compression ratio comparison results show that the delta coding
method does not achieve the consistent level of performance demonstrated by the octree
approach. However, in limited cases the delta approach does outperform. The ﬂame-
wise compression ratio comparisons were conducted. Figure 5.8 shows the ﬂame—wise
compression ratio comparison between the delta and octree approach given the same
threshold settings for the Chef animation. In this case, the delta approach is better than
the octree approach for 18% ﬂames. Figure 5.9 shows the ﬂame-wise comparison results
for the Chicken Crossing animation. For 22% ﬂames, the delta approach can achieve
higher compression ratios given the same threshold settings. Figure 5.10 shows the
comparison results for the Dance animation. The delta approach is better than the octree

approach for 9% ﬂames.

90

 

Delta 2%
------ Octree 2%

 

F rame-wise Comparison: Deita vs. Octree

 

Compression Ratio

   

 

1 Frame Number

 

Figure 5.8: Frame-wise compression ratio comparison between the delta and octree approach using the
Chef animation given the same threshold settings. (threshold=2%).

 

Delta 5%
------ Octree 5%

Frame-wise Comparison: Delta vs. Octree

 

   

Frame Number

 

1
1
1 700
Is“00
it§5oo
15400
1%
12300
D.
15200
0
10100
‘ o
1 03(0000 gFQI-DNCDOMON
NWVLOLD [\NwmmoFNN
‘ FFFF F‘— 1- NNNN
1

 

Figure 5.9: Frame-wise compression ratio comparison between the delta and octree approach using the
Chicken animation given the same threshold settings. (threshold=5%).

91

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I . o I
Frame-Wise Companson: Delta vs. Octree Dena” 1
------ Octree1%1
, 60
1 r l
1 I 2 :" ~
1 0'3 40 i I; , '3 e
1 e : :‘ -.
1 .2 I. it ‘ ~.m.'1 "~ 5' '1 '
1§3°zri I. 33,5: 131;.
1 '5 20:5“ 1 3:3. 1": I'm” ;. ' Iii "1.5 13"
. 0:11 ~ ' 11115411111-
1 5" ' I '3 1'.
1 8 10 A2151: ANMJL A.“ : V we v.5;
1
1 O 11v rm 1* 11*Frrr 1111111 r Ill 1111 1111 r11» 1 1111 11111 11 17 1 H1 i 111!

1815222936435057647178859299

Frame Number

 

 

Figure 5.10: Frame-wise compression ratio comparison between the delta and octree approach using the
Dance animation given the same threshold settings. (threshold=l%).

5.3 Conclusions

This chapter presented the delta approach to compress animated geometric data.
This approach is straight forward. It is easy to implement. But it has its limitation. The
big problem is that it can not achieve consistently high compression ratios. From the
compression ratio comparisons with the octree approach, the compression ratio curves for
the delta approach stop growing after they achieve certain level. Obviously the delta
approach could not be used alone to achieve animated geometry compression.

However, ﬂame-wise compression ratio comparison results between the delta
approach and octree approach showed that the delta approach could achieve higher
compression ratios for some ﬂames of the animation sequence. Although the delta
approach does not have good performance when it is used alone, it may increase the
performance of the octree approach if they are combined. This is not the case of the PCA

method. Although it also exhibits a great deal of variability among ﬂames, the PCA

92

method must work over an entire sequence and cannot be efﬁciently applied to a single
differential ﬂame reconstruction. The delta approach, like the octree approach, is a pair-
wise approach and can be applied to ﬂames in isolation. In the following chapter, a

hybrid approach is presented based on the combination of these two mechanisms.

93

Chapter 6

6 Hybrid approach

This chapter presents a hybrid coding approach to compress animated geometric
data by combining the octree and delta coding approaches. The motivation of this
approach comes ﬂom the observation that the encoded octree size is sometimes larger
than the encoded size of the delta encoding. Section 6.1 presents the encoding process of
the hybrid approach and Section 6.2 presents the results of this approach and the

comparison among the octree approach, delta approach and hybrid approach.

6.1 Hybrid encoding process

 

 

 

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 l fr
Apply Delta De ta coded ame
Reconstructed Coding to the
Previous Frame current Frame
Apply Octree
Current Coding to the
Frame current Frame Octree coded
frame
De Frame 2'1 Decoder Encoded
Current
Frame
Encoded
Animation
Sequence

 

 

 

Figure 6.1: Logical ﬂow of the hybrid encoding process for compressing 3D animation sequence.

94

To overcome the shortcomings of the octree approach, the hybrid approach is
presented in this chapter. The logical ﬂow of the encoding process using the hybrid
approach is shown in Figure 6.1.

This approach uses a threshold to control the quality of the output. The threshold
is used to control the subdividing process of the octree coding and the number of bits
used in the quantization step of the delta coding. For the octree encoding process, if the
error (Euclidean distance) of any reconstructed vertex is larger than the threshold, it will
subdivide the current bounding box and repeat the motion vector computation step. For
the delta encoding process, if the error of any reconstructed vertex is larger than the
threshold, it will continue search for the best quantization level that drops the
reconstruction error below the threshold. Therefore, any reconstructed vertex in any
ﬂame in the reconstructed animation sequence satisﬁes the criteria that the error between
this vertex and the corresponding original vertex is smaller than the threshold as shown in
Equation 4.1. The threshold in the hybrid coding method is set to be the maximum
allowed distance between the reconstructed vertex and its corresponding original vertex.
It is represented by the percentage of the edge length of the initial cubic bounding box of
the object as illustrated in the Equation 4.2.

To encode each frame except the ﬁrst frame in the sequence, both the octree
coding method and delta coding method are applied in parallel. If the size of the encoded
delta is smaller than the encoded octree, the encoded octree is replaced with the encoded

delta. A single bit in the output data stream is used to indicate if it is an octree coding or

delta coding.

95

6.2 Evaluations

To evaluate the performance of the hybrid approach, three different animation
datasets: “Chef’, “Chicken Crossing” and “Dance” are used in the evaluation as shown in
Figure 4.4, Figure 4.5 and Figure 4.6 respectively. This section presents the compression
ratio results of the hybrid approach, the reconstruction given different threshold settings
and L2 distance results. It also includes the compression ratio comparison among the

hybrid, octree and delta approach given the same threshold settings.

6.2.1 Hybrid approach results

The hybrid method generates a mixed sequence of the octree encoded and delta
encoded frames. Table 6.1 shows the percentage of the octree encoded and delta encoded
ﬂames for different animation sequences and different thresholds. The delta encoding
contributes signiﬁcantly to the hybrid method. It takes up to 41% of the encoded ﬂames.
However, as the threshold increases, the percentage of the delta coded ﬂames in the
encoded sequence decreases. This is because the performance of the octree approach is

much better than the delta coding approach as the threshold increases.

96

Table 6.1: Percentage of the octree encoded and delta encoded ﬂames in the hybrid method.

 

 

 

 

 

 

 

 

 

 

 

 

 

Hybrid Threshold Octree Delta
Chef 1% 59% 41%
2% 82% 18%

3% 86% 14%

4% 90% 10%

Chicken 2.5% 64% 36%
5% 78% 22%

7.5% 85% 15%

10% 88% 12%

Dance 1% 91% 9%
2% 92% 8%

3% 95% 5%

4% 97% 3%

 

 

 

 

 

 

This section also presents the compression ratio results for the hybrid approach.
Table 6.2, 6.3 and 6.4 present the compression ratios of the Chef, Chicken Crossing and
Dance animation sequence using the hybrid approach. Some reconstructed sample
frames of the Chef, Chicken Crossing and Dance animation sequence using the hybrid
approach are shown in Figure 6.2, 6.3 and 6.4 respectively. L2 distance comparison
results among different threshold settings using the hybrid approach are presented in

Figure 6.5, 6.6 and 6.7 respectively.

97

Table 6.2: Compression ratios of the Chef animation using the hybrid approach. The Chef animation
sequence has 75 frames and each frame has 8162 triangles and 4241 vertices. The object size is 440 units in
x dimension, 148 units in y dimension and 455 units in z dimension. Original ﬁle size is 3,816,900 bytes.
Max Distance threshold ﬁb% *455 units, where p=l, 2, 3, ...6.

 

 

 

Hybrid approach Max Distance
Chef animation 1% 2% 3% 4% 5% 6%
Compression Ratio 2221 32:1 4121 6511 7811 9621

 

 

 

 

 

 

 

 

 

Table 6.3: Compression ratios of the Chicken Crossing animation using the hybrid approach. The Chicken
Crossing animation has 400 ﬂames and each ﬂame has 5664 triangles and 3030 vertices. The object size is
2.556 units in x dimension, 2.23 units in y dimension and 1.07 units in z dimension. The original ﬁle size
is 14,544,000 bytes. Max Distance threshold= p% *2.556 units, where p=2.5, 5, 7.5, 10, 15, 20.

 

 

 

Hybrid approach Max Distance
Ch‘d‘e“ ammatlon 2.5% 5% 7.5% 10% 15% 20%
Compression Ratio 2921 47:1 60:] 7721 11221 15311

 

 

 

 

 

 

 

 

 

Table 6.4: Compression ratios of the Dance animation using the hybrid approach. The Dance animation has
201 ﬂames and each ﬂame has 14118 triangles and 7061vertices. The object size is 0.0758 units in x
dimension, 0.172 units in y dimension and 0.074 units in z dimension. Original ﬁle size is 17,031,132
bytes. Max Distance threshold=p%*0.l72 units, where p=1, 2, 3, 4, 5, 6.

 

 

 

Hybrid Approach Max Distance
Dance “man” 1% 2% 3% 4% 5% 6%
Compression Ratio 17:1 28:1 37:1 48:1 62:1 72:1

 

 

 

 

 

 

 

 

 

98

\

--L-n A“

.1.

‘11 2..

l
1

lllgﬁl

-
r
..- -
2

 

Figure 6.2: A selected set of reconstructed Chef animation using the hybrid approach. The top row is the
original animation. The reconstructed animation by using Mar Distance threshold = 1% and 5% are shown
in row 2and 3 respectively. The compression ratio is 22:1 and 78:1 in row 2 and 3 respectively.

99

Figure 6.3: A selected set of reconstructed Chicken Crossing animation using the hybrid approach. The top
row is the original one. The reconstructed animation by using Max Distance threshold = 5%, 10% and 15%
are shown in row 2, 3, and 4 respectively. The compression ratio is 47:1, 77:1 and 112:1 in row 2, 3 and 4
respectively.

 

 

Figure 6.4: A selected set of reconstructed Dance animation using the hybrid approach. The top row is the
original animation. The reconstructed animation by using max distance threshold = 1% and 3% are shown
in row 2 and 3 respectively. The compression ratio is 17:1 and 37:1 in row 2 and 3 respectively.

101

 

 

Hybrid approach: L2 distances — "- 'Hybrid 1%
1+ Hybrid 3%
- - t- - Hybrid 5%

 

 

 

L2 Dlstance

    

1O 15 20 25 30 35 4O 45 50 55 60 65 70
FrameNumber

 

 

Figure 6.5: L2 distances of some sample frames in the reconstructed Chef animation using the hybrid
approach given different threshold settings.

 

—-— Hybrid 5% ‘
— -0--Hybn'd10% ‘
—-— Hybrid 15% 1
' ' 9,: ”BEE/‘3‘

Hybrid approach: L2 distances

1
1
0.45 1

0.35

P
w

1
1
1 0.4
1
1

0.25

L2 Distance
9
N

1
1
1 0.15
1 0.1
0.05

 

0
'13) ‘9 «ODQer'f’x‘? Cgrﬁngrﬁorﬁgs°°éf°¢§9®g 99°

 

Frame Number

 

1
1
1
1
1

 

Figure 6.6: L2 distances of some sample ﬂames in the reconstructed Chicken animation using the hybrid
approach given different threshold settings.

102

 

— +- -Hybrid 1% 1
—-——Hybrid 3% 1
- ... - Hybrid 5% 1

Hybrid approach: L2 distances

 

0.009
0.008 '3
0.007 '
0.006
0.005
0.004
0.003
0.002 ' -
0.001

0 _
.3, 8;» 2;» 41, .51 ((1 (5'1, «‘9’ <0, '9'»

L2 Distance

 

Frame Number

 

 

 

 

Figure 6.7: L2 distances of some sample ﬂames in the reconstructed Dance animation using the hybrid
approach given different threshold settings.

6.2.2 Compression ratio comparison: hybrid, octree and delta

This section presents the compression ratio comparison results among the hybrid,
octree and delta approaches. Figure 6.8 shows the results using the Chef animation.
Similarly, Figure 6.9 and Figure 6.10 show the results using the Chicken Crossing and

Dance animation respectively.

103

 

 

. . .. . - - Hybrid Approach
_._ Octree Approach
—.— Delta Approach

Compression ratio: Hybrid, Octree, Delta

 

 

40

Compression Ratio
8

20 .

  
    

 

1% 2% 3% 4% 5% 6%
Threshold

 

 

 

Figure 6.8: Compression ratio comparison among the hybrid, octree and delta approach using the Chef
animation.

Compression ratio: Hybrid, Octree, Delta """‘Hyb'id Appmc“
—-— Octree Aproach

—.— Delta Approach

 

Compression Ratio

 

 

,1 5% 10% 15% 20% ’
1 Threshold J

Figure 6.9: Compression ratio comparison among the hybrid, octree and delta approach using the Chicken
animation.

 

 

. . .. - - - Hybrid Approach

Compression Ratio: Hybrid, Octree, Delta , Octree Appmch,

 

—.— Delta Approach
70 ac‘

 

 
   

60
50 .
40

30

Compression Ratio

20 "

 

1% 2% 3% 4% 5%
Threshold

 

 

Figure 6.10: Compression ratio comparison among the hybrid, octree and delta approach using the Dance
animation.

6.3 Conclusions

The hybrid approach outperforms the octree-only approach in terms of
compression ratio given the same threshold settings. It provides a simple and efﬁcient

way to improve the performance of the octree-based motion representation method.

105

Chapter 7

7 System

A well developed concept in digital video compression is the layering of
components into systems. MPEG video compression includes a structure for combination
of differential ﬂame representations and intraﬂame coding such that streams can be
rejoined and errors do not accumulate over time [78]. This system, the well known
combination of I, P, and B-ﬂames allows MPEG to exploit the massive interframe
correlations of digital video while still providing occasional points where the stream can
be picked back up for decoding. In addition, the MPEG systems standard deﬁnes another
layer of stream encoding that allows for the merging of a video stream with additional
media streams such as audio [79].

In this chapter, a systems-layer is proposed for animated geometry compression
that supports practical application of these methods in distributed systems including
stream joining, random access in sequences, merging with additional multimedia streams,
and playback controls. It describes the system implementation of a ﬂame-based geometry
system with particular emphasis on support for practical playback requirements including
stream joining, playback controls, and error correction. The details of such a system are
contrasted to the mature MPEG systems layer designs. A system block diagram is

illustrated in Figure 7.1.

106

7.1 System design

Similar to MPEG, the complete hybrid compression system has I-ﬂames and P-
frames. I-ﬂames (intra-coded ﬂames) are coded independently using a static geometry
compression method as discussed in Chapter 3. P-ﬂames (octree or delta predicted
ﬂames) are inter-coded ﬂames. P-ﬂames are predicted ﬂom temporally preceding 1-
ﬂames or P-ﬂames in the animation sequence as described in Chapter 4, 5, and 6. P-
frames are encoded using the hybrid coding method.

MPEG gains advantage ﬂom the recognition that motion is often continuous.
Hence, the motion ﬂom one ﬂame to another may be easily deduced by the location of
content before and after the ﬂame. The MPEG B-ﬂames allow for interpolation of
motion between preceding and succeeding video ﬂames. Though this thesis does not
explore the concept in detail, it is easy to propose the concept of a B-ﬂame for animated
geometry compression.

B-frames are predicted ﬂom the nearest preceding and following I or P-ﬂames in
the sequence. B-ﬂames could be encoded with an interpolation method. As an example, a
simple ﬂaction can indicate the distance the vertices are expected to move along a linear
or cubic path between the preceding and succeeding ﬂame vertex locations. Then a
differential of that motion (the error aﬁer the prediction) is simply encoded using the
hybrid encoding method. This closely mirrors the concept of DCT compression of
residuals in MPEG B-ﬂames.

I-ﬂames are required to support random access playback at the decoder side.
Decompression must always begin with an I-ﬂame for any sequence. I-ﬂames also allow

for stream joining, the acquisition in sequence of a broadcast stream, and recovery after

107

errors. It is common that broadcast multimedia streams utilize forward error correction
(FER) to mask the effects of errors. If errors exceed the correction threshold, I-ﬂames
provide a later place to recover decoding following the error.

Increasing the quantity of I-ﬂames will lower the compression ratio, since intra-
coded ﬂames are considerably larger than inter-coded ﬂames. Having more inter—coded
ﬂames increases the compression ratio at the risk of reducing the reconstructed animation
quality. The distribution of I, P, and B-ﬂames in MPEG video is typically application
dependent, allowing applications to choose the granularity of stream random access and
the quality of the content for a given data rate. The same idea clearly applied to animated

geometry compression.

 

Step 1: Decide insertion criteria

 

 

Step 2: Encode Connectivity

 

Store

 

 

 

 

Original . ‘
3D animation - I ' Encoded '" 9 -
Se uence Step 3. App y static geometry Me dia \ a
9 compressron to Lirames or I -frames Sequence
and apply the hybrid approach to P
frames
Network
Encode

 

 

 

——-_—-_———-—-—-_—--—-—-——_—_——--—--1--_-

Decode

  
 

; Decode l-frame using static method
1 Decode P-frame using the hybrid

\. approach
2‘ i 4&0 Decoded lframe or P-irame

 

 

 

 

 

 

 

Figure 7.1: System Diagram.

Animated geometry compression methods actually induce two possible variants

on the I-ﬂame. The data for a given ﬂame consists of vertex and connectivity

108

information. A complete intra-coded ﬂame will include both elements, allowing for the
stream to be joined in progress and all data for rendering decoded. However, in many
cases the connectivity information will remain constant. This leads to a reduced I-ﬂame
version (the I’-frame) consisting only of vertex data. Such a ﬂame does not support
stream joining, but will allow error recovery if the geometry does not change during the
stream loss and will support random access, since the geometric data can be stored in the
ﬁle header or in relation of some grouping of ﬂames.

A hybrid animated geometry compression system works as follows: given a 3D
animated sequence, the user needs to specify some parameters in the beginning, namely
I-ﬂame, I’-ﬂame, P-ﬂame and B-ﬂame interval or insertion criteria. Insertion criteria
might include a maximum spacing and automatic insertion of I-ﬂames following
geometry changes (due to cuts or new object inclusion). Connectivity may be assumed to
remain unchanged for the entire sequence, so it can be encoded and transmitted only
once, though new connectivity can always be transmitted using an I-ﬂame. The
connectivity information is encoded with each I-ﬂame where connectivity has changed or
stream joining capabilities are to be supported. Other information like the number of
vertices and number of ﬂames is also stored with the I-ﬂame.

According to those parameters, each ﬂame can be encoded as an I-ﬂame, 1’-
frame, P-ﬂame or possibly a B-ﬂame. Apply static geometry compression method for 1-
frames; apply the hybrid approach for the P-ﬂames. After all the ﬂames are encoded, the
encoded I and P-ﬂame sequence are stored into media. The encoded data could be
transmitted through network to a decoder side. The decoder reads the stream data ﬂom

the encoder side. It ﬁrst reads the insertion criteria information to decide parameters such

109

as l-frame, P-ﬂame and B-ﬂame intervals. It also restores the connectivity information. It
then retrieves an encoded I-ﬂame or P-ﬂame record and decodes it. An l-ﬂame is
decoded using static geometry compression method; a P-ﬂame is decoded using the
hybrid approach. After decoding a ﬂame, it could be rendered on the screen or saved into

a ﬁle. This process is repeated until all the ﬂame records are retrieved ﬂom the network.

7 .2 System features

As a complete hybrid compression system which is similar to the MPEG system
[80], the system design described in this chapter can support several desired features for
practical applications.
0 Stream joining
Stream joining feature allows joining the stream in progress when user switches
the geometry channel. I-ﬂames can be used to support this feature.
0 Random access
Random access requires that a compressed bit stream be accessible in its
middle [80]. It is an essential feature that a ﬂame based compression system
should support. This feature can be achieved by providing access points within
the sequence. Intra-coded ﬂame can serve as access points for the random access.
Any full system will need to support the transmission of intra-coded data in
addition to the inter-coded data in order to support scene changes and new content
entering the scene. I-ﬂames are intra-coded. To support random access, the
nearest I-ﬂame to the requested access point is located and the decoder can

directly jump to the record of the I-ﬂame.

110

Interleaved audio support

It is possible to add audio support to the system. At the encoder, 3D animated
data and audio are encoded separately and then interleaved using a system layer
standard such as that deﬁned for the MPEG standards. The decoder decodes
audio and animation ﬂom the interleaved stream. Time Division Multiplexing can
be used if the delay ﬂom the sender to the receiver is fixed. However, in a
packetized system the delay is usually not ﬁxed and the methods described in this
thesis are fundamentally variable bit-rate mechanisms. Some means of
synchronizing audio and animation packets are required. The solution used by
MPEG-2 system can be adopted [47]. Decoding Time Stamps (DTS) indicates
when a unit to be decoded; Presentation Time Stamps (PTS) indicates when a unit
to be rendered, displayed or played. When the System Time Clock (STC)
advances to the DTS, the unit is decoded. Likewise, when the STC advances to
the PTS, the unit is presented. This relatively simple system has been applied to a
variety of media types and compression methods.
Error correction

During the transmission of an encoded animation, errors may occur. The error
may be propagated by differential hybrid decoding. However, I-ﬂames will be
able to correct the previous errors and restart the correct animation sequence.
How soon the system can recover ﬂom an error depends on the ﬂequency of 1-

frames.

111

 

VCR controls

The hybrid compression system as described in this chapter can support VCR
controls such as pause, fast forward. Fast forward is basically a more demanding
form of random access. Several different levels of fast forward can be supported.
The fastest level is done through decoding only I-ﬂames and skipping all inter-
coded ﬂames. If B-ﬂames are included, the next level is supported by decoding P-

ﬂames and skipping B-ﬂames.

112

Chapter 8

8 Conclusions and Future work

This chapter presents some conclusions that can be drawn ﬂom this thesis and
some possible future extensions to the current work. This thesis has presented three novel
approaches to efﬁciently represent 3D animation sequences, an octree-based approach,
delta coding approach and hybrid approach. These approaches have been compared and

evaluated using the “Chef”, “Chicken Crossing” and “Dance” animation sequences.

8.1 Conclusions

This thesis began with a review of the basic techniques for data compression,
some notable efforts in geometry compression and animated geometry compression. It
then described the implementation of a novel octree-based approach that efﬁciently
represents the data coherence in a 3D animation sequence in detail. In order to evaluate
this approach, PCA method was implemented and compared to the octree-based method
in terms of compression ratio and L2 distance. The results show that the octree-based
approach is signiﬁcantly better in L2 distance given the same compression ratio.
Compared to a purely static mechanism, the octree approach can achieve much higher
compression ratios.

A delta coding mechanism was also introduced that represents the data coherence
in a 3D animation sequence and is efﬁcient as a compression mechanism on some pair-

wise ﬂame sequences. This approach was evaluated by comparing its performance to the

113

octree-based approach in terms of overall and ﬂame-wise compression ratios. In some
pair-wise ﬂame cases it did out-perform the octree-based method. However, it is shown
that it is not suitable as a general solution because the compression ratio does not scale
with the number of vertices of a ﬂame. Although the delta coding approach could not
achieve good performance when it is used alone, it can increase the performance of the
octree-based approach when they are combined.

A hybrid approach was then presented that combines the octree-based method and
delta coding method. The results show that the hybrid approach outperforms both the
octree-based and delta-encoding approaches in terms of compression ratio.

This thesis also described the system implementation of a ﬂame-based geometry
system with particular emphasis on support for practical playback requirements including
stream joining, playback controls, and error correction. The details of such a system were

contrasted to the mature MPEG systems layer designs.

8.2 Future work

This section presents some possible extensions to the current work.
0 A better 3D visual quality measure
Finding a good visual quality measure is a difﬁcult task. In this thesis, L2
distance was used as a distortion measurement. However, it is not an accurate
visual quality measurement. For instance, there are many cases that one may have
larger L2 distance but have better visual quality. A better measure may need to
take into account more factors, such as issues of the human visual system and

characteristics of the rendered images. In addition to comparing the distortions of

114

individual ﬂames in an animation sequence, there is also a need to measure the
reconstructed motion quality. Good visual quality measurements may help the
compression system adjust parameters to achieve improved ratio/distortion
performance.
Several variations on the octree design

In order to increase the performance of the current design, the
modiﬁcation of the splitting criteria deserves more exploration. Other alternative
ways to estimate and model motion for a cell may worth further exploration, such
as a compact quaternion representation and cubic interpolations.
Motion clustering

The concept of clustering vertices so as to partition the data into groups
with greater motion coherence is another area that deserves exploration.
Compression methods take advantage of the motion coherence among vertices.
This is clearly the case for vertices within an object, but may not be the case
among independently moving objects. Clustering the data into groups that mimic
independently moving objects may allow for better motion modeling within the
groups, thereby increasing quality.
Progressive compression

To support a system that can provide for partial data transmission at
reduced resolution, the octree-based implementation needs to be modiﬁed. One
possible approach is to store the motion vectors not only at the leaf, but also at
internal nodes of an octree. A child node constructs a motion vector based on the

results of its parent’s reconstructed ﬂame. One research issue is how to traverse

115

the octree to provide a smooth and balanced increase of LOD for all parts of an
object.
0 Dynamic changes in topology
Connectivity is assumed to remain constant in this thesis; future work
needs to address dynamic changes in topology.

Animated geometry compression is a relatively new area, but one with great
promise. Numerous applications will be enabled by practical and high quality solutions
to the problem. This thesis presents methods that signiﬁcantly improve on existing
methods, particularly in the area of design simplicity, performance, and consistency ﬂom
frame to ﬂame and a preliminary description of how to integrate this work into a system
is presented. It is not assumed that this is the ﬁnal solution of the problem, but the
general method is very promising as a ﬂamework for developing a solution with wide

applicability.

116

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

References

"Nvidia brings back SLI,"
http://www.gamespot.com/news/2004/06/2 8/news 6101476.html

F. Sparacino, K. Hall, C. Wren, G. Davenport, and A. Pentland, "Improvisational
Theater Space," Proceedings of The Sixth Biennal Symposium for Arts and
Technology, New London, CT, 1997.

N. Foster and R. Fedkiw, "Practical Animation of Liquids," Proceedings of A CM
SIGGRAPH '01 , 2001.

N. Foster and D. Metaxas, "Modeling Water for Computer Animation,"
Communications of the ACM, vol. 43, no. 7, pp. 60-67, 2000.

M. Deering, "Geometry Compression," Proceedings of A CM SIGGRAPH '95, pp.
13-20, 1995.

G. Taubin and J. Rossignac, "Geometric compression through topological
surgery," ACM Transactions on Graphics, vol. 17, no. 2, pp. 84-115, 1998.

G. Taubin and J. Rossignac, "3D Geometry Compression," ACM SIGGRPAH'98
Course Notes 21, Orlando, Florida, 1998.

C. Touma and C. Gotsman, "Triangle Mesh Compression," Proceedings of 24th
Conference on Graphics Interface (GI-98), pp. 26-34, San Francisco, 1998.

J. Rossignac, "Edgebreaker: Connectivity Compression for Triangle Meshes,"
IEEE Transactions on Visualization and Computer Graphics, vol. 5, no. 1, pp. 47-
61, 1998.

H. Hoppe, "Progressive meshes," Proceedings of ACM SIGGRAPH '96, vol. 30,
pp. 99-108, New Orleans, Louisiana, 1996.

R. Pajarola and J. Rossignac, "Compressed progressive meshes," IEEE
Transactions on Visualization and Computer Graphics, vol. 6, no. 1, 2000.

117

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

J. Li and C. C. Kuo, "Progressive coding of 3D graphic models," Proceedings of
IEEE Multimedia and Systems, pp. 1052-1063, 1998.

C. L. Bajaj, V. Pascucci, and G. Zhuang, " Progressive compression and
transmission of Arbitrary triangle meshes," Proceedings of IEEE Visualization
'99, pp. 67-72, 1999.

D. Cohen-Or, D. Levin, and O. Remez, "Progressive compression of arbitrary
triangular meshes," Proceedings of IEEE Visualization '99, pp. 67-72, Los
Alamitos, California, 1999.

G. Taubin, A. Gueziec, W. Horn, and F. Lazarus, "Progressive Forest Split
Compression," Proceedings of ACM SIGGRAPH '98, pp. 123 - 132, 1998.

P.-M. Gandoin and O. Devillers, "Progressive lossless compression of arbitrary
simplicial complexes," Proceedings of A CM SIGGRAPH ’02, pp. 372-379, Texas,
2002.

A. Khodakovsky, P. Schroder, and W. Sweldens, "Progressive geometry
compression," Proceedings of A CM SIGGRAPH '00, pp. 271- 278, 2000.

J. Popovic and H. H. A. 1997, "Progressive simplicial complexes," Proceedings
of A CM SIGGRAPH '97, pp. 217-224, 1997.

F. Bossen, "On The Art Of Compressing Three-Dimensional Polygonal Meshes
And Their Associated Properties," Ph.D. Thesis, cole Polytechnique F drale de
Lausanne (EPFL), 1999.

D. Shikhare, "State of the Art in Geometry Compression," National Centre for
Software Technology, 2000.

D. Luebke, "A survey of polygonal simpliﬁcation algorithms," Dept. Computer
Science, University of North Carolina, Chapel Hill, Tech. Report TR97-045,
1997.

J. E. Lengyel, "Compression of Time-Dependent Geometry," Proceedings of
ACM Symposium on Interactive 3D Graphics, pp. 89 -95, New York, ACM Press,
1999.

118

[23]

[24]

[25]

[26]

[27]

[28]

[30]

[31]

[32]

[33]

[34]

M. Alexa and W. Miiller, "Representing Animations by Principal Components,"
Computer Graphics Forum, vol. 19, no. 3, pp. 411-418, 2000.

L. Ibarria and J. Rossignac, "Dynapack:Space-Time Compression of the 3D
animations of triangle meshes with ﬁxed connectivity," Proceedings of
Eurographics/SIGGRAPH Symposium on Computer Animation, 2003.

H. M. Briceﬁo, P. V. Sander, L. McMillan, S. Gortler, and H. Hoppe, "Geometry
Videos: A new representation for 3D Animations," Proceedings of
Eurographics/SIGGRAPH Symposium on Computer Animation(SCA03), San
Diego, California, 2003.

Z. Kami and C. Gotsman, "Compression of soft-body animation sequences,"
Computers & Graphics, vol. 28, pp. 25-34, 2004.

X. Gu, S. Gortler, and H. Hoppe, "Geometry images," Proceedings of A CM
SIGGRAPH '02, pp. 355--361, 2002.

J. Zhang and C. B. Owen, "Octree-based Animated Geometry Compression,"
Proceedings of Data Compression Conference (DCC'04), pp. 508-517, Snow
Bird, UT, 2004.

J. Zhang and C. B. Owen, "Hybrid Coding for Animated Polygonal Meshes:
Combining Delta and Octree," Proceedings of IEEE International Conference on
Information Technology (IT C C ’05), Las Vegas, NV, 2005.

D. Salomon, Data Compression: The Complete Reference, 2nd, 2000.

K. Sayood, Introduction to Data Compression, 2nd, Morgan Kauﬂnann
Publishers, 2000.

T. C. Bell, J. G. Cleary, and L. H. Witten, Text Compression, Prentice Hall, 1990.

A. Moffat and A. Turpin, Compression and Coding Algorithms, Kluwer
Academic Publishers, 2002.

C. E. Shannon, "A Mathematical Theory of Communication," The Bell System
Technical Journal, vol. 27, pp. 379-423 and 623-656, 1948.

119

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

J. Rissanen and G. G. Langdon, "Arithmetic coding," IBM J. Res. Develop, vol.
23, no. 2, pp. 149-162, 1979.

R. N. Bracewell, The Fourier Transform & Its Applications, 3rd, McGraw-Hill
Science/Engineering/Math, 1999.

E. Brigham, Fast Fourier Transform and Its Applications, lst, Prentice Hall,
1988.

O. Riol and M. Vetterli, "Wavelets and signal processing," IEEE Signal
Processing Magazine, vol. 8, no. 4, pp. 14-3 8, 1991.

A. Graps, "An Introduction to Wavelets," IEEE Computational Science &
Engineering, vol. 2, no. 2, pp. 50-61, June 1995.

C. K. Chui, An Introduction to Wavelets: Wavelet Analysis and its Applications,
Academic Press, 1992.

R. Polikar, "The Wavelet Tutorial,"
http://users.rowan.edu/~polilgar/WAVELETS/WTtutorial.html

A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer
Academic Press, 1992.

Y. Linde, A. Buzo, and R. M. Gray, "An Algorithm for Vector Quantization
Design," IEEE Transactions on Communications, vol. 28, pp. 84-95, 1980.

J. Kominek, "Introduction to Fractal compression,"
http://wwwfaqs.org/faqs/compression—fagﬂaart2/Section—8.html

W. B. Pennebakcr and J. L. Mitchell, Jpeg: Still Image Data Compression
Standard, lst edition, 1992.

J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, and D. J. LeGall, MPEG Video
compression standard, Chapman & Hall, 1996.

B. G. Haskell, A. Puri, and A. N. Netravali, Digital video: an introduction to
MPEG-2, Kluwer Academic Publishers, 1996.

120

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

Z. Kami and C. Gotsman, "Spectral compression of mesh geometry," Proceedings
of A CM SIGGRAPH ’00, pp. 279-286, 2000.

S. Gumhold and W. Strasser, "Real time compression of triangle mesh
connectivity," Proceedings of A CM SIGGRAPH '98, pp. 133-140, 1998.

M. Isenburg and J. Snoeyink, "Face Fixer: Compressing Polygon Meshes With
Properties," Proceedings of A CM SIGGRAPH '00, pp. 263-270, New Orleans,
Louisiana, 2000.

D. King, J. Rossignac, and A. Szymczak, "Connectivity compression irregular
quadrilateral meshes," Georgia Tech, Tech. Rep. TR-99-36,GVU, 1999.

O. Staadt, M. Gross, and R. Weber, "Multiresolution Compression and
Reconstruction," Proceedings of IEEE Visualization '97, pp. 337-364, 1997.

C. Zach and K. Karner, "Progressive Compression of Visibility Data for View-
dependent Multiresolution Meshes," Proceedings of WSCG'2003, Czech
Republic, 2003.

H. Hoppe, "View-dependent reﬁnement of progressive meshes," Proceedings of
ACMSIGGRAPH'97, pp. 189--l98, 1997.

H. Hoppe, "Smooth view-dependent level-of-detail control and its application to
terrain rendering," Proceedings of IEEE Visualization '98, pp. 35-42, 1998.

D. Schmalstieg and G. Schauﬂer, "Smooth Levels of Detail," Proceedings of
IEEE Virtual Reality Annual International Symposium (VRAIS'97), pp. 12-19,
New Mexico, 1997.

T. F unkhouser, "Advanced Computer Graphics Lecture Notes: Mesh
Representations,"
hm)://www.cs.princeton.edu/courses/archive/faLII02/cs526/1ectures/repspdf

"The Stanford 3D Scanning Repository,"
http://graphics.stanford.edu/data/3 Dscanrep/

121

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

C. Gotsman, S. Gumhold, and L. Kobbelt, "Simpliﬁcation and compression of 3D
meshes," Proceedings of the European Summer School on Principles of
Multiresolution in Geometric Modelling (PRIMUS), Munich, 2001.

A. Guiziec, F. Bossen, G. Taubin, and C. Silva, "Efﬁcient Compression of Non-
manifold Polygonal Meshes," Proceedings of IEEE Visualization ’99, 1999.

R. Bar-Yehuda and C. Gotsman, "Time/space tradeoffs for polygon mesh
rendering," ACM Transactions on Graphics, vol. 15, no. 2, pp. 141-152, 1996.

J. Neider, T. Davis, and M. Woo, OpenGL programming Guide, Addison-Wesley,
1993.

H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle, "Mesh
optimization," Proceedings of A CM SIGGRAPH ’93, pp. 19-26, 1993.

M. Davis and M. Tuceryan, "Coding of Facial Image Sequences by Model-Based
Optical Flow," Proceedings of International Workshop on Synthetic-Natural
Hybrid Coding and Three Dimensional Imaging (IWSNHC3DI'97), Greece, 1997.

H. Tao, T. S. Huang, H. H. Chen, and T.-P. J. Shen, "Data compression for
animated three dimensional objects," Rockwell Science Center,1nc., 1998.

D. Cohen-Or, Y. Mann, and S. F leishman, "Deep Compression for Streaming
Texture Intensive Animations," Proceedings of A CM SIGGRAPH '99, 1999.

A. F ournier and D. Y. Montuno, "Triangulating simple polygons and equivalent
problems," ACM Trans. on Graphics, vol. 3, pp. 153-174, 1984.

R. Seidel, "A simple and fast incremental randomized algorithm for computing
trapezoidal decompositions and for triangulating polygons," Computational
Geometry: Theory and Applications, vol. 1, no. 1, pp. 51-64, 1991.

A. Glassner, T. McClure, S. Benza, and M. V. Langeveld, "Chicken Crossing,"
SIGGRAPH Video Review, 1996.

122

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

C. L. J ackins and S. L. Tanimoto, "Oct-tree and their use in representing three-
dimensional obj ects," Proceedings of Computer Graphics and Image Processing,
vol. 14, pp. 249-270, 1980.

N. Ahuja and C. Nash, "Octree Representations of moving objects," Proceedings
of Computer Vision, Graphics and Image Processing, vol. 26, pp. 207-216, 1984.

" 3D Trilinear Interpolation,"
http://www.siggraph.org/education/materials/HyperVis/vised/VisTech/VisProcess
N isInterp/interpS .html

R. Pozo, "Template Numerical Toolkit," http://m_ath.nistgov/tnt/index.html

I. Witten, R. Neal, and J. Cleary, " Arithmetic Coding for data compression,"
Communications of the ACM, vol. 30, no. 6, pp. 520-540, 1987.

A. Moffat, R. M. Neal, and I. H. Witten, "Arithmetic Coding Revisited," ACM
Transactions on Information Systems, vol. 16, no. 3, pp. 256-294, 1998.

"GNU Triangulated Surface Library v. 0.7.2," http://gtssourcefogenet/

P. Cignoni, C. Rocchini, and R. Scopigno, "Metro: Measuring Error on Simpliﬁed
Surfaces," Computer Graphics Forum, vol. 17, no. 2, pp. 167-174, 1998.

"Coding of moving pictures and associated audio for digital storage media at up to
about 1.5Mbit/s-Part2: Video," ISO/IEC 11172-2, 1993.

"Coding of moving pictures and associated audio for digital storage media at up to
about 1.5Mbit/s-Part1: Systems," ISO/IEC 11172-1, 1993.

D. L. Gall, "MPEG: A Video Compression Standard for Multimedia
Applications," Communications of the ACM, vol. 34, no. 4, pp. 46-58, 1991.

123

a“ Stair-Vin?» Fiﬁ-$1.. ,,,,,,,,,,

   

ulimitiiilliliiattain