IIINHHIWWWWINI'llWill“HIIHHIHIIIWWI

   

135

This is to certify that the

thesis entitled

ROBUSTNESS 0F TEC SPEECH NATERMARKING
TO CROPPING AND ADDITIVE NOISE

presented by

Aparna Gurijala

has been accepted towards fulﬁllment
of the requirements for

Master's degreein Electrical Eng

Major professor

QM [8' Q (645*— (LA-9
Date 09 717.51%?! 2511/!

0.7639 MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

 

 

LIBRARY

Michigan State

University

 

 

PLACE IN RETURN Box to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1100 mm.“

 

ROBUSTNESS OF TEC SPEECH WATERMARKING TO
CROPPING AND ADDITIVE NOISE

By

Aparna Gurijala

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Department of Electrical and Computer Engineering

2001

ABSTRACT

ROBUSTNESS OF TEC SPEECH WATERMARKIN G TO
CROPPING AND ADDITIVE NOISE

By

Aparna Gurijala

The widespread use of the intemet has created a need for technologies for the
protection of copyrighted digital information. Digital watermarking is one such
technology in which a preferably imperceptible signal (watermark) is embedded into a
copyrighted host signal. Digital watermarks are prone to a wide range of “attacks” and
other forms of distortion. In this work, the robustness of a new watermarking method
based on transform encryption coding (TEC) to cropping and additive noise is
investigated.

Experiments were conducted to test the robustness of TEC speech watermarking
to additive noise under different conditions including different SNRs and watermark
masking algorithm parameters. Although a cropping attack is easy to implement, the
resulting desynchronization severely hinders watermark detection and recovery. A
dynamic programming (DP) based algorithm for the detection of cropped speech samples
and reconstruction of the cropped stego—signal to enable watermark recovery has been
developed. Implementation details of the DP algorithm and performance under different
environmental conditions are presented. Factors inﬂuencing the robustness of TEC

speech watermarking are analyzed.

ACKNOWLEDGMENTS

I would like to acknowledge Dr. J.R. Deller, my advisor for his invaluable
guidance, encouragement and support. Special thanks to Dr.Deller for his very helpful
remarks and suggestions that greatly contributed to my learning and understanding.
Special thanks to Dr.Seadle and Dr.Radha for their consideration, patience and effort.
The time spent by Dr.Deller, Dr. Seadle and Dr.Radha to ensure the completion of my
thesis is truly appreciated.

Personally I would like to thank my parents for their love and encouragement. My

thanks to all my friends for their kindness and help.

iii

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

CHAPTER 1
INTRODUCTION
Watermarking for the National Gallery of the Spoken Word
A typical watermarking system
Properties of digital watermarks
Classiﬁcation of watermarking techniques
Attacks on watermarking systems
Document overview

CHAPTER 2

DIGITAL WATERMARKING OF SPEECH USING TEC
Watermarking algorithm
Correlation detector
Security and robustness

CHAPTER 3
ROBUSTNESS STUDY
Additive noise
Cropping
Algorithm for watermark recovery from cropped speech
Memory and computational requirements
Cropping in the presence of additive noise
Counterfeit attacks

CHAPTER 4
IMPLEMENTATION DETAILS, RESULTS AND CONCLUSIONS
Robustness testing engine
Robustness to additive noise
Robustness to cropping
Implementation details of the modiﬁed DP algorithm
Experimental results

vii

viii

0000me

11
15
16

2O
22
24
28
28
29

32
33
42
42
45

Robustness to cropping in the presence of noise
Conclusions

CHAPTER 5
FUTURE WORK

REFERENCES

48
49

50

52

LIST OF TABLES

. Quality rating

. Robustness to Gaussian noise (constant gain factor)

. Robustness to Gaussian noise (adaptive gain factor)

. Robustness to uniformly distributed noise (constant gain factor)
. Robustness to uniformly distributed noise (adaptive gain factor)

. Robustness to cropping and additive noise (adaptive gain factor)

vi

33

35

37

4O

4O

47

LIST OF FIGURES

1. A typical watermarking system

2.

3.

Watermarking process
Watermark recovery

Encryption using quasi m-arrays

. Watermarking selectively to watermarking the entire record

Encryption and decryption processes

. Noise amplitude distribution
. Cropping in speech and images

. Dynamic programming approach to recovering cropped

speech samples

10. Robustness of TEC watermarking to Gaussian noise

(constant gain factor)

11. Robustness of TEC watermarking to Gaussian noise

(adaptive gain factor)

12. Robustness of TEC watermarking to uniformly distributed noise

(constant gain factor)

13. Modiﬁed implementation of DP algorithm

14. DP algorithm for watermark recovery

vii

11

12

13

14

17

22

23

25

36

38

41

43

46

Chapter 1

INTRODUCTION

1.1 Watermarking for the National Gallery of the Spoken word

The National Gallery of the Spoken Word (NGSW) [1] project is creating an
online database of spoken word collections, spanning the 20th century. These collections
are mainly drawn from Michigan State University’s Vincent Voice Library, MSU
Museum, Chicago Historical Society and Northwestern University. They include Thomas
Edison’s ﬁrst cylinder recordings to the voices of Theodore Roosevelt, Florence
Nightingale, and Babe Ruth. The aural resources for the NGSW are in the digital form.

Representation of information in digital form has many properties that make it
preferable to analog forms. An unlimited number of digital copies can be made with ease
and accuracy. This beneﬁt, however, has been a cause of concern for intellectual property
owners and content providers. The widespread use of the Internet coupled with the
developments in compression techniques facilitates fast and efﬁcient distribution of
digital content. However, while easy to implement, distribution of copyrighted digital
information without authorization threatens intellectual property rights. Copyright laws
protecting analog information are inapplicable to digital information. As a result, there is
a need to develop techniques for protecting the ownership of digital content and for
tracking intellectual piracy.

Digital watermarking is one such technique. Digital watermarking is the process
of embedding a permanent and preferably imperceptible signal into a copyrighted host

signal. The embedded signal may typically convey information about the owner, author

or carrier. More information about the need for watermarking in the NGSW project is
found in [7].

The concept of watermarking has its origins in the ancient Greek technique of
steganography or “covered writing” — interpreted as hiding information in other
information. Detailed information on the history of steganography and watermarking is
found in [8]. Applications of digital watermarking include copyright protection,
ﬁngerprinting, authentication, copy control, owner identiﬁcation, broadcast monitoring,
security control and tamper prooﬁng. Watermarking can be used to protect virtually any
form of digital information including images, speech, music, and video.

Most of the digital watermarking schemes have been developed for images. Audio
watermarking schemes include the method due to Boney et al. [9] in which the
watermark is generated by ﬁltering a PN-sequence with a ﬁlter that approximates the
frequency masking characteristics of the human auditory system, and then accounting for
temporal masking. Bassia and Pita [10] developed an audio watermarking method that
modiﬁes the temporal characteristics of the audio signal in accordance with a seed
(watermark key) known only to the copyright owner. In [11] an audio watermarking
technique operating in the Fourier domain is presented. Bender et al. [12] use
homomorphic signal processing techniques to place information imperceptibly into audio
streams by the introduction of closely spaced echoes. Luy et al. [13] proposed a multi—
purpose audio watermarking scheme that embeds two complementary watermarks — one
for audio authentication and the other for the detection of tampered regions. The spread
spectrum watermarking technique developed by Cox et al. [14] can be applied to audio,

image, video and multimedia data.

This paper is concerned with the robustness of the digital speech watermarking

technique employing transform encryption coding (TEC) [2, 3].

1.2 A typical watermarking system

A typical watermarking system consists of a watermark generator, an embedder, a
watermark detector and possibly a component that distorts the stego-signal (deﬁned
below).

Signal Key
67

Watermark
generator Cover-signal Key

....................................... -.,
Distorted : .

Watermark Stego—I'""""""”: stego— "—1—” t k Recovered
- E _ , i . a ermar

Signals Distortion :srgnal detection watermark

Kn---"

 

at:

 

 

 

 

 

 

 

. - : or es/no
Watermark ——>5 rntroducer he or recovery 9)’
embedder i 5 component

 

 

 

Stego-key

 

 

* Same as the one used for watermark generation.
Figure 1. A typical watermarking system
Due to the wide variations in the watermarking techniques, it is difﬁcult to
generalize and characterize a “typical” watermarking scheme. To account for the vast
variations in watermarking approaches, certain inputs are indicated by dotted lines in
Figure 1, meaning that they may not be present in all techniques. A signal for which

copyright protection must be provided is called a cover-signal. A watermark is a signal

that is embedded into the cover-signal for this purpose in accordance with the stego-key'.
The stego-key ensures the imperceptibility of the watermark and thus introduces
additional protection, by making the watermark location unknown. A watermark may
take different forms — an encrypted or modulated speech sequence or image, least
signiﬁcant bit manipulations, a pseudo-random sequence. As a result, the inputs to
watermark generators are highly diverse. For example, in the audio watermarking
technique proposed by Bassia and Pitas [10] the input signal is the cover-signal itself, and
the key is a randomly generated constant. In the spread spectrum watermarking scheme
of Cox et al. [14], the input signal is the same as the key and comprises a pseudo-random
sequence.

Watermark embedding techniques may be additive, multiplicative or
quantization-based [15] and may operate in the space or time domain, or in some
transform domain. The output of a watermark embedder is the stego-signal. The stego-
signal should be perceptibly similar to the cover-signal, in spite of the presence of the
watermark. Watermark detectors are classiﬁed as type I or type II. Type I detectors
require knowledge of the cover-signal to extract the watermark from the stego-signal.
Type II detectors provide a yes or no answer to question of whether the watermark is
present in a distorted stego-signal. In the motivating application for this work, the TEC
speech watermarking system employs a type I detector.

Typically the term “watermark” is used to refer to the processed (modulated,
encrypted, etc.) form of the original signal to be embedded in the cover-signal as

indicated in Figure 1. However, in this document, “watermark” will refer to the

 

1 In the TEC speech watermarking technique the stego-key is the constant or adaptive gain factor of the
masking algorithm [2].

unprocessed watermark signal. The result of the processing step will be called the

encrypted watermark.

1.3 Properties of digital watermarks

Some essential properties of watermarks are as follows:

I Perceptual transparency: Inserting a watermark into the host or cover-signal will
alter the cover-signal in some way. If the amount of alteration does not introduce any
perceptual degradation then the watermark is said to be perceptually transparent
[16,17]. This ensures that the value of the original material is not reduced by the
presence of the watermark.

o Robustness: Robustness refers to the degree to which a watermark can survive an
“attack” or distortion. An attack is a deliberate attempt to remove the watermark or
hinder its recovery. The watermark should not be able to be destroyed without
simultaneous destruction of the cover-signal. A successful attack is one that removes
the watermark or obstructs the recovery process without causing perceptual
degradation of the cover-signal [16].

0 Unambiguity: A recovered watermark should unambiguously identify the owner of
the watermarked material.

0 Security: Encryption keys, if any, used in the watermarking process, and keys used
for watermark generation, should be very difﬁcult to predict, guess, or otherwise
ascertain.

Another important property is the watermark bit rate [17]. This is determined by

the amount of information contained in the watermark (watermark payload) and the

amount of data needed to embed one unit of watermark information (watermark
granularity) while ensuring perceptual transparency.

For greater robustness it is desirable to have stronger components of the
watermark in the stego-signal. This in turn will affect the perceptual transparency of the
signal containing the watermark (stego-signal). Thus there are trade-offs among the
various watermark properties that must be considered in light of the requirements of a
particular watermarking application. Further, for an application like fragile watermarking
[19] robustness is not desirable. In such a case, fragile watermarks that get destroyed by
some or all of the transformations are used. The degree of adherence to the ideal
properties is dictated by the requirements of the particular application and the availability

of resources. More information is found in [16]-[19].

1.4 Classiﬁcation of watermarking techniques

Watermarking techniques are classiﬁed according to the domain in which the
watermark is inserted, the requirements of the watermark detection process, or the
availability of the keys.

Watermarking schemes are categorized as restricted- or unrestricted-key
watermarking schemes based on the relative availability of the key(s) [20]. Schemes in
which the keys are available to all the watermark detectors are called unrestricted-key
schemes. In the case of restricted-key schemes, the knowledge of keys is conﬁned to a
small number of detectors. The TEC-based speech watermarking scheme is a restricted
key scheme. Though such a categorization appears to be mainly based on a difference in
usage, the complexity and suitability of a watermarking algorithm differs between the

two C3868.

Schemes that require knowledge of the cover-signal to recover the watermark are
said to be non-oblivious [21]-[23]. TEC-based watermarking is non-oblivious.
Watermark recovery is effected by subtracting the cover-signal from the stego—signal.
Non-oblivious techniques generally yield more robust watermarks. However, non-
oblivious watermarking may be more prone to protocol attacks [20]-[22] due to the
availability of greater freedom for creating fake cover-signals and hence fake
watermarks. For example, a hacker may succeed in developing a suitable fake watermark
(say, a pseudo-random pattern). On subtracting it from the stego-signal, to which he or
she has access, a fake original can be created. The hacker now claims to be the owner of
this original. Of course, oblivious watermarking schemes are more prone to attacks based
on neutralizing the detector (if there is access to one as in the case of the DVD copy
control problem [20]) readings. The cover-signal is not required during the detection
process in oblivious watermarking and may be treated as noise. Oblivious watermarking
methods permit faster detection of the watermark and include bit-wise or noise-dependent
methods. These methods are sensitive to even small variations of the stego—signal and are
thus more fragile [22]

A watermarking strategy is designated as a spatial (time) or transform domain
technique according to whether the watermark is embedded into the cover-signal in the
signal or the transform domain. In the present application in which audio rather than
image data are watermarked, the term “signal,” rather than “spatial,” domain is more
appropriate.

If the same key is required in the watermark recovery or detection process as that

used for watermark embedding, the scheme is said to be symmetric. The need for

asymmetric or public key watermarking arises when the user of the copyrighted
information [23] must perform watermark detection. In this case there is a set of two keys
— a public key and a private key. The private key is required for watermark embedding
and recovery, and is known only to the owner. The public key is given to the users solely
for watermark detection. Knowledge of the public key should not provide any
information about the private key, and should not compromise the security and
robustness of the scheme. A variation on this idea occurs in the TEC strategy. A “public”
key is made available to descramble the speech signal, but this process has the effect of

further encrypting the watermark rather than detecting it.

1.5 Attacks on watermarking systems

Digital watermarks are prone to a wide range of attacks [15, 20] and other means
of distortion. As mentioned earlier, an attack is an attempt to remove the watermark or
preclude its recovery, while ensuring tolerable or no apparent damage to the stego-signal.
An attack can also be an attempt to create ambiguity of ownership. Attacks include those
due to common signal processing operations like resampling, compression, ﬁltering, D/A
conversion, and requantization. Introduction of noise can also affect a watermark.
Deliberate manipulations of the content like cropping, rescaling and rotation can severely
hinder watermark recovery.

By using secure keys, a cryptographic attack like brute-force key search [25] can
be thwarted. In the attack by statistical averaging [18, 20, 25], a large number of
differently watermarked copies of the same stego-signal may be averaged to get the

attacked stego-signal. Collusion attack differs from an attack by statistical averaging in

the sense that only portions of the stego-signal and not the entire stego-signal are used to
create the attacked stego-signal [25, 26].

Counterfeit attacks [15, 24, 25], including inversion attack, multiple watermarks,
and copy attack, attempt to undermine the concept of watermarking itself by producing
fake originals or fake watermarked signals. Watermarking an already marked signal (the
problem of multiple watermarks) can negate the utility of any watermarking scheme. To
counteract these attacks, watermark registration with a trusted authority has been

proposed by some parties [21, 23].

Distortion is normally the result of signal processing operations or the presence of
noise. Attacks encompass the different types of distortion that may be unintentionally

introduced into the stego-signal.

Robustness against attacks is a very important aspect of a watermarking scheme.
A particular watermarking scheme may not be robust to all forms of attack. An attack on
the watermark may be directed at either removing the watermark or hindering its
recovery while causing tolerable apparent damage to the stego-signal. When the attack
hinders watermark recovery, then the general remedy is to attempt to identify the attack
and to undo the damage. Duric et al. [22] describe a method of recognizing distorted
images and recovering watermarks using identiﬁcation marks, salient features of the
image invariant to transformations like cropping, scaling and rotation. In the present
paper, watermark recovery from cropped speech is accomplished using a dynamic
programming approach. Most watermarking techniques are susceptible to damage caused
by cropping due to its desynchronization of the watermark detection and recovery

process.

1.6 Document overview

Research in the ﬁeld of watermarking is progressing in different directions. New
watermarking techniques are being devised [11, 13, 27], new attacks on watermarking
schemes are being identiﬁed [15, 18, 20, 25], benchmarks to evaluate the different
watermarking schemes are being developed [15, 28], and algorithms for watermark
detection and recovery after attacks and other forms of distortion are being developed
[22, 24]. This document describes work that was mainly directed at TEC-based
watermark detection and recovery after being subjected to certain attacks.

Chapter 2 presents the TEC-based speech-watermarking technique. Chapter 3
describes the robustness of this watermarking scheme to different attacks that include
additive noise, cropping, or a combination. Algorithms for watermark recovery when
subjected to these attacks are described. Chapter 4 is concerned with Matlab
implementation, experimental results and performance evaluation under different

conditions. Chapter 5 comprises a description of future work.

10

Chapter 2

DIGITAL WATERMARKIN G OF SPEECH USING TEC

2.1 Watermarking algorithm

Transform encryption coding (TEC) was originally developed by Kuo et al. [3] as
an algorithm for image compression and efﬁcient and secure transmission. It can be
applied to speech and other signals as well. TEC produces independent transform
coefﬁcients by passing the signal through an all-pass ﬁlter with unity gain. TEC derives
its encryption properties, and hence security, from the use of highly random ﬁlter
coefﬁcients. Typically, quasi m-arrays and gold code arrays [4] are used to obtain ﬁlter
coefﬁcients with the desired property of unpredictability. The phase spectrum of the
signal to be transformed is scrambled in accordance with the phase spectrum of a quasi
m-array or gold code array [3].

The speech watermarking technique developed by Ruiz et al. [2] employs TEC in
conjunction with a masking algorithm for encrypting and watermarking speech. The one
dimensional speech signal is arranged in the form of two-dimensional arrays, each having

the same dimensions as the quasi m-arrays used for the TEC process.

 

 

c 3" s
T, '1 ‘———’

—_‘Tl

 

 

 

 

 

 

 

 

 

E
W w I s=c+T."{k°'5x T2{w}}
___,. T2 .

k0.5

 

 

 

 

Figure 2. Watermarking process

ll

The watermarking process involves the application of TEC to both the cover-
signal and the watermark. Different quasi m-arrays are used for encrypting the cover-
signal and the watermark. The encrypted watermark is subjected to a masking algorithm
to ensure perceptual transparency based on the cover to watermark ratio (CWR), deﬁned

as

E
CWRdB = ioiogmmg’vé—M—l (1)

where E‘[n] and Wm are the respective short-term energy measures for the encrypted

cover and watermark signals, and k[n] is an adaptive gain factor (stego-key) at time n.
Altemately, a constant gain factor k can be used instead of k[n]. Since the encryption
process involves passing the cover and watermark signals through all—pass ﬁlters with
unity gain [see Figure 4], the energy content of the encrypted and non-encrypted signals
are similar in each case. The encrypted cover-signal and watermark are converted into a
one-dimensional arrays. The encrypted, masked watermark is then added to the encrypted
cover-signal to obtain the encrypted stego-signal. Applying the inverse TEC operation to
decrypt the cover-signal component of the encrypted stego-signal subjects the watermark

to a second level of encryption (see Figure 6).

S

 

 

 

 

 

 

 

 

 

 

 

 

r k0.5
1?. i w
+ T1 Ti1 ‘—’® I
C T Q; = k"()'5 X T2'1{T1{S‘C} }

Figure 3. Watermark recovery

12

For watermark recovery, an estimate of the doubly encrypted watermark is

obtained by subtracting the cover-signal from the stego-signal,
(2)

=S—C

Em

Finally the inverse TEC operations and the gain factor are applied to the estimated

twice-encrypted watermark (Figure 3):
w = k’1 ng' {'1'1 {3 B: k‘1 xTz‘l {1'1 {s — c}}

(D

 

(6)
Figure 4. Encryption using quasi m-arrays. (a) The original “Lena” image. (b) The
original “mandrill” image. (c) Lena encrypted using quasi m-array A (key A). (d)
Mandrill encrypted using quasi m-array B (key B). (e) Amplitude distribution of the Lena

encrypted using key A. (f) Amplitude distribution of the mandrill encrypted using key B.
The amplitude distributions are different for the encrypted versions of the two images.
However, they are similar to the amplitude distributions of the respective original images
due to the all pass nature of the encryption process.

The recovery of the watermark is only possible with the knowledge of the two
quasi m-arrays (encryption keys) used in the process. The watermark may take many

forms including speech samples or images. Part of the future work of this project will be

concerned with researching pr0perties that assure quality watermarks.

 

 

Cover-Signal Cover-Signal
Encrypted stego-srmai Encrypted steoo-signai

 

 

 

(a) (b)

Figure 5. Watermarking selectively to watermarking the entire speech record. (a)
Tire entire speech record consisting of 48387 samples in watermarked. All the recovered

watermarks are shown. (b) The watermark is embedded in the ﬁrst 16129 speech
samples.

Although the entire speech record is watermarked in Ruiz’s work [2], only
selected frames of speech may be watermarked depending upon the requirements of the
application (Figure 5). By watermarking selectively, a degree of unpredictability is
introduced about the exact locations of the watermarks. It is preferable to watermark the

higher intensity speech regions, since for a given CWR, the watermark intensity will be

14

greater in these regions. As a result, the embedded watermarks will be more robust
against certain attacks. The results supporting this are presented in Chapter 4. The
computational complexity and hence the amount of time required for watermarking will

be reduced on watermarking selected speech regions.

2.2 Correlation detector

A correlation detector (type II detector, see Section 1.2) can be used to detect the
presence of the watermark in the stego-signal when subjected to linear distortion. The
detector uses the normalized correlation between the original and possibly distorted
watermark recovery signals. The latter are obtained by taking differences between the
respective stego-signals and the cover-signal. Such a detector may not appear to be
necessary for the TEC strategy as the watermark recovery signal (possibly distorted) may
be fed to the recovery process in any case. However, the correlation detector is useful for
acquiring quantiﬁed information about the presence or absence of the watermark. This
information is crucial, for example, when an attack hinders the recovery process. The
correct alignment of the two watermark recovery signals (original and possibly distorted)
is an essential requirement for the correlation detector to perform correctly.

If s'is a speech signal that differs from the original cover-signal by an added
sequence 1], the correlation detector can be used to obtain information about the presence

or absence of the watermark in s’ as follows.
' = c + 7]. (4)

The normalized correlation between i? and(s' - c) is deﬁned as,

-(s'—c)

lW|l(s'—c)l

Em

(5)

 

p:

15

A high value of p indicates the presence of the watermark in 5'.

If the distortion has the effect of misaligning the stego-signal and the original,
then the samples must be resynchronized before using the correlation detector. Due to
such a requirement the correlation detector can be used for studying the effectiveness of
the algorithm for watermark recovery from cropped speech as described in the next

chapter.

2.3 Security and robustness

A good watermarking technique is one for which the security relies on the key
and not on the secrecy of the algorithm. Public knowledge of the watermarking
technology must not compromise security. This holds true for TEC-based speech
watermarking. Security means that only authorized parties can decode the watermark
[26]. It entails unpredictability and non-invertibility [23]. Non-invertibility of the
watermarking technique means, for a modulated or encrypted watermark signal, it is
practically impossible to ﬁnd a fake watermark that can be produced by the same process
[23].

TEC speech watermarking derives its security from the quasi m-arrays used for
cover-signal and watermark encryption. The recovery of the watermark is only possible
with the knowledge of two quasi m-arrays (encryption keys) per frame. Further,
encryption ensures secure transmission across the communication channel. It also helps
in data access control i.e., an unauthorized person cannot retrieve the information [23].
Mere encryption without watermarking cannot provide copyright protection, as the data
are unprotected and open to content tampering and c0pyright violation [23]. Hence it is

important to hookup encryption and watermarking for secure copyright protection.

16

It is generally recommended [20] that the unmarked original not be publicly
released. Enhanced security is also achieved by embedding the watermarks in random
locations of the stego-signal rather than predictably throughout the entire stego-signal.
Further, different signals being watermarked differently, copies of the same signal
similarly watermarked (alternately, better to have copies of the stego-signal rather than
the cover-signal), having more than one watermark (preferably different watermarks and
keys) in a particular stego-signal, and using keys of different dimensions, all can
contribute to security. The use of different keys avoids the obsolescence of the watermark
if a set of keys used for watermark recovery were by chance made public knowledge after
intentional tampering or copyright violation. Using quasi m-arrays and gold code arrays
(keys) of higher dimension achieves greater encryption security. This is because, the
number of available quasi m-arrays or gold code arrays increases with their dimension.
Greater security also implies increased computational complexity implying a trade-off
involved between increased security and computational burden.

TEC’s masking algorithm provides additional protection by using different
parameters (stego-keys) in a random fashion while ensuring the imperceptibility of the
watermark.

The amount of data, measured in bits, needed to embed one unit of watermark
information is termed the watermark granularity [17]. Finer granularity may result in
greater robustness against certain attacks. In the case of cropping, for example, spreading
the watermark across a large number of cover-signal samples implies greater risk of
sample loss. However, ﬁner granularity works against higher key dimension and hence

security.

17

For the robustness of the watermarking scheme, the CWR plays a very crucial
role. Lower CWR contributes to increased robustness. However the need for a

perceptually transparent watermark places a practical lower bound on the CWR.

 

(C) (d)

Figure 6. Encryption and decryption processes. (a) Original “mandrill” image. (b)
Mandrill encrypted using key B. (c) Decryption of the encrypted mandrill in (b) using
key C. If a signal is encrypted using key A (key B), it can be decrypted only using key A
(key B). By using a different key for decryption, the mandrill gets encrypted twice. (d)
Decryption of the encrypted mandrill in (b) using key B.

The next chapter focuses on the robustness of TEC-based speech watermarking to

additive noise, cropping and protocol attacks.

Chapter 3

ROBUSTNESS STUDY

The issue of watermark robustness is introduced in Chapter 1. Robustness is the
ability of the watermark to survive attacks and other forms of distortion. A watermarking
scheme is said to be robust against a particular attack, if watermark detection and
recovery are possible. This chapter deals with the robustness of TEC-based speech
watermarking to additive noise, cropping and protocol attacks.

The encryption capabilities of TEC ensure secure transmission of stego-signal
across a communication channel and secure storage in an archive. If a hacker tries to
intercept (or download) and then attempts to decrypt a transmitted (or stored) stego-signal
without the knowledge of the encryption keys, the nominal stego—signal obtained on
decryption will be unintelligible. Hence, it is sufﬁcient and necessary to use the
unencrypted stego-signal for robustness study. According to [3], the TEC encrypted
signal is insensitive and robust to channel noise. Due to the all pass nature of the
encryption and decryption processes, the noise strength in every sample of the decrypted
signal will be small.

The main focus of this chapter is robustness to cropping and additive noise.
Cropping results in irretrievable loss of information that causes desynchronization in the
watermark detection and recovery processes. As a consequence, the watermark fails to be
detected and recovered. A number of watermarking techniques, especially signal (spatial
or time) domain techniques are vulnerable to the damage caused by cropping. The
damage caused by cropping depends more on the watermark embedding (for example,

according to an additive rule) or detection strategy, than on the nature of the watermark.

l9

In order to tackle cropping, an algorithm for watermark detection and recovery from
cropped speech is presented. This algorithm can be applied to any cropped stego—signal,

even if watermarked using a different watermarking technique.

3.1 Additive noise

Addition of uncorrelated and randomly generated noise is a common attack
against a watermarked stego-signal. Techniques where the watermark is in the form of
LSB modiﬁcations, are especially prone to such an attack or distortion. To study the
robustness of TEC speech watermarking to additive noise, independent, uncorrelated and
randomly generated noise was added to every sample of the stego-signal. The noise
amplitude was either uniformly distributed or Gaussian distributed as shown in Figure 7.

If s’ is the noisy stego-signal, then
s’ = c + “2» + 77 (6)

The recovered watermark signal will now be,

\

Em
II
Sn

+ 77 (7)

As implied by equation (6), the robustness of the watermarking technique to
additive noise depends upon the watermark to noise power ratio (WNR). The significance
of a particular value of the WNR cannot be ascertained independently of the following
factors:

i) Stego-signal to noise ratio (SNR), deﬁned as

we... =1mog.. j:

 

(8)

’7
where Ps and Pn are the signal and noise energy, averaged over the entire duration of the

speech sequence (that is the signal and noise power).

20

ins-£24m? (9)

1», $1172an (10)
s[n] and t][n] are samples of the stego-signal and noise at time n.
ii) Cover-signal to watermark ratio (CWRZ).

The CWR is inﬂuenced by whether a constant or an adaptive gain factor is used in
the masking algorithm. If a constant gain factor (k factor), is used, then the CWR varies
from samples to sample. On the other hand, when an adaptive gain factor (k[n] factor) is
used, the CWR is a constant throughout. In this case, robustness will also depend on the
temporal placement of the watermark in the cover-signal. Since the intensity of the
embedded watermark is adapted to the intensity of the cover-signal, the strength of the
watermark will be greater in the higher intensity regions of speech. Experimental results
for different CWRs, SNRs, and watermarks are presented in Chapter 4. To assess the
signiﬁcance of the experimental results, a group of individuals were asked to look at or
listen to the results. The squared error (E) and normalized correlation (p) were used in
conjunction, to provide quantiﬁed information. It was inferred from the experimental
results that the damage survived by the watermark was sufﬁcient to lower the commercial
value of the attacked stego-signal, when the embedded watermark was mildly

perceptible. A detailed discussion of the results is presented in Chapter 4.

 

C[n]

2 .
deﬁned rn(l)as, CWR =1010 ____~__
4” g” k[n]xW[n]

21

 

lﬂtmnrmsn mm mm / j Gasumuwrbuso um
’.——.——.—— k V .

 

 

   

-2 0 2 .
\‘0 *0 (Vitamin J

(a) (b)

Figure 7. Noise amplitude distribution

3.2 Cropping

Cropping is an attack on the content of the stego-signal wherein samples of the
signal are deleted in a random or deterministic manner. About 1 in 50 speech samples
may be cropped without introducing any perceptible difference. Cropping may be an
intentional attack or unintentionally introduced distortion. It is extremely easy to
implement, but most digital watermarking schemes are vulnerable to the damage caused
by it.

One method of identifying the attack to be cropping is by making use of the cross-
correlation between the original watermark recovery signal (obtained by taking the
difference between an undistorted stego-signal and the cover-signal) and the attacked
watermark recovery signal. If samples are indeed cropped from the stego-signal, the
normalized cross-correlation continues to sharply decrease as more and more cropped

samples are encountered.

 

22

Cropping desynchronizes the recovery process, making watermark recovery
difﬁcult. Hence there is a need for an algorithm to identify the cropped samples, and to

undo the damage caused by cropping, in order to make possible watermark recovery.

 

(b)
l
I

(C) (d)

   

Figure 8. Cropping in images and speech. (a) Original mandrill image. (b) Cropped
mandrill image. (c) 1000 samples from the speech “Theodore Roosevelt talks about
Wilson and Taft”. (d) A cropped version of the speech in (c). About 1 in 50 samples were
cropped. It can be observed that there is greater predictability in the manner in which
cropping manifests itself in images than in speech.

Duric et al. [22] make use of registration patterns (invariant features of an image)
to recognize and restore images that are subjected to detection—disabling afﬁne

transformations. Typically, the registration patterns, also known as identiﬁcation marks

might be groups of points that exhibit uniqueness. Watermarks are then recovered from

23

the restored images. Such a methodology works for images, due to the manner in which
cropping manifests itself in images. On cropping, the aspect ratio, shape or resolution of
the image is generally affected. Hence, the effects of cropping are more predictable in the
case of images (see Figure 8). Due to the random nature of speech, the geometg does not
facilitate the derivation of registration patterns that are unique and invariant to
transformations. Even in the case of images, the registration patterns can be exploited or
attacked [20] to undermine their functionality.

Hence, in order to deal with cropping in speech, a dynamic programming based

approach to identify the cropped samples and undo the damage was favored.

3.2.1. Algorithm for watermark recovery from cropped speech

A recovery algorithm is presented which is based on the concept of dynamic
programming [5]. An attempt to temporally align the samples of the cropped stego-signal
with the original stego-signal using dynamic programming (and hence dynamic time
warping (DTW) [5]) will inherently determine the (former) time locations of the cropped
samples.

Consider the i-j plane (as shown in Figure 9) with the cropped stego-signal (test
string) along the i-axis and the stego-signal (reference string) along the j-axis.
Determination of the cropped samples is treated as the problem of ﬁnding the minimum
distance path through the grid. A path is a collection of nodes of the form (t(i), sm)
connecting the original and terminal nodes. Distances or costs are assigned to paths in the

form of nodal costs. The cost associated with the node (t(i), s0)) is deﬁned as,

4.0.» = W) - so»? (1 1)

24

3(5)
3(7)

s(i+N)
s(i)
s(N+1) -

N+I

\llsU)

(0.0) 1(1) t(i) t(7)

 

 

W

~.

Figure 9. Dynamic programming approach to recovering cropped speech
samples.

The search for the optimal path is described as follows. Let S be the length (total
number of time samples) of the uncropped stego—signal, and T be the length of the
cropped stego-signal. Assuming that no additional or duplicate samples are added to the
stego—signal, the number of samples cropped is

N=S—T. 0%
The following search constraints are imposed on the search region to limit the amount of
computation and to ensure appropriate matching between the test and reference strings:
Monotonicity. For the path to be monotonic it must advance in the upward direction, i.e.,
it should not go “south” or “west” in the grid. Further, movement of the path in the
horizontal or the vertical direction is prohibited as a single test sample cannot be
associated with more than one reference sample and vice versa.
Global path constraints. Since N samples are cropped and the path can only move in the

upward direction, element t(i) of the cropped stego-signal can be matched only with the

25

(N+1) elements s(i) to s(i+N) of the stego-signal. A similar constraint is applied at the
endpoints. The result is a constrained search region in the form of a diagonal strip as
shown in Figure 9.

Local path constraints. As every sample of the cropped stego-signal is contained in the
original stego-signal, the optimal path should include all the test string elements. That is,
no skips are permitted along the i-axis. At most, N reference string samples may be
skipped in the process of ﬁnding the optimal path, as N samples were cropped. Thus, for
node (t(i), s(i)) in the search region, the possible immediate predecessor nodes include
(t(i-I), s(k)) where k ranges from (H) to (i-I).

As a consequence of the Bellman optimality principle [5], the optimal path to the
node (t(i), s(i)) can be found by considering the best paths associated with all the possible
predecessor nodes and choosing the one with the minimum cost,

Dmin (i, j) = min (i-I,k){Dmin(i-1. k) + dnU» 1.)}, k =(i-1).----,(l"1) (13)

After all the nodes in the search region are considered, a set of N+1 optimal paths
is obtained. The ﬁrst path, that is the one that involves zero skips, is the same as the
cropped stego-signal. The global optimal path is the one associated with least cost among
them. If the ﬁrst path is associated with the least cost, then it implies that the last N
samples of the stego-signal were cropped. It can be observed that the number of optimal
paths is one more than the number of cropped samples. This follows as a direct
consequence of the search constraints and equation (13). Although the paths might have
common nodes, they never traverse each other. At every node (t(i), s(m of a particular

optimal path, it is necessary to record the immediate predecessor node from which the

26

path was extended. This way the path may be reconstructed by backtracking beginning at
the terminal node.
The overall algorithm based on the principles above involves the following steps:
i) Initialization: The original node is (0,0) and the nodal cost associated with it is zero.
(0,0) is the only predecessor associated with nodes (t(l), s(D), j = 1,. . .,(I+N).
Danae) = dn(0.0) + dn(1.1). j = 1......(1+N)
M], j) = (0,0), j= I,....,(1+N)
MI, 1') = the index of the predecessor node to (1, j).
51(1) = Dmm(1.J). J' = 1......(1+N)
ii) Recursion:
For i = 2,...,T
Forj = i,...,(i+N)

Compute Dmin(i, j) using (13).

(Dmm(i-1, j) is held in (5,0)).

(MI, j) is recorded for every (i, j)).

51(1) = Dmin(i. 1')

Next j

Next i
iii) Termination: The best path is the one associated with the least cost.
min (Danna. 1)}. J'= T.....(T+N)
iv) Reconstruction: The best path accurately identifies samples of the cropped stego-
signal that are present in the stego-signal. The cropped samples are the ones, which are

not present in the cropped stego-signal. The reconstructed stego-signal can be obtained

27

easily by reinserting the cropped samples at the appropriate places of the cropped stego-
signal.
v) Watermark recovery: The watermark recovery process is applied to the reconstructed

stego-signal.

3.2.2. Memory and computational requirements

The algorithm requires about (N+1)T nodal costs or distance measures to be
computed and approximately ((N+1)(N+2)T)/2 implementations of equation (13).
Considering the memory requirements, a matrix of size 0(TS) must be allocated for
backtracking. This requirement cannot be replaced by the use of N+1 arrays of size 0(T)
each. Such a replacement will require precise knowledge of the nodes comprising each
path and this information will not be available until the entire algorithm has been
executed. To compute Dmin(i, j) at every (i, j) within the search region, it is necessary to
have just the past Dmin(i-1, j) values for j = (i-I),...,(i-1). Therefore, at most an array of

dimension IX(N+I) is required assuming that the computation can be done in-place.

3.2.3 Cropping in the presence of additive noise

Though TEC speech watermarking is fairly robust to additive noise, the recovery
process is severely affected even if one sample is cropped. However, it was found that
the DTW algorithm for watermark detection and recovery functioned quite efﬁciently in
the presence of independent uncorrelated random noise. The experimental results for

different SNR and CWR values are discussed in the next chapter.

28

3.3 Counterfeit attacks

Also known as protocol attacks [11, 18, 21, 24, 25], counterfeit attacks seek to
undermine the concept of watermarking itself by producing fake originals or fake
watermarked signals. Counterfeit attacks are not concerned with destroying the
' embedded watermark nor disabling the recovery process. In the context of counterfeit
attacks, robustness has a different meaning. A watermarking scheme is said to be robust
against counterfeit attacks if the attack does not succeed in creating ambiguity in the
resolution of ownership (or any other purpose for which watermarking is used).

There are different types of counterfeit attacks including inversion attacks,
multiple watermarks, and copy attacks. The basic idea behind watermark copy attack is to
copy a watermark from a stego-signal to another signal without the knowledge of the
watermarking algorithm and the key that were used to create the rightful stego-signal [15,
25]. This is achieved by estimating the embedded watermark either by direct prediction
or denoising [25]. In the case of TEC speech watermarking, the coefﬁcients of the
embedded doubly encrypted watermark are outcomes of Gaussian random variables. For
watermark recovery, a good estimate of these coefﬁcients and knowledge of encryption
keys will be essential. Thus, the copy attack will be extremely difﬁcult to implement in
the case of TEC watermarking.

In an inversion attack, the attacker subtracts his or her watermark from the stego-
signal. The attacker thus obtains a fake cover-signal (original) and claims to be the owner
of the watermarked signal. This can create ambiguity in the resolution of the ownership

of the stego-signal. Craver et al. [11] show that non-invertibility of the embedded

29

watermarks is essential for robustness against inversion attack. Non-invertibility of TEC
speech watermarking is discussed in Section 2.3.

The problem of multiple watermarks arises when an attacker inserts another
watermark into the already watermarked signal and claims ownership of the signal. As a
consequence, this creates ambiguity in the resolution of ownership. The TEC speech
watermarking technique can be made robust against such a problem as discussed below.

Suppose person A is the real owner of a speech watermarked using TEC. A’s

stego—signal is,

€12

s = c + (14).
Person A releases only the watermarked speech and not the cover-signal to the public.
Person B obtains a copy of s and is interested in selling illegal copies. B embeds another
watermark wB into sand circulates illegal copies of SB. It is assumed that W}; is
embedded in s in accordance with an additive rule and also that W]; is not correlated with
s. This ensures that the distortion produced as a result of watermarking the already
marked (using TEC) stego—signal is linear and uncorrelated. Robustness against distortion
that is non-linear and correlated with the stego-signal is beyond the scope of this
research.

s3 =c+uzr+w3 (15).

Person A comes across one of the illegal copies and recovers w from it. When A

tries to sue B, B claims ownership of sB . However this fails to create enough ambiguity

due to the following reasons.

30

i) It will not be possible for B to show a copy of the speech that does not contain A’s
watermark.

ii) A has the cover-signal and stego-signal that do not contain B’s watermark, giving
credence to the proposition that A is the true owner.

iii) Copies of the stego-signal (if any) not circulated by B contain A’s and not B’s

watermark.

31

Chapter 4
IMPLEMENTATION DETAILS, RESULTS AND

CONCLUSIONS

4.1 Robustness testing engine

In this chapter, the experimental results obtained by testing the robustness of TEC
watermarking to additive noise and to cropping are presented.

One main problem faced by the current digital watermarking technology is the
absence of common benchmarks for the evaluation of different watermarking schemes.
Petitcolas [28] proposes the establishment of a public benchmarking service. The
performance metrics to be used for evaluation are yet to be established. Software
packages StirMark [29] and unZign [30] include robustness testing engines for image
watermarks. Such services provide a common platform for the evaluation of different
watermarking techniques. Public domain software for testing the robustness of audio
watermarking techniques is not yet available.

To study the robustness of TEC speech watermarking, the robustness testing engine
developed by Ruiz et al. [35] was used. The testing engine can be used to perform 17
tests on the stego-signal. The tests include addition of random noise, cropping, ﬁltering,
u—law compression and expansion. The robustness testing engine accommodates a high
degree of ﬂexibility for setting the parametric values characterizing the tests. For the
evaluation of TEC speech watermarking, an error measure was determined according to

the following equation.

32

12..

— (W _ wr)2

A2
W

(16)

In (16), it indicates the original watermark recovery signal and w' indicates the

watermark recovery signal obtained from a distorted stego-signal. In addition to the error

E the normalized correlation p, deﬁned in (5), is used to evaluate the performance. A

quality rating (see Table l) on a scale from 1 to 5 was used to quantitatively describe the

perceived results. Martin used a similar rating in [34] to rank the quality of the

watermarked image. Two individuals were asked to rate the quality of the stego—signal,

distorted stego-signal and the watermarks recovered form the distorted stego-signal in

accordance with Table 1. These ratings were obtained without providing the individuals

with the knowledge of the error and normalized correlation values. A quality rating of 3

for the recovered watermark is considered sufﬁcient and it implies that the watermark is

 

 

 

 

 

 

 

identiﬁable.
Table 1 - Quality rating
Rating Quality of the Quality of the recovered Effect of distortion on the

watermarked signal watermark stego-sigpal

1 Watermark imperceptible Excellent No perceptible damﬂe

2 Perceptible, not annoyin Good Perceptible

3 Slighm annoying Fair Mildly degrading

4 Disturbiﬂ Poor Degradiﬁ

5 Very disturbing Bad Destructive

 

 

 

 

 

4.2 Robustness to additive noise

Experiments were performed for studying the robustness of the stego-signal

against uncorrelated additive noise. Gaussian or uniformly distributed noise was used.

For all the experimental results and simulations presented in this section, a record of

48387 samples (3 seconds) of the speech “Theodore Roosevelt talks about Wilson and

33

Taft” [32] was used as source material. The signal is monaural, sampled at 16kHz with
l6-bit quantization. The “Lena” image was used as the watermark.

Table 2 enumerates the experimental results obtained by adding randomly
generated Gaussian noise to the stego-signal. A set of three watermarks was embedded in
the 48387-sample speech waveform, by dividing it into three frames, each consisting of
16129 samples (Figure 10). For the last ﬁve entries in the table, a watermark was
embedded selectively in the ﬁrst frame as it was associated with higher speech energy.
For all results in Table 2, the masking algorithm used of a constant gain factor. Since
every sample of the encrypted watermark was scaled by a constant, the CWR as deﬁned
in (1) was not a constant, but a varying quantity. In Table 2 the average CWR values over
every frame and across the entire speech segment are shown. When a constant gain factor
is used, the mean CWR across the entire speech segment varies widely from the CWRs
averaged across individual frames. The SNR and the normalized correlation between the
distorted and original watermark recovery signals are tabulated.

It can be inferred from Table 2 that robustness against Gaussian additive noise
depends on the CWR and the SNR. A lower CWR and a higher SNR contribute to
increased robustness. Since the embedding process is independent of the speech intensity,
watermarking selectively does not contribute to increased robustness and all the
recovered watermarks are of the same quality. In interpreting the normalized correlation
. value, it must be noted that it is dependent on both the SNR and CWR. Even if the SNR
' is low, the normalized correlation between the original and distorted watermark recovery
signals may be high, if the CWR is low. When the embedded watermarks were very

mildly perceptible, corresponding to a mean CWR of approximately 26 dB, the mean of

34

the recovered watermarks was identiﬁable for an SNR of 42.63dB. In this case, the noise
had the effect of just mildly degrading the stego-signal. When the mean CWR was 21.4

dB, better robustness was exhibited. However, the watermark was perceptible in the

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

speech.
Table 2 - Robustness to Gaussian noise (constant gain factor)
CWR Mean Gaussian SNR Norm. Quality Effect Recovered
(dB) CWR noise (dB) Correl of of noise watermarks
(dB) ation stego- on
p signal stego-

1 2 3 p 0 Signal 1 2 3
31.4 19.1 27.6 26.04 0 .0046 26.61 0.3491 2 4 5 5 5
31.4 19.1 27.6 26.03 0 .00092 42.63 0.8784 2 2 4 4 4
31.4 19.1 27.6 26.06 0 .00009 62.60 0.9985 2 2 2 2 2
26.4 14.2 22.6 21.06 0 .0092 22.59 0.3161 2 4 5 5 5
26.4 . 14.2 22.6 21.05 0 .0046 28.63 0.5479 2 3 5 5 5
26.4 14.2 22.1 21 .06 0 .0037 30.59 0.6302 2 3 4 4 4
26.4 14.2 22.6 21.06 0 .0023 34.67 0.7925 2 3 4 4 4
26.4 14.2 22.6 21.06 0 .00092 42.64 0.9562 2 2 2 2 2
21.4 9.2 17.6 16.06 0 .0046 28.68 0.7605 3 4 4 4 4
26.4 - - 26.38 0 .0092 22.55 0.1865 2 4 5 - -
26.4 - - 26.38 0 .0023 34.65 0.5988 2 3 4 - -
26.4 - - 26.38 0 .00092 42.58 0.8805 2 2 3 - -
21.4 - - 21.38 0 .0046 28.60 0.5531 3 4 4 - -
21.4 - - 21.38 0 .00092 42.63 0.9581 3 2 2 - -

 

 

 

 

 

 

 

 

 

 

 

 

 

 

35

 

Cover-sums Mean recovered watermark

Encrypted stegosrmat

Distorted asgosngnat

M
II-

(a) (b)

-1 0 1 2 3 4 5 s 1: . ,

engines ,, -:
Hstogramotmewetmnanu oven/Signal attornotse atticirnonx 10

 

 

ii

 

§

Number 01 sanples
N h
8 8

  

 

.<3

 

 

 

 

ii

new
‘

 

 

 

 

 

§ 1 v v v - r

r“: mi , M

MD . ,1 W71“ 1 i v .

5 1O 50 100 15 00 250 300

g 200 - 0

831 0005 0 CNS 001 0015 0 2000 4000 6000 8000 10000 12000 14(200 16000 18000

mum
(C) (d)

Figure 10. Robustness of TEC watermarking to Gaussian noise (constant gain
factor). 48387 samples of the speech “Theodore Roosevelt talks about Wilson and Taft”
[32] was used as the cover—signal. The “Lena” image was used as the watermark. (a) The
cover-signal, encrypted stego—signal, the stego—signal distorted by the addition of
Gaussian noise and the watermarks recovered from the distorted stego-signal. The
recovered watermarks were associated with a quality rating of 4 (Table 2). (b) Mean
recovered watermark (quality rating of 3) (c) Histogram of the watermark recovery signal
before and after the addition of Gaussian noise. (d) One of the recovered watermarks.
Also shown are the histograms of the original and recovered watermarks, and the
encrypted watermark reshaped into an array.

36

Table 3 - Robustness to Gaussian noise (adaptive gain factor)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CWR Gaussian SN R Error (E) Norm. Quality Effect Recovered
(dB) noise (dB) Correl of of noise watermarks
ation stego- on
p signal stego-

u o l 2 3 signal 1 2
23.90 .0009 42.59 .071 .6202 .203 0.9599 3 2 3 5
29.90 0 .0046 28.70 .784 1.076 .957 0.3275 1 4 5 5
29.90 0 .0009 42.66 .203 .8418 .433 0.8657 1 2 3 5
29.90 0 .0001 62.67 .002 .0754 .007 0.9983 1 l 1 3
26.91 0 .0046 28.64 .703 1.087 .869 0.4396 2 4 4 5
26.90 0 .0009 42.64 .1 15 .7401 .291 0.9253 2 2 3 5
22.04 0 .0009 42.66 .058 - - 0.9500 3 2 3 -
26.04 0 .007 25.09 .849 - - 0.2435 2 4 5 -
26.91 0 .0069 25.13 .959 - - 0.3071 2 4 5 -
26.05 0 .0046 28.65 .709 - - 0.3541 2 4 5 -
26.05 0 .0009 42.65 .135 - - 0.8873 2 2 3 -
25.05 0 .0009 42.65 .1 19 - - 0.9061 2 2 3 -
28.05 0 .0009 42.63 .179 - - 0.8364 1 3 3 -
28.04 0 .0001 62.68 .002 - - 0.9979 1 I 2 -

 

 

 

 

 

 

 

 

 

 

 

 

 

 

37

 

Cover-sugnai Mean recovered watermark

    

Encrypted stegosrmat

 

(a) (b)

 

.m

000 - 4
500 - ”~* 1
Rev-WM
(I L L L A L
-1 0 1 2 3 4 5 6

Etude J
Hstogram otn'ewaennargrr'e'cmwstw dternoise attritionx 1°

Nurnberot samples

 

 

 

 

 

 

 

 

 

_ A" - . - 'r~ - , ‘1 . 1 i i 1 i .
4 -2 0 2 4 6 8 10 0 2000 4000 6000 8000 10000 12000 14000 16000 18000

(C) (d)

Figure 11. Robustness of TEC watermarking to Gaussian noise (adaptive gain
factor). 48387 samples of the speech “Theodore Roosevelt talks about Wilson and Taft”
[32] was used as the cover—signal. The “Lena” image was used as the watermark. (a) The
cover-signal, encrypted stego-signal, the stego-signal distorted by the addition of
Gaussian noise and the watermarks recovered from the distorted stego-signal. The
recovered watermarks were associated with a quality ratings of 3, 5 and 4 respectively
(Table 3). (b) Mean recovered watermark (quality rating of 3) (c) Histogram of the
watermark recovery signal before and after the addition of Gaussian noise. The shape of
the histogram before attack indicates the use of an adaptive gain factor for watermark
embedding ((1) One of the recovered watermarks. Also shown are the histograms of the
original and recovered watermarks, and the encrypted watermark reshaped into an array.

Table 3 shows the results obtained by testing the robustness of TEC speech
watermarking against Gaussian noise when the masking process uses an adaptive gain
factor (also see Figure 11). Hence, the CWR is a constant throughout the speech. The
error, determined according to (16) is also tabulated. In addition to the CWR and the
SNR, robustness (and the error) is inﬂuenced by the intensity of the speech. On
comparing Tables 2 and 3, it is inferred that for a given quality of the watermarking
process, better robustness is exhibited by speech watermarked using an adaptive gain
factor. This fact suggests the use of masking algorithms that exploit the perceptual
properties of human auditory system for increased robustness. The ultimate aim would be
to achieve robustness such that the quality of the recovered watermarks is better than, or
comparable to, the effect of noise on the stego-signal, even when the embedded
watermarks are imperceptible. That is, the rating in the recovered watermarks or at least
the mean recovered watermark column (see Table 2 or 3) is less than or equal to the
rating in the effect of noise on stego-signal column. At present, such results are achieved
only for CWRs that cause watermarks to be at least mildly perceptible. A similar
behavior was observed when experiments were conducted using non-zero mean Gaussian
noise.

Experiments were conducted to study the robustness of TEC watermarking to
uniformly distributed noise (see Figure 12). The results are tabulated in Tables 4 and 5.
When the embedded watermarks were mildly perceptible, at least one of the recovered
watermarks was identiﬁable for an SNR of approximately 30dB for the adaptive gain

factor case. When the CWR was 31.4 dB, the mean recovered watermark was identiﬁable

39

for an SNR of 28.34 dB. Taking into account the CWR, robustness of TEC watermarking

tends to be better in the presence of uniformly distributed noise over Gaussian noise.

0
0

Table 4 - Robustness to uniformly distributed noise (constant gain factor)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CWR Mean Noise SNR Norm. Quality Effect Recovered
(dB) CWR (uniform) (dB) Correl of of noise watermarks
(dB) ation stego- on
p signal stego-
signal
1 2 3 n max. M l 2 3
3 i .4 19.1 27.6 26.0 .0046 .0092 27.40 0.8876 2 4 5 5 5 5
31.4 19.1 27.6 26.0 .0023 .0046 33.41 0.9259 2 3 4 5 5 5
31.4 19.1 27.6 26.1 .0012 .0023 39.43 0.9625 2 2 3 3 3 3
26.4 14.1 22.6 21.1 .0041 .0083 28.34 0.9251 2 3 3 4 4 4
26.4 14.1 22.6 21.0 .0023 .0046 33.39 0.9568 2 3 3 3 3 3
21.4 9.16 17.6 16.1 .0046 .0093 27.38 0.9512 3 4 3 4 4 4
21.4 9.15 17.6 16.1 .0035 .0070 29.09 0.9647 3 3 3 3 3 3
Table 5 — Robustness to uniformly distributed noise (adaptive gain factor)
CWR Noise SNR Error Norm. Quality Effect Recovered
(dB) (uniform) (dB) Correl of of noise watermarks
ation stego- on
p signal stego-

:1 max 1 2 3 5'81““ M 1 2 3
29.89 .0046 .0092 27.39 .178 .294 .233 0.8103 1 4 5 5 5 5
29.91 .0035 .0069 29.90 .141 .286 .209 0.8365 1 3 5 4 5 5
29.90 .0012 .0023 39.44 .046 .214 .099 0.9351 1 2 3 3 5 3
26.90 .0035 .0069 29.92 .1 19 .271 .172 0.8682 2 4 4 4 5 4
23.92 .0042 .0083 28.29 .1 1 l .259 .157 0.8838 3 4 4 3 5 5
23.90 .0035 .0069 29.90 .077 .248 .138 0.9006 3 3 3 3 5 4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

40

 

 

Cover-stare! Mean recovered waermerk

    
    

Encrypted stegosrgnal

Distorted stage-Signet

WM
---2

 

 

 

 

 

 

 

 

(a) (b)
” r . ‘ “ ' Recoveredwaerrnarkrurberm Error01498
g 2’; . . ‘3' I :4 1‘,—
3600- j « ‘ . . . - ‘ 7.
3400- WM . _; ‘ _ '
E200 ‘ '1 '1 . .2' ‘ ~ -
‘1 .1 o 1 2 3 4 5 e ,1 .3- 14" I
an” .3 . ~ _ .' ‘r 4. ,. ,5.
I-hstogremotthewetermerkrecd'dggsrgnaietternorseeddmonx10 '_ '3 _‘ . ‘ " .‘ , -
g 092 . . .
E300 001} i 1 3 ﬁt: J
B 200 0
g 1 1 1 50 300
5 100. 05
0 r A l I I A I ‘- 0
0002 0004 0006 0008 001 0012 0014 0 2000 40m 6WD 8000 10000 12% 14mm 16000 18000
artpltude
(C) ((1)

Figure 12. Robustness of TEC watermarking to uniformly distributed noise
(constant gain factor). 48387 samples of the speech “Theodore Roosevelt talks about
Wilson and Taft” [32] was used as the cover-signal. The “Lena” image was used as the
watermark. (a) The cover-signal, encrypted stego-signal, the stego-signal distorted by the
addition of noise and the watermarks recovered from the distorted stego—signal. The
recovered watermarks were associated with a quality ratings of 4 (Table 4). (b) Mean
recovered watermark (quality rating of 3) (c) Histogram of the watermark recovery signal
before and after the addition of Gaussian noise. ((1) One of the recovered watermarks.
Also shOwn are the histograms of the original and recovered watermarks, and the
encrypted watermark reshaped into an array.

41

4.3 Robustness to cropping

The DP algorithm for the detection of cropped speech samples and watermark
recovery is described in Section 3.2.1. The Matlab implementation differs slightly from
the description found in 3.2.1. This deviation was necessary to account for the “out of
memory " problems encountered in Matlab when the algorithm was used for a speech

sequence consisting of more than approximately 7000 samples.

4.3.1. Implementation details of the modified DP algorithm

The algorithm described in Chapter 3 requires a matrix of size 0(TS), where T
and S are the lengths of the cropped and original stego-signals, respectively. The values
of T and S employed here result in out of memory problems when the unaltered DP
algorithm is implemented in Matlab. One simple remedy would be to break down the
long speech sequence (greater than 7000 samples) to shorter sequences and to apply the
algorithm separately to each of them. However, this would necessitate the determination
of the exact number of cropped samples in each of the shorter segments. For this, the
exact end-points may have to be determined by cross-correlation between the original
stego-signal and cropped speech segment in the appropriate region. Such an approach
may not perform optimally in the presence of noise, as noise might hinder the accurate

determination of the end points.

Hence, the implementation of the algorithm was modiﬁed to alleviate out of
memory problems or the need to determine the exact number of cropped samples in each
of the shorter speech segments. The modiﬁed form determines the global best path and

requires ‘p’ matrices, each of which comprises of m rows and m+N columns. Here, m is a

42

number less than 8000 and greater than N, the number of cropped samples. The
modiﬁcation involves dividing the cropped stego-signal into frames of m samples each
except the last one, which may contain less than tn samples. p is the total number of
frames, excluding the last one. The search constraints described in Section 3.2.1 are
applied here (Figure 13). The algorithm proceeds similar to the original version by
assigning costs to nodes and applying the Bellman optimality principle. During transition
from one frame to another, the costs associated with the last N+I nodes [in the case of the
ﬁrst frame they include nodes (t(m), s(l)) , m S j 2 m+N] of the previous frame are taken
as the initial costs associated with the next frame. Backtracking information for each of
the m-segment frames is stored in p matrices of dimension (m, m+N). At the end of the
last (that is, p+1‘”) frame, the global best path is chosen from the N+I optimal paths, by
selecting the one with the least cost. On backtracking across the various frames, the

global best path is reconstructed.

3(3) 3
1 NH
s(T) --------------------------------------------

 

s(2m+1+N)

s(2m+l)

 

 

s(m+1 +N)

 

s(m+l)

 

s(N-t-l)

 

 

 

~. \/

(0,0 t(rn+1) t(2m+1) t(T)

Figure 13. Modified implementation of the DP algorithm

43

The modiﬁed algorithm involves the following steps:
i) Initialization: The original node is (0,0) and the nodal cost associated with it is zero.
(0,0) is the only predecessor associated with nodes (t(I), s(J)), j = 1,. ..,(I +N).
Dmin(0, 0) = dn(0,0) , i = j = 0.
5107 = Dmin(0. 0)
ii) Recursion:
For k = I,...,p
For i = i+1,...,km
Forj = i,...,(i+N)
Dmm (i, J) = min (i-l.j){Dmin(i‘1, j) + dn(i, J)}, k = (i-I),....,(i-1)
(Dmg..(i-I, J) is held in 61(1)).
Record m(i, J).
t/Ik(i, J) = the index of the predecessor node to (i, J) in the kth frame.
51(1) = Dmrn(i, 1')
Next j
Next i

Next k
iii) Termination:

Fori = km+l,...,T
Forj = i+1,...,S
Dmin (131) = min (r-r.j){Dmrn(i-1. 1') + dn(i. 1)}, k =(i-1).----.(J'-1)
(Danna-I, J) is held in 61(1)).

Record %+1(i, J).

44

WPHU, j) = the index of the predecessor node to (i, j) in the last frame.
510) = Dunno} D
Next j
Next i
The best path is the one associated with the least cost.
min {Dmin(T: 1)}. J'= T,....(T+N)
iv) Reconstruction: By backtracking through the (p+1) frames, the global best path is
obtained. The best path accurately identiﬁes samples of the cropped stego-signal that are
present in the stego-signal. The cropped samples are the ones, which are not present in
the cropped stego—signal. The reconstructed stego-signal can be obtained easily by
reinserting the cropped samples at the appropriate places of the cropped stego-signal.
v) Watermark recovery: The watermark recovery process is applied to the reconstructed
stego-signal.
Computational requirements for this modiﬁed implementation are the same as

those for the original algorithm of Section 3.2.2. Instead of a single matrix of size 0(TS),
the modiﬁed implementation requires p matrices of size 0(m2-I-mN), where m is small

compared to T. This modiﬁcation solves the out of memory problems.

4.3.2. Experimental results

As an example, the DP algorithm was applied to a cropped stego-signal
watermarked using TEC. The cover-signal was obtained from the TIMIT speech database
[33] and has a male voice saying: “She had your dark suit in greasy wash water all year.”
In Figure 12, 7968 speech samples were used. Quasi m-arrays of dimension 63x63 were

used for encryption.

45

Cover-Signal

 

Encrypted stage-signal

 

Cropped stage-signal

 

(a)

Cover-signal

   

Encrypted stego-signal

 

Reconstructed stego-signal

 

   

(b)

Figure 14. DP algorithm for watermark recovery (a) Cropping and watermarks
recovered from cropped speech. (b) Reconstructed stego-signal obtained after the
application of the DP algorithm. Watermarks recovered from the reconstructed stego-
signal.

46

After the cover-signal was TEC watermarked, 150 samples of the stego-signal
were randomly cropped using the robustness testing engine. The watermarks recovered
from the cropped stego-signal are shown in Figure 14(a). The DP algorithm was then
applied to the cropped stego-signal to detect the cropped samples and to reconstruct the
stego—signal. The cropped samples were accurately determined and the watermarks
recovered from the reconstructed stego-signal (Figure 14(b)).

The robustness of the DP algorithm was tested with varying CW Rs and varying
numbers of cropped samples. In the absence of additive noise, the cropped samples were

accurately determined under all tested conditions .

Table 6 — Robustness to cropping and additive noise (adaptive gain factor)

 

 

 

 

 

 

 

 

 

 

 

 

CWR Gaussian SNR Number of Error Normalized Cropped
(dB) Noise (dB) cropped (E) correlation samples
samples p accurately
1l 0 determined
Yes/No
32.2197 0 2.546x10" 44.9662 19 0.2663 0.9134 Yes
32.1 181 0.0013 30.8328 19 0.8355 0.4365 Yes
32.2246 0.0023 25.7200 19 0.9234 0.3252 Yes
31.91 15 0.0026 25.3634 19 0.9902 0.2323 Yes
32.0573 0.0025 25.2290 92 1.0290 0. 1 167 Yes
31.9341 0.0128 1 1.3201 3 1.2426 0.0391 Yes
32.2893 0.01 15 12.464 19 1.1785 0.0296 Yes
31.8826 0.01 15 12.4326 92 1.2323 0.0861 No
26.0974 0.01 15 12.2692 92 1.2043 0.0960 N o

 

 

 

 

 

 

 

 

47

 

4.4. Robustness to cropping in the presence of noise

In the previous section, it was determined that in the absence of noise, cr0pped
samples were accurately determined. The DP algorithm was also tested for watermark
recovery from stego-signals distorted by additive noise as well as cropping. TEC speech
watermarking was found to be fairly (Section 4.3) robust to additive noise alone when the
embedded watermarks were not perfectly imperceptible. It is important for DP algorithm
to tolerate noise, at least to the extent to which TEC speech watermarking is robust
against additive noise.

Using Ruiz’s robustness testing engine, the stego-signal was randomly cropped
and subjected to additive noise. In all the experiments (Table 6), 961 samples of the
utterance, “She had your dark suit in greasy wash water all year” [33] was used. The
accuracy of the DP algorithm was veriﬁed by comparing the actual cropped samples with
the missing samples detected by the algorithm. The performance is mainly dependent on
the SNR. The algorithm is robust for a SNR of 11.3 dB or above. It was observed that
when the SNR approaches the 11.3dB threshold, the performance degrades, with an
increase in the number of cropped samples (see Table 6).

The DP algorithm is robust to additive noise well above the robustness threshold
(approximately 30dB when the embedded watermarks were mildly perceptible) of TEC
speech watermarking. Experiments have confirmed that for the range of importance, that

is when the recovered watermarks are identifiable, the algorithm is reliable.

48

4.5. Conclusions

The salient points from the experiments described above are summarized as follows:

The robustness of TEC speech watermarking to additive noise, is mainly dependent
on the SNR and CWR. Higher SNRs and lower CWRs contribute to increased
robustness. The need to maintain the perceptual transparency of the embedded
watermark imposes a lower limit on the CWR.

When the watermark masking algorithm involves the use of an adaptive gain factor,
better robustness is exhibited by watermarks embedded in the higher intensity regions
of speech.

The DP algorithm for the detection of cropped samples and subsequent watermark
recovery performs with 100% accuracy in the absence of noise. In the absence of
noise, the performance is independent of the number of samples cropped.

In the presence of cropping and uncorrelated additive noise, the performance of the
DP algorithm is mainly determined by the SNR and the number of cropped samples.
Unlike most watermarking techniques [9,10], TEC watermarking admits watermark
recovery, not just watermark detection. If the watermark contains information
supporting the owner or title, on recovery, this information will lead to greater

credence in the true ownership.

49

#4]

Chapter 5

FUTURE WORK

Digital watermarking is an emerging technology and it faces problems typical of
many new signal-processing endeavors. The main problems include difﬁculty in dealing
with many types of attacks, a lack of standard tools with which to assess and compare
watermarking schemes, and the lack of clear deﬁnitions of watermarking requirements
[42]. As the need for the technology increases, these problems will have to be resolved if
the methods are to be effective and therefore accepted by those depending on the
technology for copyright protection. In addition to the challenges common to all
watermarking techniques, TEC speech watermarking as described in this thesis, requires
further research in a number of areas. Some areas identiﬁed for future work are as
follows:

9 Robustness of TEC watermarking to other attacks [45] must be studied. In particular,
study of the robustness to signal-processing transformations like resampling,
compression, ﬁltering and quantization is of importance. These transformations may
be the consequence of routine and unintentional operations on the stego-signal. Some
of the other deliberate attacks to be studied include collusion attacks, cryptographic
attacks and time-scale modiﬁcation. While studying the robustness, it is also
essential to test TEC watermarking against a combination of two or more attacks.
Robustness study will entail developing a more elaborate robustness testing engine.

0 The watermark-masking algorithm as described in this document, involves scaling the
watermark by a gain factor in accordance with the CWR. Future work in this area

comprises the application of masking algorithms that exploit the perceptual properties

50

of the human ear. Application of perceptual model-based masking algorithms
becomes necessary for watermarking due to the rigid requirements of imperceptibility
and robustness. Such an application must be viewed in conjunction with robustness
against perceptual model-based compression algorithms like MPEG.

Not much research has been done in the ﬁeld towards understanding the embedding
capacity [31] offered by the different watermarking techniques. Research must be
done to determine the best strategies for utilizing the embedding capacity so as to
fulﬁll watermarking and channel bandwidth requirements.

An appropriate audio transform coding strategy must be implemented to effect
compression in conjunction with watermarking. In such a scenario, it will be
important to use audio or speech rather than image watermarks.

In the context of various attacks (although this did not matter for additive noise and
cropping), some classes of watermarks may perform than others. Future work will
also pursue understanding watermark characteristics that result in optimum
watermarks in the presence of a particular attack.

TEC was initially developed as an image compression algorithm [3]. Application of

TEC watermarking to images and the performance appraisal are areas in need of

further work

51

REFERENCES

National Gallery of the Spoken Word, Michigan State U.,
http://www.ngsw.msu.edu

Fco. J. Ruiz and JR. Deller, Jr., “Digital watermarking of speech signals for the
National Gallery of the Spoken Word,” International Conference on Acoustics,
Speech and Signal Processing 2000, Istanbul, May 2000.

CJ. Kuo, J.R. Deller, Jr. and AK. Jain, “Pre/post-ﬁlter performance improvement of
transform coding,” Signal Processing: Image Communication, vol. 8, pp.229-239,
1996.

C]. Kuo and RB. Rigas, “2-D quasi m-arrays and gold code arrays,” IEEE
Transactions Information Theory, vol. 37, pp. 385-388, March 1991.

JR. Deller, Jr., J .H.L. Hansen, and LG. Proakis, Discrete Time Processing of Speech
Signals (2d ed.), New York: IEEE Press, 2000.

A. Gurijala and JR. Deller, Jr., “Robust algorithm for watermark recovery from
cropped speech,” Proc. IEEE International Conference on Acoustics, Speech and
Signal Processing 2001 , Salt Lake City, May 2001 (in press).

J.R. Deller, Jr., A. Gurijala, and MS. Seadle, “Audio watermarking techniques for
the National Gallery of the Spoken Word,” Proc. ISt ACM—IEEE Joint Conference on
Digital Libraries 200], Roanoke, Virginia, June 2001 (in press).

F.A.P. Petitcolas, R]. Anderson and MG. Kuhn, “Information Hiding - A Survey,”
Proceedings of the IEEE, special issue on protection of multimedia content, pp.
1062-1078, July 1999.

L. Boney, A.H. Tewﬁk and K.N. Hamdy, “Digital watermarks for audio signals,”

Proc. IEEE International Conference on Multimedia Computing and Systems,
Hiroshima, pp. 473-480, June 1996.

52

10

ll.

12.

13.

14.

15.

16.

17.

18.

19.

P. Bassia and I. Pitas, “Robust audio watermarking in the time domain,” Proc. IX
European Signal Processing Conference, Rhodes, Greece, vol. I, pp.25-28, Sept.
1998.

M. Arnold, “Audio watermarking: features, applications and algorithms,” Proc.
IEEE International Conference on Multimedia and Expo (II), pp. 1013-1016, 2000.

W. Bender, D. Gruhl, and N. Marimoto, and A. Lu, “Techniques for data hiding,”
IBM Systems Journal, vol. 35, pp.3l3-336, 1996.

CS. Lu, H.Y.M. Liao, and L.H. Chen, "Multipurpose Audio Watermarking",
Proc.15th International Conference on Pattern Recognition, Barcelona, Spain, vol.
III. Pp. 286-289, Sept. 2000.

1.]. Cox, J. Kilian, T. Leighton, and T. Shamoon, “Secure spread spectrum
watermarking for multimedia,” IEEE Transactions on Image Processing, vol. 6, pp.
1673-1687, Dec. 1997.

S. Voloshynovskiy, S. Pereira, T. Pun, J.K. Su, J.J. Eggers, "Attacks and
Benchmarking, " submitted to IEEE Communication Magazine, 2001.

1.]. Cox, M.L. Miller, J. M.G. Linnartz, T. Kalker, "A review of watermarking
principles and practices," Digital Signal Processing for Multimedia Systems, K. K.
Parhi, T. Nishitani (eds), New York: Marcel] Dekker, Inc., pp. 461-485, 1999.

GO Langelaar, I. Setyawan, and R.L. Lagendijk, “Watermarking digital image and
video data,” IEEE Signal Processing Magazine, vol. 17, pp.20—46, Sept. 2000.

1.]. Cox, ML. Miller, and J.A. Bloom, “Watermarking applications and their
properties,” Proc. IEEE International Conference on Information Technology:
Coding and Computing, pp.6-10, Mar. 2000.

M. Wu and B. Lin, "Watermarking for image authentication," Proc. IEEE

International Conference on Image Processing, Chicago, vol. 2, pp. 437-441, Oct.
1998.

53

 

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

I. J. Cox and J .P.Linnartz, "Some general methods for tampering with watermarks,"
IEEE Journal on Selected Areas of Communication, vol. 16, pp. 587-593, May 1998.

M. Ramkumar, A.N. Akansu, “Robust protocols for proving ownership of images,”
Proc. IEEE International Conference on Information Technology: Coding and
Computing, Las Vegas, pp. 22-27, Mar. 2000.

Z. Duric, N.F. Johnson, and S. Jajodia, “Recovering watermarks from images,”
Information and Software Engineering Technical Report, ISE-TR-99-04, Apr. 1999.

G. Voyatzis and I. Pitas, "The use of watermarks in the protection of digital
multimedia products," Proceedings of the IEEE, vol. 87, pp. 1197-1207, July 1999.

M. Ramkumar, A.N. Akansu, "Image watermarks and counterfeit attacks : Some
problems and solutions", Content Security and Data Hiding in Digital Media,
Newark, NJ, May 1999.

M. Kutter, S. Voloshynovskiy and A. Herrigel, “Watermark copy attack,”
IS&T/SPIE’s 12th Annual Symposium, Electronic Imaging 2000: Security and
Watermarking of Multimedia Content II, San Jose, vol. 3971, Jan. 2000.

J .K. Su, J.J. Eggers, and B. Girod, "Capacity of digital watermarks subjected to an

optimal collusion attack," Proc. European Signal Processing Conference, Tarnpere,
Finland, Sept. 2000.

J. J. Eggers, J. K. Su, and B. Girod, "Asymmetric watermarking schemes," GI
Jahrestagung Informatik 2000, Sicherheit in Mediendaten, Berlin, Sept. 2000.

F.A.P. Petitcolas, “Watermarking schemes evaluation,” IEEE Signal Processing
Magazine, vol. 17, Sept. 2000.

F.A.P. Petitcolas and R]. Anderson, “Evaluation of copyright marking systems,”
IEEE Multimedia Systems, Florence, Italy, vol. 1, pp. 574--579, June 1999.

F .A.P. Petitcolas, “unZign: is your watermark secure?,” In

http://www.cl.cam.ac.uk/~fapp2/watermarking/image watermarkinglunzignz

 

54

31. C. Candan, and N. Jayant, “A new interpretation of data hiding capacity,” Proc.
IEEE International Conference on Acoustics, Speech and Signal Processing 2001,
Salt Lake City, May 2001 (in press).

32. “Theodore Roosevelt talks about Wilson and Taft” audio ﬁle, Vincent Voice Library,
Michigan State U. Libraries, http://www.lib.m§u.edu/vincent/t roosevelt.m

33. W.M. Fisher, G.R. Doddington, and KM. Goudie-Marshall, “The DARPA speech
recognition research database: Speciﬁcations and status,” Proc. DARPA Speech
Recognition Workshop, pp. 93-99, 1986.

34. CG. Martin, “Digital image watermarking techniques,” In
http://home.att.net/~steamedcrab/masterns.Ef

35. Fco. J. Ruiz, A.A. Gokhale, and J .Y. Lee, Unpublished report, Class research project,
ECE 966A, Michigan State U., East Lansing, Fall 1999.

55