(.1
Hum A
uatmt
.I..~IQJ-
.. i

Inna? ..

3?...

V .‘a ,
. .

ﬁgunmmv.

. a %
. ..
f4:«

.5» ...
than.

his»? .11.. 44,
Av: 11.. about». .Lr
.t‘ 5.354” z.v
~ .1 l
.91.: . } at: I.
I}. .. I
A!" hint!

1.3 ’1', .-
4i.f~_(tnw
.I l. - a.

F“! .r..,.lr:. e. a a...
«:19... 2.....1...‘ ¢

r 1 at;

we... r

 

 

1734.833

 

 

Q UBRARY
ma Michigan State
University

 

 

This is to certify that the
dissertation entitled

SPEECH WATERMARKING THROUGH PARAMETRIC
MODELING

presented by

APARNA GURIJALA

has been accepted towards fulﬁllment
of the requirements for the

PhD. degree in Electrical Engineering

 

 

 

MajorOProfessor’s Sigyﬁtt‘ire'
0/ 7’10 V. c900;

Date

MSU is an afﬁrmative-action, equal-opportunity employer

. - -.---.—.---o-.--

.0-----.-.—.—._ .

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p:lClRCIDateDuetindd-p.1

 

SPEECH WATERMARKING THROUGH PARAMETRIC
MODELING '

By

A pama Gum'jala

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Electrical Engineering

2007

ABSTRACT

SPEECH WATERMARKING THROUGH PARAMETRIC
MODELING
By

A pama Gur‘z'jala

Parameter-embedded watermarking of speech is effected through
slight perturbations of parametric models of deeply-integrated dynam-
ics of the signal. This research focusses on speech watermarking tech-
niques based on linear-in-parameters speech models. Information is
embedded by modifying the linear predictor coefﬁcients of the original
speech, subject to ﬁdelity constraints. The modiﬁed parameters are
used to reconstruct the watermarked speech. Experiments with real
speech data are used to assess robustness and other performance prop-
erties. A particular example of watermark detector design is discussed
and performance tested.

In set-membership ﬁltering (SMF) based parametric watermarking,
linear predictor (LP) coefﬁcients of the original speech are modiﬁed
subject to an objective ﬁdelity constraint. SMF is used to obtain a
hyperellipsoidal set of allowable parameter perturbations (i.e., water-

marks) subject to a constraint on the error between the watermarked

and original material. This research discusses the robustness of SMF
based watermarking to ﬁltering, quantization and combination attacks.
An important consideration in watermark robustness is the energy of
the watermark signal (difference between watermarked and original sig-
nals). Watermarks of higher energy are obtained from perturbed LP
coefﬁcients at the boundary of the hyperellipsoidal set. A constrained
optimization problem is solved to obtain the best watermarks for ﬁl-
tering and quantization attacks.

Finally, a generalized framework for parametric speech watermark-
ing is presented. In addition to the LP model, other parametric repre—
sentations such as log area ratio, inverse sine, line spectrum pair, and
reﬂection coefﬁcients are used for speech watermarking. An application
of perturbed parameter theory for autoregressive models is presented.
The perturbed parameter theory is used to obtain bounds on the per-

turbation of the stegosignal caused by watermarking.

ACKNOWLEDGMENTS

I would like to thank my faculty advisor Dr. Jack Deller for his
guidance, patience, understanding and support throughout my grad-
uate studies at Michigan State University. I sincerely thank him for
providing me with an opportunity to conduct research in speech wa-
termarking and for creating a conducive learning environment. I am
very grateful to my PhD committee, Drs. Aviyente, Jain, Radha, and
Seadle, for their concern, patience, and encouragement. The classroom
experience and research discussions with my professors have invaluably
contributed to my knowledge and understanding.

I would especially like to thank my family for their love, patience and
understanding. My parents always put great on emphasis on education
and were willing to support me in every possible way. I would also like
to acknowledge my brother, Ashok for his encouragement and his great
interest in technology. I would like to thank Ali for his encouragement
and conﬁdence in me and for the numerous research discussions we had.
I would like to thank Mujahid, Dale, and Margaret for their kindness,
encouragement and support throughout my PhD program. Ali, Mu-
jahid, Dale, and Margaret made my stay at MSU a very memorable
one.

This work is supported by the National Science Foundation of the

iv

United States under Cooperative Agreement No. IIS-9817485. I would
like to acknowledge NSF for their generous support to the National
Gallery of Spoken Word project. Any opinions, ﬁndings, and conclu-
sions or recommendations expressed in this material are those of the
author and do not necessarily reflect the views of the National Science

Foundation.

.I

 

 

Table of Contents

List of Tables viii
List of Figures ix
1 Introduction 1
2 Background 7
2.1 Speech watermarking .................... 8
2.1.1 Spread spectrum watermarking .......... 9

3

2.1.2 Watermarking integrated with speech synthesis . 11
2.1.3 Pitch and duration modiﬁcation for watermarking 12

2.2 Set-membership ﬁltering .................. 14
2.2.1 Overview of SMF .................. 14
2.2.2 Set-membership weighted recursive least squares . 15

2.3 Lagrange Multipliers .................... 16

Parametric Speech Watermarking in the LP Domain 20

3.1 Introduction ......................... 20
3.1.1 An algorithm for LP parametric watermarking . . 21
3.1.2 Recovering LP parameter-embedded watermarks . 24
3.1.3 Perceptual aspects of LP parametric watermarking 26
3.1.4 Security issues .................... 28
3.1.5 A detection algorithm for LP parametric water-

marking ....................... 31

3.2 Experiments and discussion ................ 42
3.2.1 Introduction ..................... 42
3.2.2 Subjective perceptual tests ............. 47

3.2.3 Watermark robustness ............... 48

vi

4 LP Parametric Watermarking with a Fidelity Constraint 67

4.1 Introduction ......................... 67
4.2 SMF parametric watermarking ............... 68
4.3 Robustness optimization .................. 71
4.3.1 Optimal watermarks for a ﬁltering attack ..... 72
4.3.2 Optimal watermarks for a quantization attack . . 74
4.3.3 Maximizing watermark energy ........... 75
4.4 Experiments and discussion ................ 76
5 Generalizations and Extensions 80
5.1 Introduction ......................... 80
5.2 Generalized framework for parametric watermarking . . . 81
5.3 Experiments and discussion ................ 86
5.3.1 Subjective perceptual tests ............. 87
5.3.2 Robustness experiments .............. 88
5.4 Perturbed parameter models in watermarking ...... 92
5.4.1 Time-varying AR models in watermarking . . . . 94

5.4.2 Application of perturbed parameter Markov equa-
tions to watermarking ............... 94
6 Conclusions 100

Bibliography 103

vii

all

 

 

 

 

2.1
3.1
3.2
3.3
3.4
3.5
4.1
5.1
5.2
5.3
5.4
5.5
5.6

List of Tables

SM-WRLS algorithm ....................

Watermark embedding algorithm .............
Watermark recovery algorithm ..............
Effect of selective normalization ..............

Estimates of SNR, d2, PD and PF .............

Robustness to speech coding ................
Robustness to quantization attacks ............

Generalized watermark embedding algorithm ......
Generalized watermark recovery algorithm ........
Conversion of reflection coefﬁcients to LP coefﬁcients . .
Conversion of LP coefﬁcients to reﬂection coefﬁcients . .
Robustness to speech coding CWR.3eg of 7 dB ......
Robustness to speech coding at CWRﬁ.3g of 27 dB

viii

19

37
41

65
79

84
85
85
91
92

3.1

3.2

3.3

3.4

3.5

List of Figures

Typical noise distribution in the LP domain for any co-
efficient. For Fig. 3.1(a) 15 dB white noise was added
in time domain to the stegosignal, and for Fig. 3.1(b) 15

dB colored noise was added to the stegosignal. .....

Effect of complete normalization, selective normaliza-
tion, and no normalization of watermark coefﬁcients on
the correlation coefﬁcient between original and recovered
watermarks. In 3.2(a) the stegosignal was distorted by
white noise in the time domain, and in 3.2(b) colored

noise was added to the stegosignal. ............

Plots of (a) coversignal and (b) stegosignal at CWRéeg of
7.715 dB. The coversignal and the stegosignal are of 1 s
duration and sampled at 16 kHz. The speech is divided
into frames of 2000 samples and a watermark vector is

embedded into each of the eight frames. .........

Segments of cover (dotted line) and stegosignals (con-
tinuous line) of 480 samples or of 0.03 ms duration and
a CWRseg 0f 7.715 dB. The cover and stegosignals used
in the robustness experiments are of 1 s duration and
sampled at 16 kHz. The speech is divided into frames of
2000 samples and a watermark vector is embedded into

each of the eight frames. ..................

Watermark robustness to white noise attack. Perfor-
mance of parametric watermarking at CWRﬁeg’s of 7.715
dB and 10.68 dB is compared with that of SS watermark-
ing at 7.715 dB, 10.68 dB, 27 dB and 30 dB CWRseg.

ix

45

51

.1. |

 

 

3.6

3.7

3.8

3.9

3.10

3.11

3.12

4.1

4.2

5.1

Watermark robustness to colored noise attack. Colored
noise was generated by lowpass ﬁltering white noise.

Improvement in watermark robustness to colored noise

attack due to whitening transformation. .........

Plots of (a) Magnitude spectrum of the watermark coef-
ﬁcients, and (b) Magnitude response of the attack ﬁlter
at a normalized cut-off frequency of 0.4. A 4th-order IIR
Butterworth ﬁlter was used to test watermark robustness

to lowpass ﬁltering. ....................

Robustness to lowpass ﬁltering. A 4‘h-order IIR butter-
worth ﬁlter was used to implement the lowpass ﬁltering

attack. ...........................

Plots of (a) Magnitude spectrum of the original water-
mark coefﬁcients h[n], and (b) Magnitude response of
the transformed watermark coefﬁcient, (—1)"h[n].

Robustness to 4th-order butterworth highpass ﬁlter. In
(a), the embedded watermark coefﬁcients corresponded
to a magnitude spectrum shown in Fig. 3.10(a), and in
(b) the watermark coefficients were transformed using

equation (3.29) and embedded. ..............

Robustness to cropping. Samples of the stegosignal were
randomly cropped. Parameter-embedded watermarking

results in improved robustness to cropping .........

Filtering attack. For Fig. 4.1(a) a 4thorder IIR Butter-
worth lowpass ﬁlter was used to distort the stegosignal,
and for Fig. 4.1(b) an 8thorder FIR highpass ﬁlter was

used to attack the stegosignal. ..............

Watermark robustness to combination of non-uniform

quantization and IIR lowpass ﬁltering attacks. ......

The ﬁrst 100 bits of the 1000-bit binary watermark.

62

64

79
86

4|

 

5.2 Effect of white Gaussian noise on LP, LSP, LAR, IS and
PARCOR embedded watermarks. In 5.2(a) a CWRG.3g of
7 dB was used to obtain the stegosignals, and in 5.2(b)
a CWRseg 0f 27 dB was used to obtain the stegosignals. . 90

xi

Chapter 1

Introduction

Digital media and global access to high-speed computer networks
are creating complex copyright issues for owners of legally-protected
materials [1]. A response to the unprecedented need to protect intellec-
tual property has been the emergence of an active research effort into
digital watermarking technologies. Digital watermarking is the process
of embedding data (the watermark) imperceptibly into a host signal
(the coversignal) to create a stegosignal. The term “coversignal” is
commonly used in watermarking literature [2] to denote the host sig-
nal and the term “stegosignal” is borrowed from steganography [3] to
represent the watermarked signal. The watermark is typically a pseudo-
noise sequence, or a sequence of symbols mapped from a message. A
watermark offers copyright protection by providing identifying infor-
mation which is accessible only to the owner of the material. Only a

watermarked version of copyrighted material is released to the public.

When copyright questions arise, the watermark is recovered from the
stegosignal as evidence of title. Watermarking has been argued to be
an advantageous solution to this modern copyright problem, and there
is strong evidence that the practice will be accepted by the courts as
proof of title [1].

The design of a watermarking strategy for speech involves the bal-
ancing of two principal criteria. First, embedded watermarks must be
imperceptible to the listener. That is, the stegosignal must be of high
ﬁdelity. Second, watermarks must be robust. That is, they must be
able to survive attacks [4] - those deliberately designed to destroy or
remove them, as well as distortions inadvertently imposed upon the
watermarks by technical processes (e.g., compression) or by systemic
processes (e.g., channel noise). These ﬁdelity and robustness criteria
are generally competing, as greater robustness requires more watermark
energy and more manipulation of the coversignal, which, in turn, lead to
noticeable distortion of the original content. Related measures of a wa-
termark’s efficacy include data payload, the number of watermark bits
per unit of time [2]. Another important requirement of a watermarking
strategy is its security, the inherent protection against unauthorized
removal, embedding or detection. A watermarking scheme generally
derives its security from secret codes or patterns (keys) that are used

to embed the watermark. Only a breach of keying strategies should

“ 4'
’DA:

 

 

compromise the security of a watermarking technique; public knowl-
edge of the technical method should not lesson its effectiveness.

The speech watermarking methods described in this dissertation
involve private decoding, meaning that the coversignal is required for
watermark recovery. Private decoding techniques require additional in-
formation during watermark detection and recovery. However, among
other beneﬁts, this additional information can be used to undo cer-
tain attacks and distortion. In private decoding techniques, knowledge
of the coversignal at the detector, serves as a registration pattern to
undo any temporal or geometric distortions of the stegosignal [2]. For
example, in the case of a “cropping attack,” wherein speech samples
are randomly deleted, a dynamic programming algorithm can be used
in conjunction with the coversignal to recover the watermark from the
desynchronized stegosignal [5]. Although watermarking schemes involv-
ing public decoding (coversignal not required for watermark recovery)
are applicable in a larger set of applications, techniques involving pri-
vate decoding can be used for content tracking, broadcast monitoring,
and owner identiﬁcation, in addition to copyright protection.

Robustness requirements of watermarking algorithms are applica-
tion dependent. Watermarking algorithms are broadly categorized into
robust and fragile watermarking algorithms based on the robustness

requirements. For a given application, robust watermarking algorithms

.al

 

 

are required to survive all intentional attacks and also distortion in—
troduced by normal processing. Fragile watermarking algorithms are
required to be selectively robust. For example, in a speech authentica-
tion application of watermarking, the embedded fragile watermarks are
required to be robust to compression, channel noise, and resampling
and fragile to content tampering due to re—embedding and changes to
acoustic information. The algorithms presented in this thesis fall under
the robust watermarking category and were developed for applications
such as content management, broadcast monitoring, and copyright pro-
tection.

Watermark embedding techniques vary widely in method and pur-
pose. Watermarks may be additive, multiplicative, or quantization-
based, and may be embedded in the time domain, or in a transform
domain. Each technical variation tends to be more robust to some forms
of attack than to others, and for this and other application-speciﬁc rea-
sons, particular strategies may be better-suited to certain tasks. The
methods reported in this dissertation are motivated by the particular
properties of speech signal [6].

Parametric watermarking is based on manipulation of inear-in-
parameters speech models. The linear prediction (LP) model is a spe-
cial case of linear-in-parameters speech models that can be used for

watermarking [6]. Generally speaking, the watermark information is

.1 I

 

 

concentrated in the few LP coefﬁcients during the watermark embed-
ding and recovery processes, while it is dispersed temporally and spec-
trally otherwise [7]. The watermark recovery process involves least
square error (LSE) estimation [8] of modiﬁed LP coefﬁcients, and this
further contributes to watermark robustness. Parametric watermark-
ing provides sufﬁcient ﬂexibility in terms of watermark selection for a
wide range of data payload, robustness, and stegosignal ﬁdelity require-
ments.

In set-membership ﬁltering (SMF) based parametric watermarking,
LP coefﬁcients of the original speech are modiﬁed subject to an objec-
tive ﬁdelity constraint. SMF is used to obtain sets of allowable pa-
rameter perturbations (i.e., watermarks) subject to a constraint on the
error between the watermarked and original material. The robustness
of SMF based watermarking to ﬁltering, quantization and combination
attacks is studied. An important consideration in watermark robustness
is the energy of the watermark signal (difference between watermarked
and original signals). The most robust watermark is obtained from
perturbed LP coefﬁcients at the boundary of the membership set.1 A
constrained optimization problem is solved to obtain the best water-
marks for ﬁltering and quantization attacks.

The application that motivated the present work is the creation of

 

1This phenomenon is discussed below.

the National Gallery of the Spoken Word (N GSW), an N SF-sponsored
Digital Libraries Initiative II project. The goal of the NGSW effort is
the development and management of an extensive on—line repository of
spoken word collections, based largely on the renowned Vincent Voice
Library. Further information is available at www.1ib.msu.edu/vincent/
and in [9].

Owners of copyrighted material are often reluctant to grant permis-
sion to post such material on the internet without sufﬁcient assurances
that their rights will be protected. Accordingly, a prime interest in the
development of the watermarking scheme is the need for robustness to
the broadest possible array of attacks. On the other hand, preserving
the audio history and authenticity of the NGSW materials requires that
robustness not come at the expense of perceptible distortion.

Although the NGSW application places few constraints on com-
putational load, parametric watermarking can be implemented in real-
time. Further, since the NGSW is a permanent, large-scale, repository
of speech data with a rich meta-data support structure, the associa-
tion of relatively detailed watermarking information with records in

the database is not impractical.

Chapter 2

Background

In the last decade many algorithms have been proposed for multi-
media watermarking. Early work emphasized watermarking algorithms
that could be universally applied to a wide spectrum of multimedia con-
tent, including images, video, and audio. This versatility was deemed
conducive to the implementation of multimedia watermarking on com-
mon hardware [10]. However, many watermarking applications, includ-
ing copyright protection for digital speech libraries [11], embedding
patient information in medical records [12,13], or television broadcast
monitoring [14], involve embedding information into a single medium.
Also, the attacks and inherent processing distortions vary depending
on the nature of the data. For example, an attack on watermarked
images may involve rotation and translation operations to disable wa-
termark detection. However, such an attack is not applicable to audio

data. Watermarking algorithms that are speciﬁcally designed for par-

ticular multimedia content can exploit well-understood properties of
that content to better satisfy the robustness, ﬁdelity and data-payload
constraints. For example, unlike general audio, speech is characterized
by intermittent periods of voiced (periodic) and unvoiced (noise-like)
sounds. Speech signals are characterized by a relatively narrow band-
width, with most information below 4 kHz. Also, well-established an-
alytical models for speech production exist [6] which can be exploited

in the watermarking process.

2. 1 Speech watermarking

Most existing watermarking algorithms for speech can be catego-
rized into either spread-spectrum (SS) or speech synthesis based ap-
proaches. SS watermarking [10] is one of the earliest and best-known
watermarking algorithms applied to multimedia data. In SS water-
marking, a narrowband watermark is embedded into a wideband “chan-
nel” that is the coversignal. In the second main approach, watermarks
are integrated through speech synthesis. An advantage of integrating
watermarking with the coding process [15] is a reduction in computa-
tional complexity.

In this work, we adopt a new approach that has both spectrum-
spreading and integration-by-synthesis aspects, but which is fundamen—

tally different from the existing approaches. For speech signals, a para-

metric approach is naturally motivated by the extraordinary successes
in applying parametric models - in particular, the LP model - in several
key speech technology areas. The robustness of the LP model to practi-
cal anomalies occurring in coding, recognition, and other applications,
suggests that some representation of these parameters might provide
an effective basis for embedding durable watermarking data. Paramet-
ric watermarking provides sufﬁcient ﬂexibility in terms of watermark
selection for a wide range of data payload, robustness, and stegosignal
ﬁdelity requirements. In the strategy described here, LP parameters
of speech are directly or indirectly modiﬁed by an added watermark
vector. The stegosignal is constructed by passing the original speech
through the modiﬁed inverse LP ﬁlter and resultant is then added to

the prediction residual of the unaltered LP model.

2.1.1 Spread spectrum watermarking

An important contribution of the work of Cox et al. [10] is the
demonstration that a watermark must be embedded in perceptually
signiﬁcant components of a signal for sufficient robustness to attack.
In [10], the DCT is applied to the coversignal and the watermark is
embedded in the n (typically 1000) highest magnitude coefﬁcients of
the DCT, not including the zero frequency component. Each value of

the watermark is drawn independently from a unit normal distribution.

SS watermarking is robust to a wide range of attacks, so it is used
as a standard against which to evaluate the robustness of parametric
watermarking in this work. For the SS algorithm used to compare per-
formance in this research, the stegosignal {yj Bil is obtained by adding
the watermark sequence {93,130 to the 1000 largest DCT coefﬁcients

of the coversignal of 1 s duration.
372' = Y2“ + KAgia (21)

where each 9,- is independently drawn from N(0, l), and Y, and f’, are
the ith largest DCT coefﬁcients of the cover and stegosignals, respec-
tively. The A parameter controls the stegosignal ﬁdelity and is adjusted
to satisfy a desired ﬁdelity constraint.

In SS signaling [16,17], the watermark message is ﬁrst modulated
by a lowpass ﬁltered pseudo-noise sequence. The resulting sequence is
shaped by the LP spectrum of the coversignal, before being added to
the coversignal. The latter measure reduces perceptual distortion. The
watermark receiver whitens the stegosignal using the inverse LP ﬁlter.
The watermark receiver requires perfect synchronization between the
whitened stegosignal and the pseudo-noise spreading sequence. These
techniques have been tested in low noise environments such as in the
presence of additive white Gaussian noise with a 20 dB SN R. However,

it is not known how such algorithms will perform under more challeng-

10

 

El

 

 

 

ing channel conditions, or when subjected to deliberate attacks like

cropping, ﬁltering, or the addition of colored noise.

2.1.2 Watermarking integrated with speech synthesis

In the approach by Hatada et al. [18], line spectrum pairs (LSP) [6]
are extracted from short-term segments of the coversignal. The LSP
parameters are selected because they correlate well with the formant
location [18]. Codebook vectors are created by applying a clustering
algorithm to the extracted LSPs. Watermarked codebook vectors are
obtained by modifying the frequency components of the original code-
book vectors. The LSPs of a particular frame are quantized by either
the watermarked or original codebooks depending on whether the frame
is to be watermarked or not. The stegosignal is synthesized using the
watermarked LSPs.

Even in the absence of watermarking, the LSPs of the original
speech and those of the synthesized speech are different. In the pres-
ence of watermarking, the difference between the original and extracted
LSPs will be even more substantial. Thus watermark detection is af-
fected even in the absence of an attack. Hence, to preserve the wa-
termark information as accurately as possible, it is necessary that the
speech frames used for embedding watermark data have very small LSP

differences with respect to the synthesized speech.

11

2.1.3 Pitch and duration modiﬁcation for watermarking

Celik et al. [19] propose a speech watermarking algorithm for semi-
fragile authentication applications. In the case of semi-fragile water-
marking, robustness to selective manipulations or attacks is desired.
Celik et al. use pitch and duration modiﬁcation of quasi-periodic speech
phonemes as the features for semi-fragile watermarking. The signiﬁ-
cance of these features makes them suitable for watermarking and the
variability of these features facilitates imperceptible data embedding.
A quantization index modulation scheme is used to embed watermark
bits into these features.

The coversignal is segmented into phonemes. A phoneme is a fun-
damental unit of speech that conveys linguistic meaning [6]. Certain
classes of phonemes such as vowels, semivowels, diphthongs, and nasals
are quasi—periodic in nature. The periodicity is characterized by the
fundamental frequency or the pitch period. The pitch synchronous over-
lap and add (PSOLA) algorithm is used to parse the coversignal and
to modify the pitch and duration of the quasi-periodic phonemes [20].
The pitch periods (pp) are determined for each segment of the parsed
coversignal. The average pitch period is then computed for each seg-

ment,

P ,
wza
p=1

l2

The average pitch period is modiﬁed to embed the mth watermark bit

(u’rm) by using dithered quantization index modulation [21],
[5:09 : Qd}(p’avg + Tl) — Tl

where Q“; is the selected quantizer and 7? is the pseudo-random dither

value. Pitch periods is then modiﬁed such that,
pp“? 2 10,1) + (idling " Iéavg)

The PSOLA algorithm is used to concatenate the segments and synthe-
size the stegosignal. The duration of the segments is modiﬁed for better
reproduction of the stegosignal. As necessitated by authentication ap-
plications, watermark detection does not require the original speech.
At the detector, the procedure is repeated and the modiﬁed average
pitch values are determined for each segment. Using the modiﬁed av-
erage pitch values, the watermark bits are recovered. The algorithm is
robust to distortions caused by low-bit-rate speech coding. This is be-
cause it uses features that are preserved by low-bit-rate speech coders
such as QCELP, AMR, and GSM-06.10 [22]. Robustness to coding
and compression is necessary for authentication applications. On the
other hand, the fragile watermarking algorithm is designed to detect

malicious operations such as re-embedding and changes to acoustic in-

13

formation (e. g., phonemes).

2.2 Set-membership ﬁltering

The set-membership ﬁltering (SMF) concept was ﬁrst published by
Gollamudi et al. [23], and was more recently proposed as an innovative
solution to the design of channel equalizers for digital communication
by Nagaraj et al. [24]. SMF can be viewed as a reformulation of the
broadly-researched class of algorithms concerned with set-membership
identiﬁcation (e.g. [25,26]). The application of SMF to parametric

speech watermarking in demonstrated in Chapter 4.

2.2.1 Overview of SMF

The SMF problem is stated as follows:

 

SMF PROBLEM. Given a sequence {XT E RM}:=1 of observations, a
“desired” sequence {ZT E R}:=1, and a sequence of error “tolerances”
{7,}th1 (frequently constant with 7'), ﬁnd the the exact feasibility set at

time t, ”Pt Q RM which includes all vectors (ﬁlters), 0 6 RM, satisfying

7:, ___ {9| [2, _. 9%,] < 7,. for 7' e [1,t]}. (2-2)

 

14

Note that when 7t is constant with t, say 7, = 7, then we may write

a = {6| ”2 — 2n... < v}- (2.3)

in which z is the t-vector with ith element 22:, and z is the t-vector with
ith element xfd.

The SMF problem is solved using a series of recursions which re—
turn at iteration t an hyperellipsoidal membership set, say 8,; 2) Pt, and
the ellipsoid’s center, say 6t. The recursions execute an optimization
strategy designed to tightly bound R by St in some sense. Accordingly,
the broad class of algorithms employed in the SMF problem are often
called the optimal bounding ellipsoid (OBE) algorithms [25,26]. The
OBE algorithm used in the SMF based parametric watermarking al—
gorithm is called the set-membership—weighted recursive least squares
(SM-WRLS) algorithm, but the choice of OBE methods is somewhat

arbitrary for the present application.

2.2.2 Set-membership weighted recursive least squares

This section presents an overview of the SM-WRLS algorithm for
ﬁltering and identiﬁcation applications. The SM-WRLS algorithm is
used in the SMF based parametric watermarking algorithm. In the
SMF framework it is assumed that there is an observation sequence

{xt}f‘;1, a “desired” sequence {Zr}?:1, and a sequence of error tolerances

15

(7&5; [25]. The feasibility set at time t, ”Pt, includes all 6, such that

zT = BTXT subject to [2T — 2,] < ”y, for 'r = 1,2, . . . ,t. (2.4)

Let it be the tx M matrix with the ith row ﬁx? and let 2,; be the
t-vector with the ith element may, where {x/XTﬁzl are a set of error
minimization weights. Then the covariance matrix is given by C3, =
5(th and on = szt. The algorithmic steps involved in implementing
SM-WRLS for either identiﬁcation or ﬁltering applications are given in
Table 2.1 [27].

2.3 Lagrange Multipliers

The method of Lagrange Multipliers is a common approach for
solving constrained optimization problems [28]. The method of La-
grange Multipliers is used to obtain optimal watermarks from the mem-
bership set for a given attack on the stegosignal. In a constrained op—
timization problem, a function needs to be maximized or minimized
subject to certain conditions or constraints. A constrained optimiza-
tion problem with the variable a: E R" is characterized by an objective

function f0(a:), inequality constraint functions fi(:c) and equality con-

16

straint functions h,- (:r).

min/max f0(a:)
subject to fi(a:) S 0, for i = 1,2, . . . ,p1 (2-5)

hJ-(x) =0, for j=1,2,...,p2

In the method of Lagrange multipliers, the constraint functions are
taken into account by augmenting a weighted combination of the con—

straint functions to the objective function [28]. That is, the Lagrangian

L(:v, 51,17) is,
V P1 V P2
L(:r, A, 22) = f0(a:) + Z A,f,«(.r) + Z t,h,-(a:) (2.6)
i=1 j=1

where L : IR” x R91 x R“ —> R, {Xi}§:1 are the Lagrange multipliers as-
sociated with the inequality constraints, and {53- E11 are the Lagrange
multipliers associated with equality constraints. The Lagrange mul-
tipliers are nonzero and those associated with inequality constraints
are also nonnegative. The method of Lagrange multipliers converts
the constrained optimization problem into an unconstrained one with
n + 191 + p2 variables.

The maxima or minima of the constrained optimization problem

occur when the gradient of the Lagrangian is zero, VL($,:‘\,17) = 0.

17

That is,
V P1 V P2
V,L(x, A, i?) = 0 ¢=> foo = - (Z kvxf. + Zvjvmhj) (2.7)
i=1 j=1

and

V

V;_L(a:, ,17)=0 4:) fi=0, for i=1,2,...,p1 (2.8)

1

ijL(a:,5\,13) = o <==> h,- = 0, for j = 1,2, . . . ,pg. (2.9)
It can also be observed that,

3L V BL V
b—ﬂ- — Ai and 5);; -- V3- (210)

Equations (2.7)-(2.10) are used to obtain the optimal value of 11:. In
order to use the method of Lagrange multipliers, the objective and

constrained functions are not required to be convex.

18

Table 2.1: SM-WRLS algorithm

 

 

Initialization:

C51 = P0 = 6—11, where e is small

00 = 0

A1 = 1

k0 = [Hillbil]2 + 2?, computed after step 3 for 7' = 1

Recursion: For T = 1, 2, . . . ,t
1 C(r) and €7-1(r) are updated.
0(7) 2 szT-1xT where P, = C?

674(7) 2 25¢ —— 6::le

2 Skip step 2 if r = 1. The original A: is computed by ﬁnding a
positive root of the following quadratic equation.

{(A4 —1)G2(T)})‘2
F()\) = 0 = +{[2M — 1+ cyn63,1(r)] — I‘ET_1')’TG(T)}G(T))\
+{Ml1— ”Hg—1(7)] — Hr—IG(T)7—r}

If there are two positive roots, then the larger one is used.

3 Skip step 3 if r = 1. If A; S 0, set PT = PT_1, 0, = 67-1,
Ii,- 2 KET_1 then go to step 5. Otherwise continue with step 4.

4 Update PT, 0,, and k7.

Pr—lxrxzpr—l

_1: T:PT- -' T
c, P 1 ’\ 1+A.c(~r)

 

97 = 6T—1 + )‘TPTéT—1(T)XT

A62 7
KTZK’T—l+—T T r—1( )

7.. " 1— non)

 

5 If r < t, increment r and return to Step 1.

 

19

 

Chapter 3

Parametric Speech Watermarking

in the LP Domain

3. 1 Introduction

The general parametric watermarking algorithm is formulated in
the following way. Let {yn} denote the coversignal, and let {3),} be
the ultimate stegosignal. Each of these is assumed to be a real scalar
sequence over discrete-time n. It is assumed that the signals are gen-

erated according to operations of the form [29]

yn : 4574512.: $715 n) and ﬁn = (Dir(€~na in: In), (3'1)

in which {En}, {En}, {1:7,}, and {in} are measurable vector—valued ran-
dom sequences. The operator gb is parameterized by a set 7r, the alter-

ation of which (to create parameter set if) is responsible for changing

20

the operator (b to (I) and the sequences {ﬁn} and {an} into their “tilded”

counterparts.

3.1.1 An algorithm for LP parametric watermarking

In the present study, the coversignal is assumed to be generated

by a LP model,

All
971 = Z aiyn—i + En: (32)
i=1

a special case of the ﬁrst equation in (3.4). The “true” model is
determined by standard LP analysis of a (long) frame selected for wa-
termarking [6]. The sequence {ﬁn} is the prediction residual associated
with the estimated model. The duration of the FIR linear predictor
is naturally based on the assumed order of the LP model, M, used to

initially parameterize the speech. The stegosignal is constructed using

the FIR ﬁlter model

1M
y = Z a.yn_.- + g... (3.3)
i=1

where {iii} represents a deliberately perturbed version of the “true”
set {ai}. The algorithmic steps of the LP parameter-embedded wa-
termarking procedure appear in Table 3.1. Numerous ways in which
parametric modiﬁcation can be effected — including indirectly through
changes to other speech parameters such as log area ratio (LAR) values

or parcor values — are discussed further in Chapter 5.

21

Table 3.1: Watermark embedding algorithm

 

 

Let {yn}:o____oO denote a coversignal, and let {yn}:l‘__mc be the kth of
K speech frames to be watermarked. Then: For k = 1, 2, . . . ,K

1 Using the “autocorrelation method” (e.g., [6, Ch. 5]), derive
a set of LP coefﬁcients of order M, say {ah-Ail, for the given
frame.

2 Use the LP parameters in an inverse ﬁlter conﬁgura-
tion to obtain the prediction residual on the frame,

M "l:
{5n = 9n — 22:1 aiyn—i}

n=nk

3 Modify the LP parameters in some predetermined way to
produce a new set, say {dz-HZ]. The modiﬁcations to the LP
parameters (or, equivalently, to the autocorrelation sequence

or line spectrum pairs, etc.) comprise the watermark.

4 Use the modiﬁed LP parameters as a (suboptimal) predic-
tor of the original sequence, adding the residual obtained in
Step 2 above at each n, to resynthesize the speech over the
frame, [3].. = 2,121 aiyn—i + {n} k . (To the extent that the

71:71).
watermark represents only small perturbations to the orig—

inal LP parameters, the resynthesized result is a pointwise
approximation to the coversignal over the same time frame.)

5 The sequence {yn}::’: is the kth frame of the watermarked
speech (stegosignal).

Next k.

 

22

 

When watermark embedding involves direct modiﬁcation of LP
coefﬁcients, the embedding process can be interpreted as a digital ﬁlter

design problem. Equation (3.3) can be rewritten as
M M
9n = Z awn—.- + ZWiyn—i + ﬁn, (3.4)
i=1 i=1

wherein, the watermark sequence {0.2,- {:1 constitutes the impulse re-
sponse of an M th-order non-recursive ﬁlter. This ﬁltered version of
original speech incorporates the watermark information. Non-recursive
ﬁlters are inherently stable and less sensitive to quantization errors.
The watermark signal, wn = zglwiywi = g, — yn has a spectrum
determined by the watermark coefﬁcients and the coversignal. For ex-
ample, the watermark spectrum can be designed to have predominantly
lowpass, highpass or mid-band energy.

It is important to understand a key difference in the way LP mod-
eling is applied in this watermarking application relative to its conven-
tional deployment in speech coding and recognition. In these prevalent
applications, the goal is to ﬁnd a set of LP coefﬁcients that optimally
model quasi-stationary regions of speech. In parametric watermark-
ing, the LP model is used as a device to parameterize long intervals of
nonstationary speech without the intention of properly parameterizing
stationary dynamics in the waveform. Rather, the parameters are de-

rived according to the usual optimization criterion — to minimize the

23

total energy in the residual [6, Ch. 5] — with the understanding that
the aggregate time-varying dynamics will be distributed between the

long-term parametric code and the residual sequence.

3.1.2 Recovering LP parameter-embedded watermarks

The algorithm for recovering the watermark from the stegosignal
appears in Table 3.2. An important step in the recovery process is the
least square error (LSE) estimation of the modiﬁed watermark coefﬁ-
cients, {Zia-H11, which is executed as follows. Let us consider a length N
frame of the coversignal and rewrite the stegosignal generation equa-

tion (3.3) as

M
d, = Z a,y.,_.- = 573,, with d, = g, — gn. (3.5)
i=1

In principle, the system of equations (3.5) taken over N samples, n =
1, 2, . . . , N, is noise free and can be solved for a without error using
any consistent subset of M equations. For generality, to smooth round-
off and other errors, and to support further developments, we pose the
problem as an attempt to compute the LSE linear estimator of the “out-
put” signal, dn, given observations yn. The following normal equations

are solved,

(3,5 = cm (3.6)

24

Table 3.2: Watermark recovery algorithm
For k=1,2,...,K

 

1 Subtract residual frame {5.333; from the stegosignal frame
{am This results in an estimate of the modiﬁed predicted
speech, {dn = y~n " €n}2:

2 Estimate the modiﬁed LP coefﬁcients {in}? by computing
the least-square-error solution, say {fa-H", to the overde-
termined system of equations: d,, z 23:1 awn-“ n =
nk, . . . , 77.2,.

3 Use the parameter estimates from Step 2 to derive the cor-
responding watermark values.

Next k.

 

 

 

in which C31 = 2,721 ynyg = YNYE and Cyd = 25:1 yndn = YNdiV,
where

MxN
YN=[yN yN_1 y1]ER (3.7)

T
M: [dN d,“ d1] ERNXI. (3.8)

The LSE method is based on time averages, and its performance de-
pends on the frame length used in the estimation [8]. In the stegosignal,
the watermark information is distributed in time and is present as the
watermark signal {wn},’,V=1. During recovery the watermark information
is concentrated in a few coefﬁcients {at-h“; derived from an estimate

of the modiﬁed LP coefﬁcients.

25

3.1.3 Perceptual aspects of LP parametric watermarking

The watermark embedding process can be interpreted as (i) a mod-
iﬁcation to the LP model or similarly derived models, plus (ii) FIR
ﬁltering. This section deals with the perceptual beneﬁts of parametric
watermarking and the constraint used in this research to objectively
quantify stegosignal ﬁdelity. Listening tests were also conducted on the
watermarked speech ﬁle available at the website [30]. The results of

these tests are discussed in detail in Section 3.2.3.

Echo embedding interpretation

The LP parametric watermarking agorithm can be interpreted as
the addition of M echoes of small amplitudes and scales. The echoes are
delayed by M units or less. Typically, echoes of delay 20 m8 or less are
imperceptible. Also, since the echoes are scaled by much smaller valued
watermark coeﬂicients, the louder coversignal masks some components
of the echoes. It should be noted that the technique differs from the
echo hiding method of Gruhl et al. [31], in which binary “one” and
“zero” information is encoded in the offset and delay parameters of the

echo and not in the echo amplitude.

26

Stegosignal ﬁdelity

Fidelity is a measure of perceptual similarity between the coversig-
nal and the stegosignal. The watermarking process must not affect the
ﬁdelity of the speech beyond an application-dependent standard. A
simple and mathematically tractable measure of ﬁdelity is the signal-to—

noise ratio (SNR), or, in the present context, coversignal-to-watermark

ratio (CWR), deﬁned as,

E N 2
CWR : 1010g10 E—y— = 1010g10 %§‘='l—y%, (3.9)

n=1 wn

where w,, = [gn — y,,]. The CWR averages the relative distortion energy
of the coversignal over time and frequency. However, CWR is a poor
measure of speech ﬁdelity for a wide range of distortions. The CWR is
not related to any subjective attribute of speech ﬁdelity, and it weighs
the time domain errors equally [6]. A better measure of speech ﬁdelity
can be obtained if the CWR is measured and averaged over short speech

frames. The resulting ﬁdelity measure is known as segmental CWR [6],

deﬁned as,

k]

K
1 Z Z 3112
CWRseg : — 10l0g10 —;———§ , (3.10)
K j=1 z=k,._L+1 [I91 - 311]

where, k1, k2, ..., kK are the end-times for the K frames, each of which

is length L. The segmentation of the CWR assigns equal weight to the

27

loud and soft portions of speech. For computing Cwmeg, the duration
of speech frames is typically 15 — 25 ms with frames of 15 ms used for
the experimental results presented in this chapter.

Some of the other objective measures of speech quality include
the Itakura distance [6], the weighted-slope spectral distance and the
cepstral distance. According to Wang et al. [32], CWRseg is a much
better correlate to the auditory experience than the other objective
measures discussed above. A simple way to control the ﬁdelity of the
stegosignal is to scale the watermark vector, w, by a constant, say A,

before adding it to the original LP parameters (Step 3 of Table 3.1).

3. 1.4 Security issues

A watermark’s security refers to its ability to withstand attacks
designed for unauthorized removal, detection or embedding. A water—
marking technique must not rely on the secrecy of the algorithm for
its security. In parametric watermarking, a copy of the coversignal is
required for watermark recovery. The LP parameters of the stegosignal
are different from the modiﬁed LP values obtained by adding the water-
mark vector to the LP parameters of the coversignal. An attacker has
access to the stegosignal and not to the coversignal, prediction resid-
ual, frame length, and LP model order used for watermarking. Since

parametric watermarking involves the alteration of deeply-integrated

28

characteristics of speech signals, the embedded watermark information
is not easily determined from the resulting stegosignals. The security
of the present watermarking technique can also be further enhanced
by randomly selecting the speech frames to be watermarked, using a
different LP model order for each watermarked frame (model order also
depends on the ﬁdelity constraint), and by embedding psuedo random
watermark patterns. The LP parameters of the stegosignal can be eas-

ily obtained.

k .
g, = 2 than. + g... (3.11)
k=1

where K is the LP model order selected by the attacker. However, the
LP parameters ({dk}) of the stegosignal are different from the modi-
ﬁed LP coefﬁcients {Eli} and also, {En} is different from the prediction
residual {gn}) associated with the coversignal, even if K = M.

Impact of ambiguity attacks: Ambiguity attacks are of concern to both
private and public watermarking techniques [33]. In an ambiguity at-
tack, counterfeit watermarks are identiﬁed or created by an attacker in
the stegosignal using a different watermarking scheme. The attacker
recovers his or her watermark from the stegosignal, claims rightful
ownership of the protected signal and succeeds in causing ambiguity
about the “true” owner of the stegosignal. According to Craver et
al., two necessary conditions for robustness to ambiguity attacks are

non-invertibility and non-quasi-invertibility [33]. For a watermarking

29

technique to be non-invertible, it is essential that the mapping from
the watermarked signal {3),} to {(12,} and {yn} does not exist; where
{(11,} is the watermark carved out by the attacker and {31,} is the fake
original created by the attacker. Non4quasi-invertibility is a much more
stringent requirement. For a watermarking technique to be non-quasi-
invertible, it should be impossible for an attacker to create {L0,} and
{gn} from {32/7,}, which is perceptually similar to {37,} and such that {L0,}
still exists in {yn}, the true original. It is shown below that parametric
speech watermarking is non-invertible.

For an algorithm to be invertible, it should be possible for an
attacker to create a fake coversignal and a fake watermark from the

stegosignal [equation (3.3)].

K K
a. = Eating. + e. = Z (a. + way... + 5., (3.12)
k=1 k=1

where K is the model order selected by the attacker, g, = 25:1 e,g.....- + 6,,
is the fake coversignal, and {ﬁn} is the corresponding minimum MSE
prediction residual.

An attacker can easily compute the LP coefficients and the predic-
tion residual associated with the stegosignal. Obviously, equation (3.11)
cannot be substituted by the attacker as the model for the fake cov-

ersignal and fake watermark sequence, since the fake coversignal will be

30

the same as the stegosignal. On the other hand, an attacker can add a
sequence {V7,} to the stegosignal and then compute the LP coeﬂicients

and prediction residual.

K K
gn + 1/n = Z dk(gn-—k + l/n—k) + 6n = Z Ciky’n-k +671: (313)
k=1 1:21

The attacker can then subtract, {Vn} from {57,} to obtain the stegosig-

nal.
K ~ K
gn : Z: dktln—k + (£72. _ V72) 2 Z (dk + wkhfn—k + (£71 — V72), (3'14)
k=1 k=1

Comparing equations (3.12) and (3.14), the fake coversignal, is given by
y’,, = 2le (has. + (5," — 14,), where {In — V", is the prediction residual.
However, from equation (3.13), the fake coversignal is 3),, = {in + Vn and
the minimum MSE prediction residual is in, which is different from (5,“—
z/n) and hence this is a contradiction. Thus it will be impossible for an

attacker to invert the embedding process starting with the stegosignal.

3.1.5 A detection algorithm for LP parametric watermarking

A common approach to watermark detection employs classic bi-
nary decision theory. The hypotheses are H0 : I R = I and H1 : I R =
I + w, where I R is the received signal, I is the original signal and w is

the watermark signal [34,35]. A Bayesian or Neyman—Pearson paradigm

31

is followed in deriving the detection thresholds. For image watermark-
ing, the image DCT coeﬂicients are modeled as generalized Gaussian
in distribution [34,36]. These approaches do not consider the effect
of noise while deriving the detection threshold. Several watermark de-
tectors are based on correlation detection in the time or in the DCT
domain [37,38]. That is, the correlation between the original and re-
covered watermarks or the correlation between the original watermark
and recovered signal is compared against a threshold. Correlation de-
tectors are optimal when the watermark and noise are jointly Gaussian,
or, in case of blind detectors, when the watermarked signal and noise
are jointly Gaussian. For example, the detector presented in [2, Ch. 6],
assumes that the detector output for each bit is Gaussian distributed.
This is true for watermark patterns that are spectrally white, but this
is not the case with the watermark signal in parametric watermarking.
Hence there is a need to design a watermark detector in the parameter
domain.

This section describes a watermark detector for LP parametric
watermarking [39]. The stegosignal is distorted by additive white or
colored Gaussian noise in the time domain. The watermarks are com-
prised of (eight) non-binary orthogonal vectors of length eight. Each
of these eight vectors can be mapped to a unique symbol. For exam-

ple, each vector can be interpreted as a particular integer from the set

32

{0, 1, 2,3,4,5,6, 7}. The watermark may be composed of many such
integers or symbols. In the examples in this paper, each orthogonal
watermark vector (symbol), is embedded into 0.125 seconds of speech
sampled at 16 kHz, resulting in a bit rate of 24 bits per second (bps).
The watermark vector is added to the coefﬁcients of an eighth-order LP
model. The length of the watermark vector (and hence the predictor
model order) and the duration of speech frame can be selected arbi-
trarily, subject to constraints on stegosignal ﬁdelity. These constraints
include an upper limit on the predictor model order, and a need to use
FIR models of small order for short speech frames ("500 samples).
Extensive experimentation by the author has shown that noise
in the parameter domain, caused by stegosignal exposure to additive
noise, is well-modeled by a Gaussian distribution. Figure 3.1(a) shows
a typical noise distribution in the LP domain when white noise (SN R 15
dB) is added to the stegosignal. The noise distribution for Fig. 3.1(a)
was obtained by conducting 1000 experiments, involving a stegosignal
of 1 s duration watermarked at CWRseg of 7 dB using a watermark
message consisting of eight orthogonal vectors, each vector embedded
into 0.125 seconds (2000 samples) of speech. When white Gaussian
noise was added to the stegosignal, the noise effects on a particular
watermark coefﬁcient could be approximated as independent and iden-

tically distributed (i.i.d) Gaussian noise. The LP noise associated with

33

a particular watermark coefﬁcient was uncorrelated with the LP noise
for other watermark coefﬁcients. The parameter noise samples were
also uncorrelated with the corresponding LP coefﬁcients. It should be
noted that when the stegosignal is subjected to additive white noise,
the parameter noise asymptotically tends to zero as N —+ co (discussed
further in Section 3.2.3) and is of very low power. However, the noise

generated using the “randn” function in matlab, is not ideal white noise.

The parameter noise distribution of the stegosignal plus colored
noise is similar to that shown in Fig. 3.1(b). Colored noise was gener-
ated by lowpass ﬁltering white noise using an IIR Butterworth ﬁlter.
The LP noise affecting any given watermark coefﬁcient was found to be
i.i.d. Gaussian. However, a realization of noise affecting all the L = 8M
watermark coefﬁcients was found to be correlated with the original LP
coefﬁcients.

A solution to this problem is to normalize the watermark coef-
ﬁcients before adding them to the original LP coefﬁcients. That is,
instead of directly adding the watermark vector to the original LP co-

efﬁcients (a = a + or), we obtain the modiﬁed LP coefﬁcients as,

(ti = (12' + (oz-[a2]. (3.15)

From the estimate of the modiﬁed LP coefﬁcients, the watermark vector

34

Distribution of noise affecting the 5th watermark coefficient
300 r M I l f T j l l

 

250 r r

200 r *

Frequency
5:
O

_L

o

O
1

sol

 

 

 

0 1 1 1 1 1 1 1 1 1
-1 -0.8 -0.6 -O.4 -0.2 0 0.2 0.4 0.6 0.8 1
Amplitude

(a)

Distribution of noise affecting the 5th watermark coefficient
300 . . . . f

 

250 - r

200 r ‘

Frequency
8 8

01
O
r
1

 

 

 

—0.1 -0.05 0 0.05 0.1 0.15
Amplitude

(b)

Figure 3.1: Typical noise distribution in the LP domain for any coefﬁ-
cient. For Fig. 3.1(a) 15 dB white noise was added in time domain to
the stegosignal, and for Fig. 3.1(b) 15 dB colored noise was added to
the stegosignal.

35

is obtained as,

 

012':

la'l (3.16)

with 5 2 {iii $11, as deﬁned in Table 3.2. However, when [ail << 1, the
recovery of watermark coefﬁcients magniﬁes the noise variance in the
LP domain. To avoid this, watermark coefﬁcients are normalized before
embedding, but only if [ail _>_ 1. For the experiments presented in the
rest of the chapter, the watermark embedding and recovery involves this

”

“selective normalization. Accordingly, Step 3 of Table 3.1 is carried

out using the following rule in the present algorithm:

a,- +wi|a,-|, if [a1] _>_1

N

az- + Luz, otherwise

The ﬁnal step in the recovery algorithm (Table 3.2) involves the follow-

ing equation:

)
A
Q“

i— ail/(lad), if lat] Z 1

a, — ai, otherwise

In Table 3.3, ,u and 02 are the parameter noise mean and vari-
ance, and cra(0) is the cross-correlation between the recovered vector
and the original LP coefﬁcients. The values of ,u, 02, cra(0) were de-

termined by conducting 1000 experiments, involving a stegosignal of

36

 

 

 

 

 

 

 

Table 3.3: Effect of selective normalization
Noise SNR Normal 11 02 c,a(0)
(dB) -ization

White 10 no 2.849 x 10‘4 0.0517 —0.0059
White 10 complete —0.0152 4.6477 —0.0051
White 10 selective —6.2 x 10-5 0.1099 7.645 x 10-4
Color 15 no 2.3438 x 10’4 0.0049 0.0328
Color 15 complete 0.0139 0.7518 —0.0094
Color 15 selective —1.162 x 10’4 0.0071 0.0023

 

 

 

 

 

 

 

 

1 s duration. The stegosignal was watermarked at CWFIS.3g of 7 dB
using a watermark message consisting of eight orthogonal vectors, each
vector embedded into 0.125 seconds (2000 samples). Selective normal-
ization of watermark coefﬁcients signiﬁcantly reduces the correlation
between noise in LP domain and the LP coefﬁcients, especially when
the stegosignal is subjected to colored noise in the time domain (see
Table 3.3). Moreover, as the noise variance in the LP domain is re-
duced, selective normalization improves the cross-correlation between
the original and recovered watermarks compared to the complete nor-
malization case. Figures 3.2(a) and (b) also show an improvement in
the correlation coefﬁcient values when selective normalization is used.
The watermark detection process is treated as a binary decision

problem in the presence of additive noise. Preliminary watermark de-

37

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   
 

 

 

 

 

 

 

 

0.9 ~
,_, 0.8 -
c
.92
,9 0.7 ’
"é
o 0.6 -
= i
.9 0.5
E
g 0.4 r
8
0.3 .
Selective normalization
0.2 . + Complete normalization -
No normalization
0'1 1 . 1 1 J r x
«15 -10 -5 0 5 10 15 20
SNR
(a)
1 - A V l
0.9 P f q
E
.92
.9 0.8]
”a":
o
c 0.7 -
.9
E
E) 0.6r
5
0
0.5 ~ .
+ Complete normalization
0 4 _ — No normalization
' Selective normalization
-10 -5 0 5 10 15
SNR

(b)

Figure 3.2: Effect of complete normalization, selective normalization,
and no normalization of watermark coefﬁcients on the correlation co-
efﬁcient between original and recovered watermarks. In 3.2(a) the
stegosignal was distorted by white noise in the time domain, and in
3.2(b) colored noise was added to the stegosignal.

38

tection experiments are used to set the hypotheses,

H1: r,=w,-+v,-, i=1,2,...,L

where {r,-},-L=1 is the set of elements in the observation vector. The null
hypothesis is that no watermark is present and only noise is transmit-
ted {v,-},L:1, while under H1, both watermark {62,-}1‘ and noise samples

i=1
{vi ,L=1 are present in additive combination. Due to selective normaliza-
tion of watermark coefﬁcients, noise in the LP domain, v,- is distributed
as N(0, 02), when noise {Qﬁil is added to the stegosignal in the time
domain such that the SNR is 31 = 101og10 [(271]; 9,2,)/(Z,1:’=1 (3)]. For

this watermark detection problem, the expressions for false—alarm, de-

tection and missed-detection rates are well-known and are given by

(e.g., [40]), 1 L 2
PF 2 0.5 [erfc (lnr + 57'? Zi=l w,- )[ (3.17)

 

ﬁe

 

1 +1 .L_ 2—’
PD=0.5 [erfc(nT 552%;1‘“: #1)] (3.18)

PM = 1 — PD (3.19)

Here, u] = (202)”122:1 wf, 6 = (202)‘1 Zlewf and r is the detec-

tion threshold. Let r” = o2 lnr + 0.522-1’2111)? then, the decision rule

39

is

H1

L >
Zriw, — T”. (3.20)
i=1 <

H0

In a practical implementation, the threshold 7'”, corresponding to an
SNR of 5'1, can be adjusted further if the actual SNR in the time
domain is determined. As an example, if the SNR were found to be
Sg = 10log10 [(2371:]:1 g3) / (25:1 (3)] (assuming zero-mean noise), the
threshold 7” is altered by multiplying 02 with the adjustment factor
1/6, where ,6 = 10(51‘52l/10.

The SNR in the parameter domain is deﬁned as, d2 = (,11‘1/6)2 [40].
In the present case, d 2 VIII. Hence embedded marks of greater energy
will result in improved robustness, while noise of higher variance in the
parametric domain will hinder watermark detection. The stegosignal
was subjected to additive white and colored noise, resulting in different
SNRs in the time and parameter domains. In each case, experiments
were repeated 1000 times in order to estimate the mean and variance of
the Gaussian noise affecting each watermark coefﬁcient. Receiver op-
erating characteristics (ROC) were determined using equations (3.17)

and (3.18). It is observed in Table 3.4 that very low false-alarm rates

40

Table 3.4: Estimates of SNR, d2, PD and Pp

 

 

 

 

 

 

Noise SNR (dB) d2 PD PF 7”
White 15 696.95 0.99999 4.37x10-m 6.8699
White 10 72.79 0.99994 1.37x 10-6 4.3960
Colored 7 167.29 0.99999 1.20x 10-18 5.4038
White 3 14.45 0.9987 0.215 1.6610
White 1 9.54 0.99715 0.37304 0.8388

 

 

 

 

 

 

 

 

can be achieved using parametric watermaking with selective normal-
ization. For example, when 10 dB white noise is added to the stegosig-
nal, for a threshold 7” = 4.3960, a PD = 0.99999 and a false-alarm
rate Pp = 1.37 x 10“6 is obtained. Experiments were performed for
time domain SN Rs of 1 dB and 3 dB and Pp was found to be 0.14 and
0.0033 respectively, an improvement over the results in Table 3.4. It
should be noted that for time domain SN Rs below 10 dB, the resulting
stegosignals are degraded to the point of being unuseful as surrogates
for the coversignal. Comparing the SNR in the time and parameter
domains it can be observed from Table 3.4 that parametric watermark-
ing signiﬁcantly boosts the SNR. The resulting parameter noise gain

suppression contributes to improved watermark detection.

41

3.2 Experiments and discussion

3.2.1 Introduction

Robustness refers to the ability of the watermark to tolerate distor-
tion from any source to the extent that the quality of the coversignal is
not affected beyond a set ﬁdelity standard, or that the watermark detec-
tion and recovery processes are not hindered. Experiments performed
to investigate the perceptual and robustness aspects of LP parametric
watermarking are presented in this section. Some of the factors affect-
ing the robustness of the present technique include the length of the
speech frame to be watermarked, the choice of watermark sequence, the
relative energy of the watermark, and the temporal locations and du-
rations of the watermarks in the stegosignal. In broader terms, water-
mark robustness also depends on the watermark embedding, recovery,
and detection algorithms.

For the experiments below, speech was watermarked using both
LP based parametric and SS watermarking algorithms. Both LP and
SS watermarking algorithms involve private decoding. In the experi-
ments presented below, the coversignal [shown in Fig. 3.3(a)] consists
of 1 s of speech from the TIMIT database [41], sampled at 16 kHz. The
sentence “She had your dark suit in greasy wash water all year.” is

uttered by a female talker. For the robustness experiments, parametric

42

 

0.8
0.6 7
0.4 r

0.2 -

Amplitude

-O.2 -
—O.4 ~

-0.6 -

 

 

 

—O.8

 

 

0.8 r

0.6 -

Amplitude

 

 

 

 

 

' o 0.2 0.4 0.6 0.8 1
Time (s)

(b)
Figure 3.3: Plots of (a) coversignal and (b) stegosignal at CWR68g of
7.715 dB. The coversignal and the stegosignal are of 1 s duration and

sampled at 16 kHz. The speech is divided into frames of 2000 samples
and a watermark vector is embedded into each of the eight frames.

43

watermarking is implemented at CWRseg’s of 7.715 dB and 10.68 dB.
SS watermarking was implemented at CWRseg’s of 7.715 dB, 10.68, 27
dB, and 30 dB. This is explained further in Section 3.2.2. The sam-
ple correlation coefficient is used as the measure of similarity between
original and recovered watermark vectors for both parametric and SS
watermarking techniques. The correlation coefﬁcient between two ran-
dom variables 0) and r is given by

cw..(0) — E(w)E(r)

ow 0,.

 

, (3.21)

where cw. is the cross-correlation between to and r, E (w) and E (r) are
the expected values of w and r, and of, and of are the variances of
w and r, respectively. For sample correlation coefﬁcient, the expected
values of w and r are replaced by the samples means rm, 2 7132le w,
and m. = %Zf=1r,~. And the variances, of, and of, are replaced by
sample variances of w and r given by varw = %Z,.L=1(wi — m...)2 and
var. = %Z,L:1(r,- — m1.)2, respectively. The sample cross-correlation
between to and r at lag 0 is %Z,-L=1wiri. Since the watermark vectors
are mutually orthogonal, the correlation coefficient between distinct
watermark vectors is 0.

Bit error rate is another commonly used performance measure of
similarity between the original and recovered watermarks. The bit er-

ror rate (BER) is deﬁned as the ratio of number of bit errors to total

44

 

 

 

 

 

 

 

Amplitude

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.26 0.265 0.27 0.275 0.28 0.285
Time

Figure 3.4: Segments of cover (dotted line) and stegosignals (continuous
line) of 480 samples or of 0.03 ms duration and a CWRseg of 7.715 dB.
The cover and stegosignals used in the robustness experiments are of 1
s duration and sampled at 16 kHz. The speech is divided into frames
of 2000 samples and a watermark vector is embedded into each of the

eight frames.

45

number of bits transmitted. In this work, it is more relevant to use
correlation coefﬁcient than BER because it is more important to char-
acterize the performance based on the detection and recovery of the
entire watermark vector rather than the individual bits or watermark
vector elements. The probability of signal or watermark vector error
is also a useful performance measure. The relation between correlation
coefﬁcient, probability of signal error and BER can be found in [42].
In a practical implementation, a recovered vector, possibly containing
the watermark, is ﬁrst sent to the detector of Section 3.1.5, which is
governed by the decision rule in equation (3.20).

For LP parametric watermarking, the speech was divided into eight
frames of 2000 samples each or 0.125 seconds duration. The watermarks
were comprised of (eight) non-binary orthogonal vectors of length eight.
In each of the speech frames, a length eight watermark vector was
embedded into to the coefﬁcients of an eighth-order LP model, resulting
in a bit rate of 24 bits per second. For the parametric watermarking
experiments presented in this section, the watermark embedding and
recovery involved selective normalization. The resulting stegosignal
[Fig 3.3(b)] was subjected to various attacks discussed below.

For the SS algorithm, the stegosignal {37,-};le was obtained by

adding the watermark sequence {gi},12(]0 to the 1000 largest DCT coef-

46

ﬁcients of the coversignal of 1 s duration.
372' = Y,(1 + A91),

where every g,- is independently drawn from N(0,1), and Y,- and 17,
are the ith largest DCT coeﬂicients of the coversignal and stegosignal,

respectively. The A parameter is adjusted to obtain a desired CW&eg.

3.2.2 Subjective perceptual tests

Although CWR66g is used as the objective measure of ﬁdelity, lis-
tening tests were also performed to compare the watermarked speech
ﬁdelity. Speech was watermarked using both parametric and SS algo-
rithms for CWR68g ranging from 1 dB to 40 dB.

For the robustness experiments discussed in the following section,
two implementations of LP parametric watermarking at CWRseg of
7.715 dB and 10.68 dB were used. Parameter-embedded watermarks
were inaudible at these or higher CWRseg [30]. Different CWRsegvalues
can be selected depending on the ﬁdelity constraint for a given ap-
plication. For performance comparison, implementations of SS wa-
termarking at 7.715 dB and 10.68 dB were also used. Additionally,
listening tests were performed, to subjectively identify CWRseg’s of SS-
watermarked stegosignals, whose ﬁdelity was comparable to the 7.715

dB and 10.68 dB implementations of parametric watermarking. This

47

was imperative, as an objective measure such as CWRseg, although an
improvement over CWR, does not satisfactorily quantify all the per-
ceptual aspects of ﬁdelity. Five subjects were asked to select the SS
watermarking implementations that sounded most similar to the 7.715
dB and the 10.68 dB implementations of LP parametric watermarking,
from a set of stegosignals with CWRSCg’s ranging from 1 dB to 40 dB.
The sounds ﬁles used in the listening tests are available at the web-
site [30]. Based on the subjective tests it was concluded that the 7.715
dB implementation of LP parametric watermarking was perceptually
similar to the 27 dB implementation of SS watermarking, and the ﬁ-
delity of 10.68 dB implementation of LP parametric watermarking was
comparable with 30 dB implementation of SS watermarking. This, in
itself, is signiﬁcant because it demonstrates the ﬁdelity beneﬁts that

can be achieved through parametric watermarking.

3.2.3 Watermark robustness

In this section, we analyze the robustness to common attacks of wa-
termarks inserted by LP based parametric watermarking. The stegosig-
nals used in these experiments were obtained by embedding watermarks
through direct manipulation of the LP coefﬁcients. The SS watermark-
ing algorithm for multimedia signals [10] was used to benchmark per-

formance.

48

For meaningful analysis of detection performance, it is necessary
to consider stationary segments of the coversignal and the stegosignal.
That is, segments of yn, w,,, and, hence, gm are assumed to be partial
realizations of wide-sense stationary (WSS) and ergodic random pro-
cesses. Generally, speech sequences can be considered stationary across
frames of duration 20 ms. However, the robustness experiments pre-
sented below are based on speech frames of longer duration, typically,
125 ms, in order to balance the conﬂicting requirements of stationar-
ity and longer frame lengths for the LSE estimation and stegosignal
ﬁdelity. Hence, one important observation to be made based on the
experimental results is the effect of non-stationarity on watermark ro-

bustness.

Robustness to additive white noise attack

Let {77”}?=1 be a partial realization of a zero mean, uncorrelated
noise process which is added to the stegosignal samples {gn}f,"=,. Let the
corrupted stegosignal be denoted {gg},’)’=,. In this case, the “output”
signal used in the LSE problem (equation (3.5) for n E [1, N ]) will be

likewise corrupted. That is, the clean signal d, is replaced by, say,

dZ=y”—§n=dn+nn,n=1,2,...,N. (3.22)

71.

Accordingly, the cross-correlation vector cydn [i.e., right side of normal

49

 

rmuu.‘.:-.‘“ - ' l-

equations (3.6)], but only this vector, is affected by the attack. The

LSE solution is

a" = Cglcyd. = (YNYiv‘rldeyv ’ (3.23)
T
where, d7]v = div div—1 (1'17 6 RN. Equation (3.23) can be
expressed as,
a" = Cglcyd + Cglcy, = a + Cglcyn. (3.24)

The ith value of the cross-correlation term cm is given by cyn(i) =
277:1 yn_,-dn. Since the noise is uncorrelated, Cyn asymptotically as
N —> oo approaches the zero vector 0. Hence the “corrupted” cross-
correlation, Cydn, approaches cyd for large N. The watermark is there-
fore asymptotically immune to the white noise attack. In the presence
of white noise, a" is an unbiased and consistent estimator of 5 for all
N.

To verify the analysis, experiments were performed in which white
Gaussian noise resulting in different SNRs was added to speech wa-
termarked by both LP and SS algorithms. The correlation coefﬁcients
between the original and recovered watermarks from all eight stegosig-
nal frames were determined and averaged. It is seen in Fig. 3.5 that,

at any SNR, LP parametric watermarking at 7.715 dB and 10.68 dB

50

 

 

   

 

 

 

 

 

1 _
0.9 -
0.8 -
8
z: 0.7 -
(_u
g 0.6 — .
o
0 0.5 -
if? 0 4
0:) a I.
30:, 0.3 . Param. wmkg. 7dB
8 - - - Param. wmkg. 10dB
0-2 “ + SS wmkg. 7dB
0.1 _ —-1— SS wmkg. 10dB
—6— SS wmkg. 27 dB
0 ’ ,_ , 1 + $8 wmkg. 30dB
-20 0 20 40 60 80
SNR

Figure 3.5: Watermark robustness to white noise attack. Performance
of parametric watermarking at CWRseg’s of 7.715 dB and 10.68 dB is
compared with that of SS watermarking at 7.715 dB, 10.68 dB, 27 dB
and 30 dB CWRseg.

51

CWR60g results in higher correlation between original and recovered wa-
termarks compared to SS watermarking at CWRsegs of 7.715 dB, 10.68
dB, 27 dB or 30 dB. This improvement in the correlation coefﬁcient
values, and hence robustness, is mainly due to the LSE—based recovery
algorithm. This level of robustness to white noise attack is sufﬁcient for
a wide-range of watermarking applications, as the stegosignal is highly
noisy below an SNR of 15 dB (for details see [30]). The non-stationarity
of the 2000—sample watermarked speech frame can be ignored for practi-
cal applications of parametric watermarking. Also, as expected, water-
mark robustness to attack increases as the CWRseg is decreased, since

there is greater watermark energy in the same coversignal.

Robustness to colored noise attack

In the next set of experiments, the stegosignal segment was dis—
torted by the addition of a colored noise process, {7n},1:’=1. Colored noise
was generated by ﬁltering a white noise process using a 11thorder FIR
IOWpass ﬁlter with a cut-off frequency of 0.4 (normalized) or 6400 Hz.
The distorted stegosignal frame is denoted {gg},1f=,. As in the white
noise case, the “output” signal in the watermark recovery process is

corrupted. Instead of dn, we have access to

d7 = '7; — e[n] = dn +ryn, n = 1,2, ...,N. (3.25)

52

Consequently, the cross-correlation vector in the normal equations is
altered by the attack. Because of the correlation in the noise, cyan
no longer approaches cyd asymptotically. Depending on the relative
magnitudes of the cross—correlation elements in cyd—y, the LSE estimation
of the perturbed coeﬂicients, and hence the watermark, will be affected.
The solution to this problem is a prewhitening procedure.

In the presence of colored noise, the LSE estimation problem is

represented by the following equation,
Yqjgrél = (IN + ’YN. (3.26)

T
in which, 7N = [ 71 72 . . . 7N ] and all other quantities are deﬁned
above. Pre—multiplying both sides of equation (3.26) by the inverse
covariance matrix of the colored noise, C; 1, and rearranging the terms,

results in

57=(YNC;1Y71(,)‘1YNC;1d7 = (YNC;1Y§)-1YNC;1dN. (3.27)

Thus, the estimation of the perturbed LP coeﬂicients is the solution
to (3.27) with Cy replaced by C; = (YNC;1YN) and cm; replaced by
CZd = YNC;1dN. Whitening requires knowledge of noise correlation

properties which are readily determined in the present application.

53

 

   

 

 

 

 

 

 

 

1 1 ".7 I
0.9 L
0.8 - .
‘E
.9 0.7 - «
.9
‘5 0.6- .
8
c 0.5 -
.9
E 0.4 .
g 03 . Param. wmkg. 7dB
8 - - - Param. wmkg. 10dB
0-2 - + $3 wmkg. 7dB
01-, +38 wmkg. 10dB
' I . ». —6-SS wmkg. 27 dB
0 _— "4.42, -— ' +88 wmkg. 30dB
-20 0 20 40 60 80
SNR

Figure 3.6: Watermark robustness to colored noise attack. Colored
noise was generated by lowpass ﬁltering white noise.

54

 

p .0 .0
‘1 m (D —3
I r r ﬂ

. .0
O)
I M

.0
A

 

 

Correlation coefﬁcient
O
01

 

 

 

 

 

 

0.3 -
0 2 , —II— without whitening 7dB
' —— with whitening 7dB
0.1 —t— without whitening 10dB .
’ - - - with whitening 10dB
0 1 1 1 1 1
-20 0 20 40 60 80

SNR

Figure 3.7: Improvement in watermark robustness to colored noise at-
tack due to whitening transformation.

55

The effect of colored noise on watermark robustness is represented
in Fig. 3.6. It is observed that LP parametric watermarking is fairly
robust to colored noise, even in the absence of a prewhitening operation
during the recovery process. In Fig. 3.6, the differences in performance
between parametric and SS watermarking algorithms at 7.715 and 10.68
dB is even greater than in case of white noise (Fig. 3.5). An improve-
ment in watermark robustness to colored noise attack is observed in
Fig. 3.7, where the watermark recovery process involves prewhitening.
In fact, LP parametric watermarking with prewhitening at 10.68 dB
results in better robustness than at 7.715 dB without whitening, even

though the latter outperforms SS at 7.715 dB.

Robustness to ﬁltering

Let gf _ be the result of ﬁltering the stegosignal. At time n,
n n—l
9,5 — 9n * hn — yn * hn 'l' wn * hn, (3.28)

where {hn} is the impulse response of the ﬁlter, * denotes linear con-

volution, and where we have continued to denote the watermark signal

~

wn : yn — yno
In the ﬁrst analysis, it seems very reasonable that an ideal attack
would be designed to result in g): % yn. This indicates that the ideal

attack ﬁlter will maximize (in some sense) the contribution of the ﬁrst

56

term in the sum in (3.28), and minimize the second — similar to any 0p—
timal ﬁlter design to remove noise.1 On the other hand, (3.28) reveals
that good watermark design requires that the watermark signal be as
spectrally similar to the coversignal as possible, so that any attack on
the watermark will also degrade the coversignal component, thereby
degrading ﬁdelity. Since the effectiveness of an attack is constrained
by the perceptual distortion of the stegosignal, for robustness to ﬁlter-
ing attacks, it is sufﬁcient that most of the watermark information be
present in perceptually signiﬁcant components of the coversignal [10].
In general, speech signals have most of the perceptually signiﬁcant com—
ponents in the low frequency spectrum, and hence watermark signals
with low frequency spectra are most likely to survive a ﬁltering attack
- assuming that the attacker uses a rational approach which preserves
ﬁdelity.

Watermark robustness to a 4th—order butterworth lowpass ﬁlter
for a range of cut-off frequencies is shown in Fig. 3.9. Since the water-
mark vector can be interpreted as coefﬁcients of an FIR ﬁlter {02,-},1-‘11 =
{w[i]},’-‘:1, the magnitude response of this FIR ﬁlter (|W(Q)|) is as
shown in Fig. 3.8(a), while the magnitude response of the attack ﬁl-
ter is shown in Fig. 3.8(b).

Watermark robustness to ﬁltering depends on the magnitude spec-

 

1Since the attacker does not have access to the watermark signal w", truly optimal design - from
the attackcr’s point of view - is not possible.

57

 

Magnitude response |H(Q)|

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5
Normalized frequency (0)

(a)

 

0.9 r
0.8 -
0.7 -
0.6 ~
0.5 r
0.4 -
0.3 r i
0.2 -

llR ﬁlter magnitude response

0.1 -

 

 

 

0 0.1 0.2 0.3 0.4 0.5
Normalized frequency (9)

(b)

Figure 3.8: Plots of (a) Magnitude spectrum of the watermark coeffi-
cients, and (b) Magnitude response of the attack ﬁlter at a normalized
cut-off frequency of 0.4. A 4th-order IIR Butterworth ﬁlter was used to
test watermark robustness to lowpass ﬁltering.

 

 

A
.1

 

.0
(o

Param. wmkg. 7dB I I I ,
— - - Param. wmkg. 10dB ’
+ $5 wmkg. 7dB ,’ .

—-I— SS wmkg. 10dB I
—6— SS wmkg. 27 dB I
+ SS wmkg. 30dB /

.0
oo

     
  
  
    

 

.0
\I

 

 

.0

O)

\
x

P

A
' I
I
l
l
l
i

i

r

l

I

\
\

Correlation coefﬁcient
9 o
(.0 01
\

L

0.2 -

 

........
-—-_.—_..-.—-—_.

  

 

   

 

0 1 1 1 1 m 1 1
0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5
Normalized cut-off frequency

Figure 3.9: Robustness to lowpass ﬁltering. A 4th-order IIR butter—
worth ﬁlter was used to implement the lowpass ﬁltering attack.

59

trum of the embedded watermarks. Low-frequency and mid-frequency
watermark ﬁlters contribute to better robustness against lowpass ﬁlter-
ing. Watermark robustness can be improved further through diversity,
by repeatedly embedding watermark information [43]. Any highpass
watermark ﬁlter {whp[i]},-1Zl can be transformed into a lowpass water-

mark {wlp[i]};-”_’I__1, using the relation [44]
with = (—1)"wzplil- (3.29)

Robustness is improved by embedding the “same” watermark twice, in
the original form {wﬁﬂl = {w’ [2]},M:1 and in the frequency-translated
form w[i] = (—1)iw’[i]. The recovered watermark that has a higher
correlation with the embedded watermark is used for watermark detec-
tion.

In order to illustrate this point, the coversignal was altered using
the watermark whose magnitude spectrum is shown in Fig. 3.10(a),
and its translated counterpart shown in Fig. 3.10(b). The resulting
stegosignals were subjected to a highpass ﬁltering attack using a 4th-
order butterworth ﬁlter. Figures 3.11(a) and 3.11(b) show that the
transformed watermarked results in improved robustness to ﬁltering.
Highpass ﬁlters have a deleterious effect on speech quality. Even a

cut-off frequency of 0.04 (normalized) or 640 Hz resulted in signiﬁcant

distortion of the stegosignals making them unusable for typical commer-

60

cial use, for example, and certainly for the digital library application

addressed here.

Robustness to cropping

In a cropping attack, arbitrary samples of the stegosignal are re-
moved. Since the parametric modeling based watermarking involves
an additive operation during the watermark embedding and recovery
processes, cropping results in desynchronization of the coversignal and
the stegosignal. However, as the present method is an informed wa-
termarking technique, the algorithm described in [5] can be used for
resynchronization of the cover and stegosignals.

In the present experiment, the stegosignal was subjected to a mod-
iﬁed version of crepping, sometimes called the jitter attack [45]. In this
modiﬁed implementation, random samples of the stegosignal were re-
placed by zeros. A speciﬁed percentage of samples from each frame
of 2000 samples were randomly replaced by zeros. The fact that wa-
termark information is spread-out in the stegosignal, while it is con-
centrated during the recovery process involving LSE, contributes to
increased robustness of LP parametric watermarking to cropping as

shown in Fig. 3.12.

61

 

.1.
.5

.—L
N

Magnitude response |H(Q)|

 

 

 

 

0.8
0.6
0.4 -
0.2 1 1 1M 1
0 0.1 0.2 0.3 0.4 0.5
Normalized frequency ((2)
(a)
1.8 I I I ﬁ

 

Magnitude response |H(Q)|

 

 

 

 

0.2
0

0:1 0.2 0.3 0.4 0.5
Normalized frequency (9)

(b)

Figure 3.10: Plots of (a) Magnitude spectrum of the original water-
mark coefficients h[n], and (b) Magnitude response of the transformed
watermark coefﬁcient, (—1)”h[n].

62

 

1.2 r l M 1 m If

Param. wmkg. 7dB

 

 

- - - Param. wmkg. 10dB
+ SS wmkg. 7dB
—1— SS wmkg. 10dB
—9— SS wmkg. 27dB
+ SS wmkg. 30dB

 

 

 

Correlation coefﬁcient

7 I

 

 

 

 

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Normalized cut-off frequency

(a)

 

 

Param. wmkg. 7dB
- - - Param. wmkg. 10dB
1 + $3 wmkg. 7dB
—+— SS wmkg. 10dB
—9— SS wmkg. 27dB
+ SS wmkg. 30dB

1

 

 

 

Correlation coefﬁcient

 

>.‘_u' _"_ ﬁ"‘ - . '7 _-‘

 

 

 

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Normalized cut-off frequency

0))

Figure 3.11: Robustness to 4th-order butterworth highpass ﬁlter. In
(a), the embedded watermark coefﬁcients corresponded to a magnitude
spectrum shown in Fig. 3.10(a), and in (b) the watermark coefﬁcients
were transformed using equation (3.29) and embedded.

63

 

M

 

 

 

 

 

 

 

 

 

1 I
. Param. wmkg. 7dB
033'; ‘\ - - -Param. wmkg. 10dB -
\ —l— SS wmkg. 7dB
.— 0‘8 ‘ -f-- SS wmkg. 10dB
.5 0.7 ‘. q —e—ss wmkg. 27 as i
9 \ —.e.— SS wmkg. 30dB
30:, 0.6
8 t
c 0.
.9
E 0.4 1
a) i
t 0.3'
8 I
0.2 4“
0.1
0
0 20 40 60 80 100

% cropped samples

Figure 3.12: Robustness to cropping. Samples of the stegosignal were
randomly cropped. Parameter-embedded watermarking results in im-
proved robustness to cropping.

64

Table 3.5: Robustness to speech coding

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Speech Bit Para Para SS SS wmkg
codec rate wmkg wmkg wmkg wmkg
7 dB 10 dB 7 dB 10 dB

k bits/s corr coef corr coef corr coef corr coef
(3.711 64 0.9990 0.9998 0.9985 0.9966
ADPCM 32 0.9889 0.9682 0.8207 0.7658
GSM 13.2 0.6584 0.5545 0.4095 0.3140
CELP 4.5 0.1488 0.1490 —0.0225 —0.0290
CELP 2.3 0.1269 0.1464 0.0209 0.0472
LPClO 2.4 0.1762 0.1762 —0.0184 —0.0470

 

 

Robustness to speech coding

Experiments were performed to study the effect of low-bit rate
speech compression on LP based parametric watermarking. The stegosig-
nal was compressed (coded), then decompressed (decoded), and the wa-
termarks were recovered from the decompressed (decoded) signal. The
correlation coefﬁcient between the original and recovered watermarks
for different speech codecs are tabulated in Table 3.5. The attacked
stegosignals are available in the website [30]. The G711 [1 law, G726
ADPCM, GSM (13.2 k bits/s), LPClO, and CELP (4.5 k bits/s) codecs
were obtained from the website [46]. The G.711 speech codec software
uses logarithmic pulse code modulation (PCM) and operates at sam—
pling frequency of 8 KHz, with 8—bits per sample to compress and de-
compress speech. The G.726 uses adaptive differential PCM technique

and is widely used in VoIP applications. The GSM full rate codec uses

65

 

an 8th order linear prediction along with 13-bit uniform PCM. CELP
and LPC10 codecs are also based on parametric models of speech. At
4.5k bits per second or less, the attacked stegosignals are intelligible,
but are of low ﬁdelity [30].

It is seen from Table 3.5 that the compression bit rate and CWRSCg
are the main factors influencing watermark robustness. LP parametric
watermarking outperforms SS watermarking in the presence of speech
coding for all the bits rates tested. The performance of both SS and
parametric watermarking is degraded signiﬁcantly due to low bit rate
CELP, GSM and LPC10 coding. However, the quality of the de-
compressed speech is also degraded considerably for CELP, GSM and
LPC10 codecs [30]. Parameter-embedded watermarks are slightly more
robust to LPC10 coding than CELP coding at 2.3 k bits/s. Although
parametric watermarking involves perturbation of parameters of signif-
icance to speech coding, it performs better than SS watermarking in
the presence of CELP, GSM and LPClO codecs. This is because of the
more speech-like rather than noise-like characteristic of the LP based
watermark signal. At the same time, since LP analysis is performed
over nonstationary segments of speech and synthesis is not involved in
stegosignal reconstruction, robustness to a particular speech coder is

not at the expense of the robustness to other codecs.

66

Chapter 4

LP Parametric Watermarking with

a Fidelity Constraint

4.1 Introduction

The previous chapter described a speech watermarking algorithm
wherein the LP parameters of the coversignal were modiﬁed by the ad—
dition of the watermark vector that was selected independently of the
coversignal. The stegosignal was constructed using the correspondingly
perturbed LP coefficients and the exact prediction residual, {233574,
using the FIR ﬁlter g, = 23:1 a.y,,_, + 5,, = 5Tyn + g... LP based
parametric watermarking was found to be fairly robust against a wide
variety of attacks such as addition of noise, MP3 compression, and
cropping [7]. A main reason for good robustness is that the water-
mark signal is concentrated into a parametric representation during

watermark embedding and recovery, while it is spread across the en-

67

tire work otherwise. Stegosignal ﬁdelity and watermark robustness can
be improved further if the embedded watermarks are obtained by in-
tegrating a ﬁdelity constraint with the watermark embedding process

and this led to SMF based parametric watermarking.

4.2 SMF parametric watermarking

SMF-based parametric watermarking subject to an 600 ﬁdelity con-
straint [29,47] represents a step toward quantifying the relationship
between the competing requirements of robustness and ﬁdelity. The

following general problem is addressed in this research:

 

N

n=1

CONSTRAINED WATERMARKING PROBLEM. For coversignal frame {yn}
generated according to model (3.2), ﬁnd the set of watermarks, such
that, for stegosignal frame {ynmle generated according to (3.3), the

following ﬁdelity criterion is met,

Hy - ylloo < 7 (4-1)

in which y and y are N -vect0rs with nth elements yn and 37.”, respectively.

 

In the present work, the determination of a watermark set guaran—

68

teed to satisfy a ﬁdelity criterion is readily solved as an SMF problem
(refer Section 2.2). First, let us subtract yn from each side of equation

(3.3), negate each side, then rearrange to obtain

M
yn — 3771 2 (ya — €12) "' :aiyn—i = (ya ‘“ €72) “' 5TYn' (4'2)
i=1

Given a coversignal {yn E RM},1,V:1, a desired stegosignal {3711 E
KHz/=1, and a maximum error tolerance '7, SMF [25] can be used to
obtain the hyperellipsoidal membership set that tightly bounds the fol—

lowing feasibility set (’PN Q RM) at time N,

’PN = {5| lly — ylloo < ’Y}- (43)

in which y is the N -vector with nthelement yn, and y is the N -vector
with nthelement éTyn + ﬁn.
The ﬁdelity constraint can be generalized to allow for more “local”

ﬁdelity considerations in time as the signal properties change. A ﬁdelity

N

criterion takes the form of pointwise absolute error bounds, {7n}n=1,

on the difference between the stego— and coversignals: [yn — gnl < 7,,

for each n E [1, N]. Upon deﬁning the sequence
2n:yn—€ni n=132)"'3N3 (44)
(recall that {ﬁn} is known) and the search for the constrained water-

69

mark parameters is reduced to a SMF problem as in (2.2). Applying
SMF method to the estimation of a as in equation (4.2) yields hyper—
ellipsoidal set of watermark (perturbed model parameter) candidates,

EN, guaranteed to contain and tightly bound the following exact set
’PN = {a 6 RM I [2,, — aTynl < 7”, n E [1,N] }. (4.5)

The ﬁdelity constraint is a bound on [wn], where the watermark signal

is given by wn = 3],, — y,,. The hyperellipsoidal set is,
.. .. C - ~
6~ -——— {a|(a — a.(N>>T—,;fj-(a — am» < 1}, a 6 it“ (4.6)

where aC(N) is the center of 8N. C N E RMXM is the covariance matrix

and CN = YNY}; where YN = M y, e RMXN. As

y N—l
shown in Table 2.2.2, Kn is updated recursively for n = 1, . . . , N and the
ﬁnal value is obtained as KEN. By default, the center of the hyperellipsoid
is used to construct the stegosignal (equation 3.3) and the embedded
watermark vector is w = ac(N) — a.

The watermark recovery process for SMF parametric watermarking
involves LSE estimation of the modiﬁed LP coefﬁcients (refer Table
3.2). Hence, even in case of SMF-based watermarking, the embedded

watermarks are asymptotically (N ——+ 00) immune to an additive white

noise attack [47].

70

 

4.3 Robustness optimization

The robustness property is dependent on selection of appropri-
ate watermark solution from the hyperellipsoidal set, strength of the
embedded watermark, and watermark detection. In general, greater ro-
bustness can be obtained by embedding more energetic watermarks and
this in turn, affects stegosignal ﬁdelity. Although, by default, the center
of the hyperellipsoid constitutes the watermark solution, in most cases
it might not be the optimal solution for a given attack. For robust—
ness analysis it is assumed that the hyperellipsoidal set EN is obtained
through the SMF ﬁltering algorithm subjected to the ﬁdelity constraint,
[3],, —yn[ < 77,. It should be noted that the hyperellipsoid is not centered
at a, the vector of original LP coefficients. More energetic watermark
vectors are embedded by selecting perturbed LP parameters from 8N
that are as further away as possible from the original LP parameters a.
The selection of appropriate watermark solution from 8N depends on
the attack and the targeted robustness.

The SMF-based watermarking approach is especially useful in im-
proving watermark robustness against attacks whose effects vary based
on the nature of the watermark signal. For example, robustness to
a lowpass ﬁltering attack can be improved by selecting low frequency

watermark signals.

71

4.3.1 Optimal watermarks for a ﬁltering attack

The impulse response of an attack ﬁlter is assumed to be known
and is denoted by {hn}. The stegosignal of form (equation 3.3) is to be
constructed by selecting an appropriate vector of perturbed LP coefﬁ-
cients from the hyperellipsoidal set EN. The corresponding watermark
vector is deﬁned as w = ii — a. Let {g5 £21 be the result of ﬁltering the

stegosignal. That is, at time n,
gr}; : 37‘” * hn = y‘n * hn + wn * h”, (4.7)

where {ﬁn} is assumed to be a stegosignal constructed from any 5 6 EN
including the best (optimized) a. An ineffective attack on the stegosig—
nal will result in a ﬁltered stegosignal with a ﬁltered coversignal com-
ponent that is perceptually dissimilar to the original. This is because
watermark robustness is generally deﬁned as the ability of the water—
mark to survive an attack to the extent that the speech ﬁdelity is not
affected beyond an application-dependent criterion. Also, an attack
is ineffective if the ﬁltered watermark signal {w,f,}"=1 approximates the
original watermark signal {wn}?,=1. The coversignal and the attack ﬁlter
{hm};1 are predetermined quantities and hence the ﬁltered coversig-
nal component in equation (4.7) cannot be controlled by the watermark

embedding algorithm. However, the second term in (4.7) ({wg}?=1) can

72

be made to be robust against the ﬁltering attack by selecting an ap-
propriate 5 from the set EN. The problem of selecting the “best” set
of modiﬁed LP coefﬁcients from 8N, is now addressed.

Let Aw}; be deﬁned as,

Awgzw

: :Kéz — ai)yn—il * (hn _— 6‘”)’

where 6,, is the Kronecker delta; 60 = 1 and 6,, = 0 for n 75 0. Clearly,
Awf, is a function of 5 for a given attack ﬁlter. Then the mean squared

error (MSE) between the ﬁltered and original watermark signals is given

by, N
ﬁe)- — = ”717?) (wf — w.) . (4.8)

If a? = 5 — a is indeed the “best” watermark vector, then the cor-
responding ﬁltered watermark signal v-vf is associated with minimum
MSE. Then, 5 is obtained by solving the following constrained opti-
mization problem:
minlmize f (a) (49)
subject to 5 6 EN

The method of lagrange multipliers can be used to solve this opti-

mization problem [28]. The domain of the constraint function is the

73

hyperellipsoid, which is a convex set if 5541 is a positive deﬁnite matrix.
N

4.3.2 Optimal watermarks for a quantization attack

This section deals with uniform and non-uniform scalar quantizer
attacks on watermarks. The quantizer consists of L equal or unequal
intervals [11,12, . . . ,IL]. Each interval 1;, for l = 1, 2, . . . , L is associated
with a quantization value 33,. The scalar quantization operation Q can

be expressed as,

where 33,9, 2 2:; whenever Igg — 1131] is minimum over l = 1, 2, . . . , L.

To maximize watermark robustness to a speciﬁc quantization at-
tack, a similar constrained optimization problem to that in equation
(4.9) is solved with the objective function f (5) = %Z(wg — wn)”,
where w% = 373, — y,,. In a similar way, optimal watermarks for best
robustness to a combination of ﬁltering and quantization attacks can
be determined. The latter problem can be generalized for a combined

attack involving several distinct attacks on the stegosignal.

74

4.3.3 Maximizing watermark energy
The boundary of the hyperellipsoidal set obtained by SMF considera-
tions is given by,

6i = {5|(5 — a.(N>>T%j—<a — acuv» = 1}, (4.10)

where ac(N) is the center of the hyperellipsoid. The boundary of the
hyperellipsoid is signiﬁcant for the following reason. An important
factor affecting watermark robustness is the energy of the watermark
signal (wn = 3],, — yn). The modiﬁed LP coefﬁcients from the boundary
of the hyperellipsoid result in the highest energy watermarks for the
corresponding ﬁdelity constraint. Watermark robustness is also a func-
tion of the frequency content of the watermark signal. However, this
paper is mainly concerned with the effect of watermark signal energy
on watermark robustness and the “best” watermark vector is selected
accordingly. The “best” watermark vector a7 is such that the corre—
sponding, vector of modiﬁed LP coefﬁcients 5 is from the hyperellip-
soidal boundary 8%,. The constrained optimization problem in (4.9) is
modiﬁed as follows.
minimize f (5)

(4.11)
subject to 5 E 8]",

75

 

4.4 Experiments and discussion

Although Lagrange multipliers can be used to solve the optimiza-
tion problem in (4.9), the subsequent computational complexity might
be too costly for certain watermarking applications. Moreover, this re—
search is mainly concerned with selecting modiﬁed LP coefﬁcients such
that the resulting watermark signal has high energy. Hence, search-
ing the hyperellipsoidal boundary at intermittent points for improved
robustness will prove to be beneﬁcial. There is a trade-off between
the number of points selected from the hyperellipsoidal set and compu-
tational complexity. As an example, the experiments reported in this
section were executed in Matlab running on a 1.4GHz Celeron processor
with 512 MB RAM and an average run time of 5 seconds.

Experiments were performed to test the robustness of the “best”
SMF solution to ﬁltering and quantization attacks. The coversignal
consisted of 500 samples of the vowel sound /A/ sampled at 10 kHz.
The correlation coefﬁcient between the original and recovered water-
marks is used as a measure of robustness. A 4thorder LP model was
used for watermarking. The value of 7,, was 0.4 for all n E [1, N], and
the watermark signal was imperceptible in the resulting stegosignal.

Figure 4.1(a) shows the effect of a low pass ﬁltering attack involving

a 4thorder lowpass Butterworth on the best SMF solution and the cen-

76

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.995 - , D .
9" a \ , ’ ,I-
0.99'ik \ x e \ I ’0’ l / _
E \ \ \ I I /
.2 0.985- x ‘0“ x9 ’ -
9 *'~ \ ‘ ‘0 ’ X
% 0.98 ' \ \ I I ..
s m - 4, ’
0.975 ’- ‘ x _— — “ ..
.5 +
g 0.97 - _
8 0.965 ~ -
0
0.96 - -
(1955 . - e - best solution .
- + - central estimate
0.95 ‘ ' '
0.3 0.35 0.4 0.45
Normalized cut-off frequency
(a)
0.96 . . .
- + - central estimate
0- 9 — e - best solution
‘0.
,_. 0.955 - + + * ‘04 \ \ -
.5 ~ +\ x \ ~ \
e ‘e.
8 0.95 '- \k \ \ \ \ .,
O ‘ \ x
C ‘ x
e 33‘ .
<_u 0.945 - ‘ O . -
0) ‘ \
t \ \
O \ \
O \ \x
0.94[ is:
0.935 ‘ ' ' ‘
0 0.02 0.04 0.06 0.08 0.1

Normalized cut-off frequency
(b)

Figure 4.1: Filtering attack. For Fig. 4.1(a) a 4thorder IIR Butterworth
lowpass ﬁlter was used to distort the stegosignal, and for Fig. 4.1(b) an
8‘horder FIR highpass ﬁlter was used to attack the stegosignal.

77

tral estimate (default solution) of the membership set. In Fig. 4.1(b),
a highpass ﬁltering attack was applied on the stegosignal by using an
8thorder FIR ﬁlter. The watermarks derived from the best solution (el-
lipsoid boundary) are more robust than those derived from the central
estimate of the set. It is seen from Table 4.1 that parametric water-
marking is quite robust to quantization attacks. The original coversig-
nal was quantized at 16 bits per sample. The uniform quantizer in the
attack used 3 bits per sample. A sub-optimal non-uniform quantizer
requiring 3 bits to code the quantized value was implemented by ar-
bitrarily partitioning the quantization range. Finally, Fig. 4.2 shows
the effect of both quantization and lowpass ﬁltering on recovered wa-
termarks derived from the best and default solutions of the hyperellip-
soidal set. In almost all cases the watermarks recovered from the best
SMF solution perform signiﬁcantly better than the ones recovered from
modiﬁed LP coefﬁcients at the center of the membership set.

In applications with little prior knowledge of potential attacks, di-
versity [43] in watermark embedding is employed for improved robust-
ness. The SMF robustness optimization can be viewed in this context
for embedding multiple watermarks, each with a targeted robustness

to speciﬁc combination of attacks.

78

Table 4.1: Robustness to quantization attacks

 

 

 

 

 

 

 

Type of quantizer SMF solution central estimate
corr coef corr coef
uniform 1 0.9998
Non-uniform 0.9998 0.9990

 

 

 

 

 

 

 

 

 

 

1 I I f r I
l . ,..e
0.95- ‘~“;ﬁ\‘ ’0 , f .
.:-$\_._--O~._ [_z. I
“g o 9 ‘ \ e l’ i
E 0.85 r ‘ ’
a) \ I
8 \ I
c 0.8 r \ I
.9 \ I
*5 \ I
_ 0-75 ” \ I 1
g l r
o \ I
0 0.7 ' \ / I
\ I
0.65 - - + -centra| estimate ‘1.”
- -o— - best solution
0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46

Normalized cut-off frequency

Figure 4.2: Watermark robustness to combination of non-uniform quan-
tization and IIR lowpass ﬁltering attacks.

79

Chapter 5

Generalizations and Extensions

5.1 Introduction

In this chapter, a generalized framework for speech watermarking
based on linear-in-parametric models of speech production process is
presented. Watermarks are embedded in the LSP parameters, log area
ratio (LAR) parameters, inverse sine (IS) parameters, and reﬂection
coefﬁcients (parcor coefﬁcients) [6]. The watermark robustness and
stegosignal ﬁdelity aspects of these alternate parametric speech models
are discussed in this chapter and compared with watermarking in the
LP domain.

The chapter also presents an application of perturbed parameter
theory to watermarking [48,49]. The perturbed parameter theory is
used for obtaining bounds on the perturbation of the stegosignal caused

by watermarking, hence in assessing the effects of the embedded water-

80

marks on ﬁdelity.

5.2 Generalized framework for parametric water-

marking

A consequence of the LP watermarking framework is that alter-
nate or related representations of LP parametric models can be used
for watermarking. These representations, including LAR, LSP, IS, and
parcor coefﬁcients, may prove to be beneﬁcial for watermarking in cer—
tain applications. For example, localization of watermark content in the
frequency domain is more effectively controlled through direct manip-
ulation of LSP coefﬁcients. On the other hand, since LAR coefﬁcients
have the highest correlation with subjective quality [50], they can be
directly altered to preserve stegosignal ﬁdelity.

In order to obtain the LSP parameters, the Z-domain representa—

tion of the M th order LP inverse ﬁlter is decomposed into the following

 

polynomials.
P(Z) = A(Z) + z-<M+1>A(z-1) (5.1)
62(2) = A(Z) — Z‘<M+”A(Z—1) (5.2)
A(Z) = P(Z) I; Q(Z) (53)

where A(Z) = 1 — 2M, a,Z‘i, the Z—domain representation of the in—

i:

81

Table 5.1: Generalized watermark embedding algorithm

 

 

n=—-oo

K speech frames to be watermarked. Then: For k = 1, 2, . . . , K

Let yn 0° denote a coversignal, and let yn n: be the kth of
n 1n.

1 Using the “autocorrelation method” (e.g., [6, Ch. 5]), derive
a set of LP coefﬁcients of order M, say {aﬁgv for the given
frame.

2 Use the LP parameters in an inverse ﬁlter conﬁgura-
tion to obtain the prediction residual on the frame,

M n}.
{5n = yn _ Zizl aiyn—i}

3 Convert the LP parameters to LSP or parcor parameters and
embed the watermark vectors. Alternately, for watermark-
ing in the LAR or IS domain, convert the LP parameters to
parcor and then convert the resulting parcor parameters into
LAR or IS parameters before embedding the watermark vec-
tors. Use the modiﬁed LSP, LAR, IS, or parcor parameters
to produce a corresponding set of modiﬁed LP parameters,

say {5,}?11.

nznk

4 Use the modiﬁed LP parameters as a (suboptimal) predic-
tor of the original sequence, adding the residual obtained in

Step 2 above at each n, to resynthesize the speech over the
n!

frame, {377, = 2,111 diyn_,- + 67,} k . (To the extent that the
n=n

watermark represents only small pkerturbations to the orig-

inal LP parameters, the resynthesized result is a pointwise
approximation to the coversignal over the same time frame.)

5 The sequence {73.3% is the kth frame of the watermarked
speech (stegosignal).

Next k.

 

82

 

verse ﬁlter [6]. The zeros of the polynomials P and Q constitute the
LSP parameters. The zeros of P and Q occur in complex conjugate
pairs and hence M unique zeros are required to specify the vocal tract
model [6]. The magnitude of the zeros is unity and only the frequency
parameter is required to be represented. The LSPs represent the fre-
quency parameters. Conversion from LP domain to LSP or LSP to
LP parameters is quite simple [equations (5.1), (5.2) and (5.3)]. The
watermark embedding and recovery algorithms presented in Tables 5.1
and 5.2 include LSP to LP and LP to LSP conversions respectively, for
watermarking in the LSP domain. Since LSPs represent frequencies of
zeros lying within the unit circle, it has to be ensured that the modiﬁed
LSP parameter values are within 0 and 7r. This requirement imposes
a constraint on the strength of the embedded watermark vectors and
consequently on the energy of the watermark signal.

The reﬂection coefﬁcients (It) constitute an alternate representa-
tion to LP coefﬁcients and play an important role in speech coding
and analysis applications. The parcor coefﬁcients are obtained as a
by-product of the Levinson-Durbin (L-D) recursion, which is used to
convert the autocorrelation values of speech to LP coefﬁcients. Con-
version from reﬂection coefﬁcients to LP coefﬁcients is accomplished
using the algorithm in Table 5.3 [51]. LP coefﬁcients can be converted

to reﬂection coefﬁcients using the algorithm in Table 5.4 [51]. Water-

83

Table 5.2: Generalized watermark recovery algorithm

 

Fork=1,2,...,K

1 Subtract residual frame {egg from the stegosignal frame
{gn}2’;. This results in an estimate of the modiﬁed predicted
speech, {d7, = 37,, — {”25

2 Estimate the modiﬁed LP coefﬁcients {5,}?! by computing
the least-square—error solution, say {tin-H“, to the overde—
termined system of equations: d,, z 22210997143 n =

I
nk,...,nk.

3 Convert the modiﬁed LP coefﬁcients from Step 2 to modiﬁed
LSP, LAR, IS, or parcor coefﬁcients.

4 Use the parameter estimates from Step 3 to derive the cor-
responding watermark values.

Next 19.

 

 

 

mark information can be added to the reﬂection coefﬁcients and the

resultant converted to modiﬁed LP coefﬁcients. While embedding the

watermark, it should be ensured that [14),] # 1 for any i, otherwise

ﬁnding the reﬂection coefﬁcients is an ill-conditioned problem.

Other sets of speech parametric models for embedding watermark

information include the LAR and inverse sine parameters. The LAR

and inverse sine parameters are related to the reﬂection coeﬂicients as

shown in equations (5.4) and (5.5), respectively:

1 I+KZI

vzzélog =tanh‘1nl, for l=1,2,---,M.

 

I—Kll

84

Table 5.3: Conversion of reﬂection coefﬁcients to LP coefﬁcients

 

Let K. be a vector of M reﬂection coefﬁcients.

1 Initialize the output LP vector a to the ﬁrst element of K3,
l.€., I521.

2 Fori=2,--- ,M, a=a+[(a,-_1,~-- ,a1)*I-t,-,It,-]. Next i.

3 The ﬁnal set of LP coefﬁcients are obtained as the ﬁnal vector
a

 

 

 

Table 5.4: Conversion of LP coefﬁcients to reﬂection coefﬁcients
Forj=M,-~ ,1,

 

1 LEI) I‘Cj = aj.

2 Consider elements 1 through 3' of a. Let a = [a], - . - ,a,] and
let a = [a,-, aj_1, ' ° ',0.1].

a = (a — Fey/(1 — 5?)

 

 

Next j.

 

2
T1: —sin-1 Hz, for l = 1,2, - -- ,M. (5.5)

7r
Information can be embedded by modifying these parameters. The
modiﬁed LAR or inverse sine parameters are converted to the corre-
sponding modiﬁed reﬂection coeﬂicients, which in turn are converted
to modiﬁed LP coefﬁcients. The stegosignal is reconstructed by follow-

ing steps 4 and 5 in Table 5.1.

85

 

12-

.;l] l” “ l l l [[

 

 

 

Binary watermark
O
O)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0:) U [l E [

 

 

 

0 50 100 150
No. of bits

Figure 5.1: The ﬁrst 100 bits of the 1000-bit binary watermark.

5.3 Experiments and discussion

Experiments were performed to compare the robustness and ﬁ-
delity aspects of watermarking in the LSP, LAR, IS, and parcor domains
with LP watermarking. Speech was watermarked in the LP domain us-
ing the algorithm in Table 3.1. The algorithm in Table 5.1 was used
for watermarking in the LSP, LAR, IS, and parcor domains.

In the experiments presented below, the coversignal consists of
15.625 s of speech from the TIMIT database [41], sampled at 16 kHz.
The coversignal consisted of samples from ten different sentences of the

TIMIT database, uttered by a female talker. The ﬁrst 24000 or 26000

86

 

samples of the ten sentences were watermarked. A 1000—bit pseudo
random binary watermark sequence was generated [Fig. 5.1]. The cov-
ersignal was divided into 125 frames of 2000 samples or 0.125 seconds
duration each. In each of the speech frames, a length eight watermark
vector was embedded into the coefﬁcients of an eighth-order paramet—
ric model, resulting in a data payload of 64 bits per second. Very few
audio watermarking algorithms can satisfactorily trade-off robustness
and ﬁdelity at a payload of 43 bits per second. For the LP paramet-
ric watermarking experiments presented in this section, the watermark
embedding and recovery do not involve selective normalization. The
sample correlation coefﬁcient [equation 3.21] is used as the measure of
similarity between original and recovered watermark vectors for all the

parametric watermarking techniques.

5.3.1 Subjective perceptual tests

Parametric watermarking in LP, LAR, IS, and parcor domains was
implemented at CWRscg’s of 7.715 dB, 10.68 dB, 27 dB, and 30 dB.
LSP—based parametric watermarking was implemented at CWRseg’s of
27 dB and 30 dB. In LSP-based watermarking, the modiﬁed LSP pa-
rameters must be between 0 and 7r and this imposes a constraint on
watermarking at higher CWRSCg’s of 7.715 dB and 10.68 dB. Parameter—

embedded watermarks were fairly inaudible at these CWR,Cg [30].

87

Although CWR,Cg is used as the objective measure of ﬁdelity, lis—
tening tests were also performed to compare the watermarked speech
ﬁdelity. Five subjects were asked to rank the stegosignals from different
parameter-embedded watermarking schemes in terms of the perceptual
similarity to the coversignal. The sound ﬁles used in the subjective
listening tests are available at the website [30]. At CWRSeg of 7 dB,
the stegosignal from LP watermarking was found to have the highest
ﬁdelity followed by stegosignals from LAR and IS watermarking. At
CWRscg of 7 dB, the stegosignal from parcor watermarking was found
to have the least ﬁdelity. At CWRseg of 27 dB, the ﬁdelity of LP,
LSP, LAR, and IS stegosignals was comparable. While, the stegosignal

ﬁdelity of parcor watermarking was marginally worse.

5.3.2 Robustness experiments

Experiments were performed to study the robustness of LP, LSP,
LAR, IS, and parcor based watermarking algorithms to additive noise,
and speech coding. The stegosignals were subjected to white Gaussian
noise in the time domain resulting in SNRs ranging from —30 dB to 60
dB. Figure 5.2(a) shows the correlation coefﬁcient between the 1000-bit
original and recovered watermarks for LP, LAR, IS, and parcor water-
marking at CWRseg of 7 dB. It can be observed from Fig. 5.2(a) that

parcor watermarking is least robust to additive noise. And IS water-

88

marking results in better robustness than LP and LAR watermarking
at CWR60g of 7 dB. The improved robustness of LP watermarking at
7 dB CWRseg in Fig. 3.5 compared to LP watermarking in Fig. 5.2(a)
is due to selective normalization. The robustness of LP, LSP, LAR,
IS, and parcor watermarking to additive white noise at CWRseg of 27
dB is shown in Fig. 5.2(b). At 27 dB CWRSCg, LSP-based watermark-
ing results in more robust watermarks and LP watermarking without
normalization results in the least robust watermarks.

Robustness of LP, LSP, LAR, IS, and parcor watermaking to exist-
ing speech coding schemes was tested. G.711, ADPCM, GSM, CELP
and LPC10 were the speech coders used for robustness testing [22].
The original coversignal and stegosignal consisted of 256k bits per sec-
ond. The stegosignals were coded (compressed), then decoded (decom-
pressed), and the watermarks were recovered from the decoded signals.
The correlation coefﬁcient between the original and recovered water-
marks for different speech coders are tabulated in Tables 5.5 and 5.6
for the different parametric watermarking techniques. The attacked
stegosignals are available in the website [30]. The coding bit rates of
the different speech coders are also listed in Tables 5.5 and 5.6.

It is seen from Tables 5.5 and 5.6, that all the attacked stegosig-
nals at CWRseg of 7 dB and 27 dB are highly robust to the G711 p-law

coder, except LP watermarking at 27 dB CWRSCg. LSP watermarking

89

 

 

 

—e— LAR
0.9[ —+— IS

+ PARCOR
0.7

0.6
0.5
0.4
0.3
0.2
0.1

  
 
 

 

 

   

 

Correlation coefﬁcient

 

 

 

 

 

—e— LP

—e— LAR
—i— IS

+ PARCOR

+ LSP

I

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

ﬁr

 

 

 

 

Correlation coefﬁcient

 

 

 

-2o -10 0 1o 20 30 4o 50 60

Figure 5.2: Effect of white Gaussian noise on LP, LSP, LAR, IS and
PARCOR embedded watermarks. In 5.2(a) a CWR68g of 7 dB was used
to obtain the stegosignals, and in 5.2(b) a CWRseg of 27 dB was used
to obtain the stegosignals.

90

 

Table 5.5: Robustness to speech coding CWRGeg of 7 dB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Speech Bit LP LAR IS PARCOR
codec rate 7 dB 7 dB 7 dB 7 dB
k bits/s corr coef corr coef corr coef corr coef
G.711 64 0.9916 0.9844 0.9967 0.9962
ADPCM 32 0.6669 0.8502 0.9381 0.6746
GSM 13.2 0.0246 0.1151 0.1481 —0.0074
CELP 4.5 —0.0905 —0.0180 -0.0043 —0.0250
CELP 2.3 —0.0665 0.0043 0.0291 —0.0078
LPC10 2.4 —0.0825 —0.0266 0.0194 0.0281

 

is highly robust to G.711 p-law and G726 ADPCM coders even at a
higher CWR,Cg of 27 dB. The robustness of all the parametric schemes
decreases for very low bit rate speech coding [13.2k to 2.3k bits per

second]. A similar trend was observed in the robustness of SS water-

marking to low bit rate speech coding in Table 3.5.

Watermark robustness to very low bit rate speech coding can be
improved by compromising watermark payload for greater robustness.
For example, error correcting stategies can be applied to watermark-

ing [52] and the watermark can be repeatedly embedded for greater

diversity [43].

91

 

 

Table 5.6: Robustness to speech coding at CWRseg of 27 dB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Speech Bit LP LAR IS PARCOR LSP
codec rate 27 27 27 27 27
dB dB dB dB dB
k corr corr corr corr corr
bps coef coef coef coef coef
G.711 64 0.7767 0.9868 0.9840 0.9662 0.9966
ADPCM 32 0.0578 0.7590 0.7247 0.6212 0.8663
GSM 13.2 —0.0662 0.0225 0.0146 0.0034 0.0872
CELP 4.5 —0.0898 0.0068 0.0125 —0.0310 0.0124
CELP 2.3 —0.0639 0.0065 0.0453 —0.0357 —0.0207
LPC10 2.4 —0.0927 0.0049 —0.0176 —0.0319 —0.0217

 

 

 

5.4 Perturbed parameter models in watermarking

Deller and Gulboy [48,49], determined conditions under which an
autoregressive (AR) model1 with stochastic parameters can be approx-
imated by a time-invariant one wherein the stochastic coefﬁcients are
replaced by their mean values. In this section, the application of AR
perturbed parameter theory to LP parametric speech watermarking is
explored. We consider a general Markov equation with slightly per-

turbed parameters of the form

~

yn+1 : (E(énaynan)‘ (56)

In general, 9., is Q-vector, 5,, is a ﬁrst order stationary stochastic R-

vector, and (I) is a general vector function of (I, y and n. The conditions

 

lThe AR model is the statistician’s name for an LP model driven by white noise

92

 

under which the model in equation (5.6) is well approximated by the

following model [equation (5.7)] are given in [48,49]:

y..+1= ¢(yn,n), where mm) = E[<I>(5n,yn,n)lyn,n]. (5.7)

It is shown that, if the following conditions are true with probability

one on n E [1, N]:
[|<I>(dn,yn,n) — ¢(yna TL)“ < 6 (58)

then the models in equations (5.6) and (5.7) approximate according to,
N-l ‘
[lyn—ynll S e{1+ZK’L}, w.p.1, n6 [1,N]. (5.10)
i=1

This perturbed parameter theory is used for obtaining bounds on the

perturbation of the coversignal caused by watermarking.

93

5.4.1 Time-varying AR models in watermarking

The stegosignal obtained using equation (3.3) is manipulated into

a time-varying AR model:

3

yn : éiyn—i + {n
1

s

‘l

 

(at + wi)(yn—i - 972—1) 3)

.. n—i + 5n
yn—i

.M:

[0.1+ w,- +
1

ll

2

~

(lawn—2' +£1.-

.M:

i 1

The expression for the time-varying AR coefﬁcients {5%,} can be ma-

nipulated into the form

am : deﬁne: + wipm = at + ai(pn,i — 1) + wipn,i,

where pm- 2 (%"—:—f) z 1. The time—varying AR parameters are com-

posed of the true parameter term, a,, and the perturbation term, a,(pn,,-—

1) + (42,-qu

5.4.2 Application of perturbed parameter Markov equations

to watermarking

Now let us suppose that the stegosignal is constructed such that,

(ai + can ,i )yn— —i + 6n: 2 andyn—i +671: (5'11)

TIM:

94

an, if n is odd
where wm- = . Then the time—varying AR pa—

—w,-, if n is even

rameters are given by,
and : an,ipn,i : (at + wn,i)pn,i- (512)

The AR(M) system is written in state space formulation as follows.

 

Let,
5,111:A yn'l'Géan) 311,5" In
+ n ( n ) (5.13)
ﬁn = CTyn+1
where
81ml (371,2 éTimM
An = I(M—1)x(M—1) 0 ,
0

 

 

 

in : [ﬁn-1: git-2a ' ° ° )ng‘A/I]Ti
G = C = [110). ° ' IOlTa 5n =[&n,193n,21' ' ' ian,AI]T-

The watermark coefﬁcients ({wn,,-},1-l.“__1) are such that the time-varying
AR coefﬁcients are ﬁrst order stationary with E [5m] 2 a, and [5m -—-

a,-| < L. The perturbed AR parameter theory is used for determining

95

how well the AR model in equation (5.13) is approximated by the model

n :An 11+an n,”
y+1 y 5 r(y ) (5.14)

yn : CTy n+1
In the above equation, A = E [An]. The vector yn is deﬁned similarly to
the analogous vector in equation (5.13). As demonstrated in [48], it is
similarly determined that the small perturbation condition (Itim —a,-| <

L) is equivalent to the condition (5.8) of the theorem.
||<P(5m5’mn)-¢($'mm||* = ”(An—Alirzlh S llAn—Allellynlle (5-15)

According to [48, 49], the matrix norm [H]... is selected such that for
any square matrix A, “A”... _<_ r(A) + 6, given any 6 > 0. Here, r(A)
represents the spectral radius of the matrix A. In Lemma 5.6.10 of [53]

the matrix norm [H]... is given by
HAIL = ||DtUTAUD,“1||1 = ||(UD{1)’1A(UD{1)||1, (5-16)

in which I|.[[1 iS the maximum column sum matrix norm induced by
the 61 vector norm, and Dt = diag(t,t2,t3,...,t") with t > 0 and
sufﬁciently large. The matrix U is obtained by the Schur decomposi-
tion of A given by A = UAUT, A being an upper triangular matrix

with the main diagonal components comprised of the eigenvalues of

96

 

A. The vector norm compatible with the induced matrix norm |[.||...
is the £1 norm since [IDtUTAUDt'lxlll g ||DtUTAUD,’1[[1[|x||1 and
DtUTIUDt-1 = I, the identity matrix.

It is reasonable to assume that “ﬁn“... is bounded by W, a non-
negative ﬁnite number. The bound on Hymn. is determined in [48,49]

and is given by,

llynll <Y0HllAkll +2 ﬂ MAJ-II IIGII.W+HGII*W (5.17)

k: 0 j=k+1

Where, Y0 = [[370]]... and “G“... = 1. Also, An as A for all n and hence it
is assumed that there exists a small number e’ (L), a function of L, such
that “An”... S r(An) + e’ (t) Let p be the maximum pole magnitude
of the system associated with A and hence p = r(A). Similarly, let
p(n) be the pointwise maximum pole magnitude associated with An
and p(n) = r(An) [48,49]. Also, p(n) < p + on for small L and a
being a constant. Then, “An“... S p + at + e’(t). Since [5,”- — a,-| < t,
“A” — All... _<_ f(t) for some number f(t). Hence, equation (5.17) can

be rewritten as,

n 1

||yn||,.. <Y0(p+at+e()) "+l/V{1+:('(t)p+at+e )}=S(n).
1:1

(5.18)

Also, S(n) < co and S(n) < S(N) for all n E [1, N]. Hence equation

97

 

 

(5.15) can be expressed as,
llq’(5n,S/mn) - ¢($'n.n)l|* S f(L)S(N) = E(N), (5-19)

with probability one. The above equation represents condition (5.8) of
the theorem. The existence of Lipschitz constant K L in equation (5.9)

can be demonstrated in a similar way as in [48,49],

456712,”) _ $691177“) : (Ayn + G611) — (AYn + G511) : A691 — Ya)-

Hence, condition (5.9) of the theorem is given by,

ll¢(ynan) _ ¢(ynan)ll* S [[Al[*llyn ‘ynll* : (P+€)[[$’n —Yn[l*a (520)

with probability one. Based on conditions (5.19) and (5.20), the norm
of the difference between the stegosignal and the coversignal is bounded

as follows.

N—l
IIS'n - yelle S (f(b)5(N)) {1+ 2(1) + eli}. (5-21)

with probability one. Experiments were performed on speech data to
determine the ﬁnal bounds on the watermark signal for parameter per—
turbations. The coversignal consisted of 20 mS of the vowel sound

/A/ sampled at 10000 Hz. An 8thorder LP inverse ﬁlter was used for

98

 

watermarking and the watermark vector was,

0) = 10'3 >1: [—-0.1025, —0.1234, 0.0289, —0.0429, 0.0056, ——-0.0368,

 

—0.0465, 0.0371].

By applying the perturbed parameter theory, the right-hand side of
equation (5.21) was determined to be 11.03. It was veriﬁed experimen-
tally that the 61 norm of the difference between the stegosignal and
coversignal is bounded by 11.03. A tighter upper bound would be of

greater signiﬁcance to practical implementations of parametric water— i

 

marking. A relatively high value of 11.03 is obtained in the right-hand L
side of (5.21) as the parameter perturbations are of higher energy than
the underlying requirement in the formulation of the theorem [condi-

tions (5.8) and (5.9)].

99

Chapter 6

Conclusions

The dissertation presents a general approach to watermarking of
speech signals based on LP, LSP, LAR, IS, and PARCOR parametric
models. The dissertation focusses on embedding watermark informa-
tion by directly or indirectly modifying the long-term LP parameters
of speech. Parametric watermarking incorporates characteristics of SS
watermarking algorithms, as well as those of integration-by—synthesis
techniques. These aspects strongly inﬂuence the ﬁdelity, security and
robustness characteristics of the technique.

Watermark recovery is treated as a system identiﬁcation problem
involving LSE estimation. The watermark information is concentrated
during the embedding and recovery phases, while it is temporally and
spectrally distributed otherwise. The distributed nature of the water—
mark combined with the LSE estimation during recovery, contribute to

watermark robustness.

100

 

The dissertation initially focussed on speech watermarking in the
LP domain. In experiments presented here, and in many others, LP
parametric watermarking has proven to be robust to most common
forms of attack. An example parametric watermark detector has been
presented to assess performance. The noise in the parameter domain
was found to be Gaussian distributed when white or colored noise was
added to the stegosignal in the time domain. By selectively normalizing
watermark coefﬁcients to parameter magnitudes, 1 / [a,[, whenever [ail >
1, the parameter noise affecting the watermark coefﬁcients was rendered
independent of the original predictor coefﬁcients. Through this selective
normalization, watermark detection can be treated as signal detection
problem in the presence of Gaussian noise. Very low false-alarm rates
are achieved.

The method of Lagrange multipliers can be used for obtaining op-
timally robust watermarks from perturbed LP coefﬁcients selected from
the membership set. SMF optimization is however not useful against at—
tacks that are independent of the stegosignal. For applications limited
by computational complexity and where the energy of the watermark
signal is considered to be of main signiﬁcance to robustness, searching
the hyperellipsoidal boundary at intermittent points results in more ro—
bust watermarks than the central estimate of the membership set. The

use of SMF in obtaining robust watermarks to ﬁltering, quantization,

101

”.51"- -b ‘V:\ 9'
a

 

 

and combination attacks is demonstrated.

The ﬁdelity and robustness aspects of LP, LSP, LAR, IS, and par~
cor parametric watermarking algorithms were compared. It is deter-
mined that stegosignals obtained by LP and LAR watermarking are
generally associated with high ﬁdelity even at a low CWRseg of 7 dB.
Although LSPs cannot be watermarked at 7 dB CWRGCg, LSP-based
watermarking is highly robust to noise and G711 and G.726 codecs
even at a CWR60g of 27 dB. In general, parametric watermarking is
much less robust to CELP and LPC10 codecs compared to G.711, G.726
and GSM codecs. However, the quality of speech decompressed by low
bit rate CELP and LPC10 codecs is very low.

An application of AR perturbed parameter theory to speech water-
marking is presented and bounds are obtained on the watermark signal
for small parameter perturbations.

Parametric watermarking algorithms can be used for applications
such as content management, broadcast monitoring, owner identiﬁca-
tion and copyright protection. Parametric watermarking is highly ro-
bust to additive noise, quantization errors, speech codecs such as G.711,
G.726, GSM, and cropping. Based the requirements of an application,
the ﬁdelity and robustness of parameter-embedded watermarks can be

systematically adjusted.

102

fur ms": 0.2.1.5111

 

Bibliography

[1] MS. SEADLE, J .R. DELLER, JR. and A. GURIJALA, “Why water—
mark? The copyright need for an engineering solution,” Proceedings
of ACM/IEEE Joint Conference on Digital Libraries (JCDL), Port-
land, July 2002.

[2] I.J. COX, M.L. MILLER and J .A. BLOOM, Digital Watermarking,
Academic Press, 2002.

[3] N.F. JOHNSON, Z. DURIC and S. JAJODIA, Information Hiding:
Steganography and Watermarking - Attacks and Countermeasures,
Kluwer Academic Publishers, 2000.

[4] S. VOLOSHYNOVSKIY, S. PEREIRA, T. PUN, J.K. SU and
J .J . EGGERS, “Attacks and benchmarking,” IEEE Communica-
tions Magazine, August 2001.

[5] A. GURIJALA and J .R. DELLER, JR., “Robust algorithm for wa-
termark recovery from cropped speech,” Proceedings of IEEE In—
ternational Conference on Acoustics, Speech and Signal Processing
(ICASSP), Salt Lake City, May 2001.

[6] J .R. DELLER, JR., J .H.L. HANSEN and J .G. PROAKIS, Discrete—
Time Processing of Speech Signals (2d ed.), IEEE Press, 2000.

[7] A. GURIJALA, J.R. DELLER, JR., M.S. SEADLE and
J.H.L. HANSEN, “Speech watermarking through parametric mod-

eling,” Proceedings of International Conference on Spoken Language
Processing (ICSLP), Denver, CO, September 2002.

[8] S. HAYKIN, Adaptive Filter Theory (3d ed.), Prentice-Hall, 1996.

103

 

 

[9] J.H.L. HANSEN, B. ZHOU, M. AKBACAK, R. SARIKAYA and
BL. PELLOM, “Audio stream phrase recognition for a National
Gallery of the Spoken Word: One small step,” Proceedings of I C-
SLP, Beijing, October 2000, pp. 1089-1092.

[10] I.J. Cox, .1. KILIAN, T. LEIGHTON and T. SHAMOON, “Secure
spread spectrum watermarking for multimedia,” IEEE Transactions
on Image Processing, vol. 6, no. 12, pp. 1673-1687, December 1997.

[11] F.J. RUIZ and J .R. DELLER, JR., “Digital watermarking of
speech signals for the national gallery of the spoken word,” Proceed-

ings of IEEE I CASSP, Istanbul, Turkey, May 2000, pp. 1089—1092.

[12] D. ANAND and U.C. NIRANJAN, “Watermarking medical images
with patient information,” Proceedings of IEEE/EMBS Conference,
Hong Kong, October 1998, pp. 703-706.

[13] SC. MIAOU, C.H. HSU, Y.S. TSAI and HM. CHAO, “A secure
data hiding technique with heterogeneous data-combining capability
for electronic patient records,” Proceedings of the World Congress
on Medical Physics and Biomedical Engineering: Electronic Health-

care Records, Chicago, July 2000.

[14] T. KALKER, G. DEPOVERE, J. HAITSMA and M. MAES, “A
video watermarking system for broadcast monitoring,” Proceed-
ings of SPIE 1S8 T/SPIE ’s 11th Annual Symposium on Electronic
Imaging ’99: Security and Watermarking of Multimedia Contents,
Chicago, January 1999, vol. 3657.

[15] AS. SPANIAS, “Speech Coding: A Tutorial Review,” Proceedings
of the IEEE, vol. 82, no. 10, pp. 1541-1582, October 1994.

[16] Q. CHENG and J. SORENSEN, “Spread spectrum signalling for
speech watermarking,” Proceedings of IEEE I CASSP, Salt Lake
City, May 2001, vol. 3, pp. 1337-1340.

[17] M. HAGMULLER, H. HORST, A. KROPFL and G. KU-
BIN, “Speech watermarking for air trafﬁc control,” Proceedings
of 12thEur0pean Signal Processing Conference, Vienna, Austria,

September 2004.

104

[18] M. HATADA, T. SAKAI, N. KOMATSU and Y. YAMAZAKI, “Dig-
ital watermarking based On process of Speech production,” Proceed-
ings of SPIE: Multimedia Systems and Application, 2002, vol. 4861.

[19] M. CELIK, G. SHARMA and A.M. TEKALP, “Pitch and Dura-
tion Modiﬁcation for Speech Watermarking,” Proceedings of IEEE
ICASSP, Philadelphia, PA, March, 2005, vol. 2, pp. 17-20.

[20] E. MOLINES and F. CHARPENTIER, “Pitch-synchronous wave-
form processing techniques for text-to—speech synthesis using di-
phones,” Speech Communication, vol. 9, no. 5—6, pp. 453-467, De—
cember 1990.

[21] B. CHEN and G.W. WORNELL, “Quantization index modulation:
A class of provably good methods for digital watermarking and in-
formation embedding,” IEEE Transactions on Information Theory,
vol. 47, no. 4, pp. 1423-1443, May 2001.

[22] A.M. KONDOZ, Digital Speech: Coding for Low Bit Rate Commu-
nication Systems (2d ed.), John Wiley & Sons, 2004.

[23] S. GOLLAMUDI, S. NAGARAJ, S. KAPOOR and Y.F. HUANG,
“SMART: A toolbox for set-membership ﬁltering,” Proceedings of
1997 European Conference on Circuit Theory and Design, Bu-
dapest, Hungary, 1997.

[24] S. NAGARAJ, S. GOLLAMUDI, S. KAPOOR and Y.F. HUANG,
“BEACON: An adaptive set-membership ﬁltering technique with
sparse updates,” IEEE Transactions on Signal Processing, vol. 47,
no. 11, pp. 2928-2941, November 1999.

[25] J .R. DELLER, JR. and Y.F. HUANG, “Set-membership identi-
ﬁcation and ﬁltering for Signal processing applications,” Circuits,
Systems, and Signal Processing. (Special issue on signal processing
and its applications), vol. 21, no. 1, pp. 69-82, January 2002.

[26] J.R. DELLER, JR., M. NAYERI and SF. ODEH, “Least square
identiﬁcation with error bounds for real-time signal processing and
control,” Proceedings of the IEEE, vol. 81, pp. 813-849, June 1993.

105

 

1“...” ., 4,: , , ' ‘
.

[27] J.R. DELLER, JR., “Set membership identiﬁcation in digital signal
processing,” IEEE Acoustics, Speech and Signal Processing Maga—
zine, vol. 6, no. 4, pp. 4-20, October 1989.

[28] S. BOYD and L. VANDENBERGHE, “Convex Optimization,” Cam-
bridge University Press, 2004.

[29] A. GURIJALA and J .R. DELLER, JR., “Speech Watermarking
by Parametric Embedding with an €00 Fidelity Criterion,” Pro-
ceedings of Eurospeech-2003, Geneva, Switzerland, September 2003,
pp. 2933-2936.

[30] SPEECH FILES,
http: / /www.egr.msu.edu/ ~deller/Parankg/WAVFILES.

[31] D. GRUHL, A. LU and W. BENDER, “Echo hiding,” Lecture
Notes in Computer Science; Proceedings of the First International
Workshop on Information Hiding, Cambridge, UK, 1996, vol. 1174,
pp. 293-315.

[32] S. WANG, A. SEKEY and A. GERSHO, “An objective measure
for predicting subjective quality of speech coders,” IEEE Journal
on Seclected Areas in Communications, vol. 10, no. 5, pp. 819-829,
June 1992.

[33] SA. CRAVER, N. MEMON, B-L. YEO and M. YEUNG, “Resolv—
ing Rightful Ownerships with Invisible Watermarking Techniques:
Limitations, Attacks, and Implications,” IEEE Journal of Selected

Areas in Communications - Special issue on Copyright and Privacy
Protection, vol. 16, no. 4, pp. 573—586, May 1998.

[34] J .J . HERNANDEZ, M. AMADO and F. PEREZ-GONZALEZ, “DCT-
domain watermarking techniques for still images: Detector perfor-
mance analysis and a new structure,” IEEE Transactions on Image
Processing, vol. 9, no. 1, pp. 55-68, January 2000.

[35] M. BARNI, F. BARTOLINI, A.D. ROSA and A. PIVA, “Opti-
mal decoding and detection of multiplicative watermarks,” IEEE
Transactions on Signal Processing, vol. 51, no. 4, pp. 1118-1123,

April 2003.

106

[36] TR CHEN and T. CHEN, “A framework for optimal blind wa-
termark detection,” Proceedings of ACM Multimedia and Security
Workshop, Ottawa, Canada, October 2001.

[37] J.P.M.G. LINNARTZ, A.C.C. KALKER and G.F. DEPOVERE,
“Modeling the false—alarm and missed detection rate for elec-
tronic watermarks,” Lecture Notes in Computer Science, vol. 1525,

pp. 329-343, Springer-Verlag, 1998.

[38] ML. MILLER and J.A. BLOOM, “Computing the probability of
false watermark detection,” Proceedings of the Third Workshop on
Information Hiding, Dresden, Germany, 1999, pp. 146-158.

[39] A. GURIJALA and J.R. DELLER, JR., “Detector design for para-
metric speech watermarking,” IEEE International Conference on
Multimedia and Expo (ICME), Amsterdam, The Netherlands, July
2005, pp. 251—255.

[40] H.V. POOR, An Introduction to Signal Detection and Estimation
(2d ed.), Springer-Verlag, 1994.

[41] P.J. PRICE, “A database for continuous speech recognition in
a 1000-word domain,” Proceedings of IEEE ICASSP, New York,
vol. 11, pp.651-654, 1988.

[42] G.R. COOPER and CD. MCGILLEM Modern Communications
and Spread Spectrum, McGraw-Hill Book Company, 1996.

[43] D. KUNDUR and D. HATZINAKOS, “Diversity and attack charac-
terization for improved robust watermarking,” IEEE Transactions

on Signal Processing, vol. 29, no. 10, pp. 2383-2396, October 2001.

[44] J.G. PROAKIS and D.G. MANOLAKIS, Digital Signal Processing:
Principles, Algorithms, and Applications (3rd ed.), Prentice-Hall,
1996.

[45] F.A.P. PETITCOLAS, R.J. ANDERSON and MG. KUHN, “At-
tacks on copyright marking systems,” Proceedings of Second Work-
shop on Information Hiding, Portand, Oregon, April 1998, pp.218—
238.

107

 

[46] HAWKVOICE FROM HAWK SOFTWARE,
http: / /www.hawksoft .com/hawkvoice.

[47] A. GURIJALA and J.R. DELLER, J R., “Speech watermarking with
objective ﬁdelity and robustness criteria,” Proceedings of Asilomar
Conference on Signals, Systems, and Computers, Paciﬁc Grove, CA,
November 2003.

[48] J .R. DELLER, JR. and Z. GULBOY, “Simpliﬁed models for per-
turbed parameter Markov equations with application to ARMA sys—
tems,” International Journal on Systems Science, vol. 14, no. 10,
pp. 1185-1190, 1983.

[49] J .R. DELLER, JR. and Z. GULBOY, “A correction to ’Simpliﬁed
models for perturbed parameter Markov equations with applica-
tion to ARMA systems’,” International Journal on Systems Science,
vol. 15, no. 8, pp. 915-916, 1984.

[50] SR. QUAOKENBUSH, T.P. BARNWELL and M.A. CLEMENTS,
Objective Measures of Speech Quality, Prentice-Hall, NJ, 1988.

[51] S. KAY, Modern Spectral Estimation: Theory and Application,
Prentice-Hall signal processing series, NJ, 1988.

[52] S. BAUDRY, J .-F. DELAIGLE, B. SANKUR, B. MACQ and
H. MAITRE “Analysis of error correction strategies for typical com-
munication channels in watermarking,” Signal Processing, vol. 81,

pp. 1239—250, 2001.

[53] RA. HORN and CR. JOHNSON, Matrix Analysis, Cambridge
University Press, 1996.

108

 

lI[[1]][l][[[[[l[[l[]]l][[[][l]ll