(.1 Hum A uatmt .I..~IQJ- .. i Inna? .. 3?... V .‘a , . . figunmmv. . a % . .. f4:« .5» ... than. his»? .11.. 44, Av: 11.. about». .Lr .t‘ 5.354” z.v ~ .1 l .91.: . } at: I. I}. .. I A!" hint! 1.3 ’1', .- 4i.f~_(tnw .I l. - a. F“! .r..,.lr:. e. a a... «:19... 2.....1...‘ ¢ r 1 at; we... r 1734.833 Q UBRARY ma Michigan State University This is to certify that the dissertation entitled SPEECH WATERMARKING THROUGH PARAMETRIC MODELING presented by APARNA GURIJALA has been accepted towards fulfillment of the requirements for the PhD. degree in Electrical Engineering MajorOProfessor’s Sigyfitt‘ire' 0/ 7’10 V. c900; Date MSU is an affirmative-action, equal-opportunity employer . - -.---.—.---o-.-- .0-----.-.—.—._ . PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:lClRCIDateDuetindd-p.1 SPEECH WATERMARKING THROUGH PARAMETRIC MODELING ' By A pama Gum'jala A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 2007 ABSTRACT SPEECH WATERMARKING THROUGH PARAMETRIC MODELING By A pama Gur‘z'jala Parameter-embedded watermarking of speech is effected through slight perturbations of parametric models of deeply-integrated dynam- ics of the signal. This research focusses on speech watermarking tech- niques based on linear-in-parameters speech models. Information is embedded by modifying the linear predictor coefficients of the original speech, subject to fidelity constraints. The modified parameters are used to reconstruct the watermarked speech. Experiments with real speech data are used to assess robustness and other performance prop- erties. A particular example of watermark detector design is discussed and performance tested. In set-membership filtering (SMF) based parametric watermarking, linear predictor (LP) coefficients of the original speech are modified subject to an objective fidelity constraint. SMF is used to obtain a hyperellipsoidal set of allowable parameter perturbations (i.e., water- marks) subject to a constraint on the error between the watermarked and original material. This research discusses the robustness of SMF based watermarking to filtering, quantization and combination attacks. An important consideration in watermark robustness is the energy of the watermark signal (difference between watermarked and original sig- nals). Watermarks of higher energy are obtained from perturbed LP coefficients at the boundary of the hyperellipsoidal set. A constrained optimization problem is solved to obtain the best watermarks for fil- tering and quantization attacks. Finally, a generalized framework for parametric speech watermark- ing is presented. In addition to the LP model, other parametric repre— sentations such as log area ratio, inverse sine, line spectrum pair, and reflection coefficients are used for speech watermarking. An application of perturbed parameter theory for autoregressive models is presented. The perturbed parameter theory is used to obtain bounds on the per- turbation of the stegosignal caused by watermarking. ACKNOWLEDGMENTS I would like to thank my faculty advisor Dr. Jack Deller for his guidance, patience, understanding and support throughout my grad- uate studies at Michigan State University. I sincerely thank him for providing me with an opportunity to conduct research in speech wa- termarking and for creating a conducive learning environment. I am very grateful to my PhD committee, Drs. Aviyente, Jain, Radha, and Seadle, for their concern, patience, and encouragement. The classroom experience and research discussions with my professors have invaluably contributed to my knowledge and understanding. I would especially like to thank my family for their love, patience and understanding. My parents always put great on emphasis on education and were willing to support me in every possible way. I would also like to acknowledge my brother, Ashok for his encouragement and his great interest in technology. I would like to thank Ali for his encouragement and confidence in me and for the numerous research discussions we had. I would like to thank Mujahid, Dale, and Margaret for their kindness, encouragement and support throughout my PhD program. Ali, Mu- jahid, Dale, and Margaret made my stay at MSU a very memorable one. This work is supported by the National Science Foundation of the iv United States under Cooperative Agreement No. IIS-9817485. I would like to acknowledge NSF for their generous support to the National Gallery of Spoken Word project. Any opinions, findings, and conclu- sions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. .I Table of Contents List of Tables viii List of Figures ix 1 Introduction 1 2 Background 7 2.1 Speech watermarking .................... 8 2.1.1 Spread spectrum watermarking .......... 9 3 2.1.2 Watermarking integrated with speech synthesis . 11 2.1.3 Pitch and duration modification for watermarking 12 2.2 Set-membership filtering .................. 14 2.2.1 Overview of SMF .................. 14 2.2.2 Set-membership weighted recursive least squares . 15 2.3 Lagrange Multipliers .................... 16 Parametric Speech Watermarking in the LP Domain 20 3.1 Introduction ......................... 20 3.1.1 An algorithm for LP parametric watermarking . . 21 3.1.2 Recovering LP parameter-embedded watermarks . 24 3.1.3 Perceptual aspects of LP parametric watermarking 26 3.1.4 Security issues .................... 28 3.1.5 A detection algorithm for LP parametric water- marking ....................... 31 3.2 Experiments and discussion ................ 42 3.2.1 Introduction ..................... 42 3.2.2 Subjective perceptual tests ............. 47 3.2.3 Watermark robustness ............... 48 vi 4 LP Parametric Watermarking with a Fidelity Constraint 67 4.1 Introduction ......................... 67 4.2 SMF parametric watermarking ............... 68 4.3 Robustness optimization .................. 71 4.3.1 Optimal watermarks for a filtering attack ..... 72 4.3.2 Optimal watermarks for a quantization attack . . 74 4.3.3 Maximizing watermark energy ........... 75 4.4 Experiments and discussion ................ 76 5 Generalizations and Extensions 80 5.1 Introduction ......................... 80 5.2 Generalized framework for parametric watermarking . . . 81 5.3 Experiments and discussion ................ 86 5.3.1 Subjective perceptual tests ............. 87 5.3.2 Robustness experiments .............. 88 5.4 Perturbed parameter models in watermarking ...... 92 5.4.1 Time-varying AR models in watermarking . . . . 94 5.4.2 Application of perturbed parameter Markov equa- tions to watermarking ............... 94 6 Conclusions 100 Bibliography 103 vii all 2.1 3.1 3.2 3.3 3.4 3.5 4.1 5.1 5.2 5.3 5.4 5.5 5.6 List of Tables SM-WRLS algorithm .................... Watermark embedding algorithm ............. Watermark recovery algorithm .............. Effect of selective normalization .............. Estimates of SNR, d2, PD and PF ............. Robustness to speech coding ................ Robustness to quantization attacks ............ Generalized watermark embedding algorithm ...... Generalized watermark recovery algorithm ........ Conversion of reflection coefficients to LP coefficients . . Conversion of LP coefficients to reflection coefficients . . Robustness to speech coding CWR.3eg of 7 dB ...... Robustness to speech coding at CWRfi.3g of 27 dB viii 19 37 41 65 79 84 85 85 91 92 3.1 3.2 3.3 3.4 3.5 List of Figures Typical noise distribution in the LP domain for any co- efficient. For Fig. 3.1(a) 15 dB white noise was added in time domain to the stegosignal, and for Fig. 3.1(b) 15 dB colored noise was added to the stegosignal. ..... Effect of complete normalization, selective normaliza- tion, and no normalization of watermark coefficients on the correlation coefficient between original and recovered watermarks. In 3.2(a) the stegosignal was distorted by white noise in the time domain, and in 3.2(b) colored noise was added to the stegosignal. ............ Plots of (a) coversignal and (b) stegosignal at CWRéeg of 7.715 dB. The coversignal and the stegosignal are of 1 s duration and sampled at 16 kHz. The speech is divided into frames of 2000 samples and a watermark vector is embedded into each of the eight frames. ......... Segments of cover (dotted line) and stegosignals (con- tinuous line) of 480 samples or of 0.03 ms duration and a CWRseg 0f 7.715 dB. The cover and stegosignals used in the robustness experiments are of 1 s duration and sampled at 16 kHz. The speech is divided into frames of 2000 samples and a watermark vector is embedded into each of the eight frames. .................. Watermark robustness to white noise attack. Perfor- mance of parametric watermarking at CWRfieg’s of 7.715 dB and 10.68 dB is compared with that of SS watermark- ing at 7.715 dB, 10.68 dB, 27 dB and 30 dB CWRseg. ix 45 51 .1. | 3.6 3.7 3.8 3.9 3.10 3.11 3.12 4.1 4.2 5.1 Watermark robustness to colored noise attack. Colored noise was generated by lowpass filtering white noise. Improvement in watermark robustness to colored noise attack due to whitening transformation. ......... Plots of (a) Magnitude spectrum of the watermark coef- ficients, and (b) Magnitude response of the attack filter at a normalized cut-off frequency of 0.4. A 4th-order IIR Butterworth filter was used to test watermark robustness to lowpass filtering. .................... Robustness to lowpass filtering. A 4‘h-order IIR butter- worth filter was used to implement the lowpass filtering attack. ........................... Plots of (a) Magnitude spectrum of the original water- mark coefficients h[n], and (b) Magnitude response of the transformed watermark coefficient, (—1)"h[n]. Robustness to 4th-order butterworth highpass filter. In (a), the embedded watermark coefficients corresponded to a magnitude spectrum shown in Fig. 3.10(a), and in (b) the watermark coefficients were transformed using equation (3.29) and embedded. .............. Robustness to cropping. Samples of the stegosignal were randomly cropped. Parameter-embedded watermarking results in improved robustness to cropping ......... Filtering attack. For Fig. 4.1(a) a 4thorder IIR Butter- worth lowpass filter was used to distort the stegosignal, and for Fig. 4.1(b) an 8thorder FIR highpass filter was used to attack the stegosignal. .............. Watermark robustness to combination of non-uniform quantization and IIR lowpass filtering attacks. ...... The first 100 bits of the 1000-bit binary watermark. 62 64 79 86 4| 5.2 Effect of white Gaussian noise on LP, LSP, LAR, IS and PARCOR embedded watermarks. In 5.2(a) a CWRG.3g of 7 dB was used to obtain the stegosignals, and in 5.2(b) a CWRseg 0f 27 dB was used to obtain the stegosignals. . 90 xi Chapter 1 Introduction Digital media and global access to high-speed computer networks are creating complex copyright issues for owners of legally-protected materials [1]. A response to the unprecedented need to protect intellec- tual property has been the emergence of an active research effort into digital watermarking technologies. Digital watermarking is the process of embedding data (the watermark) imperceptibly into a host signal (the coversignal) to create a stegosignal. The term “coversignal” is commonly used in watermarking literature [2] to denote the host sig- nal and the term “stegosignal” is borrowed from steganography [3] to represent the watermarked signal. The watermark is typically a pseudo- noise sequence, or a sequence of symbols mapped from a message. A watermark offers copyright protection by providing identifying infor- mation which is accessible only to the owner of the material. Only a watermarked version of copyrighted material is released to the public. When copyright questions arise, the watermark is recovered from the stegosignal as evidence of title. Watermarking has been argued to be an advantageous solution to this modern copyright problem, and there is strong evidence that the practice will be accepted by the courts as proof of title [1]. The design of a watermarking strategy for speech involves the bal- ancing of two principal criteria. First, embedded watermarks must be imperceptible to the listener. That is, the stegosignal must be of high fidelity. Second, watermarks must be robust. That is, they must be able to survive attacks [4] - those deliberately designed to destroy or remove them, as well as distortions inadvertently imposed upon the watermarks by technical processes (e.g., compression) or by systemic processes (e.g., channel noise). These fidelity and robustness criteria are generally competing, as greater robustness requires more watermark energy and more manipulation of the coversignal, which, in turn, lead to noticeable distortion of the original content. Related measures of a wa- termark’s efficacy include data payload, the number of watermark bits per unit of time [2]. Another important requirement of a watermarking strategy is its security, the inherent protection against unauthorized removal, embedding or detection. A watermarking scheme generally derives its security from secret codes or patterns (keys) that are used to embed the watermark. Only a breach of keying strategies should “ 4' ’DA: compromise the security of a watermarking technique; public knowl- edge of the technical method should not lesson its effectiveness. The speech watermarking methods described in this dissertation involve private decoding, meaning that the coversignal is required for watermark recovery. Private decoding techniques require additional in- formation during watermark detection and recovery. However, among other benefits, this additional information can be used to undo cer- tain attacks and distortion. In private decoding techniques, knowledge of the coversignal at the detector, serves as a registration pattern to undo any temporal or geometric distortions of the stegosignal [2]. For example, in the case of a “cropping attack,” wherein speech samples are randomly deleted, a dynamic programming algorithm can be used in conjunction with the coversignal to recover the watermark from the desynchronized stegosignal [5]. Although watermarking schemes involv- ing public decoding (coversignal not required for watermark recovery) are applicable in a larger set of applications, techniques involving pri- vate decoding can be used for content tracking, broadcast monitoring, and owner identification, in addition to copyright protection. Robustness requirements of watermarking algorithms are applica- tion dependent. Watermarking algorithms are broadly categorized into robust and fragile watermarking algorithms based on the robustness requirements. For a given application, robust watermarking algorithms .al are required to survive all intentional attacks and also distortion in— troduced by normal processing. Fragile watermarking algorithms are required to be selectively robust. For example, in a speech authentica- tion application of watermarking, the embedded fragile watermarks are required to be robust to compression, channel noise, and resampling and fragile to content tampering due to re—embedding and changes to acoustic information. The algorithms presented in this thesis fall under the robust watermarking category and were developed for applications such as content management, broadcast monitoring, and copyright pro- tection. Watermark embedding techniques vary widely in method and pur- pose. Watermarks may be additive, multiplicative, or quantization- based, and may be embedded in the time domain, or in a transform domain. Each technical variation tends to be more robust to some forms of attack than to others, and for this and other application-specific rea- sons, particular strategies may be better-suited to certain tasks. The methods reported in this dissertation are motivated by the particular properties of speech signal [6]. Parametric watermarking is based on manipulation of inear-in- parameters speech models. The linear prediction (LP) model is a spe- cial case of linear-in-parameters speech models that can be used for watermarking [6]. Generally speaking, the watermark information is .1 I concentrated in the few LP coefficients during the watermark embed- ding and recovery processes, while it is dispersed temporally and spec- trally otherwise [7]. The watermark recovery process involves least square error (LSE) estimation [8] of modified LP coefficients, and this further contributes to watermark robustness. Parametric watermark- ing provides sufficient flexibility in terms of watermark selection for a wide range of data payload, robustness, and stegosignal fidelity require- ments. In set-membership filtering (SMF) based parametric watermarking, LP coefficients of the original speech are modified subject to an objec- tive fidelity constraint. SMF is used to obtain sets of allowable pa- rameter perturbations (i.e., watermarks) subject to a constraint on the error between the watermarked and original material. The robustness of SMF based watermarking to filtering, quantization and combination attacks is studied. An important consideration in watermark robustness is the energy of the watermark signal (difference between watermarked and original signals). The most robust watermark is obtained from perturbed LP coefficients at the boundary of the membership set.1 A constrained optimization problem is solved to obtain the best water- marks for filtering and quantization attacks. The application that motivated the present work is the creation of 1This phenomenon is discussed below. the National Gallery of the Spoken Word (N GSW), an N SF-sponsored Digital Libraries Initiative II project. The goal of the NGSW effort is the development and management of an extensive on—line repository of spoken word collections, based largely on the renowned Vincent Voice Library. Further information is available at www.1ib.msu.edu/vincent/ and in [9]. Owners of copyrighted material are often reluctant to grant permis- sion to post such material on the internet without sufficient assurances that their rights will be protected. Accordingly, a prime interest in the development of the watermarking scheme is the need for robustness to the broadest possible array of attacks. On the other hand, preserving the audio history and authenticity of the NGSW materials requires that robustness not come at the expense of perceptible distortion. Although the NGSW application places few constraints on com- putational load, parametric watermarking can be implemented in real- time. Further, since the NGSW is a permanent, large-scale, repository of speech data with a rich meta-data support structure, the associa- tion of relatively detailed watermarking information with records in the database is not impractical. Chapter 2 Background In the last decade many algorithms have been proposed for multi- media watermarking. Early work emphasized watermarking algorithms that could be universally applied to a wide spectrum of multimedia con- tent, including images, video, and audio. This versatility was deemed conducive to the implementation of multimedia watermarking on com- mon hardware [10]. However, many watermarking applications, includ- ing copyright protection for digital speech libraries [11], embedding patient information in medical records [12,13], or television broadcast monitoring [14], involve embedding information into a single medium. Also, the attacks and inherent processing distortions vary depending on the nature of the data. For example, an attack on watermarked images may involve rotation and translation operations to disable wa- termark detection. However, such an attack is not applicable to audio data. Watermarking algorithms that are specifically designed for par- ticular multimedia content can exploit well-understood properties of that content to better satisfy the robustness, fidelity and data-payload constraints. For example, unlike general audio, speech is characterized by intermittent periods of voiced (periodic) and unvoiced (noise-like) sounds. Speech signals are characterized by a relatively narrow band- width, with most information below 4 kHz. Also, well-established an- alytical models for speech production exist [6] which can be exploited in the watermarking process. 2. 1 Speech watermarking Most existing watermarking algorithms for speech can be catego- rized into either spread-spectrum (SS) or speech synthesis based ap- proaches. SS watermarking [10] is one of the earliest and best-known watermarking algorithms applied to multimedia data. In SS water- marking, a narrowband watermark is embedded into a wideband “chan- nel” that is the coversignal. In the second main approach, watermarks are integrated through speech synthesis. An advantage of integrating watermarking with the coding process [15] is a reduction in computa- tional complexity. In this work, we adopt a new approach that has both spectrum- spreading and integration-by-synthesis aspects, but which is fundamen— tally different from the existing approaches. For speech signals, a para- metric approach is naturally motivated by the extraordinary successes in applying parametric models - in particular, the LP model - in several key speech technology areas. The robustness of the LP model to practi- cal anomalies occurring in coding, recognition, and other applications, suggests that some representation of these parameters might provide an effective basis for embedding durable watermarking data. Paramet- ric watermarking provides sufficient flexibility in terms of watermark selection for a wide range of data payload, robustness, and stegosignal fidelity requirements. In the strategy described here, LP parameters of speech are directly or indirectly modified by an added watermark vector. The stegosignal is constructed by passing the original speech through the modified inverse LP filter and resultant is then added to the prediction residual of the unaltered LP model. 2.1.1 Spread spectrum watermarking An important contribution of the work of Cox et al. [10] is the demonstration that a watermark must be embedded in perceptually significant components of a signal for sufficient robustness to attack. In [10], the DCT is applied to the coversignal and the watermark is embedded in the n (typically 1000) highest magnitude coefficients of the DCT, not including the zero frequency component. Each value of the watermark is drawn independently from a unit normal distribution. SS watermarking is robust to a wide range of attacks, so it is used as a standard against which to evaluate the robustness of parametric watermarking in this work. For the SS algorithm used to compare per- formance in this research, the stegosignal {yj Bil is obtained by adding the watermark sequence {93,130 to the 1000 largest DCT coefficients of the coversignal of 1 s duration. 372' = Y2“ + KAgia (21) where each 9,- is independently drawn from N(0, l), and Y, and f’, are the ith largest DCT coefficients of the cover and stegosignals, respec- tively. The A parameter controls the stegosignal fidelity and is adjusted to satisfy a desired fidelity constraint. In SS signaling [16,17], the watermark message is first modulated by a lowpass filtered pseudo-noise sequence. The resulting sequence is shaped by the LP spectrum of the coversignal, before being added to the coversignal. The latter measure reduces perceptual distortion. The watermark receiver whitens the stegosignal using the inverse LP filter. The watermark receiver requires perfect synchronization between the whitened stegosignal and the pseudo-noise spreading sequence. These techniques have been tested in low noise environments such as in the presence of additive white Gaussian noise with a 20 dB SN R. However, it is not known how such algorithms will perform under more challeng- 10 El ing channel conditions, or when subjected to deliberate attacks like cropping, filtering, or the addition of colored noise. 2.1.2 Watermarking integrated with speech synthesis In the approach by Hatada et al. [18], line spectrum pairs (LSP) [6] are extracted from short-term segments of the coversignal. The LSP parameters are selected because they correlate well with the formant location [18]. Codebook vectors are created by applying a clustering algorithm to the extracted LSPs. Watermarked codebook vectors are obtained by modifying the frequency components of the original code- book vectors. The LSPs of a particular frame are quantized by either the watermarked or original codebooks depending on whether the frame is to be watermarked or not. The stegosignal is synthesized using the watermarked LSPs. Even in the absence of watermarking, the LSPs of the original speech and those of the synthesized speech are different. In the pres- ence of watermarking, the difference between the original and extracted LSPs will be even more substantial. Thus watermark detection is af- fected even in the absence of an attack. Hence, to preserve the wa- termark information as accurately as possible, it is necessary that the speech frames used for embedding watermark data have very small LSP differences with respect to the synthesized speech. 11 2.1.3 Pitch and duration modification for watermarking Celik et al. [19] propose a speech watermarking algorithm for semi- fragile authentication applications. In the case of semi-fragile water- marking, robustness to selective manipulations or attacks is desired. Celik et al. use pitch and duration modification of quasi-periodic speech phonemes as the features for semi-fragile watermarking. The signifi- cance of these features makes them suitable for watermarking and the variability of these features facilitates imperceptible data embedding. A quantization index modulation scheme is used to embed watermark bits into these features. The coversignal is segmented into phonemes. A phoneme is a fun- damental unit of speech that conveys linguistic meaning [6]. Certain classes of phonemes such as vowels, semivowels, diphthongs, and nasals are quasi—periodic in nature. The periodicity is characterized by the fundamental frequency or the pitch period. The pitch synchronous over- lap and add (PSOLA) algorithm is used to parse the coversignal and to modify the pitch and duration of the quasi-periodic phonemes [20]. The pitch periods (pp) are determined for each segment of the parsed coversignal. The average pitch period is then computed for each seg- ment, P , wza p=1 l2 The average pitch period is modified to embed the mth watermark bit (u’rm) by using dithered quantization index modulation [21], [5:09 : Qd}(p’avg + Tl) — Tl where Q“; is the selected quantizer and 7? is the pseudo-random dither value. Pitch periods is then modified such that, pp“? 2 10,1) + (idling " Iéavg) The PSOLA algorithm is used to concatenate the segments and synthe- size the stegosignal. The duration of the segments is modified for better reproduction of the stegosignal. As necessitated by authentication ap- plications, watermark detection does not require the original speech. At the detector, the procedure is repeated and the modified average pitch values are determined for each segment. Using the modified av- erage pitch values, the watermark bits are recovered. The algorithm is robust to distortions caused by low-bit-rate speech coding. This is be- cause it uses features that are preserved by low-bit-rate speech coders such as QCELP, AMR, and GSM-06.10 [22]. Robustness to coding and compression is necessary for authentication applications. On the other hand, the fragile watermarking algorithm is designed to detect malicious operations such as re-embedding and changes to acoustic in- 13 formation (e. g., phonemes). 2.2 Set-membership filtering The set-membership filtering (SMF) concept was first published by Gollamudi et al. [23], and was more recently proposed as an innovative solution to the design of channel equalizers for digital communication by Nagaraj et al. [24]. SMF can be viewed as a reformulation of the broadly-researched class of algorithms concerned with set-membership identification (e.g. [25,26]). The application of SMF to parametric speech watermarking in demonstrated in Chapter 4. 2.2.1 Overview of SMF The SMF problem is stated as follows: SMF PROBLEM. Given a sequence {XT E RM}:=1 of observations, a “desired” sequence {ZT E R}:=1, and a sequence of error “tolerances” {7,}th1 (frequently constant with 7'), find the the exact feasibility set at time t, ”Pt Q RM which includes all vectors (filters), 0 6 RM, satisfying 7:, ___ {9| [2, _. 9%,] < 7,. for 7' e [1,t]}. (2-2) 14 Note that when 7t is constant with t, say 7, = 7, then we may write a = {6| ”2 — 2n... < v}- (2.3) in which z is the t-vector with ith element 22:, and z is the t-vector with ith element xfd. The SMF problem is solved using a series of recursions which re— turn at iteration t an hyperellipsoidal membership set, say 8,; 2) Pt, and the ellipsoid’s center, say 6t. The recursions execute an optimization strategy designed to tightly bound R by St in some sense. Accordingly, the broad class of algorithms employed in the SMF problem are often called the optimal bounding ellipsoid (OBE) algorithms [25,26]. The OBE algorithm used in the SMF based parametric watermarking al— gorithm is called the set-membership—weighted recursive least squares (SM-WRLS) algorithm, but the choice of OBE methods is somewhat arbitrary for the present application. 2.2.2 Set-membership weighted recursive least squares This section presents an overview of the SM-WRLS algorithm for filtering and identification applications. The SM-WRLS algorithm is used in the SMF based parametric watermarking algorithm. In the SMF framework it is assumed that there is an observation sequence {xt}f‘;1, a “desired” sequence {Zr}?:1, and a sequence of error tolerances 15 (7&5; [25]. The feasibility set at time t, ”Pt, includes all 6, such that zT = BTXT subject to [2T — 2,] < ”y, for 'r = 1,2, . . . ,t. (2.4) Let it be the tx M matrix with the ith row fix? and let 2,; be the t-vector with the ith element may, where {x/XTfizl are a set of error minimization weights. Then the covariance matrix is given by C3, = 5(th and on = szt. The algorithmic steps involved in implementing SM-WRLS for either identification or filtering applications are given in Table 2.1 [27]. 2.3 Lagrange Multipliers The method of Lagrange Multipliers is a common approach for solving constrained optimization problems [28]. The method of La- grange Multipliers is used to obtain optimal watermarks from the mem- bership set for a given attack on the stegosignal. In a constrained op— timization problem, a function needs to be maximized or minimized subject to certain conditions or constraints. A constrained optimiza- tion problem with the variable a: E R" is characterized by an objective function f0(a:), inequality constraint functions fi(:c) and equality con- 16 straint functions h,- (:r). min/max f0(a:) subject to fi(a:) S 0, for i = 1,2, . . . ,p1 (2-5) hJ-(x) =0, for j=1,2,...,p2 In the method of Lagrange multipliers, the constraint functions are taken into account by augmenting a weighted combination of the con— straint functions to the objective function [28]. That is, the Lagrangian L(:v, 51,17) is, V P1 V P2 L(:r, A, 22) = f0(a:) + Z A,f,«(.r) + Z t,h,-(a:) (2.6) i=1 j=1 where L : IR” x R91 x R“ —> R, {Xi}§:1 are the Lagrange multipliers as- sociated with the inequality constraints, and {53- E11 are the Lagrange multipliers associated with equality constraints. The Lagrange mul- tipliers are nonzero and those associated with inequality constraints are also nonnegative. The method of Lagrange multipliers converts the constrained optimization problem into an unconstrained one with n + 191 + p2 variables. The maxima or minima of the constrained optimization problem occur when the gradient of the Lagrangian is zero, VL($,:‘\,17) = 0. 17 That is, V P1 V P2 V,L(x, A, i?) = 0 ¢=> foo = - (Z kvxf. + Zvjvmhj) (2.7) i=1 j=1 and V V;_L(a:, ,17)=0 4:) fi=0, for i=1,2,...,p1 (2.8) 1 ijL(a:,5\,13) = o <==> h,- = 0, for j = 1,2, . . . ,pg. (2.9) It can also be observed that, 3L V BL V b—fl- — Ai and 5);; -- V3- (210) Equations (2.7)-(2.10) are used to obtain the optimal value of 11:. In order to use the method of Lagrange multipliers, the objective and constrained functions are not required to be convex. 18 Table 2.1: SM-WRLS algorithm Initialization: C51 = P0 = 6—11, where e is small 00 = 0 A1 = 1 k0 = [Hillbil]2 + 2?, computed after step 3 for 7' = 1 Recursion: For T = 1, 2, . . . ,t 1 C(r) and €7-1(r) are updated. 0(7) 2 szT-1xT where P, = C? 674(7) 2 25¢ —— 6::le 2 Skip step 2 if r = 1. The original A: is computed by finding a positive root of the following quadratic equation. {(A4 —1)G2(T)})‘2 F()\) = 0 = +{[2M — 1+ cyn63,1(r)] — I‘ET_1')’TG(T)}G(T))\ +{Ml1— ”Hg—1(7)] — Hr—IG(T)7—r} If there are two positive roots, then the larger one is used. 3 Skip step 3 if r = 1. If A; S 0, set PT = PT_1, 0, = 67-1, Ii,- 2 KET_1 then go to step 5. Otherwise continue with step 4. 4 Update PT, 0,, and k7. Pr—lxrxzpr—l _1: T:PT- -' T c, P 1 ’\ 1+A.c(~r) 97 = 6T—1 + )‘TPTéT—1(T)XT A62 7 KTZK’T—l+—T T r—1( ) 7.. " 1— non) 5 If r < t, increment r and return to Step 1. 19 Chapter 3 Parametric Speech Watermarking in the LP Domain 3. 1 Introduction The general parametric watermarking algorithm is formulated in the following way. Let {yn} denote the coversignal, and let {3),} be the ultimate stegosignal. Each of these is assumed to be a real scalar sequence over discrete-time n. It is assumed that the signals are gen- erated according to operations of the form [29] yn : 4574512.: $715 n) and fin = (Dir(€~na in: In), (3'1) in which {En}, {En}, {1:7,}, and {in} are measurable vector—valued ran- dom sequences. The operator gb is parameterized by a set 7r, the alter- ation of which (to create parameter set if) is responsible for changing 20 the operator (b to (I) and the sequences {fin} and {an} into their “tilded” counterparts. 3.1.1 An algorithm for LP parametric watermarking In the present study, the coversignal is assumed to be generated by a LP model, All 971 = Z aiyn—i + En: (32) i=1 a special case of the first equation in (3.4). The “true” model is determined by standard LP analysis of a (long) frame selected for wa- termarking [6]. The sequence {fin} is the prediction residual associated with the estimated model. The duration of the FIR linear predictor is naturally based on the assumed order of the LP model, M, used to initially parameterize the speech. The stegosignal is constructed using the FIR filter model 1M y = Z a.yn_.- + g... (3.3) i=1 where {iii} represents a deliberately perturbed version of the “true” set {ai}. The algorithmic steps of the LP parameter-embedded wa- termarking procedure appear in Table 3.1. Numerous ways in which parametric modification can be effected — including indirectly through changes to other speech parameters such as log area ratio (LAR) values or parcor values — are discussed further in Chapter 5. 21 Table 3.1: Watermark embedding algorithm Let {yn}:o____oO denote a coversignal, and let {yn}:l‘__mc be the kth of K speech frames to be watermarked. Then: For k = 1, 2, . . . ,K 1 Using the “autocorrelation method” (e.g., [6, Ch. 5]), derive a set of LP coefficients of order M, say {ah-Ail, for the given frame. 2 Use the LP parameters in an inverse filter configura- tion to obtain the prediction residual on the frame, M "l: {5n = 9n — 22:1 aiyn—i} n=nk 3 Modify the LP parameters in some predetermined way to produce a new set, say {dz-HZ]. The modifications to the LP parameters (or, equivalently, to the autocorrelation sequence or line spectrum pairs, etc.) comprise the watermark. 4 Use the modified LP parameters as a (suboptimal) predic- tor of the original sequence, adding the residual obtained in Step 2 above at each n, to resynthesize the speech over the frame, [3].. = 2,121 aiyn—i + {n} k . (To the extent that the 71:71). watermark represents only small perturbations to the orig— inal LP parameters, the resynthesized result is a pointwise approximation to the coversignal over the same time frame.) 5 The sequence {yn}::’: is the kth frame of the watermarked speech (stegosignal). Next k. 22 When watermark embedding involves direct modification of LP coefficients, the embedding process can be interpreted as a digital filter design problem. Equation (3.3) can be rewritten as M M 9n = Z awn—.- + ZWiyn—i + fin, (3.4) i=1 i=1 wherein, the watermark sequence {0.2,- {:1 constitutes the impulse re- sponse of an M th-order non-recursive filter. This filtered version of original speech incorporates the watermark information. Non-recursive filters are inherently stable and less sensitive to quantization errors. The watermark signal, wn = zglwiywi = g, — yn has a spectrum determined by the watermark coefficients and the coversignal. For ex- ample, the watermark spectrum can be designed to have predominantly lowpass, highpass or mid-band energy. It is important to understand a key difference in the way LP mod- eling is applied in this watermarking application relative to its conven- tional deployment in speech coding and recognition. In these prevalent applications, the goal is to find a set of LP coefficients that optimally model quasi-stationary regions of speech. In parametric watermark- ing, the LP model is used as a device to parameterize long intervals of nonstationary speech without the intention of properly parameterizing stationary dynamics in the waveform. Rather, the parameters are de- rived according to the usual optimization criterion — to minimize the 23 total energy in the residual [6, Ch. 5] — with the understanding that the aggregate time-varying dynamics will be distributed between the long-term parametric code and the residual sequence. 3.1.2 Recovering LP parameter-embedded watermarks The algorithm for recovering the watermark from the stegosignal appears in Table 3.2. An important step in the recovery process is the least square error (LSE) estimation of the modified watermark coeffi- cients, {Zia-H11, which is executed as follows. Let us consider a length N frame of the coversignal and rewrite the stegosignal generation equa- tion (3.3) as M d, = Z a,y.,_.- = 573,, with d, = g, — gn. (3.5) i=1 In principle, the system of equations (3.5) taken over N samples, n = 1, 2, . . . , N, is noise free and can be solved for a without error using any consistent subset of M equations. For generality, to smooth round- off and other errors, and to support further developments, we pose the problem as an attempt to compute the LSE linear estimator of the “out- put” signal, dn, given observations yn. The following normal equations are solved, (3,5 = cm (3.6) 24 Table 3.2: Watermark recovery algorithm For k=1,2,...,K 1 Subtract residual frame {5.333; from the stegosignal frame {am This results in an estimate of the modified predicted speech, {dn = y~n " €n}2: 2 Estimate the modified LP coefficients {in}? by computing the least-square-error solution, say {fa-H", to the overde- termined system of equations: d,, z 23:1 awn-“ n = nk, . . . , 77.2,. 3 Use the parameter estimates from Step 2 to derive the cor- responding watermark values. Next k. in which C31 = 2,721 ynyg = YNYE and Cyd = 25:1 yndn = YNdiV, where MxN YN=[yN yN_1 y1]ER (3.7) T M: [dN d,“ d1] ERNXI. (3.8) The LSE method is based on time averages, and its performance de- pends on the frame length used in the estimation [8]. In the stegosignal, the watermark information is distributed in time and is present as the watermark signal {wn},’,V=1. During recovery the watermark information is concentrated in a few coefficients {at-h“; derived from an estimate of the modified LP coefficients. 25 3.1.3 Perceptual aspects of LP parametric watermarking The watermark embedding process can be interpreted as (i) a mod- ification to the LP model or similarly derived models, plus (ii) FIR filtering. This section deals with the perceptual benefits of parametric watermarking and the constraint used in this research to objectively quantify stegosignal fidelity. Listening tests were also conducted on the watermarked speech file available at the website [30]. The results of these tests are discussed in detail in Section 3.2.3. Echo embedding interpretation The LP parametric watermarking agorithm can be interpreted as the addition of M echoes of small amplitudes and scales. The echoes are delayed by M units or less. Typically, echoes of delay 20 m8 or less are imperceptible. Also, since the echoes are scaled by much smaller valued watermark coeflicients, the louder coversignal masks some components of the echoes. It should be noted that the technique differs from the echo hiding method of Gruhl et al. [31], in which binary “one” and “zero” information is encoded in the offset and delay parameters of the echo and not in the echo amplitude. 26 Stegosignal fidelity Fidelity is a measure of perceptual similarity between the coversig- nal and the stegosignal. The watermarking process must not affect the fidelity of the speech beyond an application-dependent standard. A simple and mathematically tractable measure of fidelity is the signal-to— noise ratio (SNR), or, in the present context, coversignal-to-watermark ratio (CWR), defined as, E N 2 CWR : 1010g10 E—y— = 1010g10 %§‘='l—y%, (3.9) n=1 wn where w,, = [gn — y,,]. The CWR averages the relative distortion energy of the coversignal over time and frequency. However, CWR is a poor measure of speech fidelity for a wide range of distortions. The CWR is not related to any subjective attribute of speech fidelity, and it weighs the time domain errors equally [6]. A better measure of speech fidelity can be obtained if the CWR is measured and averaged over short speech frames. The resulting fidelity measure is known as segmental CWR [6], defined as, k] K 1 Z Z 3112 CWRseg : — 10l0g10 —;———§ , (3.10) K j=1 z=k,._L+1 [I91 - 311] where, k1, k2, ..., kK are the end-times for the K frames, each of which is length L. The segmentation of the CWR assigns equal weight to the 27 loud and soft portions of speech. For computing Cwmeg, the duration of speech frames is typically 15 — 25 ms with frames of 15 ms used for the experimental results presented in this chapter. Some of the other objective measures of speech quality include the Itakura distance [6], the weighted-slope spectral distance and the cepstral distance. According to Wang et al. [32], CWRseg is a much better correlate to the auditory experience than the other objective measures discussed above. A simple way to control the fidelity of the stegosignal is to scale the watermark vector, w, by a constant, say A, before adding it to the original LP parameters (Step 3 of Table 3.1). 3. 1.4 Security issues A watermark’s security refers to its ability to withstand attacks designed for unauthorized removal, detection or embedding. A water— marking technique must not rely on the secrecy of the algorithm for its security. In parametric watermarking, a copy of the coversignal is required for watermark recovery. The LP parameters of the stegosignal are different from the modified LP values obtained by adding the water- mark vector to the LP parameters of the coversignal. An attacker has access to the stegosignal and not to the coversignal, prediction resid- ual, frame length, and LP model order used for watermarking. Since parametric watermarking involves the alteration of deeply-integrated 28 characteristics of speech signals, the embedded watermark information is not easily determined from the resulting stegosignals. The security of the present watermarking technique can also be further enhanced by randomly selecting the speech frames to be watermarked, using a different LP model order for each watermarked frame (model order also depends on the fidelity constraint), and by embedding psuedo random watermark patterns. The LP parameters of the stegosignal can be eas- ily obtained. k . g, = 2 than. + g... (3.11) k=1 where K is the LP model order selected by the attacker. However, the LP parameters ({dk}) of the stegosignal are different from the modi- fied LP coefficients {Eli} and also, {En} is different from the prediction residual {gn}) associated with the coversignal, even if K = M. Impact of ambiguity attacks: Ambiguity attacks are of concern to both private and public watermarking techniques [33]. In an ambiguity at- tack, counterfeit watermarks are identified or created by an attacker in the stegosignal using a different watermarking scheme. The attacker recovers his or her watermark from the stegosignal, claims rightful ownership of the protected signal and succeeds in causing ambiguity about the “true” owner of the stegosignal. According to Craver et al., two necessary conditions for robustness to ambiguity attacks are non-invertibility and non-quasi-invertibility [33]. For a watermarking 29 technique to be non-invertible, it is essential that the mapping from the watermarked signal {3),} to {(12,} and {yn} does not exist; where {(11,} is the watermark carved out by the attacker and {31,} is the fake original created by the attacker. Non4quasi-invertibility is a much more stringent requirement. For a watermarking technique to be non-quasi- invertible, it should be impossible for an attacker to create {L0,} and {gn} from {32/7,}, which is perceptually similar to {37,} and such that {L0,} still exists in {yn}, the true original. It is shown below that parametric speech watermarking is non-invertible. For an algorithm to be invertible, it should be possible for an attacker to create a fake coversignal and a fake watermark from the stegosignal [equation (3.3)]. K K a. = Eating. + e. = Z (a. + way... + 5., (3.12) k=1 k=1 where K is the model order selected by the attacker, g, = 25:1 e,g.....- + 6,, is the fake coversignal, and {fin} is the corresponding minimum MSE prediction residual. An attacker can easily compute the LP coefficients and the predic- tion residual associated with the stegosignal. Obviously, equation (3.11) cannot be substituted by the attacker as the model for the fake cov- ersignal and fake watermark sequence, since the fake coversignal will be 30 the same as the stegosignal. On the other hand, an attacker can add a sequence {V7,} to the stegosignal and then compute the LP coeflicients and prediction residual. K K gn + 1/n = Z dk(gn-—k + l/n—k) + 6n = Z Ciky’n-k +671: (313) k=1 1:21 The attacker can then subtract, {Vn} from {57,} to obtain the stegosig- nal. K ~ K gn : Z: dktln—k + (£72. _ V72) 2 Z (dk + wkhfn—k + (£71 — V72), (3'14) k=1 k=1 Comparing equations (3.12) and (3.14), the fake coversignal, is given by y’,, = 2le (has. + (5," — 14,), where {In — V", is the prediction residual. However, from equation (3.13), the fake coversignal is 3),, = {in + Vn and the minimum MSE prediction residual is in, which is different from (5,“— z/n) and hence this is a contradiction. Thus it will be impossible for an attacker to invert the embedding process starting with the stegosignal. 3.1.5 A detection algorithm for LP parametric watermarking A common approach to watermark detection employs classic bi- nary decision theory. The hypotheses are H0 : I R = I and H1 : I R = I + w, where I R is the received signal, I is the original signal and w is the watermark signal [34,35]. A Bayesian or Neyman—Pearson paradigm 31 is followed in deriving the detection thresholds. For image watermark- ing, the image DCT coeflicients are modeled as generalized Gaussian in distribution [34,36]. These approaches do not consider the effect of noise while deriving the detection threshold. Several watermark de- tectors are based on correlation detection in the time or in the DCT domain [37,38]. That is, the correlation between the original and re- covered watermarks or the correlation between the original watermark and recovered signal is compared against a threshold. Correlation de- tectors are optimal when the watermark and noise are jointly Gaussian, or, in case of blind detectors, when the watermarked signal and noise are jointly Gaussian. For example, the detector presented in [2, Ch. 6], assumes that the detector output for each bit is Gaussian distributed. This is true for watermark patterns that are spectrally white, but this is not the case with the watermark signal in parametric watermarking. Hence there is a need to design a watermark detector in the parameter domain. This section describes a watermark detector for LP parametric watermarking [39]. The stegosignal is distorted by additive white or colored Gaussian noise in the time domain. The watermarks are com- prised of (eight) non-binary orthogonal vectors of length eight. Each of these eight vectors can be mapped to a unique symbol. For exam- ple, each vector can be interpreted as a particular integer from the set 32 {0, 1, 2,3,4,5,6, 7}. The watermark may be composed of many such integers or symbols. In the examples in this paper, each orthogonal watermark vector (symbol), is embedded into 0.125 seconds of speech sampled at 16 kHz, resulting in a bit rate of 24 bits per second (bps). The watermark vector is added to the coefficients of an eighth-order LP model. The length of the watermark vector (and hence the predictor model order) and the duration of speech frame can be selected arbi- trarily, subject to constraints on stegosignal fidelity. These constraints include an upper limit on the predictor model order, and a need to use FIR models of small order for short speech frames ("500 samples). Extensive experimentation by the author has shown that noise in the parameter domain, caused by stegosignal exposure to additive noise, is well-modeled by a Gaussian distribution. Figure 3.1(a) shows a typical noise distribution in the LP domain when white noise (SN R 15 dB) is added to the stegosignal. The noise distribution for Fig. 3.1(a) was obtained by conducting 1000 experiments, involving a stegosignal of 1 s duration watermarked at CWRseg of 7 dB using a watermark message consisting of eight orthogonal vectors, each vector embedded into 0.125 seconds (2000 samples) of speech. When white Gaussian noise was added to the stegosignal, the noise effects on a particular watermark coefficient could be approximated as independent and iden- tically distributed (i.i.d) Gaussian noise. The LP noise associated with 33 a particular watermark coefficient was uncorrelated with the LP noise for other watermark coefficients. The parameter noise samples were also uncorrelated with the corresponding LP coefficients. It should be noted that when the stegosignal is subjected to additive white noise, the parameter noise asymptotically tends to zero as N —+ co (discussed further in Section 3.2.3) and is of very low power. However, the noise generated using the “randn” function in matlab, is not ideal white noise. The parameter noise distribution of the stegosignal plus colored noise is similar to that shown in Fig. 3.1(b). Colored noise was gener- ated by lowpass filtering white noise using an IIR Butterworth filter. The LP noise affecting any given watermark coefficient was found to be i.i.d. Gaussian. However, a realization of noise affecting all the L = 8M watermark coefficients was found to be correlated with the original LP coefficients. A solution to this problem is to normalize the watermark coef- ficients before adding them to the original LP coefficients. That is, instead of directly adding the watermark vector to the original LP co- efficients (a = a + or), we obtain the modified LP coefficients as, (ti = (12' + (oz-[a2]. (3.15) From the estimate of the modified LP coefficients, the watermark vector 34 Distribution of noise affecting the 5th watermark coefficient 300 r M I l f T j l l 250 r r 200 r * Frequency 5: O _L o O 1 sol 0 1 1 1 1 1 1 1 1 1 -1 -0.8 -0.6 -O.4 -0.2 0 0.2 0.4 0.6 0.8 1 Amplitude (a) Distribution of noise affecting the 5th watermark coefficient 300 . . . . f 250 - r 200 r ‘ Frequency 8 8 01 O r 1 —0.1 -0.05 0 0.05 0.1 0.15 Amplitude (b) Figure 3.1: Typical noise distribution in the LP domain for any coeffi- cient. For Fig. 3.1(a) 15 dB white noise was added in time domain to the stegosignal, and for Fig. 3.1(b) 15 dB colored noise was added to the stegosignal. 35 is obtained as, 012': la'l (3.16) with 5 2 {iii $11, as defined in Table 3.2. However, when [ail << 1, the recovery of watermark coefficients magnifies the noise variance in the LP domain. To avoid this, watermark coefficients are normalized before embedding, but only if [ail _>_ 1. For the experiments presented in the rest of the chapter, the watermark embedding and recovery involves this ” “selective normalization. Accordingly, Step 3 of Table 3.1 is carried out using the following rule in the present algorithm: a,- +wi|a,-|, if [a1] _>_1 N az- + Luz, otherwise The final step in the recovery algorithm (Table 3.2) involves the follow- ing equation: ) A Q“ i— ail/(lad), if lat] Z 1 a, — ai, otherwise In Table 3.3, ,u and 02 are the parameter noise mean and vari- ance, and cra(0) is the cross-correlation between the recovered vector and the original LP coefficients. The values of ,u, 02, cra(0) were de- termined by conducting 1000 experiments, involving a stegosignal of 36 Table 3.3: Effect of selective normalization Noise SNR Normal 11 02 c,a(0) (dB) -ization White 10 no 2.849 x 10‘4 0.0517 —0.0059 White 10 complete —0.0152 4.6477 —0.0051 White 10 selective —6.2 x 10-5 0.1099 7.645 x 10-4 Color 15 no 2.3438 x 10’4 0.0049 0.0328 Color 15 complete 0.0139 0.7518 —0.0094 Color 15 selective —1.162 x 10’4 0.0071 0.0023 1 s duration. The stegosignal was watermarked at CWFIS.3g of 7 dB using a watermark message consisting of eight orthogonal vectors, each vector embedded into 0.125 seconds (2000 samples). Selective normal- ization of watermark coefficients significantly reduces the correlation between noise in LP domain and the LP coefficients, especially when the stegosignal is subjected to colored noise in the time domain (see Table 3.3). Moreover, as the noise variance in the LP domain is re- duced, selective normalization improves the cross-correlation between the original and recovered watermarks compared to the complete nor- malization case. Figures 3.2(a) and (b) also show an improvement in the correlation coefficient values when selective normalization is used. The watermark detection process is treated as a binary decision problem in the presence of additive noise. Preliminary watermark de- 37 0.9 ~ ,_, 0.8 - c .92 ,9 0.7 ’ "é o 0.6 - = i .9 0.5 E g 0.4 r 8 0.3 . Selective normalization 0.2 . + Complete normalization - No normalization 0'1 1 . 1 1 J r x «15 -10 -5 0 5 10 15 20 SNR (a) 1 - A V l 0.9 P f q E .92 .9 0.8] ”a": o c 0.7 - .9 E E) 0.6r 5 0 0.5 ~ . + Complete normalization 0 4 _ — No normalization ' Selective normalization -10 -5 0 5 10 15 SNR (b) Figure 3.2: Effect of complete normalization, selective normalization, and no normalization of watermark coefficients on the correlation co- efficient between original and recovered watermarks. In 3.2(a) the stegosignal was distorted by white noise in the time domain, and in 3.2(b) colored noise was added to the stegosignal. 38 tection experiments are used to set the hypotheses, H1: r,=w,-+v,-, i=1,2,...,L where {r,-},-L=1 is the set of elements in the observation vector. The null hypothesis is that no watermark is present and only noise is transmit- ted {v,-},L:1, while under H1, both watermark {62,-}1‘ and noise samples i=1 {vi ,L=1 are present in additive combination. Due to selective normaliza- tion of watermark coefficients, noise in the LP domain, v,- is distributed as N(0, 02), when noise {Qfiil is added to the stegosignal in the time domain such that the SNR is 31 = 101og10 [(271]; 9,2,)/(Z,1:’=1 (3)]. For this watermark detection problem, the expressions for false—alarm, de- tection and missed-detection rates are well-known and are given by (e.g., [40]), 1 L 2 PF 2 0.5 [erfc (lnr + 57'? Zi=l w,- )[ (3.17) fie 1 +1 .L_ 2—’ PD=0.5 [erfc(nT 552%;1‘“: #1)] (3.18) PM = 1 — PD (3.19) Here, u] = (202)”122:1 wf, 6 = (202)‘1 Zlewf and r is the detec- tion threshold. Let r” = o2 lnr + 0.522-1’2111)? then, the decision rule 39 is H1 L > Zriw, — T”. (3.20) i=1 < H0 In a practical implementation, the threshold 7'”, corresponding to an SNR of 5'1, can be adjusted further if the actual SNR in the time domain is determined. As an example, if the SNR were found to be Sg = 10log10 [(2371:]:1 g3) / (25:1 (3)] (assuming zero-mean noise), the threshold 7” is altered by multiplying 02 with the adjustment factor 1/6, where ,6 = 10(51‘52l/10. The SNR in the parameter domain is defined as, d2 = (,11‘1/6)2 [40]. In the present case, d 2 VIII. Hence embedded marks of greater energy will result in improved robustness, while noise of higher variance in the parametric domain will hinder watermark detection. The stegosignal was subjected to additive white and colored noise, resulting in different SNRs in the time and parameter domains. In each case, experiments were repeated 1000 times in order to estimate the mean and variance of the Gaussian noise affecting each watermark coefficient. Receiver op- erating characteristics (ROC) were determined using equations (3.17) and (3.18). It is observed in Table 3.4 that very low false-alarm rates 40 Table 3.4: Estimates of SNR, d2, PD and Pp Noise SNR (dB) d2 PD PF 7” White 15 696.95 0.99999 4.37x10-m 6.8699 White 10 72.79 0.99994 1.37x 10-6 4.3960 Colored 7 167.29 0.99999 1.20x 10-18 5.4038 White 3 14.45 0.9987 0.215 1.6610 White 1 9.54 0.99715 0.37304 0.8388 can be achieved using parametric watermaking with selective normal- ization. For example, when 10 dB white noise is added to the stegosig- nal, for a threshold 7” = 4.3960, a PD = 0.99999 and a false-alarm rate Pp = 1.37 x 10“6 is obtained. Experiments were performed for time domain SN Rs of 1 dB and 3 dB and Pp was found to be 0.14 and 0.0033 respectively, an improvement over the results in Table 3.4. It should be noted that for time domain SN Rs below 10 dB, the resulting stegosignals are degraded to the point of being unuseful as surrogates for the coversignal. Comparing the SNR in the time and parameter domains it can be observed from Table 3.4 that parametric watermark- ing significantly boosts the SNR. The resulting parameter noise gain suppression contributes to improved watermark detection. 41 3.2 Experiments and discussion 3.2.1 Introduction Robustness refers to the ability of the watermark to tolerate distor- tion from any source to the extent that the quality of the coversignal is not affected beyond a set fidelity standard, or that the watermark detec- tion and recovery processes are not hindered. Experiments performed to investigate the perceptual and robustness aspects of LP parametric watermarking are presented in this section. Some of the factors affect- ing the robustness of the present technique include the length of the speech frame to be watermarked, the choice of watermark sequence, the relative energy of the watermark, and the temporal locations and du- rations of the watermarks in the stegosignal. In broader terms, water- mark robustness also depends on the watermark embedding, recovery, and detection algorithms. For the experiments below, speech was watermarked using both LP based parametric and SS watermarking algorithms. Both LP and SS watermarking algorithms involve private decoding. In the experi- ments presented below, the coversignal [shown in Fig. 3.3(a)] consists of 1 s of speech from the TIMIT database [41], sampled at 16 kHz. The sentence “She had your dark suit in greasy wash water all year.” is uttered by a female talker. For the robustness experiments, parametric 42 0.8 0.6 7 0.4 r 0.2 - Amplitude -O.2 - —O.4 ~ -0.6 - —O.8 0.8 r 0.6 - Amplitude ' o 0.2 0.4 0.6 0.8 1 Time (s) (b) Figure 3.3: Plots of (a) coversignal and (b) stegosignal at CWR68g of 7.715 dB. The coversignal and the stegosignal are of 1 s duration and sampled at 16 kHz. The speech is divided into frames of 2000 samples and a watermark vector is embedded into each of the eight frames. 43 watermarking is implemented at CWRseg’s of 7.715 dB and 10.68 dB. SS watermarking was implemented at CWRseg’s of 7.715 dB, 10.68, 27 dB, and 30 dB. This is explained further in Section 3.2.2. The sam- ple correlation coefficient is used as the measure of similarity between original and recovered watermark vectors for both parametric and SS watermarking techniques. The correlation coefficient between two ran- dom variables 0) and r is given by cw..(0) — E(w)E(r) ow 0,. , (3.21) where cw. is the cross-correlation between to and r, E (w) and E (r) are the expected values of w and r, and of, and of are the variances of w and r, respectively. For sample correlation coefficient, the expected values of w and r are replaced by the samples means rm, 2 7132le w, and m. = %Zf=1r,~. And the variances, of, and of, are replaced by sample variances of w and r given by varw = %Z,.L=1(wi — m...)2 and var. = %Z,L:1(r,- — m1.)2, respectively. The sample cross-correlation between to and r at lag 0 is %Z,-L=1wiri. Since the watermark vectors are mutually orthogonal, the correlation coefficient between distinct watermark vectors is 0. Bit error rate is another commonly used performance measure of similarity between the original and recovered watermarks. The bit er- ror rate (BER) is defined as the ratio of number of bit errors to total 44 Amplitude 0.26 0.265 0.27 0.275 0.28 0.285 Time Figure 3.4: Segments of cover (dotted line) and stegosignals (continuous line) of 480 samples or of 0.03 ms duration and a CWRseg of 7.715 dB. The cover and stegosignals used in the robustness experiments are of 1 s duration and sampled at 16 kHz. The speech is divided into frames of 2000 samples and a watermark vector is embedded into each of the eight frames. 45 number of bits transmitted. In this work, it is more relevant to use correlation coefficient than BER because it is more important to char- acterize the performance based on the detection and recovery of the entire watermark vector rather than the individual bits or watermark vector elements. The probability of signal or watermark vector error is also a useful performance measure. The relation between correlation coefficient, probability of signal error and BER can be found in [42]. In a practical implementation, a recovered vector, possibly containing the watermark, is first sent to the detector of Section 3.1.5, which is governed by the decision rule in equation (3.20). For LP parametric watermarking, the speech was divided into eight frames of 2000 samples each or 0.125 seconds duration. The watermarks were comprised of (eight) non-binary orthogonal vectors of length eight. In each of the speech frames, a length eight watermark vector was embedded into to the coefficients of an eighth-order LP model, resulting in a bit rate of 24 bits per second. For the parametric watermarking experiments presented in this section, the watermark embedding and recovery involved selective normalization. The resulting stegosignal [Fig 3.3(b)] was subjected to various attacks discussed below. For the SS algorithm, the stegosignal {37,-};le was obtained by adding the watermark sequence {gi},12(]0 to the 1000 largest DCT coef- 46 ficients of the coversignal of 1 s duration. 372' = Y,(1 + A91), where every g,- is independently drawn from N(0,1), and Y,- and 17, are the ith largest DCT coeflicients of the coversignal and stegosignal, respectively. The A parameter is adjusted to obtain a desired CW&eg. 3.2.2 Subjective perceptual tests Although CWR66g is used as the objective measure of fidelity, lis- tening tests were also performed to compare the watermarked speech fidelity. Speech was watermarked using both parametric and SS algo- rithms for CWR68g ranging from 1 dB to 40 dB. For the robustness experiments discussed in the following section, two implementations of LP parametric watermarking at CWRseg of 7.715 dB and 10.68 dB were used. Parameter-embedded watermarks were inaudible at these or higher CWRseg [30]. Different CWRsegvalues can be selected depending on the fidelity constraint for a given ap- plication. For performance comparison, implementations of SS wa- termarking at 7.715 dB and 10.68 dB were also used. Additionally, listening tests were performed, to subjectively identify CWRseg’s of SS- watermarked stegosignals, whose fidelity was comparable to the 7.715 dB and 10.68 dB implementations of parametric watermarking. This 47 was imperative, as an objective measure such as CWRseg, although an improvement over CWR, does not satisfactorily quantify all the per- ceptual aspects of fidelity. Five subjects were asked to select the SS watermarking implementations that sounded most similar to the 7.715 dB and the 10.68 dB implementations of LP parametric watermarking, from a set of stegosignals with CWRSCg’s ranging from 1 dB to 40 dB. The sounds files used in the listening tests are available at the web- site [30]. Based on the subjective tests it was concluded that the 7.715 dB implementation of LP parametric watermarking was perceptually similar to the 27 dB implementation of SS watermarking, and the fi- delity of 10.68 dB implementation of LP parametric watermarking was comparable with 30 dB implementation of SS watermarking. This, in itself, is significant because it demonstrates the fidelity benefits that can be achieved through parametric watermarking. 3.2.3 Watermark robustness In this section, we analyze the robustness to common attacks of wa- termarks inserted by LP based parametric watermarking. The stegosig- nals used in these experiments were obtained by embedding watermarks through direct manipulation of the LP coefficients. The SS watermark- ing algorithm for multimedia signals [10] was used to benchmark per- formance. 48 For meaningful analysis of detection performance, it is necessary to consider stationary segments of the coversignal and the stegosignal. That is, segments of yn, w,,, and, hence, gm are assumed to be partial realizations of wide-sense stationary (WSS) and ergodic random pro- cesses. Generally, speech sequences can be considered stationary across frames of duration 20 ms. However, the robustness experiments pre- sented below are based on speech frames of longer duration, typically, 125 ms, in order to balance the conflicting requirements of stationar- ity and longer frame lengths for the LSE estimation and stegosignal fidelity. Hence, one important observation to be made based on the experimental results is the effect of non-stationarity on watermark ro- bustness. Robustness to additive white noise attack Let {77”}?=1 be a partial realization of a zero mean, uncorrelated noise process which is added to the stegosignal samples {gn}f,"=,. Let the corrupted stegosignal be denoted {gg},’)’=,. In this case, the “output” signal used in the LSE problem (equation (3.5) for n E [1, N ]) will be likewise corrupted. That is, the clean signal d, is replaced by, say, dZ=y”—§n=dn+nn,n=1,2,...,N. (3.22) 71. Accordingly, the cross-correlation vector cydn [i.e., right side of normal 49 rmuu.‘.:-.‘“ - ' l- equations (3.6)], but only this vector, is affected by the attack. The LSE solution is a" = Cglcyd. = (YNYiv‘rldeyv ’ (3.23) T where, d7]v = div div—1 (1'17 6 RN. Equation (3.23) can be expressed as, a" = Cglcyd + Cglcy, = a + Cglcyn. (3.24) The ith value of the cross-correlation term cm is given by cyn(i) = 277:1 yn_,-dn. Since the noise is uncorrelated, Cyn asymptotically as N —> oo approaches the zero vector 0. Hence the “corrupted” cross- correlation, Cydn, approaches cyd for large N. The watermark is there- fore asymptotically immune to the white noise attack. In the presence of white noise, a" is an unbiased and consistent estimator of 5 for all N. To verify the analysis, experiments were performed in which white Gaussian noise resulting in different SNRs was added to speech wa- termarked by both LP and SS algorithms. The correlation coefficients between the original and recovered watermarks from all eight stegosig- nal frames were determined and averaged. It is seen in Fig. 3.5 that, at any SNR, LP parametric watermarking at 7.715 dB and 10.68 dB 50 1 _ 0.9 - 0.8 - 8 z: 0.7 - (_u g 0.6 — . o 0 0.5 - if? 0 4 0:) a I. 30:, 0.3 . Param. wmkg. 7dB 8 - - - Param. wmkg. 10dB 0-2 “ + SS wmkg. 7dB 0.1 _ —-1— SS wmkg. 10dB —6— SS wmkg. 27 dB 0 ’ ,_ , 1 + $8 wmkg. 30dB -20 0 20 40 60 80 SNR Figure 3.5: Watermark robustness to white noise attack. Performance of parametric watermarking at CWRseg’s of 7.715 dB and 10.68 dB is compared with that of SS watermarking at 7.715 dB, 10.68 dB, 27 dB and 30 dB CWRseg. 51 CWR60g results in higher correlation between original and recovered wa- termarks compared to SS watermarking at CWRsegs of 7.715 dB, 10.68 dB, 27 dB or 30 dB. This improvement in the correlation coefficient values, and hence robustness, is mainly due to the LSE—based recovery algorithm. This level of robustness to white noise attack is sufficient for a wide-range of watermarking applications, as the stegosignal is highly noisy below an SNR of 15 dB (for details see [30]). The non-stationarity of the 2000—sample watermarked speech frame can be ignored for practi- cal applications of parametric watermarking. Also, as expected, water- mark robustness to attack increases as the CWRseg is decreased, since there is greater watermark energy in the same coversignal. Robustness to colored noise attack In the next set of experiments, the stegosignal segment was dis— torted by the addition of a colored noise process, {7n},1:’=1. Colored noise was generated by filtering a white noise process using a 11thorder FIR IOWpass filter with a cut-off frequency of 0.4 (normalized) or 6400 Hz. The distorted stegosignal frame is denoted {gg},1f=,. As in the white noise case, the “output” signal in the watermark recovery process is corrupted. Instead of dn, we have access to d7 = '7; — e[n] = dn +ryn, n = 1,2, ...,N. (3.25) 52 Consequently, the cross-correlation vector in the normal equations is altered by the attack. Because of the correlation in the noise, cyan no longer approaches cyd asymptotically. Depending on the relative magnitudes of the cross—correlation elements in cyd—y, the LSE estimation of the perturbed coeflicients, and hence the watermark, will be affected. The solution to this problem is a prewhitening procedure. In the presence of colored noise, the LSE estimation problem is represented by the following equation, Yqjgrél = (IN + ’YN. (3.26) T in which, 7N = [ 71 72 . . . 7N ] and all other quantities are defined above. Pre—multiplying both sides of equation (3.26) by the inverse covariance matrix of the colored noise, C; 1, and rearranging the terms, results in 57=(YNC;1Y71(,)‘1YNC;1d7 = (YNC;1Y§)-1YNC;1dN. (3.27) Thus, the estimation of the perturbed LP coeflicients is the solution to (3.27) with Cy replaced by C; = (YNC;1YN) and cm; replaced by CZd = YNC;1dN. Whitening requires knowledge of noise correlation properties which are readily determined in the present application. 53 1 1 ".7 I 0.9 L 0.8 - . ‘E .9 0.7 - « .9 ‘5 0.6- . 8 c 0.5 - .9 E 0.4 . g 03 . Param. wmkg. 7dB 8 - - - Param. wmkg. 10dB 0-2 - + $3 wmkg. 7dB 01-, +38 wmkg. 10dB ' I . ». —6-SS wmkg. 27 dB 0 _— "4.42, -— ' +88 wmkg. 30dB -20 0 20 40 60 80 SNR Figure 3.6: Watermark robustness to colored noise attack. Colored noise was generated by lowpass filtering white noise. 54 p .0 .0 ‘1 m (D —3 I r r fl . .0 O) I M .0 A Correlation coefficient O 01 0.3 - 0 2 , —II— without whitening 7dB ' —— with whitening 7dB 0.1 —t— without whitening 10dB . ’ - - - with whitening 10dB 0 1 1 1 1 1 -20 0 20 40 60 80 SNR Figure 3.7: Improvement in watermark robustness to colored noise at- tack due to whitening transformation. 55 The effect of colored noise on watermark robustness is represented in Fig. 3.6. It is observed that LP parametric watermarking is fairly robust to colored noise, even in the absence of a prewhitening operation during the recovery process. In Fig. 3.6, the differences in performance between parametric and SS watermarking algorithms at 7.715 and 10.68 dB is even greater than in case of white noise (Fig. 3.5). An improve- ment in watermark robustness to colored noise attack is observed in Fig. 3.7, where the watermark recovery process involves prewhitening. In fact, LP parametric watermarking with prewhitening at 10.68 dB results in better robustness than at 7.715 dB without whitening, even though the latter outperforms SS at 7.715 dB. Robustness to filtering Let gf _ be the result of filtering the stegosignal. At time n, n n—l 9,5 — 9n * hn — yn * hn 'l' wn * hn, (3.28) where {hn} is the impulse response of the filter, * denotes linear con- volution, and where we have continued to denote the watermark signal ~ wn : yn — yno In the first analysis, it seems very reasonable that an ideal attack would be designed to result in g): % yn. This indicates that the ideal attack filter will maximize (in some sense) the contribution of the first 56 term in the sum in (3.28), and minimize the second — similar to any 0p— timal filter design to remove noise.1 On the other hand, (3.28) reveals that good watermark design requires that the watermark signal be as spectrally similar to the coversignal as possible, so that any attack on the watermark will also degrade the coversignal component, thereby degrading fidelity. Since the effectiveness of an attack is constrained by the perceptual distortion of the stegosignal, for robustness to filter- ing attacks, it is sufficient that most of the watermark information be present in perceptually significant components of the coversignal [10]. In general, speech signals have most of the perceptually significant com— ponents in the low frequency spectrum, and hence watermark signals with low frequency spectra are most likely to survive a filtering attack - assuming that the attacker uses a rational approach which preserves fidelity. Watermark robustness to a 4th—order butterworth lowpass filter for a range of cut-off frequencies is shown in Fig. 3.9. Since the water- mark vector can be interpreted as coefficients of an FIR filter {02,-},1-‘11 = {w[i]},’-‘:1, the magnitude response of this FIR filter (|W(Q)|) is as shown in Fig. 3.8(a), while the magnitude response of the attack fil- ter is shown in Fig. 3.8(b). Watermark robustness to filtering depends on the magnitude spec- 1Since the attacker does not have access to the watermark signal w", truly optimal design - from the attackcr’s point of view - is not possible. 57 Magnitude response |H(Q)| 0 0.1 0.2 0.3 0.4 0.5 Normalized frequency (0) (a) 0.9 r 0.8 - 0.7 - 0.6 ~ 0.5 r 0.4 - 0.3 r i 0.2 - llR filter magnitude response 0.1 - 0 0.1 0.2 0.3 0.4 0.5 Normalized frequency (9) (b) Figure 3.8: Plots of (a) Magnitude spectrum of the watermark coeffi- cients, and (b) Magnitude response of the attack filter at a normalized cut-off frequency of 0.4. A 4th-order IIR Butterworth filter was used to test watermark robustness to lowpass filtering. A .1 .0 (o Param. wmkg. 7dB I I I , — - - Param. wmkg. 10dB ’ + $5 wmkg. 7dB ,’ . —-I— SS wmkg. 10dB I —6— SS wmkg. 27 dB I + SS wmkg. 30dB / .0 oo .0 \I .0 O) \ x P A ' I I l l l i i r l I \ \ Correlation coefficient 9 o (.0 01 \ L 0.2 - ........ -—-_.—_..-.—-—_. 0 1 1 1 1 m 1 1 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 Normalized cut-off frequency Figure 3.9: Robustness to lowpass filtering. A 4th-order IIR butter— worth filter was used to implement the lowpass filtering attack. 59 trum of the embedded watermarks. Low-frequency and mid-frequency watermark filters contribute to better robustness against lowpass filter- ing. Watermark robustness can be improved further through diversity, by repeatedly embedding watermark information [43]. Any highpass watermark filter {whp[i]},-1Zl can be transformed into a lowpass water- mark {wlp[i]};-”_’I__1, using the relation [44] with = (—1)"wzplil- (3.29) Robustness is improved by embedding the “same” watermark twice, in the original form {wfifll = {w’ [2]},M:1 and in the frequency-translated form w[i] = (—1)iw’[i]. The recovered watermark that has a higher correlation with the embedded watermark is used for watermark detec- tion. In order to illustrate this point, the coversignal was altered using the watermark whose magnitude spectrum is shown in Fig. 3.10(a), and its translated counterpart shown in Fig. 3.10(b). The resulting stegosignals were subjected to a highpass filtering attack using a 4th- order butterworth filter. Figures 3.11(a) and 3.11(b) show that the transformed watermarked results in improved robustness to filtering. Highpass filters have a deleterious effect on speech quality. Even a cut-off frequency of 0.04 (normalized) or 640 Hz resulted in significant distortion of the stegosignals making them unusable for typical commer- 60 cial use, for example, and certainly for the digital library application addressed here. Robustness to cropping In a cropping attack, arbitrary samples of the stegosignal are re- moved. Since the parametric modeling based watermarking involves an additive operation during the watermark embedding and recovery processes, cropping results in desynchronization of the coversignal and the stegosignal. However, as the present method is an informed wa- termarking technique, the algorithm described in [5] can be used for resynchronization of the cover and stegosignals. In the present experiment, the stegosignal was subjected to a mod- ified version of crepping, sometimes called the jitter attack [45]. In this modified implementation, random samples of the stegosignal were re- placed by zeros. A specified percentage of samples from each frame of 2000 samples were randomly replaced by zeros. The fact that wa- termark information is spread-out in the stegosignal, while it is con- centrated during the recovery process involving LSE, contributes to increased robustness of LP parametric watermarking to cropping as shown in Fig. 3.12. 61 .1. .5 .—L N Magnitude response |H(Q)| 0.8 0.6 0.4 - 0.2 1 1 1M 1 0 0.1 0.2 0.3 0.4 0.5 Normalized frequency ((2) (a) 1.8 I I I fi Magnitude response |H(Q)| 0.2 0 0:1 0.2 0.3 0.4 0.5 Normalized frequency (9) (b) Figure 3.10: Plots of (a) Magnitude spectrum of the original water- mark coefficients h[n], and (b) Magnitude response of the transformed watermark coefficient, (—1)”h[n]. 62 1.2 r l M 1 m If Param. wmkg. 7dB - - - Param. wmkg. 10dB + SS wmkg. 7dB —1— SS wmkg. 10dB —9— SS wmkg. 27dB + SS wmkg. 30dB Correlation coefficient 7 I 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Normalized cut-off frequency (a) Param. wmkg. 7dB - - - Param. wmkg. 10dB 1 + $3 wmkg. 7dB —+— SS wmkg. 10dB —9— SS wmkg. 27dB + SS wmkg. 30dB 1 Correlation coefficient >.‘_u' _"_ fi"‘ - . '7 _-‘ 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Normalized cut-off frequency 0)) Figure 3.11: Robustness to 4th-order butterworth highpass filter. In (a), the embedded watermark coefficients corresponded to a magnitude spectrum shown in Fig. 3.10(a), and in (b) the watermark coefficients were transformed using equation (3.29) and embedded. 63 M 1 I . Param. wmkg. 7dB 033'; ‘\ - - -Param. wmkg. 10dB - \ —l— SS wmkg. 7dB .— 0‘8 ‘ -f-- SS wmkg. 10dB .5 0.7 ‘. q —e—ss wmkg. 27 as i 9 \ —.e.— SS wmkg. 30dB 30:, 0.6 8 t c 0. .9 E 0.4 1 a) i t 0.3' 8 I 0.2 4“ 0.1 0 0 20 40 60 80 100 % cropped samples Figure 3.12: Robustness to cropping. Samples of the stegosignal were randomly cropped. Parameter-embedded watermarking results in im- proved robustness to cropping. 64 Table 3.5: Robustness to speech coding Speech Bit Para Para SS SS wmkg codec rate wmkg wmkg wmkg wmkg 7 dB 10 dB 7 dB 10 dB k bits/s corr coef corr coef corr coef corr coef (3.711 64 0.9990 0.9998 0.9985 0.9966 ADPCM 32 0.9889 0.9682 0.8207 0.7658 GSM 13.2 0.6584 0.5545 0.4095 0.3140 CELP 4.5 0.1488 0.1490 —0.0225 —0.0290 CELP 2.3 0.1269 0.1464 0.0209 0.0472 LPClO 2.4 0.1762 0.1762 —0.0184 —0.0470 Robustness to speech coding Experiments were performed to study the effect of low-bit rate speech compression on LP based parametric watermarking. The stegosig- nal was compressed (coded), then decompressed (decoded), and the wa- termarks were recovered from the decompressed (decoded) signal. The correlation coefficient between the original and recovered watermarks for different speech codecs are tabulated in Table 3.5. The attacked stegosignals are available in the website [30]. The G711 [1 law, G726 ADPCM, GSM (13.2 k bits/s), LPClO, and CELP (4.5 k bits/s) codecs were obtained from the website [46]. The G.711 speech codec software uses logarithmic pulse code modulation (PCM) and operates at sam— pling frequency of 8 KHz, with 8—bits per sample to compress and de- compress speech. The G.726 uses adaptive differential PCM technique and is widely used in VoIP applications. The GSM full rate codec uses 65 an 8th order linear prediction along with 13-bit uniform PCM. CELP and LPC10 codecs are also based on parametric models of speech. At 4.5k bits per second or less, the attacked stegosignals are intelligible, but are of low fidelity [30]. It is seen from Table 3.5 that the compression bit rate and CWRSCg are the main factors influencing watermark robustness. LP parametric watermarking outperforms SS watermarking in the presence of speech coding for all the bits rates tested. The performance of both SS and parametric watermarking is degraded significantly due to low bit rate CELP, GSM and LPC10 coding. However, the quality of the de- compressed speech is also degraded considerably for CELP, GSM and LPC10 codecs [30]. Parameter-embedded watermarks are slightly more robust to LPC10 coding than CELP coding at 2.3 k bits/s. Although parametric watermarking involves perturbation of parameters of signif- icance to speech coding, it performs better than SS watermarking in the presence of CELP, GSM and LPClO codecs. This is because of the more speech-like rather than noise-like characteristic of the LP based watermark signal. At the same time, since LP analysis is performed over nonstationary segments of speech and synthesis is not involved in stegosignal reconstruction, robustness to a particular speech coder is not at the expense of the robustness to other codecs. 66 Chapter 4 LP Parametric Watermarking with a Fidelity Constraint 4.1 Introduction The previous chapter described a speech watermarking algorithm wherein the LP parameters of the coversignal were modified by the ad— dition of the watermark vector that was selected independently of the coversignal. The stegosignal was constructed using the correspondingly perturbed LP coefficients and the exact prediction residual, {233574, using the FIR filter g, = 23:1 a.y,,_, + 5,, = 5Tyn + g... LP based parametric watermarking was found to be fairly robust against a wide variety of attacks such as addition of noise, MP3 compression, and cropping [7]. A main reason for good robustness is that the water- mark signal is concentrated into a parametric representation during watermark embedding and recovery, while it is spread across the en- 67 tire work otherwise. Stegosignal fidelity and watermark robustness can be improved further if the embedded watermarks are obtained by in- tegrating a fidelity constraint with the watermark embedding process and this led to SMF based parametric watermarking. 4.2 SMF parametric watermarking SMF-based parametric watermarking subject to an 600 fidelity con- straint [29,47] represents a step toward quantifying the relationship between the competing requirements of robustness and fidelity. The following general problem is addressed in this research: N n=1 CONSTRAINED WATERMARKING PROBLEM. For coversignal frame {yn} generated according to model (3.2), find the set of watermarks, such that, for stegosignal frame {ynmle generated according to (3.3), the following fidelity criterion is met, Hy - ylloo < 7 (4-1) in which y and y are N -vect0rs with nth elements yn and 37.”, respectively. In the present work, the determination of a watermark set guaran— 68 teed to satisfy a fidelity criterion is readily solved as an SMF problem (refer Section 2.2). First, let us subtract yn from each side of equation (3.3), negate each side, then rearrange to obtain M yn — 3771 2 (ya — €12) "' :aiyn—i = (ya ‘“ €72) “' 5TYn' (4'2) i=1 Given a coversignal {yn E RM},1,V:1, a desired stegosignal {3711 E KHz/=1, and a maximum error tolerance '7, SMF [25] can be used to obtain the hyperellipsoidal membership set that tightly bounds the fol— lowing feasibility set (’PN Q RM) at time N, ’PN = {5| lly — ylloo < ’Y}- (43) in which y is the N -vector with nthelement yn, and y is the N -vector with nthelement éTyn + fin. The fidelity constraint can be generalized to allow for more “local” fidelity considerations in time as the signal properties change. A fidelity N criterion takes the form of pointwise absolute error bounds, {7n}n=1, on the difference between the stego— and coversignals: [yn — gnl < 7,, for each n E [1, N]. Upon defining the sequence 2n:yn—€ni n=132)"'3N3 (44) (recall that {fin} is known) and the search for the constrained water- 69 mark parameters is reduced to a SMF problem as in (2.2). Applying SMF method to the estimation of a as in equation (4.2) yields hyper— ellipsoidal set of watermark (perturbed model parameter) candidates, EN, guaranteed to contain and tightly bound the following exact set ’PN = {a 6 RM I [2,, — aTynl < 7”, n E [1,N] }. (4.5) The fidelity constraint is a bound on [wn], where the watermark signal is given by wn = 3],, — y,,. The hyperellipsoidal set is, .. .. C - ~ 6~ -——— {a|(a — a.(N>>T—,;fj-(a — am» < 1}, a 6 it“ (4.6) where aC(N) is the center of 8N. C N E RMXM is the covariance matrix and CN = YNY}; where YN = M y, e RMXN. As y N—l shown in Table 2.2.2, Kn is updated recursively for n = 1, . . . , N and the final value is obtained as KEN. By default, the center of the hyperellipsoid is used to construct the stegosignal (equation 3.3) and the embedded watermark vector is w = ac(N) — a. The watermark recovery process for SMF parametric watermarking involves LSE estimation of the modified LP coefficients (refer Table 3.2). Hence, even in case of SMF-based watermarking, the embedded watermarks are asymptotically (N ——+ 00) immune to an additive white noise attack [47]. 70 4.3 Robustness optimization The robustness property is dependent on selection of appropri- ate watermark solution from the hyperellipsoidal set, strength of the embedded watermark, and watermark detection. In general, greater ro- bustness can be obtained by embedding more energetic watermarks and this in turn, affects stegosignal fidelity. Although, by default, the center of the hyperellipsoid constitutes the watermark solution, in most cases it might not be the optimal solution for a given attack. For robust— ness analysis it is assumed that the hyperellipsoidal set EN is obtained through the SMF filtering algorithm subjected to the fidelity constraint, [3],, —yn[ < 77,. It should be noted that the hyperellipsoid is not centered at a, the vector of original LP coefficients. More energetic watermark vectors are embedded by selecting perturbed LP parameters from 8N that are as further away as possible from the original LP parameters a. The selection of appropriate watermark solution from 8N depends on the attack and the targeted robustness. The SMF-based watermarking approach is especially useful in im- proving watermark robustness against attacks whose effects vary based on the nature of the watermark signal. For example, robustness to a lowpass filtering attack can be improved by selecting low frequency watermark signals. 71 4.3.1 Optimal watermarks for a filtering attack The impulse response of an attack filter is assumed to be known and is denoted by {hn}. The stegosignal of form (equation 3.3) is to be constructed by selecting an appropriate vector of perturbed LP coeffi- cients from the hyperellipsoidal set EN. The corresponding watermark vector is defined as w = ii — a. Let {g5 £21 be the result of filtering the stegosignal. That is, at time n, gr}; : 37‘” * hn = y‘n * hn + wn * h”, (4.7) where {fin} is assumed to be a stegosignal constructed from any 5 6 EN including the best (optimized) a. An ineffective attack on the stegosig— nal will result in a filtered stegosignal with a filtered coversignal com- ponent that is perceptually dissimilar to the original. This is because watermark robustness is generally defined as the ability of the water— mark to survive an attack to the extent that the speech fidelity is not affected beyond an application-dependent criterion. Also, an attack is ineffective if the filtered watermark signal {w,f,}"=1 approximates the original watermark signal {wn}?,=1. The coversignal and the attack filter {hm};1 are predetermined quantities and hence the filtered coversig- nal component in equation (4.7) cannot be controlled by the watermark embedding algorithm. However, the second term in (4.7) ({wg}?=1) can 72 be made to be robust against the filtering attack by selecting an ap- propriate 5 from the set EN. The problem of selecting the “best” set of modified LP coefficients from 8N, is now addressed. Let Aw}; be defined as, Awgzw : :Kéz — ai)yn—il * (hn _— 6‘”)’ where 6,, is the Kronecker delta; 60 = 1 and 6,, = 0 for n 75 0. Clearly, Awf, is a function of 5 for a given attack filter. Then the mean squared error (MSE) between the filtered and original watermark signals is given by, N fie)- — = ”717?) (wf — w.) . (4.8) If a? = 5 — a is indeed the “best” watermark vector, then the cor- responding filtered watermark signal v-vf is associated with minimum MSE. Then, 5 is obtained by solving the following constrained opti- mization problem: minlmize f (a) (49) subject to 5 6 EN The method of lagrange multipliers can be used to solve this opti- mization problem [28]. The domain of the constraint function is the 73 hyperellipsoid, which is a convex set if 5541 is a positive definite matrix. N 4.3.2 Optimal watermarks for a quantization attack This section deals with uniform and non-uniform scalar quantizer attacks on watermarks. The quantizer consists of L equal or unequal intervals [11,12, . . . ,IL]. Each interval 1;, for l = 1, 2, . . . , L is associated with a quantization value 33,. The scalar quantization operation Q can be expressed as, where 33,9, 2 2:; whenever Igg — 1131] is minimum over l = 1, 2, . . . , L. To maximize watermark robustness to a specific quantization at- tack, a similar constrained optimization problem to that in equation (4.9) is solved with the objective function f (5) = %Z(wg — wn)”, where w% = 373, — y,,. In a similar way, optimal watermarks for best robustness to a combination of filtering and quantization attacks can be determined. The latter problem can be generalized for a combined attack involving several distinct attacks on the stegosignal. 74 4.3.3 Maximizing watermark energy The boundary of the hyperellipsoidal set obtained by SMF considera- tions is given by, 6i = {5|(5 — a.(N>>T%j—A(z-1) (5.1) 62(2) = A(Z) — Z‘(5n,yn,n)lyn,n]. (5.7) It is shown that, if the following conditions are true with probability one on n E [1, N]: [|(dn,yn,n) — ¢(yna TL)“ < 6 (58) then the models in equations (5.6) and (5.7) approximate according to, N-l ‘ [lyn—ynll S e{1+ZK’L}, w.p.1, n6 [1,N]. (5.10) i=1 This perturbed parameter theory is used for obtaining bounds on the perturbation of the coversignal caused by watermarking. 93 5.4.1 Time-varying AR models in watermarking The stegosignal obtained using equation (3.3) is manipulated into a time-varying AR model: 3 yn : éiyn—i + {n 1 s ‘l (at + wi)(yn—i - 972—1) 3) .. n—i + 5n yn—i .M: [0.1+ w,- + 1 ll 2 ~ (lawn—2' +£1.- .M: i 1 The expression for the time-varying AR coefficients {5%,} can be ma- nipulated into the form am : define: + wipm = at + ai(pn,i — 1) + wipn,i, where pm- 2 (%"—:—f) z 1. The time—varying AR parameters are com- posed of the true parameter term, a,, and the perturbation term, a,(pn,,-— 1) + (42,-qu 5.4.2 Application of perturbed parameter Markov equations to watermarking Now let us suppose that the stegosignal is constructed such that, (ai + can ,i )yn— —i + 6n: 2 andyn—i +671: (5'11) TIM: 94 an, if n is odd where wm- = . Then the time—varying AR pa— —w,-, if n is even rameters are given by, and : an,ipn,i : (at + wn,i)pn,i- (512) The AR(M) system is written in state space formulation as follows. Let, 5,111:A yn'l'Géan) 311,5" In + n ( n ) (5.13) fin = CTyn+1 where 81ml (371,2 éTimM An = I(M—1)x(M—1) 0 , 0 in : [fin-1: git-2a ' ° ° )ng‘A/I]Ti G = C = [110). ° ' IOlTa 5n =[&n,193n,21' ' ' ian,AI]T- The watermark coefficients ({wn,,-},1-l.“__1) are such that the time-varying AR coefficients are first order stationary with E [5m] 2 a, and [5m -—- a,-| < L. The perturbed AR parameter theory is used for determining 95 how well the AR model in equation (5.13) is approximated by the model n :An 11+an n,” y+1 y 5 r(y ) (5.14) yn : CTy n+1 In the above equation, A = E [An]. The vector yn is defined similarly to the analogous vector in equation (5.13). As demonstrated in [48], it is similarly determined that the small perturbation condition (Itim —a,-| < L) is equivalent to the condition (5.8) of the theorem. || 0. Here, r(A) represents the spectral radius of the matrix A. In Lemma 5.6.10 of [53] the matrix norm [H]... is given by HAIL = ||DtUTAUD,“1||1 = ||(UD{1)’1A(UD{1)||1, (5-16) in which I|.[[1 iS the maximum column sum matrix norm induced by the 61 vector norm, and Dt = diag(t,t2,t3,...,t") with t > 0 and sufficiently large. The matrix U is obtained by the Schur decomposi- tion of A given by A = UAUT, A being an upper triangular matrix with the main diagonal components comprised of the eigenvalues of 96 A. The vector norm compatible with the induced matrix norm |[.||... is the £1 norm since [IDtUTAUDt'lxlll g ||DtUTAUD,’1[[1[|x||1 and DtUTIUDt-1 = I, the identity matrix. It is reasonable to assume that “fin“... is bounded by W, a non- negative finite number. The bound on Hymn. is determined in [48,49] and is given by, llynll 1: [—-0.1025, —0.1234, 0.0289, —0.0429, 0.0056, ——-0.0368, —0.0465, 0.0371]. By applying the perturbed parameter theory, the right-hand side of equation (5.21) was determined to be 11.03. It was verified experimen- tally that the 61 norm of the difference between the stegosignal and coversignal is bounded by 11.03. A tighter upper bound would be of greater significance to practical implementations of parametric water— i marking. A relatively high value of 11.03 is obtained in the right-hand L side of (5.21) as the parameter perturbations are of higher energy than the underlying requirement in the formulation of the theorem [condi- tions (5.8) and (5.9)]. 99 Chapter 6 Conclusions The dissertation presents a general approach to watermarking of speech signals based on LP, LSP, LAR, IS, and PARCOR parametric models. The dissertation focusses on embedding watermark informa- tion by directly or indirectly modifying the long-term LP parameters of speech. Parametric watermarking incorporates characteristics of SS watermarking algorithms, as well as those of integration-by—synthesis techniques. These aspects strongly influence the fidelity, security and robustness characteristics of the technique. Watermark recovery is treated as a system identification problem involving LSE estimation. The watermark information is concentrated during the embedding and recovery phases, while it is temporally and spectrally distributed otherwise. The distributed nature of the water— mark combined with the LSE estimation during recovery, contribute to watermark robustness. 100 The dissertation initially focussed on speech watermarking in the LP domain. In experiments presented here, and in many others, LP parametric watermarking has proven to be robust to most common forms of attack. An example parametric watermark detector has been presented to assess performance. The noise in the parameter domain was found to be Gaussian distributed when white or colored noise was added to the stegosignal in the time domain. By selectively normalizing watermark coefficients to parameter magnitudes, 1 / [a,[, whenever [ail > 1, the parameter noise affecting the watermark coefficients was rendered independent of the original predictor coefficients. Through this selective normalization, watermark detection can be treated as signal detection problem in the presence of Gaussian noise. Very low false-alarm rates are achieved. The method of Lagrange multipliers can be used for obtaining op- timally robust watermarks from perturbed LP coefficients selected from the membership set. SMF optimization is however not useful against at— tacks that are independent of the stegosignal. For applications limited by computational complexity and where the energy of the watermark signal is considered to be of main significance to robustness, searching the hyperellipsoidal boundary at intermittent points results in more ro— bust watermarks than the central estimate of the membership set. The use of SMF in obtaining robust watermarks to filtering, quantization, 101 ”.51"- -b ‘V:\ 9' a and combination attacks is demonstrated. The fidelity and robustness aspects of LP, LSP, LAR, IS, and par~ cor parametric watermarking algorithms were compared. It is deter- mined that stegosignals obtained by LP and LAR watermarking are generally associated with high fidelity even at a low CWRseg of 7 dB. Although LSPs cannot be watermarked at 7 dB CWRGCg, LSP-based watermarking is highly robust to noise and G711 and G.726 codecs even at a CWR60g of 27 dB. In general, parametric watermarking is much less robust to CELP and LPC10 codecs compared to G.711, G.726 and GSM codecs. However, the quality of speech decompressed by low bit rate CELP and LPC10 codecs is very low. An application of AR perturbed parameter theory to speech water- marking is presented and bounds are obtained on the watermark signal for small parameter perturbations. Parametric watermarking algorithms can be used for applications such as content management, broadcast monitoring, owner identifica- tion and copyright protection. Parametric watermarking is highly ro- bust to additive noise, quantization errors, speech codecs such as G.711, G.726, GSM, and cropping. Based the requirements of an application, the fidelity and robustness of parameter-embedded watermarks can be systematically adjusted. 102 fur ms": 0.2.1.5111 Bibliography [1] MS. SEADLE, J .R. DELLER, JR. and A. GURIJALA, “Why water— mark? The copyright need for an engineering solution,” Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL), Port- land, July 2002. [2] I.J. COX, M.L. MILLER and J .A. BLOOM, Digital Watermarking, Academic Press, 2002. [3] N.F. JOHNSON, Z. DURIC and S. JAJODIA, Information Hiding: Steganography and Watermarking - Attacks and Countermeasures, Kluwer Academic Publishers, 2000. [4] S. VOLOSHYNOVSKIY, S. PEREIRA, T. PUN, J.K. SU and J .J . EGGERS, “Attacks and benchmarking,” IEEE Communica- tions Magazine, August 2001. [5] A. GURIJALA and J .R. DELLER, JR., “Robust algorithm for wa- termark recovery from cropped speech,” Proceedings of IEEE In— ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, May 2001. [6] J .R. DELLER, JR., J .H.L. HANSEN and J .G. PROAKIS, Discrete— Time Processing of Speech Signals (2d ed.), IEEE Press, 2000. [7] A. GURIJALA, J.R. DELLER, JR., M.S. SEADLE and J.H.L. HANSEN, “Speech watermarking through parametric mod- eling,” Proceedings of International Conference on Spoken Language Processing (ICSLP), Denver, CO, September 2002. [8] S. HAYKIN, Adaptive Filter Theory (3d ed.), Prentice-Hall, 1996. 103 [9] J.H.L. HANSEN, B. ZHOU, M. AKBACAK, R. SARIKAYA and BL. PELLOM, “Audio stream phrase recognition for a National Gallery of the Spoken Word: One small step,” Proceedings of I C- SLP, Beijing, October 2000, pp. 1089-1092. [10] I.J. Cox, .1. KILIAN, T. LEIGHTON and T. SHAMOON, “Secure spread spectrum watermarking for multimedia,” IEEE Transactions on Image Processing, vol. 6, no. 12, pp. 1673-1687, December 1997. [11] F.J. RUIZ and J .R. DELLER, JR., “Digital watermarking of speech signals for the national gallery of the spoken word,” Proceed- ings of IEEE I CASSP, Istanbul, Turkey, May 2000, pp. 1089—1092. [12] D. ANAND and U.C. NIRANJAN, “Watermarking medical images with patient information,” Proceedings of IEEE/EMBS Conference, Hong Kong, October 1998, pp. 703-706. [13] SC. MIAOU, C.H. HSU, Y.S. TSAI and HM. CHAO, “A secure data hiding technique with heterogeneous data-combining capability for electronic patient records,” Proceedings of the World Congress on Medical Physics and Biomedical Engineering: Electronic Health- care Records, Chicago, July 2000. [14] T. KALKER, G. DEPOVERE, J. HAITSMA and M. MAES, “A video watermarking system for broadcast monitoring,” Proceed- ings of SPIE 1S8 T/SPIE ’s 11th Annual Symposium on Electronic Imaging ’99: Security and Watermarking of Multimedia Contents, Chicago, January 1999, vol. 3657. [15] AS. SPANIAS, “Speech Coding: A Tutorial Review,” Proceedings of the IEEE, vol. 82, no. 10, pp. 1541-1582, October 1994. [16] Q. CHENG and J. SORENSEN, “Spread spectrum signalling for speech watermarking,” Proceedings of IEEE I CASSP, Salt Lake City, May 2001, vol. 3, pp. 1337-1340. [17] M. HAGMULLER, H. HORST, A. KROPFL and G. KU- BIN, “Speech watermarking for air traffic control,” Proceedings of 12thEur0pean Signal Processing Conference, Vienna, Austria, September 2004. 104 [18] M. HATADA, T. SAKAI, N. KOMATSU and Y. YAMAZAKI, “Dig- ital watermarking based On process of Speech production,” Proceed- ings of SPIE: Multimedia Systems and Application, 2002, vol. 4861. [19] M. CELIK, G. SHARMA and A.M. TEKALP, “Pitch and Dura- tion Modification for Speech Watermarking,” Proceedings of IEEE ICASSP, Philadelphia, PA, March, 2005, vol. 2, pp. 17-20. [20] E. MOLINES and F. CHARPENTIER, “Pitch-synchronous wave- form processing techniques for text-to—speech synthesis using di- phones,” Speech Communication, vol. 9, no. 5—6, pp. 453-467, De— cember 1990. [21] B. CHEN and G.W. WORNELL, “Quantization index modulation: A class of provably good methods for digital watermarking and in- formation embedding,” IEEE Transactions on Information Theory, vol. 47, no. 4, pp. 1423-1443, May 2001. [22] A.M. KONDOZ, Digital Speech: Coding for Low Bit Rate Commu- nication Systems (2d ed.), John Wiley & Sons, 2004. [23] S. GOLLAMUDI, S. NAGARAJ, S. KAPOOR and Y.F. HUANG, “SMART: A toolbox for set-membership filtering,” Proceedings of 1997 European Conference on Circuit Theory and Design, Bu- dapest, Hungary, 1997. [24] S. NAGARAJ, S. GOLLAMUDI, S. KAPOOR and Y.F. HUANG, “BEACON: An adaptive set-membership filtering technique with sparse updates,” IEEE Transactions on Signal Processing, vol. 47, no. 11, pp. 2928-2941, November 1999. [25] J .R. DELLER, JR. and Y.F. HUANG, “Set-membership identi- fication and filtering for Signal processing applications,” Circuits, Systems, and Signal Processing. (Special issue on signal processing and its applications), vol. 21, no. 1, pp. 69-82, January 2002. [26] J.R. DELLER, JR., M. NAYERI and SF. ODEH, “Least square identification with error bounds for real-time signal processing and control,” Proceedings of the IEEE, vol. 81, pp. 813-849, June 1993. 105 1“...” ., 4,: , , ' ‘ . [27] J.R. DELLER, JR., “Set membership identification in digital signal processing,” IEEE Acoustics, Speech and Signal Processing Maga— zine, vol. 6, no. 4, pp. 4-20, October 1989. [28] S. BOYD and L. VANDENBERGHE, “Convex Optimization,” Cam- bridge University Press, 2004. [29] A. GURIJALA and J .R. DELLER, JR., “Speech Watermarking by Parametric Embedding with an €00 Fidelity Criterion,” Pro- ceedings of Eurospeech-2003, Geneva, Switzerland, September 2003, pp. 2933-2936. [30] SPEECH FILES, http: / /www.egr.msu.edu/ ~deller/Parankg/WAVFILES. [31] D. GRUHL, A. LU and W. BENDER, “Echo hiding,” Lecture Notes in Computer Science; Proceedings of the First International Workshop on Information Hiding, Cambridge, UK, 1996, vol. 1174, pp. 293-315. [32] S. WANG, A. SEKEY and A. GERSHO, “An objective measure for predicting subjective quality of speech coders,” IEEE Journal on Seclected Areas in Communications, vol. 10, no. 5, pp. 819-829, June 1992. [33] SA. CRAVER, N. MEMON, B-L. YEO and M. YEUNG, “Resolv— ing Rightful Ownerships with Invisible Watermarking Techniques: Limitations, Attacks, and Implications,” IEEE Journal of Selected Areas in Communications - Special issue on Copyright and Privacy Protection, vol. 16, no. 4, pp. 573—586, May 1998. [34] J .J . HERNANDEZ, M. AMADO and F. PEREZ-GONZALEZ, “DCT- domain watermarking techniques for still images: Detector perfor- mance analysis and a new structure,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 55-68, January 2000. [35] M. BARNI, F. BARTOLINI, A.D. ROSA and A. PIVA, “Opti- mal decoding and detection of multiplicative watermarks,” IEEE Transactions on Signal Processing, vol. 51, no. 4, pp. 1118-1123, April 2003. 106 [36] TR CHEN and T. CHEN, “A framework for optimal blind wa- termark detection,” Proceedings of ACM Multimedia and Security Workshop, Ottawa, Canada, October 2001. [37] J.P.M.G. LINNARTZ, A.C.C. KALKER and G.F. DEPOVERE, “Modeling the false—alarm and missed detection rate for elec- tronic watermarks,” Lecture Notes in Computer Science, vol. 1525, pp. 329-343, Springer-Verlag, 1998. [38] ML. MILLER and J.A. BLOOM, “Computing the probability of false watermark detection,” Proceedings of the Third Workshop on Information Hiding, Dresden, Germany, 1999, pp. 146-158. [39] A. GURIJALA and J.R. DELLER, JR., “Detector design for para- metric speech watermarking,” IEEE International Conference on Multimedia and Expo (ICME), Amsterdam, The Netherlands, July 2005, pp. 251—255. [40] H.V. POOR, An Introduction to Signal Detection and Estimation (2d ed.), Springer-Verlag, 1994. [41] P.J. PRICE, “A database for continuous speech recognition in a 1000-word domain,” Proceedings of IEEE ICASSP, New York, vol. 11, pp.651-654, 1988. [42] G.R. COOPER and CD. MCGILLEM Modern Communications and Spread Spectrum, McGraw-Hill Book Company, 1996. [43] D. KUNDUR and D. HATZINAKOS, “Diversity and attack charac- terization for improved robust watermarking,” IEEE Transactions on Signal Processing, vol. 29, no. 10, pp. 2383-2396, October 2001. [44] J.G. PROAKIS and D.G. MANOLAKIS, Digital Signal Processing: Principles, Algorithms, and Applications (3rd ed.), Prentice-Hall, 1996. [45] F.A.P. PETITCOLAS, R.J. ANDERSON and MG. KUHN, “At- tacks on copyright marking systems,” Proceedings of Second Work- shop on Information Hiding, Portand, Oregon, April 1998, pp.218— 238. 107 [46] HAWKVOICE FROM HAWK SOFTWARE, http: / /www.hawksoft .com/hawkvoice. [47] A. GURIJALA and J.R. DELLER, J R., “Speech watermarking with objective fidelity and robustness criteria,” Proceedings of Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, November 2003. [48] J .R. DELLER, JR. and Z. GULBOY, “Simplified models for per- turbed parameter Markov equations with application to ARMA sys— tems,” International Journal on Systems Science, vol. 14, no. 10, pp. 1185-1190, 1983. [49] J .R. DELLER, JR. and Z. GULBOY, “A correction to ’Simplified models for perturbed parameter Markov equations with applica- tion to ARMA systems’,” International Journal on Systems Science, vol. 15, no. 8, pp. 915-916, 1984. [50] SR. QUAOKENBUSH, T.P. BARNWELL and M.A. CLEMENTS, Objective Measures of Speech Quality, Prentice-Hall, NJ, 1988. [51] S. KAY, Modern Spectral Estimation: Theory and Application, Prentice-Hall signal processing series, NJ, 1988. [52] S. BAUDRY, J .-F. DELAIGLE, B. SANKUR, B. MACQ and H. MAITRE “Analysis of error correction strategies for typical com- munication channels in watermarking,” Signal Processing, vol. 81, pp. 1239—250, 2001. [53] RA. HORN and CR. JOHNSON, Matrix Analysis, Cambridge University Press, 1996. 108 lI[[1]][l][[[[[l[[l[]]l][[[][l]ll