'nmms MllIIIHIJIHIIIHIIIWINIHlllUllllll‘llllllllllllllllll 1 3129 017102801 r - : This is to certify that the thesis entitled DEVELOPMENT OF THE DETECTOR STAGE OF AN ADAPTIVE ALERTING DEVICE FOR PEOPLE WITH HEARING DISABILITIES presented by VIKTOR ADUT has been accepted towards fulfillment of the requirements for M . S . degree in MAL ENG . Major professor ;\ Date ‘32 QI/MLY 98 . ‘ 0-7639 MS U is an Affirmative Action/Equal Opportunity Institution LIBRARY Michigan State Unlvmlty PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. I MTE DUE DATE DUE DATE DUE av 03 2003 E139 g as $4 MAMMQM 1“ WM“ DEVELOPMENT OF THE DETECTOR STAGE OF AN ADAPTIVE ALERTING DEVICE FOR PEOPLE WITH HEARING DISABILITIES By Viktor Adut A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Electrical Engineering 1998 ABSTRACT DEVELOPMENT OF THE DETECTOR STAGE OF AN ADAPTIVE ALERTING DEVICE FOR PEOPLE WITH HEARING DISABILITIES By Viktor Adut Since their inception, devices which signal deaf people to the occurrence of alerting signals (fire alarms, doorbells, etc.) have changed little. Their use is limited to highly constrained environments and relatively few sounds. This thesis reports the results of the first phase of a commercial research project between Michigan State University’s Speech Processing Laboratory and Silent Call Corporation whose goal is to develop a digital-signal-processing—based portable device which “listens” to the environment and detects the presence of alerting signals. The device will also have the capability to “learn” new sound sets. In this work, algorithms for detection of alerting signals under realistic noise environments are developed, and initial simulation results quantifying the performance of these algorithms are presented. ACKNOWLEDGMENTS I am indebted to Professor John R. Deller, Jr. for the guidance he provided. I would like to thank to my committee members, Professor Marvin Siegel and Professor Robert Nowak, for their insightful comments. I am also thankful to Mr. Dale Joachim (M.S., Ph.D. candidate) for numerous invaluable discussions, especially during the initial stages of the project. This study is based in part on work supported by State Research Fund grant monies awarded to Silent Call Corporation by the Michigan Jobs Commission. Any opinions, findings, conclusions, or recommendations expressed in this thesis are solely mine and do not necessarily reflect the views of the Michigan Jobs Commission. iii TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Introduction 2 Detector Architecture 2.1 Overview .................................. 2.2 Operating Environment ......................... 2.2.1 Classification of Alerting Signals ................. 2.2.2 Noise Environment ........................ 2.3 Sampling Stage .............................. 2.4 Choosing a Detection Strategy ...................... 2.4.1 Spectral Thresholding ...................... 0 2.4.2 Dynamic Time Warping and Hidden Markov Models ..... 2.4.3 Artificial Neural Networks .................... 2.5 A Phnctional Block Description of Detector Architecture ....... 3 Detector Design 3.1 Design Methodology ........................... 3.2 Detection Filters ............................. 3.2.1 Matched Filter .......................... 3.2.2 Eigenfilter ............................. 3.2.3 Prediction Error Filter ...................... 3.2.4 Autocorrelation Detector ..................... 3.3 Decision Block .............................. 3.3.1 Synchronization .......................... 3.3.2 Decision Rule ........................... 3.4 Postprocessor ............................... iv vi vii H #AODW 13 19 20 22 22 23 23 26 26 27 30 31 33 33 36 38 4 Performance Evaluation 4.1 Introduction ................................ 4.2 Benchmarking with the SNR Improvement Factor ........... 4.2.1 Formal Developments ....................... 4.2.2 Simulation Results ........................ 4.3 Benchmarking Using Detection Rates .................. 4.3.1 Introduction ............................ 4.3.2 Performance of the Decision Rule ................ 4.3.3 Performance of the Overall Detector .............. 5 Conclusions and Considerations for Future Research 5.1 Conclusions ................................ 5.2 Future Research .............................. BIBLIOGRAPHY 4O 40 41 41 43 44 44 47 49 63 63 64 65 4.1 4.2 4.3 4.4 4.5 LIST OF TABLES Design details of the detection filters used in the simulations ...... SNR improvement factors in decibels for matched filter ......... SNR improvement factors in decibels for prediction error filter. . . . . SNR improvement factors in decibels for eigenfilter. .......... SNR improvement factors in decibels for autocorrelation detector. . . vi 45 46 46 46 46 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 LIST OF FIGURES A fire alarm signal consisting of a stationary region and a region of silence. .................................. A doorbell signal consisting of two stationary regions. ........ A police siren signal consisting of two stationary signal segments con- nected by a transition region. ...................... A crying baby signal consisting of two stationary signal segments con- nected by transition regions. ...................... Several example stationary signal segments and their frequency spec- tra. .................................... Several example noise waveforms. ................... The sampling stage. ........................... The basic detector architecture. .................... Framewise processing of a stationary signal segment. ......... Power curves of the matched filter detector for L=750 samples and several values of a. ............................ Power curves of the matched filter detector for L=1000 samples and several values of a. ............................ Power curves of the matched filter detector for L=1500 samples and several values of a. ............................ Power curves of the eigenfilter detector for L=750 samples and several values of a. ................................ Power curves of the eigenfilter detector for L=1000 samples and several values of a. ................................ Power curves of the eigenfilter detector for L=1500 samples and several values of a. ................................ Power curves of the autocorrelation detector for L=750 samples and several values of a. ............................ Power curves of the autocorrelation detector for L=1000 samples and several values of a. ............................ vii 11 l4 17 21 25 35 50 51 52 53 54 55 56 4.9 Power curves of the autocorrelation detector for L=1500 samples and several values of a. ............................ 4.10 Overall error rate of the matched filter detector for L=1500 samples and several values of a. ......................... 4.11 Overall error rate of the eigenfilter detector for L=1500 samples and several values of a. ............................ 4.12 Overall error rate of the autocorrelation detector for L=1500 samples and several values of a. ......................... viii CHAPTER 1 Introduction During the past decades advances in biomedical engineering combined with break- throughs in microelectronics resulted in significant improvements in the devices for people with hearing disabilities. For people with mild to profound hearing impair- ments, the most widely used solution is a classical hearing aid fitted in the ear canal. If the auditory nerves are damaged, a cochlear implant consisting of clusters of elec- trodes which directly excite the nerves in cochlea can be considered [8]. However, these techniques are of limited success when hearing loss is severe. They result in poor sound quality and the problems can be so serious that the owner may opt not to use them unless absolutely necessary, or reject them totally [18]. If neither of these solutions helps, the inability to respond to emergency signals becomes a potentially life threatening problem. In this case the deaf person must use alerting devices that signal the occurrence of acoustic events like doorbells, telephones, smoke detectors, etc. Such alerting devices have changed little over the past fifty years. They usually consist of a transmitter hard-wired to the signaling device which activates a blinking light or a vibratory device carried by the system user. The operation of these devices is extremely. constrained and their use is limited to the everyday environments of the deaf person. Recently, a digital-signal-processing (DSP) based adaptive alerting device which solves these limitations was proposed, and its feasibility was studied by [5]. Using DSP technology, one can develop a portable device which “listens” to the environment and detects the presence of alerting signals. One can also equip this device with the capability to “learn” new sound sets. The current advances in microprocessor technology provide the required computational power for real-time implementation of such algorithms at low cost. Given the fact that there are 28 million individuals with with hearing impairments in United States alone [18], the world-wide utility of this device is enormous. This thesis was written as part of a joint research project between Michigan State University’s Speech Processing Laboratory and Silent Call Corporation whose ulti- mate goal is to develop a commercially-available adaptive alerting device for mass markets. The results of the first phase of this project are presented here. Before studying the development of detection algorithms, we must concisely state the specifications of the alerting device. Our goal is to develop a device which can detect the presence of alerting signals (sirens, doorbells, crying babies, etc.) under realistic noise environments encountered in everyday life (people talking, highway noise, air conditioner noise, etc.). The algorithm must be computationally efficient, work with limited memory, and lend itself to trainability. In this study we pursue a detection theoretic approach for the development of the alerting device, and focus on the development of the detector stage of the adaptive alerting device. In Chapter 2, the basic structure of the detector stage is developed in terms of functional blocks. In Chapter 3, several detection strategies are examined from a theoretical standpoint. Results of simulation studies are presented in Chapter 4. The final chapter discusses open points, and outlines the course of future research. CHAPTER 2 Detector Architecture 2.1 Overview The goal of this chapter is to describe the detector stage of the alerting device in terms of functional blocks. The primary factors affecting any detection algorithm are the properties of the signals to be detected and those of the noise environment; therefore we start by classifying alerting signals according to the complexity of their time-domain waveforms and performing an analysis of the diverse acoustic noise en- vironments under which the alerting device is expected to operate. In Section 2.3, we turn our attention to the sampling stage and explain how the automatic gain controller (AGC) used in this stage complicates the detection procedure. Next, in Section 2.4, we justify the decision to apply a detection-theoretic technique to the development of the adaptive alerting device, and discuss the shortcomings of other possible approaches. Finally, in Section 2.5, our observations lead to the formulation of an alerting signal detector consisting of three stages. 2.2 Operating Environment 2.2.1 Classification of Alerting Signals Let us consider Figures 2.1-2.4 which show frames of several alerting signals. Each frame is indexed beginning from time 0 and is long enough to represent the salient properties of the individual alerting signals. As every example signal in this thesis, they were sampled at 8 kHz and quantized to 8 bits. First, let us focus on the fire alarm signal in Figure 2.1. Every realization of this signal consists of the repetition of a stationary signal segment followed by a region of silence. The duration of these regions is fixed for every realization. Next, let us examine the doorbell signal shown in Figure 2.2. It is composed of two stationary signal segments connected. Depending on how quickly one pushes and releases the doorbell button, the length of each signal section in the generated signal will be different; however their order will be same. Finally, let us analyze the crying baby signal shown in Figure 2.4. This signal consists of several quasi-stationary signal segments connected by transition regions; both their duration and temporal order will be different every time the baby cries. Thus we conclude that alerting signals are locally stationary; based on the above observations we can classify them according to increasing complexity as follows: 0 Type I alerting signal: The temporal order and duration of each stationary signal segment is fixed for every realization (e. 9., fire alarm). 0 Type II alerting signal: The temporal order of stationary signal segments is same across realizations; however their durations are different (e. g., doorbell). 0 Type III alerting signal: Both the temporal order and durations of stationary signal segments are variable across realizations (e.g., crying baby). . 1 .5 2.5 :3 Time (norm—sec) x 10‘ (a) A fire alarm signal. 0.6 0.6 0.4 ' 0.2I . 1‘ ‘1‘ ' [I —O.2 -O.4 -0.8 —O.B I I A I ...1 k 0 50 1 00 1 50 250 300 350 ‘00 200 Time (norm-sec) (b) A stationary signal segment. Figure 2.1. A fire alarm signal consisting of a stationary region and a region of silence. -0.4 —O.6 -O.8 -1 Figure 2.1. (cont’d). l I l l l L 1 00 1 50 200 250 300 350 Time (norm—soc) (c) A region of silence. 400 0.8 0.6 0.4 0.2 -0.2 -0.4 -0.8 -0.8 200 Time (nonn—oec) (b) First stationary signal segment. Figure 2.2. A doorbell signal consisting of two stationary regions. 1 1 . . . . . 2000 4000 6000 8000 1 0000 1 2000 1 4000 1 6000 Time (norm—sec) (a) A doorbell Signal. r . v u u 1 u . . l 1 1 A; + () 50 1 00 1 50 250 300 350 400 -0.4 -0.0 —0.8 —1 Figure 2.2. (cont’d). l J l A I l 1 00 1 so 200 260 300 350 Time (norm-soc) (c) Second stationary signal segment. 400 0.8 _, . . . . . . 0 0.5 1 1 .5 2 2.5 3 8.5 Tim. (norm—soc) x 10‘ (a) A police siren signal. 1 . . .’ . . . . 0.0 - —1 0.6 - _ 0.4 - -‘ 0.2 g o -—0.2 - -o.4 — ~ —o.e — - —o.a - ~ -1 .1. A 4— _A .1. .1 _L.. 0 50 100 150 250 300 350 400 200 Time (noun-soc) (b) First stationary signal segment. Figure 2.3. A police siren signal consisting of two stationary signal segments con- nected by a transition region. 10 Figure 2.3. (cont’d). 0.3 — .1 0.0 - .. 0.4 — — 0.2 - 0 io . —0.2 — —1 —0.4 - - —o.o - - -0.a — -] ‘10 go 100 150 250 300 350 41 200 Time (norm—loo) (c) Second stationary signal segment. -1 #— L A L O 500 1 000 1 500 2000 2500 3000 Time (norm-sec) (d) 'IYansition region. 11 I O 0.5 1 Time (norm—sec) (a) A crying baby signal. -1 I l . _|__ _l_ 0 50 1 00 200 250 300 1 50 Time (norm—sec) (b) First stationary signal segment. Figure 2.4. A crying baby signal consisting of two stationary signal segments con- nected by transition regions. 12 Figure 2.4. (cont’d). 0.8 ‘- 0.8 — 0.4 '- . i . 50 1 00 1 50 200 T1me (norm—sec) (c) Second stationary signal segment. A 250 300 ;__ n 120 140 40 00 1 00 Time (norm—sec) (d) An example transition region. 180 180 200 13 Next, let us briefly consider the spectral properties of the stationary signal seg- ments. From Figure 2.5, we see that the the energy of the stationary signal segments of interest is concentrated in narrow spectral peaks. In the subsequent sections we exploit the conclusions of the above analysis to derive a detection scheme tailored to the properties of alerting signals. 2.2.2 Noise Environment The adaptive alerting device will operate under a multitude of different acoustic noise environments encountered in everyday life. As the example waveforms given in Figure 2.6 indicate, the noise environment can be described as additive, non-Gaussian, and nonstationary. With the exception of very restricted cases, such noise environments cannot be reliably modeled [9]. Comparable situations are frequently encountered in several major application areas of signal processing, such as in communication and radar systems. In these cases, adaptive noise cancelers are widely used to reduce background noise [6]. However, for reasons explained in Section 2.3, our inability to estimate the instantaneous signal and noise powers bars us from utilizing an adaptive noise canceler. Due to these difficulties, in the remainder of the thesis we shall not attempt to model the noise environment and develop the detection algorithm under the as- sumption that the adaptive alerting device Operates under independent, identically distributed Gaussian noise. The justification for this assumption will be provided in Section 3.2. Another important factor in designing the detector is the signal-to-noise ratio (SNR) at which it is expected to operate. Since we do not expect to outperform the human auditory system, we target SN R’s higher than -2 dB. 1 1 1 1 1 1 1 1 1 1 1 0.8 - _ 0.6 — - 0.4 - ‘ 1 1 ‘ ‘ ‘ ~ l ‘ ‘ 1 0.2 .— 0 1 —-0.2 »- l l [ ‘ —0.4 '- -1 _o_a .. .1 -03 ._ .1 _1 I 1 1 1 1 1 1 i 1 50 1 00 1 50 200 250 350 400 450 600 Time (north—sec) so ._ .......................................................................................... _[ 50 _ ......................................................................................... - 4° _ .......................................................................................... _ 3,..- .......................................................................................... _ 2° _ ........................................................................................ . . 1 —1 1° 1— ......................................................... .- A I) 0.5 ‘ I 1 .5 Frequency (radians) (a) A stationary signal segment from a telephone signal and its frequency spectrum. Figure 2.5. Several example stationary signal segments and their frequency spectra. 15 Figure 2.5. (cont’d). 2.5 3 1 .5 Frequency (radiana) (b) A stationary signal segment from a buzzer signal and its frequency spectrum. 16 Figure 2.5. (cont’d). I T T T I I I I I I l n n n i i 1 06- n n n 0.4- 0.2- -1 -O.2'- - -O.4- - .051. J a a J _ _081— H J V U -1 _1 I L l J l J l l l 50 100 150 200 250 300 350 400 450 600 TIme(norm—eec) 70.. ............. I ............... I ............... .I ............... I ............... I ...... . ........ I..—1 6°.— ................................................................................................ ..1 601— ...... ‘r ........................................................................................... —I [— ................................................................................................. _. 11 (c) A stationary signal segment from a tornado siren signal and its frequency spectrum. 1 A. .... “A 0.5 1 1 .5 2 2.5 3 Frequency (radiana) L l 17 -O.8 1 0‘00 20.00 30‘00 40.00 50.00 60'00 70.00 80.00 90‘00 1 0000 Time (norm-sec) (a) Highway backgroud noise. 0.6 v v v r I l I I 1 1 1 6000 7000 8000 9000 1 0000 1 1 1 1 1 000 2000 3000 4000 5000 Time (norm—sec) (b) People talking. Figure 2.6. Several example noise waveforms. 0.2 —0.2 -O.4 -0.0 18 Figure 2.6. (cont’d). i.- l l A A 1000 2000 3000 4000 5000 0000 Time (norm-sec) (c) Computer keyboard. 7000 0000 9000 10000 19 2.3 Sampling Stage Now let us turn our attention to the sampling stage of the adaptive alerting device. This stage consists of four blocks: Microphone, amplifier, sample—and—hold circuit, and analog-to—digital (A/ D) converter (see Figure 2.7). During the amplification stage, the weak signal from the microphone (typically 100-200 mV) is brought to a level suitable for processing by the A/ D converter (typ- ically 5-10 V). The desired amplification factor depends on the environment in which the device operates. In a noisy environment (e. 9., background highway noise) the amplification should be less than in a quiet environment (e.g., a library). Therefore we use an automatic gain controller (AGC) to adjust the gain of the amplifier. The AGC ensures that the microphone signal is amplified maximally without causing sat- uration. Thus the resulting sampled signal has approximately constant amplitude. This complicates the detection procedure in the following ways: 0 The instantaneous power of some Type I alerting signals (e.g. fire alarms, smoke detectors and telephones), when considered as a function of time, is nothing but a rectangular pulse train. The detection of such signals is a well-studied problem in digital communications [3]. However, because of the effects of AGC, the instantaneous power of the signal at the input of the detector stage will be approximately constant; therefore we cannot use the instantaneous power of the received signal as a detection feature. 0 The instantaneous signal and noise levels cannot be estimated. Therefore the parameters of the detection algorithm must be independent of instantaneous sig- nal and noise levels. Unfortunately, most standard statistical signal processing and pattern recognition algorithms require a pn'ori knowledge of signal and noise powers (see, e. g., [14, 16]). This observation suggests using techniques from ro- bust [10] and non-parametric detection theory [11]; however these schemes are 20 not widely applicable. Thus we must develop a detection algorithm which is robust to the scaling introduced by the AGC; the parameters of the detector should not depend on signal and noise powers. We shall model this effect by assuming that the instantaneous power of the signal at the detector input is constant. Strictly speaking, this model is inaccurate. The fact that the input signal has almost constant amplitude does not imply that it has constant power. However, this assumption will still result in an algorithm that is indifferent to the the effects of the AGC stage and thus will serve the present purpose. 2.4 Choosing a Detection Strategy As we mentioned in Chapter 1, we shall pursue a detection-theoretic approach to the development of the alerting signal detector. Of course, this is not the only way of solving this problem; techniques from other fields such as speech recognition and neural networks could also be applied. In this section, we justify our decision to use a detection-theoretic scheme, and discuss the reasons for ruling out other approaches. As described in Section 2.2.1, alerting signals consist of long stationary regions connected by brief transition regions. Although the transitions play an important role in human perception of auditory signals [1], they are very difficult to model due to their nonstationary nature. On the other hand, for our purposes, the stationary re- gions contain sufficient information for detection of alerting signals. This observation enables us to reduce the detection of alerting signals to the detection of stationary signal segments. To form an analogy with communications theory, the stationary signal segments composing the alerting signals to be detected form a “symbol alphabet” [19]. Thus we can use a suitable modification of standard detector structures used in communi- cations receivers for detection of stationary signal segments. Microphone Signal Amplifier 21 ...] Figure 2.7. The sampling stage. Sample and Hold Circuit AID Converter Discrete ‘———* Time Signal 22 In addition to this detection-theoretic scheme, we have investigated the feasibil- ity of techniques based on spectral analysis, dynamic time warping (DTW), hidden Markov models (HMM), and artificial neural networks (ANN). The shortcomings and disadvantages of these approaches are discussed in the following paragraphs. How- ever, it should be noted that the primary advantage of pursuing a detection theoretic approach is that — unlike the HMM, DTW or ANN-based approaches discussed below — it promises to result in a computationally efficient solution. 2.4.1 Spectral Thresholding A very simple detection algorithm could be based on the spectral properties of alerting signal segments. The fact that the energy of stationary signal segments is concen- trated in narrow spectral regions, suggests using a detector consisting of a bank of bandpass filters and a comparator. However, under noisy conditions, the spectral peaks are not as dominant as under noiseless conditions; therefore combined with the inability to estimate SNR, reliable threshold selection is not possible. Additionally, signal processing techniques based on spectral thresholding have high false-alarm and miss rates in general (for example, formant estimation based on spectral thresholding [4])- 2.4.2 Dynamic Time Warping and Hidden Markov Models Another idea is to use techniques from speech recognition, such as dynamic time warping (DTW) [4, 17] and hidden Markov modeling (HMM) [4, 15]. They are both tailored to handle the variability among different utterances of the same word; there- fore they can easily c0pe with the differences among realizations of Type II and III alerting signals. However, with decreasing SNR, the performance of these algorithms drops 23 drastically. This lack of robustness is caused by the distance metrics used in DTW and HMM algorithms. Regardless of the choice of acoustic features, all distance mea- sures are very sensitive to noise. Thus we rule out the possibility of using algorithms from the speech recognition field in the detector stage. 2.4.3 Artificial Neural Networks Yet another detector structure could be designed using neural networks in combi- nation with other suitable techniques [5]. Neural networks naturally account for commonly encountered properties of real-life data, including nonstationarity and non- Gaussianity [7]. However, their implementation can be very complex and computa- tionally demanding; therefore, in practice, their use is reserved to situations where signal modeling is difficult or impossible. 2.5 A Functional Block Description of Detector Architecture Before we proceed to the deve10pment of the detector architecture, we introduce some notation. We henceforth use the symbol A to denote an alerting signal and the symbol 8 to denote stationary signal segments. Using a set-like notation, we shall, describe an alerting signal consisting of n stationary signal segments as A = {81,811, . . . , 8“}. Suppose, we want to develop an algorithm for detection of the signals shown in Figures 2.1—2.4. Using the above introduced conventions, the fire alarm signal can be expressed as A, = {81}, the siren signal as A, = {52,83}, the doorbell signal as A; = {84, S5}, and the crying baby signal as A1, = {86, 87,83}. Thus the “symbol alphabet” is Ll = {81,132, . . . ,83}. 24 This notation leads us to a detector consisting of the following three stages (Figure 2.8): 0 Filter Bank Stage 0 Decision Block 0 Postprocessor The first two stages represent extensions of similar stages in a standard commu- nications receiver for M-ary symbol sets. They act as a detector for stationary signal segments. In a sense, these two stages take the received signal and convert it to a higher-level representation. The postprocessor takes the output of the decision block, and using the duration and temporal order of stationary signal segments, makes the following decisions: 0 Whether an alerting signal is present at the detector input or not. 0 If an alerting signal is present, which one. 25 _J. Detection Filter 1 Sampled ' ' input signal ' 0 0 . Detection Filter M Decision Block Postprocessor Figure 2.8. The basic detector architecture. CHAPTER 3 Detector Design 3.1 Design Methodology The non-Gaussian and nonstationary nature of the noise environment combined with the inability to estimate SNR prevents the pursuit of a strictly analytical procedure for derivation of the detector stages. For the same reasons, we cannot impose stan- dard optimality criteria from statistical signal processing [14] to the solution of this problem. These observations lead to the following design methodology which we shall follow in the remainder of the thesis: 0 Using ad hoc techniques as well as suitable modifications of standard schemes, we propose several potential solutions for each stage of the detector. Each method will only approximately meet the ideal specifications stated in Section 2.5 and will have different performance, cost, and implementation advantages. 0 Because of the lack of analytical models, we cannot meaningfully quantify the performance of different candidates. Therefore, based on simulation studies, we shall weigh the tradeoffs of different approaches and choose the most suitable configuration of the designed alternative blocks for use in the final device. 26 27 In this chapter we shall implement the first step of this design strategy, and develop the detector stages. Their performance evaluation will be the subject of the next chapter. 3.2 Detection Filters Let S be a signal segment to be detected. In this section, we develop several candidate detection filters for S. The treatment is based on the following assumptions: 0 In practice, the realizations of S are of finite duration. However, we assume that an infinite duration realization s(n), —00 < n < 00 of 8 exists and that a noisy version of this signal is input to the detector. For example, suppose S is the stationary signal segment corresponding to the fire alarm signal shown in Figure 2.1b. In this case, we can form an “eternal” realization of S by infinitely concatenating the signal segment with itself. This assumption enables us to design detector filters without being concerned with timing problems. Synchronization issues will be discussed in Section 3.3. 0 Another important problem in the design of detector filters is developing appro— priate signal models. A careful analysis of the alerting signal waveforms shown in Figures 2.1—2.4 reveals that some stationary signal segments are best modeled as deterministic whereas others as stochastic. For example, let us compare the stationary signal segments shown in Figures 2.1b (fire alarm) and 2.4c (crying baby). Every realization of the first will be identical, i.e., it is deterministic. On the other hand, the second one will be slightly different every time the baby cries. Therefore it should be modeled as a stochastic process. In the following discussion, we assume that every stationary signal segment can be modeled in both ways. Although incorrect, this will enable us to exploit 28 both techniques developed for detection of deterministic and stochastic signals. 0 Suppose we observe a realization of 8 in background noise. Let us denote the SNR in decibels at the input and output of the detection filter as SNRqon and SNRout, respectively. The design objective will be to maximize the difference ISNRom — SNRinl. In other words, the detection filter must accomplish one of the following two tasks: 0 Amplify the signal component while suppressing the noise component; 0 Suppress the signal component while amplifying the noise component. Because of the absence of analytic signal and noise models, we shall try to accomplish these goals in an heuristic manner. We call the difference ISNRout — SNRq'nl the SNR improvement factor ’ and this design criterion the SNR improvement criterion. Next, let us describe how the detection filters designed according to this criterion will perform. Let 81, 82, . . . , 8M denote the M stationary signal segments to be detected and let F,- be the detection filter corresponding to S,- where l 5 i S M. Without loss of generality, we can assume that a noisy realization of 81 is at the input of the detector. Assuming that the detection filters follow the first version of the design criterion given in Section 3.2, F1 should “resonate” with the input signal and suppress the noise component. On the other hand, from the perspective of all other filters, the input signal should appear to be noise only. Therefore they should suppress the input signal as well as the noise as much as possible. Thus the average power at the output of F1 is much higher than the output power of the other filters. The decision block should exploit this fact to decide in favor of 81. ‘ This notion is analogous to the concept of demodulation gain in communications theory. 29 o For high SNR and large sample size detection problems, improper assumptions regarding the noise environment do not significantly affect the detector perfor- mance [9]. It is easy to see that the development of the detection filters falls into this category. Therefore, as we remarked in Section 2.2.2 above, whenever the need to model the noise environment arises, we shall simply assume that it is independent, identically distributed Gaussian noise. Now we are ready to proceed with the development of detection filters. 3.2.1 Matched Filter Let S be a stationary signal segment, and let 3(k), —00 < k < 00 be an “eternal” realization of 8, obtained as described above. Assuming that S is deterministic, s(k) can be regarded as the endless repetition of a representative region consisting of samples r(0), r(1), ..., r(N - l), i.e. it can be expressed as 3(n) = r(n mod N). For example, Figures 2.5a, b and c show the representative regions of a buzzer, telephone, and tornado siren signal, respectively. Now, suppose that we observe a noisy realization :z:(k) = s(k)+n(k), —00 < k < 00 of 8 where n(k) denotes the noise sequence. An intuitive way of achieving a high SNR improvement factor is based on the cross-correlation sequence between the received signal and representative region. Let y(k) -.—. §lr(l)$(k +1), - 00 < k < 00 (3.1) i=0 be the cross-correlation sequence between x(k), —00 < k < co and r(0), r(1), . . ., r(N — 1). y(k) can be expressed as the sum of the sequences y,(lc) = [:31 r(l)s(k + l) and yn(k) 2 21:31 r(l)n(k + I). We expect yn(k), the cross-correlation between noise signal and representative region to assume values close to zero. On the other hand, y,(k), the cross-correlation between the signal to be detected and representative 30 region, should take higher values. Thus (3.1) suppresses the noise component of :c(lc) while amplifying the signal component and meets the SNR improvement criterion. It is easy to see that (3.1) can be implemented using a matched filter F(z) with impulse response f (m) = r(N — 1 —m), k = 0, .. ., N - 1 [14]. However, at a sampling rate of 8 kHz, the representative region is typically 300-500 samples long. Obviously, a filter with such a long impulse response is not suitable for real-time applications. Therefore we shall confine ourselves to approximate implementations of F(z) which are computationally tractable. In particular, since we are only concerned with the average power at the output of F (2), realizing its magnitude response is sufficient. To this end, any of the standard filter design techniques available in the literature can be used [13]. 3.2.2 Eigenfilter Let us model 8 as an ergodic wide-sense stationary random process and let R be its autocorrelation matrix. Under white noise, the impulse response of the opti- mum linear detection filter maximizing the SNR improvement factor is given by f (m) = vm(m), m = O, .. ., N — 1 where [vm(0) . . .vm(N—1)]T is the eigenvector corresponding to the largest eigenvalue of R and N is the size of R [6]. This filter is called as an eigenfilter and can be regarded as the stochastic counterpart of the matched filter introduced above. 3.2.3 Prediction Error Filter As we did in the previous section, assume that S is a stochastic signal segment. Let s(k), —00 < k < 00 be the random process corresponding to 8 and assume that it obeys an autoregressive (AR) model [2] of a known order. Let F (2) be the minimum- mean-square—error one-step linear predictor for 30:) [2]. Now, let us show how the 31 one-step linear prediction error filter for s(k), F'(z) = 1 — z"lF(z), can be employed as a detection filter. First, suppose that _s(k), —00 < k < 00 is at the input of F(z). The predictions made by F(z) will be close to the actual values. Therefore, when §(k) is at the input of F’ (2), the average output power will be lower than the average power of s(k). Next, assume that, a noise process n(k), —00 < k < 00 is at the input of F(z). F(z) will try to predict future values of 1_3_(k) as if they belonged to a realization of 8. Obviously, except by coincidence, the predictions will be erroneous. Thus, when n(k) is at the input of F’ (2), the average output power will be much higher than in the previous case. Hence when a noisy realization of S is at the input of F’ (2), the SNR at the output will be much higher than the SNR at the input; thus F’ (2) qualifies as a detection filter. 3.2.4 Autocorrelation Detector Another detection filter can be based on the autocorrelation functions of the signal segment to be detected and received signal. Let x(k) = s(k) +n(k), —00 < k < 00 be the received signal where 3(k) and n(k) are defined as above. Assume that s(k) is a realization of an ergodic wide-sense stationary random process whose autocorrelation function is p,(h), —00 < h < co and s(lc) and n(k) are uncorrelated sequences. Let p,,m(h) be the autocorrelation function of a frame of :c(n) starting at the time mN — L and ending at time mN, -00 < m < 00 where N and L satisfy 0 < N S L. In a similar fashion, we can define the functions p,,m(h) and pn,m(h) for s(k) and n(k). Because s(k) and n(k) are uncorrelated, p,,m(h) can be written as px,m(h) = p,,m(h) + pn,m(h). Noting that 3(k) is the realization of a wide-sense stationary random process, we have p,,m(h) z p,(h). Thus we obtain px,m(h) z p,(h) + pn,m(h), — 00 < h < 00. (3.2) 32 Let us define the autocorrelation detector as a filter whose output is given by y(m)— =h—2P pz,m(h) Ps( h) (3.3) when :r(n) is at its input. We shall call P > 0 the order of the autocorrelation detector. Note that the output rate of the autocorrelation detector is N times lower than its input rate. We see that y(m) is approximately the correlation among the sequences pz,m(h) and p,(h). Using (3.2), y(m) can be expressed as y(m)= y,(m) + 3111(th P.1(h) + h: Ps(h)Pn,m(h) (3'4) h=-P h=—P According to the analysis done in Section 2.2.1, the energy of s(k) is concentrated in narrow spectral peaks. Therefore, p,(h), h 2 0 must be a slowly decaying periodic sequence [2]. No similar statement can be made about pn,m(h); however, unless p, (h) and pn,m(h) are periodic with the same frequency — which is only possible in pathologic situations — the second term in (3.4) will be close to zero or negative. On the other hand, the first term of (3.4) is always positive. Thus, except under relatively low SNR’s we have y,(m) >> yn(m) for -00 < m < 00. Hence (3.3) meets the SN R improvement criterion and can be used as a detection filter. Equation (3.3) has an interesting explanation in frequency domain which clarifies the points made above [14]. Let ¢x,m(w), ¢,(w) and ¢n,m(w), —1r 3 w 3 7r be the power spectral densities of a:(k), s(k) and n(k), mN — L S k < mN. For P = oo, according Wiener-Kitchine Theorem [12], (3.3) can be written as y(m)= 51—]; ¢.,wm( )(¢.w =52]; ¢3(w)dw+—/_: ¢,.,.,.(w) ¢s(w)dw. (3. 5) Usually, the energy of 3(k) and n(k) are concentrated in non-overlapping regions of 33 the frequency spectrum. Therefore the second term of (3.5) will be close to zero, and its first term, which is always positive, will dominate y(m), except under low SNR’s. It can be shown that the autocorrelation detector is equivalent to the locally optimum detector for stochastic signals with unknown amplitude [14] 3.3 Decision Block In this section we design the decision block. The decision block must accomplish the following two tasks: Determine the decision instants (i.e. the starting and ending points of stationary signal segments) and compare the detection filter outputs to decide which stationary signal segment is present at the detector output. 3.3.1 Synchronization In developing a synchronization technique, there are two main issues we have to address: The first is the determination of optimal decision instants. The second concerns the removal of the infinite length stationary signal segment assumption made in designing the detection filters: How many samples should be used in making a decision? In this section, we answer these two questions. First, to get an insight to these two problems, we examine how they are solved in digital communications and explain why these approaches cannot be applied to the decision block of the alerting signal detector. This comparison will lead to a surprisingly simple synchronization scheme. Let us focus on the the determination of decision instants. In communications receivers, this is usually done using a clock extraction circuit which exploits the peri- odicity of the received signal to obtain the starting and ending instants of the symbols to be detected [3]. However, the structure of alerting signals is quite different than the symbol strings transmitted in a communications channel. Therefore, we cannot 34 apply synchronization schemes from classical detection theory; we must develop a technique which is compatible with the properties of the given alerting signals. In communications applications the second issue never poses a problem, because the symbols to be detected are of fixed and known length. On the other hand, each stationary signal segment (or “symbol”) we have to detect is of different length; fur- thermore the duration of some stationary signal segments varies with every realization. We must find a means of overcoming this difficulty. Having seen why standard synchronization schemes cannot be applied to the deci- sion block of the alerting signal detector, now let us concentrate on a major difference between the stationary signal segments we have to detect and symbols used in digital communications: Duration. In communications receivers, the importance of synchronization errors on per- formance is coupled with signaling speed. The faster the bit rate, the shorter the symbols, and therefore the more precise the clock extraction circuit must be. On the other hand, it can be shown that the effects of timing errors on performance di- minishes as the signaling rate decreases [3]. Since the duration of a stationary signal segment is at least quarter of a second long, its “signaling rate” is very low. In fact, it is so low that assigning the decision instants arbitrarily would presumably not affect the detection rate. To illustrate how we can exploit this conclusion to the development of a synchro- nization scheme, let us consider the detection of the Type I alerting signal consisting of a single stationary signal segment 8, shown in Figure 3.1a. As we see, 8 is ca. 7000 samples long. For comparison, in communications systems, the symbols to be detected are approximately 5-20 samples long. Now, suppose that we make a decision every 1000 samples, based on the last 1500 samples. In other words, the decision rule processes received signal using overlapping frames of 1500 samples. This point is illustrated in Figure 3.1b. As we see the ‘ I [1],.” [ 1 ILIII‘W ‘- 1] ”11‘, 1| l‘1[::lmljl '1 l Mill‘lllllil l'l ‘ ‘ltsl‘ ill! [11, millisl'i ‘ ’ — 1 0 1 0.00 2000 30.00 4000 50.00 60.00 7600 BO‘OO Time (norm—sec) (a) A typical stationary signal segment. 1 1 0.5 0.5 g s —0.5 —0-5 -1 n A _1 0 500 1000 1500 1000 2500 Time (norm—sec) Time (norm—sec) 1 - 1 1 q, 0.5 g 0.51 , IE; 0 l E 01.1 '1‘ ’ “WWW < —0.5 I < -0-5 -1 L —1 2000 2500 3000 3500 3000 3500 4 4500 “me (norm—sec) Time (norm-sec) 1 w - 1 - 0.5 0.5 g l ‘ l ‘ g 1]" s s E- 0| 1‘ lll‘y'il“ @- 0| I“, l' -0.5 “ -0-5 -1 . e —1 . 4000 4500 5000 5500 5000 5500 6000 6500 Time (norm—sec) Trrne (norm—sec) (b) Several consecutive overlapping frames. Figure 3.1. Framewise processing of a stationary signal segment. 36 decision logic may miss 8 in the first and last frames; however all of the other frames would be correctly detected. Thus the exact determination of starting and ending points of stationary signal segments is not necessary. This synchronization scheme is generalized as follows: Let :c(n), —00 < n < 00 be the received signal. Suppose that the shortest signal segment to be detected is Q samples long. The decision block will make a decision every N samples based on a frame of x(n) starting at time kN — L and ending at time kN where —oo < k < 00 and N and L are integers satisfying N S L < Q. In the above example Q is 7000 samples, N is 1000 samples and L is 1500 samples. Both N and L were chosen arbitrarily within the above given inequality. Due to the heuristic nature of this formulation, it is difficult to tell how different values of N and L will affect the detection rate; therefore no strict rule for selection of N and L can be given. The designer must choose the most convenient values by experimentation. 3.3.2 Decision Rule In this section, we investigate the difficulties encountered in developing a decision rule for the alerting signal detector and pr0pose a suitable decision scheme. As before, let 81, . . . ,SM be the stationary signal segments to be detected, and let F1, . . . , FM be the corresponding detection filters. Throughout this discussion, we shall assume that the detection filters are designed according to the first version of the SNR improvement criterion introduced in Section 3.2. Modification of the below obtained results to the second version of this optimality criterion follows immediately. According to the synchronization scheme developed above, the decision block must process the signals at the detector filter outputs framewise. Suppose that $(0),...,:1:(L - 1), an input signal frame, is at the input of the detection filters. 37 Let us denote the corresponding frames at the output of F, as y,-(0), . . . , y,-(L — 1), I 1 g i S M. We must compare these frames to make the following decisions: 0 Determine whether a stationary signal segment, a transition region, a region of silence, or just noise is present at the detector input. 0 If a stationary signal segment is present at the detector input, decide which one. A simple decision rule can be based on detector filter output frame energies. Suppose that the frame 3(0), 1(1),. . ., a:(L — 1) is at the input of the filters and let E3 = [”3 22(n) be its energy. Let E,- = 2:3 y,-2(n) the output energy of F}, n: 1 g i S M. Eirther suppose that the detection filters satisfy the following criteria: (i) When a noisy realization of S,- is at the input of the detection filters, W8h3V€E§>>Ejf0rj¢fiIS‘i,jSM. (ii) When only noise is at the input of detection filters, we have Ej << E2 for l 5 j 5 M. At first sight, it seems that filters with good SNR improvement factors would meet the above criteria. However, they will not unless their gains are chosen appr0priately. Because of the heuristic nature of the detection filter design procedure, no rule for gain selection can be given. The designer must adjust the filter gains based on simulation studies and characteristics of individual filters. Criteria (i) and (ii) immediately suggest the following decision rule which is an extension of the “choose maximum” scheme from classical detection theory [3]: L813 E; = max{E1, . . . ,EM} and Ej = max{E1, . . . , Ei_1,E.'+1, . . . ,EM}. Let a > 1 be an experimentally determined parameter which we shall 1 For the sake of simplicity, we overlook the fact that when a frame of length L is at the input of a filter, the output signal may be of different length. Additionally, the output of one of the detector structures proposed in Section 3.2 is a single number per input frame. The modification of the decision logic explained in this chapter to this detector is straightforward. 38 call as sensitivity parameter. If g; > a, decide in favor of 5,; otherwise conclude that no stationary signal segment is present at detector input. The basis for this technique is easy to understand: At low SNR’s when 5,- is present at detector input, E,- must be substantially higher than the output energies of other filters. On the other hand, if only noise is present at the detector input, all filter output energies must be close to each other. The ratio at is a measure of this closeness. If a is small, the detector will have a low miss rate but a high false alarm rate. If a is large, the opposite will occur. Thus the term sensitivity parameter for a. At high SN R’s, typical of the environments where the adaptive alerting device will to operate, a wide range of a values will result in a low miss and false alarm rate. This point will be verified by simulation in Chapter 4. 3.4 Postprocessor The duty of the postprocessor is to complete the job of the decision block and translate the detected stationary signal segments into alerting signals. The degree of sophisti- cation required in the postprocessor depends on the reliability of the decision block. To compensate for a high error rate by the decision block, the postprocessor can ex- ploit the length and temporal order of the detected stationary signal segments. For example, algorithms from syntactic pattern recognition could be used to this end. On the other hand, if our confidence in the decision block is high, the translation procedure can be accomplished relatively easily. In this case, the following algorithm, which is a generalized majority selection rule, can be used: At every decision instant, consider the last P signal segments detected. If at least 0 percent of them belong to a given alerting signal, declare its presence at the detector input. Otherwise conclude no alerting signal present. 39 As all parameters we introduced in this chapter, P and 0 must be experimentally determined. This postprocessor algorithm is strikingly simple; therefore we should first evaluate its performance in combination with the techniques developed in the previous pages; and only if we do not obtain satisfactory results we should attempt to develop a more complicated but robust postprocessor block. CHAPTER 4 Performance Evaluation 4.1 Introduction In this section, we report the results of simulation studies that evaluate the strengths and weaknesses of the various alternative approaches considered, and draw conclusions about the appropriate detector structure to use in the ultimate device. There are two levels at which individual detection filter design strategies can be benchmarked: At the lower level, the success of a particular filter is determined by its SN R improvement factor. At the higher level, the performance is specified by the detection rate obtained. In Section 4.2, we report results of simulations used to obtain the SN R improve- ment factor achieved by the filter design techniques presented in Section 3.2. In Section 4.3, we quantify the performance of detector strategies based on the forego- ing developments. 40 41 4.2 Benchmarking with the SN R Improvement Factor 4.2.1 Formal Developments Let F be a detection filter designed for a stationary signal segment 8. Let s(k), -00 < k < 00 be a realization of the eternal extension of S, and let n(k), — 00 < k < 00 be a noise sequence. Without loss of generality, we assume that the signal and noise are both unit power sequences. Additionally, we assume that the two processes are mutually uncorrelated sequences. Let y,(k), — 00 < k < 00 and yn(k), - 00 < k < 00 be the response of F to the signal and noise, respectively. Assume that the signal 11(11): (1;, s(k) + J13; 1101), — 00 < k < 00 (4.1) is at the input of F where P, and P" denote the power of signal and noise components. Thus we have SNR,~,, = 10 log 11}. (4.2) Next, let us derive expressions for SNR“, and ISNRout — SNRinI. The forms of these expressions depend on the characteristics of F. The filters derived in Section 3.2 can [be grouped in two categories: Linear filters (matched filter, prediction error filter, and eigenfilter) and the quadratic autocorrelation detector. We treat these cases separately: Linear Detection Filter. The output of F can be expressed as We) = «51.3) + 751.0). — oo < 1. < —oo. (43) 42 Thus Ps Egg—co y 2(k) SN u = 1010 3 4.4 R0 t g Pn Zkz—oo yn(k) ( ) and 00 2 |SNR.,,,, — SNR,,,| = [10 log 2"?” 11.06) (4.5) 221.... 113(k) ' Quadratic Detection Filter. The response of a quadratic filter to input sequence :r(k), — 00 < k < 00 can be expressed as y(k)= P,y,() + Pug/"(k )+ VP, P n