_.,‘. ,, ,--, ‘..~‘:V’“'"T., 1-15.1 . ..,~ ~. ' JQIC‘? ' ' ‘ I ‘3. W O .'P' $J.‘ "‘1‘“ _ . €2.03; '5. 1 . » ' - 4 r ‘_ . ‘{ QKRI t r "‘1.- ' ' " ' ‘ .y ;.~Vw::5‘$'m '1'." ." v “ P , .‘ ‘. I1 .43! > xx" .55.- wow. 7"..- 5‘L\ 5‘17“?" '.;.";’ ‘wf | —..;u .__.- .. 3;: 't. SQ}. ' :4 ’JA 0: ti. “ (ff-".3 :I-r‘m‘u- a w v .-.u ' I f- 5 ”g”, '1“ , ,: ff? .v ri’fi _‘ -.. 1... fft'v‘s , ‘2‘, Layqfi‘ ' ,5} r. 1" ... . . E???” 3'“. ".143?! ~ *3"; W"! L 1.? - ., ‘43:. (not @143" 3:; 5 ,r_ .." 44R .5 x. .no. -y.~»‘.;~a..mi~1.‘ ‘uu.W‘-p—o.w.u.Lawninnflnwmsnuy» v -.... A“- .x. Wimfiv .m‘g“; ‘5. fix": L, Q .l‘" 5-.1-‘6é9 5": 1’ ”93‘ 4‘ gt " - .5 . if?!) -.".‘.t; 3.72:: .- 1 3:6! 756 a: lap}?! ”3'12". 1: .‘ :L‘u . -' Jig-,5 F ,3 .7: "85$: ' "gafi‘g 5354‘ r. I 4 ,. . ‘ ‘11": n z. ’7: L‘. ,. 3!. .. "3‘93 . . .‘Af: - “lax-:33“, 3’: [ .1 a; T. b '1 V 1 £35.?" r." .. 351:... 2 air" . P' u \‘vh‘. . f ‘ jg t»‘ 13‘ «3; ’l‘.l’ .. 'yalm ‘ I v I'. . v.7 .o ' . "1"”.1 hunky?" , 1"” v ‘1: ‘ d c-l‘r": C.rrf::':af 5" ”L 1'19 W25: 3'. r , 5., ,1 . r; .‘ '9" . ‘“ .. 4%.;‘4372‘. ‘« .73.: 5. $ ‘5 M ME— .2555; I; J»? .. —:- ENU IVERSI TY LIB IIIIIIIIIIIIIIIIIIIIIIII|III|| II 31293 00891 9288 This is to certify that the thesis entitled EFFICIENT DISCRETE SYMBOL HIDDEN MARKOV MODEL EVALUATION USING TRANSFORMATION AND STATE REDUCTION presented by Ross Kenneth Snider has been accepted towards fulfillment of the requirements for Master's degree in Electrical Engineering Emfétwbcufl‘y Date q 710‘“ 40 0-7 539 MS U is an Affirmative Action/Equal Opportunity Institution I LIBRARY Michigan State I L Unlverelty I PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or betore date due. _____I MSU le An Affirmative Action/Equal Opportunity Institution exIMms-OJ EFFICIENT DISCRETE SYMBOL HIDDEN MARKOV MODEL EVALUATION USING TRANSFORMATION AND STATE REDUCTION By Ross Kenneth Snider A THESIS Submitted to Michigan State University in partial fulfillment of the requirements fiartheckqpeecfi MASTER OF SCIENCE Department of Electrical Engineering 1990 ABSTRACT EFFICIENT DISCRETE SYMBOL HIDDEN MARKOV MODEL EVALUATION USING TRANSFORMATION AND STATE REDUCTION By Ross Kenneth Snider A procedure is developed for evaluating the likelihood of a hidden Markov model (HMM) using only 0([1 - x]N T) floating point operations, where N is the number of states in the model, T is the number of observations derived from an utterance, and x is the degree of computational complexity reduction which can be close to unity. Effective recognition of speech using conventional algorithms require 0(3N T) (Bakis model) to 0(N’T) (ergodic model) operations to evaluate an HMM. Reduction in computational complexity is required for near-real-time recognition algorithms to be feasible on ordinary personal computers. The motivation for this reduction is the development of a PC-based speech recognition system for disabled individuals. This thesis will explore the tradeofi' between computational complexity and recognition rate for various speakers, vocabularies, and compression techniques. ACKNOWLEDGEMENTS The idea for the state space formulation and transformation contained in this thesis was suggested by my advisor Dr. John Deller J r., who also made numerous helpful suggestions in the research and also in the write up. This work was supported by the Whitaker Foundation. iii Contents Introduction and Background Computational Complexity Reduction 2.1 Complexity Reduction by Transformation ................ 2.2 Complexity Reduction by Compression ................. Experimental Results 3.1 Speakers and Vocabularies ........................ 3.2 Data Handling and Model Training ................... 3.3 Recognition Rates before Compression ................. 3.4 Eigenvalues ...... ' .......................... 3.5 Nearest Neighbor Compression ...................... 3.6 “Linearized” Nearest Neighbor Compression .............. 3.7 Bottom Up Compression ......................... Conclusions Program 1 - Wavemark Program 2 - Cepstrum Program 3 - Codebook Program 4 - Quantize Program 5 - Hmmtrain Program 6 - “.m files” iv 11 11 14 15 19 21 28 35 41 45 67 75 83 89 108 G Program 7 - Evalhmms 110 H Program 8 - Compress 116 List of Tables MACON Vocabularies 1 and 2 ............................ 12 Vocabularies 3-5. ............................. 13 Recognition Rates for Vocabularies 1 and 2. .............. l8 Recognition Rates for Vocabularies 3 and 4. .............. 18 Recognition Rates for Vocabulary 5 .................... 18 vi List of Figures 1000-40301vath [\DMMHHHHHHHHHH NHOCDOOKJOBO‘erwr—‘O Computation of state probability vector before transformation. . . . . Computation of state probability vector after transformation ...... Transfer function of anti-aliasing filter. ................. Four state Bakis model. ......................... Four state ergodic model. ........................ Eigenvalues of transformed HMM’s from Vocabulary 1. ........ Nearest neighbor compression for Vocabulary l. ............ Nearest neighbor compression for Vocabulary 2. ............ Nearest neighbor compression for Vocabulary 3. ............ Nearest neighbor compression for Vocabulary 4. ............ Nearest neighbor compression for Vocabulary 5. ............ “Linearized” eigenvalues of transformed HMM’s ............. “Linearized nearest neighbor compression for Vocabulary 1 ....... “Linearized nearest neighbor compression for Vocabulary 2 ....... “Linearized nearest neighbor compression for Vocabulary 3 ....... “Linearized nearest neighbor compression for Vocabulary 4 ....... “Linearized nearest neighbor compression for Vocabulary 5 ....... Bottom up compression for Vocabulary 1. ............... Bottom up compression for Vocabulary 2. ............... Bottom up compression for Vocabulary 3. ............... Bottom up compression for Vocabulary 4. ............... Bottom up compression for Vocabulary 5. ............... vii 16 17 17 20 23 24 25 26 27 29 30 31 32 33 34 1 Introduction and Background The general goal of this research is to assist in the development of a site within the Speech Processing Laboratory that will accept speech data from speech disabled clients at clinical centers, and return to these persons a customized speech recognition software package, implementable on a relatively ordinary personal computer (PC). The precise technical form of this recognizer will depend upon a particular individual’s needs and purposes and will not be further addressed here. The specific goal of the work presented in this thesis is to provide a recognition technique that will be used in the recognition software that is faster than the existing algorithms used in evaluating discrete symbol hidden Markov models (HMM). This research is restricted to hidden Markov modeling of isolated words, i.e. speech with deliberate pauses between words. The reason for this is that cerebral palsy speech is not well understood and using it in the continous speech case would involve more unknowns than for the isolated .word case. Thus this research addresses the simpler case, though nothing restricts the techniques reported in this thesis to the isolated word case. Though the end goal of this research is to assist speech disabled indi- viduals, to compare the technique in this thesis with existing methods of evaluating HMM’s, normal speech is used. Encouraging isolated word recognition results for speech of cerebral palsied indi- viduals with a spectrum of articulatory abilities were recently reported [1, 2, 3]. The reason the PC is being used is to maximize accessibility of these developments to users with disabilities. Customized architectures or mainframe computing environments are not generally available to the user of a prospective speech recognition device. Since the constraint of the PC is being used, measures must be taken to speed up existing recognition algorithms, particularly those involved in feature extraction and HMM decoding. It is the HMM search with which this thesis is primarily concerned. The Baum-Welsh “forward-backward” recursions (see, e.g., [4]) in evaluating an ergodic HMM (a fully connected model) require 0(N2T) operations where N is the number of states in the model and T is the number of observations extracted from the given utterance. While this represents a remarkable improvement over the 0(NT) operations required for a “brute force” procedure, it still does not allow for near- real-time evaluation of a realistic number of HMM’s once we have the observations representing a given utterance. For example, a floating point operation (flop)1 requires about 40 us on our laboratory’s 16 MHz, 80386-based PC equipped with a floating point coprocessor. The evaluation of an N = 6 state HMM with respect to an utterance represented by T = 100 observations requires about N 2T = 3600 flOps, or 0.144 s. If 100 HMM’s are to be evaluated in some time slot, then the time requirement for recognition of this single word is 14.4 s which is unacceptable. Even if a 0(3N T) method is used, the recognition would take 7.2 s. This thesis will demonstrate a method where 0(N2T) operations needed to evaluate an HMM can be reduced to 0([1 — n]N T) where x is the compression index and is chosen on the basis of the desired tradeoff between computational complexity and recognition rate. Thus the value of x is chosen on an experimental basis. If, for example, K. were chosen to be 0.9, then in the above example l-ldN T) flops would be necessary for each HMM and the time to recognize a word would be .24 3 compared with 14.4 s for (N 2T), or 7.2 s for lA flop is taken to be one multiplication plus one addition. 0(3N T), flops per model. This is “real-time,” given the above vocabulary size, and further reductions are possible with faster clock speeds or other existing technologies which are quickly becoming “ordinary,” or with further theoretical development. 2 Computational Complexity Reduction 2.1 Complexity Reduction by Transformation In the following discourse it is assumed that the reader is familiar with hidden Markov modeling and how the likelihood evaluation of an HMM is computed. The reader is referred to [4, 5, 16] for more details. It should be pointed out that this thesis is dealing with discrete symbol HMM’s as opposed to other variants such as continuous mixture density HMM’s (see, e.g., [15]). While it is not conventional to do so, the HMM is viewed here in a state-space- like formulation. Suppose a model, M, has N states (1;, i = 1,2,...,N and M discrete observation symbols 2],, lc = 1,2,. . . ,M. At discrete observation time t, let us define the state probability vector, x(t), and the observation probability vector, y(t), as follows: xT(t) é [31(t) 2:2(t) xN(t)] (1) yT(t) *- [y1(t) y2(t) WW] (2) where, 2:;(t) = (the probability of being in q.- at discrete time t given the model M ) = P(q,- at t | M ), and yk(t) = (the probability of generating symbol 2;, at discrete time t given M ) = P(zk at t | M). It is not difficult to see that the following state equation and observation equation describe the dynamics of the model: x(t + 1) = Ax(t) + u(t) 6(t) (3) y(t) = BXU) (4) where, A is the N x N state transition matrix associated with the HMM whose (i,j) element, aij = P(q,- at t + 1 |qj at t) for any t; B is the M x N observation probability matrix whose (k, j) element, bkj = P(2z;c | Qj); and u(0) is some vector such that when x(O) is defined to be zero, x(l) takes the proper initial values, with u(t) arbitrary but finite Vt yé 0, and 6(t) the unit sample sequence. Note carefully that if the observation symbol at time t is 2],, then only element yk(t) of y(t) need be calculated. To derive a more efficient computational structure, we first apply a diagonalizing equivalence transformation (e.g., see [6, pp.146-154]) to obtain x(t+1) = Ax(t)+fi(t)6(t) (5) y(t) Bib) (6) where A = UPAP"1U'1 with P the usual matrix of normalized eigenvectors, U a special diagonal matrix described below, 2(t) = UPx(t), fi(t) = UPu(t), and B = BP‘lU’l. The 2"" diagonal element of the matrix U is the reciprocal of the 2"" element of the vector Pu(0). As a consequence of this operation each element of the excitation vector, fi(t), is unity at time zero. This result will be important below. The diagonalizing transformation has reduced the computation to 0(N T) oper- ations per HMM since A is a diagonal matrix, and since only N flops are needed to compute a relevant yk(t). Note that the systems (3) - (4) and (5) - (6) are equivalent and therefore this reduction has no impact on the recognition rate. It should be pointed out, however, that in computing the state space structure, a summing pro- cedure is used rather than a Viterbi search (see, e.g., [16]). In the method employed here, the likelihood that the HMM produced the observation sequence (using any sequence of states) is computed whereas in the Viterbi decoding approach, the likeli- hood associated with the best state sequence is assigned to the model. The summing procedure is used because it admits complexity reduction that is not possible with Viterbi decoding. A graphical representation of the diagonalizing procedure can be seen in Figures 1 and 2. Figure 1 shows how the state probability vector is computed based on equation (3) before the transformation is applied. The computation of the state probability vector after the transformation is shown in Fig. 2, which is based on equation (5). The ag’s in Fig. 2 are the eigenvalues of the matrix A and are thought of as the poles of the state space system describing the HMM. 110+!) Xi“) n20) . . . O . u I) IN(I+I) “Hal Na 2 A z" _= ‘M — ‘N2 0 e e ‘NN A Figure 1: Computation of state probability vector before transformation. an il(‘+l) i‘“) 2 ‘ z'I % \ a! ‘ 5(1) i2(t+ I) i2“) ———O( 2 bf ‘ 1“ 4v 0 \ 0 e “2 5a) iN(I+l) iNa) Figure 2: Computation of state probability vector after transformation. The ai’s are the eigenvalues of the matrix A. 2.2 Complexity Reduction by Compression To demonstrate further computational benefit, it is convenient to look at all the HMM’s representing the words in the vocabulary at once. To do this the HMM’s are combined into one large “state space” formulation. This is done as follows. Suppose there are W total models or words in the vocabulary. Let us construct “global” state and observation equations 52(t+1) = Ax(t)+fi(t)6(t) (7) W) = 3i“) (8) in which x(t) is a vector of dimension NW consisting of the concatenation of the W individual state vectors, y(t) is a similarly formed M W-vector, A is a block diagonal matrix of dimension NW x NW with the individual state transition matrices, A, running down its diagonal, and B is a. similarly formed NW x MW matrix on the output side. fi(t) is the obvious N W-vector of ones at t = 0 and is arbitrary otherwise. The diagonal elements of A, say 5“, are the eigenvalues of the original A matrices or the poles of the inputoutput description of the HMM. Accordingly, these numbers are bounded in magnitude as 0 S I 6;; | _<_ 1. Empirically, we find that these numbers are often real (or have very small imaginary parts), and, if they represent “non-negligible” poles (those with significant magnitude) of the system, they are often clustered near 2 = 1 in the z-plane. This led us to consider whether, with some degree of quantization of the 71,-,- coefficients, some of the state variables could be combined, eliminating the computational burden of nearly redundant states. If, for example, 6;,- z szj then, since both state variables represent similar transient responses to a unit samplez, it is possible to eliminate fig-(t), and reduce the size of the state space. The appropriate adjustment to the observation equation is to add column j of B to column 2', then eliminate column j. Thus a computational savings is realized in both the state and observation equations. Note carefully that the merging of states may take place even if the states are from difl'erent HMM’s! Several strategies for compression will be described in Section 3.1. We now define the compression index K. This is given as ___ AEliminated (9) ATotal where Agumgnated is the number of eigenvalues merged together and ATM; is the total number of eigenvalues in all the HMM’s, i.e. Arm; is equal to NW since one eigenvalue results from each state in each model. Notice that n can never equal unity since if 2Here we see the significance of the matrix U above. all the eigenvalues were merged there would still be one “eigenvalue” left, thus is is bounded as 0 s ,c s M (10) /\Total where K. = 0 when no eigenvalues are eliminated. Further computational efficiency is achieved by quantizing the “bk,” coefficients, though the complexity reduction is much less significant than that in the quantization of the 6;,- coefficients. This is because the “bk,” coefficients are not used at every observation whereas all of the 6;; coefficients are. The “bkj” coefficients are used only when the observation symbol is the k“ one. For formal reasons it is convenient to decompose the observation equation into individual equations for each symbol, 2).. For k =1,2,...,M wehave, Mt) = fitDtiu) (11) where B 1. represents the W rows of B (with the column modifications indicated above) used in computing likelihoods for 21,. These likelihoods are contained in the W-vector, mt). D), is a square matrix whose function will become clear below. Immediately following the state space reduction procedure, each D), is simply an identity matrix with the dimension of the reduced state space. Further reduction is possible if the entries in any two columns of a particular row of B1,, say in columns I and m, are the same. It is more efficient to add the corresponding state variables first, then multiply the sum by the coefficient value once. This is accomplished by changing the element (1, m) of D). from zero to one, and the redundant “b” coefficient in column m of B). to zero. The inclusion of the matrix D], sets up an appropriately modified state vector. 10 3 ‘ Experimental Results 3.1 Speakers and Vocabularies The speakers for the following recognition studies were two normal adult males, here- after refered to Speaker 1 and Speaker 2. Speaker 1,2 will refer to the combined models of Speakers 1 and 2. By combining the models for Speaker 1,2 we are dealing with a speaker independent case whereas for Speaker 1 and Speaker 2 we are dealing with speaker dependent cases. The vocabularies were chosen for the following reasons. Vocabularies 2 and 4 are the Texas Instruments “TI-20” and “TI-46” Vocabularies, respectively. These are standard vocabularies used in isolated word recognition tasks and were chosen so that results could be compared with other research using these data bases. Vocabulary 3 was chosen to see what confusion exists among the similar sounding letters of Vocabulary 4 (e.g. b,c,d,e,g,p,t,v,z ). Vocabulary 5 consists of words taken from the “Grandfather passage” used by linguists and speech pathologists (see, e.g., [12]) because of the number of phonetically balanced words contained therein. Vocabulary 1 is the list of all the words contained in Vocabularies 2-5, with the exception of the word you which is indistinguishable from the letter u. Each of the compression experiments described below was performed for each of the three “speakers” on each vocabulary. 11 Vocabularies 2 a five 11 swiftly one about four nearly t two all frock nine thinks three an g ninety three three four as go no two five b grandfather o u six beard h old usually seven black he one v eight buttons help p w nine c himself q well zero clings i r wish start coat in repeat x stop d is rubout y yes dresses j 8 years no e k seven yes go eight 1 several yet help enter long six 2 erase erase m start zero rubout ever missing still repeat f my stop enter Table 1: Vocabularies l and 2. l2 Vocabularies 3 4 5 a one g about thinks b two h all usually c three i an well d four j as wish e five k beard years f six 1 black yet g seven m buttons you b eight n clings i nine 0 coat j zero . p dresses k start q ever 1 stop r frock m yes s grandfather n no t he 0 go u himself p help v in q erase w is r rubout x long 3 repeat y missing t enter z my u' a nearly v b ninety three w c old x (1 several y e still 2 f swiftly Table 2: Vocabularies 3-5. 13 3.2 Data Handling and Model Training To train the HMM’s the follow procedure was used. First 15 utterances of each word were recorded on analog tape using a Sony F-510 microphone and a Teac W-45OR tape deck with a Dolby C noise reduction system turned on. The order of the ut- terances was randomized for Vocabulary 5 whereas for Vocabulary 4 the utterances were given in the order as listed in Table 2. A 40 megabyte hard disk file was created and all the utterances were digitized to the file using the Metrabyte Streamer soft- ware and DASI6F sampling board after passing through an anti-aliasing filter and a NAD 3130 stereo amplifier. The utterances were sampled at 10 Khz after being bandlimited to 4.5 Khz. The transfer function of the bandlimiting filter used is found in Fig. 3. Each utterance was then viewed graphically to mark the beginning and ending point and then written to a separate file. This extraction was accomplished by the program Wavemark which is listed as Program 1 in Appendix A and which was developed as part of this effort. A Hamming window (see, e.g., [10]) 25.6 ms wide was used every 12.8 ms on the speech data. From each window, six mel-frequency cepstrum coefficients were computed since this parametric representation has been shown to give superior recognition performance over other types of parametric rep- resentations (see, e.g., [11]). These coefficients were computed using the program Cepstrum which is Program 2 in Appendix B. A codebook of 128 symbols was then produced using only the first 10 utterances of each word. A separate codebook for each speaker and also a codebook combining both speakers were produced. Pro- gram 3 in Appendix C entitled 'Codebook was used for this purpose. The codebooks were implemented in a binary tree fashion (see, e.g., [18]). The cepstrum coefficients were then quantized into symbol strings using the codebook. The program Quan- tize, which was used to do the quantization, appears as Program 4 in Appendix D. 14 Hidden Markov models were then trained for each word in the vocabulary using the first 10 utterances (i.e. symbol strings). For each word three models were trained, one for Speaker 1, another for Speaker 2 and the final for Speaker 1,2. The HMM’s were trained using the conventional Baum-Welsh algorithm with appropriate scal- ing (see, e.g., [8]). The model used for training was the Bakis model [16] with six states, as opposed to the ergodic model which is fully interconnected. Ergodic mod- els have been used in the dysarthic speech case (see, e.g., [2]). Figure 4 shows the Bakis model while Fig. 5 shows the Ergodic model. The program Hmmtrain was used in training the models. This appears as Program 5 in Appendix E. The soft- ware package Matlab (Math Works, Inc.) was then used in conjuction with the “.m files” trnsfhmm.m and autotrns.m to transform the HMM’s as in Section 2.1. These “.m files” are listed under Program 6 in Appendix F. Utterances 11-15 were used to test the models. Program Evalhmms, listed as Program 7 in Appendix G, was used in the evaluation. The results of the evaluations are found in Section 3.3. The pro- gram to implement the compressions described in Sections 3.5 through 3.7 is entitled Compress and is listed as Program 8 in Appendix H. 3.3 Recognition Rates before Compression Tables 3 through 5 list the recognition rates for the various vocabularies after the transformation and before compression. These are the baseline recognition rates since no loss in the recognition rate has occured due to compression. A comparison to a study on the recognition of digits from the literature is now given. Rabiner in [16] gives a recognition rate of 96.3% for the recognition of digits in a speaker independent case (100 speakers). Vocabulary 2 which has the digits plus 10 other words is the closest to this case. The speaker dependent cases of Vocabulary 2 have higher recognition rates 15 68 4th Order Bandpass Butterworth Filter 20 ‘m ‘ Li I DO 0 U C! HOODED 15 -‘ (5' 10.. D? U C] D _5—4 -15 .. T I l I “I _,.I i I I Log( Frequency in Hz) Figure 3: Transfer function of anti-aliasing filter. 16 Figure 5: Four state ergodic model. 17 Vocabulary l Vocabulary 2 Speaker 1 Speaker 2 Speaker 1,2 Speaker 1 Speaker 2 Speaker 1,2 91.79% 90.00% 81.67% 98.00% 97.00% 88.50% Table 3: Recognition Rates for Vocabularies 1 and 2. Vocabulary 3 Vocabulary 4 Speaker 1 Speaker 2 Speaker 1,2 Speaker 1 Speaker 2 Speaker 1, 2 90.77% 88.46% 82.31% 93.48% 89.13% 81.52% than 96.3% because in general a speaker dependent case results in better recognition than a speaker independent case. If we look at Speaker 1,2 which is the speaker independent case, the recognition rate is 88.5%, which is for the digits plus 10 other words (which results in more confusions). When the vocabulary is limited to the digits for Speaker 1,2, the recognition rate increases to 93.0%. A likely reason why this is lower then 96.3% is because there were fewer training samples for Speaker 1,2 than in [16]. The recognition rate for Speaker 1 (a speaker dependent case) on the digits was 100%. This recognition rate is expected since it is a speaker dependent Table 4: Recognition Rates for Vocabularies 3 and 4. case as opposed to the speaker independent case. Vocabulary 5 Speaker 1 Speaker 2 Speaker 1,2 96.97% 96.36% 92. 42% Table 5: Recognition Rates for Vocabulary 5. 18 3.4 Eigenvalues In Fig. 6 we see the eigenvalues of the transformed HMM’s for Vocabulary 1. Notice that a sixth of the eigenvalues are unity. This is a result of using a six state Bakis model which has an ending state (the final state has probability one of jumping to itself). In fact, any model with an ending state will have an eigenvalue of unity. The proof of this is as follows. A Bakis model corresponds to a matrix representation of the form I. an an an 0 0 0 0 . . . 0 ‘ 0 an an» an 0 0 0 . . . 0 0 0 an an an 0 0 0 A = (12) 0 . . . 0 0 0 0 an_g,,,_2 an_2,,,_1 an_2,,, 0 . . . 0 0 0 0 0 an-1,n_1 and," . 0 . . . 0 0 0 0 0 0 am, ‘ Since A represents the state transition probabilities, the sum of each row must equal one. Therefore a,m = 1. The eigenvalues are determined by detox — A) = o (13) Now det(/\I — A) can be expressed as (see, e.g., [14]) det()II — A) = (A - an”) det()\I,,,, — Am.) (14) 19 Eigenvalues of Transformed Hidden Markov Models l '- —1 0.8 - 'I O 6 " ’a .— 0 s :3 s s? 0.4 " f’i; " : .' .- ')0’ 0.2 '- :& -1 O —- - O 50 100 150 200 250 300 350 400 450 500 Nominal - 474 Eigenvalues Figure 6: Eigenvalues of all transformed HMM’s. Top curve - Speaker 1,2; Middle curve - Speaker 1; Bottom curve - Speaker 2. or (since a,m = 1), det()\I — A) = (A — 1) det(AI.,,, — A...) (15) where (AIM - Ann) is obtained by striking out row n and column n in (AI — A). The eigenvalues of A are the roots of det(AI - A) = 0 which, from (15) clearly include A = 1. Therefore each model has at least one unity eigenvalue for this model structure. 20 3.5 Nearest Neighbor Compression As noted in Section 2.2 the eigenvalues that are close together represent similar tran- sient responses to a unit sample input. Thus to gain further computation savings these eigenvalues need to be grouped together in some fashion. In this section we will explore what happens if we combine them together in a nearest neighbor procedure (see, e.g., [13]). By nearest neighbor we mean eigenvalues which are closest in the euclidean sense. The nearest neighbor procedure works as follows. Say for example we have a vocabulary of 100 words and are using a six state Bakis model. We then have 600 eigenvalues to compress together. So the nearest neighbor procedure would first find the two eigenvalues that are closest together out of all 600 and replace them by their mean. A recognition experiment would then take place to see what happened to the recognition rate as a result of combining these two eigenvalues. The procedure then continues by finding the next two closest eigenvalues. Notice after this point that a certain “eigenvalue” could be the mean of a previous combining. If so, then the new mean would be that of the original eigenvalues represented by this “eigenvalue” and the eigenvalue combining with it. This continues until all the eigenvalues are grouped into one value, thus the final value equals the mean of all 600 eigenvalues. The last recognition experiment is then run to see the effect of replacing all the eigenvalues by this mean. From Section 3.4 we notice in the above example that all 100 models have an eigenvalue that equals one. The significance of this is that 100 eigenvalues have nearest neighbors of distance zero and will be combined with no effective change in value which results in no change in the recognition rate. Thus a compression of n = 0.166 occurs with no loss in the recognition rate. Figures 7 through 11 show how the recognition rate is affected by compression using the nearest neighbor method for 21 the Vocabularies 1-5. The solid line represents the recognition rates for Speaker 1, the dashed line for Speaker 2, and the dotted line for-Speaker 1,2. Looking at Fig. 7 which represents Vocabulary 1, there are 468 eigenvalues (78 words with six states each) to compress. A compression of zero means nothing has been done to the eigenvalues while a compression of one means that all the eigenvalues have been replaced by the mean of the 468 eigenvalues. In terms of k, a compression of zero implies K. = O and if the 468 eigenvalues are replaced by the population means, n: = 0.998. The reader is refered to page 9 for the definition of 1c. The most noticable feature in Fig. 7 is that the compression has a much more detrimental affect on Speaker 1,2 then on Speakers 1 or 2. All the recognition rates in Fig. 7 stay constant initially because all the identical eigenvalues are being combined as discussed earlier. The recognition rate for Speaker 2 drops about 7% after a compression of n = 0.2 and then remains somewhat constant until a compression of k = 0.8. Speaker 1 on the other hand just gradually degrades to the approximate 7% drop for a compression of k = 0.8. What this means in terms of computation is that if a 7% degradation in the recognition rate is acceptable (K. = 0.8) for Speaker 2 with this 78 word vocabulary, the computation needed to recognize an utterance 1.28 seconds long (100 observation symbols) would be reduced from 140,400 flops (5.6 s on the 16 MHz PC for Bakis model before transformation) to 9360 flops (0.37 s on the PC). If a 25% drOp in the recognition rate was acceptable, the compression would then be K. = 0.98. This would further reduce the computation from the 140,400 flops to 936 flops. In Fig. 8 for Vocabulary 2 we have 120 eigenvalues to compress. The results are similar to Fig. 7 except for Speaker 1 where a compression of K. = 0.9 results in a degradation of 10% in the recognition rate. Vocabulary 3 with 156 eigenvalues shown 3The compression index I: approaches 1 as all the eigenvalues are replaced by the population mean and as the number of eigenvalues approaches infinity. 22 Recognition vs Compression for Vocabulary 1 100 . . . m ,8 .32 r 30 ~ ‘ 20 ]. _ 10 - 4 o L s m 1 J O 0.2 0.4 0.6 0.8 1 1.2 Compression Figure 7: Nearest neighbor compression for Vocabulary 1. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line — Speaker 1,2. in Fig. 9 shows that the compression has a much more pronounced affect than for the previous vocabularies, especially for Speaker 2 and Speaker 1,2. The reason for this is that there are many letters (e.g. b,c,d,g,p,t,v,z) which are confused easily. Thus distinct eigenvalues are apparently important for similar words. The results of compression for Vocabulary 4 (276 eignevalues) is seen in Fig. 10. Here we see that Speaker 1 loses only 5% in the recognition rate for a compression of k = 0.65, but then falls off. Speaker 2 as in Fig. 9 drops off 10% but then remains somewhat constant out to a compression of K. = 0.8 where it drops off. Speaker 1,2 does not perform well with the compression greater than k = 0.3. Vocabulary 5 (198 eigenvalues) in Fig. 11 shows that all speakers have a similar performance at a compression of ft = 0.7. This is due to the vocabulary of words that are much more dissimilar than the words in the other vocabularies. The dissimilar words are not 23 Recognition vs Compression for Vocabulary 2 100 v A v I --------------- \--—-‘ .-- o----—---- \c.-- ....--. ‘----- 80— 70- 50. Recognition Rate 40 P _ 30 4 ~ 20 - K J 10 I . O 0.2 0.4 0.6 0.8 1 1.2 Compression Figure 8: Nearest neighbor compression for Vocabulary 2. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. confused as easily and get better compression than, say, in Vocabulary 3 (Fig. 9). This is especially noticable for Speaker 1,2. 24 Recognition vs Compression for Vocabulary 3 100 . . , 90 : ........... .. a 80 ._. ....................... .. q 70 - "-2. ----------------- - :3 so - ........... ' - g, 40 _ ......... 4.. _ 3O - .1 20+ ‘ 10 - . 00 0:2 0.3 0:6 03 i 1.2 Compression Figure 9: Nearest neighbor compression for Vocabulary 3. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 25 Recognition Rate - Recognition vs Compression for Vocabulary 4 100 T . . . 50L- Rccogniliou Rate BOI- 20- 10- O 0.2 0.4 0.6 0.8 1 Compression Figure 10: Nearest neighbor compression for Vocabulary 4. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 26 1.2 U—flV— CAv—u-Civuuz Recognition vs Compression for Vocabulary 5 Recognition Rate O 0.2 0.4 0.6 0.8 1 Compression Figure 11: Nearest neighbor compression for Vocabulary 5. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 27 1.2 PX be lit in 3.6 “Linearized” Nearest Neighbor Compression If we look at Fig. 6 we can see that the values approach 0.95 in a “logarithmic” sense. Thus the eigenvalues that were combined first in the nearest neighbor compression were those close to 0.95. The question arises as to whether the eigenvalues close to 0.95 and close to each other should be combined so soon. Thus we attempt to “linearize” the eigenvalues in an attempt to combined the eigenvalues in a more uniform fashion. To do the “linearization” the following function is used: A ... Anew = Ellis—.32 (16) where Anew is the modified eigenvalue. The new eigenvalues are then treated just as the eigenvalues were in Section 3.5 in the same nearest neighbor fashion with identical values combined first. The new eigenvalues are used only for the distance measure while the old eigenvalues are used to calculate the new means. Figure 12 shows the result of the “linearization”. Figures 13 through 17 show how the recognition rate is affected by compression using the “linearized” nearest neighbor method for the Vocabularies 1-5. The solid line represents the recognition rates for Speaker 1, the dashed line for Speaker 2, and the dotted line for Speaker 1,2. The figures for the “linearized” nearest neighbor compression are very similar to the nearest neighbor case. For Speakers 1 and 2 the “linearized” method is slightly better since it has a slightly better compression for the same recognition rate compared with the nearest neighbor method. The figures look similar except that the figures in the “linearized” method are “stretched” a little and this is most noticable at the “knee,” where the recognition rate plunges. For Speaker ”1,2 the effect is just the opposite. There is a slightly worse performance since the recognition rate falls off a 28 Value fig) lop bit 5 gene C356 neig "Linearized" Eigenvalues of Transformed Hidden Markov Models T I 0.8 - .. 0.6 - _ D D 7.! > 0.4 - q 4» / f ’4’} 0.2 — If." - "5/ .;.- o ...- _ 0 so 100 150 200 250 300 350 400 450 $00 Value = (IOAEigenvalue - 1)]9 Figure 12: “Linearized” eigenvalues of transformed HMM’s. Top curve - Speaker 1,2; Middle curve - Speaker 1; Bottom curve - Speaker 2. bit sooner than in the nearest neighbor case. To the extent that these experiments are generalizable, this method seems to work slightly better for the speaker dependent case and slightly worse for the speaker independent case compared with the nearest neighbor method without linearization. 29 Recognition vs Compression for Vocabulary 1 1m V M I I f .1 ._§ - 'a .......... g ... ....... _4 3o .. .. 20 " .1 10 F - o 1 A A A A O 0.2 0.4 0.6 0.8 l 1.2 Compression Figure 13: “Linearized nearest neighbor compression for Vocabulary 1. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 30 Recognition vs Compression for Vocabulary 2 100 . . 9O - .................................... ..-- """" ~- s so - i, a 70 _ i; s 3 60 b :1, in; I :§ 50 I- ...-I“: q § 40 - - °‘ 30 - q 20 - 3 « 10 - I o A A n l A o 0.2 0.4 0.6 0.8 1 1.2 Compression Figure 14: “Linearized nearest neighbor compression for Vocabulary 2. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 31 Recognition vs Compression for Vocabulary 3 1m r 1 V V T 9.. a -4 :3 — 5" m _. 30 P .. 20 - . - 10 - = ~ 0 . L 1 0 0.2 0.4 0.6 0. 8 . Compression Figure 15: “Linearized nearest neighbor compression for Vocabulary 3. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 32 1.2 Recognition vs Compression for Vocabulary 4 100 v r . r m 2 :2 :3 ': g 30 .. 20 - 10 - 0 .1 J 41 A A o 0.2 0.4 0.6 0.8 Compression Figure 16: “Linearized nearest neighbor compression for Vocabulary 4. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 33 Recognition vs Compression for Vocabulary 5 1% V V V V' r .............................. 9O ......... ----------------- .................... 80 70 I '0 o--\ 50»- Rccognition Rate 40- 30]- 20)- IOI- )- O 0.2 0.4 0.6 0.8 1 Compression Figure 17: “Linearized nearest neighbor compression for Vocabulary 5. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 34 1.2 3.7 Bottom Up Compression Bottom up compression is used to see at what point the small eigenvalues can be neglected. Bottom up compression just means that the smallest values are aver- aged together before the larger values, beginning with those of smallest magnitude. The only exception is that all the identical values are first compressed (i.e. all the unity eigenvalues are combined first). The bottom up compression works as follows. First all identical values are compressed together since this compression has no effect on recognition rate. Then the smallest two eigenvalues are replaced by their mean followed by a recognition test to see the effect. Next the original smallest three eigen- values are replace by their mean and a recognition test is run to see the effect. This continues until all eigenvalues are replaced by one value representing the population mean. Figures 18 through 22 show how the recognition rate is affected by compression using the botton up method for the Vocabularies 1-5. The solid line represents the recognition rates for Speaker 1, the dashed line for Speaker 2, and the dotted line for Speaker 1,2. The main reason for doing the bottom up compression was to see how far the “plateau” of the initial recognition rate could be extended out before it would drop off. Thus, if say all the eigenvalues less than 0.8 did not contribute much, then the recognition rate would stay the same until the eigenvalues above 0.8 were affected. This turns out not to be the case. As can be seen in Figures 18 through 22 the recognition rate starts dropping as soon as all the identical values are merged together, the only exceptions are for Speaker 1 in Vocabularies 2 and 5, Speaker 2 in Vocabulary 3 and Speaker 1,2 in Vocabularies 2 and 3. In fact the nearest neighbor method did as well or better in extending the initial “plateau.” Thus there appears to be no 35 Recognition vs Compression for Vocabulary 1 100 . . m , 80 7O 60 50 4O Recognition Rate 30 20 IO 0 0.2 0.4 0.6 0. 8 1 Compression Figure 18: Bottom up compression for Vocabulary 1. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. threshold below which all the eigenvalues can be readily merged together. 36 Recognition vs Compression for Vocabulary 2 Recognition Rate Compression Figure 19: Bottom up compression for Vocabulary 2. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 37 1.2 Recognition vs Compression for Vocabulary 3 100 r . . . Recognition Rate O 0.2 0.4 0.6 0.8 Compression Figure 20: Bottom up compression for Vocabulary 3. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 38 1.2 Recognition vs Compression for Vocabulary 4 1m 1 i r v T 90 -. .............. " 80 [- ...................................... _. 7o .. ., “:5 " q 3 2 6° ~ ~ :3 so - - = b g 40 - 4 3O - a 20 i— -I 10 - - o 1 i t i . O 0.2 0.4 0.6 0.8 l 1.2 Compression Figure 21: Bottom up compression for Vocabulary 4. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 39 3.3... 55.5%».qu Recognition vs Compression for Vocabulary 5 100 . 8 a _- § . a: - 1 20 - - 10 I- q o L 1 A 1 A o 0.2 0.4 0.6 0.8 1 Compi-ession Figure 22: Bottom up compression for Vocabulary 5. Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2. 40 1.2 4 Conclusions A procedure has been presented for evaluating the likelihood of a hidden Markov model using only 0([1 — k]N T) floating point operations, where N is the number of states, T is the number of observations derived from an utterance, and k, the compression index, can be close to unity depending on the acceptable degradation in recognition rate. This is a reduction from the 0(3N T) to 0(N2T) operations used in existing algorithms to evaluate the likelihood of an HMM. This is significant for the development of a speech recognizer to be run on a PC for speech disabled individuals. It has been found that the compression has a greater detrimental impact on the speaker independent case than for the speaker dependent case. This is not critical since the use of the compression technique will be used for people with cerebral palsy on an individual basis. It has also been found that the vocabularies with words that are dissimilar perform better under compression than vocabularies with similar words. This is true for recognition in general. It has been observed that larger vocabularies result in greater computational savings. This is because with larger vocabularies there are more eigenvalues that are very close to each other and can be merged with negligable affect on the recognition rate. Near-real-time recognition is. possible for large vocabularies since the compression can greatly improve the speed at which recognition takes place. This can be seen in the following example. Suppose a vocabulary of 1000 words were used in conjunction with a six state Bakis model, the compression index was K = 0.9, and an utterance contained 100 observation symbols. The computation to recognize the word would then drop from 1.8 million flops to 60 thousand flops. For the PC described earlier, this would be the difference between 72 s and 7.2 s to recognize a word. This increased speed comes at the expense of at least some recognition rate degradation however. 41 It should be pointed out, that though the recognition rate drops as the compression is increased, if a word is misrecognized it is usually found among the words with the highest scores in these experiments. The significance of this can be seen if one realizes that this technique is to be used in a system to help disabled individuals communicate. Thus if a word is misrecognized, the correct word can be quickly found by searching the words with the top scores in a binary fashion, as opposed to having to resort to some other method like spelling the word out. It is the conclusion of this thesis that the reduction in computational complexity achieved by the transformation and compression of hidden Markov models is feasable for the implementation of an efficient likelihood evaluation to be implemented in a speech recognition unit for disabled individuals. 42 References [I] D. Hsu and J .R. Deller, Jr., “On the use of HMM’s to recognize cerebral palsy speech: Isolated word case,” Proc. ICASSP ‘89, Glasgow, vol. 1, pp. 290-293, May 1989. [2] D. Hsu, “Computer recognition of nonverbal speech using hidden Markov mod- elling” (Ph.D. dissertation), Northeastern University, Boston, 1988. [3] D. Hsu and J .R. Deller, Jr., “Recognition of cerebral palsy speech using hidden Markov modelling: Isolated word case,” Biomedical Measurement, Control, and Informatics, in review. [4] SE. Levinson, “Structural methods in automatic speech recognition,” Proc. IEEE, vol. 73, pp. 1625-1650, 1985. [5] L.R. Rabiner and RH. Juang, “An introduction to hidden Markov models,” IEEE ASSP Magazine, vol. 3, pp. 4-16, 1986. [6] C.T. Chen, Linear Systems Theory and Design, New York: Holt, Rinehart and Winston, 1984. [7] J.R. Deller, Jr. and D. Hsu, “An alternative adaptive sequential regression al- gorithm and its application to the recognition of cerebral palsy speech,” IEEE Trans. Circ. and Sys., vol. 34, pp. 782—787, 1987. [8] SE. Levinson, L.R. Rabiner, and M.M. Sondhi, “An introduction to the appli- cation of the theory of probablistic functions of a Markov process to automatic speech recognition,” Bell Sys. Tech. J., vol. 62, pp. 1035-1073, 1983. [9] RA. Divijver and J. Kittler, Pattern Recognition: A Statistical Approach, Lon- don: Prentice-Hall International, 1982. [10] A.V. Oppenheim and R.W. Schafer, Digital Signal Processing, London: Prentice- Hall International, pp.239-250, 1975. [11] SA. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. ASSP. vol. ASSP-28, pp. 357-366, August 1985. [12] W. Johnson, EL. Darley, and DC. Spriestersbach, Diagnostic Methods in Speech Pathology, New York: Harper & Row Publishers, 1963. [13] A.K. Jain and RC. Dubes, Algorithms for Clustering Data, London: Prentice- Hall International, pp.128—130, 1988. 43 APPENDICES A Program 1 - Wavemark This program allows the user to view the data sampled by the streamer program by Metrabyte. It is‘ run using the following command line arguments. Wavemark filename.dat x where x=0 if the file was produced by streamer, x=1 if the file was produced by Wavemark, and x=2 if using the cerebral palsy data from Northeastern University. 45 #include «include #include #include #include #include #include long stepsize; int inc,opt,b,e,M,maxx,maxy; long b_index,e-index; long FILE_SIZE,FILE-INDEX,disp1ay-size,end_val; /* Upper case are indexes */ /* in bytes , lower case are indexes in points */ char bufferbEQOO],buffere[900]; /* there are two bytes per point (int) */ char outfilenameEBO],infilenameIBO]; char PATHISO]; int MAXVAL,INFILE_TYPE; FILE *infilei; main(argc,argv) int argc; char *argvf]; { int graphdriver I DETECT,graphmode; int errorcode; int 1 char long void void void void void void void void void void void nchar; *stop; filelengthC): change_stepsize(); change-position(); change_marker()3 save-pointsZtile(); change-fileindex(); change-disp1aysize(); change-endindex(); change_path(); update-screen()3 initial-screen(); zoom_screen(); 46 void listen_vord(); void exit()3 if( argc (3 1){ printf("Usage: Hordmark (binary file name) (type of fi1e>\n\n"); printf(" For type of file enter (0) for streamer binary file (Metrobyte) or <1> for integer binary file.\n"); printf(" Hordmark saves files in the integer binary file format.\n\n"); printf("Examp1e: Wordmark cpdata.dat 0\n"); exit(0); } strcpy(infilename,argv[1]); INFILE-TYPE=atoi(argv[2]): printf("infilename-%s INFILE-TYPE-%d\n" ,infilename,INFILE_TYPE); strcpy(PATH,"\\"); FILE_INDEX'0; display_size=50000; if ( (infilel - fopen(infilename, "rb")) -- NULL) { printf("fopen failed for file Zs.\n",infilename); exit(0); } printf("file 13 has been opened\n",infilename); if ( (FILE-SIZE 8 filelength(fileno(infilei))) !8 -1L) { printf("Size of file 3 21d bytes 8 21d data points\n" ,FILE_SIZE,FILE_SIZE/2);. } elsef printf("ERROR getting file size\n"); } if( (FILE-INDEX + display_size*2) > FILE-SIZE ) display_size I FILE-SIZE - FILE_INDEX; MAXVAL82048; Ms(int)ceil((double)display_size/100); printf("M-%d display-size-%ld fileindex-Zld filesize-de\n",M,display-size,FILE-INDEX,FILE_SIZE); 47 initgraph(&graphdriver,agraphmode,"c:\\tc"); errorcode = graphresult(); if( errorcode !- 0 ) { printf("Graphics error: %s\n",grapherrormsg(errorcode)); exit(1); } maxx = getmaxxC); maxy = getmaxyC); initial_screen(); change-fileindex(0); update_screen(); inchar=0; while( inchar != 120 ) { inchar=getch(); svitch(inchar) } closegraphC); exit(0); } { case case case case case case case case case case C386 C386 case case case case 43 : 45 : 61 : 62 : 63 : 64 : 65 : 66 : 67 : 68 : 75 : 77 : 102: 108: 117: 122: default: } change_stepsize(1); change_stepsize(-1); change_marker(1); change-marker(2); change_endindex(-1); change_endindex(1); change-displaysize(-1); change_disp1aysize(1); change_fileindex(-1) change_fileindex(1); change-position(-1): change_position(1); save-point32file(); listen_vord(); update-screen(); zoon_screen(); break; s 9 48 break; break; break; break; break; break; break; break; break; break; break; break; break; break; break; break; /* x /* + /* - /* F3 /* F4 /* F5 /* F6 /* F7 /* F8 /* F9 /* F10 /* <- /* -> /* f /* 1 /* u /* z */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ /*********************************************************/ void save_points2file(){ char stringEBO],string1[80],string2E80],stringSE803,ch[2]; unsigned inbufferElOO]; int inkey,inva1,1ength,i,j,outbufferEiOO]; long index-c; ' FILE *outfile; char *1toa()3 char *itoaC): setviewport(0,370,300,390,0); setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); strcpsttring,"Enter output file name >>"); setcolor(60); outtextxy(5,370,string); strcpy(stringl,"\0"); ch[1]=’\0’; inkeyso; while( inkey !8 27 ) /* CR II esc */ { inkey=getch(); if( isalnum(inkey) !- 0 ll inkey -- 46){ /* alpha num or "." */ ch[0]=inkey; strcat(stringi,ch); /* length=str1en(string1);*/ setviewport(210,370,639,390,0); setbkcolor(0); clearviewportC); setvieVport(0,0,maxx,maxy,0)3 setcolor(60); outtextxy(215,370,stringl); } if( inkey -- 8 ){ length-strlen(stringl); stringiElength-1]-’\O’; setviewport(210,370,639,390,0); setbkcolor(0); 49 clearviewportC); setviewportC0,0,maxx,maxy,0); setcolor(60); outtextxy(215,370,string1); } if( inkey .. 13 ){ setviewport(0,370,639,390,0); setbkcolor(0); clearviewport()3 setviewport(0,0,maxx,maxy,0); setcolor(62); strcpsttring,"Hriting points to file "); strcatCstring,string1); outtextxy(5,370,string); outfile = fopen(string1, "wb"); fseekCinfilel,FILE_INDEX,SEEK_SET); M=(int)ceil((double)display_size/100); index-c = 0; for( i=0; i>4; inval -= 2048; } /* File structure from Metrobyte’s streamer sampling program. The data is in the 12 MSB’s (the 4 LSB’s contain the channel info). Thus the data is shifted 4 bits to right and 2048 subtracted to get the bipolar format (unsigned the values range from 0-4098 for 12 bits) */ if( INFILE-TYPE -- 1 ) inval - inbuffer[j]; if( INFILE_TYPE -- 2 ) inval = inbufferEjJ-2048; outbufferEj] - inval; index-c++; } fwrite((void *)outbuffer, sizeof(int),100,outfile); } fcloseCoutfile); 50 } setviewport(0,370,639,390,0); setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); break; if( inkey =8 27 ){ } } } setviewport(0,370,639,390,0); setbkcolor(0); clearviewportC); setviewport(0,0,maxx,maxy,0); break; /*********************************************************/ void change_stepsize(inc) int inc; { char string[80]; if( inc -- -1 ll inc == 1 ) { switch(stepsize){ case 1: if(inc =8 1 ) { stepsize-10; setcolor(56); strcpy(string,"STEP SIZE : 100 1,000 10,000 100,000 outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE outtextxy(10,350,string); } break; case 10: if(inc -- -1 ){ stepsize-l; setcolor(56); strcpy(string,"STEP SIZE : 100 1,000 10,000 100,000 outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 51 1 10 1,000,000"); 10"); 1 10 1,000,000"); 1"); outtextxy(10,350,string); } if(inc .. 1 ) { stepsize-100; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE ‘100"); outtextxy(10,350,string); } break; case 100: if(inc as -1 ){ stepsize810; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 10"); outtextxy(10,350,string); } if(inc 8- 1 ){ stepsize-1000; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 1,000"); outtextxy(10,350,string); } break; case 1000: if(inc .- -1 ) { stepsize-100; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 100"); 52 outtextxy(10,350,string); } if(inc .. 1 ){ stepsize=10000; setcolor(56); strcpsttring,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 10,000"); outtextxy(10,350,string); } break; case 10000: if(inc == -1 ){ stepsize=1000; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 1,000"); outtextxy(10,350,string); } if(inc .. 1 ){ stepsize=100000; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 100,000"); outtextxy(10,350,string); } break; case 100000: if(inc .- -1 ){ stepsize-10000; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 10,000"); 53 outtextxy(10,350,string); } if(inc == 1 ){ stepsize=1000000; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 1,000,000"); outtextxy(10,350,string); } break; case 1000000: if(inc =8 -1 ){ stepsize-100000; setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy<10,350,string); setcolor(62); strcpy(string,"STEP SIZE 100,000"); outtextxy(10,350,string); } break; default: break; } } } /******************************************#*************/ void change-position(nxt) int nxt; { char string1[10]; int oldb,olde; char *ltoa(); if( opt =- 1 || nxt 8- 0) { oldb-b; b-index ' b-index + stepsize*nxt; if( b-index < FILE-INDEX/2 ) 54 b_index 8 FILE-INDEX/2; if( b_index > e,index) b-index I e_index - display-size/maxx; b=(b_index - FILE-INDEX/2)*maxx/display_size; setviewport(200,330,300,340,0); setbkcolor(0); clearvievportC); setviewport(0,0,maxx,maxy,0); ltoa(b-index,string1,10); setcolor(63); outtextxy(200,330,stringl); putimage(o1db,10,bufferb,COPY_PUT); getimage(b,10,b,225,bufferb); setcolor(63); line(b,10,b,225); } if( opt =8 2 II nxt =8 0) { oldese; e_index - e_index + stepsize*nxt; if( e-index < b-index ) e-index - b-index + display_size/maxx; if( e_index > FILE_INDEX/2 + display-size ) e-index I FILE_INDEX/2 + display-size; e-(e-index - FILE_INDEX/2)*maxx/display-size; setvievport(470,330,570,340,0); setbkcolor(0); clearvievportC); setviewportC0,0,maxx,maxy,0); ltoa(e-index,string1,10); setcolor(57); outtextxy(470,330,stringi); putimage(olde,10,buffere,CDPY_PUT); getimage(e,10,e,225,buffere); setcolor(57)3 1ine(e,10,e,225); } } l*********************************************************/ 55 void change-marker(mrk) int mrk; { if ( mrk ‘3 1 ) opt I 1; if ( mrk -- 2 ) opt 8 2; } Ittt*****************#************************************/ void change-fileindex(nxt) int nxt; { char stringE40],string1[40]; FILE-INDEXIFILE-INDEX+stepsize*2*nxt; if ( FILE_INDEX < 0 > FILE_INDEX=0; if ( FILE-INDEX > FILE-SIZE - 20) FILE_INDEX3FILE_SIZE-20; ltoa(FILE-INDEX/2,string1.10); setviewport(10,280,110,290,0); setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); setcolor(58); outtextxy(10,280,stringl); end_val I FILE-INDEX/2 + display-size; if ( end_val > FILE_SIZE/2 ) { end-val 8 FILE_SIZE/2; display_size - FILE-SIZE/2 - FILE-INDEX/2; } ltoa(end_val,string1,10); setvievport<540,280,639,290,0); setbkcolor(0); clearviewportC); setvievportC0,0,maxx,maxy,0); setcolor(58); outtextxy(540,280,3tring1); 56 ltoa(display-size,string1.10); strcpsttring,"<-- "); strcat(string,string1); strcat(string," -->"); setviewport(250,280,390,290,0); setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); setcolor(58); outtextxy(250,280,string); } /*********************************************************/ void change-endindex(nxt) int nxt; { char string[40],string1[40]; end_val I end_va1 + stepsizeInxt; if ( end_val > FILE-SIZE/2 ) end-val I FILE_SIZE/2; if ( end_va1 < FILE-INDEX/2 ) end-val I FILE-INDEX/2 + 10; display-size I end_val - FILE-INDEX/2; ltoa(end_val,string1,10); setviewport(540,280,639,290,0); setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); setcolor(58); outtextxy<540,280,stringl); ltoa(disp1ay_size,string1,10); strcpy(string,"<-- "); strcat(string,string1); strcat(string," -->"); setviewport(250,280,390,290,0); setbkcolor(0); clearvievportC); setviewport(0,0,maxx,maxy,0); setcolor(58); outtextxy(250,280,string); 57 } /*********************************************************/ void change-displaysize(nxt) int nxt; { char string[40],string1[40]; display-size I display_size + stepsize*nxt; if ( display-size < 10 ) display_sizeI10; if ( display-size > FILE_SIZE/2 - FILE_INDEX/2 ) display-sizeIFILE-SIZE/2 - FILE_INDEX/2; ltoa(disp1ay_size,stringl,10); strcpy(string,"<-- "); strcat(string,string1); strcat(string," -->"); setviewport(250,280,390,290,0); setbkcolor(0); clearvievport(); setviewport(0,0,maxx,maxy,0); setcolor(58); outtextxy<250,280,string); end_val I FILE-INDEX/2 + display-size; if ( end-val > FILE_SIZE/2 ) { end-val I FILE-SIZE/2; display-size I FILE_SIZE/2 - FILE_INDEX/2; } 1toa(end-val,string1,10); setviewport(540,280,639,290,0); setbkcolor(0); clearvieVport(); setviewport(0,0,maxx,maxy,0); setcolor(58); outtextxy(540,280,string1); } /*****#***#****************##********t********************/ void update_screen() { 58 char string[80],string1[40]; int x1,x2,y1,y2,inchar; int i,j; long index-1; long index_2,inc_value; unsigned buffer[100]; int inval; void change_fileindex(); setviewport(0,10,639,225,0)3 setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); setcolor(58); fseek(infilel,FILE_INDEX,SEEK_SET); M=(int)ceil((double)display_size/100); if( M < 2000 ){ MAXVALIO; for( 1-0; i>4; inval -- 2048; } if( INFILE-TYPE .. 1 ) inval I bufferEj]; if( INFILE-TYPE 3' 2 ) inval I bufferEjJ-2048; if( MAXVAL < abs(inval) ) MAXVAL I abs(inval); } } } else MAXVALI2048; itoa(MAXVAL,string1,10); strcpy(string,"Amplitude of waveform"); setviewport(220,295,390,315,0); 59 setbkcolor(0); clearviewportC); setviewport(0,0,maxx,maxy,0); setcolor(2); outtextxy(220,295,string); outtextxy(290,305,stringl); setcolor(58); x1I0; y1I120; fseekCinfilel,FILE_INDEX,SEEK-SET); index_1I0; MI(int)ceil((double)display-size/100); for( i=0; i>4; inval -- 2048; } if( INFILE-TYPE II 1 ) inval I bufferEj]; if( INFILE-TYPE II 2 ) inval I bufferEjJ-2048; x2Ifloor((double)maxx /(double)disp1ay-size*(double)index-1); y2Ifloor(-(double)inval/(double)MAXVAL*100+120); line(x1,y1,x2,y2); x1Ix2; yl=y2z index-1++; } 60 b_index I FILE_INDEX/2; e_index I FILE-INDEX/2 + display_size; change-position(0); setcolor(63); 11ne(0,235,639,235); line(0,0,639,0); for( iIO; i<11; i++) { xi=i*64-1; if( :1 < 0 ) x1I0; line(x1,230,x1,235); line(x1,0,11,5); } setviewport(0,240,639,260,0); setbkcolor(0); clearviewport()3 setviewport(0,0,maxx,maxy,0); setcolor(63); inc-value I display_size/10; for( i=1; i<10; i++) { index-2 I floor(FILE-INDEX/2 + inc_value*i); x1Ii*64-1; ltoa(index-2,string1,10); if ( 1%2 -- o ) outtextxy(xl,250,string1); else outtextxy End Marker"); outtextxy(5,400,string); strcpy(string," - End Point (F5 > + End Point"); outtextxy(5,410,string); strcpy(string,"- Display Size (F7 > (F8 > + Display Size"); outtextxy(5,420,string); strcpy(string," - Start Point (F9 > + Start Point"); outtextxy(5,430,string); strcpy(string," - Step Size < - > < + > + Step Size"); outtextxy(5,440,string); strcpy(string," Move Marker <<- > < ->> Move Marker"); outtextxy(5,450,string); strcpy(string," Update Screen"); outtextxy(380,400,string); strcpy(string," Zoom Screen"); outtextxy(380,410,string); strcpy(string," Save Marked points to FILE"); outtextxy(380,420,string); strcpy(string," Listen to Marked Portion"); outtextxy(380,430,string); strcpy(string," Exit"); outtextxy(380,440,string); /*strcpy(string," Change directory Path"); outtextxy(380,450,string);*/ setcolor(56); strcpy(string,"STEP SIZE : 1 10 100 1,000 10,000 100,000 1,000,000"); outtextxy(10,350,string); setcolor(62); strcpy(string,"STEP SIZE 1,000"); outtextxy(10,350,string); stepsizeI1000; opt-1; b-index I FILE_INDEX/2; bI(b-index - FILE_INDEX/2)Imaxx/display-size; e_index I FILE_INDEX/2 + display-size; eI(e_index - FILE-INDEX/2)*maxx/display_size; getimage(b,10,b,225,bufferb); getimage(e,10,e,225,buffere); 62 ltoa(b_index,string1,10); strcpy(string,"Beginning point I ")3 setcolor(2); outtextxy(50,330,string); setcolor(63); outtextxy(200,330,stringi); ltoa(e_index,string1,10); strcpy(string,"Ending point I "); setcolor(2); outtextxy(340,330,string); setcolor(57); outtextxy(470,330,string1); } /*********************************************************/ void zoom_screen() { FILE_INDEX I b_index*2; display_size I e_index - b_index; update-screen(); change_fileindex(0); } /********************************************************/ void change_path() { int inkey; int length; char string[80],ch[2]; void delayC); setviewport(0,370,300,390,0); setbkcolor(0); clearviewport(); setviewport(0,0,maxx,maxy,0); strcpy(string,"Enter directory path >>"); setcolor(60); outtextxy(5,370,string); strcpy(PATH,"\0")3 Ch[1]=’\0’; 63 inkeyIO; while( inkey.!= 27 ) /* CR */ { inkengetchC); if( isalnum(inkey) !I 0 II inkey II 58 ll inkey II 92 ) II alpha num or "z" or "\" */ { ch[0]Iinkey; strcat(PATH,ch); setviewport(410,370,639,390,0); setbkcolor(0); clearvievportC); setvievport(0,0,maxx,maxy,0); setcolor(60); outtextxy(415,370,PATH); } if( inkey == 8 ) { length-strlen(PATH); PATHEIength-l]=’\0’3 setvievport(410,370,639,390,0); setbkcolor(0); clearviewportC); setviewport(0,0,maxx,maxy,0); setcolor(60); outtextxy(415,370,PATH); } if( inkey II 13 ) { setviewport(0,370,639,390,0); setbkcolor(0); clearviewport(); setvievport(0,0,maxx,maxy,0); strcpy(string,"?ath name entered >> "); strcat(string,PATH); setcolor(62); outtextxy(5,370,string); delayClOOO); setvievport<0,370,639,390,0); setbkcolor(0); clearviewportC); setviesport(0,0,maxx,maxy,0); 64 break; } if( inkey II 27 ) /* esc */ { setvievport(0,370,639,390,0); setbkcolor(0); clearviewportC); setviewport(0,0,maxx,maxy,0); break; } } /*****************Ik**************************************/ void listen_word() { long i,j; int port; unsigned inbufferfi]; int inval,outnum; long delay,total,k; char string[80]; void outportC); portI820; /* 816 + 4 I/ II 816 I 330B */ port-772; /* 768 + 4 */ /* 768 I 300B */ if( INFILE_TYPE II 0 ) delayI6; if( INFILE_TYPE II 1 ) delayIS; if( INFILE-TYPE as 2 ) delayI7; total I e_index - b_index; fseek(infile1,b_index*2,SEEK_SET); for( iI0; i>4; 65 outnum I inval<<4; } if( INFILE_TYPE as 1 ) outnum I inbuffer[0]+2048; if( INFILE_TYPE as 2 ) outnum I inbufferEOJ; for( kIO; k #include #include #include #include #define W-SIZE 256 /* Hamming window size */ #define CEP-SIZE 6 /* number of cepstral coefficients */ #define FFT-SIZE 4096 /* number of points transformed real & imag I/ #define PI 3.14159265358979 int count; double mel[21]; /* me1-frequency energy */ double mf[22]; /* mel frequency scale */ double waFFT_SIZE]; /* 256 samples plus zero padding to 4096 pts */ double cep_coef[CEP_SIZE]; double fftmag[1024]; int sdata[128],buffer[128],stemp; FILE *outfile; int main(){ int i,h,s,t; long S,T; int numread; char infilename1[80],infilename2[80],outfilename[80]; char instring[40],numstringESJ; FILE *infile,*file; void exit(); void mel_cepstrum(); void mel-freq(); void mel-energy(); void vindOVC); void fftC); me1-freq(); strcpy(infilenamel,"dvrds.dat"); if ( (infile I fopen(infilename1, "r")) II NULL){ 68 printf("fopen failed for Xs.\n",infilename1); exit(0); } SITIO; vhile( fscanfCinfile,"%s\n",instring) !I EOF){ for(hI1; h<16; h++){ strcpy(infilename2,"d:\\sdata\\"); strcat(infilename2,instring); strcat(infilename2,".d"); switch(h){ case 1 : strcpy(numstring,"01"); break; case 2 : strcpy(numstring,"02"); break; case 3 : strcpy(numstring,"03"); break; case 4 : strcpy(numstring,"04"); break; case 5 : strcpy(numstring,"05"); break; case 6 : strcpy(numstring,"06"); break; case 7 : strcpy(numstring,"07"); break; case 8 : strcpy(numstring,"08"); break; case 9 : strcpy(numstring,"09"); break; case 10 : strcpy(numstring,"10"); break; case 11 : strcpy(numstring,"11"); break; case 12 : strcpy(numstring,"12"); break; case 13 : strcpy(numstring,"13"); break; case 14 : strcpy(numstring,"14"); break; case 15 : strcpy(numstring,"15"); break; default : break; } strcat(infilename2,numstring); 69 /* } if ( (file I fopen(infi1ename2, "rb")) II NULL){ printf("fopen failed for infilename2 %s\n" ,infilename2); exit(0); } printf("reading in data from file %s\n" ,infilename2); strcpy(outfilename,"c:\\cdata2\\"); strcat(outfilename,instring); strcat(outfilename,".j"); strcat(outfilename,numstring); outfile I fopen(outfilename, "vb"); printf("Hriting cepstral coeffs. to file %s\n" ,outfilename); SItIO; numreadIfread((void*)buffer,sizeofCint),128,fi1e); t+Inumread; do { for(iIO; i<127; i++){ sdataEi] I bufferEi]; } numread I fread((void I)buffer ,sizeof(int),128,file); t+Inumread; if( numread < 128 ){ for(iInumread; i<128; i++){ bufferEiJIO; } } window(); ffthf-1,2048,1); /* -1 for inverse fft */ mel_energy(); mel_cepstrum(); s++; fwrite((void *)cep-coef ,sizeof(doub1e),6,outfile); if( 3110 II 0 ){ printh" %d\n",s); for(iIO; i<6; i++){ printf("cep_coef[Zd]I%f\n",i,cep_coef[i]); }*/ }while( numread II 128 ); fclose(file); 70 } fclose(outfile); T+It; S+Is; printf("%u data points read from file %s\n" ,t,infi1ename2); printf(" Zld global points read so far\n",T); printf("%u cepstral vectors (Zu coeffs) computed and written to file %s\n" ,s,s*6,outfilename); printf(" Zld global vectors computed so far\n",S); } } fclose(infile); /********************************************************** I This function takes 256 points from sampled data using I I Hamming window and puts zeros in remaining positions I I to implement a 2048 point FFT I *sassssssssssssassssassssssssssssssassasssssssssssssassss/ void window() { } int i; for (i=0; i<256; i++){ if( 1 < 128 ){ wf[iI2]Isdata[i]I(0.54-0.46Icos(2IPIIi/255)); waiI2+11so; }else{ wf[iI2]Ibuffer[i-128]I(0.54-0.46Icos(2IPIIi/255)); wr[iI2+1]so; } } for (iI512; i<4096; i++){ waiJIO; } /*********************************#***********************/ II This routine implements a radix-2 FFT I/ [assssssssasssssssssssssssssssssssssssssssssssasssssssssss/ #define SHAP(a,b) temprI(a);(a)I(b);(b)Itempr void fft(data,nn,isign) 71 double data[]; int nn,isign; { int n,mmax,m,j,istep,i; double wtemp,wr,wpr,wpi,wi,theta; double tempr,tempi; nInn << 1; 1‘1: for (iI1;ii){ SHAP(data[j],data[i]); swAp(data[j+1],data[i+1]); } mIn >> 1; while (m >- 2 && j > m) { j -= m; m >>I 1; } j+3m; mmaxI2; while (n > mmax) { istepI2Immax; theta-6.28318530717959/(isignImmax); wtempIsin(0.5Itheta); wpr I -2.0themprtemp; wpiIsin(theta); er1.0; wiI0.0; for (mI1;mI mei-l] && bindex*4.8828125 < mei] ){ mel[i] +I fftmagEbindex] I(bindexI4.8828125-mf[i-1])/(mf[i]-mf[i-1]); } if( bindexI4.8828125 >I mf[i] && bindex*4.8828125 < mf[i+1] ){ ‘ mel[i] +I fftmagEbindex] I(mf[i+1]-bindexI4.8828125)/(mf[i+1]-mf[i])3 } bindex++; } void me1-cepstrum() { int i,k; 73 for(iIO; i vmmdrv codebook The -vm command means that the program is running in virtual memory (i.e. the hard disk is being used as RAM) and vmmdrv is the virtual memory driver contained in the directory specifed by . 75 #include #include #include #include #include #define NUM 40000 /I maximum number of cepstral vectors #define MSIZE 30 /* maximum number of characters in input #define M0 6 /I Number of Cepstral coefficients fidefine dist_thresh 0.00000001 II threshold for separation of old and new means #define NLEVELS 7 /* number of levels in binary codebook #define THOPOHER 128 /I 2‘NLEVELS #define CUNVTHRESH 100 /* max iteration per node int quantENUM]; double s_vector[NUM][M0]; long total_count; long totl,oldtot1,tot2,oldtot2; double vectorlEMO],vector2[M0],old_vector1[M0]; double old_vector2[M0]; double vector[NLEVELS+1][THOPOHER][MOJ; FILE Ioutfile,Ioutfile2; mainC) { char outfilename[80]; void initialiZBC); void vectorize(); strcpy(outfilename,"cdbkjdif.dat"); outfile I fopen(outfilename,"w"); outfile2 I fopen("cdbkjdif.1og","w"); printf("Start initialization...\n"); 76 initializeC); vectorize(NLEVELS); fclose(outfile); fclose(outfile2); } /*******************************************************/ void vectorize(nlevels) int nlevels; { int 1 Leve1,Leve12,nt; register int i,k; long j; double old_dist1,old_dist2; double distance(),dist1,dist2; long input_data(); void separate(); void compute-vector1(); void compute_vector2(); void perturb(); printf("Start vectorize\n"); total_count I input_data(); printf("total_countI%ld\n",tota1_count); for(LevelIO;LevelI dist-thresh && nt < CONVTHRESH ){goto up;} if( fabs(dist2-old_dist2) >I dist_thresh && nt < CONVTHRESH ){goto ups} for(kI0;k #include #include e.h> h.h> 50 #define M0 6 /I number of cepstral coefficients #define NLEVELS 7 /* number of levels in binary codebook #define LEVELINDX 126 /* 2‘NLEVELS-2 #define TOTVECT 254 /* number of vectors in codebook int count; double in_vector[H0]; double codebookETDTVECTlEMOJ; double buffer[6]; char outfilename[80]; char codefile[80]; char wordsEMSIZE]; FILE Ioutfile; int main(argc,argv) int argc; char IargVE];{ int i,j,k; int numread; char infil char wrdfi es[80],infilename1[80]; le1[80],wrdfile2[80],wrdnum[10]; FILE Iinfile1,*infile2; int count; void codebook-entry(); void vecto if( argc < printf( printf( r_quantize(); I 1 ){ "IIIII After program name enter two file names IIIII\n"); "1. The first file name is the name of the codebook.\n"); 84 printf("2. The second is the name of the file that contains the paths and\n"); printf(" names of all the data files to be qauntized.\n\n"); printf("Example: "); printf("lpc2 codebook.dat allfiles.dat\n",argv[0]); exit(0); } strcpy(codefile,argv[1]); strcpy(infiles,argv[2]); codebook_entry(); for(jIO; j idm2 ) 86 index1 I index2; for(iI1; i idm2 ) . index1 I index2; } vq I index1 - LEVELINDX; fprintf(outfile,"%d\n",vq); /Iprintf(" %d\n",vq);I/ } /sassstsssssssssssssssssssssssssssssssssssssssssssssssssss/ /I This routine calculates which vector to compare I/ /I next in the codebook once a vector index in the I/ II previous level is given I/ [IIIIIIIIIIIIIIII*IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII/ int level-index(k) int k; { int num; num I (int)pow((double)2,(double)k) - 2; return num; } /*********************************************************/ double distance(vector-a,vector_b) double vector_a[M0],vector_b[M0];{ int k; double dist; distI0.0; for(kIO; k