_.,‘.

,, ,--, ‘..~‘:V’“'"T., 1-15.1 . ..,~ ~.
' JQIC‘? ' ' ‘ I ‘3. W O .'P' $J.‘ "‘1‘“ _ . €2.03;
'5. 1 . » ' - 4 r ‘_ . ‘{ QKRI t r
"‘1.- ' ' " ' ‘ .y ;.~Vw::5‘$'m '1'." ." v
“ P , .‘ ‘.

I1
.43!

>

xx" .55.- wow.
7"..- 5‘L\ 5‘17“?"

'.;.";’

‘wf

| —..;u .__.- .. 3;:
't. SQ}. ' :4 ’JA 0: ti. “
(ff-".3 :I-r‘m‘u- a w v .-.u ' I f- 5 ”g”,
'1“ , ,: ff? .v ri’ﬁ _‘ -.. 1... fft'v‘s
, ‘2‘, Layqﬁ‘ ' ,5} r. 1" ... . . E???” 3'“.
".143?! ~ *3"; W"! L 1.? - ., ‘43:. (not @143" 3:;
5 ,r_ .." 44R

.5

x.

.no. -y.~»‘.;~a..mi~1.‘ ‘uu.W‘-p—o.w.u.Lawninnﬂnwmsnuy» v -....

   
 
   

A“-

.x.
Wimﬁv
.m‘g“; ‘5. ﬁx":

L, Q .l‘"
5-.1-‘6é9 5": 1’
”93‘ 4‘

gt " -
.5 .
if?!) -.".‘.t;
3.72:: .-
1 3:6! 756

a:
lap}?!

”3'12". 1:
.‘ :L‘u . -'

Jig-,5 F ,3
.7: "85$: ' "gaﬁ‘g 5354‘
r. I 4 ,. .

‘
‘11":
n

z.
’7:
L‘.

,.
3!.
..

"3‘93 .

. .‘Af: -
“lax-:33“, 3’: [

.1

a; T.
b '1 V
1 £35.?"
r." ..
351:...
2 air"
. P'

u
\‘vh‘.
. f
‘ jg

t»‘
13‘

«3;

’l‘.l’
..
'yalm
‘ I
v I'. .

v.7
.o
' . "1"”.1
hunky?" , 1"” v
‘1:

‘ d c-l‘r":
C.rrf::':af
5" ”L 1'19 W25:

3'.

r ,
5., ,1 .
r; .‘ '9"

. ‘“

.. 4%.;‘4372‘.

‘« .73.:

5.
$

‘5
M

ME— .2555;

I; J»?

.. —:-

 

ENU IVERSI TY LIB

IIIIIIIIIIIIIIIIIIIIIIII|III|| II

31293 00891 9288

 

                           

 

 

 

 

 

 

 

 

 

 

 

 

 

This is to certify that the

thesis entitled

EFFICIENT DISCRETE SYMBOL HIDDEN MARKOV
MODEL EVALUATION USING TRANSFORMATION
AND STATE REDUCTION

presented by
Ross Kenneth Snider
has been accepted towards fulﬁllment

of the requirements for

Master's degree in Electrical Engineering

 

 

Emfétwbcuﬂ‘y

 

Date q 710‘“ 40

 

0-7 539 MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

 

 

 

I LIBRARY
Michigan State
I L Unlverelty I

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or betore date due.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_____I

 

MSU le An Affirmative Action/Equal Opportunity Institution
exIMms-OJ

EFFICIENT DISCRETE SYMBOL

HIDDEN MARKOV MODEL EVALUATION
USING TRANSFORMATION AND STATE REDUCTION

By

Ross Kenneth Snider

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
ﬁartheckqpeecﬁ

MASTER OF SCIENCE
Department of Electrical Engineering

1990

ABSTRACT

EFFICIENT DISCRETE SYMBOL

HIDDEN MARKOV MODEL EVALUATION
USING TRANSFORMATION AND STATE REDUCTION

By

Ross Kenneth Snider

A procedure is developed for evaluating the likelihood of a hidden Markov model
(HMM) using only 0([1 - x]N T) ﬂoating point operations, where N is the number of
states in the model, T is the number of observations derived from an utterance, and
x is the degree of computational complexity reduction which can be close to unity.
Effective recognition of speech using conventional algorithms require 0(3N T) (Bakis
model) to 0(N’T) (ergodic model) operations to evaluate an HMM. Reduction in
computational complexity is required for near-real-time recognition algorithms to be
feasible on ordinary personal computers. The motivation for this reduction is the
development of a PC-based speech recognition system for disabled individuals. This
thesis will explore the tradeoﬁ' between computational complexity and recognition

rate for various speakers, vocabularies, and compression techniques.

ACKNOWLEDGEMENTS

The idea for the state space formulation and transformation contained in this thesis
was suggested by my advisor Dr. John Deller J r., who also made numerous helpful

suggestions in the research and also in the write up.

This work was supported by the Whitaker Foundation.

iii

Contents

Introduction and Background

Computational Complexity Reduction
2.1 Complexity Reduction by Transformation ................

2.2 Complexity Reduction by Compression .................

Experimental Results

3.1 Speakers and Vocabularies ........................
3.2 Data Handling and Model Training ...................
3.3 Recognition Rates before Compression .................
3.4 Eigenvalues ...... ' ..........................
3.5 Nearest Neighbor Compression ......................
3.6 “Linearized” Nearest Neighbor Compression ..............

3.7 Bottom Up Compression .........................
Conclusions

Program 1 - Wavemark

Program 2 - Cepstrum

Program 3 - Codebook

Program 4 - Quantize

Program 5 - Hmmtrain

Program 6 - “.m ﬁles”

iv

11
11
14
15
19
21
28
35

41

45

67

75

83

89

108

G Program 7 - Evalhmms 110

H Program 8 - Compress 116

List of Tables

MACON

Vocabularies 1 and 2 ............................ 12
Vocabularies 3-5. ............................. 13
Recognition Rates for Vocabularies 1 and 2. .............. l8
Recognition Rates for Vocabularies 3 and 4. .............. 18
Recognition Rates for Vocabulary 5 .................... 18

vi

List of Figures

1000-40301vath

[\DMMHHHHHHHHHH
NHOCDOOKJOBO‘erwr—‘O

Computation of state probability vector before transformation. . . . .
Computation of state probability vector after transformation ......
Transfer function of anti-aliasing ﬁlter. .................
Four state Bakis model. .........................
Four state ergodic model. ........................
Eigenvalues of transformed HMM’s from Vocabulary 1. ........
Nearest neighbor compression for Vocabulary l. ............
Nearest neighbor compression for Vocabulary 2. ............
Nearest neighbor compression for Vocabulary 3. ............
Nearest neighbor compression for Vocabulary 4. ............
Nearest neighbor compression for Vocabulary 5. ............
“Linearized” eigenvalues of transformed HMM’s .............
“Linearized nearest neighbor compression for Vocabulary 1 .......
“Linearized nearest neighbor compression for Vocabulary 2 .......
“Linearized nearest neighbor compression for Vocabulary 3 .......
“Linearized nearest neighbor compression for Vocabulary 4 .......
“Linearized nearest neighbor compression for Vocabulary 5 .......
Bottom up compression for Vocabulary 1. ...............
Bottom up compression for Vocabulary 2. ...............
Bottom up compression for Vocabulary 3. ...............
Bottom up compression for Vocabulary 4. ...............

Bottom up compression for Vocabulary 5. ...............

vii

16
17
17
20
23
24
25
26
27
29
30
31
32
33
34

1 Introduction and Background

The general goal of this research is to assist in the development of a site within
the Speech Processing Laboratory that will accept speech data from speech disabled
clients at clinical centers, and return to these persons a customized speech recognition
software package, implementable on a relatively ordinary personal computer (PC).
The precise technical form of this recognizer will depend upon a particular individual’s
needs and purposes and will not be further addressed here.

The speciﬁc goal of the work presented in this thesis is to provide a recognition
technique that will be used in the recognition software that is faster than the existing
algorithms used in evaluating discrete symbol hidden Markov models (HMM). This
research is restricted to hidden Markov modeling of isolated words, i.e. speech with
deliberate pauses between words. The reason for this is that cerebral palsy speech
is not well understood and using it in the continous speech case would involve more
unknowns than for the isolated .word case. Thus this research addresses the simpler
case, though nothing restricts the techniques reported in this thesis to the isolated
word case. Though the end goal of this research is to assist speech disabled indi-
viduals, to compare the technique in this thesis with existing methods of evaluating
HMM’s, normal speech is used.

Encouraging isolated word recognition results for speech of cerebral palsied indi-
viduals with a spectrum of articulatory abilities were recently reported [1, 2, 3]. The

reason the PC is being used is to maximize accessibility of these developments to users

with disabilities. Customized architectures or mainframe computing environments are
not generally available to the user of a prospective speech recognition device. Since
the constraint of the PC is being used, measures must be taken to speed up existing
recognition algorithms, particularly those involved in feature extraction and HMM
decoding. It is the HMM search with which this thesis is primarily concerned.

The Baum-Welsh “forward-backward” recursions (see, e.g., [4]) in evaluating an
ergodic HMM (a fully connected model) require 0(N2T) operations where N is the
number of states in the model and T is the number of observations extracted from the
given utterance. While this represents a remarkable improvement over the 0(NT)
operations required for a “brute force” procedure, it still does not allow for near-
real-time evaluation of a realistic number of HMM’s once we have the observations
representing a given utterance. For example, a ﬂoating point operation (ﬂop)1 requires
about 40 us on our laboratory’s 16 MHz, 80386-based PC equipped with a ﬂoating
point coprocessor. The evaluation of an N = 6 state HMM with respect to an
utterance represented by T = 100 observations requires about N 2T = 3600 ﬂOps, or
0.144 s. If 100 HMM’s are to be evaluated in some time slot, then the time requirement
for recognition of this single word is 14.4 s which is unacceptable. Even if a 0(3N T)
method is used, the recognition would take 7.2 s. This thesis will demonstrate a
method where 0(N2T) operations needed to evaluate an HMM can be reduced to
0([1 — n]N T) where x is the compression index and is chosen on the basis of the
desired tradeoff between computational complexity and recognition rate. Thus the
value of x is chosen on an experimental basis. If, for example, K. were chosen to be 0.9,
then in the above example l-ldN T) ﬂops would be necessary for each HMM and the

time to recognize a word would be .24 3 compared with 14.4 s for (N 2T), or 7.2 s for

 

lA ﬂop is taken to be one multiplication plus one addition.

0(3N T), ﬂops per model. This is “real-time,” given the above vocabulary size, and
further reductions are possible with faster clock speeds or other existing technologies

which are quickly becoming “ordinary,” or with further theoretical development.

2 Computational Complexity Reduction

2.1 Complexity Reduction by Transformation

In the following discourse it is assumed that the reader is familiar with hidden Markov
modeling and how the likelihood evaluation of an HMM is computed. The reader is
referred to [4, 5, 16] for more details. It should be pointed out that this thesis is
dealing with discrete symbol HMM’s as opposed to other variants such as continuous
mixture density HMM’s (see, e.g., [15]).

While it is not conventional to do so, the HMM is viewed here in a state-space-
like formulation. Suppose a model, M, has N states (1;, i = 1,2,...,N and M
discrete observation symbols 2],, lc = 1,2,. . . ,M. At discrete observation time t,
let us deﬁne the state probability vector, x(t), and the observation probability vector,

y(t), as follows:

xT(t) é [31(t) 2:2(t) xN(t)] (1)

yT(t) *- [y1(t) y2(t) WW] (2)

where, 2:;(t) = (the probability of being in q.- at discrete time t given the model
M ) = P(q,- at t | M ), and yk(t) = (the probability of generating symbol 2;, at
discrete time t given M ) = P(zk at t | M). It is not difﬁcult to see that the following

state equation and observation equation describe the dynamics of the model:

x(t + 1) = Ax(t) + u(t) 6(t) (3)

y(t) = BXU) (4)

where, A is the N x N state transition matrix associated with the HMM whose

(i,j) element, aij = P(q,- at t + 1 |qj at t) for any t; B is the M x N observation
probability matrix whose (k, j) element, bkj = P(2z;c | Qj); and u(0) is some vector
such that when x(O) is deﬁned to be zero, x(l) takes the proper initial values, with
u(t) arbitrary but ﬁnite Vt yé 0, and 6(t) the unit sample sequence. Note carefully
that if the observation symbol at time t is 2],, then only element yk(t) of y(t) need be
calculated.

To derive a more efﬁcient computational structure, we ﬁrst apply a diagonalizing

equivalence transformation (e.g., see [6, pp.146-154]) to obtain

x(t+1) = Ax(t)+ﬁ(t)6(t) (5)
y(t) Bib) (6)

where A = UPAP"1U'1 with P the usual matrix of normalized eigenvectors, U a
special diagonal matrix described below, 2(t) = UPx(t), ﬁ(t) = UPu(t), and B =
BP‘lU’l. The 2"" diagonal element of the matrix U is the reciprocal of the 2""
element of the vector Pu(0). As a consequence of this operation each element of the
excitation vector, ﬁ(t), is unity at time zero. This result will be important below.
The diagonalizing transformation has reduced the computation to 0(N T) oper-
ations per HMM since A is a diagonal matrix, and since only N ﬂops are needed to
compute a relevant yk(t). Note that the systems (3) - (4) and (5) - (6) are equivalent
and therefore this reduction has no impact on the recognition rate. It should be
pointed out, however, that in computing the state space structure, a summing pro-
cedure is used rather than a Viterbi search (see, e.g., [16]). In the method employed
here, the likelihood that the HMM produced the observation sequence (using any
sequence of states) is computed whereas in the Viterbi decoding approach, the likeli-

hood associated with the best state sequence is assigned to the model. The summing

procedure is used because it admits complexity reduction that is not possible with
Viterbi decoding. A graphical representation of the diagonalizing procedure can be
seen in Figures 1 and 2. Figure 1 shows how the state probability vector is computed
based on equation (3) before the transformation is applied. The computation of the
state probability vector after the transformation is shown in Fig. 2, which is based on
equation (5). The ag’s in Fig. 2 are the eigenvalues of the matrix A and are thought

of as the poles of the state space system describing the HMM.

 

110+!) Xi“)

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n20)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

. . . O
.
u I) IN(I+I) “Hal
Na 2 A z" _=
‘M —
‘N2
0
e
e
‘NN A

 

 

 

 

 

Figure 1: Computation of state probability vector before transformation.

 

an il(‘+l) i‘“)
2 ‘ z'I %

\ a! ‘
5(1) i2(t+ I) i2“)
———O( 2 bf ‘ 1“ 4v
0 \
0
e

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

“2

 

 

 

 

5a) iN(I+l) iNa)

 

 

 

 

 

 

 

 

 

 

 

Figure 2: Computation of state probability vector after transformation.
The ai’s are the eigenvalues of the matrix A.

2.2 Complexity Reduction by Compression

To demonstrate further computational beneﬁt, it is convenient to look at all the
HMM’s representing the words in the vocabulary at once. To do this the HMM’s are
combined into one large “state space” formulation. This is done as follows. Suppose
there are W total models or words in the vocabulary. Let us construct “global” state

and observation equations

52(t+1) = Ax(t)+ﬁ(t)6(t) (7)

W) = 3i“) (8)

in which x(t) is a vector of dimension NW consisting of the concatenation of the W

individual state vectors, y(t) is a similarly formed M W-vector, A is a block diagonal

matrix of dimension NW x NW with the individual state transition matrices, A,
running down its diagonal, and B is a. similarly formed NW x MW matrix on the
output side. fi(t) is the obvious N W-vector of ones at t = 0 and is arbitrary otherwise.

The diagonal elements of A, say 5“, are the eigenvalues of the original A matrices
or the poles of the inputoutput description of the HMM. Accordingly, these numbers
are bounded in magnitude as 0 S I 6;; | _<_ 1. Empirically, we ﬁnd that these
numbers are often real (or have very small imaginary parts), and, if they represent
“non-negligible” poles (those with signiﬁcant magnitude) of the system, they are often
clustered near 2 = 1 in the z-plane. This led us to consider whether, with some degree
of quantization of the 71,-,- coefﬁcients, some of the state variables could be combined,
eliminating the computational burden of nearly redundant states. If, for example,
6;,- z szj then, since both state variables represent similar transient responses to a
unit samplez, it is possible to eliminate fig-(t), and reduce the size of the state space.
The appropriate adjustment to the observation equation is to add column j of B
to column 2', then eliminate column j. Thus a computational savings is realized in
both the state and observation equations. Note carefully that the merging of states
may take place even if the states are from diﬂ'erent HMM’s! Several strategies for
compression will be described in Section 3.1.

We now deﬁne the compression index K. This is given as

___ AEliminated (9)

ATotal
where Agumgnated is the number of eigenvalues merged together and ATM; is the total
number of eigenvalues in all the HMM’s, i.e. Arm; is equal to NW since one eigenvalue

results from each state in each model. Notice that n can never equal unity since if

 

2Here we see the signiﬁcance of the matrix U above.

all the eigenvalues were merged there would still be one “eigenvalue” left, thus is is

bounded as

0 s ,c s M (10)
/\Total

where K. = 0 when no eigenvalues are eliminated.

Further computational efﬁciency is achieved by quantizing the “bk,” coefﬁcients,
though the complexity reduction is much less signiﬁcant than that in the quantization
of the 6;,- coefﬁcients. This is because the “bk,” coefﬁcients are not used at every
observation whereas all of the 6;; coefﬁcients are. The “bkj” coefﬁcients are used only
when the observation symbol is the k“ one. For formal reasons it is convenient to
decompose the observation equation into individual equations for each symbol, 2)..
For k =1,2,...,M wehave,

Mt) = ﬁtDtiu) (11)

where B 1. represents the W rows of B (with the column modiﬁcations indicated above)
used in computing likelihoods for 21,. These likelihoods are contained in the W-vector,
mt). D), is a square matrix whose function will become clear below. Immediately
following the state space reduction procedure, each D), is simply an identity matrix
with the dimension of the reduced state space. Further reduction is possible if the
entries in any two columns of a particular row of B1,, say in columns I and m, are the
same. It is more efﬁcient to add the corresponding state variables first, then multiply
the sum by the coefﬁcient value once. This is accomplished by changing the element
(1, m) of D). from zero to one, and the redundant “b” coefﬁcient in column m of B). to

zero. The inclusion of the matrix D], sets up an appropriately modiﬁed state vector.

10

3 ‘ Experimental Results

3.1 Speakers and Vocabularies

The speakers for the following recognition studies were two normal adult males, here-
after refered to Speaker 1 and Speaker 2. Speaker 1,2 will refer to the combined
models of Speakers 1 and 2. By combining the models for Speaker 1,2 we are dealing
with a speaker independent case whereas for Speaker 1 and Speaker 2 we are dealing
with speaker dependent cases.

The vocabularies were chosen for the following reasons. Vocabularies 2 and 4
are the Texas Instruments “TI-20” and “TI-46” Vocabularies, respectively. These are
standard vocabularies used in isolated word recognition tasks and were chosen so that
results could be compared with other research using these data bases. Vocabulary
3 was chosen to see what confusion exists among the similar sounding letters of
Vocabulary 4 (e.g. b,c,d,e,g,p,t,v,z ). Vocabulary 5 consists of words taken from
the “Grandfather passage” used by linguists and speech pathologists (see, e.g., [12])
because of the number of phonetically balanced words contained therein. Vocabulary
1 is the list of all the words contained in Vocabularies 2-5, with the exception of
the word you which is indistinguishable from the letter u. Each of the compression
experiments described below was performed for each of the three “speakers” on each

vocabulary.

11

 

Vocabularies

 

 

 

2
a ﬁve 11 swiftly one
about four nearly t two
all frock nine thinks three
an g ninety three three four
as go no two ﬁve
b grandfather o u six
beard h old usually seven
black he one v eight
buttons help p w nine
c himself q well zero
clings i r wish start
coat in repeat x stop
d is rubout y yes
dresses j 8 years no
e k seven yes go
eight 1 several yet help
enter long six 2 erase
erase m start zero rubout
ever missing still repeat
f my stop enter

 

 

Table 1: Vocabularies l and 2.

l2

 

 

Vocabularies

 

 

 

 

 

3 4 5

a one g about thinks
b two h all usually
c three i an well
d four j as wish
e ﬁve k beard years
f six 1 black yet

g seven m buttons you
b eight n clings

i nine 0 coat

j zero . p dresses

k start q ever

1 stop r frock

m yes s grandfather

n no t he

0 go u himself

p help v in

q erase w is

r rubout x long

3 repeat y missing

t enter z my

u' a nearly

v b ninety three

w c old

x (1 several

y e still

2 f swiftly

 

Table 2: Vocabularies 3-5.

13

 

3.2 Data Handling and Model Training

To train the HMM’s the follow procedure was used. First 15 utterances of each word
were recorded on analog tape using a Sony F-510 microphone and a Teac W-45OR
tape deck with a Dolby C noise reduction system turned on. The order of the ut-
terances was randomized for Vocabulary 5 whereas for Vocabulary 4 the utterances
were given in the order as listed in Table 2. A 40 megabyte hard disk ﬁle was created
and all the utterances were digitized to the file using the Metrabyte Streamer soft-
ware and DASI6F sampling board after passing through an anti-aliasing ﬁlter and
a NAD 3130 stereo ampliﬁer. The utterances were sampled at 10 Khz after being
bandlimited to 4.5 Khz. The transfer function of the bandlimiting ﬁlter used is found
in Fig. 3. Each utterance was then viewed graphically to mark the beginning and
ending point and then written to a separate ﬁle. This extraction was accomplished
by the program Wavemark which is listed as Program 1 in Appendix A and which
was developed as part of this effort. A Hamming window (see, e.g., [10]) 25.6 ms wide
was used every 12.8 ms on the speech data. From each window, six mel-frequency
cepstrum coefﬁcients were computed since this parametric representation has been
shown to give superior recognition performance over other types of parametric rep-
resentations (see, e.g., [11]). These coefﬁcients were computed using the program
Cepstrum which is Program 2 in Appendix B. A codebook of 128 symbols was then
produced using only the ﬁrst 10 utterances of each word. A separate codebook for
each speaker and also a codebook combining both speakers were produced. Pro-
gram 3 in Appendix C entitled 'Codebook was used for this purpose. The codebooks
were implemented in a binary tree fashion (see, e.g., [18]). The cepstrum coefﬁcients
were then quantized into symbol strings using the codebook. The program Quan-

tize, which was used to do the quantization, appears as Program 4 in Appendix D.

14

Hidden Markov models were then trained for each word in the vocabulary using the
ﬁrst 10 utterances (i.e. symbol strings). For each word three models were trained,
one for Speaker 1, another for Speaker 2 and the final for Speaker 1,2. The HMM’s
were trained using the conventional Baum-Welsh algorithm with appropriate scal-
ing (see, e.g., [8]). The model used for training was the Bakis model [16] with six
states, as opposed to the ergodic model which is fully interconnected. Ergodic mod-
els have been used in the dysarthic speech case (see, e.g., [2]). Figure 4 shows the
Bakis model while Fig. 5 shows the Ergodic model. The program Hmmtrain was
used in training the models. This appears as Program 5 in Appendix E. The soft-
ware package Matlab (Math Works, Inc.) was then used in conjuction with the “.m
ﬁles” trnsfhmm.m and autotrns.m to transform the HMM’s as in Section 2.1. These
“.m ﬁles” are listed under Program 6 in Appendix F. Utterances 11-15 were used to
test the models. Program Evalhmms, listed as Program 7 in Appendix G, was used
in the evaluation. The results of the evaluations are found in Section 3.3. The pro-
gram to implement the compressions described in Sections 3.5 through 3.7 is entitled

Compress and is listed as Program 8 in Appendix H.

3.3 Recognition Rates before Compression

Tables 3 through 5 list the recognition rates for the various vocabularies after the
transformation and before compression. These are the baseline recognition rates since
no loss in the recognition rate has occured due to compression. A comparison to a
study on the recognition of digits from the literature is now given. Rabiner in [16] gives
a recognition rate of 96.3% for the recognition of digits in a speaker independent case
(100 speakers). Vocabulary 2 which has the digits plus 10 other words is the closest to

this case. The speaker dependent cases of Vocabulary 2 have higher recognition rates

15

68

4th Order Bandpass Butterworth Filter

 

20 ‘m ‘ Li
I DO 0 U C! HOODED
15 -‘

 

(5'

10.. D?
U
C]
D

 

_5—4

-15 ..

 

 

 

 

T I l I “I

_,.I i I I

Log( Frequency in Hz)

Figure 3: Transfer function of anti-aliasing filter.

16

 

 

Figure 5: Four state ergodic model.

17

 

Vocabulary l

Vocabulary 2

 

Speaker 1

Speaker 2

Speaker 1,2

Speaker 1

Speaker 2

Speaker 1,2

 

 

91.79%

 

90.00%

 

81.67%

 

 

98.00%

 

97.00%

 

88.50%

 

Table 3: Recognition Rates for Vocabularies 1 and 2.

 

Vocabulary 3

Vocabulary 4

 

Speaker 1

Speaker 2

Speaker 1,2

Speaker 1

Speaker 2

Speaker 1, 2

 

90.77%

 

88.46%

 

 

82.31%

 

 

93.48%

89.13%

 

 

81.52%

 

than 96.3% because in general a speaker dependent case results in better recognition
than a speaker independent case. If we look at Speaker 1,2 which is the speaker
independent case, the recognition rate is 88.5%, which is for the digits plus 10 other
words (which results in more confusions). When the vocabulary is limited to the
digits for Speaker 1,2, the recognition rate increases to 93.0%. A likely reason why
this is lower then 96.3% is because there were fewer training samples for Speaker 1,2
than in [16]. The recognition rate for Speaker 1 (a speaker dependent case) on the

digits was 100%. This recognition rate is expected since it is a speaker dependent

Table 4: Recognition Rates for Vocabularies 3 and 4.

case as opposed to the speaker independent case.

 

Vocabulary 5

 

Speaker 1

Speaker 2

Speaker 1,2

 

 

96.97%

 

96.36%

 

92.

42%

 

 

Table 5: Recognition Rates for Vocabulary 5.

18

 

 

3.4 Eigenvalues

In Fig. 6 we see the eigenvalues of the transformed HMM’s for Vocabulary 1. Notice
that a sixth of the eigenvalues are unity. This is a result of using a six state Bakis
model which has an ending state (the ﬁnal state has probability one of jumping to
itself). In fact, any model with an ending state will have an eigenvalue of unity. The

proof of this is as follows. A Bakis model corresponds to a matrix representation of

the form

I. an an an 0 0 0 0 . . . 0 ‘
0 an an» an 0 0 0 . . . 0
0 0 an an an 0 0 0

A = (12)

0 . . . 0 0 0 0 an_g,,,_2 an_2,,,_1 an_2,,,
0 . . . 0 0 0 0 0 an-1,n_1 and,"

. 0 . . . 0 0 0 0 0 0 am, ‘

 

 

Since A represents the state transition probabilities, the sum of each row must equal

one. Therefore a,m = 1. The eigenvalues are determined by

detox — A) = o (13)

Now det(/\I — A) can be expressed as (see, e.g., [14])

det()II — A) = (A - an”) det()\I,,,, — Am.) (14)

19

 

Eigenvalues of Transformed Hidden Markov Models

 

 

 

 

l '- —1
0.8 - 'I
O 6 " ’a .—
0 s
:3
s s?
0.4 " f’i; "
: .' .-
')0’
0.2 '- :& -1
O —- -
O 50 100 150 200 250 300 350 400 450 500

Nominal - 474 Eigenvalues

Figure 6: Eigenvalues of all transformed HMM’s.
Top curve - Speaker 1,2; Middle curve - Speaker 1; Bottom curve - Speaker 2.

or (since a,m = 1),

det()\I — A) = (A — 1) det(AI.,,, — A...) (15)

where (AIM - Ann) is obtained by striking out row n and column n in (AI — A).
The eigenvalues of A are the roots of det(AI - A) = 0 which, from (15) clearly
include A = 1. Therefore each model has at least one unity eigenvalue for this model

structure.

20

3.5 Nearest Neighbor Compression

As noted in Section 2.2 the eigenvalues that are close together represent similar tran-
sient responses to a unit sample input. Thus to gain further computation savings
these eigenvalues need to be grouped together in some fashion. In this section we will
explore what happens if we combine them together in a nearest neighbor procedure
(see, e.g., [13]). By nearest neighbor we mean eigenvalues which are closest in the
euclidean sense. The nearest neighbor procedure works as follows. Say for example
we have a vocabulary of 100 words and are using a six state Bakis model. We then
have 600 eigenvalues to compress together. So the nearest neighbor procedure would
ﬁrst ﬁnd the two eigenvalues that are closest together out of all 600 and replace them
by their mean. A recognition experiment would then take place to see what happened
to the recognition rate as a result of combining these two eigenvalues. The procedure
then continues by ﬁnding the next two closest eigenvalues. Notice after this point that
a certain “eigenvalue” could be the mean of a previous combining. If so, then the new
mean would be that of the original eigenvalues represented by this “eigenvalue” and
the eigenvalue combining with it. This continues until all the eigenvalues are grouped
into one value, thus the ﬁnal value equals the mean of all 600 eigenvalues. The last
recognition experiment is then run to see the effect of replacing all the eigenvalues by
this mean.

From Section 3.4 we notice in the above example that all 100 models have an
eigenvalue that equals one. The signiﬁcance of this is that 100 eigenvalues have
nearest neighbors of distance zero and will be combined with no effective change
in value which results in no change in the recognition rate. Thus a compression of
n = 0.166 occurs with no loss in the recognition rate. Figures 7 through 11 show how

the recognition rate is affected by compression using the nearest neighbor method for

21

the Vocabularies 1-5. The solid line represents the recognition rates for Speaker 1,
the dashed line for Speaker 2, and the dotted line for-Speaker 1,2.

Looking at Fig. 7 which represents Vocabulary 1, there are 468 eigenvalues (78
words with six states each) to compress. A compression of zero means nothing has
been done to the eigenvalues while a compression of one means that all the eigenvalues
have been replaced by the mean of the 468 eigenvalues. In terms of k, a compression
of zero implies K. = O and if the 468 eigenvalues are replaced by the population means,
n: = 0.998. The reader is refered to page 9 for the deﬁnition of 1c. The most noticable
feature in Fig. 7 is that the compression has a much more detrimental affect on
Speaker 1,2 then on Speakers 1 or 2. All the recognition rates in Fig. 7 stay constant
initially because all the identical eigenvalues are being combined as discussed earlier.
The recognition rate for Speaker 2 drops about 7% after a compression of n = 0.2 and
then remains somewhat constant until a compression of k = 0.8. Speaker 1 on the
other hand just gradually degrades to the approximate 7% drop for a compression of
k = 0.8. What this means in terms of computation is that if a 7% degradation in the
recognition rate is acceptable (K. = 0.8) for Speaker 2 with this 78 word vocabulary,
the computation needed to recognize an utterance 1.28 seconds long (100 observation
symbols) would be reduced from 140,400 ﬂops (5.6 s on the 16 MHz PC for Bakis
model before transformation) to 9360 ﬂops (0.37 s on the PC). If a 25% drOp in the
recognition rate was acceptable, the compression would then be K. = 0.98. This would
further reduce the computation from the 140,400 flops to 936 flops.

In Fig. 8 for Vocabulary 2 we have 120 eigenvalues to compress. The results are

similar to Fig. 7 except for Speaker 1 where a compression of K. = 0.9 results in a

degradation of 10% in the recognition rate. Vocabulary 3 with 156 eigenvalues shown

 

3The compression index I: approaches 1 as all the eigenvalues are replaced by the population
mean and as the number of eigenvalues approaches inﬁnity.

22

Recognition vs Compression for Vocabulary 1
100 . . . m

 

 

 

 

 

 

 

 

,8

.32 r
30 ~ ‘
20 ]. _
10 - 4

o L s m 1 J
O 0.2 0.4 0.6 0.8 1 1.2
Compression

Figure 7: Nearest neighbor compression for Vocabulary 1.
Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line — Speaker 1,2.

in Fig. 9 shows that the compression has a much more pronounced affect than for the
previous vocabularies, especially for Speaker 2 and Speaker 1,2. The reason for this
is that there are many letters (e.g. b,c,d,g,p,t,v,z) which are confused easily. Thus
distinct eigenvalues are apparently important for similar words.

The results of compression for Vocabulary 4 (276 eignevalues) is seen in Fig. 10.
Here we see that Speaker 1 loses only 5% in the recognition rate for a compression of
k = 0.65, but then falls off. Speaker 2 as in Fig. 9 drops off 10% but then remains
somewhat constant out to a compression of K. = 0.8 where it drops off. Speaker 1,2
does not perform well with the compression greater than k = 0.3. Vocabulary 5
(198 eigenvalues) in Fig. 11 shows that all speakers have a similar performance at a
compression of ft = 0.7. This is due to the vocabulary of words that are much more

dissimilar than the words in the other vocabularies. The dissimilar words are not

23

 

Recognition vs Compression for Vocabulary 2
100 v A v I

---------------

\--—-‘ .--
o----—---- \c.--
....--. ‘-----

80—

70-

 

50.

Recognition Rate

40 P _
30 4 ~
20 - K J

10 I .

 

 

 

 

O 0.2 0.4 0.6 0.8 1 1.2

Compression

Figure 8: Nearest neighbor compression for Vocabulary 2.
Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

confused as easily and get better compression than, say, in Vocabulary 3 (Fig. 9).

This is especially noticable for Speaker 1,2.

24

Recognition vs Compression for Vocabulary 3

 

 

 
   

 

 

 

100 . . ,
90 : ........... .. a
80 ._. ....................... .. q
70 - "-2. ----------------- -
:3 so - ........... ' -
g, 40 _ ......... 4.. _
3O - .1
20+ ‘
10 - .

00 0:2 0.3 0:6 03 i 1.2

Compression

Figure 9: Nearest neighbor compression for Vocabulary 3.
Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

25

 

 

 

 

 

 

 

Recognition Rate

- Recognition vs Compression for Vocabulary 4
100 T . . .

 

 

 

50L-

Rccogniliou Rate

BOI-

20-

 

10-

 

 

 

O 0.2 0.4 0.6 0.8 1

Compression

Figure 10: Nearest neighbor compression for Vocabulary 4.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

26

1.2

U—ﬂV— CAv—u-Civuuz

Recognition vs Compression for Vocabulary 5

 

Recognition Rate

 

 

 

 

O 0.2 0.4 0.6 0.8 1

Compression

Figure 11: Nearest neighbor compression for Vocabulary 5.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

27

1.2

PX

be

lit

in

3.6 “Linearized” Nearest Neighbor Compression

If we look at Fig. 6 we can see that the values approach 0.95 in a “logarithmic” sense.
Thus the eigenvalues that were combined ﬁrst in the nearest neighbor compression
were those close to 0.95. The question arises as to whether the eigenvalues close to 0.95
and close to each other should be combined so soon. Thus we attempt to “linearize”
the eigenvalues in an attempt to combined the eigenvalues in a more uniform fashion.

To do the “linearization” the following function is used:

A ...
Anew = Ellis—.32 (16)

where Anew is the modiﬁed eigenvalue. The new eigenvalues are then treated just as
the eigenvalues were in Section 3.5 in the same nearest neighbor fashion with identical
values combined ﬁrst. The new eigenvalues are used only for the distance measure
while the old eigenvalues are used to calculate the new means. Figure 12 shows the
result of the “linearization”.

Figures 13 through 17 show how the recognition rate is affected by compression
using the “linearized” nearest neighbor method for the Vocabularies 1-5. The solid
line represents the recognition rates for Speaker 1, the dashed line for Speaker 2, and
the dotted line for Speaker 1,2.

The ﬁgures for the “linearized” nearest neighbor compression are very similar to
the nearest neighbor case. For Speakers 1 and 2 the “linearized” method is slightly
better since it has a slightly better compression for the same recognition rate compared
with the nearest neighbor method. The ﬁgures look similar except that the ﬁgures
in the “linearized” method are “stretched” a little and this is most noticable at the
“knee,” where the recognition rate plunges. For Speaker ”1,2 the effect is just the

opposite. There is a slightly worse performance since the recognition rate falls off a

28

Value

fig)
lop
bit 5
gene
C356

neig

"Linearized" Eigenvalues of Transformed Hidden Markov Models

T I

 

0.8 - ..

 

 

 

0.6 - _
D
D
7.!
>
0.4 - q
4» /
f
’4’}
0.2 — If." -
"5/
.;.-
o ...- _
0 so 100 150 200 250 300 350 400 450 $00

Value = (IOAEigenvalue - 1)]9

Figure 12: “Linearized” eigenvalues of transformed HMM’s.

Top curve - Speaker 1,2; Middle curve - Speaker 1; Bottom curve - Speaker 2.

bit sooner than in the nearest neighbor case. To the extent that these experiments are
generalizable, this method seems to work slightly better for the speaker dependent
case and slightly worse for the speaker independent case compared with the nearest

neighbor method without linearization.

29

Recognition vs Compression for Vocabulary 1
1m V M I I f

 

 

 

 

 

 

 

.1
._§ -
'a ..........
g ... ....... _4
3o .. ..
20 " .1
10 F -
o 1 A A A A
O 0.2 0.4 0.6 0.8 l 1.2

Compression

Figure 13: “Linearized nearest neighbor compression for Vocabulary 1.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

30

Recognition vs Compression for Vocabulary 2
100 . .

 

 

 

 

 

 

 

 

 

9O - .................................... ..-- """" ~- s

so - i, a

70 _ i; s
3 60 b :1, in; I
:§ 50 I- ...-I“: q
§ 40 - -
°‘

30 - q

20 - 3 «

10 - I

o A A n l A
o 0.2 0.4 0.6 0.8 1 1.2

Compression

Figure 14: “Linearized nearest neighbor compression for Vocabulary 2.
Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

31

 

Recognition vs Compression for Vocabulary 3
1m r 1 V V T

 

 

 

 

 

 

9..
a -4
:3 —
5"
m _.
30 P ..
20 - . -
10 - = ~
0 . L
1

0 0.2 0.4 0.6 0. 8
. Compression

Figure 15: “Linearized nearest neighbor compression for Vocabulary 3.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

32

1.2

Recognition vs Compression for Vocabulary 4
100 v r . r m

 

 

   

 

 

 

 

2
:2
:3 ':
g
30 ..
20 -
10 -
0 .1 J 41 A A
o 0.2 0.4 0.6 0.8

Compression

Figure 16: “Linearized nearest neighbor compression for Vocabulary 4.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

33

Recognition vs Compression for Vocabulary 5
1% V V V V' r

 

..............................

9O

 
    

.........
-----------------
....................

80

70

I
'0

o--\

50»-

Rccognition Rate

40-

30]-

 

 

20)-

 

IOI-

 

)-

 

 

O 0.2 0.4 0.6 0.8 1

Compression

Figure 17: “Linearized nearest neighbor compression for Vocabulary 5.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

34

1.2

3.7 Bottom Up Compression

Bottom up compression is used to see at what point the small eigenvalues can be
neglected. Bottom up compression just means that the smallest values are aver-
aged together before the larger values, beginning with those of smallest magnitude.
The only exception is that all the identical values are ﬁrst compressed (i.e. all the
unity eigenvalues are combined ﬁrst). The bottom up compression works as follows.
First all identical values are compressed together since this compression has no effect
on recognition rate. Then the smallest two eigenvalues are replaced by their mean
followed by a recognition test to see the effect. Next the original smallest three eigen-
values are replace by their mean and a recognition test is run to see the effect. This
continues until all eigenvalues are replaced by one value representing the population
mean.

Figures 18 through 22 show how the recognition rate is affected by compression
using the botton up method for the Vocabularies 1-5. The solid line represents the
recognition rates for Speaker 1, the dashed line for Speaker 2, and the dotted line for
Speaker 1,2.

The main reason for doing the bottom up compression was to see how far the
“plateau” of the initial recognition rate could be extended out before it would drop
off. Thus, if say all the eigenvalues less than 0.8 did not contribute much, then the

recognition rate would stay the same until the eigenvalues above 0.8 were affected.
This turns out not to be the case. As can be seen in Figures 18 through 22 the
recognition rate starts dropping as soon as all the identical values are merged together,
the only exceptions are for Speaker 1 in Vocabularies 2 and 5, Speaker 2 in Vocabulary
3 and Speaker 1,2 in Vocabularies 2 and 3. In fact the nearest neighbor method did

as well or better in extending the initial “plateau.” Thus there appears to be no

35

Recognition vs Compression for Vocabulary 1

 

100 . . m ,

 

80

7O

60

50

4O

Recognition Rate

30

20

IO

 

 

 

 

0 0.2 0.4 0.6 0. 8 1
Compression

Figure 18: Bottom up compression for Vocabulary 1.
Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

threshold below which all the eigenvalues can be readily merged together.

36

Recognition vs Compression for Vocabulary 2

 

 

Recognition Rate

 

 

 

 

Compression

Figure 19: Bottom up compression for Vocabulary 2.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

37

1.2

Recognition vs Compression for Vocabulary 3

 

100 r . . .

 

Recognition Rate

 

 

 

 

O 0.2 0.4 0.6 0.8

Compression

Figure 20: Bottom up compression for Vocabulary 3.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

38

1.2

Recognition vs Compression for Vocabulary 4

 

 

        

 

 

 

 

1m 1 i r v T
90 -. .............. "
80 [- ...................................... _.
7o .. ., “:5 " q
3
2 6° ~ ~
:3 so - -
= b
g 40 - 4
3O - a
20 i— -I
10 - -
o 1 i t i .
O 0.2 0.4 0.6 0.8 l 1.2

Compression

Figure 21: Bottom up compression for Vocabulary 4.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

39

3.3... 55.5%».qu

Recognition vs Compression for Vocabulary 5

 

100 .

 

 

 

 

 

8

a _-
§ .
a: -
1
20 - -
10 I- q

o L 1 A 1 A

o 0.2 0.4 0.6 0.8 1

Compi-ession

Figure 22: Bottom up compression for Vocabulary 5.

Solid line - Speaker 1; Dashed line - Speaker 2; Dotted line - Speaker 1,2.

40

1.2

4 Conclusions

A procedure has been presented for evaluating the likelihood of a hidden Markov
model using only 0([1 — k]N T) ﬂoating point operations, where N is the number
of states, T is the number of observations derived from an utterance, and k, the
compression index, can be close to unity depending on the acceptable degradation in
recognition rate. This is a reduction from the 0(3N T) to 0(N2T) operations used in
existing algorithms to evaluate the likelihood of an HMM. This is signiﬁcant for the
development of a speech recognizer to be run on a PC for speech disabled individuals.

It has been found that the compression has a greater detrimental impact on the
speaker independent case than for the speaker dependent case. This is not critical
since the use of the compression technique will be used for people with cerebral palsy
on an individual basis. It has also been found that the vocabularies with words that
are dissimilar perform better under compression than vocabularies with similar words.
This is true for recognition in general. It has been observed that larger vocabularies
result in greater computational savings. This is because with larger vocabularies
there are more eigenvalues that are very close to each other and can be merged with
negligable affect on the recognition rate.

Near-real-time recognition is. possible for large vocabularies since the compression
can greatly improve the speed at which recognition takes place. This can be seen in
the following example. Suppose a vocabulary of 1000 words were used in conjunction
with a six state Bakis model, the compression index was K = 0.9, and an utterance
contained 100 observation symbols. The computation to recognize the word would
then drop from 1.8 million ﬂops to 60 thousand ﬂops. For the PC described earlier,
this would be the difference between 72 s and 7.2 s to recognize a word. This increased

speed comes at the expense of at least some recognition rate degradation however.

41

It should be pointed out, that though the recognition rate drops as the compression
is increased, if a word is misrecognized it is usually found among the words with
the highest scores in these experiments. The signiﬁcance of this can be seen if one
realizes that this technique is to be used in a system to help disabled individuals
communicate. Thus if a word is misrecognized, the correct word can be quickly found
by searching the words with the top scores in a binary fashion, as opposed to having
to resort to some other method like spelling the word out.

It is the conclusion of this thesis that the reduction in computational complexity
achieved by the transformation and compression of hidden Markov models is feasable
for the implementation of an efﬁcient likelihood evaluation to be implemented in a

speech recognition unit for disabled individuals.

42

References

[I] D. Hsu and J .R. Deller, Jr., “On the use of HMM’s to recognize cerebral palsy
speech: Isolated word case,” Proc. ICASSP ‘89, Glasgow, vol. 1, pp. 290-293,
May 1989.

[2] D. Hsu, “Computer recognition of nonverbal speech using hidden Markov mod-
elling” (Ph.D. dissertation), Northeastern University, Boston, 1988.

[3] D. Hsu and J .R. Deller, Jr., “Recognition of cerebral palsy speech using hidden
Markov modelling: Isolated word case,” Biomedical Measurement, Control, and
Informatics, in review.

[4] SE. Levinson, “Structural methods in automatic speech recognition,” Proc.

IEEE, vol. 73, pp. 1625-1650, 1985.

[5] L.R. Rabiner and RH. Juang, “An introduction to hidden Markov models,”
IEEE ASSP Magazine, vol. 3, pp. 4-16, 1986.

[6] C.T. Chen, Linear Systems Theory and Design, New York: Holt, Rinehart and
Winston, 1984.

[7] J.R. Deller, Jr. and D. Hsu, “An alternative adaptive sequential regression al-
gorithm and its application to the recognition of cerebral palsy speech,” IEEE
Trans. Circ. and Sys., vol. 34, pp. 782—787, 1987.

[8] SE. Levinson, L.R. Rabiner, and M.M. Sondhi, “An introduction to the appli-
cation of the theory of probablistic functions of a Markov process to automatic
speech recognition,” Bell Sys. Tech. J., vol. 62, pp. 1035-1073, 1983.

[9] RA. Divijver and J. Kittler, Pattern Recognition: A Statistical Approach, Lon-
don: Prentice-Hall International, 1982.

[10] A.V. Oppenheim and R.W. Schafer, Digital Signal Processing, London: Prentice-
Hall International, pp.239-250, 1975.

[11] SA. Davis and P. Mermelstein, “Comparison of parametric representations for
monosyllabic word recognition in continuously spoken sentences,” IEEE Trans.
ASSP. vol. ASSP-28, pp. 357-366, August 1985.

[12] W. Johnson, EL. Darley, and DC. Spriestersbach, Diagnostic Methods in Speech
Pathology, New York: Harper & Row Publishers, 1963.

[13] A.K. Jain and RC. Dubes, Algorithms for Clustering Data, London: Prentice-
Hall International, pp.128—130, 1988.

43

APPENDICES

A Program 1 - Wavemark

This program allows the user to view the data sampled by the streamer program by
Metrabyte. It is‘ run using the following command line arguments.

Wavemark ﬁlename.dat x

where x=0 if the ﬁle was produced by streamer, x=1 if the ﬁle was produced by
Wavemark, and x=2 if using the cerebral palsy data from Northeastern University.

45

#include <stdlib.h>
«include <graphics.h>
#include <conio.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <ctype.h>

long stepsize;
int inc,opt,b,e,M,maxx,maxy;
long b_index,e-index;
long FILE_SIZE,FILE-INDEX,disp1ay-size,end_val;
/* Upper case are indexes */
/* in bytes , lower case are indexes in points */
char bufferbEQOO],buffere[900];
/* there are two bytes per point (int) */
char outfilenameEBO],infilenameIBO];
char PATHISO];
int MAXVAL,INFILE_TYPE;
FILE *infilei;
main(argc,argv)
int argc;
char *argvf];
{

int graphdriver I DETECT,graphmode;
int errorcode;

int 1
char
long
void
void
void
void
void
void
void
void
void
void
void

nchar;

*stop;
filelengthC):
change_stepsize();
change-position();
change_marker()3
save-pointsZtile();
change-fileindex();
change-disp1aysize();
change-endindex();
change_path();
update-screen()3
initial-screen();
zoom_screen();

46

void listen_vord();
void exit()3

if( argc (3 1){
printf("Usage: Hordmark (binary file name)
(type of fi1e>\n\n");
printf(" For type of file enter
(0) for streamer binary file (Metrobyte)
or <1> for integer binary file.\n");
printf(" Hordmark saves files in the
integer binary file format.\n\n");
printf("Examp1e: Wordmark cpdata.dat 0\n");
exit(0);
}

strcpy(infilename,argv[1]);

INFILE-TYPE=atoi(argv[2]):

printf("infilename-%s INFILE-TYPE-%d\n"
,infilename,INFILE_TYPE);

strcpy(PATH,"\\");

FILE_INDEX'0;

display_size=50000;

if ( (infilel - fopen(infilename, "rb")) -- NULL) {
printf("fopen failed for file Zs.\n",infilename);
exit(0);

}

printf("file 13 has been opened\n",infilename);

if ( (FILE-SIZE 8 filelength(fileno(infilei))) !8 -1L) {
printf("Size of file 3 21d bytes 8 21d data points\n"

,FILE_SIZE,FILE_SIZE/2);.

}

elsef
printf("ERROR getting file size\n");

}

if( (FILE-INDEX + display_size*2) > FILE-SIZE )
display_size I FILE-SIZE - FILE_INDEX;

MAXVAL82048;

Ms(int)ceil((double)display_size/100);

printf("M-%d display-size-%ld fileindex-Zld
filesize-de\n",M,display-size,FILE-INDEX,FILE_SIZE);

47

initgraph(&graphdriver,agraphmode,"c:\\tc");
errorcode = graphresult();
if( errorcode !- 0 ) {

printf("Graphics error: %s\n",grapherrormsg(errorcode));
exit(1);

}

maxx = getmaxxC);
maxy = getmaxyC);

initial_screen();
change-fileindex(0);
update_screen();

inchar=0;
while( inchar != 120 )

{

inchar=getch();

svitch(inchar)

}

closegraphC);
exit(0);

}

{

case
case
case
case
case
case
case
case
case
case
C386
C386
case
case
case
case

43 :
45 :
61 :
62 :
63 :
64 :
65 :
66 :
67 :
68 :
75 :
77 :
102:
108:
117:
122:

default:

}

change_stepsize(1);
change_stepsize(-1);
change_marker(1);
change-marker(2);
change_endindex(-1);
change_endindex(1);

change-displaysize(-1);

change_disp1aysize(1);

change_fileindex(-1)
change_fileindex(1);
change-position(-1):
change_position(1);
save-point32file();
listen_vord();
update-screen();
zoon_screen();

break;

s
9

48

break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;
break;

/* x

/* +
/* -
/* F3
/* F4
/* F5
/* F6
/* F7
/* F8
/* F9
/* F10
/* <-
/* ->
/* f
/* 1
/* u
/* z

*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/

/*********************************************************/
void save_points2file(){

char stringEBO],string1[80],string2E80],stringSE803,ch[2];
unsigned inbufferElOO];

int inkey,inva1,1ength,i,j,outbufferEiOO];

long index-c; '

FILE *outfile;

char *1toa()3
char *itoaC):

setviewport(0,370,300,390,0);
setbkcolor(0);

clearviewport();
setviewport(0,0,maxx,maxy,0);
strcpsttring,"Enter output file name >>");
setcolor(60);

outtextxy(5,370,string);

strcpy(stringl,"\0");
ch[1]=’\0’;
inkeyso;
while( inkey !8 27 ) /* CR II esc */
{
inkey=getch();
if( isalnum(inkey) !- 0 ll inkey -- 46){
/* alpha num or "." */
ch[0]=inkey;
strcat(stringi,ch);
/* length=str1en(string1);*/
setviewport(210,370,639,390,0);
setbkcolor(0);
clearviewportC);
setvieVport(0,0,maxx,maxy,0)3
setcolor(60);
outtextxy(215,370,stringl);
}
if( inkey -- 8 ){
length-strlen(stringl);
stringiElength-1]-’\O’;
setviewport(210,370,639,390,0);
setbkcolor(0);

49

clearviewportC);
setviewportC0,0,maxx,maxy,0);
setcolor(60);
outtextxy(215,370,string1);
}
if( inkey .. 13 ){
setviewport(0,370,639,390,0);
setbkcolor(0);
clearviewport()3
setviewport(0,0,maxx,maxy,0);
setcolor(62);
strcpsttring,"Hriting points to file ");
strcatCstring,string1);
outtextxy(5,370,string);
outfile = fopen(string1, "wb");
fseekCinfilel,FILE_INDEX,SEEK_SET);
M=(int)ceil((double)display_size/100);
index-c = 0;
for( i=0; i<M; i++){'
fread((void *)inbuffer
,sizeof(unsigned),100,infile1);
for(j=0; j<100; j++) {
if( INFILE_TYPE .. o ) {
inval = inbuffer[j1>>4;
inval -= 2048;
}

/* File structure from Metrobyte’s streamer sampling
program. The data is in the 12 MSB’s (the 4 LSB’s
contain the channel info). Thus the data is shifted
4 bits to right and 2048 subtracted to get the bipolar
format (unsigned the values range from 0-4098 for
12 bits) */

if( INFILE-TYPE -- 1 )

inval - inbuffer[j];
if( INFILE_TYPE -- 2 )

inval = inbufferEjJ-2048;
outbufferEj] - inval;

index-c++;
}
fwrite((void *)outbuffer, sizeof(int),100,outfile);
}
fcloseCoutfile);

50

}

setviewport(0,370,639,390,0);

setbkcolor(0);

clearviewport();

setviewport(0,0,maxx,maxy,0);

break;

if( inkey =8 27 ){

}
}
}

setviewport(0,370,639,390,0);
setbkcolor(0);
clearviewportC);
setviewport(0,0,maxx,maxy,0);
break;

/*********************************************************/

void change_stepsize(inc)
int inc;

{

char string[80];
if( inc -- -1 ll inc == 1 ) {
switch(stepsize){

case 1: if(inc =8 1 ) {
stepsize-10;
setcolor(56);

strcpy(string,"STEP SIZE :

100 1,000 10,000 100,000
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE
outtextxy(10,350,string);

}
break;
case 10: if(inc -- -1 ){
stepsize-l;
setcolor(56);

strcpy(string,"STEP SIZE :

100 1,000 10,000 100,000
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE

51

1 10
1,000,000");

10");

1 10
1,000,000");

1");

outtextxy(10,350,string);
}
if(inc .. 1 ) {
stepsize-100;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE ‘100");
outtextxy(10,350,string);
}
break;
case 100: if(inc as -1 ){
stepsize810;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 10");
outtextxy(10,350,string);
}
if(inc 8- 1 ){
stepsize-1000;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 1,000");
outtextxy(10,350,string);
}
break;
case 1000: if(inc .- -1 ) {
stepsize-100;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 100");

52

outtextxy(10,350,string);
}
if(inc .. 1 ){
stepsize=10000;
setcolor(56);
strcpsttring,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 10,000");
outtextxy(10,350,string);
}
break;
case 10000: if(inc == -1 ){
stepsize=1000;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 1,000");
outtextxy(10,350,string);
}
if(inc .. 1 ){
stepsize=100000;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 100,000");
outtextxy(10,350,string);
}
break;
case 100000: if(inc .- -1 ){
stepsize-10000;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 10,000");

53

outtextxy(10,350,string);
}
if(inc == 1 ){
stepsize=1000000;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 1,000,000");
outtextxy(10,350,string);
}
break;
case 1000000: if(inc =8 -1 ){
stepsize-100000;
setcolor(56);
strcpy(string,"STEP SIZE : 1 10
100 1,000 10,000 100,000 1,000,000");
outtextxy<10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 100,000");
outtextxy(10,350,string);
}
break;
default: break;
}
}
}
/******************************************#*************/
void change-position(nxt)
int nxt;
{
char string1[10];
int oldb,olde;

char *ltoa();

if( opt =- 1 || nxt 8- 0)
{
oldb-b;
b-index ' b-index + stepsize*nxt;
if( b-index < FILE-INDEX/2 )

54

b_index 8 FILE-INDEX/2;
if( b_index > e,index)

b-index I e_index - display-size/maxx;
b=(b_index - FILE-INDEX/2)*maxx/display_size;
setviewport(200,330,300,340,0);
setbkcolor(0);
clearvievportC);
setviewport(0,0,maxx,maxy,0);
ltoa(b-index,string1,10);
setcolor(63);
outtextxy(200,330,stringl);
putimage(o1db,10,bufferb,COPY_PUT);
getimage(b,10,b,225,bufferb);
setcolor(63);
line(b,10,b,225);
}

if( opt =8 2 II nxt =8 0)
{
oldese;
e_index - e_index + stepsize*nxt;
if( e-index < b-index )
e-index - b-index + display_size/maxx;
if( e_index > FILE_INDEX/2 + display-size )
e-index I FILE_INDEX/2 + display-size;
e-(e-index - FILE_INDEX/2)*maxx/display-size;
setvievport(470,330,570,340,0);
setbkcolor(0);
clearvievportC);
setviewportC0,0,maxx,maxy,0);
ltoa(e-index,string1,10);
setcolor(57);
outtextxy(470,330,stringi);
putimage(olde,10,buffere,CDPY_PUT);
getimage(e,10,e,225,buffere);
setcolor(57)3
1ine(e,10,e,225);
}

}

l*********************************************************/

55

void change-marker(mrk)
int mrk;
{
if ( mrk ‘3 1 )
opt I 1;
if ( mrk -- 2 )
opt 8 2;

}

Ittt*****************#************************************/
void change-fileindex(nxt)
int nxt;
{
char stringE40],string1[40];
FILE-INDEXIFILE-INDEX+stepsize*2*nxt;
if ( FILE_INDEX < 0 >
FILE_INDEX=0;
if ( FILE-INDEX > FILE-SIZE - 20)
FILE_INDEX3FILE_SIZE-20;
ltoa(FILE-INDEX/2,string1.10);
setviewport(10,280,110,290,0);
setbkcolor(0);
clearviewport();
setviewport(0,0,maxx,maxy,0);
setcolor(58);
outtextxy(10,280,stringl);

end_val I FILE-INDEX/2 + display-size;
if ( end_val > FILE_SIZE/2 )
{
end-val 8 FILE_SIZE/2;
display_size - FILE-SIZE/2 - FILE-INDEX/2;
}
ltoa(end_val,string1,10);
setvievport<540,280,639,290,0);
setbkcolor(0);
clearviewportC);
setvievportC0,0,maxx,maxy,0);
setcolor(58);
outtextxy(540,280,3tring1);

56

ltoa(display-size,string1.10);
strcpsttring,"<-- ");
strcat(string,string1);
strcat(string," -->");
setviewport(250,280,390,290,0);
setbkcolor(0);

clearviewport();
setviewport(0,0,maxx,maxy,0);
setcolor(58);
outtextxy(250,280,string);

}
/*********************************************************/
void change-endindex(nxt)

int nxt;

{

char string[40],string1[40];

end_val I end_va1 + stepsizeInxt;
if ( end_val > FILE-SIZE/2 )
end-val I FILE_SIZE/2;
if ( end_va1 < FILE-INDEX/2 )
end-val I FILE-INDEX/2 + 10;
display-size I end_val - FILE-INDEX/2;
ltoa(end_val,string1,10);
setviewport(540,280,639,290,0);
setbkcolor(0);
clearviewport();
setviewport(0,0,maxx,maxy,0);
setcolor(58);
outtextxy<540,280,stringl);

ltoa(disp1ay_size,string1,10);
strcpy(string,"<-- ");
strcat(string,string1);
strcat(string," -->");
setviewport(250,280,390,290,0);
setbkcolor(0);

clearvievportC);
setviewport(0,0,maxx,maxy,0);
setcolor(58);
outtextxy(250,280,string);

57

}

/*********************************************************/
void change-displaysize(nxt)
int nxt;
{
char string[40],string1[40];
display-size I display_size + stepsize*nxt;
if ( display-size < 10 )
display_sizeI10;
if ( display-size > FILE_SIZE/2 - FILE_INDEX/2 )
display-sizeIFILE-SIZE/2 - FILE_INDEX/2;
ltoa(disp1ay_size,stringl,10);
strcpy(string,"<-- ");
strcat(string,string1);
strcat(string," -->");
setviewport(250,280,390,290,0);
setbkcolor(0);
clearvievport();
setviewport(0,0,maxx,maxy,0);
setcolor(58);
outtextxy<250,280,string);

end_val I FILE-INDEX/2 + display-size;
if ( end-val > FILE_SIZE/2 )
{
end-val I FILE-SIZE/2;
display-size I FILE_SIZE/2 - FILE_INDEX/2;
}
1toa(end-val,string1,10);
setviewport(540,280,639,290,0);
setbkcolor(0);
clearvieVport();
setviewport(0,0,maxx,maxy,0);
setcolor(58);
outtextxy(540,280,string1);

}

/*****#***#****************##********t********************/
void update_screen()

{

58

char string[80],string1[40];
int x1,x2,y1,y2,inchar;

int i,j;

long index-1;

long index_2,inc_value;
unsigned buffer[100];

int inval;

void change_fileindex();

setviewport(0,10,639,225,0)3
setbkcolor(0);
clearviewport();
setviewport(0,0,maxx,maxy,0);
setcolor(58);

fseek(infilel,FILE_INDEX,SEEK_SET);
M=(int)ceil((double)display_size/100);
if( M < 2000 ){
MAXVALIO;
for( 1-0; i<M; i++) {
fread((void *)buffor, sizeof(unsigned),100,infile1);
for(j=0; j<100; j++)
{
if( INFILE-TYPE 3' 0 )1
inval I bufferEj]>>4;
inval -- 2048;
}
if( INFILE-TYPE .. 1 )
inval I bufferEj];
if( INFILE-TYPE 3' 2 )
inval I bufferEjJ-2048;
if( MAXVAL < abs(inval) )
MAXVAL I abs(inval);

}

}

}

else
MAXVALI2048;

itoa(MAXVAL,string1,10);
strcpy(string,"Amplitude of waveform");
setviewport(220,295,390,315,0);

59

setbkcolor(0);
clearviewportC);
setviewport(0,0,maxx,maxy,0);
setcolor(2);
outtextxy(220,295,string);
outtextxy(290,305,stringl);

setcolor(58);

x1I0;

y1I120;
fseekCinfilel,FILE_INDEX,SEEK-SET);
index_1I0;
MI(int)ceil((double)display-size/100);
for( i=0; i<M; i++)

{
if( kbhit() !I 0 ) /* abort drawing if esc is entered */
{
incharIgetchC);
if( inchar II 27 ) /* esc I 27 */
break;
}

fread((void *)buffer, sizeofCunsigned),100,infile1);
for(jI0; j<1003 j++)
{
if( INFILE-TYPE II 0 )1
inval I bufferEj]>>4;
inval -- 2048;
}
if( INFILE-TYPE II 1 )
inval I bufferEj];
if( INFILE-TYPE II 2 )
inval I bufferEjJ-2048;
x2Ifloor((double)maxx
/(double)disp1ay-size*(double)index-1);
y2Ifloor(-(double)inval/(double)MAXVAL*100+120);
line(x1,y1,x2,y2);
x1Ix2;
yl=y2z
index-1++;

}

60

b_index I FILE_INDEX/2;
e_index I FILE-INDEX/2 + display_size;
change-position(0);

setcolor(63);
11ne(0,235,639,235);
line(0,0,639,0);
for( iIO; i<11; i++)

{

xi=i*64-1;

if( :1 < 0 )

x1I0;
line(x1,230,x1,235);
line(x1,0,11,5);

}
setviewport(0,240,639,260,0);
setbkcolor(0);
clearviewport()3
setviewport(0,0,maxx,maxy,0);
setcolor(63);
inc-value I display_size/10;
for( i=1; i<10; i++)

{

index-2 I floor(FILE-INDEX/2 + inc_value*i);

x1Ii*64-1;

ltoa(index-2,string1,10);

if ( 1%2 -- o )
outtextxy(xl,250,string1);

else

outtextxy<xl,240,string1);
}
}

/***********#****1**********#********#********************/

void,initia1_screen()
{
Char string[80],string1[80];

setbkcolor(56);

setcolor(2);
ltoa(FILE_SIZE/2,string1,10);
strcpsttring,"Total points");

61

outtextxy(540,295,stringi);
outtextxy(540,305,string);
setcolor(3);
strcpy(string," Begin Marker <F3 > <F4 > End Marker");
outtextxy(5,400,string);
strcpy(string," - End Point (F5 > <F6 > + End Point");
outtextxy(5,410,string);
strcpy(string,"- Display Size (F7 > (F8 > + Display Size");
outtextxy(5,420,string);
strcpy(string," - Start Point (F9 > <F10> + Start Point");
outtextxy(5,430,string);
strcpy(string," - Step Size < - > < + > + Step Size");
outtextxy(5,440,string);
strcpy(string," Move Marker <<- > < ->> Move Marker");
outtextxy(5,450,string);
strcpy(string,"<u> Update Screen");
outtextxy(380,400,string);
strcpy(string,"<z> Zoom Screen");
outtextxy(380,410,string);
strcpy(string,"<f> Save Marked points to FILE");
outtextxy(380,420,string);
strcpy(string,"<l> Listen to Marked Portion");
outtextxy(380,430,string);
strcpy(string,"<x> Exit");
outtextxy(380,440,string);
/*strcpy(string,"<c> Change directory Path");
outtextxy(380,450,string);*/
setcolor(56);
strcpy(string,"STEP SIZE : 1 10 100 1,000

10,000 100,000 1,000,000");
outtextxy(10,350,string);
setcolor(62);
strcpy(string,"STEP SIZE 1,000");
outtextxy(10,350,string);
stepsizeI1000;
opt-1;
b-index I FILE_INDEX/2;
bI(b-index - FILE_INDEX/2)Imaxx/display-size;
e_index I FILE_INDEX/2 + display-size;
eI(e_index - FILE-INDEX/2)*maxx/display_size;
getimage(b,10,b,225,bufferb);
getimage(e,10,e,225,buffere);

62

ltoa(b_index,string1,10);
strcpy(string,"Beginning point I ")3
setcolor(2);
outtextxy(50,330,string);
setcolor(63);
outtextxy(200,330,stringi);
ltoa(e_index,string1,10);
strcpy(string,"Ending point I ");
setcolor(2);
outtextxy(340,330,string);
setcolor(57);
outtextxy(470,330,string1);

}

/*********************************************************/
void zoom_screen()

{

FILE_INDEX I b_index*2;

display_size I e_index - b_index;

update-screen();

change_fileindex(0);

}

/********************************************************/

void change_path()

{

int inkey;

int length;

char string[80],ch[2];

void delayC);

setviewport(0,370,300,390,0);
setbkcolor(0);

clearviewport();
setviewport(0,0,maxx,maxy,0);
strcpy(string,"Enter directory path >>");
setcolor(60);

outtextxy(5,370,string);

strcpy(PATH,"\0")3
Ch[1]=’\0’;

63

inkeyIO;
while( inkey.!= 27 ) /* CR */

{

inkengetchC);

if( isalnum(inkey) !I 0 II inkey II 58 ll inkey II 92 )

II alpha num or "z" or "\" */

{
ch[0]Iinkey;
strcat(PATH,ch);
setviewport(410,370,639,390,0);
setbkcolor(0);
clearvievportC);
setvievport(0,0,maxx,maxy,0);
setcolor(60);
outtextxy(415,370,PATH);
}

if( inkey == 8 )
{
length-strlen(PATH);
PATHEIength-l]=’\0’3
setvievport(410,370,639,390,0);
setbkcolor(0);
clearviewportC);
setviewport(0,0,maxx,maxy,0);
setcolor(60);
outtextxy(415,370,PATH);
}

if( inkey II 13 )
{
setviewport(0,370,639,390,0);
setbkcolor(0);
clearviewport();
setvievport(0,0,maxx,maxy,0);
strcpy(string,"?ath name entered >> ");
strcat(string,PATH);
setcolor(62);
outtextxy(5,370,string);
delayClOOO);
setvievport<0,370,639,390,0);
setbkcolor(0);
clearviewportC);
setviesport(0,0,maxx,maxy,0);

64

break;

}

if( inkey II 27 ) /* esc */
{
setvievport(0,370,639,390,0);
setbkcolor(0);

clearviewportC);
setviewport(0,0,maxx,maxy,0);
break;

}

}

/*****************Ik**************************************/
void listen_word()

{
long i,j;
int port;

unsigned inbufferfi];
int inval,outnum;
long delay,total,k;
char string[80];

void outportC);

portI820; /* 816 + 4 I/ II 816 I 330B */
port-772; /* 768 + 4 */ /* 768 I 300B */
if( INFILE_TYPE II 0 )

delayI6;
if( INFILE_TYPE II 1 )

delayIS;
if( INFILE-TYPE as 2 )

delayI7;

total I e_index - b_index;
fseek(infile1,b_index*2,SEEK_SET);
for( iI0; i<total; i++)
' {
fread((void I)inbuffer, sizeof(unsigned),1,infi1e1);
if( INFILE_TYPE as 0 )
{
inval I inbuffer[0]>>4;

65

outnum I inval<<4;
}
if( INFILE_TYPE as 1 )
outnum I inbuffer[0]+2048;
if( INFILE_TYPE as 2 )
outnum I inbufferEOJ;
for( kIO; k<delay; k++);
outporthort,outnum);
}

66

B Program 2 - Cepstrum

This program computes the mel-cepstrum coefﬁcients of an input speech data ﬁle
produced by Wavemark.

67

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
#include <math.h>

#define W-SIZE 256

/* Hamming window size */
#define CEP-SIZE 6

/* number of cepstral coefficients */
#define FFT-SIZE 4096

/* number of points transformed real & imag I/
#define PI 3.14159265358979

int count;
double mel[21]; /* me1-frequency energy */
double mf[22]; /* mel frequency scale */
double waFFT_SIZE];

/* 256 samples plus zero padding to 4096 pts */
double cep_coef[CEP_SIZE];
double fftmag[1024];
int sdata[128],buffer[128],stemp;
FILE *outfile;

int main(){
int i,h,s,t;
long S,T;
int numread;
char infilename1[80],infilename2[80],outfilename[80];
char instring[40],numstringESJ;
FILE *infile,*file;
void exit();
void mel_cepstrum();
void mel-freq();
void mel-energy();
void vindOVC);
void fftC);

me1-freq();

strcpy(infilenamel,"dvrds.dat");
if ( (infile I fopen(infilename1, "r")) II NULL){

68

printf("fopen failed for Xs.\n",infilename1);
exit(0); }
SITIO;
vhile( fscanfCinfile,"%s\n",instring) !I EOF){
for(hI1; h<16; h++){
strcpy(infilename2,"d:\\sdata\\");
strcat(infilename2,instring);
strcat(infilename2,".d");

switch(h){
case 1 : strcpy(numstring,"01");
break;
case 2 : strcpy(numstring,"02");

break;

case 3 : strcpy(numstring,"03");
break;

case 4 : strcpy(numstring,"04");
break;

case 5 : strcpy(numstring,"05");
break;

case 6 : strcpy(numstring,"06");
break;

case 7 : strcpy(numstring,"07");
break;

case 8 : strcpy(numstring,"08");
break;

case 9 : strcpy(numstring,"09");
break;

case 10 : strcpy(numstring,"10");
break;

case 11 : strcpy(numstring,"11");
break;

case 12 : strcpy(numstring,"12");
break;

case 13 : strcpy(numstring,"13");
break;

case 14 : strcpy(numstring,"14");
break;

case 15 : strcpy(numstring,"15");
break;

default : break;

}

strcat(infilename2,numstring);

69

/*

}

if ( (file I fopen(infi1ename2, "rb")) II NULL){
printf("fopen failed for infilename2 %s\n"
,infilename2);
exit(0); }
printf("reading in data from file %s\n"
,infilename2);
strcpy(outfilename,"c:\\cdata2\\");
strcat(outfilename,instring);
strcat(outfilename,".j");
strcat(outfilename,numstring);
outfile I fopen(outfilename, "vb");
printf("Hriting cepstral coeffs. to file %s\n"
,outfilename);
SItIO;
numreadIfread((void*)buffer,sizeofCint),128,fi1e);
t+Inumread;
do {
for(iIO; i<127; i++){
sdataEi] I bufferEi]; }
numread I fread((void I)buffer
,sizeof(int),128,file);
t+Inumread;
if( numread < 128 ){
for(iInumread; i<128; i++){
bufferEiJIO;
}
}
window();
ffthf-1,2048,1); /* -1 for inverse fft */
mel_energy();
mel_cepstrum();
s++;
fwrite((void *)cep-coef
,sizeof(doub1e),6,outfile);
if( 3110 II 0 ){
printh" %d\n",s);

for(iIO; i<6; i++){

printf("cep_coef[Zd]I%f\n",i,cep_coef[i]);

}*/
}while( numread II 128 );
fclose(file);

70

}

fclose(outfile);

T+It;

S+Is;

printf("%u data points read from file %s\n"
,t,infi1ename2);

printf(" Zld global points read so far\n",T);

printf("%u cepstral vectors (Zu coeffs)
computed and written to file %s\n"
,s,s*6,outfilename);

printf(" Zld global vectors computed so far\n",S);

}

}

fclose(infile);

/**********************************************************

I This function takes 256 points from sampled data using I
I Hamming window and puts zeros in remaining positions I

I to implement a 2048 point FFT I
*sassssssssssssassssassssssssssssssassasssssssssssssassss/

void window() {

}

int i;

for (i=0; i<256; i++){

if( 1 < 128 ){
wf[iI2]Isdata[i]I(0.54-0.46Icos(2IPIIi/255));
waiI2+11so;

}else{
wf[iI2]Ibuffer[i-128]I(0.54-0.46Icos(2IPIIi/255));
wr[iI2+1]so;

}

}
for (iI512; i<4096; i++){
waiJIO; }

/*********************************#***********************/

II This routine implements a radix-2 FFT I/
[assssssssasssssssssssssssssssssssssssssssssssasssssssssss/

#define SHAP(a,b) temprI(a);(a)I(b);(b)Itempr

void fft(data,nn,isign)

71

double data[];
int nn,isign;

{

int n,mmax,m,j,istep,i;
double wtemp,wr,wpr,wpi,wi,theta;
double tempr,tempi;

nInn << 1;
1‘1:
for (iI1;i<n;i+I2) {

}

if(j>i){
SHAP(data[j],data[i]);
swAp(data[j+1],data[i+1]);
}
mIn >> 1;
while (m >- 2 && j > m) {
j -= m;
m >>I 1;
}

j+3m;

mmaxI2;
while (n > mmax) {

istepI2Immax;
theta-6.28318530717959/(isignImmax);
wtempIsin(0.5Itheta);
wpr I -2.0themprtemp;
wpiIsin(theta);
er1.0;
wiI0.0;
for (mI1;m<mmax;m+I2) {
for (iIm;i<In;i+Iistep) {
in+mmax;
temprIerdata[j]-wiIdata[j+1];
tempinrIdatan+11+wiIdata[j];
datanJIdataEi]-tempr;
data[j+1]Idata[i+1]-tempi;
data[i] +I tempr;
data[i+1] +I tempi;
}
er(wtemprr)prr-winpi+wr;
wininpr+wtemprpi+wi;

72

}

}

/**********************************************************

I This routine computes the MEL-frequencies, then computes
I the critical and energy and put these values in the same

}

mmainstep;

I array.

*********************************************************l

void mel_energy(){
int i;
int binder;

for(iIO; i<1024; i++){

fftmagEi]Ilog10(sqrt(pow(wf[iI2],2)+pow(wf[iI2+1],2)));
}

for(iI1; i<21; i++){

}
}

/********#*************************************************

I This routine the computes MEL-based cepstral coefficients
ssssassasssssssssssssstsssssssssssssssssssssssssssssssssss/

bindex I floor( mf[i-1]I0.2048 );
mel[iJIO;
while( bindexI4.8828125 < mf[i+1] ){
if( bindex*4.8828125 >I mei-l] && bindex*4.8828125
< mei] ){
mel[i] +I fftmagEbindex]
I(bindexI4.8828125-mf[i-1])/(mf[i]-mf[i-1]);
}
if( bindexI4.8828125 >I mf[i] && bindex*4.8828125
< mf[i+1] ){ ‘
mel[i] +I fftmagEbindex]
I(mf[i+1]-bindexI4.8828125)/(mf[i+1]-mf[i])3
}
bindex++;

}

void me1-cepstrum()

{

int i,k;

73

for(iIO; i<CEP_SIZE; i++){
cep-coef[i]I0.0;
for(kI1; k<I20; k++){
cep_coef[i]+Imel[k]Icos(iI(k-0.5)IPI/20);
}
}
}

[ssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
I This routine computes the MEL-frequencies. I
ssssssssssassssssssssssssssssssassssssssssssssassssssssss/

void mel-freq() {

int i;

meOJIO;

mf[21]I5000;

for(iI1; i<21; i++){
mf[i]I1000I(pow(10.0,0.03705482Ii)-1);

}

74

C Program 3 - Codebook

This program produces the binary tree codebook used by Quantize. The program re-
quires the Phar Lab Dos extender supplied with the MicroWay C compiler because of
the large amount of data involved. The program is run using the following command.

run386 -vm <path> vmmdrv codebook

The -vm command means that the program is running in virtual memory (i.e. the
hard disk is being used as RAM) and vmmdrv is the virtual memory driver contained
in the directory specifed by <path>.

75

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
#include <math.h>

#define NUM 40000

/I maximum number of cepstral vectors
#define MSIZE 30

/* maximum number of characters in input
#define M0 6

/I Number of Cepstral coefficients
ﬁdefine dist_thresh 0.00000001

II threshold for separation of old and new means
#define NLEVELS 7

/* number of levels in binary codebook
#define THOPOHER 128

/I 2‘NLEVELS

#define CUNVTHRESH 100
/* max iteration per node

int quantENUM];

double s_vector[NUM][M0];

long total_count;

long totl,oldtot1,tot2,oldtot2;

double vectorlEMO],vector2[M0],old_vector1[M0];
double old_vector2[M0];

double vector[NLEVELS+1][THOPOHER][MOJ;

FILE Ioutfile,Ioutfile2;

mainC)

{

char outfilename[80];
void initialiZBC);
void vectorize();

strcpy(outfilename,"cdbkjdif.dat");
outfile I fopen(outfilename,"w");
outfile2 I fopen("cdbkjdif.1og","w");
printf("Start initialization...\n");

76

initializeC);
vectorize(NLEVELS);
fclose(outfile);
fclose(outfile2);

}

/*******************************************************/

void vectorize(nlevels)
int nlevels;

{

int 1 Leve1,Leve12,nt;
register int i,k;

long j;

double old_dist1,old_dist2;
double distance(),dist1,dist2;
long input_data();

void separate();

void compute-vector1();
void compute_vector2();
void perturb();
printf("Start vectorize\n");

total_count I input_data();
printf("total_countI%ld\n",tota1_count);
for(LevelIO;Level<nlevels;Level++)
{
for(jI0;j<total_count;j++)
quantEjJII2;
Level2I(int)pow((double)2,(double)Level);
for(iI0;i<Level2;i++)
{
printf("\n");
distlIdistZIO;
perturb(vectorELevel][i]);
ntIO;
up: separate(i);
nt++;
01d-dist1 I distl;
old-dist2 I dist2;
dist1Idistance(vector1,old-vector1);
dist2Idistance(vector2,old-vector2);

77

printf("levelIXu nodeIXu ntI%u\n",Level,i,nt);
printf(" dist1IXg old-dist1I%g difI%g\n",dist1,
old-dist1,fabs(dist1-old_dist1));
printf(" dist2IXg old_dist2I%g difI%g\n",dist2,
old_dist2,fabs(dist2-old_dist2));
fprintf(outfile2,"1le%u ndeu ntIXu dstiIZg
dst2IZg\n\015",Level,i,nt,dist1,dist2);
if( fabs(dist1-old_dist1) >I dist-thresh && nt
< CONVTHRESH ){goto up;}
if( fabs(dist2-old_dist2) >I dist_thresh && nt
< CONVTHRESH ){goto ups}
for(kI0;k<M0;k++){
vector[Level+1][2Ii][kJIvectorIEk];
vectorELevel+1][2Ii+1][kJIvector2Ek];

}
}
}
for(iI1;i<In1evels;i++)
{
fprintf(outfi1e,"Level Xu ....... \n",i);
for(jI0;j<(int)pow((double)2,(double)i);j++)
{
for(kI0;k<M0;k++)
fprinthoutfile,"% 14.10f ",vectorEiJEjJEkJ);
fprintf(outfile,"\n");
}
}
}

/*****************************************************/

void separate(i)
long i; {
register int j,k;
long n;
double dist1,dist2;
double sum1[M0],sum2[M0];
double distance();

tot1Itot2I0;

for(jIO; j<M0; j++){
sum1[j]I0;

78

sum2[j]I0;
}
for(nIO; n<total-count; n++){
if(quantEnJIIiI2 ll quantEnJIIiI2+1){
dist1Idistance(vector1,s_vector[n]);
dist2=distance(vector2,s_vector[n]);
if( dist1 < dist2 ){
for(kIO; k<M0; k++)
sum1[k]+Is_vector[nl[k];
tot1++;
quantEnJIiI2;
}else{
for(kIO; k<M0; k++)
sum2[k]+Is_vector[n][k];
tot2++;
quantEnJIiI2I1;
}
}

for(kIO; k<M0; k++){
old_vector1[k]Ivector1[k];
old_vector2[kl-vector2[k];
}
if(tot1!I0)
for(kIO; k<M0; k++)
vector1[k]Isum1[k]/tot1;
if(tot2!=0)
for(kIO; k<M0; k++)
vector2[k]Isum2[k]/tot2;
printf("totlled tot2Ide\n",tot1,tot2);
fprintf(outfile2,"tot1Ide tot2Ide\n\015",tot1,tot2);
}

/*********************#***#***¢**********************#****/

double distance(vector-a,vector_b)
double vector_a[M0],vector_b[M0];{
int k;
double dist;

dist-0.0;
for(kIO; k<M0; k++){

79

distIIpow((vector_a[k]-vector-b[kl).2.0);
}
dist-sqrt(dist);
return(dist);

}

[ass***s*s**assssssssss*ssssssssssssss*sssssssssssssssssss/
void perturb(vectora)

double vectoraEMO];{

int j;

for(jIO; j<M0; j++){-
vector1[j]Ivectora[j]I1.01;
vector2[j]Ivectora[j]I0.99;
}
}

[assasssssassssssssssssssssssssssssssassssssssssssssssssss/
void initialize(){
long i;

for(iIO; i<M0; 1++){
vectorlEiJI0.0;
vector2[i]I0.0;
}
for(iIO; i<NUM; i++)
quantEiJIO;
}

[asssasssssssssssasssssassassssssssssssssssasssssssssssss/
long input_data(){

int i,h,t;

long count;

char filename[80],infilenameEBO];

char instring[20],numstring[10];

FILE Iinfile,Ifile;

int numread;

double bufferESJ;

void exit();

countIO;
strcpy(infilename,"dwrds.dat");

80

printf("Cepstral vector input -
infilename I Xs\n",infilename);
if ( (infile I fopen(infilename, "r")) II NULL){
printf("fopen failed for
infilename Xs.\n",infilename);
exit(0);
}
while( fscanf(infile,"%s\n",instring) !I EOF){
for(hI1; h<11; h++){
strcpy(filename,"c:\\cdata2\\")3
strcat(filename,instring);
strcat(filename,".j");

switch(h)

case 1 : strcpy(numstring,"01");
break;

case 2 : strcpy(numstring,"02");
break;

case 3 : strcpy(numstring,"03");
break;

case 4 : strcpy(numstring,"04");
break;

case 5 : strcpy(numstring,"05");
break;

case 6 : strcpy(numstring,"06");
break;

case 7 : strcpy(numstring,"07");
break;

case 8 : strcpy(numstring,"08");
break;

case 9 : strcpy(numstring,"09");
break;

case 10 : strcpy(numstring,"10");
break;

case 11 : strcpy(numstring,"11");
break;

case 12 : strcpy(numstring,"12");
break;

case 13 : strcpy(numstring,"13");
break;

case 14 : strcpy(numstring,"14");
break;

case 15 : strcpy(numstring,"15");

81

break;

.default : break;
}
strcat(filename,numstring);
_pmode I 0x8000; /I binary mode for NDPCI/
if ( (file I fopen(filename, "r")) II NULL){

printf("fopen failed for

filename %s\n",filename);

exit(0);
}
printf("reading in data from
file 23 ..... \n",fi1ename);
tI0;
do{

numread I fread((void I)buffer,
sizeof(double),6,file);
if(numread II 0)
break;
for(iIO; i<6; i++){
s-vector[count][i] I buffer[i];
vector1[i] +I bufferEi];

}

count++;

t++;

}while( numread II 6 );

fclose(file);

printf("Zld vectors (Zld coeffs.) read from file
Zs\n",t,tI6,filename);

printf(" Zld total vectors (21d coeffs.)
read so far\n",count,countI6);

}

for(iIO; i<M0; i++){
vector[0][0][iJIvector1[i]/count;
}

return count;

82

D Program 4 - Quantize

This program is used to quantize the mel-cepstrum coefﬁcients into symbols that are
then used in program Hmmtrain. The program is run using the following command:

Quantize codebookﬁle inputwords.dat

where codebook.ﬁle is the codebook produced by program Codebook and inputwords.dat
contains the words of the vocabulary being quantized.

83

#include (std

io.h>

#include <ctype.h>

#include <str
#include <tim
#include (mat
#define MSIZE

/I maximum number of characters in input

ing.h>
e.h>
h.h>
50

#define M0 6

/I number of cepstral coefficients
#define NLEVELS 7

/* number of levels in binary codebook

#define LEVELINDX 126
/* 2‘NLEVELS-2
#define TOTVECT 254
/* number of vectors in codebook

int count;

double in_vector[H0];

double codebookETDTVECTlEMOJ;
double buffer[6];

char outfilename[80];

char codefile[80];

char wordsEMSIZE];

FILE Ioutfile;

int main(argc,argv)

int argc;

char IargVE];{

int i,j,k;

int numread;

char infil
char wrdfi

es[80],infilename1[80];
le1[80],wrdfile2[80],wrdnum[10];

FILE Iinfile1,*infile2;

int count;

void codebook-entry();

void vecto

if( argc <
printf(

printf(

r_quantize();

I 1 ){

"IIIII After program name
enter two file names IIIII\n");
"1. The first file name is the
name of the codebook.\n");

84

printf("2. The second is the
name of the file that contains the paths and\n");
printf(" names of all the data files to be
qauntized.\n\n");
printf("Example: ");
printf("lpc2 codebook.dat allfiles.dat\n",argv[0]);
exit(0);
}
strcpy(codefile,argv[1]);
strcpy(infiles,argv[2]);

codebook_entry();

for(jIO; j<M0; j++)
printf("Z 10.8f ",codebook[253][j]);
printf("\n");

if ( (infilel I fopen(infiles, "r")) II NULL){
printf("fopen failed for file Xs.\n",infilename1);
exit(0);
}
while( fscanf(infi1e1,"%s\n",words) !I EOF){
for(iI1; i<16; i++){
strcpy(wrdfile1,"c:\\cdata2\\");
strcat(wrdfi1e1,words);
strcat(wrdfilel,".j");
itoa(i,wrdnum,10);
if( i < 10 )
strcat(wrdfilel,"0");
strcat(wrdfile1,wrdnum);
if ( (infile2 I fopen(wrdfilel, "rb")) II NULL){
printf("fopen failed for file Zs.\n",wrdfile1);
exit(0);
}
strcpy(wrdfile2,"c:\\qdata2\\")3
strcat(wrdfile2,words);
strcat(wrdfile2,".q");
itoa(i,wrdnum,10);
if( i < 10 )
strcat(wrdfile2,"0");
strcat(wrdfile2,wrdnum);
if ( (outfile I fopen(wrdfile2, "w")) II NULL){

85

printf("fopen failed for file Xs.\n",wrdfile2);

exit(0);
}
countIO;
printf("reading in data from
file 13 ..... \n",wrdfi1e1);
do{
numread I fread((void I)buffer
,sizeof(doub1e),6,infile2);
if(numread II 0)
break;
vector_quantize();
count++;
}while( numread II 6 );
fclose(infile2);
fclose(outfile);

printf("Zu data points read from file %s\n"
,count,wrdfile1);

printf(" Zu Quantized vectors written to
file %s\n",count,wrdfile2);

}

}

/***************#10"!It**************************************/

/I This routine vector quantizes the computed lpc vector I/
/I with respect to the given codebook. I/
[IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII/
void vector_quantize()

{

int i,index1,index2,vq;

double idm1,idm2;

double distance();

int level_index();

indexl I 0;

index2 I 1;

idm1 I distance(codebook[indexi],buffer);

idm2 I distance(codebookEindex2J,buffer);
lerintf("index1I%d index2IXd\n",index1,index2);I/
/Iprintf(“ idmiIZf idm2I%f\n",idm1,idm2);I/

if( idm1 > idm2 )

86

index1 I index2;

for(iI1; i<NLEVELS; i++)

{

index1 I (index1 - level_index(i))I2 + leve1_index(i+1);

index2 I index1 + 1;

idml I distance(codebook[index1],buffer);

idm2 I distance(codebook[index2],buffer);
/*printf("index1I%d index2IXd\n",index1,index2);*/
/*printf(" idmIIXf idm2I%f\n",idm1,idm2);I/

if( idmi > idm2 )

. index1 I index2;

}
vq I index1 - LEVELINDX;
fprintf(outfile,"%d\n",vq);
/Iprintf(" %d\n",vq);I/

}

/sassstsssssssssssssssssssssssssssssssssssssssssssssssssss/
/I This routine calculates which vector to compare I/
/I next in the codebook once a vector index in the I/
II previous level is given I/

[IIIIIIIIIIIIIIII*IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII/
int level-index(k)

int k;

{

int num;

num I (int)pow((double)2,(double)k) - 2;

return num;

}

/*********************************************************/

double distance(vector-a,vector_b)
double vector_a[M0],vector_b[M0];{
int k;
double dist;

distI0.0;

for(kIO; k<M0; k++){
dist+Ipow((vector-a[k]-vector-b[kl),2.O);

}

distIsqrthist);

87

return(dist);

}

/asassssssssssssssssssssssssssssssssssssssssssssssssssssss/
void codebook-entry(){

FILE Iinfile3;

int i,j,k,m;

char input1[80],input2[80],input3[80],fp[6][20];

infile3 I fopen(codefile,"r");

printf("\nReading %s\n",codefile);
mIO;
for(iI1; i<INLEVELS; i++){
fscanf(infi1e3,"%s Xs Xs\n",input1,input2,input3);
printf("Zs Zs %s\n",input1,input2,input3);
for(jIO; j<(int)pow((double)2,(double)i); j++){
fscanf(infile3,"%s Is Is Is 18 %s\n"
.fp[0].fp[1].fp[2].fp[3].fp[4].fp[5]);
for(kIO; k<6; k++){
codebookEm][kJIatof(fp[k]);

}
m++;
}
}
fclose(infile3);

88

E Program 5 - Hmmtrain

This program is used to train the hidden Markov models and uses the ﬁles containing
the symbols produced by program Quantize. It is run using the command:

run386 Hmmtrain

Certain ﬁles have to be set up before Hmmtrain can be used. these are set up using
the programs Makew8 and Makewg. Makew8 is used to set up the paths to the data
ﬁles and Makewg is used to set up the batch ﬁle containing the command found above.
The listing of these programs are found following Hmmtrain.

89

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <math.h>

#define NUM 5000

/I total allowable number of vq indices for training I/
#define SYM 128

/* number of symbols in B matrix */
#define TNUM 500

/I number of vq indices in each utterance I/
#define MURDER 6

/I model order of the A matrix I/
#define MAXITER 150

/I maximum number of training iterations */
#define HNUM 20

/I number of training files I/
#define INIT 5

/* number of initial tries */
#define INITLIMIT 5

/I number of iterations for each try I/
#define PTH 0.02

#define EPSILON 1.06-8
#define EPSLN 1.0e-8
/Ithresho1d for bw[][l, used to keep beta from overflowI/

int
int
int
int
int
int
int

vq_data [NUM] , length [mm] , ind1 [8m] ,ind2 [SYN] ;
M;

v,w1;

indeHNUM+11§

iteration;

num_vq_indices;

scalenum;

double aEMORDER] [MURDER] ,bEMORDER] [3m] ;

double bw [MURDER] [3m] ,pEMURDER];

double ia[INIT][MORDER][MURDER],ibw[INIT][MURDER][SYMJ;
double alpha [mu] [mun] [MURDER] ,batathUM] [1mm] [MURDER] ;
double ctEHNUulfTNUM];

double prbwaUMl,prb_old[HNUMJ;

double probsum,prob_oldsum;

double initprobeNIT];

90

char input2[80];

main(argc,argV)

int argc;

char Iargi];

{

int i,j,k,n;

int prbindx;

double smallest;

int input_vq_indices();
void order-vq_indices();
void initialize_ab();
void alpha_beta_computation()s
void reestimate-ab();
void ab_matrix-output();

if( argc (I 2)

{

printf("\nCommand line arguments should be two file
names.\n");

printf(" 1. The first file name is for a file
that contains\n");

printf(" the names and paths of the
quantized data files.\n");

printf(" 2. The second file name is the
name of the HMM to be calculated.\n");

printf("Example: hmm1281r qfiles.dat hmm2.dat\n\n");

exit(0);

}

num-vq,indiceinnput_vq_indices(argv[1]);

printf("total vq indices read in is: Zd\n",num-vq_indices);
printf("Start function order-vq_indices()\n");
order-vq-indices();

printf("Start function initialize-ab()\n");
initialize-ab();

for(nIO; n<INIT; n++)
{
for(iIO; i<MORDER; i++)
{

91

for(jIO; j<MORDER; j++)
a[i] [j] I iaEn] [i] [j];
for(kIO; k<M; k++)
bwfil [k] I ibw [n] [i] [k] ;
}
for(iIO; i<WNUH; i++)
prb[i]=prb_old[i]I0;
for(kIO; k<INITLIMIT3 k'H')
{
prob_oldsumIprobsum;
for(jIO; j<HNUM; j++>
prb_old[jl=prb[j];
printf("\niteration In of trial I %u\n",k,n);
alpha_beta-computation();
/* printf("Start function reestimate_ab();\n");I/
reestimate_ab();
printf("old: ");
for(jIO; j<HNUM; j++)
printf("x 6.1f",prb_old[jl);
printf("\n");
printf("new: ");
for(jIO; j<wNUM; j++)
printf("7. 6 . 11:" ,prb [j] );
printf("\n");
printf("prob_oldsum IX 12.7f\n",prob_oldsum);
printf("probsum IX 12.7f\n",probsum);
}
for(iIO; i<MORDER; i++>
{
for(jIO; j<MORDER; j++)
ia[n] [i] [j] I 8.11] [j];
for(kIO; k<M; k++)
ibw [n] [i] [k] I bw [i] [k] 3
}
initprobEn] I probsum;
}
for(nIO; n<INIT; n++>
printf("initprobquJIZ 12.7f\n",n,initprob[n]);
smallest I initprob[0];
prbindx I 0;
for(nI1; n<INIT; n++)
if( initprob[n] < smallest )

92

{
prbindx I n;
smallest I initprobEn];
}
printf("smallest I Z 12.7f\n",smallest);
printf("Training with trial In prob was % 12.7f\n"
,prbindx,smallest);
for(iIO; i<MORDER; i++){
for(jIO; j<MORDER; j++)
a[i] [j] I iafprbindx] [i] [j] 3
for(kIO; k<M; k++)
bw[i] [k] I ibw [prbindx] [i] [k] ;
}

printf("Starting training...\n");

iteration I 0;
probsum I smallest;
for(iIO; i<wNUM; i++)

prbEiJIprb-old[i]I0;

for(iIO; i<MAXITER; i++)

/*
ls

{
prob-oldsumIprobsum;
for(jIO; j<WNUM; j++)

prb-old[leprb[jl;
printf("\niteration I Xd\n",iteration);

printf("Start function alpha_beta_computation()\n");I/
alpha-beta_computation();
printf("Start function reestimate-ab();\n");*/

reestimate-ab();
printf("old: ");
for(jIO; j<HNUM; j++)

printf("x 6.1f",prb_old[j]);
printf("\n");
printf("new: ");
for(jIO; j<wNUM; j++)

printf("! 6.1f",prb[j]);
printf("\n");
printf("prob_oldsum IX 12.7f\n",prob_oldsum);
printf("probsum IX 12.7f\n",probsum);
printf("difference I Xg\n",probsum-prob_oldsum);
if( fabs(probsum - prob-oldsum) < PTH && iteration > 15)

break;

93

iteration++;

}
printf("threshold reached ..... \n");
ab_matrix-output(argv[2]);
}

/************************¢*******************************

This function inputs the training lpc quantized vectors I
from their respective files corresponding to each I
training utterance and puts them in an array vq_data[] I
sass:assssssssssssssssssstssstssssssssssssssssssssassssss/
int input,vq_indices(infilename)

char infilename[];

{

char input3[15];

FILE Iinfile1,*infile2;

int j,k,t;

int symbnumeNUM];

H1ItIindx[0]I0;
if ( (infile1 I fopen(infi1ename, "r")) II NULL)
{
printf("fopen failed for file Zs.\n",infilename);
exit(0);
}
while( fscanf(infile1,"%s\n",input2) !I EOF)
{
if ( (infile2 I fopen(input2, "r")) II NULL)
{
printf("fopen failed for file Zs.\n",input2);
exit(0);
}
printf("reading in data from file Zs ..... \n",input2);
H1++5
kIO;
while( fgets(input3,15,infile2) !I NULL)
{
vq-data[t] I atoi(input3);
t++;
k++;
}
indx[H1]It;

94

fclose(infile2);
symbnume1-1JIk;
printf(" Zu symbols read in\n",k);
printf(" indeZu]I%u\n",W1,indx[W1])5
}
fclose(infilel);
scalenumIsymbnumEOJ;
for(in; j<W1; j++)
if( symbnumEj] <I scalenum )
scalenum I symbnumfj];
printf("The smallest file contains Zu
1pc vectors\n",scalenum);
WIH1;
return t;

}

/ssssssssssssssssassassssss*ssssssssssssssssssssssssssssss/
/I This function outputs the evaluated A and B matrices I/
/sssssassasssssssssssssssssssssssassassssssssassssasssssss/
void ab_matrix_output(outfilename)

char outfilenameE];

{

int i,j,k,l;

double bEMORDERJESYMl;

double bsum;

FILE Ioutfile;

printf("writing hmm to file Xs\n",outfilename);
outfile I fopen(outfilename,"w");

for(iIO; i<MORDER; i++)
{ /I A is being transposed to file I/
for(jIO; j<MORDER; j++>
fprintf(outfile,"%10.8f ",anJEil);
fprinthoutfile,"\n");
}
fprintf(outfile,"\n");

for(jIO; j<MORDER; j++)

for(kIO; k<SYM; k++>
b[j][k] I 0.0:

95

for(kIO; k<M; k++)
for(jIO; j<MORDER; j++)
bfj][ind1[k]] I wajlfkl;

for(jIO; j<MORDER; j++)

{
lI0;
bsum I 0.0;
for(kIO; k<SYM; k++)
{
if( bEjJEk] < EPSILON )
l I l + 1;
else
bsum I bsum I b[j1[k];
}
for(kIO; k<SYM; k++)
{
if( bEjJEk] < EPSILON )
bEjltk] s EPSILON;
else
bfjlfk] I (1.0 - (double)1IEPSIL0N)Ib[jl[kl/bsum;
}
}

for(iIO; i<SYM; i++)
{ /I B is being transposed to file I/
for(jIO; j<MORDER; j++>
fprinthoutfile,"%10.8f ",bEjJEil);
fprintf(outfile,"\n");

}
fclose(outfile);
}
[sasssasastsssstssssssssssssssssssssssssssssssssssassssss/
/I This function reestimates the A and 8 matrices I/
/I using the Baum - Helch reestimation formulas. I/

/IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII/
void reestimate-ab()

{
int i,j,k,l,t,s;
int iw;

96

double bsum;

double xi[WNUM][MORDER][MURDER];
double thHNUMJEMORDERJESYM];

double gammaEWNUM][MORDER];

double temp1,temp2;

double a2[MORDER],b2[MORDER],tempsum;
double end;

for(iwIO; iw<W; iw++)
for(iIO; i<MORDER; i++)
{
gammaEiw] [i] I 0.0;
for(jIO; j<MORDER; j++)
xifiwlfilfj] I 0.0;
for(jIO; j<M; j++)
bt[iw][i][j] I 0.0;
}
for(iwIO; iw<H; iw++){
for(iIO; i<MORDER; i++){
for(jIO; j<MORDER; j++){
tIO;
for(inndeiw]; k<indx[iw+1]-1; k++){
xi[iw][i][j] +I alphaEiwlftlfilIaEilfjl
Ibw[j][ind2[vq_data[k+1]]]IbetaEiw][t+1][j];
t++;
}
}
tI0;
for(inndeiw]; k<indx[iw+1]-1; k++){
gamma [iw] [1] II alphafiw] [t] [i]
IbetaEiwlftlfillctfiwl[t];
t++;
}
}
}

lIfor(iwI0; iw<H; iw++)
for(iIO; i<MORDER; i++)
{
tempsumIO;
for(jIO; j<MDRDER; j++>
tempsum +I xifiwlfilfj];

97

printf("gammaEXulEXulIZg summed over j
xiEZu][Xu]I%g\n",iw,i,gamma[iw][i],iw,i,tempsum);

}*/

/Iprintf("Begin A calculations\n");*/
for(iIO; i<MORDER; i++)
{
a2[i]I0;
for(jIO; j<MORDER; j++)
{
temp1Itemp2I0;
for(iwIO; iw<H; iw++){
temp1 +I xi[iw][i][j];
temp2 +I gammafiwlfil;
}
a[i][j] I temp1/temp2;
a2[i] +I afi] [j];
printf("%6.5f ",afilfjl);
}
printf("\n");
}
for(iIO; i<MORDER; i++)
printf("a2EZuJIZg ",i,a2[i]);
printf("\n");
for(iIO; i<MORDER; i++)
for(jIO; j<MORDER; j++)
afilfj] I a[i][j]/a2[i];

for(iwIO; iw<H; iw++)
{
end I indeiwIll-indein-l;
for(jIO; j<MORDER; j++)
{
gammaEiw][j] +I alphaEiw][end][jl
Ibeta[iw][end][j]/ct[iw][end];
}
for(jIO; j<HORDER; j++>
for(kIO; k<H; k++){
tI0;
for(lIindx[iw]; l<indx[iw+1]; l++){
if( ind1[k] II vq_data[l] ){

98

btfiwlfjlfkl +I alphafiwlftlfjl
sbetaEinEt][jllcttivltt];
}
t++;
}
}
}
/Iprintf("Begin B calculations\n")3*/
for(jIO; j<MORDER; j++>
{
b2[j]I0;
for(kIO; k<M; k++)
{
temp1Itemp2I03
for(iwIO; iw<H; iw++){
temp1 +I bt [iw] [j] [k] 3
temp2 +I gammafiwlfjl3
}
bw[j][k] I temp1/temp2;
b2[jl +I bVEjlfkl3
}
printf("b2[%u]I%g ".j3b21jl);
}
printf("\n");

printf("%s\n",input2);

for(jIO; j<MDRDER; j++)
for(kIO; k<M; k++)
bwfjlfk] a bw[j1[k1/b2[j];

for(jIO; j<MORDER; j++)
{
lIO;
bsum I 0.0;
for(kIO; k<M; k++)
{
if( bw[j][k] < EPSLN )
1 I l + 1;
else
bsum I bsum + bwfjlfkl3
}

99

}

for(kIO; k<SYM; k++)
{
if( bwfjlfk] < EPSLN )
bwtjltk] a EPSLN;
else
bw[j][k] I (1.0 - (double)lIEPSLN)Ibw[j][kJ/bsum;
}

/**********************#******************************/

/*
/*
/*

This function calculates alpha and beta of the I/
forward-backward procedure and does it for */
each training utterance. I/

/ssssssssssssssssssssssssssssssssssssssssssssassssssss/
void alpha-beta_computation()

{

int i,j,k,m,t,t2;

int iw;

double sumalpha;
for(iwIO; iw<H; iw++)

/*

{
for(jIO; j<MORDER; j++) I Computing alpha I/
alpha[iw][0][j]I p[j]Ibw[j][ind2[vq_data[indx[iw]]l]3
sumalpha I 0;
for(jIO; j<MORDER; j++)
sumalpha +I alpha[iw] [0] [j] ;
if( sumalpha II 0 )
{
printf("sumalph I 0 iwIXu tI0\n",iw);I/
ct[iw][0]I1;
}
else
ct[iw][0] I 1/sumalpha;
for(jIO; j<MORDER; j++)
{
alpha[iwl[0][j] II ctfinEOJ;
}

t I 0;
for(inndxfiw]; k<indx[iw+1]-1; k++)
{

100

for(jIO; j<MORDER; j++){
alpha[iw][t+1][j] I 0.0;
for(iIO; i<MORDER3 i++)
alpha[iw][t+1][j] +I alpha[iw][t][i]Ia[i][j];
alpha[iw][t+1][j] *- waj][ind2[vq_data[k+1]]];
}
sumalpha I 0;
for(jIO; j<MORDER3 j++)
sumalpha +I alpha[iw][t+1][j]3
if( sumalpha II 0 ){
printf("sumalpha I 0 iwIZu t+1I%u\n",iw,t+1);I/
ct[iw][t+1] I 1;
}
else
ct[iw][t+1] I 1/sumalpha;
for(jIO; j<MORDER; j++) {
alpha[iw][t+1][j] II ct[iw][t+1];
}
t++;
}
t2It;
for(iIO; i<MORDER-1; i++)
beta[iw][indx[iw+1]-1][i] I 0.0;
beta[iw][indx[iw+1]-indx[iw]-1][MURDER-1] I 1.0;
betaEiw][indxfiw+1]-indx[iwl-1][MURDER-1] *s ctEiw][t2]3
t2--;
tIindeiwIIJ-indein-2;
for(inndeiwI1J-2; k>Iindx[iw]; k--)
{
for(iIO; i<MORDER; i++){
betaEiw][t][i] I 0.0;
for(jIO; j<MORDER; j++){
beta[iw][t][i] +s atiJEjJ
Iwaj][ind2[vq_data[k+1]]]Ibeta[iw][t+1][j];
}
}
for(iIO; i<MORDER; i++){
betaEinEtJEi] Is ct[iw][t2];
}
t“3
t2--;
}

101

}

for(iwIO; iw<W; iW++)

prb[iw]I0;

probsumIO;
for(iwIO; iw<H; iw++)

}

{

for(jIO; j<indx[iw+1]-indx[iw]3 j++)
prb[iw] +s log10(ct[iw][j]);

probsum +I prbfivl3

}

/**************************************************#******l

ls
/*
[I
/*
/*
[s
[s

This function finds and orders the vq indices that
occur in the training phase so that only these vq
indices get any type of probability of occuring,
so when we randomize the B matrix only the vq
indices that have occured received random values.
The others are set to value EPSILON later

in the program

*/
I/
*/
I/
II
I/
I/

/********************#************************************/

void order_vq_indices()

{

int f,i,j,k;

MI1;
ind1 [0] qu-data [0] 3
for(iI1; i<num_vq-indices; i++)

{
fIO;
for(jIO; j<M; j++)
{
if( vq-data[i] < ind1[j] ){
for(kICM-l); k>Ij; k--)
ind1[k+1]Iind1[k]3
ind1[j] s vq_data[il3
MIMI1;
fI1;
break;
}
if( vq-data[i] II ind1[j] ){
fI1;

102

break;

}

}

if( f == 0 )
{
ind1[M]=vq_data[il3
MIM+1;
}

}

printf("distinct vq indices: MI%d\n",M);

for(iIO; i<M; i++)
ind2[ind1[i]]Ii;
/I this "compresses" the indices from 0 to M I/
/I instead of using the symbols as indexes I/

}

/********assasstssssssssssssss*ssssstssssssssss*sss/
/* This function sets the values of the A and B I/
/I matrices to initial random values and then */
/* normalizes the rows of A and B */
/*ssss***ssssssssssssssssssssssssssssssssssssssasss/
void initialize-ab()

{

int i,j,k,n;

double suma,sumb,sump;

for(nIO; n<INIT; n++)
for(iIO; i<MORDER; i++)
/* Randomizing the A and B matrices */
{
for(jIO; j<MORDER; j++){
if( j>Ii as j<i+3 )
ia[n][i][j]I(double)rand();
else
iafnlfilfle03
}
for(kIO; k<M; k++>
ibw[n][i][kl-(double)rand();
}
p[0]I1;
for(iI1;i<MORDER;i++) /I state 0 to 1 since for this hmm I/

103

p[i]I0; /* model we only start at state 0 */
sump I 0.0; ‘
for(iI0;i<MDRDER;i++) /INormalizing p (not really needed I/

sump I sump + p[i]; /I but included for generality) I/
for(iIO; i<MORDER; i++)

pEilefiJ/sump;
for(nIO; n<INIT; n++)
for(iIO; i<MORDER; i++) /I Normalizing the rows I/
{ /I of A and B I/
sumaIsumbIO;

for(jIO; j<MORDER; j++)
suma I suma + ia[n] [i] [j];
for(kIO; k<M; k++)
sumb I sumb + ibwfn] [i] [k];
for(jIO; j<MORDER; j++>
iaEnJEiJEj] I iafnlfilfjl/suma;
for(kIO; k<M; k++)
iwan][i][k] s ibwthEiJEkllsunb;
}
/Ifor(iI0; i<MORDER; i++)
{
for(kIO; k<M; k++)
printf("Zg ",bwfi][kl)3
printf("\n");
}*/

104

Listing of program Makew8

#include <stdio.h>
#include <string.h>
#include <math.h>
main(argc,argv)

int argc;

char Iarng];

{

int k,m,s;

FILE Ioutfile,Iinfile;
char infilename[80];
char numstringES];
char prefix[30],prefix1[30];
char words[192][30];

strcpy(infilename,argv[1]);
printf(“infilenameI%s\n",infilename)3
if ( (infile I fopen(infilename, "r")) II NULL)
{
printf("fopen failed for file Xs.\n",infilename);
exit(0);
}
while( fscanf(infile,"%s\n",prefixi) !I EOF)
{
strcpy(prefix,"c:\\fdata\\")3
strcat(prefix,prefix1);
strcat(prefix,".dat");
outfile I fopen(prefix,"w");
printf("%s\n",prefix);
for( mI1; m<11; m++)
{
strcpy(prefix,"c:\\qdata\\")3
strcat(prefix,prefix1);
strcat(prefix,".q");
itoa(m,numstring,10);
if( m < 10 )
strcat(prefix,"0");
strcat(prefix,numstring);
fprintf(outfile,"%s\n".prefix);
strcpy(prefix,"c:\\qdatars\\")3
strcat(prefix,prefix1);
strcat(prefix,".q");

105

if( m < 10 )
strcat(prefix,"0");
strcat(prefix,numstring)3
fprintf(outfile,"%s\n",prefix)3
}
fclose(outfile);
}

106

Listing of program Maker

#include <stdio.h>

#include <string.h>

#include <math.h>

main(argc,argv)

int argc;

char Iargv[];

{

int i,k,m,n,t1,t2,s;

FILE Ioutfile,Iinfile;

char outfilename[80],infilenamef80],conv1[1],conv2[2],inputE40];
char number[10],prefix2[45],prefix3[45];

char prefix[75],prefix1[30]3

char filename1[20],filename2[20],filename3[20];

strcpy(infilename,argvf1])3
strcpy(outfilename,argVE2]);

if ( (infile I fopen(infi1ename, "r")) II NULL)
{
printf("fopen failed for file Xs.\n",infilename);
exit(0);
}
outfile I fopen(outfilename,"w");
sIO;
while( fscanf(infile,"%s\n",prefixl) !I EOF){
strcpy(prefix,"run386 hmm1281r c:\\fdata\\")3
strcat(prefix,prefix1);
strcat(prefix,".dat ");
strcat(prefix," c:\\hmmcomb\\");
strcat(prefix,prefix1);
strcat(prefix,".hmm");
fprinthoutfile,"%s\n",prefix);
}
fclose(infile);
fclose(outfile);
}

107

F Program 6 - “.m ﬁles”

These “.m ﬁles” are used in conjuction with the software program Matlab (Math
Works Inc.) to diagonalize the HMM’s. The “.m ﬁle” trnsfhmm.m does the actual
transformation. The “.m ﬁle” autotrns.m reads in the HMM’s, uses the “.m ﬁle”
tmsfhmm.m to do the transformation, and then saves the transformed HMM. The
listing of autotms.m gives an example for transforming the HMM’s (a, about, all, &
an)

Listing of “.m ﬁle” tmsﬂzmm.m

function [a1,a2,c2]Itrnsfhmm(a,c);

[q,a2]Ieig(a);

a1Ieig(a);

nI[1;zeros(5,1)];

bIaIn;

b1=inv(q)*b3

c1Ich;

dIeye(6);

for iI1:6
d(i,i)Ib1(i);

end

c2Ic1Id;

108

Listing of “.m ﬁle” autotms.m

load c:\hmmcomb\a.hmm
[a1,a2,c2]Itrnsfhmm(a(1:6,:),a(7:134,:));
hmmI[a2;C2l3
eigval=[eigval;a1];
save c:\hmmcomb\a.hmt hmm lascii /double
clear a a1 a2 c2 i hmm

load c:\hmmcomb\about.hmm
[a1,a2,c2]Itrnsfhmm(about(1:6,:),about(7:134,:));
hmmI[a2;c2];
eigval-[eigva1;all;
save c:\hmmcomb\about.hmt hmm /ascii /double
clear about a1 a2 c2 i hmm

load c:\hmmcomb\all.hmm
[a1,a2,c2]Itrnsfhmm(all(1:6,:),all(7:134,:));
hmmI[a2;c2];
eigvalIEeigval;a113
save c:\hmmcomb\all.hmt hmm /ascii /doub1e
clear all a1 a2 c2 i hmm

load c:\hmmcomb\an.hmm
[a1,a2,c2]Itrnsfhmm(an(1:6,:),an(7:134,:));
hmmI[a2;c2]3
eigvalIEeigval;a1];
save c:\hmmcomb\an.hmt hmm lascii /doub1e
clear an a1 a2 c2 i hmm

save eigvalcm.dat eigval lascii [double

109

G Program 7 - Evalhmms

This program is used in the Recognition experiments to evaluate the likelihoods of
the HMMs.

110

 

‘L'Illh

 

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <math.h>

#define SYN 128
/I number of symbols in B matrix */
#define TNUM 200

/I number of vq indices in each utterance I/
#define MURDER 6

II model order of the A matrix */
#define WNUM 79

int dataETNUM];

double hmmaEHNUM] [MURDER] [MURDER] ,hmmbEHNUM] [MURDER] [51m] ;
char wordsEHNUM][10];

int H,mt;

int main(argc,argv)
int argc;
char IargVEl3f
char infilename[80]3
char outfilenamef8013
char input3[15];
char prefix[80],prefix9[80];
FILE Ioutfile,Iinfile2;
int i,j,k,hmm,p,n,n1,n2,t,tot;
double probEWNUM13
double computation();
double greatest[8]3
int great-index[8]3
void input_vq_symbols()3
void input-hmms()3
void exit()3
int atoi()3

strcpy(infilename,argv[1])3

111

strcpy(outfilename,argVE2l)3

input_hmms(infilename);
for(jIO; j<w; j++)
printf("%s\n",words[j]);

pItotIO;
outfile-fopen(outfilename,"w")3
for(nIO; n<N; n++){
for(iI11; i<16; i++){
switch(i){
case 11 : strcpy(prefix9,"11"); break;
case 12 : strcpy(prefix9,"12"); break;
case 13 : strcpy(prefix9,"13"); break;
case 14 : strcpy(prefix9,"14"); break;
case 15 : strcpy(prefix9,"15"); break;
default : break;
}
strcpy(prefix,"c:\\qdata\\")3
strcat(prefix,words[n]);
strcat(prefix,".q");
strcat(prefix,prefixQ);
if ( (infile2 I fopen(prefix, "r")) II NULL){
printf("fopen failed for file Xs.\n",prefix);
exit(0);
}
tIO;
while( fgets(input3,15,infile2) !I NULL){
dataEt] I atoi(input3);
t++;
}
mtIt;
fclose(infile2);
for(hmmIO; hmm<U; hnn++){
probEhmm] I computation(hmm);
}
for(n1I0; n1<8; n1++){
greatestEnl] I prob[n1];
great_index[n1] I n1;
}
for(kI1; k<w; k++){
for(n1I0; n1<8; n1++){

112

}

}
1

if ( greatest [n1] < prob[k] ){

for(n2I8; n2>n1; n2--){
greatestCn2] I greatestEn2-1];
great_index[n2] I great_index[n2-1];

}

greatestEnl] I probEk];

great-index[n1] I k;

break;

}
}
if( n II great_index[0] )1
printf("XBs - Z2u ---> ZBs\n"
,wordsEn],i-11,words[great_index[0]]);
fprintf(outfile,"%88 - Z2u ---> Z8s\n"
,wordsEn],i-11,wordsEgreat_index[0]]);
p++;
}else{
printf("ZSs - X2u ---> 28s MISCLASSIFIED\n"
,wordsEn],i-11,words[great_index[0]]);
fprintf(outfile,"%88 - X2u ---> Z88
MISCLASSIFIED\n",wordsEn],i-11
,wordsfgreat_index[0]]);
for(n1I0; n1<8; n1++){
printf(" prob[ZBs]I%f\n"
,wordsfgreat_index[n1]],greatestEnIJ);
fprintf(outfile,"prob[%8s]I%f\n"
,wordsEgreat-index[n1]],greatestEnIJ);

}

tot++;

printf("recognition rate I X 5.2f percent\n"

,(double)p/(double)totI100)3

fprintf(outfile,"recognition rate I X 5.2f percent\n"

,(double)p/(double)totI100);

fclose(outfile);

/**********************************************************

This function inputs the training lpc quantized vectors I
from their respective files corresponding to each I

113

 

_..._. if“

 

training utterance and puts them in an array data[] I
/*****sssssssssassssassssssssssssss*sssssssssasassasasssas/

void input_hmms(infilename)

/*

char infilename[l;{

char input2[80]3

char prefix[80],in[6][40]3
FILE *infilel,*inf1162;
int i,j,n;

double extractword();

strcpy(infilename,"ti20.dat"); */

if ( (infile1 I fopen(infi1ename, "r")) II NULL){
printf("fopen failed for file Zs.\n",infilename);
exit(0);

nIO;
while( fscanfCinfilel,"%s\n",input2) !I EOF){
strcpy(wordsEn],input2);
strcpy(prefix,"c:\\hmmjd\\")3
strcat(prefix,input2);
strcat(prefix,".hmt");
if ( (infile2 I fopen(prefix, "r")) II NULL){
printf("fopen failed for file Zs.\n",input2);
exit(0);
}
printf("reading data from file Xs\n",prefix);
for(iIO; i<MURDER; i++){
fscanf(infile2,"%s Zs Is 28 Is %s\n"
.inEOJ ,infil ,inf2l ,inf3] 311114] ,inESJ);
for(jIO; j<MORDER; j++){
hmmafnl [j] [i] I atoi(infjl);
}
}
for(iIO; i<SYM; i++){
fscanf(infile2,"%s Zs Is Is 18 Xs\n"
.infol,in[1].in[2].in[3].in[4],inf5])3
for(jIO; j<MURDER; j++)
nnnbEnJEjJEi] a atof(in[jl)3

}
fclose(infile2);
n++;

if( n II R ){

114

fclose(infilei);

break;
}
}
fclose(infileI);
HIn;

}

/*****s*sssssss***sss*s***¢t*ttsttsssssssssss***sssssssss/

double computation(h1)
int h1;{
int j,t;
double ct,sum;
double xEMORDER];

for(jIO; j<MORDER; j++)
x[j]=13
sum I 0;
ctI0;
for(tIO; t<mt; t++){
for(jIO; j<MURDER; j++){
x[jl s- hnnn[h1][j][j];
}
ctI0;
for(jIO; j<MORDER; j++){
ct +I hmmbEhl][jJEdataEtJJIxfj];

}

sum +I log10(ct);
}
return sum;

115

H Program 8 - Compress

This program is used in combining the eigenvalues to implement the compression
techniques.

116

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <math.h>

#define SYM 128
/* number of symbols in B matrix */
#define TNUM 100
/* number of vq indices in each utterance I/
#define MURDER 6
/* model order of the A matrix */
#define HNUM 79

int ldataEWNUM][1013

int qdataEHNUlelolfTNUMl3

double hmnamuun] [MURDER] ,hmmbWNUM] [MURDER] [SYM] ;
double eigsum [HNUMIMURDER] , eigval [HNUMIMURDER] ;
int eigindeHNUMIMURDER],eigcnt[HNUMIMURDER];

int eigvalndfoNUMIMURDER];

char words[HNUM][10];

char infilename[80];

int H,mt,T;

int main(){
FILE Ioutfile;
int i,j,C3
double rate;
double computation();
void input_vq_symbols()3
void input_hmms();
void input_qdata()3
void qs()3
void exit()3
void quantize()3
void initialize();
int atoi()3
double recognition();

117

}

strcpy(infilename,"dwrdsnu.dat");
outfileIfopen("outcm1bu.log","w");

input_hmms();
input-qdata();
initialize();
for(jIO; j<w; j++>
printf("Xs\n",words[j]);
qs(eigsum,eigindx,0,T-1);
qs(eigval,eigvalndx,0,T-1);
rateIrecognition();
printf("23d rateI%f\n",T,rate);
fprintf(outfi1e,"%3d %f\n",T,rate);
cIO;
for(iIO; i<T-1; i++){
if( eigsumfi] II eigsum[i+1] ){
c++;
}
}
for(iIO; i<c; i++){
printf("13d rateIXf\n",T-i-1,rate);
fprintf(outfile,"%3d %f\n",T-i-1,rate);
}
quantize();
while( c < T-1 ){
quantize();
quantize();
quantize();
rateIrecognition();
CIO;
for(iIO; i<T-1; i++){
if( eigsumEi] II eigsumfi+11 ){
c++'

}
}
printf("23d rateI%f\n",T-c,rate);
fprintf(outfile,"%3d Xf\n",T-c,rate);

/*******************************************************/

118

double recognition() {
int i,k,hmm,p,n,tot;
double rate;
double probeNUM];
double computation();
double greatest;
int great_index;
void input_vq_symbols();
void input_hmms();
void qs();
void exit()3
int atoi()3
pItotIO;
for(nIO; n<H; n++){

for(iIO; i<5; i++){

}
}

for(hmmIO; nnm<w; hmm++){
probEhmm] I computation(hmm,n,i);
}
greatest I prob[0]3
great_indexI0;
for(kI1; k<H; k++){
if( greatest < probfkl ){
greatest I probEkl3
great-indeka;
}
}

if( n II great-index ){
p++;

}

tot++;

rateI(double)p/(double)totI100;
/I printf("recognition rate I X 5.2f percent\n",rate);I/
return rate;

}

/sssssssasssssssssssssssssssssssssssssssssssassssssssssss/
void input_qdata(){

char input3[15];

char prefix[80],prefix9[10];

FILE Iinfile2;

int i,k,n,t;

119

int atoi();
for(nIO; n<H; n++){
kIO;
for(iI11; i<16; i++){
switch(i){
case 11 : strcpy(prefix9,"11"); break;
case 12 : strcpy(prefix9,"12"); break;
case 13 : strcpy(prefix9,"13"); break;
case 14 : strcpy(prefix9,”14"); break;
case 15 : strcpy(prefix9,"15"); break;
default : break;
}
strcpy(prefix,"c:\\qdata\\")3
strcat(prefix,words[n]);
strcat(prefix,".q");
strcat(prefix,prefixQ);
if ( (infile2 I fopen(prefix, "r")) II NULL){
printf("fopen failed for file Zs.\n",prefix);

exit(0);
}
printf("reading in data from
file 2s ..... \n",prefix);
tI0;

while( fgets(input3,15,infile2) !I NULL){
qdatafnlfklft] I atoi(input3);
t++;

}

ldata [n] [k] '13 3

fclose(infile2);

k++;

strcpy(prefix,"c:\\qdatars\\")3

strcat(prefix,words[n]);

strcat(prefix,".q");

strcat(prefix,prefixQ);

if ( (infile2 I fopen(prefix, "r")) II NULL){
printf("fopen failed for file Zs.\n",prefix);

exit(0);
}
printf("reading in data from
file Xs ..... \n",prefix);
tI0;

while( fgets(input3,15,infile2) !I NULL){

120

qdataEn][k][t] I atoi(input3);
t++;

}

ldata [n] [kl It 3

fclose(infile2);

k++ '

9

}

/*********************************************************/

void quantize(){
int i,j,n,indx;
double tempsum,dist,prvdist;
int tempcnt;

distIO;
for(iIO; i<T-1; i++){
prvdist I dist;
dist I eigsum[i+1]/eigcnt[i+1] - eigsumEi]/eigcnt[i];
if( dist !I 0 kt prvdist II 0){
tempsum I eigsum[i+1] + eigsumEi];
tempcnt I eigcnt[i+1] + eigcntEi];

indei;
break;
}
}
for( iIO; i<indx+2; i++){
eigsum[i]Itempsum;
eigcntEilItempcnt;
}

for(iIO; i<T; i++){
nI(int)floor((double)eigindx[i]/6.0);
jIeigindXEiJ-nI6;
hmmaEn][jJIeigsumEiJ/eigcntfi];
}
}

[sassssssssssasssssssasasssssssssssssassssssssssssssssssss/
II Alternate quantization procedure

void quantize(){

121

int i,j,n,indx;
double tempsum,dist,prvdist;
int tempcnt;

distIO;
for(iIO; i<T-1; i++){
prvdist I dist;
dist I eigsum[i+1]/eigcnt[i+1] - eigsum[i]/eigcnt[i];
if( dist !I 0 && prvdist II 0){
tempsum I eigsum[i+1] I eigsumEi];
tempcnt I eigcnt[i+1] + eigcntEi];

indei;
break;
}
}
for( iIO; i<indx+2; i++){
eigsumEiJItempsum;
eigcntEilItempcnt;
}

for(iIO; i<T; i++){
nI(int)floor((double)eigindx[i]/6.0);
jIeigindeil-nIG;
hmmaEn][jJIeigsumEiJIeigcntEi];
}
}
*/
/*sassssssssssssssssssssssssssssssssssssssssssssssssssssss/
void qs(item,itemindx,left,right)
double itemEMURDERIHNUM];
int itemindfoURDERIHNUM]; /I added I/
int left,right;
{
register int i,j;
double x,y;
int h; /* added */
iIleft;
jIright;
x=item[(left+right)/2];
do {
while(itemEi]<x && i<right) i++;
while(x<item[j] && j>left) j-I3
if(i<sj){

122

yIitem[il;
itemEilIitemEj];
itemfjl=y3
hIitemindXEil3
itemindxfil=itemindxfjl3
itemindxfleh;
i++;
j--;
}
} while(i<-j);
if(left<j) qs(item,itemindx,left,j)3
if(i<right) qs(item,itemindx,i,right);
}

/**sssss**sssssssassssssssssssssssssss*sssssssssssssssssss/
void initialize(){
int i;
for(iIO; i<wNUMsMURDER; i++){
eigcntEi] I 13
}

[sssssssssssssssssssasssssssssssssssssssssssssssssssssssss
This function inputs the training 1pc quantized vectors I
from their respective files corresponding to each I
training utterance and puts them in an array data[] I
ssssssssasssssssssssssssssssssssssssss*ssssssssssssssssss/
void input_hmms(){

char input2[80];

char prefix[80],in[6][40];

FILE Iinfile1,Iinfile2;

int i,j,n,s;

double extractword();

if ( (infile1 I fopen(infi1ename, "r")) II NULL){
printf("fopen failed for file Xs.\n",infilename);
exit(0);

}

nISIO;

while( fscanfCinfilel,"Xs\n",input2) !I EUF){
strcpy(words[n],input2);

123

strcpy(prefix,"c:\\hmmcomb\\");
strcat(prefix,input2);
strcat(prefix,".hmt");
if ( (infile2 I fopen(prefix, "r")) II NULL){
printf("fopen failed for file Xs.\n",input2);
exit(0);
}
printf("reading data from file Zs\n",prefix);
for(iIO; i<MURDER; i++){
fscanf(infi1e2,"%s 18 Is 23 la %s\n",in[0]
,in[1],in[2],in[3],in[4],in[5]);
printf("inEZdJIXs atof(in[2d1)I%f nIZd "
,i,in[i],i,atof(in[i])3n);
hmmaEnJEi] I atof(in[i]);
printf("hmmaEZdJ[Zd]IXf\n",n,i,hmma[n][i]);
eigval[s]I(pow(10.0,hmma[n][1])I1)/93
eigvalndesJIs;
eigsumfs] Ihmma [n] [i] ;
eigindeslIs;
s++;
}
for(iIO; i<SYM; i++){
fscanf(infile2,"%s 2s Is Is 28 %s\n",in[0]
,in[1],in[2],in[3],in[4l,in[5]);
for(jIO; j<MURDER; j++)
hmmbEnJEjJEi] I atofCinfjl)3

}
fclose(infile2);
n++;
if( n II V ){
fclose(infilel);
break;
}
}
fclose(infile1);
“In;
TIs;

}

/**********t**********t**#*******************************/

double computation(h1,n,i)
int h1,n,i;{

124

int j,t;
double ct,sum;
double x[MURDER];

for(jIO; j<MURDER; j++)
x[j1=13
sum I 0;
ctIO;
for(tIO; t<ldata[n][i]; t++){
for(jIO; j<MURDER; j++){
x[j] *a hnnn[h1][j]:
}
ctIO;
for(jIO; j<MURDER; j++){
ct += hmmbfhllfjlfqdatafnlfi][tl]*x[j]3
}
sum += log10(ct);
}

return sum;

125

"I11111111111111'Ts