THESIS

 

:j,,v‘-_“.‘.:LI._;V_ ._-. _ _ .v A“- ' V}
I l

' EW‘Z"? 318%
LE $3; «1:5th

{ Micmw state 1
‘ Univwsi
l _ ”J

This is to certify that the

thesis entitled

COMPUTER VOICE IDENTIFICATION METHOD
BY USING INTENSITY DEVIATION SPECTRA
AND FUNDAMENTAL FREQUENCY CONTOUR

presented by

Hirotaka Nakasone

has been accepted towards fulfillment
of the requirements for

Ph.D. degﬁein Audiology and Speech

Sciences
%’

NIBJOI' prole§§or

Date November 8, 1983

 

0-7639 MSUis an A’ﬁ'mnrim’ ‘ ' '1, ’ " "J Institution

 

 

 

 

MSU

LIBRARIES

\—

 

 

RETURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

ml use our

no Not aermn

 

 

 

 

 

 

 

COMPUTER VOICE IDENTIFICATION METHOD
BY USING INTENSITY DEVIATION SPECTRA
AND FUNDAMENTAL FREQUENCY CONTOUR

By

Hirotaka Nakasone

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Audiology and Speech Sciences

198%

 

 

 

 

@ Copyright by
Hirotaka Nakasone

1981i

 

 

 

 

 

 

 

 

ABSTRACT
COMPUTER VOICE IDENTIFICATION METHOD
BY USING INTENSITY DEVIATION SPECTRA
AND FUNDAMENTAL FREQUENCY CONTOUR
by

Hirotaka Nakasone

The major purpose of this study was to investigate the
effectiveness of several speech parameters in eliminating the
influence of the transmission and recording channels of
unknown response characteristics. These speech parameters
were tested for their effectiveness in text-independent voice
identification by computer.

Text-independent speech samples were recorded from 10
male speakers randomly selected from a population of the
native speakers of Midwest American-English dialect, each
serving as unknown speaker and also known speaker. Recording
was made simultaneously by three different transmission and
recording devices. From these speech data, the following
parameters were extracted to represent the unknown and the
known speakers: l) intensity deviation spectrum (IDS), 2) a
set of fundamental frequency related measurements (PFC), 3)
long-term averaged spectrum (LTAS), and h) choral spectrum
(SPT). The principal algorithm utilized for the measurement
of the parameters l), 3), and h) was the Fast Fourier
Transform (FFT); and for parameter 2), the interactive peak

picking technique was employed. All speech parameters were

 

 

 

 

Hirotaka Nakasone

 

subjected to pre-processing in order to select optimum
features from each parameter by using the hierarchical
clustering, F-ratio, and data standardization. Distance
between the speakers was measured by Euclidian distace. The
decision rules employed were the nearest-neighbor rule and the
minimum set distance rule. The rate of correct identification
served as the criterion to determine the effectiveness of each
parameter in elimination of the influence of the response
characteristics.

From the results of this study, the following general
conclusions were drawn: Both IDS and FFC were found to be
effective in eliminating the influence of the transmission
and/or recording channels, but their correct identification
rates were only moderate (50-602.) The composite parameter of
FFC and IDS was found to be effective in eliminating the
influence of the response characteristics although, the
correct identification rate was not improved, i.e.. it was as
good as each component of that composite parameter. The
composite of FFC and LTAS was found to be the most effective
parameter in eliminating the influence of the transmission
system by achieving the highest possible correct

identification rate (100%.)

 

 

Dedicated to
My mother, Haru Nakasone

 

 

 

dis

IIIOS

men

hel

tOI

Dr

St

ma

 

 

 

 

ACKNOWLEDGMENTS

I am deeply indebted to Dr. Oscar Tosi, my professor and
dissertation director, whose guidance, encouragement, efforts, and
most of all patience, made the completion of this study possible.

I would like to extend my sincere appreciation to Committee
members, Dr. Leo V. Deal and Dr. Paul A. Cooke, for their
helpful suggestions, corrections, and constructive criticisms
toward the completion of this study. I also would like to thank
Dr. Jose-Luis Menaldi of the Department of Mathematics, Wayne
State University, who also served as a Committee member, for his

mathematical assistance and critical review of the computer
programms developed for this study.

Special acknowledgments are due Dr. Richard C. Dubes,
Department of Computer Science, who kindly made the graphic plotter
available, and Dr. Ernest J. Moore, Chairman of the Department of
Audiology and Speech Sciences, for his interest and valuable
suggestions during the final oral defense meeting.

My warmest gratitude is expressed to Ms. Nancy Brown of the
State Bar of Michigan who proofread my draft, Dr. Christina
JacksoneMenaldi for her companionship and her enthusiasm about this
study, and Dr. Paul N. Deputy of Idaho State University, who

first introduced me to this exciting field of Speech Science during
the early stages of my graduate program.

Special recognition is noted to a number of individuals who

provided their voiCe samples for this study. I am very grateful

iv

 

for their

Fina
who gave
in typing

final veI

for their cooperation.

Finally, but certainly not the least, I thank my wife, Aiko,
who gave me her unconditional support and most qualified assistance
in typing and retyping the numerous rough drafts, as well as the

final version.

LIST OF

LIST OF

CHAPTER

I.

TABLE OF CONTENTS

Page
LIST OF TABLES ............................................. viii
LIST OF FIGURES ............................................. ix
CHAPTER
I. INTRODUCTION ....................... ..... ............. l
Statement of the Problem ... ..... .. ....... ......... 7
Purpose of the Study .............................. IO
Significance of the Study ... ...... ...... ..... ..... lh
Limitation of the Study .................. ....... .. l5
Background in Selecting Speech Parameters . ........ l6
Literature Review .... ....... ... ..... ............ 18
Definition of the Terminology ..................... 28
Organization of the Study ......................... 32
II. EXPERIMENTAL PROCEDURE ....... ....... ..... .... ........ 33
List of Equipment .... ............ . .......... ... 33
Recording of Phonetic Materials .......... ......... 3A
Speakers and Phonetic Materials ...... . ...... ... 3h
Recording Setting . ..... ...... ............... 35
Arrangement of Speech Data ............ ......... 37
Pre- processing of Speech Data ........ .......... ... 39
Digitization ......... ..... ..... ...... .......... 39
Pause Elimination ............ ...... . ..... ...... ho
Extraction of Speech Parameters ................ Ah
IDS (intensity deviation spectrum)
LTAS (long-term averaged spectrum)
FFC (fundamental frequency contour)
SPT (choral spectrum)
Optimization of Features ....................... 56
Voice Identification Experiment .............. ..... 70
Organization of Experiment ......... ............ 70
Distance Measurement and Standardization ....... 70
Decision Rules ....................... ..... ..... 72

The nearest-neighbor decision rule
The minimum set distance decision rule

vi

 

CHAPT

III

REFEF
APPEI

IPPEI

APPEI

APPEI

APPEI

APPEI

APPE

APPE

TABLE OF CONTENTS (continued)

CHAPTER Page
III. RESULTS ..... ........................................ 76
IV DISCUSSIONS AND CONCLUSIONS ......................... 99
Discussions ...................................... 99

On Speaker Population ......................... 100

On Effects of Feature Optimization ............ l02

0n Composite of Parameters .................... l03

0n Influence of Pause Elimination ............. l08

On Interactionn by the Experimenter ........... l09

Conclusions ...................................... IlD
Implications for Further Research ................ lll

REFERENCES .................................................. llh
APPENDIX A A SAMPLE TEXT EXCERPT .......................... 118

APPENDIX B RESPONSE CHARACTERISTICS OF TEAC TAPE
RECORDER AND BRUEL 5 KJAER MICROPHONE .......... ITS

APPENDIX C SUMMARY OF THE RESULTS FROM PAUSE ELIMINATION .. 120

APPENDIX D COMPLETE-LINK DENDROGRAMS OF IDS AND

LTAS FEATURES .................................. 12]
APPENDIX E RAW DATA 0F FFC FEATURES ....................... 127
APPENDIX F A SAMPLE OUTPUT: PARTIAL COMPUTER PRINTOUT

OF VOICE IDENTIFICATION ........................ I30
APPENDIX C VOICE IDENTIFICATION RESULTS:

OPERATIONS l THROUGH 2h ........................ l3l
APPENDIX H LIST OF FORTRAN SOFTWARE ....................... 155

Tables

I.

Tables

1.

LIST OF TABLES

page

Results of the feature clusterings and
corresponding F-ratios of (a) IDS and (b) LTAS ........ 60

Optimum features selected for IDS parameter
(as denoted by circles in Table 1(a)) ....... ....... ... 65

Optimum features for LTAS parameters
(as deoted by circles in Table l(b)) .................. 66

Nine features and F-ratios of the FFC ................. 69

Summary of the results of the cross-transmission
voice identification operations. ............. ......... 77

Summary of the results of 2A voice identification
operations ........................ ...... ..... ......... 79

viii

Figure

i.

ll.

l2.

LIST OF FIGURES

Response curve of a transmitting and recording

system including a commercial telephone line and

a magnetic pick- up attached at the receiver end

of the line ........................ ..... .............

A diagram showing equipment used for three
simultaneous transmission and recording systems ......

Sample arrangements of text-independent phonetic
materials from two speakers ............ .............

A graphic illustration of pause
elimination procedure ............ ......... ...........

Two sound spectrograms showing examples of
an input speech and a resulting output speech
after pause elimination ........... ........ .... .......

Computer plottings of three IDS's generated from
the text-independent speech data of a speaker ........

Computer plottings of three LTAS's generated from
the text-independent speech data of a speaker ........

Photographs of the CRT displaying the interactive
peak detecting procedures ............ ...... .. ........

Illustration of F0, F0, and A0 by using
three consecutive peaks in a simplified
short speech segment .. ........ .......................

A plotting of a sample choral spectrum based on a ll
second long speech taken from a speaker ..............

A diagram illustrating the nearest-neighbor
decision rule ............................. ...... .....

A diagram showing an example of the minimum
set distance rule ......................... .......... .

Sammon's projections of 5 known speakers and 5
unknown speakers, all represented by telephone
IDS parameter ........ ........... .. ....... ............

Sammon's projections of 5 known speakers and 5
unknown speakers, all represented by telephone
LTAS parameter .......................................

ix

Page

36

38

A2

A3

A5

#8

50

5h

55

73

75

83

88

 

Sammon's projections of 5 known speakers and
5 unknown speakers - knowns represented by the
composite parameter of FFC and LTAS by normal
transmission system and unknowns represented
by the same composite parameter by telephone
transmission system ............

...................... 9h

MI
genera'
include
method:
(short
recordI
sPeakeI
upon
characI
Voices
IUdgnEI
PFOdUCI
examini
the u.
the 5p.
object
deCIsh
exam,“
Subjec.
pr°9ral

idAnti-

 

CHAPTER I

INTRODUCTION

Methods of voice identification can be classified into two
general groups: subjective and objective. The subjective methods
include aural and spectrographic examination; the objective
methods are usually performed by using a computer. Aural methods
(short term and long term memory) are performed by listening to the
recorded voices of an unknown and a known or by remembering the
speaker-dependent features of a voice. These are primarily based
upon perceptual extraction of the speaker-dependent speech
:haracteristics. The final decision regarding the identity of
Ioices is made by the human examiner based upon his subjective
iudgment. With the spectrographic methods, the sound spectrograms _
Iroduced from speech samples under study are examined. The
:xaminer compares the acoustical characteristics of the known and
he unknown voices displayed on the three dimensional plottings of
he spectrograms (frequency, intensity, and time). In spite of the
bjective means of displaying the speech parameters, the final
ecision still belongs to the subjective judgment of the human
(aminer. Hence, both aural and spectrographic methods are called
ijective methods (Tosi, I979). When a computer is properly
ogrammed with a set of algorithms, the results concerning

entification of the voices are reproducible -- when similar types

 

of dam
expecteI
commonlj
recogni‘

ThI
name I
determiI
samples
speaker:
voice i<

ACI

SDI

include:
equ I V3 II
I

‘ I
V°Ice It
Specifi,
of an Ul

speaker

 

 

data are submitted to the same procedure, the same output is

cted. Hence, computer method is considered to be objective,

only referred to as automatic or semi-automatic speaker

gnition.

The term 'voice identification' has been applied as a generic
which encompasses various aspects in the process of

'mining the identity of an unknown speaker, given his/her voice

es and voice exemplars collected from one or more known

.ers. To be more specific, Tosi (I979) classifies tasks of
identification as follows:

According to the composition of unknown and known voice

samples, tests of voice identification or elimination can

be classified into three groups: discrimination tests,

open tests, and closed tests. In the discrimination

tests the examiner is provided with one unknown voice

sample and one known voice sample. He has to decide

whether or not both samples belong to the same talker....

In the open tests the examiner is given one unknown voice

sample and several known samles. He is told that the

Jnknown sample may or may not be found among the known

samples.... In the closed tests of voice identification

the examiner is also given one unknown voice sample and

:everal known voice samples but he is told that the
Inknown voice sample is also included in the known voice

:amples....(pp. 4-5).

n this stUdy, since in all tests the unknown voice was always
ed within the known voices, the task is considered to be
lent to the 'closed test' quoted above, excepting that the
ner' is being replaced by a 'computer'. Hence, the term
identification' as used in this study covers only this
ic task which is described as follows: Given voice samples
Inknown and a group of knowns, the task is to select a

whose voice sample is the closest to that of the unknown.

 

be I
meaSI
speeI
storI
are

such
feah
the

eanI

non
obta
wher
inte
(day
of v

than

cone
sPea
the

Como
Sam:
com;

The

 

The standard procedure of voice identification by computer can
oroadly divided into three major stages: data collection,
Jrement, and identification processing. In the first stage,
:h samples from a given group of speakers are recorded and
ad. In the second stage, speech parameters (characteristics)
measured. This stage includes a series of pre-processings,
as filtering, deletion of pauses and gaps, extraction of
Ires, and statistical processing for feature optimization. In
.hird stage, the identification operation is performed by
ying apprOpriate decision rules or criteria.

Speech samples collected can be either 'contemporary' or
ontemporary' (Tosi, I979). Contemporary speech samples are
ned from each speaker during the same recording session,
as noncontemporary speech samples are recorded over some time
Ials depending upon the scope of the researcher's interest
. weeks, months, or years). It has been noted that the task
'ce identification is easier with the contemporary samples
Iith the noncontemporary samples.

.nother aspect involved during the data collection stage is
ned with the type of the phonetic content spoken by the
rs: 'text-dependence' vs. 'text-independence'. When all
peech samples of the speakers are the same in context, it is
Iy referred to as 'text-dependence'. In this case_the speech
5 of the speakers under an identification process are
:d word by word, phrase by phrase, or sentence by sentence.

Ijor advantage of text-dependent speech samples is that they

 

can be U'
of this i
many rI
commercia
In
speakers
text-indl
duration
who use
(19%)
sentence
them in
Of text-
identifi
rePorted
Seconds
rate. I
differen
when the
Influenc
d°mlnate
duration
Idehtifi
ltext‘In
°°nsiste

makers

 

be utilized in every method of voice identification. Because
.his advantage, this type of text has been rigorously studied by

researchers, but mainly directed toward industrial or
ercial applications.

In 'text-independence‘, all the speech samples spoken by the
kers are different in phonetic content. The duration 6f the
-independent materials must be relatively long, and the minimum
tion appears to vary in length depending upon the researchers
Jse the term 'text-independence' somewhat differently. Atal
4) generated the text-independent speech sample from a single
once by cutting it into no equal segments and later recombining

in random order. He reported that the minimum of two seconds
ext-independent speech sample resulted in a high correct
:ification rate. Bunge (I978) and Furui et 8]. (I972)
ted the minimum duration of a close agreement of II and lo
'ds is required for a sufficiently high correct identification

In Bunge's study, Al male and 9 female speakers produced 50
rent texts, each text lasting ll seconds. It was found that
the text length was decreased to below ll seconds, the
nce of the text became increasingly obvious and finally
ted the Speakers identity. He concluded that an ll-second
on was a limit for text-independence for a high correct
fication rate. Markel and Davis (I979) defined the term
indepenent' a little more stringently. Their speech data
ted of the extemporaneous speech material from 17 male

rs, each speaker recorded in five interview sessions at one

 

week II
from II
speech
39-secc
high <
scope c
speech
languag
Each 5
from bc
by one
text-ir
promisi
SI
acousti
Speech
which a
element
air p,
Stream
°I turt
and Pic
taking
V°Cal l
c°Illiair

Speake,

 

ek intervals. They attempted to make the speech samples free
om linguistic constraint and further free from the manner of
eech production. With this type of text-independent materials, a

-second text length (containing only voiced frames) resulted in

 

1h correct identification. Tosi et a], (I979) extended the
pe of text-independence to different languages. In their study
ech samples were obtained from 20 speakers who could speak three
guages (Piamontes, Italian, and French) with equal fluency.
‘h speaker was recorded while reading a l0-minute long passage
m books and newspapers in three sessions, each session separated
one week, and concluded that automatic voice identification with
t-independent speech materials of different languages is
nising.

Speech samples so collected are then processed for extracting
JStIC speech parameters to represent the speakers. Acoustic
:ch parameters are the measurements derived from speech signals
‘h are considered to consist of three major elements. The first
ent is the energy source coming out of the lungs as a stream of

pressure. The second element is a modulation of this air
am into vibratory motions (for voicing) set by the vocal cords
urbulence of air in a constriction of the vocal tract (friction

losive). The third element involves resonance phenomena

9 place as the modulated air pressure traverses through the

tract (pharynx, oral and nasal cavities). In each element

ined is information of the linguistic contents as well as the

er characteristics.

 

 

inten
fundar
rise

consir
parti<
outpui

charar

ident
combir
short-
frequr
basic
furthr
the
identi
set I
SPEakr
l
a Sp,
the \
SPEaII
Variai
Variar

Var I at

 

 

he first element carries variation of overall speech
ity as a function of time. The second element determines the
ental frequency and its harmonics of voiced phonemes giving
to perceptual pitch of the speaker. The third element is
ered to be the most important because resonance gives a
ular shape or envelope to the spectrum of the speech sound
which includes both a phonetic content and the individual
teristics of each speaker (Tosi, I979).
peech characteristics (parameters) used in voice
fication by a computer are generally extracted from one or a
tion of these acoustic elements. Often used parameters are
erm spectra, long-term spectra, formants, fundamental
ncy, and other variations statistically derived from these
parameters. Usually, a certain number of features are
' selected from the parameter and these features represent
:peaker in a multidimensional space. In general, the
'ication process is based on the distance measured between a
features of the unknown speaker and that of the known
5.
e basic implicit assumption for voice identification is that
ker can be distinguished by his/her speech signals and that
riation of speech characteristics within an individual
differs from that of the other speakers. The former
an is commonly referred to as the 'intra-speaker

ity' and the latter is referred to as the 'inter-speaker

ity'. The sources of the intra-speaker variability can be

 

 

attri
produ
physi
over

inter
confi

and i

Proce
metho
indiv
and/o
the
alter;
Imper
contei
study
frequ
SPeak
This
Ident

IOng l

the

 

ributed to different emotional status, various manners of speech
iuction demanded by different circumstances, and small
siological changes in articulatory apparatus of the same person

an interval of time. On the other hand, the sources of the
:r—speaker variability are the different vocal tract
‘igurations, physiological characteristics of the vocal cords,

idiosyncratic speaking habits of different speakers, etc.

ement f the Problem

 

There are several sources which could interact with the
ess of voice identification by a computer or by any other
ad. These are distortion of speech characteristics of an
widual due to the unknown response curve of the transmission
or recording devices, various types of noise which deteriorate

intelligibility of speech samples and intentional self
'ation of voice either to disguise the identity or to
sonate another person. In addition, differences in phonetic
Int and duration of the speech utterances of the speaker under
interact with the procedures of voice identification. 0n
ent occasions, the phonetic content spoken by the unknown
er can be different from the one spoken by the known speaker.
condition calls for so-called 'text-independent' voice
ification, which usually requires speech samples of relatively
uration obtained from each speaker.
his study focuses on the problem caused by the influence of

transmission and recording devices by using the

 

 

text-ine
Te'
variable
junctior
conditie
real III
the une
call, wl
moment
for late
response

distorti

(I!
O
I

a.
O
I

O)
o

h-

lnrsnsrrv (as)

M
O

h

 

FiQUre
incl
atta
1979

-independent and contemporary phonetic materials.
Telephone transmission contains in itself several sources of
ables such as the carbon microphone, number of connecting
tions, line distance, and carrier systems. Under this
Ition, the response characteristics of the telephone line, in a
life setting, cannot be determined. This is largely due to
uncertainty of the telephone line involved in each telephone
which is random, according to the existing traffic at the
It of the call. Ordinarily the speech signals must be stored
ater processing in a recording medium which also has its own
nse characteristics. An example of this type of combined

rtion in response characteristics is given in Figure l.

 

———————-—1-<-I-————-v- \—T——-a—_I
.-_____.-...._7J__-_-_I ..... .--

.-..__-.-. ..----____--.----J
y .l/

.05 0.1 0.2 0.5 1 2 3 5 1O
FREQUENCY (KHz)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

->

 

e 1. Response curve of a transmitting and recording system,
:luding a commercial telephone line and a magnetic pick-up
:ached at the receiver end of the line (taken from Tosi,
'9).

trans
charae
sample
conver
respor
of the
and I
only c
to pr
distor
resear

Propos

 

 

samples of the unknown speaker

In many cases, speech

nsmitted and recorded through the system which has the

racteristics as shown in Figure I are compared with speech

ples of the known speakers transmitted and recorded through a
ventional microphone-to-tape recorder which has relatively flat

ponse characteristics (linear). In such a case, the magnitude

the distortion present in two entirely different transmission

recording systems may be even more serious than the case where

one type of the transmission system is being used. It seems

preclude any reasonable effort to eliminate this type of

tortion made up of the unknown sources of variables. One

earcher (Tosi, 1979) being pessimistic about this phenomenon,

Yosed a very interesting idea as a possible solution:

....elimination of perturbing telephone influence could
consist of including a ‘standard' burst sound at the
beginning of every telephone communication. Because the
real spectrum of such a 'standard' sound would be known,
‘ the transfer function (response characteristics) of the
telephone line could then be easily computed. (p. 55)

te, unfortunately, this idea has not yet been realized.

Other alternative methods to eliminate the influence of the

ency distortion are 'normalization' procedures on the speech

eters extracted (Atal, I978; Furui, l98la; Bunge, l978; Tosi

Nakasone, I980) and selection of the appropriate speech

eters which are considered to be inherently resistive to the

ency response characteristics (Atal, l972; Markel et 3],

Hunt at 3],, I977). Since these alternative methods are

mented in this study, details will be discussed in a later

chaI

unde

tesl

ever
(to
rate
coo;
char
the

all

com
and
Pie:
mini
betv
Vole
lace
thaI

cAll

 

 

The major purpose of this study was to investigate the
tiveness of several speech parameters in eliminating the
ence of the transmission and recording devices of the
ined response characteristics. These speech parameters were
d in text-independent voice identification by a computer.
It is known that a speaker can be recognized by his/her voice
when the content of the text spoken is different
-independent). Among many conditions necessary for a high
of success, two factors are essential: l) The speakers are
rative (no disguise/mimicry, small variation in voice
:teristics from one recording session to another, etc) and 2)
ame transmission recording device is used for acquisition of
ie unknown and known voices. I
n the present study, one of these two factors was under
I, i,e,, all the speakers were cooperative, rendering clean
elatively uniform texts. Each speaker read an excerpt
ibed for him and recorded in a single recording session, thus
zing possible variation in his speech due to a time interval
sessions. The other factor was intentionally varied, i,e_,
amples were collected through different transmission and
ing devices. An annoying outcome of the latter procedure is
0 identical speech samples of one speaker (one sample being

ed by one transmission and recording device, another sample

bein
resu
iden
tran
reco
prob

tran

this
char
aPpl
spec
fund
Spee

aPPl

succ
Proc
two

III.
unkr
Char
com;
Idel
rev]

fune

 

 

ing collected by another transmission and recording device) may
ult in different spectral shapes and consequently yield a false
ntification. This type of problem can be easily solved if the
nsfer functions of the transmission systems involved in the
ording are well—defined. But the prime obstacle in solving this
blem'is that in most cases the true transfer function of the
nsmission recording channels is not known.
The present study focused on alternative approaches to solve
5 problem of eliminating the influence of the undefined response
racteristics upon the speech samples. These approaches were U
lication of the speech parameter IDS (intensity deviation
ctra), 2) application of the speech parameter FFC (several
damental frequency related measurements), 3) application of the
ech parameter LTAS (long-term averaged spectrum), and A)
lication of the composite parameter of l) and 2) and of 2) and
The IDS is a spectrum statistically derived from a set of
essive short-term spectra. The prototype of computational
edure for IDS prameter was introduced by Bunge (I978). IDS has
properties: By definition (details to be discussed in Chapter
it cancels the influence of the transmission systems of the
own response characteristics, and it represents dynamically
ging spectral structures of the speech signals. The FFC is
sed of a set of fundamental frequency related measurements
ined in this study) derived from a pitch contour. Literature
wed in the field of voice identification indicates that the

mental frequency is a sufficiently effective speech parameter

 

 

 

in di
respon
record

I
parame
pre-pr
optimi.
statis
proced
PDP ll
Scienc

T;

Parame

Io

 

 

distinguishing speakers and is relatively insensitive to the

nse characteristics of the transmission line used for
ding the speech samples.

In order to enhance the 'elimination' effects, all speech
eters (except choral spectra) were subjected to a series of
rocessing procedures, such as pause elimination, feature
ization by the hierarchical clustering technique and F-ratio
stics, and standardization of features. Most of the
dures were carried out by FOrtran softwares implemented on a
l/AO minicomputer at the Department of Audiology and Speech
ces, Michigan State University.
he following assumptions on the performance of each Speech

eter were set up:

The IDS is sufficiently effective in eliminating the

influence of the frequency distortion from the voice samples
of the unknown and the known speaker recorded through
different transmission and recording systems, and it is
equally effective for the voice samples recorded through the
same transmission and recording system.

The FFC is sufficiently effective in eliminating the
influence of the frequency distortion from the voice samples
of the unknown and the known speaker recorded through
different transmission and recording systems, and it is
equally effective for the voice samples all recorded through
the same transmission and recording system.

The LTAS is highly effective in eliminating the influence of

 

 

 

13

the frequency distortion from the voice samples of the
unknown and the known all recorded through the same
transmission and recording system, but the effectiveness is
decreased when the voice samples of the unknown is recorded
through one transmission and recording system and that of
the known through another system.

Choral spectrum is assumed to be as effective as the LTAS
for the same conditions described in 3.

The composite parameter of FFC and IDS increases the
effectiveness level in eliminating the influence of the
frequency distortion from the voice samples of the unknown
and known recorded through the different transmission and
recording system.

The composite parameter of FFC and LTAS increases the
effectiveness level in eliminating the influence of the
frequency distortion from the voice samples of the unknown
and the known recorded through the different transmission

and recording systems.

test the above assumptions, a total of 2b voice
ication operations were conducted in different designs
ng to various combinations of the speech parameters and the
ssion systems used for recording the unknown and the known
5. Each operation yielded the results in terms of the rate
rect identification, which served as the measure of the

veness of each parameter tested.

 

mu

alone
perso
chang
ident
much

the o
searc
distu
there
direc

other

resea
trans
text-
large
repor
°Ptin
haVe
StUdy
text-
the F
and

Speak

 

14

'gnificance _j the Study

It is known that a person can be recognized by his/her voice

 

one when heard live or over the telephone line, provided that the
rson is somebody the listener is familiar with. Despite some
ange in the perceptual quality of the voice, the judgment on the
entity of a speaker does not seem to be critically influenced too
ch by the transmission line. This is the underlying fact that

objective of this study rests on. This study was designed to
rch and study several speech characteristics which are not
turbed by transmission and/or recording devices. It is hoped,
-refore, that the results from this study will contribute
‘ectly or indirectly to understand more about human speech sound
Ier than the linguistic message it carries.

Another justification for this study is the scarcity of
learch reports dealing with the problem of the influence of the
nsmission and/or recording devices coupled with the
t-independent speech materials for voice identification. A
ge body of research reports is available, though most of these
arts are based on the text-dependent speech data recorded under
imaly controlled conditions. To date, only a few researchers
3 been concerned with these two problems together. One such
iy conducted by Hunt at a]. (1977) consisted of the
:-independent materials spoken by l3 speakers transmitted over
FM radio broadcast. However, they used only one transmitting

receiving system of high quality for recording all the

kers. Therefore, even though excellent identification rates

 

 

were

trans

into.

text-

the u

throu

Admit

there

,—
S
ro-

] :

are a

three
SOftw
Utiii

infer

taxﬁ
tend;
ihtra

15

reported in their study, the adverse influence of the
mission and recording media does not seem to have been taken
account.
The problem addressed in the present study involved
independent voice identification tasks by using the voice of
nknown speakers distorted by the telephone transmission system
the relatively clean voices of the known speakers recorded
gh a conventional microphone-to-tape recorder system.
tedly, although this problem is very difficult to fully solve,

is a legitimate need for investigation.

ﬂotsam

This study was exploratory in nature. Therefore, the results
splicable only within the following limitations.

iirst, a nominal size of lo speakers were employed. This size
iontrivial considering that actual speaker size was tripled by
simultaneous transmission systems and that only Fortran
‘res were available (no real-time hardware processor was
ed). Because of this small speaker size, no statistical
nce was attempted for generalization of the results.

econd, all the speakers were recorded only once (contemporary
while reading in more or less the same style. These two
ons obviously contributed in producing the unusually small
peaker variability. Thus, the results from this study can

ralized only to these types of speech data.

for
cri

dem.

for
idel
was
thi:
eXp

mod:

Whei
dun
COD‘
ten:
rev
COn:
Piel
tho;

(ID)

 

16

Third, though a mini computer was applied as the major tool
data processing including decision algorithms, at a few
cal points arbitrary intervention by the experimenter was
ided. For this reason, the design of the voice identification
iis study was not meant to be a completely 'objective' method.
Finally, the term 'voice identification' operationally defined
the purpose of this study was 'closed' type, i,e,, in each
ification process, the speech sample of the unknown speakers
always included among those of the known speakers. Presumably
type of identification procedure can be considered as
ratory, or linient, in terms of its credibility; hence, the
tested in this study, as it is, has little immediate

cability to real environment.

:gggg lg Selecting §peech Parameters
ipeech samples in this study were not only under the influence
nknown response characteristics but also text-independent.
ext-independent voice samples are involved, a relatively long
on of speech is required in order to homogenize the phonetic
t of the different texts from all speakers. Within this
aint, several different types of speech parameters were
ad from the literature in this field. Speech parameters
:red ppimae facie were cepstral coefficients, linear
:ive coefficients (LPC), long-term averaged spectrum (LTAS),
*spectrum, variance spectrum, intensity deviation spectrum

and fundamental frequency contour (FFC).

 

its

to th
trans
cepst
to be
The L
respc
remai
Spect
sampl
trans
to ta
hiera

Witho

LTAS,
the f
and

In Sp
sPect
ident
Same
LTAS,

this

17

Of these seven parameters, the cepstrum was discarded despite
well recognized useful property -- it is relatively resistant
1e influence of the frequency response characteristics of the
zmission system. it was found to be impractical to apply the
:ral analysis for the amount of ' text-independent' speech data
a processed by using only Fortran softwares on a mini computer.
.PC was also discarded because of its susceptibility to the
>nse characteristics of the transmission system used. The
ning five parameters, LTAS, choral spectrum, variance
,rum, lDS and FFC were tested in the pilot study. Speech
es were collected from five male speakers through two
mission systems, telephone line and a conventional microphone
pe recorder. The identification process was performed by the
rchical clustering technique with complete-link method,
ut feature optimization procedure.

irom the results of the pilot study, it became apparent that
variance spectrum, and choral spectrum were very sensitive to
"equency response characteristics of the transmission system
,hat lDS and FFC were relatively insensitive to the influence.
te of their susceptibility to the influence, LTAS and choral
a were found to be very effective speech parameters for voice
fication provided all voice samples were recorded through the
transmission and recording channels. Consequently, IDS, FFC,
Ind choral spectrum were adopted as speech parameters for

udy.

ii

pro
the
stu
pur
sev

arr

of
fil
at

C3!

 

18

rature Review

A review of the related literature revealed that research
rts on voice identification by a computer contain vast
rsification in the methodology employed depending upon the type
number of speakers, phonetic materials, and the tasks involved.
ontrast, there are very few reports primarily dealing with the
lem of the transmission and/or recording media. It was,
efore, felt that organized presentation of these diversified
ies in methodology was neither practical nor essential for the
use and the scope of this study. Hence, this section presents
'al studies dealing with automatic voice identification,
1ged by the type of speech parameters employed.

A short-term spectrum is generated from a very brief portion
I speech signal. it can be produced by the use of a bank of
:rs simulated on a digital computer (Prusansky, 1963; Bricker
’7., l97l) or by using the Fast Fourier Transform. In either

the principal expression used is the Fourier transform:

NI‘ .
F0») =fo f<t)e'J‘°tdt

(for continuous signal)

 

when
spac

COHVi

appn
know
Spee.

Spec

shor
Pruz

shor

two
the

that

 

 

N-l ,
F(®) = E:f(nT)e—anToT (for discrete Slgnal)
n=0

re N = number of samples (discrete) points of a time function,
ced apart by a sampling period of T second. As used
ventionally, a large number of short-term spectra are created in
succession as ‘a function of time. Hence, this spectrum is
ropriately called 'temporal acoustic spectra' (Tosi, 1979) and
wn to carry ”....nearly all of the important information in
ech....” (Atal, l9i8). The duration of a single short—term
ctrum is typically in the order of 20-30 milliseconds.

Some researchers investigated the effectivenesss of the
rt—term spectra for a text-dependent speaker recognition.
:ansky (l963) generated the short-term spectrum by passing the
't phrase of four words taken from each of l0 speakers through a
:hannel filter bank, frequency ranging from loo-7000 H2. By the

of the product-moment correlation coefficients computed on the
-aligned spectra, she reported the correct identification rate
89% out of a total of 393 trials tested. The same speech data

later tested by Bricker at a], (l97l) with some modification
>rocedure, resulting in a higher recognition rate of 97%. The
:tudies cited were performed on speech data recorded through
same transmission and recording equipment, though it was noted

the spectra were strongly influenced by the frequency

 

cha
has
and

imp

alt
tern
spe
spe

197

thi
fi I
who
con
tex
tex

res

 

 

cteristics of the recording devices. The short-term spectrum
een often applied to the text-dependent data for which tedious
complex procedures for time alignment become extremely
tant during the recognition procedure.
Later long-term averaged spectrum was considered as an
native to compensate for the tedious procedures of the
ral alignment of the set of the text-dependent short-term
ra. Typically, long-term averaged spectrum is obtained from a
h signal of a relatively long duration (l-2 minutes in Tosi
70 seconds in Markel et a], l977; l0 seconds in Furui et
l972, ll seconds in Bunge l978). When a computer is used,
spectrum can be easily produced either by using a bank of
's, or by FFT algorithm. A unique property of this spectrum,
taken from a sufficiently long speech, is that phonetic
its of different utterances can be balanced, thus enabling
ndependent voice identification. Its potential use for
ndependent voice identification has been recognized by many
chers (Tosi et 3],, l979; Bunge, 1978; Majewski and Hollien,
provided there is no influence of the transmission and/or
ing media.
we variation of the long-term averaged spectrum is called
spectrum‘ developed by Tosi (l979). He defines the spectra
long-term Fourier transforms of temporal choral speech
2y, l958), which is produced from a temporal rearrangement
speaker's normal speech. The major difference between

spectrum and long-term averaged spectrum is seen in the

 

com

fas

chc

I’EC

tec

 

 

putational economy: The former is said to be generated much

ter than the other by a factor of about 20. Tosi et a], (l979)
ducted a text-independent voice identification by applying the
ral spectra. Speech samples of different languages were
orded from 20 speakers. By using the hierarchical clustering
hnique, they reported the identification error rate of 5 to 30%
ending upon the method used and suggested the promising utlity
the choral spectra for text-independent voice identification.

ver, inasmuch as the spectrum bears the same property as the

-term averaged spectrum, the choral spectrum is also known to
susceptible to the influence of the transmission media.

Linear predictive coefficients (LPC) has been studied by many
ech researchers for automatic speaker recognition (Atal, 197A,
3; Markel et 3],, l977; Markel and Davis, l979; He and Dubes,
t.) The LPC is usually derived by using the autocorrelation
rod from speech signals, revealing the spectral properties of

speech as a function of time. LPC can represent the
amental frequency and its harmonics when the order of the
ictor coefficients is relatively high (ho coefficients) and can

represent formants when the order of the predictor is low (l2
ficients), but it is susceptible to the frequency response of
ecording apparatus and the transmission systems. Atal (l978)
led the effectiveness of LPC (l2 coefficients) for automatic
er identification by using l0 female speakers. All speakers
ded six repetitions of the same short sentence, using a

quality microphone on two occasions at 27-day intervals. Each

 

 

utt

dim
non
(19

(of

TEC

utt

div
spe
ser
fir

seg

mei
fez

pil

 

 

rance was divided into 40 segments; and from each segment, l2

ictor coefficients were extracted to form a vector of AD
nsions. The identification decision was based on the

Euclidian distance measure defined by Shafer and Rabiner
5.) The correct rate of identification was found to be 63.8%
60 total judgments.)

He and Dubes (l982) presented a paper on speaker
tification by using LPC and pitch contour. Speech samples were
ided in a sound booth by eight Chinese male speakers, each
hing l5 repetitions of a short sentence in the Chinese language
a microphone attached to a tape recorder. Each utterance was
ﬂed into b-second epochs, resulting in five speech data per
:er (each datum, thus containing either two complete spoken
:nces or one complete and partially complete sentence.) Then,
ly, each datum was partitioned into #0 segments. From each
:nt, l2 predictor coefficients were computed. A pitch contour

also prepared from each datum by two different
ds: cepstrum method and peak detecting technique. Five
res measured on a pitch contour were maximum, minimum, average
period, the maximal slope, and the larger one of the two
period values determining the largest slope. Subsequently,
data were subjected to feature optimization procedures
sting of: The hierarchical clustering technique, discriminant
sis, and F-ratio as discussed in Chapter II. For
fication decision operation, the Euclidian distances were

ed between the test pattern and reference patterns and the

 

 

dec

rul

 

 

 

cision criterion was based upon the nearest-neighbor decision
la. The results from their study were as follows: 8l.92 by
tch contour when all speech data were included, but the rate rose

96.h% when the data containing partially complete sentences were
scarded; 75.62 by LPC for the entire data, but increased to
.42 when the data containing partially complete sentences were
scarded. A combination of features from the pitch contour and

re LPC was also tested with all speech data included (in an
.tempt to compensate for varying text contents). This resulted in
.8% correct identification. Although the authors of this study
d not specify, the speech data were rather text-dependent even if
e proper alignment of each phonetic unit was not attempted.

The cepstrum of a speech signal is defined as the power
ectrum of the logarithm power spectrum of the signal (Noll,
57). This method was introduced as a means to separate
damental frequency from the speech signal .in the frequency
ain. Much attention has been given to the cepstral method in

field of automatic speaker recognition (Atal, l978, Luck, 1969;
ge, l978; Tosi, 1979; Furui, l98lb: He and Dubes, l982). The
son for its pepular application appears to be twofold: The
strum is mathematically well defined, i.e., renders itself for
iable algorithmic implementation to a computer, and it is
atively resistive to the frequency response characteristics of

transmission system as well as the recording devices.

Furui (l98la) published a comprehensive study on the

hniques for automatic speaker verification based on the cepstrum

 

coe
(te
con
mal
Se\
frc
the

sei

ficients computed on a fixed, sentence-long utterance
t-dependent) recorded over the conventional telephone
ection. A total of 50 utterances from each of 20 speakers (10
5, l0 females) were recorded over the period of two months.
ral kinds of utterance sets were prepared, all band limited
l00 Hz to 3.0 KHz. Cepstral coefficients were derived from
predictor coefficients (LPC) obtained from the same speech
ent. After several pre-processings —- such as pause
ination, time registration, normalization, and optimum feature
:tion -- were applied, the results of verification error rate
less than l3 even if the test utterance and the reference
'ance were subjected to different transmission conditions (but,
ably, within the same telephone connection). The key factor,
i commented, would be in the normalization procedure of the
:rum coefficients to remove the distortions of the response
icteristics introduced by the transmission system. The
iicients were averaged over the duration of the entire
'ance, and the average values were subtracted from the cepstrum
"icients of every frame.
‘Pitch contour is a plotting of the time-varying pitch of the
h signal. 0r, synonymousiy expressed, pitch contour is
[he glottal frequency - characteristics melody curve of the
r.” (Tosi, l979). There are two properties of pitch
ur: First, it is sufficiently speaker'dependent; second, it
sistive to the distortion introduced by the frequency response

cteristics of the transmission and recording devices. Many

 

earchers (Atal, 1972; Markel, Oshika, and Gray, l977; Hunt at
, l977; He and Dubes, l982) explored pitch contour as one of the
ch parameters for speaker recognition, and confirmed it to be a
‘iy (or, at least sufficiently) reliable speaker-dependent
racteristic.
Atal (l972) studied the average pitch and the measurements of
temporal variation of pitch for automatic speaker
itification. Speech data were text-dependent and collected from
male speakers, each producing 6 repetitions of the short
:ence, and each sentence lasting 1.8 to 2.8 seconds depending
I the different rate of utterance by individual speakers. He
urted that the measurement of the average pitch was far better
1 correct identification) than that of the temporal variation of
h.
Markel et a], (1977) also investigated pitch contour but
ied it to text-independent speech materials obtained from a
small group of speakers with homogeneous pitch distribution.
used F—ratio (analysis of variance) as the measure of the
tiveness of the average pitch and the standard deviation
uted on pitch contour) as a function of the number of frames,
rom Lv = l0 to Lv = lOOO, which correspond to about 70
ds). It was concluded that the average pitch is significantly
effective than the standard deviation of pitch and that the
ated standard deviation about the average pitch was reduced
about l8 Hz (for Lv= 10) to about 6 Hz (for Lv= lOOO). They

ed two other parameters,the spectral-related feature obtained

 

 

by
of
eff

spe

ide
par
ea<
COT
in
me;
fr:

fu

PT

ii

26

LPC and gain variation, to the same speech data for the purpose

comparing the three speech parameters. Ranking of the
’ectiveness in discriminating speakers was in the order of
:ctral feature, pitch contour, and gain variation.

Hunt at a], (l977) conducted text-independent voice
:ntification by using pitch contour (and including other speech
-ameters). They used a group of l2 professional meteorologists,
:h reading two sets of different text transmitted over the
imunication channel. Seven different kinds of fundamental
:quency related measures were derived from a pitch contour, viz,
in fundamental frequency, group mean of the mean fundamental
:quency and its rate of change, and proportion of time that
idamental frequency is rising or falling. The pitch contour was
pared by the use of a hardware implemented real time cepstral
cessor. Identification performance was tested in two ways:
st, texts of the unknown and the known speakers were arranged in
oncontemporary manner, resulting in 89% (l33 out of lh9 samples)
rect identification; and second, texts were arranged in a
temporary manner, resulting in IOOX correct identification. In
ition to pitch contour, they included two other parameters,
:tral related and gain related. When all three parmeters were
)ared in terms of identification performance, ranking was in
:r of spectral parameters, followed by pitch contour, then
lly, gain related parameters. The study by Hunt at al, showed
voice identification can be done even if the speech data of

erent content were transmitted over the communication channel;

27

>wever, the degree and nature of the distortion of the
'ansmitting system were not specified clearly. It appears that
Il speech data in their study were transmitted and recorded by the

ame system.

 

 

a
m
—«

I -

defi

Cate

Cros

Feai

FFC

IDS

Lh

 

28

inition 9: the Terminology
Key terminologies used throughout this study are operationally

ined as follows.

egory: The term refers to a set of patterns (or samples) and
is used synonymously to a speaker in this study.

ss-transmission: This term refers to a voice identification
procedure in which the speech data from the unknown speaker
and the known speaker are prepared through two different
transmission systems.

ture: A feature refers to an individual measurement component
within a speech parameter. The number of features determines
the dimensionality of a parameter. For instance, the first
frequency component in the IDS, or the average fundamental
frequency in the FFC, is called a feature.

The FFC is a speech parameter which consists of a set of
fundamental frequency related measurements computed on a pitch
contour of a running speech sample.

(intensity deviation spectrum): The IDS is a speech parameter
derived from a set of successiVe short-term spectra. The IDS
reflects, by definition, temporal variations of the spectra of
speech sound.

ar system: This system refers to a path in which the speech
sound is transmitted by a microphone and recorded onto an
audio tape by using a tape recorder. In this system the
microphone and the tape recorder are characterized as having

the relatively flat response curve (Linear) covering the

 

LTAS

Norm

Pat'

Sho

SPE

29

speech frequency range. The term 'Linear speech' or 'Linear
-' in this study refers to the speech sound, or processed
speech data made available by using this system.

5 (Long-term averaged spectrum): The LTAS is a speech
parameter computed by superposing and averaging of n
short-term spectra. Each one of these spectra is originated
by successive segments from the speech samples of about ll
seconds utilized in this study. The LTAS reflects static
spectral feature of speech sound.

ial system: It refers to a path in which the speech is
transmitted by a microphone and recorded onto a magnetic tape
by a tape recorder. The system is assumed to have undefined
response characteristics. The term 'normal speech' or 'normal
-' in this study denotes the speech sound, or processed speech
data made available by this system.

ern: It is composed of the set of features chosen from a
single or more of the speech parameters. A pattern is
equivalent to a speech sample and is the basic data set to
represent the voice characteristics of the speaker.

t-term spectrum: This spectrum is generated from a short
segment (in this study 25.6 msec) of each processed speech
sample by using Fast Fourier Transform (FFT).

:h parameter: The term refers to the measurement(s) derived
from acoustic speech signal. The individual feature is
extracted from a speech parameter.

(choral spectrum): The SPT is a speech parameter produced

Telei

Text

Trar

Voi

 

30

from choral speech by processing it through FFT. An
elaborated definition and algorithm to generate this spectrum
is presented by Tosi (I979). In this study, choral speech is
obtained by superimposing O.h096 second long segments of
on-going speech.

ephone system: It refers to a path in which the speech is
transmitted via the telephone transmitter, received at the
remote end of the local line telephone set, and recorded onto
an audio tape recorder. The term 'telephone speech' or
'telephone -' in the text refers to the speech sound or
processed speech data made available by this system.

:-independence: This term refers to the type of phonetic
materials used as the speech data for voice identification.
Text-independent voice identification uses different texts
from the unknown and the known speakers. Counterpart of this
term is the 'text-dependence'.

smission system: Restricted to this study, 'transmission
system' refers to a system of _the devices used for
transmitting the speech sound, such as a_microphone, telephone
transmitter and its attachment, and so forth. The term
'system‘ is used to refer to a path of the speech signal from
the speaker's mouth to the sound storage device, an audio tape
recorder.

3 Identification: It is defined as a process of selecting a
speaker (from a group of the known speakers) whose voice

sample is the closest to that of the unknown.

31

thin-transmission: This term refers to a voice identification
procedure in which all speech data from the unknown and the

known speakers are prepared by the same transmission system.

Organization _t thg §£E§X

This study is divided into four chapters.

Chapter I presented a general introduction to voice
identification, the statement of the problem, the purpose,
significance and limitation of this study, a review of the
literature, and a list of operational definitions of the
terminologies.

Chapter II is devoted to the description in detail of
experimental procedures: Recording of speakers,
digitization, pause elimination, generation of speech
parameters, feature optimization and standardization, and

identification operations.

Chapter II presents the results from the identification
operations.
Chapter IV concludes this study by presenting

discussions, conclusions, and implications for further

research.

ofp
materi
operat
record
proced
Speech
standa
sectic
voice

list:

Lsts

F°r n

CHAPTER II

EXPERIMENTAL PROCEDURE

This chapter is organized into three sections: l) recording

phonetic materials, 2) pre-processing of these phonetic
terials, and 3) experimental procedure for voice identification
erations. In the first section, recording and arrangement of the
corded speech data are discussed. In the second section,
ocedures and algorithms for pause elimination, extraction of the
eech parameters, optimization of the features, and
andardization of the features are discussed. In the third
:tion, distance measurements, decision processes, and designs of
‘ce identification operations are covered. The following is a

;t of equipment and softwares used throughout this study.

t _t Equipment
recording speech data:

Condenser microphone, Bruel 8 Kjaer, type hl32
Cathode follower, Bruel & Kjaer, type 2619
Microphone amplifier, Bruel a Kjaer, type 2603
Dynamic microphone, Ampex, model 2001

Local line telephone sets

Open-reel tape recorder, Teac, model A-70l0
Open-reel tape recorder, Sony, model TC-l06A
Cassette tape recorder, Marantz, Superscope,

model C-202-LP

Open-reel tapes, Scotch, low noise, l.5 mil, l200 ft
Cassette tapes, 3M, low noise, 30-minute (one side)

33

 

For

rar
Iii:
in

re.

as
no

N

34

processing speech data:

PDP ll/hO mini computer, 6h k(byte) memory,

with 2 disk drives

l6-bit A/D and D/A converters, 3 Rivers Computer Corp.
RK05 disks, 2.h Mega bytes

CRT monitor

Light pen connected to the CRT monitor

Deckwriter II, Digital Equipment Corp.

Open-reel tape recorder, Ampex, model hOOOG

Fortran software (see Appendix H)

RECORDING 0F PHONETIC MATERIALS

Ikers and Phonetic Materials

 

The phonetic materials used in this study consisted of
nute long speech samples. The subjects were l0 male speakers
omly selected from a population of native speakers of
estern American-English dialect, ages ranging from 20 - 35,

from defective or pathological voice conditions. All speakers

different excerpts from a nontechnical book (Appendix A shows
nple excerpt) at a 'normal' reading speed. Each speaker was
5 to rehearse by reading aloud a brief paragraph (one which was
going to be included as speech data for him) while all the
'ding equipment and the telephone line were checked for proper
:tion. During recording, each speaker was instructed to
tain approximately the same distance from his mouth to the
:mitter of the telephone set (about 3—5 cm) and to the other

microphones (about l5 cm). No additional instructions as to

the]

Reco

spea
syst
to

tran
recc
micr
char
tel:

trar

ins'
teh
Spa
alt
the
SUp
the

tra

mod
cal

mic

he manner in which the speaker should read the excerpt were given.

ecording Setting

Figure 2 illustrates the simultaneous recording of each
aeaker through three different transmission and recording
Istems: I) through a telephone line with the remote end connected
> a tape recorder by an inductive pick up ('telephone
’ansmission'); 2) through a conventional microphone-to-tape
:corder ('normal transmission'); and 3) through a
crophone-to-tape recorder of an almost linear frequency response
aracteristics. Hereafter, these three systems are referred to as
lephone transmission, normal transmission, and linear
ansmission system, respectively.

For telephone transmission, the telephone set was placed
side of a sound booth and dialed up to the other end of the local
lephone system (campus line at Michigan State University.)
aech signals were drawn by the use of an inductive coil directly
tached around the receiver of the telephone set, and connected to
a microphone input of a cassette tape recorder (Marantz,
>erscope model C-ZOZLP). No care was taken to check the response
Iracteristics of this telephone transmission system. For normal
Insmission, a dynamic microphone (Ampex, model 200i) was placed
a sound booth and connected to an open-reel tape recorder (Sony,
el TC-l06A) outside the booth. No care was taken for
ibrating the frequency response characteristics of this

rophone and tape recorder either. Thus, a normal transmI55ion

 

 

Nor

P181m: 2.

36

Sound Booth

 

 
    

Dynamic

Microphone

  
 

 

Text
Telephone
transmitter Local telephone line
Condenser -__
Microphone _-

 

 

  
  
   
 

 

Inductive

pick up
Linear transmission

Telephone
receiver

Cassette

 

Open reel tape recorder

 

 

 

 

 

 

 

 

 

 

 

 

 

I ta e recorder
3 Mic.

i em- 0 O O O

l Bruel & Kjaer Teac, model A—7010 Marantz, model C—202LP

tYPe 2603 Open reel tape recorder

Normal transmission \\\~___—r’//

Sony, model TC—106A

 

 

 

 

 

 

 

2 2. A diagram showing equipment used for three simultaneous
insmission and recording systems.

 

system
For I
type i
type 2
2603)
model
were
charai

these

becau
trans

long

know:
as an
a in
were
Iden
iIIu
FigL
trar
The

tray

37

system was assumed to have the undefined response characteristics.
For linear transmission, a condenser microphone (Bruel 8 Kjaer,
type hl32) was coupled with the cathode follower (Bruel 8 Kjaer,
type 26l9) connected to a microphone amplifier (Bruel 8 Kjaer, type
2603) outside the booth, then to an open-reel tape recorder (Teac,
model A-70l0). This condenser microphone and Teac tape recorder
were calibrated for their linearity of the response
characteristics. Plottings of the response characteristics of
these two devices are given in Appendix B.

Although each speaker read only one 6-minute long text,
because of the above described simultaneous recordings by three
transmission systems, each speaker produced a total of l8-minute

long speech data.

Arrangement gt Speech Data

In this study, all speakers served both as unknown and as the
known persons. This required that the speech sample of a speaker
as an unknown differed in context from that of the same speaker as
a known. Therefore, the speech data stored in audio tape recorders
were properly arranged to enable 'text-independence' from three
identical texts simultaneously produced by a speaker. Figure 3
illustrates this procedure for proper arrangement. As shown in
Figure 3, a 6-minute long speech of each speaker for each
transmission was partitioned into three 2-minute long portions.
The initial 2-minute portion was then segmented from telephone

transmission (indicated by' CD), the medial portion was segmented

 

H hwxwwam
an < uxoh

N thmmam
cha m— UXUF

Hg

>‘ﬁ
nit-I
jg L I ® | 7
3 3 Normal transmission
H m
L I I o I
Linear transmission
L G) I l I
Telephone transmission
>.N
.13“
I In I @ I 1
5 3 Normal transmission
[-tm
L I I o |
Linear transmission
Figure 3. Sample arrangements of text-independent phonetic materials from two
The

38

6 minutes ~-

 

2 minutes

@ I I

Telephone transmission

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

speakers. For t e cross-transmission voice identificaiton operation:
segmented portion 1 in a circle in telephone transmission was used to prepare
the data base for the speaker as an unknown while portions 1 and 2 in cirles
in normal transmission and linear transmission were used to prepare the data
base for the same speaker as the knowns. For the within-transmission voice
identification: For each transmission system, the segmented portion in a
circle was used to prepare the data base as an unknown and the portion in a

square, as a known speaker.

from nc
was 5:
arrange
cross-1
of por‘
as us:
The la'
in a

data.

conten
system
Speech

l

39

om normal transmission (indicated by CD), and the final portion.
[5 segmented from linear transmission (indicated by C§). This
'rangement was conducted to prepare input speech data for
‘oss-transmission voice identification . In addition, another set
f portions was necessary to prepare ’text-independent' speech data
5 used in within-transmission voice identification operations.
he latter set of partitionings are indicated by the number encased
n a square seen in Figure 3. Unmarked portions were not used as
ata. Consequently, two 2-minute portions of different phonetic

ontent were segmented as raw speech data from each transmission

ystem and represented a speaker. In total, 60 2-minute long

peech data resulted from the above arrangement:

l0 (speakers) x 3 (transmissions) x 2 (portions) = 60.

PRE-PROCESSING 0F SPEECH DATA

Lgitization

During the digitization process, each analog speech sample
ored in the original magnetic tape was played back on the same
:ording equipment which was used to record it. A l6-bit
:log-to-digital converter (ADC) interfaced with a mini computer
IP ll/AO) digitized a 2-minute long speech one at a time, at a
pling rate of lOOOO/sec. Then the digitized speech was stored

a disk (2.h Mega bytes) for the subsequent pause elimination

proced
(=l wo
the AD

F
of nc
digit

variaI

Pause

auton
elimi
alter
part‘
extr:
each
abou

SPEC

elin
Star
def‘
def

Pau

by

 

4O

procedure. Each sampled point (digitized) was quantized by 2 bytes
(=1 word) resulting in dynamic range of about 90 dB as specified by
the ADC.

Frequency transfer function of the ADC indicated some amount
of nonlinearity. However, this nonlinearity (distortion) in the
digitization process was not considered as the source of the

variables because it was constant for all input speech data.

Pause Elimination

The silent portions and pauses were detected and deleted
automatically from the speech samples. The objective of pause
elimination was to reduce the amount of speech data without
altering speech characteristics. Also, this procedure was
particulary important for properly computing IDS parameter whose
extraction was based on a set of successive short-term spectra,
each spectrum being transformed from a brief speech segment of
about 25 msec long. It was deemed essential that no short-term
spectrum was resulted from the 'silent‘ portions or 'pauses'.

In many studies on automatic speaker identification,
elimination of the silent portions and pauses is included as a
standard procedure, though no author has provided a specific
definition of the pauses. In this study, pauses were clearly
defined by applying the quantitative pausometric definition of
>auses proposed by Tosi (l97h). The entire process was implemented

'y Fortran software on the PDP ll/AO.

 

that
detern
detern
is det
only
the v;
resid
that
and
phone
pause
be so

respe

the
rEprl

and *

Spee
a sp
fixe
and
eacl
eIin

41

\

This process was performed in the time domain in such a way
when the signal falls within two pre-set parameters (one to
rmine the amplitude threshold, 'Ap', and the other, to
rmine the time threshold, 'Tp') that portion of the speech wave
etermined as a pause. The result was a concatenated speech of
'signals' from which all pauses were eliminated. Initially,
values for Tp and Ap parameters were sought by listening to the
dual pauses (deleted and concatenated for audio playback so
the most unvoiced and weak consonants, such as /f/, /0/,/h/,
brief portions preceeding to and following after plosive
emes such as /p/, /t/, and /k/ were detected and deleted as

es. Consequently, typical values for Tp and Ap were found to

omewhere between 15 to 20 milliseconds, and 0.90 to 0.99,

ectively.

Figure A illustrates how pauses were defined and deleted from
input speech wave. Figure 5 shows spectrographic

asentations of the original input speech containing all pauses
:he resulting output speech without pauses.
The goal of pause elimination was to obtain a 'pause-free
h sample of about 55 to 60 seconds long from each portion for
aker per transmission. The lower boundary of 55 seconds was
so that five ll-second epochs were secured for each speaker,
1e upper boundary was set to provide a margin of one second to
epoch. Summary of the results from the above pause
ration procedure is given in Appendix C. Subsequently, this

was subdivided into five 11- second long epochs, each epoch

 

[Fr

 

 

 

 

 

 

 

 

 

(a)
KComputed average peak amplitude
A9 = 0.60 Ihm
Ap = 0.90
Pd/ZO msec time
A A
Ibl Pauses eliminated at ()c Signals concatenated at Ap - O. 90 and Tp= 20 msec
Ap = 0.90 and T9 = 20 msec
D VAN /\U A"
a
3
J
a
1
i ALMA. VZ/f‘
r _
(e) Signals concatenated
Id} Pauses eliminated at at hp u 0 d
Ap - 0.60 and 1p - 20 msec 'I‘p - 20 ec

r5

 

 

Fliqure 4. A graphic illustration of pause elimination procedure.
(a) Input speech signal of about 2 second long is subjected to the pause elimination procedure with a
= two different Ap values. A dotted horizontal line is choc computed aver aeg peak
a

N
O
S
a.

Ap - 0.90 and Tp - 20mg ancd concatenated. (d) Pauugo are eliminated at Ap - 0.60 and ‘11: - 20 msec.
(a) Signals axe detectedc from (a) at Ap - O. 60 and ‘l‘p- mac andco nonunaced.

«.000an UDQEH Aﬂv

 

£03926 ucoooe N £5»? 9:: mcwugmou

Auoms o~ u E. can om.o n a: an umuecdﬁaao manning :uoomn unmade 3. .mwmsma mca

inn—Hui nodumusu 9503 v .32? ac cough £35 :3 £032.65": wanna know: noodle
unauao meauasmuu m can nuwuam and...“ nude mode—.33 0530.: uEduDOHuuome oz.

.m 2.6:

ill NEHH

 

39:20 mun 2»le : 9.5:: own acumug are—yuan :5 0 maﬁa «z u >

 

 

 

    

 

 

cucum- usmuao 3.

135

 

a La cows 5 m: o v won m 373: on an a: 5 He: oau meucH 9: an.

 

m can;

   

m >uu> 0 m3 0 a; w>

 
  
 

 

a

          
  

. ,.__u. i _ é. _ , v w
. _

. ..-? ... _ ...|..».u_.r..|.lH... til»! I _th. .9; IIIII
nu E.“ ON. (race). ..

mcwwucH Emu.

. , ._ h . .
— _ — . n - a n u 1 d .t -
...mnﬁsmcoamawao kuwnmm.=nanx Hana:

 

......t... ....
an weapon ange .muauﬁwg. . . .u

to be u
process
analog
tape re

and'chc

ExtraC‘

an I]
(one F
were

bY usi
was 1
cover

about

Where

44

> be used for generating speech parameters, IDS and LTAS. All the
'ocessed speech segments, epoch by epoch, were also stored as
ialog speech (by playing back through the DAC) onto audio tapes by
Ipe recorder (Ampex, model 4000 G) for later use to generate FFC

Id‘choral spectra.

traction of Speech Parameters

IDS (intensity deviation spectrum)
An [05 was generated from a set of short-term spectra. From
ll-second epoch of the processed speech, a set of th portions
ne portion = 25.6 millisecond, or window of 256 sampled points)
re transformed to the corresponding number of short-term spectra
using the FFT (Fast Fourier Trasform). Each short-term spectrum
5 then represented by the intensity at l28 discrete frequencies,
vering the frequency range from O to 5000 Hz, with an interval of

out 39 Hz. The IDS was computed by the following expression:

1 J _
Pik= - leijk' Sik
s-k
l J=l

 

are; P. = intensity of the ith frequency of the kth IDS,

(I)!

'k = average intensity of the ith frequency over J
I

short-term spectra from the kth segment,
S..k = intensity of the ith frequency of the jth short-term
IJ

spectrum from the kth segment.

 

a

b

-
VHHWZMHZH dmm

 

 

Figure

my

(a)

45

 

 

 

 

3.0 3.5 u.'0 u.5 5.0
FREQUENCY [N K H2

1

 

 

gure 6. Computer plottings of three IDS's generated from the
text-independent speech data of speaker 1:
(a) by telephone, (b) by normal, and (c) by linear system.

Es
express
these 1
number:
frequel
at th;
differ
3) Re
spectr

C
for e
assigr

were (

Figur
Speec

trans

set

whicl

46

Essentially, the above expression to compute the IDS can be
pressed by 1) Adding the intensities within each frequency of
ese 440 short-term spectra and dividing the sum by the total
mbers of spectra, thus obtaining the average intensity for that
equency; 2) Subtracting the average intensity from each intensity

that particular frequency, and taking the sum of all absolute
fferences, then dividing this sum by the same average intensity;

Repeating above steps for all ordinates over 440 short-term
actra.

Consequently, an lDS was represented by l28 intensities -- one

each frequency available in the spectrum. Five lDS's were
signed to each speaker per transmission. in total, 300 lDS's

'e generated:

5(IDS) x l0(speaker) x 3(type of transm.) x

2(cross- or within- transm.) = 300

lure 6 shows three computer plottings of lDS's generated from
:ech samples of a speaker recorded via three different

nsmission systems.

LTAS (Long-term averaged spectrum)
An LTAS was computed by averaging the ordinate values across a
of successive 440 short-term spectra, the same set of spectra

Ch generated the IDS. It was computed by:

 

where; L
L

S

Total nL
three cc

speaker

3. FF

An
segment
been pr
Speech
10000/5

for me:

Peak

revieu
and Di
relull
ShOUI|
The 0
avera

Were

47

ere; Lik =average intensity of the ith frequency for the kth
LTAS.
S = intensity of the ith frequency of the jth short-term

ijk
spectrum for the kth segment.

tal numbers of LTAS's generated was also 300. Figure 7 shows

'ee computer plottings of LTAS generated from speech samples of a

aaker recorded via the three different transmission systems.

FFC ( Fundamental frequency related measurements)

An FFC was prepared from the first five seconds of each
iment of the pause-deleted speech sample which has previously
an processed and stored in the audio tape. Once again, this
each segment was digitized by the ADC at a sampling rate of
loo/second, one segment at a time. Then it was further processed

' measuring fundamental frequencies (Fo) as described below.

A. Detection of F0

Several techniques of the computer implementation of direct
ik detection, which estimate Fo's from the digitized speech were
liewed from the existing literature (Gold and Robiner, l969; He
i Dubes, l982.) The review indicated that this type of technique
luires frequent heuristic adjustments when the incorrect Fo's
vuld occur, tending to result in a grossly smoothed Fo contour.
; one developed by He and Dubes was tested for computing the
.rage F0 from a half second long speech signal, and the results

he compared to the ones estimated by laboratory equipment (sound

 

 

 

 

. D
rHHmzmHZH AMM

Figure

tea

(a?

48

 

 

 

 

   

3.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 u.'0 0.5 5.
FREQUENCY IN K HZ

 

 

igure 7. Computer plottings of three LTAS's generated from the
text-independent speech data of speaker 1:
(a) by telephone, (b) by normal, and (c) by linear system.

spectre!
excelle
indicat
informa
problem
Fc
applyir
a new p
study.
experin
technir
Procedl
C
domain
Figure
Segmen
While
method
0f abc
wave
line i
fundar
entirr
deieh
segme

Were

49

:trograph, visipitch, and oscilloscope). The match was
allent. Nevertheless, this preliminary experimentation
cated that a direct peak picking technique may lose significant
irmation of F0 variations. Markel (1977) also attests to this
ulem.

For the reasons listed above and inasmuch as the aim of
ying the FFC was to represent fine glottal dynamic variations,
w peak detecting technique was devised specifically for this
y. This technique demands interactive participation of the
rimenter; hence, it is called 'interactive peak detecting
nique.l Discussion of this technique and measurement
edures for the FFC follows.

Cycle-to-cycle Fo's were measured directly from the time
in speech by the use of an interactive peak picking method.
re 8 shows a photograph of the actually displayed speech
ant on the CRT (display screen operated via the PDP ll/40)
3 F0 detection was in progress. Major steps involved in this
:d were as follows: 1) displaying of the digitized speech wave
pout 100 milliseconds on the CRT, 2) visually inspecting the

to determine recurrent wave patterns. 3) drawing a flexible
by a light pen capturing those recurrent peaks associated with
amental frequencies, and 4) repeating the above steps until an

'e 5-second segment was exhausted. Inevitably, because of
:ed pauses, there were discontinuity points within the speech
:nt under process. These discontinuities in the displayed wave

carefully avoided so that they would not be falsely considered

(b)

(a)

.Hm>:wcmE coo uLMﬂH an AvooEDWV vwoowxm mH LUﬂLB
ucomuuo .oEocoxa puuao>d3 po .omm m nuHB ucoEmom oLH ADV .po%maomﬂp oEmpm oﬁu Cﬁ£uw3 ucowwwo
new on ﬁuHB upoEmom may va .A®0Hpoavam>uoucﬂ xmoo umoE uzwﬂu oﬁu Eoww wousmmoe hocodeum Hm

lucmEmvcsw pounce: oﬁu mH Moduou pamﬁu poem: ozu so ucﬂummdom prEDG mﬁH .poucoEprQXo ecu kn
woumuooo coo uLwHH m %n wououcw women m we mxmom acouusoou osu mcﬁwpo>muu maﬁa Hmucwnwwo:
oHLonHw < .wEOH Amucﬂoo puaoﬁmm qNoHv oomE q.NoH usonm mH czocm uswEmow ﬁoouaw comm

.mopsoooouo msﬂuuoump xmuo o>HuompouGH ecu wcﬂzmaomﬁw HMO msu mo msaouwouosm .w ousmam

 

f mEHu t oEﬁu

50

apnnrtdmv

<—

  

T as

 

+- epnnttdmv

 

as pitch
pen mane
displaye
automat

An
pitch c
two end
partici
periods

reliabl

B
N
F0 con
below.
number

0f the

ii.C

 

51

pitch periods. This manipulation was carried out by the light
maneuver. However, the continuity from one frame of the
>layed signal to the subsequent frame was maintained
>matically by a software.

An output of this interactive peak detecting technique was
:h contour containing successive pitch periods and amplitudes of
ends of the pitch periods. Although it required a careful
.icipation of the experimenter to correctly target pitch
ods, this method was found to be quite simple, quick and

able.

B. Extraction of FFC

Nine features (measurements) were computed for the FFC on each
ontour created earlier. Computational procedures are described
(w. For all procedures; Fo= fundamental frequency, N = total
ers of Fo's (in each Fo contour), and A0 = relative amplitude

he peak (of F0).

ﬁo : The average Fo computed on a PO contour.
l N
F, = TA”
Fo : The standard deviation of F0.

1 N
OFo = —-— 2 (Fan ’ I30)2
N n=1

 

 

 

iii. AFO

iv. AFo/

V" 5A0
Vii. F0!

Viii. F0

bow/F.

CM,

F°(max)

. F°(min)

52

: The average temporal variation of F0 in successive

cycles.
N-l

AP, = —1—- §:|Fon+l — Fon
N-l n=1

: The ratio of the temporal variation of F0 to the

average Fo.

: Standard deviation of cycle-to-cycle peak

amplitudes

 

N
0A0 = —% Z (A011 ' Ao)2
n=1

' The average temporal variation of peak amplitudes

1 N-l
AAO = KT} ;§;|A°n+l - Aon

: The maximun F0 in a pitch contour.
F°(max) = Max[F°1, F02,-'-, FON]
' The minimum F0 in a pitch contour.

F°(min) = Min[F°1, F02,°--, FON]

“ﬁazaaf

ix. FC

Figure
frequer

above.

i. S
S
elsewh
Spectr
a Chi
sPeecl
durat
stack
sPeec
gener
(It-1:
inter
the
Para

Show

 

53

Fo(rng) : The range of the F0. The Fo(rng) is simply

computed by

Fo(rng) = Fo(max) - Fo(min)

are 9 illustrates three basic measurements (periods, fundamental
yuency, and amplitude) required for computations described

'e.

SPT (choral spectrum)

Since the detailed procedure to create an SPT is given
:where (Tosi, l979). here only a brief description of the
:trum is presented. A choral spectrum is a Fourier Transform of
:horal speech. A choral speech is generated by segmenting a
ech wave of a certain duration into n portions of the same
ition t (in this case 0.4096 second, or 4096 sampled points) and
:king entire n portions, one on top of another, resulting in one
:ch segment of length t. Choral spectrum used in this study was
:rated by using FFT resulting in l-byte integer intensity values
20 dB) at 2048 discrete frequency components with about 2.44 Hz
irval. For this study, all choral spectra were generated from
i previously processed speech segments which were used for other

meters. A total of 300 choral speech resulted. Figure 10

s a computer plotting of a sample choral Spectrum.

 

 

 

axe.» sated

 

ulllrtt uwwu

54

FM " F92

1] ‘13 13
r—ﬂ; =T1'Ti ~ 717-3-ng

Foi‘l/m Fo2=y72

AF“ 7'

 

 

 

 

   

   

 

A03
peaks

 

Apia = “oz-Ali!“

AA... iAu-Aal

  
 

 

 

ﬂ.z
Peak. 2

 

 

 

 

 

..ii.
—-i

\
d—i—

T8. T3 ti me.

 

Figure 9. Illustration of F0, APO, and 5A0 by using three con-
secutive peaks in a simplified short speech segment. In this
figure, Fo = fundamental frequency; ji= ith pitch period,
and A03“. = amplitude of the ith peak.

 

 

55

 

Care
“mm

 

m.:

.Ahum an exam BOchB no can:

mucﬁoo paaoEmo coo: uov ccouom oaoc.o mm: Hum an pwoopouo no: Eouuuooo Hmuoco mwsu

cues: Eoum coupon Hmuozo mo coﬁuwusp way u. umxmoow Eoum coxmu common pompouoEOo
mco~ ccouwmId— o co comma Ecuuuoom ﬁmuozu oHoEmm o no wcﬁuuoﬁo pousosoo .o_ ouawae

i

6.: m.m o.m m.~ o.~ m._ c." m.o .o

 

Optimizat
The i

of featu
features
between t
has beer
identific
relative
in a gel
the mean
increase
number,
number
Sizes of
“978)
Should t
Tal
°Ptimiz;
Contain
°i fea
i0? the
hierarc
F‘i’atic
hierar‘
iWO‘st.

USed.

56

thimization 3: Features

The objective of feature optimization was to reduce the number
of features for computational simplicity by selecting only those
Features which were determined to be effective in discriminating
Jetween the speakers. The importance of the optimization procedure
was been recognized in many studies on automatic voice
identification, especially when the number of speaker is very small
'elative to the number of features utilized. Hughes (1968) showed
In a general statistical model that for a fixed number of samples,
:he mean identification accuracy increased when the dimensionality
increased until an optimum value was reached. Beyond the optimum
iumber, the accuracy decreased linearly. He suggested the optimum
iumber of features to be 5, l0, 20, and lOO or greater for sample
,izes of 20, 100, 500, and larger, respectively. Jain and Dubes
:l978) suggested as a rule of thumb that the number of features
hould be at least five for each sample.

Taking these notions suggested above into consideration, the
ptimization procedure was applied to the original set of features
ontained in each speech parameter used in this study. The number
f features was l28 for both IDS and LTAS, nine for FFC, and 20h8
or the choral spectrum. Optimization was carried out by using the
ierarchical clustering technique with complete-link method, and/or
-ratio statistics. For the parameters, lDS and LTAS, both the
ierarchical technique and the F-ratio statistics were applied in a
wo-step sequence, whereas for the parameter FFC, only F-ratio was

sed.

The
tool ir
speech I
word |
identif
and th
il973).
partiCL
cluster
and Di
patterl
themse
statis
resear
as a F

voice

Parami

were
of cc
(tempt
60mp¢
high.

bit

 

The clustering technique has been employed as an effective
ool in many scientific endevors including the field of general
peech research. This technique has been applied to the study of
ord recognition (Rabiner et 37,, 1977) and also to voice
Identification by computer (Tosi at 3],, l979). Elaborated review
nd theory of the clustering technique is provided by Anderberg
(1973). The use of this technique in this study was in only one
articular way to illustrate the diverse applicability of the
lustering technique. This particular usage was suggested by Jain
nd Dubes (l978) as a means to reduce the number of features in a
attern by finding those features which are highly correlated among
hemselves. This technique was later coupled with F-ratio
.tatistics, which also has been often implemented by some
'esearchers (Paul at 3],, 1975; Markel and Davis, 1979; Atal, l978)
IS a part of the procedure for selecting effective features for
'oice identification.
Feature optimization procedures applied to the speech
arameters used in this study are discussed next.
. IDS and LTAS
First, the original number of 128 features for IDS and LTAS
ere reduced to 100. This reduction was imposed by the limitation
f computer memory capacity available on PDP to carry on further
amputational procedures. Sacrificing the lowest three frequency
amponents (78 Hz and lower) and highest 25 components (h0l7 Hz and
igher up to 5000 Hz), the new I08 and LTAS were then represented

y 100 features of frequency ranging from 117 to 3978 Hz, with the

 

interval
carried
complete-
Hier
hierarchi
features
Spearman-
all pail
speaker)
correlat
hierarch
Prepared
three 5
transmis
From e;
COMpone
horizon
six den
F.
statist
the de:

each c

 

erval of 39 Hz. Optimization procedures were sequentially
ried out by the hierarcical clustering technique with
piete-link method, and then by F-ratio statistics.

Hierarchical Clustering: The objective of using the
rarchical clustering technique was to determine subsets of
tures which were highly correlated among themselves. The
arman-product moment correlation coefficient was calculated for

pairs of features over a set of 50 patterns (5 for each
aker). Then, a similarity matrix was prepared from these
relation coefficients and submitted as input data to the
rarchical clustering technique. Six similarity matrices were
pared from six sets of 50 patterns (three sets for the I05, and

Fee sets for the LTAS, both based upon three different
ansmission systems), resulting in six complete-link dendrograms.
am each dendrogram, 10 clusters (groupings by frequency
Iponents) were systematically chosen by means of placing a
izontal line at the appropriate proximity level. The resulting
dendrograms are presented in Appendix D.
F~ratio statistics: The objective of application of F-ratio
tistics was to pick the best feature from each cluster formed in
dendrogram. The F-ratio was computed for every features within
h cluster. This F-ratio was expressed as:

Between (inter-) speaker variance
(considering each speaker as a group)

 

Within (intra-) speaker variance
(considering 5 samples per speaker)

 

Basical
the d
f-rati
and a
best f
l

of ti
featu
dendr
featu

indic

while

the.

woul
type
was
fol

fea

59

Basically, the larger the F value of the feature is, the greater

the discriminating power as indicated by that feature. All

F-ratio's so computed within each cluster were then rank-ordered

and a feature which yielded the largest F-ratio was chosen as the

best feature in that cluster.

Table 1(a) shows that three different transmission data bases
of the IDS parameter resulted in varying compositions of the
features and their F-ratios whithin each cluster formed by the
dendrogram. For example, in the case of telephone IDS, two
features, viz,, ll7 Hz and I95 Hz, were grouped in one cluster
indicating a relatively high correlation between the two features,
while the same two features were grouped in different clusters in
the case of normal IDS and of linear IDS.

Since it was necessary condition that each resulting pattern
would be composed of the same set of features regardless of the
type of transmission systems, some degree of arbitrary interaction
was introduced by the experimenter. By inspecting Table l(a), the
following strategic steps were taken to determine lO optimum
features for IDS.

a. The minimum F-ratio denoted by Fm was chosen as a cut~off
criterion which was arbitrarily set to Fm = 5.0.

b. From each cluster of features in the telephone column, a
feature with the largest F value (and equal to or greater
than Fm) was chosen.

c. F values of the same feature in other two transmission data

bases (normal and linear) were checked if they exceeded or

Table 1(

m

.HU

.H0

.H0

60

1e 1(a). Results of the feature clusterinqs and the corresponding
F—ratio's of IDS in three transmission systems.

 

fol-ghee. tron-union Int-1 {tn-1.001011 Lieu: Inn-Lulu

tutu“ Insane! P-rouo Future Drag-1.11:1 D—utln Future koguucz Putin
29 1211 6.235

28 1172 7.687

26 10916 9.688

,_, 27 1133 11.904

25 1055 3.455

° 69 1992 4.106

g 60 1953 2.629

51 2070 3.194

52 2109 1.566

c1.

 

3

v-l

C

 

 

5 q
7 .
...4
u
30 1
96 3020 16.260
22 937 0.211 ‘9 57 2305 1.626
19 020 6.266 . i 50 2031 2.276
10 701 6.994 .3 25 1055 1.756
17 762 12.342 30 1250 5.421
16 703 12.676 27 1133 6.669 .
15 664 11.010 26 1016 6.140 --t
39 1601 2.666 " 26 1094 5.000 U
13 506 5.546 . 29 1211 7.751
30 1562 6.660 v-I 20 1172 10.974
35 1645 4.350 0 32 1321 9.270
33 1367 6.723 31 1209 4.003
93 3711 17.625
90 3594 20.345 00 69 2773 1.679 o
< 92 3672 17.379 62 1719 0.909
95 __ 3739 _ 22_._6_67 - 73 2930 5.055 -
L:~_-:___3_7§O__ “'2'3.gg}'v H 71 2051 2 H
91, 3633 "‘"2'12033' ° 72 2091 3.949 9
3437 26.927 70 2012 2.696
05 33 0 34. 00
03 3320 23.206 0‘ 36 1406 12.500 " 16 703 16.795
00 3516 39.217 . 35 1445 5.071 . 15 664 13.952
04 3359 36.203 .—4 37 1523 7.576 H 20 059 9.606
37 ...--.5. _ _ _31." U 23 976 3.616 U 19 020 6.673
L§£__ _3555 2 .620:- 75 3000 5.363 17 742 6.965

(continued)

8

c1.

c1.

10

c1-

Tabli

Table 1(a) continued.

Telephone trenenieeion

Feature Frequency

 

F-retio

61 16 .
36 1606 7.282
b.

 

 

 

37 1523 2.436
36 1404 3.145
32 1320 9.552
29 1211 4.938
V 26 1096 3.301
74 2969 7.161
31 1209 6.600
40 1641 2.176
25 055 2.153
C 02 3201 15.643
3242 .
70 3125 11.430
00 3203 9. 770
(17' ‘_";300‘6 __ 9. 556‘,
79“ ‘ 3164‘“ T396
75 3000 7. 932
I 76 3047 5.472
30 1250 9.709
20 1172 3.033
72 2091 6.292
71 2051 6.043
70 2012 3.670
73 2930 5.019
67 2695 6.294
66 2656 5.050
69 2773 2.300
60 2734 3.374

6].

Normal trenelteeion

 

 

 

 

Feature Frequency F-retio
0 .- -.- - u-a».. - - ~ -
. (_9 __-- ,4__30_- __,__6._905,1
61 1600 1. 052
Ch 15 666 7.295
60 1661 6.523
F; 0 391 2.244
0 52 2109 4.640
56 2266 3.103
74 2969 4.660
70 3125 7. 069
(fzf‘j; 30067' f:9. 603‘.
03’ " 3320 ‘ ' 9.691
79 3164 7. 009
.09_ - . -3203 10_. 972
g 02_ _ 3201 __ 12. 2011
‘01 3242 'i1. 306
04 3359 11.300
4 234 0.257
__2|<___ 3631_ h.622___
C 9 3555 13.626
‘9‘ 36777 6.
$3 06 3437 .5.115
05 3390 1.730
. 90 3594 5.062
ed 07 3476 4.933
‘1 00 3516 9.714
7 351 0.003
47 1914 2.014
10 701 3.502
20 059 4.739
19 020 4.236
40 1953 4.063
16 703 7.026

 

 

 

1:1 . £3

9

(:1

l()

(:1.

 

Linear crane-100100

 

 

 

 

Feature Yreguencz P-retio
71 2051 4.704
6 312__-___ _4_. 0_39'
C 231111395 ____20_ ..0353 ‘.
72 2091 2. 37 0
70 2012 2.999
14 625 6.974
32 1320 5.046
31 1209 7.034
33 1367 4.157
30 1256 5.544
36 1404 7.317
35 __ 1445 ‘__7._7_49
‘1;;4__ ___;Izg§jj:_m._10. 1‘ j
100 ° 3904 157634
90 3906 10.004
99 3945 14.675
37 1523 0.333
20 1562 3.020
24 1016 5.196
, 10 701 7.164
-31-- _ __09§ 11.399“
C17,. #086" 1 2136M
C 94 3756 32 102 3
3023 20 935“'
) 95 3709 21.514
93 3711 20.590
92 3672 22.930
91 3633 16.660
1 90 3594 17.117
12 547 4.634

 

 

Only features (frequency components) of IDS considered in this

stdudy were the ones circled.
features selected as optimum
Dotted line circles
which were resulted from the

cc>131001.

discussed in the text.

circles.
tn! <3].. 1., 1:]..22,

Note
there are 10 common features
Clusterings of 100
etc) in each

by dendrograms in Appendix D.

Full line circles indicate those
from the corresponding transmission
also indicate optimum features, but
interaction in the selection strategy
that in every transmission column
as indicated by full and dotted line
features into 10 subsets (as indicated
column in the above table were produced

the

deter
stra‘
feau
resu

corr

whit

Ste]

bas

SUlll

Pre

de:

LT

equal to Fm = 5.0.

Then, if both F values in two trasmission data bases met the
criterion of Fm > 5.0, that feature was selected as an
optimum feature from the cluster considered. Otherwise,
another feature of the next largest F value from the same
cluster was processed for the same sequence until a feature
was found which met the criterion in 3, or all features were

exhaused.

The above steps were repeated for the remaining clusters of
ie telephone transmission data base. Optimum features so far
:termined are marked by the solid circles in Table l(a). The same
,rategy was applied to further determine the remaining 5 optimum
:atures basing on the other two transmission data bases. The
:sults are also marked by full line circles entered in the
irresponding columns of Table l(a). Dotted line circles appearing

each transmission column of the table are also optimum features
ich were resulted from the interaction of the aforementioned
eps.

Consequently, l0 mutually common optimum features of the IDS
sed on three transmission data bases were selected. These are
mmarized in Table 2.

Another set of to optimum features of the LTAS were also
epared by referring to Table l(b) according to the similar steps
scribed above. The results of IO optimum features determined for

AS are summarized in Table 3.

. we ,;_

—~=A-r

Table 1(

 

.JHU

01“”

‘H0

0H0

 

lle l (b) .

63

Results of the feature clusterings and the corresponding

F-ratio‘s of LTAS in three transmission systems.

 

 

Telephone crane-1001.001

learure Frequency F—ratio
4 4 234 50.969
2 156 6. 92
. ( 1 117 131.090 )_3
* .-----_QQ_-HALQL_
J (_3 ________ 195_ 17.619 ‘
35 IMS 1L2m
M
. 4:4 11.1 3
30 1562 13.054
9 37 1523 0.227
. 42 1719 14.422
4 41 1600 17.039
3 40 1641 9 571
39 1601 0.292
20 1172 3.210
27 1133 7.050
33 1367 10.652
31 1209 10.401
3 32 1320 12.612
. 30 1250 11.374
4 29 1211 6.040
3 0 391 16.120
7 351 13.136
5 273 20.009
56‘. ‘ 26.59.... _ _li-315-
(6.? :72511 -.ll-iOL’
64 " 2570 9.614
62 2500 0.069
61 2461 6.112
63 2539 7.125
r 47 1914 7.965
46 1075 0.026
; 40 1953 3.160
, 26 1094 s 504
25 1055 2.932
24 1016 6.775
23 976 3.942
22 937 4.256
2%9 103m
2930 7.450
_ .. .. 2921 -3éé
‘ -2051 _ _ [§.535‘J
, 0"’ 2734' 103259
4 2695 12.312
J 2012 0.007
2773 11.396
2422 5.075
2303 6.035
2344 7.724
2305 5.406
2266 5.213
2226 4.220
2107 3.161
2031 5.303
> 1w2 57w
. 2140 4.565
4 2109 4.422
J 2070 5.076
1797 7.400
2750 0.521
1036 6.647
312 4.559
3201 16.610
3242_ _ _ _.1__7._930~
~ ' -39.:1'125, -- --_9-.8§_7.1
3125 17 225
° 3006 10.002
1 3164 16.610
3047 14.053
3000 10.020

Ion-n1 (ran-natal:

Feature Pregnancy F-ratio
20 059 12.959
19 020 10.354
70 2012 0.907
69 2773 10. 609
72 ..-.-. _ 2321.....- ___-1.15

”‘71-- -_2.3_5_1 ........ 3 -99! J

Z‘p_ __ 742_ _ _12. 901)
703 187073
10 701 17.325
95 3709 13.799
94 3750 22.609

__29.._._.-329§ .--- - lliell

'-_9Z _____ 3067 __.“Léei§§)
99 594 16.670
96 3020 15.930
I“ 'L .

a‘_-_’ﬁtﬂ__2LM6J

11 500 ‘40.115
10 669 26.603
06 3437 0.971
05 3390 0.760
00 3516 0.735
07 3476 12.500
93 3711 2 .360
92 3672 9.701
91 3633 10.460
90 3594 12.744
09 3555 11.660
22 937 6.450
21 090 6.047
74 2969 17.101
73 2930 10.026
60 2422 32.043

 

77 3006 35.397
76 3047 10.257
75 3000 16.621
4 234 75.029

2 156_ _ 107.373\
5 ”10:34:40

5'9 "' 1601 15.202‘
30 1562 15.691
37 1523 21.660
13 506 22.930
50 2344 22.609
57 2305 10.117
56 2266 14.462
55 2226 13.313
17 1916 6.19
66 1075 9.650
15 1036 11.533
44 1797 7.631
43 1750 4.060
12 1719 5.809
11 1600 0.300
40 1661 9.641

(continued)

c]..

c]..

clu

“l

Linear erm-tutu

Future Prague ncy P—ratio
07 3476 7.318

76 3047 39.477
75 3000 40 212
70 3125 64.471
77 3006 31.590
74 2969 25.100
73 2930 20.035
72 2091 26.349
40 1953 6.470
47 1914 9.007
49 1992 10 494
51 2070 9.243
50 2031 9.969
53 2140 9.939
52 2109. 11.612
42 1719 0. 757
41 1600 17. 500
57 2305 30.191
56 2266 41.343
55 2226 17.351
54 2107 10 135
30 1562 16.601
37 1523 19.336
60 1641 15 954
39 1601 9.169
' 35 _ 1445___ __ 15:913.
.’ ‘35 : "T406 57.294 /-
‘3 “'T4‘04"“ 21.670
62 2500 25.403
61 2461 60 127
59 2303 29.503
50 2344 35.593
60 2422 33.442
13 506 17.404
71 2051 23. 296
70 2012 20. 467
69 2773 11.245
60 2734 9.152
67 2695 6.090
66 “56.- _1g4u_
5:63": : :00- _ f- .1005 -:
64’ 2570 16.509
63 2539 15. 591
.95 3709 15.214
94 3750 16. 246
93 3711 23.021
99 3945 17.020
90 906
(::97 3067 33. 640
96 3020 .

100 ___ -395t - —.- 11353
('3' ”L _10.2” J
23 976 12.142
22 937 11.300
26 1016 12 915
19 020 9.535
16 703 25.364

10 78 33:2‘?..
”T7" -h’ 722 .-. 13.5711 ‘
‘23 """i9s"" "BEST
20 359 7. 273

 

c1.

10

c1.

 

8

c1.

cl.

10

c1.

 

64
Table l(b) continued.
Telephone tranntuion Normal rrananiaaton Linear rranauiaalon
Feature Frequency F-ratio Feature Frequency F-ratio Feature Frequency F-ratto
; 54 2107 11.302 92 3672 16.137
100 3904 11.902 53 2140 10.170 91 3633 10.267
99 3945 17.497 r~ J 52 2109 16.709 r~ 3594 10.290
96 3020 2 . - . 51 2070 12.407 . 09 3555 0.917
M .. I so 2... ...... H . m m
:3 3320 16.394 0 49 1992 11.061 0 10 469 22.361
07 3476 32.655 60 1953 7.620 430 16.746
-25--.-.-.3.996.---_ -9-é3} ...............
(_97“ -__3_86_7 ' 21.4_43 :1 26 1094 9.599 r_'1_2 ___ _54_7_ ___19._4_2§;
95 3709"' "‘1T.462 25 1055 7.505 a: 11 500 39.773
94 3750 13 300 0 391 17.570 . 3_ .
93 3711 13.005 29 1211 19.242 ,_, ' _ 2.. __-_ _143.02_~
92 3672 14.301 20 1172 22.032 :1 t 1 117__ 191.461 1
90 3594 10.374 00 27 1133 17.779 ~ - " "'" "“""‘
91 3633 12.009 32 1320 27.179 0 391 10.092
09 3555 13.005 ,4 31 1209 19 942 5 273 37.250
34 14.007 01 30 1250 17.065 15 664 39.059
24 1016 10.195 Ch 14 625 23.351
32 1320 19.659
1 15 664 17.506 . 31 1209 10.072
Ch 7 351 31.731 *4 33 1367 26.045
, 14 625 13.370 U 7 351 23.070
.4 23 976 5/905
0 5 273 03.671 46 1075 14.662
45 1036 14.907
64 2570 13.550 44 1797 24.200
<3 63 2539 14.040 c3 43 1797 17.924
.4 67 2695 15.116 '9 26 1094 11.755
. 66 2656 17.302 _ 25 1055 7.656
,4 60 2734 11.231 .4 27 1133 19.000
0 o 29 1211 10.722
- 430 20.091 20 1172 14.200
6 312 12.267 30 1250 10.073

 

 

* Only features (frequency components) of LTAS considered in this
study were the ones circled. Full line circles indicate those’
features selected as optimum from the corresponding transmission
column. Dotted line circles also indicate optimum features, but
which were resulted from the interaction in the selection strategy
discussed in the text. Note that in every transmission column
there are 10 common features as indicated by full and dotted line
circles combined. Clusterings of 100 features into 10 subsets
(as indicated by c1. 1, cl. 2, etc.) in each column in the above
table were produced by dendrograms in Appendix D.

 

Table

II

Feature

(if)

H
I I H

* F-rat
Df 01

 

 

able 2. Optimum features selected for IDS parameter (as denoted by
circles in Table 1(a)).

 

 

 

 

 

 

 

 

*
F-ratio
in
Frequency Telephone Normal Linear

( Hz ) transmission transmission transmission
1 117 11.02 23.75 31.99
2 195 30.28 61.35 28.83
3 430 9.02 6.96 10.26
4 508 7.40 13.91 12.71
5 1406 7.28 8.47 10.61
6 3086 9.56 9.61 10.99
7 3281 15.64 12.20 9.13
8 3555 21.62 15.63 15.48
9 3750 23.28 12.62 32.10
0 3867 22.92 14.55 21.68

 

 

 

i-ratio is statistically significant (p=0.05) for F >12.12 with
f of numerator = 9, and Df of denominator = 40.

 

 

 

Table

:

Featur

H
Df

.17......I..1.5,n....l.u_o01.10~~

* F-r

 

 

66

Cable 3. Optimum features selected for LTAS parameter (as denoted by
circles in Table l(b)).

 

 

 

 

 

*
F-ratio
in
ature Frequency Telephone Normal Linear
# ) ( Hz ) transmission transmission transmission
1 117 131.10 147.55 191.46
2 195 17.62 50.30 28.36
3 547 12.85 21.09 18.84
4 742 36.73 21.90 18.84
5 1406 20.90 39.17 47.20
6 2617 11.41 29.24 10.62
7 2851 15.24 8.36 23.30
8 3203 9.89 43.49 56.85
9 3359 37.87 17.06 10.85
10 3867 21.44 14.59 33.64

 

 

 

-ratio is statistically significant (p=0.05) for F >2.12 with
f of numerator = 9, and Df of denominator = 40.

 

L

frequ:
compo:
great
extra
resu]
syste
extre

norma

p=0.C
value
was

Paran
char;
and j
feau

inte

larg
atte
Effe
feat
ValL

thre

67

The general trend as shown in Table 2 was that the lowest two
frequency components, 117 Hz and l95 Hz, and highest four
components, 3281 H2, 3555 Hz. 3750 Hz, and 3860 Hz had relatively
greater F values than those of the frequencies between the two
extremes. Unlike the case of IDS. most features of the LTAS
resulted in somewhat similar F values across the three transmission
systems excepting for one, ll7 Hz. This feature yielded the
extremely high values of 13l.lO, 147.55, and l9l.46 for telephone,
normal and linear transmission LTAS parameter, respectively.

In all cases, F-ratio was statistically significant (at
p=0.05) for F>2.l2. All features (for IDS and LTAS) had greater F
values than this critical value of 2.l2. One interesting outcome
was the composition of the optimum features in the IDS and LTAS
parameters in spite of the presumably different speech
characteristics that they carried: Four features (117, l95, l406
and 3867 Hz) were shared by both parameters. The remaining six
features were distributed somewhat differently along the frequency
interval.

2. FFC

Since the number of features included in an FFC was not very
large, feature optimization for the FFC speech parameter was
attempted by the use of F-ratio alone. To study the relative
effectiveness in discriminating between different speakers, all
features in an FFC were subjected to F-ratio statistics. Each F

value was computed over 50 patterns (5 for each speaker) and for

three different transmission systems.

 

 

 

 

tel

dev

tel

re;

(15

 

 

 

Table 4 is a list of F-ratios of nine features of the FFC
computed from speech data in three transmission systems. As shown
in Table 4 F0 (mean fundamental frequency) had the largest F-ratio
in all transmission systems (F=429.705, l47.9l5, 3l3.346, for
telephone, normal, and linear respectively). CIFo (standard
deviation of F0) resulted in much smaller F-ratio than that of is
in all transmission systems ( F =55.378, 28.488, and 4.976, for
telephone, normal, and linear, respectively).

These results of F0 andeFo in this study comply with the ones
reported by Markel et 3],, (l977), Hunt at 3],, (l977). and Atal
(I972) in view of the relative effectiveness of these two features
in discriminating speakers.

Less conspicuous features were found to be CjAo (standard
deviation of amplitudes of successive peaks of F0) and 1&Ao
(temporal variation of amplitudes of successive peaks of F0) which
had the smallest F-ratio among the rest, in all transmission
systems. Especially, in linear transmission, these two features,
Cho and 8A0 yielded F-ratios smaller than the critical value of
F=2.l2.

3. SPT

No feature optimization procedure was taken for the SPT, i,e,,

all SPT's retained the original dimensionality of 2048 frequency

components and entered as they were to voice identification

operations.

 

 

 

Table

Featuz
name

   

able 4.

Nine features and F-ratios of the FFC.

 

 

 

 

 

 

 

 

 

 

 

 

F-ratio*
iin
** Telephone Normal Linear
transmission transmission transmission

to 429.705 147.915 313.346
UFO 55.378 28.488 40.976
AFO 34.616 37.447 27.443
APO / FO 5.274 5.404 6.499
F0(max) 41.384 51.388 9.240
F0(min) 7.289 11.570 4.440
F0(rng) 18.457 17.272 7.107
5A0 5.274 5.320 1.152
5A0 2.486 4.374 1.370

* F—ratio of each feature for each transmission was computed on

 

 

a data set of 50 samples. Between speaker variance was based

on 10 speakers and the within speaker variance was based on 5
samples for each speaker. F-ratio was statistically significant
(p = 0.05) for F'>2.12 with Df of numerator = 9, and Df of

denominator = 40.

80 = mean fundamental frequency (Fo).
OFo = standard deviation of F0.
ZFo = average temporal variation of F0.

ZFo / To = ratio of the average variation of F0 to the mean of F0.

Fo(max) = maximum (highest) Fo.
Fo(min) = minimum (lowest) Fo.
Fo(rng) = range of F0.

3A0 - average temporal variation of peak amplitude of F0.

0A0 standard deviation peak amplitude of F0.

 

 

 

 

by

tral

unk
whi
lin

I'EF

knc
HeI

frl

me

De

pi

kl

 

 

 

VOICE IDENTIFICATION EXPERIMENTS

Organization gﬁ Experiment

A total of 24 voice identification experiments were conducted
by different combinations of the speech parameters and types of
transmission systems. In addition, all the parameters were tested
in the cross-transmission as well as in the within-transmission
voice identification experiments. In the cross-transmission, all
unknown speakers' voices were based upon the telephone system,
while all known speakers' were based upon either the normal or
linear transmission system. It was assumed that all speakers were
represented by the biased response characteristics. In contrast,
in the within-transmission experiment, both the unknown's and the
known's voices were based upon the same transmission system.
Hence, in the latter case, all the speakers were represented free
from the influence of the transmission system.

Two major steps involved in each experiment were I)
measurement of distance and 2) application of the decision rules.

Description of these two steps follows.

Distance Measurement

As a measure of proximity, or separation, between a pair of
patterns, one belonging to the unknown and another belonging to the
known, Euclidian distance was applied. Euclidian distance is a
vectorial summation of the differences between a pair of features

available in the patterns. This implies that if the values

 

 

IIIIIIII II

hm-

assigne
patterr
For thi
all it
standa'

standa

where
featUI
trans
these

dista

wheri

 

 

assigned to the features are not distributed homogeneously across a
pattern, the distance measure may introduce a highly biased result.
For this reason, prior to the computation of Euclidian distance,

all features in the parameter —- IDS, LTAS, and FFC .. were

 

standardized by Z-transformation. Each feature in a pattern was

standardized by transforming into a Z-score as described below.

P.. - E.
.. s lJ 1 'for i= 1, 2, ..., l
1J (7P1 (number of features)
j= I, 2, ..., J
(number of patterns)

vhere bi ando’pi are the mean and the standard deviation of the ith
Feature computed over J (=50 in this study) patterns. Zij is a
transformed score for the ith feature of the jth pattern. Then,
these standardized Z values were used in the subsequent Euclidian
iistance measurement.

Euclidian distance was calculated by the following expression:

 

K
D. o = Z. - Z. )2
13 \/kgl( 1k Jk

here; Dij = Euclidian distance between the ith pattern and the jth

pattern,
zik = kth feature of the ith pattern,
2 = kth feature of the jth pattern,

jk
K = total number of features within a pattern.

 

 

 

 

 

c:-
(D
O

I .

mini

idei

unk

dia

Decision rules

 

Two decision rules, the nearest-neighbor decision rule and the
minimum set distance rule were applied concurrently for all voice

identification experiments.

I. The nearest~neighbor decision rule:

This decision rule assigned one of the known speakers to the
unknown by the following sequence. Figure l] is a simplified
diagram to illustrate the following sequence.

a. Designating one of the n patterns belonging to the unknown
as the test pattern, and all other patterns belonging to
the knowns as reference patterns.

b. Computing the Euclidian distance between this test pattern
and all other reference patterns.

c. Assigning the test pattern to the known one of whose
reference patterns is the closest (the nearest) to the
test pattern: One decision has been rendered.

d. Repeating a through c until all patterns of the unknown
are processed as test patterns.

Up to this point, n identification decisions (n=5, in this study)
were reached, i,e,, the Euclidian distances from each of the n test
patterns available for an unknown to all other reference patterns
of the known speakers (the number of reference patterns in this
study was 50). Then the entire sequence was repeated to process
the remaining unknown speakers.

A total of 50 decisions for each voice identification

experiment were yielded by the nearest-neighbor decision rule.

 

73

Unknown speaker known speaker
by telephone by normal/linear
Test pattern Reference pattern

 

Speaker 1 J f Speaker 1

I888

 

 

Speaker ZJ

 

r Speaker 2

 

 

Speaker IOJ Speaker 10

 

 

BIBBB-II~BBB

 

Figure 11. A diagram illustrating the nearest-neighbor decision rule.
All the speakers are treated as the unknowns basing on the tele-
phone transmission as well as the knowns basing on the linear/normal
transmission system. An arrow indicates Euclidian distance between
the test pattern of the unknown to all reference patterns of the
knowns. Note that Euclidian distance is not computed among the
unknowns nor among the knowns, and that the length of an arrow
is not proportional to the actual Euclidian distance.

2. Th

Ur
require
set di
consist
compute

illusti

The a

Speaks

eXperl

t0 \

combil

 

74

2. The minimum set distance rule:

Unlike the former decision rule. this set distance rule
requires a priori category (speaker) information in determining the
set distance between two speakers under process. A, set is
consisted of n patterns assigned to each speaker. Major steps to
compute a set distance in this study is discussed next. Figure l2
illustrates these steps by using n=3 for simplicity.

a. The Euclidian distance from the set of unknown patterns to

each set of known patterns are computed.

b. Then, the maximum distance from each unknown pattern to
each known pattern within a category is chosen.

c. From this set of maximum Euclidian distances the minimum
is chosen to represent the Euclidian distance between the
unknown speaker to every known speaker. These sets of
distances are ranked from the shortest to the longest
distance.

d. Finally, the known speaker whose set distance to the
unknown is the shortest is assigned to the unknown.

The above procedures were then repeated for the remaining unknown
Speakers. A total of IO decisions for each voice identification
experiment were yielded by the minimum set distance decision rule.

Two decision rules described above were concurrently applied

to voice identification operations conducted under various

combinations of the speech parameters and transmission systems.

 

 

 

 

Unknown :

U1

U13

 

known spea
11

75

Known speaker I

U

 

Known Speaker 2

'—

igure 12. A diagram showing an example of the minimum set distance rule.

Three speakers are shown, one as an unknown and two as knowns, each
represented by three patterns. In this diagram, the symbol U11 denotes
the first pattern of the unknown speaker 1, and a symbol K11, the first
pattern of the known speaker 1. A line drawn between a pattern of the
unknown and that of the known indicates the Euclidian distance. The
length of the line is proportional to the Euclidian distance computed.
A '”' indicates the maximum distance among Euclidian distances computed
from a pattern of the unknown to all patterns of the known. A '0' is
the minimum distance of the maxima between the unknown and the known.
In this example, the minimum set distance between the unknown and the
known speaker I is designated as D(1,1) and that between the unknown
and the known Speaker 2, as D(1,2). Since D(1,1) < D(1,2) in this
example, the unknown is identified with the known Speaker 1.

 

 

ider

Spel

sys‘

wer

spe

spe

and

as

the

the

Idi

te

ei

5P

CHAPTER III

RESULTS

This Chapter focuses on the results of the voice
identification operations which were conducted by using different
speech parameters tested under various combinations of transmission
systems (telephone, normal, and linear). Speech parameters tested
were IDS (intensity deviation spectra), LTAS (long-term averaged
spectra), FFC (fundamental frequency contour), SPT (choral
spectra), and two composite parameters of IDS and FFC and of LTAS
and FFC. In each identification operation, l0 male speakers served
as both unknown and known speakers. Speech data obtained from all
the speakers were 'text-independent' as described in the previous
Chapter.

Table 5 summarizes the results of the cross-transmission voice
identification (all the unknown speakers were recorded through a
telephone system and all the known speakers were recorded through
either normal or linear systems). The relative effectiveness of
speech parameters are depicted in Table 5 in terms of the correct
identification rates. It is clearly seen from this table that the
highest correct identification rate of loo 2 was achieved by the
composite parameter of LTAS and FFC, and the lowest rate of 20 Z by
SPT. The identification rates of the remaining parameters, IDS,

LTAS. and FFC (each tested as a single parameter) and the composite

76

 

 

 

Table

II

II

 

77

Table 5. Summary of the results of the cross-transmission
voice identification operations.

 

 

 

 

 

 

 

Type of parameter Rate of correct *
and transmission identification (%)
IDS
Telephone vs. Normal 70
Telephone vs. Linear 60
LTAS
Telephone vs. Normal 70
Telephone vs. Linear 70
FFC
Telephone vs. Normal 50
Telephone vs. Linear 4O
SPT
Telephone vs. Normal 20
Telephone vs. Linear 20
IDS + FFC
Telephone vs. Normal 60
Telephone vs. Linear 6O
LTAS + FFC
Telephone vs. Normal 100
Telephone vs. Linear 100

 

 

* By the minimum set distance rule.

 

of IDS

Ta
the ini
the inl
Furthel
presen
as se
Theref
identi

F
the r
the fa
the <
Opera'

Obtaii

the
known
teste
(Wher
this
this
IDS.
iden
oVer

the

 

78

of IDS and FFC fell in the intermediate range of 40 to 70 %.

Table 6 provides more comprehensive results in order to enable
the interpretation of the elimination effect of each parameter upon
the influence of the response curve of the transmission systems.
Further detailed identification results for all the operations are
presented in Appendix G. Two independently applied decision rules,
as seen in Table 6, resulted in close conformity to one another.
Therefore, the following discussion of the results is based on the
identification rates yielded only by the minimum set distance rule.

First, IDS produced the elimination effect on the influence of
the response curve. This interpretation is clearly supported by
the fact that similar identification rates were obtained in both
the cross-transmission and the within~transmission identification
operations. This effect can also be seen by comparing the rates
obtained by IDS (60 -70 a) with the ones obtained by SPT (20 2).

Second, LTAS was found to be susceptible to the influence of
the type of transmission systems used. This susceptibility is
known by the decrease in the identification rates from 100 2 (when
tested under the within-transmission operations) to 60 to 70 %
(when tested under the cross-transmission operations). In spite of
this susceptibility of the LTAS to the type of transmission system,
this parameter yielded about the same identification rate as the
IDS. The reason that both LTAS and IDS resulted in the same
identification rate could be in the way that LTAS was extracted by
overlapping the set of short-term spectra. Practically, LTAS was

the same as the denominator used in the expression to compute the

 

 

 

 

 

 

 

 

Table

Desig

Cross
tran:

With
tran

Desi
Cros
trar

witl
trai
DeS'

Cro
tra

DeS
Cro
tra

Wit
tre

l/

*r
*‘k

 

79

 

 

 

 

 

Table 6. Summary of the results of 24 voice identification operations
Type of transmission and speech Identification
parameter used for rate (in %)
unknown speakers known speakers rule 1* rule 2**
Design 1
Telephone (IDS) vs. Normal (IDS) 52 70
Cross— Telephone (IDS) vs. Linear (IDS) 56 60
transmission Telephone (LTAS) vs. Normal (LTAS) 70 70
Telephone (LTAS) vs. Linear (LTAS) 66 70
Telephone (IDS) vs. Telephone (IDS) 58 60
Normal (IDS) vs. Normal (IDS) 64 70
Within- Linear (IDS) vs. Linear (IDS) 7O 70
transmission Telephone (LTAS) vs. Telephone (LTAS) 100 100
Normal (LTAS) vs. Normal (LTAS) 98 100
Linear (LTAS) vs. Linear (LTAS) 98 100
Design 2
Cross- Telephone (FFC) vs. Normal (FFC) 52 50
transmission Telephone (FFC) vs. Linear (FFC) 58 40
within- Telephone (FFC) vs. Telephone (FFC) 48 40
transmission Normal (FFC) vs. Normal (FFC) 48 40
Linear (FFC) vs. Linear (FFC) 56 50
Design 3
Telephone (IDS+FFC) vs. Normal (IDS+FFC) 62 60
Cross— Telephone (IDS+FFC) vs. Linear (IDS+FFC) 56 6O
transmission Telephone (LTAS+FFC)vs. Normal (LTAS+FFC) 92 100
Telephone (LTAS+FFC)VS. Linear (LTAS+FFC) 94 100
Design 4
Cross- Telephone (SPT) vs. Normal (SPT) 10 20
transmission Telephone (SPT) vs. Linear (SPT) 20 20
Within- Telephone (SPT) vs. Telephone(SPT) 92 80
transmission Normal (SPT vs. Normal (SPT) 94 90
Linear (SPT) vs. Linear (SPT) 88 80

* The nearest—neighbor decision rule.
** The minimum set distance decision rule.

 

 

 

 

 

 

 

 

IDS

of
tort
were
wen
bei

suf

Apr
spe
ire

9 I

IIOI
5P
is

di

 

 

 

 

IDS parameter.

Third, FFC was also shown to be quite free from the influence
of the frequency response curve -- FFC resulted in very similar
_correct identification rates no matter what transmission systems
were used for both unknown and known speakers. However, the rates
were only moderate at 40 to 50 z. This implies that FFC, although
being free from the influence of the response curve, may not be a
sufficiently effective speech parameter for voice identification.

With reference to the FFC features, the raw data presented in
Appendix E were inspected. It revealed that certain groups of
speakers shared extremely similar Fo's (average fundamental
frequency) and other features. For example, speakers I, 3, 5, and
9 had among themselves almost interchangeably close Fo's. Speakers
6 and l0 formed another grOUp with very close Fo's. This
homogeneity of the distribution of Fo's within the certain group of
speakers appears to be rather contradictory to the fact that this
feature, Fo, resulted in the highest F-ratio (indicating good
discriminating power) in all the transmission data bases. Such a
contradiction, however, may not be surprising considering the fact
that F-ratio only reflected (as applied in this study) the
variation of the feature values among the speakers as a whole
group, instead of the variation between all possible parings of the
individual speakers. Apparently, interpretation of the face value

of the F-ratio as the measure of discriminating power for the

speakers must be made with caution.

 

 

 

 

 

 

 

idenl
were
rate

spea

noti
corr
thai
chai
(as
the
con

die

hi!

09

81

Fourth, SPT came out as predicted. High correct
identification rates (80 to 90 a) were produced when the voices
were recorded only by one type of the transmission system, but the
rates decreased (20 X) when voices of the unknown and the known
speakers were recorded through different transmission systems.

Fifth, the composite of IDS and FFC did not show any
noticeable improvement in terms of elimination effect and the
correct identification rates. This was probably due to the fact
that both parameters included the same type of speech
characteristics. In view of the results that these two parameters
(as tested separately) were relatively free from the influence of
the transmission system, the identification rate of 60 a by a
combination of the features from IDS and FFC was a rather
disappointing outcome.

Finally, the composite of LTAS and FFC showed the unexpectedly
high correct identification rate of ICC a in two cross-transmission
operations (telephone vs. normal and telephone vs.linear). The
probable reason for this high identification rate can be expressed
as follows. LTAS and FFC carried different types of speech
characteristics working in a complementary fashion, i,e,, LTAS
provided the static spectral features, thus reflecting more or less
average vocal tract (shape during speech production, while FFC
contained the fundamental frequency related features, thus

reflecting information about the glottal dynamics in on-going

speech.

 

 

 

the s
projei
I unI
is de
from

apprc
knowr
two-c

of

from

rep:
Spe
Iii
Ide
Spe

the

C0

wi

 

 

 

 

The following Figure I3(a-e), l4(a-e), and I5(a-e) show, for
the sake of illustration, two-dimensional projections (nonlinear
projection algorithm by Sammon, I969) each projection consisting of
I unknown and 5 known speakers. Briefly, the Sammon's projection
is described to perform a point mapping of N L-dimensional vectors
from the L-space to a lower-dimensional space to preserve
approximate data structure. In this study, N=6 (I unknown and 5
known Speakers) and L=5 (5 samples/speaker) was plotted into a
two-dimensional space. In each projection, five patterns (samples)
of the unknown speaker are denoted by ui (i=unknown speaker index
from I to 5), and a center of the dispersion of the unknown speaker
i is indicated by Uci. Known speakers are simply denoted by the
speaker index and the center of the dispersion of known speaker i
is indicated by Kci.

In Figure I3(a-e) all the speakers (unknown and known) are
represented by telephone IDS data base. It is shown that unknown
speaker I is closest to known speaker I (correct identification,
I3(a) unknown speaker 4, closest to known Speaker 4 (correct
identification, l3(d)), but unknown speaker 5 as closest to unknown
speaker 3 (incorrect "identification, I3(c). As clearly shown in
these projections (Figure I3(a-e), relatively tight spatial
distribution of 5 speakers could be accounted for rather modium
correct identification rate achieved by IDS as tested under all the
within- and cross-transmission operations.

In Figure l4(a-e), each projection shows I unknown and 5 known

speakers, all the speakers represented by telephone LTAS. In

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.4
Unknown 'Ul
,l
01 .2
,4
Kcl‘l Kc4¥,2
ul 1 DE} 4 I 5
Kc2
,ul 2 2 4 .5 .5
,3
Kc5
. .2
.1 5.3 T
Kc3
.4
3 .ul
.5
,ul

 

 

 

 

 

Figure l3(a). 5 known speakers (telephone IDS) vs. unknown speaker l
(telephone IDS):

Known speaker l = 1
Known speaker 2 = 2
Known speaker 3 = 3
Known speaker 4 = 4
Known speaker 5 = 5
Unknown speaker I = ul

ch = The center of dispersion of Samples of the jth known Speaker
Ucl = The center of dispersion of samples of unknown speaker I

 

Figure 13 (a-e). Sammon's projections of 5 known speakers and 5 unknown
speakers, all represented by telephone IDS parameter: (a) 5 knowns vs.
unknown 1; (b) Sknowns Vs. unknown 2; (c) 5 knowns bs. unknown 3;
(d) 5 knowns Vs. unknown 4; and (e) 5 knowns vs. unknown 5.

 

 

 

 

 

 

84

 

.u2

 

 

 

Unknown

 

 

 

Figure 13 (b). 5 known speakers (telephone IDS) vs. unknown 2 (telephone
IDS):
Known speaker
Known speaker
Known speaker
Known speaker
Known speaker 5 = 5
Unknown speaker 2 = u2
KCj — The center of dispersion of samples of the jth known speaker.
Uc2 = The center of dispersion of samples of unknown speaker 2.

 

.—
—

ibbJNH
waF’

I

 

 

 

 

Fig

 

85

 

 

 

 

 

 

 

.4
.l ,3
.1 '2
4
ﬂ. Kc4 .2
c1 '4 .5
l Kc2
2
' 2 4 0505
3 .3
F3
’3 02K L
. 1 '5 .3 I ‘pga
K 3
° 4
' u3
O 3 3
'u3
.5
u3
Unknown

 

 

 

Figure 13 (c). 5 Known speakers (telephone IDS) vs. unknown 3 (telephone

IDS:

Known speaker
Known speaker
Known speaker
Known speaker
Known speaker

Unknown speaker 3= u3
ch = The center of dispersion of Samples of the jth known speaker.

Uc3 = The center of dispersion of samples of unknown speaker 3.

m.ncbioia
Uie.w MIA

 

 

Fit

 

 

86

 

Unknown :!4

°u4 2

 

‘ _.9

/'.’42 .5._.

.l '5 .3

7

KC

 

 

 

 

 

 

 

Figure 13(d). 5 known speakers (telephone IDS) vs. unknown speaker 4

(telephone IDS):
Known speaker
Known speaker
Known speaker
Known speaker
Known speaker 5

Unknown speaker 4= u4
ch = The center of dispersion of samples of the jth known speaker.

Uc4 = The center of dispersion of samples of unknown

.bibtoia
II
(ﬂib(»lvi‘

 

Fig

 

87

 

 

 

 

 

 

 

04
'l
02
'l
'4
Kcl U. Kg4.2
.4 .5
01 KC
.u5 ~
02 4
2 . '5 ‘5
.u5 '3
Ué‘p
‘3 02 C5
0 l . ‘
Kc3
.4 U nown
'3
o 5
.u5

 

Figure l3(e). 5 known
(telephone IDS).
Known speaker l =
Known speaker 2 =
Known speaker =
Known speaker
Known speaker 5 =
Unknown speaker 5=
ch = The center of
UcS = The center of

ohm

speakers (telephone IDS) vs. unknown speaker 5

U'liD-UJNH

u5
dispersion of samples of the jth known speaker.

dispersion of samples of unknown speaker 5.

 

 

88

 

 

 

 

.5

'ul

'4

 

 

Figure l4(a). 5 known speakers (telephone LTAS) vx. unknown speaker I

(telephone LTAS):
Known speaker I =
Known speaker 2
Known speaker 3 =
Known speaker 4
Known speaker 5 =

Unknown speaker I = ul
ch = The center of dispersion of samples of the jth known speaker

Ucl = The center of dispersion of samples of unknown speaker I

U'liwal-J

Figure 14 (a-e). Sammon's projections of 5 known speakers and 5 unknown
speakers, all represented by the telephone LTAS parameter:
(a) 5 knowns vs. unknown 1; (b) 5 knowns vs. unknown 2;
(c) 5 knowns vs. unknown 3; (d) 5 knowns vs. unknown 4;

(e) 5 knowns vs. unknown 5.

89

 

 

   

 

 

Unknown

0112

 

Figure l4(b). 5 known speakers (telephone LTAS) vs. unknown speaker 2

(telephone LTAS).
Known speaker 1
Known speaker 2
Known speaker 3
Known speaker 4
Known speaker 5
Unknown speaker 2

—
...-

.—
—

ch = The center of
Uc2 = The center of

ibbdMF’

5

u2
dispersion of samples of the jth known speaker

dispersion of samples of the unknown speakeriZ

 

9O

 

 

 

 

 

 

 

 

'3
Unknown
Kc3 Kc5 '5
.3 pp. ‘5
3 ¢u3 5
.4 ou3
0113
.4 -3 K04 '3
-4
'4

 

 

Figure 14 (c). 5 known speakers (telephone LTAS) vs. unknown speaker 3

(telephone LTAS).
Known speaker I =
Known speaker 2
Known speaker 3
Known speaker 4
Known speaker 5 =
Unknown speaker 3 = u3

ch = The centerof dispersionc>
Uc3 = The center of dispersion of samples 0

IIII
U'liwaI—J

f samples of the jth known speaker
f the unknown Speaker 3.

91

 

 

.4

 

 

 

 

Figure l4(d). 5 known speakers (telephone LTAS) vs. unknown speaker 4

(telephone LTAS).
Known speaker I =
Known speaker 2
Known speaker 3 =
Known speaker 4 =
Known speaker 5 =
Unknown speaker 4 = u4

ch = The center of dispersi
Uc4 = The center of dispersi

II

U'libUJNH

on of samples of the jth known speaker.
on of samples of the unknown speaker 4.

 

92

 

 

 

   

.3
Unknown

 

Figure l4(e).
(telephone LTAS).

Known
Known
Known
Known
Known

Unknown speaker 5
The center of
The center of

ch =
UcS =

speaker
speaker
speaker
speaker
Speaker

1
2
3
4

5

5 known

speakers (telephone LTAS) vs. unknown speaker 5

U'IibWNI-J

u5
dispersion of samples of the jth known speaker.

dispersion of samples of the unknown speaker 5.

 

93

contrast to the spatial dispersion of 5 known speakers by IDS given
in Figure l3(a-e), here, all the speakers (both unknown and known)
are more clearly separated.

In Figure I5(a-e), each projection shows I unknown and 5 known
speakers, all speakers being represented by the composite of LTAS
and FFC parameters, and the unknown speaker recorded through a
normal transmission system. These projections (a-e) indicate that
all 5 unknown speakers were correctly identified and that all the
known speakers were shown to have the relatively small

intra-speaker variation.

 

94

 

 

 

.4
Unknown
.2
,2
Kc2
. 'I
o 3 . v£C4
.4
' .5 -5
. 3 .4 '4
.5

 

 

 

Figure 15 (a). 5 known speakers (composite of FFC and LTAS by normal
transmission vs. unknown speaker l (composite of FFC and LTAS by tele—

phone transmission).

Known
Known
Known
Known
Known

speaker 1

speaker
speaker
speaker
speaker

DOOM

5

-
Con—

.—
—

Unknown speaker I

ch = The center of
Ucl = The center of

U’libLAMl-J

ul
dispersion of samples of the jth known speaker.

dispersion of samples of the unknown speaker I.

Figure 15 (a-e). Sammon's projections of 5 known speakers and 5 unknown
speakers - knowns represented by the composite parameter of FFC and
LTAS by normal transmission system and unknowns represented by the same
composite parameter by telephone transmission system.

5 known speakers vs. (a) unknown speaker l; (b) unknown speaker 2;
(0) unknown speaker 3; (d) unknown speaker 4; (e) unknown speaker 5.

 

 

 

95

 

.5 .5

 

 

 

.4
;u2 Unknown
.u2
2
O
.l
01.12
1" Uc2
‘Kc4
.4 .4
,4
'1
o4 .4
u2

 

Figure 15 (b).
transmission) vs.

5 known speakers (composite of FFC and LTAS by normal
unknown speaker 2 (composite of FFC and LTAS by

telephone transmission).

speaker I =
speaker 2
speaker 3
Known speaker 4
Known speaker 5
Unknown speaker 2 =
ch = The center of
Uc2 = The center of

Known
Known
Known

1

= 2

3

= 4
= 5

u2
dispersion of samples of the jth known speaker.

dispersion of samples of the unknown speaker 2.

 

 

 

 

96

 

 

01

K31

 

 

Unknown
3
0
p3
o‘u3
Uc3
03
.3

u3

 

Figure 15(c). 5 known speakers (composite of PFC and LTAS by normal
transmission) vs. unknown speaker 3 (composite of FFC and LTAS by

telephone transmission).

Known
Known
Known
Known
Known

Unknown speaker 3

speaker 1
speaker 2
speaker 3
speaker 4
speaker 5

l

= 2

3
4

= 5

ch = The center of

Uc3 =

The center of

u3
dispersion of samples of the jth known speaker.

dispersion of samples of the unknown speaker 3.

 

 

 

 

 

 

97

 

 

cl

 

 

 

Figure 15 (d).

5 known speakers (composite of FFC and LTAS by normal

transmission) vs. unknown speaker 4 (composite of FFC and LTAS by
telephone transmission).

Known
Known
Known
Known
Known

speaker l = 1
speaker 2 = 2
speaker 3 = 3
speaker 4 = 4
speaker 5 5

Unknown speaker 4 =u4

ch =
Uc4

II

The center of dispersion of samples of the jth known speaker.
The center of dispersion of samples of the unknown speaker 4.

 

 

 

 

 

‘1'! I . ll Illiiv“ H

 

liar

 

98

 

 

 

 

Unknown

ou5

.5

 

Figure 15(e). 5 known speakers (composite of FFC and LTAS by normal

transmission) vs. unknown speaker 5 (composite of FFC and LTAS by
telephone transmission).

Known speaker I = 1

Known speaker 2 =
Known speaker 3 =
Known Speaker 4 =
Known speaker 5 =
Unknown speaker 5 =u5

ch = The center of dispersion of samples of the jth known speaker.
Uc5 = The center of dispersion of samples of the unknown speaker 5.

(ﬁsh-DUN

 

 

 

 

 

 

 

 

CHAPTER IV

DISCUSSIONS AND CONCLUSIONS

In the present study, the speech materials were produced from
IO male speakers, each speaker simultaneously recorded by three
different transmission and recording systems. Four types of speech
parameters were extracted from the text-independent materials to
represent the speakers, as unknown and known persons. Then these
parameters were studied in voice identification operations for
their effectiveness in eliminating the influence of the response

characteristics of the transmission and recording devices.

DISCUSSIONS

Analytical comparisons of the results obtained in this study
to those reported in the literature do not appear feasible nor
meaningful due to the large variation in the types of phonetic
materials, size and type of the speaker population, methodology and
procedure employed, etc, Nonetheless, in order to facilitate some
reasonable interpretation of the results from this study, several
factors other than the distortion due to the transmission system

which could have been critically involved in the process are

discussed below.

 

 

 

 

 

100

Q2 Speaker ngulation

Typically the number of speakers included in studies of voice
identification by computer is small. This fact is mainly due to
the huge amount of information present in a brief segment of speech
to be processed by the limited memory capacity and computational
speed of the most computers. Clearly this constraint upon the size
of speaker population makes it difficult to generalize many study
results. Doddington (I974) presented a computer simulation on the
expected error rate for speaker identification as a function of
population size. It was shown that the overall probability of an
incorrect decision is a monotonically increasing function of
population size. Some examples of expected error rate (for optimum
identification) with population size N were 0.0I for N=2, 0.025 for
N=5, 0.08 for N=l0, 0.l8 for N=20, and 0.5 for N=l00. According to
Doddington's study, then, the error rate of 0.08 or conversely the
correct identification rate of 0.92 (92%) can be interpreted as
optimum, or sufficient for voice identification by computer; but
it seems to be far from the ultimate goal for practical use.

In addition to the number of speakers employed, the '
homogeneity of the speaker population is also known to considerably
affect the identification rate. 0n the issue of 'homogeneity' of
speaker population, Bogner (l98l) presented a good example in

explaining the varying results of identification rates reported in

much of the literature:

 

 

 

101

One possible 'explanation' is that this talker's speech

is exceptionally similar to the average of the papulation

of talkers. Let us denote talkers of this type as

‘type-X.‘ Assuming that a proportion 0.l of the talker

population is of type-X, we find that the expected number

of such talkers in a sample of I0 is l, with a standard

deviation of l. Thus, it would not be surprising to find

some sets of IO talkers, with no type-X talkers, and some

with 2 or 3, causing the resultant error rates to differ

greatly, by factors of more than 3.

Raw data (from Appendix F) were inspected to check how the
speaker population in this study can be explained in Bogner’s view.
The sample average of F0 , which yielded the largest F-ratio value,
was calculated over five FFC's for all the speakers. Based on this
feature alone, speakers I, 3, 5, and 9 were found to be extremely
similar, as their Fo's were 109.8, 110.1, 111.1, and 111.7 Hz,
respectively. Visual inspection of Appendix G (ll-l5) also
indicated that these four speakers were frequently misidentified
among themselves. Speakers 2 and I0 were seldom misidentified with
any other speakers. Speaker 2 had the lowest Fo, whereas speaker
110 had the highest Fo.

Another interesting interpretation can be drawn from the
results obtained by the use of the LTAS speech parameter. Judging
from the IOOX identification rate achieved under the
within-transmission voice identification, it appears that the
speakers in this study were less homogeneous when represented by
their long-term spectral characteristics than when represented by
glottal dynamic characteristics. In other words, given a group of
speakers, the spectral features have greater discriminablity in

distinguishing speakers than glottal features do. In general, this

complies with the results of Markel et a], (1977) and Hunt at a],

 

 

 

 

 

(1977)-

Qg Effects pi Feature Optimization

The major topic of this study was the 'elimination' of the
distortion of the frequency characteristics existing in the Speech
samples of the unknown speakers and the known speakers who were
recorded through the different transmission systems. Two
approaches were tried. The first approach was attempted by
applying the IDS and the LTAS parameters whose features were
optimized (reduction and selection). The second approach was
carried out by means of selecting several time-varying fundamental
frequency related features forming the FFC parameter whose features
were also subjected to the optimization procedure (selection).
Since the design of this study was not intended to study the effect
of the optimization procedure per 39, no substantiating grounds for
interpreting this effect is given in this study. Despite this
lack, indirect interpretation of this Optimization effect was
attempted by referring to the resulting correct identification
rates as the measure of effectiveness. It became obvious that the
optimization procedure induced the 'elimination' effects to
different degrees according to each speech parameter in use.

First, the effect of the optimization Upon the IDS was
undeterminable simply because there were no contrasting results to
be compared. Second, in view of the fact that all the original
features of the FFC were retained for voice identification

operation, the optimization procedure applied to the FFC was not in

 

 

 

 

effect at all for the 'elimination'. Contrary to this, when only
five features with relatively high F-ratio values were used, the
FFC yielded very poor identification rates. Third, in view of the
fact that the LTAS resulted in the better identification rate than
the choral spectrum did, the feature optimization procedure was
probably in effect for 'elimination‘ of the distortion. An earlier
pilot study carried out with the LTAS which included all the
original l28 features (no optimization) also resulted in
identification rate of only a chance level of io-zoz. A reasonable
speculation may be that l0 features (frequency components in Table
3 chosen for the LTAS were those which represented
speaker-dependent characteristics, leaving out other
characteristics related to linguistic content and to the
transmission and recording devices. This speculation, of course,
is difficult to verify but appears to be worthy of further

elaborate investigation.

92 Composite Parameter

Besides the 'elimination' of the influence of the response
characteristics, another issue of concern in this study was the
discriminating power of the speech parameter used. The goal of
this study was to investigate speech parameters which are
relatively resistive to distortion and also highly effective in
identifying the speaker. The results indicated that the composite
of the LTAS and FFC was the best approach to such a goal. Probable

effects of the composite are suspected to be twofold: That the

 

 

 

 

 

104

number of features were simply increased by a combination of IO
from the LTAS and 9 from the FFC resulting into a total of I9
features; or different types of speech characteristics were
integrated into one parameter containing both static Spectral
information and dynamic glottal information.

With respect to the first notion, Hughes (I968) showed that
the tendency of monotonically increasing the error rate of
identification beyond the certain optimum number of features and
suggested that the number of the optimum features was 5, IO, 20,
l00 or greater, for the size of 20, ICC, 500, and larger. A
similar result was also reported by Cheung and Eisenstein (I978).
Using 32 features (pitch, log energy, l0 partial correlation
coefficients, l0 cepstral coefficients, normalized absolute
prediction error energy, and 9 normalized autocorrelation
coefficients) extracted from text-independent speech data, they
plotted the identification error rate as a function of the number
of features. They concluded that the identification error rate
gradually decreases by increasing the number of features, but it
starts to taper off with 6 features -- no further improvement is
gained by increasing by more than 6. Many other researchers in
automatic speaker recognition appear to select a set of a certain
number of features according to prior knowledge about their
effectiveness without giving specific attention to the optimum
number of features.

Since not only the number but the type of speech parameter

from which features are derived may interact with the results, both

 

 

 

theoretically and practically it does not seem feasible to
establish the optimum number of features. For these reasons, the
possible effect of the increased number of features in the
composite parameter upon the improvement of the identification rate
in this study remains unanswered.

The latter notion perhaps provides more reasonable
interpretation of the possible effect for improving the
identification rate. As discussed earlier, one particular subgroup
of speakers was consistently misidentified (correct rate 50%) among
themselves when represented by one parameter (FFC), but another
subgroup of speakers was misidentified (correct rate 702) when
represented by the other parameter (LTAS). In one particular
condition, telephone vs. linear cross-transmission voice
identification operation, the correct rate of ICC? (by minimum set
distance rule) was achieved when the above two parameters were made
into a single composite parameter.

Cheung and Eisenstein (I978) studied text-independent voice
identification and showed that if the speakers were represented by
the set of features from three different types of parameters
including pitch, energy, and spectral information, the
identification performance was much better than if the speakers
were represented by the set of features from only one of these
parameters. By using text-dependent speech samples, He and Dubes
(I982) also concluded that the combined feature set from the LPC
and pitch contour resulted in the similar trend. Although these

two studies were not concerned with the influence of the response

 

 

106

 

characteristics, the results may indicate general efficacy of the
composite parameter of different types.

A reason for the poor identification rate obtained by the
composite of the IDS and the FFC parameters may largely be due to
the similar type of speech characteristics contained in both
parameters. Since IDS was derived by the time-normalization
formula reflecting the intensity variation of the feature
(frequency component) as a function of time -- hence partially
dynamic in nature, and FFC was mainly consisted of the variation of
the fundamental frequency related features extracted from the time
domain speech -- also dynamic in nature, these two parameters can
be considered to share partially common characteristics.

More convincing supportive evidence for the usage of composite
parameter of different types might be seen in the other methods of
voice identification, namely the spectrographic method and aural
method. Typically in the spectrographic method, multiple speech
characteristics (parameters) are concurrently examined in a paired
set of spectrograms, one for the unknown speaker and the other for
the known speaker. These include mean frequencies and bandwidths
of vowel formants, gaps and type of vertical striation , slopes and
transition of formants, duration of similar phonetic elements and
plosive gaps, energy distortion of fricatives and plosives, and
interformant acoustic density patterns (Tosi et 3],, I972).
Perceptual speech characteristics predominantly used by the human
examiner in the aural method are known to be clarity, roughness,

magnitude, and animation (Voiers, I964), pitch, intensity, quality,

 

 

 

107

and rate (Holmgren, I967), or quality, rhythm, melody pattern,
pitch, rate and respiratory group (Tosi, I979). To date there is
no evidence alluding to the superiority of the perceptual method
over the computer method of identification, or vice versa.
Nonetheless, in practice , it is a general notion that the computer
method of the present state of art would often fail to correctly
identify the speaker given the text-independent speech materials
containing other undefined characteristics in addition to those of
speaker dependent characteristics. Under the same circumstances,
the human examiner can often recognize the speaker with relative
ease. In this sense, the ultimate goal of the computer voice
identification seems to be attained only by simulating yet unknown
perceptual mechanisms of the trained examiner who can extract
multiple speech parameters singly or in any combination, depending
upon the type of speech data at hand.

To sum up, considering the results from this study and reports
in the literature cited above, the main reason for the improved
identification rate by the composite parameter seems to be
attributed to the inclusion of two different types of speech
parameters, one type carrying static Spectral features, the other
type carrying dynamic glottal features. Though much effort in the
study of automatic voice identification has been focused on the
search of the cardinal speech parameter(s), it is likely that the
identification system demands several types of parameters to fully
represent the Speakers who are text-dependently or -independently,

and/or with or without influence of the transmission system.

 

 

 

108

Qﬂ Influence 9i Pause Elimination

During the recording session, the Speaker was allowed to read
the text material at his preferred reading rate. Consequently, the
varying rate of each individual speaker was reflected in the
resulting speech data in terms of the duration of the voiced frames
or in terms of the overall articulatory patterns. For example,
compressed speech (processed speech data from which pauses are
deleted) of a speaker who read at a relatively faster rate might
have contained more voiced frames per unit of time than that of a
speaker who read at a slower rate. Inevitably, the rate of voiced
frames included in the compressed speech appeared to have
interacted with each speech parameter derived to the different
degree and in different aspects.

One of the speech parameters, LTAS (long-term averaged
spectrum) is believed to have been affected to the minimum degree
by the varying rate of voiced frames. Because LTAS was a spectrum
of the averaged intensities of frequency components, it reflected
no dependency upon the time variation. Also, the duration of the
total voiced frames in the compressed speech of all the speakers
was assumed to be long enough to counterbalance the different
phonetic contents for each speaker.

In contrast, because all the features in IDS and in FFC
(except Fo) were computed as a function of time (IDS, computed from
a successive temporary varying short-term spectra, and FFC,
computed from time varying Fo contour), these two parameters are

believed to have been under the influence of the rate of voiced

 

 

109

frames. The influence seems to be double faceted: One is that
the rate of voiced frames successfully reflected speaker-dependent
characteristics as it was intended; another is that it was involved
as a confounding variable. The latter aspect of the influence

certainly leaves some room for further investigation.

93 Interaction 91 the Experimenter

Certainly any kind of interaction by the experimenter Should
be avoided, or at least minimized if a completely objective or
automatic method of voice identification is desired. In this study
there were two spots which demanded the intensive participation of
the experimenter. One spot took place during feature optimization
procedure, where some amount of experimenter's strategy was
inevitable in choosing the IQ features for the IDS and LTAS
parameters. This interaction during the feature selection appears
to be a drawback in view of the repeatability of the procedure.
Since it was found that the feature optimization was a crucial
procedure in the 'elimination' of the influence of the response
characteristics, the ,need for further study for interaction-free
algorithms for this feature Optimization scheme is obvious.

Another spot of interaction took place during the interactive
peak detecting method applied for the measurement of 9 features of
the FFC parameter. This method was implemented to acertain
accurate measurements of fine temporal variations of the
fundamental frequency. Since the compressed speech data included

many discontinuous points between successive voiced frames

 

 

 

110

(signals), the application of the interactive method was crucial in
order to prevent these discontinuities from resulting in erroneous
measurements. Fortunately, it became clear that as long as the
experimenter is familiar with the wave pattern of speech sound
depicting the recurrent peaks, this interactive method would result
in the stable measurements from experiment to experiment. For this
reason, the interaction involved in the measurement procedure for
the FFC parameter does not appear to have contributed to the

resulting identification rate as a confounding factor.

CONCLUSIONS

This study was exploratory in nature regarding the methodology
applied and the types of problems dealt with. Despite the
application of a computer as the major computing source, there were
several stages where interactions by the author were required.
Within such a limitation set forth, the following general
conclusions were drawn from the results of this study.

I. Both IDS (intensity deviation spectra) and FFC (fundamental
frequency contour) are effective in eliminating the
influence of the response characteristics of the
transmission and recording channels. But their correct
identification rates were only moderate 50-60%.

2. LTAS (long-term averaged spectra) is susceptible to the

 

 

 

111

influence of the response characteristics, but even under
that influence, the correct identification rate was 60-70%.

3. The composite parameter of IDS and FFC is effective in
eliminating the influence of the response characteristics.
However, the correct identification rate is not improved,
i,e,, it is only as good as each component.

4. The composite parameter of LTAS and FFC is the most
effective speech parameter in eliminating the influence of
the response characteristics of the transmission systems.
It achieved the highest possible correct identification rate

of 100%.

IMPLICATIONS FOR FURTHER RESEARCH

The major findings of this study was that the speech parameter
composed of the optimized features of LTAS and the derived features
of FFC can successfully eliminate the influence of the biased
frequency response characteristics of the transmission and
recording devices. In relation to this finding, further immediate
research topics focusing on the same problem investigated in this
study are suggested below.

I. The methodology used in this study could be replicated by
using one composite parameter of LTAS and FFC with the

increased speaker population of 50 or more.

 

112

Choral spectra could be investigated for its feasibility for
feature optimization procedure. It is clear that choral
spectra and the LTAS possess equally useful property in
distinguishing the Speakers provided all the voices are
collected by the same transmission systems. The major
advantage of choral spectra over LTAS (long-term averaged
spectra) is a drastic reduction in the "amount of time
required to generate it. If the feature Optimization can be
made feasible with choral spectra, then it can form a
composite parameter with FFC.

Feature Optimization procedures applied in this study could
be investigated for its further improvement. Although it
was found that this procedure played a nontrivial role
toward the elimination effects, it required some amount of
arbitrary interaction by the experimenter. Ideally, there
should be no interaction taken by the experimenter during
the feature optimization process. This aspect appears to be
worthy of serious investigation to make the computer method
voice identification more objective.

The measure of the intra-speaker variability could
explicitly be taken into account to establish the basis for
assigning the probability of errors of identification.
Speech parameters other than IDS, LTAS, choral spectra, and
FFC could be investigated not as alternative parameters, but
as possible candidates to be included in the composite

parameter. One such candidate is the pause characteristics

 

 

 

 

113

(considered to be independent of the frequency response

characteristics) in an on-going speech of the Speakers.

 

 

REFERENCES

 

 

REFERENCES

Anderberg, M.R. Cluster Analysis for Application, Academic Press,
New York, I973.

Atal, B. S. 'Automatic recognition of speakers from their voices'
in Automatic Speech g Speaker Recognition, N. Rex Dixon and
Thomas B. Martin (ed.) IEEE Press, New York, I978,

pp.349-364.

Atal, B.S. 'Effectiveness of linear prediction characteristics of
the speech wave for automatic speaker identification and
verification', J; Acoust. Soc. Amer., I974, Vol. 55, pp.
l304-l3l2.

Atal, 8.5. 'Automatic speaker recognition based on pitch contour',
g; Acoust. Soc. Amer., I972, Vol. 52, pp. l687-l697.

Bogner, R.E. 'On talker verification via orthogonal parameters',
IEEE Trans. Acoust.i Speegﬂl and Signal Processipg, I98l,
Vol. ASSP-29, No. I. pp.l-l2.

Bricker, P.D. et a], 'Statistical techniques for talker
identification', Bell System Technical Journal, l97l, Vol.
50. Pp.l427-l454.

Bunge, E. 'Autmatic speaker recognition system AUROS for security
systems and forensic voices identification' in Automatic
Speech g Speaker Recognition, N. Rex Dixon and Thomas B.
Martin (ed.). IEEE Press, New York, I978, pp. 4l4-420.

Cheung, R.S. and Eisenstein, B.A. 'Feature selection via dynamic
programming for text-independent speaker identification'.
IEEE Trans. Acoust.J Speeghi and Signal Processing, 0ct.I978,
Vol. ASSP-26, No.5, pp.397-403.

Das, S.K. and Mohn, W.S. 'A scheme for speech processing in
automatic speaker verification', IEEE Trans. Audio

Electro-Acoust., l97l, Vol. Au-l9, pp.32-43.

Doddington, G.R. 'A method of speaker verification', Paper
presented at The Eightieth Meeting 2i Egg Acoust. Soc. Amer.
I970, Nov. 3-8, Houston, Texas.

 

Doddington, G.R. 'Speaker verification - Final report', Rome 51;
Development Centegy Griffiss AFB; NJ;l Tech. Rep; April,

l974, RADC 74-l79.

114

115

Furui, S. 'Cepstrum analysis technique for automatic speaker
verification' IEEE Trans. Acoust., Speech) and Signal

 

 

 

5522255129. April. 1981a. Vol. ASSP-Z, No. 2. pp.254-272.

Furui, S. 'Comparison of speaker recognition methods using
statistical features and dynamic features', IEEE Trans.
Acoust., Speech, and Signal Processing, l98lb, Vol. ASSP-29,
NO. 39 RFD-3162-350.

Furui, S. - ltakura, F., and Saito, S. 'Talker recognition by
longtime averaged speech spectrum', Electronics Egg
Communications jg Japan, I972, 55-A, pp.54-6l.

Gold, 8. and Rabiner, L. 'Parallel processing techniques for
estimating pitch periods of speech in the time domain'. J;
Acoust. Soc. Amer., I969, Vol. 46, pp.442-448.

 

Hair, G.D. and Rekieta, T.W. 'Automatic speaker verification
using phoneme spectra', J; Acoust. Soc. Amer., I972, Vol.

5], P.l3l (a) .

Hair, G.D. and Rekieta, T.W. 'Mimic resistance of speaker
verification using phoneme spectra', J; Acoust. Soc. Amer.,
I972, Vol. 5l, p.I3l(a). '

He, Q., and Dubes, R. 'An Experiment in Chinese speaker
identification', presented at I982 IEEE Int'l Conf. Trans.
Acoustpi Speech, Egg Signal Processing.

Holmgren, G. 'Physical and psychological correlates of speaker
recognition', Journal 9: Speech and Hearipg Research, I967,

Vol. IO. pp.57-66.

Hughes, G. F. 'On the mean accuracy of statistical pattern
recognizers', IEEE Trans. 93 Information Theory, I968, Vol.

IT-lh. pp.55-63.

Hunt, M.J., Yates, J.W., and Briddle, J.S. 'Automatic speaker
recognition for use over communication channels', IEEE Int's
Conf. Record pg Acousggi Speech and Signal Process. May

9-11. 1977. pp.764-767.

Jain, A.K. and Dubes, R. 'Feature definition in pattern
recognition with small sample size', Pattern Recognition,

1978, Vol.l0, pp.85-97.

Luck, J.E. 'Automatic speaker verification using cepstral
measurements', A; Acoust. Soc. Amer., I969, Vol.46,

pp.102—1o32.

 

116

Majewski, Z. W., and Hollien H. 'Cross correlation of long-term
speech Spectra as a speaker identification technique',
Acustica, I975, Vol. 34, pp.20-24.

Markel, J.D. 'The SIFT algorithm for fundamental frequency
estimation', IEEE Trans., Audio, Electroacoust., I977, Vol.

AV‘ZO. PP-367‘377-

Markel, J.D., and Davis, S.B. 'Text-independent speaker
recognition from a large linguistically unconstrained
time-spaced data base', IEEE Trans. Acoust., Speech, Egg
Signal Processipg, Feb., I977, Vol. ASSP-27, No.l, pp.74-82.

Markel, J.D., Oshika, B.T., and Gray, A.H. 'Long-term feature
averaging for speaker recognition', IEEE Trans. Acoust.,
Speech, and Signal Processing, I977, Vol. ASSP-25,
99-330-337-

The National Research Council, 9g the Theory and Practice 9: Voice
Identification, National Academy of Sciences, Washington D.C.,

1979.

 

Noll, A.M. 'Cepstrum pitch determination', g4, Acoust. Soc.
Amer., I967, Vol.4l, pp.293-309.

Paul, J.E., Rabinowitz, A.S., Riganati, J.P., Richardson, J.M.
'Development of analytical methods for a semi-automatic
speaker identification system', l925 Carnahan Conf. pg Crime
Countermeasures, I975, pp-52-64.

Pruzansky, S. 'Pattern-matching procedure for automatic talker
recognition', g; Acoust. Soc. Amer., I963, Vol.35,

pp.354-358.

Rabiner, L.R., Levinson, S.E., Rosenberg, A.E., and Wilson,J.G.
'Speaker-independent recognition of isolated words using
clustering techniques', IEEE Trans. Acoust.) Speech! gag

Signal Processing, Aug., I977. Vol. ASSP-27, No.4,
PP-336'3h9-

Sammon, J.W. Jr. 'A nonlinear mapping for data structure
analysis', IEEE Transactions 93 Computer, May l969,
pp.40l-409.

Shafer, R. L., and Rabiner, L. 'Digital representation of speech

signals', Proceedings 9: IEEE I975, Vol. 63, pp.662-667.

Tarno'czy, T. 'Determination du spectre de la parole avec une
methode nouvelle', Acustica, I958, 8:392-395.

Tosi, 0. Voice Identification: Theogy and Legal Applications.
University Press, Baltimore, I979.

 

117

Tosi, 0. 'Pausometry: Measurement of a low level of acoustic
energy', in World Papers jg Phonetics, Phonetics Society 9:
Japan, The Phonetic Society of Japan, Tokyo, I974, pp.l29-I44.

Tosi, 0., and Nakasone, H. 'Cancellation of the telephone response
curve’, Paper presented at International Association for
Identification, Aug., I982, Rochester, New York.

Tosi, 0., Pisani, R., Dubes, R., and Jain, A. 'An objective method
of voice identification', Current Issues in the Phonetic
Sciences, Harry 8 Patricia Hollien (ed.) In series of Current
Issues lg Linguistic Theory Vol. 9, in Amsterdam Studies 13
35; Theory 22g Hearing 9: Linguistic Science 1y,
Amsterdam-John Benjamins B.V., I979, pp.85l-86I.

Tosi, 0., et 3], 'Experiment on voice identification', J; Acoust.
Soc. Amer., I972, Vol. 5l, pp.2030-2043.

Voiers, W. 'Perceptual bases of Speaker identity', g; Acoust.
Soc. Amer., I964, Vol. 36, pp.l065-l073.

 

APPENDIX A

A SAMPLE TEXT EXCERPT

118

 

.mhma .mHImH .mm .smpowoncouu: pH>mQ >9 :uwmm so one: "condom.

< 699...: 9. 9.9... 99.89.9959. ... 5.9.9.9... 9899... 2.5.98 <10 9:11.9996890
.959 9 8:8 a. 95.99.95... ..9 859.999 99.95.... .6 .9. 5.6.9.9.... 9:. 95.8: (.../d
:9 :35. < 99:? .9..- .9.6E 9: 9. 9. :99... .39: 6.9.9:. 9 ... Q9259 9.6 9.»..99 9......-
...9.9..:. 99 .9 .35... 9:. .39: 99....9999 5...... 92h. 9.98.99. <20 9.9. ..85.........
9:. ..9 59.9 38:29.9 :99 6.5.5.. 9 ... 59.9.6 ....9 99.5 9. 999.9... 9.9 99:.
...: J.E.: .99.. ...—99.9 9.9 ...9: ...—:9... 9. <20 9:. 5.....- Eo... 8.59.9... 9.5.... 9:...

9:9: 9.:99...
9 99.89: 9.9... 9999 6: :98 .....9 .5536 99.89: 99.89.99. 5.9.... 5:8 :92:
9. 96.95. 9 99 9.99 5:. :99...— .89..9: 96.6999 99: 9.... :65. 9.. 999.9 9.9.99.9...
9:. .55.? .99.... 999:. 69.9.9... :99 9.190 89:9: 59.3.5.9. 9.5. 9... 99:2.
9. .. 9.9.99.6 999...: 9.. ..9 995999998 9 9. :99: 88:99. 9. <20 ..9 ...:Bu 9:...

. 99.99999... 9.6: 9.3 9:999. 5.9.9 9:. 992.6
99.9 9.6 .399: 9.9 9.: .9 59. 5.9.5 9:. 9:9: 828: 6.5.99: 9...... 95.99: 6
:999 299.9999 9...»: ..9 999:. 99.9 9.9 $10.9 99.65.9965 9...: 98:. .9. . s9: 9......
95:69.8... 29:98:. 9:. 99:98. .6: A5.59.9... 69:96:99 9.... 5...? :9... 38:5.
9. .3938 9:. 99: .. 699999 ....9 6.9.99 99.9.9 ..9 95.99.9999. 9:. .o.. .5392:
9 99 .99 :8 .. .5...— 85999... >9. 9.: :...s .. 9.99:5 9.9.99.6 9.. 4.9:. .9. <ZO .9
....99 96.999.556.999 99:8 9. .. 9.... ..9 699.866.. .95.... 9:. .9. 6.9.5 9: 9. 6s
6:. 999 996996 9.9:. 8999.399 ..9 .996» .86 9 9.9.6 ...—935...“: 99:39.9...
.3 22.9.: .9999 .959 89... 526 9.93 25:65... 99.99 6:. 9: 5.6 .6... .. .9358
19.99 99.998 9.9... 58 59.. 9. 5:35 999 :59 89.9.... 9. .699: 8.59.9... 9:.
....9 .5629... 89:96:99 98:. ..9 95.69.5998 9:. .5999.— 9.8.. ..9 999......“ 9:. 9:.

30.29.: 9.. ..9 99.5.39: 9.9.. 9:. 6
:68 9:. ..9 989 9:. 9. 9959.. 999: ...—.99 98:. 99 :999 8.89.9... 6:. 599.9 9.. 9.599
9.9: ... 99.9.9... .9 3.99.: 9:25: 9:. .9299 99.9.9 ....9 9299 9.9.9.... .9699 99.99.99.
6.9.9.... 9:. ... 9959.. 9.6: 9. ....99. 9.9.: 99.89.99. 99......8 .99....8: 9.5 .9 .99:
9 .....9 5:4 .2»: 5.9.6:... ....9 999:8... .8556 9. 99.9933 9.9.: ....999.’ .96 s
5.3 .59.... .9999» :99m 9.8.5.999 :999 5.9.... 9.59.5999 69:55 6.19:6: 999:.
9. 5:99: 22... 6:... .9399... 9. 9.99.9. 9:. ... 9.6... 9.9.9 9.59.595 596.31.
$9.92.»: :59 89 9:. .....9 :6. 9:. 9.9.2359: .6939 9:. ... 99.6..
.596 .8595 .9...— .........9 E98... 9. 6:6. 9: 299... 6:. 5.95.... :9 :5. 999......
..:..8 9:. 9:6: 9. .59 9:. 9.9.: 9.6.. 5.26:... 9939.6 9.9.9.... 92.... .5299 9.. .9
9...: 993 9.9:... 95:69. ....9 9.99896 6.9.9999... 99:69 .5995»: .9 9.999% 99:...6
..9 5.9.9999 ....9 92. ...99 993 95:59:59 9:.—. 9.6. ....9 :99 99.9.99 39.2.39
9.9.. 8998.9> 9.59.2.8 9.9.8... 9. 5:359... .9 ....9. 5:5 ... 89929.98. 9..
9.9: ......9959 5:. .9: .6. 99999... ....9. 9:. to: 9.99 .9: 9.9 9.5 ..9: ....9 9.93 ..9:.. ...:
.98. 9:9. 9. 0995.339 .6: .. 5:99.53 .6: 6:. .92.; .96..» ..9 9.9.6.9 9:.—. .68.
..9 92. 9... 9:9 9:. 9.9.... .63 5.6 9998.9 ... 295...... 2.8.6. 63 5:. .999... 9: ...

. :5: 9.. 5:9 99:98 ....9 .996
69.9.... ...96388 99.9 :68 9:. 5:3 9:... 9 919.59.79.29. 69:.8 9:. 5.9 9.9.9:
9.8.. 99...? 2.8.5:. 55...... 9 :99: :99. 9. 9.6: 93 .9996 9.... .99: 5.”..9999 9. :6.»
93 .. ...m 9.8.. 93...... 95999:. 9 .3 95.9.9959 ......990 9:. 996.919... 9.9929326

.98... ..9 99.9w 9:953. .26 8:3. ... 35:9 ... 9.59. 9.9.9 £59.82. 5:5 36
2.9.9 9:. .999 :- ...: 856:. 5.92 ... ...... .99 99:88. .95.... 86.99.... 5.0
......990 9:. ... :59. 9...... .2933... 9...? 9.9.99 99 9.9.6.599... 9 938.... ....9 559..
.99... 6 ...—8.9 6: 5:9 9.... 5.59:9. :99 99.9.9999 .98.. 9:. 996.969 .6: :92.
8.... .9 99.5.3 9: ... 95:9: 9. :55... 9.9.... 5.6 56999 9:... .9 12.9.. .6» 9 :95
.9. .99....99 9.9: 9.99:. .55... 99:9. 6: ... ...9 6 96:63. 59: 96: 9.999 99.9.9999
.98.... 99 .392. ...: :999 6:. 9.899 .26... o. 9263...... 699:9 .9259. .—
9569: 29.9.99. 99.6.99 9.: .9 8.9. 5.99.... 9:. ....3 69:5... 9: 9. 5:99. 99.99 9.6
69.9.9929 9...... :6! 9.9.6.6.. 9.. .6: 6.9929 9.339.952... 9.93 5:. 92:3 6.9:...
6.9.9 ..o 9.596.: 993E999. 99.9w 999.99 9.55:..9. 9 .9 9:69:35: 9...: .9 999
99:. 9.9... 9.. :98 £59.99»... 9.96.9 .9 89:9 9:. ...9:9 9:. ... 992999... 9999. 5:.
.8883... 9:. :m99.:. .5... 999.8999 ....9 £999.95: 9.9.9 9.9:. 6:. ...... 99 999...
9.... 939.. E9... .9999.» 39.... 85.6 83.99. .99 29:288. 5:... .9: ...99.8 .996.
99:. 9.9:. 9.. 95.8... 59.0 9.9.. .9995: 9.9.: 9 6:36.99... ..9 99.99 6:59.99
.99... 9:. 5..» 9.9... 8.6:. 9:. ....9 9.99 9: 9.999 999 97. ".969...9».9 9...: .3 9959..
59: 96: 5:. 9.999 .9 989 659...... 9:. .9 69:9: 9:. ..9 .99... 9:. ... 99.3.9 ....95...
98:. 9.93 999.99 99 .9 9:9... 9 69.. 9.25998 9.2... 996.6 2:398 .. .95.... 9.6
9.9... .299 89.5... 5:. ... .. 999.. 9.9999... 9:. 99989: 09.59 .99. 9:. 9.1:... 9399:
:93 99s 3:... .55 99:8 99:96:99 9....-.9... 999.899.... 9 ..9 99.8. 96.98 9.9:.
.9 99.5 .8595 9:91. ..9 8.9:. 9:. ..9 95.99 5...:— 998290 9:. 99.99: 999:. 99 9.9
9...! 9:. 59:9 ..9 99.99. 29.9.6 .9550 9.9.0 9:. .9 68:5... 99:... 99899:. <
9:99. 9.65.5 £6.55... 99.99 ..9 8989
.98.... 09538-52 999 o. 99-9: 55.8. 9:. .993. 9:. 3...... 5:... 9.5.5.5
9...»: :5. 9.. 9. 5996:... 9.2.9.. .6: 6:. 99.6.59. .99. .9 996999... .863:
9:. .3 599.99.... 9.9.9.6.. 99 999:. 1999.59... 8:19:59 69... ...: .959: 953 92.9:-
vuo 93. .9 990 99.9.5. 9.599 .9. 995.89 .....9.9.8 9.9.9 3.99.26 9.... .9 9.99..
.23.. 9:. .96 ..< 96.29.09» 5.9999 999.69.... 999:. 9.8.. :99... .9...— mg..oE .36
99:9... 99 99.9.39 99 9.9... 99.56.99 :9... ..9 :99 .....9 9.98. 9. 8.9.85 6.... 9:. 6:.
9: .. p.990 .52. 9.9.. 99:35 59: 96: 9999.. :999 ..9 6:. 2.9.9.6 £25.69 99
9.9 93:9 :82... 899.999.: 9:. ....9 9:99.. 999:. 99989: 9.99:9 9.: ..O 95.9 9.<
9:95 .9... ..9 99.9» .....9 5.9... ...6 .5399: p99 56...: 9.9 .9:
.999:- 9995 ..9 9.: 9.9.9. 6.9.99.3: ... .99 9: 3.99.. .8962... 4.9.. 9:... 6...: .26
..9 .9195. 9.99.... 9.. 96 999:. Sn .9... .9 99:99.99: .09.. 9:. .9. 9959.5 :9... 9. 9:9:
29.... 99. 9.9... 9.8.. 93...... 989.9 999 859...: 9:. 9. 56.. 59: 96: 3.99. 9:. 9.6
9... 9:. 39.9: 9:... .852. 9 39.. 9.6 99% 9:93 .99. 5.: 593.9: 59.5 9...: .9»;
9.6.9.90 9:. 9.9:! 99.9. 539. 9:. 9.... .99. 6 9.... 99.. 659.6 96. 9:. an 556:!
9.... ..9 99.9 9.. 9. 9.9:. .99.. .9: .9589... .9 9.96. 5999:. 95:598.. ....9 9.9
:9» .56.. .63 9:. ..9 5.69.7992... .98.. 89 3.99... 9:. 99) 6:! ... 9:9: .9 .6999:
9 9.29: ...9. 96: 6:. 9.9.9.. E6 :65 39.. 9 9.9 9.9.; 9...: 9.6.9 9.66.99 «9993.99:
..9 5.9 9.. 9.9.98 399. 9:. 1 5.18 9.8» 92...... 5.3.9: 9 p99 .. .96. 99 .9
.99: ..< :9: 5.99....- 92659 ..9 89°: 9:. 9.9 9.9:. .9: .95... ....99. 9: 9. 8.....9. 99
9.6 9.9:... 8998...: 95.8.1853 9. 99.8 99. £99.60 9:. 9.99.. 63...;—

Bmmmuxm EXMB MQQde d

ﬂ xHQmemﬂ

 

APPENDIX B

RESPONSE CHARACTERISTICS OF TEAC TAPE RECORDER
AND BRUEL & KJAER MICROPHONE

 

119

APPENDIX B

RESPONSE CHARACTERISTICS OF TEAC TAPE RECORDER
AND BRUEL & KJAER MICROPHONE

 

 

 
 

 

 

(8) W88“ WOW ‘ble‘cT w
...: C C 8 8 ' 1 . I 1 I g 1 . . | , ' . n
8180' & Kior Pohntiomotu Range: 50 dB Roam—L“ ‘ Low-r Um. Freq: 970 H: Wt. Spud: ‘9 nth/toe.
' Gui-I-

    
 
 
 

1 .3.

 
  

maturing Obj;

 

 

Minn—___—

Teac,.—
‘AjIOIO '

 
         

Hz 80 'N zoo 500 10” 2000 5000 1m 2001
01124 Multiply Froquoncy Sale by Zero Lani: (18121211

up (1)) mu... ww-

 

 

APPENDIX C

SUMMARY OF THE RESULTS FROM PAUSE ELIMINATION

.aOHumuav mo 0m>umuaﬁ vanauommum,m:u mo 0ma00m unauso mnu aﬁmuno ou m00m0u¢>
mvma wma m90m> a< 009 .ome ON 00 ucmumaoo unmx was m=0m> my use .Aumm :00 :uwcwH usaca %0 Aomm Gav nuwamH
uaauao waﬁvﬁ>ﬁu zn kusaaoo mma oaumu ~59 .mumxmomm 00m Mom mousaﬁa N mma andmﬁm summam anus“ mo coaumusv one

 

 

 

 

 

 

 

 

 

 

 

000. 0N0p0. 0N0. 000. NON.0 5N0. 500. N00. ¢00. .D.0
000w 00N.00 000. 00¢. 00¢.00 500. 00¢. 000.00 000. ﬂame
500. 050.00 000. 000. 00¢.00 000. 00¢. 000.00 050. 00
000. 000.00 000. N0¢. 000.00 050. 000. 000.00 000. 0
000. 000.00 000. 000. 000.00 000. 50¢. 5¢¢.00 0¢0. 0
000. 0¢N.00 000. 000. 050.00 000. 00¢. 0N0.00 0¢0. 5
000. 00N.00 000. N00. 00N.00 0¢0. 00¢. 000.00 000. 0
¢5¢. 000.00 000. N0¢. 000.50 050. 00¢. 005.50 550. 0
mm 00¢. 0¢5.50 0N0. 00¢. 5¢0.00 000. 50¢. 000.00 050. ¢
000. 000.00 0N0. 05¢. 000.50 000. 50¢. 505.00 050. m
000. 00N.00 0N0. 000. ¢05.00 000. N00. 00N.00 000. N
000. 000.00 000. N0¢. 000.00 0¢0. 00¢. 00¢.00 050. 0
oaumu Aoomvusauso \m< .IMNMMMI. Aoomvusmwso 10mm: oaamu Aowmvuamuno 10¢ memmmm
omma ON a my omms om n AH omma 0N n ma
scammﬁamamuu £00mmwamamuu scammaamamuu
ummawg HmBuoz maosamHmH

 

 

ZOHBﬂZHEHAm mdem 20mm mBQDmmm Ema ho MMdZSDm

U xHDmem<

APPENDIX D

COMPLETE—LINK DENDROGRAMS OF IDS AND LTAS FEATURES

 

121

 

.1 1».
l1.- .

1,-
L.

---------- -----------
i----
C
O
B
O

---+
+

\
--------

-----------
-

I----------—

------------------
O
I
H
C
N
O

1
Mr
I
1

.------—--
1

..ﬂ------------.----------.----
I----- ----

...---------o-----
.---o ---------

..h---------------

...---------------------------

E
I

.----- --------

.‘ﬁo-----------o-----

.--------.

.---------.

.----
I------.-

0---------

...-------------o -

-----
I

-
I
.-----------
-
I
I
I
I
I
I
I
I
I
I
I
I
-

..------- --------

...----

...---0 ----..-.---O-----

-
|-----------

....------------.-----

: o...----...----........

-
..---.-------
-
I

.N.....------------.-

I--------
H
O
I
K
0
0

.ﬁ“------ ----------
--

o.“-------------..----------'
.---.--

- .-------
I-------o-----

...----------

.-
I
g-
|---.
.----

.-----

...--
...-------------------. -
I-----------

..‘------.-.---.--0---.
I-------.-o-
..ﬁ----------- ----o-----
.-----------o

...---------------------
..'------------------o-oo

...-----------o-o---o---ooc--
...o-------------o--------

...--------------
..~---------------
..'------------0 -----

. ...---------------o------
...-----------------

...----o-------—--------------
..K---- ------------------------
..“D-.-----------0-000-----
..~---------.----------
...-------------o -. .----
..'---------o---o--oo

.ﬂ.-------------------— .----0
.~.----.-—--------------.---

I
..'---------------------
.-.--------------------
.-.---- .- —-----.------
.-~---- -- ---oo----------
.------------- .---o

I----- .

.-.-oo-o-o-o-o ..-----
...----
.".---- ------------o
.... --- ----------.

.~----. ------.-.-0-

Qﬁ----------------o-------
Q“.--------------------
.".---------o------------
...--------------------o O
..—--------------
.~.-----------------o
.'.-------------
...--------------
.”.-------------
...----------.

.-.-’-. ----------

.~~O ----c ---

’~.--.- --------

...--------o-----
°ﬁ¢----------------

.ﬁ.--------
.ﬁ.--------
.hﬁ------
.ﬁ.------
..h-O
..'---
-..---- -
f.“----
9..---- -

C ..-------------------
‘
.
.
C
.
.

o. a. .0 no a. ‘
J-h.i-U . .a '0

3
2
3
3'
8
'6
3
3
3
t
t
I
R
t

.n o. 2 p. a. :
.5638 .3: .IIIII I... a?

i
a
t
a
z
2
x
a
a
a

‘ g
a
a

.mQH ”0200.70000de 20 Dmmdm mmmDBdmm 00H n00 Eoomn—Zmo

Aavo xHozmmmm

APPENDIX D(2)

DENDROGRAM OF 100 FEATURES BASED ON NORMAL IDS.

bunt.
I)!” 07

III-50.31t

'~ '0

III” I?

'ILII IIIIBO.IIL

O'HJWIJﬂlI'IDUII
FIIIIIII' Ll‘tt I

I. n 93‘” '7‘"

I. II II 6' I, 09 7| ’3 ,3 77 ’9 CI

"II?

03 CI ‘7 0' II

I. II II II 37 I. II

I. II I, II II

122

.-.----------------

.-.-------------- ----------
.----------

----------
.-..------------------------

..ﬁ-----. -------- -----------.-------------
\
..ﬁ-------------------------------

-

 

I
...------ 3----
..----- :--------- .
.-- .
..~---- .-----

 

.~.-----.---. .--------
.-

I

I

3
..~--------.- ‘--------------
.ﬁ“---------- I
.“.--- ------ .--------------
-

.ﬁvu--------------------------------

...--------------------------- --------

.-.-----------------o
.-------
..------------------- Q
...------------- .-----
.------------ I
.ﬁ”-------------

-----------
.~~------.---------------

.nﬁ-o-----------o :--------‘

...-------- |--------
.------
.~.--------

.“.--------------------------

.ﬁ.--- -----------------------
.------------
.ﬁ------------1--
I-------------
.ﬁ.-----------o-

..~-------------------

..--------- ------.
.----------
..~------.--------
.------
.".-------o----
.----.-------- .
.".----.---o--- '
|-----------------
.ﬂ.------.----c------------
I
.~'----------------- .-------
I

.~“--------- .----—--
--------
.~.------ I
-

.~.-----

..°----------—-------------------- .-------- -----
.--------- I
.‘ﬁ----------------------.------.- :
.-ﬁ------------------------- I
I -
...-----------—----o .------------- I
.-------- . .
...-----.---------—- .
|---------- -----

ono------~--------o---—-—:-__--------_--J
ono---------—----------—--
oun-----------oo-—-o---‘_-____--___-_-
.5.-.-------.-.---------
oco-o--o-s----—------‘_-__---_-_-_--_J o
onnoo—o---~---—--oo---

..‘-------------------. .
.------------------

.‘.------- - ----------------
...------.-----.--o

...-----0.
...----
...------
|-----
...------
...-------
.-------
...-----o -

..'----------------

-
.---------------- ---
.--
...--------------

...------------------o----- -

 

.~~----- ---oo----. -------
...---------------------

---------------
....----- .----------------
..----------------------

I

...----------------------- .----------------

...-------o ----------------
-----------------
...----- .--------
---------

n a «noonno-noo-vnonnsonno--oo-c

'ﬂ 5 "N .. H‘...'~-.“n.”o.ﬂnﬂ‘....nﬂ.“..hz. “a“ g
C .. ... . I I I . I

653336035‘5‘5"ooooooooooooooooooooo00.000.000.000

------------}-

I
I
I
I

.----------I------

------- -----g-

-.--- ----
I
I--

 

--.}-

 

..--i.-

 

..OIOR
00°“
:~::
0".
......
......

 

 

 

 

 

 

----------{.-

 

-----

123

i

.----------

.-~-------------------

‘ ...--------
..---
..N-
...-----

: ...----
...----

g ...-
...-

b
O

 

I-4--
2
O

.---------------

O
k
-
O
-

o
O

_i

--------’--
.----------------

-----------------{-.

.------------------
..---.---------------

: ..Q-ooo--.---.‘°-’--‘ ---O----

8

.ﬁ-------------------.-----------

I
-
.--------

------
‘ ...------------------. -
.------

...-------------oo--o.------

:------

-

i----- ----------
.----

.-------

-

O
3 I
--------------
O
OI
H
VI
0
O
C

.“.--------o o

-
.- -------------------

.----------

.----
.O-o --------

...-o------o ------

.---
|------------

...--------- ------
-

'8?
.

onoon.o
nosos.o
ocoos.o
0.....0.
"no—0.. .
use»...
...no.o
ao.no.o
9......
0.000..
..nu...
ouuco.o

-
..---- --

.ﬁ~----------------------------------
...------ ----- -------------------.

..----------------------------
...--------------------------

.“..------o -------------------O--
...------------------------o----
...--------o-------------------
...----------------------------
..ﬁ-------------o ------------
...----o---.---------.------
...----oo -------- ----------
.-.-----------------------

.-.---------o-----o-o -
....-------- --- ------
...-O ---. -- 0 ------
.'.--------. -----
.~.----------.

...-----------------.------
...-----------o-----------
.------o --- -------------o
...-----.--.-o--. .------
.~“----- ---------------
.“'------------------o
..-----------------o

...---------------o----
...-----o --.----------
.-----------

‘ ...---------.---------
...-----------o--9--
..ﬁ-----..- --------
...-------- ---
...-----.-o--
...-------o --
..ﬁ---------

‘ ...--------
...--------
...---------
..---------
..~----------o
.p,.------------
.A.-------. ---
...----------
...-----------o
.“.----o ------

...-------o -----------------
...------------------------
...----------- ----------

...------------------------
...---------------o-----------
...----------0 .-- oo------
.-.--------------------

...------------------
...-------o --------
...---------------
..~---------------
...-----------o--

.‘.-------- -----
..‘------ -----
.-.---------
.~.--------
.-.------o------
.-.------------

...--------o--------
.n...------.-. --

.‘.-----o ------------
...-----------------
..3.----

r .“.----------------------.-----------------

: ..ﬁ-----------.o------------------
‘ ...--o--------------o------------o
. ...-------------------------

' .~.---------—-----------

: .~~---------.----------
. ...------oo-------
. .“.------------

I
.
-
O
O
O
h
C
a
C

C
O
s
O
a
I
I
R
8

I:
8
3
t

D. a. I. a. .- JIDUJ ubululllt
J-b‘inb .035 .i S: Uri

on an .0 a.
a-h..lll-b

3
a
C
I
3

8 I
A

.mQH m<mZHO ZO Dmm<m mmmbﬁdmm 00H m0 Edmwomozma

Amy 9 xHozmmma

 

. .00....
. ......O

 

l

.---q--

cu.....
...-...
«poo...
no.5...
.......
no.0...
..sn...
0......
0......
.nn....
0......

 

O

 

I-

124

.-.-O- ----------------------
|-------------------—

.-------

.-‘------------------------
I

.~.---------o----
.~--------.—----
.G.-----—oooo--.
.‘nﬁ0-uooc-ooo---
.-.oo-oo------.--
...-------------------
.-~-------. ------
.-H------------—

.-“------
.‘.-----

I
b
O
8
I
O

------------}--
f
u}.--

sounn..
cannn..
canan..
ns..~..
...-«.0
v«.~n..
...on..
......O
mono...
.......
«co-...
n.nn..0
0......
n.csc..
.nn....
.no....

------------}--

-
I
I
I
I
I
I
I
I
I
I
I
I
I
I

-

-
I
I
I

I
I
I
i
I
-
-
I

1
----------------}--

-----------------

Q-----------

.-----

.--—---------

-----------------'
-
I
I
I
I
I
I
-
-
I
O
I

no.~o..
..n.0..
.nn.0..
.00....
an.....
..on...
on.n...
......O
.svn...
mono...
....s..
«con...
0.5.»..
......O

1
I---------------

...------------------

-----------------------;----

.-------

..‘----------

.- -------

I
-

I

I

I

I

I
I
.--------

.a.------------
.-----

n.o.s..
.nno...
.00....
noon...
00.0...
cans...
n.n....
...-...
98H... .
.«.c...
......0
us.....

.ﬁ.----—------
|---------

.---
..ﬁ--------

3------------------

.----------
.---

|-----
.ﬁ~------

.---
.“.------

.h.--...
-

-
-
...-----------------------—----

...--.------o--—o-------------—----
...----------------------------

.~«--o ----------------
..“------------------
...-----------------
."ﬁ--‘------------
...—-----------

.n.-—-----------
.“.--,---------
.nﬁc --------
.n-------
..--------
.nﬁ---. ----
...----------

“ .'.-------
.n.--------

.nﬁ-------o----
...----------
...-------
.ﬁ°-------
...-----
°“-------

.ﬁ'- ----------
...------------
...- ------
..'-------
..‘---

...o----

Oil.-'-

...---

...-------.----
...-.-----
...—------
onco-oo-----
.on~-—-

...-o-o

..“------------
..-.-------

.'0--------------

.“ﬂ----—---
.“.-----
.“ﬁ------
..----

...-------
. ...---------------------------—-

I. n. a. .0 a. n. a. an .u A. In an JU’UJ punt-03.

4.53.8 ...... .‘i’. i..— gag

-. .0 .0

-
‘C
S
K
.1
3
II
4'
.-
l-
0'
C
h
'
.1
C
n
C

.

<—

I I

‘ a
a
a

an...nlrdo

.mﬂBQ MZOEAMAMB ZO OMm<m mm¢59<mm 00H m0 dewomozmn

Rwy a xHozmmm¢

125

 

 

i

.IIIIIIIIIIIIIIIII.

_l

I------.
--.--.I------

-----------------1--
4.
--------------------1-

3»-
5
3

---------------
---------------------

s
j

I
I
I
.-----------

. --------
-
---------

.--------
.----
.-----

O

-
.--------------.--

i

I
-
.-----------------------
H
O
C
N
C
O

0
Q-------
-

-
I
-
.------
...-..-.----.-.----......-0......-...-.

|---- 0 ----
‘-------
.--------

.--
-
-
I--.
-

I
.-----
I
O.-
-

I
.-

...-----------. --------o
-

..----------------------
...-----------------------
.~.----------------------
.‘.-----------.-------

.‘n--------

.‘---------
..ﬂ---------_-
...O....... --
...-------- --- --
..~-----------
...---------

: .'K------.--..I----------_
.“.-----------
.‘.----.-
.h.-------------

...----
.‘.------

: ...-----.|-------o---
..ﬁ------.---.;..
.“ﬁ--------

.«.-----
t ...-------o---

.“.-------------
..------------
..~----------
...---------
.ﬁ--------
...------
...---------

...-----. -
...------
..“------
...-----
...-.-.--.---0.--
..ﬁ-------
000.----- -
...-----
.‘.-----o --
.~.-------
g ...---
...---
..-----.-
..~-----
.‘.----
.ﬁ.--------
...-~-
..~----
...-o-
...---
...-----
...----
..~----
...---
...-----

: ..-------------..--------------------------
O.”-

: ...--—

‘ ..-----

k .~'-----------

I 0¢0--------

g ."n-------------
‘ ..ﬁ--------

. .ﬁ.------

. .“.-----

3
ﬂ
.
-
.
I
S
-
C
-
h
5
C
I
C
3
3
3
0
.

.0 .0 n. O. n.

B
8
8

a. .u .u 0. a. a. g..~l-i
Ju...’..l .39. .i. .9. 0.04%

I I
C

.. .0... 4.5.8....
.0090 Adzmoz zo Dmmﬂm mmmaedmm 000 mo Edmwomozmo

Amy a xHazmmm¢

APPENDIX D (6)

DENDROGRAM OF 100 FEATURES BASED ON LINEAR LTAS.

I” ’0
II II .7 I, II

It "nown.

II II .7 II II II II 77

II II I? I. II

'ILII IL'III.VII

III'IIII LII! III-IIIIIII

II II I? II

II II

II II I1

II II 07 II II

II II I? II II

II II I? II II

IIIIIIIIV LIUIL I

126

.«.------- I
.«.--- .-----------I------------
---- i

.~“------------- -----------

...----------.’-------------

..“---------------------
.~~----o------ .------

..-------- .----------. I

..~------.'--.- I--------------
.-.---------- .
.-'---------..-----------—----- I
...-.0...--..--.------

...-------.---o-------

..~---.---
...----. .............I.--.......-............
.---------9-

.-.-----------------.-------o--

...-o--------------------------

...---..------ I
...---- .. I---------
..‘.. I-----------------
...—J----'°"'

...---0---------

I
.----

.-“---------
.-- -
.‘.----.----

I
.-.------ .----------------
-
|.-----
.-.------- D

.~..-------------------

.~~------------- .--------------------
.-------
.~.-------------

00.---..

.0...-
I

|----------------------
-

00¢-—------
...-------..----—--------.
..n‘ﬁ ------

...--------------
...-----------
I---

I
I
I
I
I
I
I
I
I
I
I
I
I

[-------—

.nﬁ------
.—----------

...--------- .-----------.
...---- Q----. I
.- ----

..ﬁ----

.'---------
- I----------------
..“--—-----

..~----------
I----------------

.----------
.n |----------------
..-----------

.'~------------- |----------------------
.--- ---

..~—-------

0&0-.. .---------------
------

.5.--
Oh.--

...--1
---
...- I
Do-
‘5‘.
...---
---

..~------
...---- -..------

.m--------

...--------

..uronvn....-I~.n..
.....nn-o...v~.n.n
0N0 ...n.nn.-nnn.
«sonogonn-ounon...
.0... 00......8... .
.0... c. I COOIIOOOI. .
......OOOOOOOOOOO...

-----------------1----------

.-----------------------------

.-----O-‘C

---{---------

---3-------_--

-4----------

 

 

 

---i----------

 

 

APPENDIX E

RAW DATA OF FFC FEATURES

1227

APPENDIX E

RAW DATA OF FFC FEATURES: BY LINEAR TRANSMISSION

 

speaker

number number

4unu¢

U§UN~ U0“
MIN-d M§UN~ mwat— UI“UN~ ubqu— UIK‘UNO— U‘IkUN
:— VIDUNv-o

LIbUNv-t

i6

(Hz)

107.24
109.29
106.77
110.73
109.21

89.66
90.08
88.33
898 46
91.92

108.91
109.47
112.35
110.15
109.00

127.56
130.94
131.27
131.03
125.73

107.33
110.56
111.06
111.69
109.81

165.30
165.14
164.52
173.73
160.64

116.62
117.27
113.69
115.08
115.15

118.43
117.41
119.79
119.46
115.22

115.00
110.27
108.88
112.15
107.63

146.24
148.26
151.89
158.83
156.70

oFo

&

16.69
17.18
13.86
16.39
17.60

13.47
11.50
10.94
12.28
10.46

19.83
18.54
19.84
22.19

22.40

24.14
28.20
19.17
20.33
26.98

64

.-.14
18.51
22.03
23.96
21.11

33.61
31.63
40.46
39.26
42.05

24.00
28.89
19.98
20.08
22.72

33.15
27.95
36.01
32.73
21.98

28.89
25.38
18.64
24.50
20.12

43.08
46.44
40.30
48.06
45.91

ZFo
.iﬂaL.

5.53
6.18
4.18
5.05
5.62

3.69
3.74
3.11

3.98 '

2.53

3.43

6061 '

4.81
5.04
4.87

6.09
6.13
5.15
7.47
6.07

5.91
5.86
5.69
7.56
6.09

10.77
8.45
9.25
9.94
9.12

4.53
4.72
4.18
4.13
4.26

6.38
6.57
4.70
5.73
5.07

4.70
4.82
3.90
4.27
3.60

7.24
8.46
6.64
8.54
7.94

KPo/io

ratio

5.16
5.66
3.92
4.56
5.14

4.12
4.15
3.52
4.45
2.76

3.15
6.04
4.28
4.57
4.47

4.77
4.68
3.92
5.70
4.83

5.50
5.30
5.13
6.77
5.54

6.51
5.12
5.62
5.72
5.68

3.88
4.02
3.68
3.59
3.70

5.38
5.60
3.92
4.79
4.40

4.09
4.37
3.58
3.80
3.35

4.95
5.71
.4.37
5.38
5.07

GAO
1321.

43.43
43.75
43.16
45.59
42.06

49.17
44.28
49.00
53.00
45.28

43.08
19.03
50.04
46.57
47.06

40.71
45.72
43.74
101.62
46.27

46.77
50.61
48.33
46.65
43.14

20.24
41.26
44.99
40.25
44.97

47.22
48.08
43.95
45.07
45.31

47.12
46.17
44.83
41.50
45.26

53.14
47.78
50.73
42.30
48.00

44.10
43.23
40.76
42.78
41.93

Founx)
(82)

163.93
204.08
172.41
161.29
178.57

125.00
138.89
121.95
112.36
109.89

153.85
166.67
166.67
156.25
158.73

217.39
181.82
178.57
185.19
208.33

153.85
151.52
151.52
370.37
175.44

277.78
227.27
277.78
263.16
270.27

181.82
185.19
153.85
151.52

166.67

232.56
181.82
192.31
192.31
161.29

185.19
185.19
151.52
166.67
156.25

256.41
294.12
238.10
344.83
357.14

'Fo(-1n)

(Hz)

67.57
64.52
53.76
57.47
82.64

70.92
68.49
70.42
57.47
78.74

73.53
78.74
69.44
79.37
49.50

90.91
66.67
104.17
93.46
90.09

45.05
86.21
53.76
74.63
44.25

83.33
86.21
98.04
88.50
71.43

80.00
66.67
86.96
80.65
88.50

80.65
85.47
84.03
80.65
84.75

75.76
74.07
79.37
89.29
78.12

99.01
68.97
104.17
94.34
94.34

Po(rna)

(Hz)

96.37
139.57
118.65
103.82

95.93

54.08
70.40
51.53
54.89
31.15

80.32
87.93
97.22
76.88
109.23

126.48
115.15
74.40
91.73
118.24

108.80
65.31
97.75

295.74

131.19

194.44
141.07
179.74
174.66
198.84

101.82
118.52
66.89
70.87
78.17

151.91
96.35

103.27'

111.66
76.54

109.43
111.11
72.15
77.38
78.12

157.40
225.15
133.93
250.49
262.80

34.
3221.

8761.86
8643.10
8951.75
9001.60
8603.13

9151.75
9495.82
9320.30
10542.20
8993.58

8752.73
4783.08
9384.75
9170.75
9280.34

8228.55
9009.06
8677.66
3744.43
8950.38

9072.82
9663.06
9467.44
8978.88
8624.11

5174.55
8482.85
8697.50
8245.51
8908.37

8860.46
9619.97
8684.46
8772.21
8894.06

8947.70
8767.81
8580.54
8450.04
8803.62

10158.54
9009.67
9409.29
8383.58
9300.37

8879.20
8156.65
7961.03
8422.73
8485.80

 

128

APPENDIX E

RAW DATA OF FFC FEATURES:

 

BY NORMAL TRANSMISSION

 

speaker eenple
number

VID-
ur9—- u1§tuh1~ UIbtahaw UIbtthN uubtdha— un§ndre~ U\§\Aha~ u~§1aha— u1§uo~
.—

U§UN~

F6

106.87
108.29
109.11
107.85
107.88

84.67
90.79
91.83
88.39
93.77

105.26
109.99
115.17
111.88
104.27

132.93
121.32
127.80
122.78
126.17

109.47
109.90
106.87
107.34
106.76

164.97
167.78
170.50
171.24
162.60

117.38
118.60
118.57
118.72
112.41

113.28
111.72
127.53
116.75
112.59

114.59
120.09
106.59
114.73
107.75

160.76
146.77
157.68
157.99
151.56

0'0

11.52
12.62
11.35
12.72
12.91

10.00
17.94
14039
13.06
12.85

15.30
15.40
14.48
13.82
26.21

34.67
18.62
34.89
26.37
37.82

24.46
18.17
11.27
13.71
24.63

33.15
25.10
39.29
31.82
38.34

21.91
19.42
21.61
21.21
15.34

23.39
19.59
39.21
32.64
28.71

28.86
26.85
21.04
25.68
21.50

52.81
50.57
52.30
50.07
55.50

3P0

4.49
4.31
3.90
5.04
5.13

2.87
3.93
3.02
3.79
3.86

4.97
4.90
3.93
3.84
4.84

4.94
4.21
4.96
5.66
5.35

5.25
5.76
4.80
5.73

5.41,

8.41
7.95
8.22
7.94
7.27

3.81
4.05
4.56
3.97
4.47

5.44
5.14
5.74
5.01
4.32

3.85
4.33
4.36
3.54
4.19

6.26
7.43
6.45
6.77
5.93

Etc/Po

(recto) 5321.

4.20
3.98
3.58
4.67
4.76

3.40
4.33
3.29
4.28
4.12

4.72
4.46
3.41
3.44
4.64

3.72
3047
3.88
4.61
4.24

4.90
5.24
4.49
5.34
5.07

5.10
_4.74
4.82
4.64
4.47

3.24
3.42
3.85
3.34
3.98

4.80
4.60
4.50
4.29
3.84

3.36
3.61
4.09
3.08
3.88

3.89
5.07
4.09
4.28
3.91

060

41.54
41.83
42.37
47.56
38.67

50.49
53.35
58.51
48.70
53.71

42.07
43.06
47.74
46.56
42.67

43.01
44.99
45.05
45.34
44.81

45.57
50.19
57.30
44.06
56.02

42.06
42.25
42.63
42.10
47.92

48.10
49.48
48.24
46.74
47.95

55.73
44.92
42.38
49.61
55.10

46.13
48.06
51.52
54.27
44.59

42.55
42.94
44.54
43.08
42.41

Po(nex)

(H22

126.58
147.06
135.14
138.89
149.25

98.04
116.28
113.64
123.46
117.65

129.87
138.89
140.85
147.06
136.99

188.68
151.52
196.08
172.41
217.39

147.06
131.58
128.21
133.33
131.58

232.56
212.77
243.90
256.41
270.27

156.25
158.73
151.52
149.25
135.14

153.85
147.06
192.31
161.29
161.29

161.29
158.73
147.06
161.29
149.25

256.41
294.12
250.00
243.90
250.00

70(31n)

53:)

75.76
90.91
90.09
71.94
84.03

67.57
55.25
71.94
72.99
72.46

49.02
78.12
97.09
90.91
45.05

100.00
89.29
89.29
96.15
83.33

88.50
90.09
78.12
86.21
45.87

108.70
131.58
126.58
131.58
126.58

92.59
91.74
95.24
94.34
92.59

90.09
92.59
98.04
86.96
86.96

85.47
81.30
84.75
86.21
74.07

116.28
112.36
114.94
116.28

93.46

Fo(rns)
gas)

50.82
56.15
45.05
66.95
65.22

30.47
61.03
41.69
50.46
45.18

80.85
60476
43476
56415
91494

88.68
62.23
106.79
76.26
134.06

58.56
41.49
50.08
47.13
85.71

123.86

81.19
117.32
124.83
143.69

63.66
66.99
56.28
54.91
42.54

63.76
54.47
94.27
74.33
74.33

75.82
77.43
62.31
75.08
75.18

140.13
181.76
135.06
127.62
156.54

316
5321.

8612.94
8371.87
8640.17
9021.36
8045.95

9486.71
10147.71
10640.63

9866.24

9906.37

7994.99
8594.69
9141.12
8908.32
9057.27

8491.22
9071.54
9001.69
9241.12
8633.23

9136.10
9117.34
10205.92
8772.67
10280.46

8170.47
8447.16
8434.58
8257.78
8852.94

9380.95
9309.45
9451.14
9089.74
8823.71

10916.22
8686.57
8182.26
9662.65

10539.89

8442.39
9319.74
10265.07
9850.66
8694.71

8352.17
8050.48
8914.43
8617.10
8536.33

1JZ9

 

APPENDIX E
RAW DATA OF FFC FEATURES: BY TELEPHONE TRANSMISSION
speaker aa-ple 1'0 oh 390 610/170 an !o (In) lo(-1n) [on-n.) KM
number nunber 5H1) $1122 311:} gratin) £1. pm 5322 $1122 221.
1 107.41 21.46 5.29 4.93 45.14 136.99 49.26 97.73 9552.70
2 109.07 12.96 4.51 4.13 49.42 138.89 74.07 64.81 9637.16
1 3 110.94 19.38 6.87 6.19 45.18 151.52 59.88 91.63 9015.24
6 107.75 18.81 5.53 5.14 43.04 147.06 44.64 102.42 8968.29
5 108.75 13.72 5.79 5.33 44.14 140.85 84.75 56.10 8357.97
1 87.89 14.65 3.74 4.25 48.81 123.46 40.98 82.47 9445.13
2 91.62 14.45 4.93 5.38 50.22 135.14 61.35 73.79 ‘9881.80
2 3 90.07 13.22 4.12 4.58 50.46 114.94 68.97 45.98 9614.30
6 87.95 13.60 4.01 4.56 53.07 121.95 44.84 77.11 10119.20
5 90.48 17.98 4.47 4.94 55.89 142.86 64.10' 78.75 10010.96
1 106.51 25.01 4.77 4.48 51.77 147.06 45.45 101.60 10059.78
2 110.28 18.47 4.32 3.91 46.83 149.25 78.74 70.51 8950.81
3 3 113.42 18.17 4.74 4.18 41.28 166.67 88.50 78.17 8340.65
4 107.50‘ 23.61 4.69 4.36 48.25 147.06 46.95 100.11 9309.29
5 112.63 21.00 4.45 3.95 48.63 151.52 75.76 75.76 9403.09
1 126.64 26.89 5.74 4.53 44.00 188.68 79.37 109.31 8607.10
2 128.27 30.66 6.88 5.36 48.98 192.31 65.79 126.52 9158.48
4 3 129.49 31.27 6.18 4.77 48.59 188.68 85.47 103.21 9666.28
6 128.04 26.58 5.64 4.40 39.82 188.68 78.74 109.94 8031.53
1 109.01 23.13 5.24 4.80 42.24 172.41 77.52 94.89 8524.42
2 111.30 19.18 4.81 4.32 46.51 172.41 81.30 91.11 8980.70
4 110.29 27.92 6.75 6.12 50.52.166.67 47.85 118.82 9532.09
5 112.44 24.89 5.88 5.23 52.23 158.73 47.39 111.34 10119.64
2 165.37 37.07 8.73 5.28 42.66 285.71 98.04 187.68 ' 8232.82
6 3 169.01 41.91 9.86 5.84 45.49 277.78 90.09 187.69 8986.43
6 168.65 39.11 10.43 6.18 40.62 263.16 90.09 173.07 8412.08
5 161.90 38.33 9.36 5.78 43.60 270.27 64.94 205.34 8372.09
1 115.97 24.94 4.52 3.90 45.13 192.31 88.50 103.81 8982.04
2 118.82 26.35 4.84 4.08 47.23 185319 89.29 95.90 8893.48
3 115.74 24.03 5.58 4.82 45.87 158.73 75.76 82.97 9061.41
7 4 120.14 27.20 5.65 4.70 48.72 178.57 91.74 86.83 9427.97
5 116.20 21.59 4.61 3.97 48.05 172.41 89.29 83.13 9538.86
1 120.21 37.86 5.95 4.95 47.92 212.77 91.74 121.02 9418.07
2 117.39 35.26 5.51 4.69 48.07 200.00 52.63 147.37 8961.88
8 3 120.33 35.35 5.78 4.81 46.55 188.68 73.53 115.15 8968.28
6 120.33 38.26 6.34 5.27 47.73 250.00— 75.76 174.24 9348.98
5 114.78 34.96 6.26 5.45 47.14 188.68 44.44 144.23 9310.05
1 112.60 27.63 4.66 4.14 50.84 169.49 84.03 85.46 8916.34
2 112.49 25.69 5.15 4.58 49.98 169.49 78.74 90.75 9965.02
9 3 111.21 22.89 4.77 4.29 46.76 185.19 86.21 98.98. 9126.73
4 114.17 24.45 4.22 3.70 48.39 161.29 89.29 72.00 9202.25
5 108.19 20.40 4.59 4.24 52.75 161.29 78.12 83.17 10054.63
1 151.49 42.67 7.30 4.82 45.72 263.16 112.36 150.80 9392.55
2 150.60 46.24 6.64 4.41 47.69 277.78.114.94 162.84 9158.49
10 3 156.42 45.91 8.83 5.65 51.99 344.83 107.53 237.30 10142.53
A 156.95 40071 7.16 4.56 45.83 263.16 119.05 144.11 9048.48
5 161.80 37.00 7.27 4.49 46.02 232.56 117.65 114.91 9215.12

 

 

 

APPENDIX F

A SAMPLE OUTPUT: PARTIAL COMPUTER PRINT—OUT
OF VOICE IDENTIFICATION OPERATION

(Cross—transmission by the composite of FFC and LTAS)

 

 

am 04430)
MM IV
"6610. who);

Ill“ M17“ '11.! III [KO-IIMI‘

Aldo. ”S'll ’1L( 1- 7680.414.“

[1‘ .W 0' ram. '11.! an: M 'ILI

wago- “SIC‘ fit! '80- Vl‘lluli

hid!» MS'II IILI no.- YIKIIMIl

A SAMPLE OUTPUT:

130

APPENDIX 1?

 

 

PARTIAL COMPUTER PRINT -OUT

OF VOICE IDENTIFICATION OPERATION

(Cross-transmission by the composite of FFC and LTAS)

7“ ("run-V 111.

.1—o:uu:o.ru
“NWSOJR
441

r‘ﬂOIIILISO J11
75601-04'50 .711

"(Wu-4' two. I“ racuatv noun".
MN Inn-o I 69 4 .

“onw. emu-non:
Inn-no 4 I ’1

I464..." 4 5 ’I

mulunu 0v bun-om.
«It»!!! 49 Iowa...

(“1101“ 8187”. “1‘04 ::I'f¢ll

”ItYY‘II I w I“ m-

I or we“ “(ﬁll 1

' una- "HI’I II 18 mm” I!"

“my. 9
unra- 3 a
"ﬂat! 0

A“ “It. I.
5. 4243.

 

 

 

 

1.94.4111. u ..usma. .1 «2474413. .1 4.44741... 1: 1.9144413. 11 ...-.... h 4...... ...”, 0...:

4.4.2311. a: «3.3.1:. 1: 7.2.4.4”. 2: 4.4.4.0... 11 4.074149. 21 a...“ . “_ ...-...... . .. 4.4.." . ”J.

3.111.211. 1: 1.009750. :1 9.0711111. 1: 8.5404(0- a: mum“. n

...:acn. .1 9.4.344... 41 4.4.07.1:- 49 0.44.74... .1 «ma. 4:

3.39420. :1 8.70210”. :1 4.04034”. 91 7.100914. :2 «1.34913. 33

9.17214... .1 3.044.":- n ..«anu. .1 ......71.. .1 ..am... .1

9.430411. 7) 4.4710011. 71 7.14303". ’1 5.4700(4- 7) 4.0467415. 71 In..." be In... M ecu-It. ...)..-

9-90519119 I! M7121“:- I) 4.7559139 .1 4.7944(4- 83 LMOIS. I, a...“ 3..., n... h..., It! “0‘".

2.102.... .1 ......111. .1 ......tu. .1 “4:7"... 41 «99214.4. .1

...mumu manna...) 4.97.941...” 4.1.94.4...) 4.17....4...) 1 , .....7.
um» I or ...-4.... 9.4.4: I u 114-nun. .n- ’ ’ ‘-‘°‘"
1,14121‘00' I. m 9160.: I 3 8.9.44

6 4 L354"
1 a ..nua

tutu..- .m-eu unu- 96va a 49 ...-an... m 1 . . 3.21:4:
.... mm... 1 er Ino- vuu- ..

1.4444411. 11 3.4444“:- 11 1.9.1.4.). u ..u-a... .1 ..mu 1: ’ ’ ““30

....mn. a: 3.4772112. 1: 1.4474111. 21 ...muo. a manta. a: ' ' """’

444.491.. 1: 4.444141:- 3: 4.744414). 4: ..nuoru n amen. n ' ’ "a":

1.1344)... .1 ....aona. .1 4.999440. .1 0.1844014. 49 «44.4.... .1 ‘° " 4.42941

..man. :1 «74:94.1. :7 «494440. .1 Luau. :1 ..umn. 41 ‘7‘

969.4112. .1 ...».411. .1 .21."... .1 c.4410". .1 "g," on. ...-...... , 4. Mom ..-...

IJW’NI I. a)
I

 

"'w" ' ' 9. “914:1“. a: «convu- 31 7.1941":- 31 ..muu. 41 ..noun. 4)
""“" ' ' " 3.7.349... .1 ..auuu. .1 1.911490. .1 4.9942414. 41 «44151.5. .1
"""" ° ’ " ..1744411. 11 ..auona. n 7.74-9.11. 71 9.04924... 71 7.07.7114. n
"""‘" ' ‘ " ..vuou. 4: ..unou. .1 4.4143211. .1 «2942414. .1 5.09796“. 41
""“"’ ' ’ " ..mun. .1 4.74-4.12. 9: 7.1242443. 91 ..umu. n ..mnn. u
. "“‘m ' ' " 2.744911...) Lanna” human” Lame...” 1.30.3.4»...
how" 4 0 ‘I
7......- "0 ,, 2m... . 49 ...-no..- mm" 1. 14 nut-1m" .m-
uuuu . a. 744.1
"W161 100. to: me tutu-u.
Ioatu'v “lent-an: ...-av. .- 4-4!!- I or 0 . MIDI“ nan-n4 '7‘!- “mu 5 U m "it" 1°
7...... , , ., .... «run: 1 or no.— taut-4
no..." , = ,, ...naau. n ...ueau. .1 7.4949911. 11 nor-.9... u ..uvuxs. u
...“... . , ., 70414.11- 11 uncut:- 21 mm:- n vane".- n 4.91.3.1”. a:
...“... . . ., ..m'm- 11 7.34741". n ..ma. :1 7.142.114. 11 ..ooau. v
9...... . s ., 3.4.334... .1 ..muta- 4: ..umu. .1 9.4447".- 41 ..mms. .1
1.-.... . , ... ...nvsu. :1 7.141991:- 41 7.3006413. 41 “.44).“. 4: 9.1344319. 93
. . 4.7412041. 43 4.0.7.20. 41 ..m3. 4) 5.l712“44 4: 4.7463113. 41
z : ..nnan. n ...:nsn. 71 7.419441). 71 ..unua. 7: 4.00033“- 91
Inn-.124"..- .. 1"”... 0.1444411. 41 s-aema- 41 ...139441. .1 ....mu. 4) 4.9779749. 41
“an". " ”M”. «449.01.. .1 ..unrn. 91 ....14411. .1 ..muu. u ...ruu- 91
1.4.4“... .41 1.041412. .41 «297141). 1.) 1.9939314. ... 1.4744413. 141

 

 

 

 

 

 

 

APPENDIX G

VOICE IDENTIFICATION RESULTS: OPERATIONS 1 THROUGH 24

 

 

 

 

131.

 

APPENDIX_G(1)
Unknown speakers by telephone IDS; known speakers by normal IDS

 

.11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(1,1) Result Dist. with(1)_ Result
1 1 1.7777 (3,9) wrong
2 1.5292 (4.1) correct
3 1.8175 (5,9) wrong 2.3586 (1) correct
4 1.0640 (3,5) wrong
5 1.6315 13.5) wronggg
2 1 1.8673 (1.2), correct
2 2.1925 (3,2) correct
3 1.9993 (3,2) correct 3.4640 (2) correct
4 1.6759 (3,2) correct
5 1.6037 (3,2) correct
3 1 j.4836 (1,5) wrong I
2 1.2609 (1.5) wrong
3 1.1477 (1.5) wrong 2.5344 (1) wrong '
4 1.5970 (1.1) wrong
____ 5_ 2.0373 (5.31 correct
4 1 1.4600 (1.4) correct
2 1.8592 (1,4) correct
3 1.6361 (2,4) correct 2.2995 (4) correct
4 2.6929 (5,8) wrong
5 1.9339 (1.5) wrong,
5 1 1.5335 (2,3) wrong
2 1.9991 (4,8) wrong
3 1.5074 (5.1) wrong 2.0586 (5) correct
4 1.3402 (2,3) wrong
5 1.2415 (5.5) correct
6 1 1.5413 (1.6) correct
2 2.2373 (5,1) wrong
3 2.2056 (1.8) wrong 2.8626 (6) correct
4 2.1080 (1.6) correct
5 1.5771 (1,10) wrong
7 1 1.7083 (3,5) wrong
2 1.6675 (4,4) wrong
3 1.9032 (4,7) correct 2.6385 (7) correct
4 1.6261 (5,7) correct
5 1.6496 (3,7) correct
8 1 , . 1.8931 (5,8) correct
2 0.9828 (3,8) correct
3 1.8621 (3.5) wrong 2.8592 (4) wrong
4 1.8286 (4,5) wrong
5 1.7547 (3.8) correct
9 1 2.0119 (5,7) wrong
2 1.7101 (2,9) correct
3 2.4109 (3,7) wrong 2.5586 (1) wrong
4 1.2693 (5.9) correct
5 1.7384 (4,3) wrong
10 1 1.7599 (3.10) correct
2 1.3788 (4.10) correct
3 1.9221 (4,10)- correct 2.32 ‘
4 1.7877 (5.10) -correct 97 (10) correct
5 1.9251 (5.10) correct
i - sample index Number of correct I.D. - 26 - 7
j - speaker index Rate . 52 z, . 70 z

 

 

 

 

 

 

 

Unknown speakers by telephone IDS; known speakers by linear IDS.

132

APPENDIX 0(2)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(i,j) Result Dist. with(1) Recult
l 1 1.6555 (4,3) wrong
2 2.1921 (2,3) wrong
3- 1-3730 (2.1) correct 2.7161 (1) correct
4 1.7904 (5,9) wrong
5 1.7447 (2,4) wrong
2 1 1.3595 (5,2) ‘ correct
2 1.6832 (1,2) correct
3 2.1007 (4.2) correct 2.6780 (2) correct
4 2.3101 (4,2) correct
5 1.9775 (4,2) correct
3 PI' 2.1425 (5,1) wrong
2 1.8025 (4,5) wrong
3 1.6916 (5.3) correct 2.4885 (3) correct
4 1.9905 (2,1) wrong
L 2-3532 15-31 carrier;
4 1 1.4871 (1,4) correct
2 1.7267 (2,4) correct
3 1~9639 (5.7) wrong 2.6164 (4) correct
4 2.2227 (3,5) wrong
5 1.0971 (3.5) wrong_
5 1 1.5373 (1,3) wrong
2 2.1961 (5,8) wrong
3 1.6682 (4.3) wrong 2.3964 (3) wrong
4 1.6418 (4,9) wrong
5 1.7190 (3,9) wrong
6 1 1.7451 (1,6) correct
2 1.6678 (4,10) wrong
3 1.3089 (1,6) correct 2,7952 (3) wrong
4 1.9476 (4,8) wrong
5 1.9751 (2,10) wrong
7 1 1.4434 (5,7) correct
2 1.6036 (2,7) correct
3 1.4319 (3.7) correct 3,0197 (7) correct
4 1.3947 (2,7) correct
5 1.8256 (4,7) correct
8 1 1.6241 (1,4) wrong
2 1.7658 (3,6) wrong
3 1.2124 (3,8) correct 2.7587 (4) wrong
4 2.1096 (5,8) correct
5 1.2452 (1,4) wrong A
9 1 2.1398 (2,9) correct
2 1.7909 (5,1) wrong
3 2.3345 (5,9) corrent' 2,3591 (1) wrong
4 1.3858 (5,1) wrong
5 1.9867 (5,9) correct
10 1 1.4806 (5,10) correct
2 1.9687 (1,10) correct
3 1.7139 (2,10) correct 2,4750 (10) correct
4 1.5606 (5,10) correct
5 1.8269 (5,10) correct
1 - sample index Number of correct I.D. - 28 . 6
j - speaker index Rate - 56 2 . 60 Z

 

1133

APPENDIX G (3)
Unknown speakers by telephone LTAS known speakers by normal LEAS

ﬁ

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(i,j1) Result Dist. with(:1) Result
1 1 2.3676 (3,6) wrong
2 2.4635 (1,1) correct
3 2.2858 (5,5) wrong 3.0963 (6) wrong
4 2.2326 (1,1) correct
5 2.6622 (1,5) wrong
2 1 3.5451 (2,7) wrong
2 2.2639 (4,2) correct
3 3.5966 (3,2) correct 4.0967 (2) correct
4 3.1266 (4,2) correct
5 3.1417 (4,2) correct
3 ‘1' 1.9319 (3,3) correct
2 1.7198 (3,9) wrong
3 2.3838 (3,3) correct 2.4546 (3) correct
4 2.3213 (4,3) correct
__ L 2-2562 (3:4) _ngmlg
4 1 2.5638 (1,4) correct
3 2-3327 (1-4) correct 2.9174 (4) correct
4 2.4394 (1,4) correct
5 2.8335 (4,5) wrong
5 1 1-305171T;3) correct
2 2.1310 (2,5) correct
3 2-1985 (1,5) correct 2.5603 (5) correct
4 2.2406 (4.7) wrong
5 2.2180 (5.§) COI‘Z'th
6 1 2.2113 (1.6) correct
2 1.7672 (1.5) correct
3 1-4344 (1.6) correct 2.5684 (6) correct
4 2.2084 (1.6) correct
5 2.3095 (1.6) corrggt
7 1 1.8280 (3.3) wrong
2 2-0575 (4.7) correct
3 2.1566 (3,9) wrong 2.5092 (3) wrong
4 2.2262 (4.7) correct
5 1.57004(5,5) wrong
8 1 1.1033 (2,8) correct
2 1.2544 (2,8) correct
3 1.3869 (4,8) correct 2.0834 (8) correct
4 1.2812 (4,8) correct
5 1.3601 (1,8) correct
9 1 2.4932 (3.10) wrong
2 1.9309 (5,9) correct
3 1.9735 (5.9) correct 2.8942 (9) correct
4 2.7607 (3.4) wrong
5 1.9386 (4,9) corrggt
10 1 2.5930 (4.8) wrong
2 1.4026 (1,10) correct
3 2.1133 (1,10) correct 2.6961 (8) wrong
4 2.2182 (4.10) correct
5 2.0961 (1,8) wrong
1 . sample index Number of correct I.D. - 35 '
j - speaker index Rate - 70 Z ' 70 z

 

 

 

 

 

 

 

APPENDIX G(4)
Unknown Speakers by telephone LTAS known Speakers by linear LTAS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample- by the nearest-neighbor by the minimum set
('1) (1) Dist. with(i,_‘L) Result Dist. with(j) Result
1 1 2.3288 (1,5) wrong
2 2.7100 (1,7) wrong
3 2.4538 (5,5) wrong 3.1008 (1) correct
4 2.7360 (3,1) correct
5 2.5760 (1,5) wrong
2 1 3.2513 (1,7) wrong
2 2.2536 (4,2) correct
3 4.4801 (2,2) correct 3.6387 (7) wrong
4 3.6024 (4,2) correct
5 3.3782 (4,2) correct
3 1 1.5379 (3,3) correct
2 2.0567 (2,3) correct
3 2.3577 (3,3) correct 2.0171 (3) correct
‘ 4 1.7694 (3,3) correct
____ .Ji 1-5903 (2.31 correct_
4 1 2.3857 (4,7) wrong
2 2.3144 (1,4) correct
3 2.0629 (4,7) wrong 3.1618 (4) correct
4 2.1618‘(3,5) wrong
5 2.2359 (4,5) wrong
5 1 1.3720 (1,5) correct
2 1.7971 (4.5) correct
3 1.9521 (5,5) correct 2.5686 (5) correct
4 1.9209 (4,5) correct
5 2.0852 (5,5) correct
6 1 2.1020 (2,6) correct
2 2.0708 (2,6) correct
3 1.5473 (1,6) correct 2.4420 (6) correct
4 1.3797 (2,6) correct
5 1.9305 (2.6) correct
7 1 2.1016 (1,3) wrong
2 2.4252 (5,7) correct
3 2.6531 (4,7) correct 2.6905 (3) wrong
4 2.8138 (4.3) wrong
5 2.8822 (5.7) correct
8 1 2.0032 (3,8) correct
2 1.7706 (3,8) correct
3 2.2102 (3,8) correct 2.4422 (8) correct
4 2.1054 (3.8) correct
5 1.7313 (5.8) correct
9 1 1.9003 (1,9) correct
2 1.7539 (3.9) correct
3 1.7371 (3.9) correct 2.5414 (9) correct
4 2.2954 (2,6) wrong
- 5 2.2359 (3.9) correct
10 1 212030 (4,8) wrong
2 1.9102 (3.8) wrong-
3 2.3798 (4,8) wrong 3.3879 (8) wrong
4 2.0649 (3,8) wrong
5 1.9134 (4,8) wrong
i - sample index Number of correct I.D. . 33 u 7
j - speaker index Rate - 66 Z - 70 z

 

 

 

 

 

APPENDIX G (5)
Both unknown and known speakers by telephone IDS.

135

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (i) Dist. with(i,j) Result Dist. with(1)_ Result
1 1 1.6114 (5,8) wrong
2 1.5670 (1,3) wrong
3 1.6030 (4,9) wrong 2.3475 (9) wrong
4 1.3874 (5,9) wrong
5 2.6277 (2,7) wrong
2 1 3.1701 (2,7) wrong
2 2.4383 (3,2) correct
3 2.1099 (4,2) correct 3.1701 (2) correct
4 1.9481 (5,2) correct
5 1.9378 (4,2) correct
3 1 1.5671 (2,1) wrong
2 1.5846 (3.3) correct
3 1.4708 (2.3) correct 2.4615 (3) correct
4 1.9546 (5,9) wrong
__ j 131956 (3-3) mm
4 1 1.3381 (3,4) correct
2 1.7394 (1,4) correct
3 1.3125 (1,4) correct 2.3490 (8) wrong
4 2.2637 (5,4) correct
5 1.7677 (3,5) wrong,
5 1 0.9872 (4,5) correct
2 1.9600 (2,4) wrong
3 1.4251 (1,5) correct 2.4753 (1) wrong
4 0.9874 (1.5) correct
5 1.6024 (4,5) correct-
6 1 1.7376 (4.6) correct
2 1.8982 (1,1) wrong
3 1.8080 (4,6) correct 2.5765 (10) wrong
4 1.8392 (1,6) correct
5 2.3214 (1,10) wrong_
7 1 1.9659 (4,1) wrong
2 1.6277 (5,1) wrong
3 1.8614 (4,7) correct 2.4696 (7) correct
4 1.7286 (2,7) correct
5 1.7756 (5,9) wrong
8 1 1.9961 (1,4) wrong
2 1.7430 (5,8) correct
3 1.8135 (1,4) wrong 2.3234 (8) correct
4 1.3860 (1,4) wrong
5 1.6114 (1,1) wrong,
9 1 1.8976 (3.9) correct
2 1.6164 (4,9) correct
3 1.8860 (1,9) correct 2.3502 (9) correct
4 1.5055 (4,5) wrong
5 1.3874 (4,1) wrong,
10 1 1.7663 (4,10) correct
2 2.0853 (3.10) correct
3 1.9005 (2,6) wrong 2.3203 (10) correct
4 1.6498 (5.10) correct
5 1.0461 (4.10) correct
1 - sample index Number of correct I.D. - 29 - 6
j - speaker index Rate - 58 Z - 60 z

 

 

 

 

136

APPENDIX G(6)
Both unknown and known speakers by normal IDS.

 

_—

 

__—

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(i,_1) Result Dist. withgj) Result
1 1 1.5889 (5.1) correct
2 1.7836 (1.5) wrong
3 1.4125 (4,1) correct
4 1.5359 (3,1) correct
5 1.7123 (1L1) correct 2,4946 (3) wrongg
2 1 2.3320 (5,2) correct
2 2.6436 (3,2) correct
3 2.0921 (5,2) correct
4 2.5106 (3,2) correct
5 2.2155 (3,2) correct 3.1067 (2) correct
3 1 1.3037 (2,5) wrong
2 1.2637 (1,5) wrong
3 1.7365 (1,3) correct
4 1.7695 (1,1) wrong
5 1.8087 (2,1) wrong 2.4236 (4) wrong;
4 1 1.8115 (2.5) wrong
2 1.6386 (3,4) correct
3 1.7621 (2.4) correct
4 2.0575 (3.4) correct
5 ' 2.1941 (3,5) wrong, 2.3642 (4) correct
5 1 1.2637 (2,3) wrong
2 1.3037 (1,3) wrong
3 1.6134 (1.9) wrong
4 1.7977 (2,5) correct
5 1.7198 (1.5) correct 1.0858 (5) correct
6 1 2.4533 (2,6) correct
2 2.3671 (2,8) wrong
3 2.5581 (4,6) correct
4 2.5285 (5,6) correct
5 2.6519 (4,6) correct 2.8170 (6) correct
7 1 2.0259 (3,7) correct
2 2.5177 (3,7) correct
3 2.0802 (1,7) correct
4 2.0575 (5,7) correct
5 2.1809 (4,7) correct 3.1426 (7) correct
8 1 2.4965 (2,8) correct
2 1.8168 (4,3) wrong
3 1.8904 (3,9) wrong
4 2.2565 (3,4) wrong
5 2.1575 (3.3) wrong, 3.2555 (9) wrong
9 1 1.6134 (3.5) wrong
2 2.2370 (3,9) correct
3 1.7958 (3.1) wrong
4 2.0560 (5,9) correct
5 2.0016 (4.9) correct 2.3946 (9) correct
10 1 1.8376 (5,10) correct
2 1.9327 (4,10) correct
3 2.5047 (2,6) wrong
4 1.9870 (2,10) correct
5 2.0426 (1.10) correct 2.6479 (10) correct
1 - sample index Number of correct I.D. - 32 - 7
j - speaker index Rate - 64 Z - 70 Z

 

 

 

 

1137

APPENDIX G ( 7)

Both unknown and known speakers by linear IDS.

 

 

Identification decision

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Speaker Sample by the nearest-neighbor by the minimum set
j (i) Dist. with(1.j) Result Dist. with(j) Result
1 1 2.0848 (1,4) wrong
2 2.2532 (5,1) correct
3 1.8320 (1,3) wrong
4 2.2975 (3,1) correct
__f- 5 2.0145 (2,9) wrong 2.8420 (1) correct
2 1 1.8041 (3,2) correct
2 1.6173 (3,2) correct
3 1.6716 (2,2) correct
4 1.8506 (2,2) correct
5 2.6898 (4,2) correct 2.8871 (2) correct
3 1 1.8320 (3,1) wrong
2 2.1846 (1,3) correct
3 1.9999 (1,3) correct
4 1.9483 (1,5) wrong
5 2.2215 (5,5) wrong 2.5094 (1) wrong
4 1 1.8429 (3.5) wrong
2 1.6622 (2,5) wrong
3 1.2410 (4,4) correct
4 1.4460 (3,4) correct
_5 2.0644 (3,4) correct 2.4883 (4) correct
5 1 1.8700 (1,3) wrong
2 1.6622 (2,4) -wrong
3 1.8429 (1.4) wrong
4 1.7624 (3,9) wrong
5 1.9707 (4,5) correct 2.4894 (5) correct
6 1 2.3903 (4,6) correct
2 2.1771 (5.6) correct
3 2.2580 (1,8) wrong
4 2.0699 (5,6) correct
5 2.0536 (2,6) correct 2.7320 (6) correct
7 1 2.1139 (3,7) correct
2 1.6166 (5,7) correct
3 2.2373 (1,7) correct
4 2.4234 (2,7) correct
5 1.8216 (2,7) correct 2.7158 (5) wrong
8 1 1.8702 (3,8) correct '
2 2.0453 (4,8) correct
3 1.8159 (1.8) correct
4 1.8968 (5,8) correct
5 1.6918 (4,8) correct 2.3813 (8) correct
9 1 2.8487 (5,9) correct
2 1.9816 (4,9) correct
3 1.6758 (3,4) wrong
4 1.8581 (2,9) correct
5 1.7386 (2,4) wrong 2.6370 (4) wrong
10 1 2.0094 (5.10) correct
2 1.6545 (4,10) correct
3 1.7508 (5,10) correct
4 1.7779 (2,10) correct
5 1.6576 (2,10) correct 2.0094 (10) correct
Correct rate - 70 Z - 70 z

i - sample index
j - speaker index

 

138

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

j - speaker index

APPENDIX G (8)
Both unknown and known speakers by telephone LTAS
Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(1) (1) Dist. with(i,j) Result Dist. with(j) Result
1 1 1.6922 (5.2) Correct
2 1.2325 (3,1) correct
3 1.2868 (2,1) correct
4 2.4086 (2,1) correct
5 1.5688 (1,1) correct 2.8117 (1) correct
2 1 2.6566 (4,2) correct
2 2.0052 (5,2) correct
3 2.5504 (5,2) correct
4 1.6415 (5,2) correct
5 1.8465 (4,2) correct 2.8150 (2) correct
3 1 1.2647 (4,3) correct
2 2.0079 (5,3) correct
3 1.8817 (4,3) correct
4 1.1413 (1,3) correct
5 1.5908 (1,3) correct 2.2982 (3) correct
4 1 0.6117 (3,4) correct
2 1.9871 (4,4) correct
3 0.6660 (1,4) correct
4 1.4275 (3,4) correct
5 1.9974 (4,4) correct 1.9974 (4) correct
5 1 1.9415 (2,5) correct
2 1.7755 (4,5) correct
3 1.4398 (5,5) correct
4 1.7212 (2,5) correct
5 1.3164 (3,5) correct 2.4355 (5) correct
6 1 1.1004 (3,6) correct
2 1.7214 (3,6) correct
3 1.2239 (1,6) correct
4 1.3003 (3,6) correct
5 1.5774 (4,6) correct 1.8204 (6) correct
7 1 1.1491 (4,7) correct
2 0.8082 (3,7) correct
3 0.9317 (2,7) correct
4 0.9440 (1,7) correct
5 1.7216 (3,7) correct 1.7216 (7) correct
8 1 1.6059 (4,8) correct
2 1.3370 (4,8) correct
3 1.2555 (4,8) correct
4 1.4606 (3,8) correct
5 2.0294 (1,8) correct 2.0294 (9) correct
9 1 1.6242 (2,9) correct ‘
2 1.4756 (3,9) correct
3 1.3522 (2,9) correct
4 1.9257 (5,9) correct
5 1.4617 (3,9) correct 2.2893 (9) correct
10 1 1.9326 (3,10) correct
2 1.5187 (3,10) correct
3 1.5730 (2,10) correct
4 1.6352 (5,10) correct
5 1.5421 (2,10) correct 1.9890 (10) correct
1 - sample index Number of correct I.D. - 50 a 10
Rate - 100 Z - 100 Z

 

 

1139

APPENDIX G( 9)
Both unknown and known speakers by normal LTAS

 

J T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(1) (1) Dist. with(i,j) Result Dist. with(j) Result_
1 1 1.4748 (3,1) correct
2 1.5367 (1,1) correct
3 1.5983 (1,1) correct
4 1.9904 (1,1) correct
5 1.7460 (1.5) wrong 2.1580 (1) correct
2 1 2.3024 (4,2) correct
2 2.4877 (5,2) correct
3 1.9504 (4,2) correct
4 l.7453'(3,2) correct
5 2.4345 (1,2) correct 2.9751 (2) correct
3 1 0.6638 (5,3) correct
2 1.1126 (5,3) correct
3 0.6853 (1,3) correct
4 1.7203 (1,3) correct
5 1.7181 (1.3) correct 1.7203 (3) correct
4 1 0.7979 (4,4) correct
2 1.1831 (5,4) correct
3 0.8601 (1,4) correct
4 0.9213 (1,4) correct
5 1.2374 (2,4) correct 1.9373 (4) correct
5' 1 1.1742 (2,5) correct
2 1.2977 (1,5) correct
3 1.8747 (2,5) correct
4 1.2829 (2,5) correct
5 2.1601 (3,5) correct 2.1601 (5) correct
5 1 1.1445 (3,6) correct
2 1.1783 (3,6) correct
3 1.0902 (1,6) correct
4 1.6034 (1,6) correct
5 1.4621 (1,6) correct 1.6034 (6) correct
7 1 1.4135 (2,7) correct
2 1.5370 (1,7) correct
3 1.4139 (2.7) correct
4 1.8349 (5.7) correct
5 1.6299 (4,7) correct 2.1398 (7) correct
8 1 1.4018 (3,8) correct
2 1.3827 (3,8) correct
3 1.5878 (2.8) correct
4 1.4099 (3,8) correct
5 1.4210 (3.8) correct 1.4210(8) correct
9 1 1.8797 (2,9) correct
2 1.7563 (1,9) correct
3 1.5394 (4,9) correct
4 1.5537 (5,9) correct
5 1.4302 (4.9) correct 2,4886 (9) correct
10 1 1.3632 (5.10) correct
2 1.5876 (5,10) correct
3 2.0014 (4,10) correct
4 1.7963 (3,10) correct
5 114171 (1.10) correct 2.3288(10) correct
1 . sample index Number of correct I.D. - 49 ' 10
j ' speaker index Rate e 98 Z - 100 Z

 

 

 

141)

APPENDIX G(10)
Both unknown and known speakers by linear LTAS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(1) (1) Dist. with(i,J) Result Dist. with(j) Result
1 1 1.6216 (4,1) correct
2 2.5226 (1,1) correct
3 1.1635 (5,1) correct
4 1.4345 (5,1) correct
5 1.0019 (3,1) correct 2.5994 (1) correct
2 1 1.7803 (2,2) . correct
2 2.0002 (1,2) correct
3 1.9460 (4,2) correct
4 1.7437 (3,2) correct
5 1.8164 (3,2) correct 2.0713 (2) correct
3 1. 1.0323 (3,3) correct
2 1.3142 (5,3) correct
3 0.9088 (1.3) correct
4 1.1386 (1,3) correct
5 1.1092 (2,3) correct 1.5842(3) correct
4 1 1.8273 (3,4) correct
2 1.6194 (5,4) correct
3 1.2902 (4,4) correct
4 1.1721 (5,4) correct
5 1.2264 (4.4) correct 1.9545 (4) correct
5 1 1.1776 (2,5) correct
2 1.1068 (4,5) correct
3 1.7464 (4,5) correct
4 1.3118 (2,5) correct ‘
5 1.9072 (2,8) wrong 2.2466 (5) correct
6 1 1.1335 (3,6) correct
2 1.5933 (1,6) correct
3 1.0792 (1,6) correct
4 1.2828 (1,6) correct
5 1.6387 (3,6) correct 1.7635 (6) correct
7 1 1.1346 (3,7) correct
2 1.5510 (5,7) correct
3 1.0468 (5,7) correct
4 1.6726 (5,7) correct
5 1.1702 (3,7) correct 1.6726 (7) correct
8 1 1.6215 (2,8) correct
2 1.5671 (1,8) correct
3 1.5418 (4,8) correct
4 1.6650 (5,8) correct
5 1.4599 (4,8) correct 1.9684(8) correct
3 1 2.0199 (2.9) correct
2 1.3856 (5,9) correct
3 1.3715 (2,9) correct
4 1.7271 (5,9) correct
5 1.2622 (2,9) correct 2.1987 (9) correct
9 1 1.8211 (4,10) correct
2 1.7614 (4,10) correct
3 1.9976 (5,10) correct
4 1.9664 (2,10) correct
5 1.9681 (1,10) correct 2.3926 (10) correct
1 - sample index Number of correct I.D. - 49 - 10
j - speaker index Rate - 98 Z - 100 Z

 

 

 

141.

APPENDIX G(11)
Unknown Speakers by telephone FFC; known speakers by normal FFC.

___

 

 

 

 

 

——T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) gist. with(i,J) Result Dist. with(1) Result
1 1 1.1428 (2,3) wrong
2 1.3478 (3,3) wrong
3 0.8861 (3.1) correct 1.6879 (1) correct
4 1.2316 (2,3) wrong
5 1.5837g(5,1) correct
2 1 1.2170 (1,2) correct
2 1.1113 (4,2) correct
3 1.2383 (5,2) correct 2.5539 (2) correct
4 1.2113 (5,2) correct
5 1.1452413351_ wrong
3 17* 0.8706 (2,8) wrong
2 0.5634 (4,3) correct
3 1.3086 (2,1) wrong 3.0001 (7) wrong
4 1.3544 (5,9) wrong
___- Ji 0 9553,72-11 “Tang,
4 1 1.2479 (1,4) correct
2 1.4083 (4.4) correct
3 0.9067 (4,8) wrong -2.8700 (8) wrong
4 2.3305 (1,4) correct
5 1.4067 (Z-Z)__ wrong
5 1 0.6568 (3,3) wrong
2 0.7151 (4,3) wrong
3 1.1855 (3,7) wrong 2.9900 (7) wrong
4 1.8224 (5,5) correct
5 1.2930 (5.5) licorr§g§_
6 1 1.9926 (1,6) correct
2 1.5376 (1,6) correct
3 1.5371 (1,6) correct 2.7445 (6) correct
4 1.4852 (1,6) correct
5 2.0072 (1.6) corrgg;
7 1 1.0438 (2,4) wrong
2 1.9616 (4,7) correct
3 0.7965 (2,8) wrong 3.2124 (9) wrong
4 0.7700 (3,5) wrong
5 0.7624-(1,7), corrggr_,
8 1 1.3854 (4,8) _correct
2 1.5792 (3.4) wrong
3 1.2232 (3,4) wrong 1.9745 (8) correct
4 1.3309 (5,4) wrong
5 1.6731 (5.3) wrong-
9 1 1,2714 (2,7) wrong
2 0.5531 (3.9) correct
3 0.9244 (2,4) wrong 2.0333 (7) wrong
4 0.7184 (4,7) wrong
5 1.1367 (3.9) corrggt
10 1 1.2418 (3.10) correct
2 1.2231 (3,10) correct
3 3.1401 (5,6) wrong 2.9788 (10) correct
4 1.4187 (4,10) correct
5 2.2191 (4,10) correct
1 - sample index Number of correct I.D. - 26 3 5
j I speaker index Rate - 52 Z - 50 Z

 

 

1112

APPENDIX G(12 )
Unknown speakers by telephone FFC;

J

 

 

 

 

 

L

 

 

 

known speakers by linear FFC.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(i,_1) Result Dist. witth) Result
1 1 1.7007 (1,1) correct
2 1.0547 (3.9) wrong
3 1.8466 (2.1) correct 1.9787 (3) wrong
4 1.5381 (1,1) correct
5 2.1778 (5,1) correct
2 1 1.4248 (2,2) correct
2 1.5386 (4,2) correct
3 0.9873 (1,2) correct' 2.3223 (2) correct
4 1.0944 (4,2) correct '
5 2.1796 (4,2) correct
3 1 0.9494 (2.8) wrong
2 0.8041 (4.7) wrong
3 2.2421 (4,9) wrong 2.4319 (7) wrong
4 0.8285 (5.3) correct
__ i 1;8538'(i-11 wrong
4 1 1 4131 (1,4) correct
2 1 3132 (1,8) wrong
3 0.9596 (5.4) correct 1.7576 (8) wrong
4 2.8226 (2.3) wrong
5 0.9950 (3.8) wrong_
5 1 0 4113 (3,9) wrong
2 1 0196 (3,7) wrong
3 0.8494 (4,3) wrong 1.8548 (7) wrong
4 1.8840 (1.5) correct
5 1.6649 (3.5) correct
6 1 1.2306 (5.6) correct
2 2.2216 (4,6) correct
3 1.2181 (4.6) correct 3.0534 (6) correct
4 1.7259 (1,6) correct
5 2.0479 (211024Awrong
7 1 0.9909 (4,9) wrong
2 0.8129 (4,9) wrong
3 0.9098 (5,8) wrong 1.7310 (9) wrong
4 0.8707 (5.8) wrong
5 0.6740 (54]) correct
8 1 0.9648 (1.8) wrong
2 1.6674 (2,8) correct
3 0.5557 (4,8) correct 1.8843 (8) correct
4 1.1638 (1,8) correct
5 1.6222 (515) wrong
9 1 1.2716 (1,7) wrong
2 0.8444 (1.9) correct
3 0.7246 (1,7) wrong 1.8902 (7) wrong
4 0.8177 (4,7) wrong
5 1.5685 (3.9) correct
10 1 0.3579 (1.10) correct
2 1.4299 (1,10) correct
3 2.8851 (4,10) correct 3.0185 (10) correct
4 1.0524 (3,10) correct
5 1.0439 (3.10) correct
1 - sample index Number of correct I.D. - 29 - 4
j - speaker index Rate - 58 Z - 40 Z

 

 

 

 

 

 

 

Both unknown and known speakers by telephone FFC.

 

143

APPENDIX G( 13)

___
__

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(1) (1) Dist. with(1.1) Result Dist. with(j) Result
1 1 1.2070 (4,1) correct
2 1.1192 (1.5) wrong
3 1.8451 (1,3) wrong
4 1.3057 (1,1) correct
5 1.9226 (5,3) wrong 2.3325 (9) wrong
2 1 1.6466 (4.3) wrong
2 1.7131 (3,2) correct
3 1.3898 (2,1) wrong
4 1.8547 (5,2) correct
5 1.5961 (4,2) correct 1.9433 (2) correct
3 1 1.1875 (3,7) wrong
2 1.0107 (2,5) wrong
3 1.4440 (5,3) correct
4 1.6466 (1.2) wrong
5 1.2807 (1,4) wrong 2.3315 (7) wrong
4 1 1.3795 (5,3) wrong
2 1.5756 (5.8) wrong
3 1.0004 (4,7) wrong
4 1.6816 (1,4) correct
5 1.1115 (2,7) wronggi 2.0275 (8) wrong
5 1 0.8933 (5,7) wrong
2 0.6366 (3.9) wrong
3 1.2392 (3,7) wrong
4 1.9150 (5,8) wrong
5 1.9986 (4,5) correct 1.7363 (7) wrong
6 1 1.3464 (3.6) correct
2 1.9976 (5,6) correct
3 1.6476 (1,6) correct
4 1.9962 (5,6) correct
5 1.6168 (1,6) correct 1.9976 (6) correct
7 1 0.8063 (2,7) correct
2 0.7863 (3,9) wrong
3 1.1875 (1,3) wrong
4 1.0004 (3,4) wrong
5 0.8933 (1,5) wrong 1.6709 (7) correct
8 1 1.2497 (3.4) wrong
2 1.4292 (3,8) correct
3 1.5328 (2,8) correct
4 1.7872 (1,8) correct
5 1.5636 (2.8) correct 2.2270 (8) correct
9 1 1.2398 (2,7) wrong
2 1.5109 (5,9) correct
3 0.6366 (2,5) wrong ,
4 0.9424 (5,7) wrong
5 1.2523 (2.9) correct 1.4046 (7) wrong_
10 1 1.0207 (4,10) correct
2 1.1375 (4,10) correct
3‘ 3.6933 (3,6) wrong
4 0.9220 (1,10) correct
5 1.0978 (4.10) correct 3.9223 (7) wrong
i - sample index Number of correct I.D. - 24 u 4
j - speaker index Rate - 48 z - 40 Z

 

 

Both unknown and known speakers by normal FFC.

 

 

1411

APPENDIX G(14)

4-

L

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(j) (i) 'Dist. with(i,j) Result Dist. with(j) Result
1 1 0.8283 (2.3) wrong
2 0.9473 (4,5) wrong
3 1.4701 (2.5) wrong 2.3165 (5) wrong
4 1.2608 (2,3) wrong
5 1.3943 (2,3) wrong
2 1 1.8760 (5.2) correct
2 1.1495 (1,5) wrong
3 2.1275 (5,2) correct 2.1275 (2) correct
4 1.2417 (5.2) correct
5 1.1429 (4.2) correct
3 1 2.0839 (5.3) correct
2 0.8283 (1.1) wrong
3 0.7673 (4.3) correct 1.2440 (7) wrong
4 0.6685 (3.3) correct
___. 5. 1.7827.(l.31, corrgpt
4 1 1.1918 (3,4) correct
2 0.6282 (4.7) wrong
3 1.2954 (1.4) correct 1.4211 (7) wrong
4 1.4253 (2.8) wrong
5 1.3122 (3.4) correct
5 1 1.1495 (2.2) wrong
2 1.4701 (3.1) wrong
' 3 1.4735 (5,2) wrong 2.1218 (7) wrong
4 0.9473 (2.1) wrong
5 2.2079 (2.2) wrongi
6 1 1.2855 (3.6) correct
2 1.7038 (4,6) correct
3 0.8738 (4,6) correct 1.8526 (6) correct
4 1.1382 (3.6) correct
5 1.7863 (4.10) wrong
7 1 0.6297 (2,7) correct
2 0.5310 (1.7) correct
3 0.9912 (2,7) correct 1.2240 (7) correct
4 0.6257 (1.7) correct
5 1.1836 (3.3) wrong
8 1 1.8702 (3.5) wrong
2 1.0018 (2.3) wrong
3 1.6076 (1.4) wrong 1.8001 (1) wrong
4 1.4763 (3.7) wrong
5 1.2645 (3.9) wronggﬂ
9 1 1.3790 (4,7) wrong
2 0.9993 (2,7) wrong
3 1.2645 (5.8) wrong 1.8749 (7) wrong
4 1.5850 (2.7) wrong
5 1.3876 (2.3) wrongi
10 1 019932'T3JTO) correct
2 2.5705 (4.10) correct
3 0.7290 (4.10) correct 25705 (10) correct
4 1.0303 (3.10) correct
5 1.4040 (1.10) correct
1 - sample index Number of correct I.D. - 24 . 4
j - speaker index Rate - 48 Z - 40 Z

 

 

Both unknown and known speakers by linear FFC.

—’

14L5

APPENDIX G(15)

_

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(j) (i) Dist. with(1.1) Result Dist. with(j) Result
1 1 1.0832 (4,1) correct
2 1.2584 (1.1) correct
3 1.0433 (4,1) correct 1.9375 (5) wrong
4 0.9005 (5.3) wrong
5 1.1143 (1,1) correct
2 1 0.7391 (2,2) correct
2 0.8428 (1,2) correct
3 0.8131 (1.2) correct 1.8741 (2) correct
4 1.6192 (2,2) correct
5 1.2699 (3.2) correct
3 1 0.8031 (5.9) wrong
2 4.2353 (5.1) wrong
3 0.8754 (2,9) wrong 1.6365 (9) wrong
4 0.6028 (5.8) wrong
<5 0.9005 (4.1) wrong
4 1 0.9243 (5.4) correct
2 - 1.3800 (2,7) wrong
3 1.5194 (5.7) wrong 1.7212 (8) wrong
4 7.2062 (2.8) wrong
5 1.2256 (1.4) correct
5 1 0.8874 (5.5) correct
2 1.2873 (4.3) wrong
3 0.8830 (1.5) correct 2.3478 (1) wrong
4 3.7324 (2.10) wrong
5 0.7887 (1.5) correct
6 1 3.6106 (4.6) correct
2 1.6442 (4.6) correct
3 1.2236 (4.6) correct 2.5136 (10) wrong
4 1.1199 (3.6) correct
5 1.2410 (2,10) wrong
7 1 0.8159 (2,9) wrong
2 0.9658 (1.9) wrong
3 0.5861 (4.7) correct 1.3057 (7) correct
4 0.4873 (3.7) correct
5 0.4876 (3,7) correct
3 1 1.3840 (5,4) wrong
2 1.2177 (5.4) wrong
3 1.2593 (4.8) correct 1.6033 (8) correct
4 1.1899 (5.4) wrong
5 0.6028 (4.3) wrong
9 1 0.9658 (2,7) wrong
2 0.8159 (1.7) wrong
3 0.5846 (5.9) correct 1.6800 (9) correct
4 0.5988 (5.7) wront
5 0.4809 (3.9) correct
10 1 1.6610 (3,10) correct
2 1.2410 (5.6) wrong
3 1.3597 (1.10) correct 2.8418 (10) correct
a 0.7244 (5.10) correct
5 0.6256 (4.10) correct
1 - sample index Number of correct I.D. - 28 - 5
Rate - 56 Z 50 Z

j - speaker index

2146

APPENDIX G(16)

Unknown speakers by telephone(IDS+FFC); known speakers by norma1(IDS+FFC).

 

#—
L

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(J) (1) Dist. with(i,1) Result Dist. with(j) Result
1 1 3.4344 (1,5) wrong
2 3.3339 (1.5) wrong
3 3.0029 (3.1) correct 3.8780 (5) wrong
4 2.7733 (1.1) correct
5 2.9467 (5.3) wrong
2 1 4.0001 (1,8) wrong
2 2.3910 (4.2) correct _
3 3.9761 (3.2) correct 4.3086 (2) correct
4 3.7089 (4,2) correct
5 3.5406 (3,2) correct
3 1 3.8548 (2.9) wrong
2 3.5361 (3.3) correct
3 4.6579 (4.3) correct 3.6655 (5) wrong
4 3.8641 (4.5) wrong
5 2-2600 (2.3) correct
4 1 2.3104 (1,7) wrong
2 2.0866 (4,4) correct
3 3.0011 (4,8) wrong 4.0164 (4) correct
4 3.0998 (1,4) correct
5 3.1235 (1,4) 'correct
5 1 2.0202 (1.9) wrong'
2 3.0375 (4.9) wrong
3 3.6108 (2,8) wrong 4.6013 (9) wrong
4 3.3980 (2,5) correct
5 2.1731 (5.5) correct
6 1 2.9056 (1.10) wrong
2 2.2243 (1.6) correct
3 2.1478 (1.6) correct 3.4627 (6) correct
4 3.6614 (1.6) correct
5 3.0109 (1,10) wrong
7 1 2.1084 (3,3) wrong
2 2.2581 (4,3) wrong
3 3.0555 (5.7) correct 3.4177 (7) correct
4 2.1761 (5,7) correct
5 3.5554 (5.7) correct
3 1 . 2.9450 (4.8) correct
2 3.0108 (4,8) correct .
3 2.0001 (4,8) correct ,4.0514 (8) correct
4 2.5805 (4,8) cOrrect
5 3.1222 (4,8) correct
9 1 3.2222 (2.9) correct
2 1.8033 (3,1) wrong
3 3.0005 (5.9) correct 3.7224 (1) wrong
4 2.9989 (2,4) wrong
5 2.5412 (3.9) correct
10 1 2.9899 (1.10) correct
2 2.3706 (1.6) wrong
3 3.9989 (2,6) wrong 4.0197 (10) correct
4 3.9203 (4.10) correct
5 3.1369 (4.10) correct
1 - sample index Number of correct I.D. - 31 - 6
j - speaker index Rate - 62 Z - 60 Z

1117

APPENDIX G(17)
Unknown speakers by telephone(IDS+FFC); known speakers by linear(IDS+FFC).

j
j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(1) (1) 013‘- Vith(1.j) Result Dist. with(j) Result
1 1 3.2256 (1,5) wrong
2 3.2207 (5.9) wrong
3 3-3550 (5.5) wrong 3.8575 (1) correct
4 3.2401 (4,1) correct
5 4.3913 (2,5) wrong
2 1 4.4399 (1.7) wrong
2 2.7287 (4,2) correct
3 4.7167 (2.2) correct 4.8425 (7) wrong
4 3.7650 (4,2) correct
5 4-0204 (4.2) correct
3 1 2.4843 (4,1) wrong
2 2.7575 (1,3) correct
3 3.5950 (1.3) correct 3.7410 (3) correct
4 2.1226 (3.3) correct
5. 3.1720 (1.4) wrong
4 1 2.8339 (1,4) correct
2 2.6766 (2,3) wrong
3 2.9562 (1.4) correct 3.6610 (8) wrong
4 3.6870 (1.4) correct
5 3.3761 (4.3) wrong
5 1 2.8435 (5,8) wrong
2 3.2175 (2.7) wrong
3 2.8178 (2.1) wrong 4.0610 (8) wrong
4 3.5123 (1.5) correct
5 3.3593 (1.5) correct
6 1 3.1294 (2,6) correct
2 3.5379 (2.6) correct
3 2.5508 (3.6) correct 3.8011 (6) correct
4 2.4671 (1,10) wrong
5 3.3762 (1.10) wrong
7 1 2.6171 (1,3) wrong
2 2.6221 (5,7) correct
3 3.0940 (4,3) wrong 4.2421 (7) correct
4 2.9949 (4,3) wrong
5 2.9599 (5.3) wrong
3 1 2.4948 (1,8) correct
2 2.8349 (3,8) correct
3 2.3684 (4.8) correct 3.2303 (8) correct
4 3.2630 (4,8) correct
5 3.1400 (4,8) correct
9 1 2.8160 (1.9) correct
2 2.3419 (3.1) wrong
3 2.3018 (3.9) correct 2.9851 (5) wrong
4 2.8760 (4,1) wrong
5 2.7312 (3.9) correct
10 1 3.5804 (1,6) wrong
2 2.8889 (1.10) correct
3 3.7659 (4,6) wrong 4.3465 (10) correct
4 3.0866 (1.10) correct
5 3.4282 (1.10) correct
- dex rrect I.D. - .
1 sample in Number of co Rate - £2 2 . 20 z

j - speaker index

 

148

APPENDIX(3(18)

Unknown speakers by telephone(LTAs-I-EFC); known speakers by normal(LTAS+F_FC)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(5) (i) Dist. with(1.j) Result Dist. with(j) Result
1 1 3.6524 (1,5) wrong
2 3.0951 (1,1) correct
3 3.3446 (1.1) correct 4.4697 (1) correct
4 2.3071 (1,1) correct
5 3.8588 (5,1) correct
2 1 4.2840 (1,2) correct
2 2.6401 (4,2) correct
3 4.5476 (3.2) correct 4.4841 (2) correct
4 3.3295 (4,2) correct
5 3.3492 (4,2) correct
3 1 3.4609 (5,3) correct
2 3.0524 (3.3) correct _
3 2.8537 (1.3) correct 3.9884 (3) correct
4 2.7594 (1,3) correct
____ .5. 2-9426 (513) corrgpt_
4 1 2.9590 (1,4) correct
2 3.0474 (5.4) correct
3 2.8607 (1,4) correct 3.5549 (4) correct
4 2.7287 (1.4) correct
5 3.7803 (1.4) correct
5 1 2.8110 (1,5) correct
2 2.6593 (1,5) correct
3 3.2131 (2.5) correct 4.0908 (5) correct
4 3.4112 (1.3) wrong
5 3.5637 (1,5) correct
6 1 3.2555 (3,6) correct
2 2.9076 (2,6) correct
3 2.3935 (3,6) correct 3.2130 (6) correct
4 3.2903 (3.6) correct
5 2.8991 (3.6) correct
7 1 2,2171 (1.3) wrong
2 2.2879 (4.7) correct
3 2.4743 (3.9) wrong 3.4856 (7) correct
4 2.2801 (4.7) correct
5 2.1019 (5.7) correct
8 1 3.3397 (2,8) correct
2 1.8287 (2.8) correct
3 2.1621 (4.8) correct 3.7699 (8) correct
4 1.9583 (4.8) correct
5 3.3247 (3.8) correct
9 1 3.2522 (1.9) correct
2 2.4636 (5.9) correct
3 2.8187 (3.9) correct 3.3593 (9) correct
4 3.5689 (5.9) correct
5 2.6800 (4,9) correct
10 1 3.5119 (1.10) correct
2 3.9226 (5.10) correct
3 2.5923 (1,10) correct 3.5294 (10) correct
4 2.7880 (4,10) correct
5 3.4344 (2,10) correct
1 - sample index Number of correct I.D. - 46 . 10
j - speaker index Rate - 92 z . 100 z

 

 

 

 

149

APPENDIX G(19)
Unknown speakers by telephone(LTAS+PFC); known speakers by norma1(LTAs+FPC)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (i) Dist. with(1._1) Result Dist. with(j) Result
1 1 3.6944 (1,1) correct
2 3.6321 (1.1) correct
3 3.0198 (1,1) correct 3.8780 (1) correct
4 2.5733 (3,1) correct
5 3.5973 (5.1) correct
2 1 4.2680 (1.2) correct
2 2.5219 (4,2) . correct
3 4.4614 (3,2) correct 4.3086 (2) correct
4 3.6374 (3,2) correct
5 3.8467 (3.2) correct
3 1* 2.8549 (2,3) correct
2 2.3732 (3.3) correct
3 3.5636 (4,3) correct 3.6655 (3) correct
4 3.0286 (4.3) correct
__ _i 2-8259JZJ) correct
4 1 2.8514 (1,4) correct
2 2.7566 (4,4) correct
3 3.0358 (4,4) correct 4.0164 (4) correct
4 3.3737 (4,4) correct
5 3.3922 £1,4) correct
' 5 1 2.8025 (1,5) correct
2 2.5935 (4.5) correct
3 3.1862 (2.8) wrong 4.6013 (5) correct
4 3.3000 (2.5) correct
5 2.5674 (SLS) correct
6 1 2.9766 (1,6) correct
2 2.3424 (1.6) correct
3 2.1369 (1.6) correct 3.4627 (6) correct
4 2.6614 (1.6) correct
5 3.0599 (1.6) correct
7 1 2.4258 (3.3) worng
2 2.2711 (4,7) correct
3 2.5062 (5,7) correct 3.4177 (7) correct
4 2.7195 (5,7) correct
5 2.2268 (5,7) correct
8 1 2.3775 (4,8) correct
2 2.6494 (4.8) correct
3 2.0882 (4.8) correct 4.0514 (8) correct
4 2.7563 (4.8) correct
5 3.0276 (418) correct
9 1 3.2060 (2.9) correct
2 2.7576 (3.9) correct
3 2.4444 (5.9) correct 3.7224 (9) correct
4 3.2542 (2,4) wrong
5 2.3321 (3,9) correct
10 1 3.3501 (1.10) correct
2 2.1988 (1,10) correct
3 4,8714 (2,10) correct 4.0197 (10) correct
4 2.6331 (4,10) correct
5 3.5376 (4,10) correct
1 - sample index Number of correct I.D. - 47 - 10
j - speaker index Rate - 96 Z - 100 Z

 

 

1150

APPENDIX G (20)
Unknown speakers by telephone SPT; known speakers by normal SPT.

 

‘1

 

t

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (i) Dist. with(i,;1) Result Dist. withgj) Result
1 1 0.7505 (5,9) wrong
2 0.8211 (1,9) wrong
3 0.8084 (1.9) wrong 0.8710 (6) wrong
4 0.8783 (1.9) wrong
5 0.8454 (1,9) wrong
2 1 0.9380 (1,9) wrong
2 0.9160 (1,9) wrong
3 1.0769 (1,9) wrong 1.2068 (6) wrong
4 1.1695 (1,9) wrong
5 1.0225 (1,9) wrong
3 1’ 0:8005’(1,9) wrong
2 0.7916 (1.9) wrong
3 0.8082 (1.9) wrong 0.9076 (6) wrong
4 0.7805 (1.9) wrong
___. 4J1, 0.8075 (1.9) wrong,
4 1 0.7306 (1,9) wrong
2 0.8192 (1.9) wrong
3 0.7532 (1,9) wrong 0.9006 (6) wrong
4 0.7757 (1.9) wrong ‘
5 0.7336 (1,9) wrong
5 1 0.8386 (1,9) wrong
2 0.8176 (1.9) wrong
3 0.9402 (1,9) wrong 0.0402 (6) wrong
4 0.8506 (1,9) wrong
5 0.9102 (1.9) wrong;;
6 1 0.8087 (4.6) correct
2 0.7775 (4.10) wrong
3 0.8298 (1.9) wrong 0.9203 (6) wrong
4 0.8365 (4.10) wrong
5 0.8410 (1.9) wrongi
7 1 0.7924 1,9) wrong
2 0.7608 (1.9) wrong
3 10.7669 (5.9) wrong 0.8838 (9) wrong
4 0.7860 (5,9) wrong
5 0.7667 (5.9) wrong
8 1 0.8214 (1.977 wrong
2 0.8072 (1.9) wrong
3 0.7628 (3,10) wrong 0.8422 (6) wrong
4 0.8004 (1.9) wrong
5 0.7753 (5.9) wrong
9 1 0.9158 (1.5) wrong
2 0.8481 (1,9) correct
3 0.8718 (1.9) correct 1.0581 (9) correct
4 0.8567 (1,9) correct
5 0.8462 (1.9) correct
10 1 5—40.8779_(1,9) wrong
2 0.8536 (1,9) wrong
3 0.8698 (1.9) wrong 0.8500 (6) wrong
4 0.7421 (4,6) wrong
5 0.9261 (1.9) wrong
i - sample index Number of correct I.D. - 5 . 2
j - speaker index Rate - 10 Z - 20 z

 

 

 

 

Unknown Speakers by telephone SPT;

1f31

APPENDIX G (21)
known Speakers by linear SPT.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(i,1) Result Dist. with(1)4 Result
1 1 0.7425 (4.9) wrong
2 0.7632 (3.9) wrong
3 0.7670 (2.9) wrong 0.7999 (6) wrong
4 0.7952 (1.9) wrong
5 0.7854 (1.9) wrong
2 1 0i7750 1.9) wrong
2 0.7708 (1.9) wrong
3 0.8332 (1.9) wrong 0.9578 (9) wrong
4 0.9017 (1.9) wrong
3 1 0.7599 (3.677 wrong
2 0.7501 (3.9) wrong
3 0.7722 (3.6) wrong 0.7566 (6) wrong
4 0.7405 (5.6) wrong
__ j 0J564 (3 .5) wrongi
4 1 0.7285 (5.7) wrong
2 0.7926 (2.9) wrong
3 0.7335 (2.9) wrong 0.8018 (6) wrong
4 0.7369 (4,4) correct
5 0.7153 (5.9) wrong
5 1 0.7892 (2.9) wrong
2 0.7974 (2,9) wrong
3 0.8132 (1.9) wrong 0.8600 (9) wrong
4 0.8164 (2,9) wrong I
5 0.8161 (1.9) wrong
6 1 0.7653 (3.3)7 correct
2 0.7372 (3.6) correct
3 0.7601 (3.6) correct 0.8111 (6) correct
4 0.7764 (3.10) wrong
5 0.8045 (5.6) correct
7 1 0.7697 (1.9) wrong
2 0.7485 (5.9) wrong '
3 0.7462 (5.9) wrong 0.8462 (6) wrong
4 0.7892 (5.9) wrong
5 0.7385 (5.9) wrongi
8 1 0.7873 (2.9) wrong
2 0.7739 (2,9) wrong
3 0.7661 (4.9) wrong 0.8074 (6) wrong
4 0.7746 (3.9) wrong
5 0.7776 (3,6) wrong
9 1 0.8104 (1.9) correct
2 0.8105 (3.9) correct
3 0.8181 (3.9) correct 0.8837 (9) correct
4 0.8273 (2.9) correct
5 0.8299 (2.9) correct
10 1 0.8264’(3}3) wrong
2 0.8421 (2.9) wrong
3 0.8534 (2.9) wrong 0.8269 (6) wrong
4 0.7768 (5.7) wrong
5 0.8895 (2,9) wrong
i - sample index Number of correct I.D. - 10 -
J - speaker index Rate - 20 z - 20 z

 

 

 

1152

APPENDIX G(22)
Both unknown and known speakers by telephone SPT.

T
w

_7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (1) Dist. with(1._1) ReSult Dist. with(j) Result
1 1 0.6572 (2.7) wrong
2 0.6497 (3.1) correct
3 0.6497 (2.1) correct 0.6976 (7) wrong
4 0.6726 (2.1) correct
5 0.6759 (2,1) correct
2 1 0.6785 (2.2) correct
2 0.6785 (1.2) correct
3 0.7168 (5.2) correct 0.7367 (2 ) correct
4 0.7150 (5.2) correct
5 0.7050 (4,2) correct
3 If 0:6564 (3.3) correct
2 0.6804 (3.3) correct
3 0.6564 (1.3) correct 0.6950 (3) correct
4 0.6914 (1.1) wrong
____ .25 0.6704 (3.3)_ icprrggt
4 1 0.6555 (3.4) correct
2 0.6816 (4.8) wrong
3 0.6477 (4,4) correct 0.6874 (4) correct
4 0.6477 (3,4) correct
5 0.6601 (4,4) correct
5 1 __0164757(2f5) correct
2 0.6475 (1.5) correct
3 0.6903 (5.5) correct 0.6827 (6) wrong
4 0.6622 (1.5) correct
5 0.6800 (4.5) correct
6 1 076304 (3,67' correct
2 0.6455 (1.6) correct
3 0.6304 (1.6) correct 0.6578 (6) correct
4 0.6377 (3.6) correct
5 0.6578 (3.6) correct
7 1 30.6233 (2)7) correct
2 0.6181 (5.7) correct
3 0.6287 (5.7) correct 0.6288 (7) correct
4 0.6201 (5.7) correct
5 0.6181 (2,7) correct
8 1 0.6400 (4,8)‘7 correct
2 0.6545 (1.8) correct
3 0.6624 (5.8) correct 0.6819 (8) correct
4 0.6400 (1,8) correct
5 0.6583 (1.8) correct
9 1 '0.6555'(4I9) correct
2 0.6358 (4.9) correct
3 0.6535 (2.9) correct 0.6618 (9) correct
4 0.6246 (5.9) correct
5 0.6246 (4.9) correct
10 1 016507 (3.10)7 correct
2 0.6112 (3.10) correct
3 0.6111 (2.10) correct 0.6712 (10) correct
4 0.6640 (5.7) wrong
5 0.6394 (3.10) correct
1 - sample index Number of correct I.D.- 46 u 8
j - speaker index Rate- 92 Z - 80 Z

 

1523

APPENDIX G (23)
Both unknown and known speakers by normal SPT.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

j - speaker index

Unknown Known Identification decision
Speaker Sample by the nearest-neighbor by the minimum
(1) (1) Dist. with(i,j) ReSult Dist. with(j) Result
1 1 0.5457 (5.1) correct
2 0.6013 (1.1) correct .
3 0.6065 (2.1) correct 0.6157 (1) correct
4 0.5104 (1.1) correct
5 0~5700 (1.1) correct
2 1 0-6561 (2.2) correct
2 0.6255 (4.2) correct
3 0.6682 (2.2) correct 0.6188 (2) correct
4 0.6235 (2.2) correct
5 0.6588 (2.2) correct
3 1 0.5510 (4.3) correct
2 0.5795 (5.3) correct
3 0.5562 (5.3) correct 0.5887 (3) correct
4 0.5634 (1.3) correct
_ i 0-5801 (3 -3) correct
4 1 0.6160 (2.4) correct
2 0.5936 (5.4) correct
3 0.6232 (5,4) correct 0.6713 (4) correct
4 0.6713 (1,4) correct
5 0.6175 (2,4) correct
5 1 0.5919 (4.8) wrong
2 0.6238 (4.5) correct
3 0.6039 (1.5) correct 0.6384 (5) correct
4 0.6116 (5.5) correct
5 0.6028 (1.8) wrong
6 1 0.6198 (2.6) correct
2 0.6318 (5.6) correct _
3 0.6410 (5.6) correct 0.6410 (6) correct
4 0.6299 (5.6) correct
5 0.6422 (4.6) correct
7 1 0.6497 (2.7) correct
2 0.6263 (3.7) correct
3 0.6028 (5.7) correct 0.6611 (7) correct
4 0.6274 (5.7) correct
5 0.6267 (3.7) correct
8 1 0.5844 (4.8) correct
2 0.6016 (4,8) correct
3 0.6369 (5.8) correct 0.6369 (8) correct
4 0.5805 (5.3) wrong
5 0.5937 (1.8) correct
9 1 0.6652 (5.9) correct
2 0.5975 (3.9) correct
3 0.6095 (2.9) correct 0.6773 (5)
4 0.6051 (2.9) correct
5 -0.6636 (3.9) correct
10 1 0.6236 (2.10) correct
2 0.6360 (1,10) correct
3 0.6324 (2,10) correct 0.6488 (10) correct
4 0.6476 (1.10) correct
5 0.6253 (1.10) correct
1 - sample index Number of correct I.D. - 47 . 9
Rate - 94 Z - 90 Z

_7
_-

154+

APPENDIX G (24)
Both unknown and known speakers by linear SPT.

_—’

 

 

t
I

 

 

 

 

 

 

 

 

 

 

 

 

 

Unknown Known Aldentification decision
Speaker Sample by the nearest-neighbor by the minimum set
(3) (i) Dist. with(i.j) Result Dist. with(1) Result
1 1 0.7017 (2.5) wrong
2 0.6498 (3.1) correct
3 0.7354 (5.1) correct 0.7069 (3) wrong
4 0.6544 (5.1) correct
5 0.6366 (3.1) correct
2 1 0.6701 (4}2) correct
2 0.6811 (5.2) correct
3 0.7462 (4.1) wrong 0.7852 (2) correct
4 0.9533 (5.2) correct
5 0.6670 (4.2) correct
3 1 0.6376 (3.3) correct
2 0.7594 (5.3) correct
3 0.7364 (1.3) correct 0.6650 (3) correct
4 0.6382 (1,3) correct
.5 0.6557 (2.3) corrgpt
4 1 0.7596 (5.4) correct
2 0.9718 (4.4) correct
3 0.6945 (4.4) correct 0.6945 (4) correct
4 0.6705 (2.4) correct
5 0.6608 (1,4) correct
5 1 0.6756 (2.5) correct
2 0.6531 (5.8) wrong
3 0.6562 (2.8) wrong 0.6926 (5) correct
4 0.6772 (1,5) correct
5 0.6595 (2.5) correct
6 1 0.6624 (5.6) correct
2 0.6519 (5.6) correct
3 0.7284 (5.6) correct 0.6624 (6) correct
4 0.6432 (5.6) correct
5 0.6297 (3.6) correct
7 1 0.6612 (2.7) correct
2 0.9065 (5.7) correct
3 0.7341 (4.7) correct 0.7416 (8) wrong
4 0.6354 (3.7) correct
5 0.6479 (2.7) correct
8 1 0.9161 (3.8) correct
2 0.6562 (3.5) wrong
3 0.6575 (1.8) correct 0.6880 (8) correct
4 0.6597 (5.8) correct
5 0.6531 (2.5) wrong
9 1 0.7194 (3.9) correct
2 0.7390 (5.9) correct
3 0.6458 (2.9) correct 0.7194 (9) correct
4 0.6525 (5.9) correct
5 0.6403 (2.9) correct
10 1 0.6600 (5,10)’ correct
2 0.7810 (3.10) correct
3 0.6773 (2.10) correct 0.6971 (10) correct
4 0.6971 (5.10) correct
5 0.6732 (1.10) correct
1 - sample index Number of correct I.D. - 44 u 3
Rate - 88 2

j - speaker index

- 80 Z

 

APPENDIX H

LIST OF FORTRAN SOFTWARE

 

155

APPENDIX H

LIST OF FORTRAN SOFTWARE

 

 

Main Program

 

9:
AUTOCM

9':
POZCMP

*
SHORT8

*
IDSLTS

9k
DENCOR

**
COMPLE

3':
FRATIO

9c
FFFRAT

*
UPICK

*
FFPICK

NAKROl
NAKROZ
NAKRO3*
NAKRO4

NAKROA
NAKROB
NAKROC

NAKROP
PLOTIL

Usa e
Digitization, automatic speech compression.

Detection and concatenation of pauses and audio play-
back of the concatenated pauses.

Segmentation of digitized speech and generation of8
short-term spectra. Uses FFT algorithm with 256(2 )
sampled points.

Computes IDS (intensity deviation spectrum) and LTS
(long-term averaged spectrum).

Computes the Pearson correlation coefficients, then
creates a similarity matrix of these coefficients as
input data file to the clustering program, COMPLE.

Major program to prepare the complete-link dendrogram.

Computes F-ratios of the features within each cluster
of IDS and LTS formed by complete-link dendrogram.

Computes F-ratios of FFC features.

Interactive peak detecting algorithm. Uses a light
pen to detect F0 from digitized speech signal in
the time domain, then creates a file of F0 contour.

Computes FFC features from a file of F0 contour.

Executes the cross-transmission voice identification
operations.

Executes the within—transmission voice identification
operations.

Plotting programs.

Continued.

**
TOSIl
**
SAMMON

Subprograms

 

*
ZNORM
*
SUBF
*
SETDIS

**
IDA

1':
FILFIL

:1:
RDFILE

*9:
FFT10**

FFTlZ

System
softwares

 

156

APPENDIX H
LIST OF FORTRAN SOFTWARE (continued)
Creates choral speech and choral spectrum. Uses FFT

algorithm with 4096 sampled points.

Nonlinear projection program.

Standardization routine. Uses the z—transformation.
Routine to compute F-ratios.
Routine to find minimum set distance.

Routine to perform A/D and D/A conversions (written
in assembly language).

Creates a master file of file names.
Reads file names from a master file.
Fast Fourier Transform(written partially in assembly

language). FFTlO is called by Program SHORTS, and
FFT12 is called by TOSIl.

Graphic processing routines
Fortran IV compiler
and other essential system routines.

 

 

* Listings of the Fortran source codes are attached in the following

pages.

These softwares were designed and written by the author.

** Laboratory software used on PDP 11/40 at Speech and Hearing Sciences Research
Laboratory (Director, Dr. Oscar Tosi), Audiology and Speech Sciences,

Michigan State University.

157

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

("500000000

AUTOCM: AUTOMATIC SPEECH COMPRESSOR - FIRST PART -

written be 3 Hirotaka Nakasone
Last modified 2 11-Mas-83

Department of Audioloss and Speech Sciences
Michigan State Universits
East LansinS'

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
C234567890 234567890 234567890 234567890 234567890 234567890 234567890

C

1‘}

DOUBLE PRECISION PROG

INTEGER*4 IN41JB4vLE4

INTEGER IN2(2)9IDUF(500)

INTEGER BUFF1(5120)yDUFF2(5120)

INTEGER SFILE(4)rOFILE(4)vUERvFILCNT9TP

LOGICAL*1 FILNAM(6)vYORNySIvITABvIB

COMMON /PASS3/ ISUICHrNSECySFILE!TPvHAPvSUMMAXrSUMMINvOFILE
COMMON /BUFFR/ BUFFlyDUFF2

COMMON /FIL/ FILNAMyFILCNT

EOUIVALENCE (IN4yIN2)

DATA ITAB/‘013/rIB/‘007/7SUMMAX/0.O/

DATA SFILE(1)/3RDK0/7SFILE(4)/3RSND/19UHHIN/O.0/
DATA OFILE<l)/3RDK1/rOFILE(2)/3RCMP/yOFILE(3)/3RRSD/
DATA OFILE(4)/3RSND/:IN2/25690/

DATA FROG/12RDK1HNAUTOSAU/

FORMAT(’+’;1A1)
F0RMAT<1X91A1>
CALL INIT(IBUFv500)
CALL SCROL<191000)
TYPE 1220! ITAB
TYPE 1220’ ID
CALL APNT(O.9730.vOy-3y0y4)
CALL UECT(900.90.)
CALL APNT<200.9860.909—8)
CALL SUBP(1)
CALL TEXT(’ PROGRAM AUTOCM’)
CALL ESUB
TYPE 1200' ID
CALL OFF(1)
CALL SCROL<20957078)
FORMAT<1A1)
TYPE 215
FORMAT(’ Enter file name ( 6 letters ) 3 ’r$)
READ<59405) (FILNAM(M)1M=196)
FORMAT<6A1)
TYPE 220
FORMAT(’ How mans seconds ( in I3 FMT ) ? 3 ',$)
ACCEPT 4109 NSEC
FORMAT(13)
IF<NSEC.GE.1 .AND. NSEC.LE.118) GO TO 105
TYPE 225
FORMAT(’ Duration out of range. Trs again.’)

GO TO 110

105

Ul
(.0
UI

m
m
o

17

r.)
()1
U1

158

NBLKS = NSEC*40+1
ICHAN = IGETC()

IF(ICHAN .LT.0) STOP ’ Channel not available’
NLET = IRAD50<69FILNAM98FILE<2>)
IF(NLET.NE.6) STOP ’Dad RAD50 conversion’
MBLKS = IENTER<ICHANvSFILErNBLKS)
IF(MBLKS.LT.O) GO TO 115
TYPE 555
FORMAT(’ ENTER AP(0.0 - 1.0) b’y$)
READ<59559> HAP
FORMAT<F6.4)
TP = 10
wER = O
FILCNT = NSECX2

—-- Digitization ---
CALL QICH<ICHAN9FILCNTrDUFF1yBUFFBwaR)

IF(UER .EG. 0) GO TO 120

TYPE 2301wER

FORMAT(/’ Register 0 = ’9010)

STOP ’ wRITE ERROR IN OUTPUT FILE’

CALL CLOSEC(ICHAN)

CALL IFREEC<ICHAN)

GO TO 125

CALL CLOSEC<ICHAN)

IF(IFREEC(ICHAN).NE.0) STOP’ Error IFREEC next to 115’
IF(MBLKS .E0. -1) STOP ’NO CHANNEL OPEN’

TYPE 2351NBLKSrNSEC

FORMAT(’ Too land for current free blocks available.’/
8:1XrI4v’ blocks needed for ’713!’ second(s).’/

8’ Want to trs again by reducins # of sec (Y/N) ? ’r$)
ACCEPT 4009Y0RN

IF(YORN.EG.89) GO TO 110

STOP ’PROGRAM EXITED’

URITE(79240) (FILNAM(M)7M=1y6)

FORMAT(///v’ A sound file Just created 2 ’v6A19’.SND’/)
TYPE 12009 ID

ISUICH = 2

--- Chainins to Program HNAUTO to complete
automatic Speech compression -~—
CALL CHAIN<PROGrISwICH934)

CALL EXIT
Link information:

END

*AUTOCM=AUTOCMvGICHyGTLIBvFUSSvSYSLIB/F

 

CHHHHH HNAUT03 SECOND PART OF AUTOCM (AUTOMATIC SPEECH COMPRESSOR)

01.100000UDDODOOOODUDOUOOOOOOO

3011

PROGRAM HNAUTO IS DESIGNED TO PERFORM AUTOMATIC
SPEECH COMPRESSION DY APPLYING PAUSOMETRIC DEFINITION
OF PAUSES PROPOSED BY T081 (1974)

Program HNAUTO is written on RT-11FBy UOBc-OBBy PDP11/40.

HNAUTO detects pauses in a sound input filemeasures:
eliminates pauses, compresses signals, and crests a

compressed speech file of pre-determined duration in
seconds.

Presently: the maximum input speech duration is 118 seconds,
and the maximum output (compressed speech) depends on the
free blocks available in one of two disks in operation.

To compile: /U/S/N:m switches
To link 3 *HNAUTO=HNAUTOyIAUTOrCMPRESvHNLIBsSYSLIB/F

Author 3 Hirotaka Nakasone
Date 3 03 APR, 1983

Department of Audiology and Speech Sciences
Michigan State Universits

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

DOUBLE PRECISION UCHrPCH

COMMON XBZX COUNT,IOUTvNSIZEyIDONEvLAST

COMMON fPASS3/ ISUICHvNSECySFILErTPpAPrSUMMAXySUMMINyOFILE
INTEGERN4 IN4’JD47LE4

INTEGER IHOLD‘2560)vIBUF<2560>vSFILE(4)vOFILE(4)

INTEGER IOUT(2560)yFLAGyTMPKSvTPKTyTP

INTEGER TOTPrTSyIN2(2)

EGUIUALENCE (IN4vIN2)

LOGICALXI TM(8)vYOPNvNAME(6)vIBELvDLANK

DATA SFILE<1)/3RDKO/vSFILE(4)/3RSND/

DATA OFILE/3RDK193RCMP93RRSD93RSND/

DATA PCH/12RDK1POZCMPSAU/

DATA VCH/12RDK1POZUI SAU/

DATA IN2/25690/vISUICH/B/rIDEL/‘OO7/vBLANK/’ ’/

FORMAT(’r’71A1)

CALL RCHAIN(IF?ISUICH734)
NSIZE = 2560

UP = 000.1.

DOWN ; 0.01

COUNT = 0.0

NH = .10

Check from which pregam this program is chained to.

IF(JF.NE.-1) GO TO 4100

 

4100
4200

4400

1352

4110

7001

5001

5000

5010

5030

6030

 

160

IF(ISUICH.EO.2) GO TO 4105
IF(ISUICHoEQ.l) GO TO 4110
STOP ’ *XXINUALID ISUITCHXXX’

TYPE 4200

FORMAT(’ Enter input sound file name (6 letters) P’v$)
READ<594400) (NAME(M)7M = 196)

FORMAT(6A1)

CALL IRAD50<69NAMErSFILE(2))

URITE<791350)

READ(591220) NSEC

ICIN = IGETC()

IF(ICIN .LT. 0) STOP ’ NO INPUT CHANNEL.’
ILK = LOOKUP<ICINrSFILE)

IF(ILK .LT. 0) STOP ’ BAD LOOKUP.’

INBK = 0
SUMMIN = 10000.
SUMMAX = 0.

IF(NSEC.LT.O) GO TO 1352
IF(NSEC.GE.4) GO TO 7001
MBLOCK = NSECX40 - 10

GO TO 5001
MBLOCK = 150
INBK = 40

CALL TIME(TM)
URITE(7!1000) (TM(M)7M=178)
CONTINUE
NUD = IREADU(NSIZE7IBUF9INDKIICIN)
IF (NUD .LT. 0) STOP ’ READ ERROR.’
JJ = 1
SUN =3 00
DO 5030 J=17128
SUM = SUM + IABS(IBUF(JJ))
JJ = JJ + 1
CONTINUE
SUM = SUM/128.
IF(SUM .LT. SUMMIN) SUMMIN=SUM
IF(SUM oGTo SUMMAX) SUMMAX=SUM
JJ = JJ - 64
IF(JJ .LT. NSIZE) GO TO 5010
INBK = INBK + NH
IF ( INFK oLEo MBLOCK ) GO TO 5000
CALL CLOSEC(ICIN)
CALL IFREEC(ICIN)

CALL TIME(TM)

TYPE 3011vIBEL
URITE(7¢2000) (TM(M)9M=178)
URITE<795200) SUMMAXvSUMMIN

URITE(7!1100)
READ(591200) HAP
URITE(791300)
READ(591220) TP

CALL TIME(TM)

TYPE 20209(TM(M)9H=198)

ICIN = IGETC()

IF(ICIN oLTo 0) STOP ' N0 INPUT CHANNEL.’
ILK=LO0KUP<ICINDSFILE)

IF(ILK .LT. 0) STOP ’ BAD LO0KUP»’

THSEC = 0.5 * FLOAT(ILK) * 25.6
ICUUT=IGETC()

2999

5500

50
60

65

D8012
8010

8050

08022
8020

161

IF(ICOUT .LT. 0) STOP ' NO OUTPUT CHANNEL’
NB=IENTER<ICOUTyOFILEv-l) P
IF(NB.LT.0) STOP ' ENTER FAICED’

---INITIALIZE PARAMETERS---
MBLOCK = NSECt40 - 1o
COUNT = 0.0
AP = <1.- HAP) x (sunnnx - SUMMIN) + SUMMIN
no 5500 I = 1. NSIZE

o

IBUF(I) =
IOUT(I) = 0
IHOLD(I)= 0
CONTINUE
INBK = 0
IBKOUT = 0
IDONE = O
LHOLD = 0
LAST = O
TPKT = 0
FLAG = 0
IBEG = l
IEND = 0
CONTINUE

NUD=IREADU(NSIZE9IBUFTINBKyICIN)
. IF(NUD .LT. 0) STOP ’ READ ERROR.’
' ----- STARTING PAUSOMETRY----
KS = 0
INS
TMPK
JK = 0
IPT
SUM 0.
. DO 65 I =1910
JK = JK + l
SUM = SUM + IABS(IBUF(JK))
CONTINUE
SUM = SUM / 10.

0

(Jill

0

O

----- TESTING AP PARAMETER-----
IF(SUM .LE. AP) GO TO 70
KS = KS + 1
IF(FLAG .E0. 0) GO TO 115
ITOT = 10 * (KS-1+TMPKS)
IF(ITOT oLE. 0) GO TO 8050
IBEG = IEND - ITOT + 1
TYPE 801291ENDI IBEG, ITOT
FORMAT(1XT’FROM 8010b>>IEND=’vI77’ IBEG=”I7!’
CALL C"PRES(IBUF’IBEG'IEND’ITDT)
IF(IDONE .LT. 1) GO TO 8050 .
ND = IURITU<NSIZE9IOUTvIBKOUTvICOUT)
IBKOUT = IBKOUT + NU
IF(IDONE .EO. 2) GO TO 8010
CONTINUE
IKS = 0
KS = 1
FLAG = 0
GO TO 120
IF(LHOLD .E0. 0) GO TO 120
LHOLD 0
IHBEG 1
IHEND IHTOT
TYPE 8022!IHBEGVIHENDIIHTOT

FORMAT<1X7’FROM 8020??? IHBEG=’9I7T’ IHEND=IVI77’

CALL CMPRES(IHOLDTIHBEG9IHENDTIHTOT)
IF(IDONE oLT. 1) GO TO 120
NB = IURITU(NSIZETIOUT9IBKOUTyICOUT)

ITOT=’rI7)

IHTOT’yI7)

 

 

t.
08072

8070

80
85

8100

90
93

162

‘IBKOUT = IBKOUT + NU
IF(IDONE .E0. 2) GO TO 8020

TPKT = O

TMPKS = IKS

IF( JK .LT. NSIZE ) GO TO 50

ITOT = (KS+TMPKS) * 10

IBEG = JK - ITOT + 1

IEND = JK

TYPE 807ZyIBEGvIENDvITOT
FORMAT(1X9’ FROM 8070?}? IDEG=’vI7r’ IEND=’:I79’ ITOT=’9 I7)
CALL CMPRES(IBUFTIBEGvIENDvITOT)
IF(IDONEoLTol) GO TO 4

NB = IURITU(NSIZE9IOUTTIBKOUTTICOUT)
IBKOUT = IBKOUT + NU .
IF(IDONE.EO.2) GO TO 8070

GO TO 4

---END OF PROCESSING FOR IBUF 4 AP--

IPT = IPT + 1
----- TESTING TP PARAMETER-—--—
IF((IPT +TPKT) .LT. TP) GO TO 80‘
IEND = JK - IPT*10
TYPE 72vIENDyCOUNT/10.
FORMAT<1X” FROM n72, IEND = '917.' COUNT =',F1o.2)
FLAG = 1
LHOLD= 0
INS = o
60 TO 85
IKS = IKS + 1
CONTINUE
IF(JK .LT. NSIZE) GO TO 60
TPKT = TPKT + IPT
IF(TPKT .OT. 30000) GO TO 90
IF(FLAG.EG.1) GO TO 5
IH=JK-IPT#10
LHOLD = 1
IHTOT = IPT x 10
no 9020 J=1rIHTOT

IH = IH + 1
IHOLD(J)=IBUF(IH)
CONTINUE

ITOT = (KS+TMPKS)#10
IF(ITOT .LE. 0) GO TO 4
IEND = JK - IPT*10
IBEG = IEND - ITOT + 1
CALL CMPRES(IBUFTIBEGTIENDTITOT)
IF(IDONEoLTol) GO TO 4
CALL IURITU(NSIZErIOUTrIBKOUTvICOUT)
IBKOUT=IBKOUT+NU
IF(IDONE .EO. 2) GO TO 8100
INBK = INBK + NU
IF(INDK .LE. MBLOCK) GO TO 1
IF(IDONEoLTo1) CALL IURITU(LASTTIOUT9IBKOUTTICOUT)

GO TO 9050 _
————— END OF INPUT SIGNAL——--—

TYPE 939TPKT

FORMAT(’ Uarnins! Excessively lons pause Of”IS" msec so

1 far detected.’/’ Check AF or TP parameter, or input sound
8 file.’/’ Run again.’y//)

CALL CLOSEC<ICIN>

CALL IFREEC<ICIN>

STOP’ Interrupted exit’

—-— Check compression status by UP and DOUN method
be calling INTEGER FUNCKTION IAUTO ---

 

 

9050

6696

6666

7100

6000

1000

1100
1200
1220
1300
1350
2000
2020
3010
3040
5200

163

IF(IAUTO<HAP9UP9DOUN9COUNT.TMSEC).LT.O) GO TO 2999
Come here if compression satisfactory.

CALL CLOSEC(ICOUT)

CALL IFREEC<ICOUT>

CALL CLOSEC<ICIN)

CALL IFREEC<ICIN>

TYPE 3010913EL

CALL TIME(TM)

URITE (773020) (TM(M)9M=198)

FORMAT<1X98A1T’ Signal compression completed.’/)

--- Audio playback of the compressed speech ---

ICHAN = IGETC()

NB = LOOKUP<ICHANTOFILE)
CALL JICVT‘NB'JB4)

CALL JMUL(IN49JB49LE4)
TYPE 6696’ IBEL’IBELVIBEL
FORMAT(’+’T3A1)

COUNT = COUNT/10.
TYPE 6OOOTCOUNT9NB

IUAIT = 60

IER = IDA(ICHAN96940091009LE4,vIUAITTO)

IUAIT = 0

TYPE 7100

FORMAT(’ Hit {Return} to play asain.’/’ Otherwise type any
8 key: then hit {return} 5’95)

ACCEPT 3010uYORN

IF(YORN.EO.BLANK) GO TO 6666

CALL CLOSEC<ICHAN)

IF(IFREEC(ICHAN).NE.0) STOP ’ CLOSEC FAILED’

FORMAT(’ Output file name: CMPRSD.5ND’r/T' Total compress
zed signal duration = ’yF10.1.’ msec.(’vI4.’ Blocks)’//)

ISUICH = 3
FORHAT<1X98A1" Searching the average peak amplitude.

8Uait.’v$)

FORMAT(’ Enter AP parameter ( 0.0 =€-AP =ﬁ 1.0000 ) P’.$)
FORMAT<F10.4)

FORMAT(15)

FORMAT(’ Enter TP parameter ( 1 =ﬁ TP =¥ 10000 msec) >’9$)

FORMAT(’ Enter the number of seconds < 1=€SEC=ﬁ60 ) b’.$)
FORMAT<1X98A19’ Search ended.’v/)

FORMAT<1X98A17’ Compression in procee. Uait.’/)
FORMAT<1A1)

FORMAT(F8.1)
FORMAT<1X9’Pre-detection outputs :’v//v1Xv’Computed maximum

8 amplitude =’vF10.49/91Xv’Computed minimum amplitude =’p
8F10.475(/))

CALL EXIT
END

 

 

164

CtXXt***##*X***#****#****X**#*#****##*###*X**********###*#
C i
C INTEGER FUNCTION IAUTO(HAP9UP9DOUNrCOUNTyTHSEC) *
C *
C Called be Prosram HNAUTO. *
p

ad *
C TMSEC = 0.5XFLOAT(NUBLK)#2S.6 ! Threshold. *
p

a *
C COUNT = The number of sample points detected as i
C pauses ( devide be 10 to obtain msec.) *
C it
C IAUTO = 999 IF COMPRESSION COMPLETED. *
C t
C IAUTO = -999 IF COMPRESSION NOT COMPLETED. t
C t
C Uritten be : Hirotaka Nakasone t
C Date : 11 Mae, 1983 *
CttttttXX*********¥*#****XX*XtttttXtttXttttt**¥*##******##

FUNCTION IAUTO(HAPTUPTDOUNTCOUNTyTMSEC)

TEST = COUNT/10. - TMSEC
THRESH = TMSEC * 0.1
IF(TEST.GT.THRESH) GO TO 10
IF(TEST.LT.0.0) GO TO 20

C HERE, MISSION COMPLETED.
IAUTO = 99
RETURN
C HERE, COMPRESSED SIGNAL TOO LONG. 30 REDUCE
C AP VALUE BY CURRENT DOUN VALUE.
10 HAP = HAP - DOUN
UP = DOUN / 2.
D TYPE 1009COUNT/10.yTEST.DOUNoHAP
100 FORMAT<1X9’ Compressed too Ions; ’yF10.1r’ msec.’s/p
8’ Difference = ’9F1001” msec. Ap is reduced be ’9F7.59/!
8’ Repeating compression be Ap = ’9F7.5!//) .

IAUTO = -99

RETURN
C HERE, COMPRESSED SIGNAL TOO SHORT. SO INCREASE THE
C AP VALUE BY CURRENT UP VALUE
20 HAP = HAP + UP
DOUN = UP / 2.
D TYPE 2009COUNT/10.9TEST9UP9HAP
200 FORMAT(’ Compressed too short; ’9F10.17’ msec.’./y
8’ Difference = ’9F10.1y’ msec. Ap is increased be’pF7.5./p
8’ Repeatins compression be Ap = ’.F7.59//)

IAUTO = -99

RETURN
END

 

 

 

165

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Program POZCMP (on RT-llFDv v02c-O2Ds PDP11/40)

POZCMP detects: measures, and stores pauses from running
speech. The deteCted pauses are concatenated so that it
can be reproducible through DAC either to a loud speaker
or to a tape recorder.

An input file resides in Disk 0 with .SND extensions
and an output filer also in Disk 0.

 

Subprogram called:
CMPRES: Used to fill temporare buffer
until it eets fully 10 blocks(2560 words)
IDA: DAC 8 ADC routine found in.FUSS.LID.

PAUSOM
POZVl: SIGCMPy DTOA
cnitor directle be tepine .RUN POZCMP.

Prosram chainins(from)
' ' t to)

POZCMP can be run from

3 .. V.

To compile: /U/S/N:m switches: (m=14)
To link 2 *PQZCMP=PDZCMP9CMPRESvFUSSPSYSLIB/F

Hirotaka Nakasone
27 March, 1982

Uritten be
Date

.0 v.

Department of Audiology and Speech Sciences
Michigan State Universite
East Lansing: Michigan
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
DOUBLE PRECISION VCHrSCH
COMMON/PASS3/ ISUICHyNSECySFILEyTPeAPsSUMMAXySUMMINvDFILE
COMMON/Bl/COUNTrIOUTsNSIZErIDONErLAST
INTEGERX4 IN49JP4rLE4 ‘
INTEGER IHOLD<256O):IDUF(256O)'SFILE(4)TOFILE<4F
INTEGER IN2(2)TIOUT(2560)rFLAGvTPKTTTP
LOGICAL*1 TM(8)vYORNvNAME(6)rIBELsBLANK
EOUIVALENCE (IN491N2)
DATA SFILE<1)/3RDKO/ySFILE(4)/3RSND/
DATA OFILE/3RDK093RCMP93RRSDT3RSND/
DATA VCH/12RDK POZVI SAV/ySCH/12RDN SIGCMPSAV/
DATA ISUICH/2/rIBEL/‘007/TBLANK/’ ’/

nononnonmanna-000000110000000000

 

3011 FORMAT(’+’91A1)
CALL RCHAIN(IFTISUICH934)
COUNT.= 0.0
NSIZE=2560

NU = 10
TYPE 4200

4200 FORMAT(’ Enter input file name (6 letters) F’p$)
READ(594205) (NAME(M)9M=176)

4205 FORMAT<6A1>

CALL IRAD50(6!NAMETSFILE(2))
1352 URITE(7:1350)

READ(59122O) NSEC
4100 IF(ISUICH.EO.3) GO TO 6050

3010

7001
5001
5000

5010

5030

5500

6023
6030

O14oo

166

FORMAT(IAI)

ICIN = IGETC()

IF(ICIN .LT. 0) STOP ’ NO INPUT CHANNEL.’

ILK = LOOKUP<ICINTSFILE)

IF(ILK .LT. 0) STOP ’ BAD LOOKUP.’
INBK = 0
SUMMIN = 10000.

. SUMMAX = 0.
IF(NSEC.LT.0) GO TO 1352
IF(NSEC.GE.4) GO TO 7001’

MDLOCK = NSECX4O - 10

GO TO 5001
MBLOCK = 150
INBK = 40

CALL TIME(TM)
URITE(7.1000> (TM(M).M=1.8)
CONTINUE
NUD = IREADU<NSIZE9IBUFyINDKyICIN)
IF (NUD .LT. 0) STOP ' READ ERROR.’
JJ = 1
SUN = 09
DO 5030 J=1v128
SUM = SUM + IABS(IBUF(JJ))
JJ = JJ + 1
CONTINUE
SUM = SUM/128.
IF(SUM .LT. SUMMIN) SUMMIN=SUM
IF(SUM .GT. SUMMAX) SUMMAX=SUM
JJ = JJ - 64
IF(JJ .LT. NSIZE) GO TO 5010
INBK = INBK + NU
IF < INBK .LE. MBLOCK ) GO TO 5000
CALL CLOSEC(ICIN)
CALL IFREEC(ICIN)
CALL TIME(TM)
URITE<792000) (TM(M)9M=198)
TYPE 3011. IBEL
URITE<7952OO) SUMMAXrSUMMIN

DO 5500 I = 19 NSIZE
IBUF(I) = 0
IOUT(I) = 0
IHOLD(I)= 0

CONTINUE

GO TO 6030

FORMAT(F10.4)

URITE(7T1100)

READ(591200) AP

AP = (1.- AP) X (SUMMAX - SUMMIN) + SUMMIN
URITE<711300)

READ(5!1220) TP

CALL TIME(TM)

TYPE 20209<TM(M)9M=198)

~--DIRECT ACCESS PROCEDURES FOR
---INPUT AND OUTPUT FILES---
ICIN = IGETC()
IF(ICIN .LT. 0) STOP ’ NO INPUT CHANNEL-’
ILK=LOOKUP<ICINTSFILE)
IF(ILK .LT. 0) STOP ’ BAD LOOKUP.’

ICOUT=IGETC()
IF(ICOUT .LT. 0) STOP ’ NO OUTPUT CHANNEL’

NB=IENTER(ICOUTTOFILEr-1)

TYPE 14009ND
FORMAT(iX!’ NUMBER OF BLOCKS ALLOCATED FOR OUTPUT FILE =’sI4)

IF(NB.LT.O) STOP ’ ENTER FAILED’

 

 

LT}

Int-‘0

l')

50
6O

1')

80

 

 

 

 

 

167

~--INITIALIZE PARAMETERS~--
MBLOCK = NSECX40 - 10
COUNT = 0.0
INBK = 0
IBKOUT =
IDONE =
LHOLD =
IH =
LAST
TPKT
FLAG
IBEG

II II II ll 0

H000

---PAUSE COMPRESSION BEGINS-—-
CONTINUE

NUD=IREADU(NSIZE9IBUFyINDKvICIN)
IF(NUD .LT. 0) STOP ’ READ ERROR.’
----- STARTING PAUSOMETRY-~-—

INS = 0
JR = 0
IPT = 0
SUN = 0 0
DO 65 I =1710
JK = JK + 1
SUM = SUM + IABS(IFUF(JK))
CONTINUE

SUM = SUM / 10.
----- TESTING AP PARAMETER-----

IF(SUM .LT. AP) GO TO 70
IF(FLAG .E0. 0) GO TO 120

FLAG = o
IEND = JK — 10
ITOT = IEND — IBEG + 1

IF(LHOLD .E0. 0) GO TO 8010

LHOLD = 0

CALL C"PRES(IHOLDVIHBEG!IHEND!IHTOT)

IF(IDONE .LT. 1) GO TO 8010

TYPE 8022’IBKOUTPINBK _
FORMAT(IXT’ IURITU UITH IBKOUT=’9159’ INDK=’9I5)
ND = IURITU(NSIZE9IOUTrIDKOUT!ICOUT)

TYPE 80249NB

FORMAT(le’ NUMBER OF BLOCK SIZE = ’2I4)
IBKOUT=IBKOUT+NU

IF(IDONE .E0. 2) GO TO 8020

CALL CMPRES(IDUFTIBEGTIENDTITOT)

IF(IDONE .LT. 1) GO TO 120

TYPE 8012TIBKOUTTINBK

FORMAT(le’IURITU UITH IBKOUT=’9159’ INDK=’915)
NB = IURITU(NSIZEpIOUTrIBKOUTTICOUT)

TYPE 80149NB -

FORMAT(lXp’ NUMBER OF BLOCK SIZE = ’9I4)
IBKOUT=IBKOUT+NU V

IF(IDONE .EO. 2) GO TO 8010

TPKT = 0
IF( JK .LT. NSIZE) GO TO 50
GO TO 4

IPT = IPT + 1
----- TESTING TP PARAMETER---—-

IF((IPT +TPKT) .LT. TP) GO TO 80

IBEG = JN + 1 - (IPT * 10)

FLAG = 1
IF(JK .LT. NSIZE) GO TO 60
TPKT = TPKT + IPT
IF(TPKT .GT. 30000) GO TO 90
IF(FLAG.EO.0) GO TO 8040
IF(LHOLD.E0.0) GO TO 8030

8000

D
D8002

D
D8004

8030

9000
D9002

D9004

8040

90
93

1000
1100
1200
1220
1300

3040
5200

168

LHOLD = O
IHBEG=1
IHEND=IHTOT
CALL CMPRES(IHOLD!IHBEG9IHEND9IHTOT)
IF(IDONE .LT. 1) GO TO 8030
TYPE 80029IBKOUT9INBK
FORMAT(1X9’IURITU UITH IBKOUT=’9159’ INBKp’slS)
NB = IURITU(NSIZE9IOUT9IBKOUT9ICOUT)
TYPE 80049NB
FORMAT(1X9’ NUMBER OF BLOCK SIZE = ’9I4)
IBKOUT=IBKOUT + NU
IF(IDONE .E0. 2) GO TO 8000
IF(FLAG.E0.0) GO TO 4
FLAG = 0
IEND=JK
ITOT=IPT¥10
CALL CMPRES<IBUF9IBEG9IEND9ITOT)
IF(IDONE .LT. 1) GO TO 4
TYPE 90029IBKOUT9INBK
FORMAT(1X9’IURITU UITH IBKOUT=’9I59’ INBK=’915)
NB = IURITU<NSIZE9IOUT9IBKOUT9ICOUT)
TYPE 90049NB
FORMAT(1X9’ NUMBER OF BLOCK SIZE = ’9I4)
IBKOUT=IBKOUT+NU
IF(IDONE .E0. 2) GO TO 9000
GO TO 4
IH=NSIZE-IPT*10
LHOLD = 1
IHTOT = IPT * 10
DO 9020 J=19IHTOT
IH = IH + 1
IHOLD(J)=IBUF(IH)
CONTINUE
IF(INBK.LT.MBLOCK) GO TO 85
INBK = INBK + NU
IF(INBK .LE. MBLOCK) GO TO 1
IF(IDONE.LT.1) CALL IURITU(LAST9IOUT9IBKOUT9ICOUT)

GO TO 95
----- END OF INPUT SIGNAL---——

TYPE 939TPKT

FORMAT(’ Uarnins! Excessivele long pause of ’9I59 ’ msec so
Sfar detected.’/’ Check AP or TP parameter9 or input sound
8 file’/’ Run again.’9//)

CALL CLOSEC(ICIN)
CALL IFREEC(ICIN)
STOP ’ Interrupted exit’

CALL CLOSEC(ICOUT)
CALL IFREEC(ICOUT)
CALL CLOSEC(ICIN)
CALL IFREEC(ICIN)

CALL TIME(TM)
URITE (793011) (IBEL9M=193)

URITE(797100) (TM(M)9M=198)

FORMAT(1X98A19’ Searching the average peak amplitude.’/)

FORMAT(’ Enter AP parameter( 0.0 =ﬁ AP =ﬁ 1.0000 ) ﬁ’r$)
FORMAT(F10.4)

FORMAT(IS)

FORMAT(’ Enter TP parameter( 1 =ﬁ TP ={ 10000 msec) >’9$)

FORMAT(’ Enter the number of seconds ( 1=€SEC=€60 ) >’9$)
FORMAT(1X98A19’ Search ended.’9/) . .
FORMAT(T398A19’ Pause compression begins. Uait. ’9$)

FORMAT(FB.1) .
FORMAT(1X9’Pre-detection outputs:’9//91X9'Computed maximum

 

 

7100

6000

6666

6695
6670

6690

6680

169

8 amplitude =’9F10.43/91X9’Computed minimum amplitude =’9
8F10.495(/))

FORMAT(/9T398A19’ PAUSE compression completed.’9//)
COUNT=COUNT/10.

ICHAN = IGETC()

IN2(1)=256

IN2(2)=0

NB = LOOKUP<ICHAN9OFILE>

TYPE 60009COUNT9NB '

FORMAT(’ Output file name: CMPRSD.SND’9/9’ Total compressed
% pauses = ’9F10.19’ msec(’9I49’ blocks)’/)

CALL JICVT(NB9JB4)

CALL JMUL(IN49JB49LE4)

IER = IDA(ICHAN96940091009LE499120960)

TYPE 6710
FORMAT(’ Hit {Return} to plas asain.’/’ Otherwise tepe ans
8 kee9 then hit {Return} b’9$)

ACCEPT 30109YORN

IF(YORN .EG. BLANK) GO TO 6720

CALL CLOSEC(ICHAN) '

IF(IFREEC(ICHAN).NE.0) STOP ’ IFREEC failed’
TYPE 6666

FORMAT(’ Try with other TP and AP (Y/N) ?’9$)
ACCEPT 30109YORN

IF(YORN.EO.89) GO TO 6030

ISUICH = 2

TYPE 6670

FORMAT(’ Options (Type a letter of your choice below.)’9///
89T59’ S --- to creat a compressed SIGNAL.’9/

89T59’ H --- to Set PRINT OUTS of pauses and sianals.’9/
89T59’ O --- to QUIT this program.’9//9TS9’P’9$)

ACCEPT 30109YORN

IF(YORN.EG.81) GO TO 6680

ISUICH = 2

IF(YORN.EO.72) CALL CHAIN<VCH9ISUICH934)
IF(YORN.EG.83) CALL CHAIN<SCH9ISUICH934>
TYPE 6690

FORMAT(’ Invalid choice. Try again !’/)
GO TO 6695

CONTINUE

CALL EXIT

END

170

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C
C
C
C
C
C
C
C
C
C
C
C

D100
[I

To 1‘)

UI

DUCT-b
to
O
0

Program CMPRES is called from AUTOCM9 POZCMP9 SIGCMP9 and
POZV1.

CMPRES fills output buffer be the (pauses/signals) one
NSIZE (10 blocks = 2560 words) at a time.

Uritten be 3 Hirotaka Nakasone

Last modified: 27-March9 1982

Department of Audiology and Speech Sciences
Michigan State Universite

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

SUBROUTINE CMPRES(INB91B9IE9IT)
COMMON/Bl/COUNT9IOUT9NSIZE9IDONE9LAST
INTEGER INB<2560)9IOUT(2560)

TYPE 1009LAST9IDONEvIBvIE
FORMAT(//91X9’ AT CMPRES ENTERANCE9 LAST=’9I79’ IDONE=’eI29/9
8T24!’ IBEG=’9I79’ IEND=’yI7)
IF(IT .LE. 0) GO TO 4
IF((LAST+IT) .LE. NSIZE) GO TO 2
ITEMP = NSIZE - LAST - 1
T=00
DO 1 I=IBTIB+ITEMP
LAST = LAST + 1
IOUT(LAST)=INB(I)
IT = IT - 1
T=T+10
CONTINUE
COUNT=COUNT+T
LAST = 0
IDONE = 2
I3 = ID + ITEMP + 1
GO TO 4

CONTINUE

T=00

DO 3 I IBvIE
LAST LAST + 1
IOUT(LAST)=INB(I)
T=T+1o

CONTINUE

COUNT=COUNT+T

IF(LAST.LT.NSIZE) GO TO 5
LAST=0
IDONE=1
GO TO 4

CONTINUE

IDONE=0

CONTINUE

TYPE 2009LAST9IDONE9IB9IE
FORMAT(1X9’ LEAVING CMPRSD UITH LAST =’9I79’ IDONE=’9I29Jr

8T239’ IBEG=’9I79’ IEND=’9I79///)
RETURN
END~

171

CHHHH PROGRAM SHORTS: HHHHHHHHHHHH

SHORTS is designed to creat a set of parent files of
short-term spectra. A parent file can have as mane as

400 short-term spectra. 2 spectra are stored in a l-block
INTEGER*2.

Input sound file resides in DK1 with .SND extension.

Output short-term spectrum will be assiSned .STS
extension name. I

 

This output file is then stored in DKO.

To compile:
*SHORT8=SHORT8/U

To link:
*SHORT8=SHORT89FFT10H9SHUFFL9TABE109SYSLID/F

Uritten be: Hirotaka Nakasone
Date: Has 49 1983

Updated: Mae 69 1983
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
INTEGER FILEIN(596)9FILOUT(59596)9NSEC(5)9UCNT

INTEGER IBUF(256)9NSPT(5)9DATAIN(4)9DATOUT(4)9LBUF<256>
REAL TX(256)

COMPLEX F(256)

COMMON /FFTCOM/F

LOGICALXI TM(8)9BUG9 YN

DATA DATAIN<1)/3RDKO/9DATAIN(4)/3RSND/

DATA DATOUT(1)/3RDK0/9DATOUT(4)/3RSTS/

FUNIT = 5000./128.

BUG = .FALSE.

D TYPE 10

10 FORMAT(’ DEBUG (Y/N) ?’9$)

D ACCEPT 119 YN

11 FORMAT(1A1)

D IF(YNOE0089) BUG = OTRUEO
TYPE 20 '

20 FORMAT(’ Uhich disk has input file (1/0) ?’9$)
ACCEPT 259 NDISK

25 FORMAT(II)
IF(NDISK.EO.1) CALL IRAD50(39’DK1’9DATAIN(1))
TYPE 30

30 FORMAT(’ Uhich disk will store output file (1/0) ?’9$)

ACCEPT 259 NDISK
IF(NDISK.EO.1) CALL IRAD50(39’DK1’9DATOUT(1))

TYPE 100 .
100 FORMAT(’ Number of input sound files(max=o) >’,$)
ACCEPT 2009NFILES
200 FORMAT(I3)

DO 300 I=19NFILES
TYPE 40091

 

 

172

400 FORMAT(1X9’Tepe input file f’9I29’ (6A1) P ’93)
READ<59500) (FILEIN(I9J)9J=1.6)

500 FORMAT(6A1)
TYPE 510

510 FORMAT(’ Number of seconds of this input file E ’.$)
ACCEPT 2009NSEC(I)
TYPE 600

600 FORMAT(’ Number of parent spectra from this file

8(max=5) b ’9$)
ACCEPT 2009NSPT(I)

TYPE 700
700 FORMAT(’ Tape all parent Spectra names (6A1) below.’/)
DO 800 K=19NSPT(I)
TYPE 9009K
900 FORMAT(T2O9’ For output parent file t’9I29’ P ’9$)
READ(59500)(FILOUT(I9K9J)9J=196)
800 CONTINUE
300 CONTINUE
C END OF INPUT PROCEDURES. NOU BEGIN LOOP FOR
C ALL INPUT FILES.

NSAMP = 256
D0 1100 II=19NFILES
CALL TIME(TM)
TYPE 11109(TM(M)9Mm198)T(FILEIN(II9KK)9KK=196)
1110 FORMAT(1X98A19’ BEGIN INPUT FILE DKO: ’96A19’.SND’/)
CALL CVRADI<DATAINIFILEIN9TI)
ICH = IGETC()
IF(LOOKUP(ICH9DATAIN).LT.0) STOP ’BAD LOOKUP’
KSIZE = (NSEC(II)/NSPT(|J))¥40
IF(KSIZE.LE.O) STOP ’ UNONO KSIZE’
MBLOCK = NSIZE-4

NBLK = 0

DO 1200 JJ= 19NSPT(II)
IH = 0
NST = 0

CALL CURAD2(DATOUT9FILOUT9[I9JJ)

ICOUT = IGETC()
NB = IENTER(ICOUT9DATOUT9~1)

C
5111 NUD = IREADU<256vIBUFvNBLR9ICH)
n RSLR=NSLN
IF(NUD.LT.0) STOP ' REAn ERROR’
C
no 110 J=19256
TR = FLOAT(IBUF(J))/100.
F(J)=CMPLX(TR90.0)
11o CONTINUE
n TYPE 3. NBLK
n3 FORMAT(1X9’ NBLh ='.141
C
C --— NUU CALL THE FFT ALGORITHM —-—
CALL FFT10H(8)
C
C -—— CONVERT RETURNED REAL AND IMAGINARY
C COMPONENTS TO ABSOLUTE VALUES ___

DO 7010 I= 19128
TX(I)=CABS(F(I))
LBUF(IH+I)= FIX(TX(II)
D FREO = FUNI # (I-l)
D IF(BUG) TYPE 70149[9TX(I)9FREO9IH+J9LBUF(IH+I)
D7014 FORMAT(6X9I497X9E13.493X96X9F8.19I496X9IS)

 

 

 

7010

7030

7050
1200

1250

1100

1300

1330
1320
1310

D10

D10

173

CONTINUE

IH = IH + 128
IF(IH.LE.128) GO TO 7030
IH = 0
URITE 1 BLOCK OF STS
CALL IURITU<2569LBUF9NST9ICOUT)
UPDATE NUMBER OF BLOCKS
NST = NST + 1 .
NBLK = NBLK + 4
IF(NBLK.LE.MBLOCK) GO TO 5111
MBLOCK = MBLOCK + KSIZE

CALL CLOSEC(ICOUT)
IF(IFREEC(ICOUT).NE.0) STOP ’ERROR IN IFREEC’
CALL TIME(TM)
TYPE 70509(TM(M)9M=198)9(FILOUT(II9JJ9KK)9KK=196)
FORMAT(T2098A19T459’DK13’96A19’.STS’/)

CONTINUE

CALL CLOSEC(ICH)
IF(IFREEC(ICH).NE.0) STOP ’ ERROR IN IFREEC(ICH)’

TYPE 12509II9(FILEIN(II9KK)9KK=196)
FORMAT(1X9’AR END OF INPUT FILE 0’9129’ ’96A19’.SND’/)
CONTINUE

TYPE 1300
FORMAT(20(/)91X9’SUMMARY OF OUTPUT FILES IN’9/

$91X9’CATEGORY 1 SHORT-TERM SPECTRA FILE NAME’/)

DO 1310 I=19NFILES
DO 1320 J = 19NSPT(I)
TYPE 13309I9(FILOUT(I!J9K)9K=196)
FORMAT(4X9I37T2096A19’.STS’)
CONTINUE
CONTINUE

CALL EXIT
END

SUBROUTINE CURAD1(OUTFIL9INFILE9I2)
INTEGER OUTFIL(4)9INFILE(596)
LOGICALXl DUMMY(6)
DO 1 I=196

DUMMY(I) = INFILE(I29I)
CONTINUE
IER=IRAD50(69DUMMY9OUTFIL(2))
TYPE 109IER9(DUMMY(M)9M=196)
FORMAT(1X9’NUMBER OF CHAR. CONVERTED(IN)=’9I39’ 2 ’96A19/)
RETURN
END

SUBROUTINE CVRAD2(OUTFIL9INFILE9I29J2)
INTEGER OUTFIL(4)9INFILE(59596)
LOGICAL*1 DUMMY(6)

DO 1 I=196
DUMMY(I) = INFILE(I29J29I)

CONTINUE

IER = IRADSO<69DUMMY9OUTFIL(2))

TYPE 109IER9(DUMMY(M)9M=196)

FORMAT(1X9’NUMBER OF CHAR. CONVERTED(OUT)=’9I39’ : ’96A19/)

RETURN

 

 

 

 

174

FILE IDSLTS.FOR
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

PROGRAM IDSLTS:

IDSLTS is designed to generate several different tepes of spectra:
from a parent short—term spectra.

(1) Mean Deviation spectrum9 defined be
D(k) = E SUM E S(i9J9k) - m(i9J9k) J**2 J / I

for i = 19 2! 39 ...! I

I = number of short-term Spectra
for J = 192939 ...9 J
J = number of freouence componentS9 and

k = 1929 ...! N! k is index of kth pattern.
m(i!J9k) = ( SUM CS(ivJ9k)] ) / I

(2) Intensite deviation spectrum (IDS)9 defined be

C(k) = E SUM EABC( S(i9J9k) - m(i9J9k) )J J / m(i9J9k)

Ul'IOOOi-IDOOC00000000000001.)

where m and i defined above.
(3) Long-term averaged spectrum 4LTS)9 defined be
L(k) = E SUM ES(i9J9k)J J / I
where i defined above.

All the spectra defined above are derived from the same set of
parent short-term spectra.

Input Short-term spectra file with ext. STS
Output (1) Intensite deviation Spectra (IDS) with ext. IDS
(2) Long-term averaged spectra (LTS) with ext. LTS

IDSLTS/U
IDSLTS9NCHECK9SYSLIB/F

IDSLTS
IDSLTS

To compile
To link

0. 0.

Hirotaka Nakasone
10 Oct.9 1982
14 Oct.9 179 Mae91983

Programmed be
Date
Last modified

9. O. 9.

Department of Audiologe and Speech Sciences
Michigan State Universite

C
C

C

C

C

C

C

C

C

C

C

C

C

C .
C Subprogram called: FUNCTION NCHECK for file input verification.
C

C

C

C

C

C

C

C

C

C

C East Lansina9 Michigan
C

C

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /NAME/ FNAMES
DOUBLE PRECISION DEXT
REAL SUM1(128)9AIDS(128)

INTEGER IBUF(256)9EXT(4)
LOGICALXI FNAMES(159100)9LDATE(9)9LTIME(8)9LDEV(3)9BUG

LOGICALXl LEXT(3)9MFILE(15)9FMT(60)9L12(12)9LN(9)
LOGICAL¥1 ED(3)9EL(3)9ES(3)9NEED9LEX€395)9LFILE(15)9YN

LOGICALXI NORMD(3)9NORML(3)

.1. a. ‘1

 

 

 

175

DATA MFILE(1)/’D’/9MFILE(2)/’K’/9MFILE(4)/’:’/9MFILE(11)/’.’/

DATA DEXT/3RSTS/9EXT(4)/3RSTS/

DATA ED(1)/’I’/9ED(2)/’D’/9ED(3)/’S’/
DATA EL(1)/’L’/9EL(2)/’T’/9EL(3)/’S’/
DATA ES(1)/’S’/9ES(2)/’T’/9ES(3)/’S’/

FORMAT(1A1)

FORMAT(6A1)

FORMAT(1A193A1)
FORMAT(15A1)

FORMAT(I3)

FORMAT(12A1)

FORMAT(14A1)

FORMAT(9A193A1)
FORMAT(1X9/)

O FORMAT(1X98A193X99A193X9’PROGRAM: IDSLTS.’///)

H‘OCOVO‘UI-bbll-JH

ED(1)
ED(2)
78
EL(1)
EL(2)
78

NORMD(l)
NORMD(2)
NORMD(3)
NORML(l)
NORML(2)
NORML(3)

CALL DATE(LDATE)
CALL TIME(LTIME)
URITE(7910) LTIME9LDATE

BUG = .FALSE.
TYPE 15

15 FORMAT(’ DEBUG(Y/N) T’9$)
ACCEPT 17YN

TYPE 20

20 FORMAT(’ MASTER FILE OF FILE NAMES AVAILABLE(Y/N) ?’9$)

ACCEPT 19 YN
IF(YN.NE.89) GO TO 200

C TO HERE TO READ THE MASTER FILE NAME OF FILES.
NEED = .FALSE.

TYPE 30
30 FORMAT(’ ENTER MASTER FILE NAME EDEV:FILNAM.EXTJ

READ(597) (MFILE(M)9M=1914)
MFILE(15) = 0

C OPEN CHNN. FOR FILE INPUT.
CALL ASSIGN(129MFILE9149’OLD’)
READ(12950) (LFILE(L)9L=1914)9NUMF9FMT
50 FORMAT(14A19I3960A1)

DO 60 I = 19NUMF
READ(1297) (FNAMES(M9I)9M=1914)

D URITE(7965) (FNAMES(M9I)9M=1914)9I
D65 FORMAT(1X914A195X9I3)
60 CONTINUE

CALL CLOSE(12)

GO TO 250

C TO HERE IF THE MASTER FILE NAME IS NOT AVAILABLE.
200 CONTINUE
NEED = OTRUE 0

TYPE 100
100 FORMAT(1X9’TOTAL NUMBER OF PARENT STS FILES ?’9$)

:rum

iih;;;

 

   

 

 

176

READ(595) NUMF
TYPE 110
110 FORMAT(1X9’ENTER ALL PARENT STS FILE NAMES BELOU.’//)

C LOOP FOR FILE NAME INPUT.
DO 120 I = 19NUMF
URITE(79130) I
130 FORMAT(’ STS FILE NAME 1’9139’ ’9$)
IF(LCHECK(DEXT9ICHAN9I).LT.0) STOP ’ERROR LCHECK’
CALL CLOSEC(ICHAN)
IF(IFREEC(ICHAN).NE.0) STOP ’*ERROR IFREECX’
120 CONTINUE

C BEGIN MAIN LOOP TO COMPUTE IDS AND LTS.

25o CONTINUE
NOUNT = o
no 300 I = 19NUMF
C INITIALIZE
no 305 J = 1. 127
SUM1(J) = 0.0
AIDS(J) = 0.0
305 CONTINUE
C OPEN FOR A FILE

ICHAN = IGETC() _

CALL IRAD50(39FNAMES(19I)9EXT(1))
CALL IRAD50(69FNAMES(59I)9EXT(2))
CALL IRAD50(39FNAMES(129I)9EXT(4))
CALL R50ASC(99DEXT9LN(1))

MAXBLK = LOOKUP(ICHAN9EXT)

IF(MAXBLK .LT. 0) STOP ’XBAD LOOKUPX’

-NBLK = 0
JH = 0
310 NUORD = IREADU(2569IBUF9NBLK9ICHAN)

IF(NUORD.LT.O) STOP’ERROR IN IREADU’

CDEBUGDEBUGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDEBUG

IF(.NOT. BUG) GO TO 315

IF(NBLK.GE.2) GO TO 315

TYPE 1100
1100 FORMAT(1X9//9’ STS#1 STS*2’/)

DO 312 ID = 29128
URITE(791000) IBUF(IB)9IBUF(IB+128)

1000 FORMAT(1X9I693X9I6)
312 CONTINUE

TYPE 9
CDEBUGDEBUGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDEBUG

315 CONTINUE

DO 320 J = 19127

SUM1(J) = SUM1(J) + FLOAT(IBUF(J+1+JH))

320 CONTINUE

JH = JH + 128

IF(JH.LE.128) GO TO 315

JH = O

NBLK = NBLK + 1

IF(NBLK.LT.MAXBLK) GO TO 310

DSTS = FLOAT(MAXBLK*2)

COMPUTE THE AVERAGE INTENSITY OF EACH
C FREQUENCY COMPONENT.

 

 

330

300

400

430

177

D0 330 J = 19 127
SUM1(J) = SUM1(J)/DSTS
CONTINUE

SUMl NOU HOLDS THE AVERAGE INTENSITY.
RESET NBLK = 0

NBLK = 0

JH = 0
NUORD = IREADU(2569IBUF9NBLK91CHAN)
CONTINUE

DO 360 J = 19127
AIDS(J) = AIDS(J) + ABS(FLOAT(IBUF(J+1+JH))-SUM1(J))
CONTINUE
JH = JH + 128
‘IF(JH.LE.128) GO TO 355
JH = O
NBLK = NBLK + 1
IF(NBLK .LT. MAXBLK) GO TO 350

D0 365 J = 19127
AIDS(J) = AIDS(J) / SUM1(J)
CONTINUE

END OF ONE STS. NOU9
URITE THE RESULTS.

TYPE 9

CALL UTSPEC(1279AIDS9I9ED)

CALL UTSPEC(1279SUM19I9EL)

CALL NORPER<1279AIDS)

CALL UTSPEC(1279AIDS9I9NORMD)

CALL NORPER(1279SUM1)

CALL UTSPEC(1279SUM19I9NORML)

CLOSE CHANNEL.
CALL CLOSEC(ICHAN)
IF(IFREEC(ICHAN).NE.0) STOP ’ICHAN NOT FREED.’
KOUNT = KOUNT + 1
CONTINUE

END OF ALL STS’S

TYPE 9

TYPE 9

CALL TIME(LTIME)

URITE(794OO) LTIME9KOUNT

FORMAT(1X98A19//9’ TOTAL NUMBER OF STS FILES PROCESSED =’9I39

8///)

DO 405 I = 193

LEX(I91) = ED(I)
LEX(I92) = NORMD(I)
LEX(I93) = EL(I)
LEX(I94) = NORML(I)
LEX(I95) = ES(I)

CONTINUE

N = 4

IF(NEED) N = 5

YN = 0

DO 420 I = 19N
URITE(79430) (LEX(J9I)9J=193)
FORMAT(1X9’MASTER FILE NAME OF ’93A19’ FILES
8 EDEV:FILNAM.EXTJ P’9$)
READ(597) (MFILE(M)9M=1914)
IF(MFILE(1).EO.48) GO TO 420
MFILE(15) = 0
CALL ASSIGN(619MFILE9149’NEU’)
URITE(61944O) (MFILE(M)9M=1914)9 KOUNT

 

 

 

 

178

440 FORMAT(14A19I39’ (5E15.7)’)
DO 450 II = 19KOUNT
URITE(6197)(FNAMES(M9II)9M=1911)9(LEX(M9I)9M=193)

450 CONTINUE
CALL CLOSE(61)
420 CONTINUE
CALL EXIT
END

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE UTSF‘EC(NFREG9SDATA9 IX9 E3)

C Uritten be. Hirotaka Nakasone H
C Date 3 17 Mae 1983 H
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /NAME/ FNAMES
LOGICAL*1 FNAMES(159100)9E3(3)
REAL SDATA(NFREO)

DO 1 I = 193
FNAMES(I+119IX)= E3(I)
1 CONTINUE

CALL ASSIGN(619FNAMES(19IX)9149’NEU’)
URITE(6193) (SDATA(J)9J=19NFREO)
3 FORMAT(5E15.7)
CALL CLOSE(61)
URITE(794) (FNAMES(L9IX)9L=1914)
4 FORMAT(1X914A1)
RETURN
END

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

FILE3 NORPER.FOR
SUBROUTINE NORPER(NFREOrSDATA)

C
C
C
C NORPER IS DESIGNED TO NORMALIZE IDS AND LTS FILES.

C NORMALIZED DATA UILL BE REPRESENTED BY NUMBERS

C BETUEEN 0 AND 100.

C SDATA IS A REAL ARRAY DIMENSIONED BY NFREO.

C EACH DATUM UILL BE DEVIDED BY THE SUM OF NFREG VALUES.
C
C

IT HAS BEEN IMPLEMENTED AS A PART OF PRE-PROCESSING
OF SPEECH DATA FOR A PH.D. DISSERTATION RESEARCH.

ON THE PDP 11/409 DEPARTMENT OF AUDIOLOGY AND SPEECH
SCIENCES9 MICHIGAN STATE UNIVERSITY.

URITTEN BY HIROTAKA NAKASONE9 19 MAY 1983.
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE NORPER<NFREO9SDATA)
REAL SDATA(NFREG)
TOTAL = 0.
DO 1 I = 19 NFREO
TOTAL = TOTAL + SDATA(I)
1 CONTINUE
DO 2 I = 19 NFREO
SDATA(I) = SDATA(I)/TOTAL
CONTINUE
RETURN
END
* ,

C
C
C
C
C
C

N

 

 

179

C FILE = DENCOR.FOR
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

PROGRAM DENCOR.FOR

OEECOR COMPUTES THE SPEARMAN PRODUCT MOMENT CORRELATION COEFFICIENTS
SOLUTE) OF ALL PAIRRING OF FREQUENCY COMPONENTS OF SPEC ~
OVER NUMF FILES. ' TEA, I AND J,

A ((NDATA)*(NDATA-l)/2) COR. MATRIX UILL BE USED AS INPUT TO

A CLUSTERING PROGRAM (COMPLETE LINK) TO SEARCH GROUPINGS BY
FREQUENCY COMPONENTS. A NDATA IS THE TOTAL NUMBER OF DISCRETE
FREQUENCIES9 EACH REPRESENTING ABOUT 39 HZ9 FROM 117 TO 3978 HZ.
THE OUTPUT OF DENCOR IS A SIMILARITY MATRIX9 I.E.9 THE GREATER
THE VALUE IN THE MATRIX 189 THE MORE SIMILAR TO ONE ANOTHER.

C
C
C
C
C
C
C
C
C
C
C
C
C
C
C SUBPROGRAMS:

C FUNCTION LCHECK <H.NARASONE>: FILE VERIFICATION ROUTINE.
g FUNCTION INnEx: MATRIX FORMATTING ROUTINE.

C

C

C

C

C

C

C

C

C

C

C

C

PROGRAMMED AS A PART OF FEATURE EXTRACTION PROCEDURE FOR A'
PH.D. DISSERTATION (COMPUTER VOICE IDENTIFICATION BY USING
INTENSITY DEVIATION SPECTRA AND FUNDAMENTAL FREQUENCY CONTOUR)

BY THE AUTHOR.

URITTEN BY 3HIROTAKA NAKASONE9 22-MAY-1983
DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES9 MSU.

TO COMPILE3 DENCOR=DENCOR/UE/DJ
TO LINK 3 DENCOR=DENCOR9SYSLIB/F

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /NAME/FNAMES

COMMON /NUMF/N

DOUBLE PRECISION EXT9PROG

REAL DMAT(4950)9DAT(309100)

LOGICALXI FNAMES<159100)9YN9LNAME(15)9FMT(6O)
LOGICALXI NEXT(3) -
DATA LNAME(1)/’D’/9LNAME(2)/’K’/9LNAME(4)/’3’/
DATA LNAME(10)/’.’/

DATA FROG/12RDKIORDPR SAV/

LNAME(15) = 0
FORMAT(1A1)
FORMAT(I3)
FORMAT(6A1)
FORMAT(14A1)
FORMAT(15A1)
FORMAT(12A1)
FORMAT(1X9’ UAIT.’)
'FORMAT(1X9/)

‘OCOLﬁa-buf-JH

N = 100
D0 10 I = 19 4950
DMAT(I) = O.
10 CONTINUE

C READ DATA FILES FROM A MASTER FILE.

 

300
210

410

420

 

180

TYPE 210

FORMAT(’ ENTER MASTER FILE NAME (DEV3FILNAM.EXT) P’9$)
READ(594) (LNAME(L)9L=1914)

CALL ASSIGN(139LNAME9149’OLD’)

READ(139410) (LNAME(L)9L=1914)9NUMF9(FMT(M)9M=196O)
FORMAT(14A19I396OA1)
DO 420 I = 19 NUMF
READ(1394) (FNAMES(M9I)9M=1914)
URITE(79425) (FNAMES(M9I)9M=1914)
FORMAT(1X914A1)
CALL ASSIGN(129FNAMES(19I)9149’OLD’)
READ(129FMT) DUM9 DUM
READ(129FMT) (DAT(I9J)9J=19100)
CALL CLOSE(12)
CONTINUE '
CALL CLOSE(13)

C COMPUTE SUM(N) AND SSQ(N) FOR N FREQUENCIES OVER NUMF SPECTRA.

600

605
606

CD
CD666
610
620

700

11 = 3

I2 = 102

RN = FLOAT(NUMF)
ND = 0

DO 620 I=19N-1
DO 610 J=I+19N

SUMX = 09
SUMY = 0.
SUMXX = 0.
SUMYY = 09
SUMXY = 0.

no 600 R=1.NUHF
SUMX = SUMX + DAT(K9I)
SUMY = SUMY + DAT(R9J)
= SUMXX+ DAT(K9I)*DAT(K.I)
SUMYY= SUMYY+ DAT(K9J)#DAT(R9J)
SUMXY= SUMXY+ DAT<K9I)*DAT(K9J)
CONTINUE

XBAR = SUMX / RN
YBAR = SUMY / RN
SDX = (SUMXX/RN - XBARXXBAR)**0.5
SDY = (SUMYY/RN - YBARXYBAR)**O.5
SDXSDY = SDX#SDY
TEST = SUMXY/RN - XBARXYBAR
IF(SDXSDY .GT. 1.E-10) GO TO 605
R = 1.0
GO TO 606
R = ABS(TEST / SDXSDY)
DMAT(INDEX(I9J)) = R
ND = ND+ 1
URITE(79666) R919J
FORMAT(1X9’R=’9E15.79’ FOR ’9I39’ VS.’9I39’ DMAT’)
CONTINUE
CONTINUE

CALL ASSIGN(139’PROXTP.DAT’91O9’NEU’)

URITE(1397OO) N
FORMAT(I39’—1’/ ’ (10F8.5)’)
DO 710 I = 19 N-l
URITE(13972O) (DMAT(INDEX(I9J))9J=I+19N)

CONTINUE

FORMAT(10F8.5)
CALL CLOSE(13)

 

 

800

 

181

TYPE 800

FORMAT(’ EXITING DENCOR. HAVE YOU SET TTY UIDTH=132 ?’
8/’ IF NOT9 INTERRUPT DENCOR AND DO FOLLOUS.’

8//’ .SET TTY UIDTH=132 {RET}

8’/’ .RUN ORDPR {RET}’//’ IF YOU HAVE9 JUST UAIT.’/////)

--- PROGRAM CHAINING TO ORDPR (INITIAL PART OF
HIERARCHICAL CLUSTERING BY COMPLETE-LINK METHOD

CALL CHAIN(PROG990)
CALL EXIT
END

INTEGER FUNCTION INDEX(I9J)

COMMON /NUMF/ N
INTEGER ROU9COL

ROU MINO(I9J)

COL MAX0(19J)

INDEX = (ROU-1)#N~ROU#(ROU+1)/2+COL
RETURN

END

Iii-mu.

 

 

 

182

C FILE3 FRATIO.FOR

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C

PROGRAMS.

INPUT3

OUENCY COMPONENTS) IN EACH CLUSTERING.

OUTPUT3

URITTEN BY3 HIROTAKA NAKASONE
DATE3 30-MAY-83

DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES9

C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C MICHIGAN STATE UNIVERSITY9 EAST LANSING9 MICHIGAN.
C
C

COMMON /SENSE/ X9 F

COMMON /SENSEP/ NODLST

REAL A(59109100)9 X(5910930)9 F(30)

INTEGER NODLST(100)9IEND(20)9NG9NP

LOGICAL*1 FNAMES(14950)9LNAME(14)9FMT(60)9YN
LOGICALXI MYNAME(10)

DATA NG/10/9NP/5/9NS/10/9MAXNUD/3O/

(1) A MASTER FILE UHICH CONTAINS NAMES OF IDS FILES.

(2) HNLIST.TMP9 TEMPORARY FILE CREATED BY THE PROGRAM
DENCOR9 AND DENDRO. HNLIST.TMP HOLDS INFORMATION
ABOUT CLUSTERING GROUPS9 AND NODES (0R9 DISCRETE FRE-

HHHHHHHHH

FRATIO IS DESIGNED TO COMPUTE A SET OF F-RATIOS OF IDS AND LTS
FEATURES UHICH ARE SENT AS SUBSETS BY THE PRECEDING CLUSTERING

(1) F-RATIO’S SUMMARIZED IN THE TABLE UITH OTHER INFO’S.

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

DATA LNAME(1)/’D’/9LNAME(2)/’K’/9LNAME(3)/’O’/9LNAME(4)/’3’/

CHECK IF CHAINED TO9 OR RUN FROM THE KEYBOARD.

CALL RCHAIN(IF9MYNAME95)

IF(IF.EQ.-1) URITE(791010) (MYNAME(L)9L=1910)
IF(IF.NE.-1) TYPE 1020
1010 FORMAT(1X9/////9’ PROGRAM FRATIO3 FILE SENT =
1020 FORMAT(1X9/////9’ PROGRAM FRATIO3’//)

5 FORMAT(1A1)

999 FORMAT(1X9/)
CALL ASSIGN(129’HNLIST.TMP’9109’OLD’)

READ(12910) NODES

10 FORMAT(I39/)

READ(12920) (NODLST(I)9I=19NODES)
20 FORMAT(2OI4)

CALL CLOSE(12)
D URITE(7922) (NODLST(I)9I=19NODES)
022 FORMAT(1X92OI4)

IF(IF.EQ.-1) GO TO 3000

TYPE 30

30 FORMAT(’ Master file name (DEV36Al.EXT) ?’9$)

’910A19//)

 

 

40

3000

3510

3500

50

60

70

95
90

61

98

99
100

80

D888

410

420

800

 

183

READ(594O) (LNAME(L)9L=1914)
FORMAT(14A1)
GO TO 3500

CONTINUE
DO 3510 IM = 1910

LNAME(IM+4) = MYNAME(IM)
CONTINUE

TYPE 50

FORMAT(’ Number of clusters ?’9$)
ACCEPT 609 NCLUST

FORMAT(I2)

TYPE 70
FORMAT(’ Enter ending node location for each c1ustter.’//)

IBEG = 1
DO 80 I = 19 NCLUST
TYPE 909 I
FORMAT(’ For cluster ’9129’ 3’93)
ACCEPT 619 IEND(I)
FORMAT(I3)
NC = IEND(I) - IBEG + 1
IF(NC.LE.MAXNOD) GO TO 99
TYPE 98
FORMAT(’ NUMBER OF NODES OUT OF RANGE.’)
80 TO 95
TYPE 1009 NC
FORMAT(’ Total number of nodes = ’9I29’ 0K ?’9$)
ACCEPT 59 YN
IF(YN.EQ.78) GO TO 95
IBEG = IEND(I) + 1
CONTINUE
TYPE 999
FORMAT(1OX9’X A I J K’/)
CALL ASSIGN(139LNAME9149’OLD’)
READ(139410) (LNAME(L)9L=1.14)9NUMF9(FMT(M)9M=1960)

FORMAT(14A19I3960A1)

1
J 1
DO 420 IX = 19 NUMF
READ(13940) (FNAMES(M9IX)9M=1914)
CALL ASSIGN(129FNAMES(19IX)9149’OLD’)
READ(129FMT) DUMMY9DUMMY9(A(I9J9K)9K=19100)
TYPE 999
CALL CLOSE(12)
I = I + 1
IF(I.LE.NP) GO TO 420
I = 1
J = J + 1
CONTINUE
CALL CLOSE(13)

I

IBEG = 1
KCLUST = 0
TYPE 999
TYPE 888
KCLUST = KCLUST + 1
ND = IEND(KCLUST) - IBEG + 1
KNT = 0
DO 500 K = IBEG9 IEND(KCLUST)
KNT = KNT + 1
DO 600 J = 19 NC

 

 

 

184

D0 700 I = 19 NP
X(I9J9KNT) = A(I9J9NODLST(K))
D URITE(79777) X(I9J9KNT)9A(I9J9NODLST(K))9I9J9K9NODLST(K)9KNT
D777 FORMAT(1X9F14.59F14.595X93I59’ NODLST(K)=’9I39’ KNT=’9I3)
700 CONTINUE

600 CONTINUE
500 CONTINUE
D TYPE 999

CALL SUBFR(NP9NG9ND)
CALL PRTFR(F9KCLUST9ND9IBEG9IEND(KCLUST))

IBEG = IEND(KCLUST) + 1

IF(KCLUST.LT.NCLUST) GO TO 800
STOP’ DONE. NORMAL EXIT.’
END

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE PRTFR(F9KCL9ND9IB9IE)
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /SENSEP/ NODLST
INTEGER NODLST(100)

REAL F(30)
D URITE(7950) IB9IE
D50 FORMAT(1X9’ IB =’9I49’ IE =’9I49$)

IFREQ = INT(FINTV32.)

 

D URITE(7960) FINTV9IFREQ

D60 FORMAT(’+’9’FINTV =f9F8.29’ IFREQ =’9159/)
TYPE 19 KCL

1 FORMAT(1X9///9T59’F-RATIO TABLE FOR CLUSTER NO. ’9I29/)
TYPE 2

2 FORMAT(1X9T59’FREGUENCY(HZ)’9T209’SEQUENCE 0 ’9T4O9
8’F-RATIO’/9T59’-— ' ’9T209’ ----------- ’9T409
8’ ------- ’/)
K = 0
DO 5 I = I39 IE

K = K + 1
D TYPE 559NODLST(I)
D55 FORMAT(’ NODLST(I)=’9I5)

KF = IFREQ + INT( FINTV X FLOAT(NODLST(I)) + 0.5)
URITE(7910) KF9 NODLST(I)9F(K)

 

10 FORMAT(6X9I49T239I39T339F14.3)

5 CONTINUE
TYPE 20

20 FORMAT(1X9//9T59’ _____ _ _______________ ’/)
RETURN
END

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE SUBFR€NP9NG9ND)

C COMPUTATION PART FOR FRATIOS

C BY H. NAKASONE

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /SENSE/ X9 F
REAL X(5910930)9 SUM(10)9SSQ(10)9F(30)

 

000

[I

D4

D10

D12

ND3
NP3
NG3

DFB
DFU
GN
SN

CON
RET
END

185

HOLDS NUMBER OF DIMENSIONS IN EACH NP.
HOLDS NUMBER OF PATTERNS IN EACH NG.
HOLDS NUMBER OF SPEAKERS (OR GROUPS).

= FLOAT(NG - 1)

= FLOAT(NGXNP - NS)
= FLOAT(NPXNG)
= FLOAT(NP)

SUM(J) = 0.
SSQ(J) = 0.

no 3 I NP
SUM(J) = SUM(J) + X(I9J9K)

ll
H
n

SSQ(J) SSQ(J) + X(I9J9K)*X(I9J9K)
CONTINUE
GSUM GSUM + (SUM(J)*SUM(J))/SN

6380 = GSSQ + SSQ(J)
T = T + SUM(J)

CONTINUE

SSTOT = 6550 - <T9T)/GN
SSBET = GSUM — <TtT>/GN
SSUIT = SSTOT - SSBET
BGMS = SSBET / DFB

UGMS = SSUIT / DFU

TSUG = 0

DO 4 J = 19 N6
TSUG = TSUG + ( SSQ(J) - (SUM(J)*SUM(J))/ SN )
CONTINUE
URITE(7910) SSTOT9SSBET9SSUIT
FORMAT(’ SSTOT=’9F14.39’ SSBET=’9F14.39’ SSUIT=’9F14.3)

URITE(7912) BGMS9UGMS9TSUG

FORMAT(’ BGMS=’9F14.39’ UGMS=’9F14.39’ TSUG=’9F14.3)
F(K) = 9999.9091

IF(UGMS.GT.0.) F(K) = BGMS / UGMS

TINUE

URN

 

 

 

 

186

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

FILE3 UPICK.FOR

UPICK IS DESIGNED TO DETECT FUNDAMENTAL FREQUENCIES IN A
RUNNING SPEECH BY INTERACTIVE METHOD. THIS PROGRAM MUST BE
CHAINED TO PROGRAM HPICK9 THEN TO FFPICK TO COMPLETE
MEASUREMENT OF 9 FEATURES OF FFC.

URITTEN BY3 H. NAKASONE
DATE3 26-JUN-83

AS A PART OF SERIES OF SOUND PROCESSING SOFTUARES
USED IN THE PH.D. DISSERTATION BY THE AUTHOR.

C

C

C

C

C

C

C

C

C

C

C

C DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES
C MICHIGAN STATE UNIVERSITY
C
C

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /SENSE/ MAXPIC9NBLOCK9NAME129NSR
COMMON /ICSPEC/ ISPEC(39)

COMMON /BUFF/ NBUF9XMAX9XMIN9YMAX9YMIN9YRANGE
DOUBLE PRECISION HPROG9 EXT

INTEGERX4 JLEN

INTEGER NDEV9 NEXT9 NBUF(3600)

LOGICALXl NAME6(6)9 NAME12(12)9 YN9 BEL9 ONE
LOGICAL*1 ZERO9 CHA9 REPT9 VTAB9 DOLBY

DATA EXT/3RSND/9 NSR/10000/9 IFILTR/3/9 BEL/'007/
DATA ONE/.TRUE./9 ZERO/.FALSE./9 REPT/.FALSE./9 VTAB/'Ol3/
DATA HPROG/12RDK1HPICK SAV/

KNIT = 3600
TYPE 1
1 FORMAT(’ PROGRAM UPICK3 ’///91X9’ Uant DOLBY?’9$)
' ACCEPT 779YN
77 FORMAT(Al)

DOLBY = .FALSE.
IF(YN.EQ.89) DOLBY = .TRUE.

500 CALL INIT(NBUF.KNIT)
CALL SCROL(191000)
TYPE 20009 VTAB
2000 FORMAT(1X9A1)
CALL SCROL(29100)
CALL APNT(O.9200.919-59094)

CALL APNT(0.9700.919-79091)

CALL SUBP(1)
CALL TEXT(’ READY ANALOG INPUT.’929’ UHEN READY9

S HIT RETURN KEY TO START.’)
CALL ESUB
CALL OFF(1)

CALL APNT<0o9800.909—79191)

 

N

40

100

O

OD \104

187

CALL SUBP(2)

CALL TEXT(’ CONVERSION IN PROCESS. ’)
CALL ESUB

CALL OFF(2)

ICHAN = IGETC()

IF(ICHAN.LT.O) STOP’ ICHAN NOT AVAILABLE’
TYPE 2

VFORMAT(’ NUMBER OF SECONDS (13) 7’9$)
ACCEPT 39 NSEC

LEN = NSR/256

RSEC = FLOAT(NSEC)

SR = FLOAT(NSR)

LENB = LEN # NSEC + 1

TYPE 10
FORMAT(1X9’ENTER OUTPUT FILE NAME ﬁ’9$)
IF(ICSI(ISPEC9EXT9990).NE.0) GO TO 20
IF(ISPEC(16).NE.0) GO TO 30

TYPE 22

FORMAT(’ THARDUARE ERROR?’//)
CALL EXIT
IF(IFETCH(ISPEC(16)).NE.0) STOP’FETCHING ERROR’

NBLOCK = IENTER(ICHAN9ISPEC(16)9-1)
IF(NBLOCK.GE.LENB) GO TO 100

TYPE 409 LENB9 NBLOCK

FORMAT(1X9’NOT ENOUGH BLOCK SIZE.’/’
8 BLOCK SIZE REQUESTED = ’9I49’ AND ALLOCATED =’9I4)
CALL EXIT
TYPE 79 BEL
CALL R50ASC(129ISPEC(16)9NAME12(1))
CALL JAFIX(SR#RSEC+.59 JLEN)
CALL ON(l)
CALL BIT12(ONE)
CHA = ITTINR()
IF(CHA.NE.13) GO TO 210
CALL BIT12(ZERO)
CALL OFF(1)
CALL ON(2)
TYPE 79 BEL
I=IDA(ICHAN979INT(0.04*SR)9INT(0.01*SR)9JLEN996090)
IF(I.NE.0) TYPE 2259 I
FORMAT(’ ERROR IDA I = ’9I4)
CALL CLOSEC(ICHAN)
CALL IFREEC(ICHAN)
CALL OFF(2)
TYPE 250 ,
FORMAT(1X9/l9’ SOUND INPUT PROCEDURE COMPLETED. UAIT...’//)
CALL INIT(NBUF9KNIT)
NUORD = 9

--- CREATE SUBPICTURES OF SPEECH UAVES -—-

CALL SUBPIC(DOLBY)
--' CHAINING TO PROGRAM HPICK -6-
CALL CHAIN(HPROG9MAXPIC9NUORD)

FORMAT(I3)
FORMAT(’+’9A1)

TO COMPILE9 UPICK=UPICK/U
TO LINK9 UPICK=UPICK9FUSS9BIT129SUBPIC9GTLIB9SYSLIB/F

END

 

C

188

0

FILE SUBROUTINE3 SUBPIC.FOR

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C

C
C
C
C
C
C
C

Routine SUBPIC create a set of displae subpictures of

speech wave and command Characters reouired be the succeeding
program HPICK (main part of the interactive peak picking
technioue). SUBPIC is called from program UPICK.

H. NAKASONE
JUN 279 1983

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

SUBROUTINE SUBPIC(DOLBY)

COMMON /SENSE/ MAXPIC9 NBLOCK9 NAME129 NSR

COMMON /BUFF/ NBUF9 XMAX9 XMIN9 YMAX9 YMIN9 YRANGE
DOUBLE PRECISION DUMMY

INTEGER IBUF(1024)9 NBUF(3600)

LOGICALXI BEL9 YN9 DONE9 NAME12(12)9 NAMOUT(10)
LOGICAL*1 VTAB9NAME15(15)9NAMER(6)9DOLBY

DATA VTAB/'013/9 NSR/IOOOO/

DATA NAME12(10)/’S’/9NAME12(11)/’N’/9NAME12(12)/’D’/
DATA NAME12(1)/’D’/9NAME12(2)/’K’/9NAME12(3)/’O’/
DATA NAME15(l)/’D’/9NAME15(2)/’K’/9NAME15(3)/’0’/
DATA NAME15(4)/’3’/9NAME15(5)/’T’/9NAME15(6)/’M’/
DATA NAME15(7)/’P’/9NAME15(11)/’.’/9NAME15(12)/’D’/
DATA NAME15(13)/’P’/9NAME15(14)/’Y’/

XMIN = 0.
YMIN = ~573.
XMAX = 1023.
YMAX = 450.
YRANGE = 400.
XEND = 1022.
NOISE = 400
KSIG = 6000
.LION = 8000
IPRO = 7001
'IRPT. = 7002
ICOR = 7003
IJUMP = 7004
ISTART = 7005
LUAIT = 9000
IDOT = 9500
IDOTC = 9600
1816 = 8500
KNIT = 3600
NPTS = 1024

CALL SCROL(191000)
TYPE 8409VTAB

840 FORMAT(1X9A1)

CALL SCROL(1912)

CALL SCAL(XMIN9YMIN9XMAX9YMAX)
CALL APNT(XMIN9YMAX919-89O91)
CALL SUBP(LION)

CALL LVECT(XMAX90.91)

CALL LVECT(O.9-2.*YMAX91)
CALL LVECT(-XMAX90.91)

 

CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
YBOT

CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL
CALL

 

 

189

LVECT(0.92.*YMAX91)
APNT(0.90.909‘19091)
LVECT(1023.90.)

ESUB

ERAS(LION)

CMPRS
SAVE(’DKO:FRAMER.DPY’)
INIT(NBUF9KNIT)
SCAL(XMIN9YMIN9XMAX9YMAX)
= ~(YMAX+65.)

APNT(150.9YBOT919-39091)
SUBP(ICOR)
TEXT(’?Correction?’)
ESUB

ERAS(ICOR)

CMPRS
SAVE(’DKO3CORECT.DPY’)
INIT(NBUF9KNIT)
SCAL(XMIN9YMIN9XMAX9YMAX)
APNT<800.9YBOT919-89091)
SUBP(IRPT)
TEXT(’?REPeat?’)

ESUB

ERAS(IRPT)

CMPRS
SAVE(’DK03REPEAT.DPY’)
INIT(NBUF9KNIT)
SCALCXMIN9YMIN9XMAX9YMAX)
APNT<550.9YBOT919-89091)
SUBP(IPRO)
TEXT(’?Proceed?’)

ESUB '

ERAS(IPRO)

CMPRS
SAVE(’DK03PROCED.DPY’)
INIT(NBUF9KNIT)
SCAL(XMIN9YMIN9XMAX9YMAX)
APNT(0.9YBOT-26.9-19-89091)
SUBP(ISTART)

TEXT(’ Hit frame’)

ESUB

ERAS(ISTART)

CMPRS
SAVE(’DK03STARTR.DPY’)
INIT(NBUF9KNIT)
SCAL(XMIN9YMIN&XMAX9YMAX)
APNT(350.9YBOT919-S9091)
SUBP(IJUMP)
TEXT(’?Jump?’)

ESUB

ERAS<IJUMP)

CMPRS
SAVE(’DK03JUMPER.DPY’)
INIT(NBUF9KNIT)
SCAL(XMIN9YMIN9XMAX9YMAX)
APNT(0.9YBOT9-19-S9091)
SUBP(ISIG)

TEXT(’ Hit wave’)

ESUB

ERAS(ISIG)

CMPRS
SAVE(’DKO:SIGNAL.DPY')
INIT(NBUF1KNIT)
SCAL(XMIN9YMINrXMAX9YMAX)
APNT<300.90.9-19~8)

 

 

 

 

 

 

190 *

CALL SUBP(LUAIT)

CALL TEXT(’ U A I T.)
CALL ESUB

C CALL ERAS<LUAIT>

C CALL CMPRS

CALL SAVE(’DKO3UAITER.DPY’)
CALL INIT(NBUF9KNIT)
CALL SCAL(XMIN9YMIN9XMAX9YMAX)

50 SR = FLOAT(NSR)
ICHAN = IGETC()
CALL IRAD50(129NAME12(1)9DUMMY)
NBLOCK = LOOKUP(ICHAN9DUMMY)
IF(NBLOCK.LT.-1) STOP ’ BAD LOOKUP’.
IF(NBLOCK.EQ.-1) STOP ’ FILE NOT FOUND’

RSEC = 256.1 FLOAT(NBLOCK)/SR

D TYPE 609 NBLOCK9 RSEC
D60 FORMAT(1X9I49’ BLOCKS(= ’9F8.49’ SEC) FOR THIS FILE.’//)
MAXBLK = NBLOCK - (NPTS/256)
NBLK = 0
KSIG = 6000
NAMSIG = 0

1000 NU = IREADU(NPTS9IBUF9NBLK9ICHAN)

NAMSIG = NAMSIG + 1

MAX = 0
CHECK IF NOISE REDUCTION BY DOLBY DESIRED.

IF(DOLBY) GO TO 53
COME HERE IF NO DOLBY REQUESTED.

DO 1550 I = 19 1024

IF(IABS(IBUF(I)).GT.MAX) MAX = IABS(IBUF(I))

1550 CONTINUE

GO TO 55
COME HERE TO DO DOLBY.
53 DO 1553 I = 19 1024
ITEMP = O

IBUF(I) — NOISE
IBUF(I) + NOISE

IF(IBUF(I).GT.NOISE) ITEMP

IF(IBUF(I).LT.~NOISE)ITEMP

IBUF(I) = ITEMP

IF(IABS(ITEMP).GT.MAX) MAX
1553 CONTINUE

IABS(ITEMP)

55 FAC = 1.0
IF(MAX.GT.400) FAC = 400. / MAX

DO 560 I = 19 NPTS
IBUF(I) = IBUF(I) * FAC
560 CONTINUE

CALL APNT(0.90.999-89091)

CALL SUBP(KSIG)

CALL LVECT(0.9FLOAT(IBUF(1)))

DO 565 I = 29NPTS
DY 3 FLOAT(IBUF(I) - IBUF(I‘1))
CALL LVECT(1.9DY)

565 CONTINUE
CALL ESUB

ND 3 NAMSIG / 100
NAME15(8) = ND + '60

ND = MOD(NAMSIG9100)
NAME15(9) = ND / 10 + '60
ND = MOD(ND91O)
NAME15(10) = ND + '60
NAME15(15) = 0

 

 

 

 
 

D700

300

 

 

 

 

191

CALL ERAS(KSIG)
CALL CMPRS
CALL SAVE(NAME15)

URITE(79700) NAME159 KSIG9 NAMSIG
FORMAT(1X9’DONE FOR ’915A193X9’ KSIG=’9IS9
8’ AND NAMSIG=’9159/)

CALL INIT(NBUF9KNIT)

CALL SCAL(XMIN9YMIN9XMAX9YMAX)
NBLK = NBLK + 4
IF(NBLK.LE.MAXBLK) GO TO 1000
MAXPIC = NAMSIG

CALL CLOSEC(ICHAN)

CALL IFREEC(ICHAN)

RETURN

END

 

192

TT3=DK03BIT12.FOR

FILE3 BIT12.FOR

HHHHHHHHHHHHHHHHHHHH

SUBROUTINE TO ACTIVATE/DEACTIVATE A BIT 12 FOR
SPECIAL KEY MODE.

URITEN BY H. NAKASONE

08-JUN-1983

HHHHHHHHHHHHHHHHHHHHH

0000000

SUBROUTINE BIT12(KON)

LOGICALXI KON
C DEACTIVATE SPECIAL KEY MODE FOR KON = .FALSE..

IF(.NOT.KON) CALL IPOKE('449IPEEK('44).AND.'167777)
C ACTIVATE SPECIAL KEY MODE FOR KON = .TRUE..

IF(KON) CALL IPOKE('449IPEEK('44).OR.'10000)

RETURN

END

 

 

 

193

C FILE HPICK.FOR

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C HPICK is designed to measure Fo’s (fundamental freeuence)
C in speech signal directle from the time domain be the use
of ’Interactive peak detecting technieue’.

Input3 File of displae subpictures created be UPICK.

Output3 Data file which contains amplitudes<in absolute value)

H.

and periods(in number of sampled points).

(2) This program must be chained to FFPICK to complete
a FFC feature file.

NAKASONE

JUN 27 1983

Dept 0

of Audiologe and Speech ScienceS9 MSU9 East Lansins9 MI.

C
C

C

C

C

C

C Reouirement3 (1) Light pen attached to the CRT.
C

C

C

C

C

C

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

COMMON /FFC/ KT9 NAMOUT

COMMON /SENSE/ MAXPIC9 NBLOCK9 NAME129 NSR

COMMON /BUFF/ NBUF9 XMAX9 XMIN9 YMAX9 YMIN9 YRANGE
COMMON /ICSPEC/ ISPEC(39)

DOUBLE PRECISION DUMMY9 FPROG

INTEGER IBUF(1O24)9 NBUF(4000)9 NAMP(1500)

INTEGER NTIME(100)9 NTAO(1500)

LOGICAL¥1 BEL9 YN9 DONE9 NAMEI2<12)9 YNEND9 NAMOUT(10)
LOGICAL*1 DEBUG9VTAB9 NAME(15)9 KNTP(4)9LNAME(5)
EOUIVALENCE (X09X)9(Y09Y)9(M09M)9(N09N)

DATA DUMMY/12RDK1TESTERSND/9VTAB/'013/

DATA BEL/'7/9NAME12(10)/’S’/9NAME12(11)/’N’/9NAME12(12)/’D’/
DATA NAME12(1)/’D’/9NAME12(2)/’K’/9NAME12(3)/’O’/
DATA NAMOUT(10)/’C’/9NAMOUT(9)/’F’/9NAMOUT(8)/’F’/
DATA NAMOUT(7)/’.’/

DATA NSR/lOOOO/
DATA NAME(1)/’D’/9NAME(2)/’K’/9NAME(3)/’0’/9NAME(4)/’3’/

DATA NAME(5)/’T’/9NAME(6)/’M’/9NAME(7)/’P’/9NAME(11)/’.’/
DATA NAME(12)/’D’/9NAME(13)/’P’/9NAME(l4)/’Y’/
DATA FPROG/12RDK1FFPICKSAV/

CALL RCHAIN(IF9MAXPIC99)

C INITIALIZE.

333

334

DO 333 I

DO 334 I
NTIME(

IF(IF.EQ.-1) GO TO 2222

TYPE 2224

FORMAT(’ NUMBER OF DISPLAY PICTURES ?’9$)
ACCEPT 22269MAXPIC _
FORMAT(I3)

TYPE 2228

FORMAT(’ FILE NAME (6A1) }’9$)
READ(59223O) (NAME12(M)9M=499)

FORMAT(6A1)

O.
—573.
1023.
450.
400.
1022.

XMIN
YMIN
XMAX
YMAX
YRANGE
XEND

 

 

 

10

I!
D3010
D
D

1111

840

 

194

KSIG = 6000
LION = 8000
IPRO = 7001
IRPT = 7002
ICOR = 7003
IJUMP = 7004
ISTART = 7005
IDOT = 9500
IDOTC = 9600
1818 = 8500
KNIT = 4000
NPTS = 1024

ENCODE(39109KNTP) MAXPIC
FORMAT(I3)
KNTP(4) = 0

DEBUG = .FALSE.

TYPE 3010

FORMAT(’ DEBUG ?’9$)

ACCEPT 8059YN

IF(YN.EQ.89) DEBUG = .TRUE.

SR = FLOAT(NSR)

DO 35 I = 49 9
NAMOUT(I-3) = NAME12(I)

CONTINUE

0.
0.
0.

TIMOLD
TAOOLD
AMPOLD
RT = 0
NU = 0
NBLK = 0
NDUNT = 0

CONTINUE

CALL INIT(NBUF9KNIT)

CALL SCROL(191000)

TYPE 8409VTAB

FORMAT(1X9A1)

CALL SCROL(1912)

CALL SCAL(XMIN9YMIN9XMAX9YMAX)

CALL SUBP(LION)

CALL RSTR(’DK03FRAMER.DPY’)
CALL ESUB

CALL OFF(LION)

YBOT = -(YMAX+65.)

CALL APNT(15O.9YBOT919-89091)
CALL SUBP(ICOR)

CALL RSTR(’DK03CORECT.DPY’)
CALL ESUB

CALL OFF(ICOR)

CALL APNT(800.9YBOT919-89091)
CALL SUBP(IRPT)

CALL RSTR(’DK03REPEAT.DPY’)
CALL ESUB

CALL OFF(IRPT)

CALL APNT(550.9YBOT919-89091)
CALL SUBP(IPRO)

 

 

 

 

195

CALL RSTR(’DKO3PROCED.DPY’)
CALL ESUB
CALL OFF(IPRO)

CALL APNT(0.9YBOT~26.9~19-89091)
CALL SUBP(ISTART)

CALL RSTR(’DKO3STARTR.DPY’)

CALL ESUB

CALL OFF(ISTART)

CALL APNT(350.9YBOT919-89091)
CALL SUBP(IJUMP)

CALL RSTR(’DK03JUMPER.DPY’)
CALL ESUB

CALL OFF(IJUMP)

CALL APNT(0.9YBOT9-19-89O91)
CALL SUBP(ISIG)

CALL RSTR(’DKO3SIGNAL.DPY’)
CALL ESUB

CALL OFF(ISIG)

1000 NU = NU + 1
LASTK = 0
RMAX = 0.0

ND = NU / 100

NAME(8) = ND + '60
LNAME(1)= NAME(8)

ND = MOD(NU9100)
NAME(9) = ND / 10 + '60
LNAME(2)= NAME(9)

ND = MOD(ND910)

NAME(IO) = ND + '60
LNAME(3) = NAME(lO)
NAME(15) = 0
LNAME(4) = 45
LNAME(5) = 0

CALL ERAS(5000)

CALL APNT(86O.9-440.9-59-89091)
CALL SUBP(5000)

CALL TEXT(LNAME)

CALL APNT(914.9—440.9-59-8)
CALL TEXT(KNTP)

CALL ESUB

CALL SUBP(KSIG)
CALL RSTR(NAME)
CALL ESUB

CALL ON(LION)

450 TX = —1.0
X = 000
LASTK = 0
550 K ' 0

KLION = 1000
CALL ON(ISTART)

COMMENCE THE FIRST LIGHT PEN HIT TESTING ON THE FRAME.

CALL ERAS(5050)

90 CALL LPEN(M09N09X09Y0)
IF(M.NE.1 .OR. N.NE.LION .OR. X.LT.TX) GO TO 90

 

 

100

101

106

1008

196

IF(ABS(XEND-X).LE.1.0) GO TO 300
TYPE 79 BEL

CALL ON(ISIG)

CALL OFF(ISTART)

= X
TY = Y
CALL LPEN(M09N09XO9Y0)

IF(M.NE.1 .OR. N.NE.KSIG .OR. X-TX.LT.0.0) GO TO 100
CALL ON(ISTART)

K = K + 1
Ex = Tx
EY = TY
KX = INT(X) + 1
IX = KX

IBX = IABS(IBUF(KX))

KX = KX + 1
IF(IABS(IBUF(KX)).LT.IBX) GO TO 106
IBX = IABS(IBUF(KX))

GO TO 105 _

KX = KX - 1

CALL APNT(TX9TY9-29-39091)
CALL SUBP(K)

CALL LVECT(X-TX9Y-TY)

CALL ESUB

NTIME(K) = KX

IF(K.LT.2) GO TO 1008

FREQ = SR / FLOAT(NTIME(K)-NTIME(K-1)+1)
CALL ERAS(5050)

CALL APNT(86O.9410.9-19-89091)

CALL NMBR(5O509FREQ9’F6.1’)

CALL ON(ICOR)
TX = FLOAT(KX)
TY = Y
CALL LPEN(M09N09X09Y0)

IF(N.NE.1) GO TO 110

IF(N.EQ.KSIG .AND. X-TX.GT.20.) GO TO 101
IF(NOEGOLION .AND. x—TXOGEOOOO) GO TO 160
IF(N.NE.ICOR) GO TO 110

COME HERE IF N = ICOR TO CORRECT PREVIOUSLY DROUN VECTOR.

145

CALL ERAS(K)

CALL CMPRS
K = R -1
Tx = Ex
TY = EY

CALL OFF(ICOR)
CALL ON(ISIG)
GO TO 100

COME HERE IF N = LION TO CHECK OPTIONS MADE.

160

DX = X - TX

DY = Y - TY

KLION = KLION + 1
CALL APNT(TX9TY909'39091)
CALL SUBP(KLION)
CALL LVECT(DX9DY)
CALL ESUB

TYPE 79 DEL

CALL ON(IJUMP)
CALL ON(IPRO)
CALL OFF(ICOR)
CALL ON(IRPT)

 

.. ﬂJ

1.333%”).
, 1?

 

 

197

CALL OFF(ISIG)
CALL OFF(ISTART)

TX = X
TY = Y
180 CALL LPEN(M09N09X09Y0)

IF(M.NE.1) GO TO 180

IF(N.NE.IJUMP .AND. N.NE.IPRO .AND. N.NE.IRPT) GO TO 180
CALL OFF(IJUMP)

CALL OFF(IPRO)

CALL OFF(IRPT)

TYPE 79 BEL

IF(N.EQ.IRPT) GO TO 250

IF(K.GE.2) GO TO 200

CALL ERAS(K)

CALL ERAS(KLION)

CALL CMPRS

K = K - 1

KLION = KLION - 1
IF(N.EQ.IPRO) GO TO 300
CALL OFF(IJUMP)

CALL OFF(IPRO)

CALL OFF(ISIG)

CALL ON(ISTART)

Ex = Tx

EY = TY

GO TO 90
200 KT = KT + 1

IF(KT.GT.1500) GO TO 600
NAMP(KT) = IABS(IBUF(NTIME(K+1)))
NTAO(KT) = O

TIMNEU = FLOAT(NTIME(1))

TAONEU = TIMOLD + TIMNEU

CD IF(DEBUG) TYPE 22069TIMOLD9TAOOLD
CD2206 FORMAT(’ TIMOLD =’9F8.09’ TAOOLD =’9F8.0)
CD IF(DEBUG) TYPE 22059TIMNEU9TAONEU
CD2205 FORMAT(’ TIMNEU =’9F8.09’ TAONEU =’9F8.0)
IF(TAONEU.LE.0.) GO TO 205
IF( (ABS(TAONEU-TAOOLD)/TAONEU).GT. 0.25) GO TO 205
NTAO(KT) = INT(TAONEU)
205 CONTINUE
D IF(DEBUG) TYPE 2179 NAMP(KT)9NTAO(KT)9KT
DO 210 J = 29 K
KT = KT + 1
NAMP(KT) = IABS(IBUF(NTIME(J)))
NTAO(KT) = NTIME(J) - NTIME(J-l)
D IF(DEBUG) TYPE 2179 NAMP(KT)9NTAO(KT)9KT
210 CONTINUE

D217 FORMAT(’ NAMP=’9I69’ NTAO=’9I69’ KT=’9I4)
LASTK = LASTK + K

250 IF(K.LE.0) GO TO 305
D0 310 I = 19 K
CALL ERAS(I)
310 CONTINUE
305 IF(KLION.LE.1000) GO TO 330
D0 320 I = 10019 KLION
CALL ERAS(I)
320 CONTINUE
330 CALL CMPRS

IF(N.EQ.IRPT) GO TO 400
IF(N.EQ.IPRO) GO TO 300

 

198

IF(TX.GE. XEND-10.) GO TO 300

COME HERE IF N = IJUMP.

400

300

600

610

630
620

800
805
810
820
606

CALL OFF(IJUMP)
CALL OFF(IPRO)

CALL OFF(ISIG)

CALL ON(ISTART)
GO TO 550

CALL OFF(IJUMP)
CALL OFF(IRPT)
CALL OFF(IPRO)
CALL OFF(ICOR)
CALL OFF(ISIG)
KT = KT - LASTK
GO TO 450

IF(NU .GE. MAXPIC) GO TO 600
CALL OFF(IJUMP)
CALL OFF(IRPT)
CALL OFF(IPRO)
CALL OFF(ICOR)
CALL OFF(ISIG)
CALL OFF(ISTART)
CALL OFF(LION)
CALL ERAS(KSIG)

CALL CMPRS
TIMOLD = 0.
TAOOLD = 0.

IF(N.NE.IPRO) GO TO 1234
TIMOLD = XEND - FLOAT(NTIME(K))
TAOOLD = FLOAT(NTAO(KT))

CONTINUE

KOUNT = KOUNT + 1

TYPE 79 BEL

IF(KOUNT.LE.5) GO TO 1000

KOUNT = 0

CALL INIT(NBUF9KNIT)

GO TO 1111

CALL INIT(NBUF9KNIT)
CALL FREE
TYPE 6059NAMOUT _
FORMAT(’ URITING FFC RECORD FOR3 ’910A19/)
CALL ASSIGN(139NAMOUT9109’NEU’)
URITE(139610) (NAMOUT(M)9M=1910)9KT9NSR
FORMAT(10A19IS9I69’ (3I7)’/)
DO 620 I = 19 KT

URITE(139630) NAMP(I)9NTAO(I)9I

FORMAT(3I7)
CONTINUE
CALL CLOSE(13)
TYPE 820
CALL CHAIN(FPROG9KT910)
FORMAT(’+’9A1)
FORMAT(’+’9A1)
FORMAT(Al)
FORMAT(’ OK (Y/N) ?’9$)
FORMAT(’ CHAINING TO FFPICK.’/)
FORMAT(IOAl)
END

 

 

199

C FILE FFPICK.FOR

gHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

FFPICK COMPUTES SEVERAL MEASURES BASED UPON FUNDAMENTAL

FREQUENCY CONTOURS.

INPUT3 FFC DATA FILE CONTAINING PERIODS (STORED IN THE NUMBER
OF SAMPLED POINTS) AND ABSOLUTE AMPLITUDES.

OUTPUT3 HAN FILE CONTAINING ITS FILE NAME9 FORMAT INFORMATION
OF 9 FEATURES (MEASUREMENTS).

URITTEN BY H. NAKASONE
DATE 23-JUN-83

DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES
MICHIGAN STATE UNIVERSITY
EAST LANSING9 MICHIGAN
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
COMMON /FFC/ KT9 NAMOUT
INTEGER NTAO(1500)9 NAMP(1500)
REAL RTAO(1500)9 RAMP(1500)
LOGICALXI NAMOUT(10)9BEL9FILNAM(1O)9FMT(60)
EQUIVALENCE (NAMOUT9FILNAM)
DATA BEL/'007/
CHECK IF CHAINED FROM HPICK. DATA SENT ARE STORED IN KT AND NAMOUT.
CALL RCHAIN(IF9KT910)

C
C
C
C
C
C
C
C
C
C
C
C
C
C

IF(IF.EQ.-1) GO TO 30

TYPE 10
10 FORMAT(’ INPUT FFC FILE NAME (10A1) ?’9$)
READ(592O) (FILNAM(M)9M=1910)
2O FORMAT(10A1) -
30 CALL ASSIGN(139FILNAM9109’OLD’)
4O READ(13950) FILNAM9NTyNSR9FMT
50 FORMAT(10A19159I6960A1)
TYPE 51
51 FORMAT(1X9’Computation on FFC in progress. Uait.’/)

DO 60 I=19NT .
READ(139FMT) NAMP(I)9NTAO(I)9J
RAMP(I) = FLOAT(NAMP(I))
RTAO(I) = FLOAT(NTAO(I))

60 . CONTINUE

CALL CLOSE(13)
SR = FLOAT(NSR)
IF(IF.EQ.-1) TYPE 659KT9NT

65 FORMAT(’ TOTAL PERIODS SENT =’9159’ AND READ =’915)
K = 0
TAOMAX = 0.
TAOMIN = 500.
I

DO 100 = 19 NT

 

 

100

D
D110

COMPUTE

300

COMPUTE

200

IF(RTAO(I).LE.20. .OR. RTAO(I).GE.250.) GO TO 100
K = K + 1
RTAO(K) = RTAO(I)
RAMP(K) = RAMP(I)
IF(RTAO(K).LT.TAOMIN) TAOMIN
IF(RTAO(K).GT.TAOMAX) TAOMAX
CONTINUE
RK = FLOAT(K)
TYPE 1109NT9K
FORMAT(' ORIGINAL TAO”S =’9I49’ NOZERO TAO”S =’9I4)

RTAO(K)
RTAO(K)

II II

AVERAGE TAO9 AVERAGE AMP9 ETC.

AVETAO = 0.

AVEAMP = 0.

SSGFF = 0 0

SSQAMP = 0.

DO 200 I = 19 K
AVETAO = AVETAO + RTAO(I)
AVEAMP = AVEAMP + RAMP(I)

SSQFF = SSOFF + (SR/RTAO(I))*(SR/RTAO(I))
SSGAMP = SSQAMP + RAMP(I)*RAMP(I)
CONTINUE

AVETAO AVETAO / RK

AVEAMP = AVEAMP / RK
SDFF = SQRT(SSOFF/RK - (SR/AVETAO)*(SR/AVETAO))

SDAMP = SQRT(SSQAMP/RK - AVEAMPXAVEAMP)

DBSDAM = 0.
IF(SDAMP.GT.0.0) DBSDAM = 20.3ALOG10(SDAMP)

DBAVAM = 0.
IF(AVEAMP.GT.0.0) DBAVAM = 20.¥AL0610(AVEAMP)

DLFF 0.

DAMP = 0.

DO 300 I = 29 K
F1 = SR / RTAO(I)
F2 = SR / RTAO(I-l)

DLFF = DLFF + ABS(F1 -F2)
DAMP = DAMP + ABS(RAMP(I) - RAMP(I-1))
CONTINUE

DLFF = DLFF / (RK-l.)
DAMP = DAMP/(RK-l.)

IF(DAMP.GT.0.) DAMPDB 20.*ALOGlO(DAMP)

AVFF = o.
IF(AVETAO.GT.0.) AVFF = SR / AVETAO
FFMX = SR / TAOMIN

FFMN = SR / TAOMAX

FFRG = FFMX - FFMN

z

PERJIT = (DLFF/AVFF)*100.

PERSIM = (nAHP/AUEAHP>9100.

TYPE 7222
FORMAT(1X9/) .
URITE(797200) (FILNAM(M)9M=1910)9K

FORMAT(//91X9'
z—— "/

8’ Fi1e3 ’910A19/9’ Number of periods detected
8 = ’9I49/9’ “'7'
8"

 

'4/>

 

201

URITE(797300) AVFF9SDFF9DLFF9PERJIT

7300 FORMAT(’ Summare of fundamental freouence (F.F.)3’9//9
8’ Mean F.F. =’9F8.29’ (Hz)’/9
8’ Standard deviation of F.F. =’9F8.29’ (Hz)’/9
8’ Average variation of F.F.(DF) =’9F8.29’ (Hz)’/9
8’ Ratio of BF to mean F.F =’9F8.29’ (%)’///)

URITE(797400) AVEAMP9DBAVAM9SDAMP9DBSDAM9DAMP9DAMPDB9PERSIM
7400 FORMAT(’ Summare of amplitude of F.F. 3’9//9

 

 

8’ Mean amplitude =’9F9.29’ (’9F5.19’ dB)’/9
8’ Standard‘deviation of ampl. =’9F9.29’ (’9F5.19’ dB)’/9
8’ Average variation of amp1.(DA) =’9F9.29’ (’9F5.19’ dB)’/9
t’ Ratio of DA to mean ampl. =’9F9.29’ (Z)’
8/9’ ~ '
8 ’9/)
URITE(797450) FFMX9FFMN9FFRG

7450 FORMAT(1X9’Maximum F.F.=’9F8.29’ Minimum F.F.=’9F8.29

3’ Rania FOF9=’7F702)

CONVERT EXTENSION .FFC TO .HAN.
CHARACTER H = 789 A = 659 N = 72 BY ASCII.

FILNAM(10) a 78
FILNAM(9) = 65
FILNAM(B) = 72

CALL ASSIGN(139FILNAM9109’NEU’)
URITE(139400) (FILNAM(M)9M=1910)

400 FORMAT(10A19’ (8F6.29F10.2)’/)
URITE(139410)AVFF9SDFF9DLFF9PERJIT9PERSIM9FFMX9FFMN9FFRG9DAMP
410 FORMAT(8F6.29F10.2)

CALL CLOSE(13)
TYPE 4259FILNAM
425 FORMAT(//////91X9’Computed HAN file name3 ’910A1//)
CALL EXIT
END

 

 

202

C FILE: FFFRAT.FOR
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

FFFRAT IS DESIGNED TO COMPUTE F-RATIO’S OF 9 FFC FEATURES

INPUT: DATA FILE WITH AN EXT. NAME ’HAN’.
OUTPUT3 DATA FILE UITH AN EXT. NAME ’TIM’.

11-JULY-83

H.NAKASONE

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
COMMON /SENSE/ X: F
COMMON /PASS/ FMT
DOUBLE PRECISION UN(9)
REAL X(5910920)9 F(20)
LOOICAL¥1 FNAMES<14750)vMNAME<14)9LNAME(14)vFMT(6O)
LOGICALXl OUTEXT(3)rOUTDEU(3)9EXT<3)9DEU(3)
DATA OUTEXT(1)/’T’/rOUTEXT(2)/’I’/9OUTEXT(3)/’M’/
DATA OUTDEU(1)/’D’/rOUTDEU(2)/’K’/rOUTDEV(3)/’1’/

000000000

DATA VN(1)/’ AVFF’/9VN(2)/’ SDFF'/.UN<3>/' DLFF’/
DATA UN(4)/’ FERJIT'/.VN<5>/' FERSIM'/.UN<6)/' FFMAX’/
DATA VN(7)/’ FFMIN’/rVN(B)/’ FFRNG’/yUN(9>/’ DAMP’/
NP = 5
N6 = 10
ND = 9
TYPE 10

10 FORMAT(’ ENTER MASTER FILE NAME (14A1) D'.$>
READ(5.20) <MNAME<M>.N= 1.14)

20 FORMAT(14A1)
TYPE 30

30 FORMAT(’ ENTER EXT NAME OF INPUT DATA FILE b'.$)
READ(5.35) <EXT<N).N=1.3)

35 F0RNAT<3A1>
TYPE 40

4o FORMAT(’ ENTER DEV NAME OF INPUT DATA FILE y'.s>

READ(SyBS) (DEU(M);M=193)

CALL RDFILE(MNAME!FNAMES!EXT7DEV)

C TRANSFER DATA TO ARRAY X.

D TYPE 90
D90 FORMAT(1X9’AVFF’1T87’SDFF’leéy’DLFF’vT24r’PERJIT’:
D 8T327’PERSIM’9T4O,’FFMAX’9T489’FFMIN’vTﬁéy’FFRNG’y
D 8T64y’DAMP’//)

K = 0

DO 100 I = 17 NB

DO 200 J = 1r NP

K x K 1
CALL ASSIGN<12TFNAMES(1rK)914,’OLD’)
READ(12721O) (LNAME(M)TM=1910)9FMT

 

 

203

210 FORMAT(1OA1760A19/)

READ(127FMT) (X(J!I!KU)9KD=17ND)
D URITE(79220) (X(J!IrKD)rKD:19ND)
D220 FORMAT(1X!8F7.2:F10.2)

CALL CLOSE(12)
200 CONTINUE

TYPE 9

9 FORMAT(1X7/)
100 CONTINUE
D TYPE 90

CALL SUBP(NP7N89ND)
PAUSE ’ ADJUST TO THE TOP OF NEW PAGE.’

URITE(793OO) (MNAME(M)9M=1914)

300 FORMAT(1OXy’SUMMARY 0F F—RATID”S: ’914A19//’
810x:’FEATURE’r15Xy’F-RATIO’v/rloxy’ ——————— '.15x,
x' ——————— './)

DO 310 I = 19 ND
NRITE(7!320) IvUN(I)yF(I)

320 FORMAT(11X9’X(’7I19’)’73X7A89F12.37/)

310 CONTINUE
TYPE 330

330 FORMAT(10X7’ ------------------------------- ’)
TYPE 9
TYPE 9

C NORMALIZE
CALL ZNORM(XrNPyNGyND)

C CHANGE EXTENSION NAME.
DO 350 I = 19 50
DO 360 J = 1’ 3
FNAMES(JyI) = OUTDEU(J)
FNAMES(J+1lrI) = OUTEXT(J)
360 CONTINUE
350 CONTINUE

C WRITE THE NORMALIZED DATA FILE NITH NEH EXT NAME.

0
1
O 400 I = 19 50
CALL ASSIGN<129FNAMES<191)9149’NEU’)
WRITE<12720) (FNAMES(M91):M=1714)
J = J + 1
IF(J.LE.NP) GO TO 410
J = 1
K = K + 1
410 CONTINUE
DO 420 ND = 17 ND
wRITE(12r425) X(JyKyKD)

H H

J
K
D

425 FORMAT(F14.6)
420 CONTINUE
CALL CLOSE(12)
400 CONTINUE
CALL EXIT

END

 

 

'J

 

._-"-F‘:'

 

 

204

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
PROGRAM NAKR03.FOR

NAKR03 IS DESIGNED TO EXECUTE VOICE IDENTIFICATION OPERATION FOR
DgglGNSS (CROSS-TRANSMISSION DY COMPOSITE PARAMETER OF FFC AND IDS
L ). - '

NAKR03 REQUIRES 4 MASTER FILES, EACH CONTAINING SO FILE NAMES OF
PATTERNS.

13-JULY-83
H. NAKASONE

DEPARTMENT OF AUDIOLOGY AND SPEECH.SCIENCES
MICHIGAN STATE UNIVERSITY

C
C
C
C
C
C
C
C
C
E SUBPROGRAMS CALLED: SBUEITvRDFILE92NORMyAND SUBSET.
C
C
C
C
C
C EAST LANSING, MICHIGAN

C

C

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

REAL TXU(5910920)vTXK(5910920)

REAL XU(5v10920)9 XK(5910T20)9FUEIT(10)vTUEIT(9)
LOGICALXI MFILUF(14)vMFILUT<14)vMFILKF(14)rMFILKT<14)
LOGICAL*1 UFILEF(14950)vUFILET(14950)rKFILEF<14vSO)
LOGICALXI KFILET(14950)vFINEXT(3)vTINEXT(3)rINDEV(3)
LOGICALXI LFILE(14)9DEDUG9YNvDEL

DATA TINEXT<1)/’T’/vTINEXT(2)/’I’/vTINEXT(3)/’M’/
DATA INDEV(1)/’D’/rINDEV(2)/’K’/rINDEV(3)/’1’/

DATA BEL/'007/yDEBUG/.FALSE./

FORMAT(’+’7A1)
FORMAT(Al)
FORMAT(1X9/)

4 FORMAT(3A1)

H‘OWV

NDF = 10

NDT = 9

ND NDF + NDT
NP 5

N6 10

DEBUG = .FALSE.
TYPE 4500 -
4500 FORMAT(’ DEBUG ?’9$)
ACCEPT 45101YN
4510 FORMAT(AIL

TYPE 1
FORMAT(’ PROGRAM NAKRO3 - Experiment III (Cross-Transmission)’/

8’ A Pattern contains features from Freeuencs and Time Domain.’/

/)
TYPE 10

10 FORMAT(’ UNKNOWN MASTER FILE FROM FREO.(14A1) >’y$)
READ(SvZO) MFILUF

20 FORMAT(14A1)
TYPE 30

30 FORMAT(’ KNOUN MASTER FILE FROM FREO.(14A1) >’r$)

READ(592O) MFILKF

 

205

TYPE 40

40 FORMAT(’ EXT NAME OF FREQ. FILE (FRI 0R FR a,
READ(SySO) FINEXT L) , '$>

50 FORMAT(ZAI)

CALL RDFILE<MFILUFvUFILEFvFINEXTrINDEV)
CALL RDFILE(MFILKFvKFILEFyFINEXTvINDEV)
TYPE 9
TYPE 60
60 FORMAT(’ UNKNOUN MASTER FILE FROM TIME(14A1) h’yS)
READ(592O) MFILUT
TYPE 70
70 FORMAT(’ KNOUN MASTER FILE FROM TIME(14A1) }’9$)
READ(5920)VMFILKT

CALL RDFILE<MFILUT9UFILETvTINEXTrINDEV)
CALL RDFILE(HFILKTrKFILETrTINEXTrINDEV)

TYPE 9
TYPE 80
80 FORMAT(’ PROVIDE INFO. FOR FREQUENCY FEATURES.’/l
CALL SBUEIT(FUEITvNDF)
TYPE 9
TYPE 90
90 FORMAT(’ PROVIDE INFO. FOR TIME FEATURES.’/)
CALL SBUEIT(TUEIT9NDT)

C TRANSFER DATA FROM UFILE AND KFJILE TO XU AND XKv RESPECTIVELY.

KT = 0
DO 100 J = 1' N6

DO 110 I = 19 NP
KT = KT + 1
CALL ASSIGN(129UFILEF(19KT)9149’OLD’)
READ(12920) LFILE
DO 120 K = 19 NDF
READ(127150) TXU(IvJvK)

150 FORMAT(F14.6)-
XU(I!J9K) = TXU(I9J7K) X FUEIT(N)
120 CONTINUE

CALL CLOSE(12)

CALL ASSIGN(139KFILEF(1vKT)9149’OLD’)
READ(13v20) LFILE
DO 130 K = 19 NDF

READ(139150) TXK(IvJ9K)

XK<IvaKJ = TXK(IerK) * FHEIT(K)

130 CONTINUE

CALL CLOSE(13)
110 CONTINUE
100 CONTINUE

CALL zNORHtxu.NR.NC.NDF>
CALL ZNORM<XK.NP.NG.NDF>
CALL ZNORM(TXU9NP:NGyNDF)
CALL ZNCRN<TXK.NP.NC.NDF)

KT = 0
DO 5100 J = 1' NB

DO 5110 I = 19 NP

.KT = KT + 1
CALL ASSIGN<129UFILET(1yKT)914,’OLD’n

206

READ(12920) LFILE
KNTW = 0
DO 5120 K = 1+NDF7 NDT+NDF
READ(127150) TXU(I!J9K)
KNTU = KNTW + 1
.XU(I!J9K) = TXU(I!J9K) * TWEIT(KNTW)
5120 CONTINUE
CALL CLOSE(12)
CALL ASSIGN<137KFILET<17KT)714!’OLD’)
READ(13920) LFILE
KNTW = 0
D0 5130 K = 1+NDF9 NDT+NDF
READ(139150) TXK(I!J!K)
KNTW = KNTW + 1
XK(I!J7K) = TXK(I!J9K) * TWEIT<KNTW)

5130 CONTINUE
CALL CLOSE(13)

5110 CONTINUE
5100 CONTINUE
202 KNTU = 1

KNTUP= 1

KCORR= 0
2000 IF(DEBUG) TYPE 2057 KNTUPrKNTU
205 FORMAT(1Xy//9’ EUCLIDIAN DISTANCES BETWEEN PATTERN ’911!’

8 OF UNKNOWN SPEAKER ’vI29/9’ AND PATTERN I OF
8 KNOWN SPEAKER J.’//)

XMIN = 1.E30
ITEMP = 1
JTEMP = 1
COMPUTE EUCLIDIAN DISTANCE
DO 210 J = 1r NG

DO 220 I = 17 NP

EUCL = .0
DO 230 K = 19 NDF+NDT
EUCL = EUCL + (XU(KNTUP7KNTU9K) - XK(IrJyK))##2.
230 CONTINUE

EUCL = EUCL**0.S
IF(EUCL.GT.XMIN) GO TO 250

ITEMP = I

JTEMP = J

XMIN = EUCL
250 IF(DEBUG) WRITE(7r260) EUCLvle
260 FORMAT(’+’vF10.59’E’vI1v’9’9129’3’v3)
220 CONTINUE

IF(DEBUG) TYPE 9

210 CONTINUE

IF(KNTU.EG.JTEMP) KCORR = KCORR + 1

WRITE(77280) KNTUP, KNTUvITEMPvJTEMPvXMIN

280 FORMAT(’ PATTERN ’9119’ OF UNKNOWN SPEAKER ’712,
8’ IS IDENTIFIED WITH’/’ PATTERN ’9119’ OF KNOWN
8 SPEAKER ’vIZy/v’ DISTANCE =’vF10.59//)

KNTUP = KNTUP + 1
IF(KNTUP.LE.NP) GO TO 2000

KNTUP = 1
KNTU = KNTU + 1

3000

COMPUTE

6300

6400
6200
6100

9999

 

207

IF(KNTU.LE.NG) GO TO 2000

KINCOR = 50 - KCORR
PCOR = (FLOAT(KCORR)/50.) * 100.

TYPE 3000, KCORRvKINCORvPCOR'
FORMAT(1Xv//v’ Summary by Nearest Neighbor Rule:’//’
8 Correct =’!I3v’! Incorrect =’9139’9 Rate(Z) =’9F6.2r////>

TYPE 9
TYPE 9

THE MINIMUM SET DISTANCE

CALL SUBSET(XU9XK1NP7NG’ND)

TYPE 9

TYPE 9

IFﬂ.NOT.DEBUG) GO TO 9999

TYPE 6000

FORMAT(’ WANT DIFFERENT FEATURES COMBINATION ?’r$)
ACCEPT 60089YN

FORMAT(Al) .

IF(YN.NE.89) GO TO 9999

TYPE 6050

FORMAT(’ FOR FREQUENCY DOMAIN:’/)
CALL SBWEIT<FWEITTNDF>

TYPE 6060

FORMAT(’ FOR TIME DOMAIN:’/)

CALL SBWEITSTWEITINDT)

CALL EXIT

DO 6100 J = 19 N6
DO 6200 I = 1r NP
DO 6300 K = 1: NDF
XU<IrJrK) = TXU(I9J9K) * FWEIT(K)
XK(I9J9K) = TXK(IrJvK) X FWEIT(K)

CONTINUE

KNTW = 0

DO 6400 K = 1+NDF9 NDT+NDF
KNTW = KNTW + 1
XU(IrJvK) = TXU(IvJ9K) * TWEIT(KNTW)
XK(IvJvK) = TXK(I,J!K) X TWEIT(KNTW)

CONTINUE

CONTINUE
CONTINUE

GO TO 202

CALL EXIT
END

 

208

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH H

C PROGRAM NAKROA.FOR H HHHHHHHHHHHHHHHHHHHHHHHHH
C NAKROA IS DESIGNED TO EXECUTE WITHIN-TRANSMISSION VOICE

C IDENTIFICATION OPERATION.

C 12*JULY- -83

C H. NAKASONE
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

REAL XU(5910910)r DMAX<571O)v WEIGHT<10>
LOGICALXl MFILU(14) ‘
LOGICALII UFILE(14950)7 INEXT(3)!INDEU(3)
LOGICALXI OUTEXT<3>90UTDEV(3)9 YN9 DEBUG

DATA INDEV(1)/’D’/vINDEV(2)/’K’/9INDEV(3)/’1’/
DATA DEBUG/.FALSE./

8 FORMAT(AI)
9 FORMAT(1X9/)
TYPE 10
10 FORMAT(’ PROGRAM NAKROA: VOICE I.D. WITHIN TRANSMISSION.’//)
TYPE 15
15 FORMAT(’ ENTER MASTER FILE NAME (14A1) 3’95)
. READ(5v20) MFILU
20 FORMAT(14A1)
TYPE 25
25 FORMAT(’ ENTER EXT NAME (FRI, FRL, OR TIM) ﬁ’r$)
C FRI = IDS
C FRL = LTS
C TIM = FFC
READ(5r22) INEXT
22 FORMAT(3A1)
TYPE 26
26 FORMAT(’ ENTER NUMBER OF DIMENSIONS >’s$)
READ(5927) ND
27 FORMAT(I3)
CALL RDFILE<MFILU9UFILEIINEXT,INDEV)
TYPE 29
29 FORMAT(’ WANT PRINT OUT OF DISTANCE MATRIX h'yS)

READ(598) YN
IF(YN.EO.B9) DEBUG = .TRUE.

CALL SDWEIT(WEIGHT9ND)
NP 5
N6 10

C PLACE DATA FROM UFILE TO AN ARRAY xu.

KT = 0
Do 100 J = 1. NB

DO 110 I = 19 NP
KT = KT + 1
CALL ASSIGN<129UFILE<19KT)714T’OLD’)
READ(12 20) LFILE
D0 120 K = 17 ND
READ(129150) XU(Iv JrK)

1000

1410
1400

2000

230

210

FORMAT(F14.6)

XU(I1J9K) = XU(I’J’K) * WEIGHT(K)
CONTINUE
CALL CLOSE(12)

CONTINUE
CONTINUE

84 = CHARACTER Ty FIRST OF TIM EXTENSION.
IF(INEXT(1).NE.B4) CALL ZNORM(XU7NP7NG1ND)

KCOR1 = 0

KCOR2 = 0

KNTU = 0

KNTU = KNTU + 1

DO 1400 J = 1: NB
DO 1410 I = 1, NP

DMAX(I1J) = 0.

CONTINUE

CONTINUE

KNTUP = 0

KNTUP = KNTUP + 1

IF(DEBUG) TYPE 205’ KNTUPIKNTU

FORMAT(1Xv//r’ EUCLIDIAN DISTANCES DETWEEN PATTERN ’III!’
8 OF UNKNOWN SPEAKER ’y12,/y' AND PATTERN I OF KNOWN

8 SPEAKER J.’//)

XMIN = 1.E30
ITEMP =
JTEMP =

DO 210 J = 1’ N6

RMAX = -1.E3O
DO 220 I = 1! NP

EUCL = 0.0
DO 230 K = 1, ND
EUCL = EUCL + (XU(KNTUP!KNTU;K)-XU(IvaK))**2
CONTINUE
EUCL = EUCL**0.5
IF(EUCL.GT.RMAX) RMAX = EUCL
IF(EUCL.GT.XMIN) GO TO 250
ITEMP = I
JTEMP = J
XMIN = EUCL
IF(DEBUG) WRITE(79260) EUCLyIrJ
GO TO 220

IF(DEBUG) TYPE 265
FORMAT(’+’916X1$)

FORMAT(’+’TF10.57’E’iny’v’yI21’J’y3)
CONTINUE

IF(DEBUG) TYPE 9
DMAX<KNTUPrJ) = RMAX

CONTINUE

IF(JTEMP.EG.KNTU) KCOR1 = KCOR1 + 1
TYPE 9

.

. .ﬂxuxk. ﬂ;
r

I . T
crawl

 

 

210

WRITE<79280) KNTUP, KNTUyITEMP’JTEMPpXMIN

FORMAT(’ THE PATTERN t’rIly’ OF UNKNOWN SPEAKER ’9I29
8’ IS IDENTIFIED WITH’/’ THE PATTERN 4’9I1v’ OF KNOWN
8 SPEAKER ’rI2v/v’ DISTANCE =’9F10.59//)

IF(KNTUP.LT.NP) GO TO 2000

CALL SETDIS(DMAXyKNTUrJTMPvSETMINrNG)
IF(JTMP.EG.KNTU) KCOR2 = KCOR2 + 1

IF(KNTU.LT.NG) GO TO 1000

INCOR1 = N6 * NP - KCOR1
CRATE1 = KCOR1 I 2.
INCOR2 = N6 - KCOR2
CRATE2 = KCOR2 * 10.
TYPE 9

TYPE 9

TYPE 9

WRITE<775OO) (MFILU(M)9M=5914)

FORMAT(’ SUMMARY OF RESULTS E MASTER FILE = ’rlOAly’ J’//)
WRITE<79510) KCORleNCORvaRATEI

FORMAT(’ BY THE NEAREST NEIGHBOR RULE:’/’ CORRECT = ’9139
8’9 INCORRECT = ’vI3v’ RATE(Z) = ’rF6.27/)

WRITE(7952O) KCOR291NCOR2vCRATE2
FORMAT(’ BY THE MINIMUM SET DISTANCE RULE3’/’ CORRECT = ’9139
8’9 INCORRECT = ’713!’ RATE(Z) = ’vF6.2v/)

TYPE 9

CALL EXIT
END

 

211

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
PROGRAM RDCFET

RDCFET CREATES A NEW DATA FILE OF OPTIMUM FEATURES OF
IDS OR LTS PARAMETER.

PART OF PRE-PROCESSING STAGE OF EXPERIMENTAL PROCEDURE
(CHAPTER II).

H. NAKASONE
12-JULY-83

DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES
MICHIGAN STATE UNIVERSITY

\

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

onnonnnnnnnnnnn

COMMON /PASS/ FMT
REAL X(509100)v Y(10)

INTEGER KFET<10)
LOGICAL¥1 FNAMES<14950>9 MFILE(14)vINEXT(3)9INDEV(3)

LOGICAL¥1 0UTEXT(3)90UTDEV(3)rBEL!FMT(60)
DATA OUTDEV(1)/’D’/vOUTDEV(2)/’K’/yOUTDEV(3)/’1’/
DATA DEL/'007/

8 FORMAT(Al)

9 FORMAT(’+’74X:’*’,$)
NUMF = so

1000 TYPE 10

10 FORMATC’ MASTER FILE NAME (14A1) >'.$)
READ(5.20) MFILE

‘20 FORMAT(14A1)
TYPE 30

30 FORMAT(’ EXT NAME OF INPUT DATA FILE (3A1) D'.s>
READ(5.40) INEXT

40 F0RMAT<3A1)
TYPE 45

4s FORMAT(’ EXT NAME OF OUTPUT DATA FILE (3A1) P'.s)
READ(5.40) OUTEXT
TYPE so

so FORMAT(’ DEV NAME OF INPUT DATA FILE (3A1) 5’95)
READ(Sv40) INDEV
TYPE 6

6 F0RMAT<1X./)

CALL RDFILE(MFILE9FNAMESvINEXTrINDEVsNUMF)
TYPE 9

DO 100 I = 1' NUMF
CALL ASSIGN(127FNAMES(19I)914v’OLD’)

READ(127FMT) DUMv DUM9 (X(I'J)'J=1!100)
CALL CLOSE(12)
100 CONTINUE

 

 

,-I

. 3"...

 

212

TYPE 9

DO 110 I = 19 NUMF
DO 120 J = 19 3
FNAMES<J9I) = OUTDEV(J)
FNAMES(J{119I) = OUTEXT(J)
120 CONTINUE
110 CONTINUE

C X(509100) HAS BEEN FILLED.

TYPE 89 BEL9BEL9BEL

TYPE 200

200 FORMAT(’ ENTER NUMBER OF FEATURES 3’9$)
READ(59210) NFET ‘

210 FORMAT(I3)
TYPE 220

220 FORMAT(’ ENTER FEATURE NODES BELOW.’//)

DO 230 I = 19 NFET
TYPE 2409 I

240 FORMAT(’ NODE 0 OF THE FEATURE’9I39’ b’9$)
READ(59210) KFET(I)
230 CONTINUE

C NOW9 KFET(I)9 I.E.9 INFORMATION ON OPTIMUM FEATURE89
C HAS BEEN FURNISHED.

DO 300 I = 19 NUMF
D0 310 J = 19 NFET
Y(J) = X(I9KFET(J))
310 CONTINUE
CALL ASSIGN(129FNAMES(19I)9149’NEW’)
WRITE(1292O) (FNAMES(M9I)9M=1914)
DO 320 J = 19 NFET
WRITE<12933O) Y(J)
330 FORMAT(F14.6)
320 CONTINUE
CALL CLOSE(12)
300 CONTINUE

TYPE 9
TYPE e.DEL

WRITE(79400) NUMF9OUTEXT9OUTDEV(3)

400 FORMAT(1X9’DONE. ’9I391X93A19’ FILES STORED IN DISK ’9A19/l)

GO TO 1000
CALL EXIT
END

213

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
PROGRAM NAKROP .

NAKROP IS TO PREPARE A PROXIMITY MATRIX AND A
CATEGORY NAME FILE AS INPUT DATA REQUIRED BY A
PROJECTION PROGRAM ’SAMMON’.

SUBPROGRAM REQUIRED: NONE.
PROGRAMMS TO BE CHAINED: CHNROA9CHNRO29 8 CHNRO3

WRITTEN BY: HIROTAKA NAKASONE
DATE: 28 JULY9 1983

C

C

C

C

C

C

C

C

C

C

C

C DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES
C MICHIGAN STATE UNIVERSITY
C EAST LANSING9 MICHIGAN
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
REALXB PROGA9PROG29PROG3

INTEGER NCATEG(10)9CATNAM(5910)

LOGICAL¥1 NAMOUT(14)9YN9IDATE(2O)9TITLE(80)

DATA PROGA/12RDK1CHNROASAV/9PROG2/12RDK1CHNRO2SAV/
DATA PROG3/12RDK1CHNRO3SAV/

TYPE 5
5 FORMAT(’ NAKROP : PRELIMINARY TO SAMMON”S PROJECTION.’//)

TYPE 20

20 FORMAT(’ ENTER DESCRIPTION OF THIS RUN.’/’ k’9$)
READ(5916) (TITLE(L)9L=1972)

16 FORMAT(72A1)

TYPE 25

25 FORMAT(’ NUMBER OF CATEGORIES ?’9$)
READ(59200) NCAT

200 FORMAT(I2)
NFILES = 0

DO 30 I = 19 NCAT
TYPE 319 I
31 FORMAT(1X9’CATEGORY # ’9129’ }’9$)
READ(5932) (CATNAM(J9I)9J=195)
32 FORMAT(5A2)
NCATEG(I) = 5
NFILES = NFILES + 5

30 ' CONTINUE
TYPE 350 _

350 FORMAT(’ OUTPUT CATEGORY FILE NAME(14A1) p'.s)
READ(S937O) (NAMOUT(N)9N=1914)

37o FORMAT(14A1)

C PREPARE CAT FILES IN TWO DIFFERENT NAMES.

YN = oFALSEo
CALL ASSIGN(119’CNAMES.DAT’9109’NEW’)

2200 WRITE(1194OOO) NCAT

4000 FORMAT(IZ)
DO 4010 I = 19 NCAT
WRITE(1194020) NCATEG(I)9 (CATNAM(M9I)9M=195)

4020 FORMAT(1295A2)

 

 

 

4010

2300
3100

3310
3300

3400

214

CONTINUE
CALL CLOSE(II)

IF(YN) GO TO 2300

YN = .TRUE.
CALL ASSIGN(119NAMOUT9149’NEW’)
GO TO 2200

TYPE 31009 NAMOUT
FORMAT(1X9/9’ CNAMES.DAT = ’914A19///)

TYPE 3300 .
FORMAT(’ OPTION: CHOOSE 1. 2. OR 3.'//'

a 1 FOR WITHIN-TRANSMISSION PROJECTION.’/’

2 2 FOR CROSS-TRANSMISSION BY 1 PARAMETER.’/’

& 3 FOR CROSS-TRANSMISSION BY 2 PARAMETERS.’//’
a ?'

READ(593400) YN

FORMAT(AI)

IF(YN.EQ.49) CALL CHAIN(PROGA990)

IF(YN.EQ.50) CALL CHAIN(PROG299O)

IF(YN.EQ.51) CALL CHAIN(PROG3990)

IF(YN.NE.49 .AND. YN.NE.50 .AND. YN.NE.51) GO TO 3310

CALL EXIT
END

 

 

 

215

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
CH

14
10

13

PROGRAM CHNRO2:

CHNRO2 IS CHAINED FROM NAKROP. IT CAN BE ALSO RUN FROM THE KEY
BOARD DIRECTLY.

INPUT IS A MASTER DATA FILE NAME(8).
OUTPUT IS A PROXIMITY MATRIX(UPPER HALF DIAGNAL).

PURPOSE OF THIS PROGRAM IS TO PREPARE A SPECIFIC DATA FORMAT REQUIRED
BY A LINEAR PROJECTION PROGRAM9 ’SAMMON’.

SUBPROGRAMS: SBWEIT9RDFILE9ZNORM9EUCMAT9INDEX.

H. NAKASONE
28-JULY-83

DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES

MICHIGAN STATE UNIVERSITY
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

REALXS PROGS

REAL XU(5910910)9 XK(5910910)9WEIGHT(10)

LOGICALXI MFILU(14)9MFILK(14)9LFILE(14)

LOGICAL*1 UFILE(14950)9 KFILE(14950)9 INEXT(3)9INDEV(3)
LOGICAL*1 OUTEXT(3)9OUTDEV(3)9DEBUG9YN9NAMPRX(14)

DATA INDEV<1)/’D’/9INDEV(2)/’K’/9INDEV(3)/’1’/

DATA PROGS/12RDKISAMMONSAV/

FORMAT(1X9/)

FORMAT(3A1)

TYPE 10

FORMAT(’ PROGRAM CHNRO2: CROSS-TRANSM. BY 1 DOMAIN.’//)
TYPE 13

FORMAT(’ EXT NAME OF INPUT FILE(FRI9FRL9 OR TIM) F’9$)
READ(5914) INEXT

TYPE 15
FORMAT(’ MASTER FILE NAME FOR UNKNOWN (14A1) P’95)

READ(592O) MFILU
FORMAT(14A1)

TYPE 25
FORMAT(’ MASTER FILE NAME FOR KNOWN (14A1) P’9$)

READ(592O) MFILK

TYPE 26

FORMAT(’ ENTER NUMBER OF DIMENSIONS }’9$)
READ(5927) ND

FORMAT(I3)

TYPE 28
FORMAT(’ OUTPUT NAME FOR PROXIMITY MATRIX(14A1) >’9$)

READ(5920) (NAMPRX(M)9M=1914)

DEBUG = .FALSE.

TYPE 4500
00 FORMAT(’ DEBUG ?’9$)

ACCEPT 45109YN

IF(YN.EQ.89) DEBUG = .TRUE.
10 FORMAT(Al) -

CALL SBWEIT<WEIGHT9ND)

 

 

216 '

CALL RDFILE<MFILU9UFILE9INEXT9INDEV)
CALL RDFILE<MFILK9KFILE9INEXT9INDEV)

C UFILE AND KFILE HAVE BEEN FILLED.

5

10

NP
NG

C TRANSFER DATA FROM UFILE AND KFJILE TO XU AND XK9 RESPECTIVELY.

130

110
100

m
s 9

KT = 0
DO 100 J = 1. NB

DO 110 I = 1. NP
KT = KT + 1
CALL ASSION<12.UFILE<1.RT).14.'OLD'>
READ(12920) LFILE
DO 120 K = 1. ND
READ(129150) XU(I9J9K)

FORMAT(F14.6)
XU(I9J9K) = XU(I9J9K) * WEIGHT(K)
CONTINUE

CALL CLOSE(12)

CALL ASSIGN<139KFILE<19KT)9149’OLD’)
READ(13920) LFILE
DO 130 K = 19 ND
READ(139150) XK(I9J9K)
XK(I9J9K) = XK(I9J9K) X WEIGHT(K)
CONTINUE
CALL CLOSE(13)
CONTINUE
CONTINUE

IF(INEXT(1).EQ.84) GO TO 200
-—- STANDARDIzE DATA ---

CALL ZNORM(XU9NP9NG9ND)
CALL ZNORM(XK9NP9NG9ND)

IF(DEBUG) TYPE 4210
FORMAT(’ NORMALIZATION DONE.’/)

CALL EUCMAT(XU9XK9NAMPRX9ND9NP9NG)

IF(DEBUG) TYPE 4215
FORMAT(’ EUCMAT RETURNED. NOW CHAINING TO DRVSAV.’//)

CALL CHAIN(PROGS990)
CALL EXIT
END

 

 

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

FILE: FILFIL.FOR

FILFIL IS TO CREATE A MASTER FILE OF FILE NAMES.
MAXIMUM NUMBER OF FILES CAN BE STORED IN A MASTER
FILE IS CURRENTLY 100.

HIROTAKA NAKASONE9 22-MAY-1983
DEPARTMENT OF AUDIOLOGY AND SPEECH SCIENCES9 MSU.

C

C

C

C

C

C

C

C

C

C TO COMPILE: FILFIL=FILFIL/U
C TO LINK : FILFIL=FILFIL9LCHECK9SYSLIB/F
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
COMMON /NAME/FNAMES

COMMON /NUMF/NUMF

DOUBLE PRECISION EXT

LOGICALXl FNAMES(159100)9LETR(9)9YN9LNAME<15>9FMT(BO)
LOGICAL*1 NEXT(3)

DATA LNAME(l)/’D’/9LNAME(2)/’K’/9LNAME(4)/’:’/

DATA LNAME(10)/’.’/

LNAME(15) = 0
FORMAT(1A1)
FORMAT(I3)
FORMAT(6A1)
FORMAT(14A1)
FORMAT(12A1)
FORMAT(15A1)
FORMAT(3A1)
FORMAT(1X9’ WAIT.’)
FORMAT(1X9/)

('1 omnomauww

TO HERE TO ENTER ALL FILES.

TYPE 30 .
30 FORMAT(’ ENTER NUMBER OF FILES (I3) }’9$)

READ(594O) NUMF
40 FORMAT(I3)

TYPE 50
‘0 FORMAT(’ SPECIFY EXT NAME (3A1) P’9S)

READ(597) (NEXT(N)9N=193)
CALL IRAD50(39NEXT(1)9EXT)

TYPE 60
6O FORMAT(’ ENTER ALL FILES BELOW (6A1).’9//)

DO 70 I = 19NUMF
75 TYPE 8091
80 FORMAT(’ FILE 9’9139’ }’9$)
IF(LCHECK(EXT9ICHAN9I).GT.0) GO TO 79
TYPE 90
90 FORMAT(1X9’*FATAL ERRORX’)

STOP
79 CALL CLOSEC(ICHAN)
CALL IFREEC(ICHAN)

70 CONTINUE

IE = 5
WRITE(79100)
100 FORMAT(//91X9’ TABLE OF FILE NAMES ’//)

 

130
120

145
140

150

155
160

230

218

D0 120 J = 19 NUMF
WRITE(79130) (FNAMES(M9J)9M=1915)9J
FORMAT(1X915A19’E’9I39’J’)

CONTINUE

TYPE 9

TYPE 140

FORMAT(’ OK ?79$)
ACCEPT 19YN
IF(YN.EQ.89) GO TO 200

THEN DO SOME CORRECTION BUSINESS HERE.

TYPE 150

FORMAT(’ SPECIFY THE FILE NUMBER TO BE CORRECTED b’9$)
ACCEPT 291

TYPE 16091

FORMAT(’ ENTER THE CORRECT FILE NAME FOR #’9I39’ P’9$)
IF(LCHECK(EXT9ICHAN9I).LT.0) STOP’FATAL ERROR’

CALL CLOSEC(ICHAN) I

CALL IFREEC(ICHAN)

GO TO 145

TYPE 210 .

FORMAT(’ ENTER MASTER FILE NAME(DEV:FILNAM.EXT) b’9$)
READ(594) (LNAME(L)9L=1914)

CALL ASSIGN<129LNAME9149’NEW’)

WRITE(129220) (LNAME(L)9L=1914)9NUMF
FORMAT(14A19I39’ (5E15.7)’)

DO 230 I = 19NUMF
WRITE(1294) (FNAMES(M9I)9M=1914)
CONTINUE
CALL CLOSE(12)
CALL EXIT
END

219

C FILE: ZNORM.FOR
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
C SUBROUTINE ZNORM: TO STANDARDIZE FEATURE VALUES OF SPEECH
C PARAMETERS9ID89 FFC9 AND LTS.
C Written by: Hirotaka Nakasone
C Date: 11 JULY9 1983
C Dept. of Audiology and Speech Science59 MSU9 East Lansins9 MI.
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE ZNORM<X9NP9NG9ND>
REAL X(NPyNG9ND)

C NP = NUMBER OF PATTERNS
C NG = NUMBER OF GROUPS9 OR SPEAKERS
C ND = NUMBER OF DIMENSIONS9 OR SAMPLES / SPEAKER

NPAT = NP X MG
2
FORMAT(1X9’NORMALIZATION BY Z-TRANSFORM.’/)
DO 10 J=19ND
RM 0.0
SD 0.0

‘3
H
O
O
.4
.<
‘11
m

1'.)

DO 20 I = 19 NB
DO 25 K = 19 NP
RM = RM + X(K9I9J)
SD = SD + X(K9I9J)*X(K9I9J)

25 CONTINUE
20 CONTINUE
RM = RM/FLOAT(NPAT)
SD = (SD/FLOAT(NPAT) - RMXRM)*#O.5
IF(SD.EQ.0.) GO TO 40
DO 30 I = 19 MG
DO 35 K = 19 NP
X(K9I9J) = (X(K9I9J)-RM)/SD
35 - CONTINUE
30 CONTINUE
GO TO 10
40 CONTINUE
DO 50 I = 19 NG
DO 45 K = 19 NP
X(K9I9J) = 0.0
45 CONTINUE
5 CONTINUE
10 CONTINUE

RETURN
END

 

 

220

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C SUBROUTINE RDFILE.FOR.

C RDFILE IS TO READ ALL THE FILES IN A MASTER FILE.

C 11 JULY 1983 '

C H. NAKASONE

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE RDFILE<MFILE9NAME509EXT9DEV)
COMMON /PASS/ FMT
LOGICAL*1 MFILE(14)9LNAME(14)9FMT(6O)9NAME50<14950)
LOGICAL¥1 EXT(3)9 DEV(3)

CALL ASSIGN(139MFILE9149’OLD’)
READ(13910) LNAME9NUMF9(FMT(M)9M=1960)
10 FORMAT(14A19I396OA1)

DO 20 I = 19 NUMF
READ(1393O) (NAME50(N9I)9N=1914)
3O FORMAT(14A1)

D040J=193
NAME50(J+119I) = EXT(J)
NAME50(J9I) = DEV(J)

40 CONTINUE
20' CONTINUE
CALL CLOSE(13)
RETURN
END

 

 

 

 

 

 

 

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE SUBF(NP9NG9ND)
C COMPUTATION PART FOR FRATIO’S
C BY H. NAKASONE A
CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
COMMON /SENSE/ X9 F
REAL X(5910920)9 SUM(10)9SSQ(1O)9F(2O)

ND: HOLDS NUMBER OF DIMENSIONS IN EACH NP.
NP: HOLDS NUMBER OF PATTERNS IN EACH NG.
NG: HOLDS NUMBER OF SPEAKERS (OR GROUPS).

000

DFB = FLOAT(NG - 1)

DFW = FLOAT(NOXNP - NO)
ON = FLOAT(NPtNG)

SN = FLOAT(NP)

D0 1 K = 19 ND
= 0.
GSSQ = 00

NG
0.
0.

DO 2 J = 1

U)!-
C:
3
A
(—

II ll '0

SSQ(J)

DO 3 I 19 NP
SUM(J) = SUM(J) + X(I9J9K)
SSQ(J) = SSQ(J) + X(I9J9K)*X(I9J9K)

3 CONTINUE

. GSUM = GSUM + (SUM(J)#SUM(J))/SN
= GSSQ + SSQ(J)
T = T + SUM(J)

I’J

CONTINUE

GSSQ - (TXT)/GN

GSUM - (TXT)/GN

SSTOT - SSBET
SSBET / DFB
SSWIT / DFW

SSTOT
SSBET
SSWIT
BGMS
WGMS

TSWG = 0.
DO 4 J = 19 NG

4 CONTINUE
URITE(7910) SSTOT9SSBET9SSWIT

D URITE(7912) BGMS9WGMS9TSWG

 

TSWG = TSWG + ( SSQ(J) - (SUM(J)¥SUM(J))/ SN )

D10 FORMAT(’ SSTOT=’9F14.39’ SSBET=’9F14.39’ SSWIT=’9F14.3)

D12 FORMAT(’ BGMS=’9F14.39’ WGMS=’9F14.39’ TSWG=’9F14.3)

F(K) = 9999.9091
IF(WGMS.GT.0.) F(K) = BGMS / WGMS
1 CONTINUE
RETURN
END

0000000000 0

Id

100

30

10

H.

222

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

SUBROUTINE SUBSET<AU9BK9NP9NG9ND)

TO COMPUTE MINIMUM SET DISTANCE OF SPEAKERS RECORDED
THROUGH TWO DIFFERENT TRANSMISSION AND/OR RECORDING
MEDIA.

CALLED FROM NAKRO2 AND NAKRO3< both Programs used in
the final stase of voice I.D. in a Ph.D. dissertation.)

21-JUL-83
NAKASONE
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

REAL AU(59109ND)9 BK(59109ND)9SETMIN(10)
INITIALIZE

KNTC = 0

DO 1 I = 19 NS
SETMIN(I) = 0.

CONTINUE

TYPE 2

FORMAT(’ Summary by Minimum Set distance Rule:’//
8’ Unknown Speaker ’9T209’Knoun SPeaker’9T409
8’Set Distance’/)

TYPE 9
FORMAT(’ -------------------------

 

KUG = 0
KUG =

KNG =
KNG = KNG + 1

SMIN = 1.E20
DO 5 JU = 19 NP

SMAX = -1.E20
DO 10 JK = 1. NP

EUCL = 0.
DO 20 K = 19 ND
EUCL = EUCL + (AU(JU9KUG9K)-BK(JK9KNG9K))**2.O
CONTINUE
EUCL = EUCL¥*0.5

IF(EUCL.LE.SMAX) GO TO 10
SMAX = EUCL
JKTEMP = JK
CONTINUE
IF(SMAX.GE.SMIN) GO TO 5
SMIN = SMAX
JUTEMP = JU
CONTINUE

SETMIN(KNG) 8 SMIN
KGTEMP = JUTEMP

 

60

70

99

223

IF(KNG.LT.NG) GO TO 30
RMIN = SETMIN(l)
KTEMP = 1

DO 60 K = 29 NG
IF(SETMIN(K).GT.RMIN) GO TO 60
RMIN = SETMIN(K)

KTEMP = K

CONTINUE

IF(KUG.EQ.KTEMP) KNTC = KNTC +'1
WRITE(7970) KUG9KTEMP9RMIN
FORMAT(8X9I29T259I29T399F12.59/)

IF(KUG.LT.NG) GO TO 100

TYPE 9

P = KNTC * 10.

TYPE 999KNTC9NG-KNTC9P

FORMAT(1X9/9’ Correct =’9I29’9 Incorrect =’9I29’9
8 =’9F6.2)

RETURN
END

Rate(Z)

224

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

C
C
C
C
C
C
C
C

210
205
202

COMPUTE

320

310
300

'DO 300 I = 1

SUBROUTINE EUCMAT: TO GENERATE A DISSIMILARITY
MATRIX BY USING EUCLIDIAN DISTANCE MEASURE.

CALLED FROM CHNRO2 AND CHNRO3.

WRITTEN BY: HIROTAKA NAKASONE
DATE: 28 JULY9 1983

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

SUBROUTINE EUCMAT(XU9XK9NAMOUT9ND9NP9NG)

COMMON /NFILES/ NFILES
REAL XU(NP9NG9ND)9XK(NP9NG9ND)9X(50920)9DMAT(1300)

LOGICAL*1 NAMOUT(14)9YN

ARRANGE X ARRAY SO THAT ELEMENTS 1 TO 25 WILL CONTAIN
DATA OF UNKNOWN SPEAKER89 AND 26 TO 509 OF KNOWN SPEAKERS.

NFILES = NGXNP
JU = 0
DO 202 J = 2
DO 205 I = 19
JU =
JK = JU +
DO 210 K 19 ND
X(JU9K) = XU(I9J9K)
X(JK9K) = XK(I9J9K)
CONTINUE
CONTINUE
CONTINUE

L.
C
....
ll [0 H UI '0
U1

EUCLIDIAN DISTANCE.

9 49
DO 310 J = I+19 50
SUH = 00
DO 320 K = 19 ND
SUM = SUM + (X(I9K)-X(J9K))**2.
CONTINUE
DMAT(INDEX(I9J)) = SUMXXO.5
CONTINUE
CONTINUE

CONTINUE TO WRITE THE RESULTS.

5400

1250

1400
1600

YN = .FALSE.

CALL ASSION(129’PROXTP.DAT’9109’NEW’)
WRITE(1291250) NFILES

FORMAT(I39’ 1’/ ’(10FB.3)’ )

DO 1400 I = 19 NFILES-1
WRITE(1291600) (DMAT(INDEX(I9J))9J=I+19NFILES)
CONTINUE '
FORMAT(10F8.3)
CALL CLOSE(12)

IF(YN) GO TO 2100
YN = .TRUEN
CALL ASSIGN(129NAMOUT9149’NEW’)

GO TO 5400

 

225

2100 WRITE(792200) NAMOUT ~

2200 FORMAT(1X9’TWO RECORDS9 PROXTP.DAT AND ’9l4A19’ DONE.’/)
RETURN-
END

INTEGER FUNCTION INDEX(I9J)
COMMON /NFILES/ NFILES
INTEGER ROW9COL

ROW = MINO(I9J)

COL = MAX0(I9J)

INDEX = (ROW-1)XNFILES-ROW*(ROW+1)/2+COL
RETURN

END

 

 

 

 

226

CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

SUBROUTINE SETDIS:

C
C
C
C SETDIS IS TO FIND THE MINIMUM SET DISTANCE BETWEEN
C TWO SPEAKERS EACH REPRESENTED BY A VECTOR OF 9 OR

C 10 DIMENSIONS (FEATURES).

C SETDIS IS CALLED FROM VOICE IDENTIFICATION PROGRAMS9
C NAKRO4 AND NAKROA.

C
C
C
C
C

21-JUL-83
H. NAKASONE

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
SUBROUTINE SETDIS(DMAX9KUG9JTMP9TMIN9N)
DIMENSION DMAX(5910)

JTMP 1
TMIN 1.E30
DO 20 J = 19 N
SETMIN = DMAX(19J)
DO 10 I = 29 5
IF(DMAX(I9J).GT.SETMIN) GO TO 10
SETMIN = DMAX(I9J)
10 CONTINUE
IF(SETMIN.GT.TMIN) GO TO 20
TMIN = SETMIN
JTMP = J
20 CONTINUE

TYPE 309KUG9JTMP9TMIN
3O FORMAT(’ BY SET DISTANCE RULE:’/’
8 UNKNOWN ’9I29’ IS IDENTIFIED WITH KNOWN ’9I29/9’

8 WITH THE MINIMUM SET DISTANCE =’9F12.69/)

RETURN
END

 

STATE UN JIV

TTTTTTTLTTTTTTTTT TTTTTTTTTTTTTTTITTT|TTTT|TTT|TTTT|TT

1293 03