__ “x
1.133.413? I
Michigan State

University J

 

1L

 

ﬂ

PLACE IN RETURN eox to remove this checkout rrorn your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

________=_____J
ﬁg.

MSU to An Affirmative Action/Equal Opportunity Institution
cumulus-pd

 

 

 

 

NEURAL NETWORKS FOR DYNAMIC PROGRAMMING
By
Chinchuan Chiu

A DISSERTATION

Subnﬁnedto
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering

1991

ABSTRACT
NEURAL NETWORKS FOR DYNAMIC PROGRAMMING
By

Chinchuan Chiu

This dissertation presents a fundamentally new and different artiﬁcial neural
network approach to dynamic programming, including network formulation, analysis,
simulation, implementation, and applications. The proposed artiﬁcial neural network
method is attractive due to its robustness and radically improved speed over conventional
techniques. The network, based on the Hopﬁeld model, is deﬁned by an energy function in
which the optimal solution corresponds to the lowest energy state. The functionality and
the equilibria of the formulated network are analytically proved from the energy function
point of view. The quality of the solutions, the network behavior, and the basins of
attractions are thoroughly investigated by simulation. An architecture design well-suited to
current VLSI technology, in which the network is constructed from neuron array and
weight assignment chips, is described. Furthermore, this new approach has been
successfully applied to different applications such as optimal control, autoasssociative

memory, and speech recognition.

ACKNOWLEDGEMENTS

I sincerely thank Dr. M. A. Shanblatt, my major advisor, for his helpful guidance
and support throughout the years of my doctoral studies.

I also thank Dr. E. Goodman, Dr. P. D. Fisher, Dr. R. Enbody, and Dr. C. Ganser
for their valuable comments and suggestions on this work. Additionally, I wish to express
my gratitude to Dr. J. R. Deller and Mr. M. S. Liu for providing the cepstral coefﬁcients of
the vocabulary set used in the speech recognition experiments.

Finally, I want to dedicate this dissertation to my wife Lingming, and my adorable
daughters, Honyin and Wileen for they have been sharing love and joy with me.

ii

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES
1.0 Introduction
1.1 Overview of Neural Networks
1.2 Statement of Problem
1.3 Research Tasks
1.4 Organization of This Dissertation
2.0 Background
2.1 Artiﬁcial Neural Network Models
2.1.1 Single-layered Perceptrons
2.1.2 Multi-layered Perceptrons
2.1.3 An Example
2.2 The Hopfield-Tank Model
2.3 Dynamic Programming Problems
3.0 Network Formulation and Analysis
3.1 Network Formulation
3.2 Network Analysis
3.2.1 Deﬁnitions
3.2.2 Analysis of Network Functionality
3.2.3 Analysis of Minimum States

3.2.4 Location of Minimum States

iii

12
14
14
19
20
20
22
25
28
29
33
33
35

47

3.2.5 Example
3.3 Summary
4.0 Network Simulation
4.1 Network Simulation
4.1.1 Simulation Algorithm
4.1.2 Quality of Solutions and Function Parameters
4.1.3 Neural Network Solutions for Different Problems
4.1.4 Basins of Attractions
4.1.5 Network Behavior due to Connection Weight Errors
4.2 Application to an Optimal Control Problem
4.3 Summary
5.0 A Building Block Architecture and Problem Decomposition
5.1 A VLSI Building Block Architecture
5.2 Problem Decomposition
5.2.1 Algorithm
5.2.2 Examples
5.2.3 Simulation
5.3 Summary
6.0 Application: Incremental Autoassociative Memory
6.1 Introduction
6.2 Autoassociative Memory Network
6.2.1 Formulation
6.2.2 Deﬁnitions
6.2.3 Analysis
6.3 Examples
6.3.1 Initial Conditions

6.3.2 Simulation

iv

53
53
56
56
57
58
61
61
69
69
71
74
75
81
81
82
86
87

91
92
92
93
94
95
96
97

6.3.3 Basins of Attraction
6.3.4 Examples
6.4 Fault Tolerance in the Autoassociative Memory Network
6.5 Comparisons
6.6 Summary
7.0 Application: Dynamic Time Warping in Speech Recognition
7.1 Introduction
7.2 Dynamic Time Warping
7.2.1 A Neural Network for Dynamic Time Warping
7.3 Network Simulation
7.3.1 Vocabulary Set
7.3.2 Initial Conditions
7.3.3 Results
7.4 A Speech Recognition System Based on the Proposed Network

7.5 Summary
8.0 Conclusion

8.1 Summary

8.2 Contributions
APPENDIX A
APPENDIX B
BIBLIOGRAPHY

97

98
100
104
105
107
107
108
112
113
113
113
114
115
118
120
120
122
123
134
135

1-1

4-2
43

LIST OF TABLES

Characteristics of neural networks and conventional computers.
Comparison of D between two groups of simulations.
Lookup table of P for some values of D.

Local distances and path found by the algorithm for the 6x6 DPP. The total
lengthoftheselectedpath=3+1+1+3+3+1+1=13.

a) Initial neuron outputs for the 6x6 DPP; b-d) Neuron outputs after 25, 50,
and 100 iterations, respectively.

Simulation results for DPPs with different sizes. The parameter pair (or, B) has
been chosen as (50, 3) for instances with the number of stages less than 8, (5,
l) for instances with the number of stages greater than 16, and (25, 3) for
instances with the number of stages in between.

Number of attracted initial conditions to global and local minima with
different initial conditions.

Nuénber of attracted initial conditions to global minima with different values
of .

Comparisons of solution quality, processing time, and the network size
required for the methods with and without decomposition for the ﬁrst
example.

Number of attracted patterns for global and local minima with different initial
conditions.

Number of attracted patterns for global minima with different values of [3.

Success rate of recall for the second and third example with different
percentages of pixels being incorrect.

Comparisons of Kosko, Wang, and the proposed network for memory
patterns with a length of 100 bits.

Comparisons of distances obtained by the conventional and neural network
method for DTW for one of the test pattern A.

vi

#

62

63

68

68

87

96

98
100

106

115

7-2 Major misclassiﬁcations of the simulation for the neural network methods. 116
The second column lists the misclassiﬁed patterns for some of the test
patterns in the ﬁrst column. The third column lists the number of
misclassiﬁcations. For example, B is misclassified 4 and 2 times in 10
attempts as D or V in the methods with one and ﬁve reference patterns,
respectively.

vii

3-2

4-1

5-1

5-2

LIST OF FIGURES

A 3x6 dynamic programming problem (full interconnection not shown).
Simpliﬁed representation of a neuron.

Neuron working functions.

A 2-1ayered feed-forward neural network (connections are not fully shown).
A feedback neural network.

An XOR network.

A basic neuron of the Hopﬁeld-Tank model.

A general Hopﬁeld-Tank artiﬁcial neural network.

A conceptual 3x6 dynamic programming network (full interconnections not
shown) and its processing element.

a) Data for demonstrating the example, b) Energy surface for E1, c) Energy
surface for E2, and (1) Energy surface for E=25E1+E2.

a) The relation between the average normalized path length D and different
values of B for 8x8 and 16x16 DPPs and b) The relation between the average
normalized path length D and different values of B for 32x32 DPPs.

a) Initial neuron outputs for the 6x6 DPP; b-d) Neuron outputs after 25, 50,
and 100 iterations, respectively.

A problem consisting two minimum paths shown in thick lines. The local
distances for the two minimum paths are all zero, other local distances, not
shown, are all set to 50.

a) Optimal state trajectory and selected state trajectory for x(0)=1.0. b)
Optimal state trajectory and selected state trajectory for x(0)=-1.0. The curves
of state trajectories have been smoothed.

A building block architecture (only 3 stages shown) for the dynamic
programming neural network.

a) A neuron array. b) A neuron design.

viii

l6
17

18
21
23
23
32

54

59

67

72

78

79

5-3
5-4
5-5

5-6

5-8

6-1

6-2

6-3

6-4
7-1

7-3

A connection matrix design.
A neuronprocessing system for dynamic programming.

a) A 4 x 4 problem is divided into 4 2 x 2 sub-problems. b) A new 2 x 2
problem with each state representing a 2 x 2 sub-problem.

An example illustrating how to determine the states for each stage.
Different solutions for the ﬁrst example.

Different solutions for the second example.

A 6 by 6 network (full connection not shown) which can store patterns
Elei’gl’goiilnéép’ (1,1,1,0,0,0) and (1,0,1,0,1,0). Three shortest paths are shown in

a) Eight single-error-correcting code words composed of 3 message bits and
3 check bits and parity check table. b) 6-cube map for the code of (a).

a) Three stored patterns. b) Three test patterns with random noise. c) Three
test patterns with opaque image.

Fault recovery process. The thick line represents the new minimum cost path.
An example of the time warping function and search space of the optimal
path. The path corresponding to this warping function is shown by the
hatched line.

The block diagram of the speech recognition system which includes the
neural network for dynamic time warping.

The pipelined local distance calculator.

80
81
83

85
88
89
93

101

104

110

117

117

 

Chapter 1

 

 

 

1.0 Introduction

Artiﬁcial neural networks are a fundamcntally new and different approach to
information processing. They have proven to be very eﬁ’ective in various applications such
as pattern recognition, associative memory, control, robotics, and combinatorial
optimization. This dissertation presents a new method for dynamic programming using a
feedback Hopﬁeld-Tank type network, including network formulation, theory,
implementation, and applications.

This Chapter begins with a brief overview of the ﬁeld of neural networks. The
problem to be solved in this dissertation is then stated, followed by the research tasks.

Finally, the organization of this dissertation is outlined.

1.1 Overview of Neural Networks

The past few years have seen a resurgence of research in the ﬁeld of neural
networks promoted by advances in electronic circuit technology as well as a deeper
understanding of the functioning of brain[1-3]. One motivation is a desire to build a new
breed of powerful computers (the sixth generation) to solve a variety of problems that are

very difﬁcult to solve, or at least computationally cumbersome, with conventional digital

2

computers. Cognitive tasks such as recognizing a familiar face, learning to speak, and
understand a natural language, retrieving contextually appropriate information from
memory, and guiding a mechanical hand to grasp objects of different shapes and sizes are
representative examples of these problems [4-6]. All require some form of pattern
recognition, pattern matching, nonlinear discrimination, and/or combinatorial
optimization. Quite interestingly, these tasks are analogous to those typically performed
naturally and robustly by the brain. For example, humans can usually recognize a familiar
face in about 200 milliseconds [7]. The state-of-the—art image processing system can not
even come close to this performance. Moreover, the brain is remarkable in that this
performance is obtained by a system whose individual components, neurons, are
individually much slower than currently used electronic components [7,8].

Another motivation is a desire to develop cognitive models that can serve as a
foundation for artiﬁcial intelligence [9-11]. Although it is well known that the brain is not
as good as a digital computer at performing arithmetic operations, there are several aspects
of brain function that we are not able to duplicate with conventional computers. Some of
these are association, categorization, generalization, feature extraction, and optimization
[12,13].

Many researchers have been developing models to study neural networks. These
models fall into two categories. One is biological modeling with a goal to study the
structure and functions of the real brain in order to explain biological behaviors. The other
is technological modeling which studies brains in order to extract concepts to be utilized in
new and more powerful computer architectures [14-16]. Although there is a controversy
over which one of these two branches should constitute the true focus of research in neural
network modeling, the latter path has been taken by many working in the area of what has
come to be called artiﬁcial neural networks (ANN).

An ANN is a parallel distributed information processing system which consists of

processing elements (neurons) and connections (synapses). Each processing element,

3

characterized by its working function, can receive input signals from a certain number of
incoming connections and produce output signals to a certain number of outgoing
connections. Table 1-1 contains a comparison of some of the characteristics of neural
networks and conventional digital computers. Stored information and network output of
ANN is determined by its structure, processing elements, and connections. Since the output
results are collective decisions of all parameters together, the loss of information due to
faults such as damaged connections and malfunctioning processing elements will still result
in correct outputs as long as other parts of the network overrule the faulty parts. This
provides the inherent fault tolerance property for ANNs. The ability of massive parallel
computation due to many processing elements operating in parallel is another important
feature possessed by ANNs, which is essential for the applications requiring high
computation rate such as pattern recognition, and combinatorial optimization.

The history of neural network research has been intertwined with symbolic artiﬁcial
intelligence since the beginning of both ﬁelds [17-19]. In the late 1950s, when a group of
scientists turned their attention to attempting to build intelligent systems, two different
approaches emerged. One focused on how the brain did things, the other concentrated on
what the brain did regardless of how it was accomplished biologically. The second
approach, artiﬁcial intelligence, became favored over the ﬁrst approach because of rapid
advances in software technology which provided ﬂexible and powerful tools for testing
models and concepts. In the early 1980s, several key papers published by Hopﬁeld,
Grossberg, and Rumelhart spurred a revival of interest in the ﬁrst approach - neural
networks [2-4].

The Hopﬁeld-Tank network, one of the most well-known artiﬁcial neural networks,
has been applied to combinatorial optimization problems such as the traveling salesman
problem (TSP) [20-23], linear programming problems [24,25], nonlinear programming
problems [26,27], communication network routing problems [28,29], and others [30-33].

Due to the gradient pr0perty of the Hopﬁeld-Tank network [2], each of these problems can

 

 

 

Characteristics Neural Networks Conventional Computers
Memory Structure Distributed System-Dependent
Memory Recall Associative Speciﬁc Input
Fault-Tolerance Inherent Not Inherent
Pattern Recognition Excellent Poor
Classiﬁcation Excellent Poor

Learning Excellent Poor

Arithmetic Operation Poor Excellent

Tinting Scheme Asynchronous System-Dependent
Degree of Parallelism High System-Dependent
Processing Element Simple Complex

Degree of Connectivity High Low

 

 

 

Table 1-1. Characteristics of neural networks and conventional computers.

 

5
be solved by designing an appropriate network whose minimum energy states correspond
to the problem’s solutions.

In particular, a dynamic programming neural network has been developed for
dynamic programming problems using a feedback Hopﬁeld—Tank type network [34,35].
This network formulation has been shown to robustly provide near-optimal solutions in a
very short period of time independent of network size. It has also been shown to be a
gradient system based on the fact that the derivative of the associated energy function along
all trajectories is less than or equal to zero. The network can converge to one of the stable
equilibria (minimum states of the associated energy function) if the initial conditions are
sufﬁciently close to the stable equilibrium. The stable equilibria can be regarded as the

near-optimal solutions to the problem.

1.2 Statement of Problem
A typical dynamic programming problem (DPP), shown in Figure 1-1, can be
modeled as a set of source and destination nodes with n intermediate stages, m states in each

stage, and metric data dxi’ (x + I ) j’ where x is the index of stages, and i and j are the indices

of the states in each stage. The object is to ﬁnd an optimal path composed of one and only
one state in each stage ﬁom source to destination.

The conventional approach uses the principle of optimality which requires a large
number of computations to determine the optimal solution [35]. In fact, the number of
computations is so great for large dynamic programming problems, that the conventional
algorithms are not effective in many applications. This is especially true for those requiring
fast, perhaps real-time, solutions. Additionally, computation in the conventional algorithms
for dynamic programming is executed often sequentially, therefore the computation time is
proportional to the problem size. This is primarily due to the fact that the basic algorithms
are only partially parallelizable.

Interestingly though, in many real-time dynamic programming applications, the

stage 1

stage 2

stage 3

 

Destination

Figure 1-1. A 3x6 dynamic programming problem (full interconnection not shown).

7
rapid calculation of near-optimal solutions is sufﬁcient when contrasted to a slowly
computed globally optimal solution. For example, robot trajectory planning problems,
aircraft trajectory control problems, and other optimal control problems that must respond
quickly to radically changing environmental conditions are of this type.

To solve dynamic programming problems with an artiﬁcial neural network, an
appropriate network formulation must first be determined such that the problem solution
corresponds to the global optimal solution, or at least a local near-optimal solution. The
essential issues of stability and convergence of the formulated network must be ensured so
that the network always converges to one of the stable equilibria. In order to obtain good
solutions for problems, the qualitative network behavior must be understood. As will be
seen in Chapter 3, this network characteristic is determined by its associated energy
function. Finally, some of the implementation problems for this network and networks of

this type must be addressed.

1.3 Research Tasks

The tasks of this research are to (l) formulate an artiﬁcial neural network for
dynamic programming; (2) analyze the formulated network from the view point of
associated energy function; (3) verify the formulation and analysis through network
simulation; (4) demonstrate the applicability of the formulated network by solving optimal
control, associative memory, and speech recognition problems; and (5) propose an
architecture for the formulated network.

The network formulation will be based on the I-Iopﬁeld-type network. To design an
Hopﬁeld-type network for solving a speciﬁc problem and determine the connection
weights and parameters to achieve the best solutions to the problem, a general procedure
goes as follows:

(1) Select an encoding procedure such that the outputs of the network correspond to

the solutions of the problem.

8
(2) Choose a proper energy function, bounded from below, whose minimum

corresponds to the optimal solution of the problem.

(3) Determine connection weights and bias current which properly represent the

objective function and constraints of the problem.

To formulate a neural network for dynamic programming, each state node will be
considered as an individual neuron. Examining the characteristics of the optimal path
carefully, two constraints become evident. First, the optimal path must visit one and only
one state in each stage (a structural constraint). Secondly, the optimal solution must have
the minimum total cost based on the given performance measure (a cost constraint). Thus,
the energy function has two requirements. The structural constraint implies that the energy
function must converge to stable states where one and only one state in each stage is active.
The cost constraint dictates that the energy function must converge to stable states
representing a minimum path. After the associated energy function has been determined,
the network can be formulated as will be shown in Chapter 3.

The second task is a two-fold analysis of the network. The ﬁrst objective is to
explain why the formulated network is functional by investigating the following properties.
(1) The network must possess a unique solution for every initial condition.

(2) The time derivative of the associated energy function decreases monotonically

along the trajectories and becomes zero only at equilibrium points.

(3) The network must have a ﬁnite number of equilibrium points.

(4) Every equilibrium point of the network must be a local minimum of the

associated energy function.

Property 1 will ensure the existence and uniqueness of the solution. Property 2 will
guarantee the stability and convergence of the network. The possibility of existing periodic
solutions will be avoided by Property 3. Finally, Property 4 will provide a way to ﬁnd the
equilibria from the associated energy function. Once the above properties are veriﬁed, the

network will be shown to converge to one of the equilibria, if the initial condition is in one

9
of the basins of attraction of the equilibrium points.

The goal of designing the network is to obtain near-optimal solutions for dynamic
programming problems. Therefore, the equilibria must reside in the regions which can be
regarded as valid and near-optimal solutions. The second objective then is to analyze the
relationship between the quality of the solutions and the parameters of the associated
energy function, derivation of the locations and numbers of the minimum states for
different components of the energy function, and the locations of the minimum states of the
energy function with different values of parameters.

The third task involves a simulation which will demonstrate the functionality of the
formulated network and show that it can produce near-optimal solutions for DPPs. Since
the network is described by a set of ﬁrst order differential equations, the network can be
simulated by solving the associated differential equations with numerical analysis
techniques running on a conventional computer. Simulation programs written in C and
based on a fourth-order Run ge-Kutta method for the formulated network will be developed.

This task will be divided into four sub-tasks which are to:

(l) depict the relationship between the quality of solutions and the parameters of the

associated energy function;

(2) simulate different networks ranging from 2 x 2 to 64 x 64;

(3) simulate networks with random errors to connection weights;

(4) simulate networks with different initial conditions and different parameters of

the associated energy function; and

(5) simulate large-scale dynamic programming problems with networks of smaller

size by decomposing the problems.

The ﬁrst sub-task can provide appropriate values of the parameters for the
subsequent simulations. The results of the second sub-task will serve as an index to the
success of the formulation of the networks. The Hopﬁeld-Tank networks are shown to be

gradient systems only when the networks are symmetric. However, it is believed that

10
sufﬁciently small changes of the connection weights will not alter the qualitative properties
of the networks [37]. The third sub-task will show the network sensitivity to errors in the
connection weights. The fourth sub-task will qualitatively show how the parameters affect
the basins of attraction of the equilibrium points.

The implementation of large scale neural networks has been severely restricted by
current technology. One way to cope with this problem is to use a divide-and-conquer
method. The advantage of this method is that the size of the network used to solve larger
problems will remain small. However, the time to ﬁnd a solution will increase accordingly.
In the last sub-task, the reduction of the network connectivity, the problem-solving time,
and the quality of the solutions will be examined.

For the fourth task, three problems in different applications, optimal control,
associative memory, and speech recognition, will be solved by the formulated network.
Dynamic programming is one of the methods which have been used to solve optimal
control problems. A continuous system described by a ﬁrst order differential equation will
be used as an example to show how dynamic programming neural networks can produce
near-optimal solutions based on a predeﬁned perfomrance measure. Before the network
can be applied, the system differential equation must be approximated by a corresponding
difference equation, and the performance measure must be expressed or approximated in a
discrete form. Then the structure and connection weights of the network can be determined.
The solutions produced by the networks will be compared to the optimal solutions.

One of the most useful and most explored areas of applications for artiﬁcial neural
networks is associative memory. Suppose that a set of memory patterns [Y1 , Y2,..., 1"} is to
be stored, where Y' e R" for r = 1 to 1. Given a stimulus X = I" + e where e is sufﬁciently

small, an associative memory is designed in such a way that the output pattern I" can be
recalled successfully. The Hopﬁeld—Tank model is reportedly capable of implementing

associative networks [7]. However, some of the proposed networks do not guarantee that

l 1
each desired memory pattern can be stored [7 ,38,39], and some of them suffer from a rather
complex synthesis procedure [40,41]. The network can be used to store patterns by
selecting an appropriate working function for processing elements and connection weights
between processing elements so that the patterns correspond to the minima of the network.

In the template matching approach for speech recognition, the time scales of a test
and a reference speech pattern are generally not perfectly aligned, therefore a nonlinear
time warping is required to compensate for local compression and/or expansion of the time
scales in order to ﬁnd the best difference measure between them. Dynamic time warping
methods based on dynamic programming techniques have been developed to perform time
alignment and evaluate the difference between a test and a reference speech pattern [42-
45]. The network for dynamic time warping is designed in such a way that the minimum
states of the network’s energy function correspond to the near-best correlation between a
test and a reference pattern [46].

Although artiﬁcial neural networks are composed of basic circuits emulating
neurons and synapses, the implementation of AN Ns has been seveme restricted in size by
current VLSI technology limits such as real-estate considerations, and fan-in, fan-out, and
connectivity limitations [47, 48]. To alleviate some of these implementation problems, a
building block paradigm, in which neural networks are constructed by numbers of VLSI
chips implementing neurons and synapses separately, will be used. Eberhardt, et. al. have
designed various CMOS VLSI building block components in a single chip, such as a
synapse chip and neuron array chip, for feed-forward networks [47]. Based on their
designs, a simple, regular, and implementable architecture will be proposed for dynamic
programming neural networks. In addition, the proposed architecture can serve as a co-
processor such that the connection weights can be programmed via a host computer so that
different DPPs may be solved.

12
1.4 Organization of This Dissertation

This dissertation is organized as follows. Chapter 2 contains a background discussion
of the appropriate topics. Chapter 3 presents a general neural network structure for dynamic
programming and analyzes the network formulation. For the ﬁrst part, an artiﬁcial neural
network formulation for solving the dynamic programming problem is presented. The
formulation procedure is implemented and demonstrated using a modiﬁed Hopﬁeld-Tank
ANN. In the second part, an analytical examination of the energy function associated with a
dynamic programming neural network is presented. The analysis is carried out in two steps.
First, the locations and numbers of the minimum states for different components of the
energy function are investigated in the extreme cases. Then, the locations of the minimum
states of the energy function using different parameter values are derived. This analysis
provides a theoretical foundation for dynamic programming neural networks.

Chapter 4 presents simulation results which will demonstrate the functionality of the
formulated network and show that it can produce near-optimal solutions for different
problems. The relationship between the quality of solutions and the parameters of the
associated energy function, the network behavior due to random errors in connection
weights, and the basins of attraction of the network’s equilibria are studied. Moreover, the
simulation results of different networks ranging from 2 x 2 to 64 x 64; are also presented

Chapter 5 proposes an building block architecture for dynamic programming neural
networks and a problem decomposition method The proposed architecture can serve as a co-
processor such that the connection weights can be programmed via a host computer so that
different DPPs may be solved The problem decomposition based on a divide-and-conquer
method can be used to solve large-size dynamic programming problems with networks of
smaller size.

Chapter 6 presents a new autoassociative memory network using dynamic
programming neural networks.The formulation, based on dynamic programming techniques

used to ﬁnd minimum cost paths, is very simple and straightforward in the sense that memory

13
patterns can be easily added to an existing associative memory without any modiﬁcation to
the previous network connections. Two examples, with the ﬁrst one designed to correct
single-error codes and the second to distinguish three images, are used to demonstrate the
power of the formulated network. The issue of fault tolerance on the autoassociative
memory network is also considered. Comparisons between the proposed network and the
other two networks are also given in this chapter.

Chapter 7 presents a neural network method for dynamic time warping on speech
recognition. The network for dynamic time warping is designed in such a way that the
minimum states of the network’s energy function correspond to the near-best correlation
between a test and a reference pattern. To achieve real-time applications, an architecture
based on this method is also proposed.

Finally, Chapter 8 describes the contributions and conclusions of this dissertation.

 

Chapter 2

 

 

 

2.0 Background

This chapter begins with an introduction of artiﬁcial neural network models. The
Hopﬁeld model is then presented. Finally, the dynamic programming problems which will

be dealt with in this dissertation are deﬁned.

2.1 Artiﬁcial Neural Network Models

Scientists and researchers have been trying to build neural network models for the
purpose of studying the function of the brain for many years. These models, generally
involving very complicated mechanisms and structures, fall into two categories. In
biological modeling, which is of primary interest to neurobiologists, the goal is to study the
structure and function of the real brains in order to explain biological behaviors. In
technological modeling, which is of primary interest in this work, the goal is to study brains
in order to extract concepts for the new generation computer architectures. Although there
is a controversy over which one of these two branches should constitute the true focus of
research in neural network modeling, the latter path has been taken by many working in the
area of artiﬁcial neural networks.

An artiﬁcial neural network is a parallel distributed information processing system

14

15
consisting of processing elements (neurons) and connections (synapses). Each processing
element, characterized by its working function, can receive input signals from a certain
number of incoming connections and produce output signals to a certain number of
outgoing connections [49].

Neural networks incorporate a combination of certain features of various
information processing systems together with some special features. These special features
include the use of simple processing elements, learning abilities to adjust parameters and
connection weights to give the desired responses as well as to compensate for inaccuracies
and faults in the hardware, and ﬁne-grain and massive-scale parallel problem solving
ability. The combination of these features results in signiﬁcant advantages for several
classes of information processing problems. Massively parallel algorithms can provide
rapid processing if implemented on an appropriate processing architecture, especially one
which is matched to the algorithm. The use of only a few different type of simple
processing elements can facilitate fabrication of such massively parallel systems. The
adaptive feature of the processing elements also allows some element inaccuracies to be
compensated. Learning ability also provides a very important advantage in dealing with the
detailed knowledge necessary to build the system.

A neuron is the basic cell of a neural network. The simplest artiﬁcial neuron model
was introduced by McCulloch and Pitts [50]. This model sums the weighted inputs and
passes the result through a certain function. The output of the function, considered as the
output of the neuron, is branched out via weighted connection to the inputs of other

neurons. This simpliﬁed representation is shown in Figure 2-1 where xi’s are the inputs
from other neurons, wij is the weight of the connection from the output of neuron i to the

input of neuron j, g(.) is the neuron working function, 0- is the threshold value, and yj is the

output.

16

 

Figure 2- 1. Simpliﬁed representation of a neuron.

The neuron working function is commonly one of the three types: hard limiters,
threshold logic elements, and sigmoid nonlinearities. These input/output relationships are
shown in Figure 2-2. A more complex neuron may include temporal integration or other
types of time dependencies and more complex mathematical operations than summation
[51]. The sigmoid nonlinearity is the most commonly used because it is bounded,
monotonically increasing, and most resembles a real neuron working function. The range
of the sigmoid function can be [0, 1] for binary mode or [-1, 1] for bipolar mode, depending
on the application.

Groups of artiﬁcial neurons can be interconnected in a variety of ways to form
AN Ns. Many topological conﬁgurations for ANN s have been proposed. These networks
fall into two categories: feed-forward and feedback networks. A feed-forward network is
shown in Figure 2—3. The signal ﬂow of a feed-forward network is unidirectional. Feed-
forward networks can be single-layered or multi-layered Figure 24 shows a feedback
network. Both forward and backward connections exist in feedback networks. Feed-

forward networks have been used in applications such as pattern recognition [52-56],

17

8(1) ‘ g(x)

 

 

    

 

 

 

D <
x
' 'Y
7
(a) Hard limiter. (b) Threshold logic element.
A 800
f
>
x
V
(c) Sigrnoid nonlinearity.

Figure 2-2. Neuron working functions.

18

(0‘9 9'0 o'o‘e

Output la er . e .
y . .
0?. -’o ' o P

 

Figure 2-3. A 2-layered feed-forward neural network (connections
are not fully shown).

 

Figure 2-4. A feedback neural network.

l9
robotics [57], and control problems [58-60]. However, feedback networks are useful in
optimization [61,62], object recognition [63-65], and associative memory problems

[11,12].

2.1.1 Single-layered Perceptrons

A single-layered feed-forward ANN can be created from an array of n artiﬁcial
neurons. Each of these receives m inputs x1, x2,..., xm, via corresponding weights. This
network is able to learn to recognize simple patterns via a convergence procedure for
adjusting weights which stores patterns in a collective fashion [66,67]. This convergence
procedure goes as follows. First, connection weights and the threshold values, each of
which can be considered as another weight associated with a ﬁxed input value, are
initialized to random nonzero values. Second, a training pattern with n features is applied
to the input and then the output is computed. Third, the connection weights are adapted

according to the following equation

Wij(t+1) = W3} (0 +11 [dj(t) ‘Yj (0], (2'1)

where d}(t) is the desired output of neuron j, t is the number of iterations, and n is a positive
gain factor less than 1. The procedure is repeated until the values of Wij(t+1) - Wi}(t) are

less than a predeﬁned value for all i and j.

In the simplest case with n = l, the network is a single-neuron network shown in
Figure 2-1. This network, called a perceptron, can decide whether an input belongs to one
of two classes (denoted as class A or B) after a sufﬁcient training process. The perceptron
uses a hard limiter such that the output is either 1 or -1. The network will respond as class
A if the output is 1 and class B ifthe output is -1.

Unfortunately, the single-layered perceptron is not able to solve problems where

classes cannot be separated by a hyperplane or a line in two dimension [19,25]. The XOR

20
problem which is nonseparable by a line is an example of this problem type.

2.1.2 Multi-layered Perceptrons

Multi-layered perceptrons are feed-forward networks with one or more layers of
neurons between the input and output layer. A two-layered perceptron with one hidden
layer is shown in Figure 2-3. With the development of the back-propagation training
algorithm, multi-layered perceptrons overcome many of the limitations of single-layered
perceptrons [3]. The multi-layered perceptrons with a back-propagation training algorithm
have been tested successfully with a number of deterministic problems such as the XOR
problem [3], pattern recognition problems [11], and others [68,69].

The back-propagation algorithm uses a gradient search technique to minimize a cost
function equal to the mean square difference between the desired and the actual outputs [3].
The network is trained by initially selecting the weights randomly and then the intermediate
outputs corresponding to a given input pattern from layer to layer in a forward fashion are
calculated. The final layer output is compared to the desired output and the error is passed
in a backward direction to adjust the connection weights. The modiﬁcation of weights is
carried out from layer to layer in a backward fashion. The weights are adjusted after each

trial until weights converge and the cost function is reduced to an acceptable value [3].

2.1.3 An Example

One of the most famous backpropagation networks is the XOR multi-layered
perceptron. This network was used to solve the logic exclusive-or problem by training with
examples and can be also used to show how neural networks do information processing.
Though not particularly interesting from an application perspective, the exclusive-or
problem was not solvable using the neural networks of the early 60’s [8].

Figure 2-5 shows this network. Processing elements 1 and 2, which are in the input
layer, simply pass whatever input is applied to them directly to their output. Processing

21

elements 3 in the hidden layer and 4 in the output layer function the same way as shown in
Figure 2-1. The internal activation level for processing elements 3 and 4 is computed from
the weighted sum of the inputs. This internal activation level is transformed into the output
by its sigmoid function. The lines between processing elements are connections. The
arrows show the direction of information ﬂow of a connection. The weight for each
connection modulates the output value of the source processing element before it is passed
on to the destination. The two inputs, T1 and T2, can take on any value between 0.0 and
1.0. The output ranging from 0.0 to 1.0 of processing element 4 is the output of the network

The training process involves applying T1 and T2 to the network, and presenting
the corresponding output to the network. With each iteration, the network slightly changes
the strength of the connection weights. Over the course of about 100 iterations of each case

in the training set, the weights converge to the results shown in Figure 2-5.

W4,2
W3’0 = 2.8, W3’1 = ~7.4, W32 = -7.4

“(4,0 = 8.6, W4’1 = -5.6, W42 =-5.6

 

   

inputs
T1 T2

Figure 2-5. An XOR network.

22
2.2 The Hopfield-Tank Model
The Hopﬁeld-Tank model falls into the feedback network category. One of the
major contributions of the Hopﬁeld-Tank model is that it can be built with analog
components and is suitable for analog VLSI implementation [70-74]. In this model, each
neuron with input u,- and output v,- is modeled as an ampliﬁer with a capacitive element C,-

and a resistive element p,- at the input node. These components deﬁne the time constant of

the neuron. The output of neuron j is connected to the input of neuron i via a conductance
Tij- Figure 2-6 illustrates the basic neuron structure used in the Hopﬁeld-Tank model. The
input-output relationship of the ampliﬁer is sigmoidal.

A general Hopﬁeld-Tank network is shown in Figure 2-7. This network can be
described by a set of ﬁrst-order differential equations of the form [14,15]

dui " u,-
whme
1 1 ”
_. = 35—- + 2 Ti}, (2-2.2)
r t i=1
V: = Kiwi) ) (2'23)

n is the number of neurons, I ,- is the external input current, and gi is a sigmoid function.

The energy function selected by Hopﬁeld for this network is

1 n n n n lvi -

i=1 i=1 ‘0

such that

23

 

 

 

 

 

 

 

 

Figure 2-6. A basic neuron of the Hopﬁeld-Tank model.

 

 

 

2 V3 Vn

Figure 2-7. A general Hopﬁeld-Tank artiﬁcial neural network.

24

d .
00%) = -35 (23.2)

1

Suppose Tij- =Tji for all i and j, then the time derivative of the energy function along the

trajectories can be derived by applying the chain rule as

dE_ 8E dv du 1 dg.—(u-) (235
ziza—v du dt = C du (37.

(2-4)

Since g,(u,~) is monotonically increasing, E S 0 for all t. As a result, the value of the

d t
energy function is strictly decreasing and becomes zero only at equilibrium points where

dE _ d“: _ .
717 — Ci-dT — 0 forallr.
If “i is replaced by hui in equation (2-2.3), where 2. is a constant representing the

neuron gain, then we have

vi = giOcui), (2'51)
1 -r
u. = x8.- (Vt): (25.2)
and
—%)_: ZTi-21v+2ﬁ—Ig71(§)d§ (2-5.3)
i= lj=l i=1 i=1

If it is chosen to be very large, then the integral term of the energy function is negligible

compared to the other terms. This leads to the following:

25

dui N (261)
i=1
and
1 n I! n
i=1j=l i=1

These two equations are valid only for high gain limit, that is, when A is very large.
Equations (2-2.l) - (2-4) actually deﬁne a gradient system and thus guarantee no
oscillation or any complicated behavior in the system [14,75,76]. Furthermore, it has been
proven that such a system has only a ﬁnite number of equilibria if the equilibria are isolated
[40,41]. Those isolated equilibria may correspond to memory patterns in associative
memory, feature patterns in pattern recognition problems, or solutions to an optimization

problem.

2.3 Dynamic Programming Problems

Dynamic programming is a very useful mathematical technique for making a
sequence of interrelated decisions. It provides a systematic procedure for determining the
combination of decisions that maximizes or minimizes overall performance measures.
However, there does not exist a standard formulation for dynamic programming problems.
Rather, dynamic programming is a general approach to problem solving, and the particular
equations must be developed to ﬁt each individual situation. The following example shows
how it can be solved by traditional dynamic programming procedures. This dynamic
programming problem has been shown in Figure 1- 1. A performance measure is deﬁned as
the total length of a valid path from the source node to the destination node. Given the
source and destination nodes, the number of stages n, the number of states in each stage m,

and the metric data dxi, “+1”, where x is the index of stages, and i and j are the indices of

26
states in each stage, the problem is to ﬁnd an optimal path from source to destination. This
optimal path is measured with respect to a performance criterion.
To illustrate the conventional algorithm for solving the DPP, the principle of
optimality is presented ﬁrst. Suppose the optimal path for a multistage decision problem

starting at w and reaching 2 consrsts of segment w-x wrth cost wa and segment x-z wrth
cost cm. The minimum cost 0" is equal to the sum of wa and Cu. The principle of

optimality states that if w-x-z is the optimal path from w to 2, then x-z is the optimal path
from x to z. This principle can be proven by contradiction. Suppose there exists an optimal

path x-y-z which is different from x-z. Then Cx 2 must be less than sz and

y
wa + nyz < wa + sz = 0“.
Since 0" is the minimum cost from w to z, the principle of optimality is proved.
Traditional dynamic programming is a computational technique which makes a
sequence of decisions to deﬁne an optimal path based on the principle of optimality. The
conventional algorithm begins by ﬁnding the optimal path for the last stage and moves
backward stage by stage until the optimal path starting at the source node is found. This
procedure can be stated with the help of Figure 1-1 as follows.
At the ﬁnal stage (i.e, stage 3),
”st 40 = d3: 40
= the minimum length from state i of stage 3 to the destination.
At stages k=2 and k=l,
”a 40 = mi" {did (k+I ) j + L*(k+1)j 40 l
= the minimum length from state i of stage k to the destination.
At the source node
L"'00 40 = mi" {doo Ij + L"'11'401
= the minimum length from the source to the destination,

where i= 0,1...,5 andj=1,.,3.

27

Since the conventional algorithm ﬁnds the optimal path in a sequentially backward
stage-by-stage fashion, the algorithm cannot be fully parallelized. In addition, due to the
limitations of computation capacity, synchronization, communication, and data
distribution problems, high-speed digital computers with some degrees of parallelism may
have difﬁculties in dealing with large-scale problems.

The previous problem is a typical example of dynamic programming problems. One
way to recognize a situation that can be formulated as a dynamic programming problem is
to notice its basic structure is analogous to that of the previous problem. For this
dissertation, the dynamic programming problems which are considered solvable by the
proposed neural network method have the following basic features.

(1) The problems can be divided into stages, with a deterministic policy decision

required at each stage.

(2) Each stage has a ﬁnite number of states associated with it.

(3) The effect of the policy decision at each stage is to transform from the current

state into a state associated with next stage.

(4) The solution procedure is designed to ﬁnd an optimal solution for the problems.

(5) Given the current state, an optimal sub-solution for the remaining stages is

independent of the sub- solution adopted in previous stages.

 

Chapter 3

 

 

 

3.0 Network Formulation
and Analysis

Chapter 3 consists of two major parts, network formulation and network analysis.
For the ﬁrst part, an artiﬁcial neural network formulation for solving the dynamic
programming problem is presented. The problem entails ﬁnding an optimal path from a
source node to a destination node which minimizes (or maximizes) a performance measure
of the problem. The optimization procedure is implemented and demonstrated using a
modified I-Iopﬁeld-Tank network. The proposed network for dynamic programming is
attractive due to its radically improved speed over conventional techniques, especially
where real-time near-optimal solutions are required.

In the second part, an analytical examination of the network is presented. First of
all, propositions related to the functionality of the network are proved. Second, a two-step
energy function analysis is carried out in order to explain how the dynamic programming
neural network can provide near-optimal solutions, since the network behavior is
intimately related to the associated energy function. First, the locations and numbers of the
minimum states for diﬁ’erent components of the energy function are investigated in the
extreme cases. A clearer insight of the energy ﬁtnction can be visualized through the

minimum states of diﬁ'erent components. Second, the locations of the minimum states of the

28

29
energy function using diﬂ‘erent parameter values are derived. It is shown that the minimum

states can reside in regions which are regarded as valid solutions with certain conditions.

3.1 Network Formulation

Generally, the procedure for solving a speciﬁc problem using the Hopﬁeld network
involves three steps. First, an encoding procedure is selected such that the outputs of the
network correspond to the solution(s) of the problem. Second, a proper energy function,
whose minimum corresponds to the optimal solution of the problem, is determined. Third,
according to the determined energy function, connection weights and bias currents which
properly represent the objective function and constraints of the problem are deﬁned

By giving an appropriate working function to each processing element and an
associated weight for each connection between two processing elements, a properly
conﬁgured network can rapidly and robustly provide at least a locally optimal solution due
to the nature of the collective computations. Although the topology of the optimization
surface in the solution space is so complicated that the globally optimal solution is not
guaranteed, many good solutions which are at least locally similar to the optimal solution
can be obtained

Consider again the 3x6 DPP shown in Figure 1-1. The goal is to ﬁnd a valid path

which starts at the source node, visits one and only one state node in each stage, reaches the
destination node, and has a minimum total length (cost) among all possible paths. A
properly designed network is deﬁned by an energy function in which the optimal solution
corresponds to the lowest energy state of the network. Examining the characteristics of the
optimal path carefully, two constraints become evident. First, the optimal path must visit
one and only one state in each stage (a structural constraint). Secondly, the optimal solution
must have the minimum total cost based on the given performance measure (a cost
constraint). Thus, the energy function has two requirements. The structural constraint
implies that the energy function must converge to stable states where one and only one state

in each stage is active. The cost constraint dictates that the energy function must converge

30
to stable states representing a minimum path.
Each state node can be conveniently considered as an individual processing

element. To develop an appropriate energy function for the network, V”. is taken as the

output of a processing element of the ith state in the xth stage, where n is the number of
intermediate stages. The following formal constraints are thus deﬁned.

To satisfy the structural constraint, one and only one processing element must be
active in each stage and the number of active processing elements equal to the number of
stages. This constraint is embedded in

2
E1 = 222_vxivxj+ (zzvxi_n) . (3'1)

x 11¢!

And, the corresponding cost constraint is given by

32 = zzzdxi,(x+l)jvxiv (x+1)j+d(x-l)j.xivxiv(x-1)j' (3-2)
x t j

E 1 will vanish for a valid path. For a minimum cost path, 52 takes on a minimum
value. Let the sigmoid function g(ux,-) =(1/2)(1 + tanh(ux,- / 2.)). From equation (2.3.1), we

deﬁne the energy function for the dynamic programming network as

V .
It
_ a b 1 -1

E - 551+352+22jﬂgn (§)d(§) (3-3)

0.5

where a and b are positive numbers.

The quadratic terms in equation (3-3) deﬁne the connection weight matrix T and the
linear term deﬁnes the bias current vector 1. From the circuit equation of the Hopﬁeld

network,

duxi uxi
Ct“ (71?) = gaudy" 1.5;: ‘th' (3'4)

3 l
the weight of the connection linking the ith neuron of stage x with the jth neuron of stage y
is

bd +8

xi.yj(5(x+1>y

)
(It-1)) 3_5
2 ( )

Txiyj= -a8x y(1—8 .J.)—a—

where

a8xy(1 - 81-1.) is the inhibitory connection within each stage,

a is the global inhibition,
bdno, yJ( 8(x+1) y + 80M) y )/2 is the strength of the distance metric,

1 ifi=j(x=y)

0 otherwise,

and the input bias current of ith neuron of stage x is

Iﬁ=an.

Therefore, the dynamic programming network can be described by the following
differential equation and is conceptually depicted in Figure 3-1:

dun.
Cx- (d—ti) i #ajgivxj— aEZvyﬁ-an
—:2(dxi(x+1)lv(x+1)1+d(x 1)].Xiv(x_1)j) (3-6.1)
where
1 1
R:— 5:- +ZT I,” (3-6.2)

)1

Moreover, equation (3-2) can be rewritten as

32

 

 

 

 

 

V

 

 

 

 

 

 

 

 

 

 

Destination

 

 

 

Figure 3-1. A conceptual 3x6 dynamic programming network (full interconnections

not shown) and its processing element.

33

in
E=--2222T~ W.- ,, fiver-+22! R —e;<é>d<é> (3'7)
x 105 It.

3.2 Network Analysis

To represent equation (3-4) in terms of the variables inr it can be written as

 

dvxi dvxi duxi 1
a: (17;? ‘27.; a—_g;.-<v.,> (21,, £331..)-f.t<v>-
dvxi

(3-8)
Therefore, the network will be considered as an autonomous system since the function f,“-

is not dependent on time.

3.2.1 Definitions

The following deﬁnitions will be used in the remainder of this dissertation.
i) The domain and range of the sigmoid function g are deﬁned as U=RN and V=(0, 1)”,
respectively, where N = m x n. Every vector v e V can be expressed as

[vll,v12,... ,vmn]T , where 0 < th' < 1. The zero vector of V is denoted as 0N- The

closure and the boundary of V are denoted as l7 and 8V, respectively.

ii) A vector c e V is said to be a valid corner state if Cxi is either unity or zero and
Ecxi = l forx=l to n.
t‘
iii) The valid corner set C is deﬁned as the set of all valid corner states.

iv) A vector v e V is said to be in an h-neighborhood of a valid comer state c if

[I c — v”... < h, where H c — vll .. is the infinite norm and h is a constant.

34

v) A vector v e V is said to be in an h-neighborhood ofC if minll C — VII... < h'
c e C

vi) The set Ch is deﬁned as the set of vectors which are in the h-neighborhood of C, i.e.,
Ch=[v 6 VI minllc—vl|”< h, where c e V].

vii) The set Q is deﬁned as the set of vectors q=[q11, q12,..., qmn]T where 4xi=0 if enco,
and l— h < qxi 51 if exi=1, for every 0 e C. The set QC is a subset of Ch.

viii) A vector of convergence, v e V, is interpreted as a valid solution if it is in the h-

neighborhood of C where h is usually chosen to lie between (0, 0.2) [34].

V
ix) Given .2 > o, the set 05 is deﬁned as [5, 1— 8] N such that o s j g‘1(§)d§s e for all
0.5

v 6 D5. Note that 0 < 8 < 0.5 and is a function of the gain it such that as 7t —) co,
5 -—> 0. Also note that as l —9 oo, 05 —~) V.
x) The set 55 is deﬁned as V—Da.

xi) The open set B (o, 8) is deﬁned as

B(o,8) = {veRNl (IV-0l<5)}'

xii) A vector v0 5 V is said to be an equilibrium point of the system described by equation
(3-8) if and only if f)“. (v0) 5 O for all x and i.

xiii) The equilibrium point 0 is said to be stable if, for each 8 > 0, there exists a E, (e) > 0

such that

"NW ‘0“ <§(e) =>||V(t) -0|| <8, V(t2t0)-

35
xiv) The equilibrium 0 is unstable if it is not stable.
xv) The equilibrium 0 is asymptotically stable if (1) it is stable, and (2) there exists a

number £1 > 0 such that

||v(t0) -o|| <§1=>||v(t) —o|| —)O, as (t—ieo).

xvi) A system on an open set W c R" is said to be a gradient system if

9% = —grad L(x),
where L is a C2 function, and
_ 8L 8L
grad L — [a—xl ,..., ax—J

is the gradient vector field.

3.2.2 Analysis of Network Functionality

The following propositions set forth to explain why the formulated network can
solve dynamic programming problems. Some similar results have been derived for general
Hopﬁeld-type networks [40,41], nevertheless, they are restated for the formulated network

in this dissertation for completeness.

Proposition 1. For any initial state v e V, there is a unique solution for the system

described by equation (3-8).

Proof: From equation (3-8),f(vn~) and a—e—ﬂvxi) are continuous. According to the theory

I!

of global uniqueness and existence in systems of differential equations, the network

36
satisﬁes a global Lipschitz condition [75]. That is

||f(X) -f(y) II 5 MIX -y||

and

llf(xo) II S h

where x, y, and x0 6 V, and k and h are ﬁnite constants. Then the network has exactly one

solution.

Proposition 2. The vector 0 e V is an equilibrium point of the system described by
equation (3-8) if and only if

 

8 5‘” = 0 (39.1)

av,“-

for all x and i.
Proof: Necessary condition:
If a is an equilibrium point of the system described by equation (3-8), then

vxi _ _
E _ fn.(v) .. 0 (3-92)

for all x and i. Therefore, we have

 

27" an; ,,- 7+1: 8 EM = (3-93)

It avxi

37

 

for all x and i.
Sufﬁcient condition:
If
8 _
Evy-E (v) -— 0
then

1 1 “xi ]
_x————x T. .v.__+1_ =f.(v)=o.
C11. dg;i1 (in) (g 11, y} y] in II x;

dvxi

for all x and i. Q.E.D.

Proposition 3. The energy function E decreases monotonically along the trajectories and

becomes zero only at equilibrium points.
Proof: Since the network is symmetric, that is, Txi’ychw-ﬁ, the derivative of E along the

trajectories with respect to time can be derived as

2

d5 __ 3E dvxi d“xi __ duxi d
H? - 21:; (37,5) (37;) (a; - -;;Ci(d—t ) d—untg (“1a) . (3‘10-1)

Since gxi is monotonically increasing, dE/dt is less than or equal to zero. dE/dt

equals zero only when

 

 

xi _ a _ _
dt — av .E (v) — O (3 10.2)

X!

for all x and i. Q.E.D.

38
Proposition 4. Assume that each equilibrium point of the system described by equation (3-

8) is isolated, then there are a finite number of equilibrium points in the system [40,41].
Proof: Since g: (in) —> + co as in approaches 1, and g: (in) —~) —oo as in approaches

0, then

3
8v 50’)

xi

 

 

'-'-C‘ Cad—t (3-11)

 

 

RX!

_8;.- (vs) 1
2]? xi yj Vyj , xi

 

will approach inﬁnity as v approaches the boundary of V. Therefore, there exists a i, such

 

33 E(v) #0, outside of C = (0+ g, 1 — 5)”, where O <§< %. By proposition 2,

xi

that

every equilibrium point must be in the closure of C, [O + g, 1 - é] N. Since the closure of

C is compact, and every equilibrium point is isolated, it can be concluded that there are a

ﬁnite number of equilibrium points in the system. Q.E.D.

Proposition 5. Every isolated equilibrium point of the system described by equation (3-8)
is asymptotically stable.

dE (v)
dt

 

Proof: Since 5 O and E(v) > 0 for all v, the energy function E is a strict Liapunov

function in some neighborhood of every isolated equilibrium point. Hence, every isolated

equilibrium point is asymptotically stable. Q.E.D.

Proposition 6. o is an equilibrium point of the system described by equation (3-8) if and
only if 0 is a local minimum state of the energy function E [40,41].
Proof: Necessary condition:

If 0 is an equilibrium point, then 0 is asymptotically stable according to proposition

5. For the purpose of contradiction, assume that o is an equilibrium point and is not a local

39

minimum state of E. Then there exists a sequence

{0,} c: (0.1)”

such that

E (om) < E (0)

and

1
O<|om-o| <31-

where om e {on} . Srnce 0 is an isolated equilibrium pornt, there exrsts an e > 0 such that

there are no equilibrium points in B (o, e) - {o}. For any 8, e > 8 > 0, choose m such that

—1— < 6. Therefore,
m

ome B (0, 5) — {0} CB (0,8) - {o}

and om is not an equilibrium point. Suppose the solution (1) (om) with initial state om

converges to an equilibrium point 6. Then

5(a) <E(om) <E(0)

by proposition 3. Consequently, 6 is not contained in the region of attraction associated
with equilibrium point 0. Moreover, a is not asymptotically stable. Hence, the necessary
condition is proved by contradiction.

Sufﬁcient condition:

Assume that o is a local minimum staterof E and is not an equilibrium point. Since

0 is a local minimum state of E, there exists an e > 0 such that

E(0) <E(5)

for all 5 e B (o, e) . Suppose the solution (p (o) with initial state 0 converges to an

equilibrium point a at 0. Two cases may occur. First, if b e B (o, e) , then

53(6) <E(0)

Secondly, if b e B (o, e) , then there exists a state 0° 6 B (o, e) in the trajectory (p (0)

such that

E(o°) <E(o)-

Contradictions occur in both cases, therefore the sufﬁcient condition is proved. Q.E.D.

In summary, Proposition 1 ensures the existence and uniqueness of the solution.
Propositions 2, 3, and 5 guarantee the stability and convergence of the network. The
possibility of periodic solutions is avoided by Proposition 4. And Proposition 6 provides a

way to ﬁnd the equilibria from the associated energy function.

3.2.3 Analysis of Minimum States

Many approaches have been developed for analyzing general Hopﬁeld-type
networks. For examples, the work by Aiyer, et al., based on the geometry of the subspace
set up by the degenerate eigenvalues of the connection matrix, has given an answer to why
the Hopﬁeld network frequently converges to invalid solutions when applied to the
traveling salesman problem, and suggests some ways to solve the problem reliably [77-79].
Moreover, they have proved that spurious ﬁxed points can occur at any comer of the
solution hypercube in the case of content addressable memory, which is consistent with our

results obtained by analyzing the network’s associated energy function. The work by Abe

41
has clariﬁed by an eigenvalue analysis, the conditions to converge to a vertex, a point on
the edge, or an interior point of the hypercube [80].

Since the network behavior is intimately related to the associated energy function,
it is necessary to investigate the characteristics of the energy function in order to explain
how the dynamic programming neural network can provide near-optimal solutions. In this
dissertation, a two-step analysis is carried out. First, the locations and numbers of the
minimum states for different components of the energy function are investigated in the
extreme cases. Second, the locations of the minimum states of the energy function using
different parameter values are derived.

Let L be a real-valued function on V. For v e V, denote

F“, v) = [f1 (L, v), ...,f,I (L. ‘01fo

where

and
G (L, v) = [GO- (L, v)]~x~

where

42

P31”,- (L, V)—

8va

Bf,- (L, v)

Gij (L, V) = avjz

812(L, V)
8v-

[”3
- —me

 

 

The following two applied matrix theorems are also required.
Theorem 1: If A is an n x n positive deﬁnite symmetric matrix, then every principal
submatrix of A is also positive deﬁnite [81].
Theorem 2: If A is an n x n positive deﬁnite symmetric matrix, then every eigenvalue of
A is positive [81].

If a high-gain limit is assumed as is common, then the minimum states of the energy

3E1 + 2E2, as the integral term of equation (3-6) can

function are very close to those of 2 4

then be neglected[5—8].
The following propositions help to provide a clearer insight into the associated

energy function by explicitly showing the minimum states of El and E2.

Proposition 7. There are no local minimum states of E1 in the interior of V.
Proof: A state vector v is a local minimum state of E1 if the N-vector F (E 1,v) is zero and
the N XN matrix G(E1,v), which is the Hessian matrix associated with E1, is positive-

deﬁnite. The ﬁrst condition leads to N linear equations of the form

43

 

8E1
._._ = 2( v .+ v .—n) = 0, (3-12.1)
8"11 j; U g y]
8E1
__ = 2( v .+ v --n) = O, 3-12.2
aV12 jg'z 1’ g y] ( )

and

8E1
a = 2<zv,.-+zvs-n> = 0- “-1230
vnm in: yj

The only solution for the N linear equations is

9 = [911, 912, ..., Orrin] E V

where

n

"i _ mn+m—1 .

This can be illustrated by the following example with n=2 and m=2. The four linear
equations are

3E1

av = 2(v12+ (v11+v12+v21+v22) —2) = 0, (343.1)
11

3E1

8E1

5v— = 2(v22+ (v11+v12+v21+v22) —2) = O, (3-13.3)
21

and
BE 1

From the ﬁrst two equations, v11=v12 and from the last two equations, v21=v22.
Substituting v12 with VII and v22 with v21 into the ﬁrst equation, v22 is found to be identical
to V11. Therefore, v11=v12=v21=v22=2l5=n/(mn+m-l).

To ﬁnd out if 9 is a local minimum of E 1, it is usually necessary to lmow if G(E1,v)

is positive-definite when evaluated at 0. However, from Theorem 1, if any principal
submatrix of G(E1,v) is not positive-deﬁnite, then G(E1,v) is not positive-deﬁnite. The

leading principal submatrix of G(E1,v) of order m is

32151 32 E1 32 El

avzl a"rravrz a"ira‘ﬁm

——§-:—El iElH __.<32__51

Gll(E1,v)-_—.— avtzavu 31,32 avtzavi... (313.5)
82 32 a
El E1
avlmavn avlmav12 3v?” J
— mxm

 

 

which can be further calculated to be

2 4... 4
G11(E1,v) = 4 2'" 4 (3-13.6)
_4 4.. 2]

 

 

Theorem 2 states that if 011(E1,v) is positive-defmite, then every eigenvalue of

GI 1(E1,v) must be positive.The eigenvalues of 011(El,v) satisfy the following condition

det(G,,(El,v) —M) = 0, (313.5)

45
where 2. is an eigenvalue of 011(E1,v) and I is the m x m identity matrix. Therefore, -2 is
an eigenvalue of 011(E1,v) because det( Gll(E 1,v)+21)=0. Also 4(m-1)+2, which is

positive, is another eigenvalue. Since Gll(E1,v) is neither positive-deﬁnite nor negative-

deﬁnite, v is a saddle point of E 1. It is thus concluded there are no local minimum states in

the interior of V. Q.E.D.

Proposition 8. v is a valid comer state if and only if v is a global minimum state of E1 in V.

Proof: For every state vector v e V,

E1(V)= 222v ,.v.)+(22v..-- n) 20 (3—14)

x int:

If v e C, by the deﬁnition of valid corner state, it follows that 22v”. =
x i

andz 2v in v,” = 0. Consequently, E l(v)=0 and v is a global minimum state of E1. On the

i j¢i

other hand, if v is a global minimum state of E 1, then E1(v) = O. This implies that

22in = n ”“122thij = 0. Consequently, in is either unity or zero and

ijati

2v“. = 1 forx = 1 to n. Therefore, v is a valid comer state. Q.E.D.

Proposition 9. The zero state vector ON is the unique minimum state of E2 in V.

Proof: For every state vector v e V,

E2 = zzzdxi, (x+l)j vxiv(x+l)j+d(x- l)j,xivxiv(x— l)j20 (3'15°1)
x t 1

E2 is zero only at ON. Since the ﬁrst partial derivative of E2 with respect to th

46

8E2
ET = z (dxi,(x+ 1)j"(x+ 1)j + d(x— l)j,xiv(x-1)j): (3-15.2)
I! J

is greater than zero when v at ON and equal to zero when v = 0N for everyx and i, it follows

that E2 is monotonically increasing. Therefore ON is the unique minimum state of E2.

Q.E.D.

Propositions 7 and 8 state that there are exactly n’" valid solutions since E 1 consists
of exactly n’" minimum states. Proposition 9 shows that the zero state vector ON is the
unique minimum state of E2. These three pmpositions together imply that the minimum

a

states of E will be close to comer states if 2

E 1 dominates over %E 2; they will be close to

ON if the reverse is true.

Proposition 10. There are no local minimum states of E = g-E 1 + 2E2 in the interior of

V.
Proof : The proof is similar to that of Proposition 7. We have that

. 8E 8E 8E
f1(E,V) = [Barman-9m] (3-16.1)
where
3E b
Bit—1. = a ( évli + 2‘91 " n) + 22(d1i,2j"2j+ doom-"00)» (3-16.2)
‘ J I yr 1

and v00 is the source node and d00,1i is the distance between the source node and node j of
the ﬁrst stage. It can be shown that Gll (E, v) , the leading principal submatrix of G (E, v)

of order m, is equal to a x Gll (E1, v) . Since Gll(E1,v) is not positive-deﬁnite, G11 (E, v)

47
is not positive-def'mite. Therefore, Proposition 10 is concluded. Q.E.D.

According to Proposition 10, the local minimum states of E, i.e., the equilibrium
points, are in the set B5. This means the equilibrium points are very close to the boundary

of V when a high- gain limit is assumed. Finally, due to the gradient system property, which
states that every equilibrium point is stable [76], the following corollary must be true.

Corollary 1. The network does not possess periodic trajectories.

3.2.4 Location of Minimum States

In this section, the possible locations of the minimum states of the associated
energy function of the dynamic programming neural network will be discussed. With every
dxi,(x+1)j ﬁxed, the minimum states of the network can be approximately placed in a
neighborhood of C such that the minimum states can be interpreted as a valid path with a
locally minimum cost by appropriately adjusting parameters a and b. Assume that b is
ﬁxed, three cases may occur relating to different values of a.

Case 1: a —)oo.

The energy function E is approximately equal to gEl and the minimum states will

be determined by El exclusively. Therefore, the minimum states are in the neighborhood

of C and the number of minimum states is equal to n’". However, the corresponding paths,
which can be considered as randomly selected paths, may be unacceptably far from
optimal.

Case 2: a=0.

Since the minimum state is determined by E2, there is a unique minimum state, i.e.,
the zero state.

Case 3: O < a < ea.

In this case, a qualitative discussion is appropriate. As a decreases, the minimum

states may move away from comer states in C. The following proposition indicates the

48

possible locations of the minimum states if they are in the neighborhood of C.

Proposition 11. Assume there is at most one minimum state vector win the h-

neighborhood of a valid corner state c where

2a+bd

min

hs
2a (n+ l) +bdmin

 

(3-17)

and dmin is the minimum value of metric data. Then the minimum state
vector w belongs to Q, which has been deﬁned in Section 3.1. In other
words, waO ifcxi=0, andl—hwaiS 1 ifch-l forallx and i.

Proof: Every vector v in the h-neighborhood of c can be decomposed as
v = v + v-

C C

where

vc = [cllvll’c12vl2’ "” cnmvnm] E Qc

and

v5 = [(1—c11)v11, ..., (l-cm)vm]°

Then, E1(v) and E2(v) can be expressed as follows.

151 (v) = 222 [cxivxi + (1 — on.) vxi] x [cxjvxj+ (1 - an.) vxj]

x ij¢i

2
+ {[zzcxivxi + (l — cxi) vxi] _ n} (3-18.1)

and

49

E2 (V) = zzzdxi,(x+ 1)} [cxivxi+ (1— Cxi) vxi] X
x t j
[C(x+l)jv(x+1)j+ (1 "c(x+1)j)"(x+1)j] +
d(x-1)j,xi[cxi in+ (1" Cxi)in] [C(x— l)jv(x— 1)j+ (1" C(x- 1)j)"(x- 1))‘1'
(3-18.2)
Therefore, E(v) can be derived as
a b '
E(v) = -2—E1(vc)+ 2E2 (vc) +51 (V) +52 (v) (3.133)

where

)2
E1(Vc) = zzzcxiv xi cxj vxj+ (ZZCxiv xi_ ’ (3-18.4)

x ijati

52 (VC) = Ezzdxi,(x+1)jcxivxic(x+l)jv(x+1)}+d(x-1)].XicxiVXic(x—1)Jv(x-l)!’
x t j

(3-18.5)

Sl(v)=—222(1-cxi)vxi(l—cxj)vxj+—2-a(22(1—cxi)vxi)+

x ijati

b
zzzzdxi,(x+l)j(cxt xt(1 c(x+l)j)v(x+l)j+ (1'— cxi)vxi(1— c(x+1)j)v(x+l)j)
x t j

+d(x-l)j,xi(cxi vxi(1- C(x- l)j)v(x- 1)j+(1— Cxi)".xi(1"C(x--l)j)"(x—1M):

(3-18.6)

and

50

52 (V) = a (226% vxi —n) (;;(1— cxi) vxi)+ g2 212i cxi vxi( 1may) vxj

b
+5222Qj“ xj -cxi) in+ zzzzdxi, (x+ 1)jc(x+ 1)jv(x+ 1); (1 " 613) V,“-
x t j
+ 2222d(x-1)}.xic(x-1)jv(x-1)1‘(1_ Cxi) Vt“

xtj

(3-18.7)
Each term of S 1 (v) 2 O is greater than or equal to zero. Since v is in the h-neighborhood
of c, there exists one and only one k for every x = l to n, where l S k S m, ka = l, and ex]-

=Oifj¢k, suchthat1—h<vxk_1.Thenwehave
Zea-V xi = vxk — -»h (318.8)
and

22 2' cnvn (1cx1)vxj=22 Z ijvxj (1 1')in

x ijati x ijati

= Ezzcxjvxj (1- c,,-) v.,-2 (1- h) (22 (1 - c,,-) v.,-) ~ (3-18.9)
x i j x 1

Therefore, equation (3-18.7) can be rewritten as

 

bd .
52 (v) z 22 (1 - cm.) via (1 - h) + 5"” (1 — h) —anh:|- (348.10)
x i
Furthermore, if
2a+ bdmm

 

Smi2a(n+1)+bdn

then S2(v) is greater than or equal to zero. Therefore, E (v) > E (vc) for every v = vc + v5

and v5 at: ON. Thus, if w is the only minimum state of E in Ch, then w must be in QC. Q.E.D.

51

With certain conditions, it has been shown that the assumption used in Proposition
5 is valid [45]. According to Proposition 5, the associated energy function can be expressed

as

E(v) = E(vc) = g51(vc)+§ez(vc)- (319)

Therefore, if the minimum states are in the h—neighborhood of C, they can be obtained from
equation (3—19) with only n unknown variables, instead of the original

E(v) = 5255100 +2520)

which consists of n x m variables. This reduction will be very helpful in finding the
minimum states of the associated energy function.

For a minimum state v, suppose that vJr = max (Van) for i=1 to n, O is the output
state vector, E, is the processing element associated with vx in stage x, dx, x + 1 is the metric

data between E, andi'x... 1, and dx_1’xis the metric data betweenf'x and 133-1. Without
loss of generality, assume that gn-=g, in=R, and CxFC for every x and i. Also redeﬁne
0t=a/C and B=b/C.

Better quality paths result when a is small [39]. Therefore, it is desirable to use a
smaller value of a and also keep the minimum states in the neighborhood of C so that the
minimum states can be regarded as valid and quality solutions. However as a is decreased

to a modest value, two cases may occur. First, the ideal case where the output state vector

0 stays in the h-neighborhood of C could occur when the sum of dx, ,.. 1 and 33.1,; are
very close to each other for all x. Second, 0 may fall outside the h-neighborhood of C if

there is at least one E, where the sum of 21,,“ and fix- 1,, is very large compared to

others. To prevent the second case from occurring, a maximum value v' which satisﬁes the

following conditions is set to limit vx.

For the particular processing element B, in stage x,

52

others. To prevent the second case from occurring, a maximum value it which satisﬁes the

following conditions is set to limit vx.

For the particular processing element P, in stage x,

513

dt = -i‘t—0tm7—[3vdavg+om>0, (3-20.l)
where it = g-1 (5) , then
.. < an — it
an + Bdavg (3-20.2)

For the other processing elements in stage x,

a; = m— ai— anv + an— pidm,,,< o- (321.1)

Therefore,

(In-rum

3-21.2
”>ot(n+1)+otdm,.n ( )

 

where um is the maximum allowable value of u due to circuit considerations, rim,g is the
average of the local distances dxt,(x+1)ﬁt and d,,,,-,, is the minimum of the local distances.

These two conditions are derived from the network dynamical equation (3-S.1) with

an assumption that the sum of 21“,, and 33-1,, is less than “mg The ﬁrst condition

guarantees that every waith output value less than i5 will increase its output value until

reaching v. The second Condition guarantees that the output values of the inactive
processing elements will decrease until they become very close to zero. Ultimately, for

each stage x, the trajectory will be forced to reach the state where only E, has an output
value equal to § and all others are very close to zero.
From equation (3-20.2), with O = 1 - h, a lower limit for at can be derived as

Bd..,<1—h) +g“(1—h)
a) .

nh (3-22)

 

As a is decreased below the value given in equation (3-22), the minimum states

5 3
move too far away from the h-neighborhood of C such that they can no longer be regarded
as valid paths.

3.2.5 Example
The following example demonstrates the theoretical results presented in this
section. Figure 3-2a shows a simple problem which has an optimal path with a total distance

of 2 units. The energy function in the form of equation (3-6) is composed of

and
E2 = 2v11+18v12~ (3432)
There are two minimum states (1, 0) and (0, l) in El according to Figure 3-2b.
Additionally, the saddle point of E l is (1/3, 1/3). The state (0, 0) is the unique minimum
state of E2 shown in Figure 3-2c. Choosing a=50 and b=4, Figure 3-2d shows the
associated energy function possesses only one minimum state which is close to the comer
state (1, 0). The accurately calculated minimum state is (0.96, 0) as determined by equation
(3-19).

3.3 Summary

This chapter presented a dynamic programming artiﬁcial neural network that can
rapidly and robustly provide a near-optimal solution for the DPP. Since dynamic
programming is a very useful technique for solving optimal control problems, the proposed
network is quite practical.

The core of this chapter is an analytic exploration of the proposed dynamic

programming neural network. First of all, propositions related to the functionality of the

network were proved. Secondly, a two-step energy function analysis was carried out in

54

 

 

V11

  

1
source destination

   

 

V12

 

 

 

 

 

 

 

 

 

130.000 ‘

6.6““-

 

 

 

 

-.0ﬂ003 ‘

(0.1)

 

 

 

 

 

 

 

 

Figure 3-2. a) Data for demonstrating the example, b) Energy surface for El, c) Energy
surface for E2, and d) Energy surface for E=25E1+E2.

55
order to explain how the dynamic programming neural network can provide near-optimal
solutions. Each component of the associated energy function has been discussed in the

extreme cases. With a —-> co, the minimum states are the valid corner states representing

valid solutions. With a = 0, the zero state is the unique minimum state. Then, the possible
locations of the minimum states of the associated energy function for different parameter
values a and b have been derived. We also have shown that a must be sufﬁciently large to
ensure that the minimum states reside in the regions which can be regarded as valid
solutions. Through the minimum states, the network behavior can be predicted and the
near—optimal solutions can be obtained. The analysis developed in this paper can be further
extended to other networks, such as the TSP Hopﬁeld-Tank network.

It is more interesting to investigate the region of attraction of every stable
equilibrium point so that the initial conditions can be appropriately given to obtain the
desired solutions. How the parameters a and b affect the number of minimum states of the

associated energy function is another issue to be studied further.

4.1

 

Chapter 4

 

 

 

4.0 Network Simulation

This chapter describes a simulation of the formulated network to validate the
analysis in Chapter 3. Three items are studied by the simulation: the relationship between
the quality of solutions and the parameters of the associated energy function, the network
behavior due to random errors in connection weights, and the investigation of the basins
of attraction of the network’s equilibria. Results show that the formulated network can
provide a near-optimal solution during an elapsed time of only a few characteristic time
constants of the circuit for different problems with sizes as large as 64 stages with 64 states
in each stage. An application of the proposed algorithm to an optimal control problem is

also presented.

4.1 Network Simulation

Since the network is described by a set of ﬁrst order differential equations, it can be
simulated by solving the associated differential equations with a numerical analysis
techniques running on a conventional computers. Simulation programs written in C

language for the formulated network have been developed

56

4.1.]

wht

57

4.1.1 Simulation Algorithm

For convenience, rewrite the network equation (3-6.1) as

duxi uxi
b
’5; (dxz.<x+ 111V<x+ 1)} + drx- 1)j.xa°"(s_1)j) (4-1.1)
where
1 l
_=__+ T. .- (4-l.2)
in pxi y, “’1’

To simplify the simulation, we assume that g xi = g, in = R, and C xi = C for all x and i. The

speed of convergence may be altered by the use of different values of electronic
components in the neurons. However, the solutions will remain the same since the network
is a gradient system and the locations of equilibria only depends on the connection weights.

The gain of the sigmoid working function 1/uo is set at 50 as a high gain assumption. The
inputs of the analog processing elements “xi are limited in the range of {-5, 5] since g(5)=

1/2(1+tanh(250)) is very close to 1 and g(-5) is very close to 0. Finally, without loss of
generality, the value of RC is set to 1. Deﬁne a=a/C and B=b/C. Then, equation (4—1.1) can

be rewritten as
(duﬁ/dt) = -ua- - “if-ﬁr ‘1 2hr '

5’2 E: (dxi (x+1)j"(x+1)j + dot-1 )jxi vat-1) j ) + a “ (4'2)

Although the network is considered to be analog in nature, the state of the
processing elements will be deﬁned as active, inactive, or undeﬁned depending on their

output values. As a conservative circuit convention, the processing elements with output

58
values greater than 0.90 are considered as active, those with output values less than 0.10
are considered as inactive, while those with output values in-between are undeﬁned. Thus
a valid path is deﬁned as one which corresponds to a state with one and only one value of
output greater than 0.90 and the others less than 0.10 in each stage.
Four major steps are followed:

Step 1. Randomly generate local distances of dxi(x+1)f

Step 2. Set appropriate values for or and B (see below).

Step 3. Set the initial values uﬁ equal to the sum of um and Sun- for all x and i, where
the disturbance Sun- is randomly chosen and uniformly distributed in

[- 0.1uo, 0.1uo]. The voltage uoo is a constant such that m*g(uoo) is equal
to unity, where m is the number of states within one stage.
Step 4. Solve equation (7) for all x and i, where
in = g(un.) = (1/2)(1 + tanh(uxl. / uo)).

4.1.2 Quality of Solutions and Function Parameters

An experiment has been done to ﬁnd an appropriate value for B for 8x8, 16x16 and
32x32 dynamic programming problems. A speciﬁed set of 10 examples in which the local
distances are selected from the integer set of [ l, 3, 5, 7, 9] has been simulated with variable
B. The average normalized path length D, which is equal to the sum of the selected paths
divided by the number of simulations, i.e., 10, and the number of stages plus one, is used
in Figure 4—l.a and 4—1.b. The relation between D and B changing from 0 to 7 for the 8x8
and 16x16 problems with a set to 25 is shown in Figure 4-1.a. The network may provide
invalid results when B is greater than 6 in this case. Figure 4—l.b depicts D vs. B changing
from 0 to 3 for the 32x32 problems with or equal to 5. Invalid results may occur if B is
greater than 2.5. It is important to notice that when structure is enforced by using a small
value of B, selected paths are of poor quality and when cost minimization is attempted by
using a large value of B, invalid paths result. Figures 4-1.a and 4-l.b give a broad choice
for parameter B.

Even though the local distances have been selected from the integer set of { 1, 3, 5,

 

59

 

D
6.000
I :l6xl6
5-000 ’ :- :8x8
4.000 "'
3.000
I
2.000" ' l e e
I l *
I I
1.000 3 w‘ 1' i i i i i :B
0 1 2 3 4 5 6 7 8
(a
D
“’00 :1- :32x32
5.000
4.000"
3.000"
*
2.000"
" :1-
* at:
1.000 5 4t i i i

 

 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 B

(b)

Figure 4-1. a) The relation between the average normalized path length D and different values
of B for 8x8 and 16x16 DPPs and b) The relation between the average normalized
path length D and different values of B for 32x32 DPPs.

found

60

7, 9} in the previous experiment, it is not absolutely necessary to do so. Table 4-1 shows
the comparison of results between simulations with local distances selected from the set of
{1, 3, 5, 7, 9] (group I) and simulations with local distances selected from the set of {0.1,
0.2,....9.8, 9.9} (group H). The percentiles of the same size problems are almost the same.
Table 4-1 also reveals a promising property. That is, shorter average normalized lengths are

found more easily for larger problem sizes.

 

size
group
group I 2.00 1.85 1.61

groupII 1.74 1.44 1.10

8x8 16x16 32x32

 

 

 

 

 

 

 

 

Table 4-1. Comparison of D between two
groups of simulations.

 

D |1.oo 1.50 2.00 |2.50]3.oo|
P |o.oo 113-5 315-4 [213-3 ] 2E-2I

 

 

 

 

 

Table 4—2. Lookup table of P for some values of
D.

The proposed algorithm can provide better outputs if the local distances are
randomly selected from a set whose components are unifome distributed in a positive real
region. For those cases where the local distances are not uniformly distributed, a method of
normalization can be utilized so that good results can be obtained. In the following
simulations, local distances was selected from the set {1, 3, 5, 7, 9] for simplicity.

To ascertain the true quality of a selected path, its percentile among all possible
paths should be known. However, due to the huge computation time needed to obtain the
percentile of a selected path even for medium size problems, the average normalized path

length can not be calculated in each simulation. Given D, the percentile P of a selected path

61

can be approximated by the values given in Table 4-2 which has been constructed from 106
randomly generated paths.

4.1.3 Neural Network Solutions for Different Problems

A 6x6 DPP is presented to demonstrate the behavior of the dynamic programming
network. This example shows how the dynamic programming network indeed converges
to an optimal solution. Table 4-3 gives the local distances for the 6x6 DPP whose optimal
path total length is known to be 13. Table 4-4a gives the neuron initial outputs and Tables
4—4b, 4-4c and 4-4dd show the neuron outputs after 25, 50, and 100 iterations, respectively.
It can be seen that the network reaches a nearly stable state after 50 iterations. Figures 4-
2a, b, c, and (1 give a graphic view of the time evolution of the dynamic programming
network. It is noticed that there are two optimal solutions. The ﬁrst one, which is the one
found by the neural network algorithm is 3-1-1-3-3-1-1, and the other is 1-1-3-3-1-1-3.

To show that the proposed algorithm can provide near-optimal solutions robustly,
many different DPP instances have been simulated. Table 4-5 lists the simulation results.
For those instances with sizes less than 32x32, each has been simulated 20 times with
random initial conditions and local distances. Other cases have been simulated 10 times.
The average percentiles P are obtained from Table 4-2. The asterisked percentiles are the

exact values computed by exhaustive comparisons. The results in Table 45 show that

indeed the formulated network produces near-optimal solutions for these problems. For

example, P is less than 3 x 10'3 and D is about 1.85 which is much less than the average

value 5 for the 16x16 problems.

4.1.4 Basins of Attractions

Given a 4x8 network, shown in Figure 4-3, which consists of two global minimum
cost paths with total distance = 0, denote the ﬁrst minimum path as path A which passes
through state 0 of each stage and is tagged as code A = (O, 0, 0, 0, 0, 0, O, 0), and the second

minimum path as path B which passes through state 3 of each stage and is tagged as code

 

773757
339199
935573

951733

R.—

 

 

 

 

d2i3jj=012 3 4 5

012345
=

157919
715131

959975

 

 

 

 

 

d4i5jj=012 3 4 5

 

62

991035

773999

 

 

 

 

 

d1i2jj=012 3 4 5

 

 

 

 

d3i4jj=012 3 4 5

 

 

 

 

 

 

d00 Ij

 

 

 

 

 

 

 

d6i 7o

 

 

 

 

 

 

 

d5i6jj=012 3 4 s

 

totallengthoftheselectedpath=3+l+1+3+3+1+l=13.

 

Table 4-3. Local distances and path found by the algorithm for the 6x6 DPP. The

63

i=0 1 2 3 4 5

0.191 0.147 0.165 0.144 0.193 0.143
0.149 0.195 0.195 0.190 0.187 0.181
0.174 0.144 0.149 0.182 0.161 0.144
0.179 0.148 0.188 0.153 0.195 0.168
0.168 0.187 0.195 0.168 0.146 0.171
0.162 0.178 0.178 0.170 0.163 0.175

m

>4<

H
II
N

 

001th

<

xi i=0 1 2 3 4 5

0.178 0.029 0.411 0.046 0.087 0.046
0.026 0.081 0.492 0.043 0.039 0.074
0.098 0.038 0.013 0.257 0.305 0.031
0.066 0.048 0.162 0.076 0.270 0.052
0.056 0.040 0.482 0.015 0.039 0.101
0.060 0.003 0.135 0.788 0.000 0.003

W

 

H
II
N

O‘UI-hWN

 

if

i i= 1 2 3 4 5
0.985 0.000 0.000 0.000

0.000 0.000 0.993 0.000 0.000 0.000
0.001 0.000 0.000 0.033 0.821 0.000
0.000 0.000 0.011 0.000 0.844 0.000
0.000 0.000 0.991 0.000 0.000 0.000

0.000 0.000 0.000 0.999 0.000 0.000

 

H
31.
o
8
N
9:
§

 

0111-th

 

ouamul
O 0 0o 0 O
O

0.000 0.000 0.000 0.922 0.000 0.000
m

Table 4—4. a) Initial neuron outputs for the 6x6 DPP; b—d) Neuron outputs after 25,
50, and 100 iterations, respectively.

4956921

. 169117 ‘

 

. l‘l'al'le ‘

 

 

.7aeooo ‘

.39‘1000 "

 

0.00000 ‘

 

(b)

Figure 4-2. a) Initial neuron outputs for the 6x6 DPP; b-d) Neuron
outputs after 25, 50, and 100 iterations, respectively.

 

F“...

 

 

65

399013}

.9930?“

 

0.00000 -

 

 

.999000'1

.WO‘

 

0 .00000 -'

 

(d)

Figure 4—2. (cont’d)

66

 

 

 

 

 

 

 

 

 

 

 

 

 

 

# of stages # of states D P

2 2 3.25 0.05*

4 4 3.12 0.03*

8 8 2.00 ~ 3E-4

l6 16 1.85 3E-4>P>1E-5
32 32 1.61 3E-4>P>1E—5
64 64 1.39 1E-5>P>0.0
l6 2 3.21 0.02*

16 4 2.98 0.005*

16 8 1.85 3E-4>P>lE-5
16 32 1.79 BE-4>P>lE-5
2 16 1.53 0.02*
4 16 1.60 0.004“

8 16 1.76 3E-4>P>lE-5

 

 

 

 

 

 

Table 4—5. Simulation results for DPPs with different sizes. The parameter pair (a,
B) has been chosen as (50, 3) for instances with the number of stages
less than 8, (5, 1) for instances with the number of stages greater than
16, and (25, 3) for instances with the number of stages in between.

67
B = (1, 1, 1, 1, 1, 1, 1, 1). 23 = 256 different codes are used to deﬁne initial conditions as
follows. If the ith bit of the initial condition code is 0, then the neuron 0 of the ith stage is
given 0.9 as an initial output value and the other neurons of the ith stage are given 0.1 as
their initial value. On the other hand, if the jth bit of the initial condition code is 1, then the
neuron 3 of the jth stage is given 0.9 as a initial output value and the other neurons of the

jth stage are given 0.1 as their initial value.

 

Figure 4—3. A problem consisting two minimum paths shown in thick lines. The
local distances for the two minimum paths are all zero, other local dis-
tances, not shown, are all set to 50.

Table 4-6 shows the number of attracted initial conditions to global minimum states
and local minimum states of the network with or = 50 and B = 4, where H is the Hamming
distance between code A or B and the initial condition codes. As can be seen in the table,
the number of attracted initial conditions increases when h which is deﬁned in Chapter 3 is
increased. The case of h = 0.20 gives the largest number (184) of attracted initial conditions.

To understand how the parameters at and B affect the basins of attraction of the
global minimum states, the following experiment has been done using the previous

example. With h and or ﬁxed to 0.10 and 50, respectively, B is changed from 1 to 5. For

every value of B, 28 = 256 initial conditions have been fed to the network. The numbers of

68
attracted initial conditions of different Hamming distances to code A and B for each value
of B is listed in Table 4-7. It can be seen that the number of attracted initial conditions,
which may be used as an index of the basins of attraction, increases when B is increased.

However, the network may converge to local minimum states as B becomes greater than 4.

 

 

 

 

 

 

 

$31323“ Globalminima . .
init. cond’s. 0 s H s 4 5 s H s 6 L°°al 1mm”
b = 0.01 91 0 165
h = 0.05 107 0 149
h = 0.10 107 0 149
h = 0.15 131 o 125
h = 0.20 184 41 31

 

 

 

 

 

 

Table 4-6. Number of attracted initial conditions to global and
local minima with different initial conditions.

0
p—e
&
VI

22 18
22 20
28 39
28 49

ADJOOOOOO

N
&

 

A
B
A
B
A
B
A
B
A
B

r—tt—IHr—Ar—It—ar—nr—It-ar—s
OOOOWOOWMCOCO
\INOOOOOOOO

A
p—A

Table 44. Number of attracted initial conditions to glo-
bal minima with different values of B.

69
4.1.5 Network Behavior due to Connection Weight Errors

The Hopﬁeld-Tank networks have been shown to be gradient systems only when
the networks are symmetric [14,15]. The connection weights may not be as accurate as
desired in a real hardware implementation, therefore the networks are not exactly
symmetric. However, it is believed that sufﬁciently small changes of the connection
weights will not alter the qualitative properties of the networks [83].

In this simulation, the previous simple example is used again. With a = 50, B = 4,
and initial condition code being set either (0, 0, 0, 0, 0, 0) or (1, 1, 1, l, l, l), simulation
shows that these two minimum states are still preserved if the errors of connection weights
are less than 5% of the original values. As the errors increases over 5%, the network may
become unstable or the minimum states may move away from their original locations such
that they can not be considered as minimum cost paths any longer. It is worthy noting that
the allowable error in connection weights for the minimum states to remain in the valid
regions heavily depends on the problems themselves. In general, the smaller the
corresponding total path of a minimum state is, the more likely the minimum state can be

preserved in its associated valid region.

4.2 Application to an Optimal Control Problem
Consider a continuous system described by the ﬁrst order differential equation

dx (t)

dt = —20x(t) +u(t) (43)

where x( t) and u(t) are the state and control variables, respectively. The performance

measure to be minimized is

‘1
J = 50x2(tf) + j (0.1u2(t) +x2(t) )dt (4-4)
0

70
where tfis the speciﬁed terminal time. This performance measure tends to drive the final

state x(tf) close to zero without wasting too much energy. The admissible values of the state

and control variables are constrained by

-1.0 Sx(t) 51.0, (4-5-1)

and

-5.0 S u (t) S 5.0 - (4-5.2)

Before the dynamic programming neural network algorithm can be applied, the
system differential equation must be approximated by a corresponding difference equation,
and the integral in the performance measure must be approximated by a summation. The

system difference equation and the performance measure can be written as follows [10]

x(k+1) = (1-20At)x(k)+Atxu(k), (46.1)
and
N-l
J = 50x2(N) +At 2 (0.1112 (k) +x2(k+ 1)), (4-6.2)
k=0

where At is small enough so that the control signal can be approximated by a piecewise-
constant function that can change only at the time instants t = 0, At,....(N-1)At. N is the

number of time instants from 0 to tfand k is the index of time instants.

It is assumed in this problem that the state and control variables are allowed to take
on only quantized values of the state set X={-1.0, -0.9,..., 0,..., 1.0] and the control set
U=[-5.0, -4.0,..., 5.0}. In addition, tfwill be speciﬁed as 1.0. Therefore, a 10x21 network

71

is needed to solve this problem. The kth stage of the network represents the time instant kAt
and the processing elements correspond to the allowed values of state variables. Assume

xi(k) is the state i of the kth stage time instant and uiﬂk) is the control signal which drives

xlﬂc) to x}( k+ I ). The cost of operation in the interval k to k+ I for speciﬁed quantized control

uiﬂk) is 1,..- «+1», where 0 3k SN-I. It is equal to o. lAtufj (k) +xj2(k+1) ifx,(k+1) belongs
to the state set X. Otherwise, Jk; (1H1)!- will be set to a very high value to prevent the sub-
path not) to x10” 1) from being selected. In addition, 1N,- (N+1)0 is equal to 50 times x? (N)

which is the square of the difference between the ﬁnal state and the zero state. Therefore,

the connection weight Txi.yj is deﬁned as

[31

_ 5 xi. yJ' (4’?)

T +8

xi, yj =

“won-51'9”“ (5mm (Jr-1n)

Figure 4-4a shows the optimal state trajectory and the state trajectory found by the
proposed algorithm with initial state x(0)=1.0. The selected control signals u(k) are equal
to {5, -3, 2, 0, 0, 0, -1, 0, 1, 0]. These are very close to the optimal control signals u*(k)=
{5, -5, 0, ..... , 0}. Figure 4-4b depicts the optimal state trajectory and the selected state
trajectory with initial state x(0)=-l. The optimal control signal sequence is {-5, 5, 0...... 0}
and the obtained control signal sequence is {-5, 3, -2, -l, 0, 0, 0, l, 0, 0]. Again, these two

state trajectories are very close.

4.3 Summary

Simulations of the network have been developed to demonstrate the robustness of
the network. The neural network can provide a near-optimal solution in an elapsed time of
only a few characteristic time constants of the circuit This has been found to apply for
DPPs as large as 64 stages with 64 states in each stage. The relationship between the quality
of solutions and the parameters of the associated energy function was explored. Simulation

shows that the minimum states can be preserved if the errors of modiﬁed connections is

A _ .‘W—X'l

 

!“ s.

72

 

 

 

 

 

 

 

 

 

 

 

X(k) *
1.0 X 00 Optimal solution
x(k) selected solution
0.5 '
1 l I 5 AF L =
5 V 10 1t
- 0.5 1 (a)
1:00 1 X*(k) Optimal solution
0.5 -

 

 

 

 

x(k) selected solution

0))

 

 

Figure 4-4. a) Optimal state trajectory and selected state trajectory for x(0)=1.0. b)

Optimal state trajectory and selected state trajectory for x(0)=-1.0. The

curves of state trajectories have been smoothed.

73
small. Moreover, the basins of attractions of the network’s equilibria can be increased by
appropriately increasing the parameters of the associated energy function. Finally, an
application of the new network to an optimal control problem demonstrated the capability

of the network.

 

Chapter 5]

 

5.0 A Building Block
Architecture and
Problem Decomposition

In the ﬁrst part of this chapter, an architecture for the dynamic programming
neural network is presented. The architecture is based on a building block paradigm in
whi Ch the network is constructed from neuron array and weight assignment chips. Because
Of its simple and regular structure, the architecture is a feasible implementation for
dynamic programming neural networks with current VLS'I technology. Moreover, this
architecture can be ﬁtrther extended to other Hopﬁeld-Tank type networks. Following the
p r €Sentation of the architecture, a divide-and-conquer method for solving very large scale
dynamic programming problems is described. A ﬂight planning problem is used to

n I “Strata the advantages and disadvantages of this method in terms of the network size,

processing time, and quality of solutions.

74

75
5.1 A VLSI Building Block Architecture

Although artiﬁcial neural networks are composed of basic circuits emulating
neurons and synapses, direct hardware implementation has been severely restricted in size
by constraints of current VLSI technology. These constraints include real-estate

considerations, and fan-in/fan-out and connectivity limitations. To alleviate these
problems, especially in the case of moderate to large size networks, a building block
paradigm in which a neural network is constructed by several VLSI chips implementing
various components has been used for feed-forward networks [83]. Even though this
paradigm may reduce the processing speed over that of single-chip designs due to off-chip
routing, it allows the construction of networks with arbitrary feed-forward architectures by

these pre-designed building blocks [83].
An architecture for dynamic programming neural networks is presented following

this paradigm. For convenience, the network dynamic equation is written again.

dun-
Cx- (71%)“. J—i-azvxj— azzvyj+an
xi phi
:22(dx xi,(x+l)j v(x+1)j+d(1 1).]lx‘v (x- 1)j) (5-1.1)
Where

1 1
_ = _ (5-1.2)
in pxi+§j1TxiJj

E(Elation (5-1. 1) can be expressed as

duxi __ uxi
Cxi(—d,—) “-R:+f(x-l)i(vx-l) +f(x+1)i(Vx+l)

—a2vch -a2vyj +an

j¢i

(5-2. 1)

where

76

vx = [VXP Vx2’ ..., vxm]T, (5-2-2)
b -
f(x-1)i(vx—1) = ‘5 . (d(x—l)j,xiv(x—l)j)’ (5 2'3)
1
__ b
f(x+l)i(vx+l) — —§2(dxi,(x+l)jv(x+l)j)’ (5.24)
J
and m is the number of neurons in each stage. Let
__ “xi uxi (5_3 1)
wn- — CIA—ET) +§—. +a2.vxj+02vyj—an, .
1“ NH )1
T -
wx = [va Wx2’ ..., Wm] , (5 3.2)
T
F1 : [.fo (vx) ’fxz (vx)’ “°!fxm (vx)] 9 (5-3°3)
and
b "I
Der-1):: = 5&3 r.j=1’“'h"e Ci} = dtx-l)i.x1" (5‘34)
From equation (5-1.1), it can be shown that
T T
Wx = Fx-1+Fx+l = D(x-1)x. (_vx-1) +D(x+l)x. (_vx+l) ' (5-4)

Therefore, the network can be decomposed into an element (chip) implementing the

wx’s and connection chips implementing the D(x_1)x’s. Figure 5-1 shows part of such a

77
decomposed architecture for the dynamic programming neural network. Consider an n x m
network with n neuron array chips and 2n-2 connection chips. Each neuron array chip,
shown in Figure 5-2a, has m processing elements, shown in Figure 5-2b, and each
connection chip, shown in Figure 5-3, consists of m x m matrix-like connections. Notice

that every neuron array chip has connections to its two neighbor chips, a global input

Va ,1 = zvyj, and an output Vsub (x) = 2v?!" Va” is the sum of all neuron outputs and
w J

can be implemented by an analog adder shown in Figure 5-1. Vsub(x) is the sum of neuron

outputs in stage x. Notice that the network connectivity of this architecture is 0(nm2)
instead of 0(n2m2), which is generally required for a Hopﬁeld-type network with n x m
totally connected neurons.

The resistance elements of the connection chips and the input capacitances of the
neuron array chips may be designed as adaptable circuits to allow the circuit to be used to
SOIVe different problems. Many methods have been deve10ped for adaptable capacitors and
reSistors, including implementations of connection weights and input capacitances with
digital and analog programmable circuits [83-86]. Eberhardt, et al., have designed a
32 X 32 connection chip which utilized the four-quadrant Gilbert multiplier to convert
charge stored on capacitors into conductances for feed—forward neural networks. These
Clesigns can be utilized to implement this network.

If a host computer, an interface between the host and the network, and an AID
converter are considered along with the dynamic programming neural network, then this
integrated system is capable of solving different problems. This system, shown in Figure

5‘4. works as follows. Initially, a problem is speciﬁed in the host computer. Then, the
conlputer will calculate the connection weights, download the weight values to connection
chips, and set up initial conditions via the interface. To download a weight value to a
reSistance element and an initial value to an input capacitance, each adaptive resistance and

capacitance must be addressable. With the associated address selected by the host

 

 

stage x-2
chip

 

 

L..— .

stage x-l

\ —

 

neuron
array v
chip

 

 

V.“ Vmb(x- 1)

 

 

 

 

 

78

 

 

 

 

 

 

 

 

 

Vmb(2 .

 

 

 

 

 

 

 

 

 

FX-l-l con“.
chip
stage x stage x+l
I_‘ 1 1—1 t———1
00“" _’. neuron “,... conn. j. neuron .
chip array 1 chip my Vx+1
.———p chip —p chip —>
Fx-l ht ‘
.___. Val Vmb(x) Vail vsub(x+1)
conn Fri-2
chrp “
from
stage x+2

Vsuba)
r)0» V‘“

vsubm)

/

Figure 5-1. A building block architecture (only 3 stages shown) for
the dynamic programming neural network.

79

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

’Vali
_ V
f(p 1)1( p-I) .
‘VpI
O
O
O
f(p-1)q("p-1)
$— 'qu
O
O
O
f(p-1)m(vp.1) .1}
-me
+
‘ sub(P)
(a)
f(p-1)q(" -1)
—>
’ sub(p) l/a
1 ‘
-V 1
“11> JAR
+ p
Bias: an m 0 V
._ ' pq
__ __= cm
0))

Figure 5-2. a) A neuron array. b) A neuron design.

 

80

 

 

 

 

 

 

 

 

 

 

d’iI ,pI
- VP] f p] (vp)
+
O
O
O
-qu + f pq( Vp)
O
O
O
-me fpm(vp)
+
l
de-Pm

Figure 5-3. A connection matrix.

81

computer, each connection and iniu'al condition can be “programmed” by downloading the
corresponding values which are converted into analog signals by the interface. The host
computer then starts the dynamic programming neural network and samples each
processing element output at a certain rate to check if the network reaches a steady state.

The steady state can be read via the AID converter after the network has converged.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

address
.———>
Front-end control output
Computer ._—. Interface data ANN .r__.’ A/D
signals .__> signals
+ digital output signals

 

Figure 5-4. A neuronprocessing system for dynamic pro-
gramming.

5-2 Problem Decomposition

The implementation of large scale neural networks has been severely restricted by
Curr62m technology. One way to cope with this problem is to use a divide—and-conquer
Su’ategy. The advantage is that many smaller and feasible networks can be used to solve
larger problems. However, the time to find a solution will increase accordingly, and the

80111tion quality may not be as good as that for the solutions found without decomposition.

5~2-l Algorithm
This decomposition method is described as follows.
(1) Decompose the problem to be solved into sub-problems with size less than or
equal to that of the available smaller network. Figure S-Sa depicts a

decomposition which divides a 4 x 4 problem into 4 2 x 2 sub-problems.

82

With each sub-problem representing a state node, a new problem is formed
which is shown in Figure S-Sb using the same example. The new metric
distances between two nodes in two adjacent stages will be assigned by a
predeﬁned function of the metric distances of the original problem.

(2) Find a near-optimal solution for the new problem.

(3) Find a near-optimal solution for every sub-problem which corresponds to a state
visited by the near-optimal path of the new problem.

(4) Link the near-optimal solutions found in step (3) as a complete solution.

5.2.2 Examples

The problem to be solved is to ﬁnd a minimum cost path from a start point to an end
Point in a map consisting of 50 x60 points. Each point has an associated cost. The set of 50
X60 cost values is given in Appendix A. The total cost of the path is the sum of the costs of
the POints along the path. To reduce the network size, a divide-and-conquer strategy is used.
This method divides the 50x60 map into 100, 5x6 regions. Then, with each region
Considered as a point, the 50x60 map can be reduced to a 10 x10 map. The cost of each point
0f the reduced 10x10 map is simply the sum of the costs of the 30 points in the
correSpending 5x6 region. The set of cost values for the reduced 10x10 map is given in
ApPCDdix B. A minimum cost path in the reduced map then can be obtained using a smaller
netVvork. Finally, a minimum cost path in thc original 50x60 map can be generated by
ﬁnding each sub-path in each corresponding 5x6 region which is visited in the minimum

cost Path of the 10x10 map.
To convert the problem to a neural network, the number of stages of the network and
the States in each stage must be determined ﬁrst. Assume the desired path is regular and
SmOOIh, the number of stages for a start point (sx, sy) and an end point (ex, ey) can be set to

M = c xmax (lex—sxl, |ey-sy|)

83

 

 

 

 

 

(b)

Figure 5-5. a) A 4 x 4 problem is divided into 4 2 x 2 sub-problems.

b) A new 2 x 2 problem with each state representing a

2 x 2 sub-problem.

84
where c > 1. The larger the coefﬁcient c is, the more potential solutions the network will
have. However, the network size will increase. In simulations, c is chosen as 1.5 S c S 2.0
which gives a reasonable trade-off between the solution quality and network size.
Let T(i) be the set of states in the ith stage. Assume the path directions are only
allowed in the -45, 0, and 45 degree directions. The possible states in the ith stage are

derived in two steps. The ﬁrst step where 0 5 stage (i) S [g] begins from the start point

to the half-way point of the path as follows. Let T(0) be the set of the starting state (sx, sy).

The state set of the ﬁrst stage is deﬁned as

T(l) = {qqsx-qsr, lsy—ty|Sl) and (lex—IIISM (Icy-ty|SM))}°

The states in T(I ) are those which are the neighbor states of (3, 3).) and whose
distances along x and y axis to (ex, ey) are less than the number of stages M. Deﬁne
dir (p, q) as the direction fromp to q, where p e T (i— 2) and q 6 T(i- 1) . Then each
state t in T(i) where i > 1 must satisfy the following four conditions:

1. |qx—tx| S 1, lqy-tyl $1

2. I: q

3.|ex-tx|SM-1, ley—ty|_<.M—1

4. dir (p, q) —dir (q, t) = —45, 0, or 45 degrees.

The second step, [1%,] < i S M, starting from the end point to the half-way point of the path

in backward fashion, follows the same procedure as the ﬁrst step.

Figure 5-6 depicts parts of the network construction procedure in a 10x10 map. The
starting point is (1,1) and the ending point is (9,9). Suppose M=14, then the ﬁrst step goes
from stage 1 to stage 7, and the second step goes from stage 14 to stage 8. As can be seen

in Figure 5-6, the states in the ﬁrst stage are (0,0), (0,1), (0,2), (1,0), (1,2), (2,0), (2,1), and

85

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10
14 E
9
14 14
8
7
6
5
3 3 3
4
2,3 2 2,3
3
1 1 2,3
2
1 S 2
1
1 1 2
0 1 2 3 4 5 6 7 8 9

Figure 5-6. An example illustrating how to determine the states for each

stage.

n
x

86
(2,2). The states in the second state are (0,3), (1,3),..., (3,1), and (3,0). The ﬁnal stage, i.e.,
stage 14, consists of states (9,8), (9,9), (8,9), and (8,8).

After the stage number and the number of states in each stage have been
determined, the connection weights d(u,v) where u e T(i) andv e T (i + 1) need to be
selected to build a network. There are three cases:

Case 1: u = v.

Then d(u, v) = 0.

Case 2: lax—VII $1 andluy—vy| S 1 but u¢v.

d (u, v) = f(cost (v)) if f(cost (v)) S threshold , else
d (u, v) = D, where 1%) is a function which normalizes cost (v) such that
the function values fall in [0, 10], threshold is an adjustable positive value
in [5, 10] and D is set to a large value (over 100).

Case 3: lux-vxl >1 or|uy-vy|>1.

To prevent an illegal sub-path from being selected, the connection weight
is set to a large value as

d(u,v) = max (thresholdx (lax-VII +|uy-vy|),D) .

5.2.3 Simulation

In the ﬁrst example starting from (7, 15) and ending at (37, 21) in the original 50x60
map (from point (1,2) to point (7 , 3) in the reduced 10x10 map), 8 stages are used for the
network and the average number of states in each stage is about 15. The path shown in
Figure 5-7, with a cost 16.83, is obtained by the method with decomposition. Figure 5-7
also shows the path found by the method without decomposition, with a cost 15.54, and the
optimal path. Figure 5-8 shows the path with a cost 11.96 from point (2, 3) to point (28,
39) in the original 50x60 map (from point (0, 0) to point (5 , 6) in the reduced 10x10 map)
for the second example. The path, also shown in Figure 5-8, selected by the method without

87
decomposition has a cost of 12.21. Table 5-1 is a comparison of solution quality,
processing time, and the network size required for the methods with and without
decomposition in the ﬁrst example. With the time increment step set to 0.01 in the
simulation, the networks for the problems with the 5x6, 10x10, and 50x60 maps
approximately require 200, 300, and 1,000 iterations to converge, respectively. Therefore,
the number of iterations required for the method with decomposition in the ﬁrst example is
calculated as 8 x 200 + 300 = 1, 900. With decomposition, the network size required is
about 8x15 for the first example. However, the network is as large as 60x200 if the problem

is not decomposed.

5.3 Summary

A building block based neural network architecture for dynamic programming was
presented. The advantage of this proposed architecture is that the networks can be
implemented with some simple well-designed VLSI chips and a reduced network
connectivity. This architecture can also be extended to some other Hopﬁeld—type networks.
A problem decomposition method was described with a large-scale ﬂight planning

problem as an example.

 

 

 

Processing .
Method Cost iterations Network srze
With decomposition 16.83 ~1,900 ~8x15
Without decomposition 15.54 ~1,000 ~60x200

 

 

 

 

 

 

Table 5-1. Comparisons of solution quality, processing time, and the network
size required for the methods with and without decomposition for
the ﬁrst example.

Ut

 

88

O 10 20 30 40 50

Optimal solution
_____ Neural net solution with decomposition
__ __ _ Neural net solution without decomposition

Figure 5-7. Different solutions for the ﬁrst example.

89

 

10 20 30 ' 40
_ _ _ Optimal solution
_____ Neural net sdution with decomposition

__ .____ _ Neural net solution without decomposition

Figure 5-8. Different solutions for the second example.

 

 

Chapter 6

 

 

6.0Application: Incremental
Autoassociative Memory

For an autoassociative memory, each desired memory pattern can be considered
as a state (path) with minimum energy (cost). Thus, the autoassociative memory problem
can be viewed as a problem of ﬁnding the minimum energy state in a constrained search
space dependent on the initial condition. The application, based on dynamic programming
techniques used to ﬁnd minimum cost paths, is very simple and straightforward in the sense
that memory patterns can be easily added to an existing associative memory without any
modiﬁcation to the previous connections. Analysis reveals that each stored memory pattern
corresponds to a global minimum state of the formulated network. Simulation also shows
that every desired pattern can be successfully stored and recalled and the extent of the
basins of attraction of the global minima can be adjusted by selecting the parameters of the
network energy function. Two examples, with the ﬁrst one designed to correct single-error
codes and the second to distinguish three images, are used to demonstrate the power of the
formulated network. The issue of fault tolerance of the autoassociative memory network is
also considered. The network can recall the stored memory patterns very successfully even
when noise is added to the inputs. Moreover, the network can detect, locate, and recover
from a single stuck-at fault. Comparison with two other networks from the literature shows

that the incremental autoassociative memory network is more ﬂexible and robust.

90

91
6.1 Introduction

Suppose that a set of memory patterns [1’1 , Y2...., 1”} is to be stored, where I" e R"
for r = 1 to 1. Given a stimulus X = Y' + e where e is sufﬁciently small, an autoassociative

memory is designed in such a way that the output pattern I" can be recalled successfully.
In other words, an autoassociative memory is designed to store a set of desired memory
patterns where each pattern is attractive in the sense that any input pattern in its region of
attraction will eventually converge to it.

The Hopﬁeld-Tank model is capable of implementing associative networks [14].
However, some of the proposed networks do not guarantee that each desired memory
pattern can be stored [14.38.39]. Moreover, some of them suffer from rather complex
synthesis procedures such that, in order to store large memory patterns, the network
connection weights must be determined by solving a large number of equations[4l].
Additionally, a major drawback against them is that the whole network must be
reconstructed if additional memory patterns are to be stored.

It has been noted that the associative memory problem can be considered as an
optimization problem [7]. For example, the problem of recognizing a familiar face can be
interpreted as ﬁnding which one of those acquainted faces in the memory most closely
resembles the test face. That is, to ﬁnd which one maximizes the overlap between the test
face and the acquainted faces.

Dynamic programming, one of the techniques used to solve combinatorial
optimization problems, can be used to ﬁnd optimal paths from source to destination. If each
desired memory pattern is considered as a shortest path, then the associative memory
problem can be regarded as a dynamic programming problem, i.e., ﬁnding the shortest path
within a constrained search space dependent on the initial condition.

A coding procedure for autoassociative memory based on dynamic programming
neural networks will be presented. It will be shown that each desired memory pattern can

92
be successfully stored in the network and each memory pattern corresponds to a global
minimum state of the associated energy function. Furthermore, the extent of the basins of
attraction of the stored memory pattern can be controlled by selecting the parameters of the
associated energy function.

One of most signiﬁcant advantages of this autoassociative memory network is the
inherent fault tolerance. The fault models considered in this chapter are noisy inputs and
single stuck-at faults. The network can recall the stored memory patterns in a very noisy
environment. Moreover, the network has the ability to detect, locate, and recover from a
single stuck-at fault.

This chapter explores the autoassociative network as follows. Section 6.2 presents
the network formulation procedure and analysis. Simulations and examples are given in
Section 6.3. The issue of fault tolerance is addressed in Section 6.4 and Section 6.5

compares the proposed network to two other known networks.

6.2 Autoassociative Memory Network

6.2.1 Formulation
Assume that a set of memory patterns Y = {Y1 , Y2,..., Y’} is to be stored in an
autoassociative memory, where Y' = [Yi’ 3’; ..., y;] e R" is in binary form, i.e., y: is

either 1 or 0. To design an autoassociative memory using a dynamic programming network,
the minimum paths corresponding to the desired memory patterns must be deﬁned.

Assume a dynamic programming network with n stages and 2l states for each stage.

Let Path(r) be the minimum path associated with memory pattern I", where Path(r) starts

from the source node, passes through states Pi: p3, ..., and p; in stage 1,2...,n,
respectively, and ﬁnally ends at the destination node. The distance between p; and p; H

is deﬁned as d (x, p;, p;+1) . Let p6 be the source node, p; +1 be the destination node, and

93
s(x, i)be the ith statein stagex. Wedeﬁnep; = s (x, r+y; x l) , and d(x,p;,p;+ l) = 0,
where 0 S x S n and 1 S r S l. The other distances, between any two states in two adjacent
stages, are all set to a large positive number D. Therefore, the total distance of each
minimum path is zero and the others are at least D.
Figure 6-1 shows the network which can store three memory patterns. It will be

shown later that the designed network has exactly 1 global minimum states which

conespond to the desired memory patterns in a one-to-one fashion.

     
 
  
 

  
  
 

i}? 93$
-"/’A‘A‘\‘ .' “:1.
ctr/1.x“.-

.’/\\
A. .

  
   
     

Figure 6-1. A 6 by 6 network (full connection not shown) which can store pat-
terns (0,0,0,1,1,1), (1,1,1,0,0,0) and (1,0,1,0,1,0). Three shortest
paths are shown in thick lines.

6.2.2 Definitions
The following deﬁnitions will be used in the remainder of this chapter.

i) The domain and range of the sigmoid function g are deﬁned as U=RN and V=[0, 1]”,

94

respectively, where N = m x n. Every vector v e V can be expressed as

[vn,v12,...,vmn] T, where 0 S ”in S 1. The zero vector of V is denoted as 0N.

ii) A vector c e V is said to be a valid corner state if cn- is either unity or zero and

26”- = 1 forx=l to n.
1

iii) The valid corner set C is deﬁned as the set of all valid corner states.

iv) The path vector 1" associated with memory pattern Y’ is deﬁned as

T' = “i”; ...,t;] ,where t; = r+2lxy;, and 1 SxSn.

v) The minimum state Vim." associated with memory pattern 1" is deﬁned as

f f f r r . o r .
Vmin = [vn,v12, ...,vmn] ,where v“. = 1 if t = tx,and00therwrse, 1 SxSn,

1SiS2l,and1SrSI.ThusV'

min is a valid corner state.

6.2.3 Analysis

The following propositions provide a theoretical insight into the autoassociative
memory network. For completeness, some of the relevant propositions which were
presented in Chapter 3 are restated here without proof.

Proposition 1. v is a valid comer state if and only if E1(v) = 0.

Proposition 2. Every V'

min rs a global mmrmum state.

Proof: Since V'

min

is a valid comer state, E 1 (VIM

r r
5'2 (Wm-n) = zzzdxi,(x+ 1)}inV(x+ 1)j+ d(x-— 1)j,xi";i"r(x— 1);
x i j

=22d(x,p;,p;+1)=0~
x l

) = 0. Furthermore, we have

Therefore, it follows that

95

E(v'.) = geuv'

mm min

)+§-E2(V'

min

)=0-

Since E(v) 20 for every v e V, V’

m." is a global minimum state. Q.E.D.

Proposition 3. There are exactly 1 global minimum states.

Proof: Suppose there is another global minimum state v which does not correspond to any
of the minimum paths. Since v is a global minimum state, E2(v) is equal to zero.
Consequently, v must be the zero state 0N. However, 0N is not a valid corner state.
Therefore, according to Pr0position 1, v is not a global minimum state. Q.E.D.

Some undesired local minimum states, which correspond to paths that are either
invalid or with a total distance greater than or equal to D, may also exist in the network.
However, the following proposiu'on states that they are all on the boundary of the unit
hypercube. This gives a guide as to how to set the initial conditions as described in the next

section.

Proposition 4. Every local minimum state of E is on the boundary of V.

6.3 Examples

For convenience, rewrite the simpliﬁed network equation as

du .
C11. (—d—:‘) = "' 6.2.ij- (122131"!- an
1“ y I

l3
“2‘; (dxit(x+l)jv(x+1)j+ d(x— 1)j,xiv (x_ 1),.) °

96
6.3.1 Initial Conditions

The initial condition Z' associated with memory pattern I" for the input pattern X =

[x1,x2,...,x,,] is deﬁned as Z’ = [2:1, 2:2, ..., zfnn] 6 R" where 2;,- = h ifi = r+xk x l,
and 1 - h otherwise. h is usually selected in the range (0, 0.2) depending on the application
[8]. Since Proposition 4 shows that the undesired local minimum states lie on the boundary
of V, the initial conditions should not be set too close to the boundary to keep them from
falling into the basins of attraction of the undesired local minima.

Given two memory patterns [0, 0, 0, 0, O, 0, 0, 0] (code A)and[1,1,1,1,1,1,1,1]
(code B), a network with at = 50 and B = 4 has been designed. Table 6-1 shows the number
of attracted patterns to global minimum states and local minimum states of the network
with different initial conditions, where H is the Hamming distance between the stored
patterns. As can be seen in Table 6-1, the number of attracted patterns increases when h is
increased. The case of h = 0.20 gives the largest number (184) of attracted patterns;
however, classiﬁcation errors also occur since 41 patterns with Hamming distancesS or

more units away from the stored patterns are attracted.

 

 

 

 

 

 

 

Number? Global minima

333$? 0SHS4 ssnso “calm
h = 0.01 91 o 165

h = 0.05 107 o 149

h = 0.10 107 o 149

h = 0.15 131 o 125

h = 0.20 184 41 31

 

 

 

 

 

 

Table 6- 1 . Number of attracted patterns for global and local min-
ima with different initial conditions.

97
6.3.2 Simulation
Three major steps are followed to verify the above by simulation.
Step 1.0 The memory patterns to be stored and the test pattern X are speciﬁed.
Step 2.0 The local distance dxi.(x+1)j is generated according to Section 6.2.1.
Step 3.0 Set r = 1;
While (r S l)
3.1 Set the initial condition associated with memory pattern I”;
3.2 Relax the network;

3.3 Check the network for convergence;

3.4 If the converged state corresponds to memory pattern I”
then stop;
Else increase r by 1;

3.5 Reject the test pattern X and stop;

6.3.3 Basins of Attraction
To understand how the parameters a and b affect the basins of attraction of the
global minimum states, the following experiment has been done using the previous

example in Section 6.3. 1. With h and a fixed to 0.10 and 50, respectively, B is changed from

1 to 5. For every value of B, 28 = 256 test patterns have been fed to the network. The
numbers of attracted patterns of different Hamming distances to code A and B for each
value of B is listed in Table 6—2. It can be seen that the number of attracted patterns, which
may be used as an index of the basin of attraction, increases when B is increased. However,
classiﬁcation errors occur as B becomes greater than 4. It also can be seen, interestingly,

that the network perfectly recovers the codes with H = 2 in the case of B = 5.

98

O
H
h
LII

22 18
22 20
28 39
28 49

#wOOOOOO

N
is

 

A
B
A
B
A
B
A
B
A
B

HHHI—‘HHHHD—ir-I
OOOOOOOONOOOOOO
QNOOCOOOOO

k
H

Table 6-2. Number of attracted patterns for global mini-
ma with different values of B.

6.3.4 Examples

With at = 15, B = 2, and h = 0.05, the ﬁrst example shows that a network can be
designed to correct single-error codes. Eight six-bit code words shown in Figure 6-2a with
three message bits and three check bits have been stored in a 16 x 6 network. The code
words of Figure 6.2a are plotted on the 6-cube map of Figure 6.2b. Each code word is
indicated by the corresponding letter and all cells one distance away from a code word are
marked with an x. As can be seen in Figure 6.2b. each code word is surrounded by 6 single-
error codes. Simulations show that the network converges to the conect code word when
each of the 6 single-error codes is fed to the network.

The second example demonstrates the power of the artiﬁcial neural networks in

distinguishing among three patterns, an English letter, a Chinese character, and a Greek

racecourse-m

C2 C3
00
01
II

10

C2 C3
00
01
11

10

Code M1 M2 M3 C1 C2 C3
0 0 0 0 0 0
0 0 I 1 I 0
0 I 0 0 I I
0 I I I 0 I
I 0 0 I 0 I
1 0 I 0 I I
I I 0 I I 0
I I I 0 0 0
(30
M1 M2 = 00
M3 CI
00 01 11 10
a a’ b’ a’
a) e) d,
c’ b’ 1’
a’ b’ b b’
M1 M2 = 10
M3 C1
00 01 11 10
a) e, h,
e’ e e’
f e’ f f
g’ I” f
(b)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

C2 C3

00
01
I I

10

C2 C3

00
01
I I

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CI = M] $ M3
C2 = M2 63 M3
C3 = M1 6 M2
M1 M2 = 01
M3 C1
00 01 II 10
a’ d’ h’
c’ d’ d d’
c c’ d’ c’
c’ g’ b’
M1 M2 = 11
M3 CI
00 01 11 10
h’ g’ h’ h
e’ d’ h’
C’ 8’ f
g’ g g’ 12’

 

 

 

 

Figure 6-2. a) Eight single-error-correcting code words composed of 3 message
bits and 3 check bits and parity check table. b) 6—cube map for the
code of (a).

 

 

100

letter, as shown in Figure 6.3a. The network with parameters (it = 15, B = 2.5, and h = 0.10
has been tested with hundreds of patterns with randomly introduced noise. The noise is
added to the patterns in such a way that every pixel may be reversed as detemrined by a
probability p. Table 6-3 shows the success rate of recall with p being varied from 5% to
33%. 100 test patterns with random noise have been fed to the network for each stored
pattern and for each p. It can be seen that the network performs well when p S 20%. Figure
6.3b shows some of the test patterns. Furthermore, three test patterns with part of the
images opaque, as shown in Figure 6.3c, have been also successfully recalled.

For a third example, the same procedure has also been run through with airplane,
tank, and helicopter patterns as used in [39]. The simulation results are also listed in Table

63. Notice that the results in both examples are very close.

 

I p% 5% | 10% 15% l 20% l 25% 30%
lFigureSexamples 100%| 100% 98.5%| 98%| 88% 70.5%

IExamples from [39] 100% I 100% 97.7%] 97%] 86% 65.7%

 

 

 

 

 

 

 

Table 6-3. Success rate of recall for the second and third ex-
ample with different percentages of pixels being
incorrect.

6.4 Fault Tolerance in the Autoassociative Memory Network

Unlike conventional computers, stored information in a neural network is
determined by its structure, processing elements, and connections. Since the output results
are collective decisions of all parameters together, the loss of information due to faults,
such as damaged connections and malfunctioning processing elements, will still often
result in the network obtaining a correct output. This provides an inherent fault tolerance

property for neural networks. However, the fault tolerance works only when the neural

101

 

(C)

Figure 6-3. a) Three stored patterns. b) Three test patterns with random
noise. c) Three test patterns with opaque image.

102
network is large enough to store the pieces of information in a redundant way [87].
Moreover, if faults occur in some important neurons, the fault tolerance ability may be
degraded. For example, if stuck-at faults occur in some of the output neurons in a feed-
forward network, the results will always be incorrect. If other neurons are faulty, the
learning process may be slowed or divergent, and stored information may be lost.

Fault models to be examined in this chapter include noisy inputs and single stuck-
at-faults. Neural networks are expected to operate in noisy environments, where the
received inputs do not exactly match the desired inputs, such as in pattern recognition
problems where only partial information is provided as the input. This partial information
can be viewed as noisy input. In the binary domain, a noisy input is considered as a vector
with some bits complemented. For such noisy input, the results in Table 6-3 show the
success rate of recall with p being varied from 5% to 33%. It can be seen that the network
can recall very successfully when p S 20%.

Let neuron(x, i) denote the ith neuron of stage x. Neuron(x,i) is said to be stuck at 1
or 0 if in is equal to 1 or 0 for all t (time), respectively. For a single stuck-at fault, only two
cases, stuck-at-l and stuck-at-O, are considered. From the deﬁnitions in Section 6.2.2, the
neuron in the xth stage of Path(r), which corresponds to the memory pattern Y', is
neuron (x, r + 21 x y;) . For the ﬁrst case, assume neuron(x,i) is stuck at 1. If
i t r + 21 x y;, then the network will not converge when the memory pattern I" is applied,
since the output of neuron (x, r + 21 x y;) will be forced to 0. This is because neuron(x, i)
is stuck at l and one neuron’s output can, by deﬁnition, be one. On the other hand, if
i = r+ 21 x y;, i.e., neuron (x, r + 21 x y;) is stuck at 1, then the memory pattern 1" can
be recalled when it is applied. For the second case, assume neuron(x, i) is stuck at 0. If
i = r + 21 x y;, i.e., neuron (x, r + 21 x y;) is stuck at 0, then the memory pattern Y' will
not be recalled when it is applied since the output of neuron (x, r + 21 x y;) is always 0.
However, if i at r + 21 x y; the memory pattern Y' will be recalled when it is applied since

the network’s convergence is not affected by the faulty neuron.

103
To locate a stuck-at neuron, it is necessary to apply every memory pattern to the
network. Four cases can occur according to the stuck-at-fault analysis derived in the

previous paragraph.

Case 1: The network converges only when the memory pattern 1" is applied.

There is a stuck-at-l fault in one of the neurons in Path(r). Moreover, the divergent
neurons must be in the same stage, say x, when memory patterns other that I" are applied.
Then neuron (x, r + 21 x y;) is the faulty neuron.

Case 2: The network diverges only when I" is applied.

There is a stuck-at-O fault in one of the neurons in Path(r). Suppose the faulty

neuron is in stage x, then neuron (x, r + 21 x y;) is the faulty one.

Case 3: The network diverges when each memory pattern is applied.

There is a stuck-at-l faulty neuron which is not in Path(k) where k = l to l. The
neuron is the one whose output is always one when each memory pattern is applied.

Case 4: The network converges when each memory pattern is applied.

There is either no faulty neuron or a stuck-at-O neuron which is not in Path(k) where
k = 1 to 1. In either case, no memory is lost.

In the first three cases, if the fault has been located, the network can be restored by
replacing the faulty neuron with a spare fault-free neuron. This fault-recovery method,
shown in Figure 6-4, is to add one spare row of neurons to the original network. Suppose
that only one neuron in each stage can be faulty, one spare row is sufﬁcient to recover the
faults. The recovery process is also shown in Figure 6-4 where the stuck-at—O neuron is

removed and a new minimum path passing through the spare neuron in stage it is assigned.

104

 

 

 

 

 

 

Figure 6-4. Fault recovery process. The thick line represents the new
minimum cost path.

6.5 Comparisons

Table 6-4 shows the comparisons between the proposed network and two other
networks proposed by Kosko [38] and Wang, et al [39]. The network proposed by Wang,
et al., which used the multiple training encoding strategy, is an enhanced version of
Kosko’s network. The data shown in Table IV was obtained from [39]. In both networks,
the maximum number of stored patterns is limited. For example, the maximum number of
stored patterns is 8 for Kosko’s network and 11 for Wang’s network when the length of the
stored patterns is 100 bits. The stored patterns correspond to local minima in both Kosko’s
and Wang’s networks, while they correspond to global minima in our network.

Both other networks will fail to work if any neuron is damaged. The proposed

network can continue to work in general if the simple fault recovery mechanism discussed

105
in Section 6.4 is adopted. One other major shortcoming for both Kosko and Wang’s
network is that if additional memory patterns are to be stored, the network connection
matrix must be recalculated. One the other hand, the additional patterns can be easily added
incrementally in this proposed network without any modiﬁcation to previous connections
provided there are enough spare neurons.

The disadvantage of this network formulation is that the number of neurons
required is proportional to the number and length of stored patterns. However, an
architecture design using VLSI building blocks for networks of this type has been discussed
in Chapter 5. Moreover, only two types of connection weights, 0 or D are required. Thus,
it can be envision that this type of structure can ultimately be implemented using cell-based
computer-aided design much like the current compilation techniques used to conﬁgure

memory circuits.

6.6 Summary

In this chapter, an incremental autoassociative memory network based on dynamic
programming neural networks was pr0posed. The formulation of this network is very
simple and straightforward in the sense that memory patterns can be easily added to an
existing associative memory without any modiﬁcation to the previous connections. It has
been shown that each stored memory pattern corresponds to a global minimum state.
Moreover, the inherent ability for fault tolerance makes the proposed network more robust.
Simulation has shown that every desired pattern can be successfully stored and recalled.
Furthermore, the extent of the basins of attraction of the global minimum state can be
adjusted by controlling parameters a and b. Even though there exist some undesired local
minimum states, it has been shown that the local minima are all on the boundary of the
domain of interest. Two examples for code recovery and pattern recognition have been

presented to demonstrate the network’s storage ability and ﬂexibility.

106

 

 

 

 

 

 

 

 

 

 

 

 

Kosko [38] Wang [39]

Network (bipolar mode) (bipolar mode) Preposed

N 8 11 not limited
# of neurons
required 200 200 N*200
global or local
minima local local global
add extra change all change all add extra
memory connections connections connections
one neuron the network the network the network can be
damaged will fail will fail I easily restored
3?:6333', unknown unknown I excellent

 

Table 6-4. Comparisons of Kosko, Wang, and the proposed
network for memory patterns with a length of 100
bits.

 

 

Chapter

 

 

7 .0 Application: Dynamic
Time Warping in
Speech Recognition

This Chapter presents a neural network application for dynamic time warping in
speech recognition. The network, based on a feedback-type network for dynamic
programming, is designed such that the minimum states of the network’s energy function
correspond to the near-best correlation between a test and a reference pattern. Simulations
for classifying speaker-dependent isolated words, consisting of 0 to 9 and A to Z, show that
the recognition rate of the method is as good as that of conventional methods with the same
vocabulary set. To achieve real-time applications, a speech recognition system based on
this method, which can recognize an utterance in 200 ms with a set of 1,000 references, is

also proposed.

7.1 Introduction

To classify human utterances in speech recognition systems, the utterances are
generally preprocessed into a set of spectral feature vectors or an acoustic model, and then
compared to some pro-stored reference patterns either in a deterministic (template

matching approach) or stochastic process (Hidden Markov model) [88,89]. In the template

107

108

matching approach, a nonlinear time warping is required to compensate for local
compression and/or expansion of the time scales in order to ﬁnd the best difference measure
between the test and reference pattern since their time scales are generally not aligned.
Dynamic time warping methods, based on traditional dynamic programming, have been
developed to perform time alignment and evaluate the difference between patterns resulting
in improved accuracy of speech recognition systems [90-92]. Computational problems,
however, exist when the difference measure is computed sequentially. The computation
time depends on the length of the feature patterns and the dimension of the search space for
evaluating the best difference measures.

Neural networks have been shown to be effective in speech recognition due to the
high computation rate attained by the use of many simple processing elements (neurons)
operating in parallel and the ability to learn by altering network parameters and connection
weights[93,94]. In this chapter, the dynamic programming neural network method is
applied to speech recognition of speaker-dependent isolated words. The neural network
method for dynamic time warping is introduced in Section 7.2. Simulations and results of
recognition of speaker-dependent isolated words, in this case the digits 0 to 9 and English
letters A to Z, are shown in Section 7.3. To perform the recognition task in real time, a

speech recognition system based on the proposed network is proposed in Section 7.4.

7.2 Dynamic Time Warping

Dynamic programming based on the Principle of Optimality has been an important
tool in speech recognition. In the recognition task, the distance between the pattern to be
recognized and each of the reference patterns has to be evaluated. Each speech pattern is
represented by a set of feature vectors regularly spaced along a time axis. These vectors can
be the outputs of a set of ﬁlters, LPC coefﬁcients, or cepstral coefﬁcients as will be used in
the simulations described in this chapter. Since the time scales of a test and a reference

speech pattern are generally not perfectly aligned, alignment becomes necessary in order to

109
obtain the difference measure. In most cases, a nonlinear time warping technique is
required to obtain the optimal alignment since the differences between patterns in time,
frequency, and amplitude are nonlinear. Therefore, a class of algorithms known as dynamic
time warping (DTW) have been developed to perform the optimal alignment and evaluate
the difference between two patterns [90-92].
Suppose that the kth reference pattern, R k: is a sequence of vectors Rpm, where m =

1, 2,..., M(k), and the test pattern T is a sequence of vectors T n where n = 1, 2,..., N. An

example of time warping function w(n) is shown in Figure 7-1 with the boundary conditions

[95]

w(l) = 1, and w(N) = M(k), (7-1.1)

and continuity conditions

0,1,2 for w(n)¢w(n-1)
_ _—_- 7'1.2
w(n+1) w(n) 1,2 for w(n) =w(n-1) ( )

Equations (7-1.1) and (7-1.2) determine the dimension of the search space and each time
warping function produces a speciﬁc path in the search space.
Throughout this paper, it is assumed that the beginning and the ending points of the

test and reference patterns have been accurately located. The distance between vector Tn

and Ran) is deﬁned as

L
2 (7",,111 —R,,,,,(,,, [1112

, _l=1 7-2
d(n,w(n),k)— 1024XL ( )

 

where L is the dimension of feature vectors, and Tn[l] and Rk.w(,,)[l] are the lth component
of vector Tu and Ran), respectively. The value of every component of the feature vectors

is in the range of [0, 1024], and so is every d(n, w(n), k) for all n and k. The total distance

110

_>
CD

 
   

reference pattern

  

    
  
      

 

. 'ptmlmqﬂ'

  

”annual".
I

. 'pmumd"

 

F
test pattern

Figure 7-1 An example of the time warping function and search space of the optimal
path. The path corresponding to this warping function is shown by the
hatched line.

1 1 1
for the path resulting from the time warping function, that starts at grid point (1, l) and ends
at grid point (N, M(k)), can be deﬁned as a sum of local distance d(n, w(n); k) for each (n,
w(n)) grid point of the path. Then the problem of dynamic time warping is to ﬁnd an
optimal path which minimizes the sum of d(n, w( n); k) among all possible time warping

functions. The distance of the optimal path is expressed as

L
N N 2 (Tulll “Rhwpo ““2
D(k) =min2d(n,w(n);k) =min2’=1 .

 

W(') n = 1 W() n = 1 1024 X L (7-3)
Deﬁne the partial distance of D(k) from time instant j = 1 to n as
n
D (n, w (n) ;k) = min 2 d(i, w (j) ;k) - (7-4)
“’0 j=1

The distance D(k) in the example can be efﬁciently computed, using the algorithm of

dynamic programming, by the following recursive equation

D(n+1,w(n+1);k) =d(n+1,w(n+1);k) +
min{D(n,w(n+1);k) xg(n)),D(n,w(n+1) -1;k),D(n,w(n+1) -2;k)}
(7-5.1)

where

00 for w(n) =w(n-1)
= 7—5.2
g(n) 1 for w(n) ¢w(n-l) ( )

such that the continuity conditions of equation (7-1.2) are satisﬁed [90]. In equation (7-5.1),
it is assumed that d(n, m; k) outside the search space is inﬁnitely large.
Notice that the computation of the conventional dynamic time warping algorithm is

executed sequentially, therefore the computation time depends on the length of feature

1 12
patterns and the dimension of the search space. On the other hand, the following method
based on a dynamic programming neural network, performs distance calculations fully in
parallel. Thus the computation time is signiﬁcantly reduced and is much less dependent on

the length of patterns and the dimension of the search space.

7.2.1 A Neural Network for Dynamic Time Warping

Consider Figure 7-1 again. Assume each warping function w(n) has one and only
one value for each time instant. A valid and optimal path associated with the test pattern T
and the kth reference pattern Rk is the one which starts at grid point (1, 1), visits one and
only one grid point in each time instant, reaches grid point (N, M (k)), and has a minimum
total distance among all possible paths.

Each grid point can be conveniently considered as an individual processing

element. To develop an appropriate energy function for the network, vm. is taken as the

output of a processing element in the position (n, l); the weight for the connection cm; (“1)!-

between the grid point (n, t) and (n+1,j) is deﬁned as w = d (n +1,j;k) for] -i

ni. (n + 1)J'
= 0, 1, and 2, or inﬁnity. The following formal constraints are thus deﬁned.
Following the same procedure as in Chapter 3, the neural network for dynamic time

warping can be described by the following simpliﬁed differential equation

duxi
[3
—§; (dxi,(x+1)jv(x+ 1)j+ d(x-1)j,xiv(x_1)j) (7.91)
where
_L ._— _1_ ,t anm. (79.2)
Rni pni r} '

1 13
7.3 Network Simulation
Although it is not guaranteed that the networks can provide the globally optimal
paths, simulation results in Chapter 4 have shown that the generated paths are very close to
the optimal paths. The distances for the selected paths can then be used as difference

measures to classify speech patterns.

7.3.1 Vocabulary Set

The vocabulary set consists of 36 isolated words which are the letters A -Z and the
digits 0 - 9. Each word was spoken 15 times by a male speaker with the ﬁrst 5 utterances
for reference patterns and the other 10 utterances for test patterns. The number of segments
of utterances is in the range of 30 to 50 with each segment corresponding to 10 to 20 ms of
acoustic signal. Each segment is represented by a 6-dimensional vector of cepstral

coefﬁcients.

7.3.2 Initial Conditions

The selection of initial conditions for the networks is very important since there is
more than one equilibrium point in each associated energy function. Experiments have
shown that the optimal path usually occurs near the diagonal connecting the grid point (1,
1) and (N, M(k)) [95]. Moreover, the grid points with larger local distance should be
penalized to prevent them from being selected as valid paths. Suppose the diagonal can be
expressed as y = f(x,' k). The initial condition is deﬁned as

vm = 0.05 x |m-f(n;k)| x 61 (3 - Im -f(n;k)|)

300 - d (n, m;k)
3000

 

x 51(300— d (n. m;k)) + 0.05 (7-10)

where 81 (c) = 1 if c 2 0, and otherwise 0. Since each value of d(n, m; k) belongs to [0,

1024] in the simulation, vnm is in the range of [0.05, 0.3] for all n, and m.

1 14
7.3.3 Results

In the simulation, the parameters or and B are set to 25 and 0.01, respectively. To
simplify the calculations, each d( n, m; k) is rounded to the nearest integer. Three methods
are simulated. The first uses the conventional time warping method described in Section
7.2 with one reference pattern. The other two, based on the neural network method for
dynamic programming described in Section 7.2.1, use one and ﬁve reference patterns,
respectively. The method with ﬁve reference patterns follows a similar classiﬁcation
procedure as that of the 5-nearest-neighbor method [46]. The threshold to classify a test
pattern into a category is 2. Therefore, this method classiﬁes a test pattern to the category
which has more than three best scores. In the case of a tie, where two categories have 2 best
scores among the best ﬁve, a test pattern will be classiﬁed into the category which has the
lowest sum of scores. Ten test patterns are used for each word to test the performance of
the networks. Therefore there are 360 test patterns in the simulation.

The recognition rates for these three methods are 87.22%, 86.66%, and 92.22%.
Note that the ﬁrst two results are very close. Table 7-1 compares some distances found by
the conventional method which obtains the optimal path distances and the neural network
method for dynamic time warping with only one reference for a test utterance of A. In this
example, it happens that both methods correctly classify the test pattern. However, it is not
always true for every test pattern. In some cases, misclassiﬁcations occur in either method
or both. A similar recognition rate for the same vocabulary set and using LPC coefﬁcients
obtained by Itakura is 88.60% for 720 utterances [90]. This is not surprising since they all
are based on dynamic time warping. Notice that the one with 5 reference patterns has much
better recognition rate than that with only one reference pattern. This fact affirms that the
k-nearest~neighbor method is superior to the nearest-neighbor method with appropriate

value of k. However, the penalty is the increase in classiﬁcation time.

115

 

Reference A B C D E F G H 1
Conventional 432 463 5534 549 2247 1711 6398 2107 1940
Neural net. 702 750 6187 739 2565 2490 9020 2158 2590

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 7-1 Comparisons of distances obtained by the conventional and neural
network method for DTW for one of the test pattern A.

All digits are correctly recognized in the simulation. Most of the rnisclassiﬁcations
occur in utterances where the vowel part is identical and the difference of the consonant
part is relatively small. For example, B is often misclassiﬁed as D or V. Major
misclassiﬁcations for the neural network methods are listed in Table 7-2. It is worthy to
notice that the E test utterances are totally misclassiﬁed if only one reference pattern is

used. However, only 2 errors occur when 5 reference patterns are used.

7.4 A Speech Recognition System Based on the Proposed Network
Dynamic time warping is often implemented either on general digital signal
processors or some specially designed processors [96-99]. In this section, we propose a
neural network architecture for dynamic time warping with three salient advantages. The
ﬁrst is the high computation rate due to many simple processing elements operating in
parallel, a typical feature of artiﬁcial neural networks.The second one is the attribute of
hardware fault tolerance due to the nature of collective computation in neural networks.
The network can still provide good solutions even in situations where some of the
processing elements fail to work. This is primarily due to the fact there is more than one
minimum state in the associated energy function. The third advantage comes from the
inclusion of simple processing elements and a regular structure of this architecture making

implementation of this architecture more feasible with current VLSI technology.

116

 

 

 

 

 

 

 

 

 

 

 

Number of Errors
Test Pattern Misclassiﬁed as
1 ref. 5 ref.
B D or V 4 2
D B or V 5
E B 10 2
I Y 2 l
N M 2 2
T D or P 5 5
others 19 1 1
Total errors 48 28

 

 

 

 

 

Table 7-2 Major misclassiﬁcations of the simulation for the neural network
methods. The second column lists the misclassiﬁed patterns for some
of the test patterns in the ﬁrst column. The third column lists the
number of misclassiﬁcations. For example, B is misclassiﬁed 4 and 2
times in 10 attempts as D or V in the methods with one and ﬁve
reference patterns, respectively.

Figure 7-2 shows the block diagram of a speech recognition system including the
artiﬁcial neural network for dynamic time warping. This system consists of four major
parts: a signal preprocessor, a pipelined distance calculator, a control unit, and the artiﬁcial
neural network itself. An utterance signal is preprocessed by the signal processor into a
sequence of feature vectors of cepstral coefﬁcients. The local distances, d( n, m; k), for a
test pattern and a reference pattern are calculated by the pipelined distance calculator in
order to speed up the distance calculations. The calculated distances are then sent by the
control unit to the neural network to set the connection weights of the network. The outputs
of the network are fed back to the control unit to determine the difference measure between
the test and reference pattern.

The pipelined distance calculator, shown in Figure 73, has 6 stages corresponding
to the case where the dimension of feature vectors is 8. Four stages are for addition

(subtraction), one for multiplication, and the last stage is for division. This last stage can be

117

 

 

utterances

 

signal
processor

start signal ¢ ¢ initial conditions

 

 

 

 

 

 

 

 

cepstral pipelined distance
‘ distance r
coeff. calculator

 

control
unit

control

 

 

 

 

l I neurons’ outputs l

signalsr

 

neural
network

 

 

index of the best matched pattern

Figure 7—2. The block diagram of the speech recognition system which includes the

neural network for dynamic time warping.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r[1]-t[1] V (rm-till)2
r121-tl2] > (rlzl-tlzzl)2
r[3]-t[3] V (rlsl-tlsl)2
r[4]-l[4] > (r[4]—t[4])2
r[S]-t[51 > (r[5]-t[5])2
r[6]-t[61 V (rlél-t[6])2
r[7]-l[7] > (rm-1171)2
r[8]-t[8] r (rm-1181)2

 

 

 

Figure 7-3. The pipelined local distance calculator.

 

right—shift
register

 

 

 

d(m,n;k)l

 

118

213. The control unit

simply implemented by a shift-right register since the denominator is
sets the initial conditions of the processing elements, determines the weights of the
connections, and selects the best matched reference pattern.

To give an approximate estimation of the processing speed of the system, we
assume that the cepstral coefﬁcients of the test pattern have been obtained by the signal

preprocessor, and the processing time spent in the control unit can be neglected. Let the

maximum delay for a stage of the pipelined distance calculator be Tdelayr the number of grid
points in the search space be Ng, and the convergence time for the neural network be Tc.

Then the total processing time to classify a test pattern is approximately equal to

Kx ((Ng+5) de +Tc) (7-11)

e1 ay

where K is the number of reference patterns. The stage with the maximum delay of the
distance calculator is the multiplication stage, and the convergence time of the neural
network is dependent on the RC constants of the network.

A 16-by-16 multiplier, which can be constructed by 4-by-4 binary multipliers
(SN748274) and look-ahead adders (SN74S 182), has a typical multiplication time less than
150 us [100]. Assume the length of the feature vectors is 50, then the number of grid points
in the search space is approximately equal to 500 [90]. From the experiments we have
conducted, if RC is set to 13000, a network requires approximately 100 us to converge. If
a recognition system has 1000 reference patterns, then the processing time to classify a test

pattern is approximately equal to 200 ms, well within the limits for real-time applications.

7.5 Summary
A neural network method for dynamic time warping in speech recognition has been
proposed. Simulations for classifying speaker-dependent isolated words have shown that

the recognition rate of the method is as good as that of conventional methods with the same

1 19
vocabulary set. The recognition rate can be improved if more reference patterns are used.
An architecture based on this method has also been proposed. There are three advantages
to this proposed architecture. The ﬁrst is the high computation rate due to many simple
processing elements operating in parallel. The second is the attribute of fault tolerance to
hardware faults due to the nature of collective computation of the neural network. And the
third advantage is that this architecture is simple and regular, making implementation of

this architecture feasible using current VLSI technology.

 

Chapter 8|

 

 

8.0 Conclusion

Dynamic programming is a very useful technique in many engineering
applications, such as optimal control, optimization, and speech recognition. Due to the
curse of dimensionality and the sequential computation nature, conventional algorithms
are not effective in many applications requiring fast, perhaps real-time, solutions. This
dissertation presents a fundamentally new approach to dynamic programming, including
network formulation, analysis, simulation, implementation, and applications. The
described artiﬁcial neural network method is attractive due to its robustness and radically
improved speed over conventional techniques especially where real-time near-optimal

solutions are required.

8.1 Summary

This dissertation has presented an intensive study of applying neural networks to
dynamic programming problems. The network formulation was deﬁned by an energy
function in which the optimal solution corresponds to the lowest energy state of the
network. Following the development of the energy function, the network equations were
determined. The network formulation has been shown to be a gradient system which
converges to one of the stable equilibria if the initial conditions are sufﬁciently close.

The formulated network was analytically examined from the view point of large-

120

121
scale nonlinear systems. Properties related to the functionality of the network were proved. These
include the existence and uniqueness of the solution, the stability and convergence of the network,
the relationship between the equilibria and minima of the energy function, and the ﬁnite number
of equilibria of the network. To gain a clearer insight into the nonlinear system, each component

of the associated energy function was discussed in the extreme cases. With a —) co, the minimum

states are the valid corner states representing valid solutions. With a = 0, the zero state is the
unique minimum state. Then, the possible locations of the minimum states of the associated energy
function for different parameter values a and b were derived.

Simulations have shown that the quality of the solutions improves as the value of param-
eter b increases up to a point. However, invalid results may occur if b is greater than a certain
threshold. The network may become unstable or the minimum states may move away from their
original locations if the connection errors exceeds a certain value. In general, the smaller the cor-
responding total path of a minimum state is, the more likely the minimum state can be preserved
in its associated valid region. The basins of attraction of the network’s equilibria were also inves-
tigated. It has been shown that the number of attracted initial conditions, which can be used as an
index of the basins of attraction, increases as b is increased. Moreover, results indicate that the for-
mulated network can provide a near-optimal solution for different problems with sizes as large as
64 stages with 64 states in each stage.

An architecture based on a building block paradigm, in which the network is constructed
from neuron array and weight assignment chips, was described. Because of its simple and regular
structure, the architecture is a feasible implementation with current VLSI technology.

Two signiﬁcant applications based on dynamic programming neural networks were
described in this dissertation as case studies. First, an incremental autoassociative memory
network, in which memory patterns can be easily added to an existing associative memory without
any modiﬁcation to the previous connections, was proposed. It has been proven that each stored
memory pattern corresponds to a global minimum state. Simulation has shown that every desired

pattern can be successfully stored and recalled. Furthermore, the extent of the basins of attraction

122
of the global minimum state can be adjusted by controlling parameters a and b. Comparison with
two other networks from the literature shows that the incremental autoassociative memory network
is more ﬂexible and robust.

The second application is dynamic time warping in speech recognition. The network was
designed such that the minimum states of the energy function correspond to the near-best
correlation between a test and a reference pattern. Simulations for classifying speaker-dependent
isolated words, consisting of 0 to 9 and A to Z, have shown that the recognition rate of the method
is as good as that of conventional methods with the same vocabulary set. Moreover, a speech
recognition system based on this neural network method, which can recognize an utterance in 200

ms with a set of 1,000 references, was proposed.

8.2 Contributions

This dissertation has the following salient contributions:

(1) This dissertation has presented a fundamentally new and different approach to dynamic
programming. This artiﬁcial neural network method is attractive due to its robustness
and radically improved speed over conventional techniques.

(2) This dissertation has provided a new analysis method for neural networks from the
energy function point of view. These analytical results lay a mathematical foundation
for the future research.

(3) This dissertation has successfully applied the dynamic programrnin g neural networks to
different applications such as optimal control, autoassociative memory, speech

recognition, and ﬂight planning.

APPENDICES

 

APPENDIX A

The 50x60 cost values:

0.000000 0.084274 0.078459 0.068828 0.044160 0.014342 0.004604 0.004604
0.007609 0.014133 0.034845 0.102514 0.112155 0.108527 0.086704 0.060406
0.042231 0.005931 0.017125 0.018539 0.038956 0.111514 0.132861 0.141905
0.124018 0.097794 0.080150 0.041049 0.044340 0.020466 0.039012 0.111668
0.133011 0.141958 0.124021 0.102154 0.083596 0.042705 0.046155 0.020466
0.039012 0.111668 0.133011 0.142258 0.124221 0.102438 0.083758 0.042992
0.046453 0.020653 0.039210 0.111930 0.134162 0.142304 0.124221 0.102438
0.083758 0.043160 0.046601 0.020745

0.000000 0.078297 0.085005 0.058051 0.038992 0.009139 0.009139 0.009139
0.009139 0.013671 0.019753 0.090301 0.091803 0.099115 0.084584 0.051351
0.011106 0.005291 0.011485 0.018510 0.019753 0.098548 0.095402 0.133081
0.121798 0.087752 0.048353 0.040422 0.041041 0.020563 0.020054 0.098789
0.095651 0.133121 0.140290 0.109107 0.060448 0.043609 0.043095 0.020563
0.020054 0.098789 0.095651 0.133121 0.140471 0.110876 0.061522 0.043877
0.043347 0.020818 0.020236 0.099391 0.095787 0.133121 0.140471 0.110876
0.061522 0.043910 0.043447 0.022108

0.000000 0.065918 0.047892 0.032700 0.018501 0.018501 0.018501 0.018501
0.018501 0.018501 0.031112 0.078490 0.052349 0.074612 0.079573 0.012063
0.029771 0.003365 0.001970 0.000951 0.031239 0.083819 0.071379 0.104705
0.114286 0.047060 0.036348 0.040704 0.006716 0.007825 0.031523 0.083872
0.071427 0.113629 0.138166 0.113242 0.078295 0.048324 0.008667 0.008854
0.031523 0.083872 0.071427 0.113629 0.138302 0.113567 0.078495 0.048604
0.008895 0.009066 0.032291 0.084066 0.071427 0.113629 0.138302 0.113567
0.078495 0.048604 0.009186 0.009891

0.095740 0.024221 0.025691 0.013847 0.008064 0.008064 0.008064 0.008064
0.008064 0.008064 0.021106 0.035518 0.029386 0.018773 0.045479 0.022924
0.016031 0.009240 0.004304 0.004002 0.021124 0.039101 0.040989 0.043687
0.077465 0.053312 0.047119 0.035041 0.007469 0.004574 0.021127 0.039101
0.040989 0.066690 0.122496 0.094759 0.084801 0.073846 0.011204 0.012461
0.021852 0.039101 0.040997 0.066740 0.122733 0.094988 0.084901 0.074097
0.013410 0.012547 0.021880 0.039101 0.040997 0.066740 0.122733 0.094988
0.084901 0.074097 0.013410 0.012817

123

 

 

124

0.084196 0.008955 0.001701 0.001701 0.001701 0.001701 0.001701 0.001701
0.001701 0.001701 0.010239 0.013617 0.008643 0.006131 0.018624 0.013942
0.014948 0.008470 0.004492 0.004175 0.010309 0.014043 0.014882 0.020080
0.022032 0.035962 0.035928 0.011133 0.005481 0.004341 0.010374 0.018089
0.082452 0.061164 0.079141 0.131332 0.101828 0.045423 0.008961 0.005942
0.010374 0.018089 0.082465 0.061231 0.079410 0.131669 0.102035 0.045789
0.009334 0.006072 0.010374 0.018089 0.082465 0.061231 0.079410 0.131669
0.102035 0.045789 0.009334 0.00607 2

0.079307 0.079307 0.079307 0.079307 0.079307 0.079307 0.079307 0.079307
0.079307 0.079307 0.079307 0.001504 0.001664 0.003166 0.003734 0.003317
0.003722 0.001779 0.001067 0.000816 0.000985 0.001587 0.001764 0.007138
0.010230 0.010728 0.004610 0.002269 0.001519 0.000983 0.005858 0.009982
0.018336 0.063641 0.078575 0.027667 0.014481 0.007524 0.003569 0.000983
0.005858 0.009985 0.018355 0.064170 0.078843 0.028146 0.014745 0.007 819
0.003909 0.001113 0.005858 0.009985 0.018355 0.064170 0.07 8843 0.028146
0.014745 0.007819 0.003909 0.001113

0.074540 0.074540 0.074540 0.074540 0.074540 0.074540 0.074540 0.074540
0.074540 0.074540 0.074540 0.074540 0.074540 0.074540 0.002125 0.002196
0.001317 0.000496 0.000330 0.000519 0.000634 0.002134 0.001621 0.001202
0.004060 0.004477 0.001511 0.001702 0.000562 0.000785 0.002441 0.006723
0.011156 0.046189 0.079500 0.017260 0.001511 0.001702 0.000562 0.000785
0.002441 0.006723 0.011194 0.046317 0.079867 0.017833 0.001649 0.001800
0.000680 0.000818 0.002441 0.006727 0.011194 0.046317 0.079867 0.017833
0.001649 0.001800 0.000680 0.000818

0.089847 0.089847 0.089847 0.089847 0.089847 0.089847 0.089847 0.089847
0.089847 0.089847 0.089847 0.001582 0.001582 0.001582 0.001436 0.001436
0.001436 0.001436 0.000840 0.000581 0.000957 0.001808 0.001808 0.001808
0.002554 0.002554 0.002554 0.003443 0.001050 0.000845 0.001096 0.002891
0.005736 0.008696 0.007455 0.004714 0.004714 0.003443 0.002202 0.000845
0.001096 0.002908 0.005802 0.009602 0.007760 0.004991 0.003652 0.003499
0.002309 0.000879 0.001096 0.002908 0.005802 0.009602 0.007760 0.004991
0.003652 0.003499 0.002309 0.000879

0.089367 0.089367 0.089367 0.089367 0.089367 0.089367 0.089367 0.089367
0.089367 0.089367 0.089367 0.001567 0.000687 0.000687 0.000687 0.000687
0.000251 0.000413 0.000326 0.000494 0.001775 0.003315 0.001771 0.001771
0.001771 0.001119 0.000396 0.000718 0.000655 0.000873 0.001995 0.004231
0.003756 0.006838 0.004328 0.001180 0.000396 0.000718 0.000655 0.000873
0.001996 0.004259 0.003909 0.007110 0.004503 0.001435 0.001278 0.000863
0.001953 0.000922 0.001996 0.004259 0.003909 0.007110 0.004503 0.001435
0.001278 0.000863 0.001953 0.000922

125

0.096186 0.104923 0.142991 0.139899 0.123460 0.093852 0.044177 0.051987
0.027380 0.021356 0.039847 0.106748 0.143206 0.139971 0.123507 0.093921
0.046200 0.052235 0.027479 0.024214 0.039989 0.106785 0.143207 0.140299
0.123623 0.094110 0.046389 0.053425 0.027684 0.024637 0.040350 0.106988
0.143339 0.140966 0.125173 0.094286 0.046487 0.053489 0.027743 0.024709
0.040439 0.108220 0.143653 0.143040 0.126656 0.094639 0.046665 0.053637
0.028049 0.025566 0.040445 0.108220 0.143653 0.143040 0.126656 0.094639
0.046665 0.053637 0.028049 0.025566

0.074625 0.113498 0.134280 0.142469 0.124337 0.102448 0.083758 0.043160
0.046601 0.020745 0.039501 0.113721 0.134400 0.143201 0.133389 0.106048
0.087341 0.046488 0.052456 0.024666 0.040583 0.114253 0.134449 0.143201
0.133819 0.106195 0.087569 0.046672 0.052620 0.025012 0.040961 0.114555
0.134709 0.143830 0.133936 0.107998 0.087802 0.046803 0.052738 0.025196
0.041180 0.116807 0.136736 0.144040 0.134329 0.108437 0.088521 0.047662
0.053017 0.025350 0.041180 0.116807 0.136736 0.144040 0.134329 0.108437
0.088521 0.047913 0.053023 0.025377

0.075134 0.101420 0.095907 0.134315 0.141322 0.110974 0.061522 0.043910
0.043447 0.022108 0.021378 0.101420 0.095941 0.135529 0.145395 0.126147
0.086452 0.048697 0.051433 0.025200 0.023065 0.102345 0.096234 0.135529
0.145395 0.126200 0.088035 0.048942 0.051521 0.025377 0.023304 0.102509
0.097863 0.135713 0.146094 0.126329 0.088215 0.049067 0.051618 0.025444
0.023454 0.102656 0.097959 0.136141 0.147216 0.126750 0.089733 0.049199
0.051731 0.025444 0.023454 0.102656 0.097959 0.136141 0.147216 0.126819
0.091398 0.049593 0.052151 0.025444

0.088932 0.085376 0.071478 0.114632 0.138420 0.113645 0.07 8495 0.048604
0.009186 0.009891 0.033774 0.085376 0.071478 0.114812 0.142974 0.121308
0.085924 0.054358 0.027330 0.015630 0.036933 0.085819 0.071777 0.114812
0.142974 0.121308 0.086052 0.054462 0.027375 0.015664 0.036997 0.085899
0.071899 0.116411 0.144107 0.121378 0.086164 0.054533 0.027488 0.015745
0.037034 0.085968 0.072091 0.116626 0.144285 0.121613 0.086410 0.054619
0.027488 0.015745 0.037034 0.085968 0.072124 0.117036 0.145507 0.130535
0.087936 0.055993 0.032256 0.015970

0.091389 0.040357 0.041684 0.068088 0.122892 0.095079 0.084901 0.074097
0.013410 0.012817 0.022078 0.040357 0.041684 0.072424 0.132912 0.116698
0.146721 0.086507 0.018820 0.016978 0.023929 0.041328 0.042153 0.072431
0.132912 0.116698 0.146773 0.086507 0.018820 0.016999 0.023975 0.041385
0.042281 0.072647 0.133180 0.116939 0.146957 0.086507 0.018820 0.016999
0.024044 0.041453 0.042469 0.072806 0.133312 0.117112 0.147021 0.086507
0.018820 0.016999 0.024044 0.041524 0.042621 0.073858 0.135326 0.120654
0.151169 0.105772 0.035364 0.018110

126

0.065059 0.019289 0.082968 0.062783 0.081108 0.131742 0.102035 0.045789
0.009334 0.006072 0.011181 0.019289 0.083551 0.066265 0.092754 0.210739
0.154901 0.090700 0.059835 0.015111 0.014591 0.028154 0.084261 0.070244
0.092754 0.2107 39 0.154901 0.090700 0.059835 0.015125 0.014623 0.028200
0.084337 0.070346 0.092850 0.210884 0.155013 0.090700 0.059835 0.015125
0.014623 0.028200 0.084452 0.070416 0.092884 0.211148 0.155013 0.090700
0.059835 0.015125 0.014623 0.028270 0.085198 0.073440 0.097452 0.217360
0.188825 0.124065 0.086112 0.030682

0.059192 0.011365 0.018425 0.065374 0.079084 0.029137 0.015159 0.007 819
0.003909 0.001113 0.005908 0.011440 0.021838 0.071727 0.095355 0.064497
0.054709 0.047702 0.041985 0.038599 0.008802 0.022501 0.026781 0.072783
0.095355 0.064497 0.054775 0.047702 0.041985 0.038602 0.008809 0.022531
0.026812 0.072813 0.095371 0.064539 0.054821 0.047702 0.041985 0.038602
0.008809 0.022531 0.026838 0.073134 0.096962 0.066890 0.055162 0.047702
0.041985 0.038602 0.008809 0.022761 0.032459 0.077636 0.103061 0.103206
0.090400 0.082687 0.075780 0.059898

0.087031 0.008631 0.012501 0.047269 0.080993 0.018009 0.001791 0.001846
0.000680 0.000818 0.002490 0.008631 0.012501 0.054877 0.093342 0.043162
0.038811 0.041978 0.014666 0.029396 0.016179 0.010026 0.016472 0.054877
0.093342 0.043162 0.038940 0.042209 0.014829 0.029525 0.016306 0.010089
0.016488 0.054887 0.093353 0.043179 0.038940 0.042209 0.014829 0.029525
0.016306 0.012688 0.020454 0.062141 0.102047 0.052207 0.043750 0.042409
0.014829 0.029525 0.016306 0.013014 0.028045 0.083752 0.134096 0.087636
0.080198 0.079019 0.049327 0.057656

0.073765 0.004105 0.007291 0.011094 0.008751 0.005700 0.004785 0.003676
0.003106 0.000933 0.001675 0.004157 0.008082 0.017533 0.018797 0.022157
0.028976 0.010188 0.006952 0.002985 0.003010 0.009079 0.011137 0.017533
0.018797 0.022187 0.030298 0.012125 0.007749 0.004068 0.003302 0.009203
0.011209 0.017545 0.018799 0.022187 0.030301 0.012129 0.007749 0.004068
0.003302 0.015053 0.023259 0.036834 0.042815 0.041243 0.047361 0.013460
0.008224 0.004097 0.003302 0.015053 0.024276 0.056289 0.076209 0.077581
0.085022 0.050051 0.042550 0.008519

0.057531 0.005027 0.004842 0.007411 0.006030 0.001863 0.001417 0.001142
0.002213 0.002282 0.002198 0.006177 0.006476 0.011748 0.011158 0.003242
0.003590 0.002511 0.002809 0.003359 0.011633 0.006914 0.007062 0.011748
0.011187 0.003356 0.003788 0.002700 0.002858 0.003505 0.011910 0.007575
0.007251 0.011748 0.011187 0.003356 0.003788 0.002703 0.002858 0.003505
0.014819 0.020207 0.031794 0.043173 0.044164 0.037296 0.029755 0.018699
0.007715 0.003534 0.014819 0.020259 0.035568 0.045513 0.069374 0.071268
0.064554 0.053220 0.036960 0.007050

127

0.040026 0.108685 0.143757 0.143217 0.126837 0.095779 0.047268 0.054559
0.029453 0.026731 0.040563 0.109361 0.143905 0.143613 0.127936 0.101525
0.048228 0.060819 0.030344 0.028010 0.041548 0.109680 0.144937 0.143635
0.128047 0.101727 0.049095 0.061472 0.030491 0.028206 0.041716 0.109757
0.145161 0.143642 0.128047 0.101727 0.049095 0.061472 0.030491 0.028415
0.046720 0.128485 0.176477 0.178744 0.165046 0.138163 0.084097 0.065296
0.039891 0.028678 0.046720 0.128505 0.178314 0.187154 0.183469 0.160480
0.111180 0.090787 0.059823 0.038243

0.047013 0.117183 0.136810 0.144576 0.135351 0.110064 0.089556 0.047992
0.054027 0.026606 0.042427 0.117986 0.137684 0.145172 0.135562 0.110433
0.095218 0.052736 0.055827 0.026653 0.042459 0.118032 0.137738 0.145282
0.135686 0.110611 0.095329 0.052873 0.056015 0.026865 0.042587 0.118075
0.137761 0.145292 0.135686 0.110611 0.095329 0.052873 0.056015 0.027079
0.050892 0.140764 0.170776 0.182250 0.172903 0.148729 0.130135 0.079020
0.065624 0.028882 0.050892 0.140764 0.171657 0.184366 0.180301 0.160796
0.144143 0.094425 0.073872 0.034107

0.041253 0.102995 0.098010 0.136720 0.148225 0.127043 0.093088 0.049814
0.053680 0.027087 0.025608 0.104472 0.099459 0.138320 0.150196 0.128968
0.094932 0.053199 0.053984 0.027330 0.025718 0.104563 0.099624 0.138565
0.150356 0.129092 0.096500 0.053474 0.054142 0.027440 0.025820 0.104611
0.099644 0.138604 0.150356 0.129092 0.096500 0.053474 0.054142 0.027440
0.032096 0.124894 0.133494 0.174308 0.188385 0.167040 0.130805 0.076223
0.055975 0.027914 0.032126 0.124894 0.133494 0.174311 0.189647 0.170413
0.136814 0.078218 0.058205 0.028983

0.028983 0.086133 0.072167 0.117093 0.146094 0.131324 0.088969 0.056150
0.033415 0.016084 0.037198 0.086859 0.073255 0.118597 0.146539 0.131541
0.089098 0.056339 0.033698 0.016274 0.037337 0.086953 0.07 3255 0.118709
0.146666 0.131644 0.089347 0.058012 0.033825 0.016322 0.037402 0.086998
0.073283 0.118713 0.146666 0.131644 0.089347 0.058012 0.033825 0.016322
0.041184 0.089605 0.099125 0.151874 0.153545 0.170811 0.118776 0.081311
0.034687 0.016452 0.041184 0.089605 0.099125 0.151874 0.153545 0.171295
0.119203 0.081311 0.034687 0.016452

0.008839 0.041566 0.042640 0.073895 0.135505 0.120885 0.151201 0.106167
0.036072 0.018172 0.027109 0.041948 0.042771 0.074637 0.135642 0.120990
0.151311 0.106205 0.036177 0.019206 0.027241 0.042024 0.042771 0.074637
0.135704 0.121129 0.151438 0.106991 0.036257 0.019268 0.027283 0.042119
0.042848 0.074673 0.135704 0.121129 0.151438 0.106991 0.036257 0.019272
0.029080 0.050554 0.060769 0.096649 0.162990 0.147307 0.155325 0.108597
0.036442 0.019273 0.029080 0.050554 0.060769 0.096649 0.162990 0.147307
0.155325 0.108597 0.036442 0.019273

128

0.019273 0.028270 0.085204 0.073456 0.097540 0.217480 0.188841 0.124287
0.086159 0.030723 0.019661 0.028652 0.085772 0.073541 0.097562 0.217563
0.188844 0.124429 0.086353 0.030978 0.019760 0.028826 0.085772 0.07 3541
0.097701 0.217821 0.188896 0.124510 0.086422 0.031052 0.019825 0.029074
0.086038 0.073662 0.097701 0.217821 0.188896 0.124521 0.086422 0.031071
0.020523 0.031002 0.093305 0.085795 0.112160 0.233943 0.190266 0.125467
0.086422 0.031071 0.020523 0.031002 0.093305 0.085795 0.112160 0.233943
0.190266 0.125467 0.086422 0.031071

0.031071 0.022761 0.032459 0.077638 0.103061 0.103207 0.090400 0.082687
0.075782 0.059906 0.010543 0.022819 0.032790 0.077677 0.103197 0.103207
0.090532 0.082764 0.075946 0.060039 0.010682 0.022929 0.032790 0.077677
0.103270 0.103315 0.090577 0.082802 0.075977 0.060101 0.010770 0.023272
0.032848 0.077982 0.103575 0.103544 0.090656 0.082914 0.076041 0.060215
0.010815 0.023272 0.034040 0.078579 0.104561 0.103848 0.091061 0.083988
0.076041 0.060215 0.010815 0.023272 0.034040 0.078579 0.104561 0.103848
0.091061 0.083988 0.076041 0.060215

0.060215 0.013014 0.028045 0.083752 0.134096 0.087636 0.080198 0.079019
0.049327 0.057658 0.018233 0.013030 0.028056 0.083837 0.134103 0.087652
0.080247 0.079047 0.049429 0.057733 0.018278 0.013059 0.028056 0.083837
0.134103 0.087681 0.080274 0.079078 0.049469 0.057806 0.018367 0.013111
0.028557 0.084311 0.134757 0.088132 0.080472 0.079324 0.049617 0.057847
0.018588 0.013118 0.028557 0.084757 0.135099 0.088132 0.080472 0.079324
0.049617 0.057847 0.018588 0.013118 0.028557 0.084757 0.135099 0.088132
0.080472 0.079324 0.049617 0.057847

0.000000 0.015053 0.024276 0.056289 0.07 6209 0.077581 0.085022 0.050051
0.042550 0.008519 0.004797 0.015053 0.024276 0.056298 0.076209 0.077581
0.085030 0.050071 0.042613 0.008554 0.004824 0.015079 0.024276 0.056303
0.07 6209 0.077609 0.085065 0.050118 0.042736 0.008749 0.004959 0.015168
0.025359 0.057238 0.077264 0.077775 0.086000 0.050877 0.043252 0.008833
0.005397 0.015193 0.025405 0.057238 0.077264 0.077775 0.086000 0.050877
0.043252 0.008833 0.005397 0.015259 0.025405 0.057238 0.077264 0.077775
0.086000 0.050877 0.043252 0.008833

0.000000 0.020259 0.035568 0.045513 0.069374 0.071268 0.064554 0.053220
0.036960 0.007050 0.020442 0.020259 0.035568 0.045515 0.069374 0.071269
0.064554 0.053220 0.037042 0.007060 0.020442 0.020259 0.035568 0.045525
0.069390 0.071293 0.064617 0.053355 0.037191 0.007214 0.020568 0.020437
0.035877 0.045796 0.069609 0.071516 0.066383 0.054909 0.037352 0.007885
0.020697 0.020851 0.035896 0.045901 0.069609 0.071516 0.066383 0.054946
0.037390 0.008062 0.021118 0.021304 0.036060 0.045901 0.069619 0.071534
0.066388 0.054946 0.037390 0.008062

129

0.000000 0.128505 0.178314 0.187154 0.183469 0.160480 0.111180 0.090787
0.059823 0.038243 0.048043 0.128505 0.178314 0.187154 0.183469 0.160480
0.111180 0.0907 87 0.059823 0.038247 0.048043 0.128514 0.178332 0.187170
0.183469 0.160526 0.111261 0.090928 0.059881 0.038310 0.048144 0.128602
0.178496 0.187377 0.184589 0.161360 0.112292 0.091174 0.060054 0.038878
0.048582 0.128728 0.178634 0.187378 0.184589 0.161360 0.112292 0.091630
0.060703 0.040293 0.049975 0.139709 0.184752 0.188268 0.184808 0.161760
0.112414 0.091637 0.060716 0.040295

0.000000 0.140764 0.171657 0.184366 0.180301 0.160796 0.144143 0.094425
0.073872 0.034107 0.050892 0.140764 0.171657 0.184366 0.180301 0.160796
0.144143 0.094425 0.07 3872 0.034107 0.050903 0.140791 0.171703 0.184385
0.180301 0.160853 0.144329 0.094501 0.073911 0.034159 0.051026 0.140853
0.171823 0.185440 0.181606 0.161755 0.144807 0.095648 0.075274 0.035310
0.051774 0.140958 0.172098 0.185567 0.181720 0.162626 0.145326 0.104534
0.090163 0.055429 0.055113 0.144102 0.173994 0.186744 0.182133 0.162684
0.145385 0.105021 0.090206 0.055451

0.055451 0.124894 0.133494 0.174311 0.189647 0.170413 0.136814 0.078218
0.058205 0.028983 0.032126 0.124894 0.133494 0.174311 0.189647 0.170413
0.136814 0.07 8218 0.058205 0.028985 0.032138 0.124927 0.133566 0.174338
0.189677 0.170500 0.136995 0.078326 0.058315 0.029075 0.032381 0.125046
0.133760 0.175678 0.191456 0.172373 0.138087 0.078748 0.060046 0.031166
0.033522 0.125861 0.134398 0.175896 0.192218 0.173957 0.151243 0.102475
0.085342 0.063468 0.067472 0.158959 0.161036 0.192567 0.192968 0.174087
0.151317 0.103293 0.085915 0.063849

0.063849 0.089605 0.099125 0.151874 0.153545 0.171295 0.119203 0.081311
0.034687 0.016452 0.041184 0.089605 0.099125 0.151874 0.153545 0.171295
0.119203 0.081311 0.034687 0.016452 0.041194 0.089639 0.099216 0.151975
0.153646 0.171363 0.119311 0.081399 0.034811 0.016681 0.041315 0.089698
0.099893 0.152575 0.155003 0.173385 0.121281 0.082099 0.038341 0.018959
0.042862 0.090982 0.100031 0.153155 0.157773 0.194173 0.126090 0.103422
0.087879 0.061593 0.079755 0.126793 0.133556 0.174598 0.168289 0.194348
0.126221 0.104715 0.089285 0.062824

0.000000 0.050554 0.060769 0.096649 0.162990 0.147307 0.155325 0.108597
0.036442 0.019273 0.029080 0.050554 0.0607 69 0.096649 0.162990 0.147307
0.155325 0.108597 0.036442 0.019274 0.029097 0.050595 0.060874 0.096852
0.163126 0.147381 0.155374 0.108694 0.036624 0.019741 0.029108 0.050595
0.060875 0.096852 0.166342 0.152923 0.167025 0.119680 0.047138 0.025923
0.030742 0.050858 0.061037 0.098967 0.197857 0.185960 0.200736 0.177100
0.105375 0.077186 0.072076 0.087498 0.095651 0.102021 0.198907 0.186483
0.200989 0.178591 0.106959 0.078820

129

0.000000 0.128505 0.178314 0.187154 0.183469 0.160480 0.111180 0.090787
0.059823 0.038243 0.048043 0.128505 0.178314 0.187154 0.183469 0.160480
0.111180 0.0907 87 0.059823 0.038247 0.048043 0.128514 0.178332 0.187170
0.183469 0.160526 0.111261 0.090928 0.059881 0.038310 0.048144 0.128602
0.178496 0.187377 0.184589 0.161360 0.112292 0.091174 0.060054 0.038878
0.048582 0.128728 0.178634 0.187378 0.184589 0.161360 0.112292 0.091630
0.060703 0.040293 0.049975 0.139709 0.184752 0.188268 0.184808 0.161760
0.112414 0.091637 0.060716 0.040295

0.000000 0.140764 0.171657 0.184366 0.180301 0.160796 0.144143 0.094425
0.07 3872 0.034107 0.050892 0.140764 0.171657 0.184366 0.180301 0.160796
0.144143 0.094425 0.07 3872 0.034107 0.050903 0.140791 0.171703 0.184385
0.180301 0.160853 0.144329 0.094501 0.073911 0.034159 0.051026 0.140853
0.171823 0.185440 0.181606 0.161755 0.144807 0.095648 0.075274 0.035310
0.051774 0.140958 0.172098 0.185567 0.181720 0.162626 0.145326 0.104534
0.090163 0.055429 0.055113 0.144102 0.173994 0.186744 0.182133 0.162684
0.145385 0.105021 0.090206 0.055451

0.055451 0.124894 0.133494 0.174311 0.189647 0.170413 0.136814 0.078218
0.058205 0.028983 0.032126 0.124894 0.133494 0.174311 0.189647 0.170413
0.136814 0.07 8218 0.058205 0.028985 0.032138 0.124927 0.133566 0.174338
0.189677 0.170500 0.136995 0.07 8326 0.058315 0.029075 0.032381 0.125046
0.133760 0.175678 0.191456 0.172373 0.138087 0.078748 0.060046 0.031166
0.033522 0.125861 0.134398 0.175896 0.192218 0.173957 0.151243 0.102475
0.085342 0.063468 0.067472 0.158959 0.161036 0.192567 0.192968 0.174087
0.151317 0.103293 0.085915 0.063849

0.063849 0.089605 0.099125 0.151874 0.153545 0.171295 0.119203 0.081311
0.034687 0.016452 0.041184 0.089605 0.099125 0.151874 0.153545 0.171295
0.119203 0.081311 0.034687 0.016452 0.041194 0.089639 0.099216 0.151975
0.153646 0.171363 0.119311 0.081399 0.034811 0.016681 0.041315 0.089698
0.099893 0.152575 0.155003 0.173385 0.121281 0.082099 0.038341 0.018959
0.042862 0.090982 0.100031 0.153155 0.157773 0.194173 0.126090 0.103422
0.087879 0.061593 0.079755 0.126793 0.133556 0.174598 0.168289 0.194348
0.126221 0.104715 0.089285 0.062824

0.000000 0.050554 0.060769 0.096649 0.162990 0.147307 0.155325 0.108597
0.036442 0.019273 0.029080 0.050554 0.0607 69 0.096649 0.162990 0.147307
0.155325 0.108597 0.036442 0.019274 0.029097 0.050595 0.060874 0.096852
0.163126 0.147381 0.155374 0.108694 0.036624 0.019741 0.029108 0.050595
0.060875 0.096852 0.166342 0.152923 0.167025 0.119680 0.047138 0.025923
0.030742 0.050858 0.061037 0.098967 0.197857 0.185960 0.2007 36 0.177100
0.105375 0.077186 0.072076 0.087498 0.095651 0.102021 0.198907 0.186483
0.200989 0.178591 0.106959 0.07 8820

130

0.000000 0.031002 0.093305 0.085795 0.112160 0.233943 0.190266 0.125467
0.086422 0.031071 0.020523 0.031002 0.093305 0.085795 0.112160 0.233943
0.190266 0.125467 0.086422 0.031071 0.020540 0.031310 0.093474 0.085988
0.112245 0.234005 0.190362 0.125608 0.086516 0.031087 0.020540 0.031310
0.093474 0.086002 0.119292 0.249645 0.211239 0.151128 0.108303 0.049731
0.028496 0.031885 0.093762 0.087494 0.161705 0.286236 0.247868 0.189686
0.175708 0.105676 0.066388 0.068014 0.129229 0.108173 0.162992 0.286616
0.248051 0.190621 0.176208 0.106599

0.106599 0.023272 0.034040 0.07 8579 0.104561 0.103848 0.091061 0.083988
0.076041 0.060215 0.010815 0.023272 0.034040 0.07 8579 0.104561 0.103848
0.091061 0.083988 0.076041 0.060215 0.011054 0.023768 0.034207 0.080103
0.104609 0.103967 0.091331 0.084095 0.076168 0.060215 0.011054 0.023768
0.034207 0.084265 0.120884 0.129146 0.124864 0.118300 0.111812 0.088513
0.026512 0.024997 0.034360 0.084460 0.126202 0.176799 0.165853 0.156333
0.171793 0.126444 0.060578 0.058192 0.061962 0.087260 0.126724 0.177847
0.167075 0.157888 0.172857 0.126538

0.126538 0.013118 0.028557 0.084757 0.135099 0.088132 0.080472 0.079324
0.049617 0.057847 0.018588 0.013118 0.028557 0.084757 0.135099 0.088132
0.080472 0.079324 0.049617 0.057851 0.018644 0.013902 0.028725 0.084980
0.135952 0.089921 0.080706 0.079702 0.049617 0.057851 0.018644 0.013902
0.029001 0.092446 0.155197 0.123206 0.116710 0.116875 0.086617 0.092593
0.041215 0.028283 0.029199 0.092486 0.155257 0.133727 0.165289 0.154661
0.136201 0.132680 0.074098 0.050353 0.046957 0.101128 0.155453 0.135035
0.166705 0.156599 0.138162 0.133725

0.000000 0.015259 0.025405 0.057238 0.077264 0.077775 0.086000 0.050877
0.043252 0.008833 0.005397 0.015259 0.025405 0.057238 0.077264 0.077775
0.086000 0.050877 0.043255 0.008846 0.005446 0.015841 0.025522 0.057474
0.077394 0.077972 0.086143 0.051066 0.043255 0.008846 0.005446 0.015841
0.026942 0.065499 0.102799 0.111010 0.123907 0.088142 0.080180 0.044463
0.032247 0.017881 0.030886 0.065526 0.102849 0.111035 0.135543 0.139015
0.115101 0.069569 0.046775 0.030179 0.031755 0.066304 0.102913 0.111532
0.135930 0.139770 0.116197 0.070718

0.070718 0.024567 0.036060 0.045901 0.069619 0.071534 0.066388 0.054946
0.037390 0.008062 0.021118 0.024567 0.036060 0.045901 0.069619 0.071534
0.066388 0.054946 0.037399 0.008103 0.021185 0.025188 0.036217 0.048039
0.070593 0.071679 0.066545 0.055177 0.037399 0.008103 0.021185 0.025188
0.036217 0.056068 0.090974 0.105671 0.103295 0.092529 0.074999 0.014833
0.044657 0.027209 0.041229 0.056134 0.091043 0.105713 0.103315 0.098465
0.099828 0.018908 0.049272 0.027749 0.042471 0.056170 0.091043 0.105713
0.103315 0.098465 0.100126 0.019830

131

0.000000 0.139709 0.184752 0.188268 0.184808 0.161760 0.112414 0.091637
0.060716 0.040295 0.049979 0.139709 0.184752 0.188268 0.184808 0.161760
0.112414 0.091637 0.060724 0.040331 0.050031 0.139768 0.185885 0.188466
0.184886 0.161932 0.112520 0.091828 0.060724 0.040331 0.050031 0.139768
0.185885 0.192661 0.199693 0.188126 0.146766 0.127541 0.096489 0.072710
0.069377 0.140991 0.186069 0.192760 0.200181 0.188154 0.146856 0.130148
0.109018 0.073762 0.069494 0.141052 0.186069 0.192760 0.200181 0.188154

0.146856 0.130148 0.109018 0.074167

0.000000 0.144102 0.173994 0.186744 0.182133 0.162684 0.145385 0.105021
0.090206 0.055451 0.055136 0.144102 0.173994 0.186744 0.182133 0.162684
0.145385 0.105021 0.090212 0.055488 0.055209 0.144423 0.174540 0.189496
0.182459 0.163212 0.145723 0.105272 0.090212 0.055488 0.055209 0.144423
0.174540 0.191818 0.190233 0.180591 0.166442 0.132360 0.121665 0.060357
0.056942 0.144649 0.174739 0.192754 0.190326 0.180605 0.166442 0.132460
0.121986 0.060796 0.057014 0.144649 0.174739 0.192754 0.190326 0.180605

0.166442 0.132460 0.121986 0.060796

0.000000 0.158959 0.161036 0.192567 0.192968 0.174087 0.151317 0.103293
0.085915 0.063849 0.067798 0.158959 0.161036 0.192567 0.192968 0.174087
0.151317 0.103293 0.085921 0.063890 0.067873 0.161062 0.165771 0.201739
0.204946 0.185276 0.152326 0.104707 0.085921 0.063890 0.067873 0.161062
0.165771 0.201800 0.205216 0.190694 0.162218 0.106746 0.101512 0.074500
0.069038 0.161147 0.167737 0.201933 0.205283 0.190708 0.162218 0.106746
0.101549 0.074704 0.069046 0.161147 0.167737 0.201933 0.205283 0.190708

0.162218 0.106746 0.101549 0.074704

0.000001 0.126793 0.133556 0.174598 0.168289 0.194348 0.126221 0.104715
0.089285 0.062824 0.080694 0.126793 0.133556 0.174598 0.168289 0.194348
0.126221 0.104715 0.089290 0.062849 0.080731 0.131140 0.145080 0.195873
0.193692 0.218867 0.129024 0.113841 0.094097 0.063435 0.080731 0.131140
0.145080 0.195873 0.193692 0.219593 0.129719 0.114618 0.094470 0.063765
0.080890 0.131271 0.145164 0.195937 0.193722 0.219593 0.129719 0.114638
0.094553 0.063765 0.080890 0.131271 0.145164 0.195937 0.193722 0.219593
0.129719 0.114638 0.094553 0.063765

0.000000 0.087498 0.095651 0.102021 0.198907 0.186483 0.200989 0.178591
0. 106959 0.07 8820 0.072310 0.087498 0.095651 0.102021 0.198907 0.186483
0200989 0.178591 0.106960 0.078827 0.074611 0.097926 0.119055 0.132910
0.233182 0.222195 0.204930 0.182597 0.114702 0.080378 0.074611 0.097926
0- l19055 0.132910 0.233182 0.222356 0.205708 0.183887 0.114899 0.081128
0.075752 0.098185 0.119312 0.132988 0.233210 0.222360 0.205730 0.183887
0. 1 14972 0.081318 0.075757 0.098185 0.119312 0.132988 0.233210 0.222360
0.205730 0.183887 0.114972 0.081318

132

0.000000 0.068014 0.129229 0.108173 0.162992 0.286616 0.248051 0.190621
0.176208 0.106599 0.066601 0.068014 0.129229 0.108173 0.162992 0.286616
0.248051 0.190621 0.176208 0.107542 0.070766 0.083223 0.157788 0.142298
0.199701 0.325753 0.253098 0.195019 0.190810 0.109497 0.071444 0.083529
0.158170 0.142609 0.200165 0.326739 0.254552 0.196198 0.190908 0.109733
0.071671 0.083804 0.158407 0.142713 0.200212 0.326765 0.254651 0.196198
0.190908 0.109782 0.071671 0.083804 0.158407 0.142713 0.200212 0.326765
0.254651 0.196198 0.190908 0.109782

0.000000 0.058192 0.061962 0.087260 0.126724 0.177847 0.167075 0.157888
0.172857 0.126538 0.060728 0.058192 0.061962 0.087260 0.126724 0.177847
0.167075 0.157888 0.172857 0.126538 0.066673 0.075531 0.091114 0.123350
0.163657 0.217404 0.172067 0.184240 0.186152 0.132903 0.068504 0.076853
0.092280 0.124453 0.165032 0.219513 0.173767 0.184389 0.186403 0.135049
0.068754 0.077170 0.092597 0.124622 0.165115 0.219593 0.173885 0.184389
0.186403 0.135049 0.068754 0.077170 0.092597 0.124622 0.165115 0.219593
0.173885 0.184389 0.186403 0.135049

0.000000 0.050353 0.046957 0.101128 0.155453 0.135035 0.166705 0.156599
0.138162 0.133725 0.074434 0.050353 0.046957 0.101128 0.155453 0.135035
0.166705 0.156599 0.138162 0.133725 0.076810 0.066875 0.074410 0.135521
0.192525 0.172302 0.171228 0.181058 0.150824 0.137905 0.078895 0.068217
0.075507 0.136365 0.193682 0.173548 0.172129 0.181220 0.151089 0.138206
0.079171 0.068901 0.075748 0.136459 0.193705 0.173553 0.172129 0.181220
0.151089 0.138206 0.079171 0.068901 0.075748 0.136459 0.193705 0.173553
0.172129 0.181220 0.151089 0.138206

0.138206 0.030179 0.031755 0.066304 0.102913 0.111532 0.135930 0.139770
0.116197 0.070718 0.047585 0.030179 0.031755 0.066304 0.102913 0.111532
0.135930 0.139770 0.116197 0.070718 0.049797 0.039195 0.053889 0.097568
0.137946 0.146238 0.163959 0.161490 0.124380 0.072223 0.050610 0.040171
0.054877 0.098156 0.138637 0.146752 0.164624 0.163051 0.124561 0.073116
0.051210 0.040335 0.055034 0.098221 0.138646 0.146752 0.164624 0.163051
0.124561 0.073116 0.051210 0.040335 0.055034 0.098221 0.138646 0.146752
0.164624 0.163051 0.124561 0.073522

0.07 3522 0.027749 0.042471 0.056170 0.091043 0.105713 0.103315 0.098465
0.100126 0.019830 0.050543 0.027749 0.042471 0.056170 0.091043 0.105713
0.103315 0.098465 0.100126 0.019830 0.051697 0.033674 0.054250 0.076674
0.116309 0.108810 0.122090 0.109019 0.103749 0.021322 0.053206 0.035177
0.056358 0.078152 0.117612 0.110205 0.123631 0.109298 0.104989 0.022231
0.053436 0.035348 0.056565 0.078162 0.117612 0.110205 0.123631 0.109298
0.104989 0.022231 0.053436 0.035431 0.056605 0.07 8162 0.117612 0.110205
0.123631 0.109298 0.105286 0.023150

 

133

0.095740 0.089848 0.070838 0.050209 0.018396 0.003389 0.003389 0.003389
0.017086 0.035827 0.098392 0.122725 0.108403 0.088621 0.057121 0.008801
0.022183 0.022183 0.020133 0.039454 0.103325 0.141772 0.137085 0.123017
0.093501 0.044051 0.051729 0.026937 0.021189 0.039495 0.103325 0.141772
0.137134 0.123059 0.093502 0.044051 0.051729 0.026937 0.021189 0.039495
0.103325 0.141821 0.137982 0.123279 0.093768 0.044177 0.051987 0.027380
0.021291 0.039613 0.103557 0.142056 0.138078 0.123279 0.093768 0.044177
0.051987 0.027380 0.021332 0.039691 0.00 0.00 0.00

134

APPENDIX B

The 10x10 cost values:

1.018643 0.622603 1.223525 1.352929 1.360540 2.568414 1.323577 2.765275
1.609030 2.018555

2.699680 2.035996 0.783962 0.519023 0.431838 1.071347 0.381317 1.033432
0.703287 0.646479

2.865651 1.347955 3.175738 1.783549 2.527956 2.816140 1.716187 3.206208
1.804267 2.729310

1.368721 0.403301 1.410983 0.945442 1.126580 1.330500 0.834105 1.907370
1.255636 2.289300

2.871589 1.910008 3.376514 1.909288 2.781388 2.900722 2.015819 4.102083
2.264710 3.335683

2.106268 1.524660 2.603455 1.488128 2.292374 2.129359 1.540216 2.627110
1.524368 2.313778

3.489862 2.263908 4.167512 2.283026 3.338918 3.592025 2.427873 4.705108
3.291558 4.379825

2.357738 1.561198 2.635066 1.536828 2.320528 2.454726 2.375507 3.599548
2.533881 3.858682

3.952441 3.398223 4.921279 3.508700 4.664248 4.758035 3.638731 5.447157
3.619450 4.829804

2.206886 2.684469 2.933323 2.609965 3.726703 3.095605 3.066571 3.733389
2.695374 3.753008

BIBLIOGRAPHY

BIBLIOGRAPHY

l. DARPA, “New! Network Sgrdy,” AFCEA International Press, VA, 1988.

2.

Hopﬁeld, J.J., “Neural Networks and Physical Systems with Emergent Collective
Computational Abilities,” Proceeding of National Academic Science U .SA., Vol. 79, pp.
2554-2558, 1982.

. Rumelhart, D.E., Hinton, GE, and Williams, R.J., “Learning Internal Representations

by Error Propagation,” Parallel Distributed Processing: Explorations in the
Microstructures of Cognition, Cambridge, MA: MIT Press, 1986, Vol. I, pp. 318-362.

4. Grossberg, S., “Neural Networks and le Intelligence,” MIT Press, Cambridge MA,

1988.

. Grossberg, 8., “Adaptive Pattern Classiﬁcation and Universal Recoding: 1. Parallel

Development and Coding of Neural Feature Detectors,” Biological Cybernetics, Vol 23,
pp. 121 - 134, 1976.

. Kohonen, T., “§elf~Organization and Asspgiative Mom,” Second Edition, Springer-

Verlag, Berlin, 1988.

. Hopﬁeld, J. 1., “Artiﬁcial Neural Networks,” IEEE Circuits and Device Magazine, pp.

3-10, Sep. 1988.

. Vemuri, V., “Artiﬁcial Neural Networks: An Introduction,” Artiﬁcial Neural Networks:

Theoretical Concepts, IEEE Computer Society Press Technology Series, pp. 1-12, 1988.

9. Hebb, D., ‘ e Q_r_'ganization of Behavior,” Wiley, New York, 1949.

10. Rosenblatt, F., “Pringiples of Negodglamic ,” Spartan Books, Washington DC, 1961.

11. Lippman, R. P., “An Introduction to Computing with Neural Nets,” IEEE ASSP

Magazine, pp. 4 - 22, April, 1987.
135

136

12. Kohonen, T., “An Introduction to Neural Computing,” Neural Networks, Vol. 1, No. 1,
pp. 3- 16, 1988.

13. Kohonen, T., “Correlation Matrix Memories,” IEEE Transactions on Computers, Vol.
021, No. 4, pp. 353-359, 1972.

14. Hopﬁeld, J.J., “Neurons with Graded Response Have Collective Computational
Properties Like Those of Two-state Neurons,” Proceedings of the National Academy of
Science USA, Vol. 81, pp. 3088-3092, May 1984.

15. Hopﬁeld, 1.1. and Tank, D.W., “’Neural’ Computation of Decisions in Optimization
Problems,” Biological Cybernetics, Vol. 52, pp. 141-152, 1985.

16. Rumelhart, D.E., Hinton, GE, and Williams, R.J., “Learning Representations by Back-
Propagation Errors,” Nature, Vol. 323, pp. 533-536, 1986.

17. Minsky, M., and Papert, S., “Perceptrons,” MIT Press, Cambridge MA, 1969.
18. Hebb, D., “The Organization of Behavior,” Wiley, New York, 1949.
19. Rosenblatt, F., “Princi les of Neurod amics,” Spartan Books, Washington DC, 1961.

20. Tank, D.W. and Hopﬁeld, J.J., “Simple Neural Optimization Networks: All A/D
Converter, Signal Decision Network, and a Linear Programming Circuit,” IEEE
Transactions on Circuits and Systems, Vol. CAS-33, No. 5, pp. 533-541, 1986.

21. Kahng, A.B., “Traveling Salesman Heuristics and Embedding Dimension in the
Hopﬁeld Model,” Proceedings of the IEEE International Conference on Neural
Networks, Vol. I, pp. 513-520, 1989.

22. Wilson, G.V. and Pawley, G.S., “On the Stability of the TSP Problem Algorithm of
Hopﬁeld and Tank”, Biological Cybernetics 58, pp. 63-70, 1988.

23. Angeniol, B. et al., “Self-Organizing Feature Maps and the Travelling Salesman
Problem,” Neural Networks, Vol. 1, pp. 289-293, 1988.

24. Ma, C.-Y., and Shanblatt, M. A., “Stability of Linear Programming Neural Network
for Problems with Hypercube Feasible Region,” Proceedings of the IEEE International
Conference on Neural Networks, San Diego, CA, June 1990.

137

25. Man, C.-Y., and Shanblatt, M. A., “Improved Linear Programming Neural Networks,”
Proceedings of 32nd Midwest Symposium on Circuits and Systems, Champaign, IL, pp.
740 - 743, August, 1989.

26. Kennedy, MP. and Chua, L.O., “Neural Networks for Nonlinear Programming,” IEEE
Transactions on Circuits and Systems, Vol. CAS-35, No. 5, pp. 554-562, May 1988.

27. Maa, C.-Y., and Shanblatt, M. A., “Neural Networks for Nonlinear Programming,” in
review, IEEE Transactions on Neural Networks.

28. Ranch, HE. and Winarske, T., “Neural Networks for Routing Communication
Trafﬁc,” IEEE Control Systems Magazine, pp. 26-31, April, 1988.

29. Zhang, L. and Thomopoulos, S. C. A., “Neural Network Implementation of the Shortest
Path Algorithm for Trafﬁc Routing in Communication Networks,” Proceedings of the
IEEE International Conference on Neural Networks, Washington, D. C., Vol. 2, p. 591,
June 1989.

30. Jeffery, W. and Rosner, R., “Optimization Algorithms: Simulated Annealing and
Neural Network Processing,” Astrophysical Journal, Vol. 310, Nov. 1986, pp. 473-481.

31. Kamgar-Parsi, B. and Kamgar-Parsi, B., “An Efﬁcient Model of Neural Networks for
Optimization,” Proceedings of the IEEE International Conference on Neural Networks,
Vol. 111. pp. 785-790, 1987.

32. Levy, B.C. and Adams, M.B., “Global Optimization with Stochastic Neural Networks,”
Proceedings of the IEEE International Conference on Neural Networks, Vol. III, pp.
681-689, 1987.

33. Ramanujam, J. and Sadayappan, P., “Optimization by Neural Networks,” Proceedings
of the IEEE International Conference on Neural Networks, Vol. II, pp. 325-332, 1988.

34. Chiu, G, Maa, C.-Y., and Shanblatt, M.A., “An Artiﬁcial Neural Network Algorithm
for Dynamic Programming,”, International Journal of Neural Systems, Vol 1, No. 3,
pp. 211—220, 1990.

35. Chiu, C., Maa, C.-Y., and Shanblatt, M.A., “Energy Function Analysis for Dynamic
Programming Neural Networks,” to appear, IEEE Transactions on Neural Networks,
Vol. 2, No. 4, July, 1991.

138

36. Kirk, D.E., “Optimal Control Thm: An Introduction,” Englewood Cliffs. N.J.,
Prentice-Hall, pp. 53-78, 1970.

37. Salam, F. M. A., Wang, Y., and Choi, M. K, “On the Analysis of Dynamic Feedback
Neural Nets,” IEEE Transactions on Circuits and Systems, Vol. 38, No. 2, pp. 196 -
201, Feb. 1991.

38. Kosko, B., “Bidirectional associative memories,” IEEE Transactions on System, Man,
and Cybernetics, Vol. 18, no. 1, pp. 49-60, Jan/Feb. 1988.

39. Wang, Y.-F., et. al., “Two Coding Strategies for Bidirectional Associative Memory,”
IEEE Transactions on Neural Networks, Vol. 1, no. 1, March 1990.

40. Li, J. H., Michel, A. N., and Porod, W., “Qualitative Analysis and Synthesis of a Class
of Neural Networks,” IEEE Transactions on Circuits and Systems, Vol. 35, No. 8, pp.
976-986, Aug. 1988.

41. Farrel, J.A., and Michel, A.N., “A Synthesis Procedure for Hopﬁeld’s Continuous-
Time Associative Memory,” IEEE Transactions on Circuits and Systems, Vol. 37, No.
7, pp.877-884, July 1990.

42. Sakoe, H., and Chiba, 8., “Dynamic Programming Algorithms Optimization for Spoken
Word Recognition,” IEEE Transactions on ASSP, Vol. 26, No. 1, pp. 43 - 49, Feb. 1978.

43. Itakura, F., “Minimum Prediction Residual Principle Applied to Speech Recognition,”
IEEE Transactions on ASSP, Vol. 23, No. 1, pp. 67-72, Feb. 1975.

44. Myers, C., Rabiner, L. R., and Rosenberg, A. E., “Performance Trade-off in Dynamic
Time Warping Algorithms for Isolated Word Recognition,” IEEE Transactions on
ASSP, Vol. 28, No. 6, pp. 623-635, Dec. 1980.

45. White, G. M., and Neely, R. B., “Speech Recognition Experiments with Linear
Prediction, Bandpass Filtering and Dynamic Programming,” IEEE Transactions on
ASSP, Vol. 24. PP. 183-188, Apr. 1976.

46. Chiu, C., and Shanblatt, M. A., “A Neural Network Method for Dynamic Time Warping
on Speech Recognition,”, in review, IEEE Transactions on ASSP.

47. Eberhardt, 8., Duong, T. and Thakoor, A., “Design of Parallel Hardware Neural
Network Systems From Custom Analog VLSI “Building Block” Chips,” Proceedings

139
of the IEEE International Conference on Neural Networks, Vol. 2 pp. 183- 190, 1989.

48. Mead, C., “Analog VLSI and Neural Systems,” Addison-Wesley, 1989.
49. Hecht-Nielsen, R., “Neurocomputing,” Addison-Wesley, 1989.

50. McCulloch, W.S., and Pitts, W.H., “A Logical Calculus for the Ideas Immanent in
Nervous Activity,” Bulletin of Mathematical Biophysics, Vol. 5, pp. 115-133, 1943.

51. Hoppensteadt, F.C., “An Introduction to the Mggematics of Neurons,” Cambridge
University Press, 1986.

52. LeCun, Y., et al., “Backpropagation Applied to Handwritten Zip Code Recognition,”
Neural Computation, Vol. 1, pp. 541-551, 1989.

53. Weideman, W. B., “A Comparison of a Nearest Neighbor Classiﬁer and a Neural
Network for Numeric Handprint Character Recognition,” Proceedings of the IEEE
International Conference on Neural Networks, San Diego, Vol. I, pp. 117-120, July
1988.

54. Kammerer, B. R., and Kupper, W. A., “Experiments for Isolated-Word Recognition
with Single- and Two-layer Perceptrons,” Neural Networks, Vol. 3, pp. 693-706, 1990.

55. Waibel, A., eta1., “Phoneme Recognition Using Time-Delay Neural Networks,” IEEE
Trans. on ASSP, Vol 37, No. 3, March 1989.

56. Irino, T., and Kawahara, H., “A Method for Designing Neural Networks Using
Nonlinear Multivariate Analysis: Application to Speaker-Independent Vowel
Recognition,” Neural Computation, Vol. 2 386 - 397, 1990.

57. Guez, A., Ahmad, 2., “Solution to the Inverse Kinematics Problem in Robotics by
Neural Networks,” Proceedings of the IEEE International Conference on Neural
Networks, San Diego, Vol. II pp. 617-624, July 1988.

58. Hosogi, 8., “Manipulator Control Using Layered Neural Network Model with Self-
Organizing Mechanism,” Proceedings of the IEEE International Conference on Neural
Networks, Washington, DC, Vol II, pp. 217 - 220, Jan. 1990.

59. Nguyen, D., and Widrow, B., “The Truck Back-Upper: An Example of Self-Leaming
in Neural Networks,” Proceedings of the IEEE International Conference on Neural

140
Networks, Washington, DC, Vol. II, pp. 357 - 363, June 1989.

60. Kuperstein, M., and Wang, J., “Neural Controller for Adaptive Movements with
Unforeseen Payloads,” IEEE Transactions on Neural Networks, Vol., 1 No.1, March
1990.

61. Tsirukis, A. G., et al., “Nonlinear Optimization Using Generalized Hopﬁeld
Networks,” Neural Computation, Vol 1, No. 4, pp. 511 - 520, 1989.

62. Protzel, P. W., “Comparative Performance Measure for Neural Networks Solving
Optimization Problems,” Proceedings of the IEEE International Conference on Neural
Networks, Washington, DC, Vol II, pp. 523 - 526, Jan. 1990.

63. Li, W., and Nasrabadi, M., “Object Recognition Based on Graph Matching
Implemented by a Hopﬁeld-Style Neural Networ ,” Proceedings of the IEEE
International Conference on Neural Networks, Washington, DC, Vol II, pp. 287 - 290,
June 1989.

64. Parvin, B., and Medioni, “A Constraint Satisfaction Network for Matching 3D
Objects,” Proceedings of the IEEE International Conference on Neural Networks,
Washington, DC, Vol II, pp. 281 - 286, June 1989.

65. Mjolsness, B., Gindi, G., and Anandan, P., “Optimization in Model Matching and
Perceptual Organization,” Neural Computation, Vol. 1, pp. 218 -229, 1989.

66. Rosenbaltt, F., “The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain,” Psychological Review, Vol. 65, pp. 385 - 408, 1958.

67. Widrow, B., and Hoff, M. B., “Adaptive Switching Circuits,” WESCON Convention
Record, Vol. 4, pp. 96 - 104, 1960.

68. Weideman, W. B., “A Comparison of a Nearest Neighbor Classiﬁer and a Neural
Network for Numeric Handprint Character Recognition,” Proceedings of the IEEE
International Conference on Neural Networks, San Diego, Vol. I, pp. 117-120, July
1988.

69. Elsley, R.K., “A Leanling Architecture for Control Based on BackPropagation Neural
Networks,” Proceedings of the IEEE International Conference on Neural Networks,
San Diego, Vol. II pp. 587-594, July 1988.

141

70. Graf, H.P., et. al., “VLSI Implementation of a Neural Network Memory with Several
Hundred Neurons,” Proc. of AIP Conf. on Neural Networks for Computing, No. 151,
pp. 182-187,1986.

71. Graf, H.P., et. al., “A CMOS Association Memory Chip,” Proceedings of the IEEE
International Conference on Neural Networks, San Diego, Vol. 111, pp. 461-468, 1987.

72. Graf, H.P., et. al., “VLSI Implementation of a Neural Network Model,” Computer, Vol.
21, No. 3. PP. 41-49, March 1988.

73. Sivilotti, M.A., Ashowald, M.A., and Mead, C.V., “Real-Time Visual Computation
Using Analog CMOS Processing Array,” Advanced Research in VLSI: Proceedings of
the I 987 Stanford Conference, P. Losleben (Ed.), Cambridge, MA: MIT Press, pp. 295-
312, 1987.

74. Sivilotti, M.A., Emerling, M.R., and Mead, C.V., “VLSI Architectures for
Implementation of Neural Networks,” Proc. of AIP Conf. on Neural Network for
Computing, No. 151, pp. 408-413, 1986.

75. Vidyasagar, M., “Nonlinear Systems Analysis,” Englewood Cliffs. N.J., Prentice-Hall,
pp. 131-186, 1978.

76. Hirsch, MW., and Small, 8., “Differential Equations, Dmamical Systems and Linear
Algem,” New York, Academic Press, pp. 199-204, 1974.

 

77. Wilson, G.V. and Pawley, G.S., “On the Stability of the TSP Problem Algorithm of
Hopﬁeld and T ”, Biological Cybernetics 58, pp. 63-70, 1988.

78. Davis, G. W., Jr., “Sensitivity Analysis in Neural Net Solutions,” IEEE Transactions
on Systems, Man, and Cybernetics, Vol. 19, No. 8, 1078 - 1082, Sep./Oct. 1989.

79. Aiyer, S.V.B., Niranjan, M., and Fallside, F. “A Theoretical Investigation into the
Performance of the Hopﬁeld Model,” IEEE Transaction on Neural Networks, Vol. 1,
No 2, pp. 204-215, June 1990.

80. Abe, S, “Theories on the Hopﬁeld Networks,” Proceedings of the IEEE International
Conference on Neural Networks, Vol. I, pp. 557 - 563, July, 1989.

81. Ortega, J.M., “Matrix Theogy,” Plenum Press, New York, 1987.

82.

83.

85.

86.

87.

88.

89.

91.

92.

142

Salam, F. M. A., Wang, Y., and Choi, M. K, “On the Analysis of Dynamic Feedback
Neural Nets,” IEEE Transactions on Circuits and Systems, Vol. 38, No. 2 pp. 196 - 201 ,
Feb. 1991.

Eberhardt, S., Duong, T. and Thakoor, A., “Design of Parallel Hardware Neural Net-
work Systems From Custom Analog VLSI “Building Block” Chips,” Proceedings of
the IEEE International Conference on Neural Networks, San Diego, Vol II, pp. 183-
190 1989.

. Agranat, A. J ., Neugebauer, C. F., and Yariv, A., “A CCD Based Neural Integrated Cir-

cuit with 64 K Analog Programmable Synapses,” Proceedings of the IEEE Internation-
al Conference on Neural Networks, San Diego, Vol H, pp. 551 - 555, June 1990.

Fisher, W. A., Fujimoto, R. J ., and Okamura, M. M. “The Lockheed Programmable An-
alog Neural Network Processor,” Proceedings of the IEEE International Conference on
Neural Networks, San Diego, Vol II, pp. 563 - 568, June 1990.

Lee, B. W., Lee, J.-C., and Sheu, B. J., “VLSI Image Processors Using Analog Pro-
grammable Synapses and Neurons,” Proceedings of the IEEE International Conference
on Neural Networks, San Diego, Vol II, pp. 563 - 568, June 1990.

Swaminathan, G., et al., “Fault Tolerance in Neural Networks,” Proceedings of the
IEEE International Conference on Neural Networks, Washington, D. C. Vol II, pp. 699
- 702, June 1989.

Silverman, H F., and Morgan, D. P., “The Application of Dynamic Programming to
Connected Speech Recognition,” IEEE ASSP Magazine, pp. 7-25, July 1990.

Picone, J ., “Continuous Speech Recognition Using Hidden Markov Models,” IEEE
ASSP Magazine. PP. 26-41, July 1990.

. Itakura, F., “Minimum Prediction Residual Principle Applied to Speech Recognition,”

IEEE Transaction on ASSP, Vol. 23, No. 1, pp. 67-72, Feb. 1975.

Myers, C., Rabiner, L. R., and Rosenberg, A. B., “Performance Trade-off in Dynamic
Time Warping Algorithms for Isolated Word Recognition,” IEEE Transaction on
ASSP, Vol. 28, No. 6, pp. 623-635, Dec. 1980.

White, G. M., and Neely, R. B., “Speech Recognition Experiments with Linear Predic-

143

tion, Bandpass Filtering and Dynamic Programming,” IEEE Transaction on ASSP, Vol.
24. PP. 183-188, April 1976.

93. Kammerer, B. R., and Kupper, W. A., “Experiments for Isolated-Word Recognition
with Single- and Two-layer Perceptrons,” Neural Networks, Vol. 3, pp. 693-706, 1990.

94. Waibel, A., et al., “Phoneme Recognition Using Time-Delay Neural Networks,” IEEE
Transaction on ASSP, Vol 37, No. 3, March 1989.

95. Rabiner, L. R., Rosenberg, A. E., and Levinson, S. E., “Considerations in Dynamic
Time Warping Algorithms for Discrete Word Recognition,” IEEE Transaction on
ASSP, Vol. 26, No. 6. pp. 575 - 582, Dec. 1978.

96. Glinski, S., et al. “The Graph Search Machine (GSM): A Programmable Processor for
Connected Word Speech Recognition and Other Applications,” Proceedings of IC-
ASSP, Dallas, TX. PP. 519-522, April 1987.

97. Mann, J. R., and Rhodes, F. M., “A Wafer DTW Multiprocessor,” Proceedings ofIC-
ASSP, Tokyo, Japan, pp. 1557-1560, April 1986.

98. Jutand, F. C., et al., “VLSI Architectures for Dynamic Time Warping Using Systolic
Anays,” Proceedings of ICASSP, San Diego, CA, pp. 34A.5.1-34.A.5.4, March 1984.

99. Quenot, G., et al. “A Dynamic Time Warp VLSI Processor for Continuous Speech Rec-
ognition,” Proceedings of ICASSP, Tokyo, Japan. pp. 1549-1552, April 1986.

100. Hwang, K., “Com uter Arithmetic: Princi 1e Architecture and Desi ”John Wiley
and Sons, 1979.