a .52" 11.».5'. ”6,3 2:. a I. 1., .J?)l‘ 21..., «,7s ~ m.‘ »- .W 1 r v r . I “I”... I .I . 'I 1,... .‘i ".3 4 fl)". ’ '\ A. ,u. .. .. ..,.. a... “.44-.. o mamxtr urn"? ..,... L— Ir. I .“r'ifi’f. . L-z‘f-‘r.’ 6y»; . I» I 33-: an. s 4%. “M” ‘ 45 an 5. . v.5. 2‘?" mt 1m. no.1... - 1. I.“ -‘ 1*" m- :4. ~ . -. “.1 T .. . um... .I.-x.. .rx .31.? \..:..I:z.": .mu. ..~~.. LL..."-.:r.n. m v: mm. ~L‘N‘ -II...\.~ at: mam. «g... u. .. ~¢ \- \~t- " I. I: .. :1. J. .I .. . ~. . “ .. 2' . ...X..I.i, 1.: . .4 5-4.3» I“ «RI-.3? Q- 1—... n . H; . flIu¢.: ‘ ‘1' mm. .‘. “:23... " I I.‘i.i...-..~1_\..~.. I 5.x. v“ .Iu .. .....r.nnA-u . . .-.I::t_:;'~.- . ~...,....I. s; 43:" Im- ”w. ..u ‘- .rL- .x. .451}; ":13: I. K fun I.“ “1.3.: W oar»; -‘\.‘\ “.13.... It: ”‘36; -’ 1543.1... Laws-.1- 134 {&. £331.. .3 .132; .. .2“ ‘ uI-mxir:k ..¢II ":F‘ u. - I'vVniV‘onfl .. 'V‘V MA 54‘ I r , ‘. . . , .1..- ., :.w,~_-.-:::: .,,..:.2.‘ y. .— w. 1..-: :15} r: . .. 'I «5 “”~"\“‘ I’ ' “I“"‘“~I‘%.‘1‘.I.z........ \‘“ 5}::c:.t"~1‘ 5;..-w31'Lfiu-L.‘ .- I Innuyuhr - ‘ «x; I 1 I. u “-‘A‘rLih-a- [5540 1- .‘u.\s..:.":~u.- «:41; «I... 7&“'V -:“€:“l r.l' ”:1: '1 <, n; 3- :13 '"mm ‘:.: -w.‘—.u” m: 5 “(t 75:13.9». -m. . ’ ~mu~1m-_r“"n’ ,~I.n ....(. Iii" 31-4.. . ’mm. .11;va N h. h ‘ "m‘" 1. . 1‘. V?" ‘n w" 'a. 3d! us 1} "'u‘ :2 w.‘ I l “I N‘ ”' 5‘ “(I 131': . l r' a «JP? 'ru ”or 3": 1-,..- a r-‘1' 31:"..."r ~I-n." ’J:':’i.:;"-':‘;“f I mp»... \- . -3... . 7 ..-. ,4 u. 0..-... ,.,..., .,. I: ~ . .. ’t“ ...«~. 1* Iv... .. ..~...,_,, - w .1,. .m. .‘ -.-.;.I":..o‘,.~ u- .. 1-42.1.2. . .. J 2’1.L{7‘-{§‘“-/ llllllllllllllllllllllllllllllllllllllllllllllllllllllll 31293 00784 5963 LIBRARY Mlchlgan State 1 University This is to certify that the dissertation entitled A MASSIVELY PARALLEL ARCHITECTURE DESIGN FOR PATH PLANNING APPLICATIONS presented by Lyle Amos Reibling has been accepted towards fulfillment of the requirements for Ph .D . degree in Computer Science Major professor Date [S‘flZaE/jv /c?ci~2 MS U is an Affirmative Action/Equal Opportunity Institution 0—12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE‘DUE F fl fif‘l’fi MSU Is An Affirmative ActIon/Equal Opportunity Inditution CMNMMS-p.‘ A MASSIVELY PARALLEL ARCHITECTURE DESIGN FOR PATH PLANNING APPLICATIONS By Lyle Amos Reibling A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1992 ABSTRACT A MASSIVELY PARALLEL ARCHITECTURE DESIGN FOR PATH PLANNING APPLICATIONS BY Lyle Amos Reibling This research has investigated computer system design issues for mas- sively parallel architectures applied to path planning in two-dimensional risk fields. Physical phenomena were found to be useful in modeling shortest path plan- ning problems. These phenomena provide analogies in nature which form the basis for the models in path planning. The electrostatic model in field theory de- scribes the distribution of current flow through a nonuniform conducting media, such as a plate of nonhomogeneous resistive material. The paths of current flow are used as the analogy in nature to path planning by associating the resistivity of the media with the cost function of the path planning problem. The distribution of current flow for electrostatic fields is described by the Lapla- cian partial differential equation. An artificial neural network was designed to solve the Laplacian equation for nonhomogeneous media and was used in a massively parallel architecture to compute minimal cost and alternative paths. The scalar field for the nonhomogeneous media is found by computing a finite difference approximation to the partial differential equation. The artificial neural network computes this field. Path solutions are computed from a vector field which is orthogonal to equipotential contours of the scalar field. Connectivity of the massively parallel architecture was examined for its influ- ence on the convergence rate of the neural network approximation. Experimental investigation of this technique has shown promising results. The solutions gener- ated by the architectures have been checked against known admissible algorithm results. A simulation of an analog implementation of the artificial neural network with 3,600 neurons resulted in projected real-time convergence of 18 milliseconds. A significant result of this research was the discovery ’of an architecture design paradigm for massively parallel architectures. This paradigm is called natural parallelism. Copyright © by LYLE AMOS REIBLING 1992 DEDICATION To my wife, Anne, and our children: David, Rebekah, Rachel, and Daniel. ACKNOWLEDGEMENTS I am deeply indebted to my advisor, Dr. George Stockman, whose patience with me throughout this endeavor has been beyond measure. His guidance has been most crucial to the successful understanding and completion of this re- search. I also have deep appreciation for Dr. Michael Olinger, who has been both a friend and mentor with his intellectual stimulation and insight into this research. I wish to also extend my thanks to my employer, Smiths Industries Aerospace & Defense Systems, Inc. Through its Doctoral Assistance Program, company re- sources have been provided to support this research. I am indebted to my man- agers for their support and encouragement; in particular, the late Don Gray who encouraged me at the very onset. Others who have been instrumental through the years were Richard Corey and Michael Olinger. Without a doubt, the most critical help I received was from the Chief Librarian at Smiths Industries, Scott Brackett. l have spent many hours with Scott talking about the research, conducting litera- ture searches, reviewing documents, and explaining my investigation. I seriously doubt this research could have been undertaken without the excellent skill and interest of Scott and his library staff. I have been blessed to have close friends who have stood by me and exhorted me to continue under the most trying circumstances. To Bob, Pamela, and Dave, lsaythanks. vi Table of Contents List of Tables xi List of Figures xv List of Symbols xvi 1 Introduction 1 1.1 Nature of the Problem .......................... 1 1.1.1 Search Requirements Make Demands on Architectures . . 2 1.1.2 Massively Parallel Architecture Needs ............. 4 1.1.3 The Architecture Design Challenge .............. 6 1.2 Objective of the Research ........................ 7 1.2.1 Research Motivation ....................... 7 1.2.2 Research Method ........................ 8 1.2.3 Massively Parallel Architecture Approach ........... 8 1.3 Remaining Organization of the Dissertation .............. 9 2 Searching, Parallelism, and the Path Planning Problem 10 2.1 Search Algorithms ............................ 1O vii 2.1.1 Problem Space Representation ................ 2.1.2 Game Tree Search ....................... 2.1.3 Path Search ........................... 2.1.4 Shortest Path Search ...................... 2.1.5 Simulated Annealing ...................... 2.2 Computer Architectures and Parallelism ............... 2.2.1 Von Neumann Multiprocessing ................ 2.2.2 VLSI Technology and Massive Parallelism .......... 2.3 The Path Planning Application ..................... 2.3.1 Overview of Path Planning ................... 2.3.2 Path Generation by Searching ................. 2.4 Summary ................................. Parallel Architecture Model For Path Planning 3.1 Overview Of An Analogy In Nature ................... 3.2 The Electrostatic Model for Parallel Path Planning .......... 3.2.1 The Mathematical Model .................... 3.2.2 Analysis .............................. 3.3 Parallel Architecture Design For Path Planning ............ 3.3.1 Constructing the Difference Formula ............. 3.3.2 Defining the Neural Network Architecture ........... 3.3.3 Extracting Paths from the Neural Network Architecture . . . 3.3.4 Variable Connectivity for the Architecture ........... viii 12 19 28 34 35 37 41 43 43 44 51 55 56 64 69 71 3.4 Complexity Analysis of the Neural Network .............. 99 3.5 Summary ................................. 100 Path Planning Application Results 102 4.1 Test Case Descriptions ......................... 103 4.2 Electrostatic Model Results ....................... 104 4.2.1 Results Contrasted with Sequential Search .......... 106 4.2.2 Results of the Parallel Search Architecture .......... 107 4.2.3 Multiple Path Results ...................... 116 4.3 Variable Connectivity Results ...................... 125 4.3.1 Taylor Series Template Method ................ 125 4.3.2 Parallel Summation Method .................. 132 4.4 Connection Machine Analog Simulation Results ........... 136 4.5 Timing Results .............................. 146 Discussion of Results 150 5.1 Design for Massively Parallel Architectures .............. 150 5.1.1 The Characteristics of Parallelism ............... 151 5.2 A Massively Parallel Architecture Design Paradigm ......... 153 5.2.1 The Paradigm of Natural Parallelism ............. 153 5.2.2 Understanding the Natural Parallelism Paradigm ...... 155 5.2.3 Using the Natural Parallelism Paradigm ............ 157 5.3 Discussion of Path Planning Architecture Results .......... 159 6 Research Conclusions 6.1 Path Planning Using the Electrostatic Model ............. 6.2 Designing the Neural Network Architecture .............. 6.3 Influence of Variable Connectivity ................... 6.4 The New Paradigm of Natural Parallelism ............... 6.4.1 Natural Parallelism Related to Algorithms ........... 6.4.2 Natural Parallelism Related to Architectures ......... 6.4.3 Natural Parallelism Related to Applications .......... 6.5 Summary ................................. 6.5.1 Suggestions for Further Research ............... A Description of the Test Cases A.1 Test Case A ............................... A.2 Test Case B ............................... A.3 Test Case C ............................... A.4 Test Case D ............................... A.5 Test Cases E, F, and G ......................... Bibliography 160 161 162 164 165 165 166 167 168 169 172 172 175 175 178 178 187 List of Tables 3.1 3.2 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Coordinates of the 5-pt. Finite Difference Formula .......... Neural Network Weight Definitions for the 5-pt. Approximation . . . Path Endpoints and Optimal Costs of the Test Cases ........ Test Case Results Using the S-pt. Approximation Neural Network . Costs and Headings for 10 Best Path Values of Test Case A Costs and Headings for 10 Best Path Values of Test Case B Costs and Headings for 10 Best Path Values of Test Case C Costs and Headings for 10 Best Path Values of Test Case D Costs and Headings for 10 Best Path Values of Test Case E Costs and Headings for 10 Best Path Values of Test Case F. . . . Costs and Headings for 10 Best Path Values of Test Case G 58 67 103 106 117 119 120 121 122 123 124 4.10 Neural Network Simulation Iterations Using Taylor Series Templates 125 4.11 Least Cost Path Values Using Taylor Series Templates ....... 126 4.12 VAX Timing Results for the Neural Network Simulation ....... 148 4.13 CM-2 Timing Results for the Neural Network Simulation ...... 148 4.14 Analysis of Connections per Second on the VAX .......... 149 A1 Problem Space Size and Path Endpoints ............... 172 xi List of Figures 1.1 Example Cost Function Showing Risk Values Using Contour Lines 3 2.1 Explicit Search Tree Representation Example for a Search Space . 13 2.2 Trajectory Generation Through a Region Partitioned into Grid Cells 39 3.1 Example of Field Lines Used as Multiple Paths ........... 47 3.2 Image Dipoles Resulting from Reflecting Half-Plane A into B . . . . 60 3.3 Mapping the Approximation point 0’ for P into 0 .......... 60 3.4 Image Reflections Surrounding the Path Planning Problem Space 61 3.5 The Unique Reflection Images of the Path Planning Problem Space 61 3.6 Reflection Images Assigned to Quadrants of the Problem Space . 62 3.7 Algorithm to Map Image Indicies into Problem Space Indicies . . . 63 3.8 Neuron Model for the 5-pt. Finite Difference Approximation . . . . 65 3.9 Artificial Neural Network Op-Amp Circuit for 5-pt. Template . . . . 68 3.10 Algorithm to Extract Path Points from the Potential Surface . . . . 70 3.11 Spatial Illustration of the Euler Integration Concept ......... 74 3.12 Parallel Summation Equations Used in Variable Connectivity Analysis 76 3.13 Components of the Parallel Summation Architecture ........ 77 3.14 K=2 Connectivity of the Parallel Summation Equations ....... 80 xii 3.15 K=2 Interconnections for the Parallel Summation Architecture . . 81 3.16 K=3 Connectivity of the Parallel Summation Equations ....... 82 3.17 K=3 Interconnections for the Parallel Summation Architecture . . 83 3.18 K=4 Connectivity of the Parallel Summation Equations ....... 84 3.19 K=4 Interconnections for the Parallel Summation Architecture . . 85 3.20 K= n+1 Connected Grouping of the Parallel Summation Equations 86 3.21 K=n+1 Interconnection for the Parallel Summation Architecture . 87 3.22 Illustration of Backward Difference Parallel Summation Architecture 89 3.23 Illustration of the Central Difference Parallel Summation Architecture 91 3.24 Recursive Expansion of the Parallel Summation Architecture . . . . 92 3.25 Expansion of Central Difference Approximation for Parallel Summation 94 3.26 Two Expansions of the 5-pt. Kernel Approximation Template . . . 97 3.27 Algorithm to Generate the Parallel Summation Neuron Weights . . 98 4.1 Test Case 8 Showing a Cost Function Surface by Contours . . . . 105 4.2 Constrasting Sequential Search Path with Gradient Descent Path . 107 4.3 Multiple Path Results of the Electrostatic Model on Test Case 8 . . 109 4.4 Least Cost Path Value Found During the Neural Network Simulation 110 4.5 Algorithm of the Neural Network Parallel Path Planning Simulation 111 4.6 Convergence of Neural Outputs During Network Simulation . . . . 112 4.7 Least Cost Path Heading Found During Neural Network Simulation 113 4.8 System Block Diagram of the Electrostatic Model Neural Network. 115 4.9 Block Diagram of a Trajectory Generation System .......... 116 xiii 4.10 Plot of Test Case A Multiple Path Values Ordered by Cost ..... 117 4.11 Plot of Test Case 8 Multiple Path Values Ordered by Cost ..... 119 4.12 Plot of Test Case C Multiple Path Values Ordered by Cost ..... 120 4.13 Plot of Test Case D Multiple Path Values Ordered by Cost ..... 121 4.14 Plot of Test Case E Multiple Path Values Ordered by Cost ..... 122 4.15 Plot of Test Case F Multiple Path Values Ordered by Cost ..... 123 4.16 Plot of Test Case G Multiple Path Values Ordered by Cost ..... 124 4.17 Connectivity Curves for the Least Cost Path Found Using the Taylor Series Templates on Test Case A ................... 129 4.18 Connectivity Curves for the Least Cost Path Found Using the Taylor Series Templates on Test Case 8 ................... 130 4.19 Connectivity Curves for the Least Cost Path Found Using the Taylor Series Templates on Test Case C ................... 131 4.20 Connectivity Curves for the Least Cost Path Found Using the Parallel Summation Templates on Test Case A ................ 133 4.21 Connectivity Curves for the Least Cost Path Found Using the Parallel Summation Templates on Test Case 8 ................ 134 4.22 Connectivity Curves for the Least Cost Path Found Using the Parallel Summation Templates on Test Case C ................ 135 4.23 Operational Amplifier Schematic .................... 137 4.24 Operational Amplifier Block Diagram ................. 137 4.25 Op Amp Digital Simulation Block Diagram .............. 138 4.26 Configuration of an Op Amp Adder with Gain ............ 138 xiv 4.27 Neuron Schematic for 5-pt. Approximation using Op Amps . . . . 140 4.28 Op Amp Simulation Using a 1 Volt Input With a Gain of 2 ..... 143 4.29 Op Amp Simulation Using a 10 Volt Input With Gain of 2 and 15 Volt Limit ................................. 144 4.30 Analog Circuit Simulation of 5-pt. Neural Network on CM-2 Showing Projected Real-Time Convergence ................... 145 5.1 Relationship Among Elements of the Natural Parallelism Paradigm 157 5.2 Applying the Natural Parallelism Paradigm .............. 158 A1 Diagram of the Cost Layout for Test Case A ............. 173 A2 Test Case A Cost Function Contours and Optimal Path ...... 174 A8 Synthetic Terrain Data for Test Case 8 ................ 176 A4 Test Case 8 Cost Function Contours and Optimal Path ...... 177 A5 Test Case C Cost Function Contours and Optimal Path ...... 179 A6 Test Case D Cost Function Contours and Optimal Path ...... 180 A] Diagram of the Cost Layout for Test Case E ............. 181 A8 Test Case E Cost Function Contours and Optimal Path ...... 182 A9 Diagram of the Cost Layout for Test Case F ............. 183 A10 Test Case F Cost Function Contours and Optimal Path ....... 184 A11 Diagram of the Cost Layout for Test Case G ............. 185 A12 Test Case G Cost Function Contours and Optimal Path ...... 186 List of Symbols ¢ - Potential field (scalar) p - Resistivity a' - Conductivity j - Current density field (vector) t - Time - Electric field (vector) - Resistance E R V - Voltage W - Synaptic weight n — Neuron input voltage 1: - Neuron output voltage Chapter 1 Introduction 1.1 Nature of the Problem Many applications can be described as problems in combinatorial search. These problems require searching[3, 50, 72] through the states of a problem by tracing paths from an initial state to a final state along the allowable transitions between states. The purpose of the search is to find the best path according to some objective function which is defined over the states of the problem[57]. The objective function defines the criterion for the search. The best path is the one which minimizes the objective function when evaluated over the problem states on the path. The problem states are typically defined through the use of a cost function. The cost function is a hypersurface in the multidimensional state space of the problem. Figure 1.1 is an example of a problem space showing the risk field of a cost function. The risk is shown by the use of contour lines which describe the elevation of the function’s values. The best path is the path 'P which minimizes the integrated risk along the path between the start and goal states of the problem space. This is defined by min/1,013) da (101) where 0(a) is the cost function which defines the risk along some path ’P which begins at the start state and ends at the goal state of the problem. The best path is also shown in the Figure 1.1 as a dashed line which is the optimal path between its end points for the cost function. 1.1.1 Search Requirements Make Demands on Architectures The states of the problem and their transitions can be represented as a di- rected graph (digraph). As problems grow in size and computational complexity, their associated digraph becomes more complex. This complexity is due to the number of nodes and interconnections between the nodes of the digraph. Very complex problems make severe computing demands for search algorithms on serial processing architectures[53, 54]. A general purpose sequential processor, or even a group of multiprocessors, may not be effective in solving traditional algorithms for these demanding prob- lems. This is because of the enormous number of possible solutions to these problems. The search problem is a very complex issue for algorithm design. In general, the complexity of the search problem is exponential. As will be shown in Chapter 2, the search problem can be represented by a tree. A solution path in the search problem is then found by tracing the paths in the search tree from its root to a terminal leaf node. If there are B successors to a state in the search 4 problem, then the tree has a branching factor of B. Therefore, each new level in the search tree will grow by B new nodes for each node in the previous level of the tree. This growth in the search tree is often referred to as combinatorial explosion. The general search problem with unrestricted state operators will have a worst case path which includes all states of the search space. Thus, the total number of solutions is 0(B"), where n is the number of states in the search prob- lem. This exponential number of solutions to the fundamental search process is why it is an NP-complete problem, even though there are only 11. state in the search space. The Von Neumann architecture, a serial processing architecture, requires al- gorithms which are defined as serial instruction streams. Even multiprocessors, while simultaneously processing in parallel as a group, are processing serially as individual processors[19]. To meet the requirements of larger, more complex problems, the processing architecture and algorithms need to be more closely matched to the nature of the application problem[62]. 1.1.2 Massively Parallel Architecture Needs Advances in integrated circuit and optical computing technology have made implementation of massively parallel computer architectures feasible. These ar- chitectures are more complex than multiprocessor networks, often inspired by the structure of the brain as in artificial neural networks[31, 41]. Discovering properties of computation in these new architectures is a contemporary issue in computer science. Computation is not specified, or programmed, in the usual manner of 5 a stored program computer; rather, the collection of interconnection strengths between nonlinear parallel processing elements of these networks determine the computation of the processing system[29]. As artificial neural network technology is developed to implement solutions to search problems, obstacles need to be overcome in designing realizable hard- ware. One such obstacle is the need for a method to design the architecture of the artificial neural network. Several questions regarding a neural network need to be answered before an architecture can be defined to solve the problem: How many interconnections are required between the neurons? What pattern of in- terconnection is needed? How does the pattern of interconnections realize the numerical solution of the problem? What should be the definition of the intercon- nection strengths? Does the number of interconnections influence the feasibility of the hardware realization, the convergence and quality of the solution, or the speed of the network? What function does the neuron need to compute? How does the function realize the numerical solution to the problem? While the previous questions describe the architecture needs, they do not ad- dress any process to use in designing architectures. Other questions illustrate needs in the design process. How does the development of a particular archi- tecture relate to the algorithm solution? Is the algorithm needed first, and then the architecture definition follows, or does it make more sense the other way around? Is there an interaction between these needs? Is there some underlying mathematical model of the problem which will solve this algorithm/architecture in- terrelationship? If so, is the mathematical model a special one for each problem, 6 or is there some general principle or model which will solve the problem? How can the mathematical model be mapped into the artificial neural network? What class of problems can the resulting design compute? Questions such as these have motivated this research. The research was directed to answer these questions, although not necessarily in the order given above. Taken as a whole, these questions aid in revealing the issues that need to be understood in developing a design methodology for search architectures using massive parallelism. These design issues are taken up in the subsequent chapters of this dissertation. 1.1.3 The Architecture Design Challenge The traditional approach to algorithm conceptualization is influenced by the designer’s own sequential thinking processes and understanding of the imple- mentation of algorithms on serial processors. Parallel architectures developed from multiprocessor networks have incorporated serial processors[19, 33]. The design challenge of these networks address the control and data partitioning needed to map serial algorithms onto multiple communicating serial processors. This is not an effective method to develop massively parallel computing architec- tures. The enormous number of processor nodes and interconnections is beyond the designer’s ability to partition algorithms due to the explosive number of parti- tion combinations. Massive parallelism is characterized by collective computation and as such a new approach is needed for thinking about how the roles of appli- cation, algorithm, and architecture are interrelated in a design. This relationship 7 creates a design challenge for determining interdependence models between the application problem to be solved, the algorithm to solve the problem, and the massively parallel architecture which computes the solution. A simpler paradigm for architecture design must address these three related issues. 1.2 Objective of the Research 1.2.1 Research Motivation The motivation of this research is the investigation of computer architectures and algorithms characterized by massive parallelism which have computational properties that yield the solution to combinatorial search problems[30] with ap- plication to path planning[6, 13, 20, 55]. The central thesis of this research is that a massively parallel computing architecture can be defined in which the com- putational units correspond identically to the explicit problem states and exhibits convergence to values which represent a solution to the particular problem. Ques- tions fundamental to investigation of this central thesis are: 1. What is the theoretical basis for a model of path planning in the central thesis? 2. What is the computational definition of the massively parallel architecture in the central thesis? 3. How does the amount of interconnection among computational units in the massively parallel architecture influence its accuracy and convergence? 4. What is the performance of the massively parallel architecture? 8 5. What, if any, is the fundamental principle which unifies the distinct, yet in- terrelated roles of architecture, algorithm, and application in the definition of the massively parallel architecture of the central thesis? 1 .2.2 Research Method In this research, analogies in nature are used to design massively parallel architectures for path planning. These architectures are analyzed for their effec- tiveness in computing solutions to the search problem. The tool for this research is the simulation of the architecture models. This provided a method to study the topological properties of the architectures for their influence on solution con- vergence and performance characteristics through experimental verification. The solutions generated by the architecture simulations were checked against known admissible algorithm results. 1.2.3 Massively Parallel Architecture Approach There needs to be an understanding of how to effectively determine the archi- tectural definitions of interconnection strengths and processing element functions for performing desired computations. This dissertation describes the problem, and a solution methodology to design massively parallel architectures which use new parallel models for path planning applications. This new approach is called the natural parallelism paradigm. This paradigm uses analogies in nature to develop massively parallel architecture designs. This research has shown that the paradigm is an effective methodology for designing 9 massively parallel architectures to solve combinatorial search problems such as path planning. 1.3 Remaining Organization of the Dissertation Chapter 2 is a review of the literature which is background to this research. Tra- ditional serial search algorithms for this problem are discussed. Multiprocessor architectures are discussed and developments in massively parallel architectures, particularly artificial neural networks, are discussed. Finally, the combinatorial search problem used in path planning is described. Chapter 3 and Chapter 4 investigate the development of the parallel architec- ture for path planning. In Chapter 3, the mathematical model of a new parallel approach is developed. The parallel architecture design for the model is also de- scribed. In Chapter 4, the results of the architecture design for the path planning application are presented. In Chapter 5, the key discovery of the new design paradigm of natural par- allelism for massively parallel architectures is described. The natural parallelism concept, motivated from nature, is at the heart of the paradigm to enable map- ping algorithms onto these architectures. The significance of the paradigm is discussed. The path planning results are discussed and its performance is ana- lyzed. Chapter 6 is a summary of the research. The dissertation concludes with recommendations for further research. Chapter 2 Searching, Parallelism, and the Path Planning Problem This research concerns a new search technique and its application to path planning using massively parallel architectures. In this research, architectures are investigated for solving the parallel implementation of path planning. To develop an understanding of the background to this research, the interrelated roles of algorithm, architecture, and application in designing solutions are reviewed as distinct topics. The topics reviewed begin with a general discussion of search algorithms. The search algorithm discussion is be followed by some background discussion on traditional parallelism of multiprocessor computer architectures. Finally, the problem of path planning for aircraft trajectory generation is described. 2.1 Search Algorithms Conventional serial-processing search algorithms have addressed adversarial game tree search, path search, and shortest path search problems. Much re- search regarding combinatorial search algorithms have focused on traditional algorithm development and analysis for sequential processors, or distributed 1O 1 1 multiprocessor architectures. Many problems are solved using search[3, 50]. Winston[72] describes search as exploring alternatives. Such an exploration deals with situations in which choices are made, leading to new situations and more choices. This is a very common scenario in artificial intelligence endeavors. An optimal solution to a problem is obtained when the search results in the best possible answer. But in many cases, the best solution may be too costly or prohibitive to find, so reasonable solutions are suitable alternatives. Good ap- proximations to optimal solutions often serve well as satisfactory results. People are an example of problem solvers which are satisficers, not optimizers. A satis- ficing solution is often more apprOpriate because worst case situations requiring optimal solutions rarely arise in the real world. Among the many uses of search algorithms, two major classes are adversarial board game problems and path search problems. The adversarial scenario looks for solutions leading to a winning outcome for the game. The path search problem finds routes through a problem space between initial conditions and problem solutions. This latter search problem can also require finding the best path which meets some defined criterion. To generalize search algorithms, a method is needed to provide a mapping of the particular problem into a uniform description. This description is used by the search algorithm to solve the problem, and is called the problem space representation. 12 2.1.1 Problem Space Representation The problem space representation is often a directed graph. A graph is a set of nodes with a set of edges which connect pairs of nodes. If an edge in the graph is not bidirectional, then it is called an are. A graph which contains arcs is called a directed graph (digraph). When a graph contains an arc from a node n,- to node nj, the node n,- is called a successor of node 11,-, and conversely the parent of n,- is n,. Arcs can be weighted with a value which represents the incremental cost from a node to its successor. In many situations a particular directed graph called a search tree[72] is used. The search process is a traversal of this directed graph where each node of the directed graph is used to represent a state of the problem[63]. The arcs of the digraph define the state transition relationship between the states connected by the arc. The digraph can be represented either explicitly or implicitly. In the former representation, the digraph is defined by enumerating the entire set of states. The arcs of the digraph are shown pointing from one state to another state. In the latter representation, the digraph is defined by establishing an initial set of states and operations which transforms one state into another state. A search tree can be constructed from a digraph by expanding the possible successors to each node of the graph, beginning with some particular node as the root of the tree. Trees are expressed in geneological terms such as parent and child. A node with no parent is the root node. A node with no children is a 13 terminal node. Expanding a node means defining its children. A search algorithm is a procedure for finding a path from the root node of a tree or graph (the start state) to a goal node (leaf). The first consideration of any search algorithm begins with the representation of the search space. This choice is either an explicit representation, or an implicit one. For example, the explicit representation in Figure 2.1 can be represented implicitly as 2 4 5 6 7 8 1O 11 12 13 l 14 15 17 1 3 9 Figure 2.1: Explicit Search Tree Representation Example for a Search Space ROOT=1 SUCC(n) = {mlm = 311— 1,311,311 +1},1Sn g 5 Each node has only a finite number of successors (which excludes an infinite- breadth tree). The depth of the search tree, however, may be infinite. Arcs of the search tree often have costs associated with them. 14 State Spaces Problems may be defined as state spaces. Searching then occurs in an envi- ronment called the problem space, which has a set of states and operators[57]. These states describe the problem. An instance of the problem is defined by including a particular initial and final state with the problem states. Then problem solving becomes a state space search problem [63]. State space search is a prob- lem solving method where the problem is encoded into a state representation, and the problem is to move through various configuration of the state space. This is done by combining the allowable move operations of the problem with search techniques. The searching task solves the problem by finding the sequence of operators which converts the initial state into the final state. This sequence is the problem solution, and is described as a path through the problem space from the initial state to the final state. In problem spaces which correspond to physical space, the resulting path represents a physical route. 2.1.2 Game Tree Search An example of searching as exploration is common in board games. Adver- sarial choices are made to lead to winning (and losing) situations by the players. Some of the search algorithms used for this kind of problem are minimax and alpha-beta pruning. The search tree has been used in computer programs as the data structure for modeling chess and other games. In adversarial board games the search tree 15 nodes correspond to the board configurations[72]. These board configurations define the game state. This tree is used to represent the possible continuation of the game from the current situation of the game via the moves of the game. The branches of the search tree transform the situation which corresponds to moves of the games. Since the tree represents the possible game continuation from the current situation, it is called a Iookahead tree. The root of the tree is the current game configuration of the game pieces on the board. The successor nodes represent the new game configurations resulting from the choice of a single legal move of the game. The leaves of the tree represent the possible game configurations after a number of moves which correspond to the height of the tree. Alpha nodes of the tree correspond to board configurations from which the program (the maximizer) selects the next move, and beta nodes correspond to the board configurations from which the opponent (the minimizer) selects a move. Each tree level thus alternatively consists of alpha and beta moves, with the root being an alpha node. The leaves of the search tree are examined to determine a value of the re- sulting board configuration, since the leaves of the tree may not represent game termination. This value of the leaf is used as a numerical estimate of the relative strengths for the players in the leaf’s board configuration. An analyzer converts the board configuration judgments into a single number for the quality of the situ- ation. This is called the static evaluation score. A positive score favors one player called the maximizer, and a negative score favors the minimizing player. Evenly 16 matched board configurations have a value of zero. This numerical value of the static evaluation score is calculated using a static evaluation function (SEF). The minimax backup procedure is used to propagate these values up to the root of the tree. For an alpha node, the greatest value in its successor nodes is selected, which represents the program making that move because it will be the best new board position. The least value is selected for the beta nodes, because this represents the move of the opponent, who is assumed to be hostile and perfect and, therefore, will make moves which are the worst new board position for the program. Thus, alpha node values are the maximum of its successors’ values and beta node values are the minimum of its successors’ values. The value at the root node indicates the maximum value available for the search tree. Alpha-beta search is a minimax procedure for pruning the growth of the game tree[72]. It generates the successor nodes for the limited depth search tree in a depth-first order, and combines the development of the Iookahead tree and the leaf node evaluations with the minimax backup procedure. This allows the omis- sion of the development of subtrees by comparing the backed-up values of nodes and cutting off their generation when they will not contribute to improvement in the minimax values. 2.1 .3 Path Search Another use of search for exploring alternatives is finding the path from a start node to a goal node in a problem space. A path from node 11,, to node as. is a sequence of nodes as. ,n,,,. . . ,n,,_, where every node “i,- is the successor of the 17 node n,-5_,, for j e [2, k]. A node n5 is a descendant of the node n,- if there exists a path from n,- to n,. When the length (or cost) of the path is not important, simple procedures such as depth-first search, breadth-first search, hill climbing, beam search, and best- first search will suffice to find a solution path[72]. These procedures are easy to implement, but usually find only sub-optimal paths. In depth-first search, one alternative is selected at every node. The search then works forward from the selected node. The remaining alternatives are ignored at first. As needed, the alternatives are selected in the order they were generated from this parent node expansion. The algorithm begins by selecting the root node. A successor of the root node is then selected. If the successor meets the goal condition, the algorithm stops. Otherwise a successor of the successor is selected, and the evaluation is continued. If there are no further successors, the previous available successor IS selected and the search continues. The algorithm is a very intuitive approach to searching. It is good for small trees with many goal nodes. However, it is not guaranteed to find an optimal solution path, even if one exists. I-lill climbing uses a heuristic measure of the distance to a goal. The most promising alternatives are used first in a depth-first search fashion. The child nodes must be sorted to determine the best alternative. Three problems occur with the hill climbing approach. The foothill problem occurs when there is a sec- ondary peak in the problem giving locally optimum results. The plateau problem occurs in large flat regions because no local improvement occurs. The ridge 18 problem occurs where none of the directions show improvement and requires more alternatives to detect the local improvement. Breadth-first search uniformly pushes into the search tree by levels. In breadth- first search, each successor of the selected node is examined against the goal condition. If a successor meets the condition, then the algorithm stops. Oth- erwise, all successors of the previous successor are examined against the goal condition, and the algorithm continues. This algorithm is very easy to implement. If a solution exists, it is guaranteed to find it. Also, the solution will have a minimum path length (tree height, but not necessarily minimum cost). Beam search is like breadth-first search by level. It is restricted to move down by only the to best nodes, where w is the width of the beam. Best-first search extends the best partial path [72] by expanding the best open node. However, this requires sorting all of the open nodes. Breadth-first search, depth-first search, and best-first search can be general- ized to state space search. The states which are contained in the frontier of the search tree (the leaf nodes) are elements of the set called OPEN, and the interior nodes of the search tree are elements of the set called CLOSED. Consider an OPEN set which contains the search frontier as a specific data structure in the context of the following state-space search algorithm: 1. Put ROOT in OPEN 2. If OPEN = , then STOP-FAILURE 3. Remove the “best” state, 71., from OPEN, and place into CLOSED. If n e 19 GOAL, STOP-SUCCESS 4. Generate successors of n and place them on OPEN 5. Go to STEP 2 If the OPEN set is implemented as a stack data structure, the above state- space search algorithm is depth-first search. Implementing OPEN as a queue results in the breadth-first search algorithm. In step 3 the removal of the “best” state n is accomplished by the remove function for the stack or queue. In order to actually remove the “best” state as in best-first search, the contents of the queue need to be ordered by some evaluation function. Thus, if the data structure is implemented as a prioritized queue ordered by the evaluation function, the resulting algorithm is best-first search. Each algorithm has its advantages in certain situations[72]. Depth-first search is good when blind alleys are not too deep. Breadth-first search is good with few alternatives (small branching factor). Hill climbing is good when there is a natural measure for the goal distance, and the likely choice is a good one. Beam search is good when a natural measure exists and the likely partial paths are shallow. Best-first is good when there is a natural measure and good paths look bad at shallow levels. 2.1.4 Shortest Path Search A third class of search algorithms are used when the best path is needed. These algorithms are more complicated, because the cost of traversing the path 20 is important for determining the best path[72]. Examples of algorithms used for shortest path search are British Museum search, branch and bound search, discrete dynamic programming, A* search, and simulated annealing. The British Museum procedure is to find all possible paths, and then select the best one. This is like breadth or depth-first search, but without stopping on a goal. Conceptually, it is a very simple approach. But it obviously suffers from combinatorial explosion, and is not very useful for other than simple problems. Branch and bound search maintains a set of partially developed paths. It proceeds by extending the shortest incomplete path. It terminates when the shortest incomplete path is longer than the shortest complete path. Branch and bound is good for big search trees, and when bad paths turn very bad fast (at shallow levels) [72]. An improvement in these algorithms is to discard redundant paths. This is done by deleting paths to common nodes which have a larger cost than the mini- mum path to the same node. This technique underlies the dynamic programming principle[72]. In looking for the best path from start to goal, all paths from the start to any intermediate node except the minimum can be ignored. Bellman[4] introduced the dynamic programming concept for decision making processes. Dreyfus and Law[14] have extended this dynamic programming principle of op- timality to unconstrained shortest path search: The minimum cost path from A to B has the property that, whatever the initial cost at A, the remaining path to B, starting from the next node after A, must be the minimum cost path from that node to B. 21 Dynamic programming is good when there are many paths to common nodes [72], such as with problem graphs defined on grids, where each grid point is a state node in the problem space. Underestimates can be used to improve the efficiency of these search proce- dures by guessing the distance remaining to the goal. Several search ideas come together in the A* search algorithm. It is branch and bound search together with a distance estimate and the dynamic programming principle[72]. Additional notation is required for the following discussion of the A* search algorithm. Functions are defined that describe costs associated with the paths. When it is desirable to assign a cost to an arc to represent the cost of applying the transition represented by the arc, the notation c(n,-,nj) is used to represent the cost of the arc from n,- to n,-‘. The least path cost from the start node, a, to any arbitrary node, n, is denoted as g*(n). A path from s to n which had this cost is an Optimal path from .9 to n. The least cost from any arbitrary node n to some goal node 9 is denoted as h*(n). The function f*(n) = 9*(n) + “(M (2.1) then represents the least cost from the start node .9 to the goal node 9, with the path constrained to include the node n. This path through n however, may not necessarily be the minimum cost path from s to g. In the following algorithms leading to A*, choices are made about which nodes to expand based upon some estimate of the “worth” of the node. g(n) is used to estimate g*(n) and h(n) is 1Costs may be associated with the nodes. In such a case, the cost c(n,) at node n, can be assigned to the arc of each successor node at,- of m. where the resulting cost for the arc is c(m,n¢,) = c(m) V m, E SUCCESSOR(m) 22 used as an estimate of h*(n). Thus, the evaluation function used in the choice of the next node to expand is f(n) = 9(n) + h(n) (22) Every node n which has been reached during the search belongs to one of two sets. 11. e CLOSED if n has been examined and is no longer in the search frontier. n 6 OPEN if n is a candidate for immediate exploration due to being in the search frontier. A* is a graph search with knowledge (heuristic estimate). In A*, h(n) is an underestimate of the cost from n to a goal node. Rather than optimizing g(n), A* will optimize2 the function f (n) = g(n) + h(n). A good estimate, h, is a prerequisite for a good A*. Goodness is dependent upon the admissibility of the estimate and efficiency in evaluating the estimate. The A* search algorithm is as follows: 1. Initialize by placing ROOT in OPEN, g(ROO'l) = 0, calculate h(ROOT), and f(ROOT) = h(ROOT). 2. Select N such that f(N) = min{f(n)|n 6 OPEN} Test if N is a goal. If yes, then STOP-SUCCESS with PATH(ROOT,N) the minimum cost solution, cost = g(N). lf OPEN = , STOP-FAILURE. Test if there are any more nodes, N, such that f(N) = min{f(n)|n 6 OPEN} ‘ 2For any node n which is on the optimal path, A* will optimize 9(a). 23 If there are, check to determine if any of the nodes satisfy the goal conditions and STOP-SUCCESS, otherwise choose one randomly to be the next N. 3. Put N in CLOSED. For all n,- such that n,- = successor(N), calculate g'(n,-) = g(N) + c(N,n.-) Test if n,- was previously opened and has had previous g(n,-) g g'(n,-). If so, then ignore n5. Otherwise, put n,- in OPEN and set g(n,-) = g'(n,-) and calculate f(n,) = g(n,-) + h(n,-). Set path pointer from N to n,. Go to STEP 2 These are some of the algorithms used for computing shortest path solutions on sequential processors. They provide a background of the search problem and historical approaches to finding shortest paths. These algorithms were all deterministic. In the next section, a non-deterministic procedure is described which has been used for shortest path search. 2.1.5 Simulated Annealing Shortest path searching can be considered a combinatorial optimization pro- cedure. In combinatorial Optimization, the function to minimize is called the ob- jective function[56]. The constraints on the optimization are the set of feasible options in the solution to the problem. A problem is combinatorial when the configuration elements are finite, or a countably infinite set. The annealing concept has been posed as an applicable general routine to use on combinatorial optimization problems[56]. Annealing is a physical process 24 in condensed matter physics[70]. A solid is placed in a heat bath and heated to increase its temperature to a maximum value where its particles are then ran- domly arranged into a liquid. The liquid is then cooled by slowly lowering the temperature of the heat bath. As it cools, the quuid’s particles are arranged into low energy ground states of a corresponding lattice. This process requires a sufficiently high temperature and slow cooling. Thermal equilibrium is reached at each temperature with the energy states arranged according to a Boltzmann distribution. As the temperature decreases, this distribution concentrates on the low energy states. In simulated annealing[35], minimizing the cost function of an optimization problem is analogous to the slow cooling of a solid reaching a low energy state[70]. The cost function is substituted for energy and executing the Metropolis algorithm is analogous to the slowly decreasing temperature. Simulated annealing is also known as probabilistic hill climbing[70]. The Metropolis algorithm is a simulated evolution to thermal equilibrium of a solid for a given fixed temperature, T. It is a Monte Carlo method which generates a state sequence of a solid. A small random perturbation of a particle is applied. This new state results in a difference in the system’s energy, AE. If AE < 0, the process is continued for lower energy states. If AB 2 0, the new state is accepted with the probability e-AEN‘BT. This acceptance rule is called the Metropolis criterion[70]. To use this on a combinatorial optimization problem, a sequence of config- urations of the problem space are generated. These configurations are like the 25 states of the solid. The cost function is like energy, and the control parameter is like temperature. The simulated annealing algorithm is a sequence of runs of the Metropolis algorithm which is evaluated at sequences of decreasing values of the control parameter. To use this approach, a unique representation of the state configurations needs to be defined. This representation must proscribe a procedure for mea- suring the quality of a configuration, and there must be a neighbor relation for the configurations. A difficulty with simulated annealing is that there is no definitive answer for how to formulate and control the annealing process. This review has provided a look at several sequential path searching algo- rithms and procedures which have been developed for execution on sequential processors. Because of their intended use, they have an attribute which is not only sequential but of singular activity. This attribute can be seen in cognitive psychology terms as the focus of attention and the serialization of the human thought process in problem solving. The tie between the serial thought process and resulting serial processing designs must be acknowledged. New massively parallel architecture design approaches will be more viable if they recognize this traditional relationship in existing architectures. 2.2 Computer Architectures and Parallelism Parallel processing is a system consisting of at least two processing elements which are capable of simultaneous operation[19]. Multiprocessing has focused on integrating multiple functional units into a parallel processing system. Central 26 processing unit concurrency is obtained by either replicating functional units for concurrent operations on different problems as in a multiprocessor, or by pipelin- ing, which performs operations incrementally on a stream of data at stations within the pipeline. Using a definition of processing as the application of functions to data, or oper- ations performed upon data (memory), the distribution of processing means that this processing is applied to multiple data simultaneously (either synchronously or asynchronously) in a non-serial fashion. This is to be distinguished from cen- tralized processing where all data operations occur in a single processor, which results in serialized computation. In their definitions for parallel computer structures, Hwang and Briggs[33] consider different configurations of parallel computers. They relate these config- urations to a type of parallelism which results from the architecture. Temporal parallelism results from pipelined computers by the use of overlapped computa- tions. Spatial parallelism is achieved from multiple synchronized arithmetic logic units in array processors. And asynchronous parallelism is achieved in multipro- cessor systems by a set of interactive processors with shared resources such as memories, databases, etc. 2.2.1 Von Neumann Multiprocessing The Von Neumann machine is a uniprocessor which had the same basic struc- ture for all early stored program machines[19]. The prevalent structure through 1958 consisted of four main functional units. These units were input/output, 27 memory, control, and the arithmetic/logic unit (ALU). The weakness of this ar- chitecture is routing all of the I/O through the ALU. This made for high hardware costs because of sharing costly buffer registers. In addition, performance was degraded because no processing occurs during l/O. In the mid 1950’s direct memory access (DMA) became available. The data flow paths were altered to allow direct memory access for l/O. The control unit still was responsible for l/O operations, however, and some degraded performance continued during l/O. l/O channels then became available. These independent channels separated the ac- cess paths to and from memory, and separated control for the HO operations. Essentially, the l/O channel became a small computer which operated simultane- ously (in parallel) with the main processor. The ALU operates free from l/O until interrupted. A multiprocessor system requires at least two CPU’s. The main processor memory is shared and accessible by all of the processors. l/O is sharable, and there is a single integrated operating system which controls the hardware and software of the multiprocessor. It has benefits of reliability because of the multi- ple units which may be reconfigurable. System performance is improved through the parallel execution of independent tasks. Distribution of memory does not con- strain distributed processing which occurs locally. However, as local processing yields results which are shared with other remote processing units, the transmit- tal time effect of this sharing of data may become evident as a constraint on the global processing behavior. The system hardware can be organized in several ways. Single access port 28 units can be time-shared or placed on a common bus. There is no direct connection between function units, but rather a parallel connection to the bus. With a crossbar switch matrix any memory module can be connected to a pro- cessor or l/O unit. This is space-division multiplexing. When memory has more than a single access port, a multibus or multiport connection architecture is pos- sible. However, this requires additional conflict resolution logic to arbitrate si- multaneous requests for memory access. Pipeline systems perform the same operation repeatedly on several items in a single data stream. Array or vector processors perform the same operation on large collections of data which are related concurrently. Coupled systems are connected electrically and share common hardware re- sources. lndirectly coupled systems have no interaction between their programs except at the data set level such as is implemented by a mailbox. A directly cou- pled system shares addressable storage with high speed channel connections. 2.2.2 VLSI Technology and Massive Parallelism Very large scale integration (VLSI) architectures are a dominant new technol- ogy for computer hardware construction [52]. VLSI offers more promising perfor- mance than that of conventional computing hardware[47]. Computer structures can be built by exploiting the properties of VLSI. In 1978, Kung and Leiserson[36] introduced the computational architecture known as the systolic array[24]. The systolic “array” is derived from its similarity to a grid, with each point of the grid as a processor, and the links of the grid consisting of the links between the 29 processors. This array characteristic of the architecture capitalizes on regular and modular structures which match the computational requirements of the problems they are intended to solve. Many classical sequential devices fail to yield results in reasonable time. The technology for faster sequential devices now appears to be approaching limits on speed by the laws of physics[18]. The great concurrency of processing elements and memory can be easily implemented in VLSI at many processing sites on a chip. An example is custom VLSI machines for massively parallel Al architectures. Parallel computers with 100,000 to 1,000,000 gates per chip make it feasible to fabricate massively parallel computers. Fahlman & Hinton[21] explain that exploring the use of massively parallel ar- chitectures is an attempt to get around the limitations of conventional symbolic processing. In the massively parallel architecture, the system’s collection of per- manent knowledge is stored as patterns of interconnection strengths among pro- cessing elements. The knowledge directly determines how the processing ele- ments interact rather than residing passively in memory, awaiting examination by a central processing unit. Comparatively slow (on the order of milliseconds) neuronal computing ele- ments with complex parallel connections give rise to complex, parallel processing with fast processing effects. Time is often the critical resource in many compu- tational problems. Neurons with a computational speed of a few milliseconds can account for complex behavior in a few hundred milliseconds. Entire complex behaviors are accomplished in a few hundred time steps of this computational 30 system. The fundamental premise of massive parallelism is that individual neurons do not transmit large amounts of symbolic information, but rather compute by appropriate connections to large numbers of similar computational units. For a given massively parallel network defined by its interconnection topology and individual procedures, there is a need to characterize the class of functions which are computable by the device, and how the functions are computed[18]. This analysis is needed so that these devices can be designed and constructed to provide these functions. A theoretical basis is needed for models of computation in massive parallelism. Neural networks[16, 17, 41] are a new computer architecture based on the massive parallelism of simple computational units (such as the neurons of the brain). These networks are based upon an abstraction of the understanding of the information processing properties of neurons. The development of neural network technology began in 1943 with the work of McCulloch and Pitts[46]. In their work, they showed how to use the biological neuron as a computational model to compute boolean operations. This type of computing is an alternative to the conventional Von Neumann architecture. A distinctive feature of neural net- works is their massive parallelism. These networks have high speed and inherent fault tolerance. Artificial neural networks are a computational paradigm which is motivated by a model of the biological neuron. Hopfield [29] states that Parallel analog computation in a network of neurons is thus a natural way to organize a nervous system to solve optimization problems. 31 In developing a massively parallel architecture, the individual computational unit must be defined. This standard model of a processing unit loosely corre- sponds to the information processing model of a neuron. For an artificial neural network to be useful, the network of units requires a means to make decisions or to perform some coherent action. This requirement implies that these networks need to converge to stable states to effect decision making and coherent actions. The architecture of neural networks is simple processors with a dense ar- rangement of interconnections. The processing power of a neural network is measured in terms of the number of updates per second of the interconnections, rather than instructions per second for Von Neumann machines. This measure is defined as the connections per second (CPS) of the neural network. Neu- ral networks are characterized by having more interconnections than processing elements[12]. This is in contrast to more traditional parallel processing architec- tures which have a lower ratio and incorporate Von Neumann machines as the processing elements. One problem is how to orchestrate the mapping of the designed architecture [2]. This includes the problem of the global control of local elements. How are the local computational rules changed to implement a general algorithm? This pro- cessing is coherent and parallel, and must be robust by not exhibiting sensitivity to the failure of a few individual nodes. Interaction of the devices is also an im- portant consideration. The topology of concurrent systems require locality prop- erties to reduce the interference from too much communication[47]. The foun- dations of the topology are found in the interconnections for processing. Wholly 32 interconnected arrays are not a viable general VLSI circuit design[2]. Techniques are needed which seek primarily local interconnections. The Hopfield neural network can be used as an associative memory and also to compute solutions to optimization problems[30]. It fostered the rebirth of in- terest in neural networks[12]. Each neuron in the Hopfield neural network is connected to every other neuron except for itself. The connection strengths are symmetric and provide for convergence to stable states of the network. The ob- jective function for the optimization problem is the energy function of the network. As the network is updated, its energy is reduced. This neural network has limita- tions, however. It gives limited good answers. For example, the network doesn’t work well above 10 cities in the traveling salesman problem (T SP)[12]. It has the advantage, however, that it is an inherently parallel architecture. A solution to an optimization problem can be mapped onto a Hopfield network by generating an appropriate quadratic energy function which describes the constraints of the problem. The network connections are derived from an energy function that produces the optimal solution at the lowest energy state of the network. The Hopfield network often fails to find valid solutions and is sensitive to parameterization and initial conditions. A procedure to map the problem onto the Hopfield network requires the fol- lowing steps[42]: 1. Select an encoding of the problem so the outputs of the neurons correspond 33 to the solution to the problem. 2. Define an energy function whose minimum corresponds to the optimal so- lution of the problem. 3. Derive connection weights and bias currents which represent the objective function and constraints of the problem. 4. Choose initial values for the neuron outputs so the network converges to a stable state which is a feasible solution to the problem. An advantage of the Hopfield network is that it is relatively simple to implement as an analog circuit. One difficulty though, is the physical implementation of the high gain limit. Also, a globally optimal solution can only be obtained from the network when the conductances are infinite[43]. It is difficult to define an appro- priate energy function which corresponds to the desired optimization problem. Some convergence states are local minima which represent infeasible solutions to the problem. The network is also very sensitive to parameterization and initial conditions. Constrained optimization problems can be mapped onto neural networks by using parameterized penalty terms for the violation of constraints[59]. The quality of the solution depends on the selection of the network parameters in the energy function of Hopfield. Assigning values to the cost function parameters is crucial to the kinds of solutions the network will be able to find. Network parameters are needed to get sensible solutions, and determining their value depends on how to select them, and how to tune them. A linear relationship has been found between 34 some of the parameters of the Hopfield TSP cost function[27]. Inhibition between the neurons increased for feasible solutions. The global bias to insure complete tours must also increase to stay in the feasible solution region. As the size of the problem (number of cities) increases, this feasible region narrows. This suggests that the network does not scale up, but rather can find fewer feasible solutions as the network grows[27]. The three most important aspects of artificial neural network research are algo- rithms, application, and architecture[32]. Algorithm mapping technology needs to be used along with CAD technology to derive VLSI arrays from neural network algorithms. 2.3 The Path Planning Application One application which can benefit from the development of search techniques using massively parallel architectures is path planning for aircraft or mobile robots. A very hard problem in this path planning application is trajectory generation, which requires the computation of the desired trajectory in an embedded real-time system. The tremendous computational demands for trajectory generation[54] make it a good candidate application for this research. The optimization tech- nique which has been extensively used in aircraft trajectory generation systems is dynamic programming. 35 2.3.1 Overview of Path Planning Three functions of a flight management system are trajectory generation from the mission plan[55], trajectory optimization for control objectives[20], and tra- jectory control[53]. Mission plans are defined by waypoint position definitions. Trajectory generation[6] attempts to maximize survivability by optimizing threat penetration paths for terrain masking[37]. A terrain and obstacle avoidance path planner called AUTOPATH was de- scribed in [13]. This algorithm is an optimization based technique to generate global flight trajectories. The global flight path incorporates terrain and obstacle avoidance instead of using local avoidance procedures to a preplanned global path. AUTOPATH is an example of a path planner which has several uses. One use is in mission planning. Mission planning uses apriori threat information and computes a baseline mission profile for the aircraft to use during flight. This pro- file is computed on the ground before the flight using general purpose machines with very good performance. A second use of path planning is to analyze the effectiveness of this baseline mission profile prior to flight. Effectiveness means the objectives of the mission can be met and the aircraft can be returned to base intact and undamaged. Mission effectiveness is a qualitative judgement which is often based upon a value which rates a flight profile. This value is usually a nu- merical measure defined as a probability of survival along the mission profile and considers the total threat exposure of the aircraft during the mission. The third use of path planning is the real-time onboard updating to the mission plan. This 36 occurs in-flight while the mission is underway and uses new threat information which is discovered by the sensor suite of the aircraft. Manual flight trajectory generation is accomplished by inserting waypoints be- tween which a smoothed flight path is created. To generate avoidance paths, additional waypoints are inserted in such a manner that the resulting smoothed flight path will deviate around the area to avoid. This technique works for threat areas which are well defined and have good separation. In addition, robots, drones, remotely-piloted vehicles, and particularly autonomous vehicles cannot use manual waypoints. Most threat scenarios involve regions where the threats are arranged in a very dense coverage with much overlap. Since there is no isolation between the threats within these regions, this manual type of avoidance trajectory generation is eliminated. What is required is a mathematical optimiza- tion process which generates trajectories with minimal threat exposure for the non-uniform continuum of threats which are associated with the dense threat coverage. The path planning problem consists of a large problem space, typically with several state variables. Path planning requires finding a path out of all the pos- sible paths in a multidimensional search space which is a path with the optimal performance measure. Due to its large search space, the path planning problem places severe demands on computational resources[54]. These problems often require real-time solutions unavailable from current technology. System response time depends on the throughput, memory, and interfacing capacities of the computer or computer network used for airborne mechanization 37 of trajectory generation. This problem is difficult to solve in a time expedient fash- ion using the currently available airborne processing technology, due to the mas- sive computational throughput requirement associated with an extensive search space. 2.3.2 Path Generation by Searching Aircraft trajectory optimization is an application of path planning by searching through a multidimensional search space for some path which has the optimal performance measure through the search space. Optimization techniques to compute minimum danger paths for mission segments involving enemy threats have been developed[54]. Survivability is an example of a performance measure of mission effectiveness. It is quantitatively defined in a measure called danger by accounting for the threat exposure of the aircraft during an entire mission. This includes threat lethality and terrain masking advantages of the mission profile. Lethality is taken to be the time derivative of the probability of kill PK. Aircraft speed effects are implicitly included in this measure since it is a danger rate. The quantitative measure of performance is usually expressed as a probability of survival P, at all points along the aircraft trajectory. This performance measure attempts to capture a numerical representation of the probability of successfully completing the mission without incurring damage or destruction from the threats. Survival is improved by finding a trajectory through the threat region for which the performance measure is maximized. Rather than maximizing the probability of survival performance measure, the problem is converted to the minimization of a 38 performance measure which is a monotone decreasing function of the negative logarithm of the probability of survival: D = .7-‘(— log P.) (2.3) This function computes a danger measure based upon the survivability in a dense threat environment taking into account the threat locations and lethality, aircraft exposure due to altitude, and masking effects from the terrain. This metric is used as the cost function for the search space by assigning a danger value to every state in the search space. Danger for any path is by definition: Dpath = (1D (2.4) path d0 d0 dt — [path—(E 8 _ path E5 8 (2'5) 1') . = —d = dt 2.6 [path V 8 path D ( ) The threat penetration problem was formulated as an optimization problem where a performance measure which is a function of the aircraft trajectory was defined. The threat penetration trajectory generation algorithm computes the tra- jectory which minimizes danger as the following performance measure (J): J(path) = [pm 1') dt (2.7) D is the danger index function representing the risk per unit time imposed on the aircraft by the threat environment as a function of horizontal position, altitude, and heading. Dynamic programming [5] is the mathematical technique used for deter- mining the trajectory which minimizes the performance measure by performing 39 a search of potential trajectories using a recursive formulation. By accumulat- ing transition costs over the sequence of node transitions making up a particular trajectory, the performance measure (J) for the trajectory is determined, i.e., J = 215,44 (2.8) AUTOPATH is a dynamic programming approach in which the performance measure for the mathematical optimization is defined as a military gain[13]. This gain is a usually a weighted sum of the probability of successfully completing a mission, and the probability of surviving the mission. It is implemented by partitioning the threat coverage region of space through which the aircraft flies into an array of small cells. These cells usually cover an area of one to four nautical miles (see Figure 2.2). The center of these cells are the coordinates of a TRAJECTORY Figure 2.2: Trajectory Generation Through a Region Partitioned into Grid Cells 4O multidimensional grid. Each cell has an associated vector of values which reflect the probability of survival based upon the threat location and models of lethality. The overall survivability of a path through the cells is the product along the path of the probability of survival through each cell[13]. A path is defined with a start cell, an end cell, and a contiguous set of intermediate cells from the start to end cell. The performance measure is implemented by setting the cell value to the negative logarithm of the probability of survival. There is a benefit in trans- forming the measure with this implementation. It is more desirable to work with summation than multiplication since it is a simpler computation. The logarithm accomplishes this, although it must be recognized that this transforms the prob- lem from maximizing products to minimizing sums. The negative logarithm of the probability of survival is proportional to the probability of kill. This is, therefore, a simpler computation which is equivalent to minimizing the probability of kill (not surviving). The generated mission flight trajectories are to minimize pilot risk (the inte- grated danger) due to the threats over the set of possible trajectories defined on a geographical grid of the mission[54]. Discrete dynamic programming is the optimization technique which has been used to compute trajectories on this grid. The state space of the trajectory generation problem is a discretized lattice of nodes. The trajectory is the successive transitions between adjacent nodes of the grid. The transition cost is the value of the danger at the node. The computing grid for this research is defined as an orthogonally connected mesh-connected processor array[47]. 41 2.4 Summary The algorithms described in this chapter are several of the traditional ap- proaches to solving problems using search. All of these approaches have cen- tered on algorithms for implementation on serial processors. Some research has addressed converting these algorithms into parallelized versions for execution on multiprocessors[1, 26, 34, 40]. Those approaches also begin with the serial perspective on algorithm design. This research will show a unique approach to searching that does not use a serial viewpoint, but rather a truly parallel model which encompasses the problem space representation as well as the algorithmic model. VLSI offers opportunities to implement feasible, large scale neural networks to achieve massive parallelism. Optimization problems, such as shortest path search, can be mapped onto neural networks. However, the Hopfield network approach suffers from several difficulties. It is difficult to encode problems using the energy function minimization approach. Several parameters are required and little is known on how to specify their value. Evidence indicates that the network does not scale well for larger problems. Later chapters will describe a neural network which avoids these problems, since it is based on existing numerical analysis and has a good mathematical basis. Trajectory generation for aircraft is a well studied application of path planning search. It has been mapped into a multidimensional grid representation with each cell of the grid containing the cost of flying through that cell. The trajectory 42 optimization problem then is to find the minimum cost path through the grid by searching. This application has been studied using discrete dynamic program- ming. In later chapters, this application will be used to illustrate results obtained from the massively parallel neural network architecture design for trajectory gen- eration. Chapter 3 Parallel Architecture Model For Path Planning This chapter introduces a new parallel model for the shortest path problem[62]. The natural phenomenon of electrostatic fields are shown to have a similarity to path planning with multiple, alternative solution paths. Analysis of the electrostatic model results in a partial differential equation and its boundary conditions which correspond to the path planning problem requirements. A finite difference approx- imation for computing the numerical solution to the partial differential equation is also derived. This finite difference approximation is the basis for a new neural network designed to compute the numerical solution. Finally, a technique is de- veloped for increasing the amount of connectivity in the neural network in order to decrease the solution time. 3.1 Overview Of An Analogy In Nature The basis for this new model of path planning lies in the understanding that problems can be described by physical analogies in nature. This new path plan- ning model uses an analogy of electromagnetic field theory for the mathematical 43 44 model. The solution to the application problem is provided by a partial differential equation in field theory. The model taken from electromagnetic phenomena is based on electrostatics. The electrostatic model is used to compute multiple, parallel paths which avoid regions of high cost. The mathematical model describes paths through a region containing a variable cost function as a problem in mathematical physics. One such problem is finding the distribution of current flow through a nonuniform conducting media such as a plate of nonhomogeneous resistive material. This physics problem is described using partial differential equations. The cost function for the path planning problem is analogous to the nonuniform resistivity of the media. The solution to the Laplacian partial differential equation is a potential field in the media. Paths are computed orthogonal to the equipotential contours of this field. 3.2 The Electrostatic Model for Parallel Path Planning The electrostatic model is a physical model in nature which is used for the natural analogy of parallel path planning. The natural analogy contains corre- sponding entities between the physical model and the abstract path planning model. The paths of the path planning model correspond to the current flux lines of the electrostatic model. These flux lines describe the distribution of current flow. The variable cost region of the path planning model corresponds to the nonuniform conducting medium of the electrostatic model. A plate of nonhomo- geneous resistive material is an example of such a medium. With this analogy, 45 finding good paths is like describing the manner in which current flows through the medium. The solution of the distribution of current flow can be used as a corresponding solution for path planning. When a potential difference is applied at two different points in a resistive media, current flows along paths between the points where the different potentials are applied. The point with the higher potential is the source point, and the point with the lower potential is the sink point. The paths of current flow are determined by the resistivity of the media and the locations of the source and sink points. The distribution of current flow along these paths is optimized by nature in some fashion. Current path distribution is less dense through regions where the media has a higher concentration of resistivity. Conversely, regions with more conductivity (lower resistivity) have a greater density of current distribution. Current flow is along flux lines which correspond to the current density field. The current density field is continuous, so there are an infinite number of flux lines from the source point to the sink point. From the source to the sink, the current flux lines can be traced along the direction specified by the current density vector field. (Just as magnetic flux lines can be seen between the poles of a bar magnet by sprinkling metal filings upon a sheet of paper placed on top of the magnet.) In the electrostatic model for path planning, several analogies to nature are defined. The variable cost function for the search problem is represented by the nonuniform resistivity of the media. This media has the same dimensionality as the problem space. The range of each dimension of the problem space defines the extent of a grid upon which a spatial sampling is performed for the architecture. 46 Time may be explicitly represented as one of the dimensions, if the cost function is a function of time. The source and sink locations represent the start and goal nodes, respectively of the search space. 3.2.1 The Mathematical Model The mathematical model is based upon electric field theory[51]. From the definition for current density and its divergence in steady state conditions the N- dimensional electric field potential ¢ E ¢(2:1,2:2,. . . ,mN) with 2,- as orthogonal coordinates of a nonuniform conductive media can be derived [60] as the solution to the second order partial differential equation V2¢ + pva.v¢ = o (3.1) where p E p(:c1,$2, . . . ,mN) is the nonuniform resistivity of the media, and a' = p“. The cost function is modeled by the resistivity of the conducting media. Expanding the gradient and divergence operators in the two dimensions 2:,y results in the following second order partial differential equation 432:: + d’yy + P012962: + 90.11451: = O (32) Since the electric field and current density results from the gradient of a scalar field, they are conservative fields, and the field lines emanate from source charges and terminate on sink charges[71]. These field lines represent possible solutions to the search problem as multiple paths. Figure 3.1 is an example of this concept. This example is the field which results from dipole charges on a uniformly con- ductive surface. This corresponds to the path planning problem in a region which 47 l 0 0,50 0.25 (175 (II ‘~ 1 / -. an: 1.81:! ’ - 7931033 . " x o a \Tfi ~ . I; c \\ ‘~.'.-. \ 0‘. . Figure 3.1: Example of Field Lines Used as Multiple Paths 48 has a constant cost. The least cost path between the end points is a straight line. This is shown by a straight field line in the figure. Additional curved field lines are shown which “bulge” out from the straight field lines between the dipole charges. These curved lines then correspond to multiple paths of the corresponding path planning problem. These multiple path solutions do not include all of the possible solution paths in the exponential search space. In using the field lines as paths, the solution space is restricted to include only those paths which do not share any path segments. The multiple paths will not split from any common path or join together into a single path. The multiple paths all originate at the start state as unique alternatives. Background Theory This section reviews those aspects of electric field theory which are pertinent to the model. For a more extensive explanation of electromagnetic phenomenon, see [45, 51, 71]. The following analysis will be used to derive Equation 3.1. The divergence of the current density j is defined as the charge (p) rate of change In the steady state condition, no charge distribution changes occur Q: P _ E — o (3.4) Thus, in the steady state condition, V-j = o (3.5) (I) the Th: 49 The current density is defined in terms of the electric field E and the conductivity 0' as 1' = UE (3.6) Substitution of Equation 3.6 into Equation 3.5 gives V - (0E) = o (3.7) or, equivalently through a vector identity that Va-E+aV-E=O (3.8) The electric field is also defined as the negative gradient of the potential field ¢ = —V¢ (3.9) Substituting Equation 3.9 into Equation 3.8 yields Vo- (—V¢) + a'V . (47¢) = o (3.10) After dividing the equation by (—a'), the equation for the electric field potential of a nonuniform conductive media is V2¢ + gVo- W = o (3.11) Expanding the gradient and divergence operators in two dimensions results in the following second order partial differential equation 62¢ 32¢ 1aaa¢ iaaa¢ _ a? + W + mg + ma ‘ (3°12) This is expressed more conveniently using subscript notation for partial derivatives as 1 1 433,, + d’w + ;o,¢, + gout/5, = O (3.13) 50 By solving Equation 3.13 for p and substituting into Equation 3.9, the current flow vector field can be found by substituting Equation 3.9 into Equation 3.6. Field Lines Form Paths The field lines for the electrostatic model are used as the path definitions. The field lines are computed once the electrostatic potential field ¢ is derived for the problem from the solution to Equation 3.13. The current density vector field 3’ of Equation 3.6 is used to trace the field lines. This vector field is the tangential field for the solution. There are an infinite number of field lines which leave the source point and enter the sink point. At every point in the potential field with the exception of the source and sink, the vector field (the electric field vector) is defined. From the source point, an angle (initial heading at the start node) is selected from which to trace out the field line. Using a small incremental path step, the path progresses along this direction. From this point on, the gradient vector can be computed which will define the direction of the path descent over the potential field surface, obtaining the path as the field line. Gradient descent over the potential field will not be troubled with local minima since the potential field cannot have maximum or minimum values except at the source and sink Iocations[68]. In addition to the source and sink values included as boundary conditions described previously, it is necessary to establish boundary conditions along the edge of the grid for the problem. These boundary conditions will determine what happens to the paths where they approach the edge of the grid. The grid itself is 51 assumed to be large enough to include a sufficient portion of the path planning area in which multiple solutions are needed. Since all of the paths should be continuous from start to goal within this planning area, it is desired that none of the solution paths run off the grid. Thus, asymptotically near the edge of the grid, each path should approach a parallel course to the edge. For the field lines to be asymptotically parallel at the edge of the problem grid, it is necessary to establish the equipotential contours as normal to the grid edges as they approach the edge. This is accomplished by setting up the following boundary conditions: 345 — = o 3.14 Byl (y = yminm = ymax) ( ) and 345 — = O 3.15 an: (a: = ”min,” = zmax) ( ) The first boundary condition (Equation 3.14), causes the field lines, E, to be parallel to the unit vector, to, since along the top and bottom edges of the grid the y-component of the potential gradient is zero. Similarly, the second boundary condition (Equation 3.15) causes parallel solutions of E along the left and right edges of the grid because the x-component of the potential gradient is zero also. 3.2.2 Analysis In this section, the analysis of the minimization performed by the electrostatic model is reviewed. The analysis uses the calculus of variations [7, 8, 9, 11, 38, 44, 48]. The partial differential equation for the electrostatic solution to the Th Th. MI 52 In Ol 52 nonuniformly conductive media (Equation 3.13) is derivable from the calculus of variations. The derivation shows the corresponding minimization function for the electrostatic model. The objective of the calculus of variations is to find the extremal surface u(2:, y) which minimizes the variational integral / I(u,:c,y,u,,,u,,)da: dy (3.16) This is satisfied when the Euler Equation is found: BI 8 81 6 81 EZ‘EQE‘EE’O ‘3") This gives a second order partial differential equation in the cases of interest. The following will derive the Euler Equation for two independent variables, 2 and y. It is desired to find a function u(:c, y) which will make stationary the integral LzZI/ym1(u,z,y,uz,uy)dzdy (3.18) The stationarity condition is satisfied when [2],,” 51.1sz = o (3.19) The variation, 511., is the increment in passing from the extremal, u, to the neigh- boring function, U, while holding the independent variables 2:,y fixed. From this 62: = 6y = 0, which leaves BI 81 81 6] — $611. "I" £611., + Easily (3.20) In order to evaluate Equation 3.19 and Equation 3.20, it should be noticed that I] 6:26"de = l] 3.5%) “My = f/figwu) dandy (3.21) 53 Now integrate Equation 3.21 by parts by letting BI 811. v: and dw = 63 (5..) dz = Buldz (3.22) a: while holding 31 fixed. From this, B BI (11) — 52; (Bus) dz and w — 6a (3.23) So upon substitution into fvdw = 1m: — fwd'v (3.24) there is =2 BI a BI “’2 a BI m 314,5; (611.) dz — 811,6" m1 — [6145:- (due) dz (3.25) This yields BI “'2 BI 6146““ = an, [U (22.3!) - 14232.31) - U ($1.11) + u(a=1.y)] (3-26) 31 with y held fixed. But, U(z2,y) = u(z2,y) and U(z1,y) = u(z1,y) SO what iS left iS =2 BI B =2 B BI :1 81126—2: (6n) dz — — [21 a (8.1,) 611. dz (3.27) Similar evaluations for 31 gives In BI B m B BI ——-—- 61:. d = —/ — (—) 6nd 3.28 £31583” ) y may Buy y ( ) Combining Equation 3.19 and Equation 3.20 with substitution by Equation 3.27 and Equation 3.28 gives =2 112 BI B BI B BI [81 -/y1 (a-aE—b—y‘b—uj)6u¢l$dy—O (3.29) 54 Thus, the Euler Equation now is BI B B1 B BI To apply the calculus of variations to the electrostatic model, set the function to minimize, I, to I(¢az:ya¢m¢y) = j ’ E (331) This right side is by definition j -E = 0‘(:v.y)(<—) Substitution of Equation 3.34, Equation 3.37, and Equation 3.38 into Equation 3.39 yields —20',,¢,, — 20¢" — 203,439 — 20¢” = O 55 Dividing through by -20' gives the final result: (bu + (12”,, + $039!), + £03,433, = O (3.40) This shows that the partial differential equation (Equation 3.40) minimizes the energy loss due to heat loss (j -E) in a nonuniform conducting medium[22, 58]. 3.3 Parallel Architecture Design For Path Planning In the previous section, path planning using the Electrostatic Model was de- veloped. A nonhomogeneous media was used to model the cost function by representing regions of high cost through high resistivity values in the media. The Laplacian partial differential equation was derived using the nonhomoge- neous media. The solution to the Laplacian is the potential surface over which the electrostatic field lines represent multiple paths which avoid regions of high cost. This section develops the architectural design of a neural network[61] that computes the numerical solution to the Laplacian partial differential equation. The architecture implements a finite difference approximation[10] to compute a nu- merical solution to the partial differential equation. This section begins with a review of the numerical approximation which maps into the mathematical physics model of the problem. The approximation defines an artificial neural network pro- cessing unit and its weighted interconnections to a selected set of neighboring processing units. The computational definition of the massively parallel architecture is based on numerical solutions to the partial differential equation. A neural network architecture 56 is defined which computes the scalar potential field solution to the partial differ- ential equation by using the method of finite difference approximation. The ap- proximation templates of the difference formula define the connection strengths. The experimental results of this method are presented in Section 4.2. 3.3.1 Constructing the Difference Formula It is desired that the general second-order partial differential equation Bzu Bzu Bzu Bu Bu = —— + — + — + = . L(u) ABz2 BBz_—B_y+ +0 2 D— +E Fu 0 (3 41) be approximated[23, 25, 39, 49, 64, 65] on a unit grid by L1(u) % ao‘IIp — 2: an“); (3.42) i=1 with the n neighbors of P being Q1,...,Q,,, where Q, = (z + {by + 1),). The Taylor series about P = (z,y) in two variables is flz+ih.y+jk) = f(z.y)+ih;;f(z.y)+jk—a-f(2.y) 6y i 2 2 2 + (2) $42.31) + whiz-3:31:33) (jk)2 62 2 w Let the neighborhood of P be defined as the n points (an n + 1 point approximation + f(m) + (3.43) template) Q,- = {(z + §,,y + 17.)}, where 5,7),- are integers (h = = ifor the unit grid). The expansion of u(z + £,-,y + 1),) about P is Bu Bu ulz+£t,y+m)=UQ. = ulzlyilp+5ia— +4.73; ”p p (£23211. Bzu + __ 2 322,, +6"ma—1: By 2 2 17. Bu + —*-— +... (3.44) 2 B3121, 57 So Equation 3.42 upon substitution by Equation 3.44 and collection of common terms becomes Ban 1 " L1(‘U) = O = — aiéifli-i- atni Bz2 P 2i=__Z1ai£§+—B—_zazB;I/upg:1 22 Bu " + —— agfi + — - + up (n a,- - 2:72:10) (3.45) Bz P231 2: Z31 Substituting the coefficients of Equation 3.45 for those of Equation 3.41 gives the system of equations i?”"“ = F p43 :amg = E (3.47) :45.- = D (3.48) :am,2 = 20 (3.49) imam = B (3.50) £3046? = 2A (3.51) The solution of the coefficients a,- of the above system of equations allows approximating the value of a grid point by solving Equation 3.52 for up = p-1—o :10 auQi (3.52) For the nonhomogeneous equation, set the quantities C, = az/a and 0,, - cry/a, resulting in gi+c,§9=o 3w) _ 3.2 3,2 6y and the coefficients of Equation 3.41 are A = 1,8 = 0,0 = 1,D = CHE = Cy,and,F = 0. It is desired to derive a five-point finite difference formula. The 58 coefficients are as shown in Table 3.1. Substituting the values of Table 3.1 into Table 3.1: Coordinates of the 5-pt. Finite Difference Formula 55 7h 1 0 O 1 -1 o -1 #OJN-k—- the system defined by Equation 3.46 through Equation 3.51, yields the following system of equations a1 + a2 + a3 + a4 = a0 (3.54) (12 — a4 = 0,, (3.55) 0:1 - a3 = 0,, (3.56) a2 + a4 = 2 (3.57) a1 + a3 = 2 (3.58) The solution to this set of equations is a. = 1 + 92: (3.59) 012 = 1 + % (3.60) as = 1 — 9;: (3.61) (14 = 1 — 95! (3.62) do = 4 (3.63) Substituting Equation 3.59 through Equation 3.63 into Equation 3.42 and solving 59 for Up gives the finite difference formula HRH-221w(Haw(“3441—341 (3.84) When the template overlaps a boundary of the grid, a method is needed to define the appropriate connections for the off-grid node. The solution is to use the method of images[58, 68, 71]. The method of images is an analysis technique for dealing with boundary conditions in electrostatic fields. The technique requires a mirror image of the conductivity function to be reflected about the boundary of the medium. In addition to reflecting the conductivity function, the point charges are likewise reflected. This creates a dipole charge relationship for which a field solution is easily generated using this expanded region of the medium. The problem is reflected about the boundary whose condition is Bf O at; = As the problem has an identical image, this results in maintaining the boundary condition. For example, if solving the half-plane A shown in Figure 3.2 by reflecting the half-plane A into B, and reflecting the point charge P, solving the entire plane as a dipole charges results in maintaining the boundary condition 3f —=0 31! along the reflection line z. Now the boundary region approximations will result in reflecting back into the original region. So if the approximation for the point P as shown in Figure 3.3 uses 0’ in the image problem, 0’ can be reflected back into 60 (JP A /////////////////_/_/_////_/ //_/ 0 Figure 3.2: Image Dipoles Resulting from Reflecting Half-Plane A into B (90 0P A ////////////.///,//././////./ //,/ 0 0’ Figure 3.3: Mapping the Approximation point 0’ for P into 0 O which can be used instead of Q in the approximation formula. For the path planning problem space the reflections shown in Figure 3.4 need to be established. The required problem space with the needed reflection im- ages are shown in Figure 3.5. Although there are eight reflections for the problem space to be completely surrounded, only three unique reflection images are re- quired. These unique reflection images are shown in Figure 3.5. For each of the reflection images (I, II, and III), the coordinates need to be reflected back into the original problem space. In Figure 3.5, notice the correspondence between the problem and image maps and the quadrant assignments. This quadrant 61 q if. 3 V Figure 3.4: Image Reflections Surrounding the Path Planning Problem Space ---------‘ l---------q| b--------- i--------d Problem Space Image I Image II Image III Quadrant I Quadrant II Quadrant III Quadrant IV Figure 3.5: The Unique Reflection Images of the Path Planning Problem Space 62 assignment is shown in Figure 3.6. If the row or column index for a template co- : m a w a m g II I II III IV n1 Figure 3.6: Reflection Images Assigned to Quadrants of the Problem Space ordinate occurs in one of these image maps, then the index needs to be mapped back into the problem space. If the index k > n, then to reflect the index back into the problem it is necessary to compute n — k mod n. If the index k < 1, then to reflect the index back into the problem It is necessary to compute 2 -— 1:. So the coordinates to map a quadrant 0 point (i, 3),, into a point In the first quadrant, l, (‘11.)! IS (1..) ={((n—im0dn),j)n ifi>n ’7 ’ ((2—il.j)u 'rrz' <1 (, j)! = { (Mn—J“ mod a)». in > n ’ (1N2 ‘jIIIV ifj < 1 For quadrant lll, (i,j), = (Ic,l), where k = n — (in; mOd n) If in] > n 2 '— z.III If in] < 1 and I: 17,—(qu modn) Ifjlu>n 2 '7 J111 '1 J111 < 1 The algorithm for mapping the image index is shown in Figure 3.7. An example of 1. begin IHAGEINDEX(p,q,n,p’,q’) 2. p’ 4—— p; 3. q’ +— q: 4. If p’ > n then 5. p’e—n—p’modn; 6. elseif p' < 1 then 7. p’ +—— 2 - p’; 8. end" 9. If q’ > n then 10. q’H—n—q’modn; 11 . elself q’ < 1 then 12. q’ «— 2 — q’; 13. end" 14. end IMAGEINDEX; Figure 3.7: Algorithm to Map Image Indicies into Problem Space Indicies this mapping using the five point approximation for a nonuniform medium follows. The approximation at the y = n border is defined as follows 1 C; 0: Cy y 451... = 2,- (1 1‘ '_2‘)¢i+1.n 1' (1 — §I¢t—1.n 1’ (1 1' —2—')¢i,n+1 1” (1 — %)¢i,n—1 (3.85) But, by definition it . O = ¢i,n+1 _ ¢i,n—1 (3.66) 33 1,11. 2 Therefore, ¢i,n+1 = 451,..-1 which, of course is the expected image mapping. By substituting this back into Equation 3.65, the final result is 1 C, 0,, 4’1... = 4 [(1 1' —2_)¢i+1,n + (1 - —2_)¢i—1,n 1' 2¢1,n—1 (357) Notice, however, an intermediate step in the simplification of the above equation which resulted in Equation 3.67, included the step 451.1. = ,1, [(1 1‘ %)¢i+1,n + (1 — %)¢i—1,n + (1 1' %)¢i.n—1 1’ (1 — %)¢i,n—1 (3.88) But Equation 3.68 is just the computation template architecture including the ap- propriate image mapping. So the desired approximation at the boundary (Equa- tion 3.67) is accomplished using the IHAGEINDEX mapping for the approximation (Equation 3.68). 3.3.2 Defining the Neural Network Architecture This artificial neural network architecture is similar to the computational archi- tecture known as the systolic array. From its similarity to a grid, each point of the grid is a processor, and the links of the grid consist of the links between the processors. This array characteristic of the architecture capitalizes on the regu- lar and modular structures which match the computational requirements of the model. The mapping of the finite difference approximation into this grid is illustrated in Figure 3.8. The block diagram which shows the neuron model definition which implements the five point approximation of Equation 3.68 is also shown in Figure 3.8. V144,) +V vi,j-l-l ref ref V 1,).1 < Figure 3.8: Neuron Model for the 5-pt. Finite Difference Approximation id 66 An artificial neural network, as defined by the model, uses an electronic cir- cuit (see Figure 3.9) as its basic neuron model, with the interconnecting resistors representing the synapses between the neurons. The numerical approximation computes the solution to a continuous system of equations (the implicit differ- ence equations) and solves Equation 3.64. The circuit equation for the inverting operational amplifier implementation of the neuron is v. = mite/R.- (3.69) '=1 where V,- is the output voltage of neuron i, limited to the range [—Vre f, + Vre f], R, is the feedback resistor for the neuron operational amplifier and R,- is the inter- connecting resistor from neuron 1. Thus the synaptic weight W,- is implemented as the ratio of resistances: —R, / 12,-. The neuron model is defined as consisting of an input function and output function. The input function computes the weighted sum of the interconnections from the other neurons, .1, = ZWM (3.70) i=1 and the output function computes the linear ramp between the saturation voltages [—Vre f, + Vref] of the operational amplifier. The neural network is used to perform a parallel computation of the scalar potential field 4» at every spatial point. Thus, the network computes a numerical solution to Equation 3.13. The architecture of the neural network must solve an elliptic partial differential equation. There are many numerical methods for solving partial differential equations. In this research, finite differences were employed to 67 solve the potential field Equation 3.13. Such an approximation is the five-point formula defined in Equation 3.64. For the neural network implementation, the neuron input function 14,,- is defined as 1 0'3 a d: 0' "1.1 = Z (1 + 5;)”HLJ' + (‘I 1' film's-1 1‘ (1 ~ Elm—1.1 + (1 - Q—pviJ—l] (3.71) The synaptic weights implement the non-uniformity of the cost functions (the co- efficients of Equation 3.71 which contain 8,, and a”). The weights to neuron(i,j) are shown in Table 3.2. The non-linear neuron output function 8,3,- is defined as -Vre1 3 111,5 < -Vrej v“, T { ut,j 3 — ref 5 “1.1 S +Vr¢f (3-72) + V"! 3 115,5 > + Vre} The entire search space is constructed by replicating this portion of the neural network over the entire grid. The output of the source and sink neurons which correspond to the start and goal nodes are clamped to + V"; and —V,.,,, respec- tively. Since the clamping of the source and sink to the maximum and minimum (respectively) potential value occurs on the boundary of the problem, it is appro- priate to use these clamped values as the limits of the neuron output function (Equation 3.72), since all potential values inside the boundary of the problem are guaranteed to lie between these limits. Table 3.2: Neural Network Weight Definitions for the 5-pt. Approximation Weight From Neuron Value W1 (i+1,j) (1+ d,/20')/4 W2 (i,j + 1) (1 + ay/20)/4 W3 (1' - 1.1) (1 - tr../20)/4 W4 (131' - 1) (1 - (Tu/20)“ Figure 3.9: Artificial Neural Network Op-Amp Circuit for 5-pt. Template 69 3.3.3 Extracting Paths from the Neural Network Architecture The paths are extracted from the neural network once it settles. The network computes the scalar potential field, as described previously. The final values of the nodes represent a surface with a global maximum at the start node and a global minimum at the goal node. A direction is selected for a path at the start node. Gradient descent over the potential surface is used to extract the path for that direction. Bivariate interpolation of the potential values is used between the grid points. Figure 3.10 illustrates the gradient descent algorithm. The algorithm computes the new location, new accumulated cost, and new gradient angle. The incremental path length is given as 5. Each pass through the algorithm generates another point along the path. This algorithm is repetatively invoked until the entire path from the start node to the goal node is extracted from the gradient descent of the potential surface. This will have a complexity of 0(L), where L is the length of the path. 7O ComflGO'IIFWNI-l begin GRAD IENT-DESCENT real constant S : incremental path length; real 03,0” : current: location; real CA : accmmtlated cost; real G : gradient angle; 1. 1—- [0.]; 11¢ ‘T— 1.01113 If I, = N then Ic +— Ic — 1; end" If Jc = N then Jc <—— Jc — 1; endlf XN *— C, + SCOSG; YN t—— 0,, '1' SSinG; IN *— LXNJ ; JN 1——- LYN] 3 0,, «— 8051(I,,J,,); 0N t—— CUST(IN,JN); Ac (— Sgt—E—QE; CA (~— 04 ‘1' Ac; zID (— C, — IC; 3]? t-— Cy — Jo; f; «— BIvaATE(z..y., f.,I.,J.); f; +— BIVARIATE(z,,, y,, f,, 1,, J6) ; G .— arctan (jg/jg); 0. 1— Xni (Ive—Y"; end GRADIENT-DESCENT ; Figure 3.10: Algorithm to Extract Path Points from the Potential Surface 71 3.3.4 Variable Connectivity for the Architecture In this section, the development of a method to implement a variable connec- tivity of the architecture is developed. The concept of variable connectivity means that rather than solving the partial differential equation with each node of the grid using inputs from only four immediate neighboring nodes, additional nodes are connected. The assumption is that increased connectivity will lower the conver- gence time of the architecture. This increase in architecture complexity should not outweigh the benefits obtained in convergence time, or accuracy of the results. The Taylor-series based approach of Section 3.3.1 was used to develop tem- plate definitions for increasing sizes of templates. This was an intuitive approach for increased connectivity. The approach had a sound mathematical basis for defining approximation templates. Given a template pattern, it seemed straight- forward to enlarge the pattern from five points to templates with nine, 13, or 25 points. However, this approach failed to provide the accuracy and convergence results which were anticipated. Increasing connectivity by using larger templates actually decreased the convergence rate. Section 4.3 gives the detailed results of this approach and discusses why it failed to produce the anticipated improve- ments in convergence rate. A second method to accomplish the desired variable connectivity in the artifi- cial neural network is now developed. A parallel summation technique using the 5-point finite difference approximation is developed which is based on the Euler method of integration. 72 Unidirectional Parallel Summation Considering one dimension, a first-order differential equation can be solved numerically using the Euler method. The Euler method can be viewed as a recur- rence relation. In expanding the recurrence relation for each successive solution point, all of the summations arrived at by the recurrence relation can be seen. Considering the terms of the summation for each solution point, and comparing to the total summation of previous solution points, previous solution points for these terms can be substituted in a systolic fashion. There are many ways to partition these summation terms, which in effect varies the parallelism of the ar- chitecture and thus the connectivity of the neural network. This section defines the equations and illustrates architectures of this approach. A good approximation is retained over the entire problem space, because the fundamental approxima- tion to the derivative is unchanged. The variable connectivity is accomplished by collecting computations of the numerical approximation into different groupings which may be implemented in parallel without altering the basic approximation to the derivative. The connectivity is variable in a straightforward manner, with computational speed results apparent from the inherent serial or parallel com- binations of the architecture. These connection equations (summation formula) are developed from first-order one-dimensional to two-dimensional for their use in the simulation. Consider the differential equation fife) = 9(5), with initial condition f(zo) = fo (3.73) 73 Substitute the numerical approximation for the derivative and obtain f(31+1) - f(xi) A23; = 9(31) (3-74) Solving for f(z;+1) results in the Euler forward integration equation[64] f(miH) = f(xi) 1' A3i9(3t’) (375) Let h = Am,- be used as the grid size in a parallel architecture. Recursive application of Equation 3.75 can be used to solve for any value f (mu): f (30) f0 “$0 = f(mo) + hg(3o) f(32) f(mil + 119(31) f(zn) = f(icn—1)+h9(zn—1) This concept is illustrated in Figure 3.11. If the definition of the f (2,) terms on the right hand side of the above set of equations are expanded, this gives f(zo) = N» f(xi) = N» + 119(30) f(wz) = N» + hlglmo) + 9931)] f(iva) = N» + h [g(zo) + 9(31) + 9602)] For the nth term this is f(m.) = 1(0) + "i hg(z.-) (3.76) i=0 74 38:00 29:86.95 BEN 05 Lo 20.38535 3.:QO ”3.0 859“. 2888 TB— _2mew_ 2888 A row Ac 5» 28.3 75 The summation in Equation 3.76 can be split into two summations: n-k—1 n—1 f(z.) = f(0) + Z hg(z.-) + Z hglzi) (3-77) i=0 j=n—k The first two terms of Equation 3.77 is f (2,,_,,), so this becomes n—l f(%) = f(zn—t) + Z hg(mj) (3-78) j=n—k The connectivity will be shown later to be the fan-in K = It + 1 to the neuron. Equation 3.78 gives a definition for the variable connectivity in the parallel sum- mation for the Euler integration. This variable connectivity encompasses varying degrees of parallelism via the fan-in, and has the effect of varying the amount of serial delay in computing its results. The first term of Equation 3.78 is the serially cascaded computation. The second term is the parallel (in addition to the serial input) computation. Of course, it will have a valid output only after the cascaded serial summation delay. The connectivity is derived as follows. Equation 3.78 is spatially constructed for parallel summations. So how many summations are there? The last term has (n — 1) — (n — k) + 1 addends, for a total of 1: inputs to the parallel summer. Including the input for the first term, this is k + 1 a K total inputs to the summer. Use of Equation 3.78 for specifying the connectivity of the network architecture will be illustrated. Figure 3.12 contains the recursive equations and Figure 3.13 shows the corresponding interconnections used with various configurations to show how adjusting the fan-in results in a unique architecture. This set of illustra- tions uses ten points of the function (11. = 10). 76 Max:854 53382200 83mins 5 new: 8:833 coamEEzw 3:8me & fin 939“. A 8531395 Aroma: $392+ Ania: A3595. Amie. $321555Apcméiea u A288 A??? $532+ A8595 Among: News... A??? mews... Aroma? A853: ASL u A 8 .8 A??? 35?? Among... A4595 mews... mews: Aroma? A855,: 8: u A w 88 news... A8595. A452... Amie: $321595 A8595. A92 u A A. .9 Amewé. «swat. Among... 39:169.? A8595. Ac: u A 8 Q A4535. Amewé «55.: A2595 A8535 A92 u A m S «882.. mesa. A582. A8882... as "A 4 .3 3?: Areas: A8595 8: n A m 9 A695 A889,... 8: u A m .8 A859,... A92 u A A .9 anoacoé‘ coAAmEEg A2331 2: Ac $230328 ”m A .m 2:9”. 9999999999 Afvw Ans» Aye» Aye» Ami _A§w_ _A§m_ _A~5w_ _A_%_ A?» Ape 78 Figure 3.14 illustrates the summation equation with a fan-in of 2, which is f(z‘n) = f(zm) + hy(a=n—1) (3-79) Each output on the left side of the equations in the figure picks up two terms, the previous output term, and the last input term on the right hand side of the equations. This architecture, as shown in Figure 3.15 is parallel in space, but serial in time. It has the minimum number of connections, and the maximum time to settle because of the cascaded ripple wave computations. The next illustration is Figure 3.16 for the fan-in of 3. Its equation is n—1 f(rn) = f(wn—z) + Z hg(wa) (3-80) i=n—2 Each output on the left side of the equations of the figure picks up three terms, the second output term preceeding this output, and the last two input terms on the right hand side. This architecture shown in Figure 3.17 is the next step in adding more connectivity, which has a corresponding decrease in the time required for the cascaded serial computational wave. Figure 3.18 illustrates the next step with a fan-in of 4. The corresponding equation is n—1 f(zn) = f(zn—a) + Z hg(za) (3-81) i=n—3 The output term on the left side now picks up four terms. The serial term which was picked up preceeds the current term by three delay steps in the cascade ad- dition. The last three terms of the right hand side are picked up in the summation. Figure 3.19 illustrates this architecture. 79 In Figure 3.20 the fan-in for the complete connectivity of n + 1 is shown. The equation describing this is n—1 f(zn) = N» + X hg(=v.-) (3-82) i=0 Here no serial terms are picked up from the cascade ripple wave. All of the input terms are from the right hand side of the equation. This architecture is parallel in space, and also parallel in time. This has the maximum number of connections and minimum time because there is no ripple wave for the computation due to no cascading structure. The architecture is shown in Figure 3.21. 80 32833 33.22.55 3:93 as o £38500 w n x ”Ed 050E @KVWo—fi ‘Ol‘QI‘OO‘OI'II'CO'OI'OOlOI'lI'OI'IO'II'DI'CO. II ' ll ' OO l II l II ‘ all In, ‘ 0‘90 .- colon o ‘00 0 '00'00 I CI I'oollno‘au'oo|oc'oo|oolul 00" ot'ooloo'oo'OO' ll I. II ' oo ' 00‘s! 00.00 ‘00‘ '0. o‘oo mwié... gawk... vows? Amwié... m semi... mucus? «rows? 8% n OI to‘Il'lo-Ioo'lo‘oolIolnulIO‘ 000000 ‘0'. '00,. I! I'll '0 n‘ot‘OIl-n o o n 0 '00'00' Inn-'0 00‘ I|I 00'. III... 0' Anaemia. vawzi m 5wé+ mwié+ m semi... Miami... aw u A e .9 OI OI'OO‘OOIDI'OO'.U'OI'CO‘OI .0 ‘ ‘ l ' I‘ll. 00' to I. «5?: «5?: AA 5?: A??? AAE u ‘OO'OO'IIIOO'OO' III nn l oooo ' OO OuI‘Io Ill Ag... Ag... Ecums. . Aroma: «.59.: 8: u .‘ I. A53: sin 81 so A wwi 9:82.50? :oAAmEEam @3ng cs 5A 3800:3885 N u x Hm fim 2:9“. m Ass» Ass» Cow _A§_ _Amsw_ Aasw A5» Ass» Ape 82 398:3 coumEEam EASE 05 B 5388200 m. u x “m fin 2:9”. I 'IOIOO'OI'OI'00IOI'0I| ’ | ‘OO‘OOIIO‘IO|IO|II'IU‘OO II II I. II‘OI'IO'IO'OI'IOIO. O '00! oo‘oonllu Oo‘ool OI'IO‘ 00 “III '0‘. _A a 5:... as: «5?: $9.: Aroma A??? Aroma: «5?: A595. A52: A9“ "E .... IOI.JIO'OO|OOIIO‘OI|IOROOIUI'OIIIO'OO‘OIIOI|OI|IO'OOI II 0. 0. ’0 ‘oo‘oo' I IO'IIIIO OI'IO‘OI ' I IlII'IO‘ "00 K V DO I A: + f'h K v DO 0 4: T "b D< v DO I: + "in K V DD 0 .2 + N K V DD 3:: + "h K V DD .5 + ’6‘! K V DD 3: + m K V an O .1: + ’b N v DD 3:: + 8 «:1 II; A o 5 “- OI'OO|IQ'OO'OO'I ool‘oo‘tn‘sol‘ol'uo'ocl clot-llqu-losl ss II. nn I. nn ll oo I. ‘Io' 00'! I‘ol‘on 0'00- II... on..- DI‘IO I. II.IOO'OOIII|UI|OIIOIIIOIIOIOI'CO'OOIOO'IO'CO 00". 1 | ".’ ‘ QQ ‘ 0000 ‘IO‘II II‘IOI l ...—"~'. % K v DO 0 £3 l + A KW v Q0 0 .9 "V >< V DD 0 .fl .1. "m K v 00 o A: + f'N K V DD .E + m K v DD 0 .G + ”b K v DD 0 w: + A O «‘1 II A b 5 “- ‘CO CI II‘OO‘OC|OOlOOIII’II'OI'OO‘OC'IO‘OI'OI‘ ‘OO‘II IO'IO I‘D. ' IQ‘O I'oo‘aillon'lo‘cl‘oo‘lo'ol'll'oo'lo'lo' ‘00 00,-. ‘II. ‘0 a. 9'0 II|IO'|O‘IO|OO‘OD'II'OO I. ‘UO' I. I. l «5?: A??? ?W.fi..m§ u A M Q ’o.' h A 39.1 @5318? A N Q m [.17 \. Assam? a: u A A .2 Q§3§2< coamEEzm EASE 05 BA 8033:0225 m n x K F .m 2:9”. Ass Aye Aye Aye Aye Aye Aye Ape Are Anew A5» A05» Ame» Ave” Ans» Ame” Arc» Ace» Ass. 80.5533 counszsw 5252mm as 50 33:82:00 vn x no; 239“. . .. . ..r|..ll..|... |..||..I...'..Il.. . l..n|..|l..l. || ll .l...‘ ..l|..|l.. ..'..|l 4w— .II..II..|I..I...II .‘..Il..‘..al .. v 00 O .5 + K v 00 I .l: + A gm V DO 3: L4.— ’3’ >< V DD 0 .I: + ’?n K v on O .2 + "N K V D0 0 .: + m >< v DO I .5 + v DD 0 .l: + A 8 T a H— ':l 753+ m5?.+ A:5m..._+ ”53+ 855" a“... W... _ ..._. ‘1 ‘1‘ .l..l _ AA5?.+ W534. aim—MN 5.8—... _ $59.39;" _A.._ 5 _. 85 A35. wavw oSBSEE< :oAAmEEam EASE as 3.. «2382:0285 V" x ”9A.” 959“. Aye 15. A55. m5. A35. Ape A95. AA A95” A55» Amow A55» A3”. A35” A35... o A 2 x55 5w A35. 86 30.55.5553 355558258 525$ 55 50 0.535056 552005500 5 + :u x 586 9:9”. _A 55?: 555?: «555.5155: «5?: A 555?: «5?: «5?: A55?: 35?: 855—."3 7555?: «55...: «55?: «55?: «5?: «5?: «5?: «5?: A55?: A555 T 3 #355557; A5?: A55?: «5?: «5?: «5?: «5?: «5?: 8%. g _ «55?: 35?: «5?: «5?: «5?: «5?: AA55?: AA555 ij_ 55 $55.: «5?: «5.5.5: «5?:5 A 5?: «5?: AA555T g _ «5?: «5?: «5?: A55?: «5?: 855ng _ A555: «55.: 555.: A555: 855—." _L «5?: «5?: AA5..5A5..5+|A55w_.... E 55?: A555»: A555 +7.5 55— _ «m5»: A555_.u_.A 5 55_ 87 0535005555,.5. 20.582555 505593 ms 505 5505585500595 5 + :u x ”5d 0.59“. A555 A555 A555 A555 A555 «.55 A555 «55 A555 A555 A555 355 255 A555 A555 A555 A555 Ac 5w A055 88 Bidirectional Parallel Summation This section develops the next step in the construction of the parallel sum- mation architecture. Since the objective of the parallel summation architecture is to have a variable amount of connections for a multidimensional problem space, the summation technique must not have a unidirectional flow to the computation. ‘1" his requirement is necessary because the local connection architecture must be robust enough to support the processing of data arriving from any direction within the architecture. Therefore a bidirectional parallel summation is needed to e l i minate the unidirectional computation flow of the previous architecture. To develop a bi-directional formula, the backward difference formula first n eeds to be derived. Begin again with the differential equation 25%») = g(a:), with initial condition f(zN+1) = fN+1 (3.83) The backward difference approximation to Equation 3.83 is f(za+1)—f(m.-) A2,- = 9($i+1) (384) |~etting h = A2: be the grid size, and solving Equation 3.84 for f(zg) gives f(mi) = f(ma+1)- hg(a=a+1) (3-85) The general expression then is j-l-lc f (33') = f (33:41.) — Z h9(-’ci) (336) i=j+1 The architecture for this equation is shown in Figure 3.22. 89 o535oo55=o5< 550.55.585.55 355881 83855.5 555955503 50 3.553535: N25 959“. 3 A255 A255 _Aow5w_ _Aa55_ A555 A555 A85” 5 90 As should be expected, this result compared with Figure 3.11 is seen to be inverted along the single dimension. Now a central difference formula is derived by combining the forward and backward difference formulas. Recall the forward difference expression as j—l f(mj) = f (”j-Ia) + Z hglm.) (3.87) i=j—k and the backward difference is Equation 3.86. Adding these together and solving for f(zj) yields . z-_ j—1 j+k f(zj) = f(zéu) + LL25) + 2 ggw— :3 gym) (3.88) i=j—k i=j+1 Figure 3.23 shows the architecture for Equation 3.88 with k = 1. There is a pattern to the increasing parallelism (k) in the summation equations and the architectural configuration. This is illustrated as shown in Figure 3.24 where the right side of the figure shows the pattern after increasing the parallel connections to the example on the left. When increasing the parallelism of the summation, the input to neuron C is increased. This is done by disconnecting the output 0,; of neuron B, and adding connections from the input 13 of neu- ron B directly as additional inputs to neuron C. This will include connecting the output 04 of neuron A to the input of neuron 0. Thus the pattern of increas- ing parallelism in the summation equations incrementally bypasses neurons by adding connections from their inputs directly to those neurons which they were connected to by their original output. o5=5oo55co5< coumEEzm 5252mm 02520555 525:8 os 50 5655958555 ”mud 9:9... 92 2:535:50? 5825835358551 255 5o .5ko5qu 9.523% Head 939”. 6 U0 as U m .5. .. J mmt< ”axon—mm 93 0,, = A(IA,XA) (3.89) 03 = 303,0.) (3.90) 0,, = B(IB,IA,XA) (3.91) 00 = C(IC,OB) (3.92) 00 = C(IC,IB,OA) = C(ICaB(IB,OA)) = C(Ic,03) (393) Thus, the recursive expansion by substitution. Figure 3.25 illustrates the architec- ture of this observation for the central difference formula. 56.53.5556 35583 505 3.5385833. 8223555 5855580 50 :kochxm ”mm...” 939”. 5 . . -J 5 ME< H H \ 95 Two-dimensional Parallel Summation With the development of the central difference formula for bidirectional parallel summation, and the observation of the recursive expansion, the final step in the construction of multidimensional parallel summation can be accomplished. Using the five point approximation template as a kernel, the template can be “unfolded” as many times as desired in a recurrent fashion, obtaining a larger connection neighborhood, which takes into account the local 0,,(z,y) and Cy(a:,y) in the solution. Different kernels generate different connection patterns. Recalling the weight expressions for the model from Table 3.2, let them be written as the following equations w.:(i,j) = 31,1 + 33.)] (3.94) waver) = ,1, :1— g; 232)} (3.95) w; (i, j) = 2 (1 + §i(i,j)] (3.96) mm) = -}[1— g—gm] (3.97) The general equation for the neuron input function is “(i517 = w: (i,j)v(i+ 1,3') +w;(i,j)v(i-1,j) +10: (i,j)v(i,j+ 1) +w;(i,j)v(i,j-1) (3.93) which upon recursive expansion (for k = 2) is 96 Mid) = w:(i,j)[w:(i + 1,J')v(i + 2.1') + w;(i + 1,J')v(i,j) + w;(i+1,j)v(i+1,j+1)+ w;(i + 1,j)v(i + 1,j— 1)] + w;(i.j)[w:(i-1,j)v(i,j) + w;(i-1,J')v(i- 2,3') + w;(i—1,j)v(i—1,j+1)+w;(i—1,j)v(i—1,j—1)] + w;(i,j)[w;(i,j+1)v(i+1,j+1)+ w;(i,j + 1)v(i—— 1,j + 1) + w;(i,j + 1)v(i,j + 2) + w;(i,j + 1)v(i,j)] + w;(i,j)[w: (i’j _ 1)?)(1: + 13.7 _ 1) + W;(i,j—1)v(i—1,j—1) + w; (7')] — 1)v(i,j) + 1037(3)] — 1)v(i2.7 _ 2)] (3'99) Figure 3.26 shows the first two expansions for a 5-point mesh architecture. An algorithm which implements the connection weights for any value of k is shown in Figure 3.27. In this manner, new connection weight values are defined. This expansion continues until the desired amount of connectivity is achieved. The maximum connectivity is obtained when the expansion for each node has recur- sively used all other nodes. This will scale directly with the length of the largest dimension of the multidimensional space. 97 _ . . fl . . . f . _ ———-'-———f—- "lLl" . . _ "L|"' . f'- --.. -... ..--” I H II" MI .W. - IlW’kll -... _ ..--” ” ..-- H IIMuIIIImIIII-flll IllflllLllllfll Figure 3.26: Two Expansions of the 5-pt. Kernel Approximation Template 98 1. begin F(k,v,i,j) 2. if k = 0 then 3' F+——v(i,j). 4. else 5' 3+ 4——-w;(i,j)F(k— 1101': + 13.7); 6' :3— ‘_ w;(i17)F(k_11v1i'-1aj); 7. If t—w;(i,j)F(k—1,v,i,j +1); 8. ”- +—w;(i,j)F(k—1,v,i,j—1); 9. F4—F:+F;+Fy++F;; 10. end" 11. end Figure 3.27: Algorithm to Generate the Parallel Summation Neuron Weights 99 3.4 Complexity Analysis of the Neural Network The neural network computes a finite difference approximation, which can be expressed in matrix form. The matrix equation for the neural network is alt“) = Au“) (3.100) where a”) is the jth update to the (n?- x 1) neuron vector F 141,1 1 u1,2 u = 11.1," (3.101) “2,1 Lunm . A is the tri-diagonal square (112 x 11.2) matrix defined as ’ T 13 0 i YN T Yp A= (3.102) YP .0 13 T. along with the square (71. x n) matricies B=I 8mm 0 1/2 0 (1— %)/4 0 (1 + %)/4 (1— %)/4 0 (1 + %l/4 l O 1/2 0 ' (1 + %1)/4 0 Qt Yp — (1 + 2 )/4 (3.105) L 0 . (1+9;)/4. 100 P (1 - 920/4 0 (1 — 920/4 YN = (3.106) L O (1 _ £21)/4 .l Each iteration of Equation 3.100 requires 0(n4) operations on a Von Neu- mann processor, but only a single cycle on the neural network. The neural net- work implements the five point finite difference approximation as a point iterative Jacobi[39] algorithm (all points are computed simultaneously in parallel, how- ever). Lapidus and Pinder show[39] that the rate of convergence of the iterations of Equation 3.100 is inversely proportional to the square of the discretization of the grid (1 /n). Therefore, the convergence of the neural network approximation is 0(n2). The complexity of the PPPH algorithm is then 0(n2) + 0(all paths) = 0(n2) + no. of paths . 0(path length) (3.107) The 0(path length) is 0(n2) since the worst case length includes all nodes of the neural network. The number of paths computed is linear in heading at the start node, so the overall complexity of the algorithm PPPH is 0(n2). 3.5 Summary This chapter has developed the details of a massively parallel architecture for path planning. Electrostatic field theory was used to model the path plan- ning problem. Multiple path solutions are provided by this model through the field lines which describe the distribution of current. Numerical approximation provided the means to directly map the solution of the Laplacian partial differ- ential equation of the model to a massively parallel architecture through a finite 101 difference approximation. An artificial neural network was designed to compute the finite difference approximation by using the approximation template to de- fine the synaptic weights of the network. A parallel summation technique was developed which provided for a variable connectivity in the neural network. This technique recursively applies the approximation template to define the synaptic weights of additional connections in the neural network. Finally, the complexity of the neural network architecture was shown to be 0(n2) for a network with n2 neurons. Chapter 4 Path Planning Application Results In Chapter 3, an architecture design was developed for the electrostatic model of path planning. The architecture used a neural network computing model to solve the partial differential equation for the electrostatic model. The neural net- work was simulated to examine the usefulness of this approach to path planning. In this chapter, the results of the simulation runs are discussed. Experimental verification of the solution and performance characteristics of the massive parallel architectures was accomplished using simulations on a VAX and a Connection Machine. The results and performance of the neural network which implements the electrostatic model is reviewed, including the simulations on the Connection Machine. The solutions generated by the massively parallel architecture have been checked against known admissible algorithm results. The effects of variable connectivity on the results of the massively parallel archi- tecture is also discussed. The amount of interconnection between computational units in the massively parallel architecture affects the accuracy and convergence of the network solution. It is shown that as the amount of connectivity in the network increases, the convergence rate improves. 102 103 The results using the electrostatic model show an average difference of eight percent in computing the best cost paths relative to sequential best first search. Connectivity curves are included which show the convergence of the Taylor series based templates and the parallel summation technique templates at various de- grees of connectivity in the neural network. These curves show that the parallel summation technique templates converge faster with higher connectivity. The analog circuit simulation on the 16k CM-2 Connection Machine resulted in a real- time performance projection of 18 milliseconds for Test Case B whose neural network had 3,600 neurons. 4.1 Test Case Descriptions Several test cases were used to experiment with the architecture simulations in this research. Table 4.1 lists the features of each of the test cases. The column Table 4.1: Path Endpoints and Optimal Costs of the Test Cases CASE N: Size S: Start G: Goal P*: Optimal Cost A 30 (3, 23) (17, 2) 0.33 B 50 (20,10) (45, 55) 5.34 C 50 (8, 42) (37, 11) 0.71 D 65 (14, 35) (50, 50) 0.45 E 21 (13,5) (11, 17) 18.14 F 21 (17, 5) (5, 17) 17.56 G 26 (3, 4) (24, 24) 38.99 labelled “N” is the size of the square grid used for the test case. The total number of grid points in each test case is N2. The next column, labelled “8”, contains the coordinates of the start node on the grid. The column labelled “G” lists the coordinates of the goal node. To establish the minimum cost path and its cost 104 value as a benchmark against which the simulated architecture results could be compared, a branch-and-bound search for the test case with the specified start and goal node was conducted. The last column, labelled “P*” shows the value of the accumulated cost for the best path between S and G of each case. An example of the cost functions for the test cases is displayed in Figure 4.1 which is the cost function for Test Case 8. The cost function value is illustrated as the contour lines of a surface whose height corresponds to the function value. Contour plots of the cost function for all of the test cases are illustrated in Ap- pendix A. A detailed description of each test case is included in the appendix. These test cases collectively contain contrasting features in the cost function. These contrasting features in the test cases included single versus multiple cost regions, discrete versus continuous cost regions, dense versus sparse cost re- gions, and synthetic versus real-world cost model regions. In addition to their contrasting features, the various sizes serve to investigate the scalability of the neural network architecture. Many neural network investigations have used at most 100 neurons in there test cases. This test set included networks with sev- eral thousand neurons. 4.2 Electrostatic Model Results Table 4.2 shows the results of the electrostatic model neural network archi- tecture on the test cases. The average difference for all of the test cases was 8.3 percent. 106 Table 4.2: Test Case Results Using the 5-pt. Approximation Neural Network CASE Network Cycles P: Least Cost Path Value %T)ifference from F‘— A 4,000 0.33 0.0 B 3,483 7.67 + 43.6 C 5,460 0.84 + 18.3 D 6,767 0.43 -4.4 E 566 19.00 + 4.7 F 370 18.00 + 2.5 G 2,610 36.50 -6.4 Average + 8.3 4.2.1 Results Contrasted with Sequential Search In examining the data, an obvious question is: How can there be a negative difference? This is due to differences in the method for generating the reference path and for extracting the path from the electrostatic model. The former used a branch-and-bound search using the nodes of the test case and their cost values. Recall that the gradient descent algorithm (Figure 3.10) for extracting paths used a fractional step in the direction of the gradient which resulted from a bivariate interpolation of the potential values at the neural network nodes. This distinction is further illustrated by an example shown in Figure 4.2. In the figure, the discritization of the cost function is depicted by the nodes of the grid. The solid nodes are those which are included in the reference path found by the sequential search algorithm. This algorithm expands the nodes of the search space (grid) sequentially and the path is defined by connecting the nodes of the path with a straight line segment. This is shown as the solid line segments. In contrast, an example of the type of path generated by the gradient descent algorithm is illustrated as the dashed line in the figure. It is usually a much 107 smoother path since it uses a gradient step size which is often defined as a fraction of the grid discritization value. The fractional step size used in the neural network simulations was 0.5 (one- half of the grid step defined by the test case nodes), so this resulted in a higher resolution of the path points extracted than those of the reference path. With a higher resolution, some of the extracted paths in the test cases had a lower cost than the reference path which resulted in a negative difference. .1? 9-9- "Q"..- 1? --?--— ~ --‘O' -- _-- -— -. ' a" 1 i 1 -<>-r’ "4.8-9" .. “9- 1* -<:>- -_ --d:>'-'b-" . .... n.3,“, *‘W. Figure 4.2: Constrasting Sequential Search Path with Gradient Descent Path 4.2.2 Results of the Parallel Search Architecture Figure 4.3 shows an example of the multiple paths computed by the archi- tecture along with the benchmark optimal path. The contour plot depicts the 108 cost function of the test case. In the figure, seven paths are illustrated which are separated at the start node by a 45 degree direction angle. Figure 4.4 illustrates the solution cost from the simulated architecture. The path cost shown is that of the least cost path over 360 headings around the start node with a one degree increment. The algorithm for the parallel path planning method is shown in Figure 4.5. This algorithm implements the neural network simulation. At the end of each cycle of the simulated architecture, the path defined by the flux line at each of the 360 angles was computed and the cost of the path accumulated. The plotted cost is the minimum of all of these 360 possible paths at each pass of the simulation. This figure illustrates the convergence of the minimum cost path to asymptotic values for the electrostatic model. Figure 4.6 is a plot of the convergence over time of the simulated architecture. Each of the eight test cases exhibited a similar convergence profile. Figure 4.7 is a plot of the heading from the start node of the minimum cost path after each pass of the algorithm. This illustrates how the heading of the minimum cost path settles. All of the cases also exhibited a similar global profile in heading. For each of these plots, however, the local variations of the curves will change from one cost function to another. ’IIIIIII““‘“—-""'* “ . ‘_ _ will, a . (n . ¢é@V/\x\/\ 110 coamSEB $9.32 3302 o5 aESQ ESE 33> 5mm 88 ~83 ”.1. 239“. A. as 208 Odom 0°.me oodow oode 8.8m 8.00— och— 8.8 8.0V 8.0 Tb..::__::C:.—:u.h::b:F::.._rp_:.:k_.:hF;.;.E...r:_:h:::r:.::: 021 'L 00'? $00 0'? 09'9 [TTT1ITTIIITIIIIITTTTITIIITW11'1111111111111lI1IIT OZ'B 111 1. begin PPPH 2. real totdi f f : total sum-squared difference; 3. real stest : convergence test criterion; 4. integer ncycle : number of iterations; 5. integer mazcycles : iteration limit; 6. integer nsize : neural network square size; 7. real u(nsize,nsize) : neural network outputs; 8. real z(nsize,nsize) : previous u; 9. real sad : sum squared difference; 10. integer j,k : general indicies; 11. while ( totdi f f > stest and ncycle < marcycles ) 12. Compute neural network update; 13. for j +——— 1 to nsize 14. for k +— 1 to nsize 15. sad 4——— u(j,k) — z(j,k); 16. ssd +—— ssd * sad; 17. totdiff «_— totdiff + 33d; 18. endfor 19. endfor 20. totdiff «——- totdiff/nsize/nsize; 21. totdiff <—— SQRT(totdiff); 22. ncycle «— ncycle + 1; 23. :c <—— u; 24. for j <——— 1 to 360; 25. Perform gradient descent over u at angle j; 26. endfor 27. endwhlle 28. end PPPH; Figure 4.5: Algorithm of the Neural Network Parallel Path Planning Simulation 112 55335.5 £9582 0.555 2:830 E562 do @039?on ”ed 2:9... A. as 206 Ofiv oQoov onDn cadcn onDN OQdGN 6969. OQOQr onn OQd ELI—ltl—lbLlr_PPF-.ppp—bppbpppph _..p...-.._..._..c.p_bcrbb.._._».._5L..._.L._.p..L—-.....bL. OO’O ZO'O VO’O _f lfllTIIIITIITTTTTTIIITIIIITIITIIIIlllllll1171171ll BOUGBJOAUOO 90'0 90'0 0'0 353355 $9582 3502 05.30 Axiom 0.5301 53 ”moo ~83 ”Ev 939... A. as 205 00¢ 00.001 00.0mm 00.00». 00.9.“ 8.00m 00.09 00.09 00.0w 8.0 ______._._..:_.p...Lb:p:.t__:.:.p__p....Pbb._..hp...:—.:p.....—.trr...._b.Fh.p.L. 113 [WTIIIIITIIIIIIIITIIWIIIIITIIIIIIIITITTTIIIIIITTI 0003 00961 00821 OO'OZI 5U!P°°H 00'?” 00°29 1 14 The electrostatic model neural network becomes an element in a system which can be applied to the generation of aircraft trajectories. Such a trajectory generator is shown as a block diagram in Figure 4.8. Inputs to the neural network are the start and goal coordinates for the trajectory, and the explicit cost function values for each cell of the problem grid. In the PPPM algorithm, the network settling is checked against an input parameter for convergence. This check tests if the sum- squared difference of all the neuron outputs between successive iterations is less than the input parameter. When the neural network settles to its stable state, the neuron outputs define the potential surface solution for the electrostatic model. This surface is provided to the gradient descent procedure, along with the start and goal coordinates, and a discrete signal indicating that the network outputs are settled. The descent procedure also receives a direction angle as an external input. This angle is the heading at the start coordinate to use in initializing the descent procedure. The procedure then outputs the list of trajectory coordinates which are obtained from the descent starting with this heading. The accumulated cost along this trajectory is also provided with the list of coordinates. Once the neural network has settled, new direction angles may be input to direct the trajectory generator to supply a new list of trajectory coordinates. This is used in the trajectory generation system which is illustrated in Figure 4.9. The trajectory generator of Figure 4.8 in contained in this system as shown in the block diagram. The current aircraft heading is supplied to a trajectory analysis function as an initial heading. The current aircraft location is supplied to the trajectory generator as the start coordinate. The trajectory analysis function passes the initial 115 heading as the current direction angle to the neural network. When the network is settled, the trajectory coordinates and the final accumulated cost of the trajectory is input to the analysis function. This function can then adjust the direction angle and receive a new set of trajectory coordinates and accumulated cost from the neural network, without a re-settling of the network, since the cost, start, and goal conditions have not been changed. This allows the analysis function to employ comparison techniques to converge on the minimum cost trajectory as a function of heading. An example of techniques to use could be a binary search over heading, or a branch and bound search over heading. When the analysis function finds the best trajectory according to its search criteria, it signals that the trajectory has been found and the corresponding trajectory coordinates are output. Cost Start Goal Direction Function Coordinate Coordinate Angle GRADIENT DESCENT l Network Path Settled Coordinates Figure 4.8: System Block Diagram of the Electrostatic Model Neural Network 116 Goal Cost Initial ' Coordinate Function Heading Start Coordinate V Direction An gig L Electrostatic Model Path . Neural Network Analysrs Path Planner Function Path Best Path Coordinates Found Figure 4.9: Block Diagram of a Trajectory Generation System 4.2.3 Multiple Path Results One of the features of the Electrostatic Model for path planning is the gen- eration of multiple paths in the solution space. Once the potential surface is computed by the neural network, gradient descent from the start node over the surface extracts the path. The initial direction angle, which is given as a heading, determines a unique path to the goal node. For all of the test cases, a one degree increment was used at the start node heading to create the unique paths. This section reviews the results of all the test cases for their multiple paths. Table 4.3 shows the best ten paths for Test Case A. The costs of the first 10 paths had a range of 0.005 which is approximately 1.5 percent of the best path cost of 0.3287. The PPPH algorithm computed a total of 170 paths for Test Case A. The costs of these paths are plotted in sorted order from the best value to the worst value in Figure 4.10. The figure clearly indicates a linear cost increase of Table 4.3: Costs and Headings for 10 Best Path Values of Test Case A 117 RANK Path Cost Value Heading 1 0.3287 131 2 0.3306 132 3 0.3312 133 4 0.3319 125 5 0.3319 126 6 0.3325 127 7 0.3325 128 8 0.3325 134 9 0.3331 129 10 0.3337 130 0.9 -. 0.8 + a7 .. 0.5 - i > 0.5 «b 5 04 -- 6! up 0.3 ._ 02 J. —— PPPMAltemtives 111 w 0 096ml 0 4. . 1 i 1 1 1 A. . 1 1 1 1 f 1 1 1 11 21 31 41 51 a1 71 31 91 101 111 121 131 141 151 161 Rank Order Multiple Path No. Figure 4.10: Plot of Test Case A Multiple Path Values Ordered by Cost 1 18 only 0.0006 units per path up through the first 157 paths which were generated. This is a mere 0.18 percent increase in the cost per path. Overall, the 157th path cost of 0.4206 represents only a 28 percent increase in the path cost from the best path value of 0.3287. Of the 170 paths generated for Test Case A, these 157 nearly linearly increasing path values account for 92 percent of the solutions which were generated by the PPPM algorithm for Test Case A. The best 10 path values for the remaining test cases are contained in Ta- ble 4.4 through Table 4.9. The plots of the ordered path values are displayed in Figure 4.11 through Figure 4.16. These data show that similar results hold for all of the test cases. All test cases show that the increase in path cost values for the multiple paths starts out linearly, followed by a more dramatic rise in the path cost values. Over the linear region of these path cost values, the slope of the linear increase averaged only 0.22 percent of the best path value per new path. This linear portion of the multiple paths averaged to be 63 percent of the multiple paths generated by the algorithm PPPM over all of the test cases. Path Cost Value 119 Table 4. 4: Costs and Headings for 10 Best Path Values of Test Case 8 RANK Path Cost Value Heading 1 7.672 92 2 7.678 96 3 7.694 95 4 7.701 93 5 7.703 94 6 7.725 89 7 7.729 97 8 7.742 90 9 7.790 91 10 7.793 99 35 A. w «I. 25 at— m «L- 15 .41- ]0 .... / —-—- PPPM Alternatives 5 11 U Optiml o J l l L l l L I l l l l L l L l l l l l l l J l L L 1 l L l L l l l l l I I U I r r t I V I UUUUU r V r T r r IIIIIII r T I 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 Rank Order Multiple Path No. Figure 4.11: Plot of Test Case B Multiple Path Values Ordered by Cost Path Cost Value 120 Table 4.5: Costs and Headings for 10 Best Path Values of Test Case C RANK Path Cost Value Heading 1 0.8397 52 2 0.8404 51 3 0.8417 50 4 0.8418 77 5 0.8422 76 6 0.8428 75 7 0.8456 74 8 0.8459 72 9 0.8463 81 10 0.8464 37 2.5 .. 2 .. 15 ll 1 Al. 1 05 __ PPPMAltemtives Optimal 0 1 P 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 21 41 61 81 101 121 Rank Order Multiple Path No. Figure 4.12: Plot of Test Case C Multiple Path Values Ordered by Cost Path Coat Value 121 Table 4.6: Costs and Headings for 10 Best Path Values of Test Case D RANK Path Cost Value Heading 1 0.4298 65 2 0.4304 64 3 0.4315 63 4 0.4320 62 5 0.4323 74 6 0.4324 73 7 0.4330 72 8 0.4331 61 9 0.4336 71 10 0.4338 60 2.5 ._ 2 AL 1.5 «- I I. 0 Optiml 041111.111? tii‘rl' “‘f"“r‘%%1“‘ru 1 51 101 151 201 251 301 351 Rank Order Multiple Path No. Figure 4.13: Plot of Test Case D Multiple Path Values Ordered by Cost Path Coat Value 122 Table 4.7: Costs and Headings for 10 Best Path Values of Test Case E RANK Path Cost Value Heading 1 19.00 71 2 19.50 72 3 19.50 74 4 19.50 75 5 20.00 68 6 20.00 76 7 20.00 77 8 20.00 78 9 20.00 79 10 20.50 69 so _ 45 1. 40 l. 35 «l- 3. it 25 . 2" if“ 15 JL PPPMAlternatives 10 «l- 0 0 | l 5 1- o 11H1111111111111111111111111111' 1 51 101 151 201 251 301 Rank Order Multiple Path No. Figure 4.14: Plot of Test Case E Multiple Path Values Ordered by Cost Path Coat Value Table 4.8: Costs and Headings for 10 Best Path Values of Test Case F 123 RANK Path Cost Value Heading 1 18.00 336 2 18.50 337 3 18.50 338 4 18.50 339 5 18.50 340 6 18.50 341 7 18.50 342 8 18.50 343 9 18.50 344 10 18.50 345 Figure 4.15: Plot of Test Case F Multiple Path Values Ordered by Cost Rank Order Multiple Path No. Path Coat Value 124 Table 4.9: Costs and Headings for 10 Best Path Values of Test Case G RANK Path Cost Value Heading 1 36.50 13 2 38.00 12 3 38.00 14 4 38.00 15 5 38.00 16 6 38.00 20 7 38.00 21 8 38.50 9 9 38.50 10 10 38.50 1 1 8 L 1 l l -— PPPMAlter-nativee —°— Optima-l o 1 1 1 1 1 r I I I r 1 11 21 31 41 51 l r I I I 61 71 81 91 101 r 111 Rank Order Multiple Path No. Figure 4.16: Plot of Test Case G Multiple Path Values Ordered by Cost 121 1 1 1 l I I 1 I 131 141 151 161 125 4.3 Variable Connectivity Results 4.3.1 Taylor Series Template Method Table 4.10 shows the number of cycles for each simulation run of the neural network using the templates generated in Section 3.3.1. These runs were exe- Table 4.10: Neural Network Simulation Iterations Using Taylor Series Templates Template Cycles per Case Size A B C 5 point 4,000 3,484 5,460 13 point 3,647 2,251 4,015 25 point 3,621 2,612 4,518 41 point 6,947 15,000 10,140 cuted on a VAX 8600 until the convergence input parameter was satisfied at the end of each iteration of the simulation (or until 15,000 iterations of the simulation were executed). The sum-squared difference of all neuron outputs (TOTDIFF in the PPPM algorithm) is checked against the input parameter to test for convergence. For the simulation runs, the convergence input parameter was arbitrarily set to 0.001. In each case, it can be seen that fewer cycles of the parallel algorithm are required for convergence using the 13 point template over the 5 point template. While this improvement continues marginally for the 25 point template of Case A, in the other cases the number of cycles to achieve convergence increased. Continued expansion to the 41 point template resulted in a further, and rather significant degradation in convergence for all cases. Table 4.11 contains data for the final minimum cost path value at the end of the algorithm execution. Each 126 larger template failed to provide more accurate (smaller) cost values in each case. In fact, the cost values increase with the larger templates using the Taylor series method. The results reported here illustrate the failure of the Taylor series Table 4.11: Least Cost Path Values Using Taylor Series Templates Template Path Cost per Case Size A B (Y 5 point 0.329 7.672 0.840 13 point 0.331 7.599 0.937 25 point 0.331 7.631 1.006 41 point 0.333 7.826 1.068 template method which was reported in Section 3.3.4 and led to the development of the parallel summation template method. Figure 4.17 through Figure 4.19 are composite plots of the minimum cost of successive algorithm iteration for the four increasingly complex Taylor series template connectivity patterns. Over 135 simulations were run and their results were plotted. These plots show the value of the minimum cost path found by the neural network during each cycle of the run. This was repeated for the more connected pattern, and each run was plotted for a particular test case. A family of curves results which shows the computed cost over computation time for the varying connectivity. These plots show the impact of increased connectivity on the quality of the computation results. Each figure illustrates that the convergence patterns are similar for all templates. The plots of different test cases have a different character or general shape. There are three considerations to explain this phenomenon. First, each test case had a different cost function. Not only was the size of the problems different, but the variability of the cost function was 127 also different. The cost functions included rather smooth surfaces, sharp edges, or frequent hills and valleys. The unique surfaces of these cost functions result in the different characteristics of these connectivity curves. The second reason is that the value plotted is for the minimum cost path found thus far in the algorithm iteration. This can be any flux line and the heading plot of Figure 4.7 shows how much this changes at the beginning of the algorithm. The final consideration is that the plot illustrates tentative results towards convergence. The path value depends on the latest potential surface computed by the neural network at that point in the algorithm convergence. Again, Figure 4.6 illustrates that during the early iterations of the algorithm, the potential surface has not converged to a stable configuration. These reasons considered together account for the differences in the curves between the test cases. Upon close inspection of the plots, the desired result of improvement from using a more complex template connectivity pattern does not occur. Indeed, as the template connectivity is increased, there is actually a decrease in the conver- gence, exhibited by a higher final cost of the minimum cost path. However, these connection patterns are very small on the scale of connec- tivity. In order to get more significant results at larger connectivity, much larger templates are required. Additional constraints are added to convert the template coefficient equations (Equation 3.46 through Equation 3.51) from an underdeter— mined system. These constraints are intuitive, but yet somewhat ad hoc, trying to account for behavior farther from the approximation point at the center of the template. 128 As the connectivity increases with the larger templates, it was hoped that the cost plot could be predicted to drop sooner for each increase in connectivity, reaching the minimum cost in faster time. The larger templates use more of the network node outputs. As changes to node outputs occur in a remote region of the network more distant from a local node of interest, these larger templates facilitate propagation of the remote changes to each local node’s output com- putation. However, plots of the simulation results show that the larger templates are not providing a better approximation. An examination of Figure 4.17 through Figure 4.19 reveals that the higher connective templates are not, in general, con- verging faster. The template defines the differential operator for the approximation. Expansion of the template would seem to be counter to the definition of a derivative in the limit, which would lead to contracting the template instead of expanding the template. As the cost function varies more, less accurate approximations appear with the larger templates, and argues for smaller templates in order to have more accurate approximations of the derivatives with higher frequency cost function variation. 129 < 88 66h to 853th 8.3% BEE as use: venom 561280 863 65 52 «9:30 3.3.5858 K F .v 939“. A. 25 Ana-use .0 1335 as: now 00.9.» 00.3“ 8.2“ coda- : dw— oviu o— in 3.. t 3 m a .... 11' u: m «UH-1k” l 1.1 0 111mm uaod aso: unetutu au; go 3303 #Il( 1’. 5:! ("if 3“ ,9: t m .a.v ~6n_~oro + 11 .gmu vo.nom.o . . a can. emnmoo.a o .am kme_oo.o a s n 130 m 860 Bob co 883th 3:60 55¢ 65 as»: 6554 £61.30 863 es .2 «955 55858 no P 6 239... A. 2.3 Ana—030 .o Len-55 ea: 3 I...“ 8...: 8.8. 8.3. 9.8 8.3. on. .IaIILr13lll411llll41IIII1LrIrIaIL .2 m u a m .43 u» w. .81 um. I m .l _.m Eu 3 O u. l 1|..d an. M. .a_v mm - 11 . .amm men. « .am. 6.60 .o o .am names .6 a 131 0 860 88. 22 85225... 3.220 BEE 65 02.6: 2226.2 £22 500 .83 as 22 «9220 5.5282260 ”m; 930E A. 2.3 Aneduau .o .5955 ea: cc oe.«en .e.«~n ae.~e« ca.~e~ .a.~v. oa.~o o..«« co. {IlTl4ll4lll 10181503 1 T 16 '0 q;ed aso: anututu 00; go 1303 1 ”'1 1 91'! .294 emAvooTo “'1 an. “mm as can o_moo .o a I"! 132 4.3.2 Parallel Summation Method The parallel summation method provided a solution to the convergence and accuracy problems of the Taylor series templates of larger connectivity. The solution did not change the underlying accuracy of the five-point approximation to the derivative, but rather achieved increased connectivity through the computation architecture of the template. Figure 4.20 through Figure 4.22 are composite plots of the minimum cost of successive algorithm iteration for the increasingly complex template connec- tivity patterns derived using the parallel summation architecture of Section 3.3.4. Each figure illustrates that the convergence patterns are similar for each template. However, unlike the previous template patterns derived from the Taylor series, these template patterns exhibit the improvement expected from using increased connectivity as hypothesized in Section 3.3.4. As the template connectivity is in- creased through more parallelism in the approximation summation, the network architecture accomplished a more rapid decrease in minimum cost paths. In each plot, the increased connectivity is indicated by a larger index, k. Each test case has a similar behavior. The higher the connectivity index, the faster the minimum cost drops during the algorithm iteration. 1.33 < 88 ~85 co 33388 228.555 B=Emm cs 3.5: 938 Sam ”moo 8mg o5 .6.“ «9:30 3380230 Home 939”. .. 2.3 3033. .0 39.55 at: .3... o. .u~..~ 3 .8. 2 in. so .2. on .no 8 .2. on .2 8 .n? l l l l - r - t m - - - -. - . - .. . .l m I. 1 . 7 .es 9. CI Donor mod is" out 90 3303 / /\. x < ..x x ”I .9 't g. cm 134 m 38 32. co 853:8. toamEEzm ‘23.ch ms 93: Riot 5mm .80 ~83 o5 cod 3230 3.338228 ”Fwd 9:9”. AXES nuzuau .o ”3.9.55 0.....— vn on.onN ON.¢nN o—é .0 .un— :— [LIAIILIIL m # Ila/Elf] ZO'J 101d1$03 89" 18'. 7 ' ‘I l rune: utod isoq out i0 3303 ‘/ I, . M ”II /, II a: a: a: a: a: / ll 6 I 19" 135 o 88 ”mos co 333th team—:53 Banged es 9%: venom 5mm ”moo ~33 2: as 82:0 @3858 ”Nmé 239“. a. 25 Ame—uau .o .39ch es: ov on.uvn oo.~g~ ca.nn~ 00.nog o..o~. o~.nn on.¢_ oo.unr ’ ’ L ‘ b ‘ ‘ ‘ ‘ ‘ ‘l ‘ N 2"! IOWdISOO 8". m“ If [ojfl / / 00'! pone; Hind isoq out so 3803 In \\ u’ 1 CE“! I a 3‘ 2‘ AK '6'! 136 4.4 Connection Machine Analog Simulation Results Hardware implementation of neural networks have been offered[12, 15, 30, 31] using many technologies. Analog neural networks have received a lot of interest due to their speed and potential for VLSI implementation. This section reports on an investigation of an analog implementation of the neural network architecture for the electrostatic model. Because of the number of neurons involved in the test case (3,600), this required 7,200 operational amplifiers for an analog circuit. Rather than building such a massive circuit using breadboard techniques, a simu- lation of the analog circuit was developed for execution on a Connection Machine. The Connection Machine was selected because of the ease with which the simu- lation could be programmed for parallel execution and the resulting simplicity of the neuron model. Each processor of the Connection Machine was devoted to simulating a single neuron. The analog circuit simulation is based upon the neuron model illustrated in Figure 3.9. For simulating the neural network architecture on the Connection Machine[28], a simple operational amplifier model [66] was employed as shown in Figure 4.23. A block diagram of the function of the inverting operational amplifier is shown in Figure 4.24. This block diagram computes the function co = —wH/edt where mg is the open-loop gain of the amplifier. The digital simulation is imple- mented as shown in Figure 4.25. 137 [_t Figure 4.23: Operational Amplifier Schematic C 1 -0) ) — ) ) s H 6o Figure 4.24: Operational Amplifier Block Diagram The configuration employed in the implementation of the model using inverting operational amplifiers is shown in Figure 4.26. The operational amplifier attempts to match the inputs. Since the positive input is grounded, the output of the amplifier will be such that the negative input will be driven to match to ground state through the feedback path. The equations for the configuration begin with applying Kirkoff’s current law to the junction at the negative input to the op amp, labelled as e. Since there is a very high impedance at the input to the op amp, —>®e—>—§- -A 138 Figure 4.25: Op Amp Digital Simulation Block Diagram Figure 4.26: Configuration of an Op Amp Adder with Gain ”co 139 any current flow to or from the circuit node e and the op amp at the negative input can be ignored. So the first equation needed is C -—-e 8 ‘—-e ‘ + ° = 0 4.1 Solving for 6, gives —__R_f__e +__R‘_e R,+R, ‘ R,+R. ° 6 (4.2) But since the operational amplifier is driving its output so that its inputs match and the positive input is ground, 6 = O, and Equation 4.2 reduces to 6,, = —-——e, (4.3) Thus, the closed loop gain provided by the operational amplifier is R GAI = 52 = ——’ 4.4 e, R. ( ) For n inputs, (:1, . . . , en, rather than the single input e,, this configuration general- izes to 61 62 en e,,=—R (—+—+.-.+—) 4.5 ’ R1 R2 12.. ( ) Figure 4.27 is an illustration of the op amp schematic for implementing the 5- point finite difference approximation template which computes the weighted sum of its four neighbors. 140 3:? Q0 3.6: coamEEonQc‘ .36 do“ 03838 5:62 Kmé 239m e> a 141 To set up for the simulation of the operation amplifier neural network, a con- ductance, G, for each input connection is computed. This conductance, G, is the inverse of R. In the following procedure, the subscripts R,L,U,D denote (respectively), the right, left, up, and down neighbor nodes in the finite difference approximation template. The procedure is as follows: 1. Establish the reference conductance, G, 2. For each input connection weight, 10,, compute its corresponding conduc- tance value G,- = 0,112,, where w,- = a,/a,,, as computed in Equation 3.59 through Equation 3.63 and shown in Table 3.2. 3. Compute the voltage divider coefficients G VDCR=GR+GL+GZ+GD+Gf G VDCLzGR+GL+GS+GD+Gf G VDCU:GR+GL+G:+GD+G, G VDCDzGR+GL+G3+GD+Gf VDC,= 0’ GR+GL+GU+GD+Gf 4. Implement the operational amplifier by computing 61 = VRVDCR “I" VLVDCL + VUVDCU + VDVDCD + eaVDCf eo——A/e1dt _2+K ‘32 2 2 142 Figure 4.28 and Figure 4.29 both illustrate the response of the simulated opera- tional amplifier used in the Connection Machine simulation of the neural network architecture. Figure 4.30 illustrates the convergence of the neural network simulation con- ducted on the Connection Machine. The simulation time step is 200 nanoseconds. This simulation converged to within 0.000001 of the results of the digital simulation of the network in approximately 18 milliseconds of real time. m a see a 55. ads as . e as»: 8335.6 use. no Home 2:9... Amocouomosumtv one» and cud ovfi Cad 006 nu n and nué '6 _ HIP, r rll+|Mm _r m w _ u _ mm 2:] fill m z / j .. w m .p 1 g N ,IK r .. m azaaao eDQC~ o.mduxae> .nmo._u¢ .Numm\.m .~:x emu “ 144 as: as. 2 Re w a sec 5.5. ads as, 2 m use: 83:55 85 do ”was 9:9”. lOWdHUdO Anacouomoguuav asap _O cud OQN Wad a... nu." on." ma; ... . i . i J-“ _ u T / / / w... I ( / m m \ m .u .ga.3o fl u :ac_ u o.mduxee> .mms.~uc .Nummxwm .Nzy omw u 145 8399.30 oERfiot 8.00on assocm «.20 co E9502 $502 .36 do :oamEEB “.526 uoRSx Home 959“. nutcouons__msv.osah 8..” 8.0 8.. n... I! ‘t It '8 l8 '1 I '0 0093105 ‘0 '9 II '9 R 'l 146 4.5 Timing Results Table 4.12 shows the timing results for the electrostatic model running on a VAX 4000 Workstation with a 6 MFLOPS performance for each of the test cases. The second column shows the size of the problem defined by the total number of neurons which correspond to the number of states in the problem. The third column is the time in milliseconds for the network to compute the potential for all nodes of the network. This time is divided by the network size and the result in contained in the fourth column as the average computation time per neuron for the parallel architecture. The average for all cases was 99.7 microseconds with a standard deviation of 8.2 microseconds. This data provides a baseline for comparison with the Connection Machine implementation and the projected analog circuit performance. It also indicates that the network scales linearly on the VAX (not surprising since the algorithm loops on the neuron indicies). More important is that this shows results for scaling with neural networks of significant sizes (441 neurons up to 3,600 neurons). This is eight times larger at the larger network. Many of the neural networks studied in the past have only dealt with neurons in the quantity of at most a few hundred neurons. Table 4.13 shows the results of the average Connection Machine (CM-2) time per cycle update for the parallel network simulation. The results are in millisec- onds. The average cycle time for all cases was 5.4 milliseconds with a standard deviation of 0.3 milliseconds. This parallel computation occurred using 16,384 processors for an actual network of 16,384 neurons. Using the results of 99.7 147 microseconds per neuron for the VAX, it would require 0.0997 x 16384 = 1633.5 milliseconds per cycle to simulate the 16,384 node network on the VAX. The ratio between these implementations is VAX time for 16,384 neurons _ 1633.5 msec ~ 303 CM time for 16,384 neurons 5.4 msec ~ Speedup = Thus the Connection Machine implementation results in a performance improve- ment for the neural network of 300 times over the VAX clue to its explicit parallelism. The Connection Machine employed was a 16K CM-2 at the Argonne National Lab. It is a data parallel computing system which associates one processor with each data element of an array. There are two parallel processing units, each of which has 16,384 data processors. Each data processor has 64K bits of bit-addressable memory and an ALU. The ALU is a bit processor, and arithmetic is a bit-serial process. One floating point accelerator exists for every 32 data processors. The performance specification for single precision floating point is 2500 MFLOPS[69]. Comparing this with the 6 MFLOPS performance of the VAX yields an expected speedup of 416. This is consistent with the results obtained above considering that timing for the neural network included other processing. This additional pro- cessing of the neuron outputs included saving previous values, range limiting, and clamping boundary conditions. Since neural networks are fundamentally different from Von Neumann proces- sors, they have their own performance criteria. Rather than using a measure of instructions per second (IPS), neural networks have been rated using the measure of connections per second (CPS). Table 4.14 gives the data for the neural network II Table 4.12: VAX Timing Results for the Neural Network Simulation 14a CASE Network Size Average Network fime Normalized Neuron Time (a2 Neurons) (milliseconds) (microseconds/neuron) A 900 91.6 101.8 8 3,600 369.3 102.6 C 2,500 256.5 102.6 D 4,225 436.3 103.6 E 441 45.1 102.3 F 441 45.7 103.6 G 676 54.9 81.2 Table 4.13: CM-2 Timing Results for the Neural Network Simulation CASE Average Cycle Time (milliseconds) Q'nmUOm) 5.5 5.9 5.5 5.2 5.1 5.5 5.1 149 performance in terms of connections per second. The Time/Neuron column is Table 4.14: Analysis of Connections per Second on the VAX CASE Neurons Connections Time/Neuron KC? Time/16k 8&8 Time (microsecs) (millisecs) (seconds) A 900 3,600 101.8 39 1,668 71 3 3,600 14,400 102.6 39 1,681 696 C 2,500 10,000 102.6 39 1,681 319 D 4,225 16,900 103.6 39 1,697 846 E 441 1,764 102.3 39 1,681 26 F 441 1,764 103.6 39 1,697 26 G 676 2,704 81.2 49 1,330 54 given in microseconds and has an average for all cases of 99.7 microseconds with a standard deviation of 8.2 microseconds. The KCPS column is the kilo- connections per second rate with an average of 40 KCPS and a standard devia- tion of 4 KCPS. The sixth column is the time required in milliseconds for a 16,384 node VAX cycle update. Its average is 1,636 milliseconds with a standard devi- ation of 134 milliseconds. Finally, the last column shows the time for the branch and bound algorithm to find the optimal path for each test case. Recall that the time for the Connection Machine to update 16,384 neurons with 65,536 connections was 5.4 milliseconds. This gives an effective performance rate which would be defined in terms of the VAX performance as 12 million connections per second. This is approximately 300 times better than the VAX performance of 40 KCPS, and again, consistent with the 16k CM-2 performance. Chapter 5 Discussion of Results This chapter discusses the results obtained in the research. The first sec- tion presents a general result of the research concerning the approach for the design of massively parallel architectures. Following the discussion of this de- sign paradigm, the path planning results in the areas of multiple path generation, variable connectivity, and network performance is discussed. 5.1 Design for Massively Parallel Architectures Three distinct computer architectures are the Von Neumann processor, the multiprocessor network, and the massively parallel network. The Von Neumann architecture is characterized as a single processor (CPU) with a stored program memory. An example of the Von Neumann processor is a microprocessor such as the Motorola 68020 or the Intel 80386. The multiprocessor network is a group of Von Neumann processors communicating through an interconnection mech- anism, such as a common bus. The interconnection topology is a federated arrangement, or sometimes a communications switching network. The neural network is an example of massive parallelism where an enormous number of 150 151 simple non-Von Neumann processing units are locally connected to many other processing units. 5.1.1 The Characteristics of Parallelism A distinguishing feature of an architecture is its degree of parallelism. Paral- lelism exists at many levels within the complex structure of even a Von Neumann processor. At the electrical circuit level, parallelism is exhibited within semi- conductors as analog signals such as the simultaneous propagation of current throughout the device. At the next level, these semiconductors are brought to- gether into a digital device. Parallelism also is illustrated here as discrete voltage levels are changing simultaneously at various places within a digital circuit. Even at the hardware function level, such as l/O processing, instruction fetching, or cache memory operation, parallel activity occurs. However, considering software as the means to specify computation activity, there is no parallelism in the algorithms programmed for a single CPU since the processing unit can decode and execute only one instruction at a time. More com- plex computations are involved with a single CPU than with the parallelism which is described next. For multiprocessor networks, the granularity of parallelism is coarse. Coarse granularity implies a low level of communication activity between the processors. These processors are performing simpler computations on the algorithm than in the single CPU. However, there is increased processing to han- dle the communications. The massively parallel architecture has a fine granularity and a high communication to computation ratio due to simple computations, and 152 a high degree of communication between adjacent computing units. This archi- tecture has the algorithm “impressed” onto its medium in such a way that the algorithm solution emerges from local processing. Consider the design approach in using these three architectures to solve prob- lems. For the single CPU, classic design methodologies are used. These include flowcharts, stepwise decomposition, structured analysis and design, for example. Many of these methods are also used for multiprocessor designs, but additional complexity requires careful attention to algorithm requirements. Control and data partitioning of algorithms for process allocation over several multiprocessors is required. These requirements have to be carefully distinguished and managed to avoid inaccurate results and other harmful side effects. Advancements in computer architectures and hardware technology have led to practical implementations of massively parallel computing[12, 15, 16, 28, 33]. The first fundamental question for this research from Section 1.2 asked if, in design- ing massively parallel architectures, any unifying principle exists for the distinct roles of application, algorithm, and architecture. Because of its characteristic of collective computation, a new way of thinking about the role of application, al- gorithm, and architecture in the design of massive parallelism was needed. The design challenge was to determine an appropriate model of the interdependence between the application problem to be solved, the algorithm which solves the problem, and the massively parallel architecture which solves the problem us- ing collective computations. A key discovery of this research was a new design paradigm for massively parallel architectures which combine these interrelated 153 issues. 5.2 A Massively Parallel Architecture Design Paradigm A new design paradigm of natural parallelism has been discovered in the course of this research. This paradigm is an architecture design concept for massively parallel architectures. The motivation for this paradigm was to meet a design need in the evolution of computer architectures towards massive par- allelism. The paradigm of natural parallelism is hypothesized to meet this need. The results of experimental research presented in this dissertation have verified the usefulness of the paradigm. A design paradigm needs to unify the issues of application, architecture, and algorithm in massive parallelism. This new method determines a mapping onto the processing elements and interconnections of the massively parallel architec- ture from the application model. This new paradigm for massively parallel ar- chitecture design is called Natural Parallelism. The natural parallelism paradigm uses phenomena in nature to suggest analogies for the computational model of the application problem. 5.2.1 The Paradigm of Natural Parallelism Recent developments in massively parallel architecture technology has in- creased their capacity while shrinking their size. To capitalize on the advantages offered by these new architectures a new approach to their design is needed. This new approach is based on the increased density of hardware implementations 154 for massively parallel architecture technology. Processing units of the architecture are dedicated to explicit, unique state assignments of the problem, given a suit- able state variable resolution. Consider how discovery of nature is done. Nature is observed and a hypothet- ical model is conjectured to express natural law. This model is an approximation of the physical phenomenon which is observed, and expressed as mathematical equations in physics. When computing solutions to these equations, a numerical simulation of nature’s behavior is often performed. Now consider this viewpoint reversed. Nature is the perfect analog computer for these same equations. Na- ture is the analogy for solving problems described by the mathematical equations of physics. Thus, the new approach for massively parallel design consists of 1) determining a natural phenomenon which is an analog of the abstract problem, 2) creating an analytical model of the natural phenomenon, 3) constructing a map- ping of the analytical model to the abstract problem, and 4) implementing the massively parallel architecture for the mapping. Recall that in Chapter 1, problems were described as state transition digraphs. Nature can also be described by states and operations which transform these states. This provides another analogy to massively parallel computation. Rather than the traditional sequential computer system in which a single processor fetches data from memory, modifies the data, and puts the data back into memory, con- sider memory which has a computation capability. This computation is accom- plished by exchanging and modifying values among the data elements of the memory. When the data elements of the memory represent state assignments of 155 a problem, the memory values are possible state variable solutions. This structure provides an explicit mapping between processing nodes of a massively parallel architecture and state assignments of a physics problem. The mapping between state assignments of the natural analogy and the processing nodes occurs by a sampling of the state space into a computing grid with appropriate sampling resolution. In the suggested analogy, the states of nature correspond to computing nodes, as described in the reversed viewpoint above. The explicit representation of the application problem states define the processing elements in the computing struc- ture. Application solutions are computed numerically in a massively parallel ar- chitecture which is implemented in a grid structure corresponding to a sampling of states in the problem space. 5.2.2 Understanding the Natural Parallellsm Paradigm The natural parallelism paradigm is a synergistic concept. Synergism de- scribes phenomena where elements of a system work cooperatively in such a way that the total effect of the system exceeds the sum of the effects of the inde- pendent elements. Similar concepts are available from other disciplines. Holism was defined in 1926 by evolutionists as the orderly grouping of unit structures into whole bodies or organisms in nature[67]. Holism was argued to be a synthesis which makes parts act as one. According to John Hopfield, the physiological computation of biological neural networks can be viewed holistically as sponta- neous behaviour of the whole network: 156 The bridge between simple circuits and the complex computational properties of higher nervous systems may be the spontaneous emer- gence of new computational capabilities from the collective behaviour of large numbers of simple processing elements[29]. The synergy described above is represented as collective behavior of simple processing elements. Similarly, the natural parallelism paradigm has a synergy arising from its own defined elements of application, algorithm, and architecture. Each of these elements contributes to a holistic effect of the design paradigm. The relationship between the three elements of the natural parallelism paradigm is shown in Figure 5.1. Each element of the paradigm influences the other two el- ements. While individual descriptions of its elements do not characterize the syn- ergy of the paradigm, it does help for understanding the concept of the paradigm. In this figure the aim is to get a sense of the unifying character of the paradigm. The paradigm tries to address massively parallel design in a holistic manner. The three elements of the paradigm are algorithm, architecture, and applica- tion. The application is the problem requiring a solution, which is the purpose of the system. The algorithm is the model for using natural parallelism in the system. The architecture is the implementation of the system. Consider the interaction between the elements of the paradigm as depicted by the arrows in Figure 5.1. The application concept suggests an analogy in nature which is the basis of the algorithmic model. The inverse relationship characterizes the type of problems which can be solved by the model. The algorithm defines the mapping needs for the architecture. The architecture, in turn, specifies the neuron and weight functions to the algorithm. The influence of the application on 157 Hath nodd nape to Mn Neuron transfer function \ W on God for and/dainiafinition max analyst ARCHITECTURE APPLICATION Explicit state dalinition and problem in ‘ State a ace mapping and amp in; roquimt Figure 5.1: Relationship Among Elements of the Natural Parallelism Paradigm the architecture develops state space mapping requirements such as the spatial sampling resolution. Conversely, explicit definition for the application problem states is provided by the architecture and its problem size is scoped. 5.2.3 Using the Natural Parallelism Paradigm In Figure 5.2 the process of applying the paradigm of natural parallelism is illustrated. The process begins with the selection of some problem of interest. The first step in the process is to look for analogies in nature for the selected problem. When an appropriate analogy is found, the natural parallelism concept is formed. This concept from the paradigm is the key to the massively parallel design. The relationship of the concept to the application is understood and is 158 Application Natural Parallelism Mathematical Requirement Concept Model Discover Physical Analyze Construct illnalogies'L in Nature Concept Mappmg Computation Architecture Figure 5.2: Applying the Natural Parallelism Paradigm explainable. The algorithm is identified with the natural law which applies to the concept. The architecture is mapped through analysis of the concept to define the computational structure of the network. The natural law for specifying the algorithmic solution to the problem is a result of the analysis of the physics model used in the natural parallelism concept. The mathematical model of the natural law is used in constructing the computation structure of the architecture. In the introduction to this dissertation, a central thesis was posed. It stated that a massively parallel computer architecture could be defined where the computa- tional units correspond to explicit problem states converging to problem solutions. The natural parallelism design paradigm has achieved this objective. 159 5.3 Discussion of Path Planning Architecture Results Chapter 4 presented the results of the artificial neural network architecture designed for path planning. Assessment of the data from all of the test cases showed that the value of the paths generated by the neural network had an aver- age difference of 8.3 percent over the value of the optimal path for the test case. Even if 8.3 percent is a higher penalty for using the neural network than desired in the path planning application, the benefits of the multiple paths generated are the compensation of the method. The data revealed a significant effect from the multiple path generation. This effect was that the costs of the top-ranked paths generated by the neural network increased linearly for at least the first half of the alternative paths. Furthermore, the rate of increase averaged to be only 0.22 percent of value of the least cost path found by the neural network. So although the value of the least cost path may be higher than the optimal path, the archi- tecture will generate many additional paths which do not significantly degrade in 0051. Chapter 6 Research Conclusions The research of this dissertation dealt with issues of architecture design in massive parallelism. In Chapter 1, the requirements associated with large com- plex problems created the need to have a closer match between the application problem, and the massively parallel architectures and algorithms. The discovery of a paradigm to guide the design of massively parallel architectures is one of the significant results of the research. This paradigm represents the development of a working synergy between the traditionally distinctive elements of application, algorithm, and architecture in computer system design. A second significant re- sult of this research was the use of the paradigm to design massively parallel architectures. An artificial neural network was designed which can solve a class of partial differential equations. These designs were used as examples to test the paradigm. A third significant result of this research was showing how the architectures can be applied to path minimization problems. 160 161 6.1 Path Planning Using the Electrostatic Model One of the most difficult tasks of this research was to develop a model of path planning search which had a true parallelism inherent in the model. This interest came about since strictly sequential algorithms were not directly applicable to artificial neural network architectures. Parallelized versions were not of interest either since they are targeted for loosely coupled multiprocessor systems. The path planning model required the feasibility of a more direct use on tightly coupled architectures such as artificial neural networks. The second interest, and perhaps a more compelling requirement for path planning, was a desire for the resulting system to produce alternative sub-optimal solutions. These alternative paths provide choices at the starting node to the user of the path planning system. For various operational considerations, users often desire alternative choices when employing automated planning systems. But this is a very difficult proposition for classical sequential search algorithms. The principle of optimality does not support the consideration of simultaneous, sub-optimal decisions. It was at this point that the electrostatic model was conjectured to satisfy these requirements for a directly parallelized model and alternative sub-optimal paths. The idea was simple enough — nature “automatically” distributes electrical current through conducting material in an optimal manner, although this optimization criterion is different from that for a least cost path. Could the current flux lines then be used as the alternate paths? The result is that they can as was shown in 162 Section 3.2. These alternative paths make intuitive sense. They avoid regions of high cost and prefer regions of low cost. The paths are also parallel alternatives which have an incremental variation between the adjacent paths. Neighboring (adjacent) paths are defined using the theory of the calculus of variations. The paths are defined by a function. This func- tion, when adjusted by a “variational” increment, defines a new path neighboring the original path[7]. This variational increment is very similar to a differential, only applied to functionals rather than to variables. Consider a uniformly conducting material with dipole charges. The electrical current flux lines for this material look very similar to the magnetic field lines which form between the north and south poles of a bar magnet. As the conducting material is modified so that its conduc- tivity is no longer homogeneous, the flux lines will warp and bend simultaneously to accommodate the changes in the conductivity of the material. Yet they remain adjacent to each other in a variational sense, preserving the alternative choices for the user. Thus, the electrostatic model was perceived as a viable approach to modeling this path planning problem. What was needed next was an approach to implement this model on an arti- ficial neural network. 6.2 Designing the Neural Network Architecture The electrostatic model provided another important benefit in the research —— a sound mathematical basis. This was due to the nonhomogeneous partial differ- ential equation (Laplacian) of the electrostatic potential field. With the boundary 163 conditions applied, the flux lines could be extracted by a gradient descent over the potential surface from the source node (maximum potential boundary con- dition) to the sink node (minimum potential boundary condition). Therefore, the challenge was to compute a solution to the partial differential equation using an artificial neural network. The means to accomplish this was a finite difference approximation to the par- tial differential equation developed in Section 3.3.1. A study of the finite difference approximation formulas revealed a striking similarity to the familiar equations for the artificial neural network. Thus, by an appropriate mapping of the coefficients of the finite difference approximation formulas into the weights of the artificial neural network, the neurons could compute the numerical solution to the partial differential equation. This design was successfully accomplished in this research and described in Section 3.3.2. A significant result of this approach is that the resulting design benefits from the classical numerical analysis of finite difference approximations for partial differential equations. Since the artificial neural network directly implements the finite differ- ence approximation (in a truly parallel architecture), it retains the convergence and accuracy properties of the finite difference approximation. Additionally, the artifi- cial neural network scales directly with the size of the problem just as is the case with the finite difference approximation. With this architecture design in place, the influence of the amount of interconnections could be investigated. 164 6.3 Influence of Variable Connectivity The finite difference approximation formulas map into simple interconnections between adjacent neurons in the artificial neural network. Improvements to the convergence of the network were conjectured to result from increasing the in- terconnections between neurons beyond the locally adjacent neurons to larger regions of the network. This required a way to increase the number of input con- nections to each neuron from other neurons, and yet preserve the integrity of the finite difference approximation. The solution was found in the parallel summation technique that was devel- oped in Section 3.3.4. This approach preserved the fundamental finite difference approximation formula as a template kernel so that the approximation of the derivative would be consistent in larger interconnection patterns. The technique consisted of recursively unfolding the kernel to rewire the input connections of each neuron directly to those neurons whose outputs were directly connected to the neuron whose outputs were the source of the input connections. Each recursive unfolding of the approximation template resulted in a higher amount of interconnections. This was shown in Section 4.3.2 to successfully reduce the convergence time and yet preserve the accuracy of the approximation. The ben- efit of the variable connectivity is to give the neural network designer tradeoffs between the convergence time, accuracy, and network costs (measured in terms of the implementation complexity through connectivity). For a given accuracy and time requirement, this tradeoff then specifies the amount of interconnectivity that 165 is necessary in the artificial neural network. Conversely, for an embedded imple- mentation of an artificial neural network with a given amount of connectivity, the design tradeoff can be used to project the accuracy obtainable from the network in a specified time interval. 6.4 The New Paradigm of Natural Parallelism A new principle to guide the design of massively parallel architectures was conceived in this research. The paradigm which was conceived is as unique to massively parallel architectures as the sequential thinking process is unique to Von Neumann architectures. This paradigm was called natural parallelism and can be understood as nature acting as a massively parallel analog archi- tecture. Nature serves as the ultimate massively parallel architecture. As such, natural phenomena can be used as analogs for abstract computation problems. Once discovered, these natural analogs can be analyzed to develop the math- ematical model foundations upon which massively parallel architectures can be constructed. This concept provided a synergy that unified the distinctive Von Neumann treatment of application, algorithm, and architecture in computer design. 6.4.1 Natural Parallelism Related to Algorithms The natural law which was employed to investigate the paradigm was a class of partial differential equations in mathematical physics. These partial differential equations describe physical phenomena in field theory. Using natural parallelism, 166 these physical phenomena were shown to describe certain optimization problems. Nature was viewed as an optimization process, exhibiting equivalent behavior for which mathematical expressions are available (that is, the partial differential equations of field theory). A neural network architecture was designed in the research to solve a class of partial differential equations. The significance of this design was a neuron transfer function and weight formulas for computing the finite difference approximation. 6.4.2 Natural Parallelism Related to Architectures The partial differential equations were mapped onto the massively parallel ar- chitecture using finite difference approximations. The numerical solutions of the approximation formulas effectively simulated the analog computation of nature. An architecture was designed to compute a five-point finite difference approxi- mation solution to these equations. The five-point finite difference approximation formula provided for the definition of interconnection weights in the architecture. The massively parallel computing architecture was defined using explicit states of the problem as the computing nodes. This architecture computed solutions by exchanging information along interconnections defined in a mesh topology. As this exchange continued, each node’s output converged to a value which rep- resented the corresponding state’s value in the problem solution. These state outputs are the scalar potential field solution to the partial differential equation. The approximation templates for the finite difference formula defined the intercon- nection weights for the particular architecture configuration. 167 This research resulted in a parallel summation technique using variable con- nectivity to perform the integration necessary for the finite difference approxima- tion in the neural network. This technique was significant in its ability to define varying degrees of interconnection between the neurons. This was necessary to accomplish and assess the convergence of the network with different amounts of interconnection. The results of the connectivity study verified the hypothesis that increased convergence occurs as the interconnections increase. 6.4.3 Natural Parallelism Related to Applications The application of primary interest was path planning. This required searching for a path in a space where a cost function was defined. The path end points were defined, and the path with the least total cost along its length was the minimum cost solution path to the problem. The paradigm was employed to define a massively parallel architecture to solve this problem. Natural parallelism provided a model in field theory as an analog to this problem. This was the electrostatic model. The architecture design for the electrostatic model used an interconnection definition which supported variable connectivity. This architecture was shown to provide multiple, alternative solution paths. The architecture also was used to study the effect of connectivity on the solution convergence. This research has shown the use of the natural parallelism paradigm to solve path planning applications. Artificial neural networks designed using the natural parallelism concept were shown to provide paths which correspond to a natural 168 analogy of field theory in electrostatics. The electrostatic model architecture was shown to compute multiple, alternative paths. The cost of these multiple path so- lutions showed only a linear increase among a significant number of the available alternative solutions. 6.5 Summary In Chapter 1, several questions were raised to motivate and direct this re- search. As a result of this research, several observations and answers can be made on the issues they address. The finite difference approximation method was used to define a numerical solution to the problem. The approximation template coefficients provided the definitions for the weights of the neural network. The template also determined the number of interconnections between the neurons, and their interconnection pattern. The results on variable connectivity illustrated the influence of the number of interconnections on the convergence of the neural network to stable states. The approximation also defined the neuron transfer function. Regarding the design process questions, the natural parallelism paradigm solves the dilemma on the interaction of algorithm, architecture, and application issues. It also provided for an underlying mathematical model for the architecture from the natural analogy in physics. The model further provided for the mapping of the problem onto the neural network through the finite difference approximation. 169 6.5.1 Suggestions for Further Research An interesting followup to this research would be to further test the concept by implementing the architectures in a path planning system environment. A first step would be to include the architectures in a real-time environment such as an aircraft simulation. Further development could be pursued by the incorporation of the designs into hardware prototypes. Various hardware implementations are suggested. The simulation of the designs can be written in the VHSIC Hardware Design Language (VHDL). This would provide a behavioral circuit description of a network engine implementation. Or, a fully parallel VLSI design could be developed to fabricate a chip set to realize the networks. Other technologies such as optical computing can be examined for implementation. The robustness of the design paradigm could be further examined by us- ing the application of constrained optimization. Constraints should be modeled into the analogy as is the basic model. One example is constraining paths to a certain sector about the current vehicle state vector. The analogy for the elec- trostatic model can use zero conductivity regions behind the vehicle whose radii correspond to the constraint in order to restrict the field lines to the desired sector ahead of the vehicle. Another improvement to the architecture design concerns the approximation technique. This research used finite difference formulas for the numerical approx- imation templates. Finite element modeling is another technique that could be 170 investigated to implement the architecture design. Another issue is the refinement of the architecture design to develop adaptive mesh topologies for the interconnection topology and node definitions for the grid. Node definitions and interconnection weight adjustments would need to be defined with adaptive mesh generation algorithms. The self-organizing map is another neural network architecture which adjusts its connection weights in order to learn patterns in data. This technology may be a useful approach to map the variable structure of the cost functions into an adaptive mesh configuration. Explicit representation of the state space in the architecture assumes the avail- ability of the cost function values. However, this requires tremendous l/O capa- bility. This presents a problem for implementation in supporting such a large bandwidth for data. In addition, not all cost functions can be explicitly defined in a spatial organization. There are certainly other applications besides path planning that can benefit from the natural parallelism paradigm. Other applications should be examined to develop their natural analogies for mapping onto massively parallel architectures. Certainly the mathematical models of the analogies themselves can be solved using the architectures developed in this research. One possible application is the solution to computational fluid dynamics problems. Another example is the use of the architecture to solve some linear algebra problems. Finally, other design paradigms may also be appropriate for the development of massively parallel architectures. The design paradigm of natural parallelism is just one of the possible techniques for envisioning solutions to the synergy 171 problem associated with these architectures. Hopefully, this research has enlight- ened the subject of massively parallel architecture design to suggest alternative methodologies in their analysis and development. Appendix A Description of the Test Cases In this appendix, the test cases used to conduct the experiments in this re- search are described. There were eight different test cases used, as shown in Table A.1. The column labelled “N" is the size of the square grid used for the test. Table A.1: Problem Space Size and Path Endpoints CASE N: 515—3: Start —G: Goal ' A so (a, 23) (17, 2) B so (20,10) (45,55) 0 so (a, 42) (37, 11) o 65 (14,35) (50,50) E 21 (13,5) (11, 17) F 21 (17,5) (5,17) G 26 (3,4) (24, 24) Thus, the total number of grid points in each test case is N2. The next column, labelled “S”, contains the coordinates of the start node on the grid. The column labelled “G” lists the coordinates of the goal node. A.1 Test Case A A diagram of the cost layout for Test Case A is located in Figure A.1. Case A is a 30 by 30 grid with five regions containing different uniform cost 172 173 0.005 1 5 0.010 00 0.003 G Figure A.1: Diagram of the Cost Layout for Test Case A values. Each region in the diagram contains a number which is the cost for each cell in that region. The background region has a cost of 0.010, and contains the start node at (3,23) and the goal node at (17,2). Between the start and goal node is a circular region of infinite cost. At the top and right of the grid are two rectangular regions with lower costs of 0.005 and 0.003, respectively. Another smaller circular region of infinite cost is located in the upper right corner of the grid between the smaller cost strips. These regions were designed to bring a nonuniform distribution of cost regions to bear on the warping of the multiple path solutions. The cost function contour plot with the optimal path is shown in Figure A.2. 174 \---—--—-—-—-—-—-—-J rj I I I I T I I Wfi I I I I T I I I I I I T 7 T I Figure A.2: Test Case A Cost Function Contours and Optimal Path 175 A2 Test Case B Test Case B is a threat penetration scenario using synthetic terrain elevation and threat coverage. The terrain which was generated is shown in Figure A.3. In the perspective view of the figure, the nearest corner is the southwest corner of the grid and the farthest corner is the northeast corner of the grid. This terrain covers an area of 3600 square nautical miles. The grid is measures 60 by 60 where each cell is one square mile of the region. The terrain was generated to establish a scenario where a mountainous region had a valley which provided a route along which defensive threats could be deployed. Threats with different effective ranges and risks were then added into the region, taking the terrain into account. The combined risk of these threats was then analyzed by line-of-sight calculations, taking the effects of terrain masking into accout, to generate the cost function for the search. The costs were obtained for a low altitude flight which was constrained to stay 1000 feet above the terrain. The resulting cost function is used in Test Case B, and shown in Figure A.4 along with the optimal path for the start and goal nodes. A.3 Test Case C Test Case C is another threat penetration scenario, but this case uses actual terrain elevation data. The region of the data is near the town of Fulda, Germany. This data has been used in several studies[53, 54, 55] because many military analysts believe this was the location where Soviet tanks would pour into the 176 33.1w \ 11%“ ”Hz/11;], '3\\'O:‘\ 1‘ (111907.): 1 III/111”] I”); well W‘III\(1\\\\\I:11\1\1\\ 1113i "1100‘ (1011“ '0" ,0 III/ ”VG/”1‘1 “110/01“... Jog: fizz: 3.33:0: 11” “9.. .... "' W11 /\l\1\\ \ :.:::: :3 53:? ..zzzé‘o” .:::: a?:::’ .../11:1,! '..'@'\ /I.\\ \\ \‘7' ¢\\ \ “30:: :: 03:... :0 “\:. O. .9 ‘o’o .3? :.’ \\“II:‘;\§“ 0:. 3:. 0.. 0“” if e o:::0:0. “:03...:... 3%.:ozo’ ..z; [I \\!.‘I :«‘\\ \Zl‘wl‘ %\“H 3“: 2:550:44???3:35:50”\\ \l 1110.1,“ N15033:. . ’$::::o:o:o:;.::.:.~ 44302335343?» 101/’1‘ «Mil/[110103111111 '11‘fiittgigtzto22 2:35:33 ’ .0 :zz. ’ 0." \‘:::o::o:’o”: °° ’o:::o°0:0:\\\\\\‘\\;\{1)‘\\\::0 ' ’ ’.°. . g:::: 9:5 0:... .: 10%“!. W IA\ N. /I\ I’. 00000 .zzzui .23 :E:E::.O ...“ will i.) 1111‘” ill/l (it) \:::: 11.“ WI [.1211 ’101’11“ 10'1““ I!" “1‘1qu )1llllliiil"jl| li"“¢".il iii 0.... ll)“; Ill Figure A.3: Synthetic Terrain Data for Test Case B 178 West at the beginning of a major offensive by the Warsaw Pact. Representative models of surface-to-air missle radar sites are used in the generation of the risk data for the cost function. The grid for this test case is a 50 nautical mile by 50 nautical mile area, with the grid cells at one nautical mile increments. One features of the cost function is that it has moguls in a checkboard pattern, which increase in value from west to east. The cost function contour plot is shown in Figure A.5 along with the optimal path for the start and goal nodes. A.4 Test Case D Test Case D uses a smoother cost function than the previous test cases. The data for the cost function is a spherical surface of large curvature with some smaller, local bumps placed on top of the surface. These were averaged to generate a smooth surface with some local variation in cost. The cost function contour plot is shown in Figure A.6 along with the optimal path for the start and goalnodes. A.5 Test Cases E, F, and G Test Cases E, F, and G, are cost functions which contain highly patterned regions of high costs. A diagram of the cost layout for Test Case E is located in Figure A]. Test Case E is a 21 by 21 grid with two regions containing uniform cost values. The background region has a cost of 1.0, and contains the start node at (13,5) and the goal node at (11,17). Between the start and goal node is a rectangular strip with a cost of 4.0. The cost function contour plot with the -. - .. —. u —. .. 3 st Case C C I T , /, . _ / \7 \ \‘. 4 2 . V c- t ,. .._ x It . a “ . ;.‘\*— 74,—5‘137 . — - p ost Function Contours and O timal Pam Figure A.6: Test Case D Cost Function Contours and Optimal Path 181 c F I 4.0 I s 1.0 Figure A]: Diagram of the Cost Layout for Test Case E optimal path is shown in Figure A.8. A diagram of the cost layout for Test Case F is located in Figure A.9. Test Case F is a 21 by 21 grid with two regions containing uniform cost values. The background region has a cost of 1.0, and contains the start node at (17,5) and the goal node at (5,17). Between the start and goal node is a triangular strip with a cost of 4.0. The cost function contour plot with the optimal path is shown in Figure A.10. A diagram of the cost layout for Test Case G is located in Figure A.11. Test Case G is a 26 by 26 grid with nine regions containing two different uniform cost values. The background region has a cost of 1.0, and contains the start node at (3,4) and the goal node at (24,24). Between the start and goal node are several rectangular regions with a cost of 4.0. The cost function contour plot with the 182 I I T T T I l T I j I 7 fl T I T r Figure A.8: Test Case E Cost Function Contours and Optimal Path 183 4.0 Figure A.9: Diagram of the Cost Layout for Test Case F optimal path is shown in Figure A.12. 184 j T T T T T T I T l T T I T T 1 T I Figure A.10: Test Case F Cost Function Contours and Optimal Path 185 G 4.0 4.0 4.0 4.0 4.0 1 .0 4.0 4.0 S l Figure A.11: Diagram of the Cost Layout for Test Case G 186 Test Case G Cost Function Contours and Optimal Path Figure A.12 Bibliography Bibliography [1] Jeffrey Michael Abram. Shortest-Path Algorithms with Decentralized Informa- tion and Communication Requirements. DSc.thesis, Washington University, 1981. [2] L. A. Akers, M. R. Walker, D. K. Ferry, and R. O. Grondin. Limited intercon- nectivity in synthetic neural systems. R. Eckmiller and C. von der Malsburg, editors, ln Neural Computers, pages 407—416, Springer, Berlin, Heidelberg, 1988. [3] A. Barr and EA. Feigenbaum. Handbook ofArtificial Intelligence. Kaufmann, 1981. [4] RE. Bellman. Dynamic Programming. Princeton University Press, 1957. [5] RE. Bellman and SE. Dreyfus. Applied Dynamic Programming. Princeton University Press, 1962. [6] M. Bird, M. Olinger, R. Denton, and J. Jones. The trajectory generator for tac- tical flight management. In Proceedings of the 1983 IEEE National Aerospace and Electronics Conference, 1983. [7] Gilbert A. Bliss. Lectures on the Calculus of Variations. University of Chicago Press, 1946. [8] Oskar Bolza. Lectures on the Calculus of Variations. Dover Publications, 1961. [9] C. Carathéodory. Calculus of Variations and Partial Differential Equations of the First Order. Holden-Day, 1965. [10] Lothar Collatz. The Numerical Treatment of Differential Equations. Springer- Verlag, New York, NY, third edition, 1966. [11] R. Courant and D. Hilbert. Methods of Mathematical Physics. lnterscience Publishers, 1953. [12] Judith E. Dayhoff. Neural Network Architectures. Von Nostrand Reinhold, 1990. 187 188 [13] RV. Denton and JP. Marsh. Applications of autopath technology to terrain/obstacle avoidance. In Proceedings of the 1982 IEEE National Aerospace and Electronics Conference, 1982. [14] SE. Dreyfus and AM. Law. The Art and Theory of Dynamic Programming. Academic Press, 1977. [15] R. Eckmiller, editor. Advanced Neural Computers. North-Holland, Amster- dam, 1990. [16] R. Eckmiller, G. Hartmann, and G. Hauske, editors. Parallel Processing in Neural Systems and Computers. North-Holland, Amsterdam, 1990. [17] R. Eckmiller and C. von der Malsburg, editors. Neural Computers. Springer, Berlin, Heidelberg, 1988. [18] Omer Egecioglu, Terence R. Smith, and John Moody. Computable func- tions and complexity in neural networks. In Real Brains, Artificial Minds, pages 135—164, Elsevier Science Publishers, 1987. [19] Philip H. Enslow, Jr. Multiprocessors and Parallel Processing. John Wiley & Sons, 1974. [20] H. Erzberger and H. Lee. Constrained optimum trajectories with specified range. Journal of Guidance and Control, 3(1):78—85, 1980. [21] S. E. Fahlman and G. E. Hinton. Connectionist architectures for artificial intelligence. Computer, 100—109, January 1987. [22] Richard P. Feynman. The Feynman Lecture Series on Physics. Addison- Wesley Publishing Co., 1963. [23] George E. Forsythe and Wolfgang R. Wasow. Finite-Difference Methods for Partial Differential Equations. John Wiley & Sons, Inc., New York, NY, 1960. [24] J.A.B. Fortes and B.W. Wah. Systolic arrays — from concept to implemen- tation. Computer, 20(7):12—17, 1987. [25] L. Fox. Numerical Solution of Ordinary and Partial Differential Equations. Pergamon Press, Oxford, England, 1962. [26] Jun Gu. Parallel Algorithms and Architectures for Very Fast Al Search. PhD thesis, University of Utah, 1989. [27] Shailesh U. Hedge, Jeffrey L. Sweet, and William B. Levy. Determination of parameters in a hopfield/tank computational network. In Proceedings of the 1988 Connectionist Models Summer School, pages 211-216, Morgan Kaufmann Publishers, 1988. [28] W. D. Hillis. The Connection Machine. MIT Press, Cambridge, MA, 1985. 189 [29] J. J. Hopfield. Neural networks and physical systems with emergent collec- tive computational abilities. Proceedings of the National Acadamy of Sci- ences, 79:2554—2558, April 1982. [30] J. J. Hopfield and D. W. Tank. “neural” computation of decisions in opti- mization problems. Biological Cybernetics, 52:141—152, 1985. [31] John J. Hopfield and David W. Tank. Computing with neural circuits: a model. Science, 233:625—633, 1985. [32] Jenq-Neng Hwang. Algorithms/Applications/Architectures of Artificial Neural Nets. PhD thesis, University of Southern California, 1988. [33] K. Hwang and FA. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, 1984. [34] Leah H. Jamieson, Dennis Gannon, and RObert J. Douglass, editors. The Characteristics of Parallel Algorithms. MIT Press, 1987. [35] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671-680, May 1983. [36] HT. Kung and CE. Leiserson. Systolic arrays (for vlsi). Sparse Matrix Pro- ceedings, 256—282, 1978. [37] RA. Kupferer and DJ. Halski. Tactical flight management — survivable pen- etration. In Proceedings of the 1984 IEEE National Aerospace and Electron- ics Conference, May 1984. [38] C. Lanczos. The Variational Principles of Mechanics. Dover Publications, Inc., New York, NY, fourth edition, 1970. [39] Leon Lapidus and George F. Pinder. Numerical Solution of Partial Differential Equations in Science and Engineering. John Wiley & Sons, Inc., New York, NY, 1982. [40] Juo-Jie Li. Parallel Processing of Combinatorial Search Problems. PhD thesis, Purdue University, 1985. [41] Richard P. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine, 422, April 1987. [42] Chia-Yiu Maa and Michael A. Shanblatt. Adjustment of parameters for signal decision networks. In Proceedings of the 1989 International Joint Confer- ence on Neural Networks, 1989. [43] Chia-Yiu Mae and Michael A. Shanblatt. Stability of linear programming neural network for problems with hypercube feasible region. In Proceedings of the 1990 International Joint Conference on Neural Networks, 1990. 190 [44] Henry Margenau and George Moseley Murphy. The Mathematics of Physics and Chemistry. D. Van Nostrand Company, Inc., Princeton, NJ, second edition, 1956. [45] James Clerk Maxwell. A Treatise On Electricity and Magnetism. Volume 1, Dover Publications, Inc., New York, NY, 1954. An unabridged and unaltered republication of the third edition of 1891. [46] W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 9:127—147, 1943. [47] Carver Mead and Lynn Conway. Introduction to VLSI Systems. Addison- Wesley Publishing Co., 1980. [48] D. H. Menzel. Mathematical Physics. Dover Publications, Inc., New York, NY, 1961. [49] A. R. Mitchell and D. F. Griffiths. The Finite Difference Method in Partial Differential Equations. John Wiley & Sons, Inc., New York, NY, 1980. [50] NJ. Nilsson. Principles of Artificial Intelligence. Tioga, 1980. [51] Allen Nussbaum. ELECTROMAGNETIC THEORY for Engineers and Scien- tists. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1965. [52] John O’Donnell. Supporting functional and logic programming languages through a data parallel vlsi architecture. In VLSI for Al, pages 49—60, Kluwer Academic Publishers, 1989. [53] MD. Olinger and MW. Bird. Tactical Flight Management Phase 1 Final Report. Technical Report ID-04R-0683, Lear Siegler, Inc. Instrument Division, 4141 Eastern Avenue, SE, Grand Rapids, MI, 49518, 1983. [54] MD. Olinger and MW. Bird. Tactical Flight Management Phase 2 Final Report. Technical Report lD-03R-0484, Lear Siegler, Inc. Instrument Division, 4141 Eastern Avenue, SE, Grand Rapids, MI, 49518, 1984. [55] MD. Olinger and MW. Bird. Tactical flight management threat penetration design. In Proceedings of the 1984 IEEE National Aerospace and Electronics Conference, 1984. [56] R. H. J. M. Otten and L. P. P. P. vanGinneken. The Annealing Algorithm. Kluwer Academic Publishers, 1989. [57] Judea Pearl and Richard E. Kort. Search techniques. Annual Review of Computer Science, 2:451—467, 1987. [58] Robert Plonsey and Robert E. Collin. Principles and Applications of Electro- magnetic Fields. McGraw-Hill, 1961. 191 [59] J. Ramanujam and P. Sadayappan. Parameter identification for constrained optimization using neural networks. In Proceedings of the 1988 Connection- ist Models Summer School, pages 154—161, Morgan Kaufmann Publishers, 1988. [60] L. A. Reibling. Research and Development in Neural Network Technology and Applications. Annual Report GR-09P-0989, Smiths Industries Aerospace & Defense Systems, Inc., October 1989. [61] L. A. Reibling. A neural network implementation of parallel search for multiple paths. In Proceedings of the 1990 International Joint Conference on Neural Networks, pages 675-678, January 1990. [62] L. A. Reibling. Neural network solutions to mathematical models of parallel search for optimal trajectory generation. In Proceedings of the 61st Sympo- sium on Machine Intelligence for Aerospace Electronic Systems, Advisory Group for Aerospace Research & Development, 1991. [63] Elaine Rich. Artificial Intelligence. McGraw-Hill, 1983. [64] Mario G. Salvadori and Melvin L. Baron. Numerical Methods in Engineering. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1961. [65] G. D. Smith. Numerical Solution of Partial Differential Equations. Oxford University Press, New York, NY, 1965. [66] John Smith. Modem Operational Circuit Design. John Wiley, 1971. [67] J. C. Smuts. Holism and Evolution. Macmillian, London, 1926. [68] William R. Smythe. Static and Dynamic Electricity. McGraw-Hill, 1950. [69] Thinking Machines Corporation. Connection Machine Model CM-2 Techni- cal Summary. Thinking Machines Technical Report HA87-4, April 1987. [70] P.J.M. VanLaarhoven and E.H.L. Aarts. Simulated Annealing: Theory and Applications. D. Reidel Publishing Co., Dodrecht, Holland, 1987. [71] Ernst Weber. Electromagnetic Fields. Volume I — Mapping of Fields, John Wiley & Sons, Inc., New York, NY, 1950. [72] Patrick H. Winston. Artificial Intelligence. Addison-Wesley Publishing 00., 2nd edition, 1984. "‘ijiljjjljjliiijjijl“