. H 0L . 3.313.... . , ‘ V , . 5:. . . ‘ tv ’1 ‘ . . , . .ims , ‘ V ‘ . 2k: I75} , . . Ill-‘Pilr . ,, r ‘ ., 43¢: ‘ . x , .Fll a. .1 z: 2 .. 4%}..5? . 1. 1 a s) h. firs. .3 3.2?!“ I“ . . hp. rat“ . 31...»?! hr... aux i! \ .kfiwuwbufl‘vnfiv S ,2 . , s\\-..$2¢.>.l.!~ao.v. Elk. ;I.r.’0\: , ‘ , 591...).us ! a. 13!. z ‘ w I . ‘ ‘ I G, . ,- .. ém my .2: ‘ H ER}. fulfil-s Q -. 3.) is“) llllllllllllllllllllllllllllllllllllllllll 3 1293 01572 20 LIBRARY Michigan State University This is to certify that the thesis entitled FUNCTIONAL CORROBORATION OF DIGITAL MULTILAYER NEURAL NETWORKS presented by QAISER HAMEED MALI K has been accepted towards fulfillment of the requirements for Master's degree in Electrical Eng Mia/— Major professor Date fine 2/; /??\5-— 0-7639 , MS U i: an Affirmative Action/Equal Opportunity Institution PLACE Ii RETURN BOX to man thb checkout from your mead. TO AVOID FINES Mum on or baton data duo. DATE DUE DATE DUE DATE DUE MSU I. An mum Action/Equal Opportunity lnotiiuion Walnut FUNCTIONAL CORROBORATION OF A DIGITAL MULTILAYER NEURAL NETWORK By Qaiser Hameed Malik A THESIS Submitted to _ Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Electrical Engineering 1995 ABSTRACT FUNCTIONAL CORROBORATION OF A DIGITAL MULTILAYER NEURAL NETWORK By Qaiser Hameed Malik A model of the digital multilayer neural network (DMNN), has been implemented in sofiware using the C programming language. The DMNN employs a stochastic nonlinear fimction in the backpropagation learning rule. A suite of test cases is applied to the sirmrlated DMNN and its performance is analyzed with reference to an ordinary Artificial Neural Network (ANN) employing a sigmoidal function. The backpropagation learning algorithm, used in the DMNN, has been revised in order to accelerate the convergence process. A modified DMNN has been simulated based upon the refinements suggested in the backpropagation algorithm The modified DMNN has been found to converge for a wider range of learning rates (17) and momentum factors (a). Three networks, DMNN, modified DMNN and ANN, are simulated on an 80486 based Personal Computer (PC). The learning pattern of each network is determined for the test cases and their performance is analyzed. ACKNOWLEDGMENTS I would like to thank my thesis advisor, Professor Michael A. Shanblatt for his guidance, patience and encouragement throughout the course of this research. I would also like to thank Dr. Niaz Mahmood and Dr. Muhammad Akbar, my co-advisors at the National University of Sciences and Technology (Pakistan), for their continued support and supervision. Thanks are also due to the members of my Master’s defence committee, Dr. Mohammad Aslam and Dr. Diane Rover for their comments and valuable suggestions. My sincere and heartfelt thanks are due to my friend Abdul Yamin for providing an understanding of the programming techniques in C language and assisting me in the laborious task of network simulations. 1 would like to express my gratitude and appreciation to my mother, my wife and my children, Farina and Arsalan, for their continuous love, long patience and understanding during the course of this research. I am indebted to the National University of Sciences and Technology (Pakistan), for providing me an opportunity to pursue my MS degree program at the Michigan State University, USA. TABLE OF CONTENTS LIST OF TABLES .......................................................................................................... x LIST OF FIGURES ........................................................................................................ xi Chapter 1 INTRODUCTION .......................................................................................... l 1.1 Overview .................................................................................................... 2 1.2 Problem Statement ..................................................................................... 4 1.3 Research Tasks .......................................................................................... 5 1.4 Organization ofthe Thesrs ......................... 7 Chapter 2 BACKGROUND ............................................................................................ 9 2.1 Introduction ............................................................................................... 9 2.2 Historical Background"; ........................................................................... 10 2.3 Artificial Neural Networks ....................................................................... 14 2.3.1 Biological and Artificial Neurons .................................................... 14 2.3.1.1 Biological Neuron ............................................................. 14 2.3.1.2 Artificial Neuron ............................................................... 17 2.3.2 Network Architecture ..................................................................... 19 2.3.3 Learning Rules ............................................................................... 21 2.3.3.1 Hebb’s Rule ...................................... i ................................ 23 iv 2.3.3.2 The Delta Rule .................................................................. 23 2.3.3.3 Gradient Descent Rule ....................................................... 24 2.3.3.4 Kohonen’s Learning Law [2,29] ........................................ 24 2.3.3.5 Grossberg Learning Rule [2] ............................................. 25 2.3.3.6 Backpropagation Learning Rule ........................................ 25 2.3.4 Neural Network Models ................................................................. 26 2.3.4.1 Feedback Model ................................................................ 26 2.3.4.2 Feedforward Model ........................................................... 29 2.3.4.2.] Single Layer Perceptron .................................... 29 2.3.4.2.2 Multilayer Perceptron ........................................ 31 2.3.4.3 Recurrent Model ............................................................... 36 2.4 Neural Network Implementation ............................................................... 37 2.4.1 Hardware Implementation .............................................................. 37 2.4.1.1 Analog Implementation ..................................................... 38 2.4.1.2 Digital Implementation ...................................................... 39 2.4.2 Software Implementation ................................................................ 40 Chapter 3 DIGITAL MULTILAYER NEURAL NETWORKS ..................................... 41 3.1 Introduction ............................................................................................. 41 3.2 Stochastic Computing in the DMNN ........................................................ 42 3.2.1 Generating Probability .................................................................... 42 3.2.2 Basic Stochastic Computing. .......................................................... 44 3.2.3 Stochastic Computing in the DMNN .............................................. 46 3.3 Hardware Implementation ........................................................................ 49 3.3.1 Basic Computing Elements ............................................................. 49 3.3.1.1 Random Pulse Generator (RPG) ........................................ 49 3.3.1.2 Synaptic Element. .............................................................. 50 3.3.1.3 Input Neuron Body Element (INB) ................................... 51 3.3.1.4 Regular Neuron Body Element (RNB) .............................. 52 3.3.2 DMNN Architecture ....................................................................... 53 3.3.3 DMNN Coprocessor ...................................................................... 54 3.4 Behavioral Model of a DMNN Coprocessor ............................................. 55 3.4.1 Design Methodology ...................................................................... 55 3.4.2 Coprocessor Control ...................................................................... 57 3.5 DMNN Applications ........................................ . ......................................... 58 3.5.1 Application Methodology ............................................................... 58 3.5.2 Benchmark Problems ..................................................................... 59 3.5.2.1 XOR Problem Solver ........................................................ 59 3.5.2.2 8x3 Encoder ...................................................................... 61 3.5.3 DMNN Character Recognizer ........................................................ 62 3.6 Performance Summary ............................................................................. 66 Chapter 4 BACKPROPAGATION IN THE DMNN ..................................................... 67 4.1 Introduction ............................................................................................. 67 4.2 Backpropagation Implementation ............................................................. 68 4.2.1 Review of the Stochastic Function .................................................. 68 4.2.2 Backpropagation Analysis .............................................................. 70 4.3 Revieng Backpropagation Philosophy ................................................... 77 4.3.1 Backpropagation through Noise ..................................................... 79 4.3.2 Simultaneous Change ..................................................................... 80 4.3.3 Numerous Basins and Crests .......................................................... 80 4.3.4 Lack of Digital Convergence Proof ................................................ 81 4.3.5 Dependence on Initial Conditions ................................................... 82 4.3.6 Non-real Time Adaptation .............................................................. 82 4.4 Accelerating the Convergence .................................................................. 83 4.4.1 Momentum Factor ((2) .................................................................... 83 4.4.1.1 Implementing the Momentum Factor ................................. 84 4.4.1.2 Analysis of Kim’s Technique ............................................. 85 4.4.1.3 Redefining the Implementation of Momentum Factor ........ 86 4.4.2 Updating the Weights ..................................................................... 86 4.4.3 Varying the Learning Rate (7)) ........................................................ 87 4.5 Summary .................................................................................................. 88 Chapter 5 DMNN ANALYSIS AND APPLICATIONS ................................................ 89 5.1 Introduction ............................................................................................. 89 5.2 Methodology ............................................................................................ 89 5.2.1 General Methodology ..................................................................... 89 5.2.2 Training and Classification .............................................................. 92 5.3 Benchmark Problems ................................................................................ 93 vii 5.3.1 XOR Problem Solver. ..................................................................... 93 5.3.2 8x3 Encoder ................................................................................... 97 5.4 Character Recognizer ............................................................................. 100 5.5 DMNN Simulation Analysis ................................................................... 104 5.5.1 Kim’s DMNN .............................................................................. 104 5.5.1.1 Program Compatibility ............... '. .................................... 105 5.5.1.2 Invalid Synaptic Weights ................................................. 105 5.5.1.3 Longer Convergence Cycles ............................................ 105 5.5.1.4 Implementation of Momentum Factor: ............................. 105 5.5.2 Variations and Improvements ....................................................... 106 5.5.2.1 Program Compatibility .................................................... 106 5.5.2.2 Generation of Random Weights ....................................... 107 5.5.2.3 Redefining the Implementation of Momentum Factor ...... 107 5.5.2.4 Varying the Learning Rate (77) ......................................... 107 5.6 Summary ................................................................................................ 108 Chapter 6 CONCLUSION .......................................................................................... 109 6.1 Summary ................................................................................................ 109 6.2 Contributions ......................................................................................... 112 6.3 Future Research ..................................................................................... 112 APPENDICES .................................................................... _ .............................................. A Flowcharts. ............................................................................................... 115 B Input Data for DMNN Binary Classifiers .................................................. 119 viii LIST OF REFERENCES ............................................................................................ 126 LIST OF TABLES 3.1 An n-bit Two-layer DMNN XOR Results [1] .......................................................... 60 3.2 Truth Table for an 8x3 Encoder [1} ........................................................................ 61 3.3 Average Number of Iterations for 5-Digit Classifier [1] ........................................... 64 3.4 Performance of 5-Digit Classifier [1] ....................................................................... 65 3.5 Performance of 10-Digit Classifier [l] ..................................................................... 65 5.1 Comparison of Input Variables ................................................................................ 91 5.2 Performance of 5-Digit Classifier Trained with Ideal Digits ................................... 103 6.1 The Fuzzy Associative Memory (FAM) Rules for r] (e, n) ...................................... 114 LIST OF FIGURES 2.1 Biological Neuron [26] ........................................................................................... 15 2.2 Electrical Synapse [1] ............................................................................................. 17 2.3 Artificial Neuron ..................................................................................................... 18 2.4 Sample Activation Functions ................................................................................... 19 2.5 Neural Network Layer ............................................................................................ 20 2.6 A Fully Connected Neural Network ........................................................................ 21 2.7 Hopfield Neural Network Model [2] ....................................................................... 27 2.8 A Three Layer Neural Network ............................................................................... 31 2.9 Boltzmann Machine ................................................................................................ 36 3.1 Block Diagram of 3 Linear Feedback Shift Register (LFSR) [l] .............................. 43 3.2 Random Pulse Generator (RPG) [1] ........................................................................ 43 3.3 Stochastic Computation in the DMNN: (a) Stochastic Multiplication; (b) Logical OR; (o) Neural Activation [1] ..................................................................... 47 3.4 (a) An 8-bit LFSR with a Primitive Polynomial; (b) An RPG for v.- [1] ..................... 50 3.5 (a) A Synaptic Element; (b) Block Diagram of SYN [1] .......................................... 51 3.6 (a) Input Neuron Body (INB) Element; (b) Block Diagram of INB [l] .................... 52 3.7 (a) A Regular Neuron Body (RNB) Element; (b) Block Diagram of RNB [1] .......... 53 3.8 DMNN Architecture [1] .......................................................................................... 54 3.9 DMNN Coprocessor [1] ......................................................................................... 55 3.10 DMNN Structural Hierarchy [1] ........................................................................... 57 3.11 The XOR Problem ................... ............................................................................ 60 3.12 8x3 Encoder ......................................................................................................... 61 3.13 5-Digit Classifier Patterns: (a) Ideal; (b) 10% Noisy; (c) 20% Noisy [1] ................ 63 3.14 lO-Digit Classifier Patterns: (a) Ideal; (b) 10% Noisy; (c) 20% Noisy [1] .............. 63 4.1 Three Layered Neural Network .............................................................................. 68 4.2 Error Surface in: (a) Linear ANN; (b) Nonlinear ANN ............................................ 78 4.3 Local Minimas and Relative Weight Changes .......................................................... 81 5.1 DMNN as XOR Solver for a Momentum Factor (1 = 0.2 ......................................... 95 5.2 DMNN XOR Solver in 3-D, 17 vs (2 vs Number of Iterations ................................... 95 5.3 ANN as XOR Solver for a Momentum Factor 0: = 0.2 ............................................ 96 5.4 ANN XOR Solver in 3-D, 77 vs a vs Number of Iterations ...................................... 96 5.5 DMNN as 8x3 Encoder for a Momentum Factor a = 0.2 ........................................ 98 5.6 DMNN 8x3 Encoder in 3-D, 17 vs a vs Number of Iterations .................................. 98 5.7 ANN as 8x3 Encoder for a Momentum Factor a = 0.2 ............................................ 99 5.8 ANN 8x3 Encoder in 3-D, 1) vs a vs Number of Iterations ...................................... 99 5.9 DMNN as 5-Digit Classifier for a Momentum Factor a = 0.2 ................................ 101 5.10 DMNN 5-Digit Classifier in 3-D, 7] vs a vs Number of Iterations ........................ 101 5.11 ANN as 5-Digit Classifier for a Momentum Factor 0: = 0.2 ................................. 102 5.12 ANN 5-Digit Classifier in 3-D, nvs 0: vs Number of Iterations ........................... 102 CHAPTER 1 INTRODUCTION Artificial Neural Networks (ANN) are systems made of a large number of simple neurons or Processing Elements (PE), whose fimction is determined mainly by their interconnection topology. These systems are capable of solving. computationally intensive and ill-defined problems such as pattern classification, adaptive control, optimization and many more data processing tasks. Currently there are many models available that can be implemented via software and hardware. SoftWare implementation usually employs an algorithm, which is based on the architecture and an tmderstanding of neural networks, mapped onto the conventional digital computers. This thesis presents a software model of a pulse mode Digital Multilayer Neural Network (DMNN) developed by Kim [1]. The capabilities of the network are validated and compared with an ordinary ANN by using data encoding and pattern classification problems. This chapter begins with a brief overview of the DMNN model. The problem to be solved is then defined, followed by the research tasks. Finally the organization of this thesis is outlined. 1.1 Overview Conventional digital computers are very useful in solving well-defined problems since they can be represented by sequences of instructions. However, conventional computers can not compete with the performance of human beings in solving ill-defined random problems such as pattern recognition, classification, speech understanding, vision and so on. The typical characteristics of human nervous system and the brain are; adaptive learning, generalization, error correction, robustness, and creativity. Its computational power is due to its massive interconnection and asynchronous parallel commtmication among neurons (basic anatomical units of the nervous system). Biologically inspired computing machines, often referred to as artificial neural networks (ANN), work in a way similar to the neurons in the human brain. In the past few years, neural networks have received a great deal of attention and are being touted as one of the important computational tools developed so far. Much of the excitement is due to the apparent ability of the neural networks to imitate the brain’s ability to make decisions and draw conclusions when presented with complex, noisy, irrelevant and/or partial information. It is believed that the massive parallelism and computational power of the human brain is due to the complex interconnections among a large number of neurons and not due to the complexity of an individual neuron. Artificial neural networks are inspired both by neurobiology and various mathematical theories of learning and information processing. The main objective of the ANN research is to design new architectures and algorithms that can solve problems which are difiicult to solve by conventional digital computers. With the enormous developments in the fields of VLSI technology and with improved understanding of the human nervous system, it has become possible to implement models of neural nets by mimicking some aspects of the nervous system of the mammals. The implementation of artificial neural networks is based on a large number of sinrple computational elements. Currently there are many models of ANN s [2-4] that can be implemented via hardware or software. Software implementation usually employs an algorithm which is based on the architecture and an understanding of the neural networks, mapped onto the conventional digital computers. Software implementation is very flexible since one can easily modify codes or algorithms in order to implement another ANN model or learning mechanism. Hardware implementation is more suitable to implementing large ANN s and its computational speed is nnrch faster than that of the sofiware implementation. With the current advancements in VLSI technology, dedicated hardware implementation of ANN s is progressing and many analog and digital ANN models have been built [5-8]. Recently, a new digital approach has been introduced in which a synaptic multiplication and neuron activation function is implemented with the simple logic gates using stochastic computing techniques [9,10]. Based on this technique, a statistical model of a pulse mode Digital Multilayer Neural Network (DMNN) has been developed [1]. The applicability of the developed network has been demonstrated using benchmark comparisons and character recognition problems. This thesis presents the functional validation of the DMNN presented by Kim [1]. A software simulation of the DMNN is presented and its results are compared with an ordinary ANN. The backpropagation algorithm, used in the DMNN, is modified to accelerate the convergence. Based on these modifications, an improved DMNN is presented and its performance is evaluated by solving a suite of test cases. 1.2 Problem Statement Many researchers have worked on the implementation of artificial neural networks. Consequently a number of ANN models have been developed both in software and hardware. Software implementation offers more flexibility in carrying out the performance evahration of various ANN models. However, the software implementation sufiers from the major disadvantages of excessive conrputational time and increased memory requirements with increase in the number of neurons (processing elements). Hardware ANN implementation models have been built using analog and digital components [10- 13]. Analog implementation due to its inherent drawbacks, such as, inaccuracy of analog computations and low design flexibility of the analog electronic devices, limit their chip density thus restricting their applications. The advanced design techniques of current VLSI technology have facilitated the digital implementation of the ANN 5. To reduce the area requirements of a conventional digital approach, stochastic corrrputing techniques have been developed that have resulted in the possibility of a low cost and high speed digital ANN implementation. In these architectures, algebraic operations are replaced by random processes and the network performs pseudo-analog computations with operands ranging from 0.0 to 1.0. An operand x in the pulse mode representation is the probability of pulse occurrence in the corresponding binary pseudo-random pulse sequence x(,.), generated at each clock [9,10]. Based on this technique, a pulse mode digital multilayer neural network (DMNN) architecture and its corresponding statistical model has been developed by Kim [1]. Kim has employed the VHSIC (Very High Speed Integrated Circuits) Hardware Description Language (VHDL) for simulating the DWN [1]. However, there is a need to verify the applicability of the developed network using software sinrulation techniques. The DMNN needs to be compared with some standard ANN model to observe the performance of the stochastic fimction used in the backpropagation algorithm 1.3 Research Tasks The research tasks identified for this work are: 1. Sirmrlate the statistical model of the digital multilayer neural network (DMNN), developed by Kim [1], in software using the C programming language. 2. Apply a suite of test cases to the simulated DMNN, and verify its results. 3. Verify validity of the DMNN by solving the test cases with a conventional ANN using a sigrnoidal fimction and comparing their results. To simulate the statistical model of the DMNN, the first step is to understand the model developed by Kim [1]. The research is, therefore, initiated with the study of artificial neural networks. An understanding of the various models of ANN 5, based on different learning algorithms, is developed with a special emphasis on the backpropagation algorithm A detailed study of the statistical model of the DMNN developed by Kim, is undertaken. The DMNN is simulated in C, using a 80486 based Personal Computer (PC). A suite of test problems is applied to the DMNN and the results achieved by Kim are validated. The DMNN is first applied to the famous XOR problem Next, an 8x3 data encoder is sinmlated and the results compiled in graphical form Efforts are then directed to simulate the character classification problem. Two experiments are attempted: one with a 5-digit classifier and the other with a 10-digit classifier. Each experiment is first performed with an ideal set of digits and then with an addition of noisy data. For the last task, an artificial neural network (ANN) using an ordinary sigrnoidal fimction is sinmlated using C language. The network is presented with the same set of test problems and results are compared with the performance of the DMNN. A number of research constraints have been encountered that have restricted the verification of the character classification problem The proficiency of the simulated DMNN is enhanced by incorporating a number of variations in the program structure and its logic. The backpropagation algorithm is modified in an attempt to sinmlate the character classifier given the limitations of the PC. 1.4 Organization of the Thesis This thesis has been organized in six chapters. Chapter 2 discusses the background of artificial neural networks. It begins with a brief history of the different development phases and the important contributions made by various researchers and scientists in this field. Working models of biological and artificial neurons are presented and basic building blocks of an artificial neural network are introduced. Various ANN models are presented and their learning rules are defined. The chapter concludes with a discussion of few implementation techniques. Chapter 3 is a summary/review of Kimls work [1]. The digital multilayer neural network developed by Kim is presented The stochastic computing techniques are introduced and the nonlinear neuron activation fimction used in the DMNN is presented. Basic computing elements of the DMNN are introduced and a modular DMNN architecture is discussed. The DMNN is applied to a suite of test cases and its performance is presented. Chapter 4 discusses the implementation of the backpropagation algorithm in the DMNN. A complete mathematical derivation of the feedforward and feedback processes is presented. A two layer feedforward DMNN is simulated in C using the backpropagation learning rule. The backpropagation learning rule is analyzed for its weaknesses and improvements are suggested to accelerate the convergence process. The DMNN simulation results are presented in Chapter 5. The results of the application problems are compared to those attained through a standard ANN using a sigmoidal function. The improvements suggested in Chapter 4 are then incorporated to enhance the performance of the DMNN. The results are tabulated in graphical form for ease of comparison. Finally, Chapter 6 lists major contributions of the research and identifies firture directions of this work. CHAPTER 2 BACKGROUND 2.1 Introduction The human brain is the most complex computing device known to man. The brain's powerful thinking, remembering, and problem-solving capabilities have inspired many scientists to attempt modeling of its operation. One group of researchers has sought to create a computer model that matches the fimctionality of the brain in a very simplified manner, the result has been the study of Artificial Neural Networks (ANN). This chapter discusses the brief history of artificial neural networks. A review of biological and artificial neurons is provided. Next, various learning mles and typical network models based on these learning rules, with a special emphasis on the feedforward multilayer perceptron using backpropagation learning, are discussed. Thereafter, difi‘erent implementation techniques currently employed are introduced. 10 2.2 Historical Background This section briefly discusses the history of artificial neural networks. From now onwards the artificial neural networks will be referred to as neural networks, neural nets or ANNs. Attenrpts to tmderstand the human brain go back a long way, even centuries. It has been almost exactly a century, 1890, since William James's text, Psychology (Briefer Course) provided insights into brain activity and foreshadowed current theories [14]. Authors of many texts and papers quote the following from William James's text as an example: “ Let us then assume as the basis of all our subsequent reasoning this law: When two elementary brain processes have been active together or in immediate succession, one of them, on re-occurring, tends to propagate its excitement into the other. " In 1936, Alan Turing was the first to use the brain as computing example. In 1943, McCulloch and Pitts published one of the most famous neural network papers about how neurons might work [15]. McCulloch and Pitts derived models of neural networks based on what was known about the biological structures in the early 19405. In coming to their conclusions, they stated five physical assumptions: 1. The activity of a neuron is an ‘all-or-none’ process. ll 2. A certain fixed number of synapses must be excited within a period of time during which the neuron is able to detect the values present on its inputs, the synapses, in order to excite a neuron at any time. This number is independent of previous activity on the neuron. 3. The only significant delay within the nervous system is synaptic delay, i.e., the time delay between sensing inputs and acting on them by transmitting an outgoing pulse. 4. The activity of an inhibitory synapse absolutely prevents excitation of the neuron at that time. 5. The structure of the net does not change with time. The neuron described by these five assumptions is known as McCulloch-Pitts neuron. The next personality in the Age of Conception is Donald 0. Hebb, whose 1949 book was the first one to define the method of updating synaptic weights that we now refer to as Hebbian [16]. Donald 0. Hebb is among the first to use the term ‘connectionism’. The 19505 was the age of computer sinmlation. It became possible to test theories about nervous system fimctions. The IBM research laboratories conducted an tmsuccessful software simulation of a neural network model based on Hebb's work It was not tmtil the mid 19505, when with the collaboration from Hebb and others, successful adaptations were made. Theories began to be modified. In the summer of 1956, the Dartmouth Summer Research Project on Artificial Intelligence (AI) gave birth to neural nets. The following year, John von Neumann 12 suggested imitating simple neuron functions by using telegraph relays and vacuum tubes [17]. In 1958, a milestone paper by Frank Rosenblatt referred to the perceptron as a neural network structure [18]. The perceptron was simulated in detail on an IBM 704 computer at the Cornell Aeronautical Laboratory. The perceptron was probably the first ‘learning machine’. The primary learning mechanism of a perceptron is ‘self-organizing’ or ‘self-associative’ in the sense that an initially random process tends to become organized. In 1959, Bernard Widrow and Marcian Hofi’ developed models for ADALINE, and MADALINE (Multiple ADAptive LINear Elements). This was the first neural network applied to a real-world problem, i.e., adaptive filters to eliminate echoes on phone lines. Widrow and Hofl’s paper [19] is considered to be ‘prophetic’. They suggested several practical implementations of their ADALINE: “ If a computer were built of adaptive neurons, details of structures could be imparted by the designer by training (showing it examples what he would like to do) rather than by direct designing. ” Then came the so called ‘Dark Age’ for neural network research. In 1969, Marvin Minsky and Seymour Papert published a book [20], which condenmed the Rosenblatt's perceptron. The charge was that it could not solve any ‘interesting’ problems. This brought a halt to nmch of the funding for neural network research. Many researchers turned their attention to expert systems, which make decisions based on a set of pre- defined rules under certain conditions. 13 Some of the disappointed researchers continued working. These included James Anderson, a neurophysiologist at Brown University, who developed a network model with the name of Brain-State-in-a-Box (BSB) [21]. Research also continued in Japan and Europe. Kunihiko Fukushima developed the Neocognitron, a neural network model for visual pattern recognition [22]. Teuvo Kohonen, an elecnical engineer at Helsinki University, independently developed a similar model to Anderson's BSB [23]. There were others, plugging away quietly in their labs: Grossberg, Rumelhart and McClelland, Marr and Poggio, Amari, Cooper, and many others. In 1982, John J. Hopfield at California Institute of Technology, published a paper reviving the neural network’s field [24]. His follow-on paper was published in 1984 [25]. With clarity and with mathematical analysis, he showed how neural networks could work and what they could do. The fixture of artificial intelligence started to brighten once again What had been thought to be the failures of artificial intelligence could now be overcome. In 1985, the American Institute of Physics began an annual Neural Networks for Computing meeting. In 1987, the Institute of Electrical and Electronics Engineers’ (IEEE) First International Conference on Neural Networks drew more than 1,800 attendees and 19 vendors. Later the same year, the International Neural Network Society (INNS) was formed lmder the leadership of Grossberg in U.S., Kohonen in Finland, and Amari in Japan. The International Joint Conference on Neural Networks (IJCNN), held in Inc 1989, produced 430 papers, 63 of which focused on the application development. The 14 January 1990 IJCNN in Washington, DC. included a concert of music generated by neural networks. Today, there are numerous joumals/magazines published monthly/quarterly on neural networks all over the world. A number of different conferences and symposia are held periodically and a large number of papers are presented by researchers on the various aspects and applications of the neural networks. 2.3 Artificial Neural Networks This section presents a brief review of biological and artificial neurons followed by the network architecture. Several learning rules are discussed. Based upon these rules, difl‘erent types of ANN models are introduced with a detailed mathematical treatment for the multilayer feedforward perceptron using backpropagation rule: the model subsequently used in the simulation of Digital Multilayer Neural Networks (DMNN). 2.3.1 Biological and Artificial Neurons 2.3.1.1 Biological Neuron Before we describe the building blocks and operation of an artificial neural network, it is instructive to briefly examine the corresponding components of the brain which inspired neural computing. The basic anatomical lmit in the biological nervous system is a specialized cell called the neuron. The brain is composed of about 1011 neurons of many different types. Figure 2.1 portrays a conceptual diagram of a typical 15 neuron. Most neurons can be divided into four distinct regions, each performing a Specialized function. A tree-like network of nerve fiber called dendrites are connected to the cell body or soma, where the cell is located. Synapse Dendme\ 1‘ 7n Hliiock /”'Axon SWWN Nucleus within the cell body Figure 2.1. Biological Neuron [26]. Extending from the cell body is a single long fiber called the axon. The axon eventually branches or arborizes into strands and substrands. At the end of these are the transmitting ends of the synaptic junctions or synapses, to the other neurons. Receiving ends of these junctions on other cells can be found both on the dendrites (1&3 in Figure 2.1) and on the cell bodies themselves (2 in Figure 2.1). The axon of a typical neuron makes a few thousand synapses with other neurons [26]. 16 are unidirectional in nature and determine the direction of signal flow. Axons are specialized for carrying signal towards other cells without much afl‘ecting its magnitude. Action potentials originate at the axon hillock, travel towards synapses and pass to other cells. Dendrites receive signals fi'om sensory organs or from the axons of other neurons, convert these signals into electrical impulses and transmit them to the cell body. The cell body receives signals independently either through dendrites or through synapses directly connected with it. If electrical impulses are greater than a certain threshold, action potentials ill the form of the pulse streams, are generated and actively conducted down the axon. The signal flow goes fi'om dendrites, through the cell body and out through the axon. The axon terminal from a presynaptic cell sends chemical or electrical signals through a synaptic gap. The signals are collected by a postsynaptic cell. Two types of synapses are believed to exist in a biological neural system; electrical synapses and chemical synapses. The two differ in both structure and fimction. Figure 2.2 shows the schematic of an electrical synapse. Cells commlmicating by electrical synapses are connected by gap junctions which allow an electrical pulse to pass fiom the presynaptic cell to the postsynaptic cell. Action potential is generated ill the postsynaptic cell. Chemical substances, called neurotransmitters, are responsible for transmitting the signals in chemical synapses [27]. Plasma Membrane Gap Junction Connection €-~a..-~_~.-.~r.m:-.. . .. w. .. a I ‘Postsynaptic Cell g 1 Figure 2.2. Electrical Synapse [1]. Two types of signals are generated in synapses: excitatory and inhibitory. In excitatory synapse, the signal from the presynaptic cell causes a change in the plasma membrane of the post synaptic cell that tends to induce an action potential. Action potentials are pulse streams with a pulse-width of about 1 msec. With an inhibitory synapse, a nerve impulse in a presynaptic neuron prevents the generation of an action potential by affecting the electrical properties of the post synaptic membrane. Excitatory and inhibitory stimuli often affect a single neuron in combination [1]. 2.3.1.2 Artificial Neuron An artificial neuron, a unit analogous to a biological neuron is referred to as a processing element (PE). Just as a biological neuron, the artificial neuron is a many-input 18 single-output processing element. Each input is weighted. The output is a linear or nonlinear function of weighted sum of its inputs. Figure 2.3 portrays a single artificial neuron. Figure 2.3. Artificial Neuron. The inputs and the weights on the inputs are represented as vectors, [x], xi, x,.] and [w,, W), wn], respectively. The total input signal is the dot or inner product of the two vectors and this is compared to a threshold value to determine the output. If the total input is greater than the threshold value, the processing element fires (excitatory), and if it is less, the processing element does not fire (inhibitory). Mathematically, if y is the output of a neuron and 6 is the threshold value, the required neuron transfer fimction can be expressed as l9 y=f[‘:xjwj 49) (2.1) j=i where fi-) is a linear or nonlinear threshold function. Some of the commonly used nonlinear functions are shown in Figure 2.4. The most pervasive threshold function is the sigmoid function because it is a bounded, monotonic and non-decreasing function that provides a graded, nonlinear response, most resembling a biological neuron. f(x) f(x) l A ll_ 1}.” i l/ r %x >x __l.1 . l Hard Limiter Threshold Logic J ¥ _,___,_]_1 V'x Sigrnoid Functions Figure 2.4. Sample Activation Functions. 2.3.2 Network Architecture So far a single neuron (processing element) has been discussed. These processing elements are combined together to form a layer of neurons whose cell body is called a 20 node. Inputs could be connected to many nodes with difi'erent weights. Each node has an individual output. A neural network layer is shown in Figure 2. 5. W W "’21 W x2 >@———o y, W2", 22 W"! n2 x" wnm a' f(.) > yin Figure 2.5. Neural Network Layer. Several layers can be interconnected to form a neural network. The layer that receives the inputs is called the input layer or input buffer. The network output is generated from the output layer or the output buffer. The intermediate layers are called the hidden layers. The network is said to be fully connected if every output fi'om one layer is passed along to every node in the next layer. Figure 2.6 shows a fully connected network. '.il i i. I-Iiddar Layers Figure 2.6. A Fully Connected Neural Network. 2.3.3 Learning Rules The main characteristic of a neural network is its ability to learn. Artificial neural networks are constructed according to some learning rule used to change the interconnected weights. In a very broad sense, neural networks employ two modes of operation. In the first mode, the operational mode, a neural network computes the output for a given set of inputs. This mode assumes that the interconnected weights are fixed to some desired values. The other mode, called the learning mode, the interconnected weights are adjusted using an appropriate learning law with the objective of training the neural network to give a desired output. In most cases, the operational mode is followed by the learning mode, that is, first the desired accuracy is achieved through the learning process and then the neural network is used in the operational mode. 22 The weights can be dynamically adjusted to produce a given output. This dynamic modification of the weights is the very essence of the learning process. Learning is the process by which a neural network modifies its weights in response to the external inputs. The equation that specifies this change is called the learning law, or the learning rule. Before discussing various learning rules, learning types, that is, supervised, unsupervised, and reinforced learning are defined. Supervised learning requires a teacher. The teacher may be a training data set or an observer who grades the performance. In either case, the teacher is used to form an error signal which is used in some kind of law to update the interconnected weights of the neural network. When there is no external teacher, the neural net rmrst organize itself by utilizing some internal criteria designed into the network This type of learning, called lmsupervised learning is based on the input/output values of the neural net. This kind of learning is often used in the feature extraction applications. For reinforced learning, the weights connected to a neuron are not updated in direct or inverse proportion to the error signal of that particular neuron, but instead they are updated in proportion to some type of global reinforcement signal. A binary or a trinary neural network is an example of such type of learning. In a binary neural net, weight update can only take two values (+1 or -1) whereas in a trinary network three update values are allowed (+1, 0, -1). Many learning laws are in the common use. Research has continued, however, and new ideas are being tried. The definition of a few learning laws, generally omitting the mathematical details, is given here. 2.3.3.1 Hebb's Rule The first rule was introduced by Donald Hebb. He stated in his book [16]: “ If a processing element receives an input from another processing element, and if both are highly active (mathematically have the same sign), the weight between the processing elements should be strengthened ” 2.3.3.2 The Delta Rule This rule is also known as the Widrow-Hofi’ Learning Rule. The Delta Rule is based on the idea of continuously modifying the strengths of the connections to reduce the difi’erence (delta) between the desired output vahle and the current output value of the processing element. If there is no difference between the two, no learning takes place. The rule for changing the weights, following the presentation of the input/output pair p, in case of no hidden layers, is given by A pwU = I] (th — Ori)ipl = "as! is" (2'2) where t” is the target input for the jth component of the output pattern for pattern p, 0,,- is the jth element of the actual output pattern produced by the presentation of input pattern p, i,- is the value of the ith element of the input pattern, 6,, = ty- - OH, and Apr is the change to be made to the weight fiom the ith to the jth lmit following the presentation of the pattern p. Widrow and Half used this rule in their ADALINE model [28]. 24 2.3.3.3 Gradient Descent Rule In this rule the weights are modified by an amount proportional to the first derivative of the error with respect to the weight. IfWg' is the connection weight between the neuron i and neuron j, and Ep is the error cost filnction for the pattern p; the gradient descent rule states DE A Fwy cc #6:: . (2.3) This corresponds to performing the steepest descent on a surface in the weight space whose height at any point in the weight space is equal to the error measure [3]. Due to a mathematical approach to minimize the error between the actual and the desired outputs, the gradient descent rule converges to a point of stability very slowly. 2.3.3.4 Kohonen's Learning Law [2,29] This rule is based on the unsupervised learning approach. In this procedure, the processing elements compete fer the opportunity of learning. The processing element with the largest output is declared the winner and has the capability of inhibiting its conrpetitors as well as exciting its neighbors. Only the winner is permitted an output, and only the winner plus its neighbors are permitted to adjust their weights. Further, the size of the neighborhood can vary dluing the training period The usual pattern is to start with a larger definition of the neighborhood, and narrow it as the training proceeds. 25 2.3.3.5 Grossberg Learning Rule [2] In the Grossberg's model, every neural network is made up of the instars and the outstars. An instar is a processing element receiving many inputs and an outstar is a processing element sending its output to many other processing elements. The connections allow recall of a concentrated image fiom a single outstar node. The pattern is stored distributively. If both the input and the output activities of a node are high, the weights change significantly. If either the total input or the output is small, the change in the weights is very small, and the weights could even approach to 0 on unimportant connections. Time is important in Grossberg learning. If an input stimulus is removed over time, the output response reduces as forgetting sets in The network threshold is important: if set too high, the network responds to every little nuance and if it is set too low, the network ignores too much. Learning is turned ofl’ during any recall. 2.3.3.6 Backpropagation Learning Rule The backpropagation of errors technique is the most commonly used generalization of the Delta Rule. This technique involves two phases The first phase, called the forward phase, occurs when the input is presented and propagated forward through the network to compute an output value for each processing element. For each processing element, all the current outputs are compared with the desired output, and the difi’erence, or the error, is computed. In the second phase, called the backward phase, the recurring difference computation (from the first phase) is performed in a backward direction. The new inputs can only be presented when these two phases are complete. 26 Generally this technique is applied to the hierarchical networks involving hidden layers. At the output layer, the actual output from each node and the expected output is known. The ‘magic’ of this rule is, how the adjustment of weights take place in hidden layers [2]. There are several important drawbacks: backpropagation is very slow, requires much off-line training, exhibits temporal instability (can oscillate), and has the tendency to get stuck at a local minima (see Section 4.3). 2.3.4 Neural Network Models There are basically three types of neural network models: feedback models, feedforward models and recurrent models. A brief description of these models is presented here. A detailed mathematical treatment has been given to the multilayer feedforward perceptron model to facilitate its subsequent use. 2.3.4.1 Feedback Model In feedback neural networks, the neural elements are connected to one another by the feedback paths fiom outputs to the inputs of the neural elements. A key issue for these networks is the definition of an energy flmction which always decreases during the dynamical evolution. The Hopfield model is a typical example of the feedback models. The Hopfield model is a one-layered feedback network which consists of interconnected nonlinear neurons Many implementations have been built using this model One version of the original net [2,24] which can be used as a content addressable 27 memory is described here. This net, shown in Figure 2.7, has N nodes containing hard limiting nonlinearities and the binary inputs and the outputs taking on the values +1 and - l. The output of each node is fed back to all the other nodes via the weights denoted Wg'. OUTPUT S (validafterconvergence) yo yr ‘ yn-Z yl-I l l x1 x2 ‘ x54 xii-l INPUT S (appliedattime zero) Figure 2.7. Hopfield Neural Network Model [2]. The weights are first set from exemplar patterns for all classes using Ail x‘x", ...... i¢ ' wu= ..o ‘ 1 J (2.4) 0,W.mi=j,ogi,jsM-1, where Wg is the connection weight fiom node i to node j. xr’, which can be +1 or -1, is the element i of the exemplar for class 5 and M is the total number of classes. An lmknown pattern is imposed on the net at time zero by forcing the output of the net to match the unknown pattern, i.e., 28 u,(0)=x,, --------- osisN—l, (2.5) where u, (t) is the output of node i at time t and x, which can be +1 or -1, is element 1' of the input pattern. N is the total number of nodes. Following this initialization, the net iterates in descrete time steps expressed using the formula u,(t+l)= f,[§wuu,(t)], ------ OS j_<. M-l, (2.6) where fl. is the hard limiting nonlinearity. The process is repeated until node outputs become constant on successive iterations. The node outputs then represent the exemplar pattern that best matches the lmknown input. The pattern specified by the node outputs, after convergence, is considered the net output. Hopfield has proven that the net converges when the weights are symmetric (wg=w,-,-) and the node outputs are updated asynchronously using the equations mentioned above [24]. Hopfield also demonstrated that the net converges when the graded nonlinearity, similar to the sigmoid nonlinearity in Figure 2.4, is used [25]. The Hopfield model has also been applied to the combinatorial optimization problems where it converges to a good solution in a few time constants [30,31]. When the Hopfield network is used as an associative memory, the net output after convergence is used directly as the conrplete restored memory. While applying the Hopfield model as a classifier, the output afier convergence has to be compared to the M exemplars to determine if it matches an exemplar exactly. If it does, the output is that class whose exemplar matches the output pattern. Ifit does not, then a ‘no match’ result occurs [2]. 29 2.3.4.2 Feedforward Model Feedforward networks were first studied in detail by Rosenblatt in the early 1960's [32]. A number of algorithms for training of these networks have been developed since then. These networks can learn a set of input-output pairs as examples and can successftu generalize the learned patterns. Feedforward networks have been applied to pattern recognition, robotics, and control problems [33,34]. 2.3.4.2.1 Single Layer Perceptron This net generated much interest when developed initially because of its ability to learn to recognize simple patterns. A simple perceptron is a single layered feedforward neural network consisting of n inputs and an output layer. Figure 2.5 illustrates an example of a single layer perceptron. x.” is the ith element of the input pattern and yf‘ is the output of the neuron i when pattern p is presented to the network W5 is the connection weight between the neuron i and the jth element of the input pattern. If the number of patterns is p such that p = 1,2,...,p, the output of the output layer can be described by y,” = f Z-Ewy x; 44),] (2.7) ~ 1 y,” = f [’Z-gw, xj) , (2.8) 30 where x0“=l for all p and w,0=i9, is a bias. fl) is the continuous. nondecreasing and differentiable function. The system's performance is measured by the error cost firnction which is defined as 1 E=3§lEu é—Z Z“: (1,.“ - y,“)2 (2.9) é—zaltr - fl; mil where t,” is the desired output of the neuron i for the input pattern p. The connection weights, w”, are changed by the gradient descent rule (Section 2.3.3 .3) Aw . = -11— (2.10) U flé(’ru‘yru)f'[zwrj x51) (2-11) The simple perceptron can not solve problems in which input patterns are linearly dependent, and may offer alternate partial solutions. However, multilayer feedforward neural networks with nonlinear neuron elements can overcome this limitation [1]. 31 2.3.4.2.2 Multilayer Perceptron Multilayer perceptrons are backpropagation neural networks consisting of an input layer, an output layer and one or more hidden layers of neurons. Figure 2.8 depicts a neural network which has one input layer, 1, one output layer, K, and one hidden layer, J. The input layer, 1, consists of n. neurons, the output layer, K, consists of n], neurons, and the hidden layer, J, consists of n, neurons. Bias Inputs. +1 Ui-lC'UZ- . O : ° U . 2 T ' t P U T O O 8 Connecting wears . 3 . Weighs Input Layer. 1 Hidden Layer, I Output Layer. K Figure 2.8. A Three Layer Neural Network. The mathematical expressions for the network are "I 0,. =f(ZH,W,-,,) ,~ H,=1.0 whenj=0 (2.12) j=0 32 and H, =f(ZI.W.,) .- 1,=1.o wheni=0, (2.13) =0 where f (-) is a continuous differentiable function, m is the adjustable weight parameter connecting the jth hidden layer neuron to the kth output layer neuron, and W, is the adjustable weight parameter connecting the ith input layer neuron to the jth hidden layer neuron. The outputs, Or for the output layer K and H, for the hidden layer J, are calculated using equations (2.12) and (2.13), for k = 1,2,...,nr and j = 1,2,...,n,-, respectively. Yr." are the desired outputs for k = 1,2,...,nt. The objective is to minimize the difference between the desired outputs, Ytd, and the observed outputs, Or, for k = l,2,...,nr. Conventionally, the error cost function for each k is defined as 1 d 2 ek = 3(Yk-0k) for k = 1,2, ... ,nk . (2.14) The variable parameters in the neural networks are the connecting weights that can be adjusted by minimizing the cost function. Therefore, minimizing the cost filnction for the kth output node with respect to the weight connecting jth hidden node, produces 5et d 601' — - Y - 0 . 2.15 aij ( k k) 6W1} ( ) Substituting equation (2.12) in equation (2.15) yields aet aw}, "1 = -(Y:'0k)f'(zHiWflr)Hl ; Hi = 1.0 when j = 0. (2-16) j=0 33 The kth gradient of the error cost function, equation (2.15), with respect to the weight parameter connecting the jth hidden neuron to the kth output neuron is given by equation (2.16). Defining a. = (Yi-Ot)f'(iH,W,t) (2.17) 1'0 and substituting the value of 5., in equation (2.16) leads to be; am. = 43*,” (2.13) Backpropagation is a gradient descent learning rule that minimizes the error cost function, 8C1uation (2.14), according to the following formula 5 W,.ln+ 1] = w,.lnl+ n(- a 8* )+aAW,-,,[n]. (2.19) jk Substituting equation (2.18) in (2.19), gives the backpropagation law to update the weight parameters connecting the hidden layer to the output layer. The expression is given as W}, [n+ l] = W,,,[n]+ 175,, H). + aAW,, [n] (2.20) where n is the time index, n is the learning rate, a is the momentum factor, and AW,t[n] is the change in weight, connecting the jth hidden neuron to kth output neuron, in the last step. Equation (2.20) can be easily implemented for weights connecting the output layer to the last hidden layer, for example, in Figure 2.8 for layer K to layer J. However, equation (2.20) can not be used to update weights connecting any other two layers. For 34 example, weights, W9, connecting the input layer, 1, to the hidden layer, J, can not be adjusted using equation (2.20). The reason being that there is no known target response for the hidden layer neurons. However, the contribution in the total error by the hidden layer neurons at the output layer neurons must be propagated back. In Figure 2.8, weights W.) are updated reflecting back the error fiom all the output nodes to the jth hidden node. Mathematically, this is accomplished by the relationship W,ln+11 = Wanna-i?" in]. (2.21) i=0 r; In equation (2.21) the factor involving partial derivatives is calculated using the chain rule given by m a m (2.22) e__t_ i—(rt- 00’] E— 6W.- .2 slam-2 “To. a—H, 6W; 6H =-Z (Yt- 0..)0f’ (Em-Wow..- . i=0 aWy Substituting equation (2.17) in equation (2.22) gives in aek n. "" = 8 W . 2.23 gm- 6W.- Z... ‘ ”‘ ( ) From equation (2. 13), we have 6H}- - ' nl - f (Elm-)1. . (2.24) 5W.) i=0 Substitution of equation (2.24) in equation (2.23) gives as a n, m 2““ = -f’(Z/.W.,)I.-Z§.W,r. (2.25) k=0 aWU i=0 k=0 “Whiting 6,— = f '(21. W,-)Z6t W... (2.26) i=0 i=0 allows equation (2.25) to be written as "k 6e __L = ' i . 2.27 Finally, substituting equation (2.27) into equation (2.21) produces W,ln+ 1] = Wain1+ n6,1.— + aAngn] . (2.28) Similarities between equations (2.20) and (2.28) are obvious. In a more general form, the parameter adjustment between a node a and a node b can be given as Wn[n+ 1] = Wa[n]+ ”abSa + aAWalnl - (229) Here Sn, the symbolizing source, is the output of node a or an external input to the network and 8b is an associated error at the node b. 8;, can be calculated as 6b = Sh (Y: —Sb) (230) for the output layer nodes, and a, = 5,2 W,,5, (2.31) b for the hidden layer nodes. 36 2.3.4.3 Recurrent Model In recurrent networks, connections in either direction, i.e., between a pair of layers and within a layer to itself; are possible. The Boltzmann machine is a typical example of the recurrent networks with symmetric connections. The Boltzmann Machine is comprised of visible units (input units and output units) and hidden units [35,36]. Figure 2.9 illustrates the structure of a Boltzmann machine. The units are stochastic and take output value v.- = :1 with a certain probability. A sinmlated annealing procedure is used to achieve a global minimum Boltzmann machines have been used for pattern recognition and the optimization problems. However, these networks take a very long time to converge and their hardware implementation is not practical Visible Units (output) Figure 2.9. Boltzmann Machine. 37 2.4 Neural Network Implementation The preliminary efforts of neural network modeling were mostly theoretical since the implementation tools had not been developed with the same pace. As sophistication ill the field of computers progressed, many practical models have been presented. Software simulations have not only moved this fledgling technology along, but also provided an environment that nurtures testing and experimentation. The goal of many researchers has been, and still is, to find implementations which are fast and yet not too expensive. This generally means implementing in hardware, however, many current models have been sinmlated ill software using serial or parallel digital computers. To date a number of hardware and software prototypes have been built using electronic/optoelectronic techniques and employing various programming tools. Network implementations can be divided into two broad categories: hardware implementation and software implementation. This section briefly discusses these categories. 2.4.1 Hardware Implementation Hardware implementation can be divided into two categories based upon the method used to express the values within the network: analog, and digital 38 2.4.1.1 Analog Implementation Analog implementations are mostly based upon the physical properties of analog components such as the behavior of voltage/current in resistors, capacitors, and semiconductor devices. The interconnections in an analog neural network can be simple fixed value resistors representing the synaptic weight values. Such networks, however, are not of much practical significance since these networks can not be made programmable due to constant value synaptic weights. To design a programmable network, the synaptic weights have to be stored in a memory. A weight can be stored as the voltage difference between the two capacitors [37]. This technique, however, requires a refiesh circuitry to overcome the problem of leakage in the capacitors. Static memory cells have been used for storage of weight vahres [6]. This method is also not very eflicient since most applications require a higher resolution. The weights can also be stored digitally but this technique requires a D/A converter at each connection to perform analog multiplication of the stored weights with the input signal. Hardware implementations using analog computations are less expensive because the network operations can be performed using inexpensive hardware. This category however, sufi‘ers from a number of disadvantages, such as low accuracy and limited dynamic range, due to physical constraints of the analog components. The design flexibility in analog implementation is also strictly constrained because only mathematical flmctions resulting from physical principles are available. 39 2.4.1.2 Digital Implementation Since the artificial neural networks require a large amount of intercOnnections and computation, especially in training, the development of a specialized high performance processor is needed. Digital implementation is a reasonable alternative for realizing neural network VLSI processor. Several digital neural networks based on custom VLSI design have been developed where a neuron is a processing element consisting of computing units, registers, and a lookup table. Suzuki and Atlas mapped a network to an array of custom processors [38,39]. This ANN hardware has a high design flexibility, but sufl‘ers fiom a large hardware requirement. In order to achieve a high design flexibility and high chip density, digital networks using stochastic computing techniques have been proposed [9,10,13]. In this approach algebraic operations are replaced by stochastic processes using pseudo-random pulse sequences. Simple logic gates alongwith other digital components perform nmltiplication and nonlinear transformation of signals. The values for synaptic weights and the input operands are restricted in the range between 0.0 and 1.0 both for training and testing [1]. An operand x in the pulse mode representation is the probability of a pulse occurrence ill the corresponding binary sequence xm at each clock Stochastic computations using random pulse sequences inherently utilize concurrent processing in all synaptic and neuron elements. Use of simple logic gates allows high neuron density and compact network architecture. The network can be made programmable thus increasing the design flexibility. A pulse mode digital multilayer neural network has been developed by Kim [1]. Details of Kim’s model are presented in the next chapter. 40 2.4.2 Software Implementation Software implementation or simulations can be affected either on general purpose computers or on special purpose computers. General purpose parallel computers with a large number of processing elements are being used for ANN simulations. Processing elements commlmicate through a single high speed data path and may have a dedicated memory to store data. Warp machine was used to implement a backpropagation network using a systolic array of 10 processing elements. [40]. Forrest, et al., used a Distributed Array Processor (DAP) comprised of 4096 processors to implement a Hopfield network [41]. The DAP was able to perform 25 million additions per second General purpose parallel computers can solve problems of limited size within a reasonable time. But, as the size of ANN increases, the computational load increases beyond the acceptable bounds and these computers take hours to solve a given problem. Special purpose computers designed for ANN simulations are called neurocomputers. Neurocomputers are attached to a host computer as coprocessors and are controlled through a user program Mark III and Mark IV neurocomputers were developed by TRW [42]. These machines were able to process up to 450,000 and 5,000,000 interconnections per second respectively. 3F the CHAPTER 3 DIGITAL MULTILAYER NEURAL NETWORKS 3.1 Introduction Artificial Neural Networks present a practical approach to solving many computationally intensive and ill-defined problems such as pattern recognition, optimization, and other complex information processing tasks. Kim [1] has developed a new architecture for a digital feedforward neural network A statistical model of the digital multilayer neural network (DMNN) is also evolved based on stochastic computing. Kim presents a compact and expandable pulse mode DMNN architecture employing simple logic gates and modular design techniques. The colossal parallelism embedded in the stochastic computations, using random pulse streams, is thoroughly explored with this architecture. Kim makes use of VI-IDL hardware description language to model and simulate all the components of the DMNN coprocessor. Kim also demonstrates the applicability of his model by using benchmark and character recognition problems [1,43]. This chapter presents a brief overview of Kim’s work with a view to develop a thorough understanding of the DMNN. 41 3.2 l 3.2.1 111173. This art}: Hit usir. 0a: m; 42 3.2 Stochastic Computing in DMNN 3.2.1 Generating Probability Stochastic computing techniques, using random pulse streams, were proposed in the 19605 [44,45]. In stochastic computations, the operands are normalized and represented by probabilities which are actually encoded in random pulse streams. Probability is estimated as a relative frequency of 1 pulse occurrences in a finite but long pulse stream. Since the probability can not be measured exactly, errors by estimation are introduced in the form of variance when the stochastic computing techniques are used. This idea has been used as an alternative to the deterministic computations in the area of artificial neural networks since late 19805. The main reason is that stochastic computing using random pulse sequences shares important characteristics with ANN dynamics, i.e., network performance depends on the collective properties of the network where each processing element does not necessarily perform correct computations. Stochastic computing techniques require that values of all the operands lie between 0 and l. The fiactional numbers represented by the probabilities are encoded in the random pulse streams. Ifthe fractional number is stored in an n-bit register, the resolution is 1/(2"-1). The random pulse streams corresponding to a fractional number can be generated by comparing the number with a pseudo random number. The pseudo random number can be generated fi'om a PN sequence by taking all the bits of a tapped Linear Feedback Shift Register (LFSR) in parallel. An n-bit LFSR is shown in Figure 3.1. 43 2 3 -- - n-l n ——-—-> MSB ---- LSB Figure 3.1. Block Diagram of a Linear Feedback Shift Register (LFSR) [1]. Fractional numbers from 1/(2"-l) to 1, equally spaced by l/(2"-l), are generated exactly once in a period (2"-l). The distribution of the pseudo random number is close to an ideal uniform distribution. Figure 3.2 shows the diagram of a Random Pulse Generator (RPG) for a fractional number x. The generating probability x is the probability of pulse occurrence in the corresponding random pulse sequences x(,,) at each clock pulse. x is estimated in a sampling clock period. The sampling clock period is defined as finite clock periods taken for the estimation of x. Digital Comparator clock LFSR Figure 3.2. Random Pulse Generator (RPG) [1]. 44 Error (noise) is involved in estimating the generating probability in finite clock periods. Estimate it is taken as the original signal plus random noise. The generating probability of the random pulse generator has been modeled as a binomial distribution in the literature [45]. However, the pulse occurrence in x(,,) is not perfectly independent because a maximum length LFSR generates fractional numbers between l/(2"-1) and I .such that each number occurs exactly once in a clock period. Accordingly, the pulse occurrence in x(,,) has statistical dependency. The sampling distribution of x can be modeled closely by a hypergeometric distlibution. The details of this statistical model are given in [1]. This new statistical model has been used to perform an analysis on random noise effects in digital multilayer neural networks (DMNN). 3.2.2 Basic Stochastic Computing Stochastic computing exploits the similarities between probability algebra and Boolean algebra. A random pulse sequence x(,.) is a sequence of pulses whose probability x can not be measured during any one clock period. It can, however, be approximated by a measurement of the average pulse rate. Any Boolean operation over individual pulses corresponds to an algebraic operation among variables represented by their respective average pulse rates [45]. If two sequences rm and yo.) are statistically independent, the probability of pulse occurrence in an output sequence 20., of an AND gate is z = P[z(,,, = l] (3.1) = P[x(n) = 1 AM») = 1] (3-2) 45 = P[x(,,, = 1112mm: 1] (3.3) =xy, ‘ (3.4) The probability of pulse occurrence in an output sequence 2,,” of an OR gate is z =P[z(,,)= l] (3.5) = Pixel = 1 Vym = 1] (3-6) = Pl:x(r1)=1]+ PLVm:1]'P[x(n1=1/\y(n)=1] (3-7) =x+y-xy. (33) Instead of being statistically independent, if the two sequences are mutually exclusive, i.e., x y = O, the logical OR performs a direct summation. A NOT gate produces an output pulse whenever no input pulse occurs. If x(,,) is the input sequence of a NOT gate, the probability of pulse occurrence in the output sequence 2(a) is Z =P[Z(n) = 1] ' (3.9) =1 -P[x(,,)=l] (3.10) =1 -x. (3.11) Examples of stochastic computations utilizing the duality between Boolean and algebraic operations can be found in [45]. 46 3.2.3 Stochastic Computing in the DMNN Neural operations in the DMNN are performed with basic gates using pulse sequences as inputs. Let w,-,- and v,- be the connection weight between neurons 1' and j, and the neural activation of neuron j, respectively. If two sequences WW.) and v,~(,,) are statistically independent, the probability of pulse occurrence in an output sequence mm") of an AND gate is m.) = P[m,-,(,,) =l ] (3.12) =P[w,«,(,,) = l /\ v10.) = l] (3.13) =leyln1= 1 IPLVJ'OI) = 1} (3-14) =w,-j 12,-. (3.15) Input summation and nonlinear transformation is performed simultaneously using a logical OR operation. The inputs of an OR gate are product sequences, mgr”), produced from AND gates. Two kinds of synaptic weights are necessary for most feedforward networks: positive or excitatory weights, Wy‘+, and negative or inhibitory weights, w,,-'. Two separate OR gates per neuron are, therefore, necessary to form excitatory and inhibitory net inputs. 47 mum W 9 1'0!) . ”3— .. in H H - ”1(a) I ’ (a) netw min») netw L] n H H : minor) F t (b) ”1(a) net ‘ (a f I _.._>— .n n m n (C) Figure 3.3. Stochastic Computation in the DMNN: (a) Stochastic Multiplication; (b) Logical OR; (c) Neural Activation [1]. Let net} be the probability of a pulse occurrence in the output sequence net“. f of an n-input OR gate for excitatory net input in neuron i, and, net; likewise for an inhibitory net input (Figure 3.3). net,+ and net; can be described as net,+ = P(net,‘:,,) =1) (3.16) = P(m,*,(,, = wing“, = 1v vm;(,, = 1) (3.17) =1-(1-P(m;;(,, = l))(1-P(m,*,(,, = 1)) ...(1—P(m;(,,) = 1)) (3.18) =1-fi(1-m;) (3.19) j-l fl! A.|V LU‘ 48 :1— (l—w’v) (3.20) U 1 and net,“ = l- (1+w’v ). (3.21) U 1 \. 11‘ Two net inputs, formed by dedicated OR gates are ANDed together to form the activation function. The probability of a pulse occurrence, v,, for two statistically independent sequences netf and net,’, is given by =P(net+ =1Anet,}n)= ) (3.22) l(n) = net: (1 - netf) (3 .23) f n n = [l- 1:1(1—wyvj) 111(l+w,jvj) . (3.24) The nonlinear activation filnction in equation (3.24) is both continuous and differentiable and therefore a good candidate for backpropagation training [3]. This form of stochastic computation has been used by Kim [I] in the development of the DMNN architecture, a subject for the next section. A detailed mathematical derivation of the backpropagation algorithm using the above stochastic filnction is covered in Chapter 4. TC 49 3.3 Hardware Implementation 3.3.1 Basic Computing Elements Kim developed a modular network architecture by developing the basic building blocks in the form of a Random Pulse Generator (RPG), a Synaptic element (SYN), an Input Neuron Body element (INB), and a Regular Neuron Body element (RNB) [1]. 3.3.1.1 Random Pulse Generator (RPG) The RPG is comprised of a tapped LFSR and a digital comparator. The tapped LFSR is constructed using D flip-flops and XOR logic gates [1]. Figure 3.4(a) shows an 8th order LFSR, with a feedback function f(x) = x2 $ x3 9 x4 $ x3, as implemented by XOR logic gates. The period of the sequence v(,,, is 28-] = 255. Figure 3.4(b) presents an RPG. A logic ‘1’ or ‘0’ pulse is generated at every clock pulse for v, 2 x and v, < 3:, respectively. 50 ’l l ‘A l at i , 1 ; ir—DDFF—'TDli’Fe-Qi— ——————— DFF I DI-‘lT—-Q:—JDFF—-Ol . l | l g clock 1,4; ,1+l-l . i i i i v r _< f(x)=2$x369x4etxg > Vt clock (LFSR x Figure 3.4. (a) An 8-bit LFSR with a Primitive Polynomial; (b) An RPG for v,- [1]. 3.3.1.2 Synaptic Element The synaptic element (SYN) consists of an RPG, a weight register, two AND gates, and two wired-OR lines [1]. Figure 3.5 shows the structure and block diagram of a digital synaptic element. Weight w.) represented as an r-bit fractional number is loaded into a weight register and the corresponding random pulse stream WW, is generated through the RPG. The pulse stream is transmitted to the two AND gates for discrimination of positive and negative weights. If the synaptic weight is positive, a resulting product sequence mm"; is transmitted to an excitatory net-input line. Otherwise, mm"; is transmitted to an inhibitory net-input line. 3.3.] 51 sign-bit ’ ‘ ~ \».\ + W” Ma ) \ m U (I) ‘ load Register > L > 7 select _—/ / magnitude \\\ ., m ”m. 3 gfint°;:f wliil) 0— > E» 1‘ clock 1 LFSR I// g. // 1 fi/ RPG v“" v v (a) l i i , :11 —> SYN I: mum in) + mill-1' (b) Figure 3.5. (a) A Synaptic Element; (b) Block Diagram of SYN [1]. 3.3.1.3 Input Neuron Body Element (INB) An INB consists of an n-bit register and an RPG [1]. The INB converts the value of the ith element in an input pattern, v,-, to a corresponding random pulse sequence van). Figure 3.6 shows the structure and block diagram of the INB. 52 select V. select vI Io“ b't R ° t l l C'OCK - ll - I egrs er I d , OI ——. L LFSR l vii-1 RPG v1(a) (t) (b) Figure 3.6. (a) Input Neuron Body (INB) Element; (b) Block Diagram of INB [1]. 3.3.1.4 Regular Neuron Body Element (RNB) An RNB consists of an AND gate, an OR gate, an up-counter, a 2x1 multiplexer, a bufier, and an RPG[1]. Two net-input pulse streams, transmitted fi'om synaptic elements, are collected in an up-counter through an AND gate to form a neural activation. Figure 3.7 shows the structure and block diagram of an RNB. v,- is estimated as v, which is actually the value of the up-counter after each iteration. The signal new_iter changes from ‘0’ to ‘1’ after each iteration and transfers the output of the counter to a buffer via a 2x1 multiplexer at the next clock. At the same time, the up—counter is reset. This output is used to generate a new action pulse sequence A 15-0,, while the up-counter continues to accumulate incoming pulses. v , (dotted line) is used (4.1 ila; [lag 53 as an output of a neuron i in the output layer, while van) (solid line) is used in the hidden layers. netm’ netm' lock [ P V new_iter c l U' "m i - load data 1 | Multiplcxcraxl) )——i (L) ““x , I) select ‘ 4| Bu fl’er i I d‘“ net“) 5 ; clock new lter select m load- v“) v. (a) (b) Figure 3.7. (a) A Regular Neuron Body (RNB) Element; (b) Block Diagram of RNB [1]. 3.3.2 DMNN Architecture The DMNN has been constructed using four basic modules: Input Layer Module (ILM), Synaptic Array Module (SAM), Regular Neuron-body Array Module (RNAM), and Interconnection Module (ICM) [1]. The ILM is composed of a group of INB elements. It receives input and transforms them into corresponding binary pulse sequences. These pulse sequences are transmitted to the SAM through the ICM. The SAM consists of a group of synaptic 54 elements and net input lines. All synaptic multiplications in the same layer are performed simultaneously. The ICM is a group of connection lines that transmit action pulse sequences fiom the previous layer to the SAM in the next layer. Pulses on two net-input sequences transmitted from the SAM are collected at the RNAM. All v,(,,,’s from the RNAM are produced simultaneously. Using these modules, any size of DMNN, with any number of neurons in each layer and any number of layers, can be configured. Figure 3.8 portrays architecture of the DMNN. Input i Layer I l ILM l i W data ICM >'<| m—— net_ioad l r l Output I l SAM 1 1 rd——— select Layer 1 ‘ | RNAM ] . L h :14— new_lter Figure 3.8. DMNN Architecture [1]. 3.3.3 DMNN Coprocessor The DMNN coprocessor is composed of a controller, a memory, an iteration counter and a clock generator. Figure 3 .9 shows the architecture of a DMNN 55 coprocessor. The controller consists of a microprocessor and a control unit which may be either programmed or hardwired. The training of the network is performed on a host computer. The trained weights, network configuration, input patterns and some control commands are downloaded from the host memory. clk_enable ‘ d°°k new it“ Coprocessor —"l_1£l—f—' wan... elk net_loadj . contro . net_enable ”" /1——'\ Host m \luj/ . I address DMNN 4 ) address Memory \t——t/ Figure 3.9. DMNN Coprocessor [1]. 3.4 Behavioral Model of a DMNN Coprocessor 3.4.1 Design Methodology The design of any hardware typically starts from a gate or circuit level schematic. However, as the system becomes complex, a top down approach is needed to reduce the development cost as well as the design cost. The top down design starts with a high level 56 specification which is subsequently reduced to lower level specifications in a hierarchical manner. Hardware Description Languages (HDL) are very important tools for a high level or top down design [46]. VHSIC Hardware Description Language (VHDL) is a typical behavioral description language which is semantically oriented for digital systems. VHDL can be used to model the behavior of systems and simulate them to verify the design. Modeling involves specifying the inputs and outputs of a device, and describing its behavior and/or structure [47]. A top down approach was needed to model the DMNN coprocessor. Kim used VHDL to model the proposed DMNN architecture. The coprocessor model starts with a high level specification of the network which is decomposed into lower level specifications in a hierarchical fashion [1]. Figure 3.10 shows the design hierarchy of the DMNN coprocessor. The DMNN itself consists of three layers: input layer, hidden layer, and an output layer. Each component is described in VHDL using the two styles of description: behavioral and structural. The components in the lower branches of the hierarchy are modeled using behavioral descriptions and all the other components are modeled using structural descriptions. 57 DMNN Coprocessor lNBs SYNs RNBs SYNs RNBs {gm {nor} {cw} gm; {RG'rfs {CMP} €30de- {RGT {SW} {Ms} 59W} ----------------------------------------- 3.4.2 Step 1. Step 2. Step 3. Figure 3.10. DMNN Structural Hierarchy [1]. Coprocessor Control Kim’s approach for modeling the DMNN coprocessor is elaborated below: The network is trained using software simulations and the backpropagation rule. (A detailed derivation of the backpropagation algorithm, using Kim’s stochastic model, is presented in Chapter 4). The trained weights, input patterns and the test patterns are written in files. The DMNN coprocessor reads the weights file and the file for test patterns. The coprocessor initializes the LFSRs in the network and loads synaptic weight registers with the trained weight values. 58 Step 4. The coprocessor initiates the controller to start generating the control signals. Step 5. The DMNN starts classifying the input patterns and writes the output patterns into a file. The complete VHDL code for the DMNN coprocessor is given in [1]. 3.5 DMNN Applications 3.5.1 Application Methodology Binary classification problems are applied to the DMNN coprocessor as test bench problems. The sampling data, comprised of the training data and the testing data, is selected first for each problem The DMNN is trained off-line on a host computer using the training data. The training is carried out using the backpropagation algorithm (Chapter 4). Afier the training is complete, the network architecture is setup and the test data is applied to the DMNN coprocessor. The performance of the DMNN is evaluated based on the test results. Each pattern is represented as an array of binary numbers. The number of neurons in the input and output layers of the DMNN are directly dependent on the nature of the problem The number of neurons in the input layer are one more than the number of elements in the pattern vector. The additional input is used to provide a permanent I as a threshold input to the synaptic elements located in the upper layers. The number of neurons in the output layer is the same as the number of categories into which the sampling patterns are to be classified. 59 Kim’s results show that two layers (one hidden layer) are enough for a binary classification. He has also shown that a relationship exists between the existence of solutions, the value of the sum squared error and the ratio of input vs output values. Kim has suggested the ratio value as 0.1 and used 0.55 and 0.45 as target values to represent the ON and OFF states of the output neurons. The network configuration is a matrix that represents the interconnection of all the neuron elements in the network. It can also be represented as no x n, x n; . . . x n,,, where no denote the number of elements in the input layer, n1, n3, . . . ,n,;, denote the number of elements in the hidden layers and n, represents the number of elements in the output layer. A minimal network configuration is defined as the network that converges to an error state equal to or below the predefined error threshold with a minirmrm number of neurons in the hidden layers and minimum number of hidden layers. 3.5.2 Benchmark Problems Two benchmark problems: the XOR problem and an 8x3 encoding problem were applied to the DMNN coproceSsor to test its ability to classify linearly unseparable patterns and whether it can work as a data compressor. 3.5.2.1 XOR Problem Solver The Exclusive-OR (XOR) problem is a dual input single output problem with linearly unseparable patterns. Figure 3.11 displays an XOR problem The nonlinearity associated with the XOR is also represented graphically. A network configuration of 60 3x2xl was considered. Table 3.1 represents XOR problem for a DMNN showing four input patterns and their corresponding target patterns. The DMNN was able to classify the patterns correctly when the register length was more than ‘6’. Values below ‘6’ degraded the network performance considerably. Truth Table - XOR I1 I2 0 0.0 0.0 0.45 0.0 1.0 05 5 1.0 0.0 0.55 1.0 1.0 0.45 I2 (0,1) 0 (1,1) x > (0.0) (1.0) I] Figure 3.11. The XOR Problem Table 3.1. An n-bit Two-layer DMNN XOR Results [1]. 0 i I. I; n=10 n=9 n=8 n= n=6 n= 0.0 0.0 0.451 0.456 0.461 0.468 0.469 0.500 0.0 1.0 0.549 0.556 0.563 0.563 0.688 0.563 1.0 0.0 0.549 0.556 0.570 0.578 0.719 0.563 1.0 1.0 0.457 0.449 0.445 0.484 0.625 0.438 61 3.5.2.2 8x3 Encoder In order to see the behavior of DMNN as a data compressor, an 8x3 encoding problem was applied. A minimal network configuration of 9x3x3 was considered. Figure 3.12 portrays an example of the 8x3 encoder. The problem is represented in Table 3.2. The DMNN successfully encodes the 8-bit data when the register length is more than ‘8’. Figure 3.12. 8x3 Encoder. Table 3.2. Truth Table for an 8x3 Encoder [1]. Ir I: la 14 Is 16 17 111 or 02 03 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.55 0.45 0.45 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.55 0.45 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.55 0.55 0.45 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.45 0.45 0.55 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.55 0.45 0.55 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.45 0.55 0.55 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.55 0.55 0.55 62 3.5.3 DMNN Character Recognizer The DMNN was used as a character recognizer for solving two problems: a five- digit problem for classifying digits 0 to 4 and a ten-digit classifier for digits 0 to 9. The network was trained with two sets of training data for each prOblem The first set consists of ideal digits and the second set contained 10% noisy patterns (three patterns per digit). An additional set of 20% noisy patterns (three patterns per digit) was added in the test data set. For the five-digit classifier, each of the five digits were represented by a matrix of 6x4 pixels whereas the pixel size was increased to 7x5 ill the ten-digit problem Figures 3.13 and 3.14 depict the pixel images for the five-digit and ten-digit problems. Kim compared the performance of the DMNN with an ordinary multilayer neural network employing a sigmoidal filnction. Typical results of his comparison using a 5-digit classifier are shown ill Table 3.3. The ‘*’ values indicate minimal network configuration and N1 and N2 are the number of neurons in the first and second hidden layers respectively. Each figure in the table is the average value based on ten different initial sets of weights. A learning rate (7]) = 0.2, momentum factor (a) = 0.5, and sum squared error (ss-error) = 0.0001 are used for the DMNN while 7] = 1.0, a = 0.5, and ss-error = 0.01 are used for the ordinary neural network. Lower values of 17 and ss-error used for training of the DMNN are due to the constraints in its operation range. Figure 3.14. lO-Digit Classifier Patterns: (a) Ideal; (b) 10%Noisy; (c) 20% Noisy [1]. 64 Table 3.3. Average Number of Iterations for 5-Digit Classifier [1]. DMNN ANN Trained with a = 0.0 Trained with a = 0.5 Trained with a = 0.5 N 1 N 2 No of iterations N I N 2 No of iterations N I N 2 No of iterations *4 0 2086 *4 0 1738 ‘5 0 13311 10 0 420 10 0 293 10 0 17212 30 0 332 30 0 182 15 0 11905 50 0 274 50 0 135 25 0 14108 6 4 12263 6 4 7482 8 4 79320 8 6 10446 8 6 4959 8 8 35480 10 8 7427 10 8 2972 10 8 44520 12 10 6135 12 10 1820 12 10 55025 Kim’s results show that the DMNN network can be trained with a significantly reduced number of iterations compared to an. ordinary ANN. The fast convergence obtained in the DMNN is due to the fact that the vahles of the output neurons can take on intermediate vahles between 0 and 1 as target values. The number of iterations generally increase with the number of hidden layers and decrease with the increase in the number of neurons in the hidden layers of the DMNN. Performance results for a 5-digit recognizer are tabulated in Table 3.4. A DMNN model with 8-bit and 9-bit register lengths and a C simulated DMNN are compared for nrisclassification errors. The misclassification occurs when no neuron or more than one neurons are turned on during the test mode. Each figure is based on an average result of 20 tests and for each test a new set of synaptic weights are loaded The results indicate that a 9—bit character recognizer behaves very close to a deterministic simulation It is also noted that the performance of DMNN improves when training data contains noisy patterns. 65 Table 3.4. Performance of 5-Digit Classifier [1]. Trained with ideal digits Trained with ideal + 10% noisy data MisJassi/ied error Misclassified error Msclassi/ied error on 10% noisy data on 20% noisy data on 20% noisy data 8-Bit DMNN 32.2% 55.6% 38.7% 9-Bit DMNN 11.7% 30% 26.7% C-simulation 8.5% 43% 35.3% Table 3.5 shows the performance results for a lO-digit classifier. The learning rate (7]) = 0.05, momentum factor (a) = 0.0, and ss-error = 0.005 are used. Only two layers are considered for this experiment and minimal network configurations are 36x9x10 and 36x30x10 for ideal set and ideal plus noisy sets of training data, respectively. The results are compared with those of the deterministic simulations and ordinary ANN classifier. Each figure in the table is the average vahre based on twenty sets of weights The classification rates obtained from the DMNN are close to the results of the deterministic sinmlations when the register length is greater than 8. Table 3.5. Performance of lO-Digit Classifier [l]. Trained with ideal digits Trained with ideal + 10% noisy data Miscfisification Misclassification Misclassification error on 10% noisy error on 20% noisy error on 20% data data noisy data TRW—WI . % 433% W—z .4 0 9-bit DMNN 9.7% 363% 20.3% ' C-simulation 6.7% 35.77% 13.5% L ANN 15.8% 30.8% I6.’7% ‘ 66 3.6 Performance Summary A pulse mode digital multilayer neural network architecture implementable with simple logic gates, developed by Kim, has been presented. The DMNN employs stochastic computing techniques and a modular architecture that is compact and programmable. All operations in a layer are performed in parallel and operations between two layers are performed in a pipeline fashion thus utilizing fill] parallelism. The processing speed depends only on the clock speed and the register length; not on the network size. Therefore, any size of network can be constructed by interconnecting the desired number of modules without compromising speed The DMNN has been applied successfully to binary classification problems such as XOR, 8x3 encoding, and character recognition. Kim simulated the DMNN in VHDL and found the performance comparable to deterministic DMNN simulations and ordinary backpropagation neural networks. CHAPTER 4 BACKPROPAGATION IN THE DMNN 4.1 Introduction The digital multilayer neural network is a feedforward neural network which can be trained with the backpropagation algorithm discussed in Section 2.3.4.2.2. Backpropagation performs an iterative gradient descent over a Stun-squared error measure. This chapter discusses the method of implementing the non traditional stochastic flmction, used by Kim (Chapter 3), into the backpropagation algorithm Kim, though he uses the backpropagation algorithm in his DMNN, does not provide the mathematical details of the C simulation program [1]. In order to understand his C code and to be able to develop an improved simulation technique, it is essential to derive the backpropagation equations. These equations are subsequently used ill the software sinmlation of the DMNN. The backpropagation technique is criticized and its inherent weaknesses are identified. Enhancements are suggested in the end to improve the performance of the DMNN. 67 4.2 7 4.2.1 This 11 deelc 511111112 preset 68 4.2 Backpropagation Implementation 4.2.1 Review of the Stochastic Function The stochastic filnction developed by Kim [1] has been presented in Section 3.1.3. This non-linear, non-decreasing, continuously difi‘erentiable function has been used in the development of a digital multilayer neural network. This section discusses a software sinnllation of the DMNN using the backpropagation algorithm discussed in Section 2.3.4.2.2. To review the stochastic firnction, a three layered feedforward neural network presented in Figure 2.8, is redrawn again as Figure 4.1 for ease of reference. m—ichH Input Layer, I dell Hiddur Layor,J OutputIayer,K Figure 4.1. Three Layered Neural Network. wr-lC'cv-lcio 69 In the above figure, v,- is the output of the ith input layer passive neuron, v,- is the output of the jth hidden layer active neuron, and vi is the output of the of the kth output layer neuron. w.) ’s are the synaptic weights connecting the ith input layer neuron to the jth hidden layer neurons and WI), '3 are the synaptic weights connecting jth hidden layer neuron to the kth output layer neurons. From Section 3.2.3, we know that the excitatory net input for the jth neuron in the hidden layer is given by net; = l-fi(l—w;v,) (4.1) 1.0 and the inhibitory net input for the jth neuron ill the hidden layer is given by Ill net; =l-H(1+w,;v,) . . (4.2) i-O In equations (4.1) and (4.2), v,- = l for i = 0, and n,~is the total number of neurons in the input layer. Similarly, the excitatory net input for kth neuron in the output layer is net; = l-fi(l-wj’;vj) (4.3) j-O and the inhibitory net input for the kth neuron in the output layer is net; =l-1gI(l+w;,vj) . (4.4) j-O 70 In equations (4.3) and (4.4), 13 = 1 for j = O, and nj is the total number of neurons in the hidden layer. Therefore the output of the kth neuron in the output layer is v, =nez;(1—nez;). (4.5) From equations (4.3) and (4.4) "J v, = [l—fi(l—w;kvj)] (l+w;,vj) (4.6) j=0 j=0 where v, is the output of the jth hidden layer neuron given by v) = net;(1-net;) . (47) Finally, from equations (4.1) and (4.2) v}. =[1-In](1-w;v,)]]31(1+w,;v,) . (4.8) i=0 i=0 4.2.2 Backpropagation Analysis The outputs, v), for the output layer and v,- for the hidden layer, are calculated using equations (4.1) and (4.3) for positive weights and equations (4.2) and (4.4) for negative weights. Let it be the desired outputs. The objective is to minimize the difference between the desired outputs, it, and the actual outputs vr. 71 Define the error cost firnction Er for each k as l 2 E, =—2-(z, - v,) . (4.9) Therefore, minimizing the cost function for the kth output node with respect to the weight connecting jth hidden node involves using as, _ 6E, av, '(410) + " . + 0W}. 0V. 5W). (E 6E 6v and f = ' o f . 4.11 aw... av. aw... ‘ ’ From equations (4.9) and (4.10) 6E, _ 6v, —aw;, - (r, v,) aw}; (4.12) and fiom equations (4.9) and (4.11) 6E, 6v, aw]; (t, -v,) aw]; (4.13) Applying the chain rule gives 6v, 6v, anet; —_+’ = , 0 + 4.14 aw, aim, 6w], ( ) and av, — av, am" (4.15) ___ _, _- aw, finer, aw, 72 From equation (4. 5), we find 5V. = 1- net' anet: " av and ’_ = «net,+ . anet, From equations (4.3) and (4.4), we obtain where,v,-=lforj=0. (4.168) (4.16b) (4.17) (4. 18) Now baclmacking by substituting equations (4.16), (4.17), (4.18) ill equations (4.14) and (4.15) and then substituting equations (4.14) and (4.15) ill equations (4.12) and (4.13) at appropriate places, we obtain BE 6,}; =-(t.-v.)(1-mt;)(1-mr:),_fj;,j 6E and aw}: = -(t,-v,)(net;)(1-net;) 1+3;ka . Let 8: =(t, -v,)(l-net;) and 6; = (t, —v,)(net;) . (4.19) (4.20) (4.21) (4.22) 73 Hence equation (4.19) becomes (915‘I v. = _ 5+ 1_ t+ J aw}; k( nek)l-W;kvj and from equation (4.20) aEt v} =— ' l— t‘ 6w}, 5"( ne ‘) 1+wj’,vj where, vj= l, forj = 0. (4.23) (4.24) Backpropagation is a gradient descent learning rule. For excitatory weights, it is given by as w;,[n+1] = w,’,[n]+n[- mi] and for inhibitory weights, by 6E w‘[n+1]=w‘[n]+n[- 1‘] ft 1k 6W.“ where n is the number of presentation and 77 is the learning rate. (4.23) and (4.24) in equations (4.25) and (4.26) respectively gives v] + l wj,vj w;,[a+ l] = w;,[n] +118: (l—acr;) v1 1+wfl‘vj and w;.in+11= w;.inl -n6;(1-net;) where,v,-= 1forj=0. (4.25) (4.26) Substituting equations (4.27) (4.28) 74 It is important to note that equations (4.27) and (4.28) are only valid for weights connecting the jth hidden layer neurons to the kth neuron in the ouput layer. Weights connecting any other two layers are updated by reflecting back the error or difference at the output neurons. Therefore, referring back to Figure 4.1, weights, w.) are updated reflecting back the error from _a_l_l the output nodes to the jth hidden node. . , + " 6E, That 15, w” [n+1] = w” [n] +1] — aw, k-i y " 6E and wgin+li=wginl+n[-X ’1) kn] my Using chain rule gives '6E,_ 6E, av anet; 6v .Ianet k-l aWJ- k-l av; .Ianet .Jav ”anet " 6E,_ 6E av, anet; 6v anet‘ .Wdr and Shimmy M aw; M av anet;.15v anet' aw- ' We knowthat %-_(tk—vk)s 67622;; -1 net; , and 6:2; ‘ -net;’ (4.29) (4.30) (4.31) (4.32) (4.33) (4.34a) (4.34b) 75 5E (3V Therefore, 6v: om=-(t, —v,)(l—net;) =—5; 5E 6v + - and Bf.-6n—eif=-(t*-v*)(-netk)=8* . Next, from equations (4.3) and (4.4) we write and — =—(l-net,) _ 6v] 5 etj (l-netj) d j t‘ anet“ J + l , = l—net , 6w” ( 1)1—wyv, d duet; (I 1‘ vi =- -ne an an»; 1 1+w;, (4.35) (4.36) (4.37) (4.38) (4.398) (4.3%) (4.40) (4.41) Substituting equations (4.33) through (4.41) in equations (4.31) and (4.32) at appropriate places, we obtain 76 ‘1 =— i 1— t? — 2’ - . ° 4.4 H 6w; m (6,)( ne L)1-W,',.", (1 mt} )(1 Iltt})1_w’;vl ( 2) n 5E), n “5): V =_ - _ - * — ‘ ' . 4. and M 6w; H (5,)(1 netk) ”wit", (net) )(1 net!) l+w,;v,. ( 43) Now, define the identities n W+ 5*: 1— r 5* 1— 1* —’-"-—— 4.44 J ( nej)k=l k( nek)l—wikvr ( ) w" 5'. = r: 5' 1— r "’ . 4.45 and ’ ne 1; "( ne ’) l+w;,vj ( ) Hence, equations (4.42) and (4.43) become " 6E, + + v, w aw: --5,(1—net,.) l_w,v (4.46) 11 u I " 6E d “ =—6'. 1— r: . . an H 6w; )( ne ’)1+wv’.v, (4 47) Substituting equations (4.46) and (4.47) in equations (4.29) and (4.30) respectively, we . + + + 4» vi obtaln w”. [n+ l] = w”. [71] +115]. (1 — net!) 1_ wfv (4.48) 1} l a -[ 11— -[ 1 sh -) v.- an w”. n+ —w,.j n +1] I —netj 1+ngr . (4.49) Finally, rewrite the above equations as Aw;[n+ l] = no; (1 - net;)-]-:i‘—+‘T (4.50) U 1 77 _ - _ v, and Awy [n+1]: 775}(1—netj)1+w;v, (4-51) where n is the number of presentation steps and Awy[n+ l] is the change in weight required for the next presentation. Flowcharts representing the implementation of DMNN using this backpropagation algorithm are placed in Appendix A. 4.3 Reviewing Backpropagation Philosophy We know that each input vector V,- in a backpropagation algorithm has a corresponding desired output vector T. In response to V.-, the system yields its own output vector Va. The backpropagation philosophy considers performance error to be the vector difference between V), and T, that is, error vector E is E: T- Vk = ( [(I'VkI], [trvkzL a [tn'vlm] ) ' (4°52) Minimizing the squared length of the error vector E amounts to minimizing the sum of the squares of each of the elements or n E 112 = [2,-v.,]2+ [trvu]2+ + [1,..v,.,,]2 . (4.53) The main idea is to minimize E by adjusting each weight in the ANN. The technique is similar to the Widrow-Hoff algorithm for linear systems [48], even though an ANN is typically nonlinear. The entire approach rests on the desire to modify each weight by an amount proportional to how it can decrease the output error. In other words 78 fl:_ dllEllz (454) dt n dw ' where dw/dt is the amount of adjustment each weight would receive for one training exposure. Each weight is updated as w[n+l] =w[n]+-::v—t. (4.55) The critical difference between the backpropagation and the Widrow-Hoff model is that the former uses PBS with nonlinear output transfer functions and the latter uses a linear function. This poses a problem for the backpropagation algorithm. In a linear system, any arrangement of rows of PBS with linear data processing could always be reduced to just two rows. As a consequence, error minimization amounts to adjusting those weights which will maximize the descent down an n-dimensional parabolic basin, the least mean-squared error surface. This makes implementation of equation (4.54) straight forward in linear systems [48]. Figure 4.2 portrays the least mean squared error surfaces ill—3h (a) (b) in a linear and a nonlinear ANN. Figure 4.2. Error Surface in: (a) Linear ANN; (b) Nonlinear ANN. 79 An ANN with rows of nonlinear PEs cannot, in general, be reduced to fewer rows. This has two major consequences : 1. We must carry out the same algorithm foral_l rows when adjusting the weights. 2. The error surface is gutted with many basins of which few should ever resemble the analytically appealing paraboloid (Figure 4.2). The consequences of these two issues as well as a few others [49] are examined in the following sections. 4.3.1 Backpropagation through Noise Since, the system error is first derived at the output, the error signal must propagate back through the rows in order to reach every row. To begin with, however, every row is assigned random weights. Thus the error becomes increasingly meaningless as it backpropagates through the net of randomized weights A5 a consequence, the fiont rows are ‘misled’ into what adjustment was truly needed Afier this almost random ' correction at the front rows, the back rows will now receive data corrupted by the updated fiont rows. This noisy commrmication appears to be very ineficient because some simple training tasks may require thousands of training cycles. 80 4.3.2 Simultaneous Change The mathematical derivation of the formula for weight adjustment uses partial derivatives, which assume that nothing else is being changed except the weight in question. However, in the true gradient descent method the weight change is affected after the complete set of training patterns have been presented to the network. If the weight change is to be implemented at the presentation of each pattern then the learning rate has to be small enough to enable minimization of the error fimction. 4.3.3 Numerous Basins and Crests One reason why the mean-squared method worlm well for linear systems is that the error fimction yields only one n-dirnensional parabolic error surface having only one minima at the paraboloid’s bottom Nonlinear systems, on the other hand, will typically have an error sin-face indented with numerous basins. For backpropagation, the problem is enhanced since these basins are usually non-parabolic and time varying as shown in Figure 4.3. 81 -/ / slope = 0; => no change Figure 4.3. Local Minimas and Relative Weight Changes. Zero slopes exist between any two basins as well as at the bottom of each basin (local minima). This implies that the weight modification may be very slow within regions of relative maximum error as well as at the relative minimas. This is yet another cause for slow learning. Since there is no simple relation between slope and distance to a relative minima, large misguided adjustments can frequently occur, sending the system state straight down through the local minima and far up the other side, giving the system an error performance worse than prior to the adjustment. 4.3.4 Lack of Digital Convergence Proof It can be shown mathematically that if one uses infinitesimally small weight adjustments, which also requires infinitely long patience, the convergence to an optimal 82 behavior will occur in a backpropagation network. The reality, however, requires a finite training period and therefore a fixed valued learning constant. Thus, digital applications may not necessarily converge and there is no way to find out when to stop the training session. 4.3.5 Dependence on Initial Conditions The presence of numerous minimas with unequal depths is another significant issue. Convergence towards one minima may yield a lower mean squared error than convergence towards another. There is no direct way to determine whether the backpropagation has reached a local minima or a global minima. The only way of reaching better results is to randomize the weights and re-train again and again. The final performance, therefore, is partially dependent upon the randomized initial conditions for ..g 4.3.6 Non-real Time Adaptation The backpropagation tends to fade its memory of previously learnt patterns as it learns the new patterns. The consequence is that every new pattern for training requires all the old patterns to be recycled alongwith inorder to achieve a long term noise irnnnmity during the training 4.4 Accelerating the Convergence Several heuristics are available for attaining improvements in the rate of convergence [3,49,50]. The methods adopted by Kim [1] are discussed, and variations are suggested to further accelerate the convergence process. 4.4.1 Momentum Factor (01) Rumelhart suggests that one way to increase the learning rate without leading to oscillations is to modify the generalized delta rule to inchlde a momentum term [3]. This can be accomplished by the rule AMI“ 1] = 77 (5110:») + a Awg{n] (455) where n is the presentation number, AWg{n+ 1] is the weight change required for the next step and or is the momentum factor such that 0 S a S 1. or determines the efi‘ect of past weight changes on the current direction of movement ill the weight space. It provides each connection weight “’9' with a kind of momentum so that it tends to change in the direction of the average downhill force instead of oscillating with the high fiequency variations of the error surface in the weight space. In turn, the effective learning rate 17 can be made larger without initiating oscillations. Adding the momentum term in equations (4.50) and (4.51) gives 84 Aw;[n+ l] = 776:0 —net;)-1—v‘—.—+aAwu’[n] (4.57) — w” V. and Aw; [71+ l] = 7,6;(1— neg) 1+1), _ + a Awgln] . (4.58) W” V. 4.4.1.1 Implementing the Momentum Factor Kim uses following procedure for implementing the momentum factor: Step 1. A number of registers equal to the number of weights (w) in the network are designated for weight change (Aw[n]). All registers are initialized to zero before presenting the first pattern (p). Step 2. First pattern is presented and weight changes, Aw[n+l], are calculated using equations (4.57) and (4.58). Here the momentum term is zero due to Step 1. Step 3. Next pattern is presented and weight changes are calculated using the results of step 2 in the momentum term. Step 4. All patterns are presented one after the other and corresponding weight changes are calculated in a similar fashion. Step 5. Weight changes calculated for the last pattern are dropped. Step 6. F eedforward and then go to step 1 for the next iteration until the desired error or maximum number of iterations is reached. 85 In other words the weight change calculated for the previous pattern is used in the momentum term for calculating the weight change for the next pattern (except the first pattern), and the weight change calculated for the last pattern is not considered for this purpose (see flowchart in Appendix A). 4.4.1.2 Analysis of Kim’s Technique Kim’s technique suffers fiom two problems: 1. All the registers are initialized to zero at the beginning of each iteration thereby discarding the last calculation of weight change in the previous iteration. Hence the weight changes calculated for the first pattern are without a momentum term 2. The weight change calculated for one pattern is used in the momentum term for calculation of weight change for the next pattern. The training results indicate the following response attributable to the above mentioned observations: 1. The convergence range is extremely restricted. Weight values for most of the processing elements tend to cross the limits (-1.0 S w S +1.0) and start increasing without bound. 2. The network takes a long time (too many iterations) to converge. 86 4.4.1.3 Redefining the Implementation of Momentum Factor In order to achieve better results, the procedure for implementing the momentum factor is redefined as follows: Step 1. A number of registers, equal to the number of patterns (p), are designated for weight changes (Awp) in each weight (w) in the network All the registers are initialized (Aw,[n] = 0) at the beginning of the training mode. Step 2. All patterns are presented one by one and weight changes, Aw,[n+1], are calculated using equations (4.57) and (4.58). Here, the momentum term is zero for all the patterns for the first iteration (n = 0) only. Step 3. In the next iteration (n +1), weight changes, Awp[n +1], are calculated using the results of step 2 in the momentum term Step 4. Go to step 2 lmtil the desired error or maxirmlm number of iterations is reached. In essence, the weight change is calculated separately for each pattern and it is used in the momentum term for calculating the next weight change for the same pattern. Moreover, n_o weight change is discarded. 4.4.2 Updating the Weights The network weights are not updated afier presenting each pattern. Rather, the weights are modified only after all input patterns have been presented. After each pattern 87 is presented, all weight changes are calculated, as usual, but these changes are not immediately applied. Instead, the changes for each weight are summed separately for all the input patterns, and the sum is applied to modify the weight after each iteration. That is, Aw” [n +1] = :pry [n]. (4.59) iI-l Note that n represents the iteration number rather than the presentation number, since the weights are updated only once per iteration for all the patterns. The flowchart in Appendix A elaborates the above procedure in detail 4.4.3 Varying the Learning Rate (17) An intelligence is incorporated in the program to check the rate of convergence and apply the appropriate remedial measures. The learning rate (7]) is varied in accordance with the performance index (total error for all patterns for each iteration). If an update results in reduced total error, 7] is multiplied by a factor ¢ > 1 for the next iteration. If a step produces a network with a total error more than a few (typically 1-5) percent above the previous value, all changes to the weights are rejected and n is multiplied by a factorfl <1. 01 is set equalto zero and the step is repeated. or is reset to its original value only after execution of a successful step. The rationale behind this maneuver is the idea that as long as the topography of the terrain is relatively uniform and the descent is relatively smooth, the memory implicit in or will aid convergence. If; however, a step results in a degradation of the performance of the system, then clearly the topography of the terrain demands a change in the direction of the optimization. Moreover, experience incorporated in the term in or will be misleading rather than beneficial Hence or is set to zero so that memory from the previous steps is lost. a will assume a nonzero value only after the network has taken a step that reduces the total error. 4.5 Summary The mathematical derivation of the stochastic function in backpropagation algorithm has been presented A critical analysis of the backpropagation technique is given. Measures taken by Kim to improve the performance index of the DMNN are discussed and possible improvements are suggested to further improve the performance of the DMNN. CHAPTER 5 DRINN ANALYSIS AND APPLICATIONS 5.1 Introduction The DMNN developed by Kim [1] has been simulated using a software model. The simulated DMNN is applied to the benchmark and the character recognition problems and its results are compared to an ordinary ANN using a sigmoidal fimction. The improvements suggested in Chapter 4 are also implemented and their afi‘ect is observed. The dificulties and constraints encountered during the course of this research are discussed at the end. 5.2 Methodology 5.2.1 General Methodology Three networks have been considered for application and analysis: the DMNN, the modified DMNN and the ANN. The DMNN and the ANN incorporate the algorithm discussed in Sections 4.4.1.3 and 4.4.2. The modified DMNN incorporates the technique discussed in Section 4.4.3. Each network is tested for three basic classification problems: 89 90 the XOR, data encoder, and pattern recognizer problems. The sampling data (patterns) for each problem is divided into the training data and the testing data. Each network is trained first with the set of training data. To study the learning pattern, networks are trained iteratively for a wide range of values for the learning rate (77) and the momentum factor ((2). Once the training is completed, the test data is applied to the trained networks. Based on these results, the performance of the networks is evaluated. Each pattern in the sampling data is an array of binary values. The number of neurons in the input layer of a network equals the number of elements in the pattern vector plus one. The additional input is required to provide the synaptic elements located in the upper layers of the network with a constant threshold input of 1.0. The number of neurons in the output layer is the same as the number of categories into which the sampling patterns are to be classified. The network configuration defines, in matrix form, the interconnection layout of the processing elements in the network A minimal network configuration for solving a particular problem can be defined as the network that converges to an error state equal to or below the predefined error threshold with a minimum number of neurons in the hidden layers and minimum number of hidden layers. Kim suggests that a two layer network (one hidden layer) with minimal network configuration is enough for solving most problems. He selects 0.0 and 1.0 to represent a logic 0 and a logic 1 respectively. A150 0.45 and 0.55 have been used as target values corresponding to ON and OFF thresholds at the output neurons. The above mentioned selections are incorporated in the three networks for solving the test cases. Input variables are selected ill the following manner: 91 Learning Rate (77): 77 is varied from 0.1 to 1.0 with a step size of O. 1. Momentum Factor ((2): a is varied from 0.0 to 0.9 with a step size of 0.1. Since this value is changed for each 77, a total of 100 different combinations for each problem are applied to the networks. Synaptic Weights (w): The synaptic weights are generated randomly. To be able to compare and draw meaningful conclusions, results are averaged over 20 repetitions for each combination. Maximum Sum-squared Error (as-error): A value of 0.00001 is used for all the problems because the higher vahles (ss-error > 0.00001) do not sufficiently train the networks. A comparison of the selection of input variables with those selected by Kim, is given ill Table 5.1. Table 5.1. Comparison of Input Variables. Input Variable Kim’s Value [1] This Research Learning Rate 77 0.2 0.1 to 1.0 Momentum Factor 71 0.5 0.0 to 0.9 SS - Error 0.0001 0.00001 Maximum Iterations 30,000 30,000 Range ofFinal Weights -1.0 to +1.0 -1.0 to +1.0 Network Configuration minimal minimal Range of Weight Generation 0.0 to +1.0 0.0 to +0.5 92 5.2.2 Training and Classification Each network is trained with a set of training and target patterns. Thereafter it is tested for classification using a set of test patterns. The synaptic weights are randomly generated once and updated continuously during the training session until the desired convergence is achieved. The training steps are listed below: Step 1. Load the desired network configuration which defines the network according to the nature of the problem. Input the values of learning rate (77) and the momentum factor ((1). Also input the values of phi (45) and beta (E) in case of the modified network discussed in Section 4.4.3. Step 2. Randomize all the synaptic weights between 0.0 and +0.5. Step 3. Apply all the training and target patterns one by one. Perform feedforward operation and calculate the sum-squared error for the network. Step 4. Apply all the training and target patterns one by one. Perform the feedback operation to prepagate the error in a backward direction and modify the synaptic weights. Step 5. Go to step 3 until the network converges and the sum-squared error value reaches below the threshold value of 0.00001. If the network oscillates or is stuck in a local minima, terminate the operation after 30,000 iterations. Step 6. Save the final weights for testing the desired patterns. 93 Step 7. Check if the values of all synaptic weights are valid, i.e., -1.0 S w 5 1.0. In case of invalid weights, repeat steps 1 to 6. Flowcharts elaborating the above procedure are given in Appendix A. 5.3 Benchmark Problems The benchmark problems inchide an XOR problem and an 8x3 data encoder problem An analysis of these problems is presented in the following two sections. 5.3.1 XOR Problem Solver The XOR problem is a classic problem involving hidden imits. This problem has invariably been applied to the neural networks to study their nonlinear behavior. The XOR problem is applied to each network with a network configuration of 3x2x1. The performance of the DMNN for 0.1 S n S 0.8 and a momentum factor, a = 0.2, is shown in Figure 5.1. A three dimensional view of the network’s learning pattern is presented in Figure 5.2, where, 0.1 _< 77 50.8 and 0.0 _< a _< 1.0. The graphical representation identifies the values of r) and a for which the network learns faster. Figure 5.2 shows that the best combination is for 0.6 S 77 S 0.7 and 0.3 S a S 0.4 when the network learns below 2000 iterations. These values are useful for the future learning and testing of the network The 3-D graph in Figure 5.2 also indicate the operational bounds of the DMNN. 94 The learning pattern for the ANN is presented in Figures 5.3 and 5.4. The pattern indicates several peaks and troughs showing an extremely divergent behavior. For comparison, similar values of n and a have been selected. The network generally learns faster with the increase in both the above mentioned variables. Figure 5.4 indicates that the best learning values are, n = 0.8 and a =0.9, when the iterations are below 2000. These are, however, only the troughs, as the learning does not follow a set pattern. In case of the modified DMNN, the training starts with an initial value of learning rate (11), and momentum factor (a), set by the operator. The learning rate is modified throughout the training cycle. it is therefore impractical to draw any meaningful sketch of its learning behavior. It has been noticed that the modified DMNN would invariably learn faster if the training is initiated with the acceptable values of n and (1. Moreover, due to its construction, the network would respond to a wider range of n and a. Iterations 95 Figure 5.1. DMNN as XOR Solver for a Momentum Factor on = 0.2. Iterations Figure 5.2. DMNN XOR Solver in 3-D, n vs at vs Number of Iterations. Iterations Iterations aoooo . 25000 20000 4 15000 . 10000 . 5000 {L 96 Eta Figure 5.3. ANN as XOR Solver for a Momentum Factor a = 0.2. Eta Figure 5.4. ANN XOR Solver in 3-D, 1] vs a vs Number of Iterations. 97 5.3.2 8x3 Encoder The purpose of the 8x3 encoder problem is to test the data compression ability of the networks. A network configuration of 9x3x3 has been considered. DMNN and ANN encoders are presented with the same set of values for the learning rate, 7], and momentum factor, a. The learning performance of the DMNN is represented graphically in Figures 5.5 and 5.6. The ANN counterpart is portrayed in Figures 5.7 and 5.8. The learning pattern shows that DMNN learns fast for n S 0.3. The performance degrades logarithmically as the learning rate is increased above 0.3. The ANN on the other hand, performs well for 17 2 0.2 and shows poor performance for lower values of r]. The divergent behaviors of the two networks indicate difi‘erent characteristics of the squashing fimctions used in the networks. The performance of the modified DMNN is tested for different initial values of 77 and a. The network converges relatively faster when r] S 0.4 for the same reasons mentioned in the above section. Iterations 98 Figure 5.5. DMNN as 8x3 Encoder for a Momentum Factor on = 0.2. Iterations Figure 5.6. DMNN 8x3 Encoder in 3-D, 1] vs or vs Number of Iterations. Iterations 10000 99 Figure 5.7. ANN as 8x3 Encoder for a Momentum Factor on = 0.2. Iterations Figure 5.8. ANN 8x3 Encoder in 3-D, 11 vs a vs Number of Iterations. 100 5.4 Character Recognizer The problem of classification of numeric characters is considered in two levels. The first level involves classifying five digits fi'om 0 to 4 and the second level involves classifying ten digits from 0 to 9. Each pattern in a 5-digit classifier is represented by the arrangement of 0’s and 1’s in a 6x4 bit matrix (Figure 3.13). The network configuration is 25x4x5. The network is trained first with a set of ideal digits (noise free). The simulation results are depicted by plots in Figures 5.9 and 5.10. The learning pattern indicates that the DMNN learns comparatively faster (less than 2000 iterations) for 17 2 0.2 and 0.1 S a .<. 0.4. The surface, however, looks rocky for 0.1 s n S 0.45 and 0.4 S a S 1.0. The learning pattern for the ANN is shown in Figures 5.11 and 5.12. It is observed that the ANN produced more encouraging results. A flat surface is observed (learning in less than 1000 iterations) for 0.2 S n S 1.0 and 0.4 S as 0.9. The S-digit classifier is tested (after training on ideal digits) for a misclassification error on 10% noisy patterns and 20% noisy patterns. All networks showed similar results. The performance is considerably reduced when the test is conducted for 20% noisy data. The average misclassification error results are presented in Table 5.2. Five sets of noisy patterns, in each category, are considered for each network and the results are averaged out. The performance of the networks can improve considerably if they could also be Iterations 101 10000 l0000 .E. 10000 .. 12000 ., ‘0000 .. , 0000 .. 0000 .. 4000 n. 2000 .. 0.: 0.2 0.3 0.4 0.5 Eta Figure 5.9. DMNN as S-Digit Classifier for a Momentum Factor on = 0.2. s Iterations Figure 5.10. DMNN S-Digit Classifier in 3-D, 11 vs a vs Number of Iterations. hennions 102 13000 10000 a 14000 0 12000 + roooo “ uooo . coon i .000 i 2000 . Eta Figure 5.11. ANN as S-Digit Classifier for a Momentum Factor a = 0.2. lunafions Figure 5.12. ANN 5-Digit Classifier in 3-D, 1] vs or vs Number of Iterations. 103 trained for noisy patterns. Due to time and machine constraints (Section 5.5), the networks could not be trained for noisy patterns. Table 5.2. Performance of 5-Digit Classifier Trained with Ideal Digits. Missclassified Error Rate on Missclassified Error Rate ‘ 10% Noisy Data on 20% Noisy Data DMNN 18% 54% ANN 17.5% 52% It is observed that pattern classification is beyond the scope of the computational equipment used in this research. Even an ideal 5-digit classifier presents a computational load that takes a considerable time. Each learning cycle for an ideal 5-digit classifier takes an average of one hour. One learning cycle means training the network for a particular value of 77 and a. 100 combinations of r] and a (ranging from 0.1 to 0.9 with a step size of 0. l) are presented to each network and each combination is repeated twemy times for averaging the results. It, therefore, takes approximately 2000 computational hours (84 days) for an 80486 based PC to obtain the rearlts of one S-digit DMNN classifier. The computational time for an ANN sinmlation is almost one half of the DMNN simulation. Eight similar PCs were engaged for 15 days to obtain the results of this experiment. The training with 10% noisy patterns was attempted but appreciable results could not be obtained within a reasonable time. 104 The second level involves classifying 10 digits fiom 0 to 9. The simulation entails a much larger data set: a pattern size of 5x7 bit matrix to represent each digit and a network configuration of 36x9x10. The network data segment becomes too large for a PC to handle. This experiment was, therefore, abandoned. A more powerful machine is the only solution to solve classification problems beyond an ideal 5-digit. Input data in the form of network configuration, training patterns and test patterns for each problem discussed above, is given in Appendix B. 5.5 DMNN Simulation Analysis The Digital Multilayer Neural NetWork was implemented using software sinmlation on a 80486 based PC with 8MB RAM. Kim [1] performed the simulation work on the Sun- SPARC stations, a much more powerfiil machine compared to a PC. This section analyses the simulation performance of Kim’s DMNN on a PC. The need to incorporate improvements in the network is discussed and performance of different variations to the backpropagation algorithm is presented. 5.5.1 Kim’s DMNN The backpropagation algorithm variant used by Kim [1] is implemented first. The problems encountered during this phase are discussed in the following sections. 105 5.5.1.1 Program Compatibility Kim developed a software simulation on Sun- SPARC workstations. Sun workstation operate under the UNIX operating system The program is therefore not compatible with the DOS/windows environment. It was to be appropriately modified to work on a PC. In addition, many logical improvements were needed in the program to make it more eficient. 5.5.1.2 Invalid Synaptic Weights Either the network would not converge, or it would generate a number of invalid synaptic weights (w > :tl.0). Once the synaptic weight vahres become invalid, they would increase without bormd thus resulting in the overflow error. 5.5.1.3 Longer Convergence Cycles In an attempt to relegate the problem of invalid weights, the values of learning rate (17) and momentum factor ( a) are reduced in steps. The convergence starts at lower values of learning rate ( n < 0.2), which makes the learning process very slow. 5.5.1.4 Implementation of Momentum Factor Another aspect of Kim’s network is that the effect of past weight change of one pattern is added in the momentum term for calculation of the weight change for the next 106 pattern, thus resulting in a longer convergence cycle. The details of the procedure for implementing the momentum factor are adequately covered in Section 4.4.1.1. 5.5.2 Variations and Improvements In view of the above mentioned problems, there was a need i to improve the performance of the program especially in the PC environment. A number of modifications are incorporated, both in the program structure and in the implementation technique, to make the network more efficient. The major modifications are discussed in the subsequent sections. 5.5.2.1 Program Compatibility The program was made compatible with DOS for the PC. To compensate for the PC DOS’s limitation of a 64K stack segment, most of the local variables are converted into global variables thus making them available via the ‘heap’. All variables in the program are converted into double precision numbers (64-bit wide). The program is compiled with a Borland CH (Ver. 3.1) compiler utilizing its options for speed optimization. To make most of the RAM available to the network, a DOS-Extender utility is employed. The PC is upgraded to a 486DX2-66MI-Iz processor with an additional 8MB of RAM. In addition, eight PCs with similar capabilities are engaged in an effort to obtain the required results within a reasonable time. 107 5.5.2.2 Generation of Random Weights Slight modifications were made in the pseudo random number generator. Instead of taking the seed from the user, a seed is now generated internally using the facility of Borland C. The new design is code efficient. 5.5.2.3 Redefining the Implementation of Momentum Factor The momentum factor is implemented by allocating separate registers (equal to the number of patterns) for the weight change in each synaptic weights. The details of this procedure is narrated in Section 4.4.1.2. This modification appreciably reduced the number of iterations and improved the network learning. 5.5.2.4 Varying the Learning Rate (11) The learning rate (11) is varied according to the performance index. Section 4.4.3 discusses the procedure for implementing this heuristic technique. A separate network (modified DMNN) is implemented and is subjected to the defined problems. The network converges much faster since the learning rate changes automatically, according to the topography of the error surface. 108 5.6 Summary Three difi‘erent networks: the DMNN, modified DMNN and ANN, simulated in C, are subjected to application problems. The networks are trained for a variety of learning rates (77) and momentum factors (a). The learning pattern of each network is drawn and its performance is compared/discussed. The software program used by Kim [1] is discussed. Various improvements, suggested in Chapter 4, are implemented and their effect is evaluated. CHAPTER 6 CONCLUSION A fimctional corroboration of the statistical model of DMNN developed by Kim [1] has been performed in this thesis. The statistical model is used in the feedforward multilayer neural network utilizing the backpropagation training algorithm The stochastic fimction is compared with the standard sigmoidal fimction by subjecting the two networks to test problems. It has been found that while Kim’s architecture has the desirable properties of high speed and high density on a chip, the simulated DMNN generally produces competitive results when compared with an ordinary ANN. An improved backpropagation algorithm is implemented in an effort to improve the training of the network As a result a small scale pattern classification has been achieved. This chapter summarizes the research work, identifies the major contributions made by this work and concludes with the outlining of possible future research tasks. 6.1 Summary Presently, research is imderway to develop a fast, space eficient and programmable architecture for dedicated VLSI ANN s which can be applied efi‘ectively to the real engineering problems. The development of a pulse mode digital multilayer neural 109 110 network by Kim [1] is a step towards the achievement of this goal. Kim developed a DMNN architecture based on stochastic computing techniques which can be implemented by using simple logic gates. Kim modeled the DMNN using VHDL code and analyzed its performance in data compression and character classification problems. The aim of this research was to validate the results of Kim’s model with a software simulation technique using the C programming language. The stochastic fimction developed by Kim [1] was employed in the backpropagation algorithm A two layer DMNN was simulated using a C program The baclqiropagation algorithm was analyzed, its weaknesses identified and various techniques were suggested for accelerating the convergence process. The DMNN and its improved variants are subjected to the problems of XOR, data encoding (8x3) and character classification (5-digit and lO-digit). The simulation results indicate that the performance of the stochastic fimction is competitive, in this computing environment, with that of a simoidal fimction. An 80486 DX2 processor based personal computer was used for the training simulations. Due to the intensive computational nature of the character classification problems, these could not be represented adequately. In an efl‘ort to solve these problems, the program structure and logic was improved appropriately. To reduce the number of iterations required for the convergence, three modifications are proposed in the backpropagation algorithm: 111 1. The weight change, (Aw[n]), is calculated separately for each pattern and is used for calculation of the next weight change, (Aw[n+1]), for the same pattern. This method is more logical and has proved to reduce the training cycles. 2. Instead of wasting the effect of past weight change for each pattern, it is saved and used in the next iteration. This change not only proved to decrease the number of iterations but also assisted in reducing the number of invalid weight values. 3. The learning rate (77) and the momentum factor (a) are varied dynamically so that the system utilizes a near-optimum 1), determined by the local optimization topography. Moreover, the momentum factor (a) is set to zero whenever its efi‘ect is misleading rather than being beneficial. The DOS extender mility is used in an attempt to flee maximum memory for the program Eight systems are utilized to achieve the efi‘ect of parallel processing for the pattern classification problems. As a result of these changes, the 5-digit pattern classifier has been successfully trained and tested for the ideal digits (noise free). The DMNN training involves computations which are generally beyond the capabilities of a personal computer. Even with all the possible eflons to enhance the capabilities of the PC, it was only possible to solve the S-digit classification problem When the number of processing elements are increased beyond a total of 34 and/or the number of patterns are more than five, the network is unable to converge within a reasonable time. Even with five patterns, a 5-digit classifier takes more than 8-10 hours for one training cycle. The misclassification error 112 shows that the network needs more training, in terms of number of patterns and in terms of the number of iterations, to improve its performance. 6.2 Contributions The major contributions of this research are: 1. The DMNN based on the stochastic random processes, developed by Kim [1], has been sinmlated using C program and a fimctional validation of his results has been performed. 2. The same set of problems have been solved by an ANN using a sigmoidal fimction and the results of the two are compared. 3. Modifications to the backpropagation algorithm, incorporating the stochastic fimction, have been presented and their efi‘ectiveness has been confirmed. 6.3 Future Research The DMNN proposed by Kim has been partially validated, using a software simulation technique, on a PC. A powerfiil system using multiprocessing can not only validate the results of a lO-digit character classifier, but it can fiirther be used for solving more complex pattern classification problems. Furthermore, instead of C, an object- 113 oriented programming language (C++) can be applied which is better suited to such problems. The backpropagation technique can be improved finther by introducing the concept of “expected source values”. Considering a more general form, the parameter adjustment between a node a and a node b can be given as W.b[n+1] = W.,,[n] + 775.,S. + aAWasz] (6.1) where S., symbolizing source, is the output of node a or an external input to the network and d. is an associated error at the node b. T. Samad suggests that instead of using the current value of the source, S., use the following expected value of S. so = S. + ,6 5a (6.2) where ,6 is any appropriate positive constant and 6.. is the error associated with node a. The modified learning rule has proven to be rmrch faster [51]. Furthermore, recent advancements in firzzy logic set theory [52,53] can be employed to vary 1], fl and a based on variation in the output error and the training time (number of iterations) that has elapsed For instance, the learning rate, r], can have following fiizzy rules: 1. ”(e, n) can be set to a large value when the output error, e, is large. This rule will indicate that weights are far away fi'om the optimal value. 114 2. At other instances, 77(e,r2) can be set to a small value when the output error, e, is small. Thus indicating that weights are getting closer or are in the neighborhood of optimal values. 3. 77(e,n) can be set to a large value if training time (number of iterations), is fairly small indicating the beginning of training cycle. 4. ”(e,n) can be set to a small value for better convergence if the training time (number of iterations) is significantly large indicating the final stages of the training process, and so forth. Table 6.1 shows the possible firzzy associative memory (FAM) rules for n(e, n). Similar rules can be developed for other parameters, ,6 and (I. These rules are based on the relevant knowledge about the learning behavior. Table 6.1. The Fuzzy Associative Memory (FAM) Rules for ”(e,n). Output Error Time (Iterations) Very small Small Large Very large Less small large very large very large Moderate very small small large very large Large very small small large large APPENDICES APPENDIX A Flowcharts APPENDIX A F lowcharts A.1 DMNN O eration C sTART D READ NETWORK CONFIGURATION INPUT LEARN FLAG LEARN FLAG "° 1, ( READ WEIGHTS 1 F INPUT ETA, ALPHA, MAX__SS_ERROR FEEDFORWARD 6 + READ INPUT ( PATTERN, TARGET< C DISPLAY PATTERN RESULTS + GENERATE RANDOM WEIGHTS WHILE MAX NO ITERATIONS OR y... F MAX_SS__ERROR s REACHED DISPLAY TOTAL D O W FEE F R AR” ITERATIONS, e FINAL WEGHTS, FEEDBACK OUTPUTS I, UPDATE WEIGHTS, C STOP 3 INCREASE ITERATION COUNT I 115 116 A.2 Feedforward Operation ( START ) DEFINE LOCAL VARIABLES / LOOP FOR ALL \ PATTERNS \ LOOP FOR ALL NEURONS NO . net; =1-fi(l+w,;v,.) net}=1—1=I(l-W,jv,-) i=0 i=0 vj =netj(1—net;) ‘ @ 117 A.3 Feedback Operation I START } DEFINE LOCAL VARIABLES i INCREASE rrER_CNT. RESET DELTA WEIGHTS 6 ( LOOP FOR ALL PATTERNH v / LOOPFORALLOUTPUT \ math“) LAYER NEURONS 0 ERR = (z, — v,) i LOOP FOR ALL NEURONS \ CONNECTED ,WITH THIS NEURON YES NO {__— ___-T 5; = (t, - v, )(net; ) 8; = (I, -v,)(l -net;) l v l . .. - - + _ o _ o 1 AW! [n+1] = 115.0 - mum Awp["+1] -‘I81(1 "“3 ) I __ ”a"! l i Aw} [n +1] = Aw;Ir [n + l] +aAw; [n] Aw; [n +1] = Aw; [n +1] +aAw;[n] l l Aw}[n] = Aw;[n+l] Aw} [n] = Aw; [n+1] I T I i ABS_ERR-= ABS_ERR + IERRI ss_ERR -= ss_ERR + (ERR? I I ss_ERR-s ss_ERR/ 2 ] (Next Page ) 118 LOOP FOR ALL HIDDEN / LAYER NEURONS CONNECTED WITH THIS LOOP FOR ALL NEURONS \ NEURON n. W a, * 6;:netj’E8;(1-mt;)l—+-év—J 5;:(1-mtJEBIQ-netfi)lf% l - l - — - _ _ v v Aw” [n+1]— 1151(1 net} ,F—L—l-I-wgv, Aw;[n+ I] = 115:0- net; )_1_ WW yr I l Aw§[n+ 1] = Awfln+ 1]+ aAwy'DI] Awfln-I-I]: Awfln+ll+aAwflnl l Aw',[n]=AW,'[ n+1 l l A wfln] = Aw;[n+1] W @ APPENDIX B Input Data for DMNN Binary Classifiers APPENDIX B Input Data for DMNN Binary Classifiers Input data in the form of network configuration, training patterns and testing patterns used for XOR 8x3 encoding and S-digit classification problems are listed here. For network configuration, the value in row 1' and column j represents the connectivity between neurons 1' and j respectively. The ith element of the training and test pattern vectors is the value applied to the ith neuron in the input layer. Input data for the lO-digit classification problem has been omitted since the problem could not be represented. The data is available in [1]. B.l XOR Problem B.l.1 Network Configuration (xor.cnf) B.l.2 Training Patterns (xor.pat) 0 0 B.l.3 Test Patterns (xor.tgt) 0 119 120 B.2 8x3 Encoding Problem B.2.l Network Configuration (8s3.cnl) 0 0 0 B.2.2 Training Patterns (8x3.pat) B.2.3 Test Patterns (8x3.tgt) 0 121 B.3 5-Digit Classification Problem B.3.1 Network Configuration (5d.cnf) l000000000000000000000000000000000 0l00000000000000000000000000000000 0010000000000000000000000000000000 0001000000000000000000000000000000 0000100000000000000000000000000000 0000010000000000000000000000000000 0000001000000000000000000000000000 0000000l00000000000000000000000000 0000000010000000000000000000000000 0000000001000000000000000000000000 0 0 0000000000001000000000000000000000 0000000000000100000000000000000000 00000000000000]0000000000000000000 0000000000000001000000000000000000 0000000000000000100000000000000000 0000000000000000010000000000000000 0000000000100000000000000 0000000000010000000000000 0000000000000000001000000000000000 0000000000000000000100000000000 00000 0 0 0 0 0 0 0 0 0 0 000000000000000000001000000000 00000000000000000000001000000000 000000000000000000000100000000 00000000000000000000000100000000 llllllllll111111111111110000000001 lllllllllllllllllll11111000000000] lllllllllllllllllll111110000000001 llllllllllll1111111111110000000001 000000000000000000000000l111000001 000000000000000000000000l111000001 0000000000000000000000001111000001 0000000000000000000000001111000001 000000000000000000000000l11.1000001 000000000000000000000000000000000l B.3.2 Training Patterns for Ideal Digits (5d.pat) 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.45 0.55 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.55 0.45 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.55 0.45 0.45 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.55 0.45 0.45 0.45 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.55 0.45 0.45 0.45 0.45 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 l.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 122 B.3.3 Test Patterns for Ideal Digits (5d.tgt) 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 B.3.4 Test Patterns for Ideal Digits + 10% Noise (three noisy patterns per digit) 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 123 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 B.3.5 Training Patterns for Ideal Digits + 10% Noise (three noisy patterns per digit) 0.0 1.0 1.0 0.0 1.0 0.0 0.0 l.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 l.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.45 0.55 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.45 0.55 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.45 0.55 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.45 0.55 1.0 0.0 1.0 l.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.55 0.45 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.55 0.45 1.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.55 0.45 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.00.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.45 0.55 0.45 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.55 0.45 0.45 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.55 0.45 0.45 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.55 0.45 0.45 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.45 0.45 0.55 0.45 0.45 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.55 0.45 0.45 0.45 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.55 0.45 0.45 0.45 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.55 0.45 0.45 0.45 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.45 0.55 0.45 0.45 0.45 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.55 0.45 0.45 0.45 0.45 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.55 0.45 0.45 0.45 0.45 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.55 0.45 0.45 0.45 0.45 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.55 0.45 0.45 0.45 0.45 1.0 124 B.3.6 Test Patterns for Ideal Digits + 10% Noise (three noisy patterns per digit) + 20% Noise (three noisy patterns per digit) 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.00.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 125 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 LIST OF REFERENCES [1] [2] [3] [4] [5] [6] [7] [3] [9] [10] LIST OF REFERENCES Y.C. Kim, “Architecture and Statistical Model of a Pulse Mode Digital Multilayer Neural Network,” Ph. D. Dissertation, Michigan State University, 1993. RP. Lipmann, “An Introduction to Computing with Neural Nets,” IEEE ASSP Magazine, pp. 4-22, April 1987. DE. Rumelhart, G.E. Hilton, and RJ. Williams, “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing, Cambridge, MA, MIT Press, Vol. 1, pp. 318-362, 1986. RC. Hoppensteadt, “An Introduction ' to the Mathematics of Neurons,” Cambridge University Press, 1986. C. Mead, M. Sivilotti, and M Emerling, “A Novel Associative Memory Implemented using Collective Computation,” Chapel Hill conference on VLSI, pp. 329-342, 1985. HP. Graf, L.D. Jackel, W.E. Hubbard, “VLSI Implementation of a Neural Network Model,” IEEE Computer, pp. 41-49, March 1988. HP. Grail L.D. Jackel, “Analog Electronic Neural Network Circuits,” IEEE Circuits and Devices, pp. 44-49, July 1989. W. Wike, D.E. van den Bout and T.Miller III, “The VLSI Implementation of STONN,” Proc. UCNN, Vol. 111, pp. 443-452, 1987. DE. van den Bout and T.K Miller, “A Digital Architecture employing Stochasticisim for the Sinnrlation of Hopfield Neural Nets,” IEEE Trans. on Circuits and Systems, Vol. 36(5), pp. 732-738, May 1989. MS. Tomlinson Jr, D]. Walker, MA. Sivilotti, “A Digital Neural Network Architecture for VLSI,” Proc. UCNN, Vol. II, pp. 545-556, 1990. 126 127 [11] A. Moopenn, A.P. Thakoor, T. Duong, and SK Khanna, “A Neurocomputer based on an Analog-Digital Hybrid Architecture,” Proc. IEEE ICNN, Vol. III, pp. 479- 486, 1987. [12] Y. Tsividis, and S. Satyanarayama, “Analogue Circuits for Variable Synapse Electronic Neural Networks,” Electronic Letter, 1987. [13] D. Nguyen and F. Holt, “Stochastic Processing in a Neural Network Application,” Proc. IEEE ICNN, Vol 111. pp. 281-291, 1987. [14] W. James, “Psychology: Briefer Course,” Holt, New York, 1980. [15] W.C. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, Vol 5, pp. 115-133, 1943. [16] DO. Hebb, “The Organization of Behavior,” John Wiely, New York, NY, 1949. [17] IV. Neumann, "The Computer and the Brain,” Yale University Press, 1958. [18] F. Rosenblatt, "The Perceptron: A Probablistic Model for Information Storage and Organization in the Brain,” Psychological Review, Vol 65, pp. 386-408, 1958. [19] B. Widrow and ME. Hofi; “Adaptive Switching Circuits,” 1960 IRE WESCON Convention Record, Part 4, Computers:, Man-Machine Systems, Los Angles, pp. 96-104, 1940. [20] M. Minsky and S. Papert, “Perceptron,” MIT Press, Cambridge, MA, 1969. [21] IA Anderson, “A Simple Neural Network Generating on Interactive Memory,” Mathematical Biosciences, Vol 14, pp. 197-220, 1972. [22] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network for a Mechanism of Pattern Recognition Unaffected by a Shift in Position,” Biological Cybernetics, pp. 36-193, 1980. [23] T. Kohonen, “Associative Memory-A System Theoretical Approach,” Springer Press, New York, 1977. [24] J.J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities,” Proc. Natl. Acad Sci, Vol 79, pp. 2554-2558, 1982. [2 5] J.J. Hopfield, “Neurons with Graded Response have Collective Computational Properties like Those of TWO-State Neurons,” Proc. Natl. Acad. Sci, Vol 81, pp. 3088-3092, 1984. 128 [26] C. Mead, “Analog VLSI and Neural Systems,” Addison-Wesley, MA, 1989. [27] J. Darnell, H. Lodish and D. Baltimore, “Molecular Cell Biology,” Scientific American Books, New York, pp. 715-765, 1986. [28] B. Widrow, and S. Steams, “Adaptive Signal Processing,” Pren. Hall, 1985. [29] T. Kohonen, “Self Organization and Associative Memory,” Springer-Verlag, Second Edition, Berlin, 1987. ‘ ‘ [30] D.W. Tank and J.J. Hopfield, “Simple Neural Optimization Neural Networks: An AID Converter, Signal Decision Circuit, and Linear Programming Circuits,” IEEE Trans. on Circuits and Systems, Vol. CAS-33, No. 5, pp. 533-541, May 1986. [31] D.W. Tank and J.J. Hopfield, “Collective Computation in Neuronlike Circuits,” Scientific American, pp. 104-114, Dec. 1987. [32] F. Rosenblatt, “Principles ofNeurodynamics,” New York, Spartan, 1962. [33] A Guez and Z. Ahmad, “Sohrtion to the Inverse Kinematics Problem in Robotics by Neural Networ ” Proc. UCNN, Vol. II, pp. 617-624, 1988. [34] M. Knperstein and J. Wang, “Neural Controller for Adaptive Movements with Unforeseen Payloads,” IEEE Trans. on Neural Networks, Vol. 1, No. 1, March 1990. [35] GE. Hinton and TJ. Sejnowski, “Optimal Perceptual Inference,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, Washington, pp. 448-453, 1983. [36] GE. Hinton and T.J. Sejnowski, “Learning and Releaming in Boltzmann Machines,” Parallel Distributed Procesing, Vol. 111, pp. 721-734, 1987. [3 7] Y. Tsividis and S. Satyanarayama, “Analog Circuits for Variable- Synapse Electronic Neural Networks,” Electronic Letters, Vol. 23, pp. 1312-1313, 1987. [38] Y. Suzuki and L. Atlas, “A Study of Regular Architecture for Digital Implementation of Neural Networks,” Proc. IEEE Int. Symposium on Circuits and Systems, Portland, May 1989. [39] Y. Suzuki and L. Atlas, “A Comparison of Processor Topologies for a Fast Trainable Neural Network for Speech Recognition,” Proc. IEEE Int. Conf on Acoustic, Speech, and Signal Processing, Glassgow, May 1989. 129 [40] DA Pomerleau, G.L. Gusciora, D.S. Touretzky, H.T. Kung, “Neural Network Simulation at Warp Speed: How We Got 17 Million Connections Per Second,” Proc. IJCNN, Vol. II, pp. 143-150, 1988. [41] BM. Forrest, D. Roweth, N. Stround, D.J. Wallace, and G.V. Wilson, “Implementing Neural Network Models on Parallel Computers,” Computer Journal, Vol. 30(5), pp. 32-41, 1989. [42] R Kuczewsk, M. Myers, and W. Crawford, “Neurocomputer Workstations and Processors: Approaches and Applications,” Proc. UCNN, Vol. III, pp. 487-500, 1988. [43] Y.C. Kim and M.A Shanblatt, “An Implementable Digital Multilayer Neural Network (DMNN),” Proc. IJCNN, Vol. H, pp. 594-600, 1992. [44] B.R Gaines, “Stochastic Computing,” Spring Joint Computer Conf, Vol. 30, pp. 149-156, 1966. [45] ST. Ribeiro, “Random Pulse Machine,” IEEE Trans. on Electronic Computers, Col EC-l6, No. 3, pp. 261-276, JIme 1967. [46] A Design & Test, “Behavioral Description Languages,” IEEE Design & Test of Computer, pp. 56-58, 1990. [47] OK. Keshavachandra and MA Shanblatt, “Modelling Artificial Neural Networks using VHDL,” A technical Report, MSU-ENGR-90-009, 1990. [48] B. Widrow and S. Steams, “Adaptive Sigral Processing,” Prentice-Hall, 1985. [49] M. Jurik, “Back Error Propagation: A Critique,” Proc., IEEE, May 1988. [50] TR Vogl, J.K Mangis, AK Rigler, W.T. link, and BL. Alkon, “Accelerating the Convergence of the Backpropagation Method,” Biol Cybem. 59, pp. 257-263, 1988. [51] T. Samad, “Backpropagation with Expected Source Values,” Neural Networks, Vol. 4, pp. 615-618, 1991. [52] CC. Lee, “Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part 1,” IEEE Trans. on Systems, Man and Cyber. Vol. 20, No. 2, pp. 404-418, March/April 1990. [53] CC. Lee, “Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part II,” IEEE Trans. on Systems, Man and Cyber. Vol. 20, No. 2, pp. 419-432, March/April 1990.