SCALABLE PULSED MODE COMPUTATION ARCHITECTURE USING INTEGRATE AND FIRE STRUCTURE BASED ON MARGIN PROPAGATION By Thamira Hindo A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical Engineering- Doctor of Philosophy 2014 ABSTRACT SCALABLE PULSED MODE COMPUTATION ARCHITECTURE USING INTEGRATE AND FIRE STRUCTURE BASED ON MARGIN PROPAGATION By Thamira Hindo Neuromorphic computing architectures mimic the brain to implement efficient computations for sensory applications in a different way from that of the traditional Von Neumann architecture. The goal of neuromorphic computing systems is to implement sensory devices and systems that operate as efficiently as their biological equivalents. Neuromorphic computing consists of several potential components including parallel processing instead of synchronous processing, hybrid (pulse) computation instead of digital computation, neuron models as a basic core of the processing instead of the arithmetic logic units, and analog VLSI design instead of digital VLSI design. In this work a new neuromorphic computing architecture is proposed and investigated for the implementation of algorithms based on using the pulsed mode with a neuron-based circuit. The proposed architecture goal is to implement approximate non-linear functions that are important components of signal processing algorithms. Some of the most important signal processing algorithms are those that mimic biological systems such as hearing, sight and touch. The designed architecture is pulse mode and it maps the functions into an algorithm called margin propagation. The designed structure is a special network of integrate-and-fire neuron-based circuits that implement the margin propagation algorithm using integration and threshold operations embedded in the transfer function of the neuron model. The integrate-and-fire neuron units in the network are connected together through excitatory and inhibitory paths to impose constraints on the network firing-rate. The advantages of the pulse-based, integrate-and-fire margin propagation (IFMP) algorithmic unit are to implement complex non-linear and dynamic programming functions in a scalable way; to implement functions using cascaded design in parallel or serial architecture; to implement the modules in low power and small size circuits of analog VLSI; and to achieve a wide dynamic range since the input parameters of IFMP module are mapped in the logarithmic domain. The newly proposed IFMP algorithmic unit is investigated both on a theoretically basis and an experimental performance basis. The IFMP algorithmic unit is implemented with a low power analog circuit. The circuit is simulated using computer aided design tools and it is fabricated in a 0.5 micron CMOS process. The hardware performance of the fabricated IFMP algorithmic architecture is also measured. The application of the IFMP algorithmic architecture is investigate for three signal processing algorithms including sequence recognition, trace recognition using hidden Markov model and binary classification using a support vector machine. Additionally, the IFMP architecture is investigated for the application of the winner-take-all algorithm, which is important for hearing, sight and touch sensor systems. Copyright by Thamira Hindo 2014 To my husband Sabah Almasraf and my daughters Nora, Basma and Nagham v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Problem and key solution . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Pulse computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 What and why pulse computation . . . . . . . . . . . . . . . . 1.2.2 Example of a pulse computational system . . . . . . . . . . . 1.2.2.1 Technical specifications . . . . . . . . . . . . . . . . 1.2.2.2 Functional features . . . . . . . . . . . . . . . . . . . 1.2.3 Survey of pulse computation . . . . . . . . . . . . . . . . . . . 1.2.3.1 Pulse computation based on stochastic computations 1.2.3.2 Pulse computation based on finite state machine . . 1.2.3.3 Pulse computation based on neural integration . . . 1.2.4 Thesis objective . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Log-sum-exponent and margin propagation . . . . . . . . . . . . . . . 1.4 Integrate and fire structure . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Tasks and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 3 5 6 7 9 11 13 16 25 26 32 38 41 Chapter 2 Integrate-Fire Model and 2.1 Pulse based IF . . . . . . . . . . 2.2 Spike time dependency plasticity 2.3 Summary . . . . . . . . . . . . . 43 44 51 61 Synaptic . . . . . . . . . . . . . . . . . . Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 IFMP . . . . . . . . . . . . . . . . . . . . 3.1 IFMP: concepts and analysis . . . . . . . . . . . 3.2 IFMP properties . . . . . . . . . . . . . . . . . 3.3 Examples of basic computations using IFMP . 3.4 IFMP: circuit, layout and tested measurements 3.5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 . 63 . 78 . 85 . 93 . 105 Chapter 4 IFMP Applications 4.1 Introduction to the Hidden 4.2 Sequence detection . . . . 4.2.1 Chip simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markov Model . . . . . . . . . . . . . . . . . . vi . . . . . . . . . . . . 107 107 111 120 4.3 4.4 4.5 4.6 Trajectory detection . . . . . . . . . . . . . Support vector machine . . . . . . . . . . . Classification using support vector machine . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 131 135 142 Chapter 5 Significance of IFMP . . . . . . . . . . . . . . . . . . . 5.1 Dynamic range enhancement in pulse computation . . . . . . . . 5.1.1 Applications of stochastic computations . . . . . . . . . 5.1.2 Significant difference between IFMP and SC . . . . . . . 5.2 Novel pulse-mode Winner take all (WTA) . . . . . . . . . . . . 5.2.1 Importance of WTA . . . . . . . . . . . . . . . . . . . . 5.2.2 Background of WTA (continuous input/output variables) 5.2.3 Case study to design WTA ( spiking networks) . . . . . . 5.2.4 Contribution and Significance of IFMP as WTA . . . . . 5.2.5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 146 146 147 150 152 154 154 155 157 162 Chapter 6 Summary and Future work . . . . 6.1 Objectives . . . . . . . . . . . . . . . . . . . 6.2 Implemented tasks . . . . . . . . . . . . . . 6.3 significance of the IFMP algorithmic module 6.4 Future work . . . . . . . . . . . . . . . . . . 6.5 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 167 168 174 177 178 BIBLIOGRAPHY . . . . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 LIST OF TABLES Table 1.1 Comparison between analog and digital systems. . . . . . . . . . . . 5 Table 1.2 Comparison of parametric specification between the brain and the computer for simulating human cortex. The computational energy/cost in brain is 108 times smaller (efficient) than computers. . . . . . . . 9 Comparison of functional specification between the brain and the computer for simulating human cortex. . . . . . . . . . . . . . . . . 10 Table 1.3 Table 5.1 Differences in implementation between the pulse based stochastic computation structures and the pulse based IFMP computation modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Table 5.2 Survey of Winner take all design started from theoritical analaysis into spiking populaion of neural networks . . . . . . . . . . . . . . . 156 Table 5.3 Comparison between the temporal spiking network in Oster’s architecture and the proposed work . . . . . . . . . . . . . . . . . . . . . 165 Table 6.1 Pulse rates obtained in the theoretical ”theo” and cadence ”cad” simulations for the three neurons N1 , N2 and N3 , note that z is the output rate of neuron N3 . Where γ = 0.2 and L2 = 0.3 . . . . . . . 172 viii LIST OF FIGURES Figure 1.1 Main concepts of the proposed module . . . . . . . . . . . . . . . . 2 Figure 1.2 Stochastic computation (a): Multiplication stochastic bit streams with an AND gate. (b): Scaled addition on stochastic bit streams, with a multiplexer (MUX). (c): Stochastic division [17, 21] . . . . . 12 Figure 1.3 Pulse computation using state machines. (a): matched and unmatched pulses. (b): simplified state diagram for the matcher. (c): simplified state diagram for the differencer. (d): matched circuit [22]. (For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation) 14 Figure 1.4 (a): Oscilloscope traces from matcher. P1 and P2, inputs; O2 and O1 outputs of states S1 and S2 for the matcher state diagram. (b): Block diagram of the peak detector application. (b) Oscilloscope traces from shows peak detection, P1, P2, P3,are the inputs; P4 is the output state diagram [22] . . . . . . . . . . . . . . . . . . . . . 15 (a): Resistor array in Hopfield as fixed function analog as (b): Processor node in digital VLSI . . . . . . . . . . . . . . . . . . . . . . 19 Some of the pulse coding, A is the pulse amplitude modulation (PAM), B is the pulse frequency or rate modulation (PFM),and C is the pulse duration modulation (PDM) . . . . . . . . . . . . . . . . . . . . . . 20 (a): Schematic architecture for a pulse - stream neural network. Neurons are denoted as a circle (c) and synaptic operators is denoted as square shape (b)[36] . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Figure 1.8 Neuron and synapse circuit in (a) and (b) respectively [36] . . . . . 23 Figure 1.9 (a):Output pulse generated from input current (b):Synapse connection. (c):Pulse coded winner-take-all block diagram [37, 27] . . . . . 24 Figure 1.10 Factor graph representation to show that p(x3 ) = pa (x3 ).pb (x3 ) . . . 30 Figure 1.11 Water level filling constraint . . . . . . . . . . . . . . . . . . . . . . 31 Figure 1.5 Figure 1.6 Figure 1.7 ix Figure 1.12 Mechanism of spike-generation and signal propagation through synapses and neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 1.13 (a): Simplest representation of IF representation as neuron model neuron. The basic circuit is the module inside the dashed circle on the right-hand side. A current I(t) charges the C circuit and a voltage across the capacitance is compared to a threshold vth . Left part: A presynaptic representation as a low-pass filter. (c): The dynamic equation of the membrane voltage in a population of spiking neurons in (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Conceptual schematic of a circuit that generate pulses due to positive feedback through capacitor cf b across high gain buffer. The lower transistor adjust the pulse width during discharge phase of the capacitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 1.14 Figure 2.1 (a): Integrate and fire schematic circuit. (b): Integrator and cascoded inverter. (c): Transfer function of the IF unit. (d): The output voltage Vout , the membrane voltage Vm and input pin in of the integrator 44 Figure 2.2 IF model rate convergence . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 2.3 The STDP modification function. The synaptic change with ∆T , where ∆T = tpre − tpost . . . . . . . . . . . . . . . . . . . . . . . . . 52 Figure 2.4 Floating gate memory . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 2.5 STDP learning using neuronal architecture showing the connectivity between the synapses, neurons, post shape, pre shape and shift register modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 (a): Learning curve of STDP as the change in the weight versus the time interval between tpre and tpost . (b): The synapse circuit shows the injection and tunneling needed in the floating gate to adjust the current in pMOS M 1 and M 2. (c): Timing sequence for the injection and tunneling in the synapse -floating gate of (b). (d): The Layout of synapse of (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Figure 2.7 Post pulse shape circuit . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure 2.8 Pre pulse shape circuit . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 2.6 x Figure 2.9 Transient simulation of the STDP circuit . . . . . . . . . . . . . . . 61 Figure 2.10 Layout of the STDP architecture. The two chips are identical but are different in density . . . . . . . . . . . . . . . . . . . . . . . . . 62 Organization of a generic neurobiological sensory system, image source: Wikipedia and Ref. [5, 66] . . . . . . . . . . . . . . . . . . . . . . . 64 Schematic of the proposed IFMP unit comprising of three integrateand-fire modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 a: The error bar between the theoretical and practical rates with respect to iterations for iterations greater than 3000. b: Error bars for iterations greater than 100 . . . . . . . . . . . . . . . . . . . . . 67 Spike-rates for neurons N1 , N2 and N3 when the spike-rate of L1 is monotonically increased. For interpretation of references to color in this figure and all other figures, the reader referred to the electronic version of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Dynamic characteristics of IFMP unit for different values of γ for MP and IFMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Approximation of rate z between the log-sum-exp rate (zLSE ), Margin Propagation rate (zM P ) and Integrate-Fire Marginn Propagation (zIF M P ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 (a): Block diagram that shows the approximation between LSE and IFMP. (b): Cascading of IFPM structure . . . . . . . . . . . . . . . 74 Implementation results of pulsed serial cascaded and parallel IFMP module approximation compared to the log-sum-exp function and the real values serial and parallel MP, The y-axis represents the output rate (z) for the different cases showed in the legend and the x-axis represents the change of rate for one input while keeping the other inputs at constant rates . . . . . . . . . . . . . . . . . . . . . . . . . 76 Scaling property for spiking IFMP compared to the continous values of margin propogation . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Monoticity propertiy for spiking IFMP compared to the continous values of margin propogation . . . . . . . . . . . . . . . . . . . . . . 81 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 xi Figure 3.11 Convexity property for spiking IFMP compared to the continous values of margin propogation . . . . . . . . . . . . . . . . . . . . . . . 82 Margin propogation property for spiking IFMP compared to the continous values of margin propogation. . . . . . . . . . . . . . . . . . . 83 Super position property for spiking IFMP compared to the continous values of margin propogation. . . . . . . . . . . . . . . . . . . . . . 84 Figure 3.14 Input/output stages in the Pulsed computational unit. . . . . . . . . 86 Figure 3.15 Basic computational architecture using IFMP. . . . . . . . . . . . . 87 Figure 3.16 Computational functions for addition using IF M Psw , M P, LSE and IF M Pcircuit . The right figures represent the error rate between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) 87 Computational functions for multiplication using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) 89 Computational functions for power using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) . . . 90 Computational functions for inner product using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) 91 Computational functions for polynomial using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) 91 Output Rates and error corresponding to the theoretical computation for both the circuit simulation in (a) and software simulation in (b) to evaluate the multiplication function using pulse computation . . . 92 Output Rates and error corresponding to the theoritical computation for both the circuit simulation in (a) and software simulation in (b) to evaluate the polynomial function using pulse computation . . . . 94 Figure 3.12 Figure 3.13 Figure 3.17 Figure 3.18 Figure 3.19 Figure 3.20 Figure 3.21 Figure 3.22 xii Figure 3.23 Figure 3.24 Figure 3.25 Figure 3.26 Figure 3.27 Output Rates and error corresponding to the theoritical computation for both the circuit simulation in (a) and software simulation in (b) to evaluate the inner product function using pulse computation . . . 95 (a): Two inputs IFMP module. (b): Analog circuit for the IFMP module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 (a): Interfacing boards for testing the IFMP chip. The PC and FPGA are used to generate random pulses/getback the output pulses to/from the IFMP circuit. The National instrument interfacing card is used to generate voltage biasing to the transistors of the IFMP as well to intiate the stack memory to pull the stored pulses to the PC. (b): IFMP chip includes array of 8*8 IFMP chip and decoders . . . 97 Sample tests of dynamic characteristics of IFMP when γ = 0.2 in (a) respectivally and the deference between the dynamic characteristics of the measured values, theoritical rates in (b) . . . . . . . . . . . . 98 Sample tests of dynamic characteristics of IFMP when γ = 0.4 in (a) respectivally and the deference between the dynamic characteristics of the measured values, theoritical rates in (b) . . . . . . . . . . . . 99 Figure 3.28 Sample tests of dynamic characteristics of IFMP when γ = 0.6 in (a) respectively and the diference between the dynamic characteristics of the measured values, and theoritical rates in (b) . . . . . . . . . . . 100 Figure 3.29 (a): Uncoupled connection between neurons and the coresponding raster plot. (b): Coupled connection between neurons and the coresponding raster plot showing the uniform and non uniform occurance of pulses at the bottom of the raster plots . . . . . . . . . . . . . . . 103 Figure 3.30 Power spectrum to show that the noise is suppressed in coupled connection of neurons compared with the uncoupled connection of neuron shown in Fig. 3.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 4.1 (a): HMM states(s)and emission events (x). (b): Three state HMM and it’s representation using trellis with four input sequence to be recognized (1,3,3,2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 4.2 Three state HMM cascaded using IFMP structure to decode sequence and the analog circuit (below) for each cascaded state . . . . . . . . 115 xiii Figure 4.3 (Left): Rates of the states in Fig. 4.2 for two sequences recognition with both matched and unmatched observation using pulse based HMM in software and schematic simulation. Time scale: 0.4ms/step. (a): output rates of HMM state H1 software simulation. (b): output rates of HMM state H1 cadence simulation. (c): output rates of HMM state H2 software simulation. (d): output rates of HMM state H2 cadence simulation. (right): A window of the output pulses for the three states for HMM H2 which view the sparse of pulses between state s1 as Lp1 rate , state s2 as Lp2 rate and state s3 as Lp3 rate . 118 Figure 4.4 Instantaneous firing-rates for neurons N1 to N3 using the IFMP network implementing HMM H1 (left) and HMM H2 (right) for the observation sequence {1,3,3,2}. . . . . . . . . . . . . . . . . . . . . . . 119 Figure 4.5 Interfacing boards for testing the IFMP chip. The PC and FPGA are used to generate random pulses/getback the output pulses to/from the IFMP circuit. The National instrument interfacing card is used to generate voltage biasing to the transistors of the IFMP as well to intiate the stack memory to pull the stored pulses to the PC . . . . 120 Figure 4.6 Flowchart of the supported software needed to implement applications using IFPG board and the array of IFMP modules on a chip . 121 Figure 4.7 Timing simulation of Verilog program to generate random pulses to the three units of IFMP in time sharing mode where col1 and col2 select the IFMPunit on the chip . . . . . . . . . . . . . . . . . . . . 122 Figure 4.8 The moving average window for the three output states of HMM with two observations: the HMM-H1 and HMM-H2 respectively . . . . . 124 Figure 4.9 The output pulses and moving average window for the three states for HMM-H1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Figure 4.10 The output pulses and moving average window for the three states for HMM-H2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Figure 4.11 Traces for eight trajectories starting from state 3 using three HMM states per path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Figure 4.12 Four detecting experiments for different trajectory paths in Fig 4.11. The x-axis and Y-axis reprsent the eight traces and the probabilities for each trace path in the experiment . . . . . . . . . . . . . . . . . 130 xiv Figure 4.13 (a, b): Two examples of mapping a non separable data (left) into a linearly separable data (right) by mapping the lower dimension data into higher dimensions. (c): Binary classification based on linear huperplan svm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Figure 4.14 The classification using IFMP and support vector machine to compare between the differential outputs . . . . . . . . . . . . . . . . . . . . 138 Figure 4.15 svm based classification using parllel cascading of IFMP modules. k1 : k9 are the kernels. The difference between the firing-rates of the output z + −z − determines the classification of input data. B1: Input vector (y) and support vector (x), B2: Kernel function (k), output decision (z + ) B3: Kernel function (k), output decision (z − ) . . . . . 140 Figure 4.16 Results obtained from the svm based classification using IFMP network. (a): Two dimensional Iris data set; (b): Raster-plot of spikes generated by the neurons in the network; and (c) Difference between the firing-rate of the outputs z + and z − . . . . . . . . . . . . . . . . 141 Figure 4.17 Continued- Difference between the firing-rate of the outputs z + and z − . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Figure 4.18 Classification for serial cascaded architecture using IFMP. k1 : k9 are the kernels. The biasing and weights (b and α ) are in the differential forms to find the output z + and z − that decides the classification of input data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Figure 4.19 (a): Classification for the serial cascaded architecture of two classes of Iris data set. The negative and positive values of (z + − z − ) represent Setosa and Versilcolor types respectively using two models,the MP (red marker) and IFMP (plus marker). (b): Raster plot for spikes of neurons 30 and 81 decide the classification of two samples of input Iris data set: the first and the second half of the raster plot respectively144 Figure 5.1 (a): Multiplication on stochastic bit streams with an AND gate. (b): Scaled addition on stochastic bit streams, with a multiplexer (MUX) 147 Figure 5.2 Stochastic computation to implement forward HMM algorithm HMM 149 Figure 5.3 (Upper):the observation sequence {1,3,3,2} of the Instantaneous firingrates for neurons N1 to N3 using the IFMP architecture. (Lower): the observation sequence {1,3,3,2} of the Instantaneous firing-rates for neurons N1 to N3 using stochastic computation architecture . . . 151 xv Figure 5.4 (a): Input spikes, membrane potential and output spike of neuron1 and neuron2. It shows how is the winner neuron1 inhibit the membrane of neuron2.(b): Full connectivity between 20 neuorons for processing the input pulses shown in (a) . . . . . . . . . . . . . . . . . 157 Figure 5.5 More than one winner in Oster’s architecture when three random input streams are applied to three neuron. . . . . . . . . . . . . . . 159 Figure 5.6 Raster plot for 20 neurons based on Oster’s spiking neuranal architecture. Two types of connectivity are implemented between the IF neurons, the first part (less than 400 units) are implemented when there is no weight between neurons. The second part ( greator than 400 units) in the full connectivity and implementation of Oster’s archhitecture showing that there is only one winner neuron . . . . . . . 161 Figure 5.7 Raster plot for four experiment to the Oster’s Architecture with stochastic inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Figure 5.8 (a): Conection between 20 neurons in the IFPM architecture. (b): Inhibitary connections between neurons in the IFMP architecture similar to the basic sensory system in the receptive field shown in (c) . 163 Figure 5.9 Pulse output response for three input IFMP architecure showing that the output rate is equal to the highest input rate . . . . . . . . . . . 164 Figure 5.10 (a):Spikes of neuron with respect to time using IFMP architecture with inhibitory connections between IF neurons. (b):Spikes of neuron with respect to time using without the connection between the IF neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Figure 6.1 Convergence of one IF neuron with input rate equal 0.2. It shows that the output rate (R) is equal to the input pulse rate (R1) . . . . 168 Figure 6.2 Convergence of one IF neuron with three inputs of rates 0.2, 0.1, and 0.3 respectively. It shows that the output rate (Rate) is convergence to the sum of the input pulse rates (R1, R2, and R3) . . . . . . . . 169 Figure 6.3 Convergence of the output rates for the three neurons N1 = 0.25, N2 = 0.05 and N3 = 0.45 in the IFMP structure when the input rates γ, L1 , and L2 are equal to 0.3, 0.7 and 0.5 respectively . . . . 170 xvi Figure 6.4 Dynamic characteristics for two-input IFMP module showing the convervence and the output rates for the three neurons N 1 , N 2 and N 3 in both theoretical and circuit(cadence) simulation . . . . . . . . . . 171 xvii Chapter 1 Introduction 1.1 Problem and key solution Recognition and classification are the basic sensing applications in biological sensory organs. Extensive researches are implemented to achieve these applications using artificial neural network algorithms (ANN) and the technology used in man machine interface . Yet, the processing time and architectures of these applications are far beyond the ”real time” processing and ”scalable structures” as in biology. In order to gain real time processing, the algorithms executed in software using Von Neumann architecture have to be replaced by hardware implementation. In addition to that, the computational algorithms themselves have to be scalable to gain efficient hardware implementation. Real time processing and scalability are the goal of implementing signal processing on a chip in general and implementing neuromorphic systems as a specific implementation. The goal of the neuromorphic systems is to build a silicon chip that mimics the brain in order to implement sensory devices in an efficient way as in biology [1, 2, 3, 4, 5]. Therefore, the architecture of the new systems has to be similar to the neuronal architecture systems which is different from the traditional Von Neumann architecture such as asynchronous- parallel processing instead of synchronoussingle processing, hybrid (pulse mode or analog and digital mode) computation instead of digital computation, neuron model as a basic core of the processing instead of the arithmetic 1 logic unit and finally, analog VLSI design instead of digital VLSI. As a contribution in this large effort, a novel and scalable algorithm and its hardware implementation is proposed to compute non-linear functions as an important procedure to implement signal processing algorithms in the sensory applications. To meet the above objective, the proposed algorithm is designed in pulse mode that can map the functions into an algorithm called margin propagation using an integrate and fire structure. Figure 1.1 shows the three concepts of the proposed algorithm that is firstly based on pulse computation, secondly mapped into Margin Propagation as an approximation method to log-sum- exp and thirdly the proposed module is based on an integrate and fire structure as a core unit. This chapter introduces Implement Design a new computational module Pulsed computation ≡ Hybrid computation Margin propagation (MP) ≈ log-sum- exp (LSE) Integrate and fire (IF) structure inspired from neuron cell Figure 1.1: Main concepts of the proposed module the following: Section 1.2 starts initially with the definition of pulse computation and how the variables are represented and the reasons behind using the pulse computation module. Then the section is followed by the background of the pulse computation architectures in 2 the research domain and the objective of the proposal. The following two sections 1.3 and 1.4 include concepts that meet the objective of the proposal. Sections 1.3 includes the logsum-exp function and margin propagation algorithm. Section 1.4 includes a brief description of the integrate and fire structure as the core unit of the proposed algorithm. Section 1.5 includes the tasks and methodology to verify the concept. Finally, section 1.6 states the organization of the consecutive chapters of the proposal. 1.2 Pulse computation The pulse computation algorithm is introduced to mix the advantages of both digital and analog design. The proposed algorithm is mapped into hardware as digital in amplitude level and analog in the variable time between these levels. The main goal for using pulse computation is to propagate easily the computed parameters between successive modules because the type of the input/ output signals are digital. Before explaining the details of the computational module, it is important to define and describe with an example the importance of pulse computation systems, background of the pulse computations, the objective of the proposal and finally the significant issues that meet the objective. 1.2.1 What and why pulse computation Pulse mode computation is an analog/digital (hybrid) computation which is digital(binary) in voltage level and analog (variable) quantity in time intervals between digital levels. In this mode, the data used is represented by a stream of pulses which are asynchronous and arrived at arbitrary times. Example of such data is the stream of pulses generated from neuron structure which is driven by a continuously varying current. 3 A pulse stream is represented as a stochastic stream of ones and zeros in which the probability of generating ones is considered as a pulse rate value. The pulse rate values are considered as the input /output variables of the pulse computation mode. In other words, the input/output variables are assigned as the probability of ones in a stream of random sequence of zeroes and ones as shown in the following mathematical procedure. If x is a vector of random inputs between 0 and 1 (x ∈ u[0, 1]), y is the generated pulse stream, then the probability of generating a stream y of rate L1 is calculated as, y = 0.5[ sgn(x − 1 + L1 ) + 1] =⇒ 2y − 1 = sgn(x − 1 + L1 ) p[y = 1] = p[sgn(x − 1 + L1 ) = 1], and [y = 0] = p[sgn(x − 1 + L1 ) = −1] therefore the probability that the stream y has logic 1 is equal to the rate L1 calculated as following, ∞ 1 p[y = 1] = p[x > (1 − L1 )] =⇒ 1−L u(0, 1)dx = 1−L dx = L1 1 1 Pulse computation is a promising research topic since it mixes the advantages of analog and digital designs [6, 7]. The noise accumulation in analog stages can be eliminated by digital noise immunity. The analog design has the advantage of small area, low cost and low power especially if the design of computational units is implemented in weak inversion mode of complementary metal oxide semiconductors (CMOS). Table 6.1 shows the main differences between the digital and analog systems. The first three items show the benefits of analog devices in computations while the last two items show the advantages of the digital systems. 4 Table 1.1: Comparison between analog and digital systems. 1.2.2 Analog Digital Efficient computations use continuous values (currents and voltages). Computations uses discrete values (Logic 0 and 1). Computations arise from physics of computing devices (capacitors, resistors..etc.). Computations arise from the mathematics of Boolean logic( AND, OR..etc.). Designs have high density, small size and low power (addition requires only wires, multiplication requires 7 transistors). Designs have less density, larger size and higher power (Addition require 18 transistors and 2 bits multiplications requires 60 transistors). Noise is due to thermal fluctuation in physical devices and it accumulates in the multi-stage designs. Noise is due to round off error and it does not accumulate. Signal is not restored at each stage of the computation. Signal is restored at each stage of the computation. Example of a pulse computational system The brain is an example of hybrid computation because the type of signals transmitted between neurons are spikes (or pulses) and the processing of signals inside the neuron is analog. Next, a discussion of the power and efficiency of the brain in processing compared to computers. Computers are no real competition for the human brain in areas such as vision, hearing, pattern recognition, and learning. Computers, for instance, cannot match our ability to recognize a friend from a distance merely by the way he walks. And when it comes to operational efficiency, there is no contest at all. The ”100 trillion synapses and 100 billion neurons in the brain process information faster and with a higher density (data stored in the synapses) than the software simulation of the neural networks using mathematical models. For example: 2 duo core chips are needed to simulate only thirty neurons, a 32 chip board 5 is needed to simulate 302 neurons, and a 64 board rack of super computers is needed to simulate 1 million neurons. For the super computer, the cost is in millions of dollars and the power bill costs thousands of dollars in order to simulate the brain of a bee (1 million neuron). A comparison of processing data between the brain and computers based on the technical parametric and functionality are listed below: [8]. 1.2.2.1 Technical specifications Energy cost: The computational power measured in Joules per operation in the brain is 108 times smaller than the microprocessor. To explain that, let’s take an example showing the difference in energy between the brain and processor taking into consideration that each synaptic junction between neurons represents an instruction operation in the microprocessors [9, 10]. Given the followings information: If the neurons fire at an average of 10 Hz in a human brain that has about 1012 neurons, 1015 synapses and uses an approximate power of 10W. Assuming that the dominant energy consumer is synaptic activation and that a basic operation is a synaptic activation (synapse is the junction between a pre and post neuron synaptic). Then the energy cost of this activation is computated in Joules/Operation and compared with the cost per Multiply-Accumulate (MAC) of contemporary microprocessors, e.g. 100 W and 1 Hz for 1 G operation [10]. For human brain; Energy cost = Energy/Number of synapses Energy cost = (10 W) .( 0.1 sec)/1015 = 10−15 J/operation For Multiply-Accumulate (MAC) in microprocessors; Energy cost = Energy/Number of operations Energy cost = (100 W) . (1 sec)/109 = 10−7 J/operation 6 Therefore, the energy - cost for the human (10−15 J/operation) is less than the energy cost in the microprocessor (10−7 J /operation) by an order of eight. Power: The power consumption in the brain is much lower than the computer processor. The power consumption in the brain is 25 W, while in Pentium, the power needed is 130 W. The power needed in software simulation to mimic the function of brain in computation is about 1010 W [3]. Frequency: The spike frequency in the brain is between 1-100 Hz while the clock frequency in Pentium is in GHz. Size: The size of brain is just 1 liter while in computers technology, a typical room-size supercomputer weighs roughly 1,000 times more, occupies 10,000 times more space than the cantaloupe-size lump of neural tissue that makes up the brain [3]. Cross connection: The synapse in neurons is equivalent to executing an instruction in the microprocessor. Synaptic activity is 1016 neural connection /sec [3]. It would take one million Intel Pentium processors to match that rate besides tens of megawatt (cost of time, power and energy) in software to process these information. 1.2.2.2 Functional features The computer system is a powerful machine in calculation and processing data, yet, it is limited in the tasks which the brain is capable to perform. The attractive features which support such tasks in the brain have different functions from the processor as follows: Parallel processing: The neurons work in parallel rather than sequentially, as in the processors. In other words there is no central processing unit to schedule the tasks serially. The access time of the Input/ output information is the same in both the silicon memory 7 and the sensory/motor biology neural system which is measured in nano seconds. However, the parallel architecture of the brain enhance the processing of the information compared with computers. It is well known that computers are no real competition for the human brain in areas such as vision, hearing, pattern recognition, and learning. Clock orientation: Neurons are self timed while computers are oriented systems that run on the global clock. Memory: It is available during computation in neurons within synapse connections, while in computers the data is distant from computation. Adaption and modification: In the brain, learning and adaptation are continuous while the processors are fixed as they are designed and fabricated. Determination of signal state: The type of signal is stochastic (i.e. digital analogdigital) while being digitally deterministic in processors. Tables 1.2 and 1.3 show the technical specifications and functional differences between the brain and computer. The above specifications are motivation in designing new architecture in the neuromorphic systems in which the design principles and architecture are neuronal inspired. Scientists in the area of Information Theories, Computer Science, Communication, Mathematics, Biology and Engineering are all working together to mimic the human brain on a silicon chip. The primary reason for implementing neural models is attempting to gain possible brain like performance. It is well known that the type of signal in the brain is spike or pulse based. Therefore, processing the information is needed to exploit methods in pulse mode processing. Pulse system research is concentrated in two directions, first the development of computational algorithms to encoding information [11, 12, 13, 14, 15]; and second to morph 8 Table 1.2: Comparison of parametric specification between the brain and the computer for simulating human cortex. The computational energy/cost in brain is 108 times smaller (efficient) than computers. Specification Computer simulation of human cortex Brain Power 1010 W consumed in running software simulation in addition to the power needed in the system hardware. 25 W. Size A typical room-size supercomputer weighs roughly 1,000 times more, occupies 10,000 times more space. 1liter. Energy cost 10−7 J/operation,for 1G operation in 1s and power =100 W. 10−15 J /operation, for a fire rate =10 Hz, power ≈ 10 W, and a total number of synapses = 1015 . the above algorithms on silicon elements to encode the input stimulus and then process the spikes to mimic the biological sensory systems [16, 4, 3]. This thesis mixes between the two research directions. It introduces a novel pulse based computation algorithm and verifies the algorithm on the circuit level. But, what are the pulse computation algorithms in the research field and what is the contribution of the proposed thesis? The answers for these questions are explored in the next section. 1.2.3 Survey of pulse computation The goal of pulse computation algorithm is to introduce efficient hardware architecture to implement signal processing tasks such as recognition and classification. The efficiency means 9 Table 1.3: Comparison of functional specification between the brain and the computer for simulating human cortex. Computer Brain It is a global clock oriented system, in GHz. It is a self timed system, 1-100 Hz. It has a 1- 8 core processors and it is based on serial computation. It has distributed computational units and it is based on parallel computation. The buses share several components. There are dedicated local point to point connections. The signal is digital and it is a timediscrete. The signal is digital in levels (spikes) and it is analog in time. The memory is distant from computation. The memory is available at computation location. The memory and processor are separated. The storage and computation happen at the same time and in the same place. Devices are fixed as fabricated. They are programmed devices. It continuously adapts, modifies and processes information. It is sensitive to noise. Noise is a beneficial parameter [5]. low power, compact and scalable architecture. The architecture of pulse computational algorithms are introduced in different categories: • Pulse computation is based on digital gates using stochastic computations. • Pulse computation is based on analog circuits using finite state machine. • Pulse computation is based on analog circuits using neural integration. 10 1.2.3.1 Pulse computation based on stochastic computations : Stochastic computation (SC) was introduced in the 1960’s as a method to design low precision digital circuits. The essential aspect of the SC is the possibility of performing complex computations using only simple circuitry. In SC, the input variables are represented as a probability of logic one in a stream of inputs of zeros and ones. The multiplication, scaled addition and division can be implemented using ”AND”, ”Multiplexers” and ”JK-flip flops” respectively as shown in Fig.1.2. For the AND gate shown in Fig. 1.2-(a) for example , if the input rates are 6/8 and 4/8, then the output rate is 6/8 × 4/8 = 3/8, as expected for the multiplication operation. Another example for the multiplexer shown in 1.2-(b), if the input rates are 1/8, 5/8, and 2/8, then the output rate is 2/8 × 1/8 + (1 − 2/8) × 5/8 = 4/8, as expected for the scaled addition. Experiments on polynomial functions used in image processing show that SC method produces circuits that are highly tolerant of input errors. The accuracy degrades gracefully with the error rate [17, 18, 19, 20]. SC has recently shown to be able to provide near optimal decoding performance for decoding with respect to sum product algorithms with small coding and specific applications, but it fails to build huge decoding systems due to SC weakness in computational precision and dynamic range. SC fails for some of the computation process, for example, the multiplication of two independent streams (p) such as p2 = p × p using logical bitwise AND operation lead to misleading computation. In addition to that, SC fails because the parameters used in computations should be scaled up or down depending on the math operator, for example, the addition cannot be performed directly since the addition of two stream rates might be greater than one, therefore scaled addition must be used to implement the addition operator. 11 a= 6/8 A B 1,1,0,1,0,1,1,1 1,1,0,0,1,0,1,0 AND b= 4/8 c= 3/8 1,1,0,0,0,0,1,0 C (a) Pa J Q Pb K Pc =Pa / (Pa + Pb ) CLK a = 1/8 A 0,1,0,0,0,0,0,0 c= 4/8 1 b = 5/8 B 1,0,1,1,0,1,1,0 0 S 1,0,0,1,0,1,1,0 C 0,0,1,0,0,0,0,1 s= 2/8 (b) J K Q 1 0 1 0 1 0 0 0 No change 1 1 Reverse (c) Figure 1.2: Stochastic computation (a): Multiplication stochastic bit streams with an AND gate. (b): Scaled addition on stochastic bit streams, with a multiplexer (MUX). (c): Stochastic division [17, 21] Another limitation of the SC is the low dynamic range of the computation in the sum -product algorithms. In such algorithms, the multiplication of two probabilities (represented as two bit streams) tends into a smaller quantity and hence multiple multiplications tend to much smaller value which results into an under flow problem in computation. In SC, the normalization step could be implemented using JK flip flops as division operator and multiplexer as scaled addition to realize the data in computations. But this process tends to be a difficult problem as the design size of the applications is huge (for example, in the case of hundred states, the implementation needs hundred JK flip flops and hundred to 12 one multiplexer). In addition to that, the huge number of states and the number of input stream need a storage media such as registers. This will definitely increases the size of the design. Therefore, the low dynamic range of computations, the under flow problem and the normalization difficulties lead into false computation, low precision and long time to reach convergence ( or never reach the convergence stage) in factor graph computations. 1.2.3.2 Pulse computation based on finite state machine : A new technique was proposed to introduce pulse computation using finite state machine. This computational domain may hold the possibility of discovering silicon analogues to the enormously complex computational operations. In [22], elementary computational units named as ”Macher” and ”Differencer” are implemented in hardware using CMOS 2µm. These units are the basic units to form more complex processing structures. The foundation of the pulse computation using state machine is the operation of pulse matching and difference. Fig.1.3 (a) illustrates matched and unmatched pulses, given two signal lines P 1 and P 2, then the signals are matched by pairing a pulse on P 1 with a pulse on P 2. A pulse on P 1 is unmatched, by contrast, if another pulse arrives on it and there are no pulses on P 2 during the successive time period. Fig. 1.3 (b) shows the state transition diagram of the ”Matcher” where the matched input pulses produce pulse outputs on line O2. if the machine is in P 1 mode (state S1) and receives an input pulse on P 1, no output is produced; but, if the pulse arrives instead on P 2, an output pulse is produced on O2 and the machine transitions to P 2 mode (state S2). Fig. 1.3(c) shows the state machine of the ”Differencer”. If the machine is in ’P 1 mode’ (state Sl), it will transition to ’P 2-mode’ (state S2) if a pulse received on input P 2, and remain in S1 either if nothing happens, or if a pulse is received on input P 1. Output (a 13 unmatched P1 P1 P1 S1 P2 matched matched (a) P2 [O1] [O2] P2 S2 P1 P1 S1 P2 [O1] (b) P2 S2 [O2] (c) (d) Figure 1.3: Pulse computation using state machines. (a): matched and unmatched pulses. (b): simplified state diagram for the matcher. (c): simplified state diagram for the differencer. (d): matched circuit [22]. (For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation) pulse on line O1 or O2) is shown as [O1] or [O2]; so if the machine is in S1 and receives a pulse on P 1, it will output a pulse on O1. Therefore, this device outputs pulses only when unmatched input pulses are received. The analog circuit and the oscilloscope traces from matcher is shown in Fig. 1.3(d) and Fig. 1.4(a) respectively. The circuit of the matcher is an SRAM cell constructed from crosscoupled inverters (transistors M P 1, M P 2, M N 1, M N 2) to hold the state. States (Sl) and (S2) are marked on the diagram; (S2) high means the circuit is in P 2-mode. In P 2-mode, a pulse arriving on P 1 first opens, then closes the left-side pass-gate (M P 11, M N 11). This 14 charges the dynamic node A high. When P 1 falls low again (at the end of the incoming pulse,) the output of the NAND gate formed by ( M P 1a, M P 1b, M N 1a, M N 1b) then drops low; this signal is inverted by (M P 12, M N 12) onto O1. An output high on O1 discharges the dynamic node through M N 13, causing the system, including O1, to reset. The result is a brief voltage pulse, [O1], which also (through M N 14) sets the SRAM to state S1. If the incoming pulse had instead been on P 2, the right-side pass-gate (M P 21, M N 21) would have opened; but since the dynamic node B would not have been raised high no output nor change of state would have been produced. Oscilloscope trace for matcher state diagram P1 P1 P2 D P2 O1 O2 P3 D O2 O1 O2 M O1 Out Time (a) Oscilloscope trace for the peak detector (b) P1 P2 P3 Out Time (c) Figure 1.4: (a): Oscilloscope traces from matcher. P1 and P2, inputs; O2 and O1 outputs of states S1 and S2 for the matcher state diagram. (b): Block diagram of the peak detector application. (b) Oscilloscope traces from shows peak detection, P1, P2, P3,are the inputs; P4 is the output state diagram [22] 15 The basic elementary circuits are used to built a more complex application, for example, peak detector is designed using mixed units of differencer and matcher shown in the block diagram of Fig. 1.4 (b). The design considers the conversion of the signal amplitude into frequency and uses the concept of edge detection to find the highest frequency or the highest amplitude signal. In the figure, the outputs of the two differencers are the edge detectors between inputs P 1, P 2 and between P 2, P 3 respectively. The matcher generates an output pulse, with an average frequency which reflects the strength of the peak as shown in Fig. 1.4 (c) where the first three traces are the inputs and the last trace is the output. The occurrence of the output pulses indicates the input trace with the highest frequency. Although the above method is attractive in accuracy and programmability but the limitation here is in mapping the mathematical expressions needed in the applications of the signal processing algorithms. Also, the above computation unit is not scalable which means that any new application to design in hardware requires a new architecture of state machine and new analog circuits. 1.2.3.3 Pulse computation based on neural integration Pulse stream computation was first introduced in the context of neural integration by Murry in 1987 [23]. Since then it has been used by a number of other groups [24, 25, 26, 27]. Before introducing the concept of neural pulse computation, the neuron itself is explored first as a pulse computation unit then an explanation of how to address the pulse computation among the digital and analog computation methods. Pulse computation in biological neuron was first noticed, modeled and verified in experiments made by Hodgkin and Huxly in 1952 [28, 29, 30]. It is well known that the neurons communicate with each other through pulses. The pulses in the axon release neurotransmitters from the pre to the post synapse to generate pulses in the post neuron. As the 16 neurotransmitters increase, the rate of the output pulses increase. The increment of neurotransmitters in biological experiments is represented by increasing the value of the injected current in the neuron membrane. It was proven in biological experiments on a giant squid axon and compared with a simulated neuron model that the transfer function of a neuron between the activation input current and the output spike rate is close to the logarithm function [28]. That means the addition of two input rates in one neuron represent the multiplication of the two rates [31, 29]. The fact that neuron multiplication of the input pulses realize specific tasks, for example, a motion detection task based on the correlation principle that multiply the input signal with the delayed version of the same or another input signal [30]. On the other hand, the decision made by one neuron is a function of the weighted sum of all the inputs in the neural population. If V j is the input pulse stream from neuron j to neuron i, and the weight of the synapse between these two neuron is denoted by Tij , then the output pulse stream Vi is a function of the weighted sum activity xi such that, j=n Vi = f (xi ) = f ( Tij Vj ) (1.1) j=1 The above is the weighted sum equation which is the core of artificial neural network (ANN) algorithms that is realized in hardware as digital, analog and hybrid VLSI computation. Digital techniques are used in the implementation of the weighted sum, the hardware platform can be implemented on a special chip, for example, the connected network of adaptive processors (CNAPS) chip is based around the processor node shown in Fig. 1.5 (b). The node comprises a multiplier ( to form Tij Vj ), an ”Adder” to perform the summation, RAM memory device, input/output control and an arithmetic shifter. Each CNAPS comprises a 17 64 processor Node. The device is configurable via microcode to define the architecture of CNAPS based system. CNAPS is one of the first commercial neurocomputers. This design puts the emphasis on a very limited connectivity: only two shared input and output buses connect the processing elements. Since there is no dedicated weight port on each processing element, weight memory is internal to the processing elements and, as such, rather limited [32, 33]. In addition to that the digital computations occupy large area and consume power for designing dense parallel networks [32, 33, 34]. Bacause of the limitation of the connectivity in digital architecture, analog techniques are used to design fully parallel architectures such as in Hopfield network shown in Fig. 1.5 (a). Although analog architecture sacrifices the computation precision, the low power encourages a massive dense network to be built. Yet, the Hopfield network was abandoned for a decade because of limitations in programmability. In the Hopfield network, the computation of the synapse weight is represented by the resistors and the op-amp which imply limited programmability. Once the resistors are fabricated within the CMOS process, the weight value and the network functionality are fixed which causes a lack in the dynamic change of the weight. However, introducing an array of resistors and switches to select the proper weight, increases the complexity and the size of hardware. Later on, the resistors in the above design are replaced by weak inversion MOSFET operation which take small space on the hardware chip. Mead [16] was the pioneer in introducing that since he found the similarity in operation between the transistors and nerve cell in which both has exponential transfer function. Yet, the computation in analog has limited programmability and storage. Many researchers later on used the floating gate that can dynamically change the weight. However, cascading modules of computational networks as well as the storage capability in analog hardware computation are still an open research topic. 18 Resistors (Synapse) V0 Rf R23 = V .T added to input of neuron N 3 23 2 V1 N2 (a) Output unit Input unit 32 x16 bits V2 V3 V2 =.. + (R f /R23).V3 Logic 4K bytes Logic (b) Figure 1.5: (a): Resistor array in Hopfield as fixed function analog as (b): Processor node in digital VLSI To summarize the above, digital circuitry offer advantages to compute Tij Vj in terms of simple design and accuracy, but does not allow more than few hundred parallel multiply-add operations on a chip. However, analog VLSI allow massively parallel arrays of interconnected neurons in terms of thousands synapse that can be integrated on a single chip. These considerations have led to adapt a pulse stream computation technique which perform analog multiplication under digital control. 19 The underlying principles for using pulse coded stream in neural computations are the followings: 1. Analog computation is attractive in neural VLSI, for reasons of compactness, potential speed, asynchronous, and lack of quantization effects. 2. Analog signals are far from robust against noise and interference, are susceptible to process variations between devices and are not robust in the inter chip communication. 3. Digital silicon processing is more readily available than analog. 4. Digital signals are robust, easily transmitted and regenerated. 5. Digital consume more power and area than analog. Figure 1.6: Some of the pulse coding, A is the pulse amplitude modulation (PAM), B is the pulse frequency or rate modulation (PFM),and C is the pulse duration modulation (PDM) These considerations encourage a hybrid approach, seeking to blend the merits of both digital and analog technology. The pulse stream technique uses digital signals to carry information and control analog circuitry, while storing further analog information on the time axis. A number of possible techniques exist for coding a neural signals shown in Fig. 20 1.6 such as Pulse Amplitude (PAM), Pulse Width (PDM), and Pulse Frequency (PFM). The PFM is defined as the number of pulses per unit time corresponding to the amplitude of the analog signal. PFM is the coding mode used in the proposed computation model and it is referred as the rate or density modulation. In the following, the approach of the pulse computation is introduced in mapping a dense computation of the weighted sum in the network architecture [35]. Fig.1.7 (a) shows the architecture of a single network of n totally interconnected neurons. Neurons, represented by circles, signal their states {Vi } upward into a matrix of synaptic operators. The state signals are connected to an n-bit horizontal bus running through this synaptic array, with a connection to one synaptic operator in every column. Each column consists, therefore, of n operators, denoted by squares, each adding a new contribution Tij to the running total of activity for the neuron i at the foot of the column. The function of the neuron is therefore to apply a sigmoid function to this activity to determine a neural state V . The synaptic function is to multiply a neural state Vp by a synaptic weight Tip (stored in memory local to the synaptic operator), and add the result to a running total. The neuron function is illustrated in Fig. 1.7 (c). The neuron is shown as receiving excitatory and inhibitory inputs, and producing a state output. The synaptic function is also straightforward at the functional level. Fig. 1.7 (b) shows a single synapse block. The (positive or negative) synaptic weight Tip is stored in digital memory. To form the product Tip Vp , the presynaptic neural state is gated according to the chopping signals derived from Tip . The resultant product, Tip Vp is added to the running total propagating down either the excitatory or inhibitory activity channel, to add one term to the running total, as shown. One binary bit (the MSBit) of the stored Tij determines whether the contribution is excitatory or inhibitory. The circuit of neuron that implements the sigmoid function and the circuit of 21 j=n−1 T1,n T2,n Tn,n T1,3 T2,3 Tn,3 T1,2 T2,2 Tn,2 T1,1 T2,1 Tn,1 1 n 2 Excitatory Inhibitory Tij j=p+1 Neural state Vp c Synapse Synaptic c array Weight c Tip cc c Synapse j=n−1 Tij Neuron Elements j=p (b) Excitatory and inhibitory Neuron (a) Output (c) Figure 1.7: (a): Schematic architecture for a pulse - stream neural network. Neurons are denoted as a circle (c) and synaptic operators is denoted as square shape (b)[36] the synapse that introduces the weighted sum is shown in Fig.1.8 (a,b) respectively [36, 34]. Another computational model of pulse coded neural network is introduced in [27, 37]. The levels of input pulses that represent the output from pre neuron Vi and the weight Ti introduce current into the synapse- neuron circuit which encode the input current into pulse density coding shown in Fig. 1.9(a, b). The output of the neuron in Fig. 1.9(b)is transmitted to other neurons via synapses. The weight inputs represents the connection strength (synaptic weight) between a particular neuron to another. Currents from all synapses connected to a neuron are summed at the input. The net current modulates the pulse train emanating from the neuron. As an application to this weighted sum circuit, a pulse coded 22 T ik memory From previous synapse Excitatory (a) Weighting clocks Φ1 Φ2 Φ3 Φp-1 Vk Inhibitory Threshold control Excitatory and Inhibitory outputs (a) (b) Figure 1.8: Neuron and synapse circuit in (a) and (b) respectively [36] winner-take-all network (PWTA) has been implemented in a 2-micron CMOS process [27]. Fig. 1.9(b) shows a block diagram of the PWTA. The three neurons receive weighted inputs from V1 and V2 . These three neurons in turn feed their outputs (Y1 , Y2 , and Y3 ) to a global inhibition generator. This global inhibition generator feeds back inhibition signals to all the neurons in the network. Thus, if the output of one neuron is high, the Iinh inputs of all the neurons will be forced high. This will in turn discharge all the integrating capacitors in the network and thereby effecting the inhibition of all the neurons. Many applications are implemented in signal processing and ANN in digital, analog VLSI and in pulse computation, but there is no scalable computation algorithm to implement more than the ”weighted sum function”. The limitation is that they are specific to certain application (algorithms based on weighted sum function) which means that any change in the applications needs to change the hardware structure and build a new system design. To 23 Ti Inet M1 Vi M2 Yi Yi Inet t (a) (b) Y1 Y2 Y3 Iinh Inhibit V1 V2 (c) Figure 1.9: (a):Output pulse generated from input current (b):Synapse connection. (c):Pulse coded winner-take-all block diagram [37, 27] address this limitation, an array of scalable -pulse computational modules are introduced to implement nonlinear functions (not restricted on computing only the weighted sum) and to map the algorithm into analog VLSI hardware. In order to achieve that, the proposed algorithm has to meet the items discussed in the next section. 24 1.2.4 Thesis objective In this thesis, a scalable pulse computation algorithm and its hardware implementation based on integrate and fire structure and Margin propagation is developed to meet the following: 1. Convergence property: As a stochastic pulse stream computation, the algorithm has to meet the convergence and stability performance for the local unit and whole structure of the module. 2. Cascading and scalability property: The module can be cascaded in serial and parallel. The serial cascading supports the scalability property which means that the basic unit of the module can be cascaded to handle a growing amount of work in a capable manner. 3. Computation properties: As a new algorithm, the module has to meet the basic mathematical properties: scaling property, monotonicity, convexity, superposition and offset property. 4. Basic computation: The proposed module is able to map the basic computations (addition, subtraction, multiplication and division) and math expressions (power, polynomial and inner product) in pulse mode. 5. Computation capabilities: The computation capability means that the module is not restricted on computing only the weighted sum terms, but this unit can be mapped into other types of computations such as the dynamic equations in the Hidden Markov model and recursion equations in the Support vector machine. In addition to that, the IFMP algorithm generates an output that is equal to the maximum rate among the inputs with an adjustment of certain parameters. 25 6. Hardware realization: The basic concept of the thesis is the capability of mapping the algorithm into analog circuits. 7. Power and size efficiency: Since the algorithm can be verified in analog VLSI circuits, then it can be easily said that the design is compact (occupies small size) and low power since the design is adjusted in sub threshold or weak inversion mode. 8. The significance of the new pulse computation design: Because of the importance of the spiking networks with real-time and random input pulses, it is important to show the impact and significance of this design in the research space. To meet the above objectives and implement the pulse computation algorithms, the equations of the algorithm have to be first mapped into the Log-sum-exp function and then mapped into margin propagation procedure and finally use the integrate and fire structure as the basic core to implement the algorithm. In the next two sections (sections 1.3 and 1.4), it is important to explain the reason of using log-sum-exp and how is this function related to both the margin propagation and the integrate and fire structure. In section 1.5, the methodology and tasks are listed to achieve the thesis objectives. 1.3 Log-sum-exponent and margin propagation To meet the objective, the proposed algorithm and hardware architecture uses log-sum-exp and margin propagation. The first term ”log-sum-exp” abbreviated as LSE is an important function because of the following: 1. LSE is used in factor graph and signal processing applications such as Bayesian belief propagation [38]. In factor graph algorithms, the basic functions used are the sum of 26 product terms ( probability terms) which are used to evaluate the probabilities or the marginal functions of passing messages between the nodes and variables of the factor graph. Since the product of probability terms tends to decrease as the number of probability terms increase, then a problem of underflow will be popped up and cause false computations. Therefore, such algorithms use the log-likelihood computation to eliminate the underflow problem. 2. LSE is used to increase the dynamic range of variables in the computation process since it depend on logarithm functions of the input parameters. 3. For n input parameters, the LSE function evaluate the max value of the input parameters as shown in the following proof: Let P1 , P2 , P3 ..Pn represent probability variables and Let f= k P1k .P2k .. −→ Pnk (1.2) Then the LSE expression is represented as, elog(P1k )+log(P2k )+..−→log(pnk ) log(f ) = log (1.3) k Now Let a = log(P1k ) + log(P2k ).. + log(Pnk ) and set b = max(aik ) then log(f ) = log(eb eaik −b ) = b + log( eaik −b ) b (1.4) The last term is approximately equal to b if all the terms of aik ( except the max value) are too small. The above trick is to shift the terms of aik to the right by a maximum value 27 of inputs equal to b, evaluate the log-sum-exp then shift back to the original position. In this work, the LSE expression is the first step to map non-linear functions. t can be said the following : ”If a linear/ non-linear function can be mapped into LSE, then the function can be implemented in the proposed architecture”. An example below demonstrates a mapping process of any function into LSE. Example: Let a,b,c and d ∈ R and let f = ab + cd, then log(f ) = log(ab + cd) (1.5) z = log(f ) = log[elog(a)+log(b) + elog(c)+log(d) ] (1.6) The last term of the example is called the LSE expression. As mentioned earlier, the LSE expression is used in factor graph algorithms. Next, the factor graph algorithm is defined and how it can be represented in LSE. A factor graph is a graphical representation that shows how a function of several variables can be factorized into a product of smaller functions and it is useful because it simplifies complicated efficient algorithms. In the factor graph, the nodes (rectangles) represent the factors or functions while the edges represent the variables [39]. The proposed model can be applied in factor graphs such as in the inference problems. In such problems, the probability of the message propagated between two independent graph sets can be evaluated based on belief propagation. An example shows the concept of factorization by evaluating the probability in any path of the graph using sum product rule [39]. The main idea of using the factor graph is to extract some variables by sum operators that reduce the computation of messages passing through nodes and edges. Lets take a function f (x1 , x2 , x3 ..xn ) where f is the probability density function (PMF) of discrete random variable x1 , x2 , x3 ..xn , then 28 in order to compute fk (xk ) such that (example: n = 4 and k = 3 in Fig 1.10), fk (xk ) = f (x1 , x2 , x3 ..xn ) (1.7) x1 ..xn except xk where fk (xk ) is the PMF of xk . If fk (xk ) can be divided into two independent graph sets, then it can be inferred that the probability of the message propagated between the two sets is calculated by multiplying the probability of the two sets. This can be calculated by using sum-product algorithm in which the large marginalization is split into a sequence of small marginalization. For example, assume that fk (xk ) = f (x1 ..xk )f (xk ..xn ) (1.8) x1 ..xn except xk and the marginal probabilities pa (k) and pb (k) are denoted as, pa (xk ) = f (x1 , ..xk ) (1.9) f (xk ..xn ) (1.10) x1 ..xn except xk pb (xk ) = xk ..xn except xk then the probability of the message passing in path k is: p(xk ) = pa (xk )pb (xk ) (1.11) The above equations are expressed in log-sum-exp as following: elog(f (x1 ..xk )) ) log(pa (xk )) = log( x1 ..xk except xk 29 (1.12) elog(f (xk ..xn )) ) log(pb (xk )) = log( (1.13) xk ..xn except xk log(p(xk )) = log(pa (xk )) + log(pb (xk )) f 2 f4 f4 f4 x4 x5 (1.14) x2 f5 pa (x3) x3 f3 x1 f1 p b (x3) Figure 1.10: Factor graph representation to show that p(x3 ) = pa (x3 ).pb (x3 ) In factor graph, the algorithms are applied extensively using analog input parameters. But in this work, the input parameters are represented as a stream of random pulses and applied in scalable architecture. Since it is difficult to implement modularity in LSE, an approximation method to the LSE is needed. Therefore, Margin propagation method is introduced as an approximation method to LSE. In previous work [40], it was proven that margin propagation is an approximation method to LSE. The input/output variables in that work are represented as currents and the computation procedure is implemented using Kirchoff’s current law. Margin propagation was implemented in these circuits to achieve scalability in the decoding algorithms. In this work, the concept in pulse computation mechanism of margin propagation as an approximation method to LSE is introduced. The core of the margin propagation (MP) algorithm is based on the concept of the reverse water-filling (RWF) algorithm, [41]. Given a set of random inputs (scores) Li ∈ R; i = 1 : N , the RWF algorithm computes the solution z according to the constraint 30 Log - sum - exp 𝑧= log (𝑒 𝐿1 +𝑒 𝐿2 … ) Reverse water filling constraint Margin Propagation z= 𝑧: [𝐿𝑖 - z] + = γ M(𝐿1 , 𝐿2 , … 𝐿𝑖 , γ) 𝐿𝑖 [𝐿𝑖 - z] + z 𝐿1 1 2 3 𝑖 4 5 6 7 Figure 1.11: Water level filling constraint N i=1 [Li − z]+ = γ (1.15) where [.]+ = max(., 0) denotes a threshold operation and γ ≥ 0 represents a parameter of the algorithm. Note that z is in the log domain. The solution of the equation 1.15 can be visualized using Fig. 1.11, where the cumulative scores beyond z ( shown by the shaded area) equal γ. The limit of z is such that z ≤ maxLi i Equation 1.15 can be rewritten in the form of z = M (L1 , L2 , ..., LN , γ) 31 (1.16) where M denotes the MP function and N denotes the number of operands in the MP function. Then LSE is represented as following N eLi ) z = log( M (L1 , L2 , L3 ....LN , γ). (1.17) i=1 To summarize the above, it can be deduced that LSE is a useful function to map signal processing equations or factor graph algorithms. Yet, it is difficult to represent a cascaded LSE in analog circuits. In this work, the LSE is approximated into MP algorithm which is verified to implement scalable computation. The approximated concept from LSE into MP is realized in hardware using a novel design based on integration and threshold operation. This operation is realized using integrate and fire structure (explained in the next section) as the core unit of the computational module. 1.4 Integrate and fire structure In order to realize the threshold operation used in the MP algorithm discussed in the previous section, it is important to implement this operation in the circuit design level. The integrate and fire (IF) structure is introduced to calculate the threshold operation. IF model is a simple representation of a neuron, it is extensively used as a neuron model in spiking neural networks [42, 43, 44] and neuromorphic systems [45, 4]. The neuron model is an important computational structure and it is considered as the basic core in designing ANN which has huge applications in processing biological, statistical and engineering applications. To understand the operation of IF neuron model, the structure of neuron is explained briefly. The neuron has three parts as shown in Fig. 1.12: 32 1. Dendrites: These branchlike projections of the cell make connections to other cells and allow the neuron to talk with other cells . 2. Soma: It is the place that joins the signals from other neurons via dendrites. The soma and nucleus do not play a role in the transmission of neural signals but they maintain the cells functionality. 3. Axon: It is located at the end of the soma. It controls the firing of the neuron. If the total potential reaches a certain threshold, then the neuron will fire. It propagates the Membrane potential (mv) spikes to other cells. Action potential Presynaptic Na+ conductance 30 K+ conductance Neurotransmitter Na+ -65 Na+ Refractory period Time (ms) K+ Postynaptic K+ K+ Synapse Soma Dendrites Axon -Hillock Axon Synapse Figure 1.12: Mechanism of spike-generation and signal propagation through synapses and neurons Nerve cells communicate with one another using spikes ( pulses) passed through specialized junctional structures called synapses. The spikes are generated when pulses exit neurotransmitters in the presynapse and release ions to the postsynapse as shown in Fig.1.12. The 33 underlying mechanism of spike or action-potential generation is due to unbalanced movement of ions across the membrane which alters the potential difference between the inside and the outside of the neuron. In the absence of any stimuli to the neuron, the potential of the membrane with respect to the potential outside the membrane is about −65 mV, also referred to as the resting potential. This potential is increased by the influx of sodium ions (N a+ ) inside the cell causing depolarization whereas the potential is decreased by the efflux of potassium ions (K + ) outside the cell causing hyper polarization. Once the action potential is generated, the (N a+ ) ion channels are unable to reopen immediately until a built-up potential is formed across the membrane. The delay in reopening the sodium channels results in a timeperiod called refractory period, as shown in Fig. 1.12, where the neuron cannot spike. There are two types of synapses typically encountered in neurobiology: excitatory synapses and inhibitory synapses. For excitatory synapses, the membrane potential of the post-synaptic neuron (referred to as the excitatory post-synaptic potential or EPSP) increases, whereas for inhibitory synapses, the membrane potential of the post-synaptic neuron (referred to as the inhibitory post-synaptic potential or IPSP) decreases. It is important to note that the underlying dynamics of EPSP, IPSP and the action potential are complicated and several books have been written to discuss only the mathematical model [46, 43]. Therefore, for the sake of brevity, a simple integrate-and-fire neuron model is described which is used in the design of neuromorphic sensors [4]. A schematic diagram of the IF model is shown in Fig. 1.13 (a). The basic circuit inside the dashed circle represents the IF model. The IF structure is described as following: when a current I(t) charges the capacitor c representing the membrane of a neuron, a voltage vm (t) across the capacitance (points) is built and compared to a threshold. If vm (t) = vth f f at firing time ti , then an output delta type pulse δ(t − ti ) is generated. The generated 34 From neuron j axon Pre-synapse spike Soma  (t  tif ) wi1 Synapse  (t  t ) f j I weight Soma wi 2 I(t) Soma ∫dt + C c dvm  I (t ) dt (a)  (t  tif ) wiN Integration threshold (b) Synaptic weights membrane potential Pre-synaptic spike wij xi dvmi vmi N    wij (t  t jf )  xi (t ) dt  f j 1 f Neuronal time-constant Input signal (c) Figure 1.13: (a): Simplest representation of IF representation as neuron model neuron. The basic circuit is the module inside the dashed circle on the right-hand side. A current I(t) charges the C circuit and a voltage across the capacitance is compared to a threshold vth . Left part: A presynaptic representation as a low-pass filter. (c): The dynamic equation of the membrane voltage in a population of spiking neurons in (b) pulse resets the capacitor voltage to its initial value and hence the pulses are continuously generated as long as the capacitor is charging/discharging. In the same figure (Left), the f synapse is represented as a low-pass filter whose input pulses are δ(t − tj ) and the output f pulses are α(t − tj ). The membrane voltage vm is simply denoted by the following equation that describe the charging of the membrane capacitance c as, c dvm = I(t) dt 35 (1.18) The above equation is written in discrete time at sampling time n and sampling time ts as, vm [n] − vm [n − 1] = I[n].ts c (1.19) In Fig. 1.13 (a), the input current I to the postsynaptic neuron i is due to the spikes of presynaptic neurons j. In the simplest model of a synapse, each presynaptic spike arrival evokes a postsynaptic current with a standard time course α. The total input to neuron i is denoted as, f Ii = j,f wij α(t − tj ) (1.20) f Where the sum runs over all firing times tj of all presynaptic neurons. The factor wij is the synaptic efficacy of a connection from a presynaptic neuron j to a postsynaptic neuron i. In Fig. 1.13 (b), the neuronal population is shown and the membrame potential for neuron i in Fig. 1.13 (c) is denoted as [23], v dvmi = − mi − dt τ N f j=1 f wij δ(t − tj ) + xi (t) (1.21) Where vmi is the membrane voltage for neuron i, wij is the weight from neuron j to neuron i that could be positive/ negative depending on the excitatory/ inhibitory connection of the f f presynapse, tj is the firing time of neuron i, δ(t − tj ) is the pre synapse pulses and xi (t) is an external exciting potential to the neuron cell. Different mathematical models have been introduced and implemented in hardware to simulate the dynamic characteristics of the IF model [43, 42]. In this work, a simple mathematical equation (1.19) is used to represent the IF model since it implements the integration and threshold operations. In analog circuits, the first analog IF model was introduced by Carver Mead in 1989 as a 36 Vdd ΔVmem Voltage c fb Vout Cf b Vmem Vmem Cm Vout Feedback Vpulse-width ΔVmem = C fb V dd Cm +Cfb Figure 1.14: Conceptual schematic of a circuit that generate pulses due to positive feedback through capacitor cf b across high gain buffer. The lower transistor adjust the pulse width during discharge phase of the capacitors core to a dense layout using a VLSI neuromorphic chip. The designed circuit shown in Fig. 1.14 is simple and easy to implement in hardware [1]. It consists of an integrating capacitor (representing the membrane capacitance of a neuron), a simple high gain amplifier (double inverter) that switches when the integrated input current exceeds its threshold voltage and a feedback capacitor that adds extra charge to the membrane capacitor to stabilize the firing state. The reset is achieved through a leakage that is turned on while the output is high. That leakage current determines the pulse length of the output pulse. When the output voltage is high, it resets the membrane voltage and the output pulse resets to the starting point again. At this point, the input current charges the capacitor cin until the input voltage of the amplifier reaches the threshold level at which in which it turns the output Vout into logic high. The feedback capacitor (Cf b and membrane capacitor Cm are considered here 37 a capacitor voltage divider which changes the timing duration of pulses and adjusts the change in slope of membrane voltage Vmem which decides the output pulse state Vout of the amplifier. Conceptual, this model is considered in two ways, the first one is to be considered as an analog-to-analog converter that transforms an analog current into an analog time interval. The pulse frequency is proportional to the analog input current. One advantage of this representation in the time domain is that one can convey analog data with a digital signal, for example for transmission via a digital wireless link. The second view is conceptually the same but has different names in that the IF model can be viewed as an analog-to-digital converter, where the output pulses of the network faithfully encodes different features of the input analog sensory stimuli into a time encoded signal or into a train of spikes [47, 48]. IF circuit is asynchronous, meaning it has no clocks, as compared to the conventional ADCs. The asynchronous property leads to a significantly lower output data rate and lower power consumption as compared to conventional ADCs [49]. In this work as explored in chapter 2, a more dense circuit of the IF is designed to implement the threshold operation. The designed circuit has only one small feedback capacitor equal to 200 fF. 1.5 Tasks and Methodology The methodology of achieving the proposed pulse computational model and hardware implementation is to design the analog circuit of the ”IF” unit as the core unit in the computational model, verify the dynamic characteristics and the convergence property in the cadence tool and then apply it in a novel design to implement spike time dependency. The ”IF” unit is then simulated in Matlab to verify and analyse the main concept of the ”IFMP algorithm”. 38 For hardware implementation of the IFMP algorithm, the simulated ”IFMP” is mapped and verified in analog circuit. The convergence of the ”IF” unit and the ”IFMP” algorithm are proven mathematically. In order to show that the concept of IFMP algorithm is applicable in computations and applications, the IFMP structures are simulated in Matlab to verify the basic computations ( addition, multiplication, power, inner product and polynomial) and applications (winner takes all, decoding and classification). An array of IFMP units is fabricated on a chip using 0.5 µ CMOS circuitry to verify the concept in hardware. The tasks to achieve the proposed computational model include the followings: 1. Design the analog circuit of the integrate and fire structure (IF), which is the basic unit of the pulse computational algorithm IFMP. The design of the analog IF implements the integration and threshold operations, which refer to the operations of integration the currents caused by the input pulses and then comparing the voltage produced by the integration with a certain threshold to produce output pulses. The first step in the successful IF design is to verify the logarithmic transfer function of the IF as the pulse rate output versus input current. The second step is to verify the convergence characteristic of spiking IF structure which implies that the expected value of a train of input pulses converges to the expected value of the output pulses. Here the expected value of pulses is defined as the average of pulse levels over a time period of these pulses. 2. Verify the convergence of one IF structure for more than one input in which the sum of expected values of input pulses are converges to the expected value of the output pulses.If the input pulses are denotes as Li , then the E(L1 ) + E(L2 ) + ...E(Ln ) −→ E(output), where ’i’ is the index of the input pulses, ’E’ is the expected or average 39 value of the pulses and ’n’ is the number of inputs. 3. Verify the concept of IF structure as a computational unit itself in the spike dependency plasticity, which is the learning concept in synaptic junctions of neurons. 4. Simulate steps 1, 2 and 3 in Matlab and verify the concept of approximating the logsum-exp into margin propagation algorithm in pulse mode. The design of the technique is implemented using a network of IF structures (IFMP) and simulated to verify the convergence of margin equation for all values of the adjusting parameter γ. 5. Derive and simulate how much the difference is between the IFMP algorithm and the ”log-sum -exp” function. 6. Using MOSFET transistors, design, simulate and verify the dynamic characteristics of an analog circuit implementation of the basic pulse computational algorithm IFMP. 7. Prove mathematically the convergence of one IF structure and IFMP algorithm. Specifically it is shown by induction that the convergence is valid if the integration of step voltage is bounded between two limits. 8. Prove and simulate the concept of cascading multi - two inputs IFMP structures which is equivalent to the parallel inputs to the IFMP. The importance here is to built scalable units of an IFMP array on silicon in order to implement computational function. 9. Simulate in Matlab and circuits the basic computation of nonlinear functions such as multiplication, power, polynomial and inner product by mapping the functions into the margin propagation method. 40 10. Simulate in Matlab and circuits the applications of sequence and trace detection using Hidden Markov model (HMM) by mapping the algorithm of HMM into margin propagation. 11. Simulate in Matlab and circuits binary classification using a Support vector machine method (SVM) by mapping the algorithm of SVM into margin propagation. 12. Show the significance of the IFMP algorithm in dynamic range capability compared with stochastic pulse computation using HMM decoding algorithm. 13. Show the significance of the IFMP algorithm that can implement easily spiking winner take all network compared with other architectures. 14. Design an array of analog IFMPs on a 0.5 µ CMOS process for testing and verification the dynamic characteristics of a single IFMP. 15. Design an array of analog IFMP on a 0.5 µ CMOS process for testing and verification the simulation procedure of HMM. 1.6 Thesis organization The thesis is organized as following: Chapter 2 introduces the circuit diagram of an integrate and fire model, the convergence performance of one neuron and an application of a single integrate and fire neuron model. In the application, a floating gate transistor is used as a storage device to modify the weight in a novel architecture of synapse plasticity ( modification of weight strength between synapses). However, the above design is limited to a specific task. The goal here is to built a scalable computational unit that can be plugged into many applications. Therefore, the main topic of the thesis is introduced in chapter 3 to built a 41 scalable computational unit, ”the integrate- fire structure and margin propagation method (IFMP)”. The chapter includes a proof of the convergence of the dynamic performance, the IFMP parameters and cascading, the properties of MP and basic computations examples of the IFMP. Chapter 4 introduces three examples of the proposed pulse computation algorithm to implement non-linear functions of signal processing algorithms. As a circuit simulation and layout design on a chip, the first example is a novel implementation of the Hidden Markov Model algorithm (HMM) to recognize sequences in pulse based computation. The first example is based on discrete probability observation while the second example is based on the continuous probability observation. The third example is an implementation of a support vector machine to classify two sets of data types. Chapter 5 explores the significance roles of using IFMP algorithmic module in the research space. In other words, how can the IFMP modules enhance the performance and add specific properties to the applications. Finally, chapter 6 conclude the thesis with a summary and future work recommendations. 42 Chapter 2 Integrate-Fire Model and Synaptic Plasticity A new design of the integrate and fire (IF) model is introduced with two concepts: 1. First concept: The generated pulses are proportional to the injected current to the neuron. 2. Second concept: The generated pulses are proportional to the input pulses which is refereed to a convergence status. Convergence status mean that the output pulse rate convergence in value to the input pulse rate. It is shown in this chapter that the first concept of the IF structure has computational capability in changing the synapse weight, yet it has a limited capability in implementing other functions. Therefor, the second concept of the IF model is introduced, in the next chapter, as the core of a scalable pulse computational algorithmic module which is the main topic of this thesis. In section 2.1, the new design of the IF model and the proof to the convergence status are discussed. In section 2.2, the first concept of generating the IF model is introduced to change the synapse weight in a mechanism called the spike time dependency plasticity (STDP). Section 2.3 is the summary of the chapter. 43 2.1 Pulse based IF The concept of the integrating and fire neuron model (IF) is to integrate the injected currents to increase the membrane voltage until reaching a threshold. Then the output of the IF model fires and resets the membrane voltage into its initial value. The membrane voltage starts increasing again according to the values of the injected currents. This concept is translated into analog circuit using an integrator and inverter as a comparetor circuit to compare the membrane voltage Vm with the threshold value (Fig. 2.1-b). Vm Integrator Vm Vout Inverter Vout (a ) (b ) Integrator Inverter Vout Vm in c (c ) (d) Figure 2.1: (a): Integrate and fire schematic circuit. (b): Integrator and cascoded inverter. (c): Transfer function of the IF unit. (d): The output voltage Vout , the membrane voltage Vm and input pin in of the integrator 44 The lower boundary is adjusted through biasing voltages Vb1 and Vb2 . The upper bound is equal to Vdd − 2Vef f , where Vdd = 3.3V , and Vef f ≈ 0.5V . Note that Vef f = Vgs − Vth . The biasing V1 , V2 , V3 are used to adjust the integrator in the high gain region. In this circuit, the biasing is adjusted so that the integration is bounded between 2.3V and 0.9V . Here, the lower bound is the threshold value and the upper limit is the reset value of the membrane. In other words, the integration of currents cause the membrane voltage to decrease from it’s restart value 2.3V until it reach the lower bound or the threshold (0.9 V). Initially, if the input of the integrator is zero, the outputs of the integrator and the cascoded inverter are equal to 3.3V and zero volt respectively. If the input voltage increases and reaches the high gain region of integrator amplifier, then the integration phase starts to built up which is considered as the discharging phase of the feedback capacitor. When the output voltage of the integrator Vm reaches the lower bound, then the outputs of the inverter Vout is turned into logic one (3.3V ). At this point, the output of the inverter is logic one which resets the output voltage of the integrator to the upper bound (charging phase of capacitor) through the feedback to the NMOS shown in Fig. 2.1(a). The cycle of charging and discharging the feedback capacitor C is repeated according to the amount of the injected current to the inputs of the integrator (’in’ node) as shown in Fig. 2.1(d). The injected currents to the integrator are applied respectively during off and on states of the input pulses L for the excitatory path (PMOS transistors) and through feedback pulses (output pulses) in the inhibitory path (NMOS transistors). Fig. 2.1(c) shows the relation between the injected current and the output rate. For a single neuron, it had been remarked that the relation between external stimulus and neural response is approximately logarithmic [50, 51] which has been noticed in the designed IF model too. This means that addition of two neural signals is performing multiplication of 45 the input signal. For example if x1 is the input to neuron A and x2 is the input to neuron B and if it is considered that the response of neuron is exactly log, then the sum of the output of neurons A and B is a multiplicative relation in log domain; log(x1 ) + log(x2 ) = log(x1 x2 ). Thus multiplication and division can be implemented by adding and subtracting outputs of neurons with a logarithmic transfer function [29]. The importance of the above is to show the computation aspect of the IF model that can encode two separate inputs into one output which is a detecting process (i.e: detecting the occurrence of timely simultaneous inputs yet spatially separate input signals). Next, the concept of the IF structure as pulse response with respect to pulse rate is introduced and analyzed. The change of the applied current to the integrator occurs by applying a stream of input pulses to the gates of the NMOS and PMOS switches at the input paths of the integrator. The change of the input rate is caused by changing the stream of the input pulses L. The circuit is designed to fulfill the convergence property for pulse rate convergence. It is important to set the parameters (sampling time of integration, capacitance, currents) of Fig 2.1 (a) such that the expected rate of the output pulses convergence to the expected rate of the input pulses. In order to do that the currents of both excitatory and inhibitory inputs are set to the same value. The convergence of IF means that the expected value of output spikes d[n] is equal to the expected value of the input spike Li [n] overall the samples, En {L[n]} = En {d[n]} (2.1) To prove the convergence of IF unit, the equation for the change in the step voltage is shown 46 in Fig. 2.1 (d) can be written as, V [n] = V [n − 1] − ts I (L[n] − d[n]) C (2.2) where V is the membrane voltage, n is the instance time, C is the capacitance that represent 𝐼 V[n-1] V[n] 𝑉𝑜𝑢𝑡 =d 𝑉𝑚 =V Convergence in one neuron 𝐸 𝑑𝑛 → 𝐸{𝐿 𝑛 } Step voltage= V[n-1] – V[n]= 𝐼.𝑡𝑠 𝐶 (𝐿 𝑛 − 𝑑 𝑛 ) Figure 2.2: IF model rate convergence the membrane, ts is the time sampling, I is the biasing current in the NMOS and PMOS paths, Li [n] is the Bernoulli random input variables and d[n] is the output spikes that take the values { 0,1 } according to the threshold voltage (Vth ) and the membrane voltage V , where, d[n] = 0.5[sgn(Vth − V [n]) + 1] (2.3) Then it must be proven that when the membrane voltage V [n] is bounded, then the expected value of the output d[n] is equal to the expected value of the input L[n]: if |V [n]| ≤ c then En {Li [n]} = En {di [n]} 47 where c is a constant. Let’s write the recursive equation 2.2 of the membrane potential, sum s I , then: them to deduce the expectation equation of input/ output and let α = tC V [n] = V [n − 1] − α(L[n] − d[n]) (2.4) V [n − 1] = V [n − 2] − α(L[n − 1] − d[n − 1]) (2.5) V [n − 2] = V [n − 3] − α(L[n − 2] − d[n − 2]) (2.6) V [1] = V [0] − α(L[1] − d[1]) (2.7) Until When these equations are added, divided by the number of samples N and then performing the limN →∞ of both side of the equation, the resultant equation will be as following: V [0] 1 V [n] = lim −α lim N N →∞ N N →∞ N N n=1 1 L[n] − N N d[n] (2.8) n=1 If |V [n]| is bounded (if |V [n]| ≤ constant ), then the equation 2.8 is written as: 1 lim N →∞ N N n=1 1 L[n] = lim N →∞ N N d[n] (2.9) n=1 Now, Let us return to equation 2.4: V [n] = V [n − 1] − α(L[n] − d[n]) where the output dn is equal to: d[n] = 0.5[sgn(Vth − V [n − 1]) + 1] (2.10) Let V [n] = Vth − V [n] and V [n − 1] = Vth − V [n − 1]. If the above equations are substituted 48 in equation 2.4 and 2.10 then the following equations are introduced as, d[n] = 0.5[sgn(V [n − 1]) + 1] (2.11) V [n] = V [n − 1] + α(L[n] − d[n]) (2.12) d [n] = sgn[V [n − 1]] (2.13) d[n] = 0.5[d [n] + 1] (2.14) V [n] = V [n − 1] + α(L[n] − 0.5d [n] − 0.5) (2.15) 2V [n] = 2V [n − 1] + α(2L[n] − 1 − d [n]) (2.16) and Now let Substitute equation 2.13 in 2.11 Substitute equation 2.14 in 2.12 Now Let L [n] = 2L[n] − 1 , V [n] = 2V [n] and V [n − 1] = V [n − 1] then (2.17) V [n] = 2V [n − 1] + α(L [n] − d [n]) (2.18) 49 Now substitute 2.13 and 2.17 to find d [n] such that d [n] = sgn[V [n − 1]] too. V [n] = V [n − 1] + α(L [n] − sgn(V [n − 1])) If V [n − 1] > 0 then V [n] = V [n − 1] + α(L [n] − 1) and V [n] ≤ V [n − 1] if V [n − 1] ≤ c then V [n] ≤ V [n − 1] ≤ c and if V [n − 1] < 0 then V [n] = V [n − 1] + α(L [n] + 1) and V [n] ≥ V [n − 1] and if V [n − 1] ≥ c then V [n] ≥ V [n − 1] ≥ c Therefore V [n] is bounded since |V [n]| ≤ c. 50 and hence, En {L[n]} = En {d[n]} 2.2 Spike time dependency plasticity In this section, the designed model of spiking IF neuron is used as a computational model itself to change the weight of synapses. The mechanism of changing the strength is called ”the spike time dependency plasticity” (STDP) which is implemented by changing the amount of the injected current into the neuron depending on the time difference between the pre and post pulses of the neurons. The change of the injected current depends on previous weight of the synapse that must be stored as reference. For that purpose, the change of weight is implemented using a storing mechanism using floating gate MOSFETs. The concept of synapse plasticity and a novel mechanism of changing the plasticity using floating gate transistors will be discussed next. The STDP is a process that adjusts the strength of connections between neurons in the brain. The adjustment of the connectivity strength is based on the relative timing of a particular neuron’s output and input action potentials (or spikes). The STDP process is a tentative candidate for a hypothesis that partially explains the development of an individual’s brain, especially with regards to long-term potentiation and long-term depression [52, 53]. While different models have been reported for describing STDP, all of them agree on the causal relationship between the pre- and post-synaptic spikes. The increase or decrease in the strength of the synapse (W ) depends on whether the pre-synaptic spike arrives before or after the post-synaptic spike and the time-duration between the pre and post synaptic spike. If the pre-synaptic spike was generated at time tj and the post-synaptic spike was 51 generated at time ti , then one form of STDP can be mathematically expressed as ∆Wij (∆t) ∝ sgn(∆t).A.e|∆t|/τ , (2.19) where ∆t = tj − ti , sgn(.) denotes the sign of ∆t, while the parameters A and τ are the amplitude and the time constant of synapse potential respectively. Depending on the retention time of the synapse, the increase in the weight value is referred to the short-term potentiation (STP) or long-term potentiation (LTP). On the other hand, the decrease of the weight value is referred to the short-term depression (STD) or long-term depression (LTD). Fig. 2.3 and 2.6 (a) show the behavior of changing the weight corresponding to the difference % synaptic change between the time difference between the pre and the post pulses. LTP 0 LTD Δt: t 0pre-post Figure 2.3: The STDP modification function. The synaptic change with ∆T , where ∆T = tpre − tpost In this section, it is shown a novel synapse circuit to change the strength of synaptic connectivity using a nonvolatile memory of the floating gate type. Before introducing the synapse circuit, an overview of the floating gate will be explained below. 52 The floating-gate MOSFET (FGMOS) is a field-effect transistor, whose structure is similar to a conventional MOSFET. The FGMOS is used as storage element in digital EPROM and flash memories or in analog elements in neuronal computational networks [54]. The gate of the FGMOS is electrically isolated, creating a floating node. A simplified implementation of the conventional FG memory cell is shown in Fig. 2.4, where the gate of a PMOS transistor M1 is used for storing charge. Because the gate is completely insulated by high quality silicon-dioxide, any charge trapped on the gate is retained for a long period of time (8 bits retention accuracy for about 8 years) [55, 56]. The charge on the floating-gate can be modified either by inducing a large electric field at the drain of the transistor for hot electron injection or by Fowler-Nordheim (FN) tunneling of electrons through a parasitic capacitor Ctun [57]. The total amount of charge on the floating-gate and the magnitude of the applied control gate voltage Vcg determines the floating-gate voltage Vf g . The drain current of M1 which is a function of Vf g can be copied and used for biasing or for auxiliary analog signal processing functions (shown in Fig.2.4). Vtun source + - M1 current Inject Vtg Ctg Vinj drain Floating gate Ctun Vcg source + - I(Vtg) current Tunnel drain Floating gate Figure 2.4: Floating gate memory Usually Fowler-Nordheim tunneling and hot-carrier injection mechanisms are used to 53 modify the amount of charge stored in the FG [58]. The charge on the floating-gate is modified using hot-electron injection voltage (4.2V in 0.5µm process) and tunneling voltage (15V in 0.5µm process). Since the first reported floating-gate structure in 1967 [59], floating- gate transistors have been used widely to store digital information for long periods in structures such as EPROMs and EEPROMs. Recently, floating-gate devices have found applications as analog memories, analog and digital circuit elements, and neuronal computational networks as adaptive processing elements. In neuronal networks, the choice of changing the weight in the learning rule is the (STDP) rule. Recent implementations of STDP learning in VLSI spiking networks demonstrate the use of these networks in classification and computational tasks [60, 61, 62]. However, the slowest weight update rate and time constant in most of these implementations is limited to the transistor leakage currents in the chosen fabrication process. To obtain longer update time constants for the synaptic weights, the use of floating-gate technology to update the weight is described as another implementation of the STDP learning rule. The synaptic weight is set by the current flowing through a pFET transistor whose gate is a floating node. The charge on the floating node is removed or added using tunneling and injection. Two main reasons to use the floating gates in STDP learning mechanism in the neuronal networks; the storage facility of the weight and good energy efficiency [53, 54, 57, 63, 64, 65]. Toward the above goals, a synapse design is suggested for the STDP learning rule. The circuit uses the floating gate as a storage device for the weight. The adjustment of the weight in the designed circuit is implemented in a different way such that the injection and the tunneling processes change the current to the IF neuron discussed earlier. The adjustment of weight is implemented in an instantaneous way by applying the following pulses: pre synapse pulse, post synapse pulse, pre shaped synapse pulse and post-shaped 54 synapse pulse. The pre synapse pulse, post synapse pulse are input/ output pulses of a neuron respectively. The pre shape pulses and post synapse pulses impose a delay to the incoming signals in order to allow weight modification before losing the occurrence of these pulses. The detail of the designed circuit is listed below, the circuit description and the simulation result of each part is demonstrated. The synapse design is fabricated on (1.5mm * 1.5mm) chip as an array of 12 synapse * 8 neurons and also on (3mm*3mm) chip as an array of 25 synapse * 30 neurons. The connectivity between the neurons on the fabricated chip can be implemented using a look up table and state machine program configured in field-programmable gate array (FPGA). The role of FPGA is to process the output of the chip and route the pulses between the rows and columns of synapses which are connected directly as a column vector to the neurons. Fig 2.5 is the architecture for the STDP learning that include the following units: 1. Synapse unit (S): it represents the junction between neurons. The inputs to the synapse unit are the pre pulse signal, post pulse signal, pre shape signal and post shape signals. The output of the synapse is the current applied to the neuron. The current here is either to increase or decrease the pulse rate of IF neuron. Note that more than one synapse is connected to the input of neurons. The concept here is that the pulses are applied in time multiplexing using a high speed FPGA for generating pulses as if the pulses are generated simultaneously. This is the concept of distributed parallel processing that is used in neuromprphic design. 2. Neuron unit (N): It represents the IF unit discussed in the previous section. 3. Post shape pulse unit: it receives the neuron output pulses and introduces a delay so that the synapse unit keeps track of timing sequences between the post pulse and 55 Pre shape pulse row selector S S S Pre shape pulse S S S Pre shape pulse S S S N N N Pre synapse pulse from FPGA Post shape pulse S N Synapse Neuron Post shape pulse Post shape pulse Post synapse pulse to FPGA Figure 2.5: STDP learning using neuronal architecture showing the connectivity between the synapses, neurons, post shape, pre shape and shift register modules pre pulse. This Process is necessary for the tunneling phase in the FG because the tunneling happens when the post pulse come before the pre pulse. Therefore, the post pulse has the chance to stay a longer time than the pulse itself so that the difference voltage between the pre pulse and post shape signal is high enough for tunnelling. The duration of tunnelling phase and the amount of tunnelling voltage decide how much increment in the floating gate voltage that deceases the current applied to the neuron and eventually decrease the pulse rate. The change of the weight depends on the time difference between tpre and tpost shown in the LTD part of the Fig. 2.6 (a). In the tunneling mode of the floating gate, the required voltage V T to transfer the electron 56 Wij t pre, j t post, i ∆Wij/Wij 1 0 -0.5 (a) Pre before post Injection tj -ti VT Tunneling : VT >15v Pre M1 bias M2 Vfg (b) Pre Tunneling Vss Pre Pre shape bias Vds Post, i Current to the IF model Pre, j ~Pre ~Post shape ~Post Pre shape Post ~ Post Post shape (c) W: W: (d) Figure 2.6: (a): Learning curve of STDP as the change in the weight versus the time interval between tpre and tpost . (b): The synapse circuit shows the injection and tunneling needed in the floating gate to adjust the current in pMOS M 1 and M 2. (c): Timing sequence for the injection and tunneling in the synapse -floating gate of (b). (d): The Layout of synapse of (b) out of the gate, where V T is > 15v. The post shape circuit, the layout and a graphic simulation of the shaped circuit is shown in Fig. 2.7 4. Row selector: It represents the shift register to enable and select the rows to input the pre pulse from the FPGA into pre shape stage. 5. Pre shape pulse unit: It includes first a level shifter to raise the voltage level to a required level (from 3.3 v to 6 v) to validate the required voltages for injection and tunnelling of electrons (in 0.5µm process), hot-electron injection requires a voltage 57 difference of 4.2v between the source and drain. This unit is used to introduce a delay time to the pre pulse in order to keep track of the timing sequence between the pre pulse and the post pulse. This Process is necessary for the injection phase in the FG because the injection happens when the pre pulse comes before the post pulse. Therefore, the pre pulse has the chance to stay longer time than the pulse it self so that the difference voltage between the pre pulse and post shape signal is high enough for injection. The duration of injection phase and the amount of injection voltage decide how much decrements in the the floating gate voltage that increase the current applied to the neuron and eventually increase the pulse rate. Also, the change of the weight depends on the time difference between tpre and tpost in the potentiation of weight (”pre before post” part shown in Fig. 2.6 (a)). The pre shape circuit, the layout and a graphic simulation of the shaped circuit are shown in Fig. 2.8. An example of circuit simulation for STDP design is shown in Fig. 2.9. In this experiment ten synapse circuits are connected to one IF neuron model. In the simulation result, it is shown that the floating gate voltage (vf g ) decreases whenever there is a pre pulse pre at which the integration step occurs. Notice that as the vf g decrease more ,the current increases in the PMOS transistor M1 and M2 in Fig. 2.6 (b). Thus the integration of the current in the IF capacitor increases and causes the membrane voltage of the integrator to increase too. That will cause the membrane potential to reach the threshold level earlier and generate output pulse earlier too. Hence the time between the generating output pulses by the IF will decreases (lower time intervals between the post pulses ”post”) and in other word, the pulse rate of the IF will increase. In the figure, it is shown two time intervals, the first one is 70ms where tpre − tpost is less than that in the second interval 100ms. This means that the 58 ~Post 1pF 3.3v 2.3v 1.9v 1.5v Post, v 0.5v post shape, v Transient response ~ Post shape 3 2 1 0 3 2 1 0 0 2.5 (a) 5 7.5 Time (ms) 10 (b) (c) Figure 2.7: Post pulse shape circuit integration occurs earlier in the first case and hence the time to reach the threshold to spike is shorter and thus the rate is faster. Note of the first case has almost simultaneous intervals ”preshape” and ”post” pulses which realize the injection mode of the floating gate while in the second interval the post pulse comes before the pre pulse which realize the tunnel mode of floating gate. The integration in the membrane capacitor vmem is shown correspondingly in the same figure. The layout for the fabricated chips are shown in Fig.2.10 for 12 synapses * 8 neuron and for 25 synapses * 30 neuron. The above implementation is to show the computation power of the IF-synapse module in the learning field at the level of one neuron. The synapse 59 Transient response Pre, v ~Pre 1pF 6v 5.28v 6v 3 2 1 pre shape, v 0 5v 1.5v Pre shape 5 0 0 0.6v 2.5 (a) 5 7.5 Time (ms) (b) 10 (c) Figure 2.8: Pre pulse shape circuit unit and the IF structure modify the weight by finding the time difference between post and pre pulses, decide the change of the weight which is translated to a certain amount of an injected current and modifies the membrane potential in the IF unit. However, the computation complexity increases in a population of the neuronal network. In the next chapter, a network of IF units is designed as a basic novel pulse based computational unit to encode, compute, recognize input events, map and implement non linear functions. 60 V fg pre Pre shape Post shape post mem 250 300 350 100ms 400 70ms Figure 2.9: Transient simulation of the STDP circuit 2.3 Summary In this chapter, a new model of the integrate and fire neuron is designed. The model is applied to change the synapse weight using a storage transistor ” floating gate” using a novel synapse circuit. The applied example discussed in the chapter is to show the computational capability of the integrate and fire model. Yet, the integrate and fire model is limited in computing other functions. Therefore, the integrate and fire model is reconsidered to verify the convergence capability in order to use this model as the core of the basic designed pulse computational algorithmic model in this thesis. The convergence capability for the integrate and fire neuron means that the expected rate of the output pulse approach the expected rate 61 Pre-shape Neuron Post-shape synapse Pre-shape 25 synapse, 30 neuron-1.5m*1.5m Neuron Post-shape synapse 12 synapse, 8neuron-1.5m*1.5m Figure 2.10: Layout of the STDP architecture. The two chips are identical but are different in density of the input pulses. The applied pulses is stochastic where the input/ output parameters are quantified by the average number one’s to the time period. 62 Chapter 3 IFMP In this chapter, a scalable pulse computational module using Integrate - Fire structure and Margin Propagation (IFMP), as the main topic of the thesis, will be discussed in details. Section 3.1 includes a description of the dynamic performance, the analysis of different parameters in the module and the cascading concept of the IFMP. Section 3.2 includes the proofs and simulations of the margin propagation properties. Section 3.3 demonstrates examples of the basic computations using IFMP including the addition, multiplication, division, power, polynomial, and inner product functions. Section 3.4 shows the analog circuit of IFMP module and the measurement test of the IFMP chip. The final section summaries the chapter. 3.1 IFMP: concepts and analysis The strategy of processing information and the connectivity between neurons in a population of neurons in the brain is not known yet. There are different types of connectivity schemes between neurons in the population of spiking neural networks. Some of these neurons act as an excitatory and others as inhibitory neurons. If the excitatory/inhibitory neurons are connected to a particular neuron in the population through synapse, then the spike rate of that neuron will increase/decrease accordingly. The proposed model is inspired from the neuronal computation of the receptive field in the biological sensory system. In the receptive 63 field, the afferent neurons are excited by the external stimuli. The excited spot of the input stimulus causes high spike rate generated by the afferent neurons of the receptive field. These neurons inhibit the surrounded afferent neurons through inhibitory synapses in order to maximize its effect and transmit the spikes to the higher neuronal level as shown in the neurobiology sensory system figure 3.1 [5]. This observation matched with the log-sum-exp function discussed earlier which maximize the input value. Inhibitory connection Synapse Higher level neurons Afferent neurons Receptive field Sensors Basic sensory system Spot of stimuli neuron Figure 3.1: Organization of a generic neurobiological sensory system, image source: Wikipedia and Ref. [5, 66] It is well known that the simplest form of neural coding is the rate-based encoding [11] 64 which computes the instantaneous spiking-rate of the ith neuron Ri (t) according to Ri (t) = 1 t+T ρi (t)dt, T t (3.1) where ρi (t) denotes the spike-train generated by the ith neuron using a sequence of timeshifted Kronecker-delta functions as ρ(t) = ∞ m=1 δ(t − tm ) (3.2) +∞ where δ(t) = 0 for t = 0 and −∞ δ(τ )dτ = 1. tm is the firing time of the spike or the delta pulse. Since pulse rate is one way to encode information, a pulsed neuronal model is proposed to implement pulse rate computation with a novel strategy of excitatory and inhibitory neurons [67]. In this proposal, an architecture is presented which uses the integrate-and-fire structure to encode the analog input currents passing through switches controlled by a stream of input pulses. These neuronal structures are connected with lateral inhibition which prevents them from spiking simultaneously, and has the effect of distributing their sampling uniformly in time. The new structure has N feed back inhibition which is less than the full connection of N(N-1) inhibition connections used for example in [68]. Fig. 3.2 shows a schematic diagram of the proposed spiking network. The network is referred to as an IFMP model and consists of three integrate-and-fire structures N1 , N2 and N3 . The excitatory/inhibitory inputs are represented by black (white) triangles and only modules N1 and N2 have self-inhibitory feedback connections. Given the rate of input spiketrains L1 [n] and L2 [n] with n being a discrete time-index, it can be shown that firing-rate of 65 Excitatory input Inhibitory input Integrator Threshold ʃ [.]+ + Ʃ - Feedback current L1 N1 [L1-z]+ z N3 L2 N2 [L2-z]+ ɤ Figure 3.2: Schematic of the proposed IFMP unit comprising of three integrate-and-fire modules the output Lz [n] (denoted by E(Lz ) = limT →∞ T1 T n=1 Lz [n]) asymptotically satisfies the following equation, [E(L1 [n]) − E(Lz [n])]+ + [E(L2 [n]) − E(Lz [n])]+ −→ E(γ[n]) (3.3) and in general for p inputs, P i=1 [E(Li [n]) − E(Lz [n])]+ −→ E(γ[n]) (3.4) Note that the equation 3.4 converges only in probability. The difference between the left 66 0.4 Error 0.3 0.2 0.1 0 100 500 1500 3000 5000 10000 20000 30000 Number of iterations (a) Error 0.03 0.02 0.01 0 3000 5000 10000 20000 Number of iterations (b) 30000 Figure 3.3: a: The error bar between the theoretical and practical rates with respect to iterations for iterations greater than 3000. b: Error bars for iterations greater than 100 and right hand side of the above equation decreases as the time increases (or the number of stream sequence of the random inputs increases denoted as n). Hence the summation of the expected values of the input stream converges to the expected values of the output stream. Fig. 3.3-a shows the absolute error bars versus the number of iteration for the input stream where the error is calculated as the difference between the theoretical and the simulated output rate. In the figure, it is shown that the absolute error bars decrease with iteration number starting from 100 to 30,000. The nine bars for each iteration represent the error when the input rate for one input changes from [0.1:0.9] where the bound for the absolute 67 error is less than 0.003. In Fig. 3.3-b, it is shown a zoomed out picture for iterations starting from 3,000 to 30,000 and how is the error decreasing significantly.. Fig.3.4 shows the plot of instantaneous spiking-rates for N1 ,N2 and N3 , when input rate of the inputs are varied as shown in Fig. 3.4 (below). In this experiment, γ = 0.3 and the input rate L2 = 0.5 for N2 while input rate L1 for N1 increase from 0 to 1. The dynamic of the figure follows the IFMP equation 3.4 such that [L1 − z]+ + [L2 − z]+ = γ. Initially, when L1 is between 0 and 0.3, then the output rate of N1 , N2 and N3 is equal to 0, 0.3 and 0.2 respectively. When L1 is 0.35, then the output rate of units N1 , N2 and N3 are 0.05, 0.25 and 0.25 respectively. When L1 is 0.6, then the output rate of units N1 , N2 and N3 are 0.2, 0.1 and 0.4 respectively and so on the sum of the output rates for the first two IF units convergence to a rate equal γ all the time in which the dynamics of IFMP satisfies equation 3.4 as shown in Fig. 3.4. To prove the convergence of equation 3.4 above, it is necessary to prove initially the convergence of each neuron and then the convergence of the IFMP network. The convergence of one neuron was proven in chapter 2. The convergence equation for the IFMP in Fig. 3.2 is listed as, P i=1 [E(Li [n]) − E(Lz [n])]+ −→ E(γ[n]) or [En {T1 [n]} + En {T2 [n]}] = E{γ[n]} where, T1 [n] = E(L1 [n]) − E(Lz [n])+ and (3.5) T2 [n] = E(L2 [n]) − E(Lz [n])+ The equation for the membrane voltage for neurons 1, 2 and 3 can be written as, V1 [n] = V1 [n − 1] − α(L1 [n] − T1 [n] − z[n]) 68 (3.6) 0.8 Output rate 0.6 N3 N1 N2 L1 L2 0.4 Input rate 0.2 0 1 0.5 0 0 8 16 24 Time(sec) 32 40 Figure 3.4: Spike-rates for neurons N1 , N2 and N3 when the spike-rate of L1 is monotonically increased. For interpretation of references to color in this figure and all other figures, the reader referred to the electronic version of this thesis V2 [n] = V2 [n − 1] − α(L2 [n] − T2 [n] − z[n]) (3.7) V3 [n] = V3 [n − 1] − α(T1 [n] + T2 [n] − γ[n]) (3.8) The membrane potential V3 [n] in the IFMP is designed to be bounded between threshold voltage and restart voltage (initial voltage). Let T1 , T2 , and z be the output spike rate for structures N1 , N2 and N3 respectively and γ is the inhibitory input rate for unit N3 . Then resumes proving the convergence by writing recursively the membrane equation for N1 starting from the first recursion as follows, V1 [n − 1] = V1 [n − 2] − α(L1 [n − 1] − T1 [n − 1] − z[n − 1]) 69 (3.9) V1 [n − 2] = V1 [n − 3] − α(L1 [n − 2] − T1 [n − 2] − z[n − 2]) (3.10) Writing the recursive equations for the previous step times will get eventually the initial recursive equation as follows, V1 [1] = V1 [0] − α(L1 [1] − T1 [1] − z[1]) (3.11) Next, adding all the above recursive equations and divide them by N will have the followings, V [0] 1 V1 [n] = 1 − α( N N N N i=1 1 L1 [i] − N N i=1 1 T1 [i] − N N z[i]) (3.12) i=1 since |V1 [n]| is bounded, then the limN →∞ of the above equation becomes as, [E{L1 [n]} − En {z[n]}] = En {T1 [n]} (3.13) [En {L2 [n]} − En {z[n]}] = En {T2 [n]} (3.14) Similarly for N2 , For N3 , the membrane potential is listed below: V3 [n] = V3 [n − 1] − α(T1 [n] + T2 [n] − γ[n]) (3.15) Since V3 is bounded, then the expected value of the output for neuron 3 will be as : [En {T1 [n]}] + En {T2 [n]} = En {γ[n]} (3.16) Now, substitute equations (3.13) and (3.14) in (3.16), then the convergence expectation 70 equation is written as follows, [En {L1 [n]} − En {z[n]}] + [En {L2 [n]} − En {z[n]}] = [En {γ[n]}] 1 z, output rate 0.8 (3.17) Theoritical (MP) IFPM γ =0.01 0.6 0.4 γ=0.56 0.2 0 0 0.2 0.4 0.6 0.8 L1, input rate, ( L2=0.5, γ=0.01:0.05:0.56) 1 Figure 3.5: Dynamic characteristics of IFMP unit for different values of γ for MP and IFMP Fig.3.5 shows the plot of instantaneous spiking-rates for N3 , when spiking-rate of the input rate L1 is varied from 0.01 to 0.99. For this result, the spiking-rate for input L2 is kept constant to 0.5 as γ changes from 0.01 : 0.05 : 0.56. The plot shows that the spiking-rate of N3 increases according to a piece-wise linear approximation to the margin propagation function. It is proven in [40] that the margin propagation (MP) is an approximation to the log- 71 sum-exp. But how much the approximation is and what are the parameters involved in the approximation? How successful is the cascading of the approximated model? To answer these questions, the followings are stated: Let zM P is the output pulse rate for an IFMP unit of m inputs of inputs L1 , L2 ..Lm and let zLSE = log( m m Li i e ) is a solution to e[Li −zLSE ]+ = 1 (3.18) i where {zLSE : Rm −→ R}, then using the fact that e[Li −zLSE ] ≥ [1 + Li − zLSE ]+ then m i [1 + Li − zLSE ]+ ≥ 1 (3.19) To make the normalization ideal, the above equation is equated to one, m i [1 + Li − zLSE ]+ = 1 (3.20) m i [Li − zM P ]+ = 1 (3.21) where zM P is the approximated MP value to the log -sum -exp as zM P = zLSE − 1 If the normalization factor changes to zLSE /γ = log( m e[Li −zLSE ]/γ = 1 i 72 m (Li /γ) ) i e as a solution to (3.22) MP is approximated as following, (see Fig. 3.7-a) m i [1 + (Li − zLSE )/γ]+ ≥ 1 (3.23) m i [γ + Li − zLSE ]+ ≥ γ (3.24) [γ + Li − zLSE ]+ = γ (3.25) m i m i [Li − zM P ]+ = γ (3.26) such that zM P = zLSE − γ Fig. 3.6 shows the above approximation which is equal to γ between zLSE and zM P , zIF M P where zIF M P = zM P The MP formulation can be mapped onto a cascaded topology by rewriting the LSE equation ” zLSE = log( n (Li ) ) i e ” in a recursive form as following, n−1 e(Li ) + e(Ln ) ) zLSE [n] = log( (3.27) i log( zLSE [n] = log(e n−1 e(Li ) ) i + e(Ln ) ) (z ) zLSE [n] = log(e LSE[n−1] + e(Ln ) ) (3.28) (3.29) Above is the equation for cascading the MP units. The margin of the first unit ( Fig. 3.7-b) is represented as , [L1 − zM P 1 ]+ + [L2 − zM P 1 ]+ = γ 73 (3.30) 1 zLSE, zMP,zIFMP 0.8 MP IFMP LSE 0.6 0.4 zLSE /γ=log (eL1/ γ + eL2/ γ ) [L1 −zLSE +γ ]+ + [L2 −zLSE+γ]+ =γ , zLSE =zMP + γ, zMP = zIFMP 0.2 0 0 0.2 0.4 0.6 L1 (L2=0.1, γ=0.1) 0.8 1 Figure 3.6: Approximation of rate z between the log-sum-exp rate (zLSE ), Margin Propagation rate (zM P ) and Integrate-Fire Marginn Propagation (zIF M P )  zMP Li MP  zMP1 L1 MP L2  z LSE  (a)  zMP 2 zLSE1 MP L3  zLSE 2 MP L4 (b) Figure 3.7: (a): Block diagram that shows the approximation between LSE and IFMP. (b): Cascading of IFPM structure Let L2 = −∞ for the sake of clarity to show the cascading property. It was shown earlier 74 that zLSE = zM P + γ then for the second block, [zLSE1 − zM P 2 ]+ + [L2 − zM P 2 ]+ = γ (3.31) [zLSE1 − [zLSE2 − γ]]+ + [L2 − [zLSE2 − γ]]+ = γ (3.32) [zLSE1 − zLSE2 + γ]+ + [L2 − zLSE2 + γ]+ = γ (3.33) Denote zLSE[n] and zLSE[n−1] are equal to zm [n] and zm [n − 1], therefore, [zm [n − 1] − zm [n] + γ]+ + [Ln − zm [n] + γ]+ = γ (3.34) Equation 3.34 can now be implemented using only two-input IFMP (2-IFMP) units in which it is extended to implement a complex spiking neural network as shown in Fig.3.7 (b). An array of 2-IFMP units is integrated on silicon and the connectivity to the array is potentially achieved using an FPGA. The FPGA could also implement the synapse by storing the synaptic weights on a high-density digital memory. Fig. 3.8 shows the response of five inputs to log-sum exp function that is identical with cascaded margin in series, cascaded margin in parallel, cascaded pulsed margin (IFMP) in series and cascaded pulsed margin (IFMP) in parallel. In [40], it is shown that the approximation between log-sum-exp and MP is enhanced as the number of inputs increases. In this work, the relation is shown between the γ constraint and the number of inputs for the exact match. Let L1 , L2 ..Ln be the input scores to the log-sum-exp formula and these inputs are of equal values then, zLSE = log(e(L1 ) + e(L2 ) + ..e(Ln ) ) 75 (3.35) 1 0.8 0.6 Log−sum−exp Margin Series cascade Margin Parallel Pulsed Margin Series Cascade Pulsed Margin Parallel 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Figure 3.8: Implementation results of pulsed serial cascaded and parallel IFMP module approximation compared to the log-sum-exp function and the real values serial and parallel MP, The y-axis represents the output rate (z) for the different cases showed in the legend and the x-axis represents the change of rate for one input while keeping the other inputs at constant rates zLSE = log(ne(L) ) (3.36) In MP with the same assumption related the input scores assuming that the inputs are greater that zM P (see Fig. 3.8) n[L − zM P ]+ = γ (3.37) n[L − zLSE + γ]+ = γ (3.38) The following equation is the result of substituting equation 3.36 in the above equation, nL − n(log(ne(L) ) + nγ = γ (3.39) Then gamma needed for exact matching corresponds to the number of input scores (n) as 76 following: γ = −nlog(n)/(1 − n) (3.40) For two different input scores L1 and L2 , the value of gamma that matches the exact logsum-exp value of zLSE to the approximated value of zM P will substituted in the following log-sum-exp and the margin equations respectively as, zLSE = log(eL1 + eL2 ) (3.41) [L1 − zM P ]+ + [L2 − zM P ]+ = γ (3.42) Then the above equation is substituted with the exact value of zM P , [L1 − zLSE + γ]+ + [L2 − zLSE + γ]+ = γ (3.43) If the input scores are greater than zM P , then the value of γ will be, γ = 2zLSE − L1 − L2 (3.44) In general, for n input scores L > zM P and m input scores L < zM P , γ is calculated for best matching between the exact log-sum-exp and the approximated margin propagation values as following: For the log-sum-exp value,zLSE is calculated as, N log(eLi ) zLSE = (3.45) i=1 Here N=n+m input scores where n and m represent the number of inputs greater and less 77 than zM P respectively. Therefore, the approximated margin propagation values zLSE for n inputs is written as, n i=1 Li − nzLSE + nγ = γ zLSE = n i=1 Li + (n − 1)γ n (3.46) (3.47) Equate the two equations 3.45 and 3.47 will get the followings, N n i=1 Li log(eLi ) = i=1 γ= n + (n − 1)γ n N L i=1 log(e i ) − n i=1 Li n−1 (3.48) (3.49) In the above derivations, it is shown the convergence and approximation characteristics of the IFMP related the constraint γ. In the next section, the properties of the IFMP is shown with proofs and simulation plots. 3.2 IFMP properties In [69], the properties and proofs were mentioned for analog margin propagation. In this work, the same properties are tested in pulse computation mode. Experiments are implemented for both the analog and pulse based margin propagation and shows the similarity between the two modes. The properties and proofs in [69] are listed below: Since margin approximation uses thresholds [x]+ = max(x; 0), then two lemmas are listed as facts to be used later for proving some of the other properties. Lemma 1: ∀a, b ∈ R, [a]+ − [b]+ ≤ [a − b]+ . 78 Lemma 1: ∀a, b ∈ R, [a]+ + [b]+ ≤ [a − b]+ . Property 1 (P1 , Scaling Property): For any α ∈ R, α > 0 and a set of series L = {Li }, i = 1..N, M (αL, αγ) = αM (L, γ). Here αL = {αLi }, i = 1, .., N. The proof of this property is simple: if the condition N i=1 [Li − z]+ = γ is satisfied, then the following condition N i=1 [αLi − αz]+ = αγ is also satisfied. This property indicates that the threshold z scales with the scaling of the log-likelihood scores and the parameter γ. Property 2 (P2 , Monotonicity): Given a set of scores L = {Li }, i = 1, ..., N, and if γ1 ≥ γ2 ≥ 0, then M (L, γ1 ) ≤ M (L, γ2 ). One of the important implications of the monotonicity is the asymptotic property when γ → 0 and is given by limγ→0 M (L, γ) = max(L). Proof: Given N N i=1 [Li − z1 ]+ = γ1 , i=1 [Li − z2 ]+ = γ2 , γ1 ≥ γ2 ≥ 0, (3.50) To prove the equation z1 ≤ z2 ., then the two equations in (3.50) are subtracted to get the following, N i=1 Based on Lemma 1, (3.51), it is deduced that [Li − z1 ]+ − [Li − z2 ]+ = γ1 − γ2 ≥ 0, N i=1 [z2 N i=1 [z2 N i=1 [Li − z1 ]+ ≥ (3.51) − z1 ]+ − [Li − z2 ]+ . combined with − z1 ]+ ≥ 0 Thus z1 ≤ z2 . Property 3 (P3 , Convexity Property): Given a set of coefficients {αk } satisfying 0 ≤ αk ≤ 1 and k αk = 1, given a set of hyper-parameters {γk }, M (L, 79 k αk γk ) ≥ rate, z P1: α M(Li, γ)= M(α Li, α γ) 0.8 MP1:α M(Li, γ) 0.7 IFMP1: α M(Li, γ) 0.6 MP: M(α Li, α γ) 0.5 IFMP1: M(α Li, α γ) 0.4 0.3 0.2 0.1 0.1 0.15 0.2 0.25 L1 (α= 2.4, L2= 0.2) 0.3 0.35 Figure 3.9: Scaling property for spiking IFMP compared to the continous values of margin propogation k αk M (L, γk ). Proof: Given the same group of Li s, and a set of γk , k = 1, .., n. N i=1 N [Li − z1 ]+ = γ1 , ... i=1 [Li − zn ]+ = γn . Based on property 1, equation (3.52) is transformed as, N i=1 N [α1 Li − α1 z1 ]+ = α1 γ1 , ... i=1 [αn Li − αn zn ]+ = αn γn . Add up the above equations, and according to Lemma 2 will get, 80 (3.52) P2: M(Li, γ1) <= M(Li, γ2) rate, z 0.8 IFMP, L1=0.8 0.7 MP, L1=0.8 0.6 IFMP, L =0.85 0.5 MP, L1=0.85 1 0.4 0.3 0.2 0.1 0.2 0.3 0.4 0.5 γ (L2= 0.2) 0.6 0.7 Figure 3.10: Monoticity propertiy for spiking IFMP compared to the continous values of margin propogation n N i=1 Denote z satisfies equation z ≥ N i=1 [Li n k=1 αk zk [Li − k=1 n α k z k ]+ ≥ − z1 ]+ = αk γk . (3.53) k=1 n k=1 αk γk . Based on property 1, the following is obtained. Therefore, M (L, k αk γk ) ≥ k αk M (L, γk ) Property 4 (P4 , Superposition Property): Given two sets of scores L and Gof size N with a well defined ordering, and if L + G represent an element by element scalar addition, then M (L + G, γ) ≤ M (L, γ) + M (G, γ). Proof: Given, N i=1 N [Li − z1 ]+ = γ, i=1 81 [gi − z2 ]+ = γ, (3.54) P3 : M(L, 0.5 M(L, rate, z 0.45 M(L, k (αk γk )) ≥ k (αk M(L, γk )) k (αk γk )) k (αk γk )) 0.4 k (αk M(L, γk )) 0.35 k (αk M(L, γk )) 0.3 0.25 0.2 0.3 0.4 0.5 0.6 L (L = 0.6, γ=0.3:0.1:0.6, α=0.1:0.1:0.4) 1 0.7 2 Figure 3.11: Convexity property for spiking IFMP compared to the continous values of margin propogation What is needed to prove is: if N i=1 [Li + gi − z3 ]+ = γ, (3.55) then z3 ≤ z1 + z2 . All the Li s are sorted and rearranged in a decreasing order so that L1 ≥ L2 ≥ L3 ... ≥ LN , then N1 is used to represent the number of Li above the threshold z1 . The same operations are done to gi s. All gi s are stored and rearranged decreasingly as g1 ≥ g2 ≥ g3 ... ≥ gN . And the number of gi s above the threshold z2 is N2 . Rewrite equation (3.54) as, N i=1 N Li − N1 z1 = γ, i=1 82 gi − N2 z2 = γ, (3.56) 0.7 P4 : M(L + G, γ) ≤ M(L, γ) + M(G, γ) 0.6 rate, z 0.5 0.4 MP, M(L + G, γ) 0.3 IF MP, M(L + G, γ) MP, M(L, γ) + M(G, γ) 0.2 IF MP, M(L, γ) + M(G, γ) 0.5 0.55 0.6 0.65 L1 (L2= 0.3, L3=0.3, L4=0.2) 0.7 Figure 3.12: Margin propogation property for spiking IFMP compared to the continous values of margin propogation. Assume N2 ≥ N1 then from the latter equation in (3.56), the following will have, N1 0≤ i=1 gi − N1 z2 ≤ γ. (3.57) Thus by adding the two equations in (3.56), the following will have, N1 γ≤ N1 Li + i=1 i=1 Equation (3.58) can be transformed as gi − N1 z1 − N1 z2 ≤ 2γ. N1 i=1 [Li + gi − (z1 + z2 )] (3.58) ≥ γ from which can be deduced that N i=1 [Li + gi − (z1 + z2 )]+ ≥ γ, 83 (3.59) Given (3.55) and (3.59), based on Property 2, it can be deduced that z3 ≤ z1 + z2 and vice versa if N 1 ≥ N2 is assumed. If written in MP function, it is M (L + G, γ) ≤ M (L, γ) + M (G, γ). Property 5 (P5 , Offset property): Given a set of scores L of size N and a scalar g ∈ R, then M (L + g, γ) = M (L, γ) + g. Property 5 implies that if a constant offset to all the elements of input set leads to an equivalent offset in the output of the margin approximation function. Proof: Based on the proof of Property 4, given N i=1 [Li rewritten as − z]+ = γ, if G is a scalar set with each gi = g, g ∈ R, equation (3.55) can be N i=1 [Li + g − (z + g)]+ = γ. Written into MP function format, it becomes M ((L + g), γ) = M (L, γ) + g. P5 : M(L + g, γ) = M(L, γ) + g 0.65 MP, M(L + g, γ) IFMP, M(L + g, γ) MP,M(L, γ)+ g IFMP,M(L, γ)+ g 0.64 rate, z 0.63 0.62 0.61 0.6 0.59 0.22 0.24 0.26 γ 0.28 0.3 Figure 3.13: Super position property for spiking IFMP compared to the continous values of margin propogation. 84 The computational properties above are applied successfully to the spiking networks based on the proposed IFMP model. It is shown in Fig.3.9, 3.10, 3.11, 3.12 and 3.13 respectively the close match between the approximated continuous MP and the pulse-mode margin propagation (IFMP) for all of the above properties. Because of the limitation in representing the probabilities between [0,1], the above properties, especially the scaling and the offset properties, are used to represent the input parameters to approximate the nonlinear computational functions. In the next section, examples of nonlinear functions are approximated using IFMP pulse computations. 3.3 Examples of basic computations using IFMP Before applying the model into algorithmic applications, it is necessary to show how the IFMP unit is used to approximate the basic computations such as addition,subtraction, multiplication, division, power, inner product and polynomial. The inputs to these units are probabilities and must be converted into logarithmic format. Then the resultant negative values are scaled using the properties of scaling and offset, as discussed above, in order to keep the input values between zero and one. To retrieve the correct computation values, the output of the computational function is scaled back in the reverse order of input scales as shown in Fig.3.14. The basic computations are implemented using log-sum-exp, analog margin propagation and pulsed margin propagation as an approximated operation as well as the error for each of the above computations will be shown graphically for different values of input probability rates. Addition: Let f = a + b = elog(a+b) then the exponent z can be represented in log-sum- 85 a1 a2 Log + scale an b1 Pulsed computational Module IFMP b2 bn Digital Analog/ Digital z Scale + exp F = ez Digital Figure 3.14: Input/output stages in the Pulsed computational unit. exp and margin format as followings(see Fig. 3.15-a) z = log(f ) = log(elog(a) + elog(b) ) (3.60) z = log(eL(a) + eL(b) ) (3.61) z = M (La , Lb , γ) (3.62) where L stand for log function. Multiplication: Let f = a.b = elog(a.b) then the exponent z can be represented in log-sum-exp and margin format as followings (see Fig. 3.15-b) z = log(eL(a)+L(b) ) (3.63) z = M (La + Lb , γ) (3.64) Division: Let f = a/b and c = 1/b then f = ac. Then the above can be represented as 86 La IF Lb IF Excitatory Inhibitory La Lb IF λ IF La IF IF λ La + a- Addition Lb + La Lb - IF IF IF IF IF λ IF z λ d- Subtraction λ b- Multiplication z+ La0 La1 Lb1 La2 Lb2 c- Power La0 Lb0 IF IF IF IF La1 Lb1 IF La2 λ IF IF IF λ Lb2 IF e- Polynomial f- Inner Product Figure 3.15: Basic computational architecture using IFMP. 0.9 0.15 0.7 0.6 0.5 0.4 0 Error rate Pulse rate for F 0.8 Error1 = LSE − IFMP (sw) Error2 = LSE− IFMP (circuit) 0.1 0.05 IFMP (sw) MP 0 LSE IFMP (circuit) −0.05 0 0.2 0.4 0.6 0.8 op1(op2=0.3) 0.2 0.4 0.6 op1(op2=0.3) 0.8 Figure 3.16: Computational functions for addition using IF M Psw , M P, LSE and IF M Pcircuit . The right figures represent the error rate between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) 87 a multiplication operation in IFMP representation as, z = M (La + Lc , γ) (3.65) Power: Let f = a1 a2 a3 ... = elog(a1 a2 a3 ...) then the exponent z can be represented in log-sum-exp and margin format as followings (see Fig. 3.15-c) z = log(eL(a1 )+L(a2 )... ) (3.66) z = M (La1 + La2 + ..., γ) (3.67) Subtraction: In the subtraction function, the differential form is used such that f = a−b can be written as f = (a+ − a− ) − (b+ − b− ) = (a+ + b− ) − (b+ + a− ) + − + − f = elog(a +b ) − elog(b +a ) (3.68) The following two exponents z + and z − are then evaluated using two IFMP’s as shown in Fig. 3.15 (d) such that z + = M (La+ , Lb− , γ). (3.69) z − = M (Lb+ , La− , γ). (3.70) and Polynomial: Let f = a0 + a1 b1 + a2 b22 + a3 b33 then the exponent z can be represented in log-sum-exp and margin format as followings (see Fig. 3.15-e) La1 +Lb 1 z = log(eLa0 + e 88 +e La2 +2Lb +.... 2 ) (3.71) z = M (La0 , La1 + Lb1 , La2 + 2Lb2 .., γ) (3.72) Inner Product: Let f = a0 b0 + a1 b1 + a2 b2 + ... then the exponent z can be represented in log-sum-exp and margin format as followings (see Fig. 3.15-f) La0 +Lb 0 z = log(e +e La1 +Lb 1 La2 +Lb 2 +e + ....) z = M (La0 + Lb0 , La1 + Lb1 , La2 + Lb2 ..., γ) 0.3 0.02 IFMP (sw) F=op1 . op2 MP LSE IFMP (circuit) Error rate Pulse rate for F 0.4 0.2 (3.74) Error1=LSE −IFMP (sw) Error2=LSE −IFMP (circuit) 0 −0.02 0.1 0 0 (3.73) 0.5 op1 (op2= 0.3) 1 −0.04 0 0.5 op1 (op2= 0.3) 1 Figure 3.17: Computational functions for multiplication using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) The above computations are simulated using log-sum-exp (LSE), margin propagation for real values and pulsed IFMP. The left parts of Fig.3.16 ,3.17, 3.18, 3.19 and 3.20 represent the computational function for addition, multiplication, power, inner product and polynomial for pulse-based IFMP software simulation, continuous values MP, LSE and pulse-based hybrid circuit simulation IFMP, while the right part represents the error between the LSE and pulse based IFPM values for both software and hybrid circuit simulation. The absolute 89 0.3 IFMP (sw) MP LSE IFMP (circuit) 0.02 0.2 Error rate Pulse rate for F 0.4 0.1 −0.02 Error1=LSE−IFMP (sw) Eror2=LSE− IFMP (circuit) 0 3 F=op 0 0.2 0.4 op 0.6 0.8 −0.04 0.2 0.4 0.6 op 0.8 1 Figure 3.18: Computational functions for power using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) errors between the software simulation and the theoretical output rates for all computations are less than 0.03, where the number of iteration in these cases are 10000. On the other hand, the number of iterations for the circuit simulation in cadence are applied for 3000 for the sake of storage space allocated for research purpose. Therefore, the absolute errors in the cases of circuit simulation are higher which are less than 0.07. The rates in the above experiments are chosen so that the output computational results are in the limits of probability values of [0:1]. If the first operator in the addition case for example is 0.3, then the second operator in the experiment is between [0:0.6]. In the multiplication and power cases, the computational results have no conflict with the probability limit. As experimental procedure for the multiplication operation, one operand is equal to 0.3 and the other is equal between [0.1:0.9]. For the power function, the range of operand in the experiment is between [0.3:0.7]. For the inner product function, there are four operands as discussed in the mapping above,(op1 , op2 , op3 , op4 ), three of them are fixed to 0.3, 0.4 and 0.5 respectively and the forth operand varies between [0.1:0.8]. For the polynomial function, 90 IFMP (sw) MP LSE IFMP (circuit) 1 0.8 0.6 0.3 0.2 Error rate Output rate for F F=op1 (op2+op3+op4) 0.4 Error2= LSErate− IFMPcircuit 0.1 0 0.2 0 0 Error1= LSErate−IFMPsw 0.2 0.4 0.6 0.8 op1 (op2=0.3,op3=0.4,op4=0.5) −0.1 0 0.2 0.4 0.6 0.8 op1 (op2=0.3,op3=0.4,op4=0.5) Figure 3.19: Computational functions for inner product using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) 0.8 0.6 IFMP (sw) MP LSE IFMP (circuit) 0.08 Error1=LSE− IFMP (sw) Error1=LSE − IFMP (circuit) 0.06 Error rate Output rate for F 1 F=op1 (1+op2+op23 ) 0.4 0.04 0.02 0 0.2 0 0 −0.02 0.6 −0.040 0.2 0.4 op1 (op2=0.3, op3=0.4 ) 0.2 0.4 op1 (op2=0.3, op3=0.4 ) 0.6 Figure 3.20: Computational functions for polynomial using IF M Psw , M P, LSE and IF M Pcircuit . The right figure represent the error between the LSE computation and IF M Psw (Error1), and IF M Pcircuit (Error2) there are three operands as discussed in the mapping above,(op1 , op2 , op3 ), two of them are fixed to 0.3 and 0.4 and the third operand varies between[0.1:0.6]. For more comprehensive display of the pulse based computations explained above, ex- 91 Error = z z IFMP circuit=op1.op2 −z theoritical 0.05 Rate, z Rate, z 1 IFMPcircuit 0.5 0 1 1 0.5 Rate, op2 0 −0.05 1 0 0 1 0.5 0.5 Rate, op1 Rate, op2 0.5 0 0 Rate, op1 (a) zIFMP SW=op1.op2 Error = zIFMP SW − ztheoritical 0.05 Rate, z Rate, z 1 0.5 0 1 Rate, op2 −0.05 1 1 0.5 1 0.5 0.5 0 0 0 Rate, op1 Rate, op2 0.5 0 0 Rate, op1 (b) Figure 3.21: Output Rates and error corresponding to the theoretical computation for both the circuit simulation in (a) and software simulation in (b) to evaluate the multiplication function using pulse computation periments are implemented when both of the IFMP inputs are changing. For example, experiments are implemented to evaluate the output rates for the pulse based IFMP circuits and theoretical modules to show the dynamic characteristics for the multiplication operation and polynomial operation as in Figure3.21, 3.22 and 3.23 respectively. In the multiplication function, the range of first and the second operand is between [0.1, 0.9]. In the polynomial 92 function, one operand is fixed on 0.3 and the other two operands are in the range of [0.1:0.3] and [0.1:0.6] respectively. In the inner product function, two operands are fixed to 0.4 and 0.5 and the other two operands are in the range of [0.1:0.6] and [0.1:0.7] respectively.The absolute errors in the above cases are less than 0.1. The errors are due to either the number of random pulses in the simulation and circuit simulation or the balance accuracy between the excitatory and inhibitory paths of the IFMP circuit. 3.4 IFMP: circuit, layout and tested measurements The analog circuit of IFMP module in Fig. 3.24-(a) is shown in Fig. 3.24-(b). Each neuron in the IFMP module represents the integration and threshold operations that are designed between two bounds (2.34v, 0.9v). Initially, if the input of the integrator is zero, the outputs of the integrator and the cascoded inverter are equal to 3.3 and zero volts respectively. If the input voltage increases and reaches the high gain region of integrator amplifier, then the integration phase will built up which is considered as the discharging phase of the capacitor. The input current is integrated and the output voltage of the integrator discharges to the lower bound. At this point, the output of the cascoded inverter turned into logic one which will turn the output voltage of the integrator to the upper bound (charging phase of capacitor). The cycle of charging and discharging the capacitor C is repeated according to the amount of the current injected to the inputs of the integrator (’in’ node). The injected currents to the three integrators are applied respectively during off and on states of the input pulses for the excitatory path (PMOS transistors) and inhibitory path (NMOS transistors). Modules N1 , N2 have excitatory inputs (PMOS path), one self feedback inhibitory input (NMOS path) and one output feedback inhibitory input (NMOS path) 93 = op1(1+op2+op32), op3=0.3 z Error = z IFMP circuit IFMP circuit theoritical 0.2 Rate, error Rate, z 1 −z 0.5 0 1 Rate, op1 −0.2 1 0.4 0.5 0 0 0 0.4 0.5 0.2 Rate, op1 Rate, op2 0.2 0 0 Rate, op2 (a) zIFMP SW= op1(1+op2+op32), op3=0.3 0.5 Rate, error Rate, z 1 Error = zIFMP SW − Ztheoritical 0.5 0 1 Rate, op1 −0.5 1 0.4 0.5 0.4 0.5 0.2 0 0 0 Rate, op2 Rate, op1 0.2 0 0 Rate, op2 (b) Figure 3.22: Output Rates and error corresponding to the theoritical computation for both the circuit simulation in (a) and software simulation in (b) to evaluate the polynomial function using pulse computation while Module N3 has two excitatory inputs (PMOS paths) and one inhibitory input. The last inhibitory input is represented by an adjustable constrain rate γ explained earlier.The values of the capacitance, current, W/L and the biasing voltages in Figure 2(3) are as follows: C = 200 ∗ 10−15 F, I = 10−12 A, W/L = 6/3m, V b1 = 0, V b2 = 3.3v, V 3 = 0.7v, V 2 = 1.37v, V 1 = 1.4v. 94 zIFMP circuit= op1(op2+op3+op4), Error = zIFMP circuit − ztheoritical op3=.5,op4=.4 0.5 Rate, z Rate, z 1 0.5 0 1 Rate, op1 −0.5 1 1 0.5 0 0 0 1 0.5 0.5 Rate, op2 Rate, op1 0.5 0 0 Rate, op2 (a) zIFMP SW=op1(op2+op3+op4), Error = zIFMP SW − ztheoritical op3=.5,op4=.4 0.2 Rate, z Rate, z 1 0.5 0 −0.2 1 0 1 1 0.5 Rate, op1 0.5 0 0 1 0.5 Rate, op1 Rate, op2 0.5 0 0 Rate, op2 (b) Figure 3.23: Output Rates and error corresponding to the theoritical computation for both the circuit simulation in (a) and software simulation in (b) to evaluate the inner product function using pulse computation The dynamic characteristics of the IFMP module are verified in theoretical and layout design on standard 0.5m CMOS technology shown in 3.25 (b). In the experiments using the structure in 3.25(a), the chip is biased to ensure the balance between the excitatory and inhibitory parts using National instrument data acquisition input/ output. The biasing voltages are applied utilizing a Matlab program to initialize the ADC and DAC converters. 95 L1 N1 T1 L2 T2 N2 z N4 γ (a) c L1 c V1 T1 c T2 V2 V3 V1 L2 Vb1 in B1 Vb2 B2 c B2 B3 z γ T2 (b) Figure 3.24: (a): Two inputs IFMP module. (b): Analog circuit for the IFMP module The USB is supported with functions to communicate the input / output between th e PC and FPGA. The inputs are applied as random pulses with certain specified rates which are 96 Trigger inputs to collect the stored pulses in stack memory from Input/output pulses FPGA to PC to/ from IFMP. PC USB Controllers Initialization FPGA NI Data Acquisition card IFMP chip Biasing voltages decoder 8*8 array of IFM (a) Decoders 8*8 array of IFMP (b) Figure 3.25: (a): Interfacing boards for testing the IFMP chip. The PC and FPGA are used to generate random pulses/getback the output pulses to/from the IFMP circuit. The National instrument interfacing card is used to generate voltage biasing to the transistors of the IFMP as well to intiate the stack memory to pull the stored pulses to the PC. (b): IFMP chip includes array of 8*8 IFMP chip and decoders generated using Verilog hardware description language (Verilog HDL). The HDL program is converted into a bit file and is reconfigured into hardware components on FPGA using 97 , γ=0.2 z zIFMP chip, γ=0.2 1 1 0.5 0.5 z z IFMP theoritical 0 1 1 0.5 L2 0 1 0 0 1 0.5 0.5 L2 L1 0.5 0 0 L1 (a) Error= zIFMP theoritical− zIFMP chip, γ=0.2 z 0.1 0 −0.1 1 1 0.5 L2 0.5 0 0 L1 (b) Figure 3.26: Sample tests of dynamic characteristics of IFMP when γ = 0.2 in (a) respectivally and the deference between the dynamic characteristics of the measured values, theoritical rates in (b) Xilinx Integrated Software Environment (ISE 9.2) software. Experiments are implemented and shown in Figure 3.26, 3.27 and 3.28 to test the convergence of the dynamic characteristics of IFMP chip which match the theoretical convergence rates. In the figures, it is shown the measured output rates, theoretical rates and the difference between these two rates in when the input rates changes between 0 and 1 for three gamma rate (0.2, 0.4 and 0.6). The 98 , γ=0.4 z zIFMP chip, γ=0.4 IFMP theoritical 0.6 0.4 0.4 z z 0.6 0.2 0.2 0 1 1 0.5 L2 0 1 0 0 1 0.5 0.5 L1 L2 0.5 0 0 L1 (a) Error: zIFMP theeoritical− zIFMP chip, γ=0.4 z 0.2 0.1 0 1 1 0.5 L2 0.5 0 0 L1 (b) Figure 3.27: Sample tests of dynamic characteristics of IFMP when γ = 0.4 in (a) respectivally and the deference between the dynamic characteristics of the measured values, theoritical rates in (b) experiments are implemented for the three values of gamma when the range of input rates L1 and L2 are chosen between [0.2:0.6] and [0.3:0.8] respectively. The absolute errors for the three experiments are less than 0.1. In addition to the reason of the convergence error caused by the value of the iterations assigned to the stream inputs, there is another reason for the error in the IFMP layout. The main reason for the error in the IFMP layout is to 99 IFMP chip 0.4 0.4 0.2 0.2 0 1 L2 0 1 1 0.5 1 0.5 0.5 0 0 , γ=0.6 z z z zIFMP theoritical, γ=0.6 L2 L 1 0.5 0 0 L 1 (a) Error: ZIFMP theeoritical− ZIFMP chip, γ=0.6 z 0.2 0 −0.2 1 1 0.5 L2 0.5 0 0 L1 (b) Figure 3.28: Sample tests of dynamic characteristics of IFMP when γ = 0.6 in (a) respectively and the diference between the dynamic characteristics of the measured values, and theoritical rates in (b) maintain the balance in current between the exitatory and inhibitory paths of the IFMP circuit. The balance achieved by fixing the biasing voltages to certain accurate values. In circuit simulation, there is no problem in fixing these values but a problem in the circuit layout chip appears because the biasing voltages are common for all the paths due to pins limitation of the chip. This may not cause a problem if there is no mismatch in the physical 100 performance of the transistors on a chip but that is not the case in real world. Therefore the number of pulses to coverage the input/output rates and the mismatch of transistors in hardware are the main reasons for the error in the measurement results of IFMP simulation and circuit layout. It is well known that the low current biasing in subthreshold is important in low power applications, particularly in neuromorphic applications that simulate various aspects of brain function [1], [70]. Since the new designed modules, discussed above, are verified in low power circuits, which imply low current and/or voltage levels, then the designed circuits are more susceptible to the effect to noise. Therefore, It is important to suppress the effect of the noise in the design since the noise due to the thermal change can not be avoided. In this novel design, the type of connectivity of the IFMP network reduce the effect of the noise because the inhibitory connection suppress the noise to maintain the desired dynamic characteristics as will be discussed next. The main parameter that causes the noise in the weak inversion mode is the thermal effect. A formula for subthreshold noise in MOS transistors has been derived by Enz [71] and Vittoz [72] from the consideration that the channel of transistor as being composed of a series of resistors. The integrated thermal noise of all these resistors yields the net thermal noise in the transistor. In [73], the white- noise is derived for the subthreshold MOS transistors that include the thermal effect as the sum of fluctuation components from the forward and reverse currents. However, it is proven in neuronal population that the inhibitory connections between neurons shape the noise or reduce the noise and thus increase the signal to noise ratio (SNR). The SNR is improved if either the output signal is increased or the noise is reduced. In [74], [75], the authors proved that a method named noise shaping is used to improve 101 the SNR in neuronal population. In the above approach, the inhibitory couplings between neurons are used to shift the noise power from one part of the spectum to another, thereby shaping the spectrum and more importantly, lowering the noise at the frequencies of interest. Therefore, it is possible in the noise shaping method to suppress the noise power within the signal bandwidth and thus improve the SNR. In [76], the author shows that the noise shaping is used to reduce the noise effect in performance of the analog circuit design fabricated on 0.5 µ m CMOS process. In [5], the noise-shaping mechanism is discussed using the integrate-and-fire model. Consider a neuronal network consisting of N integrate-and-fire neurons. Each neuron is characterized by its intrinsic voltage vi (t), i ∈ [1, N ], and the neuron fires whenever vi exceeds a threshold Vth . Between consecutive firings, the dynamics of the membrane potential can be expressed using the integrate-and-fire model as [74]: dvi (t) = −vi (t)/τm − dt N j=1 Wij exp(−(t − tm j ) /τs ) + αxi (t). (3.75) th Here, tm j are the set of firing times of the j neuron and τm denotes the time-constant of the neuron capturing the leaky nature of integration denoted by the leaky potential of the membrane vi (t)/τm . The exponential term and the related time constant τs models the pre-synaptic filter h(t) in Eq. (4.8) and time constant of the pre-synaptic spike-train. The parameter set Wij denotes the synaptic weights between the ith and j th neurons and also denotes the set of learning parameters for this integrate-and-fire neural network. To show how the synaptic weights Wij influence noise-shaping, consider two specific cases: (a) when Wij = 0, implying there is no coupling between the neurons and each neurons fires independently of the other; and (b) when Wij = W implying that the coupling between the 102 xi (a) W ij xi (b) Figure 3.29: (a): Uncoupled connection between neurons and the coresponding raster plot. (b): Coupled connection between neurons and the coresponding raster plot showing the uniform and non uniform occurance of pulses at the bottom of the raster plots neurons is inhibitory and constant. For a simple demonstration, τm is set to 1ms and N is set to 50 neurons. When the input xi (t) is applied, the raster plot is used to indicate the firing signal of the 50 neurons for both the uncoupled case and the coupled case as shown in Fig. 3.29 (a) and (b) respectively. In the raster plot of the two figures, the y-axis represents the neuron Ni that fires with respect to time, where i is the index of neurons. The lower parts of the raster plots present firings time of the neurons. For the uncoupled case, the population firing show clustered behavior where multiple neurons fire could fire in close proximity, whereas for the coupled case, the firing rates are respectively uniform indicating 103 that the inhibitory coupling reduce correlation (introduce temporal correlation) between the neuronal firings. In the frequency domain, these correlations shift the noise power from one part of the spectrum to another, thereby ”shaping” the spectrum and, more importantly increase the SNR ratio. Power 100 Noise floor Out of noise band Uncoupled 10-2 10-4 Coupled In band noise shaping Single Neuron 101 102 frequency (Hz) 103 Figure 3.30: Power spectrum to show that the noise is suppressed in coupled connection of neurons compared with the uncoupled connection of neuron shown in Fig. 3.29 To understand the implication of the inhibitory coupling for noise-shaping, a sinusoidal input at frequency f0 = 1 kHz is applied to all the neurons and the population firing rates are analyzed in frequency domain using a short time Fourier transform. Fig. 3.30 shows a comparison of the power spectrum for a single neuron, a neuron in a population of a coupled and uncoupled network respectively. The spectrum for a single neuron shows that it is unable to track the input signal since its bandwidth (1 kHz) is much larger than the firing rate of the neuron, whereas for the uncoupled/coupled neurons in a population case the input signal can be easily seen. For the uncoupled case, the noise floor, however, is flat, whereas for the coupled case, the noise floor from the signal band is shifted in the higher frequency 104 range. The shaping of the in-band noise-floor enhances the SNR ratio of the network for a large network [74]. therefore, It can be deduced that the inhibitory connections enhance the performance of the IFMP network in the existence of thermal noise. 3.5 summary To summaries this chapter, the characteristics of the new designed pulse computational IFMP module is verified in code simulation, mapped and verified into analog circuit as simulation and chip layout. The results in all the above simulations proved the convergence equation of the IFMP in verifying the dynamic characteristics and approximating the nonlinear functions. The convergence of IFMP module depends on three main points: 1- The first parameter is the number of iterations of random pulses per time that represents the probability to satisfy the convergence. This effect is verified in software simulation and showed that the convergence is enhanced as the number of random input pulses to represent the rate increases above 3,000. The convergence can be achieved for less than 3,000 iterations (1,000 iterations), yet these lower iterations is not stable on repeatedly running the experiments. The reason is that the type of input pulses are random and that the number of pulses per time period in repeating the experiment is not constant. As a results of the randomness of the inputs, the chance of the potential membrane potential to reach the threshold and generate pulses is not fixed when the experiments are repeated. Hence, the average number of pulses per time might not be to satisfy the convergence. 2- The second parameter is the accuracy of the current balance between the excitatory and the inhibitory paths of the analog IFMP module. This effect is very important in the circuit and layout design. If the biasing current which is measured in pico-amperes 105 (20PA) and balanced accurately, then the convergence tends to be stable and close to the software simulation. Therefore, it is recommended, for future work, to design a fixed biasing techniques to stabilize the analog circuit biasing voltages. Therefore, this second parameter effect is the reason in the difference error between measured and simulated results discussed in this chapter. 3- The mismatch of the transistors in the hardware implementation causes difference biasing in the excitatory and inhibitory paths. This mismatch is caused by the common biasing for all the neuron modules in the IFMP array because of the limitation of the pins on the chip packages. The last reason is the cause of the error in hardware implementation besides the above two reasons for the error. 106 Chapter 4 IFMP Applications This chapter shows three applications for the newly designed pulse computational architecture, the first two applications are concerned with factor graph algorithm in recognition of discrete sequences and trajectory traces using Hidden Markov Model (HMM) while the third one is a binary classification using support vector machine (svm). The purpose of applying the above applications is to show that pulse computations are applicable to complicated dynamic algorithms. If the newly designed pulsed computational architecture is applied and verified on a silicon chip then this unit could be the core of sensory designs of the neuromorphic applications. The chapter includes an introduction to the HMM, sequence recognition HMM, trajectory detection HMM, introduction to svm, binary classification using svm and finally a summary to the chapter. 4.1 Introduction to the Hidden Markov Model In this section, a brief explanation of the Markov model and the hidden Markov model algorithm is introduced. Markov model is a probabilistic model of symbol sequences in which the probability of the current event is conditioned only by the previous event. Consider a sequence of random variables s1 , s2 , , sN in which the subscripts indicates word-position in a sentence. Here, the random variable is a function, and in this case its range is the vocabulary of the language. Then the probability of words s1 , s2 ..st according to Bayes formula is equal 107 to, p(s1 = w1 , s2 = w2 , ..st = wt ) = p(st = wt | s1 = w1 , s2 = w2 , ..st−1 = wt−1 )* (4.1) p(s1 = w1 , s2 = w2 , ..., st−1 = wt−1 ) p(s1 = w1 , s2 = w2 , ..st = wt ) = p(st = wt | s1 = w1 , s2 = w2 , ..st−1 = wt−1 )∗ p(st−1 = wt−1 | s1 = w1 , s2 = w2 , ..st−2 = wt−2 )∗ (4.2) ......p(s1 = w1 ) Therefore, the probability of the last state can be recursively written as, p(st = wt | s1 = wi , s2 = w2 , ..st−1 = wt−1 ) = p(st = wt | st−1 = wt−1 ) (4.3) On the other hand, the Hidden markov Model (HMM) is a probabilistic algorithm that is used to estimate the probabilities of unobserved events (in order to be decoded or recognized ) using probabilities of observed events. For examples: in speech recognition, the observed data is the acoustic signal and the spoken sequence of words are the hidden parameters and the goal is to find the maximum probability out of all possible sequences of words. In handwriting recognition, the observed data is the designed images of words and the hand written words to be decoded or recognized are the hidden parameters. The estimation of the probabilities for unobserved events can be evaluated using the forward recursion algorithm. The following shows how to calculate the probability using the HMM- forward algorithm 108 [77]. Given the following parameters for the HMM states s1 : sn in Fig.4.1(a): 1. Initial state distribution for the first state. 2. Transition probabilities from one state to the next or self state : p(sk /sk−1 ) 3. Emission or observation probabilities: x1:k a11=0.8 S1 Sk-1 a12=0.2 Sk-1 S1 S1 X1 XK-1 a22 =0.8 XK S2 1 3 a33= 1 a23=0.2 3 S3 2 S2 S3 Time (a) (b) Figure 4.1: (a): HMM states(s)and emission events (x). (b): Three state HMM and it’s representation using trellis with four input sequence to be recognized (1,3,3,2) The goal is to compute the probability of sk /x1:n according to the marginalization property as p(sk /x1:n ) = m sk−1=1 p(sk , sk−1 , x1:k−1 ) for k = 1 : m, then the probability is factorized and normalized as following; p(sk , x1:k ) = m sk−1=1 p(xk /sk , sk−1 , x1:k−1 )∗ (4.4) p(sk /sk−1 , x1:k−1 ) ∗ p(sk−1 /x1:k−1 ) ∗ p(x1:k−1 ) In the first term p(xk /sk , sk−1 , x1:k−1 ), xk is independent on both sk−1 and x1:k−1 . Similarly, in the second term p(sk /sk−1 , x1:k−1 ), sk is independent of x1:k−1 . The terms 109 p(sk−1 /x1:k−1 ) ∗ p(x1:k−1 ) is equivalent to p(sk−1 , x1:k−1 ), therefore, p(sk , x1:k ) = m sk−1=1 p(xk /sk ) ∗ p(sk /sk−1 ) ∗ p(sk−1 , x1:k−1 ) (4.5) Let αk−1 (sk−1 ) = p(sk−1 , x1:k−1 ) and αk (sk ) = p(sk , x1:k ) then αk (sk ) = m sk−1=1 p(xk /sk ) ∗ p(sk /sk−1 ) ∗ αk−1 (sk−1 ) (4.6) The above equation represents the HMM recursive equation where αk−1 (sk−1 ) is the pre or initial state distribution, p(sk /sk−1 ) is the transition probability and p(xk /sk ) is the emission probability. Thus the forward algorithm evaluate recursively the probabilities from the current state to backward state. HMM is feasible in recognition but it has limits in scalability for implementing large networks in analog circuit implementations using currentmode or voltage-mode circuits [78]. In the next section, it can be shown how to implement an asynchronous analog HMM circuit using log-likelihood decoding directly in time-domain using binary pulses. The core of the implementation is an integrate-and-fire margin propagation circuit that can approximate a soft-max function and can be cascaded together to build large, complex HMMs. The simulation results obtained at a circuit level show that the implementation of the asynchronous HMM is equivalent to the results obtained from a software implementation. 110 4.2 Sequence detection In this application, the detection for sequences might be a recognition of slices to recognize a character, recognition of letters in certain sequence to recognize a word or determination of certain order in DNA sequence. In these applications, a sequence of numbers are needed for recognition. If it is intended to detect a specific discrete number in a sequence, then the required observations for such sequence are trained to get the highest probability of occurrence for successful detection. The resultant probability can be assigned in a matrix named as an observation probability matrix. The observation probability, as discussed in the previous section, is one of the main parameters in HMM. The dimension of the observation matrix depends on the number of HMM states and the size of numbers to be detected. For example, if the number of HMM states are three and the discrete numbers to be detected are a set of {1,2,3...10}, then the size of observation matrix is (3x10), where the row is denoted to the state and the column is denoted to the sequence number. Note that the sum of observation probability for each state (row) in the matrix is equal to 1. In this section, the number of states are chosen as three states and the the numbers to be detected in a sequence are chosen as set of {1,2,3}. Therefore, the dimension of the observation probability matrix is (3 x 3). Another parameter in the HMM sequence detection is the transition probability matrix which includes two types of probabilities: 1- the transition probability from one state to the next state. 2- the probability of being in the same state. Since there are three states in the HMM model, then the size of the observation matrix is (3x3), where the rows and columns denoted to the state number. The same condition related the sum of probabilities in each row or state in the observation matrix is valid in the transition matrix too (ie, the sum of probabilities in a row is equal to one). Next, the HMM model for detecting a sequence of 111 numbers using the newly designed pulse computational IFMP algorithm will be discussed in details. A generic architecture of an HMM is similar to that of a finite-state-machine with multiple states s1 , .., sk where the transition between the states occurs with a finite probability and each of the states generates random observations or symbols according to a well-defined probability distribution [79]. As an example, a three-state HMM is shown in Fig.4.1(b) which label the transition probability between states si and sj . For each state, the probability distribution associated with it is based on which an observation is generated. If the probability distribution is continuous then the observations are the elements of real numbers vector. If the probability distribution is discrete then the observations are symbols drawn from a finite alphabet. In the example, 3-states HMM shown in Fig.4.1(b) can be mapped into trellis as three states to detect a four items sequence with time, the discrete probability distribution over three symbols is compactly represented using a observation probability matrix. An example of the observation matrix is:   0.9 0.1 0.0     O = 0.0 0.2 0.8     0.0 0.9 0.1 where the element Oi,x represents the observation probability of generating symbol x ∈ {1, 2, 3} while being in state Sj . The transition and observation probabilities for an HMM (discrete or continuous) are typically estimated using a training procedure based on a pre collected sequence of observations. During the decoding procedure, the trained HMM model H1 is used for estimating the posterior probability p(x|H1 ) of generating an observation sequence x =∈ {1, 2, 3} given the HMM. In practice, estimation of p(x|H1 ) is implemented 112 using a forward recursion algorithm which can be expressed as, pj [n] = i∈Cj pi [n − 1]aij Oj,x[n] (4.7) where pj [n] represents the probability of being in state Sj at time instant n, aij represents the transition probability from state Si to state Sj , Oj,x[n] is the emission probability of an observation x in state Sj at instant n and Cj is the connection index for state Sj . For example, in Fig 4.1, the recursion for state S2 can be simplified to p2 [n] = p1 [n − 1]a12 O2,x[n] + p2 [n − 1]a22 O2,x[n] (4.8) Practical implementations of HMM decoding algorithms map equation 4.8 into logdomain such that the product of probabilities does not lead to precision or truncation errors. Define L(pj [n]) = log(pj [n]), L(aij ) = log(aij ), L(Oj,x[n] ) = log(Oj,x[n] ), then equation 4.7 can be expressed in the log-domain as, L(pj [n]) = log( i∈Cj exp(L(pi [n − 1]) + L(aij ) + L(Oj,x[n] ))) (4.9) Equation 4.9 involves a logarithm of a sum of exponents, abbreviated as log-sum-exp, for which a piece-wise linear approximation called margin propagation is discussed earlier. Given a general form of log-sum-exp function by P zlog = log( exp(Li )) (4.10) i=1 where Li , i = 1 : P represents the log-likelihood scores as in equation 4.9, margin propagation 113 generates an approximation z ≈ zlog where z is computed as a solution to the following equation P i=1 [Li − z]+ = γ (4.11) where γ is an adjustable parameter which is chosen to improve the quality of approximation. The operation [.]+ denotes a threshold operation whose output value is generated if the input is positive, otherwise equals zero. Using equation 4.11 to approximate the equation 4.9 the following recursion is obtained. i∈Cj [L(pi [n − 1] + L(aij ) + L(Oj,x[n] )) − L(p[n])]+ = γ (4.12) A basic circuit of margin propagation (IFMP) is designed for two inputs P = 2 indicated in equation 4.11. Any state of the three states in Fig. 4.2 (below) represents an IFMP model and consists of three integrate-and-fire structures. For the first structures of state 1 (N1 , N2 , N4 , N5 , N7 , N8 ), there are three excitatory inputs (black triangle) and two inhibitory inputs (white triangle): the self-feedback and the feedback from the output of the IFMP structure, where the output here is fed from the third structure of the local state. The third one of each state (N3 , N6 , N9 ), has two excitatory inputs and one inhibitory input. Let us denote L1 [n] and L2 [n] with n being a discrete time-index as the rate of input spike-train for the first two structures, then it can be shown that firing-rate of the output Lz [n] (denoted by E(Lz ) = limT −→∞ T1 T n=1 Lz [n]) asymptotically satisfies. P i=1 [E(Li [n]) − (Lz [n])]+ −→ E(γ[n]) (4.13) Note that equation 4.13 convergences only in probability. The parameters in the above 114 equations represent the expected values of binary (pulse) sequences. For example, a rate of L = 0.5 means that half of the equally space bits in a stream is equal to logic 1. L (a12 ) - B1  L ( P1 ) L (O2,x[n]) c c Excitatory input Inhibitory input Vb1 in V 1 T2 V2 - AC T1 L (O2,x[n])  L ( P2 )  L (a22 ) B3 c c B3 0 0 in1 in2 B1 N1 N2 B2 L(P1) N3  in3 in4 B1 B2 N4 in3 in5 N5 S1 in1: L(O1,x[n]) in2: L(a11) B2 L( P2 )  T2 in1 Vb2 V3 L(P2) N6  S2 in3: L(O2,x[n]) in4: L(a12) in5: L(a22) in6 in7 B1 B2 L(P3) N7 in6 in8 N8 N9  S3 in6: L(O3,x[n]) in7: L(a23) in8: L(a33) Figure 4.2: Three state HMM cascaded using IFMP structure to decode sequence and the analog circuit (below) for each cascaded state The analog circuit of IFMP module shown in Fig. 4.2 (upper) (explained earlier in 115 chapter 2) is mapped to each state of the HMM states shown in 4.2 (lower). The inputs of the analog circuit shown in the figure represent the input terms for state S2. Each state of the HMM states is represented by ”two-inputs IFMP” unit. One input is fed by the sum-log of self feedback pulses of the state, observation probability to an input sequence and self transition probability. The second input is fed by the sum-log of the previous pulse state, observation probability to an input sequence and transition probability from a previous state to present state. These input are applied into the IFMP as expected value of pulses as E(L(Oj,x[n] )) + E(L(pi [n − 1]) + E(L(aij )). where j represents the present state, i represents the previous and the present states as two terms and x[n] is the input sequence to the HMM states. As an example of the inputs to state S2, j and i in the above equation are denoted as states 2 and 1 respectively. Then, the above terms of the equation are represented as E(L(O2,x[n] )) + E(L(p1 [n − 1]) + E(L(a12 )) and E(L(O2,x[n] )) + E(L(p2 [n − 1]) + E(L(a22 )) respectively. In the next section, the procedure and results of HMM decode sequence are demonstrated when the proposed model in Fig. 4.2, maps the equation below such that the expected value of the output spike is proportional to the expected values of the input spikes, i∈Cj [E(L(Oj,x[n] )) + E(L(pi [n − 1]) + E(L(aij )) − E(L(pj [n]))]+ = γ (4.14) Procedure and results: The IFMP module discussed above represents one state machine of three HMM states. Here, the circuit of three states HMM is implemented to find the output rates p1, p2, p3 of states s1 , s2 , s3 of Fig 4.2 (b- below) which is substituted from the recursive equations 4.9 and 4.12 as, 116 L(p1 )[n] = L[e [L(O1,x[n] )+L(p1 [n−1])+L(a11 )] +e ] [L(O2,x[n] )+L(p2 [n−1])+L(a22 )] (4.15) [L(O2,x[n] )+L(p1 [n−1])+L(a12 ] +e ] (4.16) [L(O3,x[n] )+L(p2 [n−1])+L(a23 )] +e [L(O3,x[n] )+L(p3 [n−1])+L(a33 )] (4.17) L(p2 )[n] = L[e L(p3 )[n] = L[e [L(O1,x[n] )+L(p0 )] ] The mapping of the parameters in the above equations is shown in the Fig. 4.2. Where L(p0 ) is the initial probability for the first state. In the figure, it is shown that each of the IFMP units has two terms; the pre state unit and the self state unit in addition to the related transition and observation probabilities for each state. Two HMM (H1 andH2 ) are tested to observe the sequence {1, 3, 3, 2} where the rows in the probability observation matrix represent the states and the columns represent the item sequence as a digit ∈ {1, 2, 3} . In HMM H1 , it is trained to maximize the probability in the third state according to trained observation. The second HMM H2 has identical transition probabilities as H1 but with different observation matrix that does not recognize the above sequence. The second observation is given as:   0.9 0.1 0.0     O = 0.1 0.9 0.0     0.9 0.1 0.0 In both observations, the excitatory inputs corresponding to each state Sj are the recursive feed-back term L(pi [n − 1]) , the transition probability score L(aij ) and the observation probability score L(Oj,x[n] ). At the end of the HMM decoding (after N sequences have been 117 L(P1) L(P3) L(P2) L(P1) L(P2) L(P3) Figure 4.3: (Left): Rates of the states in Fig. 4.2 for two sequences recognition with both matched and unmatched observation using pulse based HMM in software and schematic simulation. Time scale: 0.4ms/step. (a): output rates of HMM state H1 software simulation. (b): output rates of HMM state H1 cadence simulation. (c): output rates of HMM state H2 software simulation. (d): output rates of HMM state H2 cadence simulation. (right): A window of the output pulses for the three states for HMM H2 which view the sparse of pulses between state s1 as Lp1 rate , state s2 as Lp2 rate and state s3 as Lp3 rate received, here N = 4, p(x|H1 ) can be estimated according to log(p(x|H1 )) ≈ L(p3 [N ]) (4.18) Which is then compared against the likelihood scores generated by other HMMs before the decision is made. Fig. 4.3 shows the instantaneous firing-rate computed for each of the 118 modules S1 , S2 , S3 in HMMs H1 and H2 when an observation sequence x = {1, 3, 3, 2} is received. For this particular experiment, only H1 is trained to recognize this sequence (which can be easily verified by inspecting the observation matrix. As described in equation 4.9, the final likelihood score can be estimated using L(p3 [N ]) which is the firing-rate of the third IF module of third state. As shown in Fig.4.3, the HMM H1 which is trained to recognize the observation sequence produces a higher probability (output p3 ) compared to output produced by HMM H2 for both the software and schematic simulations. In other words the output rate of recognizing HMM H1 in the third state is higher than that for HMM H2 . Fig 4.3 (a, b, c, d) shows successively the output rates of HMM H2 states in software simulation, the output rates of HMM H2 states in cadence simulation, the output rates of HMM H1 states in software simulation and the output rates of HMM H1 states in cadence simulation respectively. In the same Figure (right) a window of 60ms is shown for the output pulses (circuit simulation) for the three states of HMM H2 and H1 . 0.5 0.4 HMM−H 1 0.3 0.3 Rate Rate 0.4 P1 P2 P3 P1 HMM− H 2 P2 P3 0.2 0.2 0.1 0.1 0 0 2000 Time 4000 0 0 6000 2000 Time 4000 6000 Figure 4.4: Instantaneous firing-rates for neurons N1 to N3 using the IFMP network implementing HMM H1 (left) and HMM H2 (right) for the observation sequence {1,3,3,2}. Another experiment is implemented for the same input sequence {1,3,3,2} shows a successful detection. The result of the experiment is shown in Fig. 4.4. The figure represents 119 the moving average window for the probabilities over the time operation which shows that the probability of the third state is higher in HMM H1 than HMM H2 . Probability of third state is the resultant sum of product over the observations, transition probabilities and iterations. It is shown in the figure the instantaneous firing-rates for neurons N1 to N3 using the IFMP network implementing HMM H1 (Left) and HMM H2 (right) for the observation sequence {1,3,3,2}. Trigger inputs to collect the stored pulses in stack memory from FPGA to PC Input/ output pulses to /from IFMP PC USB Controller Initialization FPGA NI data acquisition Card IFMP Chip Biasing voltage (a) Figure 4.5: Interfacing boards for testing the IFMP chip. The PC and FPGA are used to generate random pulses/getback the output pulses to/from the IFMP circuit. The National instrument interfacing card is used to generate voltage biasing to the transistors of the IFMP as well to intiate the stack memory to pull the stored pulses to the PC 4.2.1 Chip simulation For the hardware verification of the IFMP pulse computational modules, the sequence detection example is implemented using arrays of IFMP’s on a chip. The units of the IFMP in the array are chosen using row and column decoders. The IFMP chip is interfaced to the FPGA and National instrument (NI) ADC/ DAC boards shown in Figure 4.5 to test and (c) 120 ( Assig Map the algorithm procedure of the applications into IFMP equations: • Scale the input parameters. • Decide the number of IFMP units required to implement the algorithm Assign the required input / output ports for the FPGA • Assign the ports between IFMP chip and FPGA • Assign the ports between the FPGA and PC Write a Verilog program to configure the connectivity between IFMP units using FPGA • Generate the random pulses for the parameters • Decide the timing clock required to generate the random pulses ( see the timing chart) v Apply the input pulses to all IFMP’s in parallel • Design a time multiplexing mechanism to output the pulses • Decide the feedback required from chip to modify the input pulses • Write an interfacing program to communicate the PC with FPGA • Convert the Verilog program to bit file • Download the bit file in the RAM of FPGA to Configure the connectivity between IFMP units • Analyze the generated rate out of the IFMP stages • Use pipe line facility of the USB interfacing with FPGA to save the pulses into RAM • Load these pulses from RAM into PC using the interfacing program for quantifying the results • Figure 4.6: Flowchart of the supported software needed to implement applications using IFPG board and the array of IFMP modules on a chip implement the HMM algorithm. The NI instrument is used to apply the voltage biasing to the IFMP chip while the FPGA system is used to apply inputs to the IFMP units, route the 121 pulses between the IFMP units and store the output pulses of the IFMP units for analysis and verification of results. In this experiment, the input parameters are applied in parallel to the three HMM state machine discussed earlier (the three IFMP modules). The inputs are generated using pseudo random generators written in verlog hard description language (Verlog HDL). While the inputs are applied in time sharing strategy for each input pulse period, the output pulses from the three states are considered as feedback signals according to the forward HMM equations. Clock Iteration Out1 Out2 Out3 Col1 Col2 K[0:2] a1[39:0] P3 b1[39:0] c3[39:0 P1 c1[39:0] a2[39:0] b2[39:0] P2 c2[39:0] a3[39:0] b3[39:0] P3 c3[39:0] Figure 4.7: Timing simulation of Verilog program to generate random pulses to the three units of IFMP in time sharing mode where col1 and col2 select the IFMPunit on the chip 122 The flow chart shown in Figure 4.6 explains the procedure needed to implement applications using IFMP chip supported by the FPGA board as a device to select the connectivity and route the pulses between the IFMP units on a chip according to a particular application. The flow chart for implementing applications started by scaling the input parameters and assigning the iteration number of the random input pulses. Then the interfacing ports need to be assigned between the IFMP chip and FPGA as well as between FPGA and PC . Next, a verlog program is written to generate the pseudo random pulses in time-share multiplexing. The Time step of the generating pulses depend on the frequency of applying these pulses. The step time used in software simulation is the same step time used in hardware implementation which is equal to 0.2ms. Therefore, the input frequency of 5KHz is needed for the HDL program to generate the pseudo random pulses. By simulating the HDL program, In Figure 4.7, the following pulse signals are explained as, 1. Generated pulse streams for the IFMP states(a1, b1, c1, a2, b2, c2, a3, b3 and c3): in the timing diagram, three random input rates are generated per IFMP state. Two of these inputs (a1, b1) are the excitatory inputs of the first two IF model of the IFMP, while the third input (c1) is the gamma constraint applied to the inhibitory input of the third neuron of the IFMP. The other inputs (a2, b2, c2, a3, b3 and c3) are applied in the same way to the other two IFMP states. For experimental purpose to verify the HMM operation in pulse mode, these nine inputs are generated using pseudo random generator. These nine inputs are applied in parallel to the three IFMP units in time share multiplexing mode where each pulse width represent a time of 0.2ms which is the step time implemented in software simulation and circuit simulation. 2. The output pulses from the IFMP states (P1, P2 and P3): these outputs are fed back 123 1 1 0.8 Rate P3 P1 P2 HMM−H 0.6 0.4 0.2 0 0 0.4 1000 2000 3000 Time 4000 5000 6000 2000 3000 Time 4000 5000 6000 HMM−H2 Rate 0.3 0.2 0.1 0 0 1000 Figure 4.8: The moving average window for the three output states of HMM with two observations: the HMM-H1 and HMM-H2 respectively and ORED with the input pulses to the IFMP units according to the HMM states algorithm. 3. The selection signals k, col1 and col2: These signals are used to select a specified IFMP unit in the IFMP array. 4. The clock pulse: this input signal is chosen equal to 5KHz which is the same frequency used in the software and circuit simulation. 5. The iteration signal: this signal is to repeat the pseudo random pulses inputs. 124 Figure 4.9: The output pulses and moving average window for the three states for HMM-H1 125 The last steps in the flow chart is to write an interfacing program in Matlab to communicate the PC with the FPGA, configure the HDL program on FPGA and analyzing the generated rates from the IFMP stages. The last step is utilized using the pipe line facility of software functions ( designed by opal kelly to interface the PC with FPGA through USB) to save the pulses into RAM of the FPGA then load these pulses from RAM into PC using the interfacing program for quantifying the results. An example of detecting a sequence of numbers is succeeded by evaluating the rates of the third states for the two observations HMM H1 and HMM H2 . As the expected outputs in simulation, a higher probability or rate is observed using the input parameters of the first observation HMM H1 (which is the matched observation for the input sequence [1,3,3,2]) rather than using the parameters of the second observation HMM H2 as shown in Figure 4.8. In the figure, the P3 trace represents the probability of the third state of the Markov state machine, while the P1 and P2 traces represent the rate of the first and the second states respectively . In the experiment, the number of samples /input is equal to 1500 steps each has duration time of 0.2ms which is the same step time used in the software simulation in Matlab and in circuit simulation using cadence (discussed in the previous section). For both of the observations HMM H1 and HMM H2 , it is shown respectively in Figures 4.9 and 4.10 the output pulses of the third state of HMM and the moving average window which from which the three traces of the three states clarify the matching detection of the HMM circuit. To summarize this section, the implementation of HMM sequence decoding task is demonstrated in pulse based using analog circuit as cadence simulation and chip layout hardware which realize the modular characteristics of IFMP. Achieving MP based computation in spike-domain implies the possibility of implementing hybrid neuromorphic architectures where large-scale machine learning algorithms can now be integrated on spiking neural126 Figure 4.10: The output pulses and moving average window for the three states for HMM-H2 networks. The observation probability matrix in the above application shows discrete or fixed probability to the observed input items. In the next section, another detection applica127 tion is verified in HMM model using IFMP pulse computation module where the observation probability are real values and obtained using Gaussian distribution. 4.3 Trajectory detection In this application, the forward algorithm of HMM Model discussed in the sequence detection application is used to detect a trajectory traces. Since the trajectory has infinity points in the trace, the change in the curvature shape is considered as separate state. Each state is trained by selecting the mean and variance of the neighboring points using Gaussian distribution. Fig. 4.11 shows the states sj : j = 1 : 17 of the HMM model used to detect eight trajectories Pi : i = 1 : 8. The model is left-to-right Markov models with three states per one path. Each state, j , is characterized by the following. 1- A state transition vector, ai , with components aji = probability of making a transition to state i (at the next transition instant), given that the system is currently in state j. 2- A state observation continuous density, bj (O), of the form of, bj (O) = D 2 2 d=1 exp[−(O(d) − µjd ) /(2σjd ] 2 1/2 2π 1/2 ( D d=1 σjd ) (4.19) where O is the observation vector, µjd is the mean vector and σ is the standard deviation in state j and D (D = 2 for this experiment) is the component of dimension of µjd . In this application, the sets of mean and standard deviation for each trace is trained in order to find the standard deviations that are close to the trace. The mean, standard deviation and the observed dimension of the event are used to evaluate the observation probability. In the experiment, the standard deviation used for training the neighboring trace is chosen as 0.2 128 P3 2.5 14 P5 10 5 11 2.0 Y- position P4 15 4 P2 1.5 6 3 7 1.0 P1 0.5 P6 2 12 16 P 8 Path trace P7 1 9 17 0.5 13 1.0 State P8 1.5 X- position 2.0 2.5 Figure 4.11: Traces for eight trajectories starting from state 3 using three HMM states per path unit so that there is no conflict with other traces. The mean values in the two dimension graph shown in Fig. 4.11 are chosen for the eight paths as following, 1. Path1: {(1.5,1.5); (1.0,1.0); (0.5,0.5)} 2. Path2: {(1.5,1.5); (1.0,1.5); (0.5,1.5)} 3. Path3: {(1.5,1.5); (1.0,2.0); (0.5,2.5)} 4. Path4: {(1.5,1.5); (1.5,2.0); (1.5,2.5)} 5. Path5: {(1.5,1.5); (2.0,2.0); (2.5,2.5)} 129 Position sequence close to the states 3, 12 13 13 path 6 Rate 1.0 0.5 0.5 0 0 1 Rate 1.0 2 3 4 5 6 7 8 Path trace Position sequence close to the states 3, 16, 17, 17  path 8 1.0 0.5 0.5 0 0 1 2 3 4 5 6 Position sequence close to the states 3, 11 10 10 path 3 1 Rate Rate 1.0 4 5 6 7 8 Path trace Position sequence close to the states 3, 2 1 1 path 1: 1 7 8 Path trace 2 3 2 3 4 5 6 7 8 Path trace Figure 4.12: Four detecting experiments for different trajectory paths in Fig 4.11. The xaxis and Y-axis reprsent the eight traces and the probabilities for each trace path in the experiment 6. Path6: {(1.5,1.5); (2.0,1.5); (2.5,1.5)} 7. Path7: {(1.5,1.5); (2.0,1.0); (2.5,0.5)} 8. Path8: {(1.5,1.5); (1.5,1.0); (0.5,0.5)} In order to check which path is belong to given points or traces, the points as an observation values are applied in Gaussian equation (4.19) to calculate the observation probability, given the variance σjd and the mean values for the paths listed above. Once the observation probability and the transition probability are known, then it is possible to apply the forward 130 algorithm HMM to calculate the probability of the being in any states having the input observation of the trace point. Thus, the observation probability and the transition probability for all traces are mapped into the recursive HMM-IFMP formula to decide the highest probability for the observed event or trace among the eight traces as shown in Fig. 4.12. In this application, four experiments are implemented to check the path to four different input traces. In the first experiment, the sequence of the trajectory points near by the states 3, 12, 13, 13 are required to be detected and decide the closest path. For this purpose, the HMM model is implemented to find the probabilities for the eight paths. In this experiment, it is shown the max probability matches at path 6 which is the closest path to the input trace points as shown in the trace figure 4.11 and 4.12. The same procedures are implemented for the other three experiments to detect respectively the trajectory points near by the following states [3,11 ,10 ,10], [3 ,16 ,17 ,17] and [3 ,2 ,1 ,1]. The figure shows that the maximum probabilities for the above traces are corresponding paths 3, 8, and 1 respectively. 4.4 Support vector machine Support vector machines (svm) are one of the most popular supervised learning instances. The supervised learning methods are comprised of two discreet phases, the training and the classification [80]. The svm training phase is responsible for the identification of these data points, called support vectors, that can best build a separation model for the classes. These vectors are then used to predict the class of any future data point during the classification phase. In this section, a general idea of svm learning machine and a brief description of the learning and classification will be explained. 131 The General idea of classifying data using support vector machine it to map the data from the original input space into some higher-dimensional feature space where the training set is separable. For example in Fig. 4.13 (a), the data is mapped from one dimension into two dimension space and in Fig. 4.13 (b), the data is mapped from two dimension into three dimensional space. Every data point is mapped into high-dimensional space via some transformation φ : x −→ φ(x) such that: K(xi , xj ) = φ(xi )T φ(xj ) where K(xi , xj ) is some function called a kernel function that corresponds to an inner product in some expanded feature space. Example: Two dimensional vectors x = [x1 x2]; let K(xi , xj ) = (1 + xTi xj )2 , Starting from the second order polynomial kernel K(xi , xj ) = (1 + xTi xj )2 , it can be shown that K(xi , xj ) = φ(xi )T (xj ) as following, K(xi , xj ) = (1 + xTi xj )2 = 1 + x2i1 x2j1 + 2xi1 xj1 xi2 xj2 + x2i2 x2j2 + 2xi1 xj1 + 2xi2 xj2 = [1 x2i1 √ [1 x2j1 √ √ 2xi1 xi2 x2i2 2xj1 xj2 x2j2 2xi1 √ 2xj1 √ 2xi2 ]T ∗ (4.20) √ 2xj2 ] = φ(xi )T φ(xj ) where φ(x) = [1 x21 √ 2x1 x2 x22 √ √ 2x1 2x2 ] The kernel function types listed below are used in training and classification in svm, 132 Φ: x → φ(x) x2 x 0 (a) x1 Φ: x → φ(x) 2x1 x2 x2 (b) x12 2 w.x  b  1 || w || class1 d x2 2 w.x  b  1 class 2 (c) Figure 4.13: (a, b): Two examples of mapping a non separable data (left) into a linearly separable data (right) by mapping the lower dimension data into higher dimensions. (c): Binary classification based on linear huperplan svm • Linear: K(xi , xj ) = xTi xj • Polynomial of power p: K(xi , xj ) = (1 + xTi xj )p x −x • Gaussian (radial-basis function network): K(xi , xj ) = exp(− i 2j 2σ 2 ) • Sigmoid: K(xi , xj ) = tanh(β0 xTi xj + β1 ) In the IFMP application for classification, a second order polynomial is implemented as will be discussed in the next section. The svm training builds a model that is able to distinguish the belonging class of any 133 future data based on the support vectors obtained by the training data set. On a twoclass classification problem, the svm objective is the construction of a separating hyperplane w.x − b = 0 to attain maximum separation between the classes, as shown in Fig. 4.13(c). The classes’ hyperplanes are parallel to the separating one, lying on each of its sides. The Euclidean distance between the two hyperplanes is w2 , thus the objective of svm is to maximize the distance between the classes hyperplanes or, in other words, to minimize min 1 2 w2 , s.t. yi (w.xi − b) ≥ 1, 1 ≤ i ≤ N w (4.21) where (·) denotes an inner product, xi is the training data and label yi denotes the belonging class of data xi and takes the values -1 or 1. While w is a perpendicular vector to the hyperplane direction, b is the offset to the origin, and N is the training set size. The svm training phase focuses on the identification of the support vectors, which are the training samples that lie closest to the hyperplane and determine its direction. Many optimization algorithms have been proposed to find the parameters of svm [81, 82, 83]. Whenever the training data are not linearly separable in the input space, the input space is mapped to a higher-dimensional one, where a linear separation may be feasible. Support vector Machine employs kernel functions discussed above K(xi , xj ) to replace the inner products in the optimization problem 4.21 min 1 2 w2 , s.t. yi (k(w, xi ) − b) ≥ 1, 1 ≤ i ≤ N (4.22) On the classification phase, any new data x is classified according to the output of the 134 decision function; Nsv F (x) = sgn( yi αi k(xi , x) + b) (4.23) i=1 where αi and b are the optimization parameters and ”Nsv ” is the set of the support vectors identified in the training phase. It is easily derived that the computational time of the nonlinear svm classification task is linearly dependent on the size of the classification dataset, on the SV population ”Nsv ” and on the problem’s dimensionality. In the next section, the svm classification using the IFMP computation modules will be shown. 4.5 Classification using support vector machine A multi-class svm classifier computes matching scores (kernels) between the input vectors y and a set of template vectors (support vectors) xi . Then the kernel of the two vectors x and y are combined linearly to produce a score corresponding to class j according to Nsv fj (y) = αij k(xi , y) + bj (4.24) i=1 where bj is a class specific bias parameter and αij are the weights corresponding to the support vector xi and class j. In this work, a quadratic kernel K(xi , y) = (xi · y)2 is implemented, satisfying the Mercer condition [Mercer condition only tells whether the kernel function is actually an inner product in some space and therefore admissible for use in a support vector machine. However, it tells nothing how to find φ(x). To find support vectors, there are different training packages for training the input data based on the optimization problem to maximize the marginal distance from the hyper plane] [84]. The svm training procedure automatically selects support vector templates from the training examples, and 135 derives values for the weights αij and the bias parameter bj accordingly. During the training procedure, several hardware related constraints can be imposed. For instance, all the the weights, biases and support vectors are positive so that they can be mapped directly onto an IFMP architecture using the inner-product in the kernel (xi · y). The polynomial operation (.)2 in the kernel is mapped onto multiplication because IFMP computations operate in a logarithmic domain. For a spiking architecture, the mapping simply results in multiple fan-outs (copies) of the spike-train. The multiplication of the weights αij with the kernels according to equation 4.24 can be viewed as an inner-product computation and hence the mapping to IFMP described in chapter 2 is also applicable. As an example of applying IFMP as pulse computational unit, a binary linear classifying has been implemented using svm algorithm. The following procedure shows how to derive the parameters to be applied in the IFMP based svm experiment: Let y denotes a vector m 1 denotes a set of non-linear transformations drawn from the input space and let {φ(yj )}j=1 from the input space to a feature space , m1 is the dimension of the feature space. For such set of non-linear transformations, a hyperplane is defined as the decision surface as: m1 α ¯ φ(yj ) + b = 0 (4.25) j=i where α ¯ denotes the vector of non-linear weights connecting the feature space to the output space, and b is the bias. In this example, a second order polynomial kernel of inner product is introduced between the input vector y and support vectors x to decide the hyperplane as N αi (xi .¯ y )2 + b z= (4.26) i=1 To find the parameters for the IFMP units, the logarithm function of both sides of the 136 above equation must be applied as following, N α.(xi .¯ y )2 + b) (4.27) e(log(αi )+2. log(xi .¯y)) + e(log(b)) ) (4.28) Lz = L( i=1 N Lz = L( i=1 where N represent the number of support vectors. The parameters for margin (M1 ) are expressed as, M1 ({log(αi ) + 2. log(xi .¯ y )}, log(b), γ) (4.29) Now, the kernels ki are calculated as the Margin parameter (M2 ) for the inner product term such that m xi,j .yj ) (4.30) (log(xi,j )+log(yj )) ) (4.31) Ki = log( j=1 m e Ki = log( j=1 Therefore, the parameters for Margin (M2 ) are the set of kernels Ki such that, Ki = M2 ({log(xi,j ) + log(yj )}, γ) (4.32) Since the values of bias b and optimization factors αi might be negative, these values are simulated in differential form (b+ − b− and αi+ − αi− ). Therefore, equation 4.27 is derived for two differential form as following, 137 N Lz + = L( α+ .(xi .¯ y ) 2 + b+ ) (4.33) α− .(xi .¯ y ) 2 + b− ) (4.34) i=1 N Lz − = L( i=1 The difference between the above two equations are used to decide the classification of data as shown in Fig. 4.14. a1 a2 an log + scale b1 b2 bn Pulsed computational module IFMP F1=LZ+ F2=LZ- F = eF1 − eF2 Figure 4.14: The classification using IFMP and support vector machine to compare between the differential outputs As an example, the IFMP based svm network is applied for a binary classification task where the two-dimensional Iris data (shown in Fig. 4.16(a)) are used for training and validation. A ”Gini” svm training algorithm is used to obtain the weights αij and identify the support vectors xi [84]. The complexity of the IFMP depends on the number of support vectors and number of sparse data. In this application, nine support vectors are obtained from the training program ” Gini” svm and 22 units of IF neural model are needed for the classification architecture. For this application, the architecture shown in Fig. 4.15 is designed for two - class classification using spiking svm network based on IFMP modules. The architecture follows a modular approach to compute first the kernel inner-product and sec138 ondly the inner-product between the weights and the kernels. The first computations shown in the architecture is implemented using nine IFMP modules to implement the kernel inner product with nine support vectors mentioned in equation (4.32). These nine units (k1 : k9 ) are common units between the two differential hyper plane equations (4.33) and (4.34). The + input parameters for the positive hyper plan are : b+ 1 , α , k1 : k9 , where k represents the kernels or the inner products of the support vectors and the input data to be classified. In − the same way, the input parameters for the negative hyper plan are : b− 1 , α , k1 : k9 . For this architecture note the followings: 1. For the IFMP architecture, a quadratic operation is implemented using a simple fan-out which replicates the spike-train twice. 2. The two outputs z + and z − in the figure are in log domain and they are corresponding to the output of the IFMP networks for each class. 3. All the inputs are quantified as random pulse rate. Fig. 4.16(b) shows the raster-plot of the spikes generated by the neurons in the svm network [raster plot: is a fig that shows the occurrence of spikes for neurons in population with respect to time]. The spike-train at the top of the plot corresponds to one of the output of the network (neurons 11 and 22 in Fig 4.15) . Visually, it is inferred that when the vector corresponding to each of the classes is presented, the output spiking rate illustrates the network’s discrimination capability. Fig. 4.17 shows the difference between the firingrates z + − z − computed for the input data. The result shows that the network is correctly classifying both types of Iris petal data and hence the result validates the proof-of-concept of an IFMP-based spiking SVM. 139 Lx (1,1)  Ly (1) Lx (1,2)  Ly (2) Lb1 23 24  25 k1 1  Lz  11 2 Lb1 L1  12 Lz  22  13 Lx (9,1)  Ly (1) Lx (9,2)  Ly (2) L 1  L 9  47 48  49 k9 10 L 9  21 B1 B2 B3 Figure 4.15: svm based classification using parllel cascading of IFMP modules. k1 : k9 are the kernels. The difference between the firing-rates of the output z + − z − determines the classification of input data. B1: Input vector (y) and support vector (x), B2: Kernel function (k), output decision (z + ) B3: Kernel function (k), output decision (z − ) On the other hand the classification has been implemented for cascading architecture of two input- IFMP units shown in Fig. 4.18, where the support vectors are applied to a series of IFMP units with positive and negative values of alpha and bias values. Fig. 4.19 shows successively the classification of the data and raster- plot of the cascaded IFMP units as in the first case where neurons 30 and 81 are the output neurons of the two differential parts of the IFMP based SVM implementation. In this case, 81 neurons are used to simulate SVM with nine support vectors, nine values of alpha and one bias value, which are all in the differential form , to test the classification of input data. 140 (a) (b) Figure 4.16: Results obtained from the svm based classification using IFMP network. (a): Two dimensional Iris data set; (b): Raster-plot of spikes generated by the neurons in the network; and (c) Difference between the firing-rate of the outputs z + and z − 141 (b) (c) Figure 4.17: Continued- Difference between the firing-rate of the outputs z + and z − 4.6 Summary In this chapter, it is shown that the new designed pulse computational unit is not only approximating the basic nonlinear functions but it can also cascaded in serial architecture or parallel architecture to approximate more complicated algorithms. Three applications are implemented: the sequence decoding, the trace detection and data classification. The first two applications are implemented using forward recursion algorithm of hidden Markov model and the third application is implemented using support vector machine algorithm. It is not intended to implement these applications to replace artificial neural networks rather to show that the new designed pulse computation module is applicable in complicated algorithms that include state machine design and recursive equations in their procedures. Moreover, all the parameters in these applications are represented by the rate of a real time random pulses. Another important goal in this chapter is to verify that such applications can be applied in 142 Lb  Lx (1,1)  L1 L x (1,2 )  L y 2 IFMP Lx (2,1)  Ly1 k1 Z IFMP IFMP1 2k1  L1 2k 2  L 2 IFMP 2k9  L 9 IFMP Lx (2,2)  Ly 2 Lx (9,1)  Ly1 k2 k9 IFMP Lx (9,2)  Ly 2 2k1  L1 2k2  L 2 2k9  L 9 Z Kernels IFMP IFMP1 IFMP Lb  Figure 4.18: Classification for serial cascaded architecture using IFMP. k1 : k9 are the kernels. The biasing and weights (b and α ) are in the differential forms to find the output z + and z − that decides the classification of input data hardware. In the hardware implementation, the designed IFMP algorithm is mapped and verified into analog circuit, and then design an architecture composed of IFMP array to implement procedural applications. The implementation of applications is possible because first the IFMP units are scalable modules and secondly the connectivity between modules are implemented using high speed programmable gate array ”FPGA”. The FPGA helps to apply high speed random pulses, consider the output pulses from the IFMP modules as feedback in processing and finally analyses the output rates according to the required algorithm. 143 Class1 81 Class2 70 Neurons 60 50 40 30 20 10 0 7900 7950 8000 Time (a) 8050 8100 0.04 0.03 z+ −z− 0.02 0.01 0 −0.01 −0.02 0 20 40 60 index (b) 80 100 120 Figure 4.19: (a): Classification for the serial cascaded architecture of two classes of Iris data set. The negative and positive values of (z + − z − ) represent Setosa and Versilcolor types respectively using two models,the MP (red marker) and IFMP (plus marker). (b): Raster plot for spikes of neurons 30 and 81 decide the classification of two samples of input Iris data set: the first and the second half of the raster plot respectively 144 Chapter 5 Significance of IFMP There are two significant roles of the new designed IFMP module in pulse computation algorithms. The first significant role is the enhancement of dynamic range in computations compared with an existed pulse-based stochastic computation applications. The second significant role is the direct and simple implementation of the ”winner take all” design compared with other existing designs. For the first significance of the designed IFMP module, it is shown with an example that the factor graph application reported to fail in pulse based stochastic computation regardless of the simplicity in hardware structure. In section one, a solution is suggested to this failure by using the IFMP pulse computation module. The second role of the IFMP designed module is compared with the existing ”winner take all” designs which are important in designing neuromorphic architecture such as the receptive field structures of retina and cochlea. A background of designing winner take all is presented starting from continuous time design presented by Lazzaro, 1989[85] to the spiking network design presented by; Oster, 2009), [86]. It is shown in section 2 with an example, the significant difference of spiking network design between Oster’s design and the new IFMP design. 145 5.1 5.1.1 Dynamic range enhancement in pulse computation Applications of stochastic computations Stochastic computations are pulse computational structures where the probabilities are represented as streams of random digital bits using Bernoulli sequences and the information is contained in the statistics of the bit stream. Using this representation, complex operations on probabilities such as multiplication and division are converted to operations on bits which can easily be manipulated using simple stochastic gates. Stochastic computation is reported to implement hardware structures for some applications of factor graph algorithms. The factor graphs, introduced in chapter one, originally lie in the coding theory but they offer an attractive notation for a wide variety of signal processing problems. These problems are viewed as instances of the summary-product algorithm which operates by message passing in a graphical model. Specific instances of such algorithms include Kalman filtering and smoothing; the forward-backward algorithm for hidden Markov models; probability propagation in Bayesian networks; and decoding algorithms for error-correcting codes such as the Viterbi algorithm, the BCJR algorithm, and the iterative decoding of turbo codes, low-density parity-check (LDPC) codes, and similar codes [39][17]. However, stochastic decoding methods could only work either on very simple short codes or for decoding some specific error-correcting codes on trellis graphs. The reason for that is because the stochastic computations have false computations and might not be successful for other decoding applications. The false stochastic computations are caused by the similar pulsed-input codes in the multiplication operation and implementing scaling addition in the addition operation. In addition to that, the main problems in the stochastic computation are the long time convergence and low dynamic range of processed parameters. To address 146 the low dynamic range problem, the new pulse computational module ”IFMP” is introduced to map the input/output parameters in logarithmic domain. In the next section, the basic units in stochastic computations are introduced in order to design the required dynamic or the recursive equations for the algorithmic applications. An application of sequence decoding using Hidden Markov model is implemented in both pulsebased stochastic computation and pulse- based IFMP modules to compare the performance in both cases. 5.1.2 Significant difference between IFMP and SC In stochastic representation, the multiplication function is implemented with simply the ”AND” logic gate. In Fig. 5.1 (a) the multiplication is implemented on two input values represented by stochastic bit streams. In the figure, with bit streams of length 8, the values have a resolution of 1/8. Assuming that the two input stochastic bit streams A and B are independent, then the rate c is represented by the output stochastic bit stream C as P (C = 1) such that, c = P (C = 1) = P (A = 1andB = 1) = P (A = 1)P (B = 1) = a.b a= 6/8 A B c= 3/8 1,1,0,1,0,1,1,1 A 1,1,0,0,0,0,1,0 C B 1,1,0,0,1,0,1,0 AND b= 4/8 S a: 1/8 0,1,0,0,0,0,0,0 b= 5/8 1,0,1,1,0,1,1,0 1 1,0,0,1,0,1,1,0 C 0 0,0,1,0,0,0,0,1 s= 2/8 (a) c= 4/8 (b) J K Q Figure Multiplication P a 5.1: (a): J Q P c =P a on / (Pstochastic a + P b) bit streams with an AND gate. (b): Scaled 1 (MUX) 0 1 addition on stochastic bit streams, with a multiplexer Pb K 0 1 0 147 ck (c) 0 0 No change 1 1 Reverse In stochastic computation, it is not possible to add two probability values directly because the output result might be greater than one, which cannot be represented as a probability value. Therefore, the normal addition is replaced by the scaled addition which is verified using multiplexer (MUX) . The multiplexer shown in Fig. 5.1 (b) is a digital construct that selects one of its two input values to be the output value, based on the selecting input value S. When S = 1, the output C = A. Otherwise, when S = 0, the output C = B. The Boolean function implemented by the multiplexer is C = (A ∧ S) ∨ (B ∧ S). With the assumption that the three input stochastic bit streams A, B, and S are independent, the number represented by the output stochastic bit stream C is c = P (C = 1) = P (S = 1 ∧ A = 1) + P (S = 0 ∧ B = 1) = P (S = 1)P (A = 1) + P (S = 0)P (B = 1) = s.a + (1 − s).b Thus, with this stochastic representation, the computation performed by a multiplexer is the scaled addition of the two input values a and b, with a scaling factor of (s) for (a) and (1 − s) for (b). Since the multiplication and the addition operations are verified in stochastic pulse computation, then it is possible to implement more complicated equations such as the recursive equations. It is intended in this section to map the forward HMM recursion algorithm (equations 4.7 and 4.8 in chapter 4) in stochastic computation to verify the sequence detection application. Recall that the HMM recursion equation is the following, pj [n] = i∈Cj pi [n − 1]aij Oj,x[n] Then the above equation is expanded into the following three equations; 148      p1 [n] = p0 [n − 1]O1,x[n] + p1 [n − 1]a11 O1,x[n]      p2 [n] = p1 [n − 1]a12 O2,x[n] + p2 [n − 1]a22 O2,x[n]         p3 [n] = p2 [n − 1]a23 O + p3 [n − 1]a33 O 3,x[n] 3,x[n] p0 O1, x[ n ] 2-1 MUX a11 a23 O3, x[ n ] p1 2-1 MUX a33 p3 a12 𝑆1 HMM equation: O2, x[ n ] 2-1 MUX a22 p2 𝑆3 p j [n]   pi [n  1]aij O j , x[ n ] iC j 𝑆2 Stochastic computations implement HMM decoding algorithm,HMM Figure 5.2: Stochastic computation to to implement forward HMM algorithm The above three equations are simulated in stochastic computation using the AND logic units as multiplication operations and multiplexer units as the addition operations shown in Figure 5.2. This pulse-based architecture is compared with the architecture of pulse based IFMP computational modules discussed in chapter four and shown in Figure 4.2. The implemented example is to maximize a sequence that match the HMM observations. It is shown in the lower plots of Fig. 5.3, the output rates or probabilities of the three states of the HMM H1 and HMM H2 for the stochastic simulation which can not recognize a simple sequence [1,3,3,2]. It is shown in the figure that the output rates for HMM states are low rates that make it impossible to distinguish the input sequence if it is matched or unmatched with the 149 given observations HMM H1 and HMM H2 . The low rates of the HMM based stochastic computations are caused by the multiplication of the following probabilities in the forward HMM equation: observation probability, transition probability and the pre-time state probability. Multiplying the above probabilities reduce the probability more or reduce the dynamic range of the computations that make it difficult to recognize the posterior probabilities for different HMM observations. On the other hand, mapping the HMM recursive equation to the log-exp- sum or the margin propagation enhances the dynamic range and make it possible to distinguish the different HMM observation probabilities as shown in upper plots for Fig. 5.3. In the figure of IFMP implementation, the output rate of the third state P 3 in the HMM H1 observation is higher than the output rate in the case of observation HMM H2 . In the case of HMM H1 , it is shown how the the rate of P3 increasing as the input sequence [1,3,3,2] are applied with respect to time. In addition to the dynamic range characteristics between the pulse based computations and the pulse base IFMP architecture, there are other differences in implementation such as the hardware dense, the power consumption and the rate convergence of the HMM states. Table 5.1 lists the differences in implementation between the pulse based stochastic computation structures and the pulse based IFMP computation modules. 5.2 Novel pulse-mode Winner take all (WTA) The second significance of the new designed IFMP module is the simplicity and the direct implementation of the winner take all network. The definition of the winner take all concept and it’s background in the design and the contribution of the new designed pulse mode IFMP 150 0.5 P1 HMM− H 2 P2 P3 0.3 0.3 Rate Rate 0.4 0.4 HMM−H1 P1 P2 P3 0.2 0.2 0.1 0.1 0 0 2000 Time 4000 0 0 6000 2000 HMM−H 1 1 3 3 2 0.8 P1 P3 P2 0.4 1 3 4000 6000 HMM−H2 3 2 P1 P3 P2 0.6 Rate Rate 0.5 Time 0.3 0.4 0.2 0.2 0.1 0 0 1 2 Time 3 0 0 4 4 x 10 1 2 Time 3 4 4 x 10 Figure 5.3: (Upper):the observation sequence {1,3,3,2} of the Instantaneous firing-rates for neurons N1 to N3 using the IFMP architecture. (Lower): the observation sequence {1,3,3,2} of the Instantaneous firing-rates for neurons N1 to N3 using stochastic computation architecture architecture is introduced in the successive sections. Winner take -all networks are a case of competitive learning in recurrent neural networks. Output nodes in the network mutually inhibit each other, while simultaneously activating themselves through reflexive connections. After some time, only one node in the output layer will be eventually active, namely the one corresponding to the strongest input. Thus the network uses nonlinear inhibition to pick out the largest of a set of inputs. Winnertake-all is a general computational primitive that can be implemented using different types 151 Table 5.1: Differences in implementation between the pulse based stochastic computation structures and the pulse based IFMP computation modules Pulse - based IFMP Computation Pulse - based Stochastic Computation Number of Transistors = 100 (more dense in layout chip design) Number of Transistors = 300 (less dense) Analog computations, low power dissipation = 1.6 µ W, operation current =200nA Digital Computations, Higher power dissipation in 100s mW, operation current in hundreds µ A Implementing the computations by taking the log of input rates (probabilities) introduce wide range(Ex: log .000001= -6) Implementing the computations prohibits the decoding process (Ex: multiplication of probabilities reduce the values causing underflow ) Increasing the number of pulses to represent the rate improves the convergence equation and eventually enhances the decoding applications. Increasing the iterations does not improve the decoding output. The IFMP module is scalable, meaning that the IFMP units has the same structure each implement partial computation of any mapped algorithm (Ex. HMM and SVM algorithm) Simple design but each algorithm has different structure as hardware. of neural network models, including both continuous-time and spiking networks (Grossberg, 1973; Oster, 2009)[87] [86]. 5.2.1 Importance of WTA 1. WTA has a role in the sensing of the external stimulus in the receptive fields of cochlea, retina and skin [88]. [Zaghlol 2006, Wen, 2003, Dungem, 2005]. A winner-take-all circuit finds the maximum among a set of inputs, shutting off the other inputs. This 152 is a useful function in many sensory processing tasks that try to focus attention on the most interesting salient of an array of sensory inputs. Solving this task on-chip with dedicated parallel computing elements is much faster than processing the raw sensor data serially on a computer. 2. WTA has a role in cortical processing. Decision processes in the brain are not localized in one specific region but evolve in a distributed manner when different brain regions cooperate to reach a consistent interpretation. The winner-take-all circuit with both cooperative and competitive properties is a main building block that contributes to this distributed decision process in [89] [90]. 3. WTA has a role in a hierarchical model of vision in cortex. Computational visual attention is the process of selecting a part of the available visual information for localization, identification and understanding of objects in an environment or visual scene. This process allows the visual system to capture visual input preferentially by shifting attention about an image, giving more attention to salient locations and less attention to unimportant regions [91] [92] [93]. 4. WTA circuit conceived as part of an imager chip to process current input from a motion detection array, thus detecting the row and column of maximum change of illumination. The fact that this WTA processes analog input and produces spike output is most convenient for the address event interface (AER) that conveys the WTA output off-chip. 153 5.2.2 Background of WTA (continuous input/output variables) Because of the importance of WTA networks, these networks have been of great interest to researchers. Yuille and Grzywacz (1989) [94] and Ermentrout (1992) [95] are classical references to theoretical analyses. In these early models, the neurons are non spiking, that is, they receive an analog input and have an analog output. The analog WTA computation can be efficiently implemented in very-large scale integrated (VLSI) transistor circuits. With initial circuits described in [96], a whole series of analog models [97, 98, 99] and implementations has been developed in [100, 101, 102, 103, 104, 105, 106]. 5.2.3 Case study to design WTA ( spiking networks) In the past decade, spiking neuron models and their electronic counterparts have gained increasing interest. Spike-based networks capture the asynchronous and time-continuous computation inherent in biological nervous systems. Neuron models with analog inputs and analog outputs can be converted into models with spiking output if a thresholding operating is introduced to the neuron. Coultrip, Granger, and Lynch (1992) is an early theoretical analysis, with further descriptions in Jin and Seung (2002) [107] and Yu, Giese, and Poggio (2002) [108] and VLSI implementations in Chicca, Indiveri, and Douglas (2004), Abrahamsen, Hafliger, and Lande (2004) [109], and Oster, Wang, Douglas, and Liu (2008) [110]. The next theoretical considerations are models with both spiking input and spiking output. Previous theoretical studies focused on population models (e.g., Lumer, 2000) [111], where the population firing represents a graded analog value. Indiveri, Horiuchi, Niebur, and Douglas (2001) [104] and Chicca, Lichtsteiner, Delbruck, Indiveri,and Douglas (2006) [112] are VLSI implementations that use the firing rate as an analog input and output encoding. 154 A special case is presented in Carota, Indiveri, and Dante (2004) [113] and Bartolozzi and Indiveri (2004) [114], in which the core of the winner-take-all is analog but the signals are converted to spike rates for communication with the outside world. Table 5.2 present the summary of the above background of WTA design. 5.2.4 Contribution and Significance of IFMP as WTA No analysis until now has considered the effect of single spikes and spike timing on the winner-take-all computation with random spiking inputs and outputs. Gautrais and Thorpe (1998) [115] start their analysis from a similar point of view, that is, how the network decides which of two input spike rates is higher, but they do not consider sampling of this estimation in the output spikes (this analysis could be classified as spiking input and analog output). The emergence of multichip spiking systems that incorporate the WTA as part of their decision process (Serrano-Gotarredona et al., 2005 [116]; Choi, Merolla, Arthur, Boahen, and Shi, 2005 [117]; Chicca, Lichtsteiner, Delbruck, Indiveri, and Douglas, 2006 [118]; Vogelstein, Mallik, Culurciello, Cauwenberghs,and Etienne-Cummings, 2007 [119]) highlights the necessity for theoretical quantification of a WTA network based on different network parameters (e.g., CAVIAR), especially if these systems are to be used in different applications. To address this need a framework is developed by Oster et al. (2001-2009) [86] for quantifying the performance of a spiking WTA network with regular spiking inputs. Our contribution is to develop a new pulse computational architecture that implement WTA computation as one application to our model. The new aspect here is that the input pulses to the network is not necessary to be regular which is the basic of Oster architecture in which the recognition of the highest rate is based on receiving the earliest signal in time among 155 Table 5.2: Survey of Winner take all design started from theoritical analaysis into spiking populaion of neural networks Research of WTA development Achievement of WTA Yuille and Grzywacz (1989);Ermentrout (1992) Models as theoretical analyses: In these early models, the input and output signals of neuron models are analog rather than spiking signals. Lazzaro, Ryckebusch, Mahowald, and Mead (1989); Kaski and Kohonen (1994); Barnden and Srinivas (1993); Hahnloser, Sarpeshkar, Mahowald, Douglas, and Seung (2000) The WTA circuits are initially described using analog computation. He and Sanchez-Sinencio (1993); Starzyk and Fang (1993); SerranoGotarredona and Linares-Barranco (1995); Kincaid, Cohen, and Fang (1996); Indiveri (1997,2001); Moller, Maris, Tomes, and Mojaev (1998); Hahnloser, Sarpeshkar, Mahowald, Douglas, and Seung (2000); Liu (2000, 2002) The WTA are efficiently implemented in very-large scale integrated (VLSI) transistor circuits Coultrip, Granger, and Lynch (1992); Jin and Seung (2002); Yu, Giese, and Poggio (2002);Chicca, Indiveri, and Douglas (2004); Abrahamsen, Hafliger, and Lande (2004); Oster, Wang, Douglas, and Liu (2008), Lumer (2000), Indiveri, Horiuchi, Niebur, and Douglas (2001); Chicca, Lichtsteiner, Delbruck, Indiveri, and Douglas (2006) Early theoretical analysis with VLSI implementation of spiking neuron models when the threshold operating point is introduced to the neuron. The studies are focused on spiking population models, where the population firing represents a graded analog value. Oster et al. (2001-2009) quantifying the performance of a spiking WTA network with regular spiking inputs 156 set of incoming sources of spikes. If the input spikes are random, then Osters architecture will not recognize the highest rate as will explained in the next section. In the new designed IFMP architecture, the inputs to our computing architecture is quantified as the rate of random spikes in which a simple procedure is implemented to recognize the highest rate even if the input spikes are random pulses in time. Table 5.2 list the summary of the WTA background. Neuron1 (winner) Input spikes for neuron1 Vth Vth VE Membrane potential for neuron1 VE Vself Output spikes for neuron1 Neuron2 Input spikes Membrane potential VI VE VI VE (a) Number of connections: n(n-1)= 20 (19)=380 (b) Figure 5.4: (a): Input spikes, membrane potential and output spike of neuron1 and neuron2. It shows how is the winner neuron1 inhibit the membrane of neuron2.(b): Full connectivity between 20 neuorons for processing the input pulses shown in (a) 5.2.5 Experiment A framework is developed by Oster et al. (2001-2009) for quantifying the performance of a spiking WTA network with regular spiking inputs. Our contribution is to develop a new pulse 157 computational architecture that implement WTA computation where the input pulses to the network is not necessary to be regular which is the basic of Oster architecture in which the recognition of the highest rate is based on receiving the earliest signal in time among set of incoming sources of spikes. The problem of the above architecture is that if the input spikes are random, then Osters architecture will not recognize the highest rate. Experiment are implemented to simulate both Oster’s architecture and IFMP architecture for a network of 20 neurons. For each case the simulation of the neuronal population of neurons are considered with no connection between neurons and with specific connection according to the type of architecture. In order to understand the difference in the architecture and operation between the newly designed IFMP and Oster’s design, both of the above architectures are explained next. 1. To explore the concept of Oster’s architecture, Fig. 5.4(b) shows the full connectivity between neurons in Oster’s design. The number of connectivity in this architecture is simply calculated as n(n − 1), where n is the number of neurons. The operation of this network is explained using Fig. 5.4(a) which demonstrate the membrane behavior for a couple of neurons when the input spikes are applied. The input spikes have different frequency and therefore the winner neuron is the one with the highest input frequency or the least in time period as explained in the followings: (a) Neuron 1 (the winning neuron) spikes when its membrane potential exceed threshold. After every spike, the neuron is reset to Vself . (b) As soon as neuron 1 spikes , no other neuron is able to spike because of the inhibitory connection from neuron 1 reset the membrane potential of other neurons (neuron 2 in the figure). 158 (c) When the input spikes are applied to the neurons, the membrane potential of the neurons increase by a step voltage VE . As the membrane potential reaches the threshold, the neuron generates a spike and resets the membrane voltage again to Vself . Then the above procedures are repeated and therefore the shortest time period of the spike is considered as the winner neuron. (d) If the duty cycle for the input spikes are not equal to 50% or the input spikes are irregular, then there will be more than one winner. Fig. 5.5 shows that when the input pulses are irregular and applied to three neurons (for example), then there will be more than one winner. Therefore it can be deduced that Oster’s network can not recognize the highest rate for streams of random pulses. Figure 5.5: More than one winner in Oster’s architecture when three random input streams are applied to three neuron. 159 The above procedure are simulated and implemented with two types of connectivity between the IF neurons. The first type of connectivity are implemented when there is no weight between neurons (for time less than 400 units) and the second type when there is full connectivity ( for time greater than 400 units). Two experiments are implemented, the first one when the input spikes are regular and the second one when the input spikes are irregular (random). As a result of the first experiment, a raster plot in Fig. 5.6 shows that the neurons spike according to regular input spikes when there is no connectivity and there is only one neuron spike (winner ) when there is full connectivity. On the other hand, the second experiment shows the raster plots for four simulations in Fig. 5.7where the neurons spikes when there is no connectivity and there in different winners with the full connectivity architecture. 2. The concept of pulse based computation of the IFMP architecture is discussed earlier in details. In Fig. 5.8(a), it is shown the network architecture between neurons (n) in the IFMP design where the number of connectivity in this architecture is calculated as 3n. The concept of inhibitory connection between the neurons is similar in both the IFMP network and the receptive field of the sensory system. In Fig. 5.8(c), the neurons of the highest rate inhibit the neighboring neurons in order to propagates the highest input stimuli(highest rate). For three input IFMP module shown in Fig. 5.8(b), the output of N4 inhibits neurons N1, N2 and N3 according to the convergence equation [L1 − z]+ + [L2 − z]+ [L3 − z]+ = γ. The output neuron N3 in this case propagates the highest input rate when the constraint γ is approximately zero. Fig. 5.9 shows the membrane potential for neurons N1, N2 and N3 for a three-inputs IFMP where the output rate is the highest rate between the input rates. For comparison 160 01-2009) put pulses: ment and in the first part of time (400 time scale), 20 neurons are Figure 5.6: Raster plot for 20 neurons based on Oster’s spiking neuranal architecture. Two and they spikes to implemented Oster design of the theIFfiring integrate types ofaccording connectivity are between neurons, the first part (less than 400 units) are implemented when there connections is no weight between Theand second part ( greator e second part concurrent inhibitory areneurons. applied than 400 units) in the full connectivity and implementation of Oster’s archhitecture showing ve one winner (neuron 1 is the highest rate or the least duration in that there is only one winner neuron It is shown those neurons 2: 20 are inhibited to spike and only the generate spikes. reasons, 20 neurons of the IFMP modules are simulated to test the spiking capability of the neurons when these neurons have inhibitory connections and when there is no connectivity between these neurons. Fig. 5.10(a)shows that there is one winner between the 20 neurons for the inhibitory connections while Fig. 5.10(b) shows that the neuron spikes according to the membrane potential and the threshold reached as the normal operation for neurons. Finally table 5.3 lists the difference comparison between the temporal spiking network in Oster’s architecture [86] and the IFMP architecture. 161 20 15 15 Neurons Neurons 20 10 5 5 200 Time 400 0 0 600 20 20 15 15 Neurons Neurons 0 0 10 10 5 0 0 200 Time 400 600 400 600 10 5 200 Time 400 0 0 600 200 Time Figure 5.7: Raster plot for four experiment to the Oster’s Architecture with stochastic inputs 5.3 Summary In this chapter, two significant roles of the new designed pulse mode computation are explored. In the first significance role, it is shown that the designed pulse mode computation enhances the dynamic range of the signal processing applications that are based on probability or the rate of input/output parameters. The reason for the wide range enhancement is mapping the computational functions into log- sum- exp which eliminates the underflow problem caused by repeated multiplication of probabilities. It is shown using a simple sequence detection factor graph application that using stochastic computation failed to detect a simple sequence of numbers while the new designed pulse computational IFMP modules 162 v Number of connections in IFMP architecture : 3n= 3*20=60 (a) (b) (c) Figure 5.8: (a): Conection between 20 neurons in the IFPM architecture. (b): Inhibitary connections between neurons in the IFMP architecture similar to the basic sensory system in the receptive field shown in (c) recognize the sequence directly. The second significant role of the designed module is the direct and simple implementation of an important design in the bio sensory system named as winner take all. A background of WTA network has discussed in details showing that no research has yet designed the winner take all circuit as spiking network where the input/ outputs are stream of random pulses. In this section, the successful implementation of winner take all to random input pulses is compared to the unique study in this aspect that shows a limited capability in designing the spiking winner take all networks. 163 20 20 15 15 Neurons Neurons Figure 5.9: Pulse output response for three input IFMP architecure showing that the output rate is equal to the highest input rate 10 5 0 0 10 5 200 400 Time 600 0 0 800 200 400 Time 600 800 Figure 5.10: (a):Spikes of neuron with respect to time using IFMP architecture with inhibitory connections between IF neurons. (b):Spikes of neuron with respect to time using without the connection between the IF neurons 164 Table 5.3: Comparison between the temporal spiking network in Oster’s architecture and the proposed work WTA (IFMP) WTA (Oster, Douglas, Shi Chii 20012012) The activated input pulses are random where the probability of generating the pulses is equal to the rate of logic one over a time window. The activated input pulses are assumed as regular rate which is not the real case in the biological neuronal population The activating variables are currents injected by the random input / output pulses. The activating variables are currents injected by of regular input / output pulses. Competition between neurons is decided through a global neuron as shown in fig a. Ex: To detect the max input between N neurons, the number of connection are equal 3N connections Competition between neurons is decided through full connections as shown in fig b. Ex: To detect the max input for 100 neurons, the number of connections is equal N (N-1) connections. Simple procedure of detecting the max input by the excitatory synapses and the inhibitory feedback to/from a global neuron using the margin equation. Complicated procedure of detecting the max input through self excitatory connection and inhibit all the neurons in the network. 165 Chapter 6 Summary and Future work In chapter one, it is stated that real time processing and scalability are goals of implementing signal processing applications on a chip in general and implementing neuromorphic systems as specific computations. In order to achieve real time processing, the computations are processed in parallel and implemented in pulse mode. The pulse mode computations mix the advantages of analog and digital hardware designs as well as to propagate easily the computed parameters in real time between the successive modules. In order to achieve a scalable design, it is important to introduce a computation algorithm that can be cascaded to achieve partial processes of the algorithms and procedures. As a contribution to this large effort, a novel and scalable mechanism of pulse computation algorithm and its hard ware implementation is designed to approximate the nonlinear functions as an important procedure to implement signal processing algorithms. The designed module is based on the integrate and fire structure and margin propagation algorithm (IFMP). In chapter one, the objectives and the tasks to implement the designed pulse computational module are addressed. In the chapters two, three, four and five, the analysis of these tasks are explored in details. In this chapter, the objective goals to design the scalable IFMP algorithmic module, the tasks and methodology to meet the objective and the significance roles of the designed algorithm are summarized with recommendations to the future work. 166 6.1 Objectives In order to design a scalable pulse computation algorithm and its hardware implementation, it is important to meet the following objectives, 1. Convergence property: Since the computations are implemented with stochastic pulse computation, the designed algorithm has to meet the convergence property for each unit of the designed module. 2. Scalability property: Scalability means that the basic designed module can be cascaded to handle a growing amount of work in a capable manner. 3. Computation properties: As a new computation algorithm, the module has to meet some computational properties such as: scaling property, monotonicity, convexity, superposition and offset property. 4. Pulse computation verification: The pulse mode computational module has to verify the the approximation of the basic and non linear functions such as the addition, multiplication, power, polynomial and inner product. 5. Pulse computational applications: It is important to implement procedural algorithms to verify that the pulse computation module has the capability to implement signal processing applications such as decoding and classification. 6. Hardware realization: The basic concept of the designed module is the capability of mapping the algorithm into analog circuits. 7. Power and size efficiency: In order to design a dense array of the pulse mode-algorithm on a chip, it is recommended to map the algorithm into a compact and low power 167 analog circuits. 8. Significance role: The designed pulse module has the roles in dynamic range enhancement and unique design to maximize the highest input rate in spiking networks. 6.2 Implemented tasks To meet the above objectives, the following tasks and methodologies are applied for the verification of the scalable pulse computational module: Design the core unit of the IFMP: The basic unit of the IFMP module is designed using the concept of the integrate and fire structure. The goal here is to verify the convergence of input/output pulses such that E(L[n]) = E(d[n]), where E(L[n]) and E(d[n]) are the expected values of input pulses L and d over time period T as shown in following equations: ( T1 T n=1 L[n]) and ( T1 T n=1 d[n]) respectively. IN1: R1=0.2 1 0.5 0 500 600 700 800 900 1000 900 1000 OUT: R= 0.2 1 0.5 0 500 600 700 800 Time (0.2 msec/step) Figure 6.1: Convergence of one IF neuron with input rate equal 0.2. It shows that the output rate (R) is equal to the input pulse rate (R1) 168 INP1: R1=0.2 1 0.5 0 500 600 700 800 INP2: R2=0.1 900 1000 1 0.5 0 500 600 700 800 INP3: R3=0.3 900 1000 1 0.5 0 500 600 700 800 OUT: Rate = 0.6 900 1000 1 0.5 0 500 600 700 800 Time (0.2msec/step) 900 1000 Figure 6.2: Convergence of one IF neuron with three inputs of rates 0.2, 0.1, and 0.3 respectively. It shows that the output rate (Rate) is convergence to the sum of the input pulse rates (R1, R2, and R3) The above convergence is proven theoretically and verified experimentally. If the ”IF” neuron has one input, then the output rate (OUT) converges to the input rate (IN1) as shown in Fig. 6.1, where both of the input and output rates are equal to 0.2. In addition to that, the convergence is verified for more than one input according to this equation: P i=1 E(L[n]) = E(d[n]), where (P ) is the number of input signals to the neuron. In Fig. 6.2, an experiment is implemented using three inputs (P = 3). It is shown that the sum of the expected values for these inputs is equal to the expected value of the output pulses. IF application: As an application to the IF structure, a novel circuit is designed to adjust the weight of the synapse by changing the input current to the neuron. The injected current is changed according to the previous and post pulses to the neuron. The synapses 169 weight is adjusted using storage transistors named as the floating gates. The above synapse weight adjustment is designed with ten synapses connected a IF neuron. The result of the experiment verifies the behavior of the synapse time dependency plasticity that represents (rate = 0.25) ( rate = 0.45) (rate = 0.05) N 3 N2 N1 the mechanism of learning process in neuronal population. 1 0.5 0 500 1 600 700 800 900 1000 600 700 800 900 1000 0.5 0 500 1 0.5 0 500 600 700 800 900 1000 Time (0.2 msec/step, L1=0.7, L2=0.5 and γ=0.3) Figure 6.3: Convergence of the output rates for the three neurons N1 = 0.25, N2 = 0.05 and N3 = 0.45 in the IFMP structure when the input rates γ, L1 , and L2 are equal to 0.3, 0.7 and 0.5 respectively IFMP pulse computational unit: A special network of the IF neurons is designed to approximate non- linear functions in pulse mode. The designed algorithmic module is based on margin propagation that approximate the log-sum-exp function. The importance of the log-sum -exp is to map easily the nonlinear functions. Moreover, the importance of margin propagation is to map these function into analog circuit. In order to verify the algorithmic 170 1 0.8 Rate 0.6 0.4 N2 (theoretical) N1 (theoretical) z (theoretical) N2 (cadence) N1 (cadence) z (cadence) 0.2 0 0 0.2 0.4 0.6 0.8 Rate, L ( L =0.3, γ= 0.2) 1 1 2 Figure 6.4: Dynamic characteristics for two-input IFMP module showing the convervence and the output rates for the three neurons N 1 , N 2 and N 3 in both theoretical and circuit(cadence) simulation operation of IFMP, the following convergence equation must be satisfied, P i=1 [E(Li [n]) − E(Lz [n])]+ −→ E(γ[n]) where Li , p, z, γ and n are the i’th input, number of inputs, output, constraint and time step respectively. The inputs, output and γ are quantified as the rate of ones in a stream of random pulses. The concept of the convergence in the above equation is proven theoretically, verified in software simulation and circuit simulation as explained in chapter 3. An experiment is implemented when the input rates γ, L1 , L2 are equal to 0.3, 0.7 and 0.5 respectively. The output pulses for the neurons converge to N1 = 0.25, N2 = 0.05 and N3 = 0.45 as shown in Fig. 6.3. 171 Table 6.1: Pulse rates obtained in the theoretical ”theo” and cadence ”cad” simulations for the three neurons N1 , N2 and N3 , note that z is the output rate of neuron N3 . Where γ = 0.2 and L2 = 0.3 L1 N 1cad N 2cad zcad N 1theo N 2theo ztheo 0.1 0.21 0.02 0.12 0.2 0 0.1 0.2 0.16 0.07 0.17 0.15 0.05 0.15 0.3 0.14 0.14 0.2 0.1 0.1 0.2 0.4 0.1 0.2 0.25 0.05 0.15 0.25 0.5 0.4 0.24 0.33 0 0.2 0.3 0.6 0 0.27 0.41 0 0.2 0.4 0.7 0 0.26 0.54 0 0.2 0.5 0.8 0 0.25 0.64 0 0.2 0.6 0.9 0 0.27 0.73 0 0.2 0.7 Another experiment is implemented to show the convergence of the IFMP algorithmic module in software and circuit simulations. Fig. 6.4 and table 6.1 shows the dynamic convergence of the output rates for the three neurons N1 = 0.25, N2 = 0.05 and N3 = 0.45 in the IFMP structure when the input rates γ, L1 , and L2 are equal to 0.3, 0.7 and 0.5 respectively. The figure shows a difference between theoretical and practical results of the rates as indicated in the legend and the corresponding table of numerical values. The reason of the difference in the circuit simulation is the requirement of the accurate balance for the IFMP convergence to fix the input currents to the inhibitory and excitatory paths of the IF. In addition to that, the accuracy of the output rates depends on the number of iterations for the input pulses. The implementation of IFMP in analog circuit is important because it tells that the IFMP concept can be mapped on a chip and thus achieve dense, high speed, and low power specification compared to the software simulation. 172 Cascading and approximation concept of IFMP: The importance of the cascading or the scalability property is to handle gnawing amount of designs and to implement different algorithms on the same hard ware array with different connectivity and settings. The connectivity and settings in this project are implemented using field-programmable gate array (FPGA). In order to achieve the cascading property for the IFMP modules, first of all the approximation between the margin propagation and the log-sum-exp must be estimated in order to achieve the right computations. In chapter 3, it is proven that the estimation between the above function is equal to the rate γ. It is shown in the simulation too the successfully approximation for both the parallel and serial cascading. Properties, computation and applications: As a pulse computation algorithmic model, it is important to identify and verify the properties of these computations. In chapter 4, the scaling, offset, monstrosity, convexity and supper position were discussed in details. The above properties were proven and simulated to show that the above properties are applicable for the designed computational algorithm in pulse mode. Scaling and offset properties are used frequently in the computations because the inputs, output and γ are probabilistic terms or pulse rate terms. In order to map the IFMP to the non-linear functions, all the parameters of the equation are in logarithmic values. Applying the logarithmic function to these terms will get negative numbers that require biasing adjustments by adding an offset and then scaling the rates to values within the probability range (0:1). Having the above properties for IFMP modules, the module can then be used to approximate the mathematical functions such as multiplication, addition, power, inner product and polynomial function. The above functions are verified in both the software simulation and the circuit simulation as explained in chapter 4. In addition to that, more complicated algorithm are verified using 173 cascaded IFMP modules such as recursive dynamic equations and support vector machine algorithms for decoding and classification applications respectively. 6.3 significance of the IFMP algorithmic module 1. Dynamic range improvement To identify the difference that the pulse computational IFMP module would have to make in computations compared to other pulse computational modules, an application of sequence detection is applied in two pulse computation techniques: the stochastic pulse computations and the IFMP algorithmic module. The Stochastic computation is a new alternative approach for iterative computation on factor graphs. In this approach, the information is represented by the statistics of the bit stream which results in simple high-speed hardware implementation of graphbased algorithms. Stochastic arithmetic was introduced to design low-precision digital circuits. The important motivation for considering stochastic computation was the possibility of performing complex computations using only simple circuitry. In stochastic computation as in the IFMP pulse computations , probabilities are represented as streams of random digital bits. Using this representation, complex operations on probabilities such as multiplication and division are converted to operations on bits which can easily be manipulated using simple stochastic gates. Although the stochastic computations seems feasible but it fails to implement a simple detection application. In chapter 5, a simulation to the factor graph application ”sequence detection” is implemented using stochastic computation. The simulation shows that the low range of the processing parameters cause a failure to recognize 174 an input sequence to the hidden Markov model with a matching observation. In the same chapter, it is shown the significance difference in detecting a simple sequence in both the pulse stochastic computation and the IFMP computations which tells that the logarithmic representation of the computational parameter enhance the range of the small values of the probabilities in computations. 2. Novel design of spiking winner take all network In chapter 5, the background of designing winner take all (WTA) in continuous time systems and spiking networks is discussed. Many attempts are listed in the background survey to design the WTA because of it’s importance in the sensory system of the receptive fields. Many researchers are interested in the design of spiking networks but most of them looked to the WTA in a spiking architecture as a circuit that implement continuous signal with encoder/decoder for spikes to continuous/ continuous to spikes signal conversion. The only designed network for spiking WTA that addressed the spiking WTA is the Oster’s architecture, where the input and output signals for the designed algorithm are quantified as spike rate. Yet, these input spikes have to be regular in order to identify the winner between the input streams of pulses. If the above architecture is designed for random input pulses, the design will fail to identify the winner. The significance and novelty of the pulse computational IFMP unit is the identification to the winner input even if the input streams are random pulses ( variable time period between pulses/ stream). In chapter 5, it is shown that the designed IFMP module verifies the concept of winner take all (WTA) algorithm in pulsed computations and that the idea of WTA is embedded in the IFMP algorithm. The IFMP algorithm 175 approximates the max input if the inhibiting parameter γ is chosen as a small rate (example: 0.001), thus it can be said that IFMP propagates the maximum input rate. The novelty in the designed winner take all is the implementation of the concept when the input pulses are stochastic or random compared to the unique designed found in the survey which implement spiking WTA based on the regular input pulses. 3. Chip test verification: The next significance role of the designed pulse computational module is the successful implementation and realization of the algorithmic concept into analog circuit and layout design. The design pulse computation algorithm is realized on a chip as an array of 64 units of two-input IFMP modules. Couple of the 3-8 decoders are used to select the IFMP unit in the array and 16- pull up transistors are used to propagate the output pulses to identify the x-y coordination of the IFMP that generates a spike in the array. If the IFMP units are verified as a stable unit in silicon layout as in the simulation, then the IFMP array is ready to approximate the basic functions and the algorithmic applications. In order to built an architecture for any application, a state machine is designed using hardware description language (HDL) that is configured on an interfacing fieldprogrammable gate array (FPGA). The HDL program is designed to generate random pulses that are applied to the IFMP units in time multiplexing mode. The number of the required IFMP units for implementing applications depends on the complexity of the algorithmic application. For each step time unit, the output of the IFMP units is detected and routed using a lookup table that represents the connectivity of IFMP network. The output pulses are stored in the FPGA registers for the analysis and 176 verification of the applied algorithmic application. Following the above procedure, the IFMP array has been tested and verified in the following, (a) Testing and verification of the convergence property of the IFMP algorithmic module on a chip as discussed in chapter 3. (b) Testing and verification of a sequence decoder application using HMM algorithm as discussed in chapter 4 The same chip can be used for other application in the same procedure discussed above. This is the scalability issue, which means that there is no need to redesign the hardware for different applications. For example, the scalability is essential in designing the neuromporphic architectures that require dense neuron modules to implement signal processing applications using FPGA or any other interfacing programmable controller. 6.4 Future work Since the newly designed module is significant in spiking networks and it is scalable in detection and classification algorithms, then it is recommended the followings, 1- It is recommended to built speech recognition and image processing systems on the hardware level. A pre-stage is required for both applications to encode/decode the images and speeches into/from sequence of pulses. 2- In the neuromorphic applications, the designed arrays can be used as computational model in mimicking the biological sensory systems especially that the induced noise is considered as beneficiary to enhance performance. 3- In order to get accurate convergence of each neuron and each IFMP unit in the array 177 in the hardware level, the biasing voltages can be fixed using floating gate transistors. This is a huge project in fixing the biasing using row and column shift registers to identify each biasing voltage. 6.5 Publications chapter book 1. Thamira Hindo and Shantanu Chakrabartty, Noise-Exploitation and Adaptation in Neuromorphic Sensors (Biomimetics, and Bioreplication. Edited by Lakhtakia, Akhlesh, ELSEVIER 2012). 2. Thamira Hindo, ” Basic Computations using a Novel Scalable Pulse-Mode Modules” an invited paper to IAENG Transaction on Engineering Technologies -Special Issue of the World Congress on Engineering and Computer Science 2012” Conferences 1. Thamira Hindo and Shantanu Chakrabartty, Noise-Exploitation and Adaptation in Neuromorphic Sensors, Bioinspiration, Biomimetics, and Bioreplication. Edited by Lakhtakia, Akhlesh. Proceedings of the SPIE, Volume 8339, pp. 833905-83390513,2012. 2. Thamira Hindo, An Asynchronous, Time- Domain Analog Hidden Markov Model Circuit Based On Integrate and Fire Margin Propagation, 5th International Conference on Computer and Electrical Engineering ICCEE, Hong kong, Oct, 2012. 3. Thamira Hindo, Scalable Pulsed Computational Module Using Integrate and Fire 178 Structure and Margin Propagation Algorithm, World Congress on Engineering and Computer Science, San Francisco, Oct, 2012 4. Thamira Hindo and Timothy Grotjohn, ”Design of Scalable Pulse Computational Module Using Integrate and Fire Structure”, Engineering Graduate Research Symposium, Michigan State University, East Lansing, Nov 2012. 5. Thamira Hindo and Timothy Grotjohn, ”Design of pulse - based winner take all network using integrate and fire neuron and margin propagation algorithm”, Fifth Annual Graduate Academic Conference, Michigan State University, Feb, 2013. Journals 1. Thamira Hindo, Classification Using Novel Spiking Computation Module, Submitted to Electrical and Electronics Engineering: An International Journal (EEEIJ) ,ISSN:22005846, accepted on 6’th November 2012. 2. Thamira Hindo, Properties and Applications of Scalable Pulse Computational Module Using Integration and Threshold Operation, submitted to Neurocomputing Journal on 9/2/2012 179 BIBLIOGRAPHY 180 BIBLIOGRAPHY [1] C. A. Mead. AnalogVLSI and Neural Systems. Addison-Wesley, 1989. [2] M. Mahowald. VLSI Analogs of Neuronal Visual Processing: A Synthesis of Form and Function. Technical Report. California Institute of Technology, Pasadena, 1992. [3] K. Boahen. Neuromorphic microchips. Scientific American SCIAM, 2005. [4] S-C Liu and T. Delbruck. Neuromorphic sensory systems. Elsevier:Current Opinion in Neurobiology, 20:1–8, 2010. [5] T. Hindo and S. Chakrabartty. Noise-exploitation and adaptation in neuromorphic sensors. In Proceeding of SPIE, Bioinspiration, Biomimetics, and Bioreplication, volume 8339, March 2012. [6] R. Sarpeshkar. Analog versus digital: Extrapolating from electronics to neurobiology. Neural Computation, 10(7):1601–1638, 1998. [7] Y.W. Li, K.L. Shepard, and Y.P. Tsividis. Continuous-time digital signal processors. In Asynchronous Circuits and Systems, 2005. ASYNC 2005. Proceedings. 11th IEEE International Symposium on, pages 138 – 143, march 2005. [8] K. Boahen. ”googling the brain on a chip, stanford university, youtube”, 2009. [9] S. Liu. Introdution to Neuromorphic and Bio-inspired Mixed signal KLAB.Caltech.edu/ Shih, 2006. VLSI. [10] Shih-Chii Liu, Tobias Delbruck, Jorgene Kramer, Giacomo Indiveri, and Rodney Douglas. Analog VLSI: Circuits and Principles. MIT Press, Cambridge, MA, USA, 2002. [11] W. Gerstner and W. M. Kistler. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 1 edition, 2002. [12] A.A. Lazar and E.A. Pnevmatikakis. Video time encoding machines. Neural Networks, IEEE Transactions on, 22(3):461 –473, march 2011. 181 [13] P.A. Cariani. Temporal codes and computations for sensory representation and scene analysis. Neural Networks, IEEE Transactions on, 15(5):1100 –1111, sept. 2004. [14] John V. Arthur and Kwabena Boahen. Learning in silicon: Timing is everything. In NIPS, 2005. [15] H. Paugam-Moisy and S. M. Bohte. Computing with spiking neuron networks. (to appear), 2011. [16] C. A. Mead and M.A. Mahowald. A silicon model of early visual processing. Neural Networks, pages 91–97, 1988. [17] S.S. Tehrani, S. Mannor, and W.J. Gross. Survey of stochastic computation on factor graphs. In Multiple-Valued Logic, 2007. ISMVL 2007. 37th International Symposium on, page 54, May 2007. [18] S. Sharifi Tehrani, W.J. Gross, and S. Mannor. Stochastic decoding of ldpc codes. Communications Letters, IEEE, 10(10):716 –718, oct. 2006. [19] S. Sharifi Tehrani. Digital yet deliberately random: Synthesizing logical computation on stochastic bit streams. 2011. [20] B.D. Brown and H.C. Card. Stochastic neural computation. i. computational elements. Computers, IEEE Transactions on, 50(9):891 –905, Sep 2001. [21] Weikang Qian, Xin Li, Marc D. Riedel, Kia Bazargan, and David J. Lilja. An architecture for fault-tolerant computation with stochastic logic. IEEE Transactions on Computers, 60:93–105, 2011. [22] J. Miller and W. Yang. Simple pulse asynchronous state machines. In Circuits and Systems, 1996. ISCAS ’96., Connecting the World., 1996 IEEE International Symposium on, volume 3, pages 405 –409 vol.3, may 1996. [23] A.F. Murray and A.V.W. Smith. Asynchronous VLSI neural networks using pulsestream arithmetic. Solid-State Circuits, IEEE Journal of, 23(3):688 –697, Jun 1988. [24] J.L. Meador, A. Wu, C. Cole, N. Nintunze, and P. Chintrakulchai. Programmable impulse neural circuits. Neural Networks, IEEE Transactions on, 2(1):101 –109, Jan 1991. 182 [25] A. Hamilton, A.F. Murray, D.J. Baxter, S. Churcher, H.M. Reekie, and L. Tarassenko. Integrated pulse stream neural networks: results, issues, and pointers. Neural Networks, IEEE Transactions on, 3(3):385 –393, may 1992. [26] L. M. Reyneri. Theoretical and implementation aspects of pulse streams: an overview. In Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy and Bio-Inspired Systems, MICRONEURO ’99, pages 78–, Washington, DC, USA, 1999. IEEE Computer Society. [27] N. Shaikh-Husin and Chang Wooi Po. Pulse coded neural network implementation in VLSI. In TENCON 2000. Proceedings, volume 3, pages 237 –241 vol.3, 2000. [28] A.L. Hodgkin and A.F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 1952. [29] D. Tal, E. L. Schwartz, and M. Greis. Computing with the leaky integrate and fire neuron: Logarithmic computation and multiplication. Neural Computation, 19(2):305– 18, 1997. [30] G. Bugmann. Biologically plausible neural computation. Biosystems, 40:11–19, 1997. [31] Christof Koch and Tomaso Poggio. Multiplying with Synapses and Neurons. In Thomas Mckenna, Joel Davis, and Steven F. Zornetzer, editors, Single Neuron Computation, Neural Nets: Foundations to Applications, pages 315–345. Academic Press, 1992. [32] D. Hammerstrom. A highly parallel digital architecture for neural network emulation. In J.G.Delgado-Frias and W.R.Moore eds., editors, VLSI for Artificial Intelligence and Neural Networks, pages 357–366. Plenum Press, 1991. [33] J. Cabestany, P. Ienne, J.M. Moreno, and J. Madrenas. Is there a future for ann hardware?, 1996. [34] A. Murry and L. Tarassenko. Analogue Neural VLSI A pulse stream approach. Chapmann and Hall, 1994. [35] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the United States of America, 79(8):2554–2558, 1982. [36] A.F. Murray and A.V.W. Smith. Asynchronous VLSI neural networks using pulsestream arithmetic. Solid-State Circuits, IEEE Journal of, 23(3):688 –697, Jun 1988. 183 [37] P. Hylander, J. Meader, and E. Frie. VLSI implementation of pulse coded winner take all networks. In Circuits and Systems, 1993., Proceedings of the 36th Midwest Symposium on, pages 758 –761 vol.1, Aug 1993. [38] R. P. N. Rao. neural models of bayesian belief propagation. The Bayesian Brain: Probability Approaches to Neural Coding, K. Doya, S. Ishii, A. Pouget, R. P. N. Rao, eds., Cambridge, MA: MIT Press, 2006. [39] H. A. Loeliger. An introduction to factor graphs. Signal Processing Magazine, IEEE, 21(1):28–41, January 2004. [40] Ming Gu and S. Chakrabartty. Synthesis of bias-scalable cmos analog computational circuits using margin propagation. Circuits and Systems I: Regular Papers, IEEE Transactions on, 59(2):243 –254, Feb. 2012. [41] C. Kong and S. Chakrabartty. Analog iterative ldpc decoder based on margin propagation. Circuits and Systems II: Express Briefs, IEEE Transactions on, 54(12):1140 –1144, Dec. 2007. [42] E.M. Izhikevich. Simple model of spiking neurons. Neural Networks, IEEE Transactions on, 14(6):1569 – 1572, Nov. 2003. [43] Wulfram Gerstner and Werner M. Kistler. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 1 edition, August 2002. [44] Bruce Segee. Methods in neuronal modeling: From ions to networks, 2nd edition. Computing in Science and Engineering, 1:81, 1999. [45] G. Indiveri and etal. Neuromorphic silicon neuron circuits. Frontiers in Neuroscience, 5:1–23, 2011. [46] E. M. Izhikevich. Dynamical Systems in Neuroscience: The Geometry of Excitability And Bursting. MIT Press, 2007. [47] A.A. Lazar and L.T. Toth. Time encoding and perfect recovery of bandlimited signals. In Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003 IEEE International Conference on, volume 6, pages VI – 709–12 vol.6, April 2003. [48] M. Rastogi, A.S. Alvarado, J.G. Harris, and J.C. Principe. Integrate and fire circuit as an adc replacement. In Circuits and Systems (ISCAS), 2011 IEEE International Symposium on, pages 2421 –2424, May 2011. 184 [49] H.Y. Yang and R. Sarpeshkar. A bio-inspired ultra-energy-efficient analog-to-digital converter for biomedical applications. Circuits and Systems I: Regular Papers, IEEE Transactions on, 53(11):2349 –2356, Nov. 2006. [50] F. Ratliff. Mach bands: quantitative studies on neural networks in the retina. MIT Press, 1965. [51] A L Hodgkin and A F Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4):500–544, 1952. [52] Shih-Chii Liu. Analog VLSI circuits for short-term dynamic synapses. EURASIP Journal on Applied Signal Processing, 7:620–628, Jan. 2003. [53] J.M. Cruz-Albrecht, M.W. Yung, and N. Srinivasa. Energy-efficient neuron, synapse and stdp integrated circuits. Biomedical Circuits and Systems, IEEE Transactions on, 6(3):246 –256, June 2012. [54] Paul E. Hasler, Bradley A. Minch, and Chris Diorio. Floating-gate devices: they are not just for digital memories any more. In ISCAS (2), pages 388–391. IEEE, 1999. [55] V. Srinivasan, G. Serrano, C.M. Twigg, and P. Hasler. A floating-gate-based programmable cmos reference. Circuits and Systems I: Regular Papers, IEEE Transactions on, 55(11):3448 –3456, Dec. 2008. [56] T. Morie, O. Fujita, and K. Uchimura. Self-learning neural network lsi with highresolution non-volatile analog memory. In Neural Networks, 1995. Proceedings., IEEE International Conference on, volume 4, pages 1628 –1632 vol.4, Nov/Dec 1995. [57] F. Tenore, R.J. Vogelstein, R. Etienne-Cummings, G. Cauwenberghs, and P. Hasler. A floating-gate programmable array of silicon neurons for central pattern generating networks. In Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on, pages 4 pp. –3160, May 2006. [58] M Lenzlinger. Fowler-nordheim tunneling into thermally grown sio2. Journal of Applied Physics, 40(1):278, 1969. [59] D. Kahng and S. M. Sze. A floating-gate and its applications to memory devices. The Bell System Technical Journal, XLVI(6):1288–1295, July 1967. 185 [60] Giacomo Indiveri. Neuromorphic bistable VLSI synapses with spike-timing-dependent plasticity. In in Advances in Neural Information Processing Systems, volume 2002, pages 1091–1098, 2002. [61] John V. Arthur and Kwabena Boahen. Learning in silicon: Timing is everything. In Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada], 2005. [62] A. Bofill and A. Murray. Synchrony detection by analogue vlsi neurons with bimodal stdp synapses. In IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. MIT Press, 2003. [63] C. Diorio, P. Hasler, B.A. Minch, and C.A. Mead. A floating-gate mos learning array with locally computed weight updates. Electron Devices, IEEE Transactions on, 44(12):2281 –2289, Dec 1997. [64] P. Hafliger and C. Rasche. Floating gate analog memory for parameter and variable storage in a learning silicon neuron. In Circuits and Systems, 1999. ISCAS ’99. Proceedings of the 1999 IEEE International Symposium on, volume 2, pages 416 –419 vol.2, jul 1999. [65] Shih-Chii Liu and R. Mockel. Temporally learning floating-gate VLSI synapses. In Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pages 2154 –2157, May 2008. [66] D. Robert and M. C. G¨opfert. Novel schemes for hearing and orientation in insects. Current Opinion in Neurobiology, 12(6):715–720, December 2002. [67] T. Hindo. Scalable pulsed computational module using integrate and fire structure and margin propogation algorithm. In World Congress on Engineering and Computer Science 2012, San Francisco, pages 10–16, 2012. [68] J. Tapson and A. Schaik. An asynchronous parallel neuromorphic adc architecture. IEEE, ISCAS, 2012. [69] Ming Gu, K. Misra, H. Radha, and S. Chakrabartty. Sparse decoding of low density parity check codes using margin propagation. In Global Telecommunications Conference, 2009. GLOBECOM 2009. IEEE, pages 1 –6, 30 2009-Dec. 4 2009. 186 [70] M.D. Godfrey. CMOS device modeling for subthreshold circuits. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, 39(8):532 –539, Aug 1992. [71] C. C. Enz. High Precision CMOS Micropower Amplifiers. PhD thesis, 1989. No. 802. [72] E. Vittoz. MOS Transistor -Intensive Summer course on CMOS VLSI Design - Swiss Federal Institute of Technology, 2007. [73] R. Sarpeshkar, T. Delbruck, and C. Mead. White noise in MOS transistors and resistors. IEEE Circuits and Devices, 9(6):23–29, June 1993. [74] D. J. Mar, C. C. Chow, W. Gerstner, R. W. Adams, and J. J. Collins. Noise shaping in populations of coupled model neurons. Proceedings of the National Academy of Sciences, 96(18):10450–10455, August 1999. [75] M. J. Chacron, B. Lindner, L. Maler, A. Longtin, and J. Bastian. Experimental and theoretical demonstration of noise shaping by interspike interval correlations. In luctuations and Noise in Biological, Biophysical, and Biomedical Systems III, page 150. SPIE 5841, 2005. [76] R. K Shaga. Noise- Shaping Optimization and Online Learning with Applications to Digitally Assisted Analog Circuits. A Thesis Submitted to Michigan State University, 2012. [77] L.R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257 –286, Feb 1989. [78] J. Lazzaro, J. Wawrzynek, and R.P. Lippmann. A micropower analog circuit implementation of hidden markov model state decoding. Solid-State Circuits, IEEE Journal of, 32(8):1200 –1209, Aug 1997. [79] T. Hindo. An asynchronous, time- domain analog hidden markov model circuit based on integrate and fire margin propagation. In 5th international Conference on computer and electrical engineering ICCEE, Hong kong, pages 10–16, 2012. [80] Vladimir N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA, 1995. [81] John C. Platt. Advances in kernel methods. chapter Fast training of support vector machines using sequential minimal optimization, pages 185–208. MIT Press, Cambridge, MA, USA, 1999. 187 [82] Thorsten Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the Sixteenth International Conference on Machine Learning, ICML ’99, pages 200–209, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. [83] Thorsten Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’06, pages 217–226, New York, NY, USA, 2006. ACM. [84] S. Chakrabartty and G. Cauwenberghs. Gini-support vector machine:quadratic entropy based multi-class probability regression. Journal of Machine Learning Research, 8:813– 839, 2007. [85] J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A. Mead. Winner-take-all networks of o(n) complexity. In Advances in Neural Information Processing Systems, pages 703–711. Morgan Kaufmann, 1989. [86] Matthias Oster, Rodney Douglas, and Shih-Chii Liu. Computation with spikes in a winner-take-all network. Neural Comput., 21(9):2437–2465, September 2009. [87] S. Grossberg. Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52(3):213–257, 1973. [88] G. Indiveri, C. Rasche, and R. Douglas. Neuromorphic Sensory Processing using Analog Very Large-Scale Integration. In Et, editor, Frontiers of Life. Academic Press, San Diego, CA, 2002. [89] Shin-Ichi Amari and Michael A. Arbib. Competition and Cooperation in Neural Nets. Springer, Berlin, 1982 1982. [90] Rodney J. Douglas and Kevan A. Martin. Neuronal circuits of the neocortex. Annual review of neuroscience, 27(1):419–451, 2004. [91] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex, 1999. [92] Laurent Itti, Christof Koch, and Ernst Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259, 1998. [93] L. Carota, G. Indiveri, and V. Dante. A software-hardware selective attention system. Neurocomputing, 58–60:647–653, Jun 2004. 188 [94] A. L. Yuille and N. M. Grzywacz. A winner-take-all mechanism based on presynaptic inhibition feedback. Neural Computation, 1(3):334–347, 1989. [95] Bard Ermentrout. Complex dynamics in winner-take-all neural nets with slow inhibition. Neural Networks, 5(3):415–431, 1992. [96] John Lazzaro, Sylvie Ryckebusch, Misha Mahowald, and Carver Mead. Winner-takeall networks of o(n) complexity. In David S. Touretzky, editor, NIPS, pages 703–711. Morgan Kaufmann, 1988. [97] Samuel Kaski and Teuvo Kohonen. Winner-take-all networks for physiological models of competitive learning. Neural Networks, 1994. [98] J. A. Barnden and K. Srinivas. Temporal winner-take-all networks: a time-based mechanism for fast selection in neural networks. Trans. Neur. Netw., 4(5):844–853, September 1993. [99] R. Hahnloser, R. Sarpeshkar, M. Mahowald, R. J. Douglas, and S. Seung. Digital selection and analog amplification co-exist in an electronic circuit inspired by neocortex. Nature, 405(6789):947–951, 2000. [100] Y. He and E. Sanchez-Sinencio. Min-net winner-take-all cmos implementation. Electronics Letters, 29(14):1237 –1239, July 1993. [101] Starzyk and X. Fang. CMOS current mode winner-take-all circuit with both excitatory and inhibitory feedback. Electronic Letters, 29(10):908–910, May 1993. [102] R. Serrano-gotarredona, M. Oster, P. Lichtsteiner, A. Linares-barranco, R. Paz, F. Gmez-rodrguez, H. Kolle Riis, T. Delbrck, S. C. Liu, S. Zahnd, A. M. Whatley, R. Douglas, P. Hfliger, G. Jimenez-moreno, and A. Civit. Aer building blocks for multi-layer multi-chip neuromorphic vision systems. In in Advances in Neural Information Processing Systems, pages 1217–1224. MIT Press, 2005. [103] Yuguang Fang, Michael A. Cohen, and Thomas G. Kincaid. Dynamics of a winnertake-all neural network. Neural Networks, 9(7):1141–1154, 1996. [104] G. Indiveri, T. Horiuchi, E. Niebur, and R.J. Douglas. A competitive network of spiking VLSI neurons. In F. Rattay, editor, World Congress on Neuroinformatics, ARGESIM Report no. 20, pages 443–455, Vienna, 2001. ARGESIM / ASIM - Verlag. [105] R. Mller, M. Maris, Oller M. Maris, J. Tomes, and A. Mojaev. A strong winner-take-all neural network in analogue hardware, 1998. 189 [106] S.C. Liu, S.C. Liu, and Institute of Neuroinformatics (Z¨ urich). A Winner-take-all Circuit with Controllable Soft Max Property. INI’s posters. Institute of Neuroinformatics, ETH/UNIZ, 2001. [107] Dezhe Z. Jin and H. Sebastian Seung. Fast computation with spikes in a recurrent neural network. Physical Review E, 65(5):051922+, May 2002. [108] Angela J. Yu, Martin A. Giese, and Tomaso A. Poggio. Biophysiologically plausible implementations of the maximum operation. Neural Comput., 14(12):2857–2881, December 2002. [109] Jens Petter Abrahamsen, Philipp Hfliger, and Tor Sverre Lande. A time domain winner-take-all network of integrate-and-fire neurons. In ISCAS (5), pages 361–364, 2004. [110] Matthias Oster, Yingxue Wang, Rodney Douglas, and Shih-Chii Liu. Quantification of a spike-based winner-take-all VLSI network, 2008. [111] Erik D. Lumer. Effects of spike timing on winner-take-all competition in model cortical circuits. Neural Comput., 12(1):181–194, January 2000. [112] E. Chicca, P. Lichtsteiner, T. Delbruck, G. Indiveri, and R.J. Douglas. Modeling orientation selectivity using a neuromorphic multi-chip system. In International Symposium on Circuits and Systems, ISCAS 2006, pages 1235–1238. IEEE, 2006. [113] L. Carota, G. Indiveri, and V. Dante. A software-hardware selective attention system, 2004. [114] C. Bartolozzi and G. Indiveri. A neuromorphic selective attention architecture with dynamic synapses and integrate-and-fire neurons. In Brain Inspired Cognitive Systems, BICS 2004, volume BIS2.2, pages 1–6, Aug 2004. [115] J. Gautrais and S. Thorpe. Rate coding versus temporal order coding: a theoretical approach. Biosystems, 48(1-3):57–65, 1998. [116] Teresa Serrano-Gotarredona and Bernab Linares-Barranco. A modular current-mode high-precision winner-take-all circuit. In ISCAS, pages 557–560, 1994. [117] T. Y. W. Choi, P. A. Merolla, J. V. Arthur, K. A. Boahen, and B. E. Shi. Neuromorphic implementation of orientation hypercolumns. IEEE Transactions on Circuits and Systems I: Regular Papers, 52(6):1049–1060, June 2005. 190 [118] E. Chicca, P. Lichtsteiner, T. Delbruck, G. Indiveri, and R.J. Douglas. Modeling orientation selectivity using a neuromorphic multi-chip system. In International Symposium on Circuits and Systems, ISCAS 2006, pages 1235–1238. IEEE, 2006. [119] R. Jacob Vogelstein, Udayan Mallik, Eugenio Culurciello, Gert Cauwenberghs, and Ralph Etienne-Cummings. A multichip neuromorphic system for spike-based visual information processing, 2007. 191