NOISE-SHAPING STOCHASTIC OPTIMIZATION AND ONLINE LEARNING WITH APPLICATIONS TO DIGITALLY-ASSISTED ANALOG CIRCUITS By Ravi Krishna Shaga A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Electrical Engineering 2011 ABSTRACT NOISE-SHAPING STOCHASTIC OPTIMIZATION AND ONLINE LEARNING WITH APPLICATIONS TO DIGITALLY-ASSISTED ANALOG CIRCUITS By Ravi Krishna Shaga Analog circuits that use on-chip digital-to-analog converters for calibration use DSP based algorithms for optimizing and calibrating the system parameters. However, the performance of traditional online-gradient descent based optimization and calibration algorithms suffer from artifacts due to quantization noise which adversely affects the real-time and precise convergence to the desired parameters. This thesis proposes and analyzes a novel class of on-line learning algorithms that can noise-shape the effect of quantization noise during the adaptation procedure and in the process achieve faster spectral convergence compared to the conventional quantized gradient-descent approach. We extend the proposed framework to higher-order noise-shaping and derive criteria for achieving optimal system performance. The thesis also explores the application of stochastic perturbative gradient descent techniques to the proposed noise-shaping online learning framework where we show the performance of the stochastic algorithm can be improved in the spectral domain. The thesis applies the proposed optimization method for online calibration of subthreshold analog circuits where artifacts like mismatch and non-linearity are more pronounced. Using measured results obtained from prototype fabricated in a 0.5µm CMOS process, we demonstrate the robustness of the proposed algorithm for the task of: (a) compensating and tracking of offset parameters; and (b) calibration of the center frequency of a sub-threshold gm-C biquad filter. Dedicated to my parents, Narsaiah and Pushpa Latha, for their unconditional love and support iii ACKNOWLEDGMENT The completion of my graduate studies has been one of the most significant academic challenges that I have undertaken. This endeavour wouldnot have been possible without the help and support of several people. I wish to express my gratitude to all of them. First and foremost, I would like to take this opportunity to express my deepest appreciation to Prof. Shantanu Chakrabartty for his support, patience and guidance during my graduate studies. As an academic advisor, he was always approachable and ready to help in every way possible. My special thanks to him for helping me in some of my difficult times both academically and personally. I would also like to thank Prof. Jack R. Deller and Prof. Selin Aviyente for being on my MS thesis committee and guiding me through the process. I would like to extend my special thanks to my labmates Chenling Huang, Amin Fazel, Ming Gu, Amit Gore and Yang Liu with whom I have spent considerable amount of time and I am grateful for their invaluable guidance during the early stages of my research. I have been fortunate enough to be in the company of a lot of good friends who made my journey through the graduate school really memorable. They were indeed like a family to me over here helping me in every step of the way. Atlast, I owe my deepest regards and gratitude to my parents Narsaiah and Pushpa Latha and my brothers Murali and Shivaram for their unconditional love and support which has carried me to where I am today. iv TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction 1 2 Gradient Descent Optimization 2.1 Optimization problem . . . . . . . . . . . . . . . 2.1.1 Gradient Descent Algorithm . . . . . . . . 2.1.2 Noisy Gradient Descent Algorithm . . . . 2.1.3 Noise Shaping Gradient Descent Algorithm 2.2 Online Learning Example . . . . . . . . . . . . . 2.2.1 Gradient Descent Tracking . . . . . . . . . 2.2.2 Effect of Quantization . . . . . . . . . . . 2.2.3 Σ∆ Gradient Descent Algorithm . . . . . . . . . . . . . 5 5 6 7 9 16 17 20 23 3 Generalized Noise Shaping Algorithm 3.1 Higher Order Σ∆ Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Batch Gradient Descent Optimization . . . . . . . . . . . . . . . . . . . . . . 33 36 38 4 Gradient-Free Stochastic Optimization 4.1 Finite Difference Stochastic Approximation . . . . . . . . . . . . . . . . . . . 4.2 Simultaneous Perturbation Stochastic Approximation . . . . . . . . . . . . . 42 43 44 5 Application to Analog Circuit Calibration 5.1 Analog Circuit Design of a Spectral Feature Extractor 5.1.1 Bandpass Filter Realization . . . . . . . . . . . 5.1.2 Transconductor . . . . . . . . . . . . . . . . . . 5.1.3 Current DAC . . . . . . . . . . . . . . . . . . . 5.1.4 Halfwave Rectifier . . . . . . . . . . . . . . . . . 5.1.5 Σ∆ Modulator . . . . . . . . . . . . . . . . . . 5.2 Test Station Setup . . . . . . . . . . . . . . . . . . . . 5.3 Calibration of Sigma Delta Modulator . . . . . . . . . 5.4 Calibration of a bandpass filter . . . . . . . . . . . . . 49 50 51 52 56 57 59 61 63 69 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 v Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 80 LIST OF TABLES 2.1 SNR (in dB) for different noise levels of the ideal, noisy and Σ∆ gradient descent algorithms for a sample speech utterance ‘26 81 57’ from the YOHO database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 SNR (in dB) for different quantization levels of the different optimization algorithms on a sample speech utterance “26 81 57”. . . . . . . . . . . . . . 27 5.1 Measured specifications of the fabricated single-channel bandpass filter chip . 65 5.2 Variation of the bandpass filter characteristics across two different chips operating under similar conditions . . . . . . . . . . . . . . . . . . . . . . . . 70 2.2 vii LIST OF FIGURES 1.1 Schematic diagram of a generic online learning system . . . . . . . . . . . . 2 2.1 Block diagram of the noisy gradient descent optimization algorithm . . . . . 8 2.2 Block diagram of the first order noise-shaping gradient descent algorithm . . 9 2.3 Frequency domain response of the original sinusoid, ideal, noisy and noiseshaped gradient descent algorithms. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Convergence of Noise shaping gradient descent algorithm to the global minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 2.5 Tracking a sinusoid signal with ideal,noisy and Σ∆ gradient descent algorithms 13 2.6 Rate of convergence of the ideal, noisy and Σ∆ gradient descent algorithms 13 2.7 Error magnitude in the case of ideal (in black),noisy (in blue) and Σ∆ (in green) gradient descent algorithms for a noise floor of (a) 30dB, (b) 10dB, (c) -10dB and (d) -30dB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 FFT of the sample speech utterance “26 81 57” (in red) along with the noisy (in blue) and Σ∆ (in green) learning algorithm . . . . . . . . . . . . . . . . 16 Real-time tracking of a sinusoid signal with ideal gradient descent algorithm for = 0.01, 0.1 and 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 E(jω) in gradient descent optiY (jω) mization for different values of the adaptation rate α . . . . . . . . . . . . . 20 2.11 Variation of the relative error w.r.t signal magnitude and quantization error for a quantized gradient descent algorithm for α = 0.5 . . . . . . . . . . . . 22 2.8 2.9 2.10 Frequency dependency of the relative error viii 2.12 Variation of Noise floor (in dB) with increase in no. of quantization bits . . 23 2.13 Variation of the relative error w.r.t normalized frequency for a quantized (in red) and Σ∆ (in green) gradient descent algorithm for different values of α . 25 2.14 Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent(in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 1-bit quantizer for a (a) = 0.1,(b) = 0.2,(c) = 0.5 and (d) = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.15 Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 4-bit quantizer for a (a) = 0.1,(b) = 0.2,(c) = 0.5 and (d) = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.16 Error(in dB) vs normalized frequency of ideal gradient descent (in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms for = 0.5, with a n-bit quantizer for (a) n=1,(b) n=2,(c) n=4 and (d) n=8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.17 SNR(in dB) vs adaptation parameter of gradient descent, quantized gradient descent and 1st order Σ∆ gradient descent optimization algorithms with a n-bit quantizer for (a) n=1,(b) n=2,(c) n=4 and (d) n=8 . . . . . . . . . . 31 2.18 SNR(in dB) vs no. of quantization bits of gradient descent, quantized gradient descent and 1st order Σ∆ gradient descent optimization algorithms for (a) =0.2,(b) =0.5,(c) =0.8 and (d) =0.95 . . . . . . . . . . . . . . . . . . . 32 3.1 Generalized block diagram of the Σ∆ gradient descent algorithm . . . . . . . 34 3.2 Block diagram of a second order sigma-delta modulator . . . . . . . . . . . . 36 3.3 Noise transfer function(NTF) of a first, second and third order Σ∆ modulator 38 3.4 Error (in dB) vs normalized frequency of the gradient descent algorithm with higher order(1st , 2nd and 4th order) Σ∆ modulation . . . . . . . . . . . . . 39 Variation of the relative quantization error w.r.t the normalized frequency for quantized gradient descent and batch gradient descent algorithms. . . . . . . 40 3.5 ix 3.6 Error (in dB) vs normalized frequency for gradient descent,first order Σ∆ and Σ∆ gradient descent with feedback filter F (z) for =0.5 . . . . . . . . . . . . 41 4.1 Block diagram of the Σ∆ SPSA gradient descent algorithm . . . . . . . . . . 45 4.2 The loss function along with the optimized parameters θ1 ,θ2 for a 2-variable optimization problem using (a) true gradient descent optimization and (b) Σ∆ SPSA gradient descent optimization . . . . . . . . . . . . . . . . . . . . . . . 47 Error magnitude (in dB) of the tracking system optimized with Σ∆ SPSA gradient approximation(in blue) and quantized SPSA gradient approximation(in red) methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3 5.1 Block diagram of the feature extraction unit used in speaker verification system 51 5.2 Schematic diagram of the gm-C biquad filter with the bandpass filter output . 53 5.3 Schematic diagram of the transconductor circuit used in the bandpass filter . 54 5.4 DC Analysis of the transconductor Output current for different values of the bias currents (in subthreshold region) . . . . . . . . . . . . . . . . . . . . . . 55 Voltage Attenuator to increase the linear range of operation of the bandpass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Schematic diagram of the 10-bit current DAC providing the bias current for each transconductor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Output current vs. input bit of a 6-bit DAC showing the non-monotonic nature of current DAC in subthreshold region of operation . . . . . . . . . . . . . . 58 The bandpass characteristics of the filter biased with subthreshold current DAC’s at a unity gain and Q = 1. . . . . . . . . . . . . . . . . . . . . . . . . 59 Schematic diagram of the half-wave rectifer used in the feature extractor . . . 59 5.10 The schematic diagram of the Σ∆ modulator . . . . . . . . . . . . . . . . . . 60 5.11 System level diagram of the test station setup 62 5.5 5.6 5.7 5.8 5.9 x . . . . . . . . . . . . . . . . . 5.12 Test station setup showing the interface between the Opalkelly XEM3010 FPGA, NI data acquisition card, motherboard and the test chip . . . . . . . . . . . . 63 5.13 Micrograph of (a) single channel feature extractor with bandpass filter,rectifier and Σ∆ modulator stages and (b) the 10-bit current DAC fabricated on a 0.5µm CMOS process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.14 Offset mismatch between individual channels of an array of Σ∆ modulators for constant input voltage across all the channels . . . . . . . . . . . . . . . 66 ˆ2 5.15 Figure showing (a) the loss function 1 Iin as a variation of the 10-bit current 2 DAC sweep, (b) the convergence of the modulator output to zero . . . . . . . 68 5.16 Convergence of the modulator output due to In adaptation from the bandpass filter chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.17 Mismatch between the performance of bandpass filters from two different chips with same inputs under similar operating conditions . . . . . . . . . . . . . . 71 5.18 Block diagram of a single channel feature extraction unit with center frequency calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.19 Frequency response of the bandpass and the lowpass stages of the biquad filter showing the magnitude of the residual signal for input tone of different frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.20 Oscilloscope figure showing the input tones seperated by 90o phase and the residual signal from the filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.21 Moving average of the Σ∆ modulator showing the minimum when center frequency reaches the frequency of the input tone . . . . . . . . . . . . . . . . . 75 xi Chapter 1 Introduction Optimization problems in real life are often dynamic in nature, i.e., they keep changing with time. Therefore there is a need to continuously monitor the system and adapt to the variations so that the system performance can be maximized or the losses minimized. From a system level perspective, the block diagram of a generic online learning system is given in Fig 1.1. The system is characterized by a performance index L called the ‘cost function’ or the ‘loss function’ which can usually be controlled using a set of adjustable parameters θ. An online learning system makes use of the previous experience of system output and tries to adapt to the changing input by learning and correcting itself accordingly. The objective of the learning system is to be able to predict the output of the system when a previously unseen input stimuli occurs. A learning algorithm can usually be modelled as an optimization problem where the set of adjustable parameters θ need to be adapted in a way such that the loss function L is minimized. Several different class of optimization algorithms are present in literature [1] [2]. The choice of the optimization method being used depends on the system characteristics, 1 input System output adaptation Learning Figure 1.1: Schematic diagram of a generic online learning system the performance index of the system and the constraints of optimization. Zeroth order optimization algorithms like direct search algorithms make use of the direct measurements of the loss function by moving the system adaptation parameter in the direction in which the loss function decreases. First order optimization algorithms [3] depend on the first order gradient measurement of the loss function to adapt the system parameters in the direction of the steepest gradient descent. Second order optimization algorithms like Newtons method make use of the information present in the hessian of the loss function to adapt the optimization parameter. As the order of optimization algorithm increases, the rate of convergence increases. But the requirement of the knowledge of higher order gradients puts an additional cost on the optimization algorithms of higher order. Several other heuristic and stochastic optimization techniques have been proposed in literature including genetic algorithm optimization [4], simulated annealing [5], particle swarm optimization [6] and reactive search algorithms [7] [8]. One of the most popular optimization techniques is the gradient descent learning algorithm. It makes use of the fact that the loss function decreases rapidly along the direction of the slope, imitating a greedy search algorithm in following the steepest descent. One of the major drawback of the gradient descent algorithm is that it 2 converges to the local minimum point for small enough adaptation rate. So, the optimization process is highly susceptible to the choice of the intial point used for the iterative updates. Digitally assisted analog circuits make use of a A/D conversion step in the calibration of the analog circuits. A conventional A/D converter like nyquist rate converters adds an additional quantization noise to the system making the adaptation process susceptible to noise. This results in a dramatic decrease in the performance of the learning algorithm with respect to the convergence rate. In this thesis, we propose a novel Σ∆ gradient-descent optimization algorithm which can be used for online tracking of system parameters in real-time. The proposed algorithm has superior performance as compared to the quantized gradient-descent algorithm as the noise-shaping characteristics of the Σ∆ modulator suppress the inband quantization noise and pushes it away into the higher frequency range. The out of band quantization noise can be eliminated by using an appropriate lowpass filter. Also, if the input signal is sufficiently random, the quantization stage of the modulator can be modeled as an additive whitenoise with flat power spectrum density. The randomness in the quantization noise actually helps the optimization process by allowing the optimization parameter out of local minimum neighborhood depending on the noise floor level. This enables a larger search space for the optimization process. Although the global optimization is not guaranteed, the algorithm is more robust than the ideal gradient descent which is known to converge to the local minimum. The remaining part of the thesis is organized as follows. In Chapter 2, mathematical framework for the quantized version of a 1st order optimization algorithm using gradient descent approach has been introduced. A novel Σ∆ gradient descent algorithm has been 3 proposed which makes use of the noise-shaping properties of the A/D modulator to achieve better convergence properties. The effect of the adaptation parameter on the error rate has been thoroughly studied. The performance enhancement of the proposed algorithm has been validated through simulations on a speech sample from YOHO database. Chapter 3 generalizes the proposed algorithm to include a higher order loop filter and learning mechanism. Chapter 4 deals with gradient-free optimization algorithms (Finite-difference and Simultaneous perturbation techniques) and the extension of the noise-shaping algorithm to these cases. In Chapter 5, we introduce the practical problems (non-linearities, mismatches, temperature variance) in the calibration of analog circuits operating in the subthreshold region of operation. A single channel analog feature extractor implemented using bandpass filter is studied. The Σ∆ learning algorithm is validated on hardware by (a) calibrating an unbalanced first-order Σ∆ modulator for offset cancellation and (b) calibrating the center frequency of a gm-C bandpass filter. Chapter 6 concludes the thesis with some final remarks and observations. 4 Chapter 2 Gradient Descent Optimization 2.1 Optimization problem Consider a system where a stochastic parameter X introduces randomness in the measurement of the objective function L(X; θ). Let θ be an adjustable parameter which can be varied as a response to the change in parameter X . The optimization problem refers to finding the optimal point θ∗ where the loss function given by L(X; θ) goes to a minimum, i.e., θ∗ = argmin L(X; θ) θ (2.1) Assuming that L(X; θ) is continuous and differentiable w.r.t θ, the partial derivative g(θ) is given by g(θ) = ∂L(X; θ) ∂θ (2.2) For the remainder of this chapter, the analysis assumes that the gradient of the loss function exists and is readily available. The case when the gradient is not available or L(θ) is not 5 differentiable is discussed in Chapter 4. Also, the performance analysis is studied for a single variable system but can be readily expanded to a multi-dimensional optimization problem with θ ∈ Rp by using a vectorial representation. 2.1.1 Gradient Descent Algorithm One of the most popular technique for optimization of the above problem is the gradient descent algorithm, where at each step, the parameter θ is decreased in the direction of the slope of the loss function L(X; θ). The iterative update θn = θn−1 − g(θn−1 ) (2.3) converges the loss function to the minimum point given in eq.(2.1). Taking the summation of the above equation to ∞, the optimal point θ∗ can be obtained to be θ∗ = − ∞ g(θk ) (2.4) k=1 The effect of the choice of the learning rate on the convergence of the gradient descent algorithm has been extensively studied in [9],[3]. If the choice of is too small, the opti- mization takes longer time to converge. On the other hand, if the learning rate is too high, the convergence is effected as the solution oscillates about the optima point. From Taylor’s series first order approximation of g(θ) about the optima point θ∗ , we get g(θ) = g(θ∗ ) + (θ − θ∗ )g (θ) 6 (2.5) But for the optimal point θ∗ , g(θ∗ ) = 0. Comparing the above equation with eq. (2.3), we have n= 1 (2.6) g (θ) which shows that the optimal learning rate is given by the inverse of the hessian of the loss function. One of the major drawbacks of the gradient descent approach is that depending on the initial choice of the parameter θ, the algorithm converges to local minimum even in the presence of a global minimum. It has been observed that by careful injection of noise to the learning algorithm, the standard gradient descent converges to the global optimal point. The idea behind deliberately adding noise comes from the fact that the stochastic nature of the noise in the recursion would allow the algorithm to escape the θ neighborhood corresponding to the local minimum. 2.1.2 Noisy Gradient Descent Algorithm When an intentional noise qn is added to the gradient estimate, the gradient descent algorithm takes the form: θn ← θn−1 − [g(θn−1 ) + qn ] (2.7) Taking the summation of the above equation to ∞, the function L(θ) converges to its minimum value at θ∗ given by: ∞ ∞ ∗=− ( θ g(θk ) + qk ) k=1 k=1 7 (2.8) Noisy Gradient Estimate ˧( ) J + ∆ + Parameter Adaptation Figure 2.1: Block diagram of the noisy gradient descent optimization algorithm We consider the case when qn is additive white noise with a flat power spectral density. For a zero mean random white noise, the second term in eq.(2.8) goes to zero, thereby converging to the same optimal point as in the case of the noiseless gradient descent algorithm in eq. (2.4). The conditions for convergence of the above algorithm have been studied in [13] and [16], which prove the almost sure convergence of the algorithm when qn follows certain statistical properties. Fig. 2.1 shows the block diagram of the noisy gradient descent algorithm. Depending on the choice of learning parameter, the final value of θ oscillates randomly about the optimal point θ∗ . In order to reduce the effect of the noise on the optimization, a simulated annealing technique([11],[12]) has been studied in literature. A large value of the adaptation parameter is initially choosen and is successively damped allowing the initial iterations to move θ out of the local minimum neighborhood. As the number of iterations is increased, the adaptation parameter goes to zero thereby decreasing the effect of the additive noise. The choice of the noise floor level and annealing rate depends on the loss function that is to be minimized. Care has to be taken to make sure that the adaptation parameter doesnot die down to zero before reaching the global minimum point.The conditions for global minimum optimization of stochastic gradient descent algorithm has been studied in [10]-[12]. 8 First Order Σ∆ Modulator ˧( ) ∆ + + − ∆ J + ˤ + Parameter Adaptation Figure 2.2: Block diagram of the first order noise-shaping gradient descent algorithm 2.1.3 Noise Shaping Gradient Descent Algorithm In this thesis, we propose a new optimization algorithm which makes use of the noise shaping properties of a Σ∆ modulator in finding the optima point. The iterative updates for the optimization of θ through this new algorithm are given by: ωn = ωn−1 + g(θn−1 ) − dn (2.9) dn = ωn−1 + qn (2.10) θn = θn−1 − dn (2.11) The proposed algorithm has better convergence properties compared to the noisy gradientdescent algorithm. The architecture introduces a Σ∆ loop which shapes the quantization noise qn before being used in the gradient-descent iteration. Fig.2.2 shows the block diagram of the first order noise-shaped gradient descent algorithm. 9 For a large number of iterations N , by taking the summation of eq. (2.11), we get N θN = θ0 − dk (2.12) k=1 But from the Σ∆ update of eqn.(2.9), if the intial state is ω0 , the summation to N iterations yields N N g(θk ) + ωN − ω0 dk = k=1 (2.13) k=1 As N tends to ∞, from the stability considerations of a first order Σ∆ modulator we get ∞ | ∞ dk − k=1 g(θk )| ≤ 2 g(θ) ∞ (2.14) k=1 which shows the asymptotic convergence of the algorithm to the optimal point θ∗ . We compare the performance of the above optimization to that of the noisy gradient descent by studying the convergence properties of both the algorithms in the case of a signal tracking problem. Consider the loss function in the case of the least mean square optimization problem given by 1 T 1 L(t; θ) = lim (y(t) − θ)2 dt T →∞ T 0 2 (2.15) where the signal that is to be tracked is a sinusoid, i.e., y(t) = sin(ωt). Fig.2.3 shows the noise-shaping characteristics of the proposed algorithm which leads to better convergence properties compared to the noisy gradient descent learning algorithm. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. In this simulation, the gradient of the loss function g(θ) is assumed to be available for computation. The value of 10 = 0.5 is chosen for the simulation. The Amplitude (in dB) 100 0 −100 −200 −300 −4 10 Target Σ∆ Gradient Descent Noisy Gradient Descent Ideal Gradient Descent −2 10 0 10 Normalized Frequency Figure 2.3: Frequency domain response of the original sinusoid, ideal, noisy and noise-shaped gradient descent algorithms. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. magnitude response of the noisy gradient descent algorithm shows that the noise floor of about 5dB corrupts the tracked signal θ. But in the case of Σ∆ modulator, the noise at lower frequencies is clearly shaped out of the signal band of interest. By using a suitable lowpass filter, the high frequency noise can be eliminated from the output. The proposed Σ∆ algorithm is more robust to traps introduced by local minimum neighborhood. Fig 2.4 shows the convergence of the algorithm to the global minimum even when the ideal gradient descent algorithm converges to local minimum. The cost function under sin(θ4 ) . The same initial point θ0 = 1.45 is used for both consideration is L(θ) = −e−θ θ3 the ideal and the Σ∆ gradient descent algorithm. The highly non-linear form of the loss function causes the ideal gradient descent optimization to converge to the local minimum. Whereas the addition of the noise qn enables the Σ∆ approach to avoid the local minimum and converge to the global optimal solution. The global convergence of the algorithm is not 11 1 Loss Function 0 −1 −2 −3 Loss Function Noise Shaping GD Ideal Gradient Descent −4 −5 0 0.5 1 1.5 Adaptation parameter θ 2 Figure 2.4: Convergence of Noise shaping gradient descent algorithm to the global minimum guaranteed, and depends highly on the loss function under consideration and the initial conditions. But the addition of the noise randomizes the search process enabling more robust global search properties. Fig. 2.5 shows the real-time tracking performance of the algorithms with = 0.9. It can be clearly seen that the Σ∆ learning has larger higher frequency spikes compared to the noisy gradient-descent case showing that qn is shaped. But at lower frequencies, the Σ∆ algorithm output follows the input signal more closely compared to the noisy gradient-descent case. Fig. 2.6 shows that because of the addition of noise to the learning algorithm, there is no penalty w.r.t the speed of convergence to the optimal point. The loss function under consideration here is the same as in eq. (2.15), where y(t) = 1, a constant function. Fig. 2.7 shows that the convergence of the noisy gradient descent algorithm depends greatly on the noise floor of qn being added. As the magnitude of the noise level qn increases, 12 4 Amplitude 2 0 −2 Noise Shaping Gradient Descent Noisy Gradient Descent Gradient Descent Target −4 0 50 100 150 Time (msec) Figure 2.5: Tracking a sinusoid signal with ideal,noisy and Σ∆ gradient descent algorithms 1.2 Gradient Descent Noisy Gradient Descent Σ∆ Gradient Descent 1 Error Value 0.8 0.6 0.4 0.2 0 −0.2 0 20 40 60 No. of iterations 80 100 Figure 2.6: Rate of convergence of the ideal, noisy and Σ∆ gradient descent algorithms the error in the estimation of θ increases. Whereas the highpass filter characteristics of the noise shaping in Σ∆ learning reduces the noise qn in the signal band of interest, thereby improving the convergence. Fig. 2.7(a) and (b) shows that as the noise floor level increases 13 50 Error Magnitude (dB) Error Magnitude (dB) 50 0 −50 −100 −150 −200 −4 10 (a) Normalized Frequency −100 −150 −2 10 (b) Normalized Frequency 50 Error Magnitude (dB) Error Magnitude (dB) −50 −200 −4 10 −2 10 50 0 −50 −100 −150 −200 −4 10 0 0 −50 −100 −150 −200 −4 10 −2 10 (c) Normalized Frequency −2 10 (d) Normalized Frequency Figure 2.7: Error magnitude in the case of ideal (in black),noisy (in blue) and Σ∆ (in green) gradient descent algorithms for a noise floor of (a) 30dB, (b) 10dB, (c) -10dB and (d) -30dB beyond the signal strength, the noisy gradient descent algorithm no longer converges to the original signal. But the Σ∆ learning approach shows better convergence characteristics, as the error is closer to the case of the ideal gradient descent algorithm. In the next set of simulations, the performance of the noise induced algorithms is validated on a sample speech utterance “26 81 57” taken from the YOHO speaker verification data set. The speech sample at 8KHz is oversampled by OSR=128. Fig. 2.8 shows the FFT of the original speech sample along with the performance of noisy gradient descent and Σ∆ gradient 14 Table 2.1: SNR (in dB) for different noise levels of the ideal, noisy and Σ∆ gradient descent algorithms for a sample speech utterance ‘26 81 57’ from the YOHO database. Optimization Algorithm Noise level (in dB) SNR (in dB) = 0.1 SNR (in dB) = 0.5 SNR (in dB) = 0.9 Ideal Gradient Descent − 20.622 61.71 105.24 Noisy Gradient Descent -30 19.998 61.07 87.41 -10 16.241 41.36 44.31 10 -2.61 -1.91 -2.13 30 -46.09 -48.15 -46.15 -30 20.01 61.42 101.74 -10 19.98 60.81 95.767 10 19.58 52.42 57.52 30 10.25 10.45 13.43 Σ∆ Gradient Descent 15 Magnitude of signal (dB) 50 0 −50 −100 ωc = ω0/OSR −150 −4 10 −2 10 Normalized Frequency Figure 2.8: FFT of the sample speech utterance “26 81 57” (in red) along with the noisy (in blue) and Σ∆ (in green) learning algorithm descent algorithms. It can be clearly seen that in signal band of interest (ωc = 8KHz), the performance of the Σ∆ algorithm outweighs the performance of the noisy gradient descent algorithm. Table 2.1 summarizes the Signal-to-Noise ratio of the output speech signal for different noise floor levels in the case of the ideal, noisy and noise shaped gradient descent algorithms. 2.2 Online Learning Example In this section, we study in detail the performance of the gradient descent algorithms for an online tracking system. The loss function under consideration is 1 T 1 {y(t) − x}2 dt L(x) = lim T 0 2 T →∞ 16 (2.16) where y(t) is the signal that is being tracked by the parameter x. The loss function is being optimized with respect to x in real-time. The gradient in this case is given by 1 T g(x) = lim {y(t) − x}dt T →∞ T 0 (2.17) The mathematical model for the error is derived for each of the algorithm. 2.2.1 Gradient Descent Tracking The gradient descent algorithm attempts to track the input y(t) by moving in the direction of the gradient given by g(x). At time index n, the update for the tracking signal x is given by xn = xn−1 + where yn − xn−1 (2.18) is the step size. Taking the Z-transform on both sides of eq. (2.18), we have X(z) = z −1 X(z) + Y (z) − z −1 X(z) (2.19) which gives X(z) = 1 − (1 − )z −1 Y (z) (2.20) The error function E(z) would become E(z) = Y (z) − X(z) = (1 − )(1 − z −1 ) Y (z) 1 − (1 − )z −1 17 (2.21) (2.22) Replacing 1 − by α we have, E(z) = For frequency ω α(1 − z −1 ) Y (z) 1 − αz −1 (2.23) ω ω0 , replacing z −1 = 1 − j ω 0 ω αj ω 0 E(jω) = ω Y (jω) (1 − α) + jα ω 0 (2.24) Taking the magnitude of the relative error, we have ω αω 0 |E(jω)| = |Y (jω)| 2 (1 − α)2 + α2 ω 2 ω0 1 2 (2.25) Replacing ω0 = 1−α ω0 , we get α |E(jω)| =  |Y (jω)| ω ω0 1 2 1 + ω 2  ω02 The choice of the adaptation parameter (2.26) effects the error magnitude of the gradient descent learning. Fig. 2.9 shows the real-time learning results of the tracking problem when the input signal is a tone given by y(t) = sin(ωt). For = 0.01 the adaptation is not fast enough to track the input y(t). As the value of epsilon decreases, the learning becomes less efficient. Eq. (2.6) shows that the optimal value of is given by the inverse of the hessian. ∂ 2 L(x) ∂g(x) In this case of the tracking problem, the hessian of the loss function is 2 = ∂x = 1. ∂x 18 2 ε = 0.01 ε = 0.1 ε = 1.0 Original Sinusoid Amplitude 1 0 −1 −2 0 50 100 150 Time (msec) Figure 2.9: Real-time tracking of a sinusoid signal with ideal gradient descent algorithm for = 0.01, 0.1 and 1.0 Therefore, the optimal value of the learning parameter is = 1, in which case the output x perfectly tracks the input signal y(t). Fig. 2.10 gives a quantitative estimate of the relative error |E(jω)| in the estimate of |Y (jω)| x with respect to the variation of α. For a given value of the adaptation parameter , the error E(z) increases linearly with the input signal frequency ω before saturating at a cutoff frequency ω given by ω0 = 1−α ω0 α (2.27) where ω0 is the sampling rate. It can be seen that for α → 0, the error magnitude decreases to zero. 19 α = 0.01 α = 0.1 α = 0.5 α = 0.95 Relative Error E(j ω) / Y(j ω) 0 10 −3 10 −2 −1 0 10 10 10 Normalized Frequency ω/ω0 Figure 2.10: Frequency dependency of the relative error E(jω) in gradient descent optiY (jω) mization for different values of the adaptation rate α 2.2.2 Effect of Quantization In online learning of most analog circuits, the gradient is digitized using an Analog-to-Digital converter(ADC) before it is used for updating the learning parameter. The digitization of the input signal introduces an additional quantization error in the estimate of the system parameter. The quantized gradient descent algorithm can be modeled by adding the quantization noise qn to the ideal gradient estimate according to xn = xn−1 + (yn − xn−1 ) + qn (2.28) Although the quantization noise qn is a stochastic random process, we consider the case 20 of one particular realization of qn . Taking the z-transform of the above equation, we get X(z) = 1 − (1 − )z −1 (Y (z) + Q(z)) (2.29) where Q(z) is the z-transform of the realization of quantization noise qn . The error function E(z) would be given by : E(z) = α(1 − z −1 ) 1−α Y (z) + Q(z) 1 − αz −1 1 − αz −1 (2.30) Comparing it to eq. (2.23), we see that quantization of the input signal adds an additional component of error Eq (z) to the gradient descent case, given by Eq (z) = 1−α Q(z) 1 − αz −1 (2.31) ω As in the case of the ideal gradient descent, replacing z −1 = 1 − j ω for frequency 0 ω ω0 , and taking the power spectral density(PSD) of the error, we get |Eq (jω)| = |Q(jω)| (1 − α) 2 (1 − α)2 + α2 ω 2 ω0 1 2 (2.32) 1 where the PSD is defined as |a| = (aa∗ ) 2 . The additional error Eq (z) added because of the quantization is therefore, |Eq (jω)| =  |Q(jω)| 1 1 2 1 + ω 2  ω02 21 (2.33) 0 Relative Error 10 Eq(jω) = E(jω) / Q(jω) Ey(jω) = E(jω) / Y(jω) Signal Error Quantization Error −1 10 −1 10 Normalized Frequency ω/ω0 0 10 Figure 2.11: Variation of the relative error w.r.t signal magnitude and quantization error for a quantized gradient descent algorithm for α = 0.5 where ω0 = 1−α ω0 . α Fig 2.11 shows the variation of the relative error due to the quantization process Eq (z) and also the signal error Ey (z) from the ideal gradient descent for a particular value of α. The eq. (2.31) shows that for a given α the error component Eq (z) has low pass characteristics with respect to the quantization error qn . The cutoff frequency ω0 is the same as in the ideal gradient descent case and the quantization error decreases after ω0 . Depending on the statistics of the input signal, a particular value of α can be choosen inorder to reduce the overall error in the quantized gradient case. The error qn introduced by the quantization of the input signal can be assumed to be uncorrelated with the input y(t) if the sampling rate ω0 is much higher than ω. Under this condition, the error qn can be approximated to white noise with a uniform power spectral density. Fig. 2.12 shows the variation of the noise floor of the quantization error qn with 22 30 White noise Sinusoid input Noise Level (dB) 20 10 0 −10 −20 1 2 3 4 No. of bits of Quantization 5 Figure 2.12: Variation of Noise floor (in dB) with increase in no. of quantization bits increase in the number of quantization bits for both a random input signal and tone. For every one bit increment in the quantizer, the noise floor due to quantization reduces by 6dB. As the number of quantization bits increases, the relative error due to quantization process decreases. 2.2.3 Σ∆ Gradient Descent Algorithm The mathematical model for the Σ∆ learning system in the case of the learning problem is given by wn = wn−1 + (yn − xn ) − dn (2.34) dn = wn−1 + qn (2.35) xn+1 = xn − dn 23 (2.36) Replacing n by n − 1 in the above iteration, we get wn−1 = wn−2 + (yn−1 − xn−1 ) − dn−1 (2.37) dn = (yn−1 − xn−1 ) + (qn − qn−1 ) (2.38) which gives, Substituting the above equation back in eq. (2.36) we get xn = xn−1 + (yn−1 − xn−1 ) + (qn − qn−1 ) (2.39) For one particular realization of the quantization noise qn , taking the z-transform of the above equation we have, X(z) = 1 − αz −1 Y (z) + (1 − z −1 ) Q(z) 1 − αz −1 (2.40) The error function E(z) is given by − 1 Y (z) + E(z) = (1 − z −1 ) Q(z) 1 − αz −1 1 − αz −1 1 − z −1 (αY (z) + Q(z)) E(z) = 1 − αz −1 (2.41) (2.42) The contribution of the error from quantization of the signal Eq z in the Σ∆ gradient descent case is given by Eq (z) = (1 − z −1 ) Q(z) 1 − αz −1 (2.43) ω As in the case of the ideal gradient descent, replacing z −1 = 1 − j ω for frequency 0 24 Relative Error α = 0.1 1 α = 0.5 0.5 α = 0.9 0 −3 10 −2 −1 10 10 Normalized frequency ω / ω0 0 10 Figure 2.13: Variation of the relative error w.r.t normalized frequency for a quantized (in red) and Σ∆ (in green) gradient descent algorithm for different values of α ω ω0 , and taking the power spectral density(PSD) of the error, we get ω (1 − α) ω 0 |Eq (jω)| = |Q(jω)| 2 (1 − α)2 + α2 ω 2 ω0 1 2 (2.44) Replacing ω0 = 1−α ω0 , we have α |Eq (jω)| = |Q(jω)| 1−α α ω ω0 2  1 2  ω  1 +   ω0  (2.45)  Fig. 2.13 shows the comparison of the error due to quantization in the case of noisy gradient descent (in red) and the Σ∆ gradient descent algorithm (in green). Eq. (2.43) 25 shows that for a given value of the adaptation parameter α, the quantization error has highpass characteristics with a cutoff frequency ω0 . Also, it can be clearly seen that the quantization error noise floor decreases with increase in the value of α. For a input signal with known characteristics, a proper choice of α can be made to reduce the overall error due to the signal and quantization error components. Fig. 2.14 and 2.15 compare the error convergence of the three algorithms with the variation of the adaptation rate for a 1-bit and 4-bit quantizer respectively. The simulation results validate the error analysis described in the previous sections. For the quantized gradient descent learning algorithm, the overall error increases as the value of increases. On the other hand, for the Σ∆ gradient descent approach, the noise shaping characteristics can be clearly seen and the overall error actually decreases with increase in the value of . In Fig. 2.14(d) and 2.15(d) the error in the case of ideal gradient descent is zero and is not plotted. Fig. 2.16 shows the variation of the error convergence of all the three optimization algorithms with change in the number of quantization bits for a constant = 0.5. As the quantization bits increases, we see that the error decreases in both the quantized and the Σ∆ algorithms. This is because the noise floor due to the quantization error decreases with increase in the number of bits of quantization as shown in fig 2.12. In the next set of simulations, the signal-to-noise(SNR) ratio of each of the algorithms is calculated by using a tone at ω = 0.01ω0 . The output from each optimization algorithm is low pass filtered and the SNR is calculated. Fig. 2.17 shows how the SNR varies as a function of the adaptation parameter for different values of the quantization bits. It can be observed that the SNR of Σ∆ learning algorithm is much closer in performance to the 26 Table 2.2: SNR (in dB) for different quantization levels of the different optimization algorithms on a sample speech utterance “26 81 57”. Optimization Algorithm Quantization levels SNR (in dB) = 0.1 SNR (in dB) = 0.5 SNR (in dB) = 0.9 Ideal Gradient Descent − 20.622 61.39 105.24 Quantized Gradient Descent 1 -9.87 -44.13 -56.27 2 6.03 -29.59 -41.93 4 37.60 0.734 -12.51 8 22.27 63.08 51.17 1 21.54 25.80 12.40 2 20.50 43.93 28.26 4 20.15 60.16 69.51 8 20.04 61.19 102.5 Σ∆ Gradient Descent ideal gradient descent case. The variation of the SNR with respect to the no. of bits of quantization is shown in Fig. 2.18 for different values of . In the next set of simulations, the performance of the quantized gradient descent and Σ∆ algorithm is validated on the same sample speech utterance “26 81 57” used in the previous section. The oversampling ratio is still set at OSR = 128. Table 2.2 summarizes the Signal-to-Noise ratio of the output speech signal for different levels of quantization in the case of the ideal, noisy and noise shaped gradient descent algorithms. 27 50 50 0 0 −50 −50 −100 −4 10 −100 −150 −4 10 −2 (a) 10 50 −2 (b) 10 50 0 0 −50 −50 −100 −150 −4 10 −100 −4 10 −2 (c) 10 −2 (d) 10 Figure 2.14: Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent(in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 1-bit quantizer for a (a) = 0.1,(b) = 0.2,(c) = 0.5 and (d) = 1.0 28 50 50 0 0 −50 −50 −100 −4 10 −100 −150 −4 10 −2 (a) 10 50 −2 (b) 10 50 0 0 −50 −50 −100 −150 −4 10 −100 −4 10 −2 (c) 10 −2 (d) 10 Figure 2.15: Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 4-bit quantizer for a (a) = 0.1,(b) = 0.2,(c) = 0.5 and (d) = 1.0 29 50 50 0 0 −50 −50 −100 −100 −150 −4 10 −150 −4 10 −2 (a) 10 100 −2 (b) 10 0 −50 0 −100 −100 −200 −4 10 −150 −200 −4 10 −2 (c) 10 −2 (d) 10 Figure 2.16: Error(in dB) vs normalized frequency of ideal gradient descent (in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms for = 0.5, with a n-bit quantizer for (a) n=1,(b) n=2,(c) n=4 and (d) n=8 30 120 80 100 SNR (in dB) SNR (in dB) 100 120 Sigma−Delta Quantized Gradient Descent 60 40 20 80 Sigma−Delta Quantized Gradient Descent 60 40 20 0 0.2 0.4 0.6 0.8 Adaptation parameter ε (a) 0 1 80 60 40 20 0.4 0.6 0.8 Adaptation parameter ε (b) 1 120 Sigma−Delta Quantized Gradient Descent SNR (in dB) SNR (in dB) 100 0.2 100 Sigma−Delta Quantized Gradient Descent 80 60 0.2 0.4 0.6 0.8 Adaptation parameter ε (c) 1 40 0.2 0.4 0.6 0.8 Adaptation parameter ε (d) Figure 2.17: SNR(in dB) vs adaptation parameter of gradient descent, quantized gradient descent and 1st order Σ∆ gradient descent optimization algorithms with a n-bit quantizer for (a) n=1,(b) n=2,(c) n=4 and (d) n=8 31 1 70 80 Sigma−Delta Quantized Gradient Descent SNR (in dB) for ε = 0.5 SNR (in dB) for ε = 0.2 80 60 50 40 30 2 4 6 No. of bits of quantizer Sigma−Delta Quantized Gradient Descent 20 2 4 6 No. of bits of quantizer 8 100 SNR (in dB) for ε = 0.95 SNR (in dB) for ε = 0.8 40 0 8 80 60 40 Sigma−Delta Quantized Gradient Descent 20 0 60 2 4 6 no. of bits of the quantizer 8 80 60 40 Sigma−Delta Quantized Gradient Descent 20 0 2 4 6 No. of bits of quantizer Figure 2.18: SNR(in dB) vs no. of quantization bits of gradient descent, quantized gradient descent and 1st order Σ∆ gradient descent optimization algorithms for (a) =0.2,(b) =0.5,(c) =0.8 and (d) =0.95 32 8 Chapter 3 Generalized Noise Shaping Algorithm The Σ∆ gradient descent learning algorithm introduced in the previous chapter made use of a simple first order integrator as the loop filter. Also a simple first-order gradient descent approximation method was used in the estimation of the updated optimization parameter. In this chapter, we generalize the same algortihm by using a higher order loop filter for the Σ∆ modulation stage and a generalized learning filter for the optimization parameter update. The loss function L(θ) under consideration is assumed to be continuous and differentiable with g(θ) being the partial derivative of L(θ) with respect to the optimization parameter θ. Fig. 3.1 shows the block diagram of the generalized noise-shaping learning algorithm. The generalized loop filter for the Σ∆ modulation stage is H(z) and the feedback learning filter is F (z). The output Y (z) of the modulator stage is given by Y (z) = H(z)[g(θ) − Y (z)] + Q(z) 33 (3.1) Q(z) ݃ሺߠሻ H(z) ൅ െ ൅ Y(z) ߠ ߝ F(z) Figure 3.1: Generalized block diagram of the Σ∆ gradient descent algorithm which gives H(z) 1 + H(z) Y (z) = g(θ) + 1 1 + H(z) Q(z) (3.2) When the parameter estimate θ is close enough to the optimal solution θ∗ , the Taylor’s series expansion of g(θ) about the root θ∗ gives g(θ) = g(θ∗ ) + (θ − θ∗ )g (θ∗ ) (3.3) As θ∗ is the root of g(θ), g(θ∗ ) = 0. Replacing (θ − θ∗ ) as a new error variable Θ we get, g(θ) = Θg (θ∗ ) (3.4) Replacing eq. (3.4) in eq. (3.2), Y (z) = H(z)Θg (θ∗ ) Q(z) + 1 + H(z) 1 + H(z) 34 (3.5) The feedback path in fig. 3.1 shows that the learning update of Θ can be written as Θ(z) = F (z)Y (z) (3.6) Substituting it back in eq. (3.5), we have Θ(z) = g (θ∗ )H(z)F (z)Θ(z) + F (z)Q(z) 1 + H(z) (3.7) F (z)Q(z) 1 + H(z) − g (θ∗ )H(z)F (z) (3.8) which gives Θ(z) = The above equation shows the dependancy of the error estimate Θ as a function of the quantization error Q(z) in the generalized form of noise shaping algorithm. The error transfer function w.r.t quantization noise Θq (z) is given by Θq (z) = Θ(z) Q(z) Θq (z) = F (z) 1 + H(z) − g (θ∗ )H(z)F (z) (3.9) (3.10) The magnitude of the error is given by |Θ(z)| = 1 + H(z) − g (θ∗ )H(z) F (z) F (z) |Q(z)| (3.11) From eq. (3.11), depending on the statistics of the loss function being minimzed, proper choice of the loop filter H(z) and feedback learning filter F (z) can be made. 35 3.1 Higher Order Σ∆ Modulation The fundamental ideas of noise-shaping of Σ∆ modulator used in gradient descent algorithms can be extended to the use of higher order modulation. The schematic diagram of a secondorder modulator is shown in fig. 3.2. The structure makes use of two integrators H1 (z) and H2 (z) as opposed to just one used in first order Σ∆ modulation. The error signal at the end of first integration stage is used as the input for the second integrator to achieve higher resolution than a simple first order modulator. X(z) + − H# (z) ˱# (z) From the z-domain analysis of the second H$ (z) + ˱$ (z) Q(z) + Y(z) − ˴ # Figure 3.2: Block diagram of a second order sigma-delta modulator order Σ∆ modulator, the output of the first integrator stage w1 (z) can be written as w1 (z) = X(z) − z −1 Y (z) H1 (z) (3.12) and the output w2 (z) of the second integrator can be written as w2 (z) = w1 (z) − z −1 Y (z) H2 (z) (3.13) The quantization at the output stage is modelled by using an additive noise Q(z). The output of the modulator Y (z) is given by Y (z) = w2 (z) + Q(z) 36 (3.14) From the above three equations, eliminating the variables w1 (z) and w2 (z), we get Y (z) = H1 (z)H2 (z) 1 X(z) + Q(z) 1 + z −1 H2 (z)(1 + H1 (z)) 1 + z −1 H2 (z)(1 + H1 (z)) (3.15) Replacing the value of H1 (z) and H2 (z) by the integrator z-domain transfer function H1 (z) = H2 (z) = 1 1 − z −1 (3.16) in eqn. (3.15), we have Y (z) = z −1 X(z) + (1 − z −1 )2 Q(z) (3.17) which gives the noise transfer function(NTF) as (1 − z −1 )2 . The second order modulator provides more quantization noise suppression over low frequency signal band compared to the first order modulator, thereby giving higher resolution of the output signal. The second order modulator shown in fig. 3.2 can be extended to a higher order modulator by using L integrators in succession. The noise transfer function NTF of such a modulator can be derived to be of the form NT F = Y (z) = (1 − z −1 )L Q(z) (3.18) Figure 3.3 shows the improvement in the noise-shaping characteristics of a Σ∆ modulator as the order of modulator increases. It can be shown that for a given oversampling ratio, the resolution of the modulator increases by 6dB or 1 bit as the order increases. The noise shaping characteristics of the higher order modulators can be exploited in the optimization of the learning system introduced in Section 2.2. Fig 3.4 shows the performance of the Σ∆ gradient descent algorithm when the loop filter H(z) in fig 3.1 is replaced by a 37 5 Magnitude spectrum 10 0 10 1st order NTF −5 10 2nd order NTF 3rd order NTF −10 10 0 0.2 0.4 0.6 Normalized Frequency Figure 3.3: Noise transfer function(NTF) of a first, second and third order Σ∆ modulator higher order Σ∆ modulator. The figure clearly shows that the tracking error decreases as the order of the Σ∆ modulator increases. For a given oversampling ratio, the SNR of the output increases by 6dB or 1-bit when the order of the Σ∆ modulator increases by one. 3.2 Batch Gradient Descent Optimization The online gradient descent algorithm introduced in eq. (2.3) uses only the instantaneous value of the gradient for optimization. In the case of noise corrupted signals, a better estimate of the gradient direction can be got by using the knowledge of gradient direction from previous iterations [17]. More precisely, for the standard first order Σ∆ optimization, the feedback learning transfer function F(z) from fig. 3.1 can be used to accommodate previous values of the output pulse d in the gradient estimate. We consider the case of batch gradient descent when the learning parameter θ is updated using the previous two values of Σ∆ 38 Error (in dB) 0 −50 1st Order Σ∆ 2nd Order Σ∆ 4th Order Σ∆ Ideal Grad Desc −100 −4 10 −3 −2 10 10 Normalized Frequency Figure 3.4: Error (in dB) vs normalized frequency of the gradient descent algorithm with higher order(1st , 2nd and 4th order) Σ∆ modulation modulator output. The update for the parameter θ using gradient descent algorithm in this case is given by θn = θn−1 − (dn + dn−1 ) (3.19) Taking the z-transform of the above equation, we have (1 − z −1 )Θ(z) = (1 + z −1 )D(z) (3.20) which gives Θ(z) = (1 + z −1 ) D(z) (1 − z −1 ) (3.21) Comparing it to the eqn. (3.6), the feedback learning transfer function F (z) is given by F (z) = (1 + z −1 ) (1 − z −1 ) 39 (3.22) Quantization noise (in dB) 0 10 −2 10 Σ∆ Batch gradient descent Quantized gradient descent −3 10 −2 −1 10 10 Normalized Frequency 0 10 Figure 3.5: Variation of the relative quantization error w.r.t the normalized frequency for quantized gradient descent and batch gradient descent algorithms. For a first order sigma delta modulator, the loop transfer function H(z) is given by H(z) = z −1 (1 − z −1 ) (3.23) Substituting eqns (3.22) and (3.23) back in the generalized error equation (3.10),the error transfer function Θ(z) becomes Θ(z) = 1 − z −2 Q(z) (1 − ) − (1 + )z −1 (3.24) Fig. 3.5 shows the noise shaping characteristics of the above batch optimization technique compared to the standard quantization gradient descent approach. The quantization noise level at lower frequencies is attenuated in the batch gradient descent algorithm whereas for the quantized gradient descent case, the noise floor is fairly constant throughout the frequency range. 40 Error (in dB) 0 −50 −100 Σ∆ gradient descent Σ∆ Batch Gradient descent −4 10 −3 −2 10 10 Normalized Frequency Figure 3.6: Error (in dB) vs normalized frequency for gradient descent,first order Σ∆ and Σ∆ gradient descent with feedback filter F (z) for =0.5 Fig. 3.6 shows the results when batch gradient descent algorithm is used for optimization of the tracking problem discussed in Section 2.2. The figure clearly shows that at lower signal frequencies, the batch gradient descent has a much lower error as compared to the Σ∆ version of the algorithm. 41 Chapter 4 Gradient-Free Stochastic Optimization In many real-life applications, the functional relationship between the optimization parameters and the loss function measurements is not readily available. The gradient based optimization algorithms discussed in Chapter 2 assume that such a relationship exists and that the gradient can be readily computed based on the relationship. But in many complex systems such as problems dealing with adaptive control, image restoration from noisy data, training of neural networks and discrete-event systems the gradient measurement is either not possible or computationally expensive. The models representing such complex system may be highly inaccurate and not dependable. In such cases, several gradient-less algorithms have been proposed in literature which make use of direct loss function measurements instead of the gradient measurement. Direct search algorithms refers to the class of optimization techniques which donot require the gradient value in the estimate of the optimal value. Pattern search algorithm [1], Nelder-Mead simplex algorithm [14], Rosenbrock algorithm [15] 42 are some of the popular zeroth order optimization techniques which make use of only the loss function measurement in the optimization. The general idea behind these algorithms is that the optimization parameter is moved in the direction in which the loss function is decreasing. Another class of gradient-free algorithms like Keifer-Wolfowitz finite-difference approximation(FDSA) and Simulataneous Perturbation Stochastic Approximation(SPSA) attempt at finding the approximation to the gradient using only the loss function measurement. These gradient approximation algorithms lead to asymptotic convergence of the parameter estimate to the optimal solution. The problem with such algorithms though lies in the fact that the convergence of these gradient-approximation algorithms is usually slower than that of gradient based algortihms [18]. But depending on the loss function that is being considered, the cost of evaluating the gradient may outweigh the speed of the gradient based algorithms in which case gradient-approximation algorithms become handy. In this Chapter, we extend the noise shaping optimization technique introduced in Chapter 2 to the case when only the loss function measurement can be made and the gradient is not available. We introduce the Σ∆ optimization framework into the gradient-free algorithms discussed above. 4.1 Finite Difference Stochastic Approximation Consider the optimization of a system with p adjustable parameters and let the objective function be L(θ). Here the optimization parameter θ is a p-dimensional vector such that θ ∈ Rp , where p > 1. If g (θ) represents the approximation of the gradient made by using the ˆ loss function measurements, then the approximation can be used in the iterative learning of 43 the optimization parameter θ using gradient descent method as follows: k k θn+1 = θn − n g k (θn ) ˆ k (4.1) Here, the index n represents the iteration number and the index k represents the k th element of the vector θ, k ∈ {1, 2, ..., p}. In the finite difference approximation method, each of the gradient g k with respect to the k th element is calculated by perturbing the parameter θ along ˆ the direction of the unit vector uk representing the direction of θk . The approximation can either be uni-directional or bi-directional with respect to the unit vector uk . In the case of bi-directional estimation, the finite-difference approximation is given by g k (θn ) = ˆ L(θn + cn uk ) − L(θn − cn uk ) 2cn (4.2) In the case of finite-difference approximation of g (θ), at each iteration 2p number of ˆ evaluations of the loss function need to be made to get the complete gradient estimate along all the p-dimensions. 4.2 Simultaneous Perturbation Stochastic Approximation In the case of simultaneous perturbation approximation of the gradient g (θ), the vector θ is ˆ perturbed simultaneously along all the p-dimensions instead of individually evaluating the loss function across each dimension. The SPSA approximation of the gradient makes use of the fact that a properly generated random change of the vector θ contains the same amount 44 Gradient Estimate + H( ) H( + + I ˤ 1 2I ˤ First Order Σ∆ Modulator + ∆ + Q ˤ _ ) _ + I ˤ ∆ + -ε Parameter Estimate Figure 4.1: Block diagram of the Σ∆ SPSA gradient descent algorithm of information about the gradient as in the case of p perturbations of the same vector θ along each dimension. The gradient estimate in the case of SPSA is given by g k (θn ) = ˆ L(θn + cn ∆n ) − L(θn − cn ∆n ) 2cn ∆k n (4.3) where ∆n is the perturbation vector along all the p-dimensions. Imposing certain statistical conditions on cn , ∆n and gain sequence n [20], the SPSA algorithm is known to converge asymptotically to the optimal solution θ → θ∗ . One particular choice of ∆n known to aid the convergene is a bernoulli sequence {±1}. The SPSA algorithm gains heavily over the finite difference method as only 2 measurements of the loss function are required to calculate the gradient along all the dimensions p. The convergence rate of both the algorithms is the same. In this chapter, we extend the Σ∆ gradient approximation framework from chapter 2 to the estimation of gradient using SPSA. As the true gradient g(θ) is not available, the approximation g (θ) generated by either SPSA (eq. (4.3)) or FDSA (eq. (4.2)) can be used ˆ in the noise-shaping optimization. Fig. 4.1 shows the block diagram of the SPSA algorithm 45 introduced into the Σ∆ optimization framework. The mathematical model for the stochastic optimization is given by ωn ← ωn−1 + [ˆn (θn ) − dn ] g (4.4) dn ← Q(ωn−1 ) (4.5) θn ← θn−1 − dn (4.6) where all the variables are in vector format of dimension p. For a single-level quantizer the quantization function Q(.) is given by dn = sgn(.), in which case dn becomes a Bernoulli sequence {±1}. We can replace ∆n by dn . As dn is a single bit quantizer, eq. (4.4) can be written as ωn = ωn−1 + L(θn + cn dn ) − L(θn − cn dn ) − dn 2cn dk n (4.7) which gives dn ωn = ωn−1 + [L(θn + cn dn ) − L(θn − cn dn ) − 2cn ] 2cn (4.8) The above equation shows that there is no need to generate the random direction vector ∆n as the output bit stream of the modulator itself can be used as the direction of perturbation. Fig. 4.2 shows that infact for the optimization of a simple enough loss function, the performance of the Σ∆ SPSA gradient descent algorithm is as good as an ideal gradient descent algorithm with respect to the rate of convergence. The loss function under consideration is L(t; θ1 , θ2 ) = 1 (y(t) − θ1 x1 (t) − θ2 x2 (t))2 2 (4.9) where the input signal y(t) = a1 sin(ω1 t) + a2 sin(ω2 t) was choosen as a mixture of two 46 a1 0.5 a2 1 Loss Function Loss function 1 θ1 0 0.5 θ2 a1 a2 θ1 θ2 0 Loss function Loss function 0 500 1000 No. of iterations 1500 0 500 1000 (b) No. of iterations 1500 Magnitude of error (in dB) Figure 4.2: The loss function along with the optimized parameters θ1 ,θ2 for a 2-variable optimization problem using (a) true gradient descent optimization and (b) Σ∆ SPSA gradient descent optimization 50 0 −50 −100 −4 10 −2 10 Normalized Frequency Figure 4.3: Error magnitude (in dB) of the tracking system optimized with Σ∆ SPSA gradient approximation(in blue) and quantized SPSA gradient approximation(in red) methods different tones at ω1 and ω2 . The optimization aims at finding the magnitude of a particular frequency signal present in the input y(t). From the simulations in fig 4.2 it can indeed be seen that the Σ∆ SPSA gradient algorithm also converges the values of θ1 ,θ2 to a1 ,a2 respectively. 47 Fig 4.3 shows the performance of the Σ∆ gradient descent algorithm with gradient estimated from the SPSA as opposed to the quantized version of the SPSA gradient. The noise-shaping characteristics of the Σ∆ learning algorithm discussed in the previous chapters is preserved even if the gradient is approximated through stochastic perturbation methods. 48 Chapter 5 Application to Analog Circuit Calibration As opposed to digital circuits, the performance of fabricated prototypes of analog circuits usually has a high degree of variance from the expected nature. This effect is even more pronounced in subthreshold analog circuits. But low-power operation and direct interfacing to the real world signals make analog circuits indispensable in any practical design. Dependance of transistor characteristics on temperature,process variations and mismatches in different components on the die make calibration of analog circuits even more difficult [23]. Digitally-assisted calibration techniques have been proposed in literature to calibrate analog circuits [21], [22]. But the use of A/D conversion introduces unnecessary quantization noise in the calibration process making them not feasible for use in calibrating analog circuits operating in subthreshold region because of the high amount of non-linearity and mismatch in such systems. In this chapter, we present the hardware realization of the noise shaping optimization 49 framework discussed in previous chapters.We introduce it as an online learning algorithm to calibrate a feature extraction unit that was developed for speaker verification purposes. The feature extraction unit consists of a bandpass filter stage, a rectification stage and a Σ∆ modulator is used to measure the magnitude envelope of the speech signal in each frequency band. Each of the stages is described in detail and the problems arising in the calibration of the Σ∆ modulator and the bandpass filter stages are discussed.The variability of the circuit performance especially in subthreshold region of operation due to mismatch in device parameters is also studied. The optimization framework is validated on the fabricated chip for two different cases, (a)for balancing an unbalanced Σ∆ modulator and (b)the center frequency calibration of a bandpass filter. 5.1 Analog Circuit Design of a Spectral Feature Extractor The block diagram of a single-channel in analog spectral feature extraction unit along with the calibration circuit is shown in figure 5.1. It consists of a bandpass filter stage, rectification stage and a Σ∆ modulator. The bandpass filter captures only the in-band frequency component of the input speech signal. The rectification stage gives the magnitude envelope of the filtered speech signal. The modulator in each channel generates a bit-stream proportional to the energy of the speech signal in that particular frequency band. This bit-stream is used as features for speaker verification purposes. Each of the above blocks are programmed with the help of serial-in current DACs. 50 Calibration Circuit (serial-in current DACs) Digital in ˧ Speech #, ˧ $, ˧ Bandpass Filter ˧ % H & Half-wave Rectifier Q Σ∆ Modulator Figure 5.1: Block diagram of the feature extraction unit used in speaker verification system 5.1.1 Bandpass Filter Realization Figure 5.2 shows the realization of the bandpass filter stage using gm − C filter design. The transconductors used for realization of the filter are described in the next section. The output current of each of the transconductors in fig. 5.2 are given by: i1 = gm1 Vin (5.1) i2 = gm2 (Vlp − Vbp ) (5.2) i3 = −gm3 Vbp (5.3) The currents through the capacitors C1 ,C2 are given by i1 + i2 = C1 sVbp (5.4) i3 = C2 sVlp (5.5) Putting together eqns. (5.1)- (5.5) and solving for the bandpass filter output vbp (s), we 51 get gm1 C1 s Vbp (s) = gm2 g gm3 Vin (s) s2 + C s + m2 C C1 2 1 (5.6) The transfer function of a generic second-order bandpass filter with center frequency ω0 , quality factor Q and gain G is given by ω0 GQs H(s) = ω0 2 s2 + Q s + ω 0 (5.7) Comparing eqn. (5.6) and (5.7), we have the expressions for center frequency ω0 , gain G and quality factor Q as following: ω0 = Q = G = gm2 gm3 C1 C2 (5.8) C1 gm3 C2 gm1 (5.9) gm1 gm2 (5.10) Equations (5.8)- (5.10) show that there is a non-linear relationship between biasing parameters gm1 , gm2 , gm3 and the desired performance of the bandpass filter given by ω0 ,Q and G. This gives rise to difficulty in programming the bandpass filter to its desired operating point. Considering the problems with subthreshold operation of the filters and the DACs used to calibrate them, this non-linear relationship increases the complexity of programming. 5.1.2 Transconductor Fig. 5.3 shows the schematic diagram of the transconductor circuit used in the realization of the bandpass filter described above. The output current of the transcondutor is di52 < Y $  C Y F?@ # Y # F?@ % $ Figure 5.2: Schematic diagram of the gm-C biquad filter with the bandpass filter output rectly proportional to the differential input signal (Vin − Vref ) and the bias current of the transconductor(Ibias ). The output stage uses a cascaded structure (M4,M6,M7 and M8) in order to increase the output impedance of the transconductor. This helps in reducing the loading effect on the output current of the transconductor. The range of operation of the bandpass filter described in fig. 5.2 directly depends on the linear range of operation of the transconductor. Several topologies have been studied in literature [24] to improve the linearity of the transconductor operation region. In our circuit, a bump circuit proposed in [24] has been used. The bump circuit (formed by transistors M B1 − M B4) draws part of the input bias current away from the input transistors (M 3 − M 4), thereby reducing their gm . This improves the linear range of the transconductor. The linear range of this transconductor has been shown to be around 300mV in saturation region [26]. The transconductance and also the linear range of a FET-OTA have a square root rela53 M1 I ω0 , there is a non-zero residual signal at Vout . The minimum point at ω0 can be reached by using a Σ∆ gradient descent algorithm using the loss function described in eq (5.34). The on-chip calibration method described above has been verified on hardware using the fabricated prototype. For this experiment, the bandpass filter is biased at unity gain G and quality factor Q. The input Vin1 to the filter is a tone whose frequency is the same as the center frequency ω0 to which the bandpass filter is to be calibrated. 74 Averaged output of the Σ∆ modulator 0.5 0.4 0.3 0.2 ω0 0 0.05 0.1 0.15 0.2 0.25 Time (in sec) 0.3 Figure 5.21: Moving average of the Σ∆ modulator showing the minimum when center frequency reaches the frequency of the input tone Equation (5.32) shows that the bandpass filter has a zero at origin, thereby introducing a phase difference of 90o at the output compared to Vin1 . In order to facilitate the correct cancellation of the signal at the output due to the lowpass filter stage, the second input Vin2 is phase shifted by the same 90o with respect to Vin1 . Figure 5.20 shows both the inputs to the filter and also the non-zero residual filter output Vout for an uncalibrated bandpass filter with center frequency not equal to the input tone ω0 . Figure 5.21 shows the results of the calibration from the fabricated chip. The Σ∆ modulator was run at a sampling frequency of 250KHz. The input tone was set at ω=2KHz and the bandpass filter was initially programmed to around 1KHz using voltage biasing. The center frequency of the filter was adjusted according to the learning algorithm by using a FPGA which tuned the voltage biasing of the transconductors. The modulator output was filtered by taking a 1024 sample moving average of the bit-stream. Fig 5.21 shows that the modulator output does not go to zero after optimization. This is because of the presence of 75 a constant offset current from the rectifier due to imperfect cancellation of the inputs at the filter stage. Also, once the minimum point ω0 was achieved, the modulator doesn’t remain at that point because the rectifier does not estimate the true gradient of the loss function defined in eq. (5.34) but only gives the magnitude, which is always positive. The actual calibration point can be achieved by taking the summation of the bitstream up until the minimum point ω0 is achieved. 76 Chapter 6 Conclusion A novel Σ∆ gradient descent optimization algorithm has been presented in this work. The proposed learning algorithm has been shown to have better precision compared to the traditionally used noisy gradient descent algorithm by using the noise-shaping characteristics of the Σ∆ modulator in real-time tracking of system parameters. The stochastic nature of the noise in the quantization stage has been shown to be helpful in the case of global convergence as opposed to the ideal gradient descent algorithm which converges to local minimum. The proposed algorithm was later extended to include higher-order noise shaping and learning mechanisms. The performance improvement of the algorithm was shown through simulation on a sample speech utterance from YOHO database. The Σ∆ optimization framework was later extended to accommodate gradient-free optimization techniques like finite-difference stochastic approximation and simultaneous perturbation stochastic approximation. The performance of the Σ∆ gradient descent algorithm has later been validated on hardware by modeling the problem of calibration of analog circuits as an optimization problem. The Σ∆ learning was shown to be robust enough to account for mismatches and 77 non-linearities even in subthreshold region of operation. The self-calibration of an unbalanced Σ∆ modulator converged to the global minimum even in the presence of several local minima points because of the non-monotonicity of the calibration DACs in the subthreshold region. The algorithm was also used to calibrate the center frequency of a gm-C biquad filter by exploiting the lowpass and bandpass filter stages present within the biquad filter. 78 BIBLIOGRAPHY 79 BIBLIOGRAPHY [1] R. Hooke and T.A. Jeeves, “Direct search solution of numerical and statistical problems”, J. ACM, 8 (1961), pp.212-229. [2] M. J. D. Powell, “Direct search algorithms for optimization calculations”’, Acta Numerica, 7 , pp 287-336(1998). [3] Battiti R.,“ First- and Second-Order Methods for Learning: Between Steepest Descent and Newtons Method”, Neural Computation, Vol. 4,141-156., 1992. [4] Goldberg, E. David, “Genetic Algorithms in Search Optimization and Machine Learning”, Addison Wesley, p41, ISBN 0201157675. [5] D. Bertsimas and J. Tsitsiklis, “Simulated Annealing”, Journal of Statistical Science, 1993, Vol 8, No. 1, pp 10-15. [6] J. Kennedy, R. Eberhart, “Particle Swarm Optimization”, IEEE International Conference on Neural Networks, Vol 4, pp. 1942 - 1948, Nov. 1995 [7] Battiti R.,“ Reactive Search: Toward Self-Tuning Heuristics”, John Wiley & Sons Ltd., pages 61-83, Chichester, 1996. [8] R. Battiti and M. Brunato, “The Reactive Tabu Search”, ORSA Journal of Computing, 6(2):126-140, 1994 [9] Luo, Z.,“ On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks,” Neural Computation, Vol. 3, 226-245., 1991. [10] S. Gelfald and S.K. Mitter, “Recursive Stochastic Algorithms for Global Optimization”,J. Control and Optimization, Vol 29, pp 999-1018, Sept. 1991. 80 [11] Fang, H., Gng, G., and Qian M., “Annealing of Iterative Stochastic Schemes,” J. Control and Optim., 35, pp. 1886-1907, 1997. [12] S. Gelfald and S.K. Mitter, “Simulated annealing with noisy or imprecise energy measurements”,J. Optim. Theor. Appl., 62:49-62, 1989. [13] H. Robbins and S. Monro, “ A Stochastic approximation method”, Ann. Math. Statist., 22: 400-407, 1951. [14] J.A. Nelder and R. Mead, “A simplex method for function minimization”, Comput. J., 7 (1965), pp.308-313. [15] H. H. Rosenbrock, “An automatic method for finding the greatest or least value of a function”, Comput. J., 3 (1960), pp. 175-184. [16] J. R. Blum, “ Approximation methods which converge with probability one”, Ann. Math. Statist., 25:382-386, 1954. [17] L. Bottou, “ Stochastic Learning”, Advanced Lectures in Machine Learning, pages 146168, Springer, 2003. [18] J. Kiefer and J. Wolfowitz, “Stochastic estimate of a regression function”, Ann. Math. Statistics., 23: 462-466, 1952 [19] J. C. Spall, “A stochastic approximation technique for generating maximum likelihood parameter estimates”, Proc. Amer. Control Conf., 1987, pp. 1161-1167. [20] J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation”, IEEE Trans. Automat. Control, 37: pp. 332-341, 1992. [21] B. Murmann,“Digitally assisted analog circuits”, IEEE Micro,Vol. 26, pp.38-47, Mar. 2006. [22] B. Murmann, B. Boser,“A 12-bit 75-MS/s pipelined ADC using open-loop residue amplification”, IEEE J. Solid-State Circuits, vol. 38, no. 12, pp.2040-2050, Dec. 2003. [23] P. R. Kinget,“Device Mismatch and Tradeoffs in the design of analog circuits”, in IEEE Journal of Solid-State Circuits, Vol. 40, Issue:6, pp.1212-1224, Jun 2005. 81 [24] P.M. Furth and H.A. Ommani. “Low-voltage highly-linear transconductor design in subthreshold CMOS”, Proceedings of the 40th Midwest Symposium on Circuits and Systems, 1997. Volume 1, 3-6 Aug. 1997 Page(s):156 - 159 vol.1 [25] Achim Gratz, “Operational Transconductance Amplifiers”. [26] A. Gore, “Design of High-Dimensional Oversampling Data Converters with On-chip Learning : Theory, Algorithm and Hardware Realization”. [27] A. Gore, S. Chakrabartty, S. Pal and E. C. Alocilja, “A Multichannel FemtoampereSensitivity Potentiostat Array for Biosensing Applications,” in IEEE Transactions of Circuits and Systems -I, Vol. 53, No. 11, November 2006. [28] S. C. Wong et al., “A CMOS mismatch model and scaling effects,” IEEE Electron Device Lett., vol. 18, pp. 261263, June 1997. [29] S. Lovett et al., “Characterizing the mismatch of submicron MOS transistors,” in IEEE Int. Conf. Microelectronic Test Structures, vol. 9, Mar.1996, pp. 3942. [30] G. Cauwenberghs.,“Neuromorphic Learning VLSI Systems: A Survey” [31] Yannis P. Tsividis,Ken Suyama.,“MOSFET Modeling for Analog Circuit CAD: Problems and Prospects,” IEEE Journal of Solid-State Circuits, vol.29, no. 3,March 1994 82