NOISE-SHAPING STOCHASTIC OPTIMIZATION AND ONLINE LEARNING WITH
APPLICATIONS TO DIGITALLY-ASSISTED ANALOG CIRCUITS
By
Ravi Krishna Shaga

A THESIS
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
MASTER OF SCIENCE
Electrical Engineering
2011

ABSTRACT
NOISE-SHAPING STOCHASTIC OPTIMIZATION AND ONLINE
LEARNING WITH APPLICATIONS TO DIGITALLY-ASSISTED ANALOG
CIRCUITS
By
Ravi Krishna Shaga
Analog circuits that use on-chip digital-to-analog converters for calibration use DSP based
algorithms for optimizing and calibrating the system parameters. However, the performance
of traditional online-gradient descent based optimization and calibration algorithms suﬀer
from artifacts due to quantization noise which adversely aﬀects the real-time and precise
convergence to the desired parameters. This thesis proposes and analyzes a novel class of
on-line learning algorithms that can noise-shape the eﬀect of quantization noise during the
adaptation procedure and in the process achieve faster spectral convergence compared to the
conventional quantized gradient-descent approach. We extend the proposed framework to
higher-order noise-shaping and derive criteria for achieving optimal system performance. The
thesis also explores the application of stochastic perturbative gradient descent techniques to
the proposed noise-shaping online learning framework where we show the performance of the
stochastic algorithm can be improved in the spectral domain.
The thesis applies the proposed optimization method for online calibration of subthreshold analog circuits where artifacts like mismatch and non-linearity are more pronounced.
Using measured results obtained from prototype fabricated in a 0.5µm CMOS process, we
demonstrate the robustness of the proposed algorithm for the task of: (a) compensating and
tracking of oﬀset parameters; and (b) calibration of the center frequency of a sub-threshold
gm-C biquad ﬁlter.

Dedicated to my parents, Narsaiah and Pushpa Latha,
for their unconditional love and support

iii

ACKNOWLEDGMENT

The completion of my graduate studies has been one of the most signiﬁcant academic
challenges that I have undertaken. This endeavour wouldnot have been possible without the
help and support of several people. I wish to express my gratitude to all of them.
First and foremost, I would like to take this opportunity to express my deepest appreciation to Prof. Shantanu Chakrabartty for his support, patience and guidance during my
graduate studies. As an academic advisor, he was always approachable and ready to help in
every way possible. My special thanks to him for helping me in some of my diﬃcult times
both academically and personally. I would also like to thank Prof. Jack R. Deller and Prof.
Selin Aviyente for being on my MS thesis committee and guiding me through the process.
I would like to extend my special thanks to my labmates Chenling Huang, Amin Fazel,
Ming Gu, Amit Gore and Yang Liu with whom I have spent considerable amount of time
and I am grateful for their invaluable guidance during the early stages of my research.
I have been fortunate enough to be in the company of a lot of good friends who made
my journey through the graduate school really memorable. They were indeed like a family
to me over here helping me in every step of the way.
Atlast, I owe my deepest regards and gratitude to my parents Narsaiah and Pushpa Latha
and my brothers Murali and Shivaram for their unconditional love and support which has
carried me to where I am today.

iv

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
List of Figures

vii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 Introduction

1

2 Gradient Descent Optimization
2.1 Optimization problem . . . . . . . . . . . . . . .
2.1.1 Gradient Descent Algorithm . . . . . . . .
2.1.2 Noisy Gradient Descent Algorithm . . . .
2.1.3 Noise Shaping Gradient Descent Algorithm
2.2 Online Learning Example . . . . . . . . . . . . .
2.2.1 Gradient Descent Tracking . . . . . . . . .
2.2.2 Eﬀect of Quantization . . . . . . . . . . .
2.2.3 Σ∆ Gradient Descent Algorithm . . . . .

.
.
.
.
.
.
.
.

5
5
6
7
9
16
17
20
23

3 Generalized Noise Shaping Algorithm
3.1 Higher Order Σ∆ Modulation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Batch Gradient Descent Optimization . . . . . . . . . . . . . . . . . . . . . .

33
36
38

4 Gradient-Free Stochastic Optimization
4.1 Finite Diﬀerence Stochastic Approximation . . . . . . . . . . . . . . . . . . .
4.2 Simultaneous Perturbation Stochastic Approximation . . . . . . . . . . . . .

42
43
44

5 Application to Analog Circuit Calibration
5.1 Analog Circuit Design of a Spectral Feature Extractor
5.1.1 Bandpass Filter Realization . . . . . . . . . . .
5.1.2 Transconductor . . . . . . . . . . . . . . . . . .
5.1.3 Current DAC . . . . . . . . . . . . . . . . . . .
5.1.4 Halfwave Rectiﬁer . . . . . . . . . . . . . . . . .
5.1.5 Σ∆ Modulator . . . . . . . . . . . . . . . . . .
5.2 Test Station Setup . . . . . . . . . . . . . . . . . . . .
5.3 Calibration of Sigma Delta Modulator . . . . . . . . .
5.4 Calibration of a bandpass ﬁlter . . . . . . . . . . . . .

49
50
51
52
56
57
59
61
63
69

6 Conclusion

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

77
v

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

80

LIST OF TABLES

2.1

SNR (in dB) for diﬀerent noise levels of the ideal, noisy and Σ∆ gradient
descent algorithms for a sample speech utterance ‘26 81 57’ from the YOHO
database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

SNR (in dB) for diﬀerent quantization levels of the diﬀerent optimization
algorithms on a sample speech utterance “26 81 57”. . . . . . . . . . . . . .

27

5.1

Measured speciﬁcations of the fabricated single-channel bandpass ﬁlter chip .

65

5.2

Variation of the bandpass ﬁlter characteristics across two diﬀerent chips operating under similar conditions . . . . . . . . . . . . . . . . . . . . . . . .

70

2.2

vii

LIST OF FIGURES

1.1

Schematic diagram of a generic online learning system . . . . . . . . . . . .

2

2.1

Block diagram of the noisy gradient descent optimization algorithm

. . . . .

8

2.2

Block diagram of the ﬁrst order noise-shaping gradient descent algorithm . .

9

2.3

Frequency domain response of the original sinusoid, ideal, noisy and noiseshaped gradient descent algorithms. For interpretation of the references to
color in this and all other ﬁgures, the reader is referred to the electronic version
of this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

Convergence of Noise shaping gradient descent algorithm to the global minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.4

2.5

Tracking a sinusoid signal with ideal,noisy and Σ∆ gradient descent algorithms 13

2.6

Rate of convergence of the ideal, noisy and Σ∆ gradient descent algorithms

13

2.7

Error magnitude in the case of ideal (in black),noisy (in blue) and Σ∆ (in
green) gradient descent algorithms for a noise ﬂoor of (a) 30dB, (b) 10dB, (c)
-10dB and (d) -30dB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

FFT of the sample speech utterance “26 81 57” (in red) along with the noisy
(in blue) and Σ∆ (in green) learning algorithm . . . . . . . . . . . . . . . .

16

Real-time tracking of a sinusoid signal with ideal gradient descent algorithm
for = 0.01, 0.1 and 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

E(jω)
in gradient descent optiY (jω)
mization for diﬀerent values of the adaptation rate α . . . . . . . . . . . . .

20

2.11 Variation of the relative error w.r.t signal magnitude and quantization error
for a quantized gradient descent algorithm for α = 0.5 . . . . . . . . . . . .

22

2.8

2.9

2.10 Frequency dependency of the relative error

viii

2.12 Variation of Noise ﬂoor (in dB) with increase in no. of quantization bits . .

23

2.13 Variation of the relative error w.r.t normalized frequency for a quantized (in
red) and Σ∆ (in green) gradient descent algorithm for diﬀerent values of α .

25

2.14 Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent(in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 1-bit quantizer for a (a) = 0.1,(b) = 0.2,(c)
= 0.5 and (d) = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.15 Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 4-bit quantizer for a (a) = 0.1,(b) = 0.2,(c)
= 0.5 and (d) = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.16 Error(in dB) vs normalized frequency of ideal gradient descent (in black),
quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in
red) optimization algorithms for = 0.5, with a n-bit quantizer for (a) n=1,(b)
n=2,(c) n=4 and (d) n=8 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.17 SNR(in dB) vs adaptation parameter of gradient descent, quantized gradient
descent and 1st order Σ∆ gradient descent optimization algorithms with a
n-bit quantizer for (a) n=1,(b) n=2,(c) n=4 and (d) n=8 . . . . . . . . . .

31

2.18 SNR(in dB) vs no. of quantization bits of gradient descent, quantized gradient
descent and 1st order Σ∆ gradient descent optimization algorithms for (a)
=0.2,(b) =0.5,(c) =0.8 and (d) =0.95 . . . . . . . . . . . . . . . . . . .

32

3.1

Generalized block diagram of the Σ∆ gradient descent algorithm . . . . . . .

34

3.2

Block diagram of a second order sigma-delta modulator . . . . . . . . . . . .

36

3.3

Noise transfer function(NTF) of a ﬁrst, second and third order Σ∆ modulator 38

3.4

Error (in dB) vs normalized frequency of the gradient descent algorithm with
higher order(1st , 2nd and 4th order) Σ∆ modulation . . . . . . . . . . . . .

39

Variation of the relative quantization error w.r.t the normalized frequency for
quantized gradient descent and batch gradient descent algorithms. . . . . . .

40

3.5

ix

3.6

Error (in dB) vs normalized frequency for gradient descent,ﬁrst order Σ∆ and
Σ∆ gradient descent with feedback ﬁlter F (z) for =0.5 . . . . . . . . . . . .

41

4.1

Block diagram of the Σ∆ SPSA gradient descent algorithm . . . . . . . . . .

45

4.2

The loss function along with the optimized parameters θ1 ,θ2 for a 2-variable
optimization problem using (a) true gradient descent optimization and (b) Σ∆
SPSA gradient descent optimization . . . . . . . . . . . . . . . . . . . . . . .

47

Error magnitude (in dB) of the tracking system optimized with Σ∆ SPSA gradient approximation(in blue) and quantized SPSA gradient approximation(in
red) methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

4.3

5.1

Block diagram of the feature extraction unit used in speaker veriﬁcation system 51

5.2

Schematic diagram of the gm-C biquad ﬁlter with the bandpass ﬁlter output .

53

5.3

Schematic diagram of the transconductor circuit used in the bandpass ﬁlter .

54

5.4

DC Analysis of the transconductor Output current for diﬀerent values of the
bias currents (in subthreshold region) . . . . . . . . . . . . . . . . . . . . . .

55

Voltage Attenuator to increase the linear range of operation of the bandpass
ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

Schematic diagram of the 10-bit current DAC providing the bias current for
each transconductor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

Output current vs. input bit of a 6-bit DAC showing the non-monotonic nature
of current DAC in subthreshold region of operation . . . . . . . . . . . . . .

58

The bandpass characteristics of the ﬁlter biased with subthreshold current
DAC’s at a unity gain and Q = 1. . . . . . . . . . . . . . . . . . . . . . . . .

59

Schematic diagram of the half-wave rectifer used in the feature extractor . . .

59

5.10 The schematic diagram of the Σ∆ modulator . . . . . . . . . . . . . . . . . .

60

5.11 System level diagram of the test station setup

62

5.5

5.6

5.7

5.8

5.9

x

. . . . . . . . . . . . . . . . .

5.12 Test station setup showing the interface between the Opalkelly XEM3010 FPGA,
NI data acquisition card, motherboard and the test chip . . . . . . . . . . . .

63

5.13 Micrograph of (a) single channel feature extractor with bandpass ﬁlter,rectiﬁer
and Σ∆ modulator stages and (b) the 10-bit current DAC fabricated on a
0.5µm CMOS process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

5.14 Oﬀset mismatch between individual channels of an array of Σ∆ modulators
for constant input voltage across all the channels . . . . . . . . . . . . . . .

66

ˆ2
5.15 Figure showing (a) the loss function 1 Iin as a variation of the 10-bit current
2
DAC sweep, (b) the convergence of the modulator output to zero . . . . . . .

68

5.16 Convergence of the modulator output due to In adaptation from the bandpass
ﬁlter chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

5.17 Mismatch between the performance of bandpass ﬁlters from two diﬀerent chips
with same inputs under similar operating conditions . . . . . . . . . . . . . .

71

5.18 Block diagram of a single channel feature extraction unit with center frequency
calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

5.19 Frequency response of the bandpass and the lowpass stages of the biquad ﬁlter showing the magnitude of the residual signal for input tone of diﬀerent
frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

5.20 Oscilloscope ﬁgure showing the input tones seperated by 90o phase and the
residual signal from the ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

5.21 Moving average of the Σ∆ modulator showing the minimum when center frequency reaches the frequency of the input tone . . . . . . . . . . . . . . . . .

75

xi

Chapter 1
Introduction
Optimization problems in real life are often dynamic in nature, i.e., they keep changing
with time. Therefore there is a need to continuously monitor the system and adapt to the
variations so that the system performance can be maximized or the losses minimized. From
a system level perspective, the block diagram of a generic online learning system is given in
Fig 1.1. The system is characterized by a performance index L called the ‘cost function’ or
the ‘loss function’ which can usually be controlled using a set of adjustable parameters θ.
An online learning system makes use of the previous experience of system output and tries
to adapt to the changing input by learning and correcting itself accordingly. The objective
of the learning system is to be able to predict the output of the system when a previously
unseen input stimuli occurs.
A learning algorithm can usually be modelled as an optimization problem where the set
of adjustable parameters θ need to be adapted in a way such that the loss function L is
minimized. Several diﬀerent class of optimization algorithms are present in literature [1] [2].
The choice of the optimization method being used depends on the system characteristics,
1

input

System

output

adaptation

Learning

Figure 1.1: Schematic diagram of a generic online learning system

the performance index of the system and the constraints of optimization. Zeroth order
optimization algorithms like direct search algorithms make use of the direct measurements
of the loss function by moving the system adaptation parameter in the direction in which
the loss function decreases. First order optimization algorithms [3] depend on the ﬁrst
order gradient measurement of the loss function to adapt the system parameters in the
direction of the steepest gradient descent. Second order optimization algorithms like Newtons
method make use of the information present in the hessian of the loss function to adapt
the optimization parameter. As the order of optimization algorithm increases, the rate of
convergence increases. But the requirement of the knowledge of higher order gradients puts
an additional cost on the optimization algorithms of higher order. Several other heuristic
and stochastic optimization techniques have been proposed in literature including genetic
algorithm optimization [4], simulated annealing [5], particle swarm optimization [6] and
reactive search algorithms [7] [8].

One of the most popular optimization techniques is the

gradient descent learning algorithm. It makes use of the fact that the loss function decreases
rapidly along the direction of the slope, imitating a greedy search algorithm in following the
steepest descent. One of the major drawback of the gradient descent algorithm is that it
2

converges to the local minimum point for small enough adaptation rate. So, the optimization
process is highly susceptible to the choice of the intial point used for the iterative updates.
Digitally assisted analog circuits make use of a A/D conversion step in the calibration of the
analog circuits. A conventional A/D converter like nyquist rate converters adds an additional
quantization noise to the system making the adaptation process susceptible to noise. This
results in a dramatic decrease in the performance of the learning algorithm with respect to
the convergence rate.
In this thesis, we propose a novel Σ∆ gradient-descent optimization algorithm which
can be used for online tracking of system parameters in real-time. The proposed algorithm
has superior performance as compared to the quantized gradient-descent algorithm as the
noise-shaping characteristics of the Σ∆ modulator suppress the inband quantization noise
and pushes it away into the higher frequency range. The out of band quantization noise can
be eliminated by using an appropriate lowpass ﬁlter. Also, if the input signal is suﬃciently
random, the quantization stage of the modulator can be modeled as an additive whitenoise with ﬂat power spectrum density. The randomness in the quantization noise actually
helps the optimization process by allowing the optimization parameter out of local minimum
neighborhood depending on the noise ﬂoor level. This enables a larger search space for the
optimization process. Although the global optimization is not guaranteed, the algorithm
is more robust than the ideal gradient descent which is known to converge to the local
minimum.
The remaining part of the thesis is organized as follows. In Chapter 2, mathematical
framework for the quantized version of a 1st order optimization algorithm using gradient
descent approach has been introduced. A novel Σ∆ gradient descent algorithm has been
3

proposed which makes use of the noise-shaping properties of the A/D modulator to achieve
better convergence properties. The eﬀect of the adaptation parameter on the error rate
has been thoroughly studied. The performance enhancement of the proposed algorithm has
been validated through simulations on a speech sample from YOHO database. Chapter 3
generalizes the proposed algorithm to include a higher order loop ﬁlter and learning mechanism. Chapter 4 deals with gradient-free optimization algorithms (Finite-diﬀerence and
Simultaneous perturbation techniques) and the extension of the noise-shaping algorithm to
these cases. In Chapter 5, we introduce the practical problems (non-linearities, mismatches,
temperature variance) in the calibration of analog circuits operating in the subthreshold
region of operation. A single channel analog feature extractor implemented using bandpass
ﬁlter is studied. The Σ∆ learning algorithm is validated on hardware by (a) calibrating an
unbalanced ﬁrst-order Σ∆ modulator for oﬀset cancellation and (b) calibrating the center
frequency of a gm-C bandpass ﬁlter. Chapter 6 concludes the thesis with some ﬁnal remarks
and observations.

4

Chapter 2
Gradient Descent Optimization

2.1

Optimization problem

Consider a system where a stochastic parameter X introduces randomness in the measurement of the objective function L(X; θ). Let θ be an adjustable parameter which can be
varied as a response to the change in parameter X . The optimization problem refers to
ﬁnding the optimal point θ∗ where the loss function given by L(X; θ) goes to a minimum,
i.e.,
θ∗ = argmin L(X; θ)
θ

(2.1)

Assuming that L(X; θ) is continuous and diﬀerentiable w.r.t θ, the partial derivative g(θ) is
given by
g(θ) =

∂L(X; θ)
∂θ

(2.2)

For the remainder of this chapter, the analysis assumes that the gradient of the loss function
exists and is readily available. The case when the gradient is not available or L(θ) is not
5

diﬀerentiable is discussed in Chapter 4. Also, the performance analysis is studied for a single
variable system but can be readily expanded to a multi-dimensional optimization problem
with θ ∈ Rp by using a vectorial representation.

2.1.1

Gradient Descent Algorithm

One of the most popular technique for optimization of the above problem is the gradient
descent algorithm, where at each step, the parameter θ is decreased in the direction of the
slope of the loss function L(X; θ). The iterative update

θn = θn−1 − g(θn−1 )

(2.3)

converges the loss function to the minimum point given in eq.(2.1). Taking the summation
of the above equation to ∞, the optimal point θ∗ can be obtained to be

θ∗ = −

∞
g(θk )

(2.4)

k=1
The eﬀect of the choice of the learning rate

on the convergence of the gradient descent

algorithm has been extensively studied in [9],[3]. If the choice of

is too small, the opti-

mization takes longer time to converge. On the other hand, if the learning rate is too high,
the convergence is eﬀected as the solution oscillates about the optima point. From Taylor’s
series ﬁrst order approximation of g(θ) about the optima point θ∗ , we get

g(θ) = g(θ∗ ) + (θ − θ∗ )g (θ)
6

(2.5)

But for the optimal point θ∗ , g(θ∗ ) = 0. Comparing the above equation with eq. (2.3), we
have
n=

1

(2.6)

g (θ)

which shows that the optimal learning rate is given by the inverse of the hessian of the loss
function.
One of the major drawbacks of the gradient descent approach is that depending on the
initial choice of the parameter θ, the algorithm converges to local minimum even in the
presence of a global minimum. It has been observed that by careful injection of noise to the
learning algorithm, the standard gradient descent converges to the global optimal point. The
idea behind deliberately adding noise comes from the fact that the stochastic nature of the
noise in the recursion would allow the algorithm to escape the θ neighborhood corresponding
to the local minimum.

2.1.2

Noisy Gradient Descent Algorithm

When an intentional noise qn is added to the gradient estimate, the gradient descent algorithm takes the form:
θn ← θn−1 − [g(θn−1 ) + qn ]

(2.7)

Taking the summation of the above equation to ∞, the function L(θ) converges to its minimum value at θ∗ given by:
∞
∞
∗=− (
θ
g(θk ) +
qk )
k=1
k=1
7

(2.8)

Noisy Gradient Estimate
˧( )

J

+

∆

+

Parameter Adaptation

Figure 2.1: Block diagram of the noisy gradient descent optimization algorithm
We consider the case when qn is additive white noise with a ﬂat power spectral density. For a
zero mean random white noise, the second term in eq.(2.8) goes to zero, thereby converging to
the same optimal point as in the case of the noiseless gradient descent algorithm in eq. (2.4).
The conditions for convergence of the above algorithm have been studied in [13] and [16],
which prove the almost sure convergence of the algorithm when qn follows certain statistical
properties.
Fig. 2.1 shows the block diagram of the noisy gradient descent algorithm. Depending on
the choice of learning parameter, the ﬁnal value of θ oscillates randomly about the optimal
point θ∗ . In order to reduce the eﬀect of the noise on the optimization, a simulated annealing
technique([11],[12]) has been studied in literature. A large value of the adaptation parameter
is initially choosen and is successively damped allowing the initial iterations to move θ out
of the local minimum neighborhood. As the number of iterations is increased, the adaptation
parameter goes to zero thereby decreasing the eﬀect of the additive noise. The choice of the
noise ﬂoor level and annealing rate depends on the loss function that is to be minimized.
Care has to be taken to make sure that the adaptation parameter doesnot die down to zero
before reaching the global minimum point.The conditions for global minimum optimization
of stochastic gradient descent algorithm has been studied in [10]-[12].
8

First Order Σ∆ Modulator
˧( )

∆

+

+
−

∆

J

+

ˤ

+

Parameter Adaptation

Figure 2.2: Block diagram of the ﬁrst order noise-shaping gradient descent algorithm

2.1.3

Noise Shaping Gradient Descent Algorithm

In this thesis, we propose a new optimization algorithm which makes use of the noise shaping
properties of a Σ∆ modulator in ﬁnding the optima point. The iterative updates for the
optimization of θ through this new algorithm are given by:

ωn = ωn−1 + g(θn−1 ) − dn

(2.9)

dn = ωn−1 + qn

(2.10)

θn = θn−1 − dn

(2.11)

The proposed algorithm has better convergence properties compared to the noisy gradientdescent algorithm. The architecture introduces a Σ∆ loop which shapes the quantization
noise qn before being used in the gradient-descent iteration. Fig.2.2 shows the block diagram
of the ﬁrst order noise-shaped gradient descent algorithm.
9

For a large number of iterations N , by taking the summation of eq. (2.11), we get
N
θN = θ0 −

dk

(2.12)

k=1
But from the Σ∆ update of eqn.(2.9), if the intial state is ω0 , the summation to N iterations
yields
N

N
g(θk ) + ωN − ω0

dk =
k=1

(2.13)

k=1

As N tends to ∞, from the stability considerations of a ﬁrst order Σ∆ modulator we get
∞
|

∞
dk −

k=1

g(θk )| ≤ 2 g(θ) ∞

(2.14)

k=1

which shows the asymptotic convergence of the algorithm to the optimal point θ∗ .
We compare the performance of the above optimization to that of the noisy gradient descent
by studying the convergence properties of both the algorithms in the case of a signal tracking
problem. Consider the loss function in the case of the least mean square optimization problem
given by
1 T 1
L(t; θ) = lim
(y(t) − θ)2 dt
T →∞ T 0 2

(2.15)

where the signal that is to be tracked is a sinusoid, i.e., y(t) = sin(ωt). Fig.2.3 shows the
noise-shaping characteristics of the proposed algorithm which leads to better convergence
properties compared to the noisy gradient descent learning algorithm. For interpretation of
the references to color in this and all other ﬁgures, the reader is referred to the electronic
version of this thesis. In this simulation, the gradient of the loss function g(θ) is assumed
to be available for computation. The value of

10

= 0.5 is chosen for the simulation. The

Amplitude (in dB)

100
0
−100
−200
−300 −4
10

Target
Σ∆ Gradient Descent
Noisy Gradient Descent
Ideal Gradient Descent
−2

10

0

10

Normalized Frequency
Figure 2.3: Frequency domain response of the original sinusoid, ideal, noisy and noise-shaped
gradient descent algorithms. For interpretation of the references to color in this and all other
ﬁgures, the reader is referred to the electronic version of this thesis.

magnitude response of the noisy gradient descent algorithm shows that the noise ﬂoor of
about 5dB corrupts the tracked signal θ. But in the case of Σ∆ modulator, the noise at
lower frequencies is clearly shaped out of the signal band of interest. By using a suitable
lowpass ﬁlter, the high frequency noise can be eliminated from the output.
The proposed Σ∆ algorithm is more robust to traps introduced by local minimum neighborhood. Fig 2.4 shows the convergence of the algorithm to the global minimum even when
the ideal gradient descent algorithm converges to local minimum. The cost function under
sin(θ4 )
. The same initial point θ0 = 1.45 is used for both
consideration is L(θ) = −e−θ
θ3
the ideal and the Σ∆ gradient descent algorithm. The highly non-linear form of the loss
function causes the ideal gradient descent optimization to converge to the local minimum.
Whereas the addition of the noise qn enables the Σ∆ approach to avoid the local minimum
and converge to the global optimal solution. The global convergence of the algorithm is not
11

1

Loss Function

0
−1
−2
−3
Loss Function
Noise Shaping GD
Ideal Gradient Descent

−4
−5
0

0.5

1
1.5
Adaptation parameter θ

2

Figure 2.4: Convergence of Noise shaping gradient descent algorithm to the global minimum
guaranteed, and depends highly on the loss function under consideration and the initial conditions. But the addition of the noise randomizes the search process enabling more robust
global search properties.

Fig. 2.5 shows the real-time tracking performance of the algorithms with = 0.9. It can
be clearly seen that the Σ∆ learning has larger higher frequency spikes compared to the noisy
gradient-descent case showing that qn is shaped. But at lower frequencies, the Σ∆ algorithm
output follows the input signal more closely compared to the noisy gradient-descent case.
Fig. 2.6 shows that because of the addition of noise to the learning algorithm, there is
no penalty w.r.t the speed of convergence to the optimal point. The loss function under
consideration here is the same as in eq. (2.15), where y(t) = 1, a constant function.
Fig. 2.7 shows that the convergence of the noisy gradient descent algorithm depends
greatly on the noise ﬂoor of qn being added. As the magnitude of the noise level qn increases,
12

4

Amplitude

2

0

−2

Noise Shaping Gradient Descent
Noisy Gradient Descent
Gradient Descent
Target

−4
0

50

100

150

Time (msec)
Figure 2.5: Tracking a sinusoid signal with ideal,noisy and Σ∆ gradient descent algorithms

1.2
Gradient Descent
Noisy Gradient Descent
Σ∆ Gradient Descent

1

Error Value

0.8
0.6
0.4
0.2
0
−0.2
0

20

40
60
No. of iterations

80

100

Figure 2.6: Rate of convergence of the ideal, noisy and Σ∆ gradient descent algorithms

the error in the estimation of θ increases. Whereas the highpass ﬁlter characteristics of the
noise shaping in Σ∆ learning reduces the noise qn in the signal band of interest, thereby
improving the convergence. Fig. 2.7(a) and (b) shows that as the noise ﬂoor level increases
13

50
Error Magnitude (dB)

Error Magnitude (dB)

50
0
−50
−100
−150
−200 −4
10

(a) Normalized Frequency

−100
−150
−2

10

(b) Normalized Frequency

50
Error Magnitude (dB)

Error Magnitude (dB)

−50

−200 −4
10

−2

10

50
0
−50
−100
−150
−200 −4
10

0

0
−50
−100
−150
−200 −4
10

−2

10

(c) Normalized Frequency

−2

10

(d) Normalized Frequency

Figure 2.7: Error magnitude in the case of ideal (in black),noisy (in blue) and Σ∆ (in green)
gradient descent algorithms for a noise ﬂoor of (a) 30dB, (b) 10dB, (c) -10dB and (d) -30dB
beyond the signal strength, the noisy gradient descent algorithm no longer converges to the
original signal. But the Σ∆ learning approach shows better convergence characteristics, as
the error is closer to the case of the ideal gradient descent algorithm.
In the next set of simulations, the performance of the noise induced algorithms is validated
on a sample speech utterance “26 81 57” taken from the YOHO speaker veriﬁcation data
set. The speech sample at 8KHz is oversampled by OSR=128. Fig. 2.8 shows the FFT of the
original speech sample along with the performance of noisy gradient descent and Σ∆ gradient
14

Table 2.1: SNR (in dB) for diﬀerent noise levels of the ideal, noisy and Σ∆ gradient descent
algorithms for a sample speech utterance ‘26 81 57’ from the YOHO database.
Optimization Algorithm

Noise level
(in dB)

SNR (in dB)
= 0.1

SNR (in dB)
= 0.5

SNR (in dB)
= 0.9

Ideal Gradient Descent

−

20.622

61.71

105.24

Noisy Gradient Descent

-30

19.998

61.07

87.41

-10

16.241

41.36

44.31

10

-2.61

-1.91

-2.13

30

-46.09

-48.15

-46.15

-30

20.01

61.42

101.74

-10

19.98

60.81

95.767

10

19.58

52.42

57.52

30

10.25

10.45

13.43

Σ∆ Gradient Descent

15

Magnitude of signal (dB)

50
0
−50
−100

ωc = ω0/OSR

−150 −4
10

−2

10

Normalized Frequency

Figure 2.8: FFT of the sample speech utterance “26 81 57” (in red) along with the noisy (in
blue) and Σ∆ (in green) learning algorithm

descent algorithms. It can be clearly seen that in signal band of interest (ωc = 8KHz), the
performance of the Σ∆ algorithm outweighs the performance of the noisy gradient descent
algorithm. Table 2.1 summarizes the Signal-to-Noise ratio of the output speech signal for
diﬀerent noise ﬂoor levels in the case of the ideal, noisy and noise shaped gradient descent
algorithms.

2.2

Online Learning Example

In this section, we study in detail the performance of the gradient descent algorithms for an
online tracking system. The loss function under consideration is
1 T 1
{y(t) − x}2 dt
L(x) = lim
T 0 2
T →∞
16

(2.16)

where y(t) is the signal that is being tracked by the parameter x. The loss function is being
optimized with respect to x in real-time. The gradient in this case is given by
1 T
g(x) = lim
{y(t) − x}dt
T →∞ T 0

(2.17)

The mathematical model for the error is derived for each of the algorithm.

2.2.1

Gradient Descent Tracking

The gradient descent algorithm attempts to track the input y(t) by moving in the direction
of the gradient given by g(x). At time index n, the update for the tracking signal x is given
by
xn = xn−1 +
where

yn − xn−1

(2.18)

is the step size.

Taking the Z-transform on both sides of eq. (2.18), we have

X(z) = z −1 X(z) +

Y (z) − z −1 X(z)

(2.19)

which gives
X(z) =

1 − (1 − )z −1

Y (z)

(2.20)

The error function E(z) would become

E(z) = Y (z) − X(z)
=

(1 − )(1 − z −1 )
Y (z)
1 − (1 − )z −1
17

(2.21)
(2.22)

Replacing 1 − by α we have,

E(z) =

For frequency ω

α(1 − z −1 )
Y (z)
1 − αz −1

(2.23)

ω
ω0 , replacing z −1 = 1 − j ω
0
ω
αj ω
0
E(jω) =
ω Y (jω)
(1 − α) + jα ω
0

(2.24)

Taking the magnitude of the relative error, we have

ω
αω
0

|E(jω)|
=
|Y (jω)|

2
(1 − α)2 + α2 ω 2
ω0

1
2

(2.25)

Replacing ω0 = 1−α ω0 , we get
α

|E(jω)|
=

|Y (jω)|

ω
ω0

1
2
1 + ω 2 
ω02

The choice of the adaptation parameter

(2.26)

eﬀects the error magnitude of the gradient

descent learning. Fig. 2.9 shows the real-time learning results of the tracking problem when
the input signal is a tone given by y(t) = sin(ωt). For

= 0.01 the adaptation is not fast

enough to track the input y(t). As the value of epsilon decreases, the learning becomes less
eﬃcient. Eq. (2.6) shows that the optimal value of

is given by the inverse of the hessian.
∂ 2 L(x)
∂g(x)
In this case of the tracking problem, the hessian of the loss function is
2 = ∂x = 1.
∂x
18

2

ε = 0.01
ε = 0.1
ε = 1.0
Original Sinusoid

Amplitude

1

0

−1

−2
0

50

100

150

Time (msec)
Figure 2.9: Real-time tracking of a sinusoid signal with ideal gradient descent algorithm for
= 0.01, 0.1 and 1.0

Therefore, the optimal value of the learning parameter is

= 1, in which case the output x

perfectly tracks the input signal y(t).

Fig. 2.10 gives a quantitative estimate of the relative error

|E(jω)|
in the estimate of
|Y (jω)|

x with respect to the variation of α. For a given value of the adaptation parameter , the
error E(z) increases linearly with the input signal frequency ω before saturating at a cutoﬀ
frequency ω given by
ω0 =

1−α
ω0
α

(2.27)

where ω0 is the sampling rate. It can be seen that for α → 0, the error magnitude decreases
to zero.
19

α = 0.01
α = 0.1
α = 0.5
α = 0.95

Relative Error

E(j ω) / Y(j ω)

0

10

−3

10

−2

−1

0

10
10
10
Normalized Frequency ω/ω0

Figure 2.10: Frequency dependency of the relative error

E(jω)
in gradient descent optiY (jω)

mization for diﬀerent values of the adaptation rate α

2.2.2

Eﬀect of Quantization

In online learning of most analog circuits, the gradient is digitized using an Analog-to-Digital
converter(ADC) before it is used for updating the learning parameter. The digitization of the
input signal introduces an additional quantization error in the estimate of the system parameter. The quantized gradient descent algorithm can be modeled by adding the quantization
noise qn to the ideal gradient estimate according to

xn = xn−1 + (yn − xn−1 ) + qn

(2.28)

Although the quantization noise qn is a stochastic random process, we consider the case
20

of one particular realization of qn . Taking the z-transform of the above equation, we get

X(z) =

1 − (1 − )z −1

(Y (z) + Q(z))

(2.29)

where Q(z) is the z-transform of the realization of quantization noise qn . The error function
E(z) would be given by :

E(z) =

α(1 − z −1 )
1−α
Y (z) +
Q(z)
1 − αz −1
1 − αz −1

(2.30)

Comparing it to eq. (2.23), we see that quantization of the input signal adds an additional
component of error Eq (z) to the gradient descent case, given by

Eq (z) =

1−α
Q(z)
1 − αz −1

(2.31)

ω
As in the case of the ideal gradient descent, replacing z −1 = 1 − j ω for frequency
0
ω

ω0 , and taking the power spectral density(PSD) of the error, we get
|Eq (jω)|
=
|Q(jω)|

(1 − α)
2
(1 − α)2 + α2 ω 2
ω0

1
2

(2.32)

1
where the PSD is deﬁned as |a| = (aa∗ ) 2 . The additional error Eq (z) added because of the
quantization is therefore,
|Eq (jω)|
=

|Q(jω)|

1

1
2
1 + ω 2 
ω02

21

(2.33)

0

Relative Error

10

Eq(jω) = E(jω) / Q(jω)

Ey(jω) = E(jω) / Y(jω)

Signal Error
Quantization Error

−1

10

−1

10
Normalized Frequency ω/ω0

0

10

Figure 2.11: Variation of the relative error w.r.t signal magnitude and quantization error for
a quantized gradient descent algorithm for α = 0.5

where ω0 = 1−α ω0 .
α
Fig 2.11 shows the variation of the relative error due to the quantization process Eq (z)
and also the signal error Ey (z) from the ideal gradient descent for a particular value of α. The
eq. (2.31) shows that for a given α the error component Eq (z) has low pass characteristics
with respect to the quantization error qn . The cutoﬀ frequency ω0 is the same as in the
ideal gradient descent case and the quantization error decreases after ω0 . Depending on the
statistics of the input signal, a particular value of α can be choosen inorder to reduce the
overall error in the quantized gradient case.
The error qn introduced by the quantization of the input signal can be assumed to be
uncorrelated with the input y(t) if the sampling rate ω0 is much higher than ω. Under this
condition, the error qn can be approximated to white noise with a uniform power spectral
density. Fig. 2.12 shows the variation of the noise ﬂoor of the quantization error qn with
22

30
White noise
Sinusoid input

Noise Level (dB)

20
10
0
−10
−20
1

2
3
4
No. of bits of Quantization

5

Figure 2.12: Variation of Noise ﬂoor (in dB) with increase in no. of quantization bits

increase in the number of quantization bits for both a random input signal and tone. For
every one bit increment in the quantizer, the noise ﬂoor due to quantization reduces by 6dB.
As the number of quantization bits increases, the relative error due to quantization process
decreases.

2.2.3

Σ∆ Gradient Descent Algorithm

The mathematical model for the Σ∆ learning system in the case of the learning problem is
given by

wn = wn−1 + (yn − xn ) − dn

(2.34)

dn = wn−1 + qn

(2.35)

xn+1 = xn − dn
23

(2.36)

Replacing n by n − 1 in the above iteration, we get

wn−1 = wn−2 + (yn−1 − xn−1 ) − dn−1

(2.37)

dn = (yn−1 − xn−1 ) + (qn − qn−1 )

(2.38)

which gives,

Substituting the above equation back in eq. (2.36) we get

xn = xn−1 + (yn−1 − xn−1 ) + (qn − qn−1 )

(2.39)

For one particular realization of the quantization noise qn , taking the z-transform of the
above equation we have,

X(z) =

1 − αz −1

Y (z) +

(1 − z −1 )
Q(z)
1 − αz −1

(2.40)

The error function E(z) is given by

− 1 Y (z) +

E(z) =

(1 − z −1 )
Q(z)
1 − αz −1

1 − αz −1
1 − z −1
(αY (z) + Q(z))
E(z) =
1 − αz −1

(2.41)
(2.42)

The contribution of the error from quantization of the signal Eq z in the Σ∆ gradient descent
case is given by
Eq (z) =

(1 − z −1 )
Q(z)
1 − αz −1

(2.43)

ω
As in the case of the ideal gradient descent, replacing z −1 = 1 − j ω for frequency
0
24

Relative Error

α = 0.1

1
α = 0.5

0.5
α = 0.9

0 −3
10

−2

−1

10
10
Normalized frequency ω / ω0

0

10

Figure 2.13: Variation of the relative error w.r.t normalized frequency for a quantized (in
red) and Σ∆ (in green) gradient descent algorithm for diﬀerent values of α

ω

ω0 , and taking the power spectral density(PSD) of the error, we get
ω
(1 − α) ω
0

|Eq (jω)|
=
|Q(jω)|

2
(1 − α)2 + α2 ω 2
ω0

1
2

(2.44)

Replacing ω0 = 1−α ω0 , we have
α

|Eq (jω)|
=
|Q(jω)|

1−α
α

ω
ω0
2  1
2

ω 
1 + 

ω0


(2.45)



Fig. 2.13 shows the comparison of the error due to quantization in the case of noisy
gradient descent (in red) and the Σ∆ gradient descent algorithm (in green). Eq. (2.43)
25

shows that for a given value of the adaptation parameter α, the quantization error has
highpass characteristics with a cutoﬀ frequency ω0 . Also, it can be clearly seen that the
quantization error noise ﬂoor decreases with increase in the value of α. For a input signal
with known characteristics, a proper choice of α can be made to reduce the overall error due
to the signal and quantization error components.
Fig. 2.14 and 2.15 compare the error convergence of the three algorithms with the variation of the adaptation rate

for a 1-bit and 4-bit quantizer respectively. The simulation

results validate the error analysis described in the previous sections. For the quantized gradient descent learning algorithm, the overall error increases as the value of

increases. On

the other hand, for the Σ∆ gradient descent approach, the noise shaping characteristics can
be clearly seen and the overall error actually decreases with increase in the value of . In
Fig. 2.14(d) and 2.15(d) the error in the case of ideal gradient descent is zero and is not
plotted.
Fig. 2.16 shows the variation of the error convergence of all the three optimization algorithms with change in the number of quantization bits for a constant

= 0.5. As the

quantization bits increases, we see that the error decreases in both the quantized and the
Σ∆ algorithms. This is because the noise ﬂoor due to the quantization error decreases with
increase in the number of bits of quantization as shown in ﬁg 2.12.
In the next set of simulations, the signal-to-noise(SNR) ratio of each of the algorithms
is calculated by using a tone at ω = 0.01ω0 . The output from each optimization algorithm
is low pass ﬁltered and the SNR is calculated. Fig. 2.17 shows how the SNR varies as a
function of the adaptation parameter

for diﬀerent values of the quantization bits. It can

be observed that the SNR of Σ∆ learning algorithm is much closer in performance to the
26

Table 2.2: SNR (in dB) for diﬀerent quantization levels of the diﬀerent optimization algorithms on a sample speech utterance “26 81 57”.
Optimization Algorithm

Quantization levels

SNR (in dB)
= 0.1

SNR (in dB)
= 0.5

SNR (in dB)
= 0.9

Ideal Gradient Descent

−

20.622

61.39

105.24

Quantized Gradient Descent

1

-9.87

-44.13

-56.27

2

6.03

-29.59

-41.93

4

37.60

0.734

-12.51

8

22.27

63.08

51.17

1

21.54

25.80

12.40

2

20.50

43.93

28.26

4

20.15

60.16

69.51

8

20.04

61.19

102.5

Σ∆ Gradient Descent

ideal gradient descent case. The variation of the SNR with respect to the no. of bits of
quantization is shown in Fig. 2.18 for diﬀerent values of .
In the next set of simulations, the performance of the quantized gradient descent and
Σ∆ algorithm is validated on the same sample speech utterance “26 81 57” used in the
previous section. The oversampling ratio is still set at OSR = 128. Table 2.2 summarizes
the Signal-to-Noise ratio of the output speech signal for diﬀerent levels of quantization in
the case of the ideal, noisy and noise shaped gradient descent algorithms.

27

50

50
0

0

−50
−50
−100 −4
10

−100
−150 −4
10

−2

(a)

10

50

−2

(b)

10

50

0

0

−50
−50

−100
−150 −4
10

−100 −4
10

−2

(c)

10

−2

(d)

10

Figure 2.14: Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent(in blue) and 1st order Σ∆ gradient descent (in red) optimization algorithms with a 1-bit quantizer for a (a) = 0.1,(b) = 0.2,(c) = 0.5 and (d) = 1.0

28

50

50
0

0

−50
−50
−100 −4
10

−100
−150 −4
10

−2

(a)

10

50

−2

(b)

10

50

0

0

−50
−50

−100
−150 −4
10

−100 −4
10

−2

(c)

10

−2

(d)

10

Figure 2.15: Error(in dB) vs normalized frequency of ideal gradient descent(in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization
algorithms with a 4-bit quantizer for a (a) = 0.1,(b) = 0.2,(c) = 0.5 and (d) = 1.0

29

50

50

0

0

−50

−50

−100

−100

−150 −4
10

−150 −4
10

−2

(a)

10

100

−2

(b)

10

0
−50

0

−100
−100
−200 −4
10

−150
−200 −4
10

−2

(c)

10

−2

(d)

10

Figure 2.16: Error(in dB) vs normalized frequency of ideal gradient descent (in black), quantized gradient descent (in blue) and 1st order Σ∆ gradient descent (in red) optimization
algorithms for = 0.5, with a n-bit quantizer for (a) n=1,(b) n=2,(c) n=4 and (d) n=8

30

120

80

100
SNR (in dB)

SNR (in dB)

100

120
Sigma−Delta
Quantized
Gradient Descent

60
40
20

80

Sigma−Delta
Quantized
Gradient Descent

60
40
20

0
0.2

0.4
0.6
0.8
Adaptation parameter ε (a)

0

1

80
60
40
20

0.4
0.6
0.8
Adaptation parameter ε (b)

1

120
Sigma−Delta
Quantized
Gradient Descent

SNR (in dB)

SNR (in dB)

100

0.2

100

Sigma−Delta
Quantized
Gradient Descent

80
60

0.2

0.4
0.6
0.8
Adaptation parameter ε (c)

1

40

0.2

0.4
0.6
0.8
Adaptation parameter ε (d)

Figure 2.17: SNR(in dB) vs adaptation parameter of gradient descent, quantized gradient
descent and 1st order Σ∆ gradient descent optimization algorithms with a n-bit quantizer for
(a) n=1,(b) n=2,(c) n=4 and (d) n=8

31

1

70

80
Sigma−Delta
Quantized
Gradient Descent

SNR (in dB) for ε = 0.5

SNR (in dB) for ε = 0.2

80

60
50
40
30

2

4
6
No. of bits of quantizer

Sigma−Delta
Quantized
Gradient Descent

20

2

4
6
No. of bits of quantizer

8

100
SNR (in dB) for ε = 0.95

SNR (in dB) for ε = 0.8

40

0

8

80
60
40
Sigma−Delta
Quantized
Gradient Descent

20
0

60

2

4
6
no. of bits of the quantizer

8

80
60
40
Sigma−Delta
Quantized
Gradient Descent

20
0

2

4
6
No. of bits of quantizer

Figure 2.18: SNR(in dB) vs no. of quantization bits of gradient descent, quantized gradient
descent and 1st order Σ∆ gradient descent optimization algorithms for (a) =0.2,(b) =0.5,(c)
=0.8 and (d) =0.95

32

8

Chapter 3

Generalized Noise Shaping Algorithm

The Σ∆ gradient descent learning algorithm introduced in the previous chapter made use
of a simple ﬁrst order integrator as the loop ﬁlter. Also a simple ﬁrst-order gradient descent
approximation method was used in the estimation of the updated optimization parameter. In
this chapter, we generalize the same algortihm by using a higher order loop ﬁlter for the Σ∆
modulation stage and a generalized learning ﬁlter for the optimization parameter update.
The loss function L(θ) under consideration is assumed to be continuous and diﬀerentiable
with g(θ) being the partial derivative of L(θ) with respect to the optimization parameter θ.
Fig. 3.1 shows the block diagram of the generalized noise-shaping learning algorithm. The
generalized loop ﬁlter for the Σ∆ modulation stage is H(z) and the feedback learning ﬁlter
is F (z).

The output Y (z) of the modulator stage is given by

Y (z) = H(z)[g(θ) − Y (z)] + Q(z)
33

(3.1)

Q(z)
݃ሺߠሻ

H(z)

൅
െ

൅

Y(z)

ߠ

ߝ

F(z)

Figure 3.1: Generalized block diagram of the Σ∆ gradient descent algorithm

which gives
H(z)
1 + H(z)

Y (z) =

g(θ) +

1
1 + H(z)

Q(z)

(3.2)

When the parameter estimate θ is close enough to the optimal solution θ∗ , the Taylor’s series
expansion of g(θ) about the root θ∗ gives

g(θ) = g(θ∗ ) + (θ − θ∗ )g (θ∗ )

(3.3)

As θ∗ is the root of g(θ), g(θ∗ ) = 0. Replacing (θ − θ∗ ) as a new error variable Θ we get,

g(θ) = Θg (θ∗ )

(3.4)

Replacing eq. (3.4) in eq. (3.2),

Y (z) =

H(z)Θg (θ∗ )
Q(z)
+
1 + H(z)
1 + H(z)
34

(3.5)

The feedback path in ﬁg. 3.1 shows that the learning update of Θ can be written as

Θ(z) = F (z)Y (z)

(3.6)

Substituting it back in eq. (3.5), we have

Θ(z) =

g (θ∗ )H(z)F (z)Θ(z) + F (z)Q(z)
1 + H(z)

(3.7)

F (z)Q(z)
1 + H(z) − g (θ∗ )H(z)F (z)

(3.8)

which gives
Θ(z) =

The above equation shows the dependancy of the error estimate Θ as a function of the
quantization error Q(z) in the generalized form of noise shaping algorithm. The error transfer
function w.r.t quantization noise Θq (z) is given by

Θq (z) =

Θ(z)
Q(z)

Θq (z) =

F (z)
1 + H(z) − g (θ∗ )H(z)F (z)

(3.9)
(3.10)

The magnitude of the error is given by

|Θ(z)| =

1 + H(z) − g (θ∗ )H(z)
F (z) F (z)

|Q(z)|

(3.11)

From eq. (3.11), depending on the statistics of the loss function being minimzed, proper
choice of the loop ﬁlter H(z) and feedback learning ﬁlter F (z) can be made.

35

3.1

Higher Order Σ∆ Modulation

The fundamental ideas of noise-shaping of Σ∆ modulator used in gradient descent algorithms
can be extended to the use of higher order modulation. The schematic diagram of a secondorder modulator is shown in ﬁg. 3.2. The structure makes use of two integrators H1 (z) and
H2 (z) as opposed to just one used in ﬁrst order Σ∆ modulation. The error signal at the
end of ﬁrst integration stage is used as the input for the second integrator to achieve higher
resolution than a simple ﬁrst order modulator.

X(z)
+
−

H# (z)

˱# (z)

From the z-domain analysis of the second

H$ (z)

+

˱$ (z)

Q(z)
+

Y(z)

−

˴

#

Figure 3.2: Block diagram of a second order sigma-delta modulator

order Σ∆ modulator, the output of the ﬁrst integrator stage w1 (z) can be written as

w1 (z) = X(z) − z −1 Y (z) H1 (z)

(3.12)

and the output w2 (z) of the second integrator can be written as

w2 (z) = w1 (z) − z −1 Y (z) H2 (z)

(3.13)

The quantization at the output stage is modelled by using an additive noise Q(z). The
output of the modulator Y (z) is given by

Y (z) = w2 (z) + Q(z)
36

(3.14)

From the above three equations, eliminating the variables w1 (z) and w2 (z), we get

Y (z) =

H1 (z)H2 (z)
1
X(z) +
Q(z)
1 + z −1 H2 (z)(1 + H1 (z))
1 + z −1 H2 (z)(1 + H1 (z))

(3.15)

Replacing the value of H1 (z) and H2 (z) by the integrator z-domain transfer function

H1 (z) = H2 (z) =

1
1 − z −1

(3.16)

in eqn. (3.15), we have
Y (z) = z −1 X(z) + (1 − z −1 )2 Q(z)

(3.17)

which gives the noise transfer function(NTF) as (1 − z −1 )2 . The second order modulator
provides more quantization noise suppression over low frequency signal band compared to
the ﬁrst order modulator, thereby giving higher resolution of the output signal. The second
order modulator shown in ﬁg. 3.2 can be extended to a higher order modulator by using
L integrators in succession. The noise transfer function NTF of such a modulator can be
derived to be of the form
NT F =

Y (z)
= (1 − z −1 )L
Q(z)

(3.18)

Figure 3.3 shows the improvement in the noise-shaping characteristics of a Σ∆ modulator
as the order of modulator increases. It can be shown that for a given oversampling ratio,
the resolution of the modulator increases by 6dB or 1 bit as the order increases.
The noise shaping characteristics of the higher order modulators can be exploited in the
optimization of the learning system introduced in Section 2.2. Fig 3.4 shows the performance
of the Σ∆ gradient descent algorithm when the loop ﬁlter H(z) in ﬁg 3.1 is replaced by a
37

5

Magnitude spectrum

10

0

10

1st order NTF

−5

10

2nd order NTF
3rd order NTF

−10

10

0

0.2

0.4

0.6

Normalized Frequency

Figure 3.3: Noise transfer function(NTF) of a ﬁrst, second and third order Σ∆ modulator
higher order Σ∆ modulator. The ﬁgure clearly shows that the tracking error decreases as
the order of the Σ∆ modulator increases. For a given oversampling ratio, the SNR of the
output increases by 6dB or 1-bit when the order of the Σ∆ modulator increases by one.

3.2

Batch Gradient Descent Optimization

The online gradient descent algorithm introduced in eq. (2.3) uses only the instantaneous
value of the gradient for optimization. In the case of noise corrupted signals, a better estimate
of the gradient direction can be got by using the knowledge of gradient direction from
previous iterations [17]. More precisely, for the standard ﬁrst order Σ∆ optimization, the
feedback learning transfer function F(z) from ﬁg. 3.1 can be used to accommodate previous
values of the output pulse d in the gradient estimate. We consider the case of batch gradient
descent when the learning parameter θ is updated using the previous two values of Σ∆
38

Error (in dB)

0

−50

1st Order Σ∆
2nd Order Σ∆
4th Order Σ∆
Ideal Grad Desc

−100
−4

10

−3

−2

10
10
Normalized Frequency

Figure 3.4: Error (in dB) vs normalized frequency of the gradient descent algorithm with
higher order(1st , 2nd and 4th order) Σ∆ modulation

modulator output. The update for the parameter θ using gradient descent algorithm in this
case is given by
θn = θn−1 − (dn + dn−1 )

(3.19)

Taking the z-transform of the above equation, we have

(1 − z −1 )Θ(z) = (1 + z −1 )D(z)

(3.20)

which gives
Θ(z) =

(1 + z −1 )
D(z)
(1 − z −1 )

(3.21)

Comparing it to the eqn. (3.6), the feedback learning transfer function F (z) is given by

F (z) =

(1 + z −1 )
(1 − z −1 )
39

(3.22)

Quantization noise (in dB)

0

10

−2

10

Σ∆ Batch gradient descent
Quantized gradient descent
−3

10

−2

−1

10
10
Normalized Frequency

0

10

Figure 3.5: Variation of the relative quantization error w.r.t the normalized frequency for
quantized gradient descent and batch gradient descent algorithms.

For a ﬁrst order sigma delta modulator, the loop transfer function H(z) is given by

H(z) =

z −1
(1 − z −1 )

(3.23)

Substituting eqns (3.22) and (3.23) back in the generalized error equation (3.10),the error
transfer function Θ(z) becomes

Θ(z) =

1 − z −2
Q(z)
(1 − ) − (1 + )z −1

(3.24)

Fig. 3.5 shows the noise shaping characteristics of the above batch optimization technique
compared to the standard quantization gradient descent approach. The quantization noise
level at lower frequencies is attenuated in the batch gradient descent algorithm whereas
for the quantized gradient descent case, the noise ﬂoor is fairly constant throughout the
frequency range.
40

Error (in dB)

0

−50

−100

Σ∆ gradient descent
Σ∆ Batch Gradient descent
−4

10

−3

−2

10
10
Normalized Frequency

Figure 3.6: Error (in dB) vs normalized frequency for gradient descent,ﬁrst order Σ∆ and
Σ∆ gradient descent with feedback ﬁlter F (z) for =0.5
Fig. 3.6 shows the results when batch gradient descent algorithm is used for optimization
of the tracking problem discussed in Section 2.2. The ﬁgure clearly shows that at lower
signal frequencies, the batch gradient descent has a much lower error as compared to the
Σ∆ version of the algorithm.

41

Chapter 4
Gradient-Free Stochastic
Optimization
In many real-life applications, the functional relationship between the optimization parameters and the loss function measurements is not readily available. The gradient based optimization algorithms discussed in Chapter 2 assume that such a relationship exists and
that the gradient can be readily computed based on the relationship. But in many complex
systems such as problems dealing with adaptive control, image restoration from noisy data,
training of neural networks and discrete-event systems the gradient measurement is either
not possible or computationally expensive. The models representing such complex system
may be highly inaccurate and not dependable. In such cases, several gradient-less algorithms
have been proposed in literature which make use of direct loss function measurements instead
of the gradient measurement. Direct search algorithms refers to the class of optimization
techniques which donot require the gradient value in the estimate of the optimal value. Pattern search algorithm [1], Nelder-Mead simplex algorithm [14], Rosenbrock algorithm [15]
42

are some of the popular zeroth order optimization techniques which make use of only the
loss function measurement in the optimization. The general idea behind these algorithms
is that the optimization parameter is moved in the direction in which the loss function is
decreasing.
Another class of gradient-free algorithms like Keifer-Wolfowitz ﬁnite-diﬀerence approximation(FDSA) and Simulataneous Perturbation Stochastic Approximation(SPSA) attempt at
ﬁnding the approximation to the gradient using only the loss function measurement. These
gradient approximation algorithms lead to asymptotic convergence of the parameter estimate to the optimal solution. The problem with such algorithms though lies in the fact that
the convergence of these gradient-approximation algorithms is usually slower than that of
gradient based algortihms [18]. But depending on the loss function that is being considered,
the cost of evaluating the gradient may outweigh the speed of the gradient based algorithms
in which case gradient-approximation algorithms become handy. In this Chapter, we extend
the noise shaping optimization technique introduced in Chapter 2 to the case when only the
loss function measurement can be made and the gradient is not available. We introduce the
Σ∆ optimization framework into the gradient-free algorithms discussed above.

4.1

Finite Diﬀerence Stochastic Approximation

Consider the optimization of a system with p adjustable parameters and let the objective
function be L(θ). Here the optimization parameter θ is a p-dimensional vector such that
θ ∈ Rp , where p > 1. If g (θ) represents the approximation of the gradient made by using the
ˆ
loss function measurements, then the approximation can be used in the iterative learning of
43

the optimization parameter θ using gradient descent method as follows:

k
k
θn+1 = θn − n g k (θn )
ˆ k

(4.1)

Here, the index n represents the iteration number and the index k represents the k th element
of the vector θ, k ∈ {1, 2, ..., p}. In the ﬁnite diﬀerence approximation method, each of the
gradient g k with respect to the k th element is calculated by perturbing the parameter θ along
ˆ
the direction of the unit vector uk representing the direction of θk . The approximation can
either be uni-directional or bi-directional with respect to the unit vector uk . In the case of
bi-directional estimation, the ﬁnite-diﬀerence approximation is given by

g k (θn ) =
ˆ

L(θn + cn uk ) − L(θn − cn uk )
2cn

(4.2)

In the case of ﬁnite-diﬀerence approximation of g (θ), at each iteration 2p number of
ˆ
evaluations of the loss function need to be made to get the complete gradient estimate along
all the p-dimensions.

4.2

Simultaneous Perturbation Stochastic Approximation

In the case of simultaneous perturbation approximation of the gradient g (θ), the vector θ is
ˆ
perturbed simultaneously along all the p-dimensions instead of individually evaluating the
loss function across each dimension. The SPSA approximation of the gradient makes use of
the fact that a properly generated random change of the vector θ contains the same amount
44

Gradient Estimate
+

H(

) H(
+

+

I ˤ

1
2I ˤ

First Order Σ∆ Modulator
+

∆

+

Q

ˤ

_

)
_
+

I ˤ
∆

+

-ε

Parameter Estimate
Figure 4.1: Block diagram of the Σ∆ SPSA gradient descent algorithm
of information about the gradient as in the case of p perturbations of the same vector θ along
each dimension. The gradient estimate in the case of SPSA is given by

g k (θn ) =
ˆ

L(θn + cn ∆n ) − L(θn − cn ∆n )
2cn ∆k
n

(4.3)

where ∆n is the perturbation vector along all the p-dimensions. Imposing certain statistical
conditions on cn , ∆n and gain sequence n [20], the SPSA algorithm is known to converge
asymptotically to the optimal solution θ → θ∗ . One particular choice of ∆n known to aid
the convergene is a bernoulli sequence {±1}. The SPSA algorithm gains heavily over the
ﬁnite diﬀerence method as only 2 measurements of the loss function are required to calculate
the gradient along all the dimensions p. The convergence rate of both the algorithms is the
same.
In this chapter, we extend the Σ∆ gradient approximation framework from chapter 2
to the estimation of gradient using SPSA. As the true gradient g(θ) is not available, the
approximation g (θ) generated by either SPSA (eq. (4.3)) or FDSA (eq. (4.2)) can be used
ˆ
in the noise-shaping optimization. Fig. 4.1 shows the block diagram of the SPSA algorithm
45

introduced into the Σ∆ optimization framework. The mathematical model for the stochastic
optimization is given by

ωn ← ωn−1 + [ˆn (θn ) − dn ]
g

(4.4)

dn ← Q(ωn−1 )

(4.5)

θn ← θn−1 − dn

(4.6)

where all the variables are in vector format of dimension p. For a single-level quantizer the
quantization function Q(.) is given by dn = sgn(.), in which case dn becomes a Bernoulli
sequence {±1}. We can replace ∆n by dn . As dn is a single bit quantizer, eq. (4.4) can be
written as
ωn = ωn−1 +

L(θn + cn dn ) − L(θn − cn dn )
− dn
2cn dk
n

(4.7)

which gives
dn
ωn = ωn−1 + [L(θn + cn dn ) − L(θn − cn dn ) − 2cn ]
2cn

(4.8)

The above equation shows that there is no need to generate the random direction vector ∆n
as the output bit stream of the modulator itself can be used as the direction of perturbation.

Fig. 4.2 shows that infact for the optimization of a simple enough loss function, the performance of the Σ∆ SPSA gradient descent algorithm is as good as an ideal gradient descent
algorithm with respect to the rate of convergence. The loss function under consideration is

L(t; θ1 , θ2 ) =

1
(y(t) − θ1 x1 (t) − θ2 x2 (t))2
2

(4.9)

where the input signal y(t) = a1 sin(ω1 t) + a2 sin(ω2 t) was choosen as a mixture of two
46

a1

0.5

a2

1
Loss Function

Loss function

1

θ1
0

0.5

θ2

a1
a2
θ1
θ2

0

Loss function

Loss function
0

500
1000
No. of iterations

1500

0

500

1000

(b) No. of iterations

1500

Magnitude of error (in dB)

Figure 4.2: The loss function along with the optimized parameters θ1 ,θ2 for a 2-variable
optimization problem using (a) true gradient descent optimization and (b) Σ∆ SPSA gradient
descent optimization

50
0
−50
−100 −4
10

−2

10

Normalized Frequency

Figure 4.3: Error magnitude (in dB) of the tracking system optimized with Σ∆ SPSA gradient
approximation(in blue) and quantized SPSA gradient approximation(in red) methods
diﬀerent tones at ω1 and ω2 . The optimization aims at ﬁnding the magnitude of a particular
frequency signal present in the input y(t). From the simulations in ﬁg 4.2 it can indeed be
seen that the Σ∆ SPSA gradient algorithm also converges the values of θ1 ,θ2 to a1 ,a2
respectively.
47

Fig 4.3 shows the performance of the Σ∆ gradient descent algorithm with gradient estimated from the SPSA as opposed to the quantized version of the SPSA gradient. The
noise-shaping characteristics of the Σ∆ learning algorithm discussed in the previous chapters
is preserved even if the gradient is approximated through stochastic perturbation methods.

48

Chapter 5
Application to Analog Circuit
Calibration
As opposed to digital circuits, the performance of fabricated prototypes of analog circuits
usually has a high degree of variance from the expected nature. This eﬀect is even more
pronounced in subthreshold analog circuits. But low-power operation and direct interfacing
to the real world signals make analog circuits indispensable in any practical design. Dependance of transistor characteristics on temperature,process variations and mismatches in
diﬀerent components on the die make calibration of analog circuits even more diﬃcult [23].
Digitally-assisted calibration techniques have been proposed in literature to calibrate analog
circuits [21], [22]. But the use of A/D conversion introduces unnecessary quantization noise
in the calibration process making them not feasible for use in calibrating analog circuits
operating in subthreshold region because of the high amount of non-linearity and mismatch
in such systems.
In this chapter, we present the hardware realization of the noise shaping optimization
49

framework discussed in previous chapters.We introduce it as an online learning algorithm
to calibrate a feature extraction unit that was developed for speaker veriﬁcation purposes.
The feature extraction unit consists of a bandpass ﬁlter stage, a rectiﬁcation stage and a Σ∆
modulator is used to measure the magnitude envelope of the speech signal in each frequency
band. Each of the stages is described in detail and the problems arising in the calibration
of the Σ∆ modulator and the bandpass ﬁlter stages are discussed.The variability of the
circuit performance especially in subthreshold region of operation due to mismatch in device
parameters is also studied. The optimization framework is validated on the fabricated chip
for two diﬀerent cases, (a)for balancing an unbalanced Σ∆ modulator and (b)the center
frequency calibration of a bandpass ﬁlter.

5.1

Analog Circuit Design of a Spectral Feature Extractor

The block diagram of a single-channel in analog spectral feature extraction unit along with
the calibration circuit is shown in ﬁgure 5.1. It consists of a bandpass ﬁlter stage, rectiﬁcation stage and a Σ∆ modulator. The bandpass ﬁlter captures only the in-band frequency
component of the input speech signal. The rectiﬁcation stage gives the magnitude envelope
of the ﬁltered speech signal. The modulator in each channel generates a bit-stream proportional to the energy of the speech signal in that particular frequency band. This bit-stream is
used as features for speaker veriﬁcation purposes. Each of the above blocks are programmed
with the help of serial-in current DACs.
50

Calibration Circuit (serial-in current DACs)

Digital in
˧
Speech

#,

˧

$,

˧

Bandpass Filter

˧

%

H

&

Half-wave
Rectifier

Q
Σ∆ Modulator

Figure 5.1: Block diagram of the feature extraction unit used in speaker veriﬁcation system

5.1.1

Bandpass Filter Realization

Figure 5.2 shows the realization of the bandpass ﬁlter stage using gm − C ﬁlter design.
The transconductors used for realization of the ﬁlter are described in the next section. The
output current of each of the transconductors in ﬁg. 5.2 are given by:

i1 = gm1 Vin

(5.1)

i2 = gm2 (Vlp − Vbp )

(5.2)

i3 = −gm3 Vbp

(5.3)

The currents through the capacitors C1 ,C2 are given by

i1 + i2 = C1 sVbp

(5.4)

i3 = C2 sVlp

(5.5)

Putting together eqns. (5.1)- (5.5) and solving for the bandpass ﬁlter output vbp (s), we
51

get

gm1
C1 s
Vbp (s) =
gm2
g gm3 Vin (s)
s2 + C s + m2 C
C1 2
1

(5.6)

The transfer function of a generic second-order bandpass ﬁlter with center frequency ω0 ,
quality factor Q and gain G is given by
ω0
GQs
H(s) =
ω0
2
s2 + Q s + ω 0

(5.7)

Comparing eqn. (5.6) and (5.7), we have the expressions for center frequency ω0 , gain
G and quality factor Q as following:

ω0 =
Q =
G =

gm2 gm3
C1 C2

(5.8)

C1 gm3
C2 gm1

(5.9)

gm1
gm2

(5.10)

Equations (5.8)- (5.10) show that there is a non-linear relationship between biasing parameters gm1 , gm2 , gm3 and the desired performance of the bandpass ﬁlter given by ω0 ,Q and G.
This gives rise to diﬃculty in programming the bandpass ﬁlter to its desired operating point.
Considering the problems with subthreshold operation of the ﬁlters and the DACs used to
calibrate them, this non-linear relationship increases the complexity of programming.

5.1.2

Transconductor

Fig. 5.3 shows the schematic diagram of the transconductor circuit used in the realization
of the bandpass ﬁlter described above. The output current of the transcondutor is di52

<
Y

$



C
Y
F?@

#

Y
#

F?@

%
$

Figure 5.2: Schematic diagram of the gm-C biquad ﬁlter with the bandpass ﬁlter output

rectly proportional to the diﬀerential input signal (Vin − Vref ) and the bias current of the
transconductor(Ibias ). The output stage uses a cascaded structure (M4,M6,M7 and M8)
in order to increase the output impedance of the transconductor. This helps in reducing
the loading eﬀect on the output current of the transconductor. The range of operation of
the bandpass ﬁlter described in ﬁg. 5.2 directly depends on the linear range of operation of
the transconductor. Several topologies have been studied in literature [24] to improve the
linearity of the transconductor operation region. In our circuit, a bump circuit proposed
in [24] has been used. The bump circuit (formed by transistors M B1 − M B4) draws part
of the input bias current away from the input transistors (M 3 − M 4), thereby reducing
their gm . This improves the linear range of the transconductor. The linear range of this
transconductor has been shown to be around 300mV in saturation region [26].
The transconductance and also the linear range of a FET-OTA have a square root rela53

M1

I<C

M2

G

M3

VF?@

MB1

MB3

MB2

M4

VC

MB4

M5
M6

VTPBIAS

I
M10

M7

M9

IH

M8

Figure 5.3: Schematic diagram of the transconductor circuit used in the bandpass ﬁlter

tionship [25] to the input bias current when operating in the saturation region. The tuning
of the bandpass ﬁlters over the audible frequecy range (50Hz to 4KHz) require the bias currents of the transconductors to change 10pA to 4nA, which is in the subthreshold region of
operation. The linear range of operation decreases even more dramatically in this region.
Figure 5.4 shows that the linear range of the transconductor is only about ±120mV for subthreshold currents of the order of 100pA. This reduces the linear range of the bandpass ﬁlter
input voltage even further. This is because, the input voltage across the transconductor gm2
is the diﬀerence between the lowpass Vlp and the bandpass Vbp output stages. Therefore,
for higher values of the quality factor (greater than 3), gm2 is no longer operating in linear
region, and the relationship in eq. (5.2) doesn’t hold good anymore. The whole system is
54

−9

Output current (nA)

1

x 10

Linear Range

0
1nA
775pA
550pA
325pA
100pA

−1
1.5

1.6
1.7
Input Voltage (V)

1.8

Figure 5.4: DC Analysis of the transconductor Output current for diﬀerent values of the bias
currents (in subthreshold region)
no longer a linear system. As a work around to this problem, a simple capacitive voltage
attenuator as shown in ﬁg. 5.5 has been used. The attenuation at this stage is given by
Vf g
C1
=
Vin
C1 + C2 + Cgs

(5.11)

where Cgs is the sum of the gate-to-source parasitic capacitances of all the transistors connected to the ﬂoating gate output node Vf g . For C1 = 80f F and C2 = 320f F , the input
to the ﬁlter stage was seen to attenuate by a factor of 10, thereby increasing the dynamic
range of operation of the bandpass ﬁlter. A separate source of injection current Vinj is also
provided to program the ﬂoating gate to Vref .
55

V
inj
V
in

V

fg

C1
C2

Vref
Figure 5.5: Voltage Attenuator to increase the linear range of operation of the bandpass ﬁlter

5.1.3

Current DAC

Each of the transconductors used in the bandpass ﬁlter are biased through a current DAC.Figure 5.6
shows the schematic of a 10-bit current DAC. The current division in the DAC is done by
a standard MOS resistive network. The transistors of each stage of the DAC are so sized
that the current divides by half, thereby generating a binary current DAC. The DACs are
programmed through a serially connected shift-register. Depending on the digital bit stream
(b0 − b9) that is programmed, the current in each branch is either bypassed or gets added to
the IDAC . The current division at each stage is accurate only if all the transistors operate
in saturation region. But as the DAC resolution increases (for LSB bits), the transistors operate at lesser and lesser currents due to current division and therefore go out of saturation
region. This leads to non-linear operation of the current DAC. The monotonic nature of the
DAC response has been characterized in [26] for saturation region.
For input bias currents Ibias in subthreshold region, the current division at each stage is
no longer accurate. Infact, this eﬀect is so pronounced in subthreshold region that the DAC
ceases to be not only linear but also monotonic. This can be observed from ﬁg. 5.7 which
shows the performace of a 6-bit DAC in both saturation region(in red, for Ibias = 5µA) and
56

W/L

W/L

2W/L

W/L

2W/L

W/L
2W/L

W/L

Figure 5.6: Schematic diagram of the 10-bit current DAC providing the bias current for each
transconductor
subthreshold region (in black, for Ibias = 1.3nA).When a large number of input bits ﬂip say
from 011111 to 100000, the compounded eﬀect of inaccurate current division leads to a huge
non-monotonicity in the output of the DAC which is clearly seen in the ﬁg. 5.7.
The eﬀect of this non-monotonic behavior of the current DACs on programmability of the
bandpass ﬁlter is shown in ﬁg 5.8. Three diﬀerent 10 bit current DACs are used to program
the bias currents of the transconductors of the bandpass ﬁlter in ﬁg 5.2. The quality factor
and the gain are set to unity by ensuring that all the three DACs have the same 10-bit input.
The DAC inputs are swept from 0 to 1023, and for each value of the input, the response of the
modulator is plotted. The modulator output is lowpass ﬁltered by averaging the bitstream
and the DC value of the output is plotted showing the non-monotonic behaviour of the ﬁlter
response because of the DAC currents.

5.1.4

Halfwave Rectiﬁer

The ﬁlter output from the bandpass ﬁlter is passed through a half-wave rectiﬁer. This rectiﬁcation stage extracts the magnitude envelope of the speech signal present in that particular
57

Normalized current Iout / Ibias

1
0.8
0.6
0.4
Ibias = 5uA

0.2

Ibias = 1.3nA
0

10

20

30
40
DAC input (6 bit)

50

60

Figure 5.7: Output current vs. input bit of a 6-bit DAC showing the non-monotonic nature
of current DAC in subthreshold region of operation

frequency band. The rectiﬁcation stage consists of a transconductor which converts the output of the bandpass ﬁlter stage Vbp to current, with a diode connected PMOS at the output
as shown in Fig. 5.9. The diode connection ensures that the current is only fed into the Σ∆
converter. The negative feedback across the ampliﬁer in the Σ∆ modulator stage ensures
that the output node Iout is held at Vref provided the ampliﬁer is strong enough. The
output impedance of the transconductor is increased by the addition of a cascaded stage as
in ﬁg. 5.3. This enables a precise current summation at the input stage of the Σ∆ modulator.

58

Figure 5.8: The bandpass characteristics of the ﬁlter biased with subthreshold current DAC’s
at a unity gain and Q = 1.

C
IH

Figure 5.9: Schematic diagram of the half-wave rectifer used in the feature extractor

5.1.5

Σ∆ Modulator

Figure. 5.10 shows the schematic diagram of the ﬁrst-order current mode Σ∆ modulator.
It is used to measure the magnitude of the rectiﬁed signal from the half-wave rectiﬁcation
stage described above. The reference currents for the Σ∆ modulator are generated by a
cascaded transistor pair M 2, M 3 and M 4, M 5. The positive reference current Ip generated
by M 2, M 3 acts as a current source and the negative reference current In acts as a current
sink. A standard fold-cascoded opamp is used for the integration stage of the modulator.
59

M6
VPBIAS1
VPBIAS2

CINT

M5

Ip

M4

-

I in
VNBIAS2
VNBIAS1

RESET

In

M3

Comparator

-

+

d

+

M2
M1

Figure 5.10: The schematic diagram of the Σ∆ modulator

In order to ensure the correct functionality of the integrator, the opamp was biased to
have an open-loop gain of atleast 60dB. The quantization stage of the modulator consists
of a hysteretic comparator which produces 1-bit quantization. The hysteresis comparator
is formed by using a simple diﬀerential ampliﬁer followed by current starved inverter. The
performance of the Σ∆ modulator along with the hysteretic comparator has been previously
studied [27] and is not a subject of discussion here. It has been shown that the modulator
has about 10-bit resolution over a wide input current range (50f A − 100nA).
For a completely balanced Σ∆ modulator, Ip=In=Iref . The integrator voltage in this case
at time instance i with a sampling rate Ts is given by
Ts
(I − di Iref )
Vi+1 = Vi +
CIN T in
60

(5.12)

where di = sign(Vi ). Taking the summation of the above equation over N iterations and
taking the limit as N → ∞, we have
N
di =
i=1

Iin
Iref

(5.13)

which shows that the pulse modulated bit stream di gives a faithful representation of the
input current Iin provided the sampling rate Ts is much higher than the bandwidth of the
input current. The case of an unbalanced Σ∆ modulator where Ip = In is dealt with in detail
later. The calibration of an unbalanced Σ∆ modulator wherein the optimization technique
described in Chapter.2 is applied to self-calibrate a Σ∆ modulator is also presented in the
later sections along with the results from hardware testing.

5.2

Test Station Setup

The block diagram of the test station used to characterize the chip is shown in ﬁgure 5.11.
The experimental setup consists of an OpalKelly XEM3010 FPGA which is used for generating all the digital clock pulses required for the chip. The state machine running on the FPGA
can be controlled through Matlab and C interface from the computer. The FPGA is also
used for storing the pulse encoded bit stream generated from the Σ∆ modulator. The FPGA
has an integrated 32MB SDRAM which was used for this purpose. The FPGA has an USB
interface for high-speed data transfer with PC. The collected data is later post-processed
and analyzed in Matlab.
All the analog biases for the chip are programmed through the National Instruments
data acquisition card. The custom made motherboard has 40 diﬀerent analog output ports
61

Host
PC
USB

Opalkelly
FPGA Board

USB

Digital I/O

C
Interface

Audio
Speech
Signal input

Test Chip

MATLAB
Interface

Configuration
Parameters

PCI

Bias voltage

LT2600
Bias Generator

Bias
Parameter

NI
Interface

Real-time
data collection

VHDL
State Machine

Figure 5.11: System level diagram of the test station setup

each of which can be digitally programmed through the NI card. These analog output ports
are generated by 5 diﬀerent 16-bit DACs (LTC2600) present on the motherboard each of
which have 8 analog outputs. The motherboard also generates reference voltage for these
DACs and also the power supply for the test chip. The daughter board oﬀers the ﬂexibility
of interfacing the test station setup with any design under test. Figure 5.12 shows the actual
setup of the test station. The motherboard, FPGA, daughterboard and the test chip can
be clearly seen in the ﬁgure. Figure 5.13 shows the micrograph of the fabricated chip with
the bandpass ﬁlter stage (a) and the current DAC (b). The prototype has been fabricated
on a 0.5µm CMOS process from MOSIS. A single channel of the bandpass ﬁlter occupies
62

XEM3010 FPGA
Board

Daughter Board

Design Under Test

Test Station Setup

Motherboard

Figure 5.12: Test station setup showing the interface between the Opalkelly XEM3010 FPGA,
NI data acquisition card, motherboard and the test chip

100µm X 300µm area. Table 5.1 summarizes the performance of the fabricated prototype.
The most power hogging element in the whole design was observed to be the Σ∆ modulator.
The ﬁlter and the rectiﬁer were consuming less than 300nW of power.

5.3

Calibration of Sigma Delta Modulator

In this section, the unbalanced Σ∆ modulator is studied in detail. The noise shaping Σ∆
optimization introduced in Chapter.2 is extended to self-calibrate an unbalanced Σ∆ modulator. The results from the simulation and the hardware are also presented. Equation (5.12)
shows the functionality of a balanced Σ∆ modulator. For an unbalanced Σ∆ modulator, the
63

50µm

(a)
25µm
10 bit Current DAC
(b)

Figure 5.13: Micrograph of (a) single channel feature extractor with bandpass ﬁlter,rectiﬁer
and Σ∆ modulator stages and (b) the 10-bit current DAC fabricated on a 0.5µm CMOS
process

integrator voltage at time instant i is given by

Ts
Vi+1 = Vi +
I − di Iq
CIN T in

(5.14)

di = sgn(Vi )

(5.15)

where Iq is the pulse averaged reference current given by Iq = qi In + (1 − qi )Ip. Here, qi
1+di
is the bitstream di in 0’s and 1’s given by qi =
2 . Taking the summation of eq (5.14)
for N iterations,


N



VN − V0
Ts 
=
Iin −
di qi In + 1 − qi Ip 
N
CIN T
i=1
64

(5.16)

Table 5.1: Measured speciﬁcations of the fabricated single-channel bandpass ﬁlter chip

Technology
0.5µm CMOS process
Single Channel size
1000µm X 300µm
Power - Bandpass ﬁlter
150nW
Power - Rectiﬁer
100nW
Power - Current DACs
150nW
Power - Σ∆ modulaor
40µW
Tunable Frequency Range
10Hz - 10KHz
Input Range
400mV
Quality Factor Q
3 (max)
Transconductor Linear Range
240mV
Input Pulse Rate
250KHz

Assuming Vi is bounded, as N → ∞ we have

Iin =

Ip + In
2

1
N

N
di −
i=1

Ip − In
2

(5.17)

Equation (5.17) shows that for an unbalanced Σ∆ modulator, the average of the bitstream at the modulator output doesn’t truthfully represent the input current Iin . There is
a ﬁnite error in the measurement of the input current given by the last term in eq. (5.17).
Infact, for zero input current Iin = 0, the modulator has a non-zero oﬀset given by
1
dof f =
N

N
i=1

Ip − In
di =
Ip + In

(5.18)

It can clearly be seen that the oﬀset goes to zero for a balanced Σ∆ modulator.
Figure 5.14 shows the performance of an array of Σ∆ modulators biased with the same input
voltages. The current references for each of the modulator are generated by the cascaded
MOS pair as described in ﬁg. 5.10. Even though the Σ∆ modulators operate at the egde
65

Offset mismatch

Channel number

Figure 5.14: Oﬀset mismatch between individual channels of an array of Σ∆ modulators for
constant input voltage across all the channels

of saturation region (at about 1µA current), we see that the oﬀset variation is as high as
70 percent in some of the channels. All the modulators are biased with the same input
voltage, but because of the diﬀerences in routing length of the metal lines supplying input
voltages V N BIAS1 and V P BIAS1 in the chip for each modulator, there is a signiﬁcant IR
drop before the voltage appears at each of the modulator. This voltage diﬀerence between
individual modulators results in unbalanced reference currents causing the oﬀset shown.
The oﬀset described above can lead to erroneous estimates of the input current measurement. This current oﬀset can be cancelled by extending the Σ∆ optimization technique
described in earlier chapters to self-calibrate a Σ∆ modulator.
Lemma 1: For an unbalanced modulator with Ip = In and a zero input current Iin , the
66

iterative update
Ini+1 = Ini + i di

(5.19)

converges In to Ip for large enough value of N .
Proof: For input current Iin = 0, eq. (5.14) can be written as
Ts
−di qi In + (1 − qi )Ip
Vi+1 = Vi +
CIN T

(5.20)

Let Ierror = In − Ip. Substituting it in above equation,
Ts
Vi+1 = Vi +
−di qi Ip + qi Ierror + (1 − qi )Ip
CIN T

(5.21)

Observing that di qi = di , we have
Ts
−di Ierror − di Ip
Vi+1 = Vi +
CIN T

(5.22)

Comparing the above equation with eqn. (5.12), we observe that the eﬀective input current
ˆ
Iin is given by
ˆ
Iin = −di Ierror

(5.23)

ˆ2
The Σ∆ gradient descent algorithm introduced earlier converges the objective function 1 Iin
2
to its minimum for large enough number of iterations N. The minimum is obtained when
ˆ
the expected value of the gradient Iin = 0, i.e.,

ˆ
E Iin = E −di Ierror = 0

67

(5.24)

The correlation between di and Ierror means that this happens only when E[Ierror ] = 0
and E[di ] = 0. So, for a large enough N,
n→∞
Ierror −→ 0

(5.25)

n→∞
The above equation shows that In −→ Ip which is a balanced Σ∆ modulator.
The performance of the above Σ∆ calibration has been shown in ﬁg. 5.15. The objective
1 ˆ2
function being minimized 2 Iin is plotted by varying the 10-bit current DAC input bits from
0 to 1023. Figure 5.15(a) shows the loss function for Ip = 0.7Ibias , where Ibias is the
current bias of the DAC. The presence of a large number of local minimum can be seen from
the ﬁgure. Figure 5.15(b) shows the convergence of the modulator output to zero because of
the adaptation in eq. (5.19). It can be seen clearly that despite the presence of other local
minimum, the optimization converges to the global minimum at the DAC input of around
780.
1
Σ∆ modulator output

0.25

Loss function

0.2
0.15
0.1
0.05

0.8
0.6
0.4
0.2
0

0
0

500
1000
(a) 10−bit DAC input sweep

0

1000

2000 3000 4000
(b) No. of iterations

5000

ˆ2
Figure 5.15: Figure showing (a) the loss function 1 Iin as a variation of the 10-bit current
2
DAC sweep, (b) the convergence of the modulator output to zero

Figure 5.16 shows the veriﬁcation of Σ∆ calibration on hardware. The Σ∆ modulator
68

1.2

Σ∆ Modulator output

1
0.8
0.6
0.4
0.2
0
−0.2
0

0.05

0.1

0.15
0.2
Time (in secs)

0.25

0.3

Figure 5.16: Convergence of the modulator output due to In adaptation from the bandpass
ﬁlter chip
output converges to zero once the negative current In is adapted. In this experiment, the
reference currents are not generated by the DAC but by the voltage biases of the cascaded
transistors shown in ﬁg. 5.10. Transistors M 2, M 3 form a voltage(V N BIAS1)controlled
current source. For a given positive reference current Ip, the adaptation is done by programming V N BIAS1 using the Opalkelly FPGA. Depending on the modulator output di ,
the 16-bit DAC(LTC2600) on the motherboard is either increased or decreased by 1 LSB.
The voltage V N BIAS1 ia initially set to 0 at t = 0.

5.4

Calibration of a bandpass ﬁlter

The performance of any analog IC design is greatly eﬀected by the mismatch between diﬀerent components within the chip. The design of high presicion analog circuits needs special
care in order to mitigate the eﬀect of mismatch. This mismatch is majorly because of the
69

Table 5.2: Variation of the bandpass ﬁlter characteristics across two diﬀerent chips operating
under similar conditions
Center Frequency ω0
Quality Factor Q
Gain G

Chip 1
420 Hz
3.2
66.62 dB

Chip 2
370 Hz
0.8
68.6 dB

variability and reliability of the physical process used in fabricating the device. Analog
circuits are also sensitive to ambient temperature. Also, the variation of the performance
across diﬀerent chips is a major problem in designing precise analog circuits. Several approaches ([28],[29]) have been used to model the mismatches between devices based on the
drain-current relationship of a transistor in saturation region. Even a small device mismatch
of transistors in weak inversion causes a huge performance variation across components. Figure 5.17 shows the variation in the performance of the bandpass ﬁlter characteristics across
two diﬀerent chips, both of which were biased under similar ambient conditions and same
input bias. The variation across the chip shows that the mismatch can be as high as 50 percent among diﬀerent components. One way to deal with the mismatch is to use an on-chip
learning mechanism[30] that can auto-correct for the variation in performance of the chip to
the desired output by using a feedback system.
In this section, we introduce the problem of calibration of a bandpass ﬁlter to a particular center frequency as an on-chip learning framework, by using the Σ∆ gradient descent
approach. Figure 5.18 shows the block diagram of the bandpass ﬁlter calibration system.
The bandpass ﬁlter described in previous section acts as a low-pass ﬁlter when the input
signal is modulated on the reference voltage of the transconductor gm3 instead of Vin1 .
Assuming that all the transconductors are operating in their linear region, the current
70

70

Gain (in dB)

60

Chip 2

50

Chip 1
40

10

100

1000

Frequency
Figure 5.17: Mismatch between the performance of bandpass ﬁlters from two diﬀerent chips
with same inputs under similar operating conditions

equations for each of the transconductor are given by

i1 = gm1 Vin1

(5.26)

i2 = gm2 (Vlp − Vout )

(5.27)

i3 = gm3 (Vin2 − Vout )

(5.28)

71

V
out
g m2
Vin1

V
g

m1

V

Input tone

ref

g
V

Biquad Filter

m3

in2

Half-wave
Rectifier

Semi Synchronous Q
Σ∆ Modulator

Center frequency
Adaptation

Figure 5.18: Block diagram of a single channel feature extraction unit with center frequency
calibration

The currents ﬂowing through each of the capacitor C1 , C2 are given by

i1 + i2 = C1 sVout
i3 = C2 sVlp

(5.29)
(5.30)

Solving for Vout from the above equation we have,

Vout (s) = Hbp (s)Vin1 (s) + Hlp (s)Vin2 (s)
72

(5.31)

Normalized Magnitude

bandpass filter
lowpass filter

ωc
Frequency

Figure 5.19: Frequency response of the bandpass and the lowpass stages of the biquad ﬁlter
showing the magnitude of the residual signal for input tone of diﬀerent frequencies

where

gm1
C1 s
Hbp (s) =
gm2
g gm3
s2 + C s + m2 C
C1 2
1

(5.32)

and
1
Hlp (s) =
gm2
g gm3
s2 + C s + m2 C
C1 2
1

(5.33)

which shows that the systems has second-order lowpass ﬁlter characteristics w.r.t the input
Vin2 . The cutoﬀ frequency of Hlp (s) is the same as the center frequency of the bandpass
ﬁlter Hbp (s). From eq. (5.31), deﬁne a loss function L(ω) as

L(ω) =

2
1
Hbp (ω)Vin1 (ω) + Hlp (ω)Vin2 (ω)
2

(5.34)

The problem of calibration of the bandpass ﬁlter to a center frequency ω0 reduces to optimization of L(ω).
73

ˇˌ

ˇˌ

Residual Signal
Figure 5.20: Oscilloscope ﬁgure showing the input tones seperated by 90o phase and the
residual signal from the ﬁlter

Figure 5.19 shows the frequency response of both the bandpass and the lowpass stages
of the biquad ﬁlter for a unity quality factor. The magnitude of the residual signal from
the ﬁlter goes to zero at the center frequency ω0 . For input frequencies lesser than ω0 , the
lowpass ﬁlter stage has a unity gain and the bandpass ﬁlter gain tapers at 20dB/decade,
resulting in a non-zero output signal at Vout . For frequencies greater than ω0 the bandpass
ﬁlter gain reduces by only 20dB/decade as opposed to 40dB/decade reduction of the lowpass
ﬁlter gain. As a result for ω > ω0 , there is a non-zero residual signal at Vout . The minimum
point at ω0 can be reached by using a Σ∆ gradient descent algorithm using the loss function
described in eq (5.34). The on-chip calibration method described above has been veriﬁed on
hardware using the fabricated prototype. For this experiment, the bandpass ﬁlter is biased
at unity gain G and quality factor Q. The input Vin1 to the ﬁlter is a tone whose frequency
is the same as the center frequency ω0 to which the bandpass ﬁlter is to be calibrated.
74

Averaged output of the Σ∆ modulator

0.5

0.4

0.3

0.2

ω0
0

0.05

0.1

0.15 0.2 0.25
Time (in sec)

0.3

Figure 5.21: Moving average of the Σ∆ modulator showing the minimum when center frequency reaches the frequency of the input tone
Equation (5.32) shows that the bandpass ﬁlter has a zero at origin, thereby introducing a
phase diﬀerence of 90o at the output compared to Vin1 . In order to facilitate the correct
cancellation of the signal at the output due to the lowpass ﬁlter stage, the second input Vin2
is phase shifted by the same 90o with respect to Vin1 . Figure 5.20 shows both the inputs
to the ﬁlter and also the non-zero residual ﬁlter output Vout for an uncalibrated bandpass
ﬁlter with center frequency not equal to the input tone ω0 .
Figure 5.21 shows the results of the calibration from the fabricated chip. The Σ∆ modulator was run at a sampling frequency of 250KHz. The input tone was set at ω=2KHz and
the bandpass ﬁlter was initially programmed to around 1KHz using voltage biasing. The
center frequency of the ﬁlter was adjusted according to the learning algorithm by using a
FPGA which tuned the voltage biasing of the transconductors. The modulator output was
ﬁltered by taking a 1024 sample moving average of the bit-stream. Fig 5.21 shows that the
modulator output does not go to zero after optimization. This is because of the presence of
75

a constant oﬀset current from the rectiﬁer due to imperfect cancellation of the inputs at the
ﬁlter stage. Also, once the minimum point ω0 was achieved, the modulator doesn’t remain
at that point because the rectiﬁer does not estimate the true gradient of the loss function
deﬁned in eq. (5.34) but only gives the magnitude, which is always positive. The actual
calibration point can be achieved by taking the summation of the bitstream up until the
minimum point ω0 is achieved.

76

Chapter 6
Conclusion
A novel Σ∆ gradient descent optimization algorithm has been presented in this work. The
proposed learning algorithm has been shown to have better precision compared to the traditionally used noisy gradient descent algorithm by using the noise-shaping characteristics of
the Σ∆ modulator in real-time tracking of system parameters. The stochastic nature of the
noise in the quantization stage has been shown to be helpful in the case of global convergence
as opposed to the ideal gradient descent algorithm which converges to local minimum. The
proposed algorithm was later extended to include higher-order noise shaping and learning
mechanisms. The performance improvement of the algorithm was shown through simulation
on a sample speech utterance from YOHO database. The Σ∆ optimization framework was
later extended to accommodate gradient-free optimization techniques like ﬁnite-diﬀerence
stochastic approximation and simultaneous perturbation stochastic approximation.
The performance of the Σ∆ gradient descent algorithm has later been validated on hardware by modeling the problem of calibration of analog circuits as an optimization problem. The Σ∆ learning was shown to be robust enough to account for mismatches and
77

non-linearities even in subthreshold region of operation. The self-calibration of an unbalanced Σ∆ modulator converged to the global minimum even in the presence of several local
minima points because of the non-monotonicity of the calibration DACs in the subthreshold
region. The algorithm was also used to calibrate the center frequency of a gm-C biquad ﬁlter
by exploiting the lowpass and bandpass ﬁlter stages present within the biquad ﬁlter.

78

BIBLIOGRAPHY

79

BIBLIOGRAPHY

[1] R. Hooke and T.A. Jeeves, “Direct search solution of numerical and statistical problems”,
J. ACM, 8 (1961), pp.212-229.
[2] M. J. D. Powell, “Direct search algorithms for optimization calculations”’, Acta Numerica, 7 , pp 287-336(1998).
[3] Battiti R.,“ First- and Second-Order Methods for Learning: Between Steepest Descent
and Newtons Method”, Neural Computation, Vol. 4,141-156., 1992.
[4] Goldberg, E. David, “Genetic Algorithms in Search Optimization and Machine Learning”, Addison Wesley, p41, ISBN 0201157675.
[5] D. Bertsimas and J. Tsitsiklis, “Simulated Annealing”, Journal of Statistical Science,
1993, Vol 8, No. 1, pp 10-15.
[6] J. Kennedy, R. Eberhart, “Particle Swarm Optimization”, IEEE International Conference
on Neural Networks, Vol 4, pp. 1942 - 1948, Nov. 1995
[7] Battiti R.,“ Reactive Search: Toward Self-Tuning Heuristics”, John Wiley & Sons Ltd.,
pages 61-83, Chichester, 1996.
[8] R. Battiti and M. Brunato, “The Reactive Tabu Search”, ORSA Journal of Computing,
6(2):126-140, 1994
[9] Luo, Z.,“ On the convergence of the LMS algorithm with adaptive learning rate for linear
feedforward networks,” Neural Computation, Vol. 3, 226-245., 1991.
[10] S. Gelfald and S.K. Mitter, “Recursive Stochastic Algorithms for Global Optimization”,J. Control and Optimization, Vol 29, pp 999-1018, Sept. 1991.
80

[11] Fang, H., Gng, G., and Qian M., “Annealing of Iterative Stochastic Schemes,” J. Control
and Optim., 35, pp. 1886-1907, 1997.
[12] S. Gelfald and S.K. Mitter, “Simulated annealing with noisy or imprecise energy measurements”,J. Optim. Theor. Appl., 62:49-62, 1989.
[13] H. Robbins and S. Monro, “ A Stochastic approximation method”, Ann. Math. Statist.,
22: 400-407, 1951.
[14] J.A. Nelder and R. Mead, “A simplex method for function minimization”, Comput. J.,
7 (1965), pp.308-313.
[15] H. H. Rosenbrock, “An automatic method for ﬁnding the greatest or least value of a
function”, Comput. J., 3 (1960), pp. 175-184.
[16] J. R. Blum, “ Approximation methods which converge with probability one”, Ann.
Math. Statist., 25:382-386, 1954.
[17] L. Bottou, “ Stochastic Learning”, Advanced Lectures in Machine Learning, pages 146168, Springer, 2003.
[18] J. Kiefer and J. Wolfowitz, “Stochastic estimate of a regression function”, Ann. Math.
Statistics., 23: 462-466, 1952
[19] J. C. Spall, “A stochastic approximation technique for generating maximum likelihood
parameter estimates”, Proc. Amer. Control Conf., 1987, pp. 1161-1167.
[20] J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation
gradient approximation”, IEEE Trans. Automat. Control, 37: pp. 332-341, 1992.
[21] B. Murmann,“Digitally assisted analog circuits”, IEEE Micro,Vol. 26, pp.38-47, Mar.
2006.
[22] B. Murmann, B. Boser,“A 12-bit 75-MS/s pipelined ADC using open-loop residue ampliﬁcation”, IEEE J. Solid-State Circuits, vol. 38, no. 12, pp.2040-2050, Dec. 2003.
[23] P. R. Kinget,“Device Mismatch and Tradeoﬀs in the design of analog circuits”, in IEEE
Journal of Solid-State Circuits, Vol. 40, Issue:6, pp.1212-1224, Jun 2005.
81

[24] P.M. Furth and H.A. Ommani. “Low-voltage highly-linear transconductor design in
subthreshold CMOS”, Proceedings of the 40th Midwest Symposium on Circuits and
Systems, 1997. Volume 1, 3-6 Aug. 1997 Page(s):156 - 159 vol.1
[25] Achim Gratz, “Operational Transconductance Ampliﬁers”.
[26] A. Gore, “Design of High-Dimensional Oversampling Data Converters with On-chip
Learning : Theory, Algorithm and Hardware Realization”.
[27] A. Gore, S. Chakrabartty, S. Pal and E. C. Alocilja, “A Multichannel FemtoampereSensitivity Potentiostat Array for Biosensing Applications,” in IEEE Transactions of
Circuits and Systems -I, Vol. 53, No. 11, November 2006.
[28] S. C. Wong et al., “A CMOS mismatch model and scaling eﬀects,” IEEE Electron Device
Lett., vol. 18, pp. 261263, June 1997.
[29] S. Lovett et al., “Characterizing the mismatch of submicron MOS transistors,” in IEEE
Int. Conf. Microelectronic Test Structures, vol. 9, Mar.1996, pp. 3942.
[30] G. Cauwenberghs.,“Neuromorphic Learning VLSI Systems: A Survey”
[31] Yannis P. Tsividis,Ken Suyama.,“MOSFET Modeling for Analog Circuit CAD: Problems and Prospects,” IEEE Journal of Solid-State Circuits, vol.29, no. 3,March 1994

82