III II II II ‘I IIIIIII‘ II 2“; ‘III
III» , I w «I»
HHIII IIIII I W III

I
I I
I

I

I

II‘I
[I II
II

"Io—A I‘
look) Il
mac-1:. II

A NEW DYNAMIC STOCHASTIC APPROXIMATION
PROCEDURE

Dissertation for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
DAVID RUPPERT
1977

L.—

IIIIIIIIIIIIIIIIIIIIIIIIIII mg; J

University

 

This is to certify that the
thesis entitled

A New*Dynamic Stochabtic

Appnoximatéon Pnoecdune

presented by

David Rappeat

has been accepted towards fulﬁllment
of the requirements for

 

 

“an M‘m

Major professor

Date W If lIQI‘I

0-7639

 

.«

3 00 "I 2.2:?

ABSTRACT
A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE
By
David Ruppert

This paper considers Robbins-Monro stochastic approximation
when the regression function changes with time. At time n, one
can select Xn and observe an unbiased estimator of the regression
function evaluated at X". Let an be the root of the regression
function at time n. Our goal is to select the sequence Xn so
that Xn - 9n converges to 0. It is assumed that an = f(sn)
for sn known at time n and f an unknown element of a class of
functions. Under certain conditions on this class andcuithe sequence
of regression functions, we obtain a random sequence Xn such that
IXn - enl converges to 0 in Cesaro mean with probability 1. Under

more stringent conditions, Xn - en converges to 0 with probability 1.

A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE

By
David Ruppert

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1977

L: ‘1 .0

To

Patricia

ii

ACKNOWLEDGEMENTS

I wish to thank Professor Vaclav Fabian for having suggested
this area of dynamic stochastic approximation to me, for giving
guidance to the research as it progressed, and finally for his valu-
able criticism of earlier drafts of this thesis.

Also, I want to thank Professor Jan Marik for a suggestion
which led to theorem 3.ll.

I believe that each member of the faculty of the Department
of Statistics and Probability has helped me in some way during my
four years of graduate study at Michigan State University and to
them I am grateful.

Mrs. Noralee Burkhardt deserves thanks for typing my thesis
excellently and for her cheerfulness and the patience while correct-

ing my errors.

Chapter
l

#0)“)

TABLE OF CONTENTS

INTRODUCTION
NOTATION AND ASSUMPTIONS
GENERAL RESULTS

RESTRICTION T0 H A FINITE DIMENSIONAL
VECTOR SPACE

BIBLIOGRAPHY

iv

Page

18
30

CHAPTER I
INTRODUCTION

This study has been motivated by practical situations in
which a process is controlled by a variable X and it is desirable
to choose X in such a manner that the response, Rn(X), at time n
is close to 0. If en satisfies Rn(en) = 0, it would be enough
to choose Xn’ the value of X at time n, equal or close to en.
The basic information is provided by the process itself; for any
choice of Xn we can obtain an unbiased estimate of Rn(xn).

If Rn’ or at least an, is independent of n and some
regularity conditions are satisfied, then the stochastic approxima-
tion procedure of Robbins and Monro (1951) provides a method of
selecting a sequence {Xn} such that Xn + 61 almost surely.

We are concerned here with situations where 6n does change
with n. Dupac (1965, l966) and Uosaki (I974) studied such situa-
tions, but their model is substantially different from ours; both
models shall be compared later.

In our model we assume that 9n = f(sn) for an f in a
family 8 of functions on a set S and for a sequence {Sn} in
S. Initially, only 3 is known, not f and not {Sn}° At time n,
the value 5 becomes known, and, after Xn is selected, an unbiased

n
estimate of Rn(Xn) is observed.

The interpretation is that sn summarizes the knowledge about
the process at time n. For example, in the case of a process in-

volving a chemical reactor, 5 can describe the age of the filter,

n
the quality of the catalyzer, and the impurities of the input. In
another example we may have 5n = n and then the assumption concern-
ing an means simply that the function n~+en is in 3.

We propose an approximation method, for which Xn - on
approaches 0 in a certain sense, for some families 3. For example,
3 can be the family of all functions f on [0,1] such that, for

some K and a > %, depending on f,
|f(x) - f(y)l .<._. le - yl“

for all x,y in [0,1] (cf. Theorem 3.9).

Another example, admittedly simpler, yet of considerable
practical importance, is the case when S is the family of all
'linear combinations of k functions f],f2,...,fk.

Both these examples are special cases of the more general con-
dition (see assumption 2.3) that there exists an inner product space
H and a function U on the set S into H such that
S c {f8; 8 E H} where fB denotes the function defined on S and
assigning to each s in S the value <8, U(s)>.

Under certain additional regularity conditions we shall show
that the proposed approximation procedure yields {Xn} for which
X - on + O in Cesaro mean with probability one; under more stringent

n
conditions Xn - on + O with probability one.

Dupac (1965, 1966) considered Robbins-Monro type stochastic
approximation methods when the root changes during the approximation
process and Uosaki (1974) generalized his work. In these papers,
the basic assumption is that en+1 is equal to gn(en), with gn
known, plus an unknown but small Vn' The procedure then is similar
to the original Robbins-Monro procedure except that where the latter
obtains the estimate Xn+1 by adjusting Xn’ the former adjusts
gn+](xn) (and neglects ,Vn)'

In our model, the procedure estimates the function fB by
estimating B. If H is infinite dimensional the procedure allows
us to keep the estimates finite-dimensional in order that the pro-
cedure can be practically realizable.

In addition to the above problems we also consider, in
theorem 4.5, the case where U(sn) is a random variable with values
in Rk.

In summary we will show that under conditions similar to
those used to prove the convergence of the Robbins-Monro method,

Xn - 6n + O in Cesaro mean with probability one, where an is the
unique root of Rn(X) = O and Xn is our estimate of on. Of
practical importance are similar generalizations of the Kiefer—
Holfowitz (1952) method of maximization (or minimization) of func-
tions on R and Blum's (1954) multi-dimensional version of the
Kiefer-Nolfowitz method. One can expect that the methods obtained

by such generalizations would have a property analogous to the

almost sure Cesaro mean convergence of Xn - on to O.

CHAPTER 2
NOTATION AND ASSUMPTIONS

2.1 Notation. The conventions introduced here hold throughout. Let

 

Rk be k—dimensional Euclidean space. The space R1 will be denoted

simply as R. Denote the transpose of the matrix A by AT. Then

k

the inner product on R is defined by

<x,y> = xTy for x,y E Rk.

If A and B are sets, then AB

is the set of all functions
from B to A.

Let (Q,F,P). be a probability space. If F e F, then IF
is the indicator of F.

If V is a normed vector space, then let !_ be the smallest

o-algebra containing all open balls, that is all sets of the form
{X e V: “X + a“ < e} for e > O and a e V .

A map T in V9 is called a measurable transformation into V if

T'](V)<: F. If T1,...,T are measurable transformations into V.

n
then o{T],...,Tn} is the smallest o-algebra on 9 containing

-1
1 1,. (1) .

"C:

i

All relations between measurable transformations are meant

to hold with probability one.

If hn is a sequence of numbers, then 0(hn) denotes a

sequence gn of numbers such that for some K
|h']g | < K for all n
n n —- '

2.2 Assumption. (i) Let S be a set and suppose Sc: RS. Suppose

 

f 6 3.

(ii) Let Rn 6 RR, en 6 R, and A > 0. Suppose

(I) (X - 6n)Rn(X) : 0
and
(2) IRn(X)I :AHX-enl +1)

for all X E R. Let sn 6 S and suppose
(3) e = f(s ).

2.3 Assumption. (i) Assumption 2.2(i) holds. Let H be a real
vector space and suppose <-,-> is an inner product on H, i.e.

<-,-> is a map from H x H to R such that if x,y,z e H and

a E R then

<ax + y,z> = a<x,z> + <y,z>,

<x,y> <y,x>,

<x,x> O,

| v

and

<x,x> = 0 implies x = O .

~ 1’ . .
For x E H define “x“ = <x,x>2. Suppose there is a function U in

S such that for each f in 3 there exists a B in H satisfying

H
f(s) = <8, U(s)> for all s 6 S.

(ii) Assumption 2.2(ii) holds. Let Un = U(sn).

2.4 Remark. He shall now consider the problem of estimating the

 

sequence {en}. The experimenter knows H, <-,->, and U and he
knows that assumption 2.3 holds. At time n he estimates 8 by an
estimate 8". Also at this time he learns the value of sn and

therefore of U"; he uses U to estimate 6n by Xn = <8", Un>.

n
He can also observe a random variable Yn’ an unbiased (conditionally,

given the past) estimator of Rn(Xn). He then forms his next estimate

*
= B - a Y U

B n nnn

n+1

*
with an a suitably chosen non-negative number and Un either equal

to Un or a suitable approximation to Un' For example, if H = 12,
then the experimenter may wish to use a finite dimensional approximation,
u;, to a Un in 22.

We shall reformulate the construction of the 8n in the follow-
ing assumption, where Fn is the o-algebra associated with the "past"
at time n.

2.5 Assumption. (i) Assumption 2.3 holds.

 

(ii) Let Fn be an increasing sequence of o-algebras contained in F.

*
Suppose {8n}, {Yn}, and {Un} are sequences of measurable

transformations into H, random variables, and elements of H,

respectively, such that with Xn = <Bn,Un>

*
(1) 8n+1 = 8n - anYnUn for some an.: O,

0(B],...,Bn)<: F”, and

Fn Fn 2 2
(2) E Yn = Rn(xn) and E (Yn - Rn(Xn)) :_o for some 0 .

(iii) Assume that

*
(3) <8", Un - Un> = 0 .

 

2.6 Remark. Suppose we wish to choose u: not equal to Un' Then
(2.5.3) will still hold if for an increasing sequence of subspaces,

*
{Hn}, 8] e H] and Un is the projection of Un onto H for

n+l’
then by (2.5.1) 8n E Hn for all n.

CHAPTER 3
GENERAL RESULTS

We will be interested in the convergence to O of the sequences
{“8n - 8H} and {Xn - 0"} defined in assumptions 2.2, 2.3, and 2.5.
These two sequences are closely connected for under these assumptions
Xn - on = <8n - B, Un>. Forpractical purposes {Xn - en} is of
primary importance since Xn would be the value of the control
variable at time n while an would be a value of the control vari-

able which for our intentions would be optimal at time n.

3.1 Lemma. Suppose assumption 2.5 holds and

 

(1) 2 an(1 + nunn)|<e, on - u:>| < a
and

(2) z aﬁuu:“2(1 + “UnU2) < m .

Then,

(3) ( th - 8H2 has a finite limit
and

(4) 2 aan(Xn)(Xn - en) < w ,

Proof: By (2.5.l) and (2.5.2)

F

(5) E n B - 8H2 < I8 - BIZ - 2a R (x )<B - s u*>
H ._ I n l n n n n . n

n+I
2 * 2 2 2
+ anUUnU (Rn(Xn) + o ) .
Now by (2.2.2),

<6) IRn(XnII:A(|<Bn - e.u,,>l + 1) :AIIIBn - en nun“ + I).

Also by (2.5.3)

II
A
X

*
«8n - B,Un>
Therefore

(7) Rn(Xn)<Bn - B .U:> )

|v

Rn(xn)(xn ' en

* 2
- |<8,Un - Un>|A((HBn - 8H + 1)HUnH + 1).

Substituting (6) and (7) into (5), one obtains

F

n 2 2 -
(8) E hen.] - 8H .1 H8" - 8h <1 + tn) - ZaanIXnIIXn ' 9n) + 9n

2. 2 * 2 *
where fn = 0(anhUn“ nun“ + an|<B,Un - Un>|“Un“) and

n
By (1) and (2), z fn + 9n < m. Thus (3) and (4) hold by

2 * * ..
9n = 0(anHUn“ + anl<B, U - Un>|(1 + “Unn)) and fn,gn 3_O.

Theorem 1 of Robbins and Siegmund (1971).

3.2 Remark. Condition (3.1.1) involves 3 which, of course, is

 

unknown. However 8 depends on f and f is known to be in the
class 3. Thus it may be possible to verify condition (3.1.1) by

using properties of S.

IO

3.3 Theorem. Let assumption 2.5 hold. Let a, y, and e be numbers

 

satisfying

and

a + e > I.

Suppose for a > O,

‘0
an = an ,

* Y
hUnH + HUnH = 0(n ).
and g
* -
(I + Hunn>l<s. Un - Un>| = o<n 5).
If for all n > 0
(9) inf inf IRn(X)| > 0

n n§IX-6n|

or if Y = 0 and for all n > 0

(10) inf inf _] an(X)| > o
n njlx-enljn
then
-1 n
n 2 IX - O I + 0.
k=l k k

Proof: First, (3.1.1) holds since

an(] + “UnH)I<Bs Un - U;>l = 0(n-(a+€))

and a + e > 1. Next (3.1.2) holds for

ll
aﬁHUZHZII + nunnz) = 0(n"2“‘4YI)

and 2a - 4y > 1. Thus by lemma 3.1, 2 aan(Xn)(Xn - on) < m and
lim “an - 8“ exists and is finite. From now until the end of the
proof we look at an m for which the two properties hold and write
5 instead of E(w) for any random variable a. For every n > 0

there is a 6(n) > 0 such that
(11) an - 6n] 3_n implies IRn(Xn)| > 6(n).

This follows directly from (9); if (10) holds and y = 0 then
IXn - en] 5 “on“ “B" - B“ is a bounded sequence and (11) holds again.

Let n > 0, set In = 1 if |Xn - onl 3_n and 0 otherwise. Then the

finiteness of z a R (X )(X

n n n n - en) and (11) imply

n n ' enl < m’

By Kronecker's lemma (see Loéve (1963), p. 238)

n
'0
n 2 IX - a II + O.
k=l k k k

Since oi: l,

o -1 n . -1 n
11m sup n 2 IX - e §_n + 11m Sup n z Ix - 9 II = n
n+w k=l n n n+m kz] n n "

for all n > O.

n

3.4 Remarks. The conclusion n"1

 

Z le - ekl + O is of practical
1 n l

importance since if n' z IXk - ekl is small then the process would
I

have run at near optimal conditions for most of the first n runs.

12

Without additional assumptions, the conclusion of theorem 3.3
cannot be strengthened to (Xn - 6") + O, as can be seen in example
4.7 below. Moreover, example 4.8 below shows that under the hypo-
theses of theorem 3.3 (Bn - 8) + 0 may fail even if (Xn - on) + 0.

Since u: is intended to be an approximation to Un we can

' *
expect that “Unu = 0(nUn“) and in that case the condition
* Y
m nun“ + nun“ = on >

would be known to hold with y = 0 if U is bounded. If U is
unbounded then it might be difficult to verify that (1) holds; how-
ever, the theorem has been formulated to allow y.> O.

3.5 Assumption. Let D be a countable set. Define the real vector

space as and the inner product <-,->D on 23 by

 

£6 = {g 6 RD: 2 92(d) < 00}
den
and
<g,h>D = 2 g(d)h(d) for g,h E is .

deD
2 . . _ B
For f E to define “fHD — <f,f>D.

Let {0”} be a sequence of finite subsets of D with

Suppose assumption 2.2(i) holds. Let U be a map from S

2 1 _ . *
to 20, et Un — U(sn), and define Un by

*
Un(d) Un(d) if d e on+1

o if d e on+].

13

Suppose assumption 2.5(ii) holds with an = an'“ for some a > 0

and a > %. Suppose
81(d) = 0 if d i 0].

3.6 Remark. If assumption 3.5 holds, then it can be easily shown
by induction (see Remark 2.6) that assumption 2.5(iii) holds.

3.7 Assumption. Assumption 3.5 holds with the following choices

 

of S, S, D, Dn’ and U.

S = [0,1].
_ [0,11,
3 - {f e R . for some K > O and y > %,
|f(x) - f(y)| §_K|x - y|Y whenever x,y E [0,1]}.
0 = {(k,m): m = k = 0 or m and k are integers satisfying
mic and lik12m}.
0n = {(k,m): (k,m) e D and 2m.: n}.

U(x)(k,m) = 1 if (k,m) = (0,0).

For (k,m) f (0,0),

 

U(x)(k,m) = hum" if x e ("—,},l, “"21
m
2 2
-1 . k-» k
= -(m+l) 1f’ x 6 (——3,-—0
2m 2'1n

=0 if xe("—‘-‘—.“—) with 2fk and 1<2<2‘“.

2m 2'" ‘ ”

As a function of x, U(x)(k,m) is continuous at 0 and l and at points
of discontinuity it equals the arithmetic mean of its left and right

limits.

14

3.8 Remark. Note that U(-)(k,m) is a multiple of the Haar func-
tion with indices k and m as defined by Alexits (1961), page 46.
3.9 Theorem. Let assumption 3.7 hold. Then,

1 n

n- 2 IX - 0 I + 0 .
k=l k k
Proof: We need only show that the hypotheses of theorem 3.3 hold.

First we will show that assumption 2.3 holds with H = 2
D

2
0' Let

B E R be defined by

B(k,m) = 2'"(m+1)2 I; f(x)U(x)(k,m)dx

for (k,m) E D. By the definition of S we can and shall choose
a g > a such that |f(x) - f(y)| 5_K|x - ng for some K and all
x,y 6 [0,1]. Then,
-(m+l) _
|B(k.m)| = (m + 1)2m1r§ f(z m<k - a) - x)

- f(2'm(k - g) + x)dx|
-(m+l)

:_k(m+1)2m+€fé ngx = 0((m+1)2'€m) .
Therefore 8 E 23 since
m 2m m
z z (8(k.m))2 = z 0((m+1)2)2m(1-2€))
m=0 k=l m=0

and 1 - 25 < 0.

Now f(x) = <B,U(x)>D for x 6 [0,1] by Alexits(1961),
theorem 1.6.2. Thus assumption 2.3 holds; therefore assumption

2.5 ho1ds.

15
2m

For xe [0.11, 1: (U(x)(k,m))2_<_ (m+1)‘2. Thus
k=l

8

m'2 < m and therefore “UnUD + “u:“0 = 0(1).

ll [‘1

sun 11161135. 1 +
x m 1

Finally <3. un - ”h’o = 0(n'E) by Alexits (1961), 4.6.1
Therefore the hypotheses of theorem 3.3 are satisfied with y = 0
and e = E.

3.10 Assumption. Assumption 3.5 holds with the following choices

 

of S, S, 0, Dn’ and U.

S = [0, 2n],
K = {g E REO’ZNJ: g is Lebesgue measurable,
f8"g(u)du = 0 and x§"(g(u112du < m}.
and
S = {h e REO’ZNJ: h(x) = f3h'(u)du + c where c E R and
h' e K}.
D={(l,0)}U{(i,k): i=l,2 and kill,
0n = {(i,k) 6 D: k §_n} for n 3_0,
and
U(x)(i,k) = 1 k = O
= k.1 cos kx i = l and k 3_l
=k" sin kx i=2 and kil.

3.11 Theorem. Let assumption 3.10 hold. Then

 

n 2 IX - e I + 0 .
k=l k k

16

Proof: Define B 6 RD by
8(l.0) =-%; 13" f(x)dx
and
_ k 2n
B(l,k) - T'fo cos (kx)f(x)dx
B(2,k) = g-rg“ Sln (kx)f(x)dx

By the definition of S, f is the indefinite integral of

Zn

f' and f' E K. Since f0 f'(x)dx = O, f(0) = f(2n). Thus

integration by parts shows that for k :_1,

II
#

8(i,k) - - [8" sin (kx)f‘(x)dx if i

=I—a

f2"

0 cos (kx)f'(x)dx if i = 2 .

:1_:

Since f8“(f'(x))2dx < w, B € 23 by the Bessel inequality.

Since f is an indefinite integral, it is continuous and

of bounded variation. Therefore

f(x) = <8, U(x)>D

by, e.g., Akhieser (1956), section III, 53. Then assumption 2.3
holds and therefore assumption 2.5 holds.

Note that sup |IU(x)IID < m.

X€[0,2n]
Finally
* 2 m 2 2
IIU - U [I = >3 (U (NO) + (U (2,k))
n n D k=m+l k k
5_2 2 k'2 = 0(r:x’2dx) = 0(n' ),

k=n+l

17

so by the Cauchy-Schwarz inequality I<8, Un - UN>D| = 0(n'g).
Therefore the hypotheses of theorem 3.3 are satisfied with y = 0

and e = 15.

f CHAPTER 4
RESTRICTION T0 H A FINITE DIMENSIONAL VECTOR SPACE

4.1. Foreword. If assumption 2.2 holds with 8 equal to the vector

 

space spanned by k functions f],...,fk, then assumption 2.3 holds

k

with H = R and U the map

13(5)
5 + .
fk(SI
In this case,since inner products in Rk are easily computed,it is
reasonable to suppose that in assumption 2.4 u: and Un have been

*
chosen so Un = U (see Remark 2.4).

n
Suppose assumption 2.5 or assumption 4.4, the analogue of

assumption 2.5 when Un is random, hold. AS will be seen,

U8n+l - 8n“ converges to zero. Therefore it is possible to find con-

ditions on Un so that, roughly speaking, 8n - 8 cannot be almost

perpendicular to Un too Often and under these conditions,

“8" - B“ + 0'

k _ *
and Un — U .

4.2. Theorem. Let assumption 2.5 hold with H = R n

 

Fix p.: k and let N" be the k x p matrix whose ith column is

U for i = l,...,p. Let an and on be the minimum

n+i-l ,max
T

and maximum eigenvalues, respectively, of wnwn.

,min

Suppose the sequence {a } satisfies

n

18

19

2 2 2
(1) 2 an “Uh“ (l + “Uh“ ) < w and
(2) z ( min aj) = w
k=l nKEJEFk+p-1

for a sequence of integers {nk} such that
(3) nk+1 3_nk + p for all k and
for some 6, A > 0.

(4) 6 :,6 < A for all k.

. < 6

Suppose that for all e > 0,

(5) inf( _1 inf IRn(x)I) > 0 .
n e >Ix-OnI>e

Then .“Bn - B“ + 0. If in addition sgp “Unu < m then Xn - on + 0.
Proof: All the assumptions Of lemma 3.1 hold SO (3.1.3) and (3.1.4)

hold.
By (1)
E(X (anIIUnII(Yn - Rn(Xn))2))_<_ 02 z aﬁnunnz < .. .
Thus
(6) annunmvn - Rn(Xn)) + 0 .
Also by (1)

m annunno +11un1n + o.

20

*
Then since (2.5.1) holds with Un = Un

(8) New - Bull : anIIUnlUAU + I18n - 811111.11) + IYn - Rn(xn)l).

By (3-1-3), (6). (7), and (8)

,hB - 8"“ + 0.

n+1
For any k.: 1 define

A = {16‘

k < limIIBn - BII < k} n {IIB

n+1 ' 8n“ + 0}.

Since except for a set of probability 0

{11m ”an - on > 0} = A

"c: 8

kl k’

to prove IIBn - B“ + 0 we need only show that for any k, P(Ak) = 0.
We now fix k and fix w 6 Ak' Until the end of the proof
we write g instead of g(w) for any random variable 5. Now

choose L1 such that

118,, - 8“ > k"

whenever I > L].
2 _.

Then for all 2 3_L],

p-l

Z (<U

2 T 2
.. B - 13>) = “w (B - BIII
i=0 "2+1 "2 "I "I

_ T _T _
- (8n - 8) Mn N" (8" B)

R. R. 9. 2.

2 -2
2.6 -IIB -BII:6k-
n£,m1n n2

Here we used (4) and the result that if A is a positive definite

T

k x k matrix with minimum eigenvalue, A, then x Ax 3_X-"x“2 for

21

k

all x e R (see Rao (1973) (lf.2.l)).

Thus there exists a sequence {mx} such that mg is in the

set {n£, n£+l,...,n£+p-l} and
-2
B - B>)2,:-§5-

9 <U
()(21 p

whenever 2 3_L].

By (3), m2+1 :m2 for all I. Also by (4),

(<Um , X>)2 < “NT x“2 < AIIXII2 for x 6 RK. Since “B - 8
2 - n£ - m

B 1

mil 1

2 ‘IB. - e. ‘I,
._ L 1 1-1
1—n£.l

n +p-l
(10) (<U B - B >)2 < AI 12 H3 ‘ 3 U2)
m1 m2 "2 "' i=n£+l 1 1']
Since IIBn+1 - 8n“ + 0 we have by (10) that for a number L2
(11) (<U B - B >)2 < §£:E- hene e 2 > L
m ’ m n —- 4p w V r -— 2'

IL 9. 2.

By (9) and (11), if we let L = max{L],L2} then

 

IX -el=l<Uis -B>I
m2 m2 m2 m2
._: l<U , B - e>| - I<U , B - s >|
me "2 m2 me "I
> k-1 é- whenever I > L
_2 p _°
Also IX - e I = I<é - B , U >I 5_A sup{“e - 8“} < m.
mﬁ mg mg m2 m2 n n

Thus by (5) there exists r > 0 such that

R (X )(X - e
mI m2 m2 m2

) 3_r whenever R 3_L.

22

Then Since

"MB

aan(xn)(xn - en) 1

n 1

oo

2 a R (X )(X - 9 )
I=L me me m2 m2 m2

3_ 2 (min a.)r
I=L n£5j§n£+p-l J

u
8

it follows from (3.1.4) that P(Ak) 0.

Finally Since Xn - 6n = <8“ -8],Un>, Xn - 9n + 0 1f

supHUh“ < m.

4.3. Remark. Until now we have assumed that sn is a fixed

 

element of S. Assumption 4.4 is an analogue Of assumption 2.5 when
sn 6 SS2 and on is a random variable.

At time n, the expected Output of the process, given the
past, depends on both Xn and on. When on was non-random we
wrote the expected output as Rn(Xn); the dependence Of the output
on on is implicit in this expression. When an is random it is
more convenient to denote the expected Output as Rn(Xn,en) where
Rn is a mapping of R2 into R.

4.4. Assumption. Assumption 2.2(i) holds. Let Rn be a Borel map
2

 

from R to R such that

(x - y)Rn(x,y) 3_0 for all x,y 6 R.

IRn(x.y)I :.A(1 + Ix - yI) for A > 0.

Let sn E S9 and define

23

on = f(sn).
Suppose assumption (2.3)(i) holds with H = Rk

Un is a measurable transformation into Rk.

and with Un = U(sn),

k

Let {8“} and {Yn} be random sequences in R and R,

respectively, such that with

F = o{B],...,Bn, U],...,U }

n n
we have
8n+1 = 8n - anYnUn for some an :_0,
Fn
E Yn = Rn(xn,en),
and
F

n 2
E (Yn - Rn(Xn,en)) §_o < m.

4.5. Theorem. Let assumption 4.4 hold. Define

 

*
Fn = O{B1’°°"Bn’ U1""’Un-1 .

Suppose r, K > 0. Assume

*
. Fn -1
Wk P (FIIXII : l<UnaX>| 5. I" . IIXII) : 1‘
xeR
and
*

F
E "(nunifu + 1111,1121) < K.

If

(l) 2 an = m and z a < m
and for all e > 0

(2) inf inf IRn(x,y)| >‘0,

n e-]>IX-yI>e

then
IIBn - on + 0.

If

an = an'a with a > 0 and k < a < l,

2
EIIBIII < °°2

and for some c > 0
(3) IRn(x,y)| 3_c|x - yI for all x,y E R,

then

2

sup nO‘IIBn - 8“ < w and sup n“ EIXn - On < m.

n
Proof: First,

* 'k 'k

F F F
n 2 _ , 2 n 2 n 2
(4) E U8n+l - 8H - hen - B“ - Zan E Yn(Xn — on) + an E (YnUUnU) .

If we define h(x) for x 3_0 by
h(x) = inf inf _]IRn(y,Z)I
n XI‘_<_Iy-zI_<_XI‘

then,

25

'k 'k

F F F
E "(Yn(Xn - e )) E n((x - en)E "1")

E:
E (xn - on)Rn(xn,en)
I *
Fn -1
_>. 1118,, - 811M118n - 81111’ (r IIBn-BIBI<Un:Bn-B>Izrllﬁn-BII)

n

Thus,
*
Fn 2
(5) E (Yn(xn - en))_: r.“sn - BHn(HBn - 8“).
Next,
F* F* F
n 2 n 2 n 2
E (vnuunh = E (1111,11 E 1,.)
F*
1 E "nunuzmﬁixnap + 021
and since
2 2 2 2
Rn<xn.en1_<_ 2A (118,, - 811 11U-n|1 + 11
*
(6) EF"(Y [U 12 = 0(Is - e 2 + 1)
"I nII- I n II '

By using (4) - (6) we obtain
*
F'

(7) E "“8 2

2
n+1 - 811 :118n - 811 (1 + fn)

- zanriisn - 811N118. - 8111 + 9n

with fn,gn 3_0 and fn,gn = 0(ag). Then by theorem 1 of Robbins and

Siegmund (1971), lim “8n - 8“ exists and is finite and

z anUBn ‘ BUUINBn ' 8“) < m .

26

Since by (l) and (2), z anxnn(xn) = m if {xn} is any sequence of

numbers satisfying xn + x with x f 0, we have IIBn - 8H + 0.

Moreover, (3) implies

h(x) 3_c r IxI

and this with (7), EHB1HZ < m, and an - an‘“ implies

' BIIZ : EIIBn ' 8112(1 + in) - M E |an - EII2 n'“ + a

.5118 -n

n+1

. _ -2a
for some M > 0 and Wlth fn’gn :_0 and fn,gn - 0(n ).

Then by a lemma of Chung (see Fabian (1971), lemma 3.1)

8

sup {naEIIBn - enz} <
n

1
T and

. . ~ , 2 , 2
Slnce E|xn - enl 5_E(H8n - thUnH) (Ehen - B“ EhUnH )

I A

W%V:K

sup {nu EIX - enI} <
n

8

4.7. Example. With this example we Show that the assumptions of

 

theorem 3.3 imply neither Xn - On + 0 nor “an - B“ + 0.
Let H = R2. Suppose e1 and e2 are the standard unit
vectors in R2, i.e., eI = (1,0) and e; = (0,1). Suppose B is

the zero vector, Rn(x) = x for all n, B] = e] and an = an']

for 0 < a < 1.

Let G be a subsequence Of the integers such that 2 an < m.
nEG
Assume that Un is e] or e2 according as n 6 G or n ¢ G.

Assume the process is deterministic, i.e. Yn = Rn(Xn).

27

For E 6 R2 let g(i) be the ith coordinate of g,
i = 1,2.
If n e G, then UII) = 0 and therefore

(1) = (1)

(l) Bn+1 8n if n i G.
If n 6 G, then Yn = Xn = <8n,Un> = 83]) and Us]) = 1, SO
(2) BAII = 8£1)(l - an) if n 6 G .

Since 8(1) = l, we have by (l) and (2) that

1

85]) = H (1 - ak) for n > 1 .
k<n
kEG
Since a < l, (l - an) f 0 for all n. Then since

2 an < w, there exists d > 0 such that

nEG
lim ( H (1 - ak)) = d .
n-wo kEG
k<n

83]) whenever n e G

Therefore séI) /»0 = 8(1). Moreover Xn =

and therefore Xn - 6n /»0.

4.8. Example. Here we have another example satisfying the con-

 

ditions of theorem 3.3 but for which 8n />B. However, in this case

Xn - on + 0.
Let H = R2. Elements of H will be represented as complex
numbers. Suppose 8 = 0 and
Rn(x) = l for x > 0

-l for x :_O.

28

Suppose c1 = l and

n-1
1 .2 3']
cn = e 3:] for n 3.2.
~ in-1
Let an - Ie - lI and
. -l
. _ 1n _ -1
Un - (e l)a cn

Also assume 8] = l and Yn = Rn(xn)’ Then for all n

(l) 8n = cn and

- ‘1 L
(21 x - e = xn = -(‘ C35 " 12 .

 

whence Xn - en + 0 but “an - B“ = l for all n.

To prove the last statement, first note that

. -1 _ . -1
a2 = (eln _ ])(e-1n

n - l) = 2(1 - cos n'l).

Next, with Re ¢ denoting the real part of the complex number ¢,

<cn,Un> = Re(cnUn)

 

Thus if (1) holds for n = k, so does (2). Moreover (l) and (2)

with n = k imply (l) for n = k+l by the following calculation:

 

'1
_ l - cos k P
Bk+1 ‘ Ck ‘ ak(Rk(-( 2 )2))Uk

II
0
x.
+
A
(D
I
nun-J
V
O
$-

By Observing that (1) holds for n = 1 the proof is completed.
Note that by Taylor‘s theorem

aﬁ = 2(1 - cos n']) 5 n'

2 + 0(n-4)

with 0 < dn < n']. It is then easy to see that the assumptions of

theorem 3.3 hold if the theorem is trivially generalized by replacing

the assumption an = an'a by an = cnn'a with 0 < m 5-Cn 5_M < m

for some m,M.

BIBLIOGRAPHY

BIBLIOGRAPHY

Akhiezer, N.I. (1956). Theory of Approximation. Ungar, New York.

 

Alexits, G. (1961). Convergence Problems gf_0rthogonal Series.
Pergamon Press, New York.

 

 

Blum, J.R. (1954). Multidimensional stochastic approximation methods.
Ann. Math. Statist. 2_5_ 737-744.

 

Dupac, V. (1965). A dynamic stochastic approximation method. 5mm,
Math. Statist. 36_ 1695-1702.

 

Dupac, V. (1966). Stochastic approximations in the presence Of
trend. Czech. Math. J, l§_(9l) 454-461.

 

Fabian, V. (1971). Stochastic approximation. Optimizing Methods
jm_Statistics (J.S. Rustagi, ed.) 439-470. Academic Press,
New York.

 

 

Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation Of the
maximum of a regression function. Ann. Math. Statist. g§_ 462-466.

 

 

Loéve, M. (1963). Probabilitylheory(3rd edition). Van Nostrand,
New York.

Rao, C.R. (1973). Linear Statistical Inference and ItS Applications.
John Wiley, New York.

Robbins, H. and Monro, S. (1951). A stochastic approximation method.
Ann. Math. Statist. 22_ 400-407.

 

Robbins, H. and Siegmund, D. (1971). A convergence theorem for non
negative almost supermartingales and some applications.
thimizimngethods jm_Statistics (J.S. Rustagi, ed.) 233-257.
Academic Press, New York.

 

Uosaki, K. (1974). Some generalizations Of dynamic stochastic
approximation processes. Ann. Statist. 2_ 1042-1048.

 

30

 

MICHIGAN STATE UNIV. LIBRARIES
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
31293102512823