5‘1
H

 

A KlEFER - WOLFOWITZ TYPE STOCHASTIC
APPROXIMATION PROCEDURE

Dissertation for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
THOMAS EDWARD DBREMSK!

1976

,lllttw

I l

 

This is to certify that the

thesis entitled

A KIEFER-WOLFOWITZ TYPE
STOCHASTIC APPROXIMATION PROCEDURE

presented by

Thomas Edward Obremski

has been accepted towards fulﬁllment

of the requirements for
Ph.D. Statistics and

Probability

 

 

degree in

\ch cLoe v totD‘owx

Major professor

 

Dam April 30, 1976

 

0-7639

 

-~N‘\"
\
1..

5.3 9; A; 7

§;
V A!
wt}-

 

it§7

ABSTRACT

A KIEFER-WOLFOWITZ TYPE
STOCHASTIC APPROXIMATION PROCEDURE

By

Thomas Edward Obremski

In considering both the Robbins-Monro (RM) and
Kiefer-Wolfowitz (KW) stochastic approximation procedures,
Abdelhamid (1973) has shown that if the density g of
the errors in estimating function values (RM case), and
differences of function values (KW case) is known, then
a transformation of observations leads to methods which
under mild conditions have desirable asymptotic properties.
Fabian (1973) obtained the same asymptotic results in the
(RM) case without assuming knowledge of the density g. We
study the analogous problem in the (KW) case, and obtain

the same asymptotic results as Abdelhamid.

A KIEFER-WOLFOWITZ TYPE
STOCHASTIC APPROXIMATION PROCEDURE

By

Thomas Edward Obremski

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1976

To Kacey, who
(to the best of my
knowledge) never
doubted I could do it.

ii

ACKNOWLEDGEMENTS

I am sincerely grateful to Professor Véclav Fabian
for the guidance he gave me during my thesis work. His
patience and valuable suggestions were always a source of
encouragement for me. I feel that I have learned from him
not only about the content area of the thesis, but also a

good deal about writing style.

I would also like to thank David Ruppert for reading
the material and discussing it with me, and Mrs. Noralee

Burkhardt for her excellent typing of the thesis.

iii

Chapter
1
2

4
REFERENCES

TABLE OF CONTENTS

INTRODUCTION

ALMOST SURE CONVERGENCE OF THE
PROPOSED STOCHASTIC APPROXIMATION
PROCEDURE . .

ASYMPTOTIC NORMALITY OF THE PROPOSED

PROCEDURE
THE MAIN RESULT .

iv

Page

17
21
26

CHAPTER ONE
INTRODUCTION

Consider the stochastic approximation procedure

given by

(1.1) Xn+1 = Xn - ancn Yn , n = 1,2,...,

where Xh, Yn are random variables and an, cn are
positive numbers. Included in (1.1) are both the Robbins-
Monro (1951) procedure (RM), and the Kiefer-Wolfowitz
(1952) procedure (KW).

Abdelhamid (1973) and, independently, Anbar (1973)
investigated the possible effect that transforming the
observed random variables Yn might have on the almost
sure convergence and the asymptotic normality. Abdelhamid's
investigation included both the RM and KW cases, Anbar's
only the RM case.

Specifically they studied the asymptotic behavior

of the procedure given by
- -1 _
(1.2) _ Xn+1 — Xn - anon h(Yn) , n - l,2,...,

where h was assumed to belong to a class C of Borel
measurable functions which preserve both the almost sure
convergence of Xn to 6 and the asymptotic normality of

1

nB(Xn - e), where in the RM case, 6 is the unknown root
of the function f and 8 = k, and in the KW case, 6 is
the point of minimum of f and 8 lies in the interval

[1/4, 1/3], depending on the assumptions on f. Denote

by Fn the sigma-algebra generated by X1,X2,...,Xn. The
following conditions were assumed for the random variables
F
_ _ n . . . _
Vn - Yn E (Yn)' Vn are conditionally (given Fn) dis

tributed according to a symmetric distribution function
G admitting density g; g has derivative almost every-
where with respect to G; O < I(g) =
I(g'(V)/g(V))2dG(V) < +w.

Within the class C, they sought that function
h* which would minimize the second moment of the asymptotic
distribution. It was known that in the case where g is
normal, such an h was given by the identity function,
that is, when g is known to be normal, (1.1) cannot be
improved upon by transformation of observations. They
found in general that within the class C, h*(v) =
(-g'/g)(v), unique up to multiplicative constant. So for
example if g is double exponential, then (-g'/g)(v) =
C sign(v) with a constant C > 0, and the optimal pro-

cedure is

_r '1
(1'3) Xn+1 _ kn ' a cn

n sign(Yn), n = 1,2,...,

first suggested by Fabian (1960 and 1964).

Abdelhamid also suggested improvements in some
cases where G is known but fails to satisfy all of the
assumptions above.

Without assuming knowledge of the distribution G,
Fabian (1973a) constructed a RM-type procedure which per-
forms asymptotically as well as the transformed RM pro-

-1

cedure (1.2) does when G is known. With an = an ,

CD = l in the RM case and cn-Y in the KW case, a and

c positive numbers, and y in the interval [1/6, 1/4],
Abdelhamid had derived values of a and c optimal in

the sense of minimizing the second moment of the asymptotic
distribution. In the RM case the optimal choice of a is
(f'(6)I(g))-1. Fabian suggested methods of estimating
I(g), -g'/g, and f'(6) and pointed out some of the prob-
lems inherent in such estimation.

The main purpose of this paper is to achieve
asymptotic results in the KW case with. G unknown which
are as strong as those obtained by Abdelhamid. Much of
the direction for this present paper is provided by
Fabian's 1973 paper which we shall refer to henceforth as
I. In some places we were able to apply results obtained
in I directly to the KW case. These places are indicated
in the text. .Much of the actual estimation of unknown

parameters that is outlined in our main result, Theorem

(4.1) is carried out as in I.

But, as in previous cases, when properties were
obtained first for the RM case, the proofs for the
analogous properties in the KW case had to be changed at
several points to cover the more difficult situation. The
paper gives detailed treatment only to the new parts of
the proof and refers to I for the other parts, to an
extent that leaves the paper readable.

The same speeds of convergence (Theorem (2.11))
and asymptotic normality result (Theorem 3.1)) as those
obtained by Abdelhamid (Theorems (4.4) and (4.5)) are
achieved. The main result, Theorem (4.1), is a realiza-
tion of the procedure suggested, indicating how to estimate
the optimal values of a and c, as well as I(g),

-g'/g, f"(6), and f"‘(6) . As in the RM case there is
more freedom in estimating -g'/g than in estimating

‘B
I(g). This is reflected in the conditions an > n 1

‘8 '-
in (4.2.vi) and an 3 (log n) 0 in (4.2.111).
In Abdelhamid's treatment of the KW case, and
in many of the earlier treatments, the following two

assumptions have appeared: First, there exist constants

A and B such that
(1.4) |f(x+l) - f(x)| < Alx - 6| + B, for every x in R,

and, secondly,

' Fn 2 2
(1.5) E (Vn) i o , for every natural number n,

F
for a number 0 and Vn = Yn - E n(Yn). The latter

assumption may be omitted here if in the truncation of

the Yn giveE-ig (2.6.4), yn are chosen to be
(log (n v 2)) 1. The use of truncation in (2.6.4)
also enables us to weaken the assumption (1.4) to f
being bounded on bounded intervals. Without the
truncated term in the recursion relation (2.6.3) we would

need to assume not only (1.4) but also a similar type of

condition for E n(-g'/g)(Yn).

CHAPTER TWO

ALMOST SURE CONVERGENCE OF THE
PROPOSED STOCHASTIC APPROXIMATION PROCEDURE

2.1 Basic Notation

 

A11 random variables are assumed to be defined on
a probability space (Q,F,P). Relations between random
variables, including convergence, are meant to hold almost
surely, unless specified otherwise. The set of real
numbers is denoted by R, positive reals by RT, and the
class of all Borel subsets of R by B. The indicator

function of a set S is denoted by X Let E denote

S'
expectation, and EF conditional expectation, given the
o-algebra F. If 21,...,Zn are random variables, then
F(Zl,...,Zn) denotes the o-algebra induced by 21,...,Zn.

If {bn} is a sequence of numbers and {Zn} a
sequence of random variables, then we write 2n = 0(bn)
if lim sup Ib;12n(w)| < +m for almost all m. Similarly
we write Zn = 0u(bn) if there exists a K in R and
an integer n0 with Ibglan i K, for all n 3 no.

“If m is a function on R and a is in Rf,
then for each x in R, ¢a(x) denotes the difference-
w(x+a) - ¢(x-a); if k is a natural number, ka(x) de-

notes the kth derivative of m at x.

6

2.2 Remark

The following assumptions are listed for reference
later. Assumptions (2.6) and (2.7) appear in the con-
vergence results in this chapter, Assumption (2.8) in
the asymptotic normality result, Theorem (3.1). Only
Assumptions (2.3) and (2.4) appear in the main result,
Theorem (4.1).

2.3 Assumption

 

Both 6 and y belong to R. We assume that f

is a function on R such that either

sz exists, is continuous in a neighborhood of e, and

Y = 1/4.

or

D3f exists, is continuous in a neighborhood of e, and

Y = 1/6.

We further assume that f is bounded on bounded intervals,

, that D2f(6) = M > O, and that for every natural number k,

(2.3.1) sup 1 D f(x) < 0; 1 inf 2 f(x) > 0,
-k<x-6<- k E<x-e<k

where D f(x) and 2 f(x) denote respectively the upper

and lower derivatives of f at x.

2.4 Assumption

 

Assumption (2.3) holds. We assume that X1,X2,...

and Y1,Y2,... are random variables, that Fn is a

non-decreasing sequence of o-algebras such that for each
n, Fn contains the o-algebra F(X1,...,Xn,Y1,...,Yn_1).
For each n, Cn is a positive Fn-measurable random

. . = -y n - - _
variable w1th cn Cnn , and f (Xn) IS the Fn measur

able random variable whose value at w is
f(Xn(w) + cn(w)) - f(Xn(w) - cn(w))

For each n, Yn - fcn(Xn) is conditionally, given Fn’
distributed according to a distribution function G which
is symmetric, has zero expectation, has a density g which
has a continuous derivative Dg everywhere on R. The
density g is non-increasing on [0,w) and 0 < I(g) =
f(g-1D(g))2dG < +m.
2.5 Remark

The assumption of symmetry of G is a natural one
in a Kiefer-Wolfowitz type of procedure, where Yn is an
unbiased estimator of an(Xn). This requirement is
satisfied, for example, if the errors in estimating
f(Xn + CD) and f(Xn - cm), respectively, are independent
and identically distributed, given n.

2.6 Assumption

 

Assumption (2.4) holds and hn are measurable
functions on‘ (O x R, Fn x B) such that for each m,
hn(w,-) are odd, and are non-negative on [0,+m). ‘For
each 'n, DD is a non-negative Fn-measurable random

variable and

9

E
(2.6.1) lhn(w.t)| : n 1x(_n,n)(t).

‘60 El

(lo n) < C < (lo n) O D < n81
g _n_ g ’ n—

with numbers €0.61 satisfying 0 < 51 < y/2 and
0 < to < 51/2. Note that for both possible values of

y, we can (and will) select a u such that

(2.6.2) 1/2 - y - 251 > ”Y > 0.

We shall write hn(t) for hn(-,t), and hn(Yn)
for hn(-,Yn(-)).

The random variables X1,X2,... satisfy

1 -l+e

1~
[Dnhn(Yn) + log(n v 2) Y 1

(2.6.3) xn+1 = xn - (men) n

where

(2.6.4) in = (Yn v <-yn)) A yn

c

with yn = n 1 if G has finite second moment and
l-2€

l

Yn = (log(n V 2)) otherwise.

2.7 Assumption

 

Assumption (2.6) holds. For almost all m,
hn(w,-) + -g'l(Dg) on the set it; g(t) > O} and

(2.7.1), an + (2M I(g))‘l.

2.8 Assumption

 

Assumption (2.7) holds and

(2.8.1) [[hn(t + nn(c)) + g'1(Dg)]2dG + o

10

for every sequence {nu} of functions on O x R with
c
Innl g If n(xn)| and such that, for almost all m,

hn(w, t + nn(w,t)) are Borel measurable with respect to

t. The random variables Cl,C2,... satisfy

t<§><n3ﬂe))'21‘1<g)Jl/6 if v = 1/6
(2.8.2) cn s and D3f(e) + o

C otherwise,

where c is in R+.

2.9 Remark

Suppose Assumption (2.6) holds. In the proof of

Theorem (2.10) we shall require expressions for
F
E nhn(Yn) and E nYn. If k is a Borel measurable func-

F
tion,tﬂunithe conditional expectation E nk(Yn), pro-

vided it exists, is equal to K(E nYn) where K(A) =
[k(t + A)g(t)dt. For k = hn and k(t) = (t v (-yn)) v yn’
several properties of K were established in I under
the same conditions on g and on hn as we assumed in

(2.6). So using the results (I3.1.l), (13.1.2), and
(13.1.9) we have

Fn an FTP” Cn
(2.9.1)- E hn(Yn) = wn(f (xn)), a Yn = f (xn).<n

where Wn are functions satisfying

(2.9.2) Awn(A) 3 O for all A in R,

11

E
(2.9.3) h'lwn(h) 3 kn 1 with a k in R1, for all A ¢ 0,

and Kn (equal to ¢$(An) in I) are non-negative
Fn-measurable random variables with Kn + l on the set
of all m for which {f n(X'n(w))} is a bounded sequence.
2.10 Theorem

If Assumption (2.6) holds, then (log n)B(Xn - e) + 0,
for every 8 > 0.

Assume without loss of generality that e = 0.
Let s > 0. It is easy to see that there is a function m
on R such that ¢(x) = ¢(-x) for all x, m = 0 on
[0,5] and m > 0 on (e,+w), m has a bounded second
derivative and first derivative 9 satisfying
|D(x)| 3 |x| for all x, and g(x) = x for x > 25.

€1'Y

We have c = C n"Y i n by (2.6.1) and it

n n
suffices to consider n so large that cn < 5/2. Then

by (2.9.1)
F

n~
(2.10.1) g(xn)E Yn 3 0,
c
since Kn 3 0 and sign f n(x) = sign x for |x| > cn
F
= -1 n y
by (2.3.1). Define Bn (cn g(Xn)E Yn)2. Write (2.6.3)
F
- _ = n .
as Xn+l - Xn - Un’ and Nn E Un' Then With

'1+€1 2
n , we have Q(Xn)Nn = An + aan,

where by (2.9.2), An 3 0. So

a = n-1(log n)

12

2
(2.10.2) 12(Xn)Nn 3 01an

Also, by (2.6.1), (2.6.2), and (2.6.4)

F 2 -l-2u

n = y . '
(2.10.3) E Un 0u(n ) With “Y > 0.

Relations (2.10.2) and (2.10.3) show that conditions (2),

(3), and (4) of Lemma (3.3), Fabian (1971), are satisfied
280 -l-2u
with yn = En = o, and an = k (log n) (n 7) with a

k in R+. Hence a subsequence {Bn } of {Bn} converges
i

to 0 and the sequence {¢(Xn)} converges to a random
variable.
Let m be a point at which both properties hold.

Since ¢(Xn.(w)) converges, Xn (w) is bounded, and then

c i i
n
so is f i(Xn (m)). Therefore K (m) + 1 and so
1 Hi
cn.
c-1(w)D(X (w))f 1(X (m)) + 0. The latter convergence,
r1i ” ni ni

the properties of D, and (2.3.1) imply that
lim sup IXni(w)| i 6. But, since m(Xn(w)) converges,
¢(Xn(w)) + 0 and lim sup lxn(w)| i e. The final relation
holds for all m in a set of probability one. Since 8
was chosen arbitrary and positive, Xn + 0. (Note that as
a consequence, Kn + 1.)

Now suppose that N is a neighborhood of x = 0
in which sz exists and is continuous if y = 1/4, and

in which D3f exists and is continuous if y = 1/6.

13

c
Expanding f n(Xn) in powers of cn in N we obtain,

with a proper choice of an, that

C
- n _
(2.10.4) cn f (X) - a X + 5 CH ,

where a and En are Fn-measurable random variables,

n
with
2 0 if y = 1/4
(2.10.5) on + M, in + 50 = 1
L3 f (0) if y = 1/6 .
Using expression (2.10.4) we obtain
_1 -1+ 21—;
(2.10.6) Nn = n (aan + gncn ).
C11 -1 cn -1+€1
[Dn(f (Xn)) Wn(f (Xn)) + (log n) Kn].

For 50 as in (2.6.1), we obtain from (2.10.6) and (2.9.3)

that
_1 -1+€O -1+ %7
(2.10.7) Nn = n (log n) (an)!n + Encn )dn,
with
(2.10.8) 0 i 5n = 0u((1og n) n ), 6n + +w.
_1 -1+eO

Then X.n - Nn = Xn(l - n (log n) anén) - Rn’

_1 ‘l+€o ‘1‘1” %"
where Rn = n (log n) Encn st. Note that from

14

(2.10.5), (2.10.8), (2.6.1) and (2.6.2) we have that

-1-u

= Y
(2.10.9) Rn 0(n ), DY > 0.

Now, eventually, depending on w,

_ '1+€ - —1+€
0 g 1 - n 1(log n) Oanén : 1, (l-n 1(log n) Osman)2 :
-1+e -1+€
1 - n-1(1og n) Oandn i l - n-1(log n) O, and
-l+e
(2.10.10) (xn - Nn)2 3 x§(1 - n-1(1og n) O)

+ 21X R l + R2 .
nn n

Writing Xn+1 = (Xn - Nn) - (Un - Nn) we obtain
from (2.10.10)

2 2
(2.10.11) Xn+1 ; (l - An)X.n - 2Vn + Wh + Tn
with
_1 -1+e0
(2.10.12) A ; n (log n) ,

n

(2.10.13) vn = (xn - Nn)(Un - Nn), wn = (Un - Nn)2

_ 2
Tn - ZIXanl + Rn .

Suppose now 8n are positive numbers satisfying

(eventually)

-1 - = Y-
(2.10.14) an 8n+l(l An) ; 1, Bn+1xn 0(n ).

2n -n
BnénY forann>0.

15

We shall show that under these conditions

(2.10.15) 2 8 < +m

8n+1Tn
Bn+1Vn < m

This, (2.10.14),and (2.10.11) then easily imply

2 _
(2.10.16) ann - 0(1)

The first relation in (2.10.15) follows from

(2.10.3) since EWh ; EUﬁ. The second relation follows
2 -2-2U
since by (2.10.9), Rn = 0(n Y) and 8n+1|Xn||Rn| =
u -n -1-u
0(n Y )0(n Y). 1 2
F - - u
n 2 _ 2 y
From (2.10. 3), E Vn - (Xn - Nn) 0 (n )

2 2
But (Xh - Nn) ; Xn + Tn by (2.10.10) and

m 2 T l0 < 'l'qual
Z 8 n 3
=1 n+1 n u — n 1

"M8

n Bn+lTn < +m as we have

already shown. Concerning the other term, we have

-1-2u
2 2 y _ -l-n
8n +1xnoum ) - o (n ) .

co

F
2
E nVn < +m and nil Bn+lvn

This shows that nil éh+l

converges by the generalized Borel-Cantelli Lemma
(Lemma 10, Dubins and Freedman (1965)). Thus (2.10.15)
holds.

16

Choose now 8n = (log n)b so that (2.10.14) is
satisfied and (2.10.16) holds. This proves the theorem.
2.11 Theorem

If Assumption (2.7) holds, then nB(Xn - 0) + 0
for every 0 < B < k - y - 281.

Relation (I3.1.l6) can be rewritten as
(2.11.1) lim inf ”n 3 I(g)

with “n = f‘1(xn)9n(f(xn)). This relation holds also in

c c
. _ n -1 n .
our case With “n — [f (Xn)] WnEf (Xn)]. So we obtain
from (2.10.6) and (2.7.1) a strengthening of (2.10.7) to
_1 ”1+ %—'-Y-
(2.11.2) Nn = n kn(anXn + ancn )

2E
with lim inf kn 3 (2M)'1, kn = 0u(n 1). Then, (2 10.11)

holds with (2.10.12) strengthened to

-la k'

(2.11.3) A ; 2n n n

n

with kn - kn + 0.

> 0 f . 0
1n .

We know this is true at least for 80 = 0. Choose a B

B+BO
in (Bo,uy) and set 8 = n . These 8n satisfy

n

(2.10.14) and thus, also, (2.10.16). Thus ann + o for
every 8 < “y; but since “Y can be chosen as any number
less than k - y - 261 (see (2.6.2)), the assertion of

the theorem holds.

CHAPTER THREE
ASYMPTOTIC NORMALITY OF THE PROPOSED PROCEDURE

To obtain the following asymptotic normality
result for the procedure proposed in (2.6.3) we use a
one—dimensional version of Theorem (2.2), Fabian (1968).
3.1 Asymptotic Normality Theorem

If Assumption (2.8) holds, then ng-YOCn - 9)

is asymptotically normal with

mean = 0,
variance = [6 I(g)M C ]
Fmean = 0, if y = 1/6

and
variance = [(16/3)1(g)M20‘-1'1 D3f(e) = 0,

(11}

mean = -(Q/128)1/3, if y = 1/6

and
Lvariance = (0/211/2)2/3, 03£(e) a 0,

 

where Q = 3D3f(e)M-3I'1(g).

"Assume without loss of generality that e = 0.
Suppose Assumption (2 8) holds. As in I, proof of
Theorem (3.1,iii), use (2.8.1) and the Schwarz inequality

17

 

18

c c
to obtain lim sup [f n(Xn)]-1‘Pn[f n(Xn)] i I(g). This,

along with (2 11.1) gives

c c
n -1 n +
(3.1.1) . [f (Xn)] WHEf (Xn)] I(g),

and from (2.10 6)

1 + l—
_ -1 ' ZY
(3.1.2) Nn - n (aan + Encn )An,
. cn _1 cn -l+e1
Wlth An = DnEf (Xn)] WHEf (Kn)] + (log n) Kn,

where in the proof of Theorem (2.10), it was shown that

Kn + 1. So by (3.1.1) and (2.7.1) we have

(3.1.3) in s (210'1 .

F
Denoting conditional variance, given Fn’ by Var n,

we have

r- c c
, n _ 2 n _ 2 n +
Var [hn(Yn)] - fhn(t + f (Xn))dG(t) intf (Xn)1 I(g)

C
by (2.8.1) and since WnEf n(xn)1 + 0. Therefore

F
2 n -2 -1
(3.1.4) Dn Var [hn(Yn)] + (2M) I (g).
Fn -l+e1~
Now consider Var [(log n) Yn]. If yn in (2.6.4)
- 1-261 251

are (log n) , this variance is bounded by (log n)

a
0n the other hand, if yn = n 1 then G has finite

19

second moment, say 02, and Var nYn i E nYﬁ i E nYﬁ i
2 Cn 2 Fn~
o + [f (Xn)] . So on the set {Xn + 0}, Var Yn + 0.
In either case then we have
Fn -1+e1~
(3.1.5) Var [(1og n) Yn] + 0

The random variables hn(Yn) and Yn are not independent,
but by the Schwarz inequality it follows from (3.1.4)
and (3 1.5) that

2

F
(3.1.6) (ncn)2E n(un - N ) s 1'1(g)(2M)'2 .

11

Now we set 2n = ncn(Un - Nn) and suppose r

e 25
1).

is in R+. By (2.6 1), 2n = 0u((log n) 0n So

{2: > rn} is eventually empty and

2
(3.1.7) E an 2 + 0 .

{Zn 1 rn}
Writing Xn+1 as (Xn - Nn) - (Un - Nn) we obtain
1 -1c-l+l/(27)

= -1 -1 -
Xn+1 (1 ' n anAn)Xn ' n 0n 2n ' n n gnxn'

Using this, (3.1.3), (3.1.6), (3.1.7), and the measurability

properties of an, A and an we obtain the desired

n,
result by applying Theorem (2.2), Fabian (1968) with

Un in Theorem (2.2) replaced by Xn here, Pn by
-l
anAn, Vn by Zn, 0n by - Cn

_ -1+1/(27)
Cn gnx

, and Tn by

20
For the case y = 1/4, by (2.8.2), (2.10.5), and

(3.1.3) we have, 0 = -c"1 and T = 0. Similarly for the

case when 7 = 1/6 and D3f(0) = 0. Finally, if

1/6 and D3f(0) # 0, 9 is -[(2/3)[D3f(0)]21(g)]1/6

Y:
and T is -(6M)‘1[(3/2)[03f(0)11'1(g)11/3.

In all cases, F = l, a = l, B = 8+ = 1 - 2y,

and by (3.1.6), 2 = I‘1(g)(2M)‘2.

CHAPTER FOUR
THE MAIN RESULT

In this chapter we state and prove the main result,
a realization of the procedure given in (2.6.3). Only
(2.3) and (2.4) are assumed to hold. This result is given
in the following theorem.
4.1 Theorem

Suppose Assumptions (2.3) and (2.4) hold with Fn
as defined below. Let {kl} be an increasing sequence
of positive integers such that 2/k£ + 0. Suppose {U1}

and {V } are sequences of random variables such that

l
with
Fn = F({X1,Yl,...,Yn_l} U {U,; k, < n} U {V,; kg < n}>»
we have
r’r
kg -2 d2 d2
E U - (2d ) (f ) (X ) .
1 1 k,
Fk Fk
E 2(U - E 2U )2 - 0 (d'4)
‘ 2 2 u 2 ’
(4.1.1)~ Fkl 3 d, d1 d2
E v2 = <2d,) ((f > > (xk1)’
rk Fk
E IL(v - E 2V )2 = 0 (d-6)
k l l U l ’

 

21

22
with d2 of the form

(4.1.2) d, = d1' , d in R+, 0 < a < 1/6 .

Then the sequence {Kn} as defined in (4.2) below con-
verges to 6 and tE'Y(Xh - 6) is asymptotically normal
with mean and variance as given in Theorem (3.1), (i)

and (ii), where 2tn = 2n + 7 card {2; kg < n} is the
number of observations needed to construct Xn.

4.2 The Procedure

 

(i) Estimation of D2f(e):

 

Set Uh equal to the arithmetic mean of all

U2 with kl < n. Then set

(4.2.1) un = (0 v Un).

(ii) Estimation of D3f(8):
Set vn equal to the arithmetic mean of all

V2 Wlth k2 < n.

(iii) Estimation of I(g):

 

-B
Let 80 > 0 and choose an 3 (log n) 0. Then this

estimation is carried out precisely as it is in (I4.2.b),
that is by a sequence {wh} with

- _ o 2
(4.2.2) “n - f(hn) dGn-1

where Gn is the empirical distribution function of
o
Y1,Y2,...,Yn, and hn is defined in (14.2.3).

 

 

 

23
(iv) The sequence On:
Set
-e
(4.2.3) Cn = [(3/2)v:12wI-11)1/6 v (log n) O] A (log n)
if y = 1/6 and D3f(e) = 0
C , otherwise
with 60 as in (2.6.1), C as in (2.8.2), vn as in
(4.2.ii) and'wn as in (4.2.2).
(v) The sequence Dn:
Set
(4 2 4) 0 = (2u w )’1 A n61
‘ ' n n n '
(vi) The functions hn:
-81
Choose a 3 n for n = 2,... Then the

n

80

estimators are constructed precisely as in (I4.2.c), but

 

for O < 81 < % - y - 261, with y
(2.3) and 51 as (2.6.1).
(vii) The sequence Xn:

The recursion relation for
(2.6.3).

Proof of Theorem (4.1):

We shall prove the theorem by verifying Assump-

tions (2.3), (2.4), (2.6),

as in Assumption

Xn is given in

(2.7), and (2.8).

24

First, Assumptions (2.3) and (2.4) are assumed to
hold in the theorem. The measurability conditions on

Cn' Dn' and hn and condition (2.6.1) are obvious from

their definitions. Relation (2.6.3) holds by assumption.
Thus Assumption (2.6) holds, and by Theorem (2.10),

(log n)8(Xn - e) + 0, for every 8 > 0.

2

To show that un converges to D f(e), it suffices

—- _ 2
to show that Un does. Let Wﬁ — U,L - E U2. Then

W2 is an orthogonal sequence,
2 (log 2)2£-2E WE i Cld-4 2 (log 1)
i=1 i=1

for a C1 in R+. So by Theorem (33.1.B.ii), Loeve, we

2 Fk2 2

X Wj + 0. Also E U2 = D f(xkl + 0%), where

2£-2+46 < +m

have 1-1

|v 1 2di' 80 eventually, depending on m, we obtain

F
k2.

using Assumption (2.3) that E U2 + D2f(e) and

Un+D2ﬂ6). The convergence of Wu to I(g) follows

Q I

from Theorem (2.2), Fabian (1973b). Verification of the
assumptions of this theorem are given in (14.3.ii).
Therefore Assumption (2.7) holds, and by our Theorem
(2.11), nB(Xﬁ - e) + o, for every 0 < 3 < s - y - 2c1.
,Finally, (2.8.1) follows from.Extension (2.3),
Fabian (1973b). Details and verification of the assump-

tions of this extension are given in (14.3.iii). The

25

convergence of vn to D3f(6) follows by an argument
similar to that used to show an +D2f(6). Therefore
Assumption (2.8) holds, and by our Theorem (3.1), Xn
has the properties asserted in Theorem (4.1) since

tn/n + 1 because i/kl + 0.

J 1AA
.‘l.ll A

REFERENCES

REFERENCES

ABDELHAMID, S.N. (1973). Transformation of observations
in stochastic approximation. Ann. Statist. 1
1158-1174.

ANBAR, D. (1973). On optimal estimation methods using

stochastic approximation procedures. Ann.
Statist. 1 1175-1184.

BLUM, J.R. (1954). Approximation methods which converge
with probability one. Ann. Math. Statist. 25
737-744.

DUBINS, L.E. and FREEDMAN, D.A. (1965). A sharper form
of the Borel-Cantelli lemma and the strong law.
Ann. Math. Statist. 36 800-807.

FABIAN, V. (1960). Stochastic approximation methods.
Czechoslovak Math. J. 10(85) 123-159.

FABIAN, V. (1964). A new one-dimensional stochastic
approximation method for finding a local minimum
of a function. Trans. Third Prague Conf.
Information Theory, Statistical Decision Func-
tions, Random Processes, Czechoslovak Academy of
Sciences, Prague, 85-105.

FABIAN, V. (1968). On asymptotic normality in stochastic
approximation. Ann. Math. Statist. 39 1327-
1332.

FABIAN, V. (1971). Stochastic Approximation. Optimizing
Methods in Statistics (J.S. Rustagi, ed.). .
439-470. Academic Press, New York.

FABIAN, V. (1973a) (Referred to in the text as I).
“Asymptotically efficient stochastic approximation;
the RM case. Ann. Statist. 1 486-495.

FABIAN, V. (1973b). Estimation of the derivative of the
logarithm of a density. Ann. Statist. 1 557-561.

26

27

KIEFER, J. and WOLFOWITZ, J. (1952). Stochastic estima-
tion of the maximum of a regression function.
Ann. Math. Statist. 23 462-466.

LOEVE, MICHEL (1963). Probability Theory, Third Edition.
D. van Nostrand Company, Inc.

ROBBINS, H. and MONRO, S. (1951). A stochastic approxima-
tion method. Ann. Math. Statist. 22 400-407.

VENTER, J.H. (1967). An extension of the Robbins-Monro
procedure. Ann. Math. Statist. 38 181-190.

   

nrcnrcnu STATE UNIV. LIBRARIES
1|111111111111111111111111111111111
31293107864591