5‘1 H A KlEFER - WOLFOWITZ TYPE STOCHASTIC APPROXIMATION PROCEDURE Dissertation for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY THOMAS EDWARD DBREMSK! 1976 ,lllttw I l This is to certify that the thesis entitled A KIEFER-WOLFOWITZ TYPE STOCHASTIC APPROXIMATION PROCEDURE presented by Thomas Edward Obremski has been accepted towards fulfillment of the requirements for Ph.D. Statistics and Probability degree in \ch cLoe v totD‘owx Major professor Dam April 30, 1976 0-7639 -~N‘\" \ 1.. 5.3 9; A; 7 §; V A! wt}- it§7 ABSTRACT A KIEFER-WOLFOWITZ TYPE STOCHASTIC APPROXIMATION PROCEDURE By Thomas Edward Obremski In considering both the Robbins-Monro (RM) and Kiefer-Wolfowitz (KW) stochastic approximation procedures, Abdelhamid (1973) has shown that if the density g of the errors in estimating function values (RM case), and differences of function values (KW case) is known, then a transformation of observations leads to methods which under mild conditions have desirable asymptotic properties. Fabian (1973) obtained the same asymptotic results in the (RM) case without assuming knowledge of the density g. We study the analogous problem in the (KW) case, and obtain the same asymptotic results as Abdelhamid. A KIEFER-WOLFOWITZ TYPE STOCHASTIC APPROXIMATION PROCEDURE By Thomas Edward Obremski A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1976 To Kacey, who (to the best of my knowledge) never doubted I could do it. ii ACKNOWLEDGEMENTS I am sincerely grateful to Professor Véclav Fabian for the guidance he gave me during my thesis work. His patience and valuable suggestions were always a source of encouragement for me. I feel that I have learned from him not only about the content area of the thesis, but also a good deal about writing style. I would also like to thank David Ruppert for reading the material and discussing it with me, and Mrs. Noralee Burkhardt for her excellent typing of the thesis. iii Chapter 1 2 4 REFERENCES TABLE OF CONTENTS INTRODUCTION ALMOST SURE CONVERGENCE OF THE PROPOSED STOCHASTIC APPROXIMATION PROCEDURE . . ASYMPTOTIC NORMALITY OF THE PROPOSED PROCEDURE THE MAIN RESULT . iv Page 17 21 26 CHAPTER ONE INTRODUCTION Consider the stochastic approximation procedure given by (1.1) Xn+1 = Xn - ancn Yn , n = 1,2,..., where Xh, Yn are random variables and an, cn are positive numbers. Included in (1.1) are both the Robbins- Monro (1951) procedure (RM), and the Kiefer-Wolfowitz (1952) procedure (KW). Abdelhamid (1973) and, independently, Anbar (1973) investigated the possible effect that transforming the observed random variables Yn might have on the almost sure convergence and the asymptotic normality. Abdelhamid's investigation included both the RM and KW cases, Anbar's only the RM case. Specifically they studied the asymptotic behavior of the procedure given by - -1 _ (1.2) _ Xn+1 — Xn - anon h(Yn) , n - l,2,..., where h was assumed to belong to a class C of Borel measurable functions which preserve both the almost sure convergence of Xn to 6 and the asymptotic normality of 1 nB(Xn - e), where in the RM case, 6 is the unknown root of the function f and 8 = k, and in the KW case, 6 is the point of minimum of f and 8 lies in the interval [1/4, 1/3], depending on the assumptions on f. Denote by Fn the sigma-algebra generated by X1,X2,...,Xn. The following conditions were assumed for the random variables F _ _ n . . . _ Vn - Yn E (Yn)' Vn are conditionally (given Fn) dis tributed according to a symmetric distribution function G admitting density g; g has derivative almost every- where with respect to G; O < I(g) = I(g'(V)/g(V))2dG(V) < +w. Within the class C, they sought that function h* which would minimize the second moment of the asymptotic distribution. It was known that in the case where g is normal, such an h was given by the identity function, that is, when g is known to be normal, (1.1) cannot be improved upon by transformation of observations. They found in general that within the class C, h*(v) = (-g'/g)(v), unique up to multiplicative constant. So for example if g is double exponential, then (-g'/g)(v) = C sign(v) with a constant C > 0, and the optimal pro- cedure is _r '1 (1'3) Xn+1 _ kn ' a cn n sign(Yn), n = 1,2,..., first suggested by Fabian (1960 and 1964). Abdelhamid also suggested improvements in some cases where G is known but fails to satisfy all of the assumptions above. Without assuming knowledge of the distribution G, Fabian (1973a) constructed a RM-type procedure which per- forms asymptotically as well as the transformed RM pro- -1 cedure (1.2) does when G is known. With an = an , CD = l in the RM case and cn-Y in the KW case, a and c positive numbers, and y in the interval [1/6, 1/4], Abdelhamid had derived values of a and c optimal in the sense of minimizing the second moment of the asymptotic distribution. In the RM case the optimal choice of a is (f'(6)I(g))-1. Fabian suggested methods of estimating I(g), -g'/g, and f'(6) and pointed out some of the prob- lems inherent in such estimation. The main purpose of this paper is to achieve asymptotic results in the KW case with. G unknown which are as strong as those obtained by Abdelhamid. Much of the direction for this present paper is provided by Fabian's 1973 paper which we shall refer to henceforth as I. In some places we were able to apply results obtained in I directly to the KW case. These places are indicated in the text. .Much of the actual estimation of unknown parameters that is outlined in our main result, Theorem (4.1) is carried out as in I. But, as in previous cases, when properties were obtained first for the RM case, the proofs for the analogous properties in the KW case had to be changed at several points to cover the more difficult situation. The paper gives detailed treatment only to the new parts of the proof and refers to I for the other parts, to an extent that leaves the paper readable. The same speeds of convergence (Theorem (2.11)) and asymptotic normality result (Theorem 3.1)) as those obtained by Abdelhamid (Theorems (4.4) and (4.5)) are achieved. The main result, Theorem (4.1), is a realiza- tion of the procedure suggested, indicating how to estimate the optimal values of a and c, as well as I(g), -g'/g, f"(6), and f"‘(6) . As in the RM case there is more freedom in estimating -g'/g than in estimating ‘B I(g). This is reflected in the conditions an > n 1 ‘8 '- in (4.2.vi) and an 3 (log n) 0 in (4.2.111). In Abdelhamid's treatment of the KW case, and in many of the earlier treatments, the following two assumptions have appeared: First, there exist constants A and B such that (1.4) |f(x+l) - f(x)| < Alx - 6| + B, for every x in R, and, secondly, ' Fn 2 2 (1.5) E (Vn) i o , for every natural number n, F for a number 0 and Vn = Yn - E n(Yn). The latter assumption may be omitted here if in the truncation of the Yn giveE-ig (2.6.4), yn are chosen to be (log (n v 2)) 1. The use of truncation in (2.6.4) also enables us to weaken the assumption (1.4) to f being bounded on bounded intervals. Without the truncated term in the recursion relation (2.6.3) we would need to assume not only (1.4) but also a similar type of condition for E n(-g'/g)(Yn). CHAPTER TWO ALMOST SURE CONVERGENCE OF THE PROPOSED STOCHASTIC APPROXIMATION PROCEDURE 2.1 Basic Notation A11 random variables are assumed to be defined on a probability space (Q,F,P). Relations between random variables, including convergence, are meant to hold almost surely, unless specified otherwise. The set of real numbers is denoted by R, positive reals by RT, and the class of all Borel subsets of R by B. The indicator function of a set S is denoted by X Let E denote S' expectation, and EF conditional expectation, given the o-algebra F. If 21,...,Zn are random variables, then F(Zl,...,Zn) denotes the o-algebra induced by 21,...,Zn. If {bn} is a sequence of numbers and {Zn} a sequence of random variables, then we write 2n = 0(bn) if lim sup Ib;12n(w)| < +m for almost all m. Similarly we write Zn = 0u(bn) if there exists a K in R and an integer n0 with Ibglan i K, for all n 3 no. “If m is a function on R and a is in Rf, then for each x in R, ¢a(x) denotes the difference- w(x+a) - ¢(x-a); if k is a natural number, ka(x) de- notes the kth derivative of m at x. 6 2.2 Remark The following assumptions are listed for reference later. Assumptions (2.6) and (2.7) appear in the con- vergence results in this chapter, Assumption (2.8) in the asymptotic normality result, Theorem (3.1). Only Assumptions (2.3) and (2.4) appear in the main result, Theorem (4.1). 2.3 Assumption Both 6 and y belong to R. We assume that f is a function on R such that either sz exists, is continuous in a neighborhood of e, and Y = 1/4. or D3f exists, is continuous in a neighborhood of e, and Y = 1/6. We further assume that f is bounded on bounded intervals, , that D2f(6) = M > O, and that for every natural number k, (2.3.1) sup 1 D f(x) < 0; 1 inf 2 f(x) > 0, -k ”Y > 0. We shall write hn(t) for hn(-,t), and hn(Yn) for hn(-,Yn(-)). The random variables X1,X2,... satisfy 1 -l+e 1~ [Dnhn(Yn) + log(n v 2) Y 1 (2.6.3) xn+1 = xn - (men) n where (2.6.4) in = (Yn v <-yn)) A yn c with yn = n 1 if G has finite second moment and l-2€ l Yn = (log(n V 2)) otherwise. 2.7 Assumption Assumption (2.6) holds. For almost all m, hn(w,-) + -g'l(Dg) on the set it; g(t) > O} and (2.7.1), an + (2M I(g))‘l. 2.8 Assumption Assumption (2.7) holds and (2.8.1) [[hn(t + nn(c)) + g'1(Dg)]2dG + o 10 for every sequence {nu} of functions on O x R with c Innl g If n(xn)| and such that, for almost all m, hn(w, t + nn(w,t)) are Borel measurable with respect to t. The random variables Cl,C2,... satisfy t<§> 0. Assume without loss of generality that e = 0. Let s > 0. It is easy to see that there is a function m on R such that ¢(x) = ¢(-x) for all x, m = 0 on [0,5] and m > 0 on (e,+w), m has a bounded second derivative and first derivative 9 satisfying |D(x)| 3 |x| for all x, and g(x) = x for x > 25. €1'Y We have c = C n"Y i n by (2.6.1) and it n n suffices to consider n so large that cn < 5/2. Then by (2.9.1) F n~ (2.10.1) g(xn)E Yn 3 0, c since Kn 3 0 and sign f n(x) = sign x for |x| > cn F = -1 n y by (2.3.1). Define Bn (cn g(Xn)E Yn)2. Write (2.6.3) F - _ = n . as Xn+l - Xn - Un’ and Nn E Un' Then With '1+€1 2 n , we have Q(Xn)Nn = An + aan, where by (2.9.2), An 3 0. So a = n-1(log n) 12 2 (2.10.2) 12(Xn)Nn 3 01an Also, by (2.6.1), (2.6.2), and (2.6.4) F 2 -l-2u n = y . ' (2.10.3) E Un 0u(n ) With “Y > 0. Relations (2.10.2) and (2.10.3) show that conditions (2), (3), and (4) of Lemma (3.3), Fabian (1971), are satisfied 280 -l-2u with yn = En = o, and an = k (log n) (n 7) with a k in R+. Hence a subsequence {Bn } of {Bn} converges i to 0 and the sequence {¢(Xn)} converges to a random variable. Let m be a point at which both properties hold. Since ¢(Xn.(w)) converges, Xn (w) is bounded, and then c i i n so is f i(Xn (m)). Therefore K (m) + 1 and so 1 Hi cn. c-1(w)D(X (w))f 1(X (m)) + 0. The latter convergence, r1i ” ni ni the properties of D, and (2.3.1) imply that lim sup IXni(w)| i 6. But, since m(Xn(w)) converges, ¢(Xn(w)) + 0 and lim sup lxn(w)| i e. The final relation holds for all m in a set of probability one. Since 8 was chosen arbitrary and positive, Xn + 0. (Note that as a consequence, Kn + 1.) Now suppose that N is a neighborhood of x = 0 in which sz exists and is continuous if y = 1/4, and in which D3f exists and is continuous if y = 1/6. 13 c Expanding f n(Xn) in powers of cn in N we obtain, with a proper choice of an, that C - n _ (2.10.4) cn f (X) - a X + 5 CH , where a and En are Fn-measurable random variables, n with 2 0 if y = 1/4 (2.10.5) on + M, in + 50 = 1 L3 f (0) if y = 1/6 . Using expression (2.10.4) we obtain _1 -1+ 21—; (2.10.6) Nn = n (aan + gncn ). C11 -1 cn -1+€1 [Dn(f (Xn)) Wn(f (Xn)) + (log n) Kn]. For 50 as in (2.6.1), we obtain from (2.10.6) and (2.9.3) that _1 -1+€O -1+ %7 (2.10.7) Nn = n (log n) (an)!n + Encn )dn, with (2.10.8) 0 i 5n = 0u((1og n) n ), 6n + +w. _1 -1+eO Then X.n - Nn = Xn(l - n (log n) anén) - Rn’ _1 ‘l+€o ‘1‘1” %" where Rn = n (log n) Encn st. Note that from 14 (2.10.5), (2.10.8), (2.6.1) and (2.6.2) we have that -1-u = Y (2.10.9) Rn 0(n ), DY > 0. Now, eventually, depending on w, _ '1+€ - —1+€ 0 g 1 - n 1(log n) Oanén : 1, (l-n 1(log n) Osman)2 : -1+e -1+€ 1 - n-1(1og n) Oandn i l - n-1(log n) O, and -l+e (2.10.10) (xn - Nn)2 3 x§(1 - n-1(1og n) O) + 21X R l + R2 . nn n Writing Xn+1 = (Xn - Nn) - (Un - Nn) we obtain from (2.10.10) 2 2 (2.10.11) Xn+1 ; (l - An)X.n - 2Vn + Wh + Tn with _1 -1+e0 (2.10.12) A ; n (log n) , n (2.10.13) vn = (xn - Nn)(Un - Nn), wn = (Un - Nn)2 _ 2 Tn - ZIXanl + Rn . Suppose now 8n are positive numbers satisfying (eventually) -1 - = Y- (2.10.14) an 8n+l(l An) ; 1, Bn+1xn 0(n ). 2n -n BnénY forann>0. 15 We shall show that under these conditions (2.10.15) 2 8 < +m 8n+1Tn Bn+1Vn < m This, (2.10.14),and (2.10.11) then easily imply 2 _ (2.10.16) ann - 0(1) The first relation in (2.10.15) follows from (2.10.3) since EWh ; EUfi. The second relation follows 2 -2-2U since by (2.10.9), Rn = 0(n Y) and 8n+1|Xn||Rn| = u -n -1-u 0(n Y )0(n Y). 1 2 F - - u n 2 _ 2 y From (2.10. 3), E Vn - (Xn - Nn) 0 (n ) 2 2 But (Xh - Nn) ; Xn + Tn by (2.10.10) and m 2 T l0 < 'l'qual Z 8 n 3 =1 n+1 n u — n 1 "M8 n Bn+lTn < +m as we have already shown. Concerning the other term, we have -1-2u 2 2 y _ -l-n 8n +1xnoum ) - o (n ) . co F 2 E nVn < +m and nil Bn+lvn This shows that nil éh+l converges by the generalized Borel-Cantelli Lemma (Lemma 10, Dubins and Freedman (1965)). Thus (2.10.15) holds. 16 Choose now 8n = (log n)b so that (2.10.14) is satisfied and (2.10.16) holds. This proves the theorem. 2.11 Theorem If Assumption (2.7) holds, then nB(Xn - 0) + 0 for every 0 < B < k - y - 281. Relation (I3.1.l6) can be rewritten as (2.11.1) lim inf ”n 3 I(g) with “n = f‘1(xn)9n(f(xn)). This relation holds also in c c . _ n -1 n . our case With “n — [f (Xn)] WnEf (Xn)]. So we obtain from (2.10.6) and (2.7.1) a strengthening of (2.10.7) to _1 ”1+ %—'-Y- (2.11.2) Nn = n kn(anXn + ancn ) 2E with lim inf kn 3 (2M)'1, kn = 0u(n 1). Then, (2 10.11) holds with (2.10.12) strengthened to -la k' (2.11.3) A ; 2n n n n with kn - kn + 0. > 0 f . 0 1n . We know this is true at least for 80 = 0. Choose a B B+BO in (Bo,uy) and set 8 = n . These 8n satisfy n (2.10.14) and thus, also, (2.10.16). Thus ann + o for every 8 < “y; but since “Y can be chosen as any number less than k - y - 261 (see (2.6.2)), the assertion of the theorem holds. CHAPTER THREE ASYMPTOTIC NORMALITY OF THE PROPOSED PROCEDURE To obtain the following asymptotic normality result for the procedure proposed in (2.6.3) we use a one—dimensional version of Theorem (2.2), Fabian (1968). 3.1 Asymptotic Normality Theorem If Assumption (2.8) holds, then ng-YOCn - 9) is asymptotically normal with mean = 0, variance = [6 I(g)M C ] Fmean = 0, if y = 1/6 and variance = [(16/3)1(g)M20‘-1'1 D3f(e) = 0, (11} mean = -(Q/128)1/3, if y = 1/6 and Lvariance = (0/211/2)2/3, 03£(e) a 0, where Q = 3D3f(e)M-3I'1(g). "Assume without loss of generality that e = 0. Suppose Assumption (2 8) holds. As in I, proof of Theorem (3.1,iii), use (2.8.1) and the Schwarz inequality 17 18 c c to obtain lim sup [f n(Xn)]-1‘Pn[f n(Xn)] i I(g). This, along with (2 11.1) gives c c n -1 n + (3.1.1) . [f (Xn)] WHEf (Xn)] I(g), and from (2.10 6) 1 + l— _ -1 ' ZY (3.1.2) Nn - n (aan + Encn )An, . cn _1 cn -l+e1 Wlth An = DnEf (Xn)] WHEf (Kn)] + (log n) Kn, where in the proof of Theorem (2.10), it was shown that Kn + 1. So by (3.1.1) and (2.7.1) we have (3.1.3) in s (210'1 . F Denoting conditional variance, given Fn’ by Var n, we have r- c c , n _ 2 n _ 2 n + Var [hn(Yn)] - fhn(t + f (Xn))dG(t) intf (Xn)1 I(g) C by (2.8.1) and since WnEf n(xn)1 + 0. Therefore F 2 n -2 -1 (3.1.4) Dn Var [hn(Yn)] + (2M) I (g). Fn -l+e1~ Now consider Var [(log n) Yn]. If yn in (2.6.4) - 1-261 251 are (log n) , this variance is bounded by (log n) a 0n the other hand, if yn = n 1 then G has finite 19 second moment, say 02, and Var nYn i E nYfi i E nYfi i 2 Cn 2 Fn~ o + [f (Xn)] . So on the set {Xn + 0}, Var Yn + 0. In either case then we have Fn -1+e1~ (3.1.5) Var [(1og n) Yn] + 0 The random variables hn(Yn) and Yn are not independent, but by the Schwarz inequality it follows from (3.1.4) and (3 1.5) that 2 F (3.1.6) (ncn)2E n(un - N ) s 1'1(g)(2M)'2 . 11 Now we set 2n = ncn(Un - Nn) and suppose r e 25 1). is in R+. By (2.6 1), 2n = 0u((log n) 0n So {2: > rn} is eventually empty and 2 (3.1.7) E an 2 + 0 . {Zn 1 rn} Writing Xn+1 as (Xn - Nn) - (Un - Nn) we obtain 1 -1c-l+l/(27) = -1 -1 - Xn+1 (1 ' n anAn)Xn ' n 0n 2n ' n n gnxn' Using this, (3.1.3), (3.1.6), (3.1.7), and the measurability properties of an, A and an we obtain the desired n, result by applying Theorem (2.2), Fabian (1968) with Un in Theorem (2.2) replaced by Xn here, Pn by -l anAn, Vn by Zn, 0n by - Cn _ -1+1/(27) Cn gnx , and Tn by 20 For the case y = 1/4, by (2.8.2), (2.10.5), and (3.1.3) we have, 0 = -c"1 and T = 0. Similarly for the case when 7 = 1/6 and D3f(0) = 0. Finally, if 1/6 and D3f(0) # 0, 9 is -[(2/3)[D3f(0)]21(g)]1/6 Y: and T is -(6M)‘1[(3/2)[03f(0)11'1(g)11/3. In all cases, F = l, a = l, B = 8+ = 1 - 2y, and by (3.1.6), 2 = I‘1(g)(2M)‘2. CHAPTER FOUR THE MAIN RESULT In this chapter we state and prove the main result, a realization of the procedure given in (2.6.3). Only (2.3) and (2.4) are assumed to hold. This result is given in the following theorem. 4.1 Theorem Suppose Assumptions (2.3) and (2.4) hold with Fn as defined below. Let {kl} be an increasing sequence of positive integers such that 2/k£ + 0. Suppose {U1} and {V } are sequences of random variables such that l with Fn = F({X1,Yl,...,Yn_l} U {U,; k, < n} U {V,; kg < n}>» we have r’r kg -2 d2 d2 E U - (2d ) (f ) (X ) . 1 1 k, Fk Fk E 2(U - E 2U )2 - 0 (d'4) ‘ 2 2 u 2 ’ (4.1.1)~ Fkl 3 d, d1 d2 E v2 = <2d,) ((f > > (xk1)’ rk Fk E IL(v - E 2V )2 = 0 (d-6) k l l U l ’ 21 22 with d2 of the form (4.1.2) d, = d1' , d in R+, 0 < a < 1/6 . Then the sequence {Kn} as defined in (4.2) below con- verges to 6 and tE'Y(Xh - 6) is asymptotically normal with mean and variance as given in Theorem (3.1), (i) and (ii), where 2tn = 2n + 7 card {2; kg < n} is the number of observations needed to construct Xn. 4.2 The Procedure (i) Estimation of D2f(e): Set Uh equal to the arithmetic mean of all U2 with kl < n. Then set (4.2.1) un = (0 v Un). (ii) Estimation of D3f(8): Set vn equal to the arithmetic mean of all V2 Wlth k2 < n. (iii) Estimation of I(g): -B Let 80 > 0 and choose an 3 (log n) 0. Then this estimation is carried out precisely as it is in (I4.2.b), that is by a sequence {wh} with - _ o 2 (4.2.2) “n - f(hn) dGn-1 where Gn is the empirical distribution function of o Y1,Y2,...,Yn, and hn is defined in (14.2.3). 23 (iv) The sequence On: Set -e (4.2.3) Cn = [(3/2)v:12wI-11)1/6 v (log n) O] A (log n) if y = 1/6 and D3f(e) = 0 C , otherwise with 60 as in (2.6.1), C as in (2.8.2), vn as in (4.2.ii) and'wn as in (4.2.2). (v) The sequence Dn: Set (4 2 4) 0 = (2u w )’1 A n61 ‘ ' n n n ' (vi) The functions hn: -81 Choose a 3 n for n = 2,... Then the n 80 estimators are constructed precisely as in (I4.2.c), but for O < 81 < % - y - 261, with y (2.3) and 51 as (2.6.1). (vii) The sequence Xn: The recursion relation for (2.6.3). Proof of Theorem (4.1): We shall prove the theorem by verifying Assump- tions (2.3), (2.4), (2.6), as in Assumption Xn is given in (2.7), and (2.8). 24 First, Assumptions (2.3) and (2.4) are assumed to hold in the theorem. The measurability conditions on Cn' Dn' and hn and condition (2.6.1) are obvious from their definitions. Relation (2.6.3) holds by assumption. Thus Assumption (2.6) holds, and by Theorem (2.10), (log n)8(Xn - e) + 0, for every 8 > 0. 2 To show that un converges to D f(e), it suffices —- _ 2 to show that Un does. Let Wfi — U,L - E U2. Then W2 is an orthogonal sequence, 2 (log 2)2£-2E WE i Cld-4 2 (log 1) i=1 i=1 for a C1 in R+. So by Theorem (33.1.B.ii), Loeve, we 2 Fk2 2 X Wj + 0. Also E U2 = D f(xkl + 0%), where 2£-2+46 < +m have 1-1 |v 1 2di' 80 eventually, depending on m, we obtain F k2. using Assumption (2.3) that E U2 + D2f(e) and Un+D2fl6). The convergence of Wu to I(g) follows Q I from Theorem (2.2), Fabian (1973b). Verification of the assumptions of this theorem are given in (14.3.ii). Therefore Assumption (2.7) holds, and by our Theorem (2.11), nB(Xfi - e) + o, for every 0 < 3 < s - y - 2c1. ,Finally, (2.8.1) follows from.Extension (2.3), Fabian (1973b). Details and verification of the assump- tions of this extension are given in (14.3.iii). The 25 convergence of vn to D3f(6) follows by an argument similar to that used to show an +D2f(6). Therefore Assumption (2.8) holds, and by our Theorem (3.1), Xn has the properties asserted in Theorem (4.1) since tn/n + 1 because i/kl + 0. J 1AA .‘l.ll A REFERENCES REFERENCES ABDELHAMID, S.N. (1973). Transformation of observations in stochastic approximation. Ann. Statist. 1 1158-1174. ANBAR, D. (1973). On optimal estimation methods using stochastic approximation procedures. Ann. Statist. 1 1175-1184. BLUM, J.R. (1954). Approximation methods which converge with probability one. Ann. Math. Statist. 25 737-744. DUBINS, L.E. and FREEDMAN, D.A. (1965). A sharper form of the Borel-Cantelli lemma and the strong law. Ann. Math. Statist. 36 800-807. FABIAN, V. (1960). Stochastic approximation methods. Czechoslovak Math. J. 10(85) 123-159. FABIAN, V. (1964). A new one-dimensional stochastic approximation method for finding a local minimum of a function. Trans. Third Prague Conf. Information Theory, Statistical Decision Func- tions, Random Processes, Czechoslovak Academy of Sciences, Prague, 85-105. FABIAN, V. (1968). On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39 1327- 1332. FABIAN, V. (1971). Stochastic Approximation. Optimizing Methods in Statistics (J.S. Rustagi, ed.). . 439-470. Academic Press, New York. FABIAN, V. (1973a) (Referred to in the text as I). “Asymptotically efficient stochastic approximation; the RM case. Ann. Statist. 1 486-495. FABIAN, V. (1973b). Estimation of the derivative of the logarithm of a density. Ann. Statist. 1 557-561. 26 27 KIEFER, J. and WOLFOWITZ, J. (1952). Stochastic estima- tion of the maximum of a regression function. Ann. Math. Statist. 23 462-466. LOEVE, MICHEL (1963). Probability Theory, Third Edition. D. van Nostrand Company, Inc. ROBBINS, H. and MONRO, S. (1951). A stochastic approxima- tion method. Ann. Math. Statist. 22 400-407. VENTER, J.H. (1967). An extension of the Robbins-Monro procedure. Ann. Math. Statist. 38 181-190. nrcnrcnu STATE UNIV. LIBRARIES 1|111111111111111111111111111111111 31293107864591