III II II II ‘I IIIIIII‘ II 2“; ‘III III» , I w «I» HHIII IIIII I W III I I I I I I II‘I [I II II "Io—A I‘ look) Il mac-1:. II A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE Dissertation for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY DAVID RUPPERT 1977 L.— IIIIIIIIIIIIIIIIIIIIIIIIIII mg; J University This is to certify that the thesis entitled A New*Dynamic Stochabtic Appnoximatéon Pnoecdune presented by David Rappeat has been accepted towards fulfillment of the requirements for “an M‘m Major professor Date W If lIQI‘I 0-7639 .« 3 00 "I 2.2:? ABSTRACT A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE By David Ruppert This paper considers Robbins-Monro stochastic approximation when the regression function changes with time. At time n, one can select Xn and observe an unbiased estimator of the regression function evaluated at X". Let an be the root of the regression function at time n. Our goal is to select the sequence Xn so that Xn - 9n converges to 0. It is assumed that an = f(sn) for sn known at time n and f an unknown element of a class of functions. Under certain conditions on this class andcuithe sequence of regression functions, we obtain a random sequence Xn such that IXn - enl converges to 0 in Cesaro mean with probability 1. Under more stringent conditions, Xn - en converges to 0 with probability 1. A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE By David Ruppert A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1977 L: ‘1 .0 To Patricia ii ACKNOWLEDGEMENTS I wish to thank Professor Vaclav Fabian for having suggested this area of dynamic stochastic approximation to me, for giving guidance to the research as it progressed, and finally for his valu- able criticism of earlier drafts of this thesis. Also, I want to thank Professor Jan Marik for a suggestion which led to theorem 3.ll. I believe that each member of the faculty of the Department of Statistics and Probability has helped me in some way during my four years of graduate study at Michigan State University and to them I am grateful. Mrs. Noralee Burkhardt deserves thanks for typing my thesis excellently and for her cheerfulness and the patience while correct- ing my errors. Chapter l #0)“) TABLE OF CONTENTS INTRODUCTION NOTATION AND ASSUMPTIONS GENERAL RESULTS RESTRICTION T0 H A FINITE DIMENSIONAL VECTOR SPACE BIBLIOGRAPHY iv Page 18 30 CHAPTER I INTRODUCTION This study has been motivated by practical situations in which a process is controlled by a variable X and it is desirable to choose X in such a manner that the response, Rn(X), at time n is close to 0. If en satisfies Rn(en) = 0, it would be enough to choose Xn’ the value of X at time n, equal or close to en. The basic information is provided by the process itself; for any choice of Xn we can obtain an unbiased estimate of Rn(xn). If Rn’ or at least an, is independent of n and some regularity conditions are satisfied, then the stochastic approxima- tion procedure of Robbins and Monro (1951) provides a method of selecting a sequence {Xn} such that Xn + 61 almost surely. We are concerned here with situations where 6n does change with n. Dupac (1965, l966) and Uosaki (I974) studied such situa- tions, but their model is substantially different from ours; both models shall be compared later. In our model we assume that 9n = f(sn) for an f in a family 8 of functions on a set S and for a sequence {Sn} in S. Initially, only 3 is known, not f and not {Sn}° At time n, the value 5 becomes known, and, after Xn is selected, an unbiased n estimate of Rn(Xn) is observed. The interpretation is that sn summarizes the knowledge about the process at time n. For example, in the case of a process in- volving a chemical reactor, 5 can describe the age of the filter, n the quality of the catalyzer, and the impurities of the input. In another example we may have 5n = n and then the assumption concern- ing an means simply that the function n~+en is in 3. We propose an approximation method, for which Xn - on approaches 0 in a certain sense, for some families 3. For example, 3 can be the family of all functions f on [0,1] such that, for some K and a > %, depending on f, |f(x) - f(y)l .<._. le - yl“ for all x,y in [0,1] (cf. Theorem 3.9). Another example, admittedly simpler, yet of considerable practical importance, is the case when S is the family of all 'linear combinations of k functions f],f2,...,fk. Both these examples are special cases of the more general con- dition (see assumption 2.3) that there exists an inner product space H and a function U on the set S into H such that S c {f8; 8 E H} where fB denotes the function defined on S and assigning to each s in S the value <8, U(s)>. Under certain additional regularity conditions we shall show that the proposed approximation procedure yields {Xn} for which X - on + O in Cesaro mean with probability one; under more stringent n conditions Xn - on + O with probability one. Dupac (1965, 1966) considered Robbins-Monro type stochastic approximation methods when the root changes during the approximation process and Uosaki (1974) generalized his work. In these papers, the basic assumption is that en+1 is equal to gn(en), with gn known, plus an unknown but small Vn' The procedure then is similar to the original Robbins-Monro procedure except that where the latter obtains the estimate Xn+1 by adjusting Xn’ the former adjusts gn+](xn) (and neglects ,Vn)' In our model, the procedure estimates the function fB by estimating B. If H is infinite dimensional the procedure allows us to keep the estimates finite-dimensional in order that the pro- cedure can be practically realizable. In addition to the above problems we also consider, in theorem 4.5, the case where U(sn) is a random variable with values in Rk. In summary we will show that under conditions similar to those used to prove the convergence of the Robbins-Monro method, Xn - 6n + O in Cesaro mean with probability one, where an is the unique root of Rn(X) = O and Xn is our estimate of on. Of practical importance are similar generalizations of the Kiefer— Holfowitz (1952) method of maximization (or minimization) of func- tions on R and Blum's (1954) multi-dimensional version of the Kiefer-Nolfowitz method. One can expect that the methods obtained by such generalizations would have a property analogous to the almost sure Cesaro mean convergence of Xn - on to O. CHAPTER 2 NOTATION AND ASSUMPTIONS 2.1 Notation. The conventions introduced here hold throughout. Let Rk be k—dimensional Euclidean space. The space R1 will be denoted simply as R. Denote the transpose of the matrix A by AT. Then k the inner product on R is defined by = xTy for x,y E Rk. If A and B are sets, then AB is the set of all functions from B to A. Let (Q,F,P). be a probability space. If F e F, then IF is the indicator of F. If V is a normed vector space, then let !_ be the smallest o-algebra containing all open balls, that is all sets of the form {X e V: “X + a“ < e} for e > O and a e V . A map T in V9 is called a measurable transformation into V if T'](V)<: F. If T1,...,T are measurable transformations into V. n then o{T],...,Tn} is the smallest o-algebra on 9 containing -1 1 1,. (1) . "C: i All relations between measurable transformations are meant to hold with probability one. If hn is a sequence of numbers, then 0(hn) denotes a sequence gn of numbers such that for some K |h']g | < K for all n n n —- ' 2.2 Assumption. (i) Let S be a set and suppose Sc: RS. Suppose f 6 3. (ii) Let Rn 6 RR, en 6 R, and A > 0. Suppose (I) (X - 6n)Rn(X) : 0 and (2) IRn(X)I :AHX-enl +1) for all X E R. Let sn 6 S and suppose (3) e = f(s ). 2.3 Assumption. (i) Assumption 2.2(i) holds. Let H be a real vector space and suppose <-,-> is an inner product on H, i.e. <-,-> is a map from H x H to R such that if x,y,z e H and a E R then = a + , , O, | v and = 0 implies x = O . ~ 1’ . . For x E H define “x“ = 2. Suppose there is a function U in S such that for each f in 3 there exists a B in H satisfying H f(s) = <8, U(s)> for all s 6 S. (ii) Assumption 2.2(ii) holds. Let Un = U(sn). 2.4 Remark. He shall now consider the problem of estimating the sequence {en}. The experimenter knows H, <-,->, and U and he knows that assumption 2.3 holds. At time n he estimates 8 by an estimate 8". Also at this time he learns the value of sn and therefore of U"; he uses U to estimate 6n by Xn = <8", Un>. n He can also observe a random variable Yn’ an unbiased (conditionally, given the past) estimator of Rn(Xn). He then forms his next estimate * = B - a Y U B n nnn n+1 * with an a suitably chosen non-negative number and Un either equal to Un or a suitable approximation to Un' For example, if H = 12, then the experimenter may wish to use a finite dimensional approximation, u;, to a Un in 22. We shall reformulate the construction of the 8n in the follow- ing assumption, where Fn is the o-algebra associated with the "past" at time n. 2.5 Assumption. (i) Assumption 2.3 holds. (ii) Let Fn be an increasing sequence of o-algebras contained in F. * Suppose {8n}, {Yn}, and {Un} are sequences of measurable transformations into H, random variables, and elements of H, respectively, such that with Xn = * (1) 8n+1 = 8n - anYnUn for some an.: O, 0(B],...,Bn)<: F”, and Fn Fn 2 2 (2) E Yn = Rn(xn) and E (Yn - Rn(Xn)) :_o for some 0 . (iii) Assume that * (3) <8", Un - Un> = 0 . 2.6 Remark. Suppose we wish to choose u: not equal to Un' Then (2.5.3) will still hold if for an increasing sequence of subspaces, * {Hn}, 8] e H] and Un is the projection of Un onto H for n+l’ then by (2.5.1) 8n E Hn for all n. CHAPTER 3 GENERAL RESULTS We will be interested in the convergence to O of the sequences {“8n - 8H} and {Xn - 0"} defined in assumptions 2.2, 2.3, and 2.5. These two sequences are closely connected for under these assumptions Xn - on = <8n - B, Un>. Forpractical purposes {Xn - en} is of primary importance since Xn would be the value of the control variable at time n while an would be a value of the control vari- able which for our intentions would be optimal at time n. 3.1 Lemma. Suppose assumption 2.5 holds and (1) 2 an(1 + nunn)|| < a and (2) z afiuu:“2(1 + “UnU2) < m . Then, (3) ( th - 8H2 has a finite limit and (4) 2 aan(Xn)(Xn - en) < w , Proof: By (2.5.l) and (2.5.2) F (5) E n B - 8H2 < I8 - BIZ - 2a R (x ) H ._ I n l n n n n . n n+I 2 * 2 2 2 + anUUnU (Rn(Xn) + o ) . Now by (2.2.2), <6) IRn(XnII:A(|l + 1) :AIIIBn - en nun“ + I). Also by (2.5.3) II A X * «8n - B,Un> Therefore (7) Rn(Xn) ) |v Rn(xn)(xn ' en * 2 - |<8,Un - Un>|A((HBn - 8H + 1)HUnH + 1). Substituting (6) and (7) into (5), one obtains F n 2 2 - (8) E hen.] - 8H .1 H8" - 8h <1 + tn) - ZaanIXnIIXn ' 9n) + 9n 2. 2 * 2 * where fn = 0(anhUn“ nun“ + an||“Un“) and n By (1) and (2), z fn + 9n < m. Thus (3) and (4) hold by 2 * * .. 9n = 0(anHUn“ + anl|(1 + “Unn)) and fn,gn 3_O. Theorem 1 of Robbins and Siegmund (1971). 3.2 Remark. Condition (3.1.1) involves 3 which, of course, is unknown. However 8 depends on f and f is known to be in the class 3. Thus it may be possible to verify condition (3.1.1) by using properties of S. IO 3.3 Theorem. Let assumption 2.5 hold. Let a, y, and e be numbers satisfying and a + e > I. Suppose for a > O, ‘0 an = an , * Y hUnH + HUnH = 0(n ). and g * - (I + Hunn>l| = o 0 (9) inf inf IRn(X)| > 0 n n§IX-6n| or if Y = 0 and for all n > 0 (10) inf inf _] an(X)| > o n njlx-enljn then -1 n n 2 IX - O I + 0. k=l k k Proof: First, (3.1.1) holds since an(] + “UnH)Il = 0(n-(a+€)) and a + e > 1. Next (3.1.2) holds for ll afiHUZHZII + nunnz) = 0(n"2“‘4YI) and 2a - 4y > 1. Thus by lemma 3.1, 2 aan(Xn)(Xn - on) < m and lim “an - 8“ exists and is finite. From now until the end of the proof we look at an m for which the two properties hold and write 5 instead of E(w) for any random variable a. For every n > 0 there is a 6(n) > 0 such that (11) an - 6n] 3_n implies IRn(Xn)| > 6(n). This follows directly from (9); if (10) holds and y = 0 then IXn - en] 5 “on“ “B" - B“ is a bounded sequence and (11) holds again. Let n > 0, set In = 1 if |Xn - onl 3_n and 0 otherwise. Then the finiteness of z a R (X )(X n n n n - en) and (11) imply n n ' enl < m’ By Kronecker's lemma (see Loéve (1963), p. 238) n '0 n 2 IX - a II + O. k=l k k k Since oi: l, o -1 n . -1 n 11m sup n 2 IX - e §_n + 11m Sup n z Ix - 9 II = n n+w k=l n n n+m kz] n n " for all n > O. n 3.4 Remarks. The conclusion n"1 Z le - ekl + O is of practical 1 n l importance since if n' z IXk - ekl is small then the process would I have run at near optimal conditions for most of the first n runs. 12 Without additional assumptions, the conclusion of theorem 3.3 cannot be strengthened to (Xn - 6") + O, as can be seen in example 4.7 below. Moreover, example 4.8 below shows that under the hypo- theses of theorem 3.3 (Bn - 8) + 0 may fail even if (Xn - on) + 0. Since u: is intended to be an approximation to Un we can ' * expect that “Unu = 0(nUn“) and in that case the condition * Y m nun“ + nun“ = on > would be known to hold with y = 0 if U is bounded. If U is unbounded then it might be difficult to verify that (1) holds; how- ever, the theorem has been formulated to allow y.> O. 3.5 Assumption. Let D be a countable set. Define the real vector space as and the inner product <-,->D on 23 by £6 = {g 6 RD: 2 92(d) < 00} den and D = 2 g(d)h(d) for g,h E is . deD 2 . . _ B For f E to define “fHD — D. Let {0”} be a sequence of finite subsets of D with Suppose assumption 2.2(i) holds. Let U be a map from S 2 1 _ . * to 20, et Un — U(sn), and define Un by * Un(d) Un(d) if d e on+1 o if d e on+]. 13 Suppose assumption 2.5(ii) holds with an = an'“ for some a > 0 and a > %. Suppose 81(d) = 0 if d i 0]. 3.6 Remark. If assumption 3.5 holds, then it can be easily shown by induction (see Remark 2.6) that assumption 2.5(iii) holds. 3.7 Assumption. Assumption 3.5 holds with the following choices of S, S, D, Dn’ and U. S = [0,1]. _ [0,11, 3 - {f e R . for some K > O and y > %, |f(x) - f(y)| §_K|x - y|Y whenever x,y E [0,1]}. 0 = {(k,m): m = k = 0 or m and k are integers satisfying mic and lik12m}. 0n = {(k,m): (k,m) e D and 2m.: n}. U(x)(k,m) = 1 if (k,m) = (0,0). For (k,m) f (0,0), U(x)(k,m) = hum" if x e ("—,},l, “"21 m 2 2 -1 . k-» k = -(m+l) 1f’ x 6 (——3,-—0 2m 2'1n =0 if xe("—‘-‘—.“—) with 2fk and 1<2<2‘“. 2m 2'" ‘ ” As a function of x, U(x)(k,m) is continuous at 0 and l and at points of discontinuity it equals the arithmetic mean of its left and right limits. 14 3.8 Remark. Note that U(-)(k,m) is a multiple of the Haar func- tion with indices k and m as defined by Alexits (1961), page 46. 3.9 Theorem. Let assumption 3.7 hold. Then, 1 n n- 2 IX - 0 I + 0 . k=l k k Proof: We need only show that the hypotheses of theorem 3.3 hold. First we will show that assumption 2.3 holds with H = 2 D 2 0' Let B E R be defined by B(k,m) = 2'"(m+1)2 I; f(x)U(x)(k,m)dx for (k,m) E D. By the definition of S we can and shall choose a g > a such that |f(x) - f(y)| 5_K|x - ng for some K and all x,y 6 [0,1]. Then, -(m+l) _ |B(k.m)| = (m + 1)2m1r§ f(z mD for x 6 [0,1] by Alexits(1961), theorem 1.6.2. Thus assumption 2.3 holds; therefore assumption 2.5 ho1ds. 15 2m For xe [0.11, 1: (U(x)(k,m))2_<_ (m+1)‘2. Thus k=l 8 m'2 < m and therefore “UnUD + “u:“0 = 0(1). ll [‘1 sun 11161135. 1 + x m 1 Finally <3. un - ”h’o = 0(n'E) by Alexits (1961), 4.6.1 Therefore the hypotheses of theorem 3.3 are satisfied with y = 0 and e = E. 3.10 Assumption. Assumption 3.5 holds with the following choices of S, S, 0, Dn’ and U. S = [0, 2n], K = {g E REO’ZNJ: g is Lebesgue measurable, f8"g(u)du = 0 and x§"(g(u112du < m}. and S = {h e REO’ZNJ: h(x) = f3h'(u)du + c where c E R and h' e K}. D={(l,0)}U{(i,k): i=l,2 and kill, 0n = {(i,k) 6 D: k §_n} for n 3_0, and U(x)(i,k) = 1 k = O = k.1 cos kx i = l and k 3_l =k" sin kx i=2 and kil. 3.11 Theorem. Let assumption 3.10 hold. Then n 2 IX - e I + 0 . k=l k k 16 Proof: Define B 6 RD by 8(l.0) =-%; 13" f(x)dx and _ k 2n B(l,k) - T'fo cos (kx)f(x)dx B(2,k) = g-rg“ Sln (kx)f(x)dx By the definition of S, f is the indefinite integral of Zn f' and f' E K. Since f0 f'(x)dx = O, f(0) = f(2n). Thus integration by parts shows that for k :_1, II # 8(i,k) - - [8" sin (kx)f‘(x)dx if i =I—a f2" 0 cos (kx)f'(x)dx if i = 2 . :1_: Since f8“(f'(x))2dx < w, B € 23 by the Bessel inequality. Since f is an indefinite integral, it is continuous and of bounded variation. Therefore f(x) = <8, U(x)>D by, e.g., Akhieser (1956), section III, 53. Then assumption 2.3 holds and therefore assumption 2.5 holds. Note that sup |IU(x)IID < m. X€[0,2n] Finally * 2 m 2 2 IIU - U [I = >3 (U (NO) + (U (2,k)) n n D k=m+l k k 5_2 2 k'2 = 0(r:x’2dx) = 0(n' ), k=n+l 17 so by the Cauchy-Schwarz inequality I<8, Un - UN>D| = 0(n'g). Therefore the hypotheses of theorem 3.3 are satisfied with y = 0 and e = 15. f CHAPTER 4 RESTRICTION T0 H A FINITE DIMENSIONAL VECTOR SPACE 4.1. Foreword. If assumption 2.2 holds with 8 equal to the vector space spanned by k functions f],...,fk, then assumption 2.3 holds k with H = R and U the map 13(5) 5 + . fk(SI In this case,since inner products in Rk are easily computed,it is reasonable to suppose that in assumption 2.4 u: and Un have been * chosen so Un = U (see Remark 2.4). n Suppose assumption 2.5 or assumption 4.4, the analogue of assumption 2.5 when Un is random, hold. AS will be seen, U8n+l - 8n“ converges to zero. Therefore it is possible to find con- ditions on Un so that, roughly speaking, 8n - 8 cannot be almost perpendicular to Un too Often and under these conditions, “8" - B“ + 0' k _ * and Un — U . 4.2. Theorem. Let assumption 2.5 hold with H = R n Fix p.: k and let N" be the k x p matrix whose ith column is U for i = l,...,p. Let an and on be the minimum n+i-l ,max T and maximum eigenvalues, respectively, of wnwn. ,min Suppose the sequence {a } satisfies n 18 19 2 2 2 (1) 2 an “Uh“ (l + “Uh“ ) < w and (2) z ( min aj) = w k=l nKEJEFk+p-1 for a sequence of integers {nk} such that (3) nk+1 3_nk + p for all k and for some 6, A > 0. (4) 6 :,6 < A for all k. . < 6 Suppose that for all e > 0, (5) inf( _1 inf IRn(x)I) > 0 . n e >Ix-OnI>e Then .“Bn - B“ + 0. If in addition sgp “Unu < m then Xn - on + 0. Proof: All the assumptions Of lemma 3.1 hold SO (3.1.3) and (3.1.4) hold. By (1) E(X (anIIUnII(Yn - Rn(Xn))2))_<_ 02 z afinunnz < .. . Thus (6) annunmvn - Rn(Xn)) + 0 . Also by (1) m annunno +11un1n + o. 20 * Then since (2.5.1) holds with Un = Un (8) New - Bull : anIIUnlUAU + I18n - 811111.11) + IYn - Rn(xn)l). By (3-1-3), (6). (7), and (8) ,hB - 8"“ + 0. n+1 For any k.: 1 define A = {16‘ k < limIIBn - BII < k} n {IIB n+1 ' 8n“ + 0}. Since except for a set of probability 0 {11m ”an - on > 0} = A "c: 8 kl k’ to prove IIBn - B“ + 0 we need only show that for any k, P(Ak) = 0. We now fix k and fix w 6 Ak' Until the end of the proof we write g instead of g(w) for any random variable 5. Now choose L1 such that 118,, - 8“ > k" whenever I > L]. 2 _. Then for all 2 3_L], p-l Z () = “w (B - BIII i=0 "2+1 "2 "I "I _ T _T _ - (8n - 8) Mn N" (8" B) R. R. 9. 2. 2 -2 2.6 -IIB -BII:6k- n£,m1n n2 Here we used (4) and the result that if A is a positive definite T k x k matrix with minimum eigenvalue, A, then x Ax 3_X-"x“2 for 21 k all x e R (see Rao (1973) (lf.2.l)). Thus there exists a sequence {mx} such that mg is in the set {n£, n£+l,...,n£+p-l} and -2 B - B>)2,:-§5- 9 )2 < “NT x“2 < AIIXII2 for x 6 RK. Since “B - 8 2 - n£ - m B 1 mil 1 2 ‘IB. - e. ‘I, ._ L 1 1-1 1—n£.l n +p-l (10) ()2 < AI 12 H3 ‘ 3 U2) m1 m2 "2 "' i=n£+l 1 1'] Since IIBn+1 - 8n“ + 0 we have by (10) that for a number L2 (11) ()2 < §£:E- hene e 2 > L m ’ m n —- 4p w V r -— 2' IL 9. 2. By (9) and (11), if we let L = max{L],L2} then IX -el=lI m2 m2 m2 m2 ._: l| - I| me "2 m2 me "I > k-1 é- whenever I > L _2 p _° Also IX - e I = I<é - B , U >I 5_A sup{“e - 8“} < m. mfi mg mg m2 m2 n n Thus by (5) there exists r > 0 such that R (X )(X - e mI m2 m2 m2 ) 3_r whenever R 3_L. 22 Then Since "MB aan(xn)(xn - en) 1 n 1 oo 2 a R (X )(X - 9 ) I=L me me m2 m2 m2 3_ 2 (min a.)r I=L n£5j§n£+p-l J u 8 it follows from (3.1.4) that P(Ak) 0. Finally Since Xn - 6n = <8“ -8],Un>, Xn - 9n + 0 1f supHUh“ < m. 4.3. Remark. Until now we have assumed that sn is a fixed element of S. Assumption 4.4 is an analogue Of assumption 2.5 when sn 6 SS2 and on is a random variable. At time n, the expected Output of the process, given the past, depends on both Xn and on. When on was non-random we wrote the expected output as Rn(Xn); the dependence Of the output on on is implicit in this expression. When an is random it is more convenient to denote the expected Output as Rn(Xn,en) where Rn is a mapping of R2 into R. 4.4. Assumption. Assumption 2.2(i) holds. Let Rn be a Borel map 2 from R to R such that (x - y)Rn(x,y) 3_0 for all x,y 6 R. IRn(x.y)I :.A(1 + Ix - yI) for A > 0. Let sn E S9 and define 23 on = f(sn). Suppose assumption (2.3)(i) holds with H = Rk Un is a measurable transformation into Rk. and with Un = U(sn), k Let {8“} and {Yn} be random sequences in R and R, respectively, such that with F = o{B],...,Bn, U],...,U } n n we have 8n+1 = 8n - anYnUn for some an :_0, Fn E Yn = Rn(xn,en), and F n 2 E (Yn - Rn(Xn,en)) §_o < m. 4.5. Theorem. Let assumption 4.4 hold. Define * Fn = O{B1’°°"Bn’ U1""’Un-1 . Suppose r, K > 0. Assume * . Fn -1 Wk P (FIIXII : l| 5. I" . IIXII) : 1‘ xeR and * F E "(nunifu + 1111,1121) < K. If (l) 2 an = m and z a < m and for all e > 0 (2) inf inf IRn(x,y)| >‘0, n e-]>IX-yI>e then IIBn - on + 0. If an = an'a with a > 0 and k < a < l, 2 EIIBIII < °°2 and for some c > 0 (3) IRn(x,y)| 3_c|x - yI for all x,y E R, then 2 sup nO‘IIBn - 8“ < w and sup n“ EIXn - On < m. n Proof: First, * 'k 'k F F F n 2 _ , 2 n 2 n 2 (4) E U8n+l - 8H - hen - B“ - Zan E Yn(Xn — on) + an E (YnUUnU) . If we define h(x) for x 3_0 by h(x) = inf inf _]IRn(y,Z)I n XI‘_<_Iy-zI_<_XI‘ then, 25 'k 'k F F F E "(Yn(Xn - e )) E n((x - en)E "1") E: E (xn - on)Rn(xn,en) I * Fn -1 _>. 1118,, - 811M118n - 81111’ (r IIBn-BIBIIzrllfin-BII) n Thus, * Fn 2 (5) E (Yn(xn - en))_: r.“sn - BHn(HBn - 8“). Next, F* F* F n 2 n 2 n 2 E (vnuunh = E (1111,11 E 1,.) F* 1 E "nunuzmfiixnap + 021 and since 2 2 2 2 Rn 0 and Wlth fn’gn :_0 and fn,gn - 0(n ). Then by a lemma of Chung (see Fabian (1971), lemma 3.1) 8 sup {naEIIBn - enz} < n 1 T and . . ~ , 2 , 2 Slnce E|xn - enl 5_E(H8n - thUnH) (Ehen - B“ EhUnH ) I A W%V:K sup {nu EIX - enI} < n 8 4.7. Example. With this example we Show that the assumptions of theorem 3.3 imply neither Xn - On + 0 nor “an - B“ + 0. Let H = R2. Suppose e1 and e2 are the standard unit vectors in R2, i.e., eI = (1,0) and e; = (0,1). Suppose B is the zero vector, Rn(x) = x for all n, B] = e] and an = an'] for 0 < a < 1. Let G be a subsequence Of the integers such that 2 an < m. nEG Assume that Un is e] or e2 according as n 6 G or n ¢ G. Assume the process is deterministic, i.e. Yn = Rn(Xn). 27 For E 6 R2 let g(i) be the ith coordinate of g, i = 1,2. If n e G, then UII) = 0 and therefore (1) = (1) (l) Bn+1 8n if n i G. If n 6 G, then Yn = Xn = <8n,Un> = 83]) and Us]) = 1, SO (2) BAII = 8£1)(l - an) if n 6 G . Since 8(1) = l, we have by (l) and (2) that 1 85]) = H (1 - ak) for n > 1 . k 0 such that nEG lim ( H (1 - ak)) = d . n-wo kEG kB. However, in this case Xn - on + 0. Let H = R2. Elements of H will be represented as complex numbers. Suppose 8 = 0 and Rn(x) = l for x > 0 -l for x :_O. 28 Suppose c1 = l and n-1 1 .2 3'] cn = e 3:] for n 3.2. ~ in-1 Let an - Ie - lI and . -l . _ 1n _ -1 Un - (e l)a cn Also assume 8] = l and Yn = Rn(xn)’ Then for all n (l) 8n = cn and - ‘1 L (21 x - e = xn = -(‘ C35 " 12 . whence Xn - en + 0 but “an - B“ = l for all n. To prove the last statement, first note that . -1 _ . -1 a2 = (eln _ ])(e-1n n - l) = 2(1 - cos n'l). Next, with Re ¢ denoting the real part of the complex number ¢, = Re(cnUn) Thus if (1) holds for n = k, so does (2). Moreover (l) and (2) with n = k imply (l) for n = k+l by the following calculation: '1 _ l - cos k P Bk+1 ‘ Ck ‘ ak(Rk(-( 2 )2))Uk II 0 x. + A (D I nun-J V O $- By Observing that (1) holds for n = 1 the proof is completed. Note that by Taylor‘s theorem afi = 2(1 - cos n']) 5 n' 2 + 0(n-4) with 0 < dn < n']. It is then easy to see that the assumptions of theorem 3.3 hold if the theorem is trivially generalized by replacing the assumption an = an'a by an = cnn'a with 0 < m 5-Cn 5_M < m for some m,M. BIBLIOGRAPHY BIBLIOGRAPHY Akhiezer, N.I. (1956). Theory of Approximation. Ungar, New York. Alexits, G. (1961). Convergence Problems gf_0rthogonal Series. Pergamon Press, New York. Blum, J.R. (1954). Multidimensional stochastic approximation methods. Ann. Math. Statist. 2_5_ 737-744. Dupac, V. (1965). A dynamic stochastic approximation method. 5mm, Math. Statist. 36_ 1695-1702. Dupac, V. (1966). Stochastic approximations in the presence Of trend. Czech. Math. J, l§_(9l) 454-461. Fabian, V. (1971). Stochastic approximation. Optimizing Methods jm_Statistics (J.S. Rustagi, ed.) 439-470. Academic Press, New York. Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation Of the maximum of a regression function. Ann. Math. Statist. g§_ 462-466. Loéve, M. (1963). Probabilitylheory(3rd edition). Van Nostrand, New York. Rao, C.R. (1973). Linear Statistical Inference and ItS Applications. John Wiley, New York. Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Statist. 22_ 400-407. Robbins, H. and Siegmund, D. (1971). A convergence theorem for non negative almost supermartingales and some applications. thimizimngethods jm_Statistics (J.S. Rustagi, ed.) 233-257. Academic Press, New York. Uosaki, K. (1974). Some generalizations Of dynamic stochastic approximation processes. Ann. Statist. 2_ 1042-1048. 30 MICHIGAN STATE UNIV. LIBRARIES IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 31293102512823