.7 TRANSFORMATION OF OBSERVATIONS m STOCHASTIC APPROXIMATION , Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY SAM! NAGUiB ABDELHAMID 1921 1|llllll’lllflllflllll Em fl _ MiCbI'335. State Unive it)! Mm‘ This is to certify that the thesis entitled TRANSFORMATION OF OBSERVATIONS IN 1 STOCHASTIC APPROXIMATION | i presented by L Sami Naguib Ab de lhamid l has been accepted towards fulfillment of the requirements for Ph.D. degreein Statistics and Probability 1 \v’doiw "filvh‘mt Major professor Date October 11, 1971 0-7639 {we 2 b 19:? 1 "3 “ H‘s ‘- d ABSTRACT TRANSFORMATION OF OBSERVATIONS IN STOCHASTIC APPROXIMATION By Sand Naguib Abdelhamid Let X1 be a random variable. We consider the following general stochastic approximation procedure: (1) xn+1 = xn - ancnlhan) , n = 1,2,... where Yn are random variables, an and cn are positive numbers, and h is a Borel measurable transformation. With the choice h = the identity, (1) includes both the Robbins-Monro (RM) pro- cedure and the Kiefer-Wolfowitz (KW) procedure. Fabian (1960, 1964) considered (1) with h = sign; we shall call (1) with h = sign procedure (F). We study the asymptotic preperties, the a.s. convergence and the asymptotic normality, of this prOposed generalized procedure under some mild requirements on h and on the random variables V :1 (Yu - Mnu'n) with Mnu’n) = E1 [Yn] where In == [X1,X2,...,Xn]. We confine our analysis to the case Share Vn are conditionally, given I“, distributed according to a distribution function G which is symmetric around 0 and admits a density g. It is shown that nB(Xn - e) is asymptotically distributed as a normal random variable g; 9 is the parameter to be approximated. The effect of using h in (l) is pointed out for the RM.and the KW situations. Sami Nagu ib Abde lhamid We consider a transformation h Optimal if it minimizes E g2 and we show, under some regularity conditions, that h is Optimal if and only if it is equal to -C(g'/g) (a.e. with respect to G) for a C > 0. With such an optimal transformation h, the surprising result, despite the very simple recurrence relation in (1), is that our optimal procedure is not only Optimal within the class of stochastic approximation procedures considered but also it is an asymptotically efficient estimator within the general class of regular unbiased estimators. We also show that the RM procedure as well as the KW procedure are Optimal if and only if the error random variables are normally distributed. As for pro- cedure (F) we show it is Optimal if and only if the error random variables have a double exponential distribution. For distributions which do not satisfy the regularity conditions, we show how one can design transformations that yield improved procedures. Our results make it possible to study the asymptotic relative efficiency (A.R.E.) for different choices of h, and in particular we show that the A.R.E. of procedure (F) relative to the optimal procedure is the same as the A.R.E. of the sequential sign test relative to the sequential probability ratio test (SPRT) (Cf- Groeneve 1d (1971)) . TRANSFORMATION OF OBSERVATIONS IN STOCHASTIC APPROXIMATION By Sand.Naguib Abdelhamid A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree Of DOCTOR OF PHILOSOPHY Department Of Statistics and Probability 1971 ©C0pyrlght by SAMI NAGUIB ABDELHAMID 1972 TO MY WIFE AND SONS Mona, Hishan and Tariq ii Acmomancamrs I wish to express my sincere gratitude to Professor Véclav Fabian, who introduced me to the area of stochastic approximation and guided me through this research. His careful criticism and valuable comments helped me avoid many wrong turnings. His suggestions have been the source of many of the results presented here and improved virtually all results, either in substance or style, almost beyond recognition. In addition, among many others who helped, I would like to thank Professor Dennis Gilliland for his encouragement and the time he gave for some discussions and reading the material of this re- search; Mrs. Noralee Barnes, who excellently typed both the rough and final versions of this thesis, and whose speed, accuracy, cheerfulness and patience leave me, at times, unnerved; my wife. for her great understanding, continuous encouragement, heroic patience and endurance, and the ideal care of me and my two sons while I was too busy with my studies and research. Finally, I would like to thank the people of my country, the United Arab Republic, and its Government for the financial support during the first five years of my graduate study. Also I would like to thank the Department of Statistics and Probability at Michigan State University for the generous support, financial and otherwise, that I have enjoyed over the period of my studies for the Ph.D. degree. iii TABEE OF CONTENTS Chapter Page 1 INTROWCTION AND SWRY I O O O O O O O O O O O O O O O O O O O O O O O 1 2 AIMDST SURE CONVERGENCE OF THE MODIFIED STOCHASTIC APPROXIMATION PROCEDURES ....... ...... O‘ 1 Basic Assumptions and Notations ............ 2 Remark ...................... ..... .......... 3 RobbinséMonro (RM) Situation ............... 4 Kiefer-Wolfowitz (KW) Situation ............ .5 Almost Sure (a.s.) Convergence Theorem ..... 6 7 8 9 Remrk 0....0.00.0.........OOOOOOOIOOOOOO0.0 QOQVO‘O‘O‘ Assumtion O O O O C O O O I O O O O O O O O O O O O O O I O O I I O O O O 0 Assumption 0 O I O O O C I O O I O I O O 0 O O O O O O O O O O O O D O O O O 10 Lew O O O O O O O O O O O O O C O O O O O I O C O O O O O O O O O O O I O O 0 O 10 NNNNNNNNNNNNN O 010 Lem coco...cocoo-oooooooooooooooooooooooo. 12 011 Remrk ooooooooooooooooooooooooooooooooooso. 12 012- .14 Examples .... ...... ooooooooooooooooooooooooo 13 3 ASYMPTOTIC NORMALITY OF THE MODIFIED PROCEDURES . 15 . Theorem .................................... 15 . Asymptotic Normality Theorem ............... l6 . Remark ........................... ..... ..... 19 Remark ...................... ..... .......... 20 Theorem (KW Situation) ...... ..... .......... 20 Theorem (RM Situation) ............... . ..... 21 Theorem ... ...... ... ........................ 22 Theorem (RM Situation) ..................... 24 uuwuuwwu O (”VGU‘J-‘WNH 4 SPECIAL CASES AND RESULTS ........ ........ ....... 25 Introduction ............................... 25 The a.s. Convergence and the Asymptotic Normality Of Procedure (F) ................ 25 Result (KW Situation) ..................... 26 Result (KW Situation) ..................... 26 Result (RM Situation) ..................... 26 o 0 NH ##### ## \lO‘U’lbw . Remark .................................... 27 . The Optimal Choice of (a,c) in the KW Situation ................................ 27 4.8 The Optimal Choice of a in the RM Situation ................................ 28 4.9 Remark ................................... 29 4.10 Effect of Taking m Observations at Each Stage ............................... 29 iv Chapter Page 5 OPTIMAL TRANSFORMATIONS ......................... 32 5.1 Introduction ............................... 32 5.2 Theorem .................................... 33 5.3 Remark .............................. ....... 34 5.4 Definition ................................. 34 5.5 Remark ..................................... 34 5.6 .Asymptotic Efficiency of Optimal Stochastic Approximation Procedures; the RM Situation . 35 5.7 The Straight Line Case ..................... 35 5.8 A Case of a Sequence of Functions .......... 36 5.9 Theorem .................................... 37 5.10 Theorem .................................... 37 5.11 Some Examples of New Optimal Procedures 37 §::§7Examples ................................... 38 5.16 Modified Procedures by Means of Suitable TranSformtions oooooooo00.000000000000000oo 40 5.17 A Case in which g' = 0 a.e. (G) .......... 40 5.18 A Case in which g' = constant a.e. (G) .... 41 6 ASYMPTOTIC RELATIVE EFFICIENCY OF THE MODIFIED PRmEDURES ..........OOOOOOOOOOOOOOOO0.0.0.000... 43 6.1 Definition of the Asymptotic Relative EffiCiency (A.R.E.) ......OOOOOOOOOOOOOOOOOO 43 6.2 Comparison of Some Transformations ......... 44 6.3 Remrk .0.........OCOOOOO......COOOOOOOOOOOO 46 6.4 Computations of the A.R.E. of Certain Transformations Relative to the Optimal Procedure for Some Given Known Distributions 46 REFERENCES 1.........OOOOOOOOOOOOOO......OOOOOOOOO 48 LIST OF TABLES Tab 1e Pa ge I 45 II 47 vi CHAPTER 1 INTRODUCTION AND SUMMARY Consider the stochastic approximation procedures of the form (101) x =xn -anC;IYn’ n=1,2,ooo , n+1 where X“, Yn are random variables, and an, cn are positive numbers. This includes both the Robbins-Monro (1951) procedure (RM), and the Kiefer-Wolfowitz (1952) procedure (KW). We shall study the effect of transforming the random variables Yn in (1.1) into h(Yn) by a Borel measurable transformation h. Fabian (1960 and 1964) prOposed that Yn in (1.1) be replaced by sign (Yn). He studied the almost sure convergence of that modified procedure in both the RM and the KW situations (See §2.3, §2.4), and he also discussed analoguous modifications in the multidimensional case. He indicated that those modified procedures behave better in some practical applications. Motivated by this idea can we transform Yn by means Of a Borel measurable transformation h and improve thereby the speed of convergence? The answer is yes. Therefore instead of (1.1), we shall consider the following general stochastic approximation procedure: - '1 = (102) xffil - Xn anCn ha“), [1 1,2,... , where h is a Borel measurable transformation. We shall establish the asymptotic prOperties of this prOposed generalized procedure, and then we characterize the optimal trans- formation h. With the choice h = sign we will refer to (1.2) as procedure (F). Let us denote In = [x1,x2,...,xn], Mnan) = ExnUnJ’ vn = Yn - Mn(In) . We shall confine our analysis to the case where the random vari- ables Vn are conditionally (given In) distributed according to a symmetric distribution function G admitting a density func- tion g. This requirement of symmetry is natural in the KW situation (See §2.7 below). To be more specific let f be a Borel measurable function defined on the real line. The exact analytic form of f may be unknown, but it is assumed that f belongs to a rather general family of functions. The only available information about f is that at any level x, we can Observe f(x) subject to a random error; that is, we can obtain an unbiased observation of f(x). In the RM situation, we try to approximate the unknown root of the equation f(9) = O; and in the KW situation we want to approximate a, the unknown point of minimum (or maximum) of a function f. In the RM situation (cf., e.g., Chung (1954), Hodges and Lehmann (1956),Burkholder (1956)) it is known, under some “’5 regularity conditions, that (Xn - e) is asymptotically dis- tributed as a normal random variable g with 2 2 (1.3) , E g = O, and Var g = a a , 2a f’(9) - l where 02 = Variance of G. Also in the KW situation (cf. Fabian (1968b) and also the references there) it is known, under some regularity conditions, that n1/3(Xn - e) is asymptotically dis- tributed as a normal random variable g with 2 2 -2 O a c 2 __ (2/3)a c f”’(&1 (1-4) E§-- 4af”(e) -2/3 ° 4 a f”(9) _ 2/3 , and Var g = The effect Of transforming Yn into hCYn) is that in- stead Of using estimators Yn of f(Xn), we are using estimators h(Yn) of another function f(xn). If then f has the same root as f and if the conditions guaranteeing the asymptotic normality I are preserved the effect of h is to change f’(e) into f (e) in (1.3) and 02 into another variance 32. Similarly in the KW situation instead of using estimators Yn of [f(xn + cn) - f(Xn - cn)], we are using estimators h(Yn) of f(Xn,cn), say. If then c-1f(x ,c ) has the same behavior as c'1(£(x + c ) - n n n n n n f(Xn - cn)) and if conditions guaranteeing the asymptotic normality are also preserved the effect of h is to change (f”(e).fm(9)) into (f”(e).§m(9)) in (1-4). and 02 into ~2 another variance 0 . ~2 In the RM situation E g2 will be minimized if Q-T——‘—‘§ (I (6)) is minimized within the class of all h's for which the asymptotic normality is preserved. Similarly in the KW situation E :2 will be minimized if ~” 2 is minimized within the class of all (f (9)) such h's which preserve the asymptotic normality. The derivatives of f at 9 can be easily determined and we obtain Two) = 15“) (9)1101) with (1,5) H(h) = g—tj‘h(t+v)c(dv)]t___0 for i = l in the RM- and i = 2,3 in the KW situation. We shall state conditions on h under which both the almost sure convergence to e and the asymptotic normality are preserved. Within the class Cv of such h's we consider the Optimal trans- formation which mdnimizes the second moment of the asymptotic dis- tribution of nB(Xn - 9). This will be shown to be equivalent to finding h E C! which maximizes H(h). This leads, under some regularity conditions, to h = -(g’/g) (or any positive constant multiple of -(g’/g)). The surprising fact is that with such an Optimal h, the stochastic approximation procedure is not only Optimal within the class of stoachstic approximation procedures considered but also, in the RM case, Xn is an asymptotically efficient estimator of 9 within the class of all regular unbiased estimators. Knowing the Optimal transformation, we show that the RM procedure as well as the KW procedure are Optimal if and only if the error random variables are normally distributed. As for pro- cedure (F), we show it is Optimal if and only if the error random variables have a double exponential distribution. One of the regularity conditions is 0 < I(g) = I(g’(v)/g(v))zG(dv) < m; if it is not satisfied, we show for some particular distribution (e.g., uniform and triangular) how one can design transformations which yield improved procedures. f(1)(9) = f(i)(e)H(h) with (1,5) H(h) = [Ell—t fh O , door-90; ffx'kk for an (unknown) number 9, and every natural number k. Further- more, there exist constants A, B such that (2.3.2) |f(x)\ SA‘x - 9| + B, for all x e R. (an):=1 is a positive sequence of numbers satisfying (203.3) 2 an = Q ’ z 82 < m 0 n=1 n=1 n 2 2 (2.3.4) EIhEVn] S c for a number a and every natural number n. 2.4 Kiefer-Wolfowitg (KW) Situation: We assume that f is a Borel measurable function on R satisfying (2.4.1) sup 1 '13 f(x) < o , 1 inf Q f(x) > o, "ka'k'i Ker-«k for an unknown number 9, and every natural number k; where 2 f(x), and D f(x) denote the lower and upper derivative, respectively, of f at x. Furthermore, there exist constants A, B such that (2.4.2) \£(x+1) - f(x)‘ < A‘x - 9| + B for all x e R . The relation (1.1) holds with positive an, cn satisfying 243 o m = ”2'2 . (- 0) Cn" 9 Ban “’3 Zancn <99 9 n=1 n=1 and the random variables Yn satisfy (2.4.4) Mn(1n) = f(Xn + en) - f(Xn - cm). and 2 2 (2.4.5) EIfiFvn] S O for a number a and every natural number n. Burkholder ((1956); Theorem 1) proved a convergence theorem which contains the convergence results for both the RM and the KW situations as special cases. The following is a rewriting of that theorem in terms of the procedure (1.2). 2.5 Almost Sure (a.s.) Convergence Theorem: Let 9 E R and 02 be a positive number. Let (1.2) hold 2 and Mn’ Sn be Borel measurable functions; Mn(Xn) = E1 [hCYn)], 2 ~ 2 n Sn(xn) = EIh[h(Yn) - Mh(Xn)] . Suppose that (2.5.1) if g > O, ‘x - 9‘ > e and n > n0(g) then (x - 9)Mn(x) > 0 ; m -1 ~ (2.5.2) if 0 < 61.< 52 < m then E ancn [ inf [‘Mn(x)‘]] = m; n-1 615\x-9|$62 \finbc)‘ (2.5.3) sup [ ]< co ; n,x 1 + ‘x‘ (2.5.4) sup 82(x) s 02 ; n,x n (2.5.5) 2 azc'z < m . nn Then (2.5.6) X a e . 2.6 Remark: Given h, the a.s. convergence of the procedure (1.2) holds if the conditions of the preceding theorem are satisfied irrespective of whether they are also satisfied for h equal to the identity. But since we are interested in the Optimal Choice of h it seems useful to investigate conditions on h, which guarantee that con- ditions of Theorem 2.5 are satisfied with this h if they are satisfied for h = the identity. Henceforth the following two assumptions will be assumed to hold. 2.7 Assumption: In the general procedure (1.2), let Yn = Mn(xn) + Vn where Vn are random variables conditionally (given In) dis- tributed according to a distribution function G which is symmetric around 0 and admits a density g. The functions Mn are Borel measurable. Here, we may remark that the requirement of symmetry is natural in the KW situation where Yn is an unbiased estimator of [f(xn + cn) - f(Xn - cn)]. The requirement of symmetry is then satisfied if the errors in estimating f(Xn + cn) and f(Xn - cm) are independent and identically distributed. 10 2.8 Assumption: We assume that h is an odd Borel measurable transformation defined on R and nonnegative on [0,m). In addition we assume that Y(t) = jh(t+v)g(v)dv = fh(v)g(v-t)dv exists for all t E R. 2.9 Lemma: Assume that (2.9.1) (1) 1im inf t‘lw(c) > o , tiO and either (ii) h is nondecreasing; or (iii) g is continuous and nonincreasing on [0,m), and h is bounded and continuous; furthermore h(v) >0 for all v>O. Then (2.5.1) and (2.5.2) hold for h if (2.5.1), (2.5.2), and (2.5.3) hold for the identity transformation. Proof: From (i) we obtain, for some positive numbers A and go, that (2.9.2) c’ly(c) 2 p0 for all 0 < t < A . If h is nondecreasing then so is Y and thus (2.9.2) implies that inf{Y(t), t > to} > O for every t0 > 0. Therefore (1) and (ii) imply that if 0 < T < T1 < m, then 0 (2.9.3) inf{Y(t); c e [T0,T1]} > o . 11 Now suppose (iii) holds; we shall prove that (2.9.3) holds in this case, too. Since h is odd and g is symmetric, Y(t) can be written in the form: on (2.9.4) Y(t) ==£h(v)[g(v-t) - g(v+t)]dv, t e R . The integrand is nonnegative for t 2 0, since h(v) 2 0 for v 2 O, by.Assumption 2.8, and g(v-t) - g(v+t) 2 0 for v 2 O and every t 2 0. The latter is obvious from (iii) if 0 s t s v; if 0 s v1< t then g(v-t) - g(v+t) = g(t-v) - g(v+t) which is again nonnegative by (iii). In particular Y(t) 2 O for t 2 0. But suppose Y(t) = 0 for some t > 0. Then, since h(v) > O for all v > O, g(v-t) - g(v+t) = 0 for almost all (Lebesgue) v 2 O. The function F(v) = g(v-t) - g(v+t), v 2 0, is then con- tinuous and thus identically zero. Moreover g is periodic with period 2t; but since g is nonincreasing for v > 0, then g = constant on R. This is a contradiction to the fact that g is a density function. Hence Y(t) > O for all t > 0. Further- more since h is bounded and continuous, then by the dominated convergence theorem Y is continuous on [T0,T1], 0 < T0 < T1 < m. But [T0,T1] is compact, then Y achieves its minimum on [T0,T1] and thus (2.9.3) holds. Since Y is odd, (2.9.3) implies Y(t) sign (t) 2 0. (2.5.1), (2.5.2) and (2.5.3) hold for M.n since (2.5.1) - (2.5.3) hold when h = the identity. fin(xn) = Y(Mn(Xn)) and thus (2.5.1) for Mn implies (2.5.1) for flu. Further (2.9.3) and (2.9.2) imply that for every T >,0 there is an n > 0 such that 12 (2.9.5) ‘Y(t)‘ 2 n\t\ for all |t\ < T . Suppose 0 < 511< 62. Then using (2.5.3) for Mn we obtain ‘Mh(x)‘ < T for some T > 0 and all ‘x‘ s 6 Thus 1. ‘Mn(x)‘ = ‘YCMh(x))‘ 2 nan(x)l for some n > o, and (2.5.2) for flu follows from (2.5.2) for Mn. Q.E.D. 2.10 Lemma: Assume there exist constants K1, K2 such that (2.10.1) ‘Y(t)‘ s Kl‘t‘ +1x2 for 211 t e R . Then (2.5.3) holds for fin if it does for Mn. Proof: Since Mn(x) = YCMn(X))a [‘Mn(x)‘] 5 Kl‘Mn(x)| +K2 1 + |x‘ 1 + ‘x‘ ’ thus (2.5.3) is satisfied since Mn satisfies (2.5.3). Q.E.D. 2.11 Remark: Concerning (2.5.4) we notice that it is satisfeid if h is a bounded transformation. We also add this remark: if h is bounded by a straight line, Mn is bounded, and G has a bounded second moment then it is easy to verify that (2.5.4) holds; further- more (2.5.3) also holds provided that Mn satisfies (2.5.3). In the following we shall show, under some additional regularity conditions, that (2.5.1) - (2.5.4) of Theorem 2.5 hold for some particular choices of h in (1.2). 13 2.12 Example: Let h(v) = sign (v), v E R. If g is continuous at 0, g(O) f 0, and (2.5.1) - (2.5.3) hold for M“, then (2.5.1) - (2.5.4) hold for Mn. 21:22.2: First Of all the sign function clearly satisfies Assumption 2.8. Conditions (2.5.3) and (2.5.4) follow from the boundedness Of h. Since h is nondecreasing, Lemma 2.9 will imply (2.5.1) and (2.5.2) if lim t-1Y(t) > 0. But from the continuity of g at 0 we obtaintlo - -10 t 1Me) = 2t j g(v)dv 2 2g(0) > o; g(O) # o. Q.E.D. -12 2.13 Example: Let h be a truncation function; that is for a positive number T0 let h(v) = v if ‘v‘ < T0 and \h(v)‘ = T0 if 4v‘ 2,T0. Assume that (2.5.1) - (2.5.3) hold for Mn and let 0 f g(v)dv > 0. Then (2.5.1) - (2.5.4) hold for fin. Again it is obvious that h satisfies Assumption 2.8, and since h is nondecreasing; Lemma 2.9 will imply (2.5.1) and (2.5.2) if we show lim inf t-1Y(t) > 0. But ttO To Y(t) = [T (t+v)g(v)dv + TOD - C(TO-t) - G(-TO-t)] ; 0 that is Y(t) = t[c(T0) - G(-T0)] + TO[G(TO+t) - G(TO-t)]. 14 Thus for t > 0, we obtain t'1\y(t) 2 G(T0) - euro) = 2G(T0) - 1 . Since 2G(T0) - l > 0, then lim inf t'1\y(t) 2 2G(T0) - 1 > o. th Concerning (2.5.3) and (2.5.4), they follow from the boundedness of h. Q.E.D. 2.14 Example: Let h and h' be bounded and fh'(v)g(v)dv > 0. In addition let g be continuous and nonincreasing on [0,«0 and h(v) > 0 for all v > 0. If M.n satisfies (2.5.1) - (2.5.3) then (2.5.1) - (2.5.4) hold for fin. 2:222: Since h is bounded, (2.5.3) and (2.5.4) hold. Also since (2.5.1) - (2.5.3) hold for Mn and (2.9.1)-(iii) is satisfied, Lemma 2.9 will imply (2.5.1) and (2.5.2) if we show that llm t'l‘Ht) 0. liocmmc of tho. hmmdodnmm of h', H fullnwn th that 1"(0) = j‘h'(v)g(v)dv > 0 which implies the required result. Q.E.D. CHAPTER 3 ASYMPTOTIC NORMALITY OF THE MODIFIED PROCEDURES In this chapter we shall use the following l-dimensional version of a theorem in Fabian (1968b). 3.1 Theorem: Let 3n be a non-decreasing sequence Of o-fields, 3n C 3. Suppose Un’ Vn, Tn, Fn, on are random variables, 00, P, Q E R, and F > 0. Suppose Fn, ‘n-l’ Vn_1 are Sn-measurable, C, a, B 6 R, and (3.1.1) I‘n-oI‘, en-oe,Tn«T, or 1&:[\Tn -T\]..o . 2 2 (3.1.2) Eynwn] - o, c > 1%;an - °o\ -» o , and with 2 2 (3.1.3) o =E[x V-\ l : jsr [‘Vj‘zzrja] J let either 2 (3.1.4) lim 0 r = 0 for every r > 0 , or M j. 1 n 2 (3.1.5) a = 1, 1im.;' 2 O r = 0 for every r > 0 . n—m j=1j’ Suppose 3+ = B if a = l, and 8+ = 0 if a # 1. (3.1.6) 0 0 ; = - ___3_ (3.2.2) B 1 2y and a > 2a0H(h) . Set 2 2 2 2 (3.2.3) 3 (t) = j[h(t+v) - 11m] g(v)dv, 80(h) = s (0); and assume that (3.2.4) the function 82 is bounded by a number 02 and is continuous at 0. In addition, let an, fin be Ih-measurable random variables, and with s = B/(Zy) if y # 0; s = 0 otherwise, assume that * )This will be satisfied if an satisfies (2.5.1)-(2.5.3), since (2.5.4) will be implied by (3.2.4) (cf. Theorem 2.5). 17 - 8 (3.2.5) CnIMn(xn) - an(xn - e) + gncn , (3.2.6) an "0 do and C11 --0 CO . e/z ' Then the asymptotic distribution of n (Xn - e) is normal with (3.2.7) mean = -2a eSH(h)go[2s aOHCh) - 53'1 and (3.2.8) variance = azc-283(h)[28 aoH(h) - B]-1 . Proof: The proof will be established by verifying the conditions of Theorem 3.1. We have (3.2.9) (Xn+1 - e) = (Xn - e) - anc;1Mn(Xn) + snot-112n ; where (3.2.10) Zn = -(h(Yn) - fln(xn)) Define the following Ih-measurable random variables (3.2.11) Hn = H(h) if Mn(Xn) = 0 , -1 ~ .. Mn (xn)Mn(xn) if Mn(Xn) # o, n — 1,2,... Then the term anc;1Mn(Xn) in (3.29) can be written as aancgan(Xn); and using (3.2.5) we obtain a e-gfi (X ) = a a H (X - e) + a csH g . nnnn nnnn nnnn Since anc: = a csnml-B/2 and a c-1 = a c-ln-g-Blz, we can 1'1 n rewrite (3.2.9) as 18 (3.2.12) (xn+1 - e) = (xn - e)(1 - a adHnn-l) +~s e'ln'(1+'3)/Zzn sn-l-B/Z '86 Hngn 0 Apply Theorem 3.1 with a = 1’ 3n = I'n’ Ian = a O[an’ Un = (xn - 9), Qn = a/c, vn = zn’ and T =‘8 CSHg o n n n Now cngn(Xn) a 0 by (3.2.5) since Xn a e, and this with (3.2.1) implies that Hn a H(h). Thus 3 I‘In - a a0H(h), Tn —. -a c H(h)g0, en -. e = a/c . Also from (3.2.3) and the continuity of S2 at 0 we obtain E [Z]=SZ 3. Hence the con- clusion of the theorem follows by Theorem 3.1. Q.E.D. 3.3 Remark: Because of our interest in the question of Optimality of the transformation h in the modified procedure (1.2), the pre- ceding theorem is stated in terms of the behavior of Mn’ (see conditions (3.2.5) and (3.2.6)), with sufficient conditions on h in order to guarantee the asymptotic normality of (1.2). But the theorem can also be applied to the modified procedure directly because this procedure can be written as the original procedure 20 with a change of the meaning of Yn's and with h equal to the identity transformation. 3.4 Remark: A sufficient condition for (3.2.4) is that (h(t+v) - ‘l’(t))2 are uniformly integrable for t in a neighborhood of O. 3.5 Theorem: (KW situation) Let f, Yn satisfy the conditions Of the KW situation in §2.4 with possibly the exception of (2.4.5). Let f” exist and be continuous in a neighborhood Of 9 with f”(e) = M > 0. Con- sider the modified procedure with a transformation h which is continuous a.e. with respect to G; further assume that (2.5.1) - (2.5.3) are satisfied for Mn if they are for Mn' Let Y' exiSt at 0 with Y'(0) = H(h) > 0 and (3.5.1) a = c = n 9 2. n nY elm ,y=4l,a>[8MH(h)]-1. In addition let 82(t) 2 by a constant a and continuous at 0. Then Xn a e and the % j[h(t+v) - Y(t)]2g(v)dv, t E R, be bounded asymptotic distribution of n (Xn - G) is normal with (3.5.2) mean = 0, and variance = a2c-283(h)[4M a H(h) - %]-l . Proof: It is known (cf. Burkholder (1956)) and can be easily shown that ‘Mn(Xn) = f(Xn + cn) - f(Xn - en) satisfy conditions (2.5.1) - (2.5.3) and thus by Theorem 2.5 we conclude that Km 4 9, since donditions (2.5.4) and (2.5.5) are satisfied, too. The rest of the conclusion of the theorem follows simply by verifying the 21 conditions of Theorem 3.2. The only conditions of Theorem 3.2 which are not directly assumed in our theorem.are (3.2.5), (3.2.6) and (3.2.2). Let A(x,cn) = f(x+cn) - f(x-cfi), x E R. Let I = [e - e, e +‘e] be an interval on which f” exists and is con- t inuous . Define (3.5.3) cp(x) = (x-e)'lt' (x) - f”(e) for x e 1 with (9(9) = o, 0 otherwise. Then by eXpanding A(x,cn) as a function of cn and substituting for f’(x) from (3.5.3), it follows that (3.5.4) c;1A(x,cn) = 2(x-9)[f”(9) + (900] + 'n(x,cn)cn where (p(x) -0 as x-oe and Tl(x,cn) -+0 if cn-oo and x a 6. Obviously p and fl(-,cn), the latter as defined by (3.5.4), are Borel measurable functions. Then cnan(xn) O’n(xn - 9) + gncn with In-measurable an 2[f”(9) + qun)] and flu = “(xn’cn); further an a 2f”(e) = 2M and 5n a 0. Thus (3.2.5) and (3.2.6) hold with do = 2M, Q0 = 0 and s = 1, since 5 = %. Condition (3.2.2) then follows from (3.5.1). Hence conditions of Theorem 3.2 are satisfied. This completes the proof. Q.EJD. 3.6 Theorem: (KW situation) In Theorem 3.5 let (3.5.1) be replaced by a __c__ __1. -1 (3.6.1) an = n’ c — nY , y - 6 and a > [6M H(h)] . 22 Moreover let f” exist and be continuous in a neighborhood Of 9. Then Xn a 9 and the asymptotic distribution of n1/3(Xn - e) is normal with (3.6.2) mean = -(2/3)ac2H(h)f”’(e)[4M a H(h) - g-j'l, and variance = a2c-233(h)[4M a H(h) - $3-1 . 2.2M: It is again easy to conclude, by Theorem 2.5, that Xn a 9, and then it remains to verify (3.2.5), (3.2.6) and (3.2.2) of Theorem 3.2 in order to complete the proof Of the theorem. Let A(x,cn) = f(x+cn) - f(x-cn), x E R. Let I = [e - e. e + e] be an interval on which f” exists and is continuous. Similarly as in the proof of the preceding theorem and with the same p, we obtain that - 2 (3.6.3) enanOin) - sum“ - e) + gnen II .. 1 I” . with Ih-measurable ah = 2[f (e) + h(xn)] and g“ - §{f (9) + H(chn)], further an « 2f”(9) = 2M and £0 = %’fm(e). Thus (3.2.5) and (3.2.6) hold with so = 2M, Co = % r”'(e), B =% and s = 2. (Ion- dition (3.2.2) follows from (3.6.1). Hence conditions of Theorem 3.2 are satisfied. This completes the proof. Q.E.D. The following theorem is presented here to cover a case treated in Albert and Gardner (1967) and the results below will be used later in Chapter 5 of this paper. 3.7 Theorem: Let fn be a sequence of Borel measurable functions defined on R such that (2.5.1) - (2.5.3) are satisfied for Mn = fn with 23 en = l and fn(e) = 0. Let d > 0 and (3.7.1) Dn(x) = (x-e)'1tn(x) if x # e, d if x = e; n = 1,2,... be continuously convergent at e to d. Consider the modified procedure (1.2) with a transformation h which is continuous a.e. with respect to G, and let h be such that (2.5.1) - (2.5.3) are satisfied for Mn if they are for Mn. Also let Y' exist at 0, Y'(0) = H(h) > 0 and (3.7.2) a = -’, a > [2d H(h)]‘l. n I[h(t+v) - Y(t)]2g(v)dv, t E R, be bounded In addition let 82(t) 2 by a constant a and continuous at 0. Then Xn a e and the ’5 (X - e) is normal with asymptotic distribution of n n (3.7.3) mean = 0 and variance = a233(h)[2ad H(h) - 1]-1 . 2:222: Under the given conditions we obtain, by applying Theorem 2.5, that Xn d e. To obtain the rest of the conclusion of the theorem, we apply Theorem 3.2 for which we need only to verify (3.2.5), (3.2.6) and (3.2.2). We have Mam“) = fnmn) = Dn (xn - 9) Since v = 0, then B = l and s = 0. Thus (3.2.5) and (3.2.6) hold with Ih-measurable an = Dn(Xn) and gn = g0 = 0; further d-d 11‘” an a 0 - , since ( n)n=1 to d. Condition (3.2.2) follows from (3.7.2). Hence the is continuously convergent at e 24 conclusion follows by Theorem 3.2. Q.E.D. 3.8 Theorem: (RM situation) Let f, Yn satisfy the conditions of the RM.situation in §2.3 with possibly the exception of (2.3.4); further let f’ exist at 6 with f’(e) > 0. Consider the modified procedure (1.2) with a transformation h which is continuous a.e. with respect to G and for which (2.5.1) - (2.5.3) are satisfied for Mn if they are for Mn' Also let Y' exist at 0, Y'(0) = H(h) > 0 and (3.8.1) c = 1, a n n =§ , a > [212' (emmn'1 . 2 In addition let S (t) = ([h(t+v) - Y(t)]2g(v)dv, t E R, be bounded by a constant 02 and continuous at 0. Then Xu 4 9 and the asymptotic distribution of n%(Xn - e) is normal with (3.8.2) mean - 0 and variance = a233(h)[2a f’(e)H(h) - 1]-1 . grate: It is known (cf. Burkholder (1956)) that Mn satisfies (2.5.1) - (2.5.3). In Theorem 3.7 let fn = f, then it follows that conditions of Theorem 3.7 are satisfied with d = f’(e) > 0. Hence the conclusion of the theorem follows by Theorem 3.7. Q.E.D. CHAPTER 4 SPECIAL CASES AND RESULTS 4.1 Introduction: In this chapter we show that the a.s. convergence (cf. Fabian (1960)) and the asymptotic normality of procedure (F) follow as a special case of the corresponding results for the modified procedure. Then we give the Optimal choice of a and c for the modified procedure in both the KW and the RM situation. Also in this chapter we point out the effect of taking more observations at each stage. Throughout the rest of this paper, we write N(p1,p§) to denote a normal random variable with mean = p1 and variance = pg, :4 and we also write Tn g if Tn is .asymptotically g-distributed. 4.2 The a.s. convergence and the asymptotic normality 9£_procedure (F): In §4.3, §4.4 and §4.5 we consider the modified procedure (1.2) with the choice h = the sign transformation, and we assume that g is continuous at 0 and g(O) i 0. Hence H(h) = 2g(0) > 0; furthermore, the sign function is continuous except at 0 and 32(t) = l - Y2(t) < m for all t E R, Since Y is continuous at 0, then so is 82 with 83(h) = 1. ‘Moreover if (2.5.1) - (2.5.3) are satisfied for Mn’ then (see Example 2.12) fin satisfy 25 26 (2.5.1) - (2.5.4). Thus the statements in Results 4.3, 4.4, and 4.5 follow from Theorems 3.5, 3.6, and 3.8, respectively. 4.3 Result: (KW situation) Let f, Yn satisfy the conditions of the KW situation in §2.4 with possibly the exception of (2.4.5). Let f“ exist and be continuous in a neighborhood of 9 with f”(e) = M >.0. Let (43-1) 8,, = in en = 3;. v = 1/4, and a > [16M g(on'l, n Then Xu 4 e and t .2 2 -2 -1 (4.3.2) n (xn - e) 4 N(0, a c [8Ma g(O) - g] ), 4.4 Result: (KW situation) In Result 4.3 let (4.3.1) be replaced by a c _ -1 (4.4.1) a = :3 c = -;, v - 1/6 and a > [12M g(0)] . n In addition let f” exist and be continuous in a neighborhood of 6. Then Xn a e and 2 I” 2 -2 (4.4.2) 1/3(X _ 91:15 Sit/3%? C 2(0)f (6) a c n n a N(- 8M8 8(0) - 2/3 ’ 8Ma g(O) - 2/3 )° 4.5 Result: (RM situation) Let f, Yn satisfy the conditions Of the RM situation in §2.3 with possibly the exception of (2.3.4). In addition let f’ exist at e, f’(e) > 0 and (4.5.1) cn = 1, an = é, a > [4f'(e)g(o)]'1 . :3 27 Then X a e and n (4.5.2) nk n - 9) 211(0, a2[4f’ (e)a g(0) - 11'1) . 4.6 Remark: With the choice h = the identity transformation, the reader may easily check that the a.s. convergence and the asymptotic normality of the original RM and KW procedures follow as special cases of the corresponding results for the modified procedure pro- vided Assumption 2.7 holds and Iv2g(v)dV«< m. 4.7 The optimal choice pf (a,c) i the Kfl_situation: Let g be a normal random variable with mean and variance given by (3.6.2). Then 8282 (h) -2 (4.7.1) E «La/9)€12 H sz’fl‘e) e4 + (4Ma H(h) - 2/3)2 (4Ma H(h) - 2/3) We shall find the Optimal values Of (a,c), i.e. values for which E g2 is minimized with other quantities being fixed. We assume that fm(e) # 0. TO find the value c(a) of c > 0 that minimizes the R.H.S. of (4.7.1) for a fixed a > [6M H(h)]-l, let 8283(h) ‘4Ma H(h) - 2/3 ’ B = (4/9)a2u2(h)t”'2(a) (4Ma H(h) - 2/3)2 Y(c) = A c"2 +-B c4 . A = and Differentiating Y with reSpect to c we get Y'(c) = -2A c-3 +’4B c3 . Thus [c(a)]6 = gg'; that is 28 g_(4Ma H(h) - 2/3) (4.7.2) [c(a)]6 = 2 2 H (h>f” (e) 2 So(h) . on With c = c(a), (4.7.1) becomes 2 1/3 a [33(h)]2/3 (4M8 H(h) - 2/3) (4.7.3) E t2 = [3H2f’”21 473 - Now the R.H.S. is minimized by the choice 1 (4.7.5) a = 2M_H(h) , which can be easily verified. Therefore the Optimal choice of (a,c) is given by 1 (4-7-6) (me) = (m)- . [g-f”’2(e)[S(2,(h)/H2(h)]]1/6)- With this Optimal choice of (a,c), (4.7.1) becomes 1/3 (4.7.7) E g2 = (3/16M2)[(9/4)fm2(9)] [SS(h)/H2(h)]2/3 . 4.8 The optimal choice pf a i the RM.situation: Let g be a normal random variable with mean and variance given by (3.8.2), then 2 2 2 _ a 30(h) (4.8.1) E g 2a f’(e)H(h) - 1 * The optimal choice of a, say a , is the value of a which minimizes E g2 with other quantities fixed. Thus one can easily check that * 1 (4.8.2) a = fr(6)H(h)- * With a = a , (4.8.1) becomes 29 (4.8.3) E g2 = S§(h)/(f’2(e)H2(h)) . 4.9 Remark: We notice that the unpleasant feature of the optimal values of a and c is that they depend on values, (f’(9),f”(e), fm(e)), which are, in general, unknown. But the value of a in the RM situation and the value of (a,c) in the KW situation can be estimated during the approximation process and fed back into the procedure. For the original RM procedure, Venter (1967) used a pro- cedure, (later generalized by Fabian (1968b)), which estimates the Optimal value of a. Recently in Fabian (1971) a procedure was described which itself estimates the optimal value of a for a modified version Of the original KW procedure. The same ideas can be used to obtain a procedure which estimates the Optimal value of a or both (a,c) for the modified procedure. 4.10 Effect p§_taking m Observations §£_each stage: Suppose that an experimenter observes m random variables Y Y instead of one, Yn’ at stage n such that these n,1’°"’ n,m m random variables are conditionally, given In, independently distributed according to G. Suppose he then uses pl" glhan’j) instead of h(Yn) in the modified procedure (1.2). Tge condi- tional expectation of the average will be the same as that Of h(Yn) and the conditional variance will only be changed by a factor of (l/uo and it is easy to see, under the conditions of Theorem 3.2, that this will result in changing the variance of the 30 asymptotic distribution by the factor (l/m). Thus, in the KW situation, under the conditions of Theorem 3.6 we obtain 2 2 2 2 a c 30(1)) 1/3 at 12/3)a c #1911191) (4'10” “ (xn ' 9) " N" 411a H(h) - 2/3 ’ m(4Ma H(h) - 2/3)) ° 0n the other hand if the experimenter wishes to continue the modified procedure for nm stages rather than using averages, then by Theorem 3.6 we have 1/3 ,5 (2/32a c2f’”(§)HQ!) “2 c2320”) (4'10'2) (mu) (xnm ' 9) I N(' 4Ma H(h) - 2/3 ’ 4Ma H(h) - 2/3 ) ° Let fm(9) # 0. Then in (4.10.2) the optimal choice of (a,c) (see (4.7.6)), is given by 82 0(h) (4.10.3) 0 we must have K > 0 and to satisfy 83(h) = l we have to have K = %5 This completes the proof. Q.E.D. 5.3 Remark: A sufficient condition for H(h) to be equal to fh(v)(-g'(v))dv is that for a certain neighborhood, N, of 0 the family {h(x) gsgigéifileq t E N-{0}} be uniformly integrable with respect to G. 5.4 Definition: Suppose that 0 < I(g) < m, h* = - %’(g'/g) a.e. with respect to G, and h* 6 NZ In addition suppose that the modified procedure is used with h = h* for which Km 4 e and nB/ZOKn - 9) has asymptotic distribution as given in Theorem 3.2. Then we shall * call h the Optimal transforpption. 5.5 Remark: An optimal transformation h has the following property. For any competitor h €.fl' to which Theorem.3.2 also applies, I'II‘IIII'IIIIilliA 1" All 35 the second moment of the asymptotic distribution is larger than * * that for h unless h = h a.e. with respect to G. A modified procedure using an Optimal transformation will be called optimal procedure. 5.6 Asygptotic efficiency 2; optimal stochastic approximation procedpres; the RM,situation: The surprising fact is that, in the RM situation, the optimal stochastic approximation procedures are not only optimal within the class Of approximation procedures considered but also they are asymptotically efficient within the general class of regular unbiased estimators of e, the parameter to be estimated. This is true despite the very simple recurrence relation that generates the approximation sequence Xn. In more detail, we show that the variance of the asymptotic distribution of nk(xn - 6) corresponds to the Cramer-Rao lower bound for the variance of an unbiased estimator based on the first n observations. We shall consider the following two cases: (i) a straight line case; f(x) = d(x-9) with d > 0 known, (ii) a case of sequence of functions fn(e) which are known except for 9. This second case was considered and studied in Albert and Gardner (1967). 5.7 The straight line case: Let (5.7.1) Yn = f(Xn) + Vn’ n = 1,2,... 36 where f(x) I d(x-e) with 9 unknown but assumed to belong to some interval, @; 6 positive and known. In addition assume that Vn =‘Yn - f(Xn) are independent and distributed according to G which satisfies the conditions stated in Theorem 5.2. The information contained in [Y1,Y2,...,Yn] is the same as that contained in [21,ZZ,...,Zn], where 23 =‘Yj + dxj, j = 1,2,...,n. Then the Cramer-Rao lower bound for the variance of any unbiased regular estimator of 9, based on the first n observations, is given (cf. Theorem 4.1.1 of Zacks (1971), p. 186) -l by In - n-1(dF)-2. Thus the asymptotic efficiency of Xn is 1, since n80!n - 8) 3-€'N(0,(<11")-2)« 5.8 A.case _£.g seguence p£_functions: Let (5.8.1) Yn = £n(e) + Vn , n = 1,2,... be observations on known functions fn except for 9 which is assumed to belong to some interval, @. Let the error random variables Vn = Yn — fn(e) be independent and distributed accord- ing to G which satisfies the conditions of Theorem 5.2. Further- more for each n let fn have the same unique root 9 E O, fn exist at e and f;(e) 2 d where d is positive and known. Also let fn satisfy the conditions stated in Theorem 3.7. Albert and Gardner have shown (cf. Theorem 5.2 of Albert and Gardner (1967), p. 68) that in this case Xn will be asymptotically efficient among regular unbiased estimators of e ’5 if the variance Of the asymptotic distribution of n (Xn - e) is 37 (dI‘)-2 which is true for our Xn generated by the Optimal pro- cedure. Hence our Optimal procedure is asymptotically efficient within the class of regular unbiased estimators Of 6. Albert and Gardner (1967; see Chapter 5 there) tried to increase the efficiency of the RM type procedure which they used in their monograph by making transformation of the parameter space @. Their procedure stayed less efficient except when the error random variables are normally distributed. 5.9 Theorem: The KW procedure as well as the RM procedure are optimal if and only if the error random variables are normally distributed. treat: The procedure is optimal if and only if (-g'/g)(v) = Cv with a constant C > 0, and this is true if and only if G is a normal distribution. Q.E.D. 5.10 Theorem: Procedure (F) is Optimal if and only if the error random variables have a double exponential distribution. 29.292: Procedure (F) is optimal if and only if (-g'/g)(v) = C sign (v) with a constant C > 0 and this is true if and only if G is a double exponential distribution. Q.E.D. 5.11 S exagples‘_§,ppg_optimal procedures: In the following we give four examples of new optimal procedures which are different from the original RM and KW 38 procedures. The first three examples fall under the case of Example 2.14. 5.12 Exapple: Let G have a Cauchy density function 1 v E R . l+v (Recall that random errors with this density function are not allowed by the RM procedure as well as the KW procedure, since both of them require the existence of the second moment of C.) One can check that P2 = 5/2. Let 1 . k v h1(v) = - f'g (v)/g(v) = 2(2/5) --§-, v E R. l+v This is an odd bounded monotone transformation with a bounded first derivative. Hence one can easily check that h1 6 W3 and h1 maximizes H(h). Furthermore h1 preserves the a.s. con- vergence (see Example 2.14) and it satisfies the conditions of Theorem 3.2. Thus h1 is an optimal transformation. 5.13 Exagple: Let C have a Student's t-density function given by .4 _1_ Pawn/2) 2 '(1‘1'V)/2 g(t) f3“,- F(\)/2) (1 + t IV) , t E R, V > 0, where v here stands for the degrees of freedom. With some manipulation one can check that 1 t h (t) = - - (g/(t)/s(t)) = C . t e R. 2 I‘ v V1132 where Cv is a positive constant for which 39 m 2 c 2 C (--) g(t)dt = 1 . L V \rl'tz Again this transformtion, 112’ has the same properties as h1 in §5.12, above. Hence h2 is an optimal transformation. 5 .14 Exar_n_ple: Let C have a logistic density function given by: 1 g(v) ’ 2(1 + Eosh(v)) ’ V E R' We leave it to the reader to check (see §2.l4) that the optimal transformation is given by s inh (v) 113“) = C3 l+cosh (v) SVERS a where C3 = 2 . 5 .15 Exagple: Let G have a density function given by: 2 g(v) = % e”v /2 if ‘v‘ < T 0 e-T‘v|+(T2/2) f2}? where Co and T are positive constants. , if \v‘ 2 T , This g behaves like a normal density for small v, and then like a double exponential for large v. It follows that A 4Tsign(v) if M 2T, - I1: (g'(v)/g(v)) .. c v if M < T C 40 where C4 also depends on T. Denoting this transformation by h4 we see that h4 is an odd, bounded and nondecreasing trans- formation which satisfies Assumption 2.8. Thus h4 6 fl' and h4 - - %'g'/g a.e. maximizes H(h). Also h4 preserves the a.s. convergence (see Lemmas 2.9 and 2.10) and it satisfies the con- ditions of Theorem 3.2. Hence h4 is an optimal transformation. 5.16 'Modified procedures pypmeans p£_ppitable transformations: In Theorem 5.2 we characterized the Optimal transformation for a given density function g which has a derivative a.e. with respect to G and for which 0 < 1(8) = f(s'(V)/g(V))sz(V) < 0°. What about those distributions for which I(g) = 0 or ad For example, if g is a constant symmetric density function then g' - 0 a.e. and thus I(g) = 0. Also if g' itself is con- stant we have I(g) = a (let g be triangular on (-l,l)). In the following we give two cases to show how to design transformations that yield better behaved procedures; but a unique optimal transformation does not exist in these cases. 5.17 A_case in which g' 8 0 a.e. (G): Let G be the uniform distribution on (-b,b), b > 0; that is g(v) = 1/-"-b . -b 1, and consider a transformation of the form: hr(v) I ,/2r+l \v/b‘rsign(v) , ‘v‘ < b [214-1 sign(v) , ‘v‘ 2 b . This is a bounded nondecreasing transformation which satisfies Assumption 2.8 and one can check (see Lemmas 2.9 and 2.10) that this transformation preserves the a.s. convergence Of the pro- cedure (1.2) as well as its asymptotic normality. Let f, an be as in Theorem 3.8 with a > [2f’(e)(2r+l)%]-1. Then for the RM procedure, by using (4.8.3) with the choice h = the identity we obtain 1 92 f’2(9) 3 while with the choice h = hr’ for r > 1, (4.8.3) gives (5.17.1) E g2 = l b2 f'2(9) (29+1) (5.17.2) E g2 = Therefore we see that the ratio Of (5.17.1) to (5.17.2) is always larger than 1 for r > 1. Taking, e.g. r = 10, will result in decreasing E g2 by the factor %‘. 5.18 A case i which g' = constant a.e. (G): In particular we consider the density g(v) = 1 -\v\ , -l [2f’(9)H(hr)]-1, then using (4.8.3) with the choice h = hr’ it follows that (5.18.2) E g2 = -—% (1") (9) “2“) ' while with the choice h = the identity (4.8.3) gives l ,2 (5.18.3) E g2 .1. r (9)6 To see the effect of this transformation in the KW situation, let f, an, c satisfy the conditions stated in Theorem 3.6 with n a > [6M H(hr)]-1' Then with the optimal choice of (a,c) provided that fm(9) * 0, it follows (see §4.7), with the choice h = hr’ that 2 3 fl 2 3 (5.19.4) E F. - -- [(9/4)f’18)]1/3[1—'——J ’ 2 4(2-r) 16M while for the KW procedure (h = the identity) we have (5.18.5) E g2 = -——-[(9/4)£MT9)]1/3[:]2/3 16M2 Thus we are able to make the ratio of (5.18.5) to (5.18.4) (or (5.18.3) to (5.18.2)) as large as we wish by choosing r close to 1. CHAPTER 6 ASYMPTOTIC RELATIVE EFFICIENCY OF THE MODIFIED PROCEDURES In Chapter 3 we have shown, under some regularity conditions, 2 Bl (Xn - 9) 34 § where g is normally distributed that, for a B > O, n with certain mean and variance. As a reasonable measure of com- parison between different procedures we use E 52 either with a and c chosen optimally or with c fixed and a chosen to minimize E g2. Our results in Chapters 3, 4 and 5 make it possible to compare different transformations, and we do so for the identity transformation and the sign transformation. We also compare the sign transformation to the Optimal transformation. To the knowledge of the present author there have not been any study of the asymptotic relative efficiency (A.R.E.) of the RM and KW procedures relative to procedure (F) except some simulation study by Springer (1969). 6.1 Definition g£_the asymptotic relative efficiency (A.R.E.); Let 0% and 9% denote two modified procedures with 1 2 e/Z 1. transformations h1 and h2 such that n (Xn - e) a gh and l nB/2(X - e) 14+; , respectively. Then the A.R.E. of 0 “ h2 h1 relative to 9k is taken to be 2 43 44 2 E[§h2] e(9 :9 )=‘——2—' h1 h2 E[§h1] 9 is called more efficient than 9 if e(9 ;9 ) > l. h1 h2 h1 h2 6.2 Comparison 9; some transformations: In Table I, we list, for four different cases, the values of the second moment of the asymptotic distribution for the choices h = the identity, h = sign and h = the optimal transformation (see Definition 5.4). Also we list the A.R.E. of h = sign relative to h = the identity, and the A.R.E. of h = sign relative to the optimal transformation. In cases (1) and (2) g is assumed to be continuous at O with g(O) # 0 and 02 = Iv2g(v)dv < a; further in cases (3) and (4) g is assumed to satisfy the conditions of Theorem 5.2 and to be continuous at 0 with g(O) # 0. Moreover in cases (1) and (3) let f, satisfy the conditions of the RM situation in , and in addition let f’ exist at e and :Im a :3 §2.3 with a = n f’(e) > 0. Then with h = the identity (provided a > [2f’(e)]-1) * - and also with h = sign ) (provided a > [4f’(e)g(0)] 1), con- ditions of Theorem 3.8 are satisfied. In cases (2) and (4) let f, Yn satisfy the conditions of the KW situation in §2.4 with a = E; c = £—-, y = 1/6 and in addition let f“ exist and be n n n nY 7 continuous in a neighborhood of 9, and f”(e) = M > 0. Then - * with h = the identity (provided a > [6M] 1) and h = sign ) * ) See §4.2 and Results 4.4 and 4.5. 45 .Na can H s mooHoSo «no you muasmmu wchcoammuuoo onu mo :oHuwaschme uoouwv mp vmusaaoo mum mosao> ammo mHSu :H .s.m coHuHaHHua «mm H« Ask N.H H N a Na NEH H as Nws NEH N\Nfl~owNu¢g u A s. new N\N HaszHN\mvT m\N HavsmHN\NHu m Hey Np a fiH: Neva m pawv up onv wva new Amv . NoHNus H- N N. H- N N. h m H N H8 Nws NEH a NEH .O I m 0 O m. l NoHovNNs u HHemNeVo :fiHov w NzNosNH NzNos\No MHW-HNV .N Nofiovas a HH a. new H-fionNwHovN.mqu HmvN.m\No HHV Aw}va u I q: NS H: As Huaauno swam zuflucmvH .u.e.< ammo cowusnauumav oHuougahmm «no mo uamfioa caooom onu mo mo=Hm> H MAQ [12M.g(0)]-1), conditions of Theorem 3.6 are satisfied. In case (2)-(1) f”(e) = o, and in case (2)-(11) f”(e) ¢ 0. Optimal constants a,c are chosen except in case (2)-(i) where c is fixed. Thus the values of the second moment of the asymptotic distribution are obtained from (4.8.3) in cases (1) and (3) and from (4.7.7) in cases (2)-(ii) and (4). 6.3 Remark: From case (3) in Table I, the A.R.E. of procedure (F) relative to the optimal procedure is given by e(h2;h0) = 4g2(0)/I(g'2(v)/g(v))dv; furthermore if c is fixed and h0 = - %'(g'/g) is an optimal transformation, then from case 2 in Table I we also obtain that e(h2;h0) = 4g2(0)/I(g'2(v)/g(v))dv. Recently Groeneveld (1971) has shown that the A.R.E. of the sequential sign test relative to the sequential probability ratio (SPR) is also given by 4g2(0)/I(g'2(v)/g(v))dv. This value is also the A.R.E. of the sign test relative to the most powerful test for with 9 testing H: e = 90 vs. H1: 9 = e > 90, (where observa- 1 1 tions are drawn from G with a symmetric density function g(x-B), x 6 R), if competing tests are considered based on a fixed sample size (cf. Héjek and gidak (1967), Chapter 7). 6.4 Computations of the A.R.E. Qf_certain transformations relative £g_the optimal procedure for some given known distributions of the error random variables: In Table II, for the sake of easy computations (eSpecially in the KW situation), we take the A.R.E. of a modified procedure 47 * 0h relative to the Optimal procedure 9 to be cow") = [foo/(for(v>/g2gdv>) . TABLE II a --.. _ ...- “"“7 Value of A.R.E. 2 2 [H (h)/(j'(g'(V)/8(V)) 8(V)dV)] \ Transformation . h (v) = v h (v) = sign(v) e(h ;h ) Distribu- 1 2 2 1 tion of (RM or Kw (Procedure (F)) errors procedure) Normal 1 a Z n 11 Double exponential -%- l 2 Uniform Unde f ined Unde f ine d 31 Triangular Undefined Undefined €- Cauchy Undefined '8—2 = .810 Undefined ‘1 2 . . 9 n Logistic "—2 .75 3— 4n REFERENCES 10. ll. 12. 13. REFERENCES Albert, Arthur E. and Gardner, LeLand.A. (1967): Stochastic Approximation and Nonlinear Regression, Research Monograph No. 42, The M.I.T. Press, Cambridge, Massachusetts. Blum, J.R. (1954): Approximation Methods which Converge with Probability One, Ann. Math. Statist., 22, pp. 737-744. Burkholder, D.L. (1956): On a Class of Stochastic Approxima- tion Processes, Ann. Math. Statist., 21, pp. 1044-1059. Chung, Kai Lai (1968): A Course in Probability Theory, Harcourt, Brace and World, Inc. (1954): On a Stochastic Approximation Method, Ann. Math. Statist., 51, 3, pp. 463-483. Derman, C. and Sacks, J. (1954): On Dvoretzky's Stochastic Approximation Theorem,.Ann.'Math. Statist., 39, 2, pp. 601-606. Dubins, Lester E. and Freedman, David (1965): A Sharper Form of the Borel-Cantelli Lemma, and the Strong Law, Ann. Math. Statist., 363 3, pp. 800-818. DupaE, V. (1957): On Kiefer-Wolfowitz Approximation Method, Casopis Pest. Mat. 82, pp. 47-75. Dvoretzky, Aryeh (1956): On Stochastic Approximation, Pro- ceedings of Third Berkeley Symposium on Math. Stat. and Probability, Vol. 1, pp. 39-59, University of California Press. Fabian, V. (1960): Stochastic Approximation Methods, Czech. Math. Journal 10, pp. 123-159. (1964): A New One-dimensional Stochastic Approxima- tion Method for Finding a Local Minimum of a Function, Trans. Third Prague Conf. Information Theory, Statistical Decision Functions, Random Processes, CzechoslovakHAcademy of Sciences, Prague, 85-105. (1967): Stochastic Approximation of Minima with Improved Asymptotic Speed, Ann. Math. Statist., 28, 1, pp. 191-200. (1968a): 0n the Choice of Design in Stochastic Approximation Methods, Ann. Math. Statist., 32, 2, pp. 457-465. 48 14. 15. 16. l7. l8. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 49 Fabian, V. (1968b): On.Asymptotic'Normality in Stochastic Approximation, Ann. Math. Statist., 32, 4, pp. 1327-1332. (1971): Stochastic Approximation, Proc. Symp. on Optimdzing Methods in Statistics, Ohio State University, June 1971 (to appear). RM-280, Michigan State University. Graves, L.M. (1956): The Theory of Functions of Real Vari- ab les , MoGraw-Hi ll . Groeneveld, RJA. (1971): A Note on the Sequential Sign Test, The American Statistician, 33, 2. Hajek, J. and Sidak, Z. (1967): Theory of Rank Tests, Academic Press. Hodges, J.L., Jr. and Lehmann, E.L. (1956): Two Approximations to the Robbins-Monro Process, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles, University of California Press, 3, pp. 95-104. Kesten, Harry (1958): Accelerated Stochastic Approximation, Ann. Math. Statist., 32, pp. 41-59. Kiefer, J. and Wolfowitz, J. (1952): Stochastic Estimation of the Maximum of a Regression Function, Ann. Math. Statist., 33, pp. 462-466. Lo‘eve, Michel (1963): Probability Theory, Third Edition, D. van Nostrand Company, Inc. Lukacs, Eugene (1968): Stochastic Convergence, D.C. Heath and Company. Pontryagin, L.S. et al (1963): The Mathematical Theory of Optimal Process, Interscience Publishers, a division of John Wiley & Sons, Inc. Rao, C.R. (1965): Linear Statistical Inference and its Applications, John Wiley & Sons, Inc. Robbins, H. and Monro, S. (1951): A Stochastic Approximation Method, Ann. Math. Statist., 33, pp. 400-407. Rudin, Walter (1966): Real and Complex Analysis, McGraw-Hill Book.Company. Springer, B.G.F. (1969): Numerical Optimization in the Pre- sence of Random Variability. The Single Factor Case, Biometrika, 33, pp. 65-74. 29. 30. 31. 50 Ventor, J.R. (1967): An Extension of the Robbins-Monro Pro- cedure, Ann. Math. Statist., 33, 1, pp. 181-190. Wolfowitz, J. (1956): On Stochastic Approximation Methods, Ann. Math. Statist., 31, pp. 1151-1155. Zacks, S. (1971): The Theory of Statistical Inference, John Wiley and Sons, Inc.