.7

 

 

TRANSFORMATION OF OBSERVATIONS m
STOCHASTIC APPROXIMATION ,

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
SAM! NAGUiB ABDELHAMID
1921

1|llllll’lllflllﬂllll Em

ﬂ _ MiCbI'335. State
Unive it)!

Mm‘

This is to certify that the

thesis entitled

TRANSFORMATION OF OBSERVATIONS IN 1
STOCHASTIC APPROXIMATION |
i

presented by L

Sami Naguib Ab de lhamid l

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degreein Statistics and
Probability 1

 

 

\v’doiw "ﬁlvh‘mt

Major professor

Date October 11, 1971

 

0-7639

 

 

{we 2 b 19:?

1 "3 “
H‘s ‘- d

 

ABSTRACT
TRANSFORMATION OF OBSERVATIONS IN STOCHASTIC APPROXIMATION
By

Sand Naguib Abdelhamid

Let X1 be a random variable. We consider the following

general stochastic approximation procedure:

(1) xn+1 = xn - ancnlhan) , n = 1,2,...

where Yn are random variables, an and cn are positive numbers,
and h is a Borel measurable transformation. With the choice
h = the identity, (1) includes both the Robbins-Monro (RM) pro-
cedure and the Kiefer-Wolfowitz (KW) procedure. Fabian (1960,
1964) considered (1) with h = sign; we shall call (1) with
h = sign procedure (F).

We study the asymptotic preperties, the a.s. convergence
and the asymptotic normality, of this prOposed generalized procedure
under some mild requirements on h and on the random variables

V

:1 (Yu - Mnu'n) with Mnu’n) = E1 [Yn] where In == [X1,X2,...,Xn].
We confine our analysis to the case Share Vn are conditionally,
given I“, distributed according to a distribution function G

which is symmetric around 0 and admits a density g. It is shown
that nB(Xn - e) is asymptotically distributed as a normal random

variable g; 9 is the parameter to be approximated. The effect of

using h in (l) is pointed out for the RM.and the KW situations.

Sami Nagu ib Abde lhamid

We consider a transformation h Optimal if it minimizes
E g2 and we show, under some regularity conditions, that h is
Optimal if and only if it is equal to -C(g'/g) (a.e. with respect
to G) for a C > 0. With such an optimal transformation h, the
surprising result, despite the very simple recurrence relation in
(1), is that our optimal procedure is not only Optimal within the
class of stochastic approximation procedures considered but also
it is an asymptotically efficient estimator within the general
class of regular unbiased estimators. We also show that the RM
procedure as well as the KW procedure are Optimal if and only if
the error random variables are normally distributed. As for pro-
cedure (F) we show it is Optimal if and only if the error random
variables have a double exponential distribution. For distributions
which do not satisfy the regularity conditions, we show how one
can design transformations that yield improved procedures.

Our results make it possible to study the asymptotic
relative efficiency (A.R.E.) for different choices of h, and in
particular we show that the A.R.E. of procedure (F) relative to
the optimal procedure is the same as the A.R.E. of the sequential
sign test relative to the sequential probability ratio test (SPRT)

(Cf- Groeneve 1d (1971)) .

TRANSFORMATION OF OBSERVATIONS IN STOCHASTIC APPROXIMATION

By

Sand.Naguib Abdelhamid

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree Of
DOCTOR OF PHILOSOPHY
Department Of Statistics and Probability

1971

©C0pyrlght by
SAMI NAGUIB ABDELHAMID
1972

TO MY WIFE AND SONS

Mona, Hishan and Tariq

ii

Acmomancamrs

I wish to express my sincere gratitude to Professor Véclav
Fabian, who introduced me to the area of stochastic approximation
and guided me through this research. His careful criticism and
valuable comments helped me avoid many wrong turnings. His
suggestions have been the source of many of the results presented
here and improved virtually all results, either in substance or
style, almost beyond recognition.

In addition, among many others who helped, I would like to
thank Professor Dennis Gilliland for his encouragement and the time
he gave for some discussions and reading the material of this re-
search; Mrs. Noralee Barnes, who excellently typed both the rough
and final versions of this thesis, and whose speed, accuracy,
cheerfulness and patience leave me, at times, unnerved; my wife.
for her great understanding, continuous encouragement, heroic
patience and endurance, and the ideal care of me and my two sons
while I was too busy with my studies and research.

Finally, I would like to thank the people of my country,
the United Arab Republic, and its Government for the financial
support during the first five years of my graduate study. Also
I would like to thank the Department of Statistics and Probability
at Michigan State University for the generous support, financial
and otherwise, that I have enjoyed over the period of my studies

for the Ph.D. degree.
iii

TABEE OF CONTENTS
Chapter Page
1 INTROWCTION AND SWRY I O O O O O O O O O O O O O O O O O O O O O O O 1

2 AIMDST SURE CONVERGENCE OF THE MODIFIED
STOCHASTIC APPROXIMATION PROCEDURES ....... ......

O‘

1 Basic Assumptions and Notations ............
2 Remark ...................... ..... ..........
3 RobbinséMonro (RM) Situation ...............
4 Kiefer-Wolfowitz (KW) Situation ............
.5 Almost Sure (a.s.) Convergence Theorem .....
6
7
8
9

Remrk 0....0.00.0.........OOOOOOOIOOOOOO0.0

QOQVO‘O‘O‘

Assumtion O O O O C O O O I O O O O O O O O O O O O O O I O O I I O O O O 0
Assumption 0 O I O O O C I O O I O I O O 0 O O O O O O O O O O O O D O O O O 10
Lew O O O O O O O O O O O O O C O O O O O I O C O O O O O O O O O O O I O O 0 O 10

NNNNNNNNNNNNN
O

010 Lem coco...cocoo-oooooooooooooooooooooooo. 12
011 Remrk ooooooooooooooooooooooooooooooooooso. 12
012-

.14 Examples .... ...... ooooooooooooooooooooooooo 13

3 ASYMPTOTIC NORMALITY OF THE MODIFIED PROCEDURES . 15

. Theorem .................................... 15
. Asymptotic Normality Theorem ............... l6
. Remark ........................... ..... ..... 19

Remark ...................... ..... .......... 20

Theorem (KW Situation) ...... ..... .......... 20
Theorem (RM Situation) ............... . ..... 21

Theorem ... ...... ... ........................ 22
Theorem (RM Situation) ..................... 24

uuwuuwwu
O
(”VGU‘J-‘WNH

4 SPECIAL CASES AND RESULTS ........ ........ ....... 25

Introduction ............................... 25
The a.s. Convergence and the Asymptotic

Normality Of Procedure (F) ................ 25
Result (KW Situation) ..................... 26
Result (KW Situation) ..................... 26
Result (RM Situation) ..................... 26

o 0
NH

##### ##
\lO‘U’lbw

. Remark .................................... 27
. The Optimal Choice of (a,c) in the KW
Situation ................................ 27
4.8 The Optimal Choice of a in the RM
Situation ................................ 28
4.9 Remark ................................... 29
4.10 Effect of Taking m Observations at
Each Stage ............................... 29

iv

Chapter Page

5 OPTIMAL TRANSFORMATIONS ......................... 32
5.1 Introduction ............................... 32
5.2 Theorem .................................... 33
5.3 Remark .............................. ....... 34
5.4 Definition ................................. 34
5.5 Remark ..................................... 34
5.6 .Asymptotic Efficiency of Optimal Stochastic

Approximation Procedures; the RM Situation . 35
5.7 The Straight Line Case ..................... 35
5.8 A Case of a Sequence of Functions .......... 36
5.9 Theorem .................................... 37
5.10 Theorem .................................... 37
5.11 Some Examples of New Optimal Procedures 37
§::§7Examples ................................... 38
5.16 Modified Procedures by Means of Suitable

TranSformtions oooooooo00.000000000000000oo 40
5.17 A Case in which g' = 0 a.e. (G) .......... 40
5.18 A Case in which g' = constant a.e. (G) .... 41

6 ASYMPTOTIC RELATIVE EFFICIENCY OF THE MODIFIED
PRmEDURES ..........OOOOOOOOOOOOOOOO0.0.0.000... 43

6.1 Definition of the Asymptotic Relative
EffiCiency (A.R.E.) ......OOOOOOOOOOOOOOOOOO 43

6.2 Comparison of Some Transformations ......... 44
6.3 Remrk .0.........OCOOOOO......COOOOOOOOOOOO 46
6.4 Computations of the A.R.E. of Certain

Transformations Relative to the Optimal
Procedure for Some Given Known Distributions 46

REFERENCES 1.........OOOOOOOOOOOOOO......OOOOOOOOO 48

LIST OF TABLES

Tab 1e Pa ge
I 45
II 47

vi

 

 

CHAPTER 1

INTRODUCTION AND SUMMARY

Consider the stochastic approximation procedures of the form

(101) x =xn -anC;IYn’ n=1,2,ooo ,

n+1

where X“, Yn are random variables, and an, cn are positive numbers.
This includes both the Robbins-Monro (1951) procedure (RM), and the
Kiefer-Wolfowitz (1952) procedure (KW).

We shall study the effect of transforming the random variables
Yn in (1.1) into h(Yn) by a Borel measurable transformation h.
Fabian (1960 and 1964) prOposed that Yn in (1.1) be replaced by
sign (Yn). He studied the almost sure convergence of that modified
procedure in both the RM and the KW situations (See §2.3, §2.4), and
he also discussed analoguous modifications in the multidimensional
case. He indicated that those modified procedures behave better in
some practical applications.

Motivated by this idea can we transform Yn by means Of a
Borel measurable transformation h and improve thereby the speed
of convergence? The answer is yes. Therefore instead of (1.1),

we shall consider the following general stochastic approximation

procedure:

- '1 =
(102) xfﬁl - Xn anCn ha“), [1 1,2,... ,

where h is a Borel measurable transformation.

We shall establish the asymptotic prOperties of this prOposed
generalized procedure, and then we characterize the optimal trans-
formation h. With the choice h = sign we will refer to (1.2)
as procedure (F).

Let us denote
In = [x1,x2,...,xn], Mnan) = ExnUnJ’ vn = Yn - Mn(In) .
We shall confine our analysis to the case where the random vari-
ables Vn are conditionally (given In) distributed according
to a symmetric distribution function G admitting a density func-
tion g. This requirement of symmetry is natural in the KW
situation (See §2.7 below).

To be more specific let f be a Borel measurable function
defined on the real line. The exact analytic form of f may be
unknown, but it is assumed that f belongs to a rather general
family of functions. The only available information about f is
that at any level x, we can Observe f(x) subject to a random
error; that is, we can obtain an unbiased observation of f(x).

In the RM situation, we try to approximate the unknown root of
the equation f(9) = O; and in the KW situation we want to
approximate a, the unknown point of minimum (or maximum) of a
function f.

In the RM situation (cf., e.g., Chung (1954), Hodges and
Lehmann (1956),Burkholder (1956)) it is known, under some
“’5

regularity conditions, that (Xn - e) is asymptotically dis-

tributed as a normal random variable g with

2 2
(1.3) , E g = O, and Var g = a a ,
2a f’(9) - l

 

where 02 = Variance of G. Also in the KW situation (cf. Fabian
(1968b) and also the references there) it is known, under some
regularity conditions, that n1/3(Xn - e) is asymptotically dis-
tributed as a normal random variable g with

2 2 -2
O a c

2
__ (2/3)a c f”’(&1
(1-4) E§-- 4af”(e) -2/3 °

4 a f”(9) _ 2/3 , and Var g =

The effect Of transforming Yn into hCYn) is that in-
stead Of using estimators Yn of f(Xn), we are using estimators
h(Yn) of another function f(xn). If then f has the same root
as f and if the conditions guaranteeing the asymptotic normality

I

are preserved the effect of h is to change f’(e) into f (e)
in (1.3) and 02 into another variance 32. Similarly in the
KW situation instead of using estimators Yn of [f(xn + cn) -
f(Xn - cn)], we are using estimators h(Yn) of f(Xn,cn), say.
If then c-1f(x ,c ) has the same behavior as c'1(£(x + c ) -

n n n n n n
f(Xn - cn)) and if conditions guaranteeing the asymptotic
normality are also preserved the effect of h is to change

(f”(e).fm(9)) into (f”(e).§m(9)) in (1-4). and 02 into

~2
another variance 0 .

~2
In the RM situation E g2 will be minimized if Q-T——‘—‘§
(I (6))

is minimized within the class of all h's for which the asymptotic

normality is preserved. Similarly in the KW situation E :2 will

be minimized if ~” 2 is minimized within the class of all
(f (9))

such h's which preserve the asymptotic normality. The derivatives

of f at 9 can be easily determined and we obtain

Two) = 15“) (9)1101) with
(1,5) H(h) = g—tj‘h(t+v)c(dv)]t___0

for i = l in the RM- and i = 2,3 in the KW situation.

We shall state conditions on h under which both the almost
sure convergence to e and the asymptotic normality are preserved.
Within the class Cv of such h's we consider the Optimal trans-
formation which mdnimizes the second moment of the asymptotic dis-
tribution of nB(Xn - 9). This will be shown to be equivalent to
finding h E C! which maximizes H(h). This leads, under some
regularity conditions, to h = -(g’/g) (or any positive constant
multiple of -(g’/g)).

The surprising fact is that with such an Optimal h, the
stochastic approximation procedure is not only Optimal within the
class of stoachstic approximation procedures considered but also,
in the RM case, Xn is an asymptotically efficient estimator of
9 within the class of all regular unbiased estimators.

Knowing the Optimal transformation, we show that the RM
procedure as well as the KW procedure are Optimal if and only if
the error random variables are normally distributed. As for pro-
cedure (F), we show it is Optimal if and only if the error random
variables have a double exponential distribution.

One of the regularity conditions is 0 < I(g) =
I(g’(v)/g(v))zG(dv) < m; if it is not satisfied, we show for some
particular distribution (e.g., uniform and triangular) how one can

design transformations which yield improved procedures.

f(1)(9) = f(i)(e)H(h) with
(1,5) H(h) = [Ell—t fh<t+v)c(dV)]t___o

for i = 1 in the RM- and i = 2,3 in the KW situation.

We shall state conditions on h under which both the almost
sure convergence to e and the asymptotic normality are preserved.
Within the class Cv of such h's we consider the Optimal trans-
formation which minimizes the second moment of the asymptotic dis-
tribution of nB(Xn - 9). This will be shown to be equivalent to
finding h G Cv which maximizes H(h). This leads, under some
regularity conditions, to h = ~(g’/g) (or any positive constant
multiple of -(g'/g)).

The surprising fact is that with such an optimal h, the
stochastic approximation procedure is not only Optimal within the
class of stoachstic approximation procedures considered but also,
in the RM case, Xn is an asymptotically efficient estimator of
9 within the class of all regular unbiased estimators.

Knowing the Optimal transformation, we show that the RM
procedure as well as the KW procedure are optimal if and only if
the error random variables are normally distributed. As for pro-
cedure (F), we show it is Optimal if and only if the error random
variables have a double exponential distribution.

One of the regularity conditions is 0 < I(g) =
I(g’(v)/g(v))2G(dv) < m; if it is not satisfied, we show for some
particular distribution (e.g., uniform and triangular) how one can

design transformations which yield improved procedures.

Our results also make it possible to compare different trans-
formations, and we do so for the "sign" transformation (which yields
procedure (F)), and the identity transformation (which yields either
the RM procedure or the KW procedure). We then study the asymptotic
relative efficiency of procedure (F) relative to the optimal pro-
cedure and show it has the same asymptotic erlative efficiency of
the sequential sign test relative to the sequential probability

ratio test (SPRT), (cf. Groeneveld (1971)).

CHAPTER 2

ALMOST SURE CONVERGENCE OF THE MODIFIED
STOCHASTIC APPROXIMATION PROCEDURES

2.1 Basic Assumptions and Notations:

A11 random variables are supposed to be defined on a proba-
bility space (n,?,P). Relations between random variables, includ-
ing convergence, are meant with probability one. E Vdenotes the
expectation and ET the conditional expectation given a random
vector T. The real line is denoted by R and the indicator
function of a set A by XA° If gm is a sequence of random
variables (or in particular numbers) we use the notation O(§n),

and o(§n) for denoting sequences of random variables such that

there is a constant K and a number sequence rH 4 0 such that

\O(§n)\ s K|§n\, and \o(§n)\ s rn\§n\.
2.2 Remark:

The original results by Robbins and Monro (1951) and Kiefer
and Wolfowitz (1952) were generalized and strengthened by Blum (1954).
He proved Xn a e in both the RM and KW situation, as described

below.

2.3 Robbins-Monro (RM) Situation:

Here we assume that (1.1) holds with cn = 1, Mh(In) = f(Xn),

where f is a Borel measurable function on R such that

1'3.

(2.3.1) sup 1 f(x) < O, 1 inf f(x) > O ,
door-90; ffx'kk

for an (unknown) number 9, and every natural number k. Further-

more, there exist constants A, B such that

(2.3.2) |f(x)\ SA‘x - 9| + B, for all x e R.

(an):=1 is a positive sequence of numbers satisfying

(203.3) 2 an = Q ’ z 82 < m 0
n=1 n=1 n
2 2
(2.3.4) EIhEVn] S c

for a number a and every natural number n.

2.4 Kiefer-Wolfowitg (KW) Situation:

We assume that f is a Borel measurable function on R
satisfying
(2.4.1) sup 1 '13 f(x) < o , 1 inf Q f(x) > o,
"ka'k'i Ker-«k
for an unknown number 9, and every natural number k; where
2 f(x), and D f(x) denote the lower and upper derivative,
respectively, of f at x. Furthermore, there exist constants

A, B such that
(2.4.2) \£(x+1) - f(x)‘ < A‘x - 9| + B for all x e R .

The relation (1.1) holds with positive an, cn satisfying

243 o m = ”2'2 .
(- 0) Cn" 9 Ban “’3 Zancn <99 9
n=1 n=1

and the random variables Yn satisfy

(2.4.4) Mn(1n) = f(Xn + en) - f(Xn - cm).
and

2 2
(2.4.5) EIﬁFvn] S O

for a number a and every natural number n.

Burkholder ((1956); Theorem 1) proved a convergence theorem
which contains the convergence results for both the RM and the KW
situations as special cases. The following is a rewriting of that

theorem in terms of the procedure (1.2).

2.5 Almost Sure (a.s.) Convergence Theorem:

Let 9 E R and 02 be a positive number. Let (1.2) hold

2

and Mn’ Sn be Borel measurable functions; Mn(Xn) = E1 [hCYn)],

2 ~ 2 n
Sn(xn) = EIh[h(Yn) - Mh(Xn)] . Suppose that
(2.5.1) if g > O, ‘x - 9‘ > e and n > n0(g) then
(x - 9)Mn(x) > 0 ;
m -1 ~
(2.5.2) if 0 < 61.< 52 < m then E ancn [ inf [‘Mn(x)‘]] = m;
n-1 615\x-9|$62
\ﬁnbc)‘
(2.5.3) sup [ ]< co ;
n,x 1 + ‘x‘
(2.5.4) sup 82(x) s 02 ;
n,x n
(2.5.5) 2 azc'z < m .

nn

Then

(2.5.6) X a e .

2.6 Remark:

Given h, the a.s. convergence of the procedure (1.2) holds
if the conditions of the preceding theorem are satisfied irrespective
of whether they are also satisfied for h equal to the identity.

But since we are interested in the Optimal Choice of h it seems
useful to investigate conditions on h, which guarantee that con-
ditions of Theorem 2.5 are satisfied with this h if they are
satisfied for h = the identity.

Henceforth the following two assumptions will be assumed

to hold.

2.7 Assumption:

In the general procedure (1.2), let
Yn = Mn(xn) + Vn

where Vn are random variables conditionally (given In) dis-
tributed according to a distribution function G which is symmetric
around 0 and admits a density g. The functions Mn are Borel
measurable.

Here, we may remark that the requirement of symmetry is
natural in the KW situation where Yn is an unbiased estimator of
[f(xn + cn) - f(Xn - cn)]. The requirement of symmetry is then
satisfied if the errors in estimating f(Xn + cn) and f(Xn - cm)

are independent and identically distributed.

10

2.8 Assumption:

We assume that h is an odd Borel measurable transformation
defined on R and nonnegative on [0,m). In addition we assume

that Y(t) = jh(t+v)g(v)dv = fh(v)g(v-t)dv exists for all t E R.

2.9 Lemma:

Assume that
(2.9.1) (1) 1im inf t‘lw(c) > o ,
tiO
and either
(ii) h is nondecreasing;
or
(iii) g is continuous and nonincreasing on [0,m),
and h is bounded and continuous; furthermore
h(v) >0 for all v>O.
Then (2.5.1) and (2.5.2) hold for h if (2.5.1), (2.5.2), and

(2.5.3) hold for the identity transformation.

Proof:

From (i) we obtain, for some positive numbers A and go,
that
(2.9.2) c’ly(c) 2 p0 for all 0 < t < A .

If h is nondecreasing then so is Y and thus (2.9.2) implies

that inf{Y(t), t > to} > O for every t0 > 0. Therefore (1)

and (ii) imply that if 0 < T < T1 < m, then

0

(2.9.3) inf{Y(t); c e [T0,T1]} > o .

11

Now suppose (iii) holds; we shall prove that (2.9.3) holds
in this case, too. Since h is odd and g is symmetric, Y(t)

can be written in the form:
on
(2.9.4) Y(t) ==£h(v)[g(v-t) - g(v+t)]dv, t e R .

The integrand is nonnegative for t 2 0, since h(v) 2 0 for

v 2 O, by.Assumption 2.8, and g(v-t) - g(v+t) 2 0 for v 2 O

and every t 2 0. The latter is obvious from (iii) if 0 s t s v;
if 0 s v1< t then g(v-t) - g(v+t) = g(t-v) - g(v+t) which is
again nonnegative by (iii). In particular Y(t) 2 O for t 2 0.
But suppose Y(t) = 0 for some t > 0. Then, since h(v) > O

for all v > O, g(v-t) - g(v+t) = 0 for almost all (Lebesgue)

v 2 O. The function F(v) = g(v-t) - g(v+t), v 2 0, is then con-
tinuous and thus identically zero. Moreover g is periodic with
period 2t; but since g is nonincreasing for v > 0, then

g = constant on R. This is a contradiction to the fact that g
is a density function. Hence Y(t) > O for all t > 0. Further-
more since h is bounded and continuous, then by the dominated
convergence theorem Y is continuous on [T0,T1], 0 < T0 < T1 < m.
But [T0,T1] is compact, then Y achieves its minimum on

[T0,T1] and thus (2.9.3) holds.

Since Y is odd, (2.9.3) implies Y(t) sign (t) 2 0.
(2.5.1), (2.5.2) and (2.5.3) hold for M.n since (2.5.1) - (2.5.3)
hold when h = the identity. ﬁn(xn) = Y(Mn(Xn)) and thus (2.5.1)
for Mn implies (2.5.1) for ﬂu. Further (2.9.3) and (2.9.2)

imply that for every T >,0 there is an n > 0 such that

12
(2.9.5) ‘Y(t)‘ 2 n\t\ for all |t\ < T .

Suppose 0 < 511< 62. Then using (2.5.3) for Mn we obtain

‘Mh(x)‘ < T for some T > 0 and all ‘x‘ s 6 Thus

1.
‘Mn(x)‘ = ‘YCMh(x))‘ 2 nan(x)l for some n > o, and (2.5.2) for

ﬂu follows from (2.5.2) for Mn. Q.E.D.
2.10 Lemma:

Assume there exist constants K1, K2 such that
(2.10.1) ‘Y(t)‘ s Kl‘t‘ +1x2 for 211 t e R .

Then (2.5.3) holds for ﬁn if it does for Mn.

 

Proof:
Since Mn(x) = YCMn(X))a
[‘Mn(x)‘] 5 Kl‘Mn(x)| +K2
1 + |x‘ 1 + ‘x‘ ’
thus (2.5.3) is satisfied since Mn satisfies (2.5.3). Q.E.D.

2.11 Remark:

Concerning (2.5.4) we notice that it is satisfeid if h
is a bounded transformation. We also add this remark: if h is
bounded by a straight line, Mn is bounded, and G has a bounded
second moment then it is easy to verify that (2.5.4) holds; further-
more (2.5.3) also holds provided that Mn satisfies (2.5.3).

In the following we shall show, under some additional
regularity conditions, that (2.5.1) - (2.5.4) of Theorem 2.5 hold

for some particular choices of h in (1.2).

13

2.12 Example:

Let h(v) = sign (v), v E R. If g is continuous at 0,
g(O) f 0, and (2.5.1) - (2.5.3) hold for M“, then (2.5.1) - (2.5.4)
hold for Mn.
21:22.2:

First Of all the sign function clearly satisfies Assumption
2.8. Conditions (2.5.3) and (2.5.4) follow from the boundedness
Of h. Since h is nondecreasing, Lemma 2.9 will imply (2.5.1)
and (2.5.2) if lim t-1Y(t) > 0. But from the continuity of g
at 0 we obtaintlo

- -10
t 1Me) = 2t j g(v)dv 2 2g(0) > o; g(O) # o. Q.E.D.
-12

2.13 Example:

Let h be a truncation function; that is for a positive

number T0 let h(v) = v if ‘v‘ < T0 and \h(v)‘ = T0 if

4v‘ 2,T0. Assume that (2.5.1) - (2.5.3) hold for Mn and let
0

f g(v)dv > 0. Then (2.5.1) - (2.5.4) hold for ﬁn.
Again it is obvious that h satisfies Assumption 2.8, and
since h is nondecreasing; Lemma 2.9 will imply (2.5.1) and (2.5.2)

if we show lim inf t-1Y(t) > 0. But
ttO
To
Y(t) = [T (t+v)g(v)dv + TOD - C(TO-t) - G(-TO-t)] ;
0

that is

Y(t) = t[c(T0) - G(-T0)] + TO[G(TO+t) - G(TO-t)].

14
Thus for t > 0, we obtain
t'1\y(t) 2 G(T0) - euro) = 2G(T0) - 1 .
Since 2G(T0) - l > 0, then
lim inf t'1\y(t) 2 2G(T0) - 1 > o.
th

Concerning (2.5.3) and (2.5.4), they follow from the boundedness

of h. Q.E.D.

2.14 Example:

Let h and h' be bounded and fh'(v)g(v)dv > 0. In
addition let g be continuous and nonincreasing on [0,«0 and
h(v) > 0 for all v > 0. If M.n satisfies (2.5.1) - (2.5.3)
then (2.5.1) - (2.5.4) hold for ﬁn.

2:222:

Since h is bounded, (2.5.3) and (2.5.4) hold. Also since
(2.5.1) - (2.5.3) hold for Mn and (2.9.1)-(iii) is satisfied,
Lemma 2.9 will imply (2.5.1) and (2.5.2) if we show that
llm t'l‘Ht) 0. liocmmc of tho. hmmdodnmm of h', H fullnwn

th
that

1"(0) = j‘h'(v)g(v)dv > 0

which implies the required result. Q.E.D.

CHAPTER 3

ASYMPTOTIC NORMALITY OF THE MODIFIED PROCEDURES

In this chapter we shall use the following l-dimensional

version of a theorem in Fabian (1968b).

3.1 Theorem:

Let 3n be a non-decreasing sequence Of o-fields, 3n C 3.
Suppose Un’ Vn, Tn, Fn, on are random variables, 00, P, Q E R,

and F > 0. Suppose Fn, ‘n-l’ Vn_1 are Sn-measurable, C, a, B 6 R,

and
(3.1.1) I‘n-oI‘, en-oe,Tn«T, or 1&:[\Tn -T\]..o .
2 2
(3.1.2) Eynwn] - o, c > 1%;an - °o\ -» o ,
and with
2 2
(3.1.3) o =E[x V-\ l :
jsr [‘Vj‘zzrja] J
let either
2
(3.1.4) lim 0 r = 0 for every r > 0 , or
M j.
1 n 2
(3.1.5) a = 1, 1im.;' 2 O r = 0 for every r > 0 .
n—m j=1j’
Suppose 3+ = B if a = l, and 8+ = 0 if a # 1.
(3.1.6) 0<asl,Ose,a+<2I‘, and

15

16

n

-(CY‘-B)/2 -a_B/2
(3.1.7) Un+1 = (1 - n‘drn)Un +'n qun + n T

Then the asymptotic distribution of nB/ZUn is normal with

(3.1.8) mean - 2T(2F - 5+)-1, and variance = 0362(2F - 3+)-1.

(For the proof see Fabian (1968b).)

3.2 .Asymptotic Normality Theorem:

Let a0, a, c and 5 be positive numbers, y be a non-
negative number and go 6 R. Consider the modified procedure (1.2)

with

an = a/n, cn = c/nY, 0 s y < a, n = 1,2,... ,

*)

and suppose that Xn a e .
(3.2.1) Let h be continuous a.e. with respect to G,
Y' exist at 0 and Y'(O) = H(h) > 0 ;

= - ___3_
(3.2.2) B 1 2y and a > 2a0H(h) .

Set

2 2 2 2
(3.2.3) 3 (t) = j[h(t+v) - 11m] g(v)dv, 80(h) = s (0);
and assume that

(3.2.4) the function 82 is bounded by a number 02 and

is continuous at 0.

In addition, let an, ﬁn be Ih-measurable random variables, and

with s = B/(Zy) if y # 0; s = 0 otherwise, assume that

 

*
)This will be satisfied if an satisfies (2.5.1)-(2.5.3), since
(2.5.4) will be implied by (3.2.4) (cf. Theorem 2.5).

17

- 8
(3.2.5) CnIMn(xn) - an(xn - e) + gncn ,
(3.2.6) an "0 do and C11 --0 CO .

e/z '

Then the asymptotic distribution of n (Xn - e) is normal with
(3.2.7) mean = -2a eSH(h)go[2s aOHCh) - 53'1
and
(3.2.8) variance = azc-283(h)[28 aoH(h) - B]-1 .
Proof:

The proof will be established by verifying the conditions

of Theorem 3.1. We have

(3.2.9) (Xn+1 - e) = (Xn - e) - anc;1Mn(Xn) + snot-112n ;
where
(3.2.10) Zn = -(h(Yn) - ﬂn(xn))

Define the following Ih-measurable random variables

(3.2.11) Hn = H(h) if Mn(Xn) = 0 ,

-1 ~ ..
Mn (xn)Mn(xn) if Mn(Xn) # o, n — 1,2,...

Then the term anc;1Mn(Xn) in (3.29) can be written as

aancgan(Xn); and using (3.2.5) we obtain

a e-gﬁ (X ) = a a H (X - e) + a csH g .

nnnn nnnn nnnn

Since anc: = a csnml-B/2 and a c-1 = a c-ln-g-Blz,

we can
1'1 n

rewrite (3.2.9) as

18
(3.2.12) (xn+1 - e) = (xn - e)(1 - a adHnn-l) +~s e'ln'(1+'3)/Zzn

sn-l-B/Z

'86 Hngn 0

Apply Theorem 3.1 with

a = 1’ 3n = I'n’ Ian = a O[an’ Un = (xn - 9), Qn = a/c, vn = zn’ and
T =‘8 CSHg o

n n n

Now cngn(Xn) a 0 by (3.2.5) since Xn a e, and this with (3.2.1)

implies that Hn a H(h). Thus
3
I‘In - a a0H(h), Tn —. -a c H(h)g0, en -. e = a/c .
Also from (3.2.3) and the continuity of S2 at 0 we obtain
E [Z]=SZ<M (X)) 2320,)
1h n n n 0 ’

So we have shown that (3.1.1) and (3.1.2) of Theorem 3.1 are

satisfied with P = a aoH(h), 9 = a/c, T = -a csH(h)Q0, C = oz

2 2 2
and 0O S (0) So(h).

Concerning (3.1.4) we have

(3.2.13) a? = s[x

2 _ 2 .
’r [zizmzjJ 43:15, 2 2133 ’

j [zjzrj]

the conditional expectations form a uniformly integrable sequence

since they are dominated by E 22 = 82(Mj(xj)) S 02, by (3.2.4).

I J
J
Thus (3.1.4) will be verified if we show that the conditional

expectations converge to 0. But the j-th conditional expectation

is equal to Q (M

j j(Xj)) where

l9

(3-2-14) Qj(t) = 15(t’v)x[§(t,v)2rj]c(dv) s

with
g(t,v) = [h(t+v) - Y(t)]2 for all t,v e R.

The integrands in (3.2.14) are uniformly integrable as t a 0;
since they are dominated by g(t,v) for all v E R and

f§(t,v)G(dv) = 82(t) a S§(h) as t.2 0. Thus to show' Q (t) #10

J
as t a 0 and j a m, it is enough to show for almost all (with
respect to G) v E R, g(t’v)X[§(t,v)2rj] a 0 as t ~ 0 and

j a a; and for this it is enough to show for almost all (with
respect to G) v E R, g(t,v) a h2(v) as t a 0. Let A be the
set of points at which h is discontinuous. A has a probability
zero under G. By the continuity of h on R - A, and since by
(3.2.1) Y is continuous at 0, then g(t,v) « h2(v) as t a 0
for all v 6 R - A. This completes the proof of (3.1.4). The
measurability condition follows from the definition of Tn, Qn-l
and Vn_1. Also, by (3.2.2), 2a aoH(h) > 3. Hence the con-

clusion of the theorem follows by Theorem 3.1. Q.E.D.

3.3 Remark:

Because of our interest in the question of Optimality of
the transformation h in the modified procedure (1.2), the pre-
ceding theorem is stated in terms of the behavior of Mn’ (see
conditions (3.2.5) and (3.2.6)), with sufficient conditions on h
in order to guarantee the asymptotic normality of (1.2). But the
theorem can also be applied to the modified procedure directly

because this procedure can be written as the original procedure

20

with a change of the meaning of Yn's and with h equal to the

identity transformation.

3.4 Remark:

A sufficient condition for (3.2.4) is that (h(t+v) - ‘l’(t))2

are uniformly integrable for t in a neighborhood of O.

3.5 Theorem: (KW situation)

Let f, Yn satisfy the conditions Of the KW situation in
§2.4 with possibly the exception of (2.4.5). Let f” exist and
be continuous in a neighborhood Of 9 with f”(e) = M > 0. Con-
sider the modified procedure with a transformation h which is
continuous a.e. with respect to G; further assume that (2.5.1) -
(2.5.3) are satisfied for Mn if they are for Mn' Let Y'

exiSt at 0 with Y'(0) = H(h) > 0 and

(3.5.1) a =

c =
n 9

2.
n
nY

elm

,y=4l,a>[8MH(h)]-1.

In addition let 82(t)

2
by a constant a and continuous at 0. Then Xn a e and the

%

j[h(t+v) - Y(t)]2g(v)dv, t E R, be bounded

asymptotic distribution of n (Xn - G) is normal with

(3.5.2) mean = 0, and variance = a2c-283(h)[4M a H(h) - %]-l .

Proof:

It is known (cf. Burkholder (1956)) and can be easily
shown that ‘Mn(Xn) = f(Xn + cn) - f(Xn - en) satisfy conditions
(2.5.1) - (2.5.3) and thus by Theorem 2.5 we conclude that Km 4 9,
since donditions (2.5.4) and (2.5.5) are satisfied, too. The rest

of the conclusion of the theorem follows simply by verifying the

21

conditions of Theorem 3.2. The only conditions of Theorem 3.2
which are not directly assumed in our theorem.are (3.2.5), (3.2.6)
and (3.2.2). Let A(x,cn) = f(x+cn) - f(x-cﬁ), x E R. Let

I = [e - e, e +‘e] be an interval on which f” exists and is con-

t inuous . Define

(3.5.3) cp(x) = (x-e)'lt' (x) - f”(e) for x e 1 with (9(9) = o,

0 otherwise.

Then by eXpanding A(x,cn) as a function of cn and substituting

for f’(x) from (3.5.3), it follows that
(3.5.4) c;1A(x,cn) = 2(x-9)[f”(9) + (900] + 'n(x,cn)cn

where (p(x) -0 as x-oe and Tl(x,cn) -+0 if cn-oo and
x a 6. Obviously p and ﬂ(-,cn), the latter as defined by

(3.5.4), are Borel measurable functions. Then

cnan(xn) O’n(xn - 9) + gncn

with In-measurable an 2[f”(9) + qun)] and ﬂu = “(xn’cn);
further an a 2f”(e) = 2M and 5n a 0. Thus (3.2.5) and (3.2.6)
hold with do = 2M, Q0 = 0 and s = 1, since 5 = %. Condition
(3.2.2) then follows from (3.5.1). Hence conditions of Theorem

3.2 are satisfied. This completes the proof. Q.EJD.

3.6 Theorem: (KW situation)

In Theorem 3.5 let (3.5.1) be replaced by

a __c__ __1. -1
(3.6.1) an = n’ c — nY , y - 6 and a > [6M H(h)] .

22

Moreover let f” exist and be continuous in a neighborhood Of 9. Then
Xn a 9 and the asymptotic distribution of n1/3(Xn - e) is

normal with

(3.6.2) mean = -(2/3)ac2H(h)f”’(e)[4M a H(h) - g-j'l, and

variance = a2c-233(h)[4M a H(h) - $3-1 .

2.2M:

It is again easy to conclude, by Theorem 2.5, that Xn a 9,
and then it remains to verify (3.2.5), (3.2.6) and (3.2.2) of Theorem
3.2 in order to complete the proof Of the theorem. Let A(x,cn) =
f(x+cn) - f(x-cn), x E R. Let I = [e - e. e + e] be an interval
on which f” exists and is continuous. Similarly as in the proof

of the preceding theorem and with the same p, we obtain that

- 2
(3.6.3) enanOin) - sum“ - e) + gnen
II .. 1 I” .
with Ih-measurable ah = 2[f (e) + h(xn)] and g“ - §{f (9) + H(chn)],
further an « 2f”(9) = 2M and £0 = %’fm(e). Thus (3.2.5) and
(3.2.6) hold with so = 2M, Co = % r”'(e), B =% and s = 2. (Ion-

dition (3.2.2) follows from (3.6.1). Hence conditions of Theorem

3.2 are satisfied. This completes the proof. Q.E.D.
The following theorem is presented here to cover a case

treated in Albert and Gardner (1967) and the results below will

be used later in Chapter 5 of this paper.

3.7 Theorem:

Let fn be a sequence of Borel measurable functions defined

on R such that (2.5.1) - (2.5.3) are satisfied for Mn = fn with

23

en = l and fn(e) = 0. Let d > 0 and

(3.7.1) Dn(x) = (x-e)'1tn(x) if x # e,

d if x = e; n = 1,2,...

be continuously convergent at e to d. Consider the modified
procedure (1.2) with a transformation h which is continuous a.e.
with respect to G, and let h be such that (2.5.1) - (2.5.3) are
satisfied for Mn if they are for Mn. Also let Y' exist at 0,

Y'(0) = H(h) > 0 and
(3.7.2) a = -’, a > [2d H(h)]‘l.

n

I[h(t+v) - Y(t)]2g(v)dv, t E R, be bounded

In addition let 82(t)
2

by a constant a and continuous at 0. Then Xn a e and the

’5

(X - e) is normal with

asymptotic distribution of n n

(3.7.3) mean = 0 and variance = a233(h)[2ad H(h) - 1]-1 .

2:222:

Under the given conditions we obtain, by applying Theorem
2.5, that Xn d e. To obtain the rest of the conclusion of the
theorem, we apply Theorem 3.2 for which we need only to verify

(3.2.5), (3.2.6) and (3.2.2). We have
Mam“) = fnmn) = Dn<xn> (xn - 9)

Since v = 0, then B = l and s = 0. Thus (3.2.5) and (3.2.6)

hold with Ih-measurable an = Dn(Xn) and gn = g0 = 0; further
d-d 11‘”

an a 0 - , since ( n)n=1

to d. Condition (3.2.2) follows from (3.7.2). Hence the

is continuously convergent at e

24
conclusion follows by Theorem 3.2. Q.E.D.

3.8 Theorem: (RM situation)

Let f, Yn satisfy the conditions of the RM.situation in
§2.3 with possibly the exception of (2.3.4); further let f’ exist
at 6 with f’(e) > 0. Consider the modified procedure (1.2) with
a transformation h which is continuous a.e. with respect to G
and for which (2.5.1) - (2.5.3) are satisfied for Mn if they are

for Mn' Also let Y' exist at 0, Y'(0) = H(h) > 0 and

(3.8.1) c = 1, a

n n =§ , a > [212' (emmn'1 .

2
In addition let S (t) = ([h(t+v) - Y(t)]2g(v)dv, t E R, be bounded
by a constant 02 and continuous at 0. Then Xu 4 9 and the

asymptotic distribution of n%(Xn - e) is normal with

(3.8.2) mean - 0 and variance = a233(h)[2a f’(e)H(h) - 1]-1 .

grate:
It is known (cf. Burkholder (1956)) that Mn satisfies

(2.5.1) - (2.5.3). In Theorem 3.7 let fn = f, then it follows

that conditions of Theorem 3.7 are satisfied with d = f’(e) > 0.

Hence the conclusion of the theorem follows by Theorem 3.7. Q.E.D.

CHAPTER 4

SPECIAL CASES AND RESULTS

4.1 Introduction:

In this chapter we show that the a.s. convergence (cf.
Fabian (1960)) and the asymptotic normality of procedure (F) follow
as a special case of the corresponding results for the modified
procedure.

Then we give the Optimal choice of a and c for the
modified procedure in both the KW and the RM situation. Also in
this chapter we point out the effect of taking more observations
at each stage.

Throughout the rest of this paper, we write N(p1,p§) to
denote a normal random variable with mean = p1 and variance = pg,

:4

and we also write Tn g if Tn is .asymptotically g-distributed.

4.2 The a.s. convergence and the asymptotic normality 9£_procedure (F):

In §4.3, §4.4 and §4.5 we consider the modified procedure
(1.2) with the choice h = the sign transformation, and we assume
that g is continuous at 0 and g(O) i 0. Hence H(h) = 2g(0) > 0;
furthermore, the sign function is continuous except at 0 and
32(t) = l - Y2(t) < m for all t E R, Since Y is continuous
at 0, then so is 82 with 83(h) = 1. ‘Moreover if (2.5.1) -

(2.5.3) are satisfied for Mn’ then (see Example 2.12) ﬁn satisfy

25

26

(2.5.1) - (2.5.4). Thus the statements in Results 4.3, 4.4, and

4.5 follow from Theorems 3.5, 3.6, and 3.8, respectively.

4.3 Result: (KW situation)

Let f, Yn satisfy the conditions of the KW situation in
§2.4 with possibly the exception of (2.4.5). Let f“ exist and

be continuous in a neighborhood of 9 with f”(e) = M >.0. Let

(43-1) 8,, = in en = 3;. v = 1/4, and a > [16M g(on'l,
n
Then Xu 4 e and
t .2 2 -2 -1
(4.3.2) n (xn - e) 4 N(0, a c [8Ma g(O) - g] ),

4.4 Result: (KW situation)

In Result 4.3 let (4.3.1) be replaced by
a c _ -1
(4.4.1) a = :3 c = -;, v - 1/6 and a > [12M g(0)] .
n

In addition let f” exist and be continuous in a neighborhood of

6. Then Xn a e and

2 I” 2 -2
(4.4.2) 1/3(X _ 91:15 Sit/3%? C 2(0)f (6) a c

n n a N(- 8M8 8(0) - 2/3 ’ 8Ma g(O) - 2/3 )°

4.5 Result: (RM situation)

Let f, Yn satisfy the conditions Of the RM situation in
§2.3 with possibly the exception of (2.3.4). In addition let f’

exist at e, f’(e) > 0 and

(4.5.1) cn = 1, an = é, a > [4f'(e)g(o)]'1 .

:3

27
Then X a e and
n

(4.5.2) nk n - 9) 211(0, a2[4f’ (e)a g(0) - 11'1) .

4.6 Remark:

With the choice h = the identity transformation, the
reader may easily check that the a.s. convergence and the asymptotic
normality of the original RM and KW procedures follow as special
cases of the corresponding results for the modified procedure pro-

vided Assumption 2.7 holds and Iv2g(v)dV«< m.

4.7 The optimal choice pf (a,c) i the Kﬂ_situation:

Let g be a normal random variable with mean and variance
given by (3.6.2). Then

8282 (h) -2

 

(4.7.1) E «La/9)€12 H sz’ﬂ‘e) e4 +
(4Ma H(h) - 2/3)2 (4Ma H(h) - 2/3)

We shall find the Optimal values Of (a,c), i.e. values for which
E g2 is minimized with other quantities being fixed. We assume
that fm(e) # 0. TO find the value c(a) of c > 0 that
minimizes the R.H.S. of (4.7.1) for a fixed a > [6M H(h)]-l, let

8283(h)
‘4Ma H(h) - 2/3 ’

 

B = (4/9)a2u2(h)t”'2(a)
(4Ma H(h) - 2/3)2

Y(c) = A c"2 +-B c4 .

A = and

Differentiating Y with reSpect to c we get

Y'(c) = -2A c-3 +’4B c3 .

Thus [c(a)]6 = gg'; that is

28

g_(4Ma H(h) - 2/3)

(4.7.2) [c(a)]6 = 2 2
H (h>f” (e)

2
So(h) .

on

With c = c(a), (4.7.1) becomes

2
1/3 a [33(h)]2/3

(4M8 H(h) - 2/3)

 

(4.7.3) E t2 = [3H2<h>f’”2<e>1 473 -

Now the R.H.S. is minimized by the choice

1

(4.7.5) a = 2M_H(h) ,

which can be easily verified. Therefore the Optimal choice of

(a,c) is given by

1
(4-7-6) (me) = (m)- . [g-f”’2(e)[S(2,(h)/H2(h)]]1/6)-

With this Optimal choice of (a,c), (4.7.1) becomes

1/3

(4.7.7) E g2 = (3/16M2)[(9/4)fm2(9)] [SS(h)/H2(h)]2/3 .

4.8 The optimal choice pf a i the RM.situation:

Let g be a normal random variable with mean and variance

given by (3.8.2), then

2 2
2 _ a 30(h)

(4.8.1) E g 2a f’(e)H(h) - 1

 

*
The optimal choice of a, say a , is the value of a which

minimizes E g2 with other quantities fixed. Thus one can easily
check that

* 1

(4.8.2) a = fr(6)H(h)-

*
With a = a , (4.8.1) becomes

29

(4.8.3) E g2 = S§(h)/(f’2(e)H2(h)) .

4.9 Remark:

We notice that the unpleasant feature of the optimal values
of a and c is that they depend on values, (f’(9),f”(e), fm(e)),
which are, in general, unknown. But the value of a in the RM
situation and the value of (a,c) in the KW situation can be
estimated during the approximation process and fed back into
the procedure.

For the original RM procedure, Venter (1967) used a pro-
cedure, (later generalized by Fabian (1968b)), which estimates
the Optimal value of a. Recently in Fabian (1971) a procedure
was described which itself estimates the optimal value of a
for a modified version Of the original KW procedure. The same
ideas can be used to obtain a procedure which estimates the Optimal

value of a or both (a,c) for the modified procedure.

4.10 Effect p§_taking m Observations §£_each stage:

Suppose that an experimenter observes m random variables

Y Y instead of one, Yn’ at stage n such that these

n,1’°"’ n,m

m random variables are conditionally, given In, independently
distributed according to G. Suppose he then uses pl" glhan’j)
instead of h(Yn) in the modified procedure (1.2). Tge condi-
tional expectation of the average will be the same as that Of
h(Yn) and the conditional variance will only be changed by a

factor of (l/uo and it is easy to see, under the conditions of

Theorem 3.2, that this will result in changing the variance of the

30

asymptotic distribution by the factor (l/m). Thus, in the KW
situation, under the conditions of Theorem 3.6 we obtain
2

2 2
2 a c 30(1))
1/3 at 12/3)a c #1911191)
(4'10” “ (xn ' 9) " N" 411a H(h) - 2/3 ’ m(4Ma H(h) - 2/3)) °

 

0n the other hand if the experimenter wishes to continue the
modified procedure for nm stages rather than using averages,

then by Theorem 3.6 we have

 

1/3 ,5 (2/32a c2f’”(§)HQ!) “2 c2320”)
(4'10'2) (mu) (xnm ' 9) I N(' 4Ma H(h) - 2/3 ’ 4Ma H(h) - 2/3 ) °

Let fm(9) # 0. Then in (4.10.2) the optimal choice of (a,c)

(see (4.7.6)), is given by

82 0(h)
(4.10.3) <a.c)= (-,——M m) .1-3- f"’2<e)[H° 7—.111’6)
H(h)
while the Optimal choice of (a,c) in (4.10.1) is given by
2
s (h)
(4.10.4) (a,c) = (517711713 . 1% f’”2(e)[—o"2——]]
m H (h)

Let §I,§2 be the normal random variables on the R.H.S. of
(4.10.1) and (4.10.2), respectively. Then with the corresponding

optimal choice of (a,c), one can easily check that
2 2

Therefore using average of m independent observations at each
stage is asymptotically equivalent to continuation of the modified
procedure for nm stages. The only effect is in decreasing the
optimal value of c by a factor Of (l/m)1/6. Also, in the RM

situation, under the conditions of Theorem 3.8, and by using

average we Obtain

31

(4.10.5) (an)"‘(xn - 9) émo, a2s§(h)/(2a f’ (9)H(h) - 1)),

while if the modified procedure is continued for nm stages,

then from Theorem 3.8 we Obtain
(4.19.9) (nn)”(xnm - 9) 49(o,azs§(h)/(2£' (9)9 H(h) - 1)) .

Hence from (4.10.5) and (4.10.6) we see that, in the RM situation,

asymptotically there is no effect.

CHAPTER 5

OPTIMAL TRANSFORMATIONS

5.1 Introduction:

We have seen, in Chapter 3, that the asymptotic results
for the modified procedure (1.2) in both the RM situation and the
KW situation are special cases of the situation described in
Theorem 3.2. The second moment of the asymptotic distribution of
nB/2(Xn - 9) can be written (see (3.2.7) and (3.2.8)) as

4a2c28g§ a2c-283(h)

+ r .
(Zaao - (B/H(h)))2 (28“0H(h) ' B)

If g0 # 0 then, with the optimal choice of (a,c), (5.1.1) becomes

 

 

(5.1.1)

2
3 2 1/ +1 (+2/s+1 2 2
(5.1.2) 95 ((s+1)/s) [4s 591 9 [91335738 [80(h)/H (n)]

“o

s/s+l

Both (5.1.1) and (5.1.2) cover the KW situation, but in the RM
situation c = 1, go = 0, and B = 1, thus (5.1.1) reduces to
a283(h)

Zaa0H(h) - 1 ’

 

(5.1.3)

which with the optimal choice of a becomes
2 2 2
(5 .1 .4) [30 (h) /aOH (11)] .

All the above expressions depend on h only through H(h) and

33(h). We notice that H(bh) = bH(h) and 83(bh) = b283(h) for

32

33

any positive number b, and so a change from h to bh does not
affect (5.1.2) and (5.1.4). But if we change a to (a/b), then
the stochastic approximation procedure will not change and (5.1.1),
(5.1.3) will be unchanged, too. This shows that it is enough to
consider transformations with S§(h) = 1. Then any one of the
above expressions is minimized by the choice h which maximizes
H(h).

Let H’ be the family of all Borel measurable transformations
h such that h satisfies Assumption 2.8, S§(h) = l, and H(h)

can be computed by differentiating under the integral sign; that is

0 < 11(1)) = Y'(0) = fh(V)(-g'(V))dv -

*)
5.2 Theorem :

Let the density g have a derivative a.e. with respect

to G. In addition let
, 2
(5-2-1) 0 < 1(3) = ((3 (V)/g(V)) dG(V) < m .

and set F = [I(g)]g. Suppose that h =-%'(g'/g) a.e. with respect

0
*
to G and h0 61/. Then within )1, H(h) is maximized by h if
*
and only if h = h0 a.e. with respect to G.

Proof:

We have

110)) = fMV) (~8'(V))dv .

 

*

) The author wishes to thank Professor Peter Bickel for calling his
attention to the possibility of using the Schwarz inequality in
proving the theorem after it was proved by a different method.

34
By Schwarz inequality, we obtain

(5-2-2) 1120)) S (fh2(V)s(V)dV)-f(s'(V)/s(V))Zg(V)dv = f(g‘2(V)/s(V))dV-

' 2
Since 0 < f(g'(v)/g(v))2g(v)dv < co and J‘h (v)g(v)dv = 1 then
equality holds in (5.2.2) if and only if (cf. Theorem 3.5 of Rudin

(1966, p. 61)) there exists a non zero constant K such that
h = K(-g'/g) a.e. with respect to G.

To have H(h) > 0 we must have K > 0 and to satisfy 83(h) = l

we have to have K = %5 This completes the proof. Q.E.D.

5.3 Remark:

A sufficient condition for H(h) to be equal to
fh(v)(-g'(v))dv is that for a certain neighborhood, N, of 0 the
family {h(x) gsgigéiﬁleq t E N-{0}} be uniformly integrable with

respect to G.

5.4 Definition:

Suppose that 0 < I(g) < m, h* = - %’(g'/g) a.e. with
respect to G, and h* 6 NZ In addition suppose that the modified
procedure is used with h = h* for which Km 4 e and nB/ZOKn - 9)
has asymptotic distribution as given in Theorem 3.2. Then we shall

*
call h the Optimal transforpption.

5.5 Remark:

An optimal transformation h has the following property.

For any competitor h €.ﬂ' to which Theorem.3.2 also applies,

 

I'II‘IIII'IIIIilliA 1" All

35

the second moment of the asymptotic distribution is larger than
* *
that for h unless h = h a.e. with respect to G.

A modified procedure using an Optimal transformation will

be called optimal procedure.

5.6 Asygptotic efficiency 2; optimal stochastic approximation

procedpres; the RM,situation:

The surprising fact is that, in the RM situation, the
optimal stochastic approximation procedures are not only optimal
within the class Of approximation procedures considered but also
they are asymptotically efficient within the general class of
regular unbiased estimators of e, the parameter to be estimated.
This is true despite the very simple recurrence relation that
generates the approximation sequence Xn.

In more detail, we show that the variance of the asymptotic
distribution of nk(xn - 6) corresponds to the Cramer-Rao lower
bound for the variance of an unbiased estimator based on the first
n observations. We shall consider the following two cases:

(i) a straight line case; f(x) = d(x-9) with d > 0 known,
(ii) a case of sequence of functions fn(e) which are known
except for 9. This second case was considered and studied in

Albert and Gardner (1967).

5.7 The straight line case:

Let

(5.7.1) Yn = f(Xn) + Vn’ n = 1,2,...

36

where f(x) I d(x-e) with 9 unknown but assumed to belong to
some interval, @; 6 positive and known. In addition assume that
Vn =‘Yn - f(Xn) are independent and distributed according to G
which satisfies the conditions stated in Theorem 5.2.

The information contained in [Y1,Y2,...,Yn] is the same
as that contained in [21,ZZ,...,Zn], where 23 =‘Yj + dxj,
j = 1,2,...,n. Then the Cramer-Rao lower bound for the variance
of any unbiased regular estimator of 9, based on the first n
observations, is given (cf. Theorem 4.1.1 of Zacks (1971), p. 186)

-l

by In - n-1(dF)-2. Thus the asymptotic efficiency of Xn is 1,

since n80!n - 8) 3-€'N(0,(<11")-2)«

5.8 A.case _£.g seguence p£_functions:

 

Let
(5.8.1) Yn = £n(e) + Vn , n = 1,2,...

be observations on known functions fn except for 9 which is
assumed to belong to some interval, @. Let the error random
variables Vn = Yn — fn(e) be independent and distributed accord-
ing to G which satisfies the conditions of Theorem 5.2. Further-
more for each n let fn have the same unique root 9 E O, fn
exist at e and f;(e) 2 d where d is positive and known. Also
let fn satisfy the conditions stated in Theorem 3.7.

Albert and Gardner have shown (cf. Theorem 5.2 of Albert
and Gardner (1967), p. 68) that in this case Xn will be
asymptotically efficient among regular unbiased estimators of e

’5

if the variance Of the asymptotic distribution of n (Xn - e) is

37

(dI‘)-2 which is true for our Xn generated by the Optimal pro-

cedure. Hence our Optimal procedure is asymptotically efficient
within the class of regular unbiased estimators Of 6.

Albert and Gardner (1967; see Chapter 5 there) tried to
increase the efficiency of the RM type procedure which they used
in their monograph by making transformation of the parameter
space @. Their procedure stayed less efficient except when the

error random variables are normally distributed.

5.9 Theorem:

The KW procedure as well as the RM procedure are optimal
if and only if the error random variables are normally distributed.
treat:

The procedure is optimal if and only if (-g'/g)(v) = Cv
with a constant C > 0, and this is true if and only if G is a

normal distribution. Q.E.D.

5.10 Theorem:

Procedure (F) is Optimal if and only if the error random
variables have a double exponential distribution.
29.292:

Procedure (F) is optimal if and only if (-g'/g)(v) =
C sign (v) with a constant C > 0 and this is true if and only

if G is a double exponential distribution. Q.E.D.

5.11 S exagples‘_§,ppg_optimal procedures:

In the following we give four examples of new optimal

procedures which are different from the original RM and KW

38

procedures. The first three examples fall under the case of

Example 2.14.

5.12 Exapple:

Let G have a Cauchy density function

1 v E R .
l+v
(Recall that random errors with this density function are not
allowed by the RM procedure as well as the KW procedure, since both
of them require the existence of the second moment of C.) One
can check that P2 = 5/2. Let
1 . k v
h1(v) = - f'g (v)/g(v) = 2(2/5) --§-, v E R.
l+v
This is an odd bounded monotone transformation with a bounded
first derivative. Hence one can easily check that h1 6 W3 and
h1 maximizes H(h). Furthermore h1 preserves the a.s. con-

vergence (see Example 2.14) and it satisfies the conditions of

Theorem 3.2. Thus h1 is an optimal transformation.

5.13 Exagple:

Let C have a Student's t-density function given by

.4 _1_ Pawn/2) 2 '(1‘1'V)/2
g(t) f3“,- F(\)/2) (1 + t IV) , t E R, V > 0,

where v here stands for the degrees of freedom. With some

manipulation one can check that

 

1 t
h (t) = - - (g/(t)/s(t)) = C . t e R.
2 I‘ v V1132

where Cv is a positive constant for which

39

m 2 c 2
C (--) g(t)dt = 1 .
L V \rl'tz

Again this transformtion, 112’ has the same properties as

h1 in §5.12, above. Hence h2 is an optimal transformation.

5 .14 Exar_n_ple:

Let C have a logistic density function given by:

1

g(v) ’ 2(1 + Eosh(v)) ’ V E R'

We leave it to the reader to check (see §2.l4) that the optimal
transformation is given by

s inh (v)

113“) = C3 l+cosh (v)

SVERS

a

where C3 = 2 .

5 .15 Exagple:

Let G have a density function given by:

2
g(v) = % e”v /2 if ‘v‘ < T

0 e-T‘v|+(T2/2)

f2}?

where Co and T are positive constants.

, if \v‘ 2 T ,
This g behaves like a normal density for small v,
and then like a double exponential for large v. It follows that

A
4Tsign(v) if M 2T,

- I1: (g'(v)/g(v)) .. c v if M < T

C

40

where C4 also depends on T. Denoting this transformation by

h4 we see that h4 is an odd, bounded and nondecreasing trans-
formation which satisfies Assumption 2.8. Thus h4 6 ﬂ' and

h4 - - %'g'/g a.e. maximizes H(h). Also h4 preserves the a.s.

convergence (see Lemmas 2.9 and 2.10) and it satisfies the con-

ditions of Theorem 3.2. Hence h4 is an optimal transformation.

5.16 'Modified procedures pypmeans p£_ppitable transformations:

In Theorem 5.2 we characterized the Optimal transformation
for a given density function g which has a derivative a.e. with

respect to G and for which

0 < 1(8) = f(s'(V)/g(V))sz(V) < 0°.

What about those distributions for which I(g) = 0 or ad For
example, if g is a constant symmetric density function then
g' - 0 a.e. and thus I(g) = 0. Also if g' itself is con-
stant we have I(g) = a (let g be triangular on (-l,l)).

In the following we give two cases to show how to design
transformations that yield better behaved procedures; but a

unique optimal transformation does not exist in these cases.

5.17 A_case in which g' 8 0 a.e. (G):

 

 

Let G be the uniform distribution on (-b,b), b > 0;

that is

g(v) = 1/-"-b . -b<v< b,

0 otherwise.

41

Let r > 1, and consider a transformation of the form:

hr(v) I ,/2r+l \v/b‘rsign(v) , ‘v‘ < b

[214-1 sign(v) , ‘v‘ 2 b .

This is a bounded nondecreasing transformation which satisfies
Assumption 2.8 and one can check (see Lemmas 2.9 and 2.10) that
this transformation preserves the a.s. convergence Of the pro-
cedure (1.2) as well as its asymptotic normality. Let f, an

be as in Theorem 3.8 with a > [2f’(e)(2r+l)%]-1. Then for the
RM procedure, by using (4.8.3) with the choice h = the identity
we obtain

1 92

f’2(9) 3

while with the choice h = hr’ for r > 1, (4.8.3) gives

 

(5.17.1) E g2 =

l b2

f'2(9) (29+1)

 

 

(5.17.2) E g2 =

Therefore we see that the ratio Of (5.17.1) to (5.17.2) is always
larger than 1 for r > 1. Taking, e.g. r = 10, will result in

decreasing E g2 by the factor %‘.

5.18 A case i which g' = constant a.e. (G):

 

 

In particular we consider the density

g(v) = 1 -\v\ , -l<v<l

0 otherwise .

Let 0 < r < l, and consider the transformation

42

(5.18.1) hr(v) - cr[—1-—; - 1]sign(v). -1 < v < 1 ,
(1-1V\)
Cr sign(v) otherwise ,

where Cr - %'[(l-r)(2-r)]%. This transformation satisfies

Assumption 2.8 and it can be checked by direct calculations that

. _ = 2-r
Y (0) - H(h) 2 l-r .

 

Also we can verify that this transformation preserves the a.s.
convergence and satisfies conditions of Theorem 3.2. Let f,
an satisfy the conditions in Theorem 3.8 with a > [2f’(9)H(hr)]-1,

then using (4.8.3) with the choice h = hr’ it follows that

(5.18.2) E g2 = -—% (1")

(9) “2“) '

while with the choice h = the identity (4.8.3) gives

 

l
,2

 

(5.18.3) E g2

.1.
r (9)6
To see the effect of this transformation in the KW situation, let

f, an, c satisfy the conditions stated in Theorem 3.6 with

n
a > [6M H(hr)]-1' Then with the optimal choice of (a,c) provided
that fm(9) * 0, it follows (see §4.7), with the choice h = hr’
that
2 3 ﬂ 2 3
(5.19.4) E F. - -- [(9/4)f’18)]1/3[1—'——J ’
2 4(2-r)

16M
while for the KW procedure (h = the identity) we have
(5.18.5) E g2 = -——-[(9/4)£MT9)]1/3[:]2/3

16M2
Thus we are able to make the ratio of (5.18.5) to (5.18.4) (or

(5.18.3) to (5.18.2)) as large as we wish by choosing r close to 1.

CHAPTER 6

ASYMPTOTIC RELATIVE EFFICIENCY OF THE MODIFIED PROCEDURES

In Chapter 3 we have shown, under some regularity conditions,

2
Bl (Xn - 9) 34 § where g is normally distributed

that, for a B > O, n
with certain mean and variance. As a reasonable measure of com-
parison between different procedures we use E 52 either with a
and c chosen optimally or with c fixed and a chosen to
minimize E g2.

Our results in Chapters 3, 4 and 5 make it possible to
compare different transformations, and we do so for the identity
transformation and the sign transformation. We also compare
the sign transformation to the Optimal transformation.

To the knowledge of the present author there have not
been any study of the asymptotic relative efficiency (A.R.E.)

of the RM and KW procedures relative to procedure (F) except some

simulation study by Springer (1969).

6.1 Definition g£_the asymptotic relative efficiency (A.R.E.);

 

Let 0% and 9% denote two modified procedures with
1 2 e/Z 1.
transformations h1 and h2 such that n (Xn - e) a gh and
l
nB/2(X - e) 14+; , respectively. Then the A.R.E. of 0
“ h2 h1
relative to 9k is taken to be
2

43

44

2
E[§h2]
e(9 :9 )=‘——2—'
h1 h2 E[§h1]

9 is called more efficient than 9 if e(9 ;9 ) > l.
h1 h2 h1 h2

6.2 Comparison 9; some transformations:

In Table I, we list, for four different cases, the values
of the second moment of the asymptotic distribution for the choices
h = the identity, h = sign and h = the optimal transformation
(see Definition 5.4). Also we list the A.R.E. of h = sign
relative to h = the identity, and the A.R.E. of h = sign
relative to the optimal transformation.

In cases (1) and (2) g is assumed to be continuous at
O with g(O) # 0 and 02 = Iv2g(v)dv < a; further in cases (3)
and (4) g is assumed to satisfy the conditions of Theorem 5.2
and to be continuous at 0 with g(O) # 0. Moreover in cases (1)
and (3) let f,

satisfy the conditions of the RM situation in

, and in addition let f’ exist at e and

:Im a
:3

§2.3 with a =
n
f’(e) > 0. Then with h = the identity (provided a > [2f’(e)]-1)
* -
and also with h = sign ) (provided a > [4f’(e)g(0)] 1), con-
ditions of Theorem 3.8 are satisfied. In cases (2) and (4) let

f, Yn satisfy the conditions of the KW situation in §2.4 with

a = E; c = £—-, y = 1/6 and in addition let f“ exist and be
n n n nY 7

continuous in a neighborhood of 9, and f”(e) = M > 0. Then

- *
with h = the identity (provided a > [6M] 1) and h = sign )

 

*
) See §4.2 and Results 4.4 and 4.5.

45

.Na can H

s mooHoSo «no you muasmmu
wchcoammuuoo onu mo :oHuwaschme uoouwv mp vmusaaoo mum mosao> ammo mHSu :H

.s.m coHuHaHHua «mm H«

Ask

 

 

 

 

 

 

 

 

 

 

 

 

 

N.H H N a Na NEH H as Nws NEH
N\Nﬂ~owNu¢g u A s. new N\N HaszHN\mvT m\N HavsmHN\NHu m Hey
Np a ﬁH: Neva m pawv up onv wva new Amv
. NoHNus H- N N. H- N N.
h m H N H8 Nws NEH a NEH

.O I m 0 O m. l

NoHovNNs u HHemNeVo :ﬁHov w NzNosNH NzNos\No MHW-HNV
.N
Noﬁovas a HH a. new H-ﬁonNwHovN.mqu HmvN.m\No HHV
Aw}va u I q: NS H:
As Huaauno swam zuﬂucmvH

.u.e.< ammo

 

cowusnauumav oHuougahmm «no mo uamﬁoa caooom onu mo mo=Hm>

 

 

H MAQ<H

 

46

(provided a > [12M.g(0)]-1), conditions of Theorem 3.6 are satisfied.
In case (2)-(1) f”(e) = o, and in case (2)-(11) f”(e) ¢ 0. Optimal
constants a,c are chosen except in case (2)-(i) where c is fixed.
Thus the values of the second moment of the asymptotic distribution
are obtained from (4.8.3) in cases (1) and (3) and from (4.7.7)

in cases (2)-(ii) and (4).

6.3 Remark:

From case (3) in Table I, the A.R.E. of procedure (F)
relative to the optimal procedure is given by e(h2;h0) =
4g2(0)/I(g'2(v)/g(v))dv; furthermore if c is fixed and
h0 = - %'(g'/g) is an optimal transformation, then from case 2
in Table I we also obtain that e(h2;h0) = 4g2(0)/I(g'2(v)/g(v))dv.
Recently Groeneveld (1971) has shown that the A.R.E. of the sequential
sign test relative to the sequential probability ratio (SPR) is
also given by 4g2(0)/I(g'2(v)/g(v))dv. This value is also the
A.R.E. of the sign test relative to the most powerful test for
with 9

testing H: e = 90 vs. H1: 9 = e > 90, (where observa-

1 1

tions are drawn from G with a symmetric density function
g(x-B), x 6 R), if competing tests are considered based on a fixed

sample size (cf. Héjek and gidak (1967), Chapter 7).

6.4 Computations of the A.R.E. Qf_certain transformations relative
£g_the optimal procedure for some given known distributions

of the error random variables:

In Table II, for the sake of easy computations (eSpecially

in the KW situation), we take the A.R.E. of a modified procedure

47

*
0h relative to the Optimal procedure 9 to be

cow") = [foo/(for(v>/g<v)>2g<v>dv>) .

 

 

 

 

 

TABLE II
a --.. _ ...- “"“7
Value of A.R.E.
2 2
[H (h)/(j'(g'(V)/8(V)) 8(V)dV)]
\ Transformation .
h (v) = v h (v) = sign(v) e(h ;h )
Distribu- 1 2 2 1
tion of (RM or Kw (Procedure (F))
errors procedure)
Normal 1 a Z
n 11
Double exponential -%- l 2
Uniform Unde f ined Unde f ine d 31
Triangular Undefined Undefined €-
Cauchy Undefined '8—2 = .810 Undefined
‘1 2
. . 9 n
Logistic "—2 .75 3—
4n

 

 

 

 

 

 

REFERENCES

10.

ll.

12.

13.

REFERENCES

Albert, Arthur E. and Gardner, LeLand.A. (1967): Stochastic
Approximation and Nonlinear Regression, Research Monograph
No. 42, The M.I.T. Press, Cambridge, Massachusetts.

Blum, J.R. (1954): Approximation Methods which Converge with
Probability One, Ann. Math. Statist., 22, pp. 737-744.

Burkholder, D.L. (1956): On a Class of Stochastic Approxima-
tion Processes, Ann. Math. Statist., 21, pp. 1044-1059.

Chung, Kai Lai (1968): A Course in Probability Theory,
Harcourt, Brace and World, Inc.

(1954): On a Stochastic Approximation Method, Ann.
Math. Statist., 51, 3, pp. 463-483.

Derman, C. and Sacks, J. (1954): On Dvoretzky's Stochastic
Approximation Theorem,.Ann.'Math. Statist., 39, 2, pp. 601-606.

Dubins, Lester E. and Freedman, David (1965): A Sharper Form
of the Borel-Cantelli Lemma, and the Strong Law, Ann. Math.
Statist., 363 3, pp. 800-818.

DupaE, V. (1957): On Kiefer-Wolfowitz Approximation Method,
Casopis Pest. Mat. 82, pp. 47-75.

Dvoretzky, Aryeh (1956): On Stochastic Approximation, Pro-
ceedings of Third Berkeley Symposium on Math. Stat. and
Probability, Vol. 1, pp. 39-59, University of California Press.

Fabian, V. (1960): Stochastic Approximation Methods, Czech.
Math. Journal 10, pp. 123-159.

(1964): A New One-dimensional Stochastic Approxima-
tion Method for Finding a Local Minimum of a Function, Trans.
Third Prague Conf. Information Theory, Statistical Decision
Functions, Random Processes, CzechoslovakHAcademy of Sciences,
Prague, 85-105.

(1967): Stochastic Approximation of Minima with
Improved Asymptotic Speed, Ann. Math. Statist., 28, 1, pp. 191-200.

(1968a): 0n the Choice of Design in Stochastic
Approximation Methods, Ann. Math. Statist., 32, 2, pp. 457-465.

48

14.

15.

16.

l7.

l8.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

49

Fabian, V. (1968b): On.Asymptotic'Normality in Stochastic
Approximation, Ann. Math. Statist., 32, 4, pp. 1327-1332.

(1971): Stochastic Approximation, Proc. Symp. on
Optimdzing Methods in Statistics, Ohio State University, June
1971 (to appear). RM-280, Michigan State University.

Graves, L.M. (1956): The Theory of Functions of Real Vari-
ab les , MoGraw-Hi ll .

Groeneveld, RJA. (1971): A Note on the Sequential Sign Test,
The American Statistician, 33, 2.

Hajek, J. and Sidak, Z. (1967): Theory of Rank Tests, Academic
Press.

Hodges, J.L., Jr. and Lehmann, E.L. (1956): Two Approximations
to the Robbins-Monro Process, Proceedings of the Third Berkeley
Symposium on Mathematical Statistics and Probability, Berkeley

and Los Angeles, University of California Press, 3, pp. 95-104.

Kesten, Harry (1958): Accelerated Stochastic Approximation,

Ann. Math. Statist., 32, pp. 41-59.

Kiefer, J. and Wolfowitz, J. (1952): Stochastic Estimation of
the Maximum of a Regression Function, Ann. Math. Statist., 33,
pp. 462-466.

Lo‘eve, Michel (1963): Probability Theory, Third Edition,
D. van Nostrand Company, Inc.

Lukacs, Eugene (1968): Stochastic Convergence, D.C. Heath
and Company.

Pontryagin, L.S. et al (1963): The Mathematical Theory of
Optimal Process, Interscience Publishers, a division of John
Wiley & Sons, Inc.

Rao, C.R. (1965): Linear Statistical Inference and its
Applications, John Wiley & Sons, Inc.

Robbins, H. and Monro, S. (1951): A Stochastic Approximation
Method, Ann. Math. Statist., 33, pp. 400-407.

Rudin, Walter (1966): Real and Complex Analysis, McGraw-Hill
Book.Company.

Springer, B.G.F. (1969): Numerical Optimization in the Pre-
sence of Random Variability. The Single Factor Case,
Biometrika, 33, pp. 65-74.

29.

30.

31.

50
Ventor, J.R. (1967): An Extension of the Robbins-Monro Pro-
cedure, Ann. Math. Statist., 33, 1, pp. 181-190.

Wolfowitz, J. (1956): On Stochastic Approximation Methods,
Ann. Math. Statist., 31, pp. 1151-1155.

Zacks, S. (1971): The Theory of Statistical Inference, John
Wiley and Sons, Inc.