_THS

“-1“

10:22

LIBRARY
Michigan State
University

This is to certify that the
dissertation entitled

STRATEGIES OF REPEATED GAMES

presented by

MINGFEI Ll

has been accepted towards fulﬁllment
of the requirements for the

PhD. degree in Applied Mathematics

 

 

 

4‘" ,/"
/ M ﬂ 1/ I

 

Major Professor's Signature

0% /7, 2&0 2?
W (J r

Date

MSU is an aﬁinnative—action, equal-opportunity employer

-... -.—.—‘--.—.—.- an.-.—-na—-o —A—.-.—--.-.--.~.c—._-— —

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 KthroleccaPres/C lRC/DateDue. indd

 

 

 

 

STRATEGIES IN REPEATED GAMES
By
Mingfei Li

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Applied M athematics

2008

ABSTRACT
STRATEGIES IN REPEATED GAMES
By
Mingfei Li

In games that. are repeated, the players have the opportunity to use information on
opponents‘ past moves in selecting a move for the current stage. Strategies for Player
II are considered in this thesis. In particular, the Play Against the Past strategy
(PAP), the Play Against the Past plus Present strategy (PAP+), the Play Against
the Random Past strategy (PARP), and Hannan-type strategies are investigated.
especially in the repeated play of the two-person game called matching pennies. The
effectiveness of a strategy is measured in terms of difference in average loss and an
envelope loss: this difference is called regret. In some cases. exact expressions for
regret are derived; more often, asymptotic properties are derived.

The PAP strategy for Player II is not effective against all Player I move sequences.
Hannan (1957) used a Bayes response to random perturbations of Player I's empiri-
cal distribution of past moves as a strategy and established good asymptotic regret
properties uniform in Player I move sequences for the repeated play of a variety of
games. Gilliland (2004) and Gilliland and Jung (2006) introduced the PARP strategy
where the randomization comes through bootstrap sampling of Player I‘s past moves
and established results for the repeated play of matching pennies.

The PAP, PAP+. PARP and Hannan-type strategies are deﬁned in Chapter 2.
The adaptation of PARP to achieve regret results relative to k-extended envelr.)pes
is demonstrated in Chapter 3 for matching pennies. Chapter 4 documents cases
where strategies published following Hannan's seminal (1957) paper are unrecognized,
special cases of his work. PARP is discussed in the context. of the expert. selection
problem in Chapter 5. and regret asymptotics are derived for certain classes of Player

I move sequences.

DEDICATION

This dissertation is dedicated to my wonderful parents, J iequn and Shumay who
raised me, supported me, taught me and loved me. You have been with me every
step of the way, through good times and bad.

Also, this dissertation is dedicated to my grandparents, aunts and uncles, who
have been great sources of motivation and inspiration.

Thank you for all the unconditional love, guidance, and support that you have
always given me, helping me to succeed and instilling in me the conﬁdence that I am

capable of doing anything I put my mind to. Thank you for everything. I love you!

iii

ACKNOWLEDGMENT

From the formative stages of this dissertation, to the final draft, I owe an immense
debt of gratitude to my PhD supervisor. Dr. Dennis Gilliland. His sound advice and
careful guidance were invaluable. I have learned so much, and without you, this would
not have been possible.

For their efforts, assistance and support, a special thanks as well to the Dr. Zhou.
Dr. Yan, Dr. Weil, Dr. Page, Dr. Dass, Dr. Vorro, Dr. Bush, Barbara Miller and
all my good friends and students colleagues in Michigan State University.

Finally, I would be remiss without. mentioning Dr. Mark Hurwitz, Mr. and Mrs.
Mawhorter, whose extreme generosity will be remembered always.

To each of the above. I extend my deepest appreciation.

iv

TABLE OF CONTENTS

List of Tables

List of Figures

Introduction

1.1 Game theory ...............................
1.2 Repeated play ...............................
1.3 Summary of Thesis ............................

The PAP, PAP+, PARP and Hannan-Type Strategies

2.1
2.2
2.3
2.4

Play Against the Past (PAP) and Past plus Present (PAP+) .....
Play Against the Random Past (PARP) ................
Hannan-Type Strategies (H) .......................
Need for Fresh Randomizations .....................

PARP Strategy for k-extended Envelope Problem

3.1
3.2
3.3
3.4
3.5
3.6

Introduction ............................ ‘. . . .
Envelopes including Extended Envelopes ................
Two-extended envelope problem .....................
PAP Strategy in two-extended envelope problem ............
PAP+ Strategy in two-extended envelope problem ...........
PARP Strategy in. two-extended envelope problem ...........

Discovering Hannan

4.1
4.2
4.3

Foster and Vohra: Selecting Forecasters .................
Feder, Merhav and Gutmaanniversal Predictitm ............
Minimax Regret ..............................

Expert Selection Problem

5.1
5.2

5.3

Introduction and Review .........................
Focus Forecasting .............................
5.2.1 Introduction ............................
5.2.2 Methodology ...........................
Two experts selection problem ......................
5.3.1 Reduction to Z—problem .....................
5.3.2 Worst case discussion .......................
5.3.3 Hannan consistency of PARP for Certain classes ........
Examples of the application of PARP strategy .............
Future work ................................

V?

vii

viii

cube—*l-i

12
18
18
22
32
32

3 (i

~J \I x]
CH 4: A; to

(DOOOO\1\1N
Leer—doom

95

Bibliography

vi

96

4.1
4.2
4.3

5.1

LIST OF TABLES

Player I and Player II’s dynamic play to N 2 NO + 3 .........
Player I and Player II‘s dynamic play to N 2 N0 + +7711 + 3 .....
Non refresh randomness example with U 20.7. .............
An examine of Player l‘s actual moves in N213 trials .........
'I‘wo-extci'ided envelope problem .....................
Convergence of minimax regret .....................
Player I’s play sequence with N25. ...................
Player I's play sequence with N210 ....................
Forward Rate prediction: ARMA vs Simple Average ..........
Forward Rate prediction: ARMA vs GARCH(1.1) ...........

vii

1.1

2.2
2.3

4.1
4.2

.01 .0! s“ .61 .01
C)! pr:- 00 [\3 t—‘

01
C33

p—a

LIST OF FIGURES

Cesaro Loss vs Bayes Envelope .....................
PAP vs PAP-l— vs Envelope for i.i.d Bernoulli sequence ........
simplex ..................................
Non refresh randomness U207. ....................
Hannan, PARP and Minimax probability for N25 ...........
Hannan, PARP and Minimax probability for N210 ..........
Hannan, PARP and Minimax probability for N2800 ..........
Simulation .................................
Forward Rate History Data. .......................
Forward Rate prediction: True data vs ARMA vs Simple Average . .

Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 5
Forward Rate: True data vs ARMA vs GARCH(1,1) .........

Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 2

 

1Images in this dissertation are presented in color

viii

66

Chapter 1

Introduction

1. 1 Game theory

Game theory is the theory of rational behavior for interactive decision problems. In
a game. participants strive to maximize their expected gain by choosing particular
courses of action, and each participant‘s final payoff depends on the profiles of the
courses of action chosen by all participants. The interactive situation, specified by
the set of participants. the information ﬂow. the possible courses of actiml of each
participant. and the set of all possible payoffs. is called a game. Participants, i.e..
those who are ‘playing’ the game, are called the players.

In a game. if the goal of each player is to achieve the largest possible individual
gain (profit or payoff). the game is called a noncoopcratz've game. Games in which
the actions of the players are directed to maximized the gains of coalitions without
subsequent subdivision of the gain among the players within the coalition are called
cooperative games. In. this thesis. we focus on some I‘ioncooperative games.

The basic objects of interest. in noncooperative games are players” strategies. A
player’s strategy is a complete plan of action, i.e.. the n'ioves to be taken when the
game is played: it. must be completely specified before the actual play of the game

starts. and it prescribes the course of play for each move that a player might be called

upon to take, for each possible piece of information. that the player may have at. each

time where he or she might be called upon to act.

In simple form, a. two-person game is a triple (.4. B, L) where A is the set of moves
for Player I, B is the set of moves for Player 11, and L is a. nonnegative function defined
on A x B with Mr, y) denoting loss to Player II when Player I plays x and Player II
plays y. With a a—ﬁeld of subsets deﬁned for .4, suitable integrability conditions for
L, and A“ denoting the class of probability distributions on the a—field, the domain
of L is extended to .4“ x B by L(7r.y) 2 f L(.:r,y)d7r(;r). If the class of prol;)al.)ility
distributions includes all degenerate probability distributions for the points in .4, then
(.4", B, L) formally extends the game (AB. L) to include randomized strategies for
Player I. Under suitable assumptions, the game extends to (.4*,B*, L) where both
players have randomized strategies 77 E A", 7' E B“. For the extension, the loss

function is an expectation (expected loss). but it will still be called the loss function.

Our focus is on moves or strategies for Player II and generally Player I’s utility or
inutility are not. defined. For a. zero-sum game. it is understood that Player I's gain
is Player II‘s loss. If Player 11 uses the distribution 7' to generate his/ her move, we
refer to this as randomization. Here the move y is determined as the realization of a

random variable with a probability distribution 7' speciﬁed by the player.

A minimax strategy for Player 11 is any move 7mM such that maxW L(7r, 7m“) 2
min? max” L(7r, 7'), the upper value of the game. A maxmin strategy for Player I is
any move WM", such that. ming L(7rMm, 7') 2 max7r min, L(7r, 7"), the lower value of the
game. If the upper value is equal to the lower value, the common value is called the

value of the game.

A Bayes rule for Player II versus the distribution 77 is any 7" such that L(n, 7) 2
min? L(7r, 7) The minimum is denoted as 13(7) and called minimum Bayes risk. R()
is called the Bayes envelope for the game. A minimizer exists in the set B of pure

moves. Any function a on A’“ with range in B“ and such that L-(7r. 0(a)) 2 R(7r) for

[\D

all 7r 6 A“ is called a Bayes response. Harman (1957) took (Jr to be B-nal'ued.

Consider a zero-sum game where Player I selects a move from the set A 2
{al.ag, . . . ,am} and Player 11 selects a move from set B 2 {121. ()2. . . . ,b,,} with loss
L(a,-, bj) to Player 11 if I chooses a,- and II chooses bj. This is called a ﬁnite m X n
game. Here A“ is the prol;)a.bility simplex in R", 13* is the probability simplex in R"
and all of the ex1‘.)ecta.tions are inner products.

Much of our study concerns the simple 2 X 2 game where each player selects from
{0,1} and the loss function is L(i.j) 2 Ii —j| 2 i-(l —j) + (1 — i) ~j, i,j 2 0.1. This
is the game of matching pennies (or matching binary bits) with Player II’s ob jective
to match Player I. In matching l;)inary bits, 1 — 2L is the gain for Player II while.
2L — 1 is Player I's gain.

Suppose that Player 11 generates his/her move in the matching pennies with a
Bernoulli distribution B(1.p), i.e. prob(j 2 1) 2 p and prob(j 2 0) 2 1 — {2. Then
Player II’s expected loss is seen to be L(i,p) 2 i - (1 — p) + (1 — i) -p 2 li — pl,
i 6 {0,1}. p 6 [0, 1]. Thus the loss function extends to expected loss on the domain
{0. 1} X [0, 1] and we will call it simply loss where there is no chance of confusion.

Applying the extended loss to a weather forecast of rain with prol;)al‘)ility p, the
forecaster (Player II) suffers “loss” 1 — p if it. rains (i 2 1) and “loss” p if it does

1

not rain (i 2 0). The choice p 2 5 in the n'iinimax choice for II. it minimizes the

1

maximum possible expected “loss”, i. e.. 1 2 5 minimizes (1 — [2) V p. Notice that the

weather forecaster is not required to actually generate the Bernoulli random varialﬂe
to serve as his / her move. Rather he / she simply specifies a. probalnlity p. If “Nature”

flips an unbiased coin to determine whether it rains or not, then the equilibrium “loss"

1

2 is achieved, the value of the game.

1.2 Repeated play

If a fixed group of players plays a given game repeatedly, we say this is a rcpmtai game
or is repmtcd play. In another words, a repeated game is the same simultaneous game
played repeatedly. The payoffs add across repeated play. In repeated play, rules will
specify what information generated in the repeated play is made available to what
players and when. We will assume that all players will be fully informed of the rules
that govern the game that is being repeated together with the history of moves of all
players at all stages of the repeated play. Thus, the player may use strategies. i.e.,
sequences of functions that map the history of past moves into a move for the current
stage.

The repeated play is of two types:

(1) Finite Horizon where there is to be a. sequence of N plays where N is finite,
specified and known to the players in. advance. See Hannah's weak sequence game
(1957. Sec 3). In finite horizon play, the players’ strategies can depend on N. Con-
ceptually, the finite horizon repeated play game is another example of a simultaneous
move game where the strategies are finite sequences of recursive functions. Player
11 strategies are evaluated in terms of the average loss over the N games. We will
consider ﬁnite horizon repeated play only in Chapter 4, Section 3.

(2) Inﬁnite Horizon where there is an infinite sequence of plays and the, players
know this. See. Hannans strong sequence game (1957, Sec 3). A player 11 strategy is
evaluated in terms of the sequence of average loss over initial segments.

Our study concerns the review of and the development of "good” strategies for
Player 11 in the repeated play of a two-person game. Generally. results uniform in
sequences of Player I moves are sought and obtained. With such results. the ﬁndings
extend to results uniform in Player I strategies and show that. the motivation for
Player I is irrelevant (Hannan, 1957). The two-person construct is not as restrictive

as it seems since the term Player I may be taken to name a coalition or collectimi of

players.

Now consider the repeated play of matching pennies. We let g and Q denote infinite
sequences of moves for the respective players and letg, and 1) denote initial sequences.
t. : 1.2.... A deterministic sti ate (pure stratum) for Player 11 has as components
recursive functions Qt((_1.4_1) taking values in {0. 1}. t : 2.3.. . . with ()1 E {0, 1}. The
associated average (Cesaro) loss to Player 11 across N plays at the Player I sequence

g_.i:

’1‘)

N

CLN( ((_t. 1)): 2L( (1... b,( ((1.1,_ ))/N— — Zlat —bt(_ _1')|/N

t: 1

As is rather obvious and Ifierhzqis first recorded by Cover (1967).

N
1nax{Z la, — bt(a,_l)| I QN E {0.1}N} : N

1:1

for every b1 and sequence of £54 - measuralgile functions Qt(-.) t z 2. .3, . . . . Thus. no

deterministic strategy for Player II can produce the uniform convergence of average
[V . . . . .

loss to zero. :1 la. — [)1] IS the Hamming distance between the binary sequences

(1,1.(12. . . . .aN and b1.b2. . ..bN.

A stochastic strategy (mired strategy) for Player II has as components recursive
functions Eth—i) taking values in [0.1]. t : 23. with pl 6 [0.1]. Identifying
1 with 7 2 1 and 0 with p : 0. the stochastic strategies include the deterministic
strategies as a subclass. The. associated average (Cesaro) loss to Player II across N
plays is

N

N
Ghee) : Z L<ai.p.(a.._.>>,/N = 2: la. — p.<g._.)|/N

Note that
A?

Ina-NZ lat — Mat—1H law 6 {(1. 1}”) 2 N/‘Z-
t=l

for every 721 and sequence of 9,4 - measurable functimis 3(a) t : 2. 3, . . . . Thus. no

stochastic strategy for Player II can produce the uniform convergemite of average loss

to zero.

Hannan (1957, Sec 3. (11)) introduced what he called modiﬁed regret for the
evaluation of Player 11 strategies (using the scale of total loss). We use the term regret
to denote the difference between the average loss for a strategy and the minimum
average loss (envelope loss) across a specified set of (often simple) strategies.

We will illustrate regret in the repeated play of matching pennies where Player I
selects a E {0, 1} and Player II selects a probability p E [0. 1] with loss L(a. p) : Ia—p|
to Player 11.

Example 1.1 The Simple Envelope and Regret for Repeated Play of Matching
Pennies.

Consider the two strategies 3(0) and p“) where
rim = 0 and 12.5.0) (at—1) = 0
for t : 2. 3. . .. (i.e., always play a 0) and
pi” = 1 and PMs—1) = 1

for t : 2, 3. . .. (i.e., always play a 1). Let S : {p‘olgjll}. The simple envelope is
defined as

N N
R(1)(aN) : min{CLN(_a.p)lp E S} : min{Z at,/N, 1 — Z: ail/N}.
tzl t:l
Dropping the superscript. and letting gN : Z? a, / N . N z 1. '2. . . . . the simple enve-

lope evaluated at gN can be written by:

ngNl : 9N/\(1— (IN)-

This is the Bayes envelope of the component game evaluated at the empirical prob-

6

ability distribution gN of {a1,a2,...,aN}. The regret sequence associated with a

strategy 13 relative to the simple envelope is
Dwtag) = CLMLLQ) — 3(9le N = 12...

Remarkably, Harman (1957) and Blackwell (1956ab) independently developed strate-
gies of a. very different structure for which DN(Q. p) is O(. 7 ‘1/2) uniformly in a. In
each case the development was for repeated play of general finite games and more.
Hannan (1957) worked to get tight bounds and. therefore, good constants in his

0(N‘1/2) demonstrations. Of course. Player II’s concern is
lim sup DN(Q. p) S 0
N _

and this limit condition may have first been referred to as Hannan consistency (at g)

in Hart. and Mas-Colell (2001. p.27). We will also refer to the sufficient condition
11131 Bids. 3) = 0

as Hannan consistency (at. g). This thesis has elaborations on various strategies

demonstrating Hannan consistency.

Example 1.1 (continued) Figure 1.1 below is a plot of the simple envelope R
and the Cesaro loss for a hypothetical strategy with Hannan consistency. Note this
interpretation: Player 11 using the Harman consistent strategy does almost as well on
average through horizon N as if he/ she were told in advance what was to be Player
1‘s majority move in the stages t : 1. 2, . . . . N and he/ she simply played that choice

in attempting to match Player I. The Hannan consistent strategy p has the property

lim sup max{CLN(g.p)|(.zN E {0.1}N} g 1/2,
N —

\l

i.e., it is asymptotically subminimazc. In the limit, Player II loses at most one—half the

time and does better if gN stays away from %.

Figure 1.1: Cesaro Loss vs Bayes Envelope

 

I

0.9

0.8 .
Cesaro Loss
——+— Simple Envelope -

 

 

0.7

 

 

 

I

0.6 _

0.5

I
l

I

0.4 -

I

0.3 . _

I
l

0.2

0.1 r .

 

 

 

1.3 Summary of Thesis

In this section we summarize the thesis and the results herein.

Chapter 2 introduces the play against the past strategy which we label as the
PAP strategy. With this strategy. Player II at stage t plays component Bayes versus
the empirical distribution of Player I's past moves {(L1.ag....(tt_1}. t. : 2.3. . . .. In
matching pennies. this has Player 11 playing the majority move found in Player I‘s past
moves. We also consider the unrealizable strategy called play against the past plus
present (PAP+ strategy). In matching pennies, this has Player 11 1.11aying the majority
move found in Player I ‘s past and present moves {(11, a2, . . . .at_1, at}. Hannan (1957)
used these strategies and their properties both in motivation and in proofs for Hannan
consistent strategies for repeated 1:)lay. We develop exact expressions for the simple
regrets of the PAP and PAP+ strategies in matching pennies. In Chapter 2. we
introduce the. play against the random past: strategy (PARP strategy) first. considered
by Gilliland (2004) and Gilliland and Jung (2006). W ith this strategy. Player II
plays component Bayes versus a random sample drawn with repla-icement from the
past. moves {0.1. oz. . . . (11.1}. A goal of this thesis work was to demonstrate Hannan
consistency for this strategy for the repeated play of the expert. selection problem,
a goal only partially reached (Chapter 5). In Chapter 2 we define Hannan-type
strategies for later reference in Chapter 4. Essentially, a Hannan-type strategy plays
component Bayes versus either a controlled random perturbation of the empirical
distribution of Player I‘s past moves or the expectation of such. We conclude Chapter
2 by illustrating the need for fresh randomization across stages when imI.)l(—>menting a
strategy for matching pennies. We base the example on a Harman-type strategy.

In Chapter 3. we examine extended envelopes for matching pennies. These en-
velopes are called k-eattended envelopes and are more stringent than the simple en-
velope. \-’\«"'hereas the simple envelope is the Bayes envelope of the component game

evaluated at the empirical distribution of {a1. a2. . . . .aN_1. aN}. the 2-extended en-

velope is a Bayes envelope evaluated at. the empirical distribution of pairs

{(a1.a2). ((12.0.3). . . . , (aN_1.aN)}. If Player I‘s moves exhibit. Markov structure, for
example. a tendency to follow a 0 with a 1, then the 2—extended envelope can be
considerably less than the simple envelope. In matching pennies, we develop exact.
expressions for the 2-extended regrets for the PAP and PAP+ strategies and establish

Hannan-consistency for a PARP strategy.

Chapter 4 reports on a. literature search to document specific theorems and results
published by others after I-Iannans (1957) seminal paper. results that are found in
or are direct. consequences of Hannan (1957) results. Because of the cryptic style
and possibly the notations used in Hannan (1957). it is understam‘lable that other
researchers failed to recognize the. speciﬁc results therein. The style and notations
makes the documentations rather challenging in some cases. The literature includes
Cover (1967), Peder. Merhav and Gutman (1992). Foster and Vohra (1993), Chung
(1994), and Cesa-Bianchi and Lugosi (1999). This search was motivated in part by
the Gina Kolta (2006) New York Times article Pity the Scientist Who Discovers the

Discovered in which Hannan is I'nentioned.

Chapter 5 introduces the expert selection problem, which has gotten considerable
attention in the game theory and computer science research communities. Here Player
II must select from a class of experts and assume whatever loss is incurred by that
expert in a speciﬁed game. This problem is often cast. in terms of a forecasting
problem. For example. consider a set of K weather forecasters (experts). Player
II must make a weather forecast for tomorrow: rather than do his / her own analysis,
Player II examines the records of accuracy for the K experts and selects the fOI‘GCaSt of
the one who has the best record of past accuracy. As described, this would be a PAP
strategy. PAP strategies here and in general are not Hannan consistent on all Player I
sequences g. In repeated games, the set of experts could be a set of strategies. Player

II uses the performance record of the strategies to select one to implement in the

10

current stage. Chapter 5 starts by discussing focus forecasting (Smith. 1978) which
can be described as PAP where the tests of the forecasting strategies in the pool
are over recent perforn'lance. not the conmlete past. In practice, this is a criticized
methodology since the pool of experts seems to have grown in a rather ad hoc fashion.
For example, see Gardner, Anderson-Flether and Wicks (2001). Smith's con'lpany
(Focus Forecastingcom) continues to serve custmners. In Chapter 5. we investigate
the use of the PARP strategy in expert selection. We examine the case. the pool has
only two experts and show the problem to be reducible to a one-dimensional problem.
This problem is examined and a class of sequences Q where PARP is Hannan consistent
is identiﬁed. We conclude Chapter 5 with empirical tests of the PARP strategy for

selecting from competing time series models for prediction.

1]

Chapter 2

The PAP, PAP+, PARP and

Hannah-Type Strategies

2.1 Play Against the Past (PAP) and Past plus

Present (PAP+)

Play against the past in the repeated play of a two-person game denotes the strategy
for Player II in which II at each stage t : 2,3, . .. plays component game Bayes
versus the empirical distribution of 1's past moves. The study of this strategy in
general settings is undertaken in Hannan (1957) where basic inequalities (Sec 8 (11))
show the possible in'lportance of the study to the construction of good strategies for
Player II in repeated games. Gilliland (1972) continues the discussion of play agaii‘ist
the past strategies in sequences of statistical decision problems. Play against the
past is a one-sided version of what is called ﬁctitious play in the repeated play of a
two-person. zero-sum game (Robinson. 1951).

Recall that a Bayes rule for Player II versus a prior distribution over the pos-
sible moves by Player I is any move that minimizes the expected loss to Player II.

For example, in matching pennies, a Bayes rule versus the prol:)a..l>ility distrilimtion

12

Prob( = 1) : 7r. Prob(a : 0) : 1 — 7:. is any rule where p : 1 (Player II plays

b = 1) if it > g and p : 0 (Player II plays I) : 0) if 77 < 5111 our study. we will
usually take the determination p : % when 7r : %. Formally. the Bayes response we

consider in our analyses of matching pennies is denoted by a(-. -). where

Here and throughout this thesis. square brackets denote indicatm functions. More-
over. it is convenient for future use to extend the domain of the Bayes response to

0(w1.w2) E [0, 0C)“2 — (0,0) by
(7(td1.u12) :2 0(a21/(a21+ wg),WQ/(w‘1 + w2)).
The PAP strategy for Player II in matching pennies is denoted by and defined by

1 1 1 1
PAP: papl : —2-. papi(g_t_1) : [gr—1 > 2] + glgr—i : 3]

d

where recall from Chapter 1 that g,_1 denotes the proportion of 1's in the sequence
(_4_1. With the PAP strategy for matching pennies. Player II starts with a coin toss
(assumed to be a fair coin) and sul')sequently plays the majority choice in Player I
past moves with a coin toss in the event of a tie

Hannan (1957, Sec 8, (11)) also considered the u1‘1realizable strategy for Player II
that in the context of matching pennies is

I 1 I 1
13,413+: [)(L[)+1: :2“ pap +1 (gt) 2 [gt > 5] + TZ—[gt : 3].

d

This can be thought of as play against. the past including present. Note that this
strategy has Player II‘s move at stage t to be the Bayes response versus the empirical

distribution of {a1, (1.2 ..... at_1, at}. Hannan (1957) established for the repeated play

13

of a general game that the average loss from PAP+ in no greater than the simple
envelope loss and that the average loss from PAP is no less than the simple envelope

loss.

The evaluations for PAP and PAP+ are simple and illustrative in the case of
matching pennies. In developing a new strategy PARP for matching binary bits
(matching pennies) Gilliland and Jung (2006) proved the following proposition in

regard to PAP.

Proposition 2.1.1. In Matching Pennies, the Cesaro loss sequence for the PAP

strategy is given by

CLN(Q.[_)(l_p):gN/\(1— 9N) + 0.51/N/N + 0.5-[gN7ll/2]/1’V, N :1.2....
where I/N is the number 0f 9,, visit to 1/2, t : 1, ‘2, 3. . . . , N

Note that the excess average loss over the simple envelope loss gN /\ (1 — gas) is
positive and is maximized at g : (0. 1, 0. 1.. . . ) or (0. 1. O. 1. . . . ) with the maximum
being 0.25 asymptotically. Here we have limN DN(_q. pap) : 0.25. That. PAP is not

Harman consistent at all in Player I move sequences is a well known result.

we now turn to the the unrealizable strategy PAP+ in matching pennies.

Proposition 2.1.2. The Cesaro loss sequence for the PAP+ strategy is given by

CLN(Q.])(I[)+) : gN /\ (1 — gN) — 0.51/N/N.

where VA! is the number of gt visit to 1/2, t : 1, ‘2, 3. . . . . N. Furthermore,

max{CLN(g_.pap+)lfi.red NgN} : gN /\ (1 — ,(IN)

1-1

and

min{CLN(Q,pap+)lfi.red Air/N} = 9N /\ (1 - gN)

— 0.5(greatest integer in N/2)/N.

Proof: Let N > 0 be fixed and take a1 : 1 without loss of generality. Suppose
that gt returns to 1/2 at stages i1. 732.. . . .ik. where 1 < i,- < i; ..... < it E N. Consider
the ﬁrst epoch 1 S t S i1. Note that gt > 1/2 on 1 S t < i1 and 911 : 1/2 so that
Player II plays 1 on 1 S t < i1 and 1/2 at t : i1. Player I has played 7:1/‘2 0s
including the 0 at stage. i1 and il/‘2 1‘s on epoch 1 S t S i1. Thus, the total loss to
Player II on the ﬁrst epoch is (7.1/2 — 1) + 1/2 : (i1/2 — 1/2). The total loss across
all epochs is (i1/2 —1/‘2)+(i2 — i1)/‘2 —1/2 + - - - + (it. — ik_1)/'2 —1/‘2 : ik/‘2 - lie/‘2.

If it. : N, then gN : 1 / ‘2 and the average loss is
CLN(Q.pUP+) : gN — O'SI/N/JV : 9N A (1 — 9N) — 0,51/N/1'N’Y

where I/N :: {number of gt visits to 1/‘2It : 1, ‘2. 3. . . . , N}.

Now suppose that it. < N. Let. (lik+1 : 1 without of loss of generality so that
gN > 1 / 2. Then Player II plays 1 on the N — it. stages it < t g N. On these stages.
Player I plays a total of (NgN — ik / 2) 1s and therefore, (N —— it.) — (NgN — it./ 2) 0s.
which is the total loss for Player 11. Thus. the average loss for Player I I across all N

stages is
CLN(Q.])(Z[)+) :— (7k/2 — I/N/2)/;7V + (I - gN) — tk/2N Z {M} /\(1— gN) — 0.51/N/A‘Y.

For the fixed total number NgN of 1's in the sequence QN. we see that the Cesaro

loss is maximized when I/N : 0 and minimized by alternating 1's and 0s.
Proof is doneEl

15

It follows from Propositions 2.1.1 and 2.1.2 that.

l/ (r 12

2 N 2 N

and
VN

DN((_1..T)(L[)+) :— —3K.

Suppose that. Player I generates moves a1,a2, . . .as independent, identically dis-
tributed B (1, 77). we show that. PAP is Hannan consistent at g. It. squices to show
that I/N/N + [gm 75 1/ 2] /N ——> O. The second term is bounded by l/N so we need
only consider the first term I/N / N. VA; is the number of visits of the random walk
St : 22(204- — 1) to 0 across t : 1.2,...,1\", N : 1,2 ..... The strong law of
large numbers shows that SN/N —> 27." — 1 as. Thus. if 7: 75 1/2, SN is 0 only
finitely often as, which 1111101638 that z/N/N —+ 0 as. Thus. where PAP is not,
Hannan consistent at all sequences 9. it is Harman consistent as. if Player I re-
peatedly and independently generates his/her moves by a coin toss that has prob-
ability 77 of turning up Heads ((1 : 1) provided 77 74 1/2. Since g»; ——> 77 as. and
R(gN) : gN /\ (1 — gN) -—> 71' /\ (1 — 77) g 1/2 as. with equality if and only if 7: : 1/‘2,
Player 11 is sure to win more than 5070 of the time in the limit if 7.- 7£ 1/2, i.e.. the
coin is biased. If 77 : 1 / 2 (the coin is unlg)iased). there is simple expression for E(I/N).

Ptom Grinstead and Snell (2008. p. 481),

E(l/2N) : (IN -1

where
(2N + 1)!

GIN" : ' Y T
4A A LN!

. . . , . . y . v /——T
Wlll appear again 1n chapter 4. section 3. Since ox.- ~ 4J\/7r. E(1/2N/2.’\ ) ~ l/v NA

and using 112 NH : u; N it follows that E (I/N / N) ——> 0 in L1.

16

Figure 2.1 shows the result of a. simulation where the at are i.i.d Bernoulli (1,1/ 2),

t=1,2,...,100.

Figure 2.1: PAP vs PAP+ vs Envelope for i.i.d Bernoulli sequence

 

 

 

 

 

 

 

 

 

0.8 j , f
I - + - PAP
_ -l-‘ - * - PAP+ .
0'7 l Bayes Envelope
at
0.6
ll
0.5 4 i .
I II ....
l
0.4
0.3 1* i . .o’o':
:* 3
Gen 3:,
0.1 ﬂ
0 l l P
0 50 100 150 200
N

17

2.2 Play Against the Random Past (PARP)

Gilliland (2004) announced the result that play against the random past in match-

T.—

ing pennies is Harman consistent with uniform rate O(]\‘ 1/2). Proof was given in
Gilliland and Jung (2006). The PARP strategy for Player II in matching pennies is
denoted by and defined by

1

. 1 1 .
narrate—1): [gt—1 > 3] + §[9,_1 : —

PARP: parpl : ‘2]

Kali—I

where g{_1 is the proportion of 1's in a random sample of size t— 1 drawn with
replacement from Player I's moves {(1.1 , a2, . . . ,a,_1}. It is assumed that the bootstrap
samples are independent across the stages t. i.e.. that. fresh samples are drawn at each
stage t :— 2. 3 . .. Study of PARP in matching pennies requires the analysis of the

half-binomial probabilities

, 1 1 , 1
Pt—l,gt_1 :: E(lgt—l > El + Elf/t—l : El)

Gilliland and Jung (2006) show that there exist constants A and B such that

 

IDMerarpH S (A + B \/ N - (9N /\ (1 - 9M) )/Ns

thus establishing uniform Harman consistency for PARP with rate 0(N‘1/2).
In Chapter 5 we explore the PARP approach for repeated play of an infinite

component game that. is motivated by the expert selection problem.

2.3 Hannan—Type Strategies (H)

Hannan-type (1957) strategies overcome the weakness in PAP by playing Bayes re-
sponses or the expectations of Bayes responses to properly scaled random perturba-

tions of the en‘ipirical distributions GP]. Specifically, with a component game where

18

Player I has m moves {1, 2. . . . .m}. the empirical probability distribution of Player

t—l 1

1’5 moves through time t— 1 is the vector GP] 2: (72.1 .721; , . . . .n:;I)/(t — 1) where

iii—1

7.

: num{aj = ilj : 1.2.....t — 1}, i : 1,2,....m. We deﬁne a Hannah-type

strategy as any Player I I strategy that. at stage t plays

0(Gt—1 +II.1,_1Zt,_1) . t: 2,3,... (2.1)
()T‘
E(0'(G1_1‘i ht_1' 2)) . t: 2. 3. . . . (2.2)

where {ht_1} is a sequence of positive real numbers. Zt_1 and Z are random vectors
take values in (0, oc)"", and E is expectation over Z. To sin‘iplify proofs. Hannan
extends the domain of the Bayes response a from the probability simplex in Hm to all
of [0, oc)m with a being positive homogeneous of order 0. that is, 0(eur) : 0(u') for
all c > 0, 21? E [0. oc)m. Harman (1957) for repeated play of a variety of component
games, including the ﬁnite two-person game and the S-game, imposes conditions on
the sequence of constants {ht} and the distribution of Z to achieve uniform Hannan
consistency for the strategy (2.2) with rates.

In matching pennies where in = 2, we have labeled the pure moves as 0 and 1 and
let 9,11 = num{aj : 1| j = 1.2.. . . ,t— 1} /(t — 1) denote the empirical proportions of
1‘s in Player I’s initial move sequences. Since. 1 — g,,_1 : nu7n{aj : 0|j : 1. 2. . . . .t—
1}/(t — 1). the empirical probability distribution is Gt.) : (1 — gt_1.gt_1).

Chapter 4 includes a survey of published results that are subsumed by Hannan
(1957). It appears that many of the authors were unaware of the specific results
contained in Hannan (1957). Since a positive homogeneous (order 0) Bayes response
function plays a key role in proofs for Harman-type strategies. we con<.:lude this sec-
tion with examples to illustrate 0, its properties and the notations that are used.
Hopefully, these examples will help make the proofs in Chapter 4 of connectirms of

Hannan-type strategies to other work understandable.

19

Example 2.2 (Matching pennies)

Here Player I and Player 11 have m = n : 2 pure moves which we have denoted
as {0,1}. Suppose that Player I selects his/ her move with the (prior) prol.)al_)ility
distribution Prob(O) : 1 — 77. Prob(1) : 77. Consider Player II selects his/her move
with Prob(0) : 1 — p. Prob(1) : p. The expected loss to Player 11 is L(7r.p) :
(1 — 7r)p + 77(1 — p). Any p that minimizes L(7r.p) is Bayes versus 77. A Bayes
response is any function 0 on the 1:)rol.)al:)ility simplex in R? such that p : 0(1 — 7r, 7:)
is Bayes versus 7r. For each 7: > 1 / 2. p = 1 is uniquely Bayes versus 7:; for each
71' < 1/2, p : 0 is uniquely Bayes versus 7."; for 7r : 1/2, all p E [0. 1] are Bayes versus
77. To specify a Bayes response one must select a minimizer when the minimizer is

not unique. Here is the example given earlier in section 2.1:

0 0_<_7r<%.
0'(l-7T,71’): % 77:?
l. é<7r_<_1.

When in : 2, there is the notational convenience derived from identifying (1—77, 7:)
by 7? . However, this identiﬁcaticm hides features. including the positive homogeneous
property of order 0 imposed by Hannan on the Bayes response. As noted in section 2.1,
once a Bayes response is deﬁned on the prolialiility simplex in H2, the domain is easily
extended to all of [0, oc)2 — {(0,0)} by 0(u-'1,ur2) : 0(u21/(u11 + 117-2).u12/(u'1 + 219))
and then to all of [0. (>0)2 by deﬁning 0(0, 0) to be any specific move. Then. a(-. -)
is a positive homogeneous function of order 0 deﬁned on [0. co)? Then for matching
pennies. 0(4.7) = a(4/11,7/1]) : 0(2 - 4,2 - 7) : 1:0(12, 12) : 0(1/2.1/2) 2:
0(7. 7) : 1/2. Note, for example, that. with Z : (21.22) and h a positive constant.
0(1 — 77 + hZ1.7r + hZ-z) : 1 if and only if (22 — Z1) > (1 — 27r)/h. In the Hannan-
type strategy (2.2). a random perturbation is used, in particular, (Z 1. Zg) is a random
vector. Thus the expected Bayes response (2.2) is a probability distribution on Player

II’s pure moves. speciﬁcally, P(1) : Prob(Z-2—Z1 > (1—27r)/h)+0.5-Prob(Z2— Z1 :

20

(1 —27r)/h) and P(O) : PTOb(Z2—Zl < (1—27r)/h)+0.5-Pr0b(Zg—Z1:(1—27r)/h).
Example 2.3 (Matching 3-sicied pennies) Here Player I and Player 11 have m :
n = 3 pure moves which we denote as {1. 2. 3}. Consider the Player II loss matrix

shown below

 

Player 11

 

side 1 side 2 side 3

 

side 1 0 1 1
Player I side 2 1 0 1
side 3 1 1 0

 

 

 

 

Suppose that Player I selects his / her moves with the (prior) probability distribu-
tion P(1) : 7:1. P(L) : 7:2. P(3) : 773. A Bayes response for Player II deﬁned on the

prolaability simplex in R3 must satisfy

1, 7T1> 71'2V7T3,
0(771a7722773): 2 7T2>771V7T3,
3, 7T3 > 771V772.

These (7:1. n2. 7:3)-sets are the interiors of the convex regions shown in the probability
simplex in ﬁgure 2.2.

The domain of a can be extended to the boundaries by any choices of probability
distributions supported on the maximizing coordinates. For example. 0(025. 0.40, 0.35) :
2 : (0,1,0) and one could take 0(0.35,0.35,0.30) : (1/2,1/2.0) and 0(1/3.1/3.1/3) 2:
(1/3,1/3.1/3) Note. for example. that if "the domain of the function 0 is extended
to all of [0. OC)3 as a positive homogeneous function of order 0, then 0(771 + hZ 1.772 +
1722. 773+hZ3) : (1,0. 0) if (Z1 — Z2) > (772—771)/h and (21 —Z3) > (7r3—n1)/h. where
h. is a positive constant. In the Harman-type strategy (2.2), a random perturbation
is used, in particular, Z = (Z 1, 22, Z3). The expected Bayes response (2.2) is then a

probability distribution on Player II‘s pure moves.

91

d

Figure 2.2: simplex

(0.0.1)

 

 

 

 

 

 

 

 

 

(1.0.0) (0.1.0)

 

 

 

 

 

 

2.4 Need for Fresh Randomizations

Consider the situation where Player II's moves are 1,)robabilities p (the weatherman
example) or. more generally, prol:)ability distributions or expectations. Contrast this
with the situation where Player II is forced to play the rca‘ilization of his / her random-
ization. For example, in matching pennies Player II is required to select. a 0 or a 1.
albeit. he/ she may generate the move with a probability distribution. Because the.
histories of Player II‘s past moves are availa.l.)le to Player 1. Player II must be con-
cerned about the joint distriliiution of the random variables he/ she generates across

the stages of the sequence.

Hannan (1957) did not deal with this issue since his concern was with repeated
play where II’s moves were 1‘)ro}.)a.l:)ility distributions or expectations and cmnponent

loss was measured following expectation over the randomization. To be more speciﬁc.

22

a single random variable Z ~ F (serving like a dummy variable of integration) is used
in describing the Hannan moves E(0(G1 +h1 - Z)) E(0(C2 +112 - Z)), E(0’(G3 + M-
Z )) . . in his theorems.

Suppose that Player II must play a move b2 at stage t = 2 determined by the
Bayes response 0(Gl + h] - Z) a move b3 at stage t = 3 detern'iined by the Bayes
response 0(02 + 122 - Z) . . . In this case, information on the realization of Z passes
to Player I through the the sequence ()2. b3, . . .. (Player 11 applying 0(G1 + In ' Z1),
0(02 + 122 - Z2). 0(G3 + h - Z3).. . . with iid Z,- ~ F removes this possibility for Player
1.) However, we take the matching pennies example to show how Player 11 can be

trapped if employing a Harman-type strategy based on a single randomization Z.

Recall our matching pennies example and consider the Harman-type strategy-2

1 h, (Z — Z.
0'(l-{]1_1+ht_121.gt_1+Il-t_1ZQ):[gt_1> ‘23 + t 1‘ 1; 2)]

 

where gt_1 is the proportion of 1's in the sequence of Player I moves from stage 1 to
stage t-1. In our example we take the scale factor ht : 2 / \/f and let U : Z1 — Z2

where (Z1. Z2) is uniformly distributed in the unit square [0.1]? Then

1 U
+—_:], f:2,3....

bz:[gt—1 >5 \/(;-1 .

where U E [—1.1].
Consider this strategy for Player I: (1.1 : 0. (12 : 0. (1.3 : 1 and
(It—1 if bt—l : 1“ (It—1.

(It I
1- (It—1 if bt—l : (It—1-

Thus, Player I continuous to play the same move until he / she observes that Player
II has matched his / her move. Assume that. Player 11 generates his / her ﬁrst move In

and the randomization (Z1,Z2) in(_le[_)endently with P(bl _: 0) > 0 so that P(bl :

23

0.. U > 0) > 0. We will show that on the event ((21 : 0, U > 0) that Player I wins at

least 3/5ths of the time in the limit so that mnNDMQQ) 2 g — é : 11—0. Since the
event has positive probability, there are move sequences Q where the strategy _b is not
Hannan consistent. (If P(1)] : 0) : 0, then P(b] : 1) : 1 and analysis of the Player

I strategy starting with 1 1 0 will lead to a similar conclusion.)

Assume that. U 2 0. Let NO + 2 be the maximum of stage before Player I switches
his / her play to begin to play the opposite, i.e
NO —— 2 U . f0 —— 1 U

< 0.5 + but gN0+l : —,—— > 0.5 +

( r : —,— _ . , —,—-.
JAG 1N0 V 1V0 [NO + 1 V Afo + 1

 

N0 —— 2
N0

 

From the inequality of g No and 9N0 : , we have

U N0 — 2
, /——A'O NO

0.5+ >0

 

 

i.e.,

0<N0<2(/’2+4+2U-\/U?+ :(U+\/U2+4)2

where N0 is the maximum integer number such that NO_ < 2U2 + 4 + 2U \/U~’ + 4.

Since U E [0, 1], the maximum of N0 is achieved when U 2: 1, and minimum at U : 0.

max A0: 10 a hen U: 1; min N0 : 4 when U : 0.

i.e., 4 g NO S 10, for all U E [0. 1].

From stage 3 to stage NO. NO + 1 and No + 2, Player I still plays 1. And

Player II plays 0 from stage 1 to stage N0. Then at. stage N0 + 1, Player II ob-

 

N . .
serve 9N0+1 Z IVE—1 H > 0.5 + ,————0+ 1, so Player II sw1tch lns play from 0 to 1 at stage

N0 + 1. Then at stage N0 + 2, Player I begins to play 0.

Player 11 switches his play at stage N0 + 2, then Player I switches his play from 1

24

Table 2.1: Player I and Player II‘s dynamic play to N : NO + 3

 

 

 

 

Stage 1 2 3 5 N0 N0 + 1 NO + 2 N0 + 3
Player I 0 0 1 1 1 1 1 1 0
Player 11 0 0 0 0 0 1 0

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 2.2: Player I and Player II‘s dynamic play to N : N0 + +7711 + 3

 

Stage Player I Player II

 

NO
NO + 1
NO + 2
N0 + 3

Or—ar—ib—A'
i—ti—aoo:

N0 + m1
N0 + mi + 1
N70 + m1 + 2
fvo + m1 + 3

HOOD:

 

 

 

 

 

to 0 at stage N0 + 3.
Let m] be the number of stages needed for Player II to switch his play back to 1

after stage N0 + 2. i.e.

As Player I and Player II’s play are listed above, the total number of 1‘s in Player

. . N . . .
I 8 play from stage 1 to stage Ag + 7721. thus gNO+m1 : T043171 And by the deﬁnition

of ml and NO,

NO U
m. = .—— > 0.5 + —.—
gN0+ 1 1V0 + 777.1 ) V N0 + 771-1
ngO _ S 0 O +

 

 

U NO — 2 U NO

-- . - 0.: + —-——, - —, S 0
VA’TO N0 ) ( L) V ./\"0 + 7711 NO + 7771 )

 

i.e., ml is the largest integer such that

U N
O + —,—— — —.—-—-0—- _<_ O.
\/ AD + 7711 AG + ml

therefore, we have
\/ N0 + 7m_ —U + on? + 2%

i.e.,

m1 3 2U2 + NO — 2U - x/2N0 + W.

Lemma 2.4.1. With. the results about NO above, we have the following bounds for
2 3 ml _<_ 4, for all U E [0, 1]. And m1 reaches its minimum at. U = 1, and

maximum at U : 0. Similarly. we have the same conclusion for m3, m5. 772.7,. . . ,i.e

2972,34, j:15)7

Proof: For m1, (U 2 + 4) < A730 (U + x/U2 +4 2,NO is the maximum integer to

satisfy this inequality.

|/\

m (U — \/'3No + U2)? — I No)?
: —\/2N0+U‘2+\/N0 '(f1—\/2N0+U—2 \/No)

S (2U + m — \/2No + U?) - (— 2N0 + U2 + (U — x/U'Z + 4))

 

Obviously —\/2N0 + U 2 + (U —— \/ U2 + 4) < 0, with the fact that m1 > 0.

x/ZNO + U2 2 2U + «U? + 4

26

then

ml _<_ 2U2 + N0 — 2U - (2U + v’U? + 4)
2 N0 — 2U2 — 2U - U2 + 4
: U2 + U2 + 4 + 2U . \/U2 + 4 -— 2U? — 2U - t/U‘2 + 4

:4.

For the lower bound of ml, assume 7721 < 2. i.e O < m] g 1, then

 

2U2 + N0 — 2U - — 1 < ml 3 1.

Thus,

(2U2 + NO — 2)? _<_ ‘ZNO + U2

A7,? + (it/'2 — c) . N0 + 4U4 — 9U‘2 + 4 < 0

N0 3 3 — 2U2 + «5 — 3m

Then, we have No 3 2.4 when U : 1. Contradiction with N0 : 10 when U : 1 from
previous discussion on NO. Therefore. ml 2 2. Together with the ﬁrst part proof.

2 3 m1 3 4. With similar argument, for mj, j : 1, 3. 5. 7, .., we all have
2 S 7713‘ g 4.

(for example, to prove 7773, one just need to replace NO by A72, which equals N0 + ml +

m2) Proof is done. 1:]

Let N1 : N0 +m1, propm‘tion of 1 from stage 1 to stage A] has property that

9N1 > 05 N11“ Let 7722 be the number of stages needed to switch play back to 0.

[‘0

N — + .. -2 -
l "‘1 "2 S 0;) + ——f— Therefore

 

Then, 92'1“th :

 

1V1 +7112 W
U U N — . ' , — 2
(O.()+— g—N )(0.5+ 1 7771+an )<0

N] 1 w/Ni + 1712 N1 + m2 _—
and 7772 is the largest integer such that

, U N1 — ml + mg — 2
0.0 + —,——— — , > 0.
x/Ni + 7722 A1 + 7722

 

 

Then. solve for 7722. with the property of m1, m1 — N0 3 2U 2 — 2U -

 

mg S 4le + 4 - 2U - V 2N0 + U2 + 2U - N/U'2 + 27m + 4.

Lemma 2.4.2. With. the discussion of m1 and No, we claim that 3 g m; g 4.

Similarly, for 777.4, mg, mg“ . . we also have 3 g m]- _<_ 4, j : 2, 4,6. 8,. ..

Proof: For m2, claim that 7722 S 4.

If m2 > 4, i.e. mg _>_ 5, since 7772 is integer and satisfy

 

2 + 4 — 2U- y/‘ZNO + U‘2 + 2U . N/U? + 2m.1 + 4 2 m2 2 5

Then

 

4U? + 2U° x/U2 + 2m + 4 _>_ 1 + 2U. W

i.e when we assume U 74 0.

 

\/U2 + 27721 + 4 > — — 2U + \/ 2N0 + (.12.

21U

By previous discussion. V2Ab + U2 > 2U + \/U'2 + 4. thus the right hand side is > 0.

28

Then ,

2777.] Z (— — 2f] + V 2A’T0 + U2)2
2 (—+2U+\/U'2+ +4-21}? —U2—4

 

21f}

: E+U2+4+l VLI2+4 ”U2
1 V T2

: ..+ b.+4——>oc U——>0.
402 U

Contradiction with 2 3 ml 3 4 from previous discussion. Thus, we have the conclu-

sion. that 7712 S 4.

If m2 < 3, i.e. mg 5 2, since mg must be a integer.

 

4U2 + 4 — 2U - ,/2N0 + U2 + 2U - N/U2 + 2m1 + 4 — 1 g 2

4U2 — 2U - y/2N0 + U2 + 2U- N/U2 + 2m1+ 4 g —1

 

for all U E [0,1].
However, when U : 0, we have left side=0 g —1. Contradiction!

Therefore, based on all the discussion we have above, we have 3 S m _<_ 4. Similar
, 2

proof for 7114., m5, m8,. .In another words, we have 3 < 7m 3 4, for j- — 2. 4. 6, 8 .....
Proof is doneL-J

According to the two propositions above, we can consider the Player II‘s total
loss. Since Player II only wins twice in each cycle (each. cycle contains 7T)2k-1 plus

m2], stages which is of length at least. 5) and since Player I plays 0 at stage 1, Player

N—_f\l_—0 INF—NO

IIs win< 2 +

 

-2 — 2. Thus, Player II‘s total loss is Z N— 2, i. e Player

2N0

5

11’s total loss is 2 3N + — 2. Therefore the Cesaro loss is
O

 

3 1
CLNfQ) : 3 + 0f?)

29

At the other side, the Bayes envelOpe gN /\ (1 — gN) g

%. Therefore.

1 1
Div = CLN(Q) — 9N /\(1— 9N) > — + (NW)

Example 2.3 (Simulation of PARP without refreshing randomness )

_10

Suppose the randomness used in Player II’s strategy U = 0.7. Player I and Player

II play as we describe at the beginning of this section. Then, the Player II’s average

loss sequence and Bayes envelope at each stage are showed by the graph below. and

the simulation of the ﬁrst 27 stages of Player I and Player II are listed by the following

table.

Figure 2.3: Non refresh randomness U=0.7.

 

 

 

 

 

 

 

 

 

_H
+++ + Average Loss of Player II
0.2 _+ + Bayes Envelope _
0.1 ~ ‘
G 41 l l 41
0 100 200 300 400

30

500

Table 2.3: Non refresh randomness example with [7:07.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

stage Player I g, barta: Player 11 Loss,_
1 0 0 1.2000 0 0
2 0 0 0.9950 0 0
3 1 0.3333 0.9041 0 0
4 1 0.5000 0.8500 0 1
5 1 0.6000 0.8130 0 1
6 1 0.6667 0.7858 0 1
7 1 0.7143 0.7646 0 1
8 1 0.7500 0.7475 0 1
9 1 0.7778 0.7333 1 0
10 0 0.7000 0.7214 1 1
11 0 0.6364 0.7111 0 0
12 1 0.6667 0.7021 0 1
13 1 0.6923 0.6941 0 1
14 1 0.7143 0.6871 0 1
15 1 0.7333 0.6807 1 0
16 0 0.6875 0.6750 1 1
17 0 0.6471 0.6698 1 1
18 0 0.6111 0.6650 0 0
19 1 0.6316 0.6606 0 1
20 1 0.6667 0.6528 0 1
21 1 0.6818 0.6492 1 0
22 0 0.6522 0.6460 1 1
23 0 0.6250 0.6429 1 1
24 0 0.6000 0.6400 0 0
25 1 0.6154 0.6373 0 1
26 1 0.6296 0.6347 0 1
27 1 0.6429 0.6323 0 1
28 1 0.6552 0.6300 1 0
29 0 0.6333 0.6278 1 1

 

 

>1: _ ~ U
ban — 0.5 + x/I‘)

is the threshold for decision based on 9,.

 

All the discussion and simulation example show that it is necessary to refresh the

randomness at each stage when we use Hannan type strategy to make decision.

31

Chapter 3

PARP Strategy for k-extended

Envelope Problem

3. 1 Introduction

Practical forecasting problems are of great variety. Sometimes we suspect that Nature.
or the market (as our Player I) makes its decision by some patterns. The decision
on one stage may be affected by the previous k stage decisions. For example, the
market gives rise to a certain stock price. This may raise investors confidence and

this coi‘ifidence or followup may make the market give another increase the next day.

Therefore. we are motivated to study such kinds of pattern behavior. Suppose
the Player I's moves on a, are affected by (1,4... (114“, .. . . (14-1, then with this situ-
ation, our Bayes envelope is called k-ertended Bayes envelope. and the corresponding
forecasting problem is called k-crtendcd Bayes envelope problem.

In this chapter. we will give definitions and extensions of PAP strategy and PARP
strategy for the two-extended envelope problem. Although we focus on the two-
extended envelope problem. it is easy to generalize the two-extended envelope case

to k-extended cases.

32

3.2 Envelopes including Extended EnveIOpes

we have already introduced the simple envelope R for the evaluation of average loss
in the repeated play of matching pennies. Hannan (1957, Sec 3) defines the simple
envelope at stage N as the total loss N - CL N to Player II had 11 known the empirical
distribution of 1's moves (1.]. (1.2. . . . ,aN and played Bayes against this distribution at

each stage t : 1, 2, . . . . N.

We consider what. are called extended envelopes for repeated play. first introduced
by Swain (1965) and Johns (1967) for the repeated play of a statistical decision
problem and first analyzed and ordered in general terms in Gilliland and Hannan
(1969). Extended envelopes can be defined as minimum average loss across specified
sets of strategies including those chosen to take advantage of possible Markov-type

structure in the empirical distribution of Is moves.

Example 3.2.1 Repeated Play of Matching Pennies.
Consider the repeated play of matching pennies and the collection of strategies 5 :

{2(0),])(1),p(2l.p(3l} where 2(0) and 2(1) were defined in example 1.1 and

12)?) : (1N and pi?) : (n.4, t

l l
1"
o:

A?
I
'3

and

[233):1— (1N and ;)§3)21— (n-1, t: 2.3.... ,N

These may be thought of as a stay strategy and switch. strategy, respectively. although
the moves by strategy pm at stage 1 are not possible given the rules for the repeated

play. The extended envelope of order 2 is given by

17mm“) 2: 7711'Ilr1.{CLN(gN.[_)/\,)I]_) E S}

33

As the minimum over a larger set of strategies,
New) S R‘l’taw) = May.) for all 9N

Thus, 2-extended envelope Hi2) is a more stringent envelope against which to conmare
the average loss of a Player 11 strategy.

For the explicit evaluation of the extended envelope 17(2) it is useful to consider all
consecutive pairs contained in the sequence QN understood to be wrapped in a circle
so that aN precedes (11. There are N consecutive, overlapping pairs. Let. 71,-]- denote
the count of the pairs with first component 73 and second component j, i. j 2 0,1.
Then it follows that n] z: 7110 +7111 : NgN :number of 1's in the sequence QN and

no :: n00 + 71.01 : [\"(1 — gN) :number of 0’s in the sequence 91w and 71.01 : 7710.

Proposition 3.2.1. If nij are deﬁned as above, then
AFRQNQN) I (7101 /\ 7100) + (71,11 /\71]())

Proof:

From the definition of 71,-]- and C LN, we have

.v 0‘
(V 'CLN(_QNstN)) : 711 Z 77-10 + 77-11
N ' CLNfﬂM Pk?) : 7'er : "01 + 7100
N - (IL/(fawn?) : 7101 + 7710 : 21101

A" ' CL1\-'(QN._]3§3)) : 77,00 ‘1" 7711

Proof can be completed by considering the four situations: (nm 3 12.00 and 7101 S 7211),
(nm S 7200 and no] 2 7711). (7201 2 72.00 and no] 3 71.11) and (71.01 2 n00 and nm 2 7111).

Example 3.2.2 Let. N : 17. and £173 (0,1.1.1.0.1.0.1,1.0.0.0.1.1.010). In this

34

case 7100 = 3, 71.01 : n10 : 5.77.11 : 4.

Then 2-extended envelope 17'R(2)(g_17) : (3A5)+(4/\5) : 7. \Vhile. 17-R(9/17) :
17 - 11mm”) 2 (3 + 5) /\ (4 + 5) : 8.

Consider another example where the sequence of Player I moves shows greater
second order dependency. I\-’larkov structure. Take N : 17, and
9173 (0.1.0.1.0.1.0.1.1.0.0.1.1.1.0.1.0). In this case 11.00 : 2, 72.01 2 7110 2: 6n“ 2 3.

Then 2-extended envelope 17-R(2)(g17) : (2A6)+(3/\6) : 5. \Vhile, 17-H.(9/17) =
17- R‘Wn) = (2 + 6) /\ (3 + 6): 8.

The extended envelope idea can be based on three-tuples, four-tuples. etc. and

leads to an ordering for a fan'iily of k envelopes
War) 5 ("ft—”(41.) s 3 Wm) = 9.4 A (1 — 9N).
Regret relative to the 2-extended envelope is
new) : cat-(447.2,...) - Wat).

This Chapter includes the adaptation of the PARP strategy (defined in Chapter 2)
to matching pennies that achieves uniform Hamian consistency with the 2-extended
envelope, 1. e.,

—] u a
D)3)(a_.parp) : 0(N ‘17) amformly 272 a.

Remark:

The wrapping of the sequence (_LN gives an ordering to the resulting envelopes.
an idea exploited in Gilliland and Hannah (1969). However, in developing Ilannan
consistent strategies in repeated play with the k-extended envelope. there are only
t — k past k-tuples of Player I consecutive moves available to Planter II for basing a

move at stage t. t : k+1. kt+ 2, . . . . Thus. the regrets studied in the following sections

are relative to envelopes based on the empirical distribution of the N — k+ 1(11ot A") k-

35

tuples (a1,a2. . . . .ak.), (a2, (1.3.. . . ,a.k+1),. . .. (aN_k+1,aN_k+2. . . . ,aN). The difference
in regrets compared to those relative to envelopes based on the N k-tuples from the
wrapped sequences is for fixed I: of order 0(1/N) since only 1: of the k-tuples are
omitted at each stage N.

In the rest of this chapter we consider the repeated play of matching pennies.
In section 3.3 we continue discussion of the 2-extended envelope. In section 3.4 and
3.5 we derive exact expressions for the Cesaro loss from PAP and PAP + applied to
empirical distribution of past. In section 3.6, we show how PARP applies to give

Hannan consistency for the 2-extended problem.

3.3 Two-extended envelope problem
Let‘s consider Player I‘s moves in an actual game in N : 13 trials.

Table 3.1: An example of Player I’s actual moves in N213 trials

 

 

stage12345678910111213
Payer10101101010010

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now as Player 11, we want to predict Player l‘s move on the (N + I)“ trial.

In order to do the prediction. we have considered PAP and also discussed the
strategy based on random past (PARP) which is randomized from the empirical
distribution in previous chapters. However, another question arises: Does Player I
more likely play 1 following a 0 or more likely play 0 following 0. In another words.
is it possible that Player I is playing on a certain ‘pair’ pattern.

Here, we explore a strategy to deal with such kind of 'pair’ pattern moves of Player
I, i.e. our Bayes responses will be based on the past ‘pairs’.

In general. Player I ‘s move is s<~~quence g with .N trials:

QNIals a2. a3r~~~ (IN—1» (1N-

By pairing Player I's move on each trial and its next trial. we have a set EN:

EN 2 {((LI, (12), (0.2. (13), ((1.3. (14)., . . ., ((1.1\.'_1, 01(1)}.

For each pair in EN. we take the ﬁrst. coordinates as the condition. Then, we can get
two partition sets of EN with respect to the condition of each pair 0 or 1. and during

this partition procedure. we keep the order of these pairs.

Therefore. by partition with the first coordinate as the condition, we have:
AN : {(a.,-.a.j)| a,- : 0, .27 : 1.2....,N— 1, j : 1.2,...,N}

BN5ffm.a-j)l 04:1, .i:I,2.....N—1, i=1,2,...,N}

Suppose Player I‘s move on the N t" is aN : 0. Our Bayes response bN+1 is the move
under the condition, preceding move. is 0. Naturally, our prediction should be based

on Player I‘s pair move under the same condition, i.e. on set AN.

In set. AN. all the second coordinates of each pair are Player I‘s move with condi-
tion: the preceding move is 0. Therefore, we take out the second coordinates of each
pair, keep their orders and put them together to form a. new sequence .410. The new
sequence AN contains all the behavior of Player I with the condition: the preceding

move is 0.

Take the example at the beginning of this section:

We have:

.4N:{(0,1), (0,1), (0,1). (0,1), (0,0)}

, BN:{(1.0), (1.1). (1.0). (1.0). (1.0). (1.0)}

37

Since aN : 0, like what we discussed above, we consider AN, and take out all the

second coordinates of each pairs in AN to form a new sequence:

.41.;1. 1, 1, 1. 0, at“.

where a N+1 is the move we want to predict.

In sequence AN, each term is an individual play with the same condition: proceed-
ing move is 0. In another word, we can consider each term of AN as an individual
move. Therefore, we can use the strategies we have discussed already in previous

chapter and the PARP from Gilliland and Jung (2006).

3.4 PAP Strategy in two-extended enveIOpe prob-

lem

We apply PAP strategy on the sets AN and 310 which we have constructed in last

section.

Theorem: I11 this kind of two~extended envelope problem, when the PAP strategy
is used to do the (N + 1) stage prediction, the Cesaro expected loss sequence for his

PAP strategy p is given by:

 

 

(2) 0.5 1 , 0.5
CLN : 7V + N ‘ ("00w A ”01w + 77,10,111 A "’11.N) + N ‘ (an + #7212.)
0.5 , , 0.5 , ,
+ N ‘ l"’00.N 7f "01.Nl + N ‘ [7110.N 75 7111.Nl

38

where

710W : number of (a)...ak+1) : (1,1) .for k : 1, 2.3, . . . . N — 1:
771 : (”bow + ”bi..N)~

14,1 : {number of times of gk : 0.5 Ik : 1.2.3,... .711}

g). is the enniirical proportion of 1 as the second coordinate through stage k in the

sequence AN.

772 : (77"10.N+77’11.I\')

pug : {number of times of gt. : 0.5 I]: : 1,2,3. . . . ,712}

91,. is the empirical proportion of 1 as the second coordinate through stage k in the

sequence B N. Furthermore.

min { C'Lg)(g,\y,pw)|fi.red 11:14.,“ 13: 0,1, ,j : 0.1}
9N _, .

0'5 + i . (. I A I + I A“, )
— N 7V "'00.N "OLN 7710.N “11.N

 

 

 

and
max { CLﬁ)(gN.pN)| f'II.redII./,j..vs i: 0.1, .j = 0.1}
9N —‘
0.5 1
: 17V '1‘ 1.5 ' W ' (“CON A 7761mm? + ”II/10.1w A 7731“.)
0'5 I I ‘ _I _ I
+ N ' ([7100N 75 "OLNl + l"'10,N 7% 7711,1vl)
Proof:

a \'..‘,( 2c) .r.217:()‘ ' 1 .o) )a w A e _.,_ 74.))3'1‘11e‘.
For 11,, se 116111 g”,V '(r Idmgt fcur tterns 1nt o-(\t(nd (1(11\(l( ( MI] 111

39

(0.0).(O.1).(1.0).(1.1), we can always create set a] from sequence QN where

EN 2 {(01. (1,2), ((1.2. (13). ((1.3. (14). . . . ._ ((1-N_1. (1N)}.

Since the first coordinate in a pair is 0 or 1, E; can be divided into two subse-

quences:

AN: {(a.,aj)1a.:0. .2T:1.2.....N—1.,j=2~~~~Nl

BN3{(CZ¢.(lj)|(1,z-:L .2? : 1.2.....N _ 1. .- : 2 ...... A r}

Therefore. we notice that AN H B N : (ll and AN U B N : ﬂ. Thus.

,. "2 .
A - C Liv) : Loss on a1 + Loss on second coordznates AN

+ Loss on second coordinates B N

Since we always ﬂip a coin on b1 as the start, Loss on a1 = 0.5.

For sequence AN, since we only consider loss on second coordinate in each pair.
we can form all the second coordinates into a sequence. By using past strategy )9.

according to Gilliland and Jung (200(3).

 

I I
7’ 00.N "011v \

 

Loss on second coordinates AN : nl - ( , I . , , .
"now + 7701.N "’00.N + "OLN

 

I
- .. "oaN -
: 0.0 - 11,11 + 0.0 -[ , I # 0.0]
"ODJV + "01.1w"

Since 771 is the the number of the pairs in the sequence AN, i.e. n1 : (7160.5. +

"()er: tl‘lerefore we can simplify the formula above.

Loss on second coordinates AN : ("bow /\ n'OLN) + 0.5 - V,”
v" I I
+ 0.1) ' [II-OOJV ?Z- 7701Nl

40

where.
V,” : {number of times of gt. : 0.5 | k : 1. 2. 3. . . . .771}.

.(II: is the empirical proportion of 1 as the second coordinate through stage k in the

sequence .4 N.

Similar for sequence B N.

. I I —
Loss on second coordinates B N : ("10w /\ “-11.10 + 0.0 - ung

+ 0.5 ' [77"10.N 7e "”11.Nl
where
722 :3 ("SUN + 77"11.N)

ung : {number of times of gk : 0.5 | k = 1. 2, 3. . . . .712}

gr. is the empirical proportion of 1 as the second coordinate through stage It in the

sequence BN.

Thus. the total loss can be written as the following form:

. (2 .— —
.N oC'LN) : 0.0 + ("bow /\ ngLN + 71’10‘NA "’11,!” + 0.0 - (11,11 + ung)

+ 0-5 ' [7100.N 5‘4 77'0er + 0-5 ' [niaN ¢ 7",11.Nl

i.e..

.2 0.5 1 0;)
CLEO : T + T. - ('Iigow /\ 7701A" + "’10.N /\ 77,11.N) + T ' (I/nl + H.712)
a. A 1
0.5 0.5
+ ,— ' l"bo.~ 75 716er + _N— ' l'N-iow 5‘ '"iml

41

I
n I
By Gilliland and Jung (2006). for ﬁxed 111 and n1 - , 00‘!) ., mininnnn loss on
‘ "00 N “‘01 N

the second coordinates 011 AN is achieved by the minimum number of Um. And

min{loss on the second coordinates on AN}

I I
”oaN "OLA" ) + 0 5

I 771 . (n’ + n’ n’ + n’
'OO.N '01.N 'OO.N OLN

Similar for the sequence BN.

min{loss on the second coordinates on B N}

I I

: 'Il.‘2'(I +,/ I + I .0
”MN ’7'11,N 7how "11.N

i.e.

fired Ngéj‘”, vi = 0115 7' = 0-1}

 

min { CLEE)(QN~PN>
a . _

— ”00w ”01w "'10.N "luv

A N

. . . . . . . ’2
Sunllarly. we can easlly derive for the explicit form for the maxn'num of C'LfN') ,

mar{toss on the second coordinates on AN}

"be N ”in N I I
l i ) + 05’ l'n-OOJV if 770er

 

: nil . (72’ + n’ 72’ + n’ .
‘00.N ’01.N 'OO.N '01.»;

ma:r{loss on the second coordinates on B N}

””10 N ”/11 N - I I
l H ) + 0-0' l"00.N 7£ "01.1Vl

 

 

__ , '9 .
n... (n’ + n’ n’ + 77’
'10.N ’11.N ’10.N -’11.N

Therefore.

1.

niax{CL}3)(_qN.pN) I fired NglIM. i: 0.1, ,j : 0.1}

 

9A7
0.5 _ 1
z T + 1.0- N ' ("bow A "brN + "’10.N A "Inc/v)
0.5 ’
+ TV ' ([72’0O.N 7g 71’01.Nl + [In/10.0" # ”ILNl)

Proof is doneC]

w- . 2 . .
Comments 3.4.1: Slnee the 111ax1mu1n of the CLiv) IS aclneved by the sequence
with maximum of UN. in another word. max loss on the second coordinate on .4 N /nl

is achieved by the maximum of UN. i.e.

I __ I
"00.N * 7?01.N

as many times as they can in the sequence AN.

This indicate that the sequence AN : {(0.0) (0.1). (0.0). (0. 1). (0. 0). . . . }
or conversely AN : {(0.1). (0.0). (0.1). (0.0). (0.1)....}.

Similar for the sequence BN.
the maximum case is the sequence: BN : {(1. 0). (1. 1). (1. 0). (1. 1). (1.0)... . }

orconverselyBN:{(1.1). (1.0). (1.1). (1.0). (1.1)....}

If we transfer the two sequence back to original sequence form. the maximum case

aN-:00110011001100....

01‘

gN211001100110011....

. . . . . (2 . .
For the nnnnnum case. Since. the mnnmum of C LN) 18 obtalned l.:>y the sequence

43

with the minimum of UN i.e.

I _ ,I . I _ I
”01w — 7"10.1w ”10.N — "11w

As few times they can. i.e. AN : {(0.1). (0.1). (0.1). (0.1). (0.1)....} or
BN : {(1.0). (1.0), (1.0). (1.0). (1.0)....}. In another word. the original

sequence is:

g..,.;01010101010110.....

or reverse the position of 0 and 1.

Corollary 3.4.1: When we take the dimension of the envelope as a higher di-
mension k. k is a ﬁxed positive integer. we can use the same idea as the theorem we
proved above. Here we can take k:3 as an example to show the possibility of this

generalization.

For k:3. we have 23 : 8 triple patterns which is the combination of the three.

dimension with each coordinates has 0 and 1 two choices:

(0.0.0) (0.0.1) (0.1.0) (0.1.1) (1.0.0) (1.0.1) (1.1.0) (1.1.1)

to transform original sequence a into

EN : {(a1.a2.a3). (ag.a3.a4). (a3.a4.a5). . . .. (a.,,_2.(1.;(.-_1.a)(:)}.

44

Then. we can construct the sets .41. .42. A3 and A4:

ALN
142‘N
443w

AM

As we do in

. (Ij. Gk)!

'.(lj.(lk)|
:.aj.ak.)|

-.a.J-.ak)|

ai:03aj:0¢7::19....]\7—2~jZQe-Hsfv—ls A:3‘7\v}
(1.-:00):1..2::1.....N—2.j=2....,N—1. k=31---J\’}
(Li:1.aj:03i:1§---91\7_25j:2""?]v_17k:3‘.'.]\7}

a.:1.aj:1.i:1.....N—2.j:2.....N—1,k:3....A'}

the theorem we proved above:

. 3
A - CL)? : Loss on a1 + Loss on a2

4
+ E Loss on 3rd coordinates on 14.; N

1:]

 

 

 

And
Loss on 3"d coordinates on. 441w
n’ n’
. 000.1v 001 .N .
: 711 ( )+0.0-1/,,.1
n’ + n’ n’ + n’
-000.N 001.N '000.N 001.1:
I
- ”(101w -
+ 0.0 - [ I I # 0.0]
"000w + "DOLN
_ , I , I — . , I I
—- (”0000' A "001.N) + 0-0 ”:11 + l”000.N 7e “DOLNl
where ng‘I-J is the number of (ak.ak+1.ak+2) : (i.j. l) for k? : 1. 2. 3. . . . . A" — 2; 711

is the number of the pairs in the sequence Amie. n1 : (71600.1(, + 77001.10-

Apply the similar ideas on .42. 143 and .44. and plug them into the formula of

40

C1153). we have:

CL)?

Furthermore.

and

mm CLEV)

max CL)?

1 11’ A, ,, A»
N N "000w 77001.N+”010.N "011w

 

"ioow A ”I101.N + ”[110.N A 7"',111.N)
— ' (”711 + ,un2 + W713 + 7nd)
W ' (lnboow 7g "beLNl + lnbiko 74 “011.Nl

l"i00.N ré 71,101.1vl + l"7',110.N # ninwl)

3

I I II I
—- W + W ' ("OWN A "001$" + 71010.1v A "011w

I I I I I I
+ "100.1v A ”-101.1v + ”new A "111.1vl

1 1 _1_ I /\ I I I /\ I
N + - ' N ' ("000w ”'001.N+77010.N "011.11;

A

O!

 

I I I I I
77100.0" A ”101w + "110.0r A 72'111.N)

0.5
—A/_ ' (lnboow 5‘é 71001.1vl + lnblow ié 77011.04

[71,100.2v 5’5 niorrvl + l"'-,110.Iv # 77i11.Nl)

With the discussion in theorem. comments and corollary above. we can see that

using PAP strategy which does not involve any random process. just makes the fore-

casting decision based on the exact empirical data. will be trapped in some special

cases like. the maximum example we just showed in comments after the theorem.

Therefore. reasonably we would like to more innovative idea to avoid the trapping

situations.

46

3.5 PAP+ Strategy in two-extended enve10pe prob-

lem

Similar with what we discussed in chapter 2. after we construct and studied the PAP
strategy in 2-extended envelope problem. we come to consider to play against past
plus present (PAP+) strategy.

Theorem: In this kind of two-extended envelope problem. when the PAP+ strat-
egy is used to do the (N) stage prediction. the Cesaro expected loss sequence for his

PAP+ strategy 3 is Criven by:

 

2. 0.5 1 0." .
01;. 0) 7 N + 7.70.11 A + A 721.10 — w 7 0... + 11....)
where
ngIN :: number of (ak.(1.1.+1) : (i.j) ,fork : .2. 3. . . . . N — 1;
771 I (”00w + ”01.01%
M,” : {number of times of g). : 0.5 II: : 1.2.3... . .711}

(J). is the empirical proportion of 1 as the second coordinate through stage k in the

sequence AN.

"2 : ('77'Il0..~'+"',11.1v)

 

11.1.2 : {number of times of g1. : 0.5 It : 1. 2. 3. . . . .n2}

Here. gt is the empirical proportion of 1 as the second coordinate through stage k in

the sequence B N.

Proof:

Similarly in the proof of PAP strategyfor any sequence a... according to four
patterns in two-extended envelope problem. (0.0). (0.1). (1.0). (1.1). we defined the
four proportions with respect to these four patterns. and also we can always create

set _(LN from sequence aN. where

$1001.02), (020-3). (03.04)...-, (aw—1.611(1)}.

By ﬁrst coordinate in a )air is O or 1. av, can be divided into two subse uences:
1 —J\
ANZ{((LI.CLJ‘)| 04:0. .t:1.2.....]V—1. j:1,2.....]\7}

BN={(ataj)l (11:1, «22:1.2......,N—1. j:l.2....,N}

Therefore. we notice that .4 N (1 EN : (b and AN U B N :_—_ EN.

Sinee any a.. i : 2.3.... .N and the second coordinates in a pair appear once
and only once either in AN or B N. and decision on a.- only depend on {(aj. (1)..) I aj :
a._1.j : 1. . . . .i — 2. A, : ‘ .3. . . . .i — 1} i.e. loss on decision of a. in gNis equal to

the loss on decision of a.- in AN.

Thus.

(2 .
N - CL N) : Loss on a1 and (12 + Loss on second coordinates A 1(-

+ Loss on second coordinates BN

Since we always flip a coin on b1 as the start. Loss on a1:O.5.

Loss on. second coordinates AN : (7160‘ N /\ 72.61. N) — 0.5 - 14.1

48

where.
11,” : {number of times of g1. : 0.5 I I: : 1.2.3... . .711}.

and here gt is the empirical proportion of 1 as the second coordinate through stage

k in the. sequence AN. Similarly. for set B N

\‘

I n b D D ‘ . - , i —’ I I f 0
Loss on. second coordinates BN — (71001.. /\ 71.01. N) — 0.0 pm

where.
,ung : {number of times of g). : 0.5 I 11' : 1.2. 3. . . . .712}.

and here g). is the en'1pirical proportion of 1 as the second coordinate through stage
k in the sequence B N.

In this way. for PAP + strategy. the total loss is

N - CLIENII) : 0.5 + (7160ij 716”.) + "low /\ 7231“,.) — 0.5 - (11,11 + ung)

i.e.

I (2) 0. 1 ,. 0.5
CLIV' (B) :1" 7V— ’l” RFC/IBOJV A 72.61.10 + IIIIIO‘N' A 7I'III.IV) — T . (”111 + “will

C51

Proof is done. [I
The explicit form of the expected loss of PAP + strategy above shows that when

74... and It"? reach their maximum 711/ 2 and n2 / 2. the expected loss approach to its

49

minimum. which is

0 5 1
' I I I I 1 '
N + Efﬂoow A ”01w + ”low A "111,N) " 0-23

 

min CLI3)(I_)) :

and when when 14.1. and [Ln-2 reach their minimum 0. the expected loss approach to

its maximum. which is

r (2) 0-5 1 I I I I
max CLN (B) : TV— + NWWW A "01w + "10.1w" A 77'11.N)-

3.6 PARP Strategy in two-extended enve10pe prob-

lem

As we did in last section for PAPast strategy. we partition the original sequence
a... into two sets AN and BN. according to the previous stage value as the condition.
Then. we extract all the second coordinates of each pair pattern within these two con-
dition subsets. to form two subsequence correspondingly. On these two subsequences.

instead of PAPast strategy. we use PARP strategy ideas.

For each ﬁxed 11. when we forecast the next stage i.e. the (N + 1) stage. in sequence
am. we are going to do the forecast base on the Nth stage aN’s value. If (1N : O. we
will use the subsequence AN from the empirical data set AN. By PARP strategy
idea. we will do bootstrap sampling in this subsequence. and use the majority from
the bootstrap sampling result as our forecast for aN+1. DiHerent from the PARP
strategy we introduced in previous chapters. this is an conditional PARP strategy.
The object of applying PARP strategy in not the original sequence a N any more. but
the conditional subsequence. With this idea. we have to investigate the asymptotic

properties of this new methodology.
Theorem: If we apply PARP strategy on two-extended enveloI:)e prol;)le1n. let p"

50

be the PARP decision. then

 

 

mncotama—30angA+rr(Ir.An',+ n',Aw )
N A 00A OLA 10.A 11.A

T i I ‘ u ' . - (2) , __ I I I I ' . I P' .
v. here A and B a1e constants. RN (p) — 7700.1vA7701. N+7710 NA77‘11.N is the tvx o-extended

Bayes envelope.
Proof:

W'e partition the origi1'1al sequence a into two subsets:

‘4N:{(ai’aj)l (“:09 3i:1$23-..71\r—'13 j:1,2.....x)\’7}
lﬁrﬂmﬂm m=ﬂ,.r=LGqN—L jzta WAG

Furthermore. we take out the second coordinates of each subsets to form two new
subsequences AN and EN. Thus. applying PARP strategy on original sequence con-

ditional on the previous stage. is equivalent to using PARP strategy on A N and 1310.

By Gilliland and Jung (2006), when we use PARP strategy on IN to predict (110+1,

we will have two constant A1 and B1 such that.

 

~ 1
CLn] (14:17)”) .<. _
n1

, N N
(‘41+B1 I nl'gnlA(1—gnl))
where CLn1(/I. p") is the Cesaro loss on the random past strategy for sequence AN.
I
"‘01.N
/ , I 1
"01 ,N ”00. N

 

N _
gnl —

7161.1.) : number of (0. 1) in. [I N-
7160..., : number of (0.0) in 47.1.

r ... , I I ' , g r. . ~
and 721 — ("01w + 7100.10). 1.e. the number of pairs 1n AN.

Similarly. if aN : 1. i.e. our Bayes response would be based on the set. BN instead

of AN. and when we use the random past. strategy on the sequence B N. \N’e have A2.

B2 such that,

 

~ 1 , r
CL..2(B.p*) s 7012 + B2 - 712-932 A (1 - 932))
‘II-

I
"11.N
/ ../ ‘
"10.I\7+”11.N

 

7
where 9,122 :

77,10.N : number of (1.0) in By.

"7"11.N :2 number of (1. 1) in BIN.

_ I .-I
and n2 — (nlLN + 72.0.1.7).

Since CLn1(.Zf.p*) is the Cesaro loss with condition aN : 0. and C Ln2( B .p") is

the Cesaro loss with condition a N = 1. the total Cesaro loss should be:

CLN(_a. 19*) : P(aN : 0) - CL.,..(.?1’. p*) + P(aN : 1) - 013.2(5. p")

I I \
("01.J\'+"11.!\7)
I\'—1

I . I
("00.N+"01.N)
N — 1

 

 

By empirical distribution. we use to estimate P (aN : 0). and

to estimate P ((110 : 1).

Thus.

N - CLN(_(_1_,.p*) : (726m; + 71.60.10) - CL.,.1(A.p*) + (“/1001 +"1i1.A’)'CL'vz2(§-I’*)

52

7 2) .
Since the extended envelope RI. (7)) : 7700.1v /\ 7161. N + 77,10.N /\ "’11. N. We have

7 t (2 ~ .1
1w cam > — at» 7 (tn—ti + It...) 012.11.71.17 > — A
+ ("liarxr + ﬂirty) ' Cva2IEsI)* — nian A ”itNl
= KNEW + 060.10 - (CLMAJI‘) - 93'} /\ (1 - 931))

+ (In/10$} + ”ILA’) ° (CL,;2(B,I)* — 971:2 A (1 _ 971:2)“

 

l/\

l ,- I
("01.N+ ”baN) ' Eff“ + Bl ' 771 ‘95:] A (1 " 921))

 

I , _
+ ("iaN +77'i1.1v) ' 5042 + 32' \ "2 ‘ 93:12 A (1 "' 92-2))

 

|/\

(741 + Bl ' "bow A "-01.10

 

+ (.42 + B2 - "iaN /\ 72/11...)

Let .4 : .41 + .42. and B : max{B1.B2}. then

 

 

 

N - CLA’(Q~I)*) ‘ RISRPII S A + B ' f\/’"00.N A "brN + "iaN A Him?)

for all Player I move sequence a and all N (N > 1).

Proof is done.l:l

. '2 .
Comments 3.6.1: The theorem above shows that ICLN((_1. p") — HEN-KP} /.\’I has
a uniform bound which is ()(N “1”?) in Player 1 sequence of move.i.e.. the Cesaro loss
of PARP strategy in two-extended envelope problem converges to the Bayes envelope

with convergence rate 0(N’1/2).

Example 3.6.1 Suppose Player I is playing 0 or 1 for each time. we suspect
that Player I‘s move at each stage is effected by his move on last stage. Therefore.
it is reasonable for us to form this forecasting problem as a two—extended envelope

problem .

Table 3.2: Two-extended envelope problem

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

stage k a). 7);. C'LI?)(_a.p) p; CLII2)(a.p*)

1 1 0.5 0.5 0.5 0.5

2 1 0.5 0.5 0.5 0.5

3 0 1 0.667 1. 0.667
4 1 0.5 0.625 0.5 0.625
5 0 0.5 0.6 0.5 0.6

6 0 1 0.667 1 0.667
7 1 0.5 0.643 0.5 0.643
8 0 0.3 0.604 0.26 0.595
9 1 0.67 0.574 0.74 0.557
10 1 0.25 0.592 0.16 0.586
11 1 0.4 0.592 0.32 0.595
12 0 0.5 0.585 0.5 0.587
13 1 0.75 0.559 0.84 0.553

We observed Player I ’s moves through 13 stages:
gN : 1 1 0 1 0 0 1 0 1 1 1 0 1

We apply both PAP and PARP strategy after each stage to make the decision for
the next step. After the 13"" stage. we collect all the results in the table 3.1 showing
where Player I move a)... the PAP decision from Player I’s past move is p. the PARP

decision is 1 *. and the ex )6Ct-Gd Cesaro losses for both PAP and PARP strategv are
P I 0.

also listed in this table.

 

Chapter 4

Discovering Hannan

This chapter gives the results of a search of the literature that had the goal of ﬁnding
published results that were in Hannan (1957) and not recognized as being there.
we ﬁnd sexeral examples. To some extent. the cryptic style and notations of I-Iannan
(1957) explain the failure of other researchers to fully exploit the Ilannan work and to
recognize the speciﬁc theorems that he proved. Motivation for this search is provided
in part by Gina Kolta’s New York Times article Pity the Scientist li’ho Discovers the

Discovered. February 5. 2006.

In Section 4.1. we discuss some of the Kolta (2006) article and mention that Chen
(1997) established the direct connection of the Foster and Vohra (1993) result to
Hannan (1957). In Section. 4.2 we show that the main result in Feder. l\/Ierhav and
Gutman (1992) is contained in Hannan (1957). In Section 4.3 we consider for the ﬁrst
and only time the ﬁnite horizon version of repeated play (see Section 1.2) and connect
results on minmax regret found in Cover (1967). Chung (1994) and Cesa-Bianchi and

Lugosi (1999) to Harman (1957).

4.1 Foster and Vohra: Selecting Forecasters

Kolta's lead paragraph mentions the Foster and Vohra (1993) paper “A Randomized
Rule for Selecting Forecasters.” The strategy proposed in the Foster and Vohra paper
has the structure of a Hannan strategy. much in 21.}_)1_)earance like those covered by
his Theorems 3. 4 and 6. (The structure of Hannan-type strategies was explained in
Chapter 2. Section 3.)

Hannans theorems claimed and proved the conclusions for strategies built on be-
ing Bayes versus random perturbations of the nmltinomial empirical counts (t— 1)G,7_1
of Player I’s pure moves in repeated play of a. game where Player I has 771 possible
pure moves. Chen (1997) reexamined the Harman theorems and proofs and showed
that the proofs actually cover the case where the empirical distributions 01-1 are re-
placed by the empirical distributions of randomization distributions taking values in
the probability simplex in R’". A randomization distribution x is a probability distri-
bution over the m pure moves and (t— 1)GI_1 is replaced by Xt_1 2 111+172 +° - -+.rt_1.
Then Chen (1997. Section 4.3) shows how the Foster and Vohra strategy is a I-Iannan
strategy so that bounds on its regret and asynmtotics are a direct. consequence of
her reinterpretation of the Hannan theorems. Following Chen's work. Gilliland and
Hannan (1999. 2008) improved on Chen. mainly through the demonstration of good
bounds and Harman consistency for strategies in the repeated play of the dual of the
S-game. This component easily subsumes the expert selection problem considered by
Foster and Vohra without a. weakening of bounds (larger constants) that is inherent
in Chen's approach.

Vohra. is quoted in the following paragraph from Kolta (2006) in which Ilannans

name. is misspelled:

In 1957. for example. a statistician named James Hanna called his the-
orem Bayesian Regret. He had been preceded by David Blackwell. also
a statistician. who called his theorem Controlled Random Walks. Other.

56

later papers had titles like “On Pseudo Games”. “How to Play an Un-
known Game”. “Universal Coding” and “Universal Portfolios” Dr. Vohra
said. adding. “Its not obvious how you do a literature search for this
result.”

As mentioned previously. Hannan and Blackwell used different approaches in the
construction of their strategies. and it is likely that neither one named his theorems.
Moreover. the term “Bayesian Regret” was probably not ever used by Hannan. “Con-
trolled Random Walks” is the title of a talk and a subsequent proceedings paper by
Blackwell (1956) that give a general result that can be applied to produce Hannan-
consistent strategies. “On Pseudo Games”. “How to Play an Unknown Game”. “Uni-
versal Coding” and “Universal Portfolios” denote different. general but related topics.
The quoted paragraph might leave the false impression that exactly one result. or

theorem has been given the different names.

4.2 Feder, Merhav and Gutmaanniversal Predic-

tion

Feder. Merhav and Gutman (1992) considered the problem of predicting the next
stage of an individual binary sequence using ﬁnite memory. And in the section III
of this paper. they gave out the deﬁnition of this predictor. which is called “S—State

Universal Sequential Predictor.” in the following way:

A. “0”. with probability 0503(0)).
At+1 :-
“1”. with probability q)(ﬁt(1)) : 1 — (7.003(0)).

57

where (b(-) is given by

 

f
0. Ogo‘<%—e.
(..- 1
(MOI—i EIo—éI—tﬁ. é—cgog%+c
I1 é+c<o£1

They allow 6 to depend on t. i.e.. c 2 6.7,. and use 771(1’1‘) to represent the expected
fraction errors made by this scheme over the sequence 1*" where the expectation is
with respect. to the randomization in the deﬁnition of .7177. 773L141?) is our Cesaro loss.

In the major theorem. they proved that. for c 2 c. I 1/(2\/t + 2).

where 61(71) : 0(1/\/n).7.'1(.r’f) : 1/n ~ min{Nn(0).N..(1)}. and N..(()) and .’\-'.,_(1)
are count of zeros and ones. respectively. along the sequence 21:. Thus in their major
theorem. they showed that the expected error converges to the simple Bayes envelope
with convergence rate ()(1/ V5).

However. we can show that their strategy is equivalent to a special case of Han-

naii’s strategy. Recall a Hannan—type strategy in matching pennies problem. which

is discussed in chapter 2 with Player II's predictor:

Os Z2 — Z] _<_ 1—2:_11
0(1 — 97—1791.) : 1—29.‘
1., Z2 - Z > —_h_:-:—1-

where 97-1 is the proportion of 1‘s in Player 1‘s play from stage 1 to stage t—1. In

another words. Player II's predictor is

.. .. . . 1—2 _ .
£7 “0". With probability P(Zg —— Z; S ——Tg:——1)).
. 1+1 :

1 9-, . . . 1—2 _
“1". With probability P(Z2 — Z] > —I:’—1-).

Now we can show that Feder. I\:'Ierhav and Gutinaiis strategys probability of

58

playing 1. 0502(1)) : 1 — 0702(0)). is the same as probability of playing 1 in Hannans

1—2
strategy P(Zg — Z1 > Iii—1 ) with a speciﬁed distribution for Z : (21.22).

 

Take Z1 to be a random variable from uniform distribution on [0.1]. let Z; be
degenerate at 1/ 2. and let o :- 1171(1) the proportion of 1 in empirical distribution of

Player I moves from stage 1 to t-1. Then. we have

0. 0_<__o<%—
'(')—P(Z<1+1('—l))— 1 1 1 1 1
(P(l— 1 g 2_:‘0 2 — zIo:—§I+-2-. 5—ESOS§+€.
1. 5+c<a<1
and letting a = 91-1. ~21: : 32;. i.e. h. 2 45. we have
1 1 1 1 2 1
PZ<- --'—-— :PZ__<_:__
(1 2+2€fa 2)) f 1 2 htfa 2))
1
: PZ ——<—2 _ —1
(1 2 htfgtl I

1
: P(Z1 _ Z2 < ~2—(2g._1 — 1) where Zg =
't

IQIH role-7

1
: P(ZQ — 21> I—(l — 291—1) ”(Uh-6TB Z2
I-t

This shows that probability of playing 1 in the Feder. l\«'Ierliav and Gutman's strat—
egy is the same as the probability of playing 1 in Hannan’s strategy where 21 ~
U nif ormIO. 1] and Z2 is degenerated at 1/ 2. Furthermore. since probability of play—
ing 0, Prob(playing 0):1-I-’rob(playing 1). we also have (1902(0)) : P(Zg 7- 21 _<_

i“ — 2gt_1). which is probability of playing 0 in Hannan‘s strategy.

Therefore. the Feder. Merhav and Gutman(1992) strategy is the same as a. Han-

nan’s strategy (1957).

59

4.3 Minimax Regret

In this section and only in this section we consider repeated play with a ﬁnite horizon
N known to Player II in advance of the repeated play. Player II‘s concern is with
regret at stage N and 11’s sequence of recursive functions 7710 can depend on N. There
is a considerable literature on predicting individual sequences in the context. of ﬁnite
horizon repeated play and minimax regret strategies. A minimax regret strategy can

be constructed from backward reduction starting at. N. Since

N—l
NDN(a.7_I) : Z L((L1-P1(Q¢_1)+ {L’(aN-pA’(QA’—1)I— NEIGNII

i=1

p10 (a N_1) can be chosen to minimize
max{L(aN.p;(r (gN_1)) — NR(GN)IaN E A}.

Then the resulting max is added to L(aN_1.pN_1(gN_2)) and pN_1(gN_2) chosen
to minimize the maximum possible total over all aN_1. Continuing in this way. a

777.11! mAI , rah/I

minimax regret strategy pm“ : 091 . p2 ..... p1. ) is determined. The minimax

regret strategy results in constant. regret across all sequences _a and that common

value is denoted bv DWM and is called minimax regret.
u A o

Hannan (1957. Section 4) constructed minimax strategies for the ﬁnite horizon
repeated play of a game where Player I has m moves. He illustrates his results with
an example of “matching Iii-sided pennies.” The translation of his minimax regret to

redictin a binar r set i - r 'r s t' kii 0" ' — ‘7 his —- — l and notincr tl"'t
p. ,. g. .y..111eice equie. a 10771—2. .pl—p2—2. - gia,
his loss of 2 comes from a match not a mismatch. Then the translation of Hannan

(1957. (8). p. 115) gives.

A7
NDIGM : “; — E min{Y. N — Y} (4.1)

60

whereY ~Binomial(A %)

PrOposition 4.3.1. For the repeated play of matching pennies with a ﬁnite horizon.

mr‘l! 771.~'\I_
2k+1 — _2D-

 

: iii)” :1), (4.2)

where Y‘k ~ Binomiat (211:. 1/ 2) Furthermore. DmM is a decreasing sequence with

limitO. V2ADmM~1/\/2r. and

«I oT’._1 +od) am
J27; Ii

Proof: Hannan (1957. Theorem 2. p. 111) develops an asymptotic lower bound
for DIEM in the general case. that he considers. that. is. in the repeated play of an
m x 77 game. The constant h in his Theorem 2 bound can be shown to be 1 for
matching l\K()-Rl(1(‘d pennies with the loss matrix we use. so that Theorem2 implies

that 111111an N1/2D70M > (2 7)l/1_/2
“e develop an expression for DmM. Since min{Y.A —Y }— — N — max{Y A —Y }.

(4.1) can be expressed by:

0T4: W Emmﬂ N—i}— (in

[\DIH

A calculation using the syn’imetry of the distribution B (N %) about A / 2 shows that

with N : 211? + 1.

2k+1. ifY:0. 2k+1
211:. if Y : 1. 2k.
2k — I. if Y :

I‘Q

2k — 1.
niax{Y. 211‘ + 1 — Y} = <

k+1. ifY:k. k+1.

 

61

Therefore.

 

 

 

 

 

 

1—
, , , 1 2k+1
Emax{1'.2Ar+1-—l'} : QIWZ< j )‘(2k+1—J)
j:0
k
1 2A? 1 l
:— 2. A Jr) (2k+1—_])
22H1 , ]!(2A ——J)l
3:0
_ 2k +1 A (2A)!
7 2k ,1 - ,._ t
1 1 u
— (2A‘+1)-(§ +— 2’P(} — —A)).
where Y" ~ Binmniat(2k. %) i.e.
r _ ,9" I, __ .,. 1 1(2A')!
E111ax{1,,.A+ 1 —l } —— (2A + 1) - {5 + 24kklk'}
And with N : 2A7, similarly we have
, , , A , (2A7!)
E111ax{1.2A —1}: 2A {% + 24*A‘A'}
It follows that minimax regret is given, by
1 (2A.) 1
mAI mA/I__ /* __ .. '
D2A+1:D2k :5 4‘Ak'k -§P(3 —Al» (4-0)
where Y” ~ Binwnial(2k, é). A". : 1. 2. 3, . . ..
Furthermore, since for all A: : 1. 2 DmM > 0. and
WM 1 , 2(k-H)!
D2<k+1)__ 2 4Z’(k+1)!(k+1)! __. 2117+] < 1
D7271)” — _1_ - 2M“)! _ 2L + 2

2 472(k)!(k)i
This shows Dim“ is smaller than DT’ﬁM i.e.. DL’";M is a decreasing se uence.
2(A+1) 2A 0

62

From the Stirling Formula it follows that

1
VnAI ,
DN 27." N ' (46)

we deduce (4.3) from the well-known result. on the rate of convergence of the

Walli's product sequence to 7r/2, i.e..

7r 7.“ 1
sz————+0— , ask—+00
2 8A". (A?)
, 77- .. . 1 . _. ,
where Vlk : 7}, m. 2 4113—2, ak : %. (For example, see H1rscl'1l1orn (2000).) The
(1" ' ". .
A

sequence {GA} was encountered at the end of section 2.1 and was studied extensively

in Frame and Gilliland (1985) where a continued fraction representation is found.

Note that.

 

 

Then «2w - V21: . 03;.“ -1: 53’s; - 7% — 1.
And we claim that

1 A 1
\/2A- DE’EM — — S —, where A : 0(Z).

 

Since
2A”? 1 7r
‘5 — 1 g —— — 1
ZAC-l-l 77k 77k
- - . . E _ l ' -1.. — 1
Proof IS done. If we show that. J77 —— 1 + 0(k). i.e. T; —— 1 + 0(k).

6 3

Since Wallis’ sequence,

 

 

. 1
l’l-"k : g — g + o(;) as A ~ 30
and Wk. : 7—3— then we have
77 1
77A 71' — 21—1: + 0(Z)
i.e.,
Wk 1 1
— — 1 — — — .
7.“ 4A‘ + 0(Af)
In another form,
7." _ 1
m. _ 1—1/(4k)+ 0(1/k)
1 — 1
I 1 + at 1
4A“ — 1 + 0(1)
1
: 1 + O(—)
A".
Proof is done. El
Here is a table of initial values of minimax regret. Recall that. D322: 1 2 $1M . and

note that 1 /\/ 27r : 0.39894.

We have given. a simple expression for the minimax regret. There is interest. in
the minimax regret strategy for Player II, that is. the strategy pmM , that minimizes

the maximum regret. for the finite horizon N.

For the simple case. that. we are considering, the strategy can be deduced by
. . tr- , . '.» :9 : :21
specializlng Hannan (1)01. (4). p. 114) to the bmarv case 711 .. vs 1111 pl 7); .2.
Also, the Hannan loss is 2 for a match. so that. his Bayes envelope and procedures

must be reinterpreted. His y} is our 1737”“. Harman (19.37, (4). p. 114) written for the

64

Table 4.1: Convergence of minimax regret

 

 

 

 

 

 

 

 

 

 

 

N DTM WET”
1 ;— 0.5
2 31 0.35355
4 1% 0.375
T
S :3: 332233
256 '
10 561—9; 0.38911
12 3% 0.39073
14 1%; 0.39139
16 565% 0.39275
00 0 0.39894

 

 

 

 

 

binary sequence case that we are considering is

j—l j—l

10mg.-.) = $11 + EH<<Z a. + 11'1/N — Em: a. + 1" +Y1/N111 (4.7)

t2] 121

where Y ~ Binomial (N — 13%); and Y“ ~ Binomial(1, %) are independent. Expec—

tation over Y" results in

I)?” (91—1) = éll + ER“: at. + Y)/N) — R“: at + 1 +1”)/N))l {4.8)

Since R(7r) 2 n1in{7r.1 — 7r}. we see that

‘ ’ ' i '—1
j—l 13 If )I Z [\l/Z "" 22:10,!)

j—l
R((Za1+Y)/N)—R((Zat+1+YVN)= 0, ifYZN/‘Z— {32-1
t—l

2'.
{.21 .
—1, if 1" g N/2 — J_1(I¢- 1.,

tzl

65

It follows that

. 1 1 1‘1 1“
PTM(ﬂJ-_1l = -2- + 51130” 2 N/2 — a.) — W s N/2 — 1 — Zn.) (4.9)
i=1 i=1

where Y ~ Bin.0mial(N — j, %) ifj = 1.2, . . ..N — 1 and Y is taken as 0 ifj = N.
Note that p713“ (gN_ 1) is simply the PAP strategy, that is, at the last. stage N, the
Player II minimax regret strategy plays the Player I majority choice in the ﬁrst N — 1
stages.

In the minimax strategy, is the probability used for playing 1 is larger than the
probability in Hannan’s strategy and in the PARP strategy. Figure 4.1 and 4.2 show

these three probabilities for N =5 and N210.

Table 4.2: Player I’s play sequence with N25.

 

stage12345
(1.11110

 

 

 

 

 

 

 

 

 

Figure 4.1: Hannan, PARP and Minimax probability for N =5

 

0.9 -

0.8 -

 

 

Harman Probabilty
0.7L —— PARP Probability
Minimax Probability

 

 

 

 

0.6 ~
0.5
Ml
0.3 -
0.2 ~

0.1}

 

 

 

4.5 5

—L
_L
or
re
01
co
co
m
bk

66

Figure 4.2: Hannah, PARP and Minimax probability for N=10

 

   
      

 

 

0.9 —
0.8 -
0.7 t
Harman Probabilty
0.6 — PARP Probability ~

 

 

Minimax Probability

 

0.5

0.4 -

0.3 -

0.2 ~

0.1 ~

 

 

 

Asymptotically Hannah’s probability is smoother than probability in PARP strategy,
and Minimax probability is bigger than both of them as shown by Figure 4.3. We
generate a binary sequence with N=100 from Bernoulli(1,1 / 2) as Player I’s play se-
quence and the asymptotic behavior of Harman, PARP and Minimax probability are
illustrated by ﬁgure 4.3.

Cover (1967) develops many interesting results concerning strategies for predicting
binary sequences. Cover measures the performance of strategies by gain through the
number of matches (not. by loss through number of misses) so his regret. is the negative
of the one we consider. The display following his (4.13) is of the minimax regret value

4.4) with the asymptotic result. DmM ~ 1 v27rN noted. Cover 1967, 4.11)) gives
N

Table 4.3: Player I’s play sequence with N=10.

 

stage12345678910
Q1111001101

 

 

 

 

 

 

 

 

 

 

 

 

 

 

67

Figure 4.3: Hannan, PARP and Minimax probability for N=800

 

i

 

 

 

 

 

 

 

 

 

0.5 Hannan Probability ‘
PARP Probability

0-4 ’ Minimax Probabilty ‘

0.3 .

0.2 ~ 1

0.1 . ‘

 

 

l 1 l l l l

1 00 200 300 400 500 600 700 800
N

 

the minimax regret strategy, which we will show is the same as derived by Hannan

(1957).

His predictor achieving 3‘ is:

 

And it follows that Cover (1967) (4.13), s can be speciﬁed as the simple Bayes enve-
lope, i.e.

9(7)) = max{r),1— n}, 0 < 7] <1.

Let i = A? — 1, then

N—k

A 1 1 _. Ak+j Aka—1 N—A‘
.,7_1:_ N._N’~+1 .____ __ .
pill 1 2+ (,1 [or N) s( N 1)( j )

i=1

68

where
. A+j . N .. , A
..k+-7 .. ifg—ASJSA—A

 

1—5ﬂ, if0§j_<_%'—k,

. k+'—1 - N+l a ,' 7 ,.
§(A"+j-l.): ‘13?“ lfT—ASJSJV—A
A 1—ﬁii rogjg%§—t,

Since j is an integer, based 011 the fact. that

Ak+j—1 L ﬁ%¥—k3jgN—k
A.l—a—jV—ﬁ—

 

.‘ . N ,.
—11 lfOS]:-§—A,

we have

A . _ __ 1 1 ,, N—n 11M “ N—It 1.1%.
mil 11—2+§l§; (j)(2l (j)<211

 

where Y ~ Binomial (N — A, 1/2).

This shows that Cover‘s strategy fir.(i) is the same. as Harman’s strategy (4.9), if

the envelope is the simple Bayes envelope.

Chung (1994.) and Cesa—Bianchi and Lugosi (1999) studied sequential randomized

prediction for an arbitrary binary sequence. In latter with n replaced by N, the

prediction at each time t : 1.2.3... . , N, it. is given by
t—l 1 1 . - t—l 'IN-t ' ,I,—l XIV—t ‘
My ) = 3 + 3El111fLrtg 0L ) —11;fLF(y_ 12.. )l (14-10)

t—11}/N—t

_ _ , I_ . .
3 IS the set. of experts, and yt IOLA t and 1; represent. the followmg sequcl'lces

69

respectively:

7

ylsygiu'1yI—1i052’2‘°~-32,N

and

y1,y2.....yt_1.1,1"t,...,1’N

where the 1j are independent random variables which follow Ber'n,ou.lli(1/2).

In Chung (1994) with T : N, it was expressed as

__ 1 1 _, _ , r _ ,_ , 7
1?er 1) = + §L[n1;ax (1&1th 1,0,1'til)-111;;iix<I>,-yy(yt 1:112'r1i1ll (4.11)

ml

where 111ax,-<I>,-,N(g) gives the maximum pay-off for sequence s among i experts given

the total stages N.

Since these two predictions are essentially the same, we take Cesa-Bianclii and
Lugosi (1.999) strategy as an example and specialize it to the simple Bayes envelope.

Then

1, ifSZN/2—(t—1)-g,_1.
ilgf .Lp(y“10YN“)—ilgf LAT—113W“) = 0, if S 2 M2 -— (t — 1).q,._1—1/2.
15' a t ,
—1, ifSZN/Q—(t—I)'gt—1—l.

where S : :1; ~ BinomiaMN—t. 1/2), since Y,- ~ Bernoulli(1/2), i : t+1. . . . , N,

and gt_1 is the proportion of 1 from stage 1 to stage t-1.

In this way. (4.10) is converted to
_ 1 1 , . , .
PinM(?_/t 1) : §+§IPI5 _>_ A"/2—(t—1)-g,_1)—P(S S AI/2—1’At‘1l'gt—1) (4.12)

where S ~ Binomial(N — t, %).

This shows that if we specify the envelope to be the simple Bayes envelope. mini-

max regret results i11 both Chung (1994) and Cesa-Bianchi and Lugosi (1999) are the

70

same as found in (4.9), Hannan (1957).

71

Chapter 5

Expert Selection Problem

5.1 Introduction and Review

Nowadays. all kinds of consulting services are booming. especially in ﬁnancial services.
There are many ﬁnancial companies and agencies giving advice everyday to all kinds
of investors. They are using different and complicated system or algorithms to analyze
the ﬁnancial market of different ﬁnancial products and to forecast the market of these
products. As experts with experience and knowledge in ﬁnance, each of them is trying
to persuade the individual investors to take his / her advice. However, surrounded by
so many experts” advice, as an investor. how can one make a decision? This is called
erpert problem.

Littlestone (1988) generalized the earlier researchers idea to an arbitrary set of
experts. However. in his strategy. randomness is not included in the forecasting
process. His strategy concerns picking the the expert whose forecasting record is the
best, as the best expert in the set of experts. and using this best expert‘s prediction as
the ﬁnal forecast. He showed that as long as there exist one expert whose forecasting is
correct in all stages. the ﬁnal decision maker will not make more than tog/2 N mistakes.
where N is the total nu11‘1ber stages.

To remove the restriction in Littlestone(1988). i.e.. to consider the case that among

72

all experts there is no expert always correct. Littlestone and \N’armuth (1989) intro-
duced the idea of weighted majority algorithm. In this strategy, they assign a weight
to each expert. Once an expert makes a mistake in forecasting. he will receive a
penalty: his new weight is old weight multiplied by k, 0 < A" < 1. i.e.. to reduce. the
weight. on his advice in our ﬁnal decision. Furthermore, \N’armuth and Haussler, et
a1 (1993) considered the weighted majority strategy in the situation that each ex-
pert’s forecaster is a probability distribution 011 set. {0,1}. 111 all these strategies.

randomness is not involved in selecting the experts actual forecast.

Hannan in 1957 ﬁrst. proposed the idea of bringing randomness into sequential
forecasting proble111s. In Hannans strategy. a random factor is added to the empirical
distribution. and a predictor based 011 this adjusted empirical distribution is used as
the forecaster for next stage of the play in a repeated game problem, as we have
introduced in previous chapters. This idea can be introduced into expert. selection

problem.

Foster and Vorha (1993). proposed a. randomized rule for selecting experts. They
ﬁrst proposed this expert actual selection problem instead of predicting a probability
distribution of experts set. or combining the experts" advices. However. the random-
ized strategy they proposed is equivalent to Hannans strategy, which was proved by

Chen (1997) and improved in Gilliland and Hannan (1999. 2008).

By using bootstrap sampling, we introduce the PARP strategy to the expert selec-
tion problem, especially the two experts selection problem. which will be discussed in
section 5.3. Section 5.2 will introduce an example of usage of expert selection in the
real world. a 111ethodology called focus forecasting. And at the end of this ("l1a.j.‘1t(-*r. a

simulation example of using PARP strategy in ﬁnancial forecasting is given.

73

5.2 Focus Forecasting

5.2.1 Introduction

In inventory management. forecasting is essential. As a new concept of forecasting.
the term focus forecast was raised by Bernard Smith (1978). With around 30 years of
usage so far, this method which is described as a heuristic 111ethodology and it is used
widely in industrial area. Over 800 companies in 47 countries worldwide are using
Demand Solution which is designed around focus forecasting in their i1‘1ventory man-
agement request. And this method is described to be a simple simulation approaches
to 01_)ti111ization. to be more practical. more easy to understand and a simple system
to work.

Focus forecasting constructs a pool of alternative decision rules for forecasting one
stage ahead. At. every stage. all the decision rules or models in the pool, are tested by
the empirical data generated before this stage, and the rule with the smallest error
in selected for the decision.

Therefore. focus forecasting simulates every time it forecasts. It is a dynamic
simulation. It uses a computer to simulate every time, and compares the errors of all
the rules, to pick one to use in the current forecast. Regardless the seasonal or trend
type of time series data, focus f(_)recasting itself just picks the one best strategy l11ased
on the e111pirical test against recent history data.

In inventory management. the traditional method is exponential smoothing, which
is taught to almost every student in inventory management and is still the most widely
used forecasting method in the world today. However, focus forecasting doesn't use
the exponential smoothing to approximate moving average. The reason Smith states

in Bernard Smith (1978):

In those early computers, storing a twelve moth inventory history to calcu-
late a moving average was expensive, inaccurate, and dangerous. So Bob

74

Brown used exponential smoothing. . . Focus Forecasting doesn’t use expo-
nential smoothing to approximate moving average. Why? Well, comput-
ers today don"t make mistakes. They are nearly 100 percent. accurate. ..

However, Gardner and Anderson (2001), compared the focus forecasting and expo-
nential smoothing showing that exponential smoothing is substantially more accurate
than the Demand Solutions approach. Although there are some criticism on focus
forecasting in academic ﬁeld, we still can notice some interesting ideas in focus fore-

casting, which is Flay Against Past strategy’s idea.

5.2.2 Methodology

Focus forecasting constructs a pool of decision rules or strategies. Some of these
rules are designed for recognizing trend. some of them are designed for recognizing
seasonality.

For example, ‘whatever the demand was in the past three months will probably be
the demand in the next three months‘, this would be a rule to recognize trend instead
of seasonality. Vy'hile if a simple rule as ‘whatever percentage increase or decrease we
had over last year in the last three months will probably be the percentage increase
or decrease over last year in the next three months“. would be a rule of recognizing
seasonality.

Gardner, Anderson—Flether and Wicks (2001) listed the seventeen decision rules
included in Demand Solutions. And there rules are functions of the previous quarterly

data:

1. Next quarter will equal last quarter.

‘2. Next quarter will equal last quarter plus a growth factor.

3. Next quarter will equal the same quarter a year ago.

4. Next quarter will equal the same quarter a year ago plus a growth
factor.

5. Next quarter will equal the average of the last two quarters.

6. Next quarter will equal the average of the last. two quarters. plus a

75

growth factor.

7. Next quarter will equal the average of the last two quarters with the
last quarter double weighted.

8. Next quarter will equal the last quarter plus the difference of the
corresponding quarters last year.

9. Next quarter will equal the average of the last three quarters, with the
last quarter double-weighted, and which seasonal adjustment.

10. Next quarter will equal the average of the same quarter in the last.
two years plus a growth factor.

11. N ext quarter will equal the average of the last quarter of the current
year plus the difference of the corresponding quarters from the last year
plus the difference of the corresponding quarters from two years ago.

12. Next quarter will equal the average quarter of the last year.

13. Next quarter will equal the average quarter of the last year plus a
growth factor.

14. Next quarter will equal the average quarter of the last two years.

15. Next quarter will equal the average quarter of the last two years with
seasonal adjustment.

16. Next quarter will equal the average quarter of the last year plus the
change from the average quarter two years ago.

17. Next quarter will equal the average quarter last year, plus the change
from the average quarter two years ago. with seasonal adjustment.

Then, during the simulation procedure. an error of measurement. for each strategy

for each time will be computed. and the final forecast strategy is selected among these

decision rules.

From these decision rules’ definition. we. can easily notice that although the final

decision. is selected among these rules, ﬁnal decision is a function of the history data.

i.e. past data. since all the decision rules are function of past data. In another words,

focus forecasting is using Play Against Past strategy to make the decision.

From our discussion about. PAP strategy and its failure. we could see that there

are some situations in which focus forecasting fails.

76

5.3 Two experts selection problem

Suppose there are two experts who give out 1.)redi(i~.tions against the market each day.

Their errors probabilities are recorded at the end of the day:

Expert 12X]. X2. X3,... Xn_1, Xn....

Ecrpcrt 2:3”1. Y2. Y3.... Yn_1. 3%....

where we assume the errors are bounded. Without. loss of generality we take {A1}

and {3’2} 6 [0.1].

Selecting an expert for each stage is repeated play of the (:ton'iponent game where
Player I selects a pure 0. : (2", y) E [01]2
Player II selects a coordinate b E {1, 2}

and the loss function for Player 11 is
L(a.b) : X[b : 1] + i"[b : 2].
If 77 is a probability distribution on {0, 1]2, then the Bayes risk of b is

2].

L(7r.b) : /L(a,b)d7r(a) : E7..(X)[b : 1] + E7,(Y)[b
Bayes risk is any choice I) to minimize thus.

1‘ If EWLX) < E74)?
5 : arbitrary. if wax) : E43,)

:2. if EN(X) > 13.0?)

77

and minimum Bayes risk is

In repeated play. Player If uses b,,(g,_1) for stage t, t : 2.3.... and the simple
envelope is YA,- /\ FA.- where (TY—Nine) is the average of the (X1,Y1), (X203). . ..

(XN a YN )-

5.3.1 Reduction to Z-problem

Let Z,- : X, — 3'2, then the Bayes envelope of two-experts selection problem RN :
XN /\ YN is equivalent to EN = Z N /\ 0 + YN. Thus. two experts’ forecasting error

SEQUGIICGS are equivalent t0:
£221, 22, Z3,... ZN_1, ZN~°--

and the zero sequence since 3'2; are fixed and given by the past history data. In this
way, original two experts problem is transformed into a one-dimensional game.
Each term of this Z-sequence can be arbitrary number from -1 to 1, then by

previous result,

N N
N ' DN(Z.I)(ITI)) 2 :(Pk ' Zk) — (Z Zk) /\ 0.
L21

k=1
where B is the probability sequence P1, P2...., PN. Pk : Prob(7;_1 S 0). for
If : 2,3....,N, P1 :arbitrary number from [0. '1], and 72.4 is the average of the
random sample from {21, Zg, . . . ,Zk_1}.

Let 0 S AN T and 0 _<_ BN T be such that.

—AN S 1V ° DN S EN.

78

If we remove all the 0 terms from this Z—sequence to form a new sequence Z which is
the subsequence of original Z-sequence. let mN _<_ N be the number of the non-zero
terms, also be the number of the terms in sequence Z.

Then. consider P is the probability through the past summation, we have

~

—4 :N-DN<.Z_..B>:mN-D (2.13338

4 ﬂlAr m.]\.' mAr-

Therefore. we have a stronger bound for original regret:

'24 S N ° DNLZ-E.) S B

77 .
I ‘f\_f

m]\.' '

When {A}, {3’2} 6 {0.1}. Z,- 6 {—1,1} which is equivalent to {0,1} matching
binary bits problem. In fact. any two state game.i.e. two Players’ action set is {(1. b},

is equivalent to {0. 1} matching binary bits problem.

Lemma 5.3.1. [fin a repeated game Zk E {—l,l}, Pk E [0.1], L(Z. P) : Z . P,

then the regret of this game is equal to the regret of matching binary bits, i.e.,
DMZ-B) : Didi-.3)

where X 6 {0,1}, P E [0, 1] is the matching binary bits problem.

Proof: By the deﬁnition of the loss function.

—Pk if 2,, : —1

kazk~ Pk) :
Pk if 2;, : 1.

Where Pk : Prob(7k_1 S 0). And from matching binary bits (Tame, X E {0.1}.

P E [0.1], and

Pk if Xk : 0

L:.(Xk.Pk) : {
1— Pk if Xk : 1.

79

 

where P). : Prob(fhl 2 é). Then, let qk : 1 — Pk : Prob(Xk_1 g é).
[4“st Pk) : Xi- ' (1k +f1— Xk.) ' f1 ‘ (1k) : 2X1.- ' (1k +1— (1k + Xk-
Since there exist the one to one transformation mapping:
. 1 , ,
A :72-(Z+1) or Z:2.X—1.

Then 7;, : 217k — 1, i.e. Prob(—Z_k_1 S 0) : Prob(2fk — 1 S 0) : qt.
Thus.

mama) = 2X- —1) P

%N (32.31 —1)-P.-+1—-X,~—(1—X,)

2
LN)“:

(IN—l (1‘ XN)

 

Therefore. since 7;. : 27X]. —- 1 the Bayes envelope of matching binary bits game:
J—X—N /\ (I — PTA! : (2Yk - I) /\ 0 + (l - KN).
the regret of Z-sequence is:

DN(Z,P) .: LNfZN:£N)_—Z_NA0
I [IR/'(ZQANQAJ—(l—YN)—(2:}Tk_DAD
: *N‘(XN qN)—((2Xk_1)/\O+(1-TN))

—_- DMX. P).

Proof is done. El

Comments 5.3.1. With proof in. lemma, study of tuio-erperts selection syste7n.,

80

is

equivalent to the study on sequence Z, where Z; = X.- — iii. And this is used in

following sections.

5.3.2 Worst case discussion

In Gilliland and Jung (2006), asymptotic convergence property was proved by con-
sidering the worst case g for the strategy. For hilatching Binary Bits problem, the

worst case for both PAP strategy and PARP is:

19
O
i—0
O
i—s
O
s—-
C
i—-'
O
i—4
O
i—I
O
I“

By establishing a. bound for the modified regret of this worst. case. a uniform bound
of the regret for all situations of Player I’s play sequences g was developed.
To study the asymptotic convergence property of PARP strategy in the two-

experts selection problem. it is reasonable to seek and analyze a worst case.

Lemma 5.3.2. The worst case of modiﬁed regret of PARP strategy is not achieved

at boundary. i.e. maxg DN(g_. b) is not achieved on the boundary.

Proof: Suppose n:3. so two experts system is:
(XL Y1 ): (X23 Y2). (X3, Y3)
the modified regret of Play Against Random Past strategy is defined as:

3 ' Dstgbl (X1 + Y1) + X2le .<_ Y1] + X3 ‘ P2 + 3'3 ' (1 —- P2)

_ l
T 2
— (4X71 + .XQ ‘i‘ k'3) /\ (3’3 + Y; + Y3),

where the probability P2 : Prob(Y; S 7;) and [] is an indicator function.

By discussion of last section, two—experts selection problem is isomorphic to Z-

81

problem. Let Z, : X, — 1'}, then

I
3 ' Dsfﬂtél : 5(21+ 2Y1) + Z2lX1 S Y1] + Y2 + Z3 ' P2 + Y3

— (21+ Z2 + Z3) /\0 — ( ”1+I"2+Y3),
where the density function P2 : Prob(—Z—; g 0). i.e.
1
3'D3(g,b) : 3214—22-[21 < 0] + Zg-Pg — (Z1 + ZQ+Z3) /\0

According to Play Against Random Past strategy, the proli)ability mass function of

7; from bootstrap sample {Z1, Z2} is:

when choose Z1 twice

1%; when choose once 21 and once Z2.

when Z2 twice

rblt-‘Nlt-‘rbl’d

Suppose (Z1 + Z? + Z3) < 0 i.e. Bayes envelope R3 : 21 + 22 + Z3.

Further more, assume 21 > 0. then

I
3‘D3stQ) :- 5Z1+Z2°0+Zs'P2—(ZI+Z2+Z3)

l
: —72'Z1—Z2—Z.3'(1-P2)

Since Z1 > 0. to achieve maximum of the regret

max 3 - D3 (ab).
21 >0'Zl +ZQ+Z3<O

82

22 and Z3 should be negative. When 21 > 0, Z2 < 0.

I
— when Z1 + 22 > 0
P __ 4
2“ 3
X when 21 + Z2 S 0.
If P2 3: ﬁ,
3Dtb)- 122(1)(1 1)
21>0.ZIIT}dZJ\2+Z3<0' 3 a._ _ ‘2 1 2 4

z 1.25

where Z1 : 1, Z2 : —0.9999. and Z3 :: —1. since Z1 > 0. Z2 < 0 and Z] + 22 > 0.

The nearer to -1 Z2 is. the better. but Z2 can not, be -1.

If P2 = 3,
3D( b)‘ 1Z Z (I)(13)
Zl>O.Z?:llf174k-‘_)+Z3<O 3 Q7 — 2 '1 2 4
z 1.25

where Z : 0.0001, 22 : —1. and Z3 2 —1, since Z1 > 0, 22 < 0 and 21+ 22 S 0.
The nearer to 0 Z1 is. the better, but 21 can not be 0.

For the case of Z1 S 0.

1
3193(9le 3Z1+Z2'1+Z3'P2—(Z1+Z2+Z3l

1
= 721—2341 —P2>

Base. on the deﬁnition of probability mass function P2, we have

when Z1 + Z; > 0, since Z S 0. Z2 > 0
P2 :
when Z] + 22 S 0.

phlwib-lr—i

83

W hen P2 : i,

1 1
2130.21111§%2+23<03.D3(Q~Q) — —§Z1 "' (—1)(1—E)
m 1.25

where 21 2 —0.9999. Z2 : 1, and Z3 : —1, since 21 S 0, Z2 > 0 and Z1 + 22 > 0.
The nearer to -1 21 is, the better, but. Z1 can not. be -1.
\Vhen P2 : 3,

I
max 3 - D3((_i.b) : —+-
Z130.Z1+Z2+Z3<0 2

where Z1: —1,Z2 : {any value 6 [—1,1]]21 + 22 S 0}, and Z3 : —1.
All the calculations above, shows even for n:3, the maximum of the modified
regret of PARP Strategy is not achieved at. the boundary of the problem domain

[0. 1]" x [0, 1]" which is equivalent to the domain [—1, 1]" for the Z-problcm.
Proof is doneC]

Therefore, the proof of Hannan consistency of PARP strategy for two-experts

selection problem can not be studied through the worst case idea.

5.3.3 Hannan consistency of PARP for Certain classes

We concern the asymptotic convergence property of PARP strategys regret under
different classes of sequences. With discussion in Z-problem, we notice that the orig-
inal problem is equivalent to Z-problem, i.e. we only need to discuss the convergence

property of Z-sequence.

Z3 Z1, 22, 23,... ZN_1, ZN....

84

The most easy case is the one expert is superior to the other one. i.e. in sequence

_Z_. all Zt has the same sign. Let Z > 0 without. loss of generality. Then, for all

N > 0,

A7
N DN( Z p(__z_,_rp) —: Zk P(ZL <0)— NZ. /\ 0 I Z 2.. - P(‘Z;_1 _<_ 0) — 0 z 0.
1:21

—* . .
where Z k_1 IS the sample mean of bootstrap sample in PARP strategy.

This means if one expert's prediction is always better than the other. the regret

of PARP strategy is always 0.
More difﬁcult situation is the two experts are competing with each other.

Theorem: For any sequence Z. Z. 6 [—1. 1]. if at 2 C1 > 0. the regret of PARP

strategy converges to 0 with order (Xvi—1;) i.e.,

1
I)N(;parp)—+O with, mic 0(7).

where C 1 is a constant and of. is the variance of sequence {21. . . . , Zk}.

Proof:

By Bermr-Esseen theorem in Loeve (1963, pp 288),

 

, —Z_k_1 ° V A — l l I. C Pk
m-DN(;,;.*) : E :Zk.(<1>(— )) + E :Zk-—— . —.—
k:1 0k_1 k=1 W a}:

where C 1s a constant. pk.— —— EIZ k — Zk|3 .

Now we consider the term:

 

IV __

Z-_ - ’k—l
Zz.-<<1><—“ V >>
k=1

Uk—l

 

LGt Ak—l : —0A‘ 1 0 (71.4 ~ V1: — 1), and since Zk : 1;? ~21. — (I: — 1) -7k-1. we can

write the term as:

It follows that

:(k' - 7k —

1V
k=l

Therefore,

N - DN(_Z_».parp)

A!

2]ka — (k— 1)-Z._1).<1>(.4._1).

k2]

(A — l) ' 7k—1)‘ q)(.4k_1)

2 1.2,, <<I>(.4._1>— «A.» — o + Na - MAN)

0

 

N A - ..
Z k . 7k ./ Al—l 1 - 6-3::de + 4N7_Z-[\r - (I)( 4N)
V27? 4

 

N _ AI—l 1 1'2
: 2192p] ‘ -e_7d;I‘-+NZN ‘I’Mtl
k=1 Al: 27‘.
N
C PL
Z ,_____.__

 

r — A . _.-'~ .—
Let Part 1:222] k. Zk. - fAl‘_1 ,1— - e 317cm". Part 11:.7\"'ZN - (P(AN). and

N ,'
Part IIIZZkzl Zk ' \/k(‘T1- ' 373’.

 

k- w 27?
Pk

k

For Part. I, suppose 0k._1 < 0;. without loss of generality. Then

[14k—1— Akl S l ‘-

 

'(—Z_k_1'VA7- l—7k'\/Z:)l

Uk—i

 

 

_ Ok—lﬁ

~(7k_1-\/kr—1\/: — 7M —1.)+ min, _ 1) — 7w;

1 _ _
: _lz. +(k'—1)Zk_1 — Z._1-¢k:—1-x/El
O'k_1 A
1

 

Uk—n/A:

(IZkIWkt— 17 M— 1— Vim

86

Since x/k' — 1— f:

>Cl>0

     

 

 

 

 

 

 

 

 

1 1 _
At... -Ak S Z}; +0 Zk_ )< O( — —— ° Zk_ .
l 1 l Uk_1\/]:(l l I ll) (w) +O(\/—'T)0k—l I ll
Thus,
I')
Alf—1 1 _fF2 1 _44:
/ -e deg |.4k_1—.4k|- ~e 7A
A]; 27r V277
42
. ‘ [‘~_
if we assume MAI.) : J12?— -e——"} > d)(.4k_1) : ﬁr:— ‘2 without loss of generality,
then _9
N —2 _LZJ)‘:
P t 1<Zo<1)+0(1) 1 A 2* 30}:
ar _ — - . 'T- -e
12:1 x/Z x/L— 277 0A 1

_2
since kr- a] -e 0:]: _<_ C and 0 < C1 < 0;. < C we have

Part 13 2,11 onA/E) : oc/Iv).

 

For Part II:N7N - (P(AN),
if EN 3 0, then Part II_<_ 0;

if 7N > 0, then

 

 

_ __ 77 7V) 7' IVY.
NzN-q)(AN)———NZ~-<I>(— if :m M/ 2o <I>(— ’” )-
A?

0N

Let (me : x/N - 7N. then by Feller (1964, pp 166),

71’ 7V

11 7V ) “‘2'
[:0 #3 /1‘(1_/:/3_T‘ u \/§ €_'_4;\_' < w
U‘A’ —— €- - (1.7 I .‘1‘ N "N' . . ’ __ /

87

 

Thus,

Part II = W-uw-q>(——'—’)s N-uw-‘I>(——-)

9
- _ 1 _‘_
: W-uw'f V2 --eI'-?_d;r
_oo 27?
S N-C
i.e. Part II S 0(W).
For Part 111: 21:12,. - 7);: - 3’3
”1.»

b\ the deﬁnition of Z*k and |Zk| E [—1.1].

k
1
S C
Then,
12.9%3 c.
0k

i.e. Part. III 3 O(\/]\—7).

With discussion on Part I, Part II, and Part III, we have D N ( _Z_ . parp) S ()(1 /\/T).

D “Z , parp) ——> 0 with rate

1
O ———- .
( r—N)
Proof is doneD

The group of three ﬁgures below shows the simulation of PARP strategy for se-
quence 5;, where :r E [—1. 1] and 0;. > 0.1, which is shown in the ﬁgure in row3. The
figure of row 1 is actual Cesaro loss-Bayes Envelope (in blue) vs expected Cesaro loss
- Bayes Envelope (in red) and the function 1 /\/—IV. The ﬁgure in row 2 is S” (in blue)

vs S (in green).

88

Figure 5.1: Simulation

 

 

Actual Regret
1 . . . 1/sqrtN
Estimated Expected Regret

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_1 l l L 1 l l l
0 50 100 150 200 250 300 350 4 O
Bootstrap Sample Sum
50 I . . 1 Data Sum
0 . ‘ n, .,- I 3'“, 51." H‘ ‘1' T" I" If 1‘ " ‘ ' 1' 1'31" " 1| “In ‘ 'v“"("“1h”1"l"b I'll-“H‘JS
—50 '

 

 

50 1 00 1 50 200 250 300 350 400
N=400, sequence Z from —1 to 1

1 . . l . . . . ~. 1
. , l l , ‘ . ' ll I":

0 ﬁll 11)} A ll )

40 5'0 100 1&0 260 250 360 ago i 400

 

 

 

5.4 Examples of the application of PARP strategy

As a statistical decision strategy in expert selection problem, PARP strategy can be
applied to many kinds of practical problems. Here we only give an example of the

application of this strategy in two—expert system.

By Hull (2002) in ﬁnance, there is a constant effort to predict future or forward
prices of stocks, bonds, options and commodities; the ability to predict future behavior

provides important information about the underlying structure of these securities.

In interest rate market, many different types of interest rates are regularly quoted.
These include mortgage rates, deposit rates, prime borrowing rates, and so on. As
a member of interest rate market, the n—year zero rate or spot rate is defined as the
rate of interest earned on an investment that starts today and last for 11 years. All

the interest and principal is realized at the end of n years. There are no intermediate

89

payments. Forward rates or forward interest rates are the rates of interest implied by

current zero rates for periods of time in the future. The graph shows the movements

Figure 5.2: Forward Rate History Data

Forward Rate Historical Data
1 4 l I I

 

' I

 

 

 

 

a - ‘3' I ; " a
g I a I h .‘i 'I
E VI ’ ‘ ~
g lovA l "H! 1
. , ‘ , II . ‘
IE 6- .MI “I 'lfl-‘l .1- Jul/[15" '. ‘ ......
‘ ' £4341 ;- . ’ 94131
' .h‘ I ”‘1‘ I‘d».
,. Ill"
4'- \"|:'hl‘ . _ _
”ﬁll
. t‘..
2 .. .... . . AI." ...
c l I I I I
0 1 000 2000 3000 4000 5000 6000

day

of various forward rates of US market data from 1983 to 2003. It includes forward
rates for 3 months, 6 months, 1 year, 2 years, 3 years, 4 years, 5 years, 7 years, 10
years and 30 years.

We are going to use forward rate for 3 month as an example to show how PARP
strategy is applied on it. In our two expert system, Expert I is ARMA model and
Expert 11 is just the average rate for the recent last 3 days. Blue line represent the
true data; red line is Expert I, i.e. ARMA model; Green line is Expert II which is
average rate of past 3 days.

We take 1 cycle=120 work days, then list two experts’ errors and the PARP

strategy’s error in the table 5.1. From the table above, it is easily to observe that

90

Figure 5.3: Forward Rate prediction: True data vs ARMA vs Simple Average
Forward Rate: True data vs ARMA vs Simple average

 

 

 

 

 

 

 

 

10.5 f y 1 u
r } ‘ § — True data
' g ——l— ARMA
10' ...... . , . . . . * SimpleAverage‘
l

9.5-1.... . ........... -d
2
g 9 .. ....... r . ....... ,
'9
G I) :l r. : . I .
g 8.35)..“ J ..... 1 ..... a ”.... ...... .. ... ................... ..
LL ~ (1 ,1 : .1.. ‘ . ' . - .

I! I (l i I II I' I ll
I ll: — " ll ,
8 .. . 1%, ll ........ l, .............................. .
l. ,1 it ll : l
l'
7,5 .. - .......... g .......
7 L J i I
0 50 100 150 200 250
day

PARP strategy automatically choose the better model,i.e. Expert I which has more
precise predictions.

In this example, since Expert I the ARMA model is superior to Expert 11 at most
of time, it is reasonable that PARP strategy converges to Expert I’s decision, and
the graph 5.4 also shows that average loss of PARP strategy converges to the Bayes
Envelopes, which agrees with the theoretical proof in previous section with the sample

standard deviation 0,, > 0.01 for all k = 1,2,. . . , 120 in this example.

Situation is more complicated if the two experts’ forecast are quite close. For
example, Expert I is still ARMA model, but Expert II is GARCH(1,1) model with

their prediction showed in the graph.

We still keep 1 cycle=120 work days,then the comparison between experts and

PARP strategy’s errors are showed in Table 5.2.

The simulation shows the PARP strategy works well. The graph 5.6 also shows

91

Table 5.1: Forward Rate prediction: ARMA vs Simple Average

 

Number of Cycle

Ave Loss Expert I

Ave loss Expert II

Ave Loss of PARP

 

 

 

 

 

 

 

 

 

cycle 1 0.0759 0.1083 0.0776
cycle 2 0.0711 0.0820 0.0699
cycle 3 0.0698 0.0830 0.0749
cycle 4 0.0499 0.0719 0.0522
cycle 5 0.0434 0.0591 0.0440

 

0.06

Figure 5.4: Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 5
Forward Rate: Bayes Envelope vs PARP Average Loss in cycle 5

 

0.05

0.04 r

0.03

0.02 -

 

0.01 -

 

 

1

I

 

 

—— Bayes Envelope
—-+— PARP Average Loss

 

 

 

 

 

0.1

20

40

60

80

100 120

Table 5.2: Forward Rate prediction: ARMA vs GARCH(1,1)

 

 

 

 

 

 

 

 

 

Number of cycles Ave loss of Expert I Ave loss Expert II Ave Loss of PARP
cycle 1 0.0707 0.0719 0.0706
cycle 2 0.0500 0.0503 0.0500
cycle 3 0.0697 0.0714 0.0689
cycle 4 0.0635 0.0640 0.0635
cycle 5 0.0673 0.0676 0.0666

 

 

92

 

 

Forward Rate

Figure 5.5: Forward Rate: True data vs ARMA vs GARCH(1,1)
Forward Rate: True data vs ARMA vs GARCH(1,1)

 

 

 

10.5 1 r 1 1
3 g 2 True data
I, : : . —+—- ARMA
1o 3 ~ , ' , 5 -

 

 

 

 

+ GARCH(1,1)

 

 

 

0 50 1 00 1 50 200 250

93

Figure 5.6: Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 2

Forward Rate: Bayes Envelope vs PARP Average Loss in cycle 2
0.08 r . . r .
Bayes Envelope
—-+— PARP Average Loss

 

 

 

1?—

l

0.07

 

.-—
n.-

‘L
s.-
l

0.06 -

0.05 t -

0.04 -

 

_ 0.03 - ~

0.02 - ‘ .

I
l

0.01

 

 

 

 

0 20 40 60 80 1 00 1 20

94

that average loss of PARP strategy converges to the Bayes Envelopes, which agrees
with the theoretical proof in previous section with 0;, > 0.001 for all I.” = 1. 2. . . . . 120

in this example.

5.5 Future work

Future work will include work on the two-expert selection problem and the k-expert
selection problem. For the two-expert selection problem. the goal is to extend proofs of
Hannan consistency for the PARP strategy to the general case covering all sequences
g E [—1,1]°° . To accomplish this, we need to get approxin‘iations of P(ZZ. S 0) for
the general case.

We are looking forward to understanding and discovering more properties about
the distribution of Z; in the future. One possible approach may be creating some bins
on the domain of Zk, i.e., make [—1, 1] in to several categories. in order to make the
domain of Z. a discrete set. Another one may be considering the change of P(Z: S 0)
from stage 2' : I: to stage i : k + 1. These ideas will be worked on and discussed in
the future.

There are still a lot of open problems in this field as well. For example, since
sometimes more recent past moves are more important to the decision, time-weighted
PARP strategy can be constructed and its Hannan consistency can. be studied in
the future. Also, in non-symmetric repeated game, construction of PAP and PARP
strategies and their asymptotic. properties are very interesting and can investigate in

the future as well.

(O
O]

Bibliography

[ll

[‘21

[101

[111

N. Cesa—Bianchi, G. Lugosi, On prediction of individual sequences, The Annals
of Statistics, Vol. 27, No.6, pp. 1865—1895, 1999.

L. Chen, Application of play against past strategies in repititions of a game.
PhD Dissertation, Department of Statistics and Probability, Michigan State Uni-
iier'isty, 1997.

T. H. Chung, M inimax learning in iterated games via distributional ma jorization,
Ph. D dissertation, Stanford University, 1994.

T. M. Cover, Behavior of sequential predictors of binary sequences. Transactions
of the Fourth Prague Conference on Information Theory, Statistical Decision
Functions, Random Processes, Publishing House of the Czechoslovak Academy
of Sciences, pp. 263—272, 1967.

M. Feder. N. Merhav and M. Gutman, Universal prediction of individual se-
quences, IEEE Trans. Inform. Theory 38 (1992) (4), pp. 1258-1270.

W. Feller, An introduction to probability theory and its applications. Vol. 1,
Second Edition: Wiley and Sons, NY, 1964

D. P. Foster, R. V. Vohra, A randomization Rule for selection forecasters Oper-
ations Reserach, 41, No. 4, pp. 704—709.

J. S. Frame, Dennis C. Gilliland, Votes and a half-binomial, Discrete Applied
Mathematics, 12, pp. 31—40, 1985.

E. Gardner, E. Anderson-Flether, A. Wicks, Further results on focus forecasting
vs. exponential smoothing, International Journal of Forecating, 17, pp. 287—293,
2001.

D. Gilliland, I. Jung, Play against the random past for matching binary bits, J.
Statistical Theory and Applications, 5, No.3, pp. 282—291, 2006.

D. Gilliland and J. Harman, Play against past. strategies in repeated play of
(T, S, -) game, 1999, unpublished.

96

(121

[13]

[Ml

118l

1191
[‘20]

D. Gilliland and J. Harman, Play against past strategies in repeated play of a
S-game and its dual, Michigan State University. Department of Statistics and
Probability, RM—603, 2008.

D. Gilliland. Bootstrap fictitious play in repeated games, Proc. Joint Stattistical
Meetings, Toronto, Aug, 8—12, 2004.

D. Gilliland, Asymptotic risk stability resulting from play against the past in a.
sequence of decision problems, IEEE Transcations on information theory. VollT-
18, No.5, pp. 614—617, September 1972.

D. Gilliland. J. Hannan, On an extended compound decision problem, The An.-
nals of Mathematical Statistics. Vol 40, No.50, pp. 1536—1541, 1969.

C. M. Grinstead. J. L. Snell, Introduction to Probalnlity, 2008.
uncut.dartmouth.edu/ chance /teachingaids /boohs.articles

J. F. Harman, Approximation to Bayes risk in repeated play Contributions to
the Theory of Games, Princeton University Press. 1957.

M. D. Hirschhorn. Comments on the paper “Wallis sequence...” by Lampret.
Aust. Math. Soc. Gazette 32.194, 2005.

J. Hull, Options. Futures &: other derivatives, fifth edition, 2002.

M. V. Johns Jr., Two-action compound decision problems. Proc. Fifth Berkley
Symp. Math. Statist. Prob. 1 463-478, University of California Press, 1967.

N. Littlestone, Learing quickly when irrelavent attributes abound: a new linear-
threshold algorithm, Machine Learning. 2:285-318, 1988.

N. Littlestone and M. K. Warmuth, The weighted majority algaritlnn. Sympo-
sium on Foudations of Computer Science, 30: 256—261, November 1989.

M. Loeve, Probability Theory, Third Edition. D. Van Nostrand Company. Inc,
NJ, 1963.

J. Robinson. An iterative. n’iethod of solving a game. Annals of lilathematics. 54.
pp. 296—301, 1951.

B. Smith, Focus corecasting computer tecl'iniques for inventory control: revised
for the twenty-ﬁrst centrury, 1997.

D. Swain, Bounds and rates of convergence for the extended compound estimate
problem in the sequence case. Tech. Report No. 81, Department of Statistics,
Standford Unviersity, 1965.

N. N. Vorob’ev, Game theory lectures for ecnomitsts and systems scientists,
. , . .
Sprmger—Verlag.

   

llillryyyll(yr