_THS “-1“ 10:22 LIBRARY Michigan State University This is to certify that the dissertation entitled STRATEGIES OF REPEATED GAMES presented by MINGFEI Ll has been accepted towards fulfillment of the requirements for the PhD. degree in Applied Mathematics 4‘" ,/" / M fl 1/ I Major Professor's Signature 0% /7, 2&0 2? W (J r Date MSU is an afiinnative—action, equal-opportunity employer -... -.—.—‘--.—.—.- an.-.—-na—-o —A—.-.—--.-.--.~.c—._-— — PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 KthroleccaPres/C lRC/DateDue. indd STRATEGIES IN REPEATED GAMES By Mingfei Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Applied M athematics 2008 ABSTRACT STRATEGIES IN REPEATED GAMES By Mingfei Li In games that. are repeated, the players have the opportunity to use information on opponents‘ past moves in selecting a move for the current stage. Strategies for Player II are considered in this thesis. In particular, the Play Against the Past strategy (PAP), the Play Against the Past plus Present strategy (PAP+), the Play Against the Random Past strategy (PARP), and Hannan-type strategies are investigated. especially in the repeated play of the two-person game called matching pennies. The effectiveness of a strategy is measured in terms of difference in average loss and an envelope loss: this difference is called regret. In some cases. exact expressions for regret are derived; more often, asymptotic properties are derived. The PAP strategy for Player II is not effective against all Player I move sequences. Hannan (1957) used a Bayes response to random perturbations of Player I's empiri- cal distribution of past moves as a strategy and established good asymptotic regret properties uniform in Player I move sequences for the repeated play of a variety of games. Gilliland (2004) and Gilliland and Jung (2006) introduced the PARP strategy where the randomization comes through bootstrap sampling of Player I‘s past moves and established results for the repeated play of matching pennies. The PAP, PAP+. PARP and Hannan-type strategies are defined in Chapter 2. The adaptation of PARP to achieve regret results relative to k-extended envelr.)pes is demonstrated in Chapter 3 for matching pennies. Chapter 4 documents cases where strategies published following Hannan's seminal (1957) paper are unrecognized, special cases of his work. PARP is discussed in the context. of the expert. selection problem in Chapter 5. and regret asymptotics are derived for certain classes of Player I move sequences. DEDICATION This dissertation is dedicated to my wonderful parents, J iequn and Shumay who raised me, supported me, taught me and loved me. You have been with me every step of the way, through good times and bad. Also, this dissertation is dedicated to my grandparents, aunts and uncles, who have been great sources of motivation and inspiration. Thank you for all the unconditional love, guidance, and support that you have always given me, helping me to succeed and instilling in me the confidence that I am capable of doing anything I put my mind to. Thank you for everything. I love you! iii ACKNOWLEDGMENT From the formative stages of this dissertation, to the final draft, I owe an immense debt of gratitude to my PhD supervisor. Dr. Dennis Gilliland. His sound advice and careful guidance were invaluable. I have learned so much, and without you, this would not have been possible. For their efforts, assistance and support, a special thanks as well to the Dr. Zhou. Dr. Yan, Dr. Weil, Dr. Page, Dr. Dass, Dr. Vorro, Dr. Bush, Barbara Miller and all my good friends and students colleagues in Michigan State University. Finally, I would be remiss without. mentioning Dr. Mark Hurwitz, Mr. and Mrs. Mawhorter, whose extreme generosity will be remembered always. To each of the above. I extend my deepest appreciation. iv TABLE OF CONTENTS List of Tables List of Figures Introduction 1.1 Game theory ............................... 1.2 Repeated play ............................... 1.3 Summary of Thesis ............................ The PAP, PAP+, PARP and Hannan-Type Strategies 2.1 2.2 2.3 2.4 Play Against the Past (PAP) and Past plus Present (PAP+) ..... Play Against the Random Past (PARP) ................ Hannan-Type Strategies (H) ....................... Need for Fresh Randomizations ..................... PARP Strategy for k-extended Envelope Problem 3.1 3.2 3.3 3.4 3.5 3.6 Introduction ............................ ‘. . . . Envelopes including Extended Envelopes ................ Two-extended envelope problem ..................... PAP Strategy in two-extended envelope problem ............ PAP+ Strategy in two-extended envelope problem ........... PARP Strategy in. two-extended envelope problem ........... Discovering Hannan 4.1 4.2 4.3 Foster and Vohra: Selecting Forecasters ................. Feder, Merhav and Gutmaanniversal Predictitm ............ Minimax Regret .............................. Expert Selection Problem 5.1 5.2 5.3 Introduction and Review ......................... Focus Forecasting ............................. 5.2.1 Introduction ............................ 5.2.2 Methodology ........................... Two experts selection problem ...................... 5.3.1 Reduction to Z—problem ..................... 5.3.2 Worst case discussion ....................... 5.3.3 Hannan consistency of PARP for Certain classes ........ Examples of the application of PARP strategy ............. Future work ................................ V? vii viii cube—*l-i 12 18 18 22 32 32 3 (i ~J \I x] CH 4: A; to (DOOOO\1\1N Leer—doom 95 Bibliography vi 96 4.1 4.2 4.3 5.1 LIST OF TABLES Player I and Player II’s dynamic play to N 2 NO + 3 ......... Player I and Player II‘s dynamic play to N 2 N0 + +7711 + 3 ..... Non refresh randomness example with U 20.7. ............. An examine of Player l‘s actual moves in N213 trials ......... 'I‘wo-extci'ided envelope problem ..................... Convergence of minimax regret ..................... Player I’s play sequence with N25. ................... Player I's play sequence with N210 .................... Forward Rate prediction: ARMA vs Simple Average .......... Forward Rate prediction: ARMA vs GARCH(1.1) ........... vii 1.1 2.2 2.3 4.1 4.2 .01 .0! s“ .61 .01 C)! pr:- 00 [\3 t—‘ 01 C33 p—a LIST OF FIGURES Cesaro Loss vs Bayes Envelope ..................... PAP vs PAP-l— vs Envelope for i.i.d Bernoulli sequence ........ simplex .................................. Non refresh randomness U207. .................... Hannan, PARP and Minimax probability for N25 ........... Hannan, PARP and Minimax probability for N210 .......... Hannan, PARP and Minimax probability for N2800 .......... Simulation ................................. Forward Rate History Data. ....................... Forward Rate prediction: True data vs ARMA vs Simple Average . . Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 5 Forward Rate: True data vs ARMA vs GARCH(1,1) ......... Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 2 1Images in this dissertation are presented in color viii 66 Chapter 1 Introduction 1. 1 Game theory Game theory is the theory of rational behavior for interactive decision problems. In a game. participants strive to maximize their expected gain by choosing particular courses of action, and each participant‘s final payoff depends on the profiles of the courses of action chosen by all participants. The interactive situation, specified by the set of participants. the information flow. the possible courses of actiml of each participant. and the set of all possible payoffs. is called a game. Participants, i.e.. those who are ‘playing’ the game, are called the players. In a game. if the goal of each player is to achieve the largest possible individual gain (profit or payoff). the game is called a noncoopcratz've game. Games in which the actions of the players are directed to maximized the gains of coalitions without subsequent subdivision of the gain among the players within the coalition are called cooperative games. In. this thesis. we focus on some I‘ioncooperative games. The basic objects of interest. in noncooperative games are players” strategies. A player’s strategy is a complete plan of action, i.e.. the n'ioves to be taken when the game is played: it. must be completely specified before the actual play of the game starts. and it prescribes the course of play for each move that a player might be called upon to take, for each possible piece of information. that the player may have at. each time where he or she might be called upon to act. In simple form, a. two-person game is a triple (.4. B, L) where A is the set of moves for Player I, B is the set of moves for Player 11, and L is a. nonnegative function defined on A x B with Mr, y) denoting loss to Player II when Player I plays x and Player II plays y. With a a—field of subsets defined for .4, suitable integrability conditions for L, and A“ denoting the class of probability distributions on the a—field, the domain of L is extended to .4“ x B by L(7r.y) 2 f L(.:r,y)d7r(;r). If the class of prol;)al.)ility distributions includes all degenerate probability distributions for the points in .4, then (.4", B, L) formally extends the game (AB. L) to include randomized strategies for Player I. Under suitable assumptions, the game extends to (.4*,B*, L) where both players have randomized strategies 77 E A", 7' E B“. For the extension, the loss function is an expectation (expected loss). but it will still be called the loss function. Our focus is on moves or strategies for Player II and generally Player I’s utility or inutility are not. defined. For a. zero-sum game. it is understood that Player I's gain is Player II‘s loss. If Player 11 uses the distribution 7' to generate his/ her move, we refer to this as randomization. Here the move y is determined as the realization of a random variable with a probability distribution 7' specified by the player. A minimax strategy for Player 11 is any move 7mM such that maxW L(7r, 7m“) 2 min? max” L(7r, 7'), the upper value of the game. A maxmin strategy for Player I is any move WM", such that. ming L(7rMm, 7') 2 max7r min, L(7r, 7"), the lower value of the game. If the upper value is equal to the lower value, the common value is called the value of the game. A Bayes rule for Player II versus the distribution 77 is any 7" such that L(n, 7) 2 min? L(7r, 7) The minimum is denoted as 13(7) and called minimum Bayes risk. R() is called the Bayes envelope for the game. A minimizer exists in the set B of pure moves. Any function a on A’“ with range in B“ and such that L-(7r. 0(a)) 2 R(7r) for [\D all 7r 6 A“ is called a Bayes response. Harman (1957) took (Jr to be B-nal'ued. Consider a zero-sum game where Player I selects a move from the set A 2 {al.ag, . . . ,am} and Player 11 selects a move from set B 2 {121. ()2. . . . ,b,,} with loss L(a,-, bj) to Player 11 if I chooses a,- and II chooses bj. This is called a finite m X n game. Here A“ is the prol;)a.bility simplex in R", 13* is the probability simplex in R" and all of the ex1‘.)ecta.tions are inner products. Much of our study concerns the simple 2 X 2 game where each player selects from {0,1} and the loss function is L(i.j) 2 Ii —j| 2 i-(l —j) + (1 — i) ~j, i,j 2 0.1. This is the game of matching pennies (or matching binary bits) with Player II’s ob jective to match Player I. In matching l;)inary bits, 1 — 2L is the gain for Player II while. 2L — 1 is Player I's gain. Suppose that Player 11 generates his/her move in the matching pennies with a Bernoulli distribution B(1.p), i.e. prob(j 2 1) 2 p and prob(j 2 0) 2 1 — {2. Then Player II’s expected loss is seen to be L(i,p) 2 i - (1 — p) + (1 — i) -p 2 li — pl, i 6 {0,1}. p 6 [0, 1]. Thus the loss function extends to expected loss on the domain {0. 1} X [0, 1] and we will call it simply loss where there is no chance of confusion. Applying the extended loss to a weather forecast of rain with prol;)al‘)ility p, the forecaster (Player II) suffers “loss” 1 — p if it. rains (i 2 1) and “loss” p if it does 1 not rain (i 2 0). The choice p 2 5 in the n'iinimax choice for II. it minimizes the 1 maximum possible expected “loss”, i. e.. 1 2 5 minimizes (1 — [2) V p. Notice that the weather forecaster is not required to actually generate the Bernoulli random varialfle to serve as his / her move. Rather he / she simply specifies a. probalnlity p. If “Nature” flips an unbiased coin to determine whether it rains or not, then the equilibrium “loss" 1 2 is achieved, the value of the game. 1.2 Repeated play If a fixed group of players plays a given game repeatedly, we say this is a rcpmtai game or is repmtcd play. In another words, a repeated game is the same simultaneous game played repeatedly. The payoffs add across repeated play. In repeated play, rules will specify what information generated in the repeated play is made available to what players and when. We will assume that all players will be fully informed of the rules that govern the game that is being repeated together with the history of moves of all players at all stages of the repeated play. Thus, the player may use strategies. i.e., sequences of functions that map the history of past moves into a move for the current stage. The repeated play is of two types: (1) Finite Horizon where there is to be a. sequence of N plays where N is finite, specified and known to the players in. advance. See Hannah's weak sequence game (1957. Sec 3). In finite horizon play, the players’ strategies can depend on N. Con- ceptually, the finite horizon repeated play game is another example of a simultaneous move game where the strategies are finite sequences of recursive functions. Player 11 strategies are evaluated in terms of the average loss over the N games. We will consider finite horizon repeated play only in Chapter 4, Section 3. (2) Infinite Horizon where there is an infinite sequence of plays and the, players know this. See. Hannans strong sequence game (1957, Sec 3). A player 11 strategy is evaluated in terms of the sequence of average loss over initial segments. Our study concerns the review of and the development of "good” strategies for Player 11 in the repeated play of a two-person game. Generally. results uniform in sequences of Player I moves are sought and obtained. With such results. the findings extend to results uniform in Player I strategies and show that. the motivation for Player I is irrelevant (Hannan, 1957). The two-person construct is not as restrictive as it seems since the term Player I may be taken to name a coalition or collectimi of players. Now consider the repeated play of matching pennies. We let g and Q denote infinite sequences of moves for the respective players and letg, and 1) denote initial sequences. t. : 1.2.... A deterministic sti ate (pure stratum) for Player 11 has as components recursive functions Qt((_1.4_1) taking values in {0. 1}. t : 2.3.. . . with ()1 E {0, 1}. The associated average (Cesaro) loss to Player 11 across N plays at the Player I sequence g_.i: ’1‘) N CLN( ((_t. 1)): 2L( (1... b,( ((1.1,_ ))/N— — Zlat —bt(_ _1')|/N t: 1 As is rather obvious and Ifierhzqis first recorded by Cover (1967). N 1nax{Z la, — bt(a,_l)| I QN E {0.1}N} : N 1:1 for every b1 and sequence of £54 - measuralgile functions Qt(-.) t z 2. .3, . . . . Thus. no deterministic strategy for Player II can produce the uniform convergence of average [V . . . . . loss to zero. :1 la. — [)1] IS the Hamming distance between the binary sequences (1,1.(12. . . . .aN and b1.b2. . ..bN. A stochastic strategy (mired strategy) for Player II has as components recursive functions Eth—i) taking values in [0.1]. t : 23. with pl 6 [0.1]. Identifying 1 with 7 2 1 and 0 with p : 0. the stochastic strategies include the deterministic strategies as a subclass. The. associated average (Cesaro) loss to Player II across N plays is N N Ghee) : Z L>,/N = 2: la. — p.menting a strategy for matching pennies. We base the example on a Harman-type strategy. In Chapter 3. we examine extended envelopes for matching pennies. These en- velopes are called k-eattended envelopes and are more stringent than the simple en- velope. \-’\«"'hereas the simple envelope is the Bayes envelope of the component game evaluated at the empirical distribution of {a1. a2. . . . .aN_1. aN}. the 2-extended en- velope is a Bayes envelope evaluated at. the empirical distribution of pairs {(a1.a2). ((12.0.3). . . . , (aN_1.aN)}. If Player I‘s moves exhibit. Markov structure, for example. a tendency to follow a 0 with a 1, then the 2—extended envelope can be considerably less than the simple envelope. In matching pennies, we develop exact. expressions for the 2-extended regrets for the PAP and PAP+ strategies and establish Hannan-consistency for a PARP strategy. Chapter 4 reports on a. literature search to document specific theorems and results published by others after I-Iannans (1957) seminal paper. results that are found in or are direct. consequences of Hannan (1957) results. Because of the cryptic style and possibly the notations used in Hannan (1957). it is understam‘lable that other researchers failed to recognize the. specific results therein. The style and notations makes the documentations rather challenging in some cases. The literature includes Cover (1967), Peder. Merhav and Gutman (1992). Foster and Vohra (1993), Chung (1994), and Cesa-Bianchi and Lugosi (1999). This search was motivated in part by the Gina Kolta (2006) New York Times article Pity the Scientist Who Discovers the Discovered in which Hannan is I'nentioned. Chapter 5 introduces the expert selection problem, which has gotten considerable attention in the game theory and computer science research communities. Here Player II must select from a class of experts and assume whatever loss is incurred by that expert in a specified game. This problem is often cast. in terms of a forecasting problem. For example. consider a set of K weather forecasters (experts). Player II must make a weather forecast for tomorrow: rather than do his / her own analysis, Player II examines the records of accuracy for the K experts and selects the fOI‘GCaSt of the one who has the best record of past accuracy. As described, this would be a PAP strategy. PAP strategies here and in general are not Hannan consistent on all Player I sequences g. In repeated games, the set of experts could be a set of strategies. Player II uses the performance record of the strategies to select one to implement in the 10 current stage. Chapter 5 starts by discussing focus forecasting (Smith. 1978) which can be described as PAP where the tests of the forecasting strategies in the pool are over recent perforn'lance. not the conmlete past. In practice, this is a criticized methodology since the pool of experts seems to have grown in a rather ad hoc fashion. For example, see Gardner, Anderson-Flether and Wicks (2001). Smith's con'lpany (Focus Forecastingcom) continues to serve custmners. In Chapter 5. we investigate the use of the PARP strategy in expert selection. We examine the case. the pool has only two experts and show the problem to be reducible to a one-dimensional problem. This problem is examined and a class of sequences Q where PARP is Hannan consistent is identified. We conclude Chapter 5 with empirical tests of the PARP strategy for selecting from competing time series models for prediction. 1] Chapter 2 The PAP, PAP+, PARP and Hannah-Type Strategies 2.1 Play Against the Past (PAP) and Past plus Present (PAP+) Play against the past in the repeated play of a two-person game denotes the strategy for Player II in which II at each stage t : 2,3, . .. plays component game Bayes versus the empirical distribution of 1's past moves. The study of this strategy in general settings is undertaken in Hannan (1957) where basic inequalities (Sec 8 (11)) show the possible in'lportance of the study to the construction of good strategies for Player II in repeated games. Gilliland (1972) continues the discussion of play agaii‘ist the past strategies in sequences of statistical decision problems. Play against the past is a one-sided version of what is called fictitious play in the repeated play of a two-person. zero-sum game (Robinson. 1951). Recall that a Bayes rule for Player II versus a prior distribution over the pos- sible moves by Player I is any move that minimizes the expected loss to Player II. For example, in matching pennies, a Bayes rule versus the prol:)a..l>ility distrilimtion 12 Prob( = 1) : 7r. Prob(a : 0) : 1 — 7:. is any rule where p : 1 (Player II plays b = 1) if it > g and p : 0 (Player II plays I) : 0) if 77 < 5111 our study. we will usually take the determination p : % when 7r : %. Formally. the Bayes response we consider in our analyses of matching pennies is denoted by a(-. -). where Here and throughout this thesis. square brackets denote indicatm functions. More- over. it is convenient for future use to extend the domain of the Bayes response to 0(w1.w2) E [0, 0C)“2 — (0,0) by (7(td1.u12) :2 0(a21/(a21+ wg),WQ/(w‘1 + w2)). The PAP strategy for Player II in matching pennies is denoted by and defined by 1 1 1 1 PAP: papl : —2-. papi(g_t_1) : [gr—1 > 2] + glgr—i : 3] d where recall from Chapter 1 that g,_1 denotes the proportion of 1's in the sequence (_4_1. With the PAP strategy for matching pennies. Player II starts with a coin toss (assumed to be a fair coin) and sul')sequently plays the majority choice in Player I past moves with a coin toss in the event of a tie Hannan (1957, Sec 8, (11)) also considered the u1‘1realizable strategy for Player II that in the context of matching pennies is I 1 I 1 13,413+: [)(L[)+1: :2“ pap +1 (gt) 2 [gt > 5] + TZ—[gt : 3]. d This can be thought of as play against. the past including present. Note that this strategy has Player II‘s move at stage t to be the Bayes response versus the empirical distribution of {a1, (1.2 ..... at_1, at}. Hannan (1957) established for the repeated play 13 of a general game that the average loss from PAP+ in no greater than the simple envelope loss and that the average loss from PAP is no less than the simple envelope loss. The evaluations for PAP and PAP+ are simple and illustrative in the case of matching pennies. In developing a new strategy PARP for matching binary bits (matching pennies) Gilliland and Jung (2006) proved the following proposition in regard to PAP. Proposition 2.1.1. In Matching Pennies, the Cesaro loss sequence for the PAP strategy is given by CLN(Q.[_)(l_p):gN/\(1— 9N) + 0.51/N/N + 0.5-[gN7ll/2]/1’V, N :1.2.... where I/N is the number 0f 9,, visit to 1/2, t : 1, ‘2, 3. . . . , N Note that the excess average loss over the simple envelope loss gN /\ (1 — gas) is positive and is maximized at g : (0. 1, 0. 1.. . . ) or (0. 1. O. 1. . . . ) with the maximum being 0.25 asymptotically. Here we have limN DN(_q. pap) : 0.25. That. PAP is not Harman consistent at all in Player I move sequences is a well known result. we now turn to the the unrealizable strategy PAP+ in matching pennies. Proposition 2.1.2. The Cesaro loss sequence for the PAP+ strategy is given by CLN(Q.])(I[)+) : gN /\ (1 — gN) — 0.51/N/N. where VA! is the number of gt visit to 1/2, t : 1, ‘2, 3. . . . . N. Furthermore, max{CLN(g_.pap+)lfi.red NgN} : gN /\ (1 — ,(IN) 1-1 and min{CLN(Q,pap+)lfi.red Air/N} = 9N /\ (1 - gN) — 0.5(greatest integer in N/2)/N. Proof: Let N > 0 be fixed and take a1 : 1 without loss of generality. Suppose that gt returns to 1/2 at stages i1. 732.. . . .ik. where 1 < i,- < i; ..... < it E N. Consider the first epoch 1 S t S i1. Note that gt > 1/2 on 1 S t < i1 and 911 : 1/2 so that Player II plays 1 on 1 S t < i1 and 1/2 at t : i1. Player I has played 7:1/‘2 0s including the 0 at stage. i1 and il/‘2 1‘s on epoch 1 S t S i1. Thus, the total loss to Player II on the first epoch is (7.1/2 — 1) + 1/2 : (i1/2 — 1/2). The total loss across all epochs is (i1/2 —1/‘2)+(i2 — i1)/‘2 —1/2 + - - - + (it. — ik_1)/'2 —1/‘2 : ik/‘2 - lie/‘2. If it. : N, then gN : 1 / ‘2 and the average loss is CLN(Q.pUP+) : gN — O'SI/N/JV : 9N A (1 — 9N) — 0,51/N/1'N’Y where I/N :: {number of gt visits to 1/‘2It : 1, ‘2. 3. . . . , N}. Now suppose that it. < N. Let. (lik+1 : 1 without of loss of generality so that gN > 1 / 2. Then Player II plays 1 on the N — it. stages it < t g N. On these stages. Player I plays a total of (NgN — ik / 2) 1s and therefore, (N —— it.) — (NgN — it./ 2) 0s. which is the total loss for Player 11. Thus. the average loss for Player I I across all N stages is CLN(Q.])(Z[)+) :— (7k/2 — I/N/2)/;7V + (I - gN) — tk/2N Z {M} /\(1— gN) — 0.51/N/A‘Y. For the fixed total number NgN of 1's in the sequence QN. we see that the Cesaro loss is maximized when I/N : 0 and minimized by alternating 1's and 0s. Proof is doneEl 15 It follows from Propositions 2.1.1 and 2.1.2 that. l/ (r 12 2 N 2 N and VN DN((_1..T)(L[)+) :— —3K. Suppose that. Player I generates moves a1,a2, . . .as independent, identically dis- tributed B (1, 77). we show that. PAP is Hannan consistent at g. It. squices to show that I/N/N + [gm 75 1/ 2] /N ——> O. The second term is bounded by l/N so we need only consider the first term I/N / N. VA; is the number of visits of the random walk St : 22(204- — 1) to 0 across t : 1.2,...,1\", N : 1,2 ..... The strong law of large numbers shows that SN/N —> 27." — 1 as. Thus. if 7: 75 1/2, SN is 0 only finitely often as, which 1111101638 that z/N/N —+ 0 as. Thus. where PAP is not, Hannan consistent at all sequences 9. it is Harman consistent as. if Player I re- peatedly and independently generates his/her moves by a coin toss that has prob- ability 77 of turning up Heads ((1 : 1) provided 77 74 1/2. Since g»; ——> 77 as. and R(gN) : gN /\ (1 — gN) -—> 71' /\ (1 — 77) g 1/2 as. with equality if and only if 7: : 1/‘2, Player 11 is sure to win more than 5070 of the time in the limit if 7.- 7£ 1/2, i.e.. the coin is biased. If 77 : 1 / 2 (the coin is unlg)iased). there is simple expression for E(I/N). Ptom Grinstead and Snell (2008. p. 481), E(l/2N) : (IN -1 where (2N + 1)! GIN" : ' Y T 4A A LN! . . . , . . y . v /——T Wlll appear again 1n chapter 4. section 3. Since ox.- ~ 4J\/7r. E(1/2N/2.’\ ) ~ l/v NA and using 112 NH : u; N it follows that E (I/N / N) ——> 0 in L1. 16 Figure 2.1 shows the result of a. simulation where the at are i.i.d Bernoulli (1,1/ 2), t=1,2,...,100. Figure 2.1: PAP vs PAP+ vs Envelope for i.i.d Bernoulli sequence 0.8 j , f I - + - PAP _ -l-‘ - * - PAP+ . 0'7 l Bayes Envelope at 0.6 ll 0.5 4 i . I II .... l 0.4 0.3 1* i . .o’o': :* 3 Gen 3:, 0.1 fl 0 l l P 0 50 100 150 200 N 17 2.2 Play Against the Random Past (PARP) Gilliland (2004) announced the result that play against the random past in match- T.— ing pennies is Harman consistent with uniform rate O(]\‘ 1/2). Proof was given in Gilliland and Jung (2006). The PARP strategy for Player II in matching pennies is denoted by and defined by 1 . 1 1 . narrate—1): [gt—1 > 3] + §[9,_1 : — PARP: parpl : ‘2] Kali—I where g{_1 is the proportion of 1's in a random sample of size t— 1 drawn with replacement from Player I's moves {(1.1 , a2, . . . ,a,_1}. It is assumed that the bootstrap samples are independent across the stages t. i.e.. that. fresh samples are drawn at each stage t :— 2. 3 . .. Study of PARP in matching pennies requires the analysis of the half-binomial probabilities , 1 1 , 1 Pt—l,gt_1 :: E(lgt—l > El + Elf/t—l : El) Gilliland and Jung (2006) show that there exist constants A and B such that IDMerarpH S (A + B \/ N - (9N /\ (1 - 9M) )/Ns thus establishing uniform Harman consistency for PARP with rate 0(N‘1/2). In Chapter 5 we explore the PARP approach for repeated play of an infinite component game that. is motivated by the expert selection problem. 2.3 Hannan—Type Strategies (H) Hannan-type (1957) strategies overcome the weakness in PAP by playing Bayes re- sponses or the expectations of Bayes responses to properly scaled random perturba- tions of the en‘ipirical distributions GP]. Specifically, with a component game where 18 Player I has m moves {1, 2. . . . .m}. the empirical probability distribution of Player t—l 1 1’5 moves through time t— 1 is the vector GP] 2: (72.1 .721; , . . . .n:;I)/(t — 1) where iii—1 7. : num{aj = ilj : 1.2.....t — 1}, i : 1,2,....m. We define a Hannah-type strategy as any Player I I strategy that. at stage t plays 0(Gt—1 +II.1,_1Zt,_1) . t: 2,3,... (2.1) ()T‘ E(0'(G1_1‘i ht_1' 2)) . t: 2. 3. . . . (2.2) where {ht_1} is a sequence of positive real numbers. Zt_1 and Z are random vectors take values in (0, oc)"", and E is expectation over Z. To sin‘iplify proofs. Hannan extends the domain of the Bayes response a from the probability simplex in Hm to all of [0, oc)m with a being positive homogeneous of order 0. that is, 0(eur) : 0(u') for all c > 0, 21? E [0. oc)m. Harman (1957) for repeated play of a variety of component games, including the finite two-person game and the S-game, imposes conditions on the sequence of constants {ht} and the distribution of Z to achieve uniform Hannan consistency for the strategy (2.2) with rates. In matching pennies where in = 2, we have labeled the pure moves as 0 and 1 and let 9,11 = num{aj : 1| j = 1.2.. . . ,t— 1} /(t — 1) denote the empirical proportions of 1‘s in Player I’s initial move sequences. Since. 1 — g,,_1 : nu7n{aj : 0|j : 1. 2. . . . .t— 1}/(t — 1). the empirical probability distribution is Gt.) : (1 — gt_1.gt_1). Chapter 4 includes a survey of published results that are subsumed by Hannan (1957). It appears that many of the authors were unaware of the specific results contained in Hannan (1957). Since a positive homogeneous (order 0) Bayes response function plays a key role in proofs for Harman-type strategies. we con<.:lude this sec- tion with examples to illustrate 0, its properties and the notations that are used. Hopefully, these examples will help make the proofs in Chapter 4 of connectirms of Hannan-type strategies to other work understandable. 19 Example 2.2 (Matching pennies) Here Player I and Player 11 have m = n : 2 pure moves which we have denoted as {0,1}. Suppose that Player I selects his/ her move with the (prior) prol.)al_)ility distribution Prob(O) : 1 — 77. Prob(1) : 77. Consider Player II selects his/her move with Prob(0) : 1 — p. Prob(1) : p. The expected loss to Player 11 is L(7r.p) : (1 — 7r)p + 77(1 — p). Any p that minimizes L(7r.p) is Bayes versus 77. A Bayes response is any function 0 on the 1:)rol.)al:)ility simplex in R? such that p : 0(1 — 7r, 7:) is Bayes versus 7r. For each 7: > 1 / 2. p = 1 is uniquely Bayes versus 7:; for each 71' < 1/2, p : 0 is uniquely Bayes versus 7."; for 7r : 1/2, all p E [0. 1] are Bayes versus 77. To specify a Bayes response one must select a minimizer when the minimizer is not unique. Here is the example given earlier in section 2.1: 0 0_<_7r<%. 0'(l-7T,71’): % 77:? l. é<7r_<_1. When in : 2, there is the notational convenience derived from identifying (1—77, 7:) by 7? . However, this identificaticm hides features. including the positive homogeneous property of order 0 imposed by Hannan on the Bayes response. As noted in section 2.1, once a Bayes response is defined on the prolialiility simplex in H2, the domain is easily extended to all of [0, oc)2 — {(0,0)} by 0(u-'1,ur2) : 0(u21/(u11 + 117-2).u12/(u'1 + 219)) and then to all of [0. (>0)2 by defining 0(0, 0) to be any specific move. Then. a(-. -) is a positive homogeneous function of order 0 defined on [0. co)? Then for matching pennies. 0(4.7) = a(4/11,7/1]) : 0(2 - 4,2 - 7) : 1:0(12, 12) : 0(1/2.1/2) 2: 0(7. 7) : 1/2. Note, for example, that. with Z : (21.22) and h a positive constant. 0(1 — 77 + hZ1.7r + hZ-z) : 1 if and only if (22 — Z1) > (1 — 27r)/h. In the Hannan- type strategy (2.2). a random perturbation is used, in particular, (Z 1. Zg) is a random vector. Thus the expected Bayes response (2.2) is a probability distribution on Player II’s pure moves. specifically, P(1) : Prob(Z-2—Z1 > (1—27r)/h)+0.5-Prob(Z2— Z1 : 20 (1 —27r)/h) and P(O) : PTOb(Z2—Zl < (1—27r)/h)+0.5-Pr0b(Zg—Z1:(1—27r)/h). Example 2.3 (Matching 3-sicied pennies) Here Player I and Player 11 have m : n = 3 pure moves which we denote as {1. 2. 3}. Consider the Player II loss matrix shown below Player 11 side 1 side 2 side 3 side 1 0 1 1 Player I side 2 1 0 1 side 3 1 1 0 Suppose that Player I selects his / her moves with the (prior) probability distribu- tion P(1) : 7:1. P(L) : 7:2. P(3) : 773. A Bayes response for Player II defined on the prolaability simplex in R3 must satisfy 1, 7T1> 71'2V7T3, 0(771a7722773): 2 7T2>771V7T3, 3, 7T3 > 771V772. These (7:1. n2. 7:3)-sets are the interiors of the convex regions shown in the probability simplex in figure 2.2. The domain of a can be extended to the boundaries by any choices of probability distributions supported on the maximizing coordinates. For example. 0(025. 0.40, 0.35) : 2 : (0,1,0) and one could take 0(0.35,0.35,0.30) : (1/2,1/2.0) and 0(1/3.1/3.1/3) 2: (1/3,1/3.1/3) Note. for example. that if "the domain of the function 0 is extended to all of [0. OC)3 as a positive homogeneous function of order 0, then 0(771 + hZ 1.772 + 1722. 773+hZ3) : (1,0. 0) if (Z1 — Z2) > (772—771)/h and (21 —Z3) > (7r3—n1)/h. where h. is a positive constant. In the Harman-type strategy (2.2), a random perturbation is used, in particular, Z = (Z 1, 22, Z3). The expected Bayes response (2.2) is then a probability distribution on Player II‘s pure moves. 91 d Figure 2.2: simplex (0.0.1) (1.0.0) (0.1.0) 2.4 Need for Fresh Randomizations Consider the situation where Player II's moves are 1,)robabilities p (the weatherman example) or. more generally, prol:)ability distributions or expectations. Contrast this with the situation where Player II is forced to play the rca‘ilization of his / her random- ization. For example, in matching pennies Player II is required to select. a 0 or a 1. albeit. he/ she may generate the move with a probability distribution. Because the. histories of Player II‘s past moves are availa.l.)le to Player 1. Player II must be con- cerned about the joint distriliiution of the random variables he/ she generates across the stages of the sequence. Hannan (1957) did not deal with this issue since his concern was with repeated play where II’s moves were 1‘)ro}.)a.l:)ility distributions or expectations and cmnponent loss was measured following expectation over the randomization. To be more specific. 22 a single random variable Z ~ F (serving like a dummy variable of integration) is used in describing the Hannan moves E(0(G1 +h1 - Z)) E(0(C2 +112 - Z)), E(0’(G3 + M- Z )) . . in his theorems. Suppose that Player II must play a move b2 at stage t = 2 determined by the Bayes response 0(Gl + h] - Z) a move b3 at stage t = 3 detern'iined by the Bayes response 0(02 + 122 - Z) . . . In this case, information on the realization of Z passes to Player I through the the sequence ()2. b3, . . .. (Player 11 applying 0(G1 + In ' Z1), 0(02 + 122 - Z2). 0(G3 + h - Z3).. . . with iid Z,- ~ F removes this possibility for Player 1.) However, we take the matching pennies example to show how Player 11 can be trapped if employing a Harman-type strategy based on a single randomization Z. Recall our matching pennies example and consider the Harman-type strategy-2 1 h, (Z — Z. 0'(l-{]1_1+ht_121.gt_1+Il-t_1ZQ):[gt_1> ‘23 + t 1‘ 1; 2)] where gt_1 is the proportion of 1's in the sequence of Player I moves from stage 1 to stage t-1. In our example we take the scale factor ht : 2 / \/f and let U : Z1 — Z2 where (Z1. Z2) is uniformly distributed in the unit square [0.1]? Then 1 U +—_:], f:2,3.... bz:[gt—1 >5 \/(;-1 . where U E [—1.1]. Consider this strategy for Player I: (1.1 : 0. (12 : 0. (1.3 : 1 and (It—1 if bt—l : 1“ (It—1. (It I 1- (It—1 if bt—l : (It—1- Thus, Player I continuous to play the same move until he / she observes that Player II has matched his / her move. Assume that. Player 11 generates his / her first move In and the randomization (Z1,Z2) in(_le[_)endently with P(bl _: 0) > 0 so that P(bl : 23 0.. U > 0) > 0. We will show that on the event ((21 : 0, U > 0) that Player I wins at least 3/5ths of the time in the limit so that mnNDMQQ) 2 g — é : 11—0. Since the event has positive probability, there are move sequences Q where the strategy _b is not Hannan consistent. (If P(1)] : 0) : 0, then P(b] : 1) : 1 and analysis of the Player I strategy starting with 1 1 0 will lead to a similar conclusion.) Assume that. U 2 0. Let NO + 2 be the maximum of stage before Player I switches his / her play to begin to play the opposite, i.e NO —— 2 U . f0 —— 1 U < 0.5 + but gN0+l : —,—— > 0.5 + ( r : —,— _ . , —,—-. JAG 1N0 V 1V0 [NO + 1 V Afo + 1 N0 —— 2 N0 From the inequality of g No and 9N0 : , we have U N0 — 2 , /——A'O NO 0.5+ >0 i.e., 0 0.5 + ,————0+ 1, so Player II sw1tch lns play from 0 to 1 at stage N0 + 1. Then at stage N0 + 2, Player I begins to play 0. Player 11 switches his play at stage N0 + 2, then Player I switches his play from 1 24 Table 2.1: Player I and Player II‘s dynamic play to N : NO + 3 Stage 1 2 3 5 N0 N0 + 1 NO + 2 N0 + 3 Player I 0 0 1 1 1 1 1 1 0 Player 11 0 0 0 0 0 1 0 Table 2.2: Player I and Player II‘s dynamic play to N : N0 + +7711 + 3 Stage Player I Player II NO NO + 1 NO + 2 N0 + 3 Or—ar—ib—A' i—ti—aoo: N0 + m1 N0 + mi + 1 N70 + m1 + 2 fvo + m1 + 3 HOOD: to 0 at stage N0 + 3. Let m] be the number of stages needed for Player II to switch his play back to 1 after stage N0 + 2. i.e. As Player I and Player II’s play are listed above, the total number of 1‘s in Player . . N . . . I 8 play from stage 1 to stage Ag + 7721. thus gNO+m1 : T043171 And by the definition of ml and NO, NO U m. = .—— > 0.5 + —.— gN0+ 1 1V0 + 777.1 ) V N0 + 771-1 ngO _ S 0 O + U NO — 2 U NO -- . - 0.: + —-——, - —, S 0 VA’TO N0 ) ( L) V ./\"0 + 7711 NO + 7771 ) i.e., ml is the largest integer such that U N O + —,—— — —.—-—-0—- _<_ O. \/ AD + 7711 AG + ml therefore, we have \/ N0 + 7m_ —U + on? + 2% i.e., m1 3 2U2 + NO — 2U - x/2N0 + W. Lemma 2.4.1. With. the results about NO above, we have the following bounds for 2 3 ml _<_ 4, for all U E [0, 1]. And m1 reaches its minimum at. U = 1, and maximum at U : 0. Similarly. we have the same conclusion for m3, m5. 772.7,. . . ,i.e 2972,34, j:15)7 Proof: For m1, (U 2 + 4) < A730 (U + x/U2 +4 2,NO is the maximum integer to satisfy this inequality. |/\ m (U — \/'3No + U2)? — I No)? : —\/2N0+U‘2+\/N0 '(f1—\/2N0+U—2 \/No) S (2U + m — \/2No + U?) - (— 2N0 + U2 + (U — x/U'Z + 4)) Obviously —\/2N0 + U 2 + (U —— \/ U2 + 4) < 0, with the fact that m1 > 0. x/ZNO + U2 2 2U + «U? + 4 26 then ml _<_ 2U2 + N0 — 2U - (2U + v’U? + 4) 2 N0 — 2U2 — 2U - U2 + 4 : U2 + U2 + 4 + 2U . \/U2 + 4 -— 2U? — 2U - t/U‘2 + 4 :4. For the lower bound of ml, assume 7721 < 2. i.e O < m] g 1, then 2U2 + N0 — 2U - — 1 < ml 3 1. Thus, (2U2 + NO — 2)? _<_ ‘ZNO + U2 A7,? + (it/'2 — c) . N0 + 4U4 — 9U‘2 + 4 < 0 N0 3 3 — 2U2 + «5 — 3m Then, we have No 3 2.4 when U : 1. Contradiction with N0 : 10 when U : 1 from previous discussion on NO. Therefore. ml 2 2. Together with the first part proof. 2 3 m1 3 4. With similar argument, for mj, j : 1, 3. 5. 7, .., we all have 2 S 7713‘ g 4. (for example, to prove 7773, one just need to replace NO by A72, which equals N0 + ml + m2) Proof is done. 1:] Let N1 : N0 +m1, propm‘tion of 1 from stage 1 to stage A] has property that 9N1 > 05 N11“ Let 7722 be the number of stages needed to switch play back to 0. [‘0 N — + .. -2 - l "‘1 "2 S 0;) + ——f— Therefore Then, 92'1“th : 1V1 +7112 W U U N — . ' , — 2 (O.()+— g—N )(0.5+ 1 7771+an )<0 N] 1 w/Ni + 1712 N1 + m2 _— and 7772 is the largest integer such that , U N1 — ml + mg — 2 0.0 + —,——— — , > 0. x/Ni + 7722 A1 + 7722 Then. solve for 7722. with the property of m1, m1 — N0 3 2U 2 — 2U - mg S 4le + 4 - 2U - V 2N0 + U2 + 2U - N/U'2 + 27m + 4. Lemma 2.4.2. With. the discussion of m1 and No, we claim that 3 g m; g 4. Similarly, for 777.4, mg, mg“ . . we also have 3 g m]- _<_ 4, j : 2, 4,6. 8,. .. Proof: For m2, claim that 7722 S 4. If m2 > 4, i.e. mg _>_ 5, since 7772 is integer and satisfy 2 + 4 — 2U- y/‘ZNO + U‘2 + 2U . N/U? + 2m.1 + 4 2 m2 2 5 Then 4U? + 2U° x/U2 + 2m + 4 _>_ 1 + 2U. W i.e when we assume U 74 0. \/U2 + 27721 + 4 > — — 2U + \/ 2N0 + (.12. 21U By previous discussion. V2Ab + U2 > 2U + \/U'2 + 4. thus the right hand side is > 0. 28 Then , 2777.] Z (— — 2f] + V 2A’T0 + U2)2 2 (—+2U+\/U'2+ +4-21}? —U2—4 21f} : E+U2+4+l VLI2+4 ”U2 1 V T2 : ..+ b.+4——>oc U——>0. 402 U Contradiction with 2 3 ml 3 4 from previous discussion. Thus, we have the conclu- sion. that 7712 S 4. If m2 < 3, i.e. mg 5 2, since mg must be a integer. 4U2 + 4 — 2U - ,/2N0 + U2 + 2U - N/U2 + 2m1 + 4 — 1 g 2 4U2 — 2U - y/2N0 + U2 + 2U- N/U2 + 2m1+ 4 g —1 for all U E [0,1]. However, when U : 0, we have left side=0 g —1. Contradiction! Therefore, based on all the discussion we have above, we have 3 S m _<_ 4. Similar , 2 proof for 7114., m5, m8,. .In another words, we have 3 < 7m 3 4, for j- — 2. 4. 6, 8 ..... Proof is doneL-J According to the two propositions above, we can consider the Player II‘s total loss. Since Player II only wins twice in each cycle (each. cycle contains 7T)2k-1 plus m2], stages which is of length at least. 5) and since Player I plays 0 at stage 1, Player N—_f\l_—0 INF—NO IIs win< 2 + -2 — 2. Thus, Player II‘s total loss is Z N— 2, i. e Player 2N0 5 11’s total loss is 2 3N + — 2. Therefore the Cesaro loss is O 3 1 CLNfQ) : 3 + 0f?) 29 At the other side, the Bayes envelOpe gN /\ (1 — gN) g %. Therefore. 1 1 Div = CLN(Q) — 9N /\(1— 9N) > — + (NW) Example 2.3 (Simulation of PARP without refreshing randomness ) _10 Suppose the randomness used in Player II’s strategy U = 0.7. Player I and Player II play as we describe at the beginning of this section. Then, the Player II’s average loss sequence and Bayes envelope at each stage are showed by the graph below. and the simulation of the first 27 stages of Player I and Player II are listed by the following table. Figure 2.3: Non refresh randomness U=0.7. _H +++ + Average Loss of Player II 0.2 _+ + Bayes Envelope _ 0.1 ~ ‘ G 41 l l 41 0 100 200 300 400 30 500 Table 2.3: Non refresh randomness example with [7:07. stage Player I g, barta: Player 11 Loss,_ 1 0 0 1.2000 0 0 2 0 0 0.9950 0 0 3 1 0.3333 0.9041 0 0 4 1 0.5000 0.8500 0 1 5 1 0.6000 0.8130 0 1 6 1 0.6667 0.7858 0 1 7 1 0.7143 0.7646 0 1 8 1 0.7500 0.7475 0 1 9 1 0.7778 0.7333 1 0 10 0 0.7000 0.7214 1 1 11 0 0.6364 0.7111 0 0 12 1 0.6667 0.7021 0 1 13 1 0.6923 0.6941 0 1 14 1 0.7143 0.6871 0 1 15 1 0.7333 0.6807 1 0 16 0 0.6875 0.6750 1 1 17 0 0.6471 0.6698 1 1 18 0 0.6111 0.6650 0 0 19 1 0.6316 0.6606 0 1 20 1 0.6667 0.6528 0 1 21 1 0.6818 0.6492 1 0 22 0 0.6522 0.6460 1 1 23 0 0.6250 0.6429 1 1 24 0 0.6000 0.6400 0 0 25 1 0.6154 0.6373 0 1 26 1 0.6296 0.6347 0 1 27 1 0.6429 0.6323 0 1 28 1 0.6552 0.6300 1 0 29 0 0.6333 0.6278 1 1 >1: _ ~ U ban — 0.5 + x/I‘) is the threshold for decision based on 9,. All the discussion and simulation example show that it is necessary to refresh the randomness at each stage when we use Hannan type strategy to make decision. 31 Chapter 3 PARP Strategy for k-extended Envelope Problem 3. 1 Introduction Practical forecasting problems are of great variety. Sometimes we suspect that Nature. or the market (as our Player I) makes its decision by some patterns. The decision on one stage may be affected by the previous k stage decisions. For example, the market gives rise to a certain stock price. This may raise investors confidence and this coi‘ifidence or followup may make the market give another increase the next day. Therefore. we are motivated to study such kinds of pattern behavior. Suppose the Player I's moves on a, are affected by (1,4... (114“, .. . . (14-1, then with this situ- ation, our Bayes envelope is called k-ertended Bayes envelope. and the corresponding forecasting problem is called k-crtendcd Bayes envelope problem. In this chapter. we will give definitions and extensions of PAP strategy and PARP strategy for the two-extended envelope problem. Although we focus on the two- extended envelope problem. it is easy to generalize the two-extended envelope case to k-extended cases. 32 3.2 Envelopes including Extended EnveIOpes we have already introduced the simple envelope R for the evaluation of average loss in the repeated play of matching pennies. Hannan (1957, Sec 3) defines the simple envelope at stage N as the total loss N - CL N to Player II had 11 known the empirical distribution of 1's moves (1.]. (1.2. . . . ,aN and played Bayes against this distribution at each stage t : 1, 2, . . . . N. We consider what. are called extended envelopes for repeated play. first introduced by Swain (1965) and Johns (1967) for the repeated play of a statistical decision problem and first analyzed and ordered in general terms in Gilliland and Hannan (1969). Extended envelopes can be defined as minimum average loss across specified sets of strategies including those chosen to take advantage of possible Markov-type structure in the empirical distribution of Is moves. Example 3.2.1 Repeated Play of Matching Pennies. Consider the repeated play of matching pennies and the collection of strategies 5 : {2(0),])(1),p(2l.p(3l} where 2(0) and 2(1) were defined in example 1.1 and 12)?) : (1N and pi?) : (n.4, t l l 1" o: A? I '3 and [233):1— (1N and ;)§3)21— (n-1, t: 2.3.... ,N These may be thought of as a stay strategy and switch. strategy, respectively. although the moves by strategy pm at stage 1 are not possible given the rules for the repeated play. The extended envelope of order 2 is given by 17mm“) 2: 7711'Ilr1.{CLN(gN.[_)/\,)I]_) E S} 33 As the minimum over a larger set of strategies, New) S R‘l’taw) = May.) for all 9N Thus, 2-extended envelope Hi2) is a more stringent envelope against which to conmare the average loss of a Player 11 strategy. For the explicit evaluation of the extended envelope 17(2) it is useful to consider all consecutive pairs contained in the sequence QN understood to be wrapped in a circle so that aN precedes (11. There are N consecutive, overlapping pairs. Let. 71,-]- denote the count of the pairs with first component 73 and second component j, i. j 2 0,1. Then it follows that n] z: 7110 +7111 : NgN :number of 1's in the sequence QN and no :: n00 + 71.01 : [\"(1 — gN) :number of 0’s in the sequence 91w and 71.01 : 7710. Proposition 3.2.1. If nij are defined as above, then AFRQNQN) I (7101 /\ 7100) + (71,11 /\71]()) Proof: From the definition of 71,-]- and C LN, we have .v 0‘ (V 'CLN(_QNstN)) : 711 Z 77-10 + 77-11 N ' CLNfflM Pk?) : 7'er : "01 + 7100 N - (IL/(fawn?) : 7101 + 7710 : 21101 A" ' CL1\-'(QN._]3§3)) : 77,00 ‘1" 7711 Proof can be completed by considering the four situations: (nm 3 12.00 and 7101 S 7211), (nm S 7200 and no] 2 7711). (7201 2 72.00 and no] 3 71.11) and (71.01 2 n00 and nm 2 7111). Example 3.2.2 Let. N : 17. and £173 (0,1.1.1.0.1.0.1,1.0.0.0.1.1.010). In this 34 case 7100 = 3, 71.01 : n10 : 5.77.11 : 4. Then 2-extended envelope 17'R(2)(g_17) : (3A5)+(4/\5) : 7. \Vhile. 17-R(9/17) : 17 - 11mm”) 2 (3 + 5) /\ (4 + 5) : 8. Consider another example where the sequence of Player I moves shows greater second order dependency. I\-’larkov structure. Take N : 17, and 9173 (0.1.0.1.0.1.0.1.1.0.0.1.1.1.0.1.0). In this case 11.00 : 2, 72.01 2 7110 2: 6n“ 2 3. Then 2-extended envelope 17-R(2)(g17) : (2A6)+(3/\6) : 5. \Vhile, 17-H.(9/17) = 17- R‘Wn) = (2 + 6) /\ (3 + 6): 8. The extended envelope idea can be based on three-tuples, four-tuples. etc. and leads to an ordering for a fan'iily of k envelopes War) 5 ("ft—”(41.) s 3 Wm) = 9.4 A (1 — 9N). Regret relative to the 2-extended envelope is new) : cat-(447.2,...) - Wat). This Chapter includes the adaptation of the PARP strategy (defined in Chapter 2) to matching pennies that achieves uniform Hamian consistency with the 2-extended envelope, 1. e., —] u a D)3)(a_.parp) : 0(N ‘17) amformly 272 a. Remark: The wrapping of the sequence (_LN gives an ordering to the resulting envelopes. an idea exploited in Gilliland and Hannah (1969). However, in developing Ilannan consistent strategies in repeated play with the k-extended envelope. there are only t — k past k-tuples of Player I consecutive moves available to Planter II for basing a move at stage t. t : k+1. kt+ 2, . . . . Thus. the regrets studied in the following sections are relative to envelopes based on the empirical distribution of the N — k+ 1(11ot A") k- 35 tuples (a1,a2. . . . .ak.), (a2, (1.3.. . . ,a.k+1),. . .. (aN_k+1,aN_k+2. . . . ,aN). The difference in regrets compared to those relative to envelopes based on the N k-tuples from the wrapped sequences is for fixed I: of order 0(1/N) since only 1: of the k-tuples are omitted at each stage N. In the rest of this chapter we consider the repeated play of matching pennies. In section 3.3 we continue discussion of the 2-extended envelope. In section 3.4 and 3.5 we derive exact expressions for the Cesaro loss from PAP and PAP + applied to empirical distribution of past. In section 3.6, we show how PARP applies to give Hannan consistency for the 2-extended problem. 3.3 Two-extended envelope problem Let‘s consider Player I‘s moves in an actual game in N : 13 trials. Table 3.1: An example of Player I’s actual moves in N213 trials stage12345678910111213 Payer10101101010010 Now as Player 11, we want to predict Player l‘s move on the (N + I)“ trial. In order to do the prediction. we have considered PAP and also discussed the strategy based on random past (PARP) which is randomized from the empirical distribution in previous chapters. However, another question arises: Does Player I more likely play 1 following a 0 or more likely play 0 following 0. In another words. is it possible that Player I is playing on a certain ‘pair’ pattern. Here, we explore a strategy to deal with such kind of 'pair’ pattern moves of Player I, i.e. our Bayes responses will be based on the past ‘pairs’. In general. Player I ‘s move is s<~~quence g with .N trials: QNIals a2. a3r~~~ (IN—1» (1N- By pairing Player I's move on each trial and its next trial. we have a set EN: EN 2 {((LI, (12), (0.2. (13), ((1.3. (14)., . . ., ((1.1\.'_1, 01(1)}. For each pair in EN. we take the first. coordinates as the condition. Then, we can get two partition sets of EN with respect to the condition of each pair 0 or 1. and during this partition procedure. we keep the order of these pairs. Therefore. by partition with the first coordinate as the condition, we have: AN : {(a.,-.a.j)| a,- : 0, .27 : 1.2....,N— 1, j : 1.2,...,N} BN5ffm.a-j)l 04:1, .i:I,2.....N—1, i=1,2,...,N} Suppose Player I‘s move on the N t" is aN : 0. Our Bayes response bN+1 is the move under the condition, preceding move. is 0. Naturally, our prediction should be based on Player I‘s pair move under the same condition, i.e. on set AN. In set. AN. all the second coordinates of each pair are Player I‘s move with condi- tion: the preceding move is 0. Therefore, we take out the second coordinates of each pair, keep their orders and put them together to form a. new sequence .410. The new sequence AN contains all the behavior of Player I with the condition: the preceding move is 0. Take the example at the beginning of this section: We have: .4N:{(0,1), (0,1), (0,1). (0,1), (0,0)} , BN:{(1.0), (1.1). (1.0). (1.0). (1.0). (1.0)} 37 Since aN : 0, like what we discussed above, we consider AN, and take out all the second coordinates of each pairs in AN to form a new sequence: .41.;1. 1, 1, 1. 0, at“. where a N+1 is the move we want to predict. In sequence AN, each term is an individual play with the same condition: proceed- ing move is 0. In another word, we can consider each term of AN as an individual move. Therefore, we can use the strategies we have discussed already in previous chapter and the PARP from Gilliland and Jung (2006). 3.4 PAP Strategy in two-extended enveIOpe prob- lem We apply PAP strategy on the sets AN and 310 which we have constructed in last section. Theorem: I11 this kind of two~extended envelope problem, when the PAP strategy is used to do the (N + 1) stage prediction, the Cesaro expected loss sequence for his PAP strategy p is given by: (2) 0.5 1 , 0.5 CLN : 7V + N ‘ ("00w A ”01w + 77,10,111 A "’11.N) + N ‘ (an + #7212.) 0.5 , , 0.5 , , + N ‘ l"’00.N 7f "01.Nl + N ‘ [7110.N 75 7111.Nl 38 where 710W : number of (a)...ak+1) : (1,1) .for k : 1, 2.3, . . . . N — 1: 771 : (”bow + ”bi..N)~ 14,1 : {number of times of gk : 0.5 Ik : 1.2.3,... .711} g). is the enniirical proportion of 1 as the second coordinate through stage k in the sequence AN. 772 : (77"10.N+77’11.I\') pug : {number of times of gt. : 0.5 I]: : 1,2,3. . . . ,712} 91,. is the empirical proportion of 1 as the second coordinate through stage k in the sequence B N. Furthermore. min { C'Lg)(g,\y,pw)|fi.red 11:14.,“ 13: 0,1, ,j : 0.1} 9N _, . 0'5 + i . (. I A I + I A“, ) — N 7V "'00.N "OLN 7710.N “11.N and max { CLfi)(gN.pN)| f'II.redII./,j..vs i: 0.1, .j = 0.1} 9N —‘ 0.5 1 : 17V '1‘ 1.5 ' W ' (“CON A 7761mm? + ”II/10.1w A 7731“.) 0'5 I I ‘ _I _ I + N ' ([7100N 75 "OLNl + l"'10,N 7% 7711,1vl) Proof: a \'..‘,( 2c) .r.217:()‘ ' 1 .o) )a w A e _.,_ 74.))3'1‘11e‘. For 11,, se 116111 g”,V '(r Idmgt fcur tterns 1nt o-(\t(nd (1(11\(l( ( MI] 111 39 (0.0).(O.1).(1.0).(1.1), we can always create set a] from sequence QN where EN 2 {(01. (1,2), ((1.2. (13). ((1.3. (14). . . . ._ ((1-N_1. (1N)}. Since the first coordinate in a pair is 0 or 1, E; can be divided into two subse- quences: AN: {(a.,aj)1a.:0. .2T:1.2.....N—1.,j=2~~~~Nl BN3{(CZ¢.(lj)|(1,z-:L .2? : 1.2.....N _ 1. .- : 2 ...... A r} Therefore. we notice that AN H B N : (ll and AN U B N : fl. Thus. ,. "2 . A - C Liv) : Loss on a1 + Loss on second coordznates AN + Loss on second coordinates B N Since we always flip a coin on b1 as the start, Loss on a1 = 0.5. For sequence AN, since we only consider loss on second coordinate in each pair. we can form all the second coordinates into a sequence. By using past strategy )9. according to Gilliland and Jung (200(3). I I 7’ 00.N "011v \ Loss on second coordinates AN : nl - ( , I . , , . "now + 7701.N "’00.N + "OLN I - .. "oaN - : 0.0 - 11,11 + 0.0 -[ , I # 0.0] "ODJV + "01.1w" Since 771 is the the number of the pairs in the sequence AN, i.e. n1 : (7160.5. + "()er: tl‘lerefore we can simplify the formula above. Loss on second coordinates AN : ("bow /\ n'OLN) + 0.5 - V,” v" I I + 0.1) ' [II-OOJV ?Z- 7701Nl 40 where. V,” : {number of times of gt. : 0.5 | k : 1. 2. 3. . . . .771}. .(II: is the empirical proportion of 1 as the second coordinate through stage k in the sequence .4 N. Similar for sequence B N. . I I — Loss on second coordinates B N : ("10w /\ “-11.10 + 0.0 - ung + 0.5 ' [77"10.N 7e "”11.Nl where 722 :3 ("SUN + 77"11.N) ung : {number of times of gk : 0.5 | k = 1. 2, 3. . . . .712} gr. is the empirical proportion of 1 as the second coordinate through stage It in the sequence BN. Thus. the total loss can be written as the following form: . (2 .— — .N oC'LN) : 0.0 + ("bow /\ ngLN + 71’10‘NA "’11,!” + 0.0 - (11,11 + ung) + 0-5 ' [7100.N 5‘4 77'0er + 0-5 ' [niaN ¢ 7",11.Nl i.e.. .2 0.5 1 0;) CLEO : T + T. - ('Iigow /\ 7701A" + "’10.N /\ 77,11.N) + T ' (I/nl + H.712) a. A 1 0.5 0.5 + ,— ' l"bo.~ 75 716er + _N— ' l'N-iow 5‘ '"iml 41 I n I By Gilliland and Jung (2006). for fixed 111 and n1 - , 00‘!) ., mininnnn loss on ‘ "00 N “‘01 N the second coordinates 011 AN is achieved by the minimum number of Um. And min{loss on the second coordinates on AN} I I ”oaN "OLA" ) + 0 5 I 771 . (n’ + n’ n’ + n’ 'OO.N '01.N 'OO.N OLN Similar for the sequence BN. min{loss on the second coordinates on B N} I I : 'Il.‘2'(I +,/ I + I .0 ”MN ’7'11,N 7how "11.N i.e. fired Ngéj‘”, vi = 0115 7' = 0-1} min { CLEE)(QN~PN> a . _ — ”00w ”01w "'10.N "luv A N . . . . . . . ’2 Sunllarly. we can easlly derive for the explicit form for the maxn'num of C'LfN') , mar{toss on the second coordinates on AN} "be N ”in N I I l i ) + 05’ l'n-OOJV if 770er : nil . (72’ + n’ 72’ + n’ . ‘00.N ’01.N 'OO.N '01.»; ma:r{loss on the second coordinates on B N} ””10 N ”/11 N - I I l H ) + 0-0' l"00.N 7£ "01.1Vl __ , '9 . n... (n’ + n’ n’ + 77’ '10.N ’11.N ’10.N -’11.N Therefore. 1. niax{CL}3)(_qN.pN) I fired NglIM. i: 0.1, ,j : 0.1} 9A7 0.5 _ 1 z T + 1.0- N ' ("bow A "brN + "’10.N A "Inc/v) 0.5 ’ + TV ' ([72’0O.N 7g 71’01.Nl + [In/10.0" # ”ILNl) Proof is doneC] w- . 2 . . Comments 3.4.1: Slnee the 111ax1mu1n of the CLiv) IS aclneved by the sequence with maximum of UN. in another word. max loss on the second coordinate on .4 N /nl is achieved by the maximum of UN. i.e. I __ I "00.N * 7?01.N as many times as they can in the sequence AN. This indicate that the sequence AN : {(0.0) (0.1). (0.0). (0. 1). (0. 0). . . . } or conversely AN : {(0.1). (0.0). (0.1). (0.0). (0.1)....}. Similar for the sequence BN. the maximum case is the sequence: BN : {(1. 0). (1. 1). (1. 0). (1. 1). (1.0)... . } orconverselyBN:{(1.1). (1.0). (1.1). (1.0). (1.1)....} If we transfer the two sequence back to original sequence form. the maximum case aN-:00110011001100.... 01‘ gN211001100110011.... . . . . . (2 . . For the nnnnnum case. Since. the mnnmum of C LN) 18 obtalned l.:>y the sequence 43 with the minimum of UN i.e. I _ ,I . I _ I ”01w — 7"10.1w ”10.N — "11w As few times they can. i.e. AN : {(0.1). (0.1). (0.1). (0.1). (0.1)....} or BN : {(1.0). (1.0), (1.0). (1.0). (1.0)....}. In another word. the original sequence is: g..,.;01010101010110..... or reverse the position of 0 and 1. Corollary 3.4.1: When we take the dimension of the envelope as a higher di- mension k. k is a fixed positive integer. we can use the same idea as the theorem we proved above. Here we can take k:3 as an example to show the possibility of this generalization. For k:3. we have 23 : 8 triple patterns which is the combination of the three. dimension with each coordinates has 0 and 1 two choices: (0.0.0) (0.0.1) (0.1.0) (0.1.1) (1.0.0) (1.0.1) (1.1.0) (1.1.1) to transform original sequence a into EN : {(a1.a2.a3). (ag.a3.a4). (a3.a4.a5). . . .. (a.,,_2.(1.;(.-_1.a)(:)}. 44 Then. we can construct the sets .41. .42. A3 and A4: ALN 142‘N 443w AM As we do in . (Ij. Gk)! '.(lj.(lk)| :.aj.ak.)| -.a.J-.ak)| ai:03aj:0¢7::19....]\7—2~jZQe-Hsfv—ls A:3‘7\v} (1.-:00):1..2::1.....N—2.j=2....,N—1. k=31---J\’} (Li:1.aj:03i:1§---91\7_25j:2""?]v_17k:3‘.'.]\7} a.:1.aj:1.i:1.....N—2.j:2.....N—1,k:3....A'} the theorem we proved above: . 3 A - CL)? : Loss on a1 + Loss on a2 4 + E Loss on 3rd coordinates on 14.; N 1:] And Loss on 3"d coordinates on. 441w n’ n’ . 000.1v 001 .N . : 711 ( )+0.0-1/,,.1 n’ + n’ n’ + n’ -000.N 001.N '000.N 001.1: I - ”(101w - + 0.0 - [ I I # 0.0] "000w + "DOLN _ , I , I — . , I I —- (”0000' A "001.N) + 0-0 ”:11 + l”000.N 7e “DOLNl where ng‘I-J is the number of (ak.ak+1.ak+2) : (i.j. l) for k? : 1. 2. 3. . . . . A" — 2; 711 is the number of the pairs in the sequence Amie. n1 : (71600.1(, + 77001.10- Apply the similar ideas on .42. 143 and .44. and plug them into the formula of 40 C1153). we have: CL)? Furthermore. and mm CLEV) max CL)? 1 11’ A, ,, A» N N "000w 77001.N+”010.N "011w "ioow A ”I101.N + ”[110.N A 7"',111.N) — ' (”711 + ,un2 + W713 + 7nd) W ' (lnboow 7g "beLNl + lnbiko 74 “011.Nl l"i00.N ré 71,101.1vl + l"7',110.N # ninwl) 3 I I II I —- W + W ' ("OWN A "001$" + 71010.1v A "011w I I I I I I + "100.1v A ”-101.1v + ”new A "111.1vl 1 1 _1_ I /\ I I I /\ I N + - ' N ' ("000w ”'001.N+77010.N "011.11; A O! I I I I I 77100.0" A ”101w + "110.0r A 72'111.N) 0.5 —A/_ ' (lnboow 5‘é 71001.1vl + lnblow ié 77011.04 [71,100.2v 5’5 niorrvl + l"'-,110.Iv # 77i11.Nl) With the discussion in theorem. comments and corollary above. we can see that using PAP strategy which does not involve any random process. just makes the fore- casting decision based on the exact empirical data. will be trapped in some special cases like. the maximum example we just showed in comments after the theorem. Therefore. reasonably we would like to more innovative idea to avoid the trapping situations. 46 3.5 PAP+ Strategy in two-extended enve10pe prob- lem Similar with what we discussed in chapter 2. after we construct and studied the PAP strategy in 2-extended envelope problem. we come to consider to play against past plus present (PAP+) strategy. Theorem: In this kind of two-extended envelope problem. when the PAP+ strat- egy is used to do the (N) stage prediction. the Cesaro expected loss sequence for his PAP+ strategy 3 is Criven by: 2. 0.5 1 0." . 01;. 0) 7 N + 7.70.11 A + A 721.10 — w 7 0... + 11....) where ngIN :: number of (ak.(1.1.+1) : (i.j) ,fork : .2. 3. . . . . N — 1; 771 I (”00w + ”01.01% M,” : {number of times of g). : 0.5 II: : 1.2.3... . .711} (J). is the empirical proportion of 1 as the second coordinate through stage k in the sequence AN. "2 : ('77'Il0..~'+"',11.1v) 11.1.2 : {number of times of g1. : 0.5 It : 1. 2. 3. . . . .n2} Here. gt is the empirical proportion of 1 as the second coordinate through stage k in the sequence B N. Proof: Similarly in the proof of PAP strategyfor any sequence a... according to four patterns in two-extended envelope problem. (0.0). (0.1). (1.0). (1.1). we defined the four proportions with respect to these four patterns. and also we can always create set _(LN from sequence aN. where $1001.02), (020-3). (03.04)...-, (aw—1.611(1)}. By first coordinate in a )air is O or 1. av, can be divided into two subse uences: 1 —J\ ANZ{((LI.CLJ‘)| 04:0. .t:1.2.....]V—1. j:1,2.....]\7} BN={(ataj)l (11:1, «22:1.2......,N—1. j:l.2....,N} Therefore. we notice that .4 N (1 EN : (b and AN U B N :_—_ EN. Sinee any a.. i : 2.3.... .N and the second coordinates in a pair appear once and only once either in AN or B N. and decision on a.- only depend on {(aj. (1)..) I aj : a._1.j : 1. . . . .i — 2. A, : ‘ .3. . . . .i — 1} i.e. loss on decision of a. in gNis equal to the loss on decision of a.- in AN. Thus. (2 . N - CL N) : Loss on a1 and (12 + Loss on second coordinates A 1(- + Loss on second coordinates BN Since we always flip a coin on b1 as the start. Loss on a1:O.5. Loss on. second coordinates AN : (7160‘ N /\ 72.61. N) — 0.5 - 14.1 48 where. 11,” : {number of times of g1. : 0.5 I I: : 1.2.3... . .711}. and here gt is the empirical proportion of 1 as the second coordinate through stage k in the. sequence AN. Similarly. for set B N \‘ I n b D D ‘ . - , i —’ I I f 0 Loss on. second coordinates BN — (71001.. /\ 71.01. N) — 0.0 pm where. ,ung : {number of times of g). : 0.5 I 11' : 1.2. 3. . . . .712}. and here g). is the en'1pirical proportion of 1 as the second coordinate through stage k in the sequence B N. In this way. for PAP + strategy. the total loss is N - CLIENII) : 0.5 + (7160ij 716”.) + "low /\ 7231“,.) — 0.5 - (11,11 + ung) i.e. I (2) 0. 1 ,. 0.5 CLIV' (B) :1" 7V— ’l” RFC/IBOJV A 72.61.10 + IIIIIO‘N' A 7I'III.IV) — T . (”111 + “will C51 Proof is done. [I The explicit form of the expected loss of PAP + strategy above shows that when 74... and It"? reach their maximum 711/ 2 and n2 / 2. the expected loss approach to its 49 minimum. which is 0 5 1 ' I I I I 1 ' N + Effloow A ”01w + ”low A "111,N) " 0-23 min CLI3)(I_)) : and when when 14.1. and [Ln-2 reach their minimum 0. the expected loss approach to its maximum. which is r (2) 0-5 1 I I I I max CLN (B) : TV— + NWWW A "01w + "10.1w" A 77'11.N)- 3.6 PARP Strategy in two-extended enve10pe prob- lem As we did in last section for PAPast strategy. we partition the original sequence a... into two sets AN and BN. according to the previous stage value as the condition. Then. we extract all the second coordinates of each pair pattern within these two con- dition subsets. to form two subsequence correspondingly. On these two subsequences. instead of PAPast strategy. we use PARP strategy ideas. For each fixed 11. when we forecast the next stage i.e. the (N + 1) stage. in sequence am. we are going to do the forecast base on the Nth stage aN’s value. If (1N : O. we will use the subsequence AN from the empirical data set AN. By PARP strategy idea. we will do bootstrap sampling in this subsequence. and use the majority from the bootstrap sampling result as our forecast for aN+1. DiHerent from the PARP strategy we introduced in previous chapters. this is an conditional PARP strategy. The object of applying PARP strategy in not the original sequence a N any more. but the conditional subsequence. With this idea. we have to investigate the asymptotic properties of this new methodology. Theorem: If we apply PARP strategy on two-extended enveloI:)e prol;)le1n. let p" 50 be the PARP decision. then mncotama—30angA+rr(Ir.An',+ n',Aw ) N A 00A OLA 10.A 11.A T i I ‘ u ' . - (2) , __ I I I I ' . I P' . v. here A and B a1e constants. RN (p) — 7700.1vA7701. N+7710 NA77‘11.N is the tvx o-extended Bayes envelope. Proof: W'e partition the origi1'1al sequence a into two subsets: ‘4N:{(ai’aj)l (“:09 3i:1$23-..71\r—'13 j:1,2.....x)\’7} lfirflmflm m=fl,.r=LGqN—L jzta WAG Furthermore. we take out the second coordinates of each subsets to form two new subsequences AN and EN. Thus. applying PARP strategy on original sequence con- ditional on the previous stage. is equivalent to using PARP strategy on A N and 1310. By Gilliland and Jung (2006), when we use PARP strategy on IN to predict (110+1, we will have two constant A1 and B1 such that. ~ 1 CLn] (14:17)”) .<. _ n1 , N N (‘41+B1 I nl'gnlA(1—gnl)) where CLn1(/I. p") is the Cesaro loss on the random past strategy for sequence AN. I "‘01.N / , I 1 "01 ,N ”00. N N _ gnl — 7161.1.) : number of (0. 1) in. [I N- 7160..., : number of (0.0) in 47.1. r ... , I I ' , g r. . ~ and 721 — ("01w + 7100.10). 1.e. the number of pairs 1n AN. Similarly. if aN : 1. i.e. our Bayes response would be based on the set. BN instead of AN. and when we use the random past. strategy on the sequence B N. \N’e have A2. B2 such that, ~ 1 , r CL..2(B.p*) s 7012 + B2 - 712-932 A (1 - 932)) ‘II- I "11.N / ../ ‘ "10.I\7+”11.N 7 where 9,122 : 77,10.N : number of (1.0) in By. "7"11.N :2 number of (1. 1) in BIN. _ I .-I and n2 — (nlLN + 72.0.1.7). Since CLn1(.Zf.p*) is the Cesaro loss with condition aN : 0. and C Ln2( B .p") is the Cesaro loss with condition a N = 1. the total Cesaro loss should be: CLN(_a. 19*) : P(aN : 0) - CL.,..(.?1’. p*) + P(aN : 1) - 013.2(5. p") I I \ ("01.J\'+"11.!\7) I\'—1 I . I ("00.N+"01.N) N — 1 By empirical distribution. we use to estimate P (aN : 0). and to estimate P ((110 : 1). Thus. N - CLN(_(_1_,.p*) : (726m; + 71.60.10) - CL.,.1(A.p*) + (“/1001 +"1i1.A’)'CL'vz2(§-I’*) 52 7 2) . Since the extended envelope RI. (7)) : 7700.1v /\ 7161. N + 77,10.N /\ "’11. N. We have 7 t (2 ~ .1 1w cam > — at» 7 (tn—ti + It...) 012.11.71.17 > — A + ("liarxr + flirty) ' Cva2IEsI)* — nian A ”itNl = KNEW + 060.10 - (CLMAJI‘) - 93'} /\ (1 - 931)) + (In/10$} + ”ILA’) ° (CL,;2(B,I)* — 971:2 A (1 _ 971:2)“ l/\ l ,- I ("01.N+ ”baN) ' Eff“ + Bl ' 771 ‘95:] A (1 " 921)) I , _ + ("iaN +77'i1.1v) ' 5042 + 32' \ "2 ‘ 93:12 A (1 "' 92-2)) |/\ (741 + Bl ' "bow A "-01.10 + (.42 + B2 - "iaN /\ 72/11...) Let .4 : .41 + .42. and B : max{B1.B2}. then N - CLA’(Q~I)*) ‘ RISRPII S A + B ' f\/’"00.N A "brN + "iaN A Him?) for all Player I move sequence a and all N (N > 1). Proof is done.l:l . '2 . Comments 3.6.1: The theorem above shows that ICLN((_1. p") — HEN-KP} /.\’I has a uniform bound which is ()(N “1”?) in Player 1 sequence of move.i.e.. the Cesaro loss of PARP strategy in two-extended envelope problem converges to the Bayes envelope with convergence rate 0(N’1/2). Example 3.6.1 Suppose Player I is playing 0 or 1 for each time. we suspect that Player I‘s move at each stage is effected by his move on last stage. Therefore. it is reasonable for us to form this forecasting problem as a two—extended envelope problem . Table 3.2: Two-extended envelope problem stage k a). 7);. C'LI?)(_a.p) p; CLII2)(a.p*) 1 1 0.5 0.5 0.5 0.5 2 1 0.5 0.5 0.5 0.5 3 0 1 0.667 1. 0.667 4 1 0.5 0.625 0.5 0.625 5 0 0.5 0.6 0.5 0.6 6 0 1 0.667 1 0.667 7 1 0.5 0.643 0.5 0.643 8 0 0.3 0.604 0.26 0.595 9 1 0.67 0.574 0.74 0.557 10 1 0.25 0.592 0.16 0.586 11 1 0.4 0.592 0.32 0.595 12 0 0.5 0.585 0.5 0.587 13 1 0.75 0.559 0.84 0.553 We observed Player I ’s moves through 13 stages: gN : 1 1 0 1 0 0 1 0 1 1 1 0 1 We apply both PAP and PARP strategy after each stage to make the decision for the next step. After the 13"" stage. we collect all the results in the table 3.1 showing where Player I move a)... the PAP decision from Player I’s past move is p. the PARP decision is 1 *. and the ex )6Ct-Gd Cesaro losses for both PAP and PARP strategv are P I 0. also listed in this table. Chapter 4 Discovering Hannan This chapter gives the results of a search of the literature that had the goal of finding published results that were in Hannan (1957) and not recognized as being there. we find sexeral examples. To some extent. the cryptic style and notations of I-Iannan (1957) explain the failure of other researchers to fully exploit the Ilannan work and to recognize the specific theorems that he proved. Motivation for this search is provided in part by Gina Kolta’s New York Times article Pity the Scientist li’ho Discovers the Discovered. February 5. 2006. In Section 4.1. we discuss some of the Kolta (2006) article and mention that Chen (1997) established the direct connection of the Foster and Vohra (1993) result to Hannan (1957). In Section. 4.2 we show that the main result in Feder. l\/Ierhav and Gutman (1992) is contained in Hannan (1957). In Section 4.3 we consider for the first and only time the finite horizon version of repeated play (see Section 1.2) and connect results on minmax regret found in Cover (1967). Chung (1994) and Cesa-Bianchi and Lugosi (1999) to Harman (1957). 4.1 Foster and Vohra: Selecting Forecasters Kolta's lead paragraph mentions the Foster and Vohra (1993) paper “A Randomized Rule for Selecting Forecasters.” The strategy proposed in the Foster and Vohra paper has the structure of a Hannan strategy. much in 21.}_)1_)earance like those covered by his Theorems 3. 4 and 6. (The structure of Hannan-type strategies was explained in Chapter 2. Section 3.) Hannans theorems claimed and proved the conclusions for strategies built on be- ing Bayes versus random perturbations of the nmltinomial empirical counts (t— 1)G,7_1 of Player I’s pure moves in repeated play of a. game where Player I has 771 possible pure moves. Chen (1997) reexamined the Harman theorems and proofs and showed that the proofs actually cover the case where the empirical distributions 01-1 are re- placed by the empirical distributions of randomization distributions taking values in the probability simplex in R’". A randomization distribution x is a probability distri- bution over the m pure moves and (t— 1)GI_1 is replaced by Xt_1 2 111+172 +° - -+.rt_1. Then Chen (1997. Section 4.3) shows how the Foster and Vohra strategy is a I-Iannan strategy so that bounds on its regret and asynmtotics are a direct. consequence of her reinterpretation of the Hannan theorems. Following Chen's work. Gilliland and Hannan (1999. 2008) improved on Chen. mainly through the demonstration of good bounds and Harman consistency for strategies in the repeated play of the dual of the S-game. This component easily subsumes the expert selection problem considered by Foster and Vohra without a. weakening of bounds (larger constants) that is inherent in Chen's approach. Vohra. is quoted in the following paragraph from Kolta (2006) in which Ilannans name. is misspelled: In 1957. for example. a statistician named James Hanna called his the- orem Bayesian Regret. He had been preceded by David Blackwell. also a statistician. who called his theorem Controlled Random Walks. Other. 56 later papers had titles like “On Pseudo Games”. “How to Play an Un- known Game”. “Universal Coding” and “Universal Portfolios” Dr. Vohra said. adding. “Its not obvious how you do a literature search for this result.” As mentioned previously. Hannan and Blackwell used different approaches in the construction of their strategies. and it is likely that neither one named his theorems. Moreover. the term “Bayesian Regret” was probably not ever used by Hannan. “Con- trolled Random Walks” is the title of a talk and a subsequent proceedings paper by Blackwell (1956) that give a general result that can be applied to produce Hannan- consistent strategies. “On Pseudo Games”. “How to Play an Unknown Game”. “Uni- versal Coding” and “Universal Portfolios” denote different. general but related topics. The quoted paragraph might leave the false impression that exactly one result. or theorem has been given the different names. 4.2 Feder, Merhav and Gutmaanniversal Predic- tion Feder. Merhav and Gutman (1992) considered the problem of predicting the next stage of an individual binary sequence using finite memory. And in the section III of this paper. they gave out the definition of this predictor. which is called “S—State Universal Sequential Predictor.” in the following way: A. “0”. with probability 0503(0)). At+1 :- “1”. with probability q)(fit(1)) : 1 — (7.003(0)). 57 where (b(-) is given by f 0. Ogo‘<%—e. (..- 1 (MOI—i EIo—éI—tfi. é—cgog%+c I1 é+c —_h_:-:—1- where 97-1 is the proportion of 1‘s in Player 1‘s play from stage 1 to stage t—1. In another words. Player II's predictor is .. .. . . 1—2 _ . £7 “0". With probability P(Zg —— Z; S ——Tg:——1)). . 1+1 : 1 9-, . . . 1—2 _ “1". With probability P(Z2 — Z] > —I:’—1-). Now we can show that Feder. I\:'Ierhav and Gutinaiis strategys probability of 58 playing 1. 0502(1)) : 1 — 0702(0)). is the same as probability of playing 1 in Hannans 1—2 strategy P(Zg — Z1 > Iii—1 ) with a specified distribution for Z : (21.22). Take Z1 to be a random variable from uniform distribution on [0.1]. let Z; be degenerate at 1/ 2. and let o :- 1171(1) the proportion of 1 in empirical distribution of Player I moves from stage 1 to t-1. Then. we have 0. 0_<__o<%— '(')—P(Z<1+1('—l))— 1 1 1 1 1 (P(l— 1 g 2_:‘0 2 — zIo:—§I+-2-. 5—ESOS§+€. 1. 5+c (2 7)l/1_/2 “e develop an expression for DmM. Since min{Y.A —Y }— — N — max{Y A —Y }. (4.1) can be expressed by: 0T4: W Emmfl N—i}— (in [\DIH A calculation using the syn’imetry of the distribution B (N %) about A / 2 shows that with N : 211? + 1. 2k+1. ifY:0. 2k+1 211:. if Y : 1. 2k. 2k — I. if Y : I‘Q 2k — 1. niax{Y. 211‘ + 1 — Y} = < k+1. ifY:k. k+1. 61 Therefore. 1— , , , 1 2k+1 Emax{1'.2Ar+1-—l'} : QIWZ< j )‘(2k+1—J) j:0 k 1 2A? 1 l :— 2. A Jr) (2k+1—_]) 22H1 , ]!(2A ——J)l 3:0 _ 2k +1 A (2A)! 7 2k ,1 - ,._ t 1 1 u — (2A‘+1)-(§ +— 2’P(} — —A)). where Y" ~ Binmniat(2k. %) i.e. r _ ,9" I, __ .,. 1 1(2A')! E111ax{1,,.A+ 1 —l } —— (2A + 1) - {5 + 24kklk'} And with N : 2A7, similarly we have , , , A , (2A7!) E111ax{1.2A —1}: 2A {% + 24*A‘A'} It follows that minimax regret is given, by 1 (2A.) 1 mAI mA/I__ /* __ .. ' D2A+1:D2k :5 4‘Ak'k -§P(3 —Al» (4-0) where Y” ~ Binwnial(2k, é). A". : 1. 2. 3, . . .. Furthermore, since for all A: : 1. 2 DmM > 0. and WM 1 , 2(k-H)! D2,-yy(yt 1:112'r1i1ll (4.11) ml where 111ax,-,-,N(g) gives the maximum pay-off for sequence s among i experts given the total stages N. Since these two predictions are essentially the same, we take Cesa-Bianclii and Lugosi (1.999) strategy as an example and specialize it to the simple Bayes envelope. Then 1, ifSZN/2—(t—1)-g,_1. ilgf .Lp(y“10YN“)—ilgf LAT—113W“) = 0, if S 2 M2 -— (t — 1).q,._1—1/2. 15' a t , —1, ifSZN/Q—(t—I)'gt—1—l. where S : :1; ~ BinomiaMN—t. 1/2), since Y,- ~ Bernoulli(1/2), i : t+1. . . . , N, and gt_1 is the proportion of 1 from stage 1 to stage t-1. In this way. (4.10) is converted to _ 1 1 , . , . PinM(?_/t 1) : §+§IPI5 _>_ A"/2—(t—1)-g,_1)—P(S S AI/2—1’At‘1l'gt—1) (4.12) where S ~ Binomial(N — t, %). This shows that if we specify the envelope to be the simple Bayes envelope. mini- max regret results i11 both Chung (1994) and Cesa-Bianchi and Lugosi (1999) are the 70 same as found in (4.9), Hannan (1957). 71 Chapter 5 Expert Selection Problem 5.1 Introduction and Review Nowadays. all kinds of consulting services are booming. especially in financial services. There are many financial companies and agencies giving advice everyday to all kinds of investors. They are using different and complicated system or algorithms to analyze the financial market of different financial products and to forecast the market of these products. As experts with experience and knowledge in finance, each of them is trying to persuade the individual investors to take his / her advice. However, surrounded by so many experts” advice, as an investor. how can one make a decision? This is called erpert problem. Littlestone (1988) generalized the earlier researchers idea to an arbitrary set of experts. However. in his strategy. randomness is not included in the forecasting process. His strategy concerns picking the the expert whose forecasting record is the best, as the best expert in the set of experts. and using this best expert‘s prediction as the final forecast. He showed that as long as there exist one expert whose forecasting is correct in all stages. the final decision maker will not make more than tog/2 N mistakes. where N is the total nu11‘1ber stages. To remove the restriction in Littlestone(1988). i.e.. to consider the case that among 72 all experts there is no expert always correct. Littlestone and \N’armuth (1989) intro- duced the idea of weighted majority algorithm. In this strategy, they assign a weight to each expert. Once an expert makes a mistake in forecasting. he will receive a penalty: his new weight is old weight multiplied by k, 0 < A" < 1. i.e.. to reduce. the weight. on his advice in our final decision. Furthermore, \N’armuth and Haussler, et a1 (1993) considered the weighted majority strategy in the situation that each ex- pert’s forecaster is a probability distribution 011 set. {0,1}. 111 all these strategies. randomness is not involved in selecting the experts actual forecast. Hannan in 1957 first. proposed the idea of bringing randomness into sequential forecasting proble111s. In Hannans strategy. a random factor is added to the empirical distribution. and a predictor based 011 this adjusted empirical distribution is used as the forecaster for next stage of the play in a repeated game problem, as we have introduced in previous chapters. This idea can be introduced into expert. selection problem. Foster and Vorha (1993). proposed a. randomized rule for selecting experts. They first proposed this expert actual selection problem instead of predicting a probability distribution of experts set. or combining the experts" advices. However. the random- ized strategy they proposed is equivalent to Hannans strategy, which was proved by Chen (1997) and improved in Gilliland and Hannan (1999. 2008). By using bootstrap sampling, we introduce the PARP strategy to the expert selec- tion problem, especially the two experts selection problem. which will be discussed in section 5.3. Section 5.2 will introduce an example of usage of expert selection in the real world. a 111ethodology called focus forecasting. And at the end of this ("l1a.j.‘1t(-*r. a simulation example of using PARP strategy in financial forecasting is given. 73 5.2 Focus Forecasting 5.2.1 Introduction In inventory management. forecasting is essential. As a new concept of forecasting. the term focus forecast was raised by Bernard Smith (1978). With around 30 years of usage so far, this method which is described as a heuristic 111ethodology and it is used widely in industrial area. Over 800 companies in 47 countries worldwide are using Demand Solution which is designed around focus forecasting in their i1‘1ventory man- agement request. And this method is described to be a simple simulation approaches to 01_)ti111ization. to be more practical. more easy to understand and a simple system to work. Focus forecasting constructs a pool of alternative decision rules for forecasting one stage ahead. At. every stage. all the decision rules or models in the pool, are tested by the empirical data generated before this stage, and the rule with the smallest error in selected for the decision. Therefore. focus forecasting simulates every time it forecasts. It is a dynamic simulation. It uses a computer to simulate every time, and compares the errors of all the rules, to pick one to use in the current forecast. Regardless the seasonal or trend type of time series data, focus f(_)recasting itself just picks the one best strategy l11ased on the e111pirical test against recent history data. In inventory management. the traditional method is exponential smoothing, which is taught to almost every student in inventory management and is still the most widely used forecasting method in the world today. However, focus forecasting doesn't use the exponential smoothing to approximate moving average. The reason Smith states in Bernard Smith (1978): In those early computers, storing a twelve moth inventory history to calcu- late a moving average was expensive, inaccurate, and dangerous. So Bob 74 Brown used exponential smoothing. . . Focus Forecasting doesn’t use expo- nential smoothing to approximate moving average. Why? Well, comput- ers today don"t make mistakes. They are nearly 100 percent. accurate. .. However, Gardner and Anderson (2001), compared the focus forecasting and expo- nential smoothing showing that exponential smoothing is substantially more accurate than the Demand Solutions approach. Although there are some criticism on focus forecasting in academic field, we still can notice some interesting ideas in focus fore- casting, which is Flay Against Past strategy’s idea. 5.2.2 Methodology Focus forecasting constructs a pool of decision rules or strategies. Some of these rules are designed for recognizing trend. some of them are designed for recognizing seasonality. For example, ‘whatever the demand was in the past three months will probably be the demand in the next three months‘, this would be a rule to recognize trend instead of seasonality. Vy'hile if a simple rule as ‘whatever percentage increase or decrease we had over last year in the last three months will probably be the percentage increase or decrease over last year in the next three months“. would be a rule of recognizing seasonality. Gardner, Anderson—Flether and Wicks (2001) listed the seventeen decision rules included in Demand Solutions. And there rules are functions of the previous quarterly data: 1. Next quarter will equal last quarter. ‘2. Next quarter will equal last quarter plus a growth factor. 3. Next quarter will equal the same quarter a year ago. 4. Next quarter will equal the same quarter a year ago plus a growth factor. 5. Next quarter will equal the average of the last two quarters. 6. Next quarter will equal the average of the last. two quarters. plus a 75 growth factor. 7. Next quarter will equal the average of the last two quarters with the last quarter double weighted. 8. Next quarter will equal the last quarter plus the difference of the corresponding quarters last year. 9. Next quarter will equal the average of the last three quarters, with the last quarter double-weighted, and which seasonal adjustment. 10. Next quarter will equal the average of the same quarter in the last. two years plus a growth factor. 11. N ext quarter will equal the average of the last quarter of the current year plus the difference of the corresponding quarters from the last year plus the difference of the corresponding quarters from two years ago. 12. Next quarter will equal the average quarter of the last year. 13. Next quarter will equal the average quarter of the last year plus a growth factor. 14. Next quarter will equal the average quarter of the last two years. 15. Next quarter will equal the average quarter of the last two years with seasonal adjustment. 16. Next quarter will equal the average quarter of the last year plus the change from the average quarter two years ago. 17. Next quarter will equal the average quarter last year, plus the change from the average quarter two years ago. with seasonal adjustment. Then, during the simulation procedure. an error of measurement. for each strategy for each time will be computed. and the final forecast strategy is selected among these decision rules. From these decision rules’ definition. we. can easily notice that although the final decision. is selected among these rules, final decision is a function of the history data. i.e. past data. since all the decision rules are function of past data. In another words, focus forecasting is using Play Against Past strategy to make the decision. From our discussion about. PAP strategy and its failure. we could see that there are some situations in which focus forecasting fails. 76 5.3 Two experts selection problem Suppose there are two experts who give out 1.)redi(i~.tions against the market each day. Their errors probabilities are recorded at the end of the day: Expert 12X]. X2. X3,... Xn_1, Xn.... Ecrpcrt 2:3”1. Y2. Y3.... Yn_1. 3%.... where we assume the errors are bounded. Without. loss of generality we take {A1} and {3’2} 6 [0.1]. Selecting an expert for each stage is repeated play of the (:ton'iponent game where Player I selects a pure 0. : (2", y) E [01]2 Player II selects a coordinate b E {1, 2} and the loss function for Player 11 is L(a.b) : X[b : 1] + i"[b : 2]. If 77 is a probability distribution on {0, 1]2, then the Bayes risk of b is 2]. L(7r.b) : /L(a,b)d7r(a) : E7..(X)[b : 1] + E7,(Y)[b Bayes risk is any choice I) to minimize thus. 1‘ If EWLX) < E74)? 5 : arbitrary. if wax) : E43,) :2. if EN(X) > 13.0?) 77 and minimum Bayes risk is In repeated play. Player If uses b,,(g,_1) for stage t, t : 2.3.... and the simple envelope is YA,- /\ FA.- where (TY—Nine) is the average of the (X1,Y1), (X203). . .. (XN a YN )- 5.3.1 Reduction to Z-problem Let Z,- : X, — 3'2, then the Bayes envelope of two-experts selection problem RN : XN /\ YN is equivalent to EN = Z N /\ 0 + YN. Thus. two experts’ forecasting error SEQUGIICGS are equivalent t0: £221, 22, Z3,... ZN_1, ZN~°-- and the zero sequence since 3'2; are fixed and given by the past history data. In this way, original two experts problem is transformed into a one-dimensional game. Each term of this Z-sequence can be arbitrary number from -1 to 1, then by previous result, N N N ' DN(Z.I)(ITI)) 2 :(Pk ' Zk) — (Z Zk) /\ 0. L21 k=1 where B is the probability sequence P1, P2...., PN. Pk : Prob(7;_1 S 0). for If : 2,3....,N, P1 :arbitrary number from [0. '1], and 72.4 is the average of the random sample from {21, Zg, . . . ,Zk_1}. Let 0 S AN T and 0 _<_ BN T be such that. —AN S 1V ° DN S EN. 78 If we remove all the 0 terms from this Z—sequence to form a new sequence Z which is the subsequence of original Z-sequence. let mN _<_ N be the number of the non-zero terms, also be the number of the terms in sequence Z. Then. consider P is the probability through the past summation, we have ~ —4 :N-DN<.Z_..B>:mN-D (2.13338 4 fllAr m.]\.' mAr- Therefore. we have a stronger bound for original regret: '24 S N ° DNLZ-E.) S B 77 . I ‘f\_f m]\.' ' When {A}, {3’2} 6 {0.1}. Z,- 6 {—1,1} which is equivalent to {0,1} matching binary bits problem. In fact. any two state game.i.e. two Players’ action set is {(1. b}, is equivalent to {0. 1} matching binary bits problem. Lemma 5.3.1. [fin a repeated game Zk E {—l,l}, Pk E [0.1], L(Z. P) : Z . P, then the regret of this game is equal to the regret of matching binary bits, i.e., DMZ-B) : Didi-.3) where X 6 {0,1}, P E [0, 1] is the matching binary bits problem. Proof: By the definition of the loss function. —Pk if 2,, : —1 kazk~ Pk) : Pk if 2;, : 1. Where Pk : Prob(7k_1 S 0). And from matching binary bits (Tame, X E {0.1}. P E [0.1], and Pk if Xk : 0 L:.(Xk.Pk) : { 1— Pk if Xk : 1. 79 where P). : Prob(fhl 2 é). Then, let qk : 1 — Pk : Prob(Xk_1 g é). [4“st Pk) : Xi- ' (1k +f1— Xk.) ' f1 ‘ (1k) : 2X1.- ' (1k +1— (1k + Xk- Since there exist the one to one transformation mapping: . 1 , , A :72-(Z+1) or Z:2.X—1. Then 7;, : 217k — 1, i.e. Prob(—Z_k_1 S 0) : Prob(2fk — 1 S 0) : qt. Thus. mama) = 2X- —1) P %N (32.31 —1)-P.-+1—-X,~—(1—X,) 2 LN)“: (IN—l (1‘ XN) Therefore. since 7;. : 27X]. —- 1 the Bayes envelope of matching binary bits game: J—X—N /\ (I — PTA! : (2Yk - I) /\ 0 + (l - KN). the regret of Z-sequence is: DN(Z,P) .: LNfZN:£N)_—Z_NA0 I [IR/'(ZQANQAJ—(l—YN)—(2:}Tk_DAD : *N‘(XN qN)—((2Xk_1)/\O+(1-TN)) —_- DMX. P). Proof is done. El Comments 5.3.1. With proof in. lemma, study of tuio-erperts selection syste7n., 80 is equivalent to the study on sequence Z, where Z; = X.- — iii. And this is used in following sections. 5.3.2 Worst case discussion In Gilliland and Jung (2006), asymptotic convergence property was proved by con- sidering the worst case g for the strategy. For hilatching Binary Bits problem, the worst case for both PAP strategy and PARP is: 19 O i—0 O i—s O s—- C i—-' O i—4 O i—I O I“ By establishing a. bound for the modified regret of this worst. case. a uniform bound of the regret for all situations of Player I’s play sequences g was developed. To study the asymptotic convergence property of PARP strategy in the two- experts selection problem. it is reasonable to seek and analyze a worst case. Lemma 5.3.2. The worst case of modified regret of PARP strategy is not achieved at boundary. i.e. maxg DN(g_. b) is not achieved on the boundary. Proof: Suppose n:3. so two experts system is: (XL Y1 ): (X23 Y2). (X3, Y3) the modified regret of Play Against Random Past strategy is defined as: 3 ' Dstgbl (X1 + Y1) + X2le .<_ Y1] + X3 ‘ P2 + 3'3 ' (1 —- P2) _ l T 2 — (4X71 + .XQ ‘i‘ k'3) /\ (3’3 + Y; + Y3), where the probability P2 : Prob(Y; S 7;) and [] is an indicator function. By discussion of last section, two—experts selection problem is isomorphic to Z- 81 problem. Let Z, : X, — 1'}, then I 3 ' Dsffltél : 5(21+ 2Y1) + Z2lX1 S Y1] + Y2 + Z3 ' P2 + Y3 — (21+ Z2 + Z3) /\0 — ( ”1+I"2+Y3), where the density function P2 : Prob(—Z—; g 0). i.e. 1 3'D3(g,b) : 3214—22-[21 < 0] + Zg-Pg — (Z1 + ZQ+Z3) /\0 According to Play Against Random Past strategy, the proli)ability mass function of 7; from bootstrap sample {Z1, Z2} is: when choose Z1 twice 1%; when choose once 21 and once Z2. when Z2 twice rblt-‘Nlt-‘rbl’d Suppose (Z1 + Z? + Z3) < 0 i.e. Bayes envelope R3 : 21 + 22 + Z3. Further more, assume 21 > 0. then I 3‘D3stQ) :- 5Z1+Z2°0+Zs'P2—(ZI+Z2+Z3) l : —72'Z1—Z2—Z.3'(1-P2) Since Z1 > 0. to achieve maximum of the regret max 3 - D3 (ab). 21 >0'Zl +ZQ+Z3 0, Z2 < 0. I — when Z1 + 22 > 0 P __ 4 2“ 3 X when 21 + Z2 S 0. If P2 3: fi, 3Dtb)- 122(1)(1 1) 21>0.ZIIT}dZJ\2+Z3<0' 3 a._ _ ‘2 1 2 4 z 1.25 where Z1 : 1, Z2 : —0.9999. and Z3 :: —1. since Z1 > 0. Z2 < 0 and Z] + 22 > 0. The nearer to -1 Z2 is. the better. but Z2 can not, be -1. If P2 = 3, 3D( b)‘ 1Z Z (I)(13) Zl>O.Z?:llf174k-‘_)+Z3 0, 22 < 0 and 21+ 22 S 0. The nearer to 0 Z1 is. the better, but 21 can not be 0. For the case of Z1 S 0. 1 3193(9le 3Z1+Z2'1+Z3'P2—(Z1+Z2+Z3l 1 = 721—2341 —P2> Base. on the definition of probability mass function P2, we have when Z1 + Z; > 0, since Z S 0. Z2 > 0 P2 : when Z] + 22 S 0. phlwib-lr—i 83 W hen P2 : i, 1 1 2130.21111§%2+23<03.D3(Q~Q) — —§Z1 "' (—1)(1—E) m 1.25 where 21 2 —0.9999. Z2 : 1, and Z3 : —1, since 21 S 0, Z2 > 0 and Z1 + 22 > 0. The nearer to -1 21 is, the better, but. Z1 can not. be -1. \Vhen P2 : 3, I max 3 - D3((_i.b) : —+- Z130.Z1+Z2+Z3<0 2 where Z1: —1,Z2 : {any value 6 [—1,1]]21 + 22 S 0}, and Z3 : —1. All the calculations above, shows even for n:3, the maximum of the modified regret of PARP Strategy is not achieved at. the boundary of the problem domain [0. 1]" x [0, 1]" which is equivalent to the domain [—1, 1]" for the Z-problcm. Proof is doneC] Therefore, the proof of Hannan consistency of PARP strategy for two-experts selection problem can not be studied through the worst case idea. 5.3.3 Hannan consistency of PARP for Certain classes We concern the asymptotic convergence property of PARP strategys regret under different classes of sequences. With discussion in Z-problem, we notice that the orig- inal problem is equivalent to Z-problem, i.e. we only need to discuss the convergence property of Z-sequence. Z3 Z1, 22, 23,... ZN_1, ZN.... 84 The most easy case is the one expert is superior to the other one. i.e. in sequence _Z_. all Zt has the same sign. Let Z > 0 without. loss of generality. Then, for all N > 0, A7 N DN( Z p(__z_,_rp) —: Zk P(ZL <0)— NZ. /\ 0 I Z 2.. - P(‘Z;_1 _<_ 0) — 0 z 0. 1:21 —* . . where Z k_1 IS the sample mean of bootstrap sample in PARP strategy. This means if one expert's prediction is always better than the other. the regret of PARP strategy is always 0. More difficult situation is the two experts are competing with each other. Theorem: For any sequence Z. Z. 6 [—1. 1]. if at 2 C1 > 0. the regret of PARP strategy converges to 0 with order (Xvi—1;) i.e., 1 I)N(;parp)—+O with, mic 0(7). where C 1 is a constant and of. is the variance of sequence {21. . . . , Zk}. Proof: By Bermr-Esseen theorem in Loeve (1963, pp 288), , —Z_k_1 ° V A — l l I. C Pk m-DN(;,;.*) : E :Zk.(<1>(— )) + E :Zk-—— . —.— k:1 0k_1 k=1 W a}: where C 1s a constant. pk.— —— EIZ k — Zk|3 . Now we consider the term: IV __ Z-_ - ’k—l Zz.-<<1><—“ V >> k=1 Uk—l LGt Ak—l : —0A‘ 1 0 (71.4 ~ V1: — 1), and since Zk : 1;? ~21. — (I: — 1) -7k-1. we can write the term as: It follows that :(k' - 7k — 1V k=l Therefore, N - DN(_Z_».parp) A! 2]ka — (k— 1)-Z._1).<1>(.4._1). k2] (A — l) ' 7k—1)‘ q)(.4k_1) 2 1.2,, <(.4._1>— «A.» — o + Na - MAN) 0 N A - .. Z k . 7k ./ Al—l 1 - 6-3::de + 4N7_Z-[\r - (I)( 4N) V27? 4 N _ AI—l 1 1'2 : 2192p] ‘ -e_7d;I‘-+NZN ‘I’Mtl k=1 Al: 27‘. N C PL Z ,_____.__ r — A . _.-'~ .— Let Part 1:222] k. Zk. - fAl‘_1 ,1— - e 317cm". Part 11:.7\"'ZN - (P(AN). and N ,' Part IIIZZkzl Zk ' \/k(‘T1- ' 373’. k- w 27? Pk k For Part. I, suppose 0k._1 < 0;. without loss of generality. Then [14k—1— Akl S l ‘- '(—Z_k_1'VA7- l—7k'\/Z:)l Uk—i _ Ok—lfi ~(7k_1-\/kr—1\/: — 7M —1.)+ min, _ 1) — 7w; 1 _ _ : _lz. +(k'—1)Zk_1 — Z._1-¢k:—1-x/El O'k_1 A 1 Uk—n/A: (IZkIWkt— 17 M— 1— Vim 86 Since x/k' — 1— f: >Cl>0 1 1 _ At... -Ak S Z}; +0 Zk_ )< O( — —— ° Zk_ . l 1 l Uk_1\/]:(l l I ll) (w) +O(\/—'T)0k—l I ll Thus, I') Alf—1 1 _fF2 1 _44: / -e deg |.4k_1—.4k|- ~e 7A A]; 27r V277 42 . ‘ [‘~_ if we assume MAI.) : J12?— -e——"} > d)(.4k_1) : fir:— ‘2 without loss of generality, then _9 N —2 _LZJ)‘: P t 1 0, then _ __ 77 7V) 7' IVY. NzN-q)(AN)———NZ~-(— if :m M/ 2o (— ’” )- A? 0N Let (me : x/N - 7N. then by Feller (1964, pp 166), 71’ 7V 11 7V ) “‘2' [:0 #3 /1‘(1_/:/3_T‘ u \/§ €_'_4;\_' < w U‘A’ —— €- - (1.7 I .‘1‘ N "N' . . ’ __ / 87 Thus, Part II = W-uw-q>(——'—’)s N-uw-‘I>(——-) 9 - _ 1 _‘_ : W-uw'f V2 --eI'-?_d;r _oo 27? S N-C i.e. Part II S 0(W). For Part 111: 21:12,. - 7);: - 3’3 ”1.» b\ the definition of Z*k and |Zk| E [—1.1]. k 1 S C Then, 12.9%3 c. 0k i.e. Part. III 3 O(\/]\—7). With discussion on Part I, Part II, and Part III, we have D N ( _Z_ . parp) S ()(1 /\/T). D “Z , parp) ——> 0 with rate 1 O ———- . ( r—N) Proof is doneD The group of three figures below shows the simulation of PARP strategy for se- quence 5;, where :r E [—1. 1] and 0;. > 0.1, which is shown in the figure in row3. The figure of row 1 is actual Cesaro loss-Bayes Envelope (in blue) vs expected Cesaro loss - Bayes Envelope (in red) and the function 1 /\/—IV. The figure in row 2 is S” (in blue) vs S (in green). 88 Figure 5.1: Simulation Actual Regret 1 . . . 1/sqrtN Estimated Expected Regret _1 l l L 1 l l l 0 50 100 150 200 250 300 350 4 O Bootstrap Sample Sum 50 I . . 1 Data Sum 0 . ‘ n, .,- I 3'“, 51." H‘ ‘1' T" I" If 1‘ " ‘ ' 1' 1'31" " 1| “In ‘ 'v“"("“1h”1"l"b I'll-“H‘JS —50 ' 50 1 00 1 50 200 250 300 350 400 N=400, sequence Z from —1 to 1 1 . . l . . . . ~. 1 . , l l , ‘ . ' ll I": 0 fill 11)} A ll ) 40 5'0 100 1&0 260 250 360 ago i 400 5.4 Examples of the application of PARP strategy As a statistical decision strategy in expert selection problem, PARP strategy can be applied to many kinds of practical problems. Here we only give an example of the application of this strategy in two—expert system. By Hull (2002) in finance, there is a constant effort to predict future or forward prices of stocks, bonds, options and commodities; the ability to predict future behavior provides important information about the underlying structure of these securities. In interest rate market, many different types of interest rates are regularly quoted. These include mortgage rates, deposit rates, prime borrowing rates, and so on. As a member of interest rate market, the n—year zero rate or spot rate is defined as the rate of interest earned on an investment that starts today and last for 11 years. All the interest and principal is realized at the end of n years. There are no intermediate 89 payments. Forward rates or forward interest rates are the rates of interest implied by current zero rates for periods of time in the future. The graph shows the movements Figure 5.2: Forward Rate History Data Forward Rate Historical Data 1 4 l I I ' I a - ‘3' I ; " a g I a I h .‘i 'I E VI ’ ‘ ~ g lovA l "H! 1 . , ‘ , II . ‘ IE 6- .MI “I 'lfl-‘l .1- Jul/[15" '. ‘ ...... ‘ ' £4341 ;- . ’ 94131 ' .h‘ I ”‘1‘ I‘d». ,. Ill" 4'- \"|:'hl‘ . _ _ ”fill . t‘.. 2 .. .... . . AI." ... c l I I I I 0 1 000 2000 3000 4000 5000 6000 day of various forward rates of US market data from 1983 to 2003. It includes forward rates for 3 months, 6 months, 1 year, 2 years, 3 years, 4 years, 5 years, 7 years, 10 years and 30 years. We are going to use forward rate for 3 month as an example to show how PARP strategy is applied on it. In our two expert system, Expert I is ARMA model and Expert 11 is just the average rate for the recent last 3 days. Blue line represent the true data; red line is Expert I, i.e. ARMA model; Green line is Expert II which is average rate of past 3 days. We take 1 cycle=120 work days, then list two experts’ errors and the PARP strategy’s error in the table 5.1. From the table above, it is easily to observe that 90 Figure 5.3: Forward Rate prediction: True data vs ARMA vs Simple Average Forward Rate: True data vs ARMA vs Simple average 10.5 f y 1 u r } ‘ § — True data ' g ——l— ARMA 10' ...... . , . . . . * SimpleAverage‘ l 9.5-1.... . ........... -d 2 g 9 .. ....... r . ....... , '9 G I) :l r. : . I . g 8.35)..“ J ..... 1 ..... a ”.... ...... .. ... ................... .. LL ~ (1 ,1 : .1.. ‘ . ' . - . I! I (l i I II I' I ll I ll: — " ll , 8 .. . 1%, ll ........ l, .............................. . l. ,1 it ll : l l' 7,5 .. - .......... g ....... 7 L J i I 0 50 100 150 200 250 day PARP strategy automatically choose the better model,i.e. Expert I which has more precise predictions. In this example, since Expert I the ARMA model is superior to Expert 11 at most of time, it is reasonable that PARP strategy converges to Expert I’s decision, and the graph 5.4 also shows that average loss of PARP strategy converges to the Bayes Envelopes, which agrees with the theoretical proof in previous section with the sample standard deviation 0,, > 0.01 for all k = 1,2,. . . , 120 in this example. Situation is more complicated if the two experts’ forecast are quite close. For example, Expert I is still ARMA model, but Expert II is GARCH(1,1) model with their prediction showed in the graph. We still keep 1 cycle=120 work days,then the comparison between experts and PARP strategy’s errors are showed in Table 5.2. The simulation shows the PARP strategy works well. The graph 5.6 also shows 91 Table 5.1: Forward Rate prediction: ARMA vs Simple Average Number of Cycle Ave Loss Expert I Ave loss Expert II Ave Loss of PARP cycle 1 0.0759 0.1083 0.0776 cycle 2 0.0711 0.0820 0.0699 cycle 3 0.0698 0.0830 0.0749 cycle 4 0.0499 0.0719 0.0522 cycle 5 0.0434 0.0591 0.0440 0.06 Figure 5.4: Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 5 Forward Rate: Bayes Envelope vs PARP Average Loss in cycle 5 0.05 0.04 r 0.03 0.02 - 0.01 - 1 I —— Bayes Envelope —-+— PARP Average Loss 0.1 20 40 60 80 100 120 Table 5.2: Forward Rate prediction: ARMA vs GARCH(1,1) Number of cycles Ave loss of Expert I Ave loss Expert II Ave Loss of PARP cycle 1 0.0707 0.0719 0.0706 cycle 2 0.0500 0.0503 0.0500 cycle 3 0.0697 0.0714 0.0689 cycle 4 0.0635 0.0640 0.0635 cycle 5 0.0673 0.0676 0.0666 92 Forward Rate Figure 5.5: Forward Rate: True data vs ARMA vs GARCH(1,1) Forward Rate: True data vs ARMA vs GARCH(1,1) 10.5 1 r 1 1 3 g 2 True data I, : : . —+—- ARMA 1o 3 ~ , ' , 5 - + GARCH(1,1) 0 50 1 00 1 50 200 250 93 Figure 5.6: Forward Rate: Bayes Envelope vs PARP Average Loss for cycle 2 Forward Rate: Bayes Envelope vs PARP Average Loss in cycle 2 0.08 r . . r . Bayes Envelope —-+— PARP Average Loss 1?— l 0.07 .-— n.- ‘L s.- l 0.06 - 0.05 t - 0.04 - _ 0.03 - ~ 0.02 - ‘ . I l 0.01 0 20 40 60 80 1 00 1 20 94 that average loss of PARP strategy converges to the Bayes Envelopes, which agrees with the theoretical proof in previous section with 0;, > 0.001 for all I.” = 1. 2. . . . . 120 in this example. 5.5 Future work Future work will include work on the two-expert selection problem and the k-expert selection problem. For the two-expert selection problem. the goal is to extend proofs of Hannan consistency for the PARP strategy to the general case covering all sequences g E [—1,1]°° . To accomplish this, we need to get approxin‘iations of P(ZZ. S 0) for the general case. We are looking forward to understanding and discovering more properties about the distribution of Z; in the future. One possible approach may be creating some bins on the domain of Zk, i.e., make [—1, 1] in to several categories. in order to make the domain of Z. a discrete set. Another one may be considering the change of P(Z: S 0) from stage 2' : I: to stage i : k + 1. These ideas will be worked on and discussed in the future. There are still a lot of open problems in this field as well. For example, since sometimes more recent past moves are more important to the decision, time-weighted PARP strategy can be constructed and its Hannan consistency can. be studied in the future. Also, in non-symmetric repeated game, construction of PAP and PARP strategies and their asymptotic. properties are very interesting and can investigate in the future as well. (O O] Bibliography [ll [‘21 [101 [111 N. Cesa—Bianchi, G. Lugosi, On prediction of individual sequences, The Annals of Statistics, Vol. 27, No.6, pp. 1865—1895, 1999. L. Chen, Application of play against past strategies in repititions of a game. PhD Dissertation, Department of Statistics and Probability, Michigan State Uni- iier'isty, 1997. T. H. Chung, M inimax learning in iterated games via distributional ma jorization, Ph. D dissertation, Stanford University, 1994. T. M. Cover, Behavior of sequential predictors of binary sequences. Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Publishing House of the Czechoslovak Academy of Sciences, pp. 263—272, 1967. M. Feder. N. Merhav and M. Gutman, Universal prediction of individual se- quences, IEEE Trans. Inform. Theory 38 (1992) (4), pp. 1258-1270. W. Feller, An introduction to probability theory and its applications. Vol. 1, Second Edition: Wiley and Sons, NY, 1964 D. P. Foster, R. V. Vohra, A randomization Rule for selection forecasters Oper- ations Reserach, 41, No. 4, pp. 704—709. J. S. Frame, Dennis C. Gilliland, Votes and a half-binomial, Discrete Applied Mathematics, 12, pp. 31—40, 1985. E. Gardner, E. Anderson-Flether, A. Wicks, Further results on focus forecasting vs. exponential smoothing, International Journal of Forecating, 17, pp. 287—293, 2001. D. Gilliland, I. Jung, Play against the random past for matching binary bits, J. Statistical Theory and Applications, 5, No.3, pp. 282—291, 2006. D. Gilliland and J. Harman, Play against past. strategies in repeated play of (T, S, -) game, 1999, unpublished. 96 (121 [13] [Ml 118l 1191 [‘20] D. Gilliland and J. Harman, Play against past strategies in repeated play of a S-game and its dual, Michigan State University. Department of Statistics and Probability, RM—603, 2008. D. Gilliland. Bootstrap fictitious play in repeated games, Proc. Joint Stattistical Meetings, Toronto, Aug, 8—12, 2004. D. Gilliland, Asymptotic risk stability resulting from play against the past in a. sequence of decision problems, IEEE Transcations on information theory. VollT- 18, No.5, pp. 614—617, September 1972. D. Gilliland. J. Hannan, On an extended compound decision problem, The An.- nals of Mathematical Statistics. Vol 40, No.50, pp. 1536—1541, 1969. C. M. Grinstead. J. L. Snell, Introduction to Probalnlity, 2008. uncut.dartmouth.edu/ chance /teachingaids /boohs.articles J. F. Harman, Approximation to Bayes risk in repeated play Contributions to the Theory of Games, Princeton University Press. 1957. M. D. Hirschhorn. Comments on the paper “Wallis sequence...” by Lampret. Aust. Math. Soc. Gazette 32.194, 2005. J. Hull, Options. Futures &: other derivatives, fifth edition, 2002. M. V. Johns Jr., Two-action compound decision problems. Proc. Fifth Berkley Symp. Math. Statist. Prob. 1 463-478, University of California Press, 1967. N. Littlestone, Learing quickly when irrelavent attributes abound: a new linear- threshold algorithm, Machine Learning. 2:285-318, 1988. N. Littlestone and M. K. Warmuth, The weighted majority algaritlnn. Sympo- sium on Foudations of Computer Science, 30: 256—261, November 1989. M. Loeve, Probability Theory, Third Edition. D. Van Nostrand Company. Inc, NJ, 1963. J. Robinson. An iterative. n’iethod of solving a game. Annals of lilathematics. 54. pp. 296—301, 1951. B. Smith, Focus corecasting computer tecl'iniques for inventory control: revised for the twenty-first centrury, 1997. D. Swain, Bounds and rates of convergence for the extended compound estimate problem in the sequence case. Tech. Report No. 81, Department of Statistics, Standford Unviersity, 1965. N. N. Vorob’ev, Game theory lectures for ecnomitsts and systems scientists, . , . . Sprmger—Verlag. llillryyyll(yr