ll N 1 WW \ WW“IWIWIWINIHI“WIN“ U1 _mmu _TH THESiS l i W LIBRARY 5% (91 i *7 3 a Michigan State University This is to certify that the dissertation entitled OPTIMAL CONTROL OF DYNAMICAL SYSTEMS WITH JUMP MARKOV PERTURBATIONS presented by Alexey G Stepanov has been accepted towards fulfillment of the requirements for the Ph. D. degree in Statistics V—élé/yv/ Professor Anatdli V. SR6rokhod (Major Professor) VProfessor Habib Salehi (Major Professor) /2}’;/t}’} 11/) L on 3 ’ Date MSU is an Affirmative Action/Equal Opportunity Institution PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJClRCIDateDuepes-p. 15 OPTIMAL CONTROL OF DYNAMICAL SYSTEMS WITH JUMP MARKOV PERTURBATION S By Alexey G Stepanov A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHYLOSOPHY Department of Statistics & Probability 2003 ABSTRACT OPTIMAL CONTROL OF DYNAMICAL SYSTEMS WITH JUMP MARKOV PERTURBATIONS By Alexey G Stepanov The control problem of a dynamical system with jump Markov perturbations is considered. The partial differential equation of dynamic programming for the value function V(t, x, y) is derived. Also an integral equation for the value function V(t, x, y) is obtained and is used to construct a sequence of functions that converge unifomly to the value function V(t, x, y) from above. TABLE OF CONTENTS INTRODUCTION .................................................................................. 1 CHAPTER 1 PRELIMINARIES ................................................................................. 10 CHAPTER 2 DYNAMIC PROGRAMMING .................................................................. 17 CHAPTER 3 REDUCTION TO A FAMILY OF DETERMINISTIC CONTROL PROBLEMS ...... 27 CHAPTER 4 FAMILY OF CONTROL PROBLEMS WITH “STOPPED” y(s) .......................... 36 CONCLUSION ..................................................................................... 44 BIBLIOGRAPHY .................................................................................. 47 iii INTRODUCTION Optimal control theory has seen tremendous growth over the past four decades, with important applications in finance, networks, manufacturing, medicine, operations research, and other areas of science and engineering. Optimal control theory is crucial to the design and operation of complicated modern systems since it ensures that vital variables are kept in check, regardless of the disturbance the system undergoes. In a variety of naturally occurring problems, we wish to control the system governed by a set of differential equations in order to minimize (or'maximize) a given performance criterion. In this dissertation, we discuss an optimal control problem of a system with a jump Markov disturbance. The evolution of the system is described by a system of differential equations, and there is a running (instantaneous) cost and a terminal cost associated with the process. A control is chosen in order to minimize the total cost of the process. However, due to a random jump Markov disturbance in the system, the position of the system at any given time is also random, and so is the total cost associated with the process. Thus, the control is chosen in order to minimize the expected value of the total cost of the process, and is also random, depending only on the present state of the process, and maybe also its past states. The two main approaches used in control problems are the Dynamic Programming approach and Pontryagin’s maximum principle. In Dynamic Programming approach, the minimum (infimum) value of the performance criterion is considered as a function of the initial (starting) point. This function is called the value firnction, and in many ways it holds the key to solving the optimal control problem. R. Bellman [1] applied dynamic programming to the optimal control of discrete-time systems, demonstrating that the natural direction for solving optimal control problems is backwards in time. That is, we start solving an optimal control problem by first finding the optimal control policy on the very last step. Then, armed with that information, we search for the optimal control on the second to last step and so on, each time using the fact that we already have the optimal control policy for the steps that follow the one that is being currently considered. The dynamic programming approach decomposes the optimal control problem into a sequence of minimization problems that are carried out over the space of controls, and are far simpler than the original problem. To illustrate the idea of dynamic programming, suppose the state of a system at time k is described by a process x k e R n, satisfying xk+1=f(xk,uk), OSkSN—l, where u k is the control at time k whose value is chosen from the set of admissible controls U k c R m. Suppose that the performance criterion is given by Jo (x, 11.), where N J,-(x,u.)=ng(xk,uk), OSiSN,xeRn. k=i is the performance of the process between time i and time N initiating from x = x,- at time i, when the control policy u. = (uO , u] , , u N) is used. Let Vk(x)=inf{Jk(x,u.)}, 0SkSN,xeRn. “0 Then VN(x)= inf {8N(stuN)}a x E Rn, uNEUN and Vk(x)=uilelf ng(xk’uk)+Vk+l(f(xkauk))}a OSkSN— Lx 6 R"- It It The problem of minimizing the performance criterion J0 (x, u.) over the choice of the entire control policy u. = (uo , "1 , , u N) splits up into a sequence of smaller and simpler problems of finding the optimal choice of each u k separately, working backwards - in time. Furthermore, the values of u k where the infimum is achieved on each step, u; , O S k S N, form the optimal (overall) control policy u: = (u6,u; ,...,u;v ). For continuous-time systems, the dynamic programming approach uses the same idea of working backwards in time, and uses the value function as a tool in the analysis of the optimal control problem. At time s, we choose the control u(s) for our process x(s), based on the assumption that the optimal control would be used after time s. Here, whenever the value function is differentiable, it satisfies a first order partial differential equation called the partial differential equation of dynamic programming. Suppose that the state of a system at time s is described by a process x(s) e R n, satisfying the system of differential equations dx(s) = a(s,x(s),u(s)), 0 S t S s S T, with initial condition x(t) =x e R", where u(s) is the control process whose value at time s can be chosen from the set of admissible controls U(s, x) c: R m. Suppose that the performance criterion is given by T J (t,x, u(-)) = I(t,x,v). H (t, x, p, v) is generally called the Hamiltonian in analogy with a corresponding quantity occurring in classical mechanics. Solving the (Hamilton-Jacobi-)Bellman equation gives us the value function. Moreover, the dynamic programming equation can be used to find the optimal control policy. Similarly to the discrete-time case, the values of v e U(t, x) where the infimum is achieved are the values of the optimal control policy u* (I). If the value function fails to be differentiable at some points (i, x), it does not satisfy the dynamic programming equation everywhere. Thus, we need to consider a generalized solution to the dynamic programming equation if we wish to use the dynamic programming approach. However, we may encounter a serious lack of uniqueness when dealing with generalized solution. Crandall and Lions [2] introduced the concept of the viscosity solution. For a large class of optimal control problems, the value function is the unique viscosity solution of the related dynamic programming equation in the case when the value function is not smooth enough to be a classical solution. For more on viscosity solutions see Fleming, Soneri[8]. ‘ i The optimal control policy may not exist in some optimal control problems. Then for some points (I, x), the imfimum will not be achieved in the dynamic programming equation. In this case, the dynamic programming approach can be used to obtain an e- optimal (almost optimal) control policy. In general, the dynamic programming equation is hard to solve. In some cases one has to resort to numerical solution of the dynamic programming equations. Typically, the state space and the control space are discretized, and the minimization is carried out for the final number of states. However, the computational difficulties may be too restrictive for complex multidimensional problems. Pontryagin’s maximum principle developed by LS. Pontryagin [15] gives a necessary condition that must hold on an optimal trajectory. Simply stated, if u*(s) and x*(s) represent the optimal control and the state trajectory, then there exists an R "-valued function P(s) called an adjoint variable, such that together u*(s), x*(s) and P(s) satisfy 41;? = —Hp (s,x * (s), P(s),u * (5)) gigs) = Hx(s,x*(s). P(s),u :"(S)) and for all s, 0 S s S T, the optimal control u*(s) is the value of v maximizing H (s, x * (s), P(s), v) , i.e., for all v e U(s, x), H (s, x * (s), P(s), v) S H (s, x "‘ (s), P(s), u * (3)). If V is differentiable at each point (3, x*(s)) of the optimal trajectory, then a candidate for an adjoint variable is P(s) = Vx (s, x * (3)). Under certain conditions, Pontryagin’s maximum principle can also be a sufficient condition for optimality. Let y(s) be a jump Markov process, and suppose that the state of a system at time s is described by a stochastic process x(s) e R n, satisfying the system of differential equations (1) $49 = a(s,x(s), y(s), u(s)), 0 St S S S T , with initial condition (2) x(t) = x e R". In (1), u(s) is a parameter whose value we can choose at any instant in a set U c R m in order to control the process x(s). Thus the control u(s) = u(s, (o) is a stochastic process. Since our decision at time 5 must be based upon what happened up to time s, the function u) ——) u(s, to) must be measurable with respect to 35 = 0'{(x(r),y(r)), r S s} . The Markovian nature of the problem suggests that it might suffice to consider control processes of the form 11(5) = y(s,X(S),y(S)). Suppose that the performance criterion is given by T J (t,x, y,u(-)) = Et, x, y{ j(I>(s,x(s), y(s),u(s))ds + ‘I’(x(T), y(T))} t (3) T = E j 0). In fact, each function in the sequence is a performance function of a control that is selected from a class of controls M that is more general than “LI. We will also show that if there is an optimal control in the class “ll, then that control is also optimal in class M. Therefore, while constructing the sequence of functions that converge uniformly to the value function V(t, x, y) from above, we would also construct a sequence of controls that are 6‘ - optimal in class M. CHAPTER 1 PRELIMINARIES Let (Q, 7", P) be a probability space. Let Y be a complete separable metric space, and let By denote the Borel o-algebra on Y. Let y(s) be a jump Markov process with values in Y defined on [0, T] with transition probabilities P(t, y, s, A), 0 S t S s, y e Y, A 6 By. Suppose that for all t e [0, T), y 6 Y, A 6 By. 1 E: P(t,y,s, A) — 1A(y)] ——) TI(t,y,A) uniformly in (t, y, A) as s —) t, s > 1. Suppose that for fixed (y, A) (y 6 Y, A 6 By) II(t, y, A) is continuous int uniformly in (y, A). Let Way) = “(WY \ M) = “(Walylia and suppose that for some K 1(t,y)S K for all t6 [0, T), y e Y. Suppose that the values of the controls are restricted to a closed subset U of R'". Let Q denote [O,T)x R" x Y and (2 denote [0,T] x R" x Y. Let the vector function a(t,x, y,u):é x U ——> R" and the running (instantaneous) cost R be continuous in (t, x, u) unifome in y and differentiable in (t, x, u) for each y e Y . Suppose also that a,(t,x,y,u) , ax(t,x,y,u) , au(t,x, y,u) , (D,(t,x,y,u), (bx (t,x, y, u), (Du(t,x,y,u) are all continuous in (t, x, u) unifonnly in y, and au (t,x, y, u) and (I)u(t,x, y,u) are continuously differentiable in (t, x, u) for each y 6 Y. In addition, suppose that for all (s,x, y, u) e a x U a) for suitable B] la(s,x,y,u)l S Bl(1 + Ixi) , b) for suitable Bz(R) . “ax(s,x,y,u)‘l S BZ(R) , whenever |x| S R , 0) there exist no 6 U such that for suitable C j (D(s,x,y, u) 2 -C1 and R be continuous in x uniformly in y and differentiable in x for each y e Y , with ‘I’x (x, y) continuous in x uniformly in y. Suppose also that for all (x, y) e R" x Y e) for suitable D1 l‘I’(x, y)| S D], t) for suitable D2(R) I‘Yx(x,y)IS Dz(R), whenever |x| S R. Let 19(0)) = t , and for k =1, 2, 3, , let r,k(a)) = inf{s > r,"“(a)) : y(s,a)) ¢ y(z',k(a)),a))) A T, i.e. rk (w) is the time of km jump of the process y(s, (a) after time t. I With these notation, (1) becomes dx(s) ?— = u(S,X(S)ayk’u(S)) ’ rsz PC([t,T);U)), where PC ([t,T);U ) denotes the set of all piecewise continuous functions u(s):[t,T) —) U continuous from the right. For er, t s |y(t) = y) = exp{— ?k(r,y)dr}. t Then S P(r,l S s,y(1’1)e A |y(t) = y) = )A(t,r,y)II(r,y,A \ {y})dr. Let DA be the set of all functions f (t,x, y):-Q— —-> R that are bounded and continuous in (t, x) uniformly in y, and such that fx(t,x, y) exists for all (t,x, y) e _Q— , and fx (r,x, y) is also bounded and continuous in (t, x) uniformly in y. For v EU define (7) Avf(t,x,y) = fx (t,x,y)- a(t,x,y,v) — G,f(t,x,y) for functions f (t,x, y) 6 DA, where Gt¢(y) = — ly[¢(y') - ¢(y)]H(t,y,dy'). Lemma 1. a) For every (t,x,y) 6? , |V((,x,y)l S C1(T — I) + D] . b) For every t e[O,T] , y 6 Y and every x,x'e R", [V(t,x,y) — V(t,x',y)) S MRlx — x' 9 whenever |x| S R , x'l S R , where _ C2(R1) ) B,(R,)(T-t) C2(R,) M — —— R __ ’ (thklNDz‘ ') 8 82m) T—r) _ where R1 = (1+ R)eB'( 1. Proof: a) Since 0, there exist a control ug(-) 611 such that J(t,x,y,u£(-)) S V(t,x,y) + a for all (t,x,y) e Q. Let |x|S R, x'| S R , and let x(s) be the solution of the system of differential equations dX(S)_ —a(s,x(s),y(s),u8(s)), tSsST, x(t) = x e R". Let u’(-) be such that u'(r; s, x, y) = u£(r; s, x(s), y) for all (t,x, y) 6 Q. Let x'(s) be the solution of the system of differential equations 612(8)=a(s,x'(s),y(s),u'(s)), tSsST, s x'(t) = x' e R" , Then |x(s) — x'(s)| S Ix — x'| + j'Bz(Rl)lx(r) — x'(r)|dr , t S s S T. 1 Therefore, 9 [x(s) — x'(s)| S [x — x'leBzml )(s-t) t S s S T. Hence, T T [ O , there exists control ug,,(-) e‘ll such that for every (t',x',y') e(t,T) x R" x Y J(t',x',y',u5,t(-)) S V(t',x',y') + .9. Define ~ us,t,s(r;t',x',y') = u(r;t',x',y')-19613,} + u£,,(r;t',x',y')-(1—1{r 0 is arbitrary, we get SA r,‘ V(t,x,y) S E,,x,y{ [ 0 there exists ug(-) 611 for which a + V(t,x,y) 2 J[t,x,y,ug(o)] = = E,,x,y{ [(D(r,x(r),y(r),u(r))dr + J(s A 1,1,x(s A 2'} ),y(s A r,‘ ),Ug(')]} Z t 2 E,’x,y{ [(D(r,x(r),y(r),u(r))dr + V(s A r},x(s A 7,] ),y(s A 1,1 ))). t Remark. Only u(-;t,x, y) is used on [I,s A 1)), no transition to the next intrajump control occurs. Therefore, the assertion of Lemma 2 could be restated in the following way: 18 V(t,x,y)= “(lttlfy)E,H5/:Tlxy{ j (r,x(“,g)(r,x), y, u(r;t,x, y))dr + u(-;t,x, y) t S + ]A(t,s, y)Vx (s,x,u,gj)(r,x), y) - a(r,xz(},)(r,x), y,u(r;t,x, y))dr + t + ?A(t,r, y) IY\{y} [V(r,ng)(r,x), y') — V(t,x,y)]l'l(r,y,dy’)dr) . For any u(-;t,x, y) e PC ([t,T);U ) , 3 lim —1— ]A(t,r,y)<1>(r,x,”§;)(r,x),y,u(r;t,x,y))dr = ¢(r,x,y,u(t;t,x,y)), 5—)! S -t t ’ S lim L ] A(t,s, y)Vx(s,x,u, g)(r,x), y) - a(r,x,'f g)(r,x), y, u(r;t,x, y))dr = s—MS-tt = Vx(t,x, y) - a(t,x.y,u(t;t,x,y)) , 20 lim -—1—}A(t,r, y) [H {y} [V(r,xz g)(r,x), y') - V(t,x, y)]I'I(r, y, dy')dr = 5—)! S — t t = —G,V(t,x, y) . Therefore, V,(t,x, y) exists and V,(t,x,y) + Eg{¢(t,x,y,v) + A"V(t,x,y)) = 0. The boundary (terminal) condition immediately follows from the definition of V(t,x, y) . In an easier case when y(s) is a finite-state Markov chain, the dynamic programming equation has the form V,(t,x,y) + i25{(b(t,x,y,v) + Vx(t,x,y) - a(t,x,y,v)) + v + zit/(m, y') — V(t,x, y)]II(t, y,{y'}) = 0 . a system of partial differential equations indexed by y 6 Y (note that Y is a finite set now). Fleming and Rishel [7] derive the continuous-time dynamic programming equation of optimal stochastic control theory mt,y)+;nezglw(ny,v)+mum}:o. 2l for a somewhat more general case, where Av (s) is the generator of the controlled Markov process y(s). In our problem, the control process x(s) is not a Markov process, but the two-component process (x(s), y(s)) is a Markov process, and Av is its generator. Lemma 3. (Dynkin’s Formula) Let f (t,x, y) 6 DA, and suppose f,(t,x, y) exists and is bounded and continuous in (t, x) uniformly in y. Let u(-) 671, and let x(s) be the corresponding trajectory of the system. ThenforOStSsST E,,x,y{f(s,x(s),y(s))) — f(taxay) = = E,,x,y{j[f,(r,x(r),y(r)) + A"(r)f(r,x(r),y(r))]dr}. 1 Proof: f(S,X(S),y(s))— f(l,x,y) = =Z(+[fSA1,k ,x(S/\ 1,,k+l)y(SA1,k))—f(SA1,k,x(SA1,k),y(SA1,k)) + k>0 +2 k>o fk+(S/\Z', ,x(SA Z',k+l),y(S/\ Z’,k+l))-f(S/\Z',k+l,x(S/\ 1,1‘+l),y(SA1,))]= =Zl+22. From the Fundamental Theorem of Calculus 22 E,,x,y{21} = E,,x,y{l[fi(r.x(r).y(r>)+fx(r.x(r).y(r))-a(r.x(r>.y(t,x, y,_u*(t,x,y)) + Aa‘l’wlwna, y) = o, and if u*(.) = {Joann y) = u‘(s,x;jg>(s,x), y),(t,x, y) e Q,t s s < T) 671, then u*(-) is optimal. c) If we can find y£(t,x, y) e U such that W,(t,x, y) + (s,x),y),(i,x, y) e Q,t s s < T) 611, then u£(-) is s- optimal, i.e. J(t,x,y,u£(-)) S V(t,x,y) + s for all (t,x,y) e Q. Proof: a) Let u(-) 671. Since for all (t,x, y) e Q , W,(t,x,y) + Au(')W(t,x,y) 2 —sy(T))} = T = W(t,x, y) + E,,x,y{ [[W,(s,x(s), y(s)) + Au(S)W(s,x(s), y(s))]ds} Z t T Z W (t,x, y) — E,,x,y{ j R such that |f(t, x, y) S Cl (T -t)+ D] for all (t, x, y) e Q, and let “Up = {u(-) 6 “LI: J(t,x,y,u(-)) e 8}. Notice that (LI 0 is non-empty since uo(-) for which uo(s; t, x, y) = up for all (t, x, y) e Q, t S s < T, is in 110, and if there is an optimal control in the class 11, then that control is in “LI 0. Lemma 1 (a) states that V(t, x, y) e 8. For every u(-) 6 “LI, define the function j(t,x, y, u(-)): [0, T] x Rn x Y—> R by T J(t, x, y, u( ))= [A (t, s, y)(s, x(s), y(s), u(s))ds + lP(xfl), y(T))}] + t +E,,,1xy[ {,g. T:}{ [¢(s x(s) y(s) u(s))dswwn MDH = Ei + Ea. Straightforward calculations yield t E, =A(t,T,y{T[(r, x,y)(r, x), y, u(r, t, x, y)Pr+ N +IA(t,,sy){L\{y} J(,sx,y (,sx),'y, u())'l(s,,')ds,ydy} and the assertion immediately follows. b) Suppose sup [W(t, x,y)—J(t,x, y, u(-)) = n > O, and (t, x, y) 6 Q is such that (my) [W(t,x,y)-J(t,x,y,u(s))Z’q-e—K'T 7%. Then n—e—K'T~%S|W(t,x,y)—J(t,x,y,u(-))= T'WWa, x, y) — T“(')J(t, x, y, u(-)) = 29 T [A(t, s, y){ LVN [W(s, x,lf(')(s, x), y')— V(s, xx?) (3, x), y')}‘fls, y, dy' ))ds t S S (1 - A( t, T, y ))n S (l — e-K 'T ) n. Contradiction. Therefore, W(t, x, y) = J(t, x, y, u(-)). c) Immediately follows from (a) and (b). Corollary. Finding the optimal u(-) e rLl is equivalent to finding, for each (t, x, y) 6 Q, an optimal deterministic open loop control function u(-; t, x, y) minimizing T .. [A(t, s, y)(D(s, xi?) (s,x),y, u(s; t, x, y))b +A(t, T, y)‘I’(x,1f)(,') (T, x),y), t where (D(s, x, y, u): PC([t,T),-U), k = 0,1, 2,...}, Define V(t) = {number of jumps of y(s) on [0, t]}. 30 The intrajump control function uv(,) (-; t, x, y) is used from time t up to the terminal time T or the time of the first jump of y(s) after t, whatever comes first. Now the control functions and thus the state of the system x(s) depend on the past of y(s). Notice that controls u(-) e “L! also are in class M, they are constant in the index k. It can be seen from Corollary 4.1 that if there is an optimal control in the class “Ll, then that control is also optimal in class M, and therefore, no advantage can be gained by going to the wider class M. Theorem 3. (Integral equation for the value function V(t, x, y)) a) V(t, x, y) satisfies the integral equation (10) V(t, x, y) = T V(t, x, y) i.e. T V(t, x, y) = inf {JA(I, s, y) 0, there exists control as, (-) e ‘11 such that for every (t',x',y') e (t, T) x R" x Y J(t',x',y’,u8',(-))SV(t',x',y’)+s. Define 142,1 (r,'t',x',y')= u("r"’rx'r)")°1{t’St] +u8,t(r;t"x'iy')'1{I'M} for all (t', x', y') 6 Q and t S r < T. Then by Lemma 4 V(t, x, y)S J(t, x, y, 118,, {-)J = = 7W, s, y)o(3, x39) (3, x), y, u(s,-i, x, y))rs+x(i, 1, mpg) (T, x) ))+ t + :[A(t, s, y){ L\{y}J(s, xx?) (5, x), y', "3.! {-)hIS, y. dY')}ds S s Tm, s, y)(D(s, xrf '1 (s, x), y, u(s; i, x, y))o + A(t, T, nygQ) (T, x), y)+ , + T[A(t,s,y){L\{y}V(s, x396, x),yr)1(s,y,dy')}ds +8.0 — A(,, T, y» Since £> 0 is arbitrary, we get 32 V(t,x,y)ST[A(t,s,y) 2?A(t,s,y)¢(s 117;)(3, x), y, u(s, t, x, y))is+A(,t T, y)‘~I’(x, "()(T, x), y)+ +)A(I,S.{y}y){L\V(,Sx,y)(s,x),y'.,'ds‘,)1(sydy)} and the assertion follows. b) Suppose sup |W(t,x,y)— V(t, x,y)| = n > O, and (t, x, y) 6 Q is such that (my) |W(t,x,y)—V(t,x,y)| 2 n—e‘” ‘X. Without loss of generality, W(t, x, y) > V(t, x, y). There exists a control u(-) e ’U such that Tu(')V(t,x,y)STV(t,x,y)+e—K'T1%. Then 33 n—e‘K'T 7% s W(t,x,y)— V(t,x,y) = TW(t,x,y)— T V(t,x,y) s s T”(')W(t, x, y) — T"(')V(t, x, y) + e'K‘T 7% _<. T S [A(t, S, y){ L‘{y}[W(s, xx?) (s, x), y')— V(s, x3201), Y')}7(S, y, 61)" ))ds+e'K'T 1% S t S(l—A(t,T,y))r]+e_K'T -T%Sn—e_K'T 2%. Contradiction. Therefore, W(t, x, y) = V(t, x, y). c) Immediately follows from (a) and (b). Consider u0(-) e “U o for which uo(s; t, x, y) = up for all (t, x, y) e Q, t S s < T. Let xo(s) be the solution of the system of differential equations dx ——0d—“Z=a(s,x0(s),y(s),u0), OStSsST, s with initial condition xo(t) = x e R". Define T (11) Wow. x, y) = J(t, x. y. uo(-)) = Er,x,y{ Ids. xo(S). y(s). uo)ds + W(xotr), ya»), t and fork=1, 2, 3, , let (12) Wk (fsxsy)=TWk-1(t,x,y)- 34 Theorem 4. W), (t, x, y) converges to V(t, x, y) from above unifome on Q. Proof: It is easy to see that W, (t, x, y) 2 V(t, x, y) for all k = 0, l, 2, . Since W0 (t, x, y) e 8 and V(t, x, y) e 8, llWo(t,x,y)-V(t,x.y)ll= sup{Wo(t,x,y)-V(t,x,y)}-<—Z'IC1T+D1l- (axiy) Suppose that for some k, ||Wk(t,x,y)-V(t,x,y)||S2-[C1T+D1]-(l—e-K'TY. Then Wk+1(t,x,y)= TWk (t,x,y)S T S T V(t, x, y) + IA(t,s, y){ L\{y}2.[C1T + D, ](1 —e“K-T )k u(s, y, dy')}ds = t = V(t,x,y)+ 2-[C,T+Dl]-(1—e‘K'T)‘+'. The assertion of the lemma follows by the mathematical induction. 35 CHAPTER 4 FAMILY OF CONTROL PROBLEMS WITH “STOPPED” y(s) Let 0 _ _ 0 z,(s)—y(t)—y(1,), tSsST, and let for k =1, 2, z,k(s) = y(s) for t S s < 1,k z,k(s)=y(1,k) for 1,k SSS T. (2," (s) is y(s) stopped at the time of the kth jump after t.) Note that for k= 1, 2,, for]: O, 1, , k, z,k(s)=z:,—I(s), 1,I SsS T. I For k = 0, l, 2, , consider the control problem where the state of a system is described by the system of differential equations (ix/AS) k T = a(s,xk(s),z, (s), uk(s)), t S s S T, with initial condition x(t) = x e R" , and the performance criterion is given by T k k J, (t, x, y, uk(.)) = E,_,,,,{ jo(s,x,,(s),z, (s),u,,(s))ds + ‘P(xk(T),z, (T))). I 36 Since, for k = O, l, 2, , z," (s) has at most k jumps on [t, T], let us consider controls u), (-) of the form _ _ . I I k I uk(S) _ "[t,/(S) "' Elk/(S, TI 9xk(TI )921 (TI ))9 1,1Ss< r,’+‘,l=o, 1, ...,k— 1, 141(5) = uk,k(s) = Alt/((5; n" ,xtmk ),21'( (If D, k 1, SsST. Let ’le be the set of all controls uk (-) of this form. Note that U ‘le = M. k For k = O, 1, 2, , consider the value function Vk(’»xtJ’)= infiJkIMJWN-Di- “It ') Using arguments similar to the ones used earlier, we can prove the following lemma. Lemma 5. If J], (t,x,y,u;(-))= Vk (t,x,y) and Jk_1(t,x,y,u;_l(-))= Vk_l(t,x,y), then a) 112,10) = u;_l’1_1(-) , l = I, 2, , k, b) u;,0(-) minimizes T u), 0() uk 0() [(Dk(s,x,,); (3.x),yauk,o(3))ds + u(s,); (T’xky) ’ t where (D0(s,x,y, u) = (I)(s,x, y, u), W0(x,y) = ‘I’(x,y) , 37 o,(.,.,y,..)=A(,,.,,){ots,x,,,.)+ ),,,,thmamas»), LI’k(x,y)=A(t,T,y)‘I’(x,y), k= 1,2, For every y e Y, consider the deterministic control problem where the state of a system is described by the system of differential equations (13) gig—S) = a(s,x(s),y,u(s)), t S s S T, s with initial condition (14) x(t)=x ER", and the performance criterion is given by T Fo (t,x,y,u(-)) = [$1 (s.x(s),y,u(s))ds + ‘Pt (x(T).y). t Suppose that ((1),( )x (s, x,y, u) exists for all (s,x,y,u) e (t, T)x R" x Yx U. Suppose that (32(3), 12(5)) , t S s S T, is the optimal solution. Let v(s):[t, T] —+ Rm be a piecewise continuous function and set u(s,8)=zi(s)+6'v(s), tSsS T. Then (13) and (14) with u(s) = u(s,8) have a one—parameter family of solutions x(s,8), t S s S T, for [8] < 50, which contains x(s) for 5 = O. The functions x(s,8) are continuous and have continuous derivatives with respect to 5. Let 38 A(s)=g%(s,0), tSsS T. Then A(s) satisfies the system of differential equations €915) = ax (551(5), y,ti(s ) A(s)+ 0,, (s, 12(5) y,13(s))- v(s), t S S S T, ds with initial condition A(t) = 0 e R”. Denote by A(s), t S s S T, the n x n matrix-valued function satisfying the differential equation 5’3?=a.(s,a(,),,,,;(,».,4(.), asst with initial condition A(t) = I n. Then A(s=) A(s) [A4 (r-)au (,rx(r),,yu(r)) v(r)dr, tSsST. Set f (5;v(-)) = F0 (t, x, y, u(-,6)) , for |8| < 8 0. Then f '(6;v(-)) exists and f(6;v())= (wt) (any) )()+Tila>i )csscs)yu(s))A(s)+(¢t). (sxts)yu(s»v(s)lds = [u(erkIr where 39 t ), (r. fir). y. zi(r))A(r)dr)A”l (ska (s. As), y. 136)), t S 3 S T. If (12(3), 17(3)), t S s S T, is the optimal solution, then for any piecewise continuous v(s), f(8;v(-)) has a local minimum at o = 0. Then f ' (6;v(-)) = 0, f " (5; v(-)) 2 0- This proves the following lemma: Lemma 6. If (x(s), 13(3)): t S s S T, is the optimal solution, then f ' (5;V(-)) = 0, f " (8; v(-)) 2 0 for any piecewise continuous v(~). The relation f '(5; V(-)) = 0 holds for any piecewise continuous v() if and only if (16) (p(s)=0 e R’" almost everywhere on [t, T]. Set T P(s) = (‘I’k )x (x(T),y)A(T)+ “(1),, )x (r,x(r), y, fi(r))A(r)dr) - A_1 (s), t S 3 S T. 40 Then since g— A’1 (3) = -A-l (3)- ax (3, 12(3), y,12(s)), 3 d (17) ,,;-P(s) = —(
(s)) = o, t s 3 s T. Suppose that (18) aa 115,,(5, 55(3), 12(3), 10(3)) .1 0, i s s s T. Then since x(s) and P(s) are AC, (16') implies that 12(3) is continuous. Then (13') and (17') imply that x(s) and P(s) are of class C I. Then (16') implies that 12(3) is also of classCl. Also from(16') we get 19 OW)— [Hy [l [Hy Hy Hy ((cp) P() )] < O? 46 [1] [2] [3] [4] [5] [6] l7] [8] [9] [10] [11] 112] [13] BIBLIOGRAPHY R. Bellman, Dynamic Programming, Princeton Univ. Press, New Jersey (1957) M. G. Crandall and P.-L. Lions, Viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc., 277 (1984) pp. l--42. M. H. A. Davis, Piecewise-deterministic Markov processes: A general class of non-diflusion stochastic models, J. Roy. Statist. Soc., B46 (1984), pp. 353--388. M. H. A. Davis, Control of piecewise-deterministic processes via discrete-time dynamic programming, Stochastic differential systems (Bad Honnef, 1985), pp. 140--150, Lecture Notes in Control and Inform. Sci., 78, Springer, Berlin-New York, 1986. M. A. H. Dempster and J. J. Ye, Necessary and sufi‘icient optimality conditions for control of piecewise deterministic Markov processes, Stochastics Stochastics Rep., 40 (1992), no. 3-4, pp. 125--l45. M. A. H. Dempster and J. J. Ye, Generalized Bellman-Hamilton-Jacobi optimality conditions for a control problem with a boundary condition, Appl. Math. Optim., 33 (1996), no. 3, pp. 211--225. W. H. Fleming and R. W. Rishel, Deterministic and Stochastic Optimal Control, Applications of Mathematics, 1, Springer-Verlag, New York, 1975 W. H. Fleming and H. M. Soner, Controlled Markov processes and viscosity solutions, Applications of Mathematics, 25, Springer-Verlag, New York, 1993. J. J. F lorentin, Optimal control of systems with generalized Poisson inputs, J. Basic Engineering, 85 (1963), pp. 217--221. R. M. Goor, Existence of an optimal control for systems with jump Markov disturbances, SIAM J. Control Optimization, 14 (1976), no. 5, pp. 899--9l 8. N. N. Krassovskii and E. A. Lidskii, Analytical design of controllers in stochastic systems with velocity limited controlling action, Appl. Math. Mech., 25 (1961), pp. 627--643. N. V. Krylov, Introduction to the theory of diflusion processes, Amer. Math. Soc., Providence, RI. (1995) H. J. Kushner, 0n the stochastic maximum principle: Fixed time of control, J. Math. Anal. Appl., 11 (1965), pp. 78--92. 47 [14] [15] [16] [17] [18] [19] [20] E. A. Lidskii, Optimal control of systems with random properties, Appl. Math. Mech., 27 (1963), pp. 42-—59. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze and BF. Mishchenko, The Mathematical Theory of Optimal Processes, John Wiley, New York (1962) R. W. Rishel, Dynamic programming and minimum principles for systems with jump Markov disturbances, SIAM J. Control, 13 (1975), pp. 338--371. H. M. Soner, Optimal control with state-space constraint. 1], SIAM J. Control Optim., 24 (1986), no. 6, pp. lllO--1122. D. D. Sworder, Feedback control of a class of linear systems with jump parameters, IEEE Trans. Automatic Control, AC-14 (1969), pp. 9--14. D. Vermes, Optimal control of piecewise deterministic Markov process, Stochastics, 14 (1985), no. 3, pp. 165-—207. W. M. Wonham, Random diflerential equations in control theory, Probabilistic Methods in Applied Mathematics, vol. 2, A. T. Bharucha-Reid, ed., Academic Press, New York, 1970, pp. 191--l99. 48 IIlililjljllljljljlflfllIljllljlilll