ll

 

N

1

WW

\

  

WW“IWIWIWINIHI“WIN“

U1
_mmu

_TH

    

THESiS
l

i W LIBRARY
5% (91 i *7 3 a Michigan State
University

This is to certify that the
dissertation entitled

OPTIMAL CONTROL OF DYNAMICAL SYSTEMS
WITH JUMP MARKOV PERTURBATIONS

presented by

Alexey G Stepanov

has been accepted towards fulfillment
of the requirements for the

Ph. D. degree in Statistics

 

 

V—élé/yv/

 

Professor Anatdli V. SR6rokhod (Major Professor)

VProfessor Habib Salehi (Major Professor)

/2}’;/t}’} 11/) L on 3
’ Date

 

MSU is an Affirmative Action/Equal Opportunity Institution

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJClRCIDateDuepes-p. 15

 

 

OPTIMAL CONTROL OF DYNAMICAL SYSTEMS

WITH JUMP MARKOV PERTURBATION S
By

Alexey G Stepanov

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHYLOSOPHY
Department of Statistics & Probability

2003

ABSTRACT

OPTIMAL CONTROL OF DYNAMICAL SYSTEMS
WITH JUMP MARKOV PERTURBATIONS

By

Alexey G Stepanov

The control problem of a dynamical system with jump Markov perturbations is
considered. The partial differential equation of dynamic programming for the value
function V(t, x, y) is derived. Also an integral equation for the value function V(t, x, y) is
obtained and is used to construct a sequence of functions that converge unifomly to the

value function V(t, x, y) from above.

TABLE OF CONTENTS

INTRODUCTION .................................................................................. 1
CHAPTER 1
PRELIMINARIES ................................................................................. 10
CHAPTER 2
DYNAMIC PROGRAMMING .................................................................. 17
CHAPTER 3
REDUCTION TO A FAMILY OF DETERMINISTIC CONTROL PROBLEMS ...... 27
CHAPTER 4
FAMILY OF CONTROL PROBLEMS WITH “STOPPED” y(s) .......................... 36
CONCLUSION ..................................................................................... 44
BIBLIOGRAPHY .................................................................................. 47

iii

INTRODUCTION

Optimal control theory has seen tremendous growth over the past four decades,
with important applications in ﬁnance, networks, manufacturing, medicine, operations
research, and other areas of science and engineering. Optimal control theory is crucial to
the design and operation of complicated modern systems since it ensures that vital
variables are kept in check, regardless of the disturbance the system undergoes. In a
variety of naturally occurring problems, we wish to control the system governed by a set
of differential equations in order to minimize (or'maximize) a given performance
criterion.

In this dissertation, we discuss an optimal control problem of a system with a
jump Markov disturbance. The evolution of the system is described by a system of
differential equations, and there is a running (instantaneous) cost and a terminal cost
associated with the process. A control is chosen in order to minimize the total cost of the
process. However, due to a random jump Markov disturbance in the system, the position
of the system at any given time is also random, and so is the total cost associated with the
process. Thus, the control is chosen in order to minimize the expected value of the total
cost of the process, and is also random, depending only on the present state of the
process, and maybe also its past states.

The two main approaches used in control problems are the Dynamic

Programming approach and Pontryagin’s maximum principle.

In Dynamic Programming approach, the minimum (inﬁmum) value of the
performance criterion is considered as a function of the initial (starting) point. This
function is called the value ﬁrnction, and in many ways it holds the key to solving the
optimal control problem. R. Bellman [1] applied dynamic programming to the optimal
control of discrete-time systems, demonstrating that the natural direction for solving
optimal control problems is backwards in time. That is, we start solving an optimal
control problem by ﬁrst ﬁnding the optimal control policy on the very last step. Then,
armed with that information, we search for the optimal control on the second to last step
and so on, each time using the fact that we already have the optimal control policy for the
steps that follow the one that is being currently considered. The dynamic programming
approach decomposes the optimal control problem into a sequence of minimization
problems that are carried out over the space of controls, and are far simpler than the
original problem.

To illustrate the idea of dynamic programming, suppose the state of a system at
time k is described by a process x k e R n, satisfying
xk+1=f(xk,uk), OSkSN—l,

where u k is the control at time k whose value is chosen from the set of admissible

controls U k c R m. Suppose that the performance criterion is given by Jo (x, 11.), where

N
J,-(x,u.)=ng(xk,uk), OSiSN,xeRn.
k=i

is the performance of the process between time i and time N initiating from x = x,- at time

i, when the control policy u. = (uO , u] , , u N) is used. Let

Vk(x)=inf{Jk(x,u.)}, 0SkSN,xeRn.
“0

Then
VN(x)= inf {8N(stuN)}a x E Rn,
uNEUN
and
Vk(x)=uilelf ng(xk’uk)+Vk+l(f(xkauk))}a OSkSN— Lx 6 R"-
It It

The problem of minimizing the performance criterion J0 (x, u.) over the choice of the
entire control policy u. = (uo , "1 , , u N) splits up into a sequence of smaller and

simpler problems of ﬁnding the optimal choice of each u k separately, working backwards -
in time. Furthermore, the values of u k where the inﬁmum is achieved on each step, u; ,

O S k S N, form the optimal (overall) control policy u: = (u6,u; ,...,u;v ).

For continuous-time systems, the dynamic programming approach uses the same
idea of working backwards in time, and uses the value function as a tool in the analysis of
the optimal control problem. At time s, we choose the control u(s) for our process x(s),
based on the assumption that the optimal control would be used after time s. Here,
whenever the value function is differentiable, it satisﬁes a ﬁrst order partial differential

equation called the partial differential equation of dynamic programming. Suppose that

the state of a system at time s is described by a process x(s) e R n, satisfying the system
of differential equations

dx(s) = a(s,x(s),u(s)), 0 S t S s S T,

 

with initial condition

x(t) =x e R",

where u(s) is the control process whose value at time s can be chosen from the set of

admissible controls U(s, x) c: R m. Suppose that the performance criterion is given by
T
J (t,x, u(-)) = I<D(s,x(s),u(s))ds + ‘P(x(T)).

Then if the value function V(t,x)=11(1f{J(t,x,u(-))} is differentiable, it satisﬁes the
(Hamilton-Jacobi-)Bellman equation (or dynamic programming equation)

V:(t,x)+ in(f ){Vx(t,x)-a(t,x,v)+(D(t,x,v)}=O, 0 St < T,x e R".

veU 1,):

with initial condition
V(T,x)= ‘P(x), x e R".
The dynamic programming equation is often rewritten as

—V,(t,x)+ sup H(t,x,Vx(t,x),v)=O, OSt<T,xeRn,

veU 1.x
where
H(t,x,p,v)= -p - a(t,x,v)—<I>(t,x,v).
H (t, x, p, v) is generally called the Hamiltonian in analogy with a corresponding quantity

occurring in classical mechanics.
Solving the (Hamilton-Jacobi-)Bellman equation gives us the value function.

Moreover, the dynamic programming equation can be used to ﬁnd the optimal control

policy. Similarly to the discrete-time case, the values of v e U(t, x) where the inﬁmum is

achieved are the values of the optimal control policy u* (I).

If the value function fails to be differentiable at some points (i, x), it does not
satisfy the dynamic programming equation everywhere. Thus, we need to consider a
generalized solution to the dynamic programming equation if we wish to use the dynamic
programming approach. However, we may encounter a serious lack of uniqueness when
dealing with generalized solution. Crandall and Lions [2] introduced the concept of the
viscosity solution. For a large class of optimal control problems, the value function is the
unique viscosity solution of the related dynamic programming equation in the case when
the value function is not smooth enough to be a classical solution. For more on viscosity
solutions see Fleming, Soneri[8]. ‘ i

The optimal control policy may not exist in some optimal control problems. Then
for some points (I, x), the imﬁmum will not be achieved in the dynamic programming
equation. In this case, the dynamic programming approach can be used to obtain an e-
optimal (almost optimal) control policy.

In general, the dynamic programming equation is hard to solve. In some cases
one has to resort to numerical solution of the dynamic programming equations.
Typically, the state space and the control space are discretized, and the minimization is
carried out for the ﬁnal number of states. However, the computational difﬁculties may be
too restrictive for complex multidimensional problems.

Pontryagin’s maximum principle developed by LS. Pontryagin [15] gives a

necessary condition that must hold on an optimal trajectory. Simply stated, if u*(s) and

x*(s) represent the optimal control and the state trajectory, then there exists an R "-valued

function P(s) called an adjoint variable, such that together u*(s), x*(s) and P(s) satisfy

41;? = —Hp (s,x * (s), P(s),u * (5))

gigs) = Hx(s,x*(s). P(s),u :"(S))

and for all s, 0 S s S T, the optimal control u*(s) is the value of v maximizing

H (s, x * (s), P(s), v) , i.e., for all v e U(s, x),

H (s, x * (s), P(s), v) S H (s, x "‘ (s), P(s), u * (3)).
If V is differentiable at each point (3, x*(s)) of the optimal trajectory, then a candidate for
an adjoint variable is P(s) = Vx (s, x * (3)). Under certain conditions, Pontryagin’s
maximum principle can also be a sufﬁcient condition for optimality.

Let y(s) be a jump Markov process, and suppose that the state of a system at time

s is described by a stochastic process x(s) e R n, satisfying the system of differential

equations

(1) $49 = a(s,x(s), y(s), u(s)), 0 St S S S T ,
with initial condition

(2) x(t) = x e R".

In (1), u(s) is a parameter whose value we can choose at any instant in a set

U c R m in order to control the process x(s). Thus the control u(s) = u(s, (o) is a
stochastic process. Since our decision at time 5 must be based upon what happened up to

time s, the function u) ——) u(s, to) must be measurable with respect to

35 = 0'{(x(r),y(r)), r S s} .

The Markovian nature of the problem suggests that it might suffice to consider
control processes of the form
11(5) = y(s,X(S),y(S)).

Suppose that the performance criterion is given by

T
J (t,x, y,u(-)) = Et, x, y{ j(I>(s,x(s), y(s),u(s))ds + ‘I’(x(T), y(T))}
t
(3)
T
= E j<b(s,x(s), y(s),u(s))ds + ‘P(x(T), y(T))

I

 

x(t) = x,y(t) = y}

The problem is - for each (t, x, y) e [0, T]x R” x Y - to ﬁnd the number V(t, x, y)

and the pair (x*(s),u* (s)), t S s S T , satisfying (1) and (2), such that
(4) V(t,x,y) = inf)‘{J(t,x,y,u(~))} = J(t,x,y,u*(-)) ,
u .

where the inﬁmum is taken over a given suitable class “(.1 of controls. V(t, x, y) is called

the value function.
In similar deterministic optimization problems, examples show that optimal
controls may be discontinuous and that the partial differential equation of Dynamic

Programming (Hamilton-Jacobi-Bellman equation) may not have a smooth solution. A

similar situation is to be expected here, so class (L! of controls should include

discontinuous controls.
Problems similar to the one described above with the disturbance being a ﬁnite

state continuous time Markov chain and terminal cost were investigated by Rishel [16].

For earlier work see Krassovskii and Lidskii [11], [14], where the control problem is on
an inﬁnite interval and terminal conditions are not imposed. Florentin [9] discussed the
optimal control of systems with disturbance inputs being generalized Poisson processes.
Wonham [20] considered the special case of the linear regulator version of this problem.
He explicitly computes the optimal control law for this case. In a slightly different
setting Kushner [13] deﬁned a stochastic maximal principle. Sworder [18] gave a
maximal principle for ﬁxed terminal time linear problems without terminal conditions.
Rishel [16] discussed the relationship between dynamic programming optimality
conditions and stochastic minimum principle optimality conditions different from those
in [13] and [l 8]. Goor [10] proved the existence of the optimal control in some special
cases.

The systems with ﬁnite state jump Markov disturbances are the most general
special class of piecewise deterministic processes (PDPs), ﬁrst introduced explicitly by
Davis [3]. Roughly speaking, such processes are continuous time Markov processes
consisting of a mixture of deterministic motion and random jumps. PDPs, with stochastic
pure jump processes and deterministic dynamical systems as special cases, include nearly
all non-diffusion continuous time processes in applied probability. The optimal control
theory of PDPs was developed by Vermes [19], Davis [4], Soner [17] and Dempster and
Ye [5], [6]. PDPs, with stochastic pure jump processes and deterministic dynamical
systems as special cases, include virtually all of the stochastic models of applied
probability except those involving diffusions. For the optimal control problems involving

diffusions see Krylov [12].

In this paper the partial differential equation of dynamic programming for the
value function V(t, x, y) is derived. Also an integral equation for the value function
V(t, x, y) is obtained and is used to construct a sequence of functions that converge
uniformly to the value function V(t, x, y) from above. This sequence can be used to
obtain the value function without solving the partial differential equation of dynamic
programming, and the value ﬁmction, in turn, can be used to obtain the optimal control, if

it exists. Moreover, the convergence to the value function V(t, x, y) is uniform in
(t, x, y) e [O,T]x R" x Y , this sequence of functions can be used to obtain 8- optimal

control functions, (that is, controls u£(-) for which J (t,x, y,ug(-)) S V(t,x, y) + a for all

(t,x, y) e [0, T]x R" x Y for some a> 0). In fact, each function in the sequence is a

performance function of a control that is selected from a class of controls M that is more
general than “LI. We will also show that if there is an optimal control in the class “ll, then

that control is also optimal in class M. Therefore, while constructing the sequence of

functions that converge uniformly to the value function V(t, x, y) from above, we would

also construct a sequence of controls that are 6‘ - optimal in class M.

CHAPTER 1

PRELIMINARIES

Let (Q, 7", P) be a probability space.
Let Y be a complete separable metric space, and let By denote the Borel o-algebra

on Y.

Let y(s) be a jump Markov process with values in Y deﬁned on [0, T] with

transition probabilities P(t, y, s, A), 0 S t S s, y e Y, A 6 By.

Suppose that for all t e [0, T), y 6 Y, A 6 By.

1
E: P(t,y,s, A) — 1A(y)] ——) TI(t,y,A)
uniformly in (t, y, A) as s —) t, s > 1.
Suppose that for ﬁxed (y, A) (y 6 Y, A 6 By) II(t, y, A) is continuous int

uniformly in (y, A).
Let
Way) = “(WY \ M) = “(Walylia
and suppose that for some K

1(t,y)S K for all t6 [0, T), y e Y.

Suppose that the values of the controls are restricted to a closed subset U of R'".

 

Let Q denote [O,T)x R" x Y and (2 denote [0,T] x R" x Y.

Let the vector function a(t,x, y,u):é x U ——> R" and the running (instantaneous)
cost <D(t,x, y,u):é x U —> R be continuous in (t, x, u) unifome in y and differentiable in
(t, x, u) for each y e Y . Suppose also that a,(t,x,y,u) , ax(t,x,y,u) , au(t,x, y,u) ,
(D,(t,x,y,u), (bx (t,x, y, u), (Du(t,x,y,u) are all continuous in (t, x, u) unifonnly in y, and
au (t,x, y, u) and (I)u(t,x, y,u) are continuously differentiable in (t, x, u) for each y 6 Y.

In addition, suppose that for all (s,x, y, u) e a x U
a) for suitable B]

la(s,x,y,u)l S Bl(1 + Ixi) ,
b) for suitable Bz(R) .
“ax(s,x,y,u)‘l S BZ(R) ,
whenever |x| S R ,
0) there exist no 6 U such that for suitable C j
(D(s,x,y, u) 2 -C1 and <D(s,x,y,u0) S C] ,
d) for suitable C2(R)
|¢x(s,x,y,u)[ S C2(R) ,
whenever lxl S R.

Let the terminal cost LP(x, y): R" x Y —> R be continuous in x uniformly in y and

differentiable in x for each y e Y , with ‘I’x (x, y) continuous in x uniformly in y.

Suppose also that for all (x, y) e R" x Y
e) for suitable D1
l‘I’(x, y)| S D],
t) for suitable D2(R)

I‘Yx(x,y)IS Dz(R),

whenever |x| S R.
Let 19(0)) = t , and for k =1, 2, 3, , let

r,k(a)) = inf{s > r,"“(a)) : y(s,a)) ¢ y(z',k(a)),a))) A T,

i.e. rk (w) is the time of km jump of the process y(s, (a) after time t.
I

With these notation, (1) becomes

dx(s)

?— = u(S,X(S)ayk’u(S)) ’

rsz<r,k+l, k=0,1,2,....
Since y(s) is independent of x(s) and u(s), the randomness of the system is due to
the “outside” disturbance y(s), neither the control u(s) nor the trajectory x(s) of the system
have any impact on it.

Therefore, instead of considering controls of the form u(s) = y(s,x(s), y(s)) , let us

consider deterministic (non-random) intrajump control functions of the open loop nature

u(-) = u(.;t,x, y) which need to be speciﬁed from t to the terminal time T in case no jump

12

 

of y(s) occurs before T. If a jump occurs at time 2' before T to y' 6 Y say, the control

function u(; r,ng;)(r,x),y') is used next, where

S
(5) ng)(s,x) = x + ja(r,ng,)(r,x),y,u(r;t,x,y))dr , t S s S T.
t
Ideally, one could expect to ﬁnd continuous u(-;t,x, y) for all (t,x, y) 6 Q (then
x(s) would be continuously differentiable between the jumps of y(s) ). However, it would
be more realistic to search for piecewise continuous u(-;t,x, y), and then x(s) would be

continuous with piecewise continuous derivative.

Without loss of generality, we can assume that u(-;t,x, y) is continuous from the
right for all (t,x, y) 6 Q.
Let ‘L1 be the set of all collections of piecewise continuous intrajump open loop

(deterministic) control functions

(6) 11 = {u(.) = u(-;t,x, y):Q —> PC([t,T);U)),

where PC ([t,T);U ) denotes the set of all piecewise continuous functions u(s):[t,T) —) U

continuous from the right.

For er, t<s,let

A(t,s,y) = P(r,l > s |y(t) = y) = exp{— ?k(r,y)dr}.

t
Then

S

P(r,l S s,y(1’1)e A |y(t) = y) = )A(t,r,y)II(r,y,A \ {y})dr.

 

Let DA be the set of all functions f (t,x, y):-Q— —-> R that are bounded and

continuous in (t, x) uniformly in y, and such that fx(t,x, y) exists for all (t,x, y) e _Q— , and
fx (r,x, y) is also bounded and continuous in (t, x) uniformly in y.

For v EU deﬁne

(7) Avf(t,x,y) = fx (t,x,y)- a(t,x,y,v) — G,f(t,x,y)

for functions f (t,x, y) 6 DA, where

Gt¢(y) = — ly[¢(y') - ¢(y)]H(t,y,dy').

Lemma 1.

a) For every (t,x,y) 6? ,
|V((,x,y)l S C1(T — I) + D] .

b) For every t e[O,T] , y 6 Y and every x,x'e R",

[V(t,x,y) — V(t,x',y)) S MRlx — x'

 

9

whenever |x| S R ,

 

x'l S R , where

_ C2(R1) ) B,(R,)(T-t) C2(R,)
M — —— R __ ’
(thklNDz‘ ') 8 82m)

T—r) _

where R1 = (1+ R)eB'( 1.

Proof:

a) Since <D(s,x, y, u) 2 —C1 and ‘I’(x,y) 2 ‘0] , V(t,x,y) 2 —CI(T — t) — D].

 

On the other hand, by choosing u(s;t',x', y') = uo for all (t',x', y') e Q , t'S s < T ,

T

V(t,x,y) = E,,x,y{ [(D(s,x(s),y(s),u0)ds + ‘I’(x(T),y(T))} S C](T - t) + D].
t

b) For any control u(-) 611, t S s S T ,

1 + [x(s)] S (I + |x(t)]) + ]|a(r,x(r), y(r),u(r))|dr S (I + |x(t)|) + [81(1 + |x(r)|)dr .

I 1

Thus,

 

|x(s)| s (1 + |x(z)|)e3'(s") — 1 for t s s s T,

and therefore,

 

x(s)| S R], t S s S T, whenever lx(t)| S R.
For a ﬁxed e> 0, there exist a control ug(-) 611 such that

J(t,x,y,u£(-)) S V(t,x,y) + a for all (t,x,y) e Q.

Let |x|S R,

 

x'| S R , and let x(s) be the solution of the system of differential
equations

dX(S)_

 

—a(s,x(s),y(s),u8(s)), tSsST,
x(t) = x e R".
Let u’(-) be such that
u'(r; s, x, y) = u£(r; s, x(s), y) for all (t,x, y) 6 Q.

Let x'(s) be the solution of the system of differential equations

612(8)=a(s,x'(s),y(s),u'(s)), tSsST,
s

 

x'(t) = x' e R" ,
Then

|x(s) — x'(s)| S Ix — x'| + j'Bz(Rl)lx(r) — x'(r)|dr , t S s S T.
1

Therefore,

9

[x(s) — x'(s)| S [x — x'leBzml )(s-t) t S s S T.

Hence,

 

T T
[<D(s,x(s),y(s),u8(s))ds— [¢(s,x'(s),y(s),u'(s))ds S
t t

T
3C R __ . 32(R.)(s-t)d =C2(R1) 32(11’.)(T-t)__1 _ ..
2( ])|x xltje s 32(R1)(e )Ix x|

Also

I‘P(x(T).y(T))— x(x'rrmnl s 02(R1)e32(R1’(T")Ix — x'

 

Therefore,

V(t,x,y)—V(t,x',y)S J(t,x,y,u8(-))—J(t,x',y,u’(-))S MRIx—x'

 

9

and the assertion immediately follows after a similar symmetric argument.

Lemma 1 (b) states that V(t, x, y) is locally Lipschitz in x. Then by Rademacher’s

Theorem, Vx(t,x, y) exists for almost all x e R" , for all t e[O,T] , y e Y .

l6

CHAPTER 2

DYNAMIC PROGRAMMING

Lemma 2.

Forall (t,x,y)eQ and tSsST,

SA r,'

V(t,x, y) = int; Et,x,y j<D(r,x(r),y(r),u(r))dr + V(s A r},x(s A 1,1),y(s A 7,1 )) .
u ' 1

Proof:

Fix a control u(-) 61!.
For any a > O , there exists control ug,,(-) e‘ll such that for every
(t',x',y') e(t,T) x R" x Y

J(t',x',y',u5,t(-)) S V(t',x',y') + .9.

Deﬁne

~

us,t,s(r;t',x',y') = u(r;t',x',y')-19613,} + u£,,(r;t',x',y')-(1—1{r<s,,.S,})
for all (t',x',y') eQ and t S r < T.

Then

V(t,x,y) S J[t,x,y,z~4£‘,,s(.)] :-

l7

 

= Et,x,y{ l‘tD(r,X(r),y(r),u(r))dr + J(5 A Ttlax“ A 7:] )aJ’(S A Ti] )su£,t('))} S
t

S E,,x,y{ [C’D(r,x(r),y(r),u(r))dr + V(s A r},x(s A 1,1),y(s A 7}”) + a

I

Since 6‘ > 0 is arbitrary, we get

SA r,‘

V(t,x,y) S E,,x,y{ [<D(r,x(r),y(r),u(r))dr + V(s A r},x(s A I} ),y(s A 1,1 ))}
t

for any u(-) 611.

A

On the other hand, for any 5 > 0 there exists ug(-) 611 for which

a + V(t,x,y) 2 J[t,x,y,ug(o)] =

= E,,x,y{ [(D(r,x(r),y(r),u(r))dr + J(s A 1,1,x(s A 2'} ),y(s A r,‘ ),Ug(')]} Z
t

2 E,’x,y{ [(D(r,x(r),y(r),u(r))dr + V(s A r},x(s A 7,] ),y(s A 1,1 ))).

t

Remark.

Only u(-;t,x, y) is used on [I,s A 1)), no transition to the next intrajump control

occurs. Therefore, the assertion of Lemma 2 could be restated in the following way:

18

V(t,x,y)= “(lttlfy)E,H5/:Tlxy{ j <D(r,() x," y (,r x ),;y,u(r t, x ,y))dr +

+ V(S/\ T,,x:lg,)(S/\l r,,x x),y(SA 95)} ,

where the inﬁmum is taken over all u(-;t,x, y) e PC([t,T);U).

Theorem 1.

If V(t,x, y) 6 DA,

then V,(t,x, y) exists for all (t,x, y) 6 Q , and V(t, x, y) satisﬁes the dynamic
programming equation
(8) V,(t,x, y) + 325{(D(t,x,y,v) + AvV(t,x,y)) = 0, (t,x, y) 6 Q,

and boundary (terminal) condition

(9) V(T,x,y) = LI’(x,y) .

Proof:

Let (t,x,y)eQ and tSsST.

Then for u(-) 611,

t

E,,x,y{ [<D(r, x(r), y(r), u(r))dr + V(s A 7,1,x(s A 1,), y(s A I, ))}=

= A(t,s,y)l:sj¢(ra xf‘g)(r,X)ay,u(r;t,x.y))dr + V(s, x, "(y)(s, x) U] +

l

+ ]A(t, w,y)k(w,y)[?®(r,ng)(r,x), y,u(r;t,x, y))dr:|dw +
t t

+ ]A(t, w, y)[ [n {y} V (w,xz g) (w,x), y')I'I(w, y,dy')]dw =
t

S
= IA(t’r’y)®(r,ng)(r,x)’y,u(r;t,x,y))dr + A(t,S,y)V(S,xtu,g)(s,x)’y) +
I

+ ?A(t,r ,y)[ [n { y} V(I‘,x,’: g)(r,x), y')l'I(r, y, dy')]dr .

Then Lemma 2 and the Fundamental Theorem of Calculus imply

0 = A(t,s,y)[V(3’x’y)‘ V(”x’y)] +

S _ - _
+ inf {[A(t,r, y)<l>(r,x(“,g)(r,x), y, u(r;t,x, y))dr +

u(-;t,x, y) t

S
+ ]A(t,s, y)Vx (s,x,u,gj)(r,x), y) - a(r,xz(},)(r,x), y,u(r;t,x, y))dr +
t

+ ?A(t,r, y) IY\{y} [V(r,ng)(r,x), y') — V(t,x,y)]l'l(r,y,dy’)dr) .

For any u(-;t,x, y) e PC ([t,T);U ) ,

3
lim —1— ]A(t,r,y)<1>(r,x,”§;)(r,x),y,u(r;t,x,y))dr = ¢(r,x,y,u(t;t,x,y)),
5—)! S -t t ’

S
lim L ] A(t,s, y)Vx(s,x,u, g)(r,x), y) - a(r,x,'f g)(r,x), y, u(r;t,x, y))dr =

s—MS-tt

= Vx(t,x, y) - a(t,x.y,u(t;t,x,y)) ,

20

lim -—1—}A(t,r, y) [H {y} [V(r,xz g)(r,x), y') - V(t,x, y)]I'I(r, y, dy')dr =

5—)! S — t t
= —G,V(t,x, y) .
Therefore, V,(t,x, y) exists and
V,(t,x,y) + Eg{¢(t,x,y,v) + A"V(t,x,y)) = 0.

The boundary (terminal) condition immediately follows from the deﬁnition of

V(t,x, y) .

In an easier case when y(s) is a ﬁnite-state Markov chain, the dynamic

programming equation has the form

V,(t,x,y) + i25{(b(t,x,y,v) + Vx(t,x,y) - a(t,x,y,v)) +
v

+ zit/(m, y') — V(t,x, y)]II(t, y,{y'}) = 0 .

a system of partial differential equations indexed by y 6 Y (note that Y is a ﬁnite set
now).
Fleming and Rishel [7] derive the continuous-time dynamic programming

equation of optimal stochastic control theory

mt,y)+;nezglw(ny,v)+mum}:o.

2l

for a somewhat more general case, where Av (s) is the generator of the controlled Markov

process y(s). In our problem, the control process x(s) is not a Markov process, but the

two-component process (x(s), y(s)) is a Markov process, and Av is its generator.

Lemma 3. (Dynkin’s Formula)

Let f (t,x, y) 6 DA, and suppose f,(t,x, y) exists and is bounded and continuous
in (t, x) uniformly in y.
Let u(-) 671, and let x(s) be the corresponding trajectory of the system.

ThenforOStSsST

E,,x,y{f(s,x(s),y(s))) — f(taxay) =

= E,,x,y{j[f,(r,x(r),y(r)) + A"(r)f(r,x(r),y(r))]dr}.
1

Proof:

f(S,X(S),y(s))— f(l,x,y) =

=Z(+[fSA1,k ,x(S/\ 1,,k+l)y(SA1,k))—f(SA1,k,x(SA1,k),y(SA1,k)) +

 

k>0

+2

k>o fk+(S/\Z', ,x(SA Z',k+l),y(S/\ Z’,k+l))-f(S/\Z',k+l,x(S/\ 1,1‘+l),y(SA1,))]=

 

=Zl+22.

From the Fundamental Theorem of Calculus

22

E,,x,y{21} = E,,x,y{l[ﬁ(r.x(r).y(r>)+fx(r.x(r).y(r))-a(r.x(r>.y<r).u(r))]dr).
t

22 requires more careful consideration.

= E(,x,y{[f( (SA Ttk+1 x(SA Tr +1)ka)— f(S/\ I',k+l ,x(SA T,k+1) T’y;)]1{q,..Ss}}=

0.._,h

‘-

: —E,,x,y(l{r’t55} A(1,k,r,yk )G,f(r,x(r),yk)dr} =

T:

= —E,,x,y{ ] A(s A 1,k,r,yk)G,f(r,x(r),yk)dr).

SA 1,‘

Also

E,,x,y{ ]G,f(r, x(r), yk)dr)- — E, x ’le{1{ k x(s) [Gr f(r, x(r), yk)dr}+

SAT, rr

3
+ E,,x’y{1{rtt SS<T,‘M} [G,f(r,x(r),yk)dr) = E1+ E2 .

Tl

E1: Et,x,y{1 {1 gs} [A(T, ,w, ”()1 (w,yk)[ jG,f(r,x(r),yk)dr]dw) =

k
T1

= Et,x,y{1{z_'k <3) 1[]A(1,k ,w, yk)/l(w, yk)dw] G,f(r, x(r), yk)dr}=

1’

~

2 E,,x,y{l {r <S}1[A(1,,r,yk)— A(1,k,s,yk)]G,f(r,x(r),yk)dr}=

23

= E,,x,y{ it [A(s A 1,k,r,yk) — A(s A 1,k,s,yk)]G,f(r,x(r),yk)dr}.

SA 1,

S
E2 = E,,x,y{l{rfSS}A(Ttk,s,y/,) [G,f(r,x(r),yk)dr) =

T,

= E,,x,y{A(s A 1,k,s,yk) [G,f(r,x(r),yk)dr).

SA 1,‘

Therefore,

Et,x,y{22} = TEt,x,y{[Grf(rax(r):)’(r))dr}a
t

and the assertion of the lemma immediately follows.

Theorem 2. (Veriﬁcation Theorem)

Let W (t,x, y) e 1),, be such that W,(t,x, y) exists for all (t,x, y) e Q , and suppose

that W, (t,x, y) is bounded and continuous in (t, x) uniformly in y.

a) Suppose W (t,x, y) satisﬁes for all (t,x, y) e Q the inequality
W,(t,x,y) + inf {(D(t,x,y,v) + AvW(l,x,y)) Z O
veU

and boundary (terminal) condition
W(T,x,y) = LI’(x,y).

Then for all (t,x,y) EQ and u(-) 671
W(t,x,y) S J(t,x,y,u(-)) ,

24

and therefore,

W (t,x, y) S V(t,x, y) .
b) If we can ﬁnd for every (t,x, y) e Q, u‘(r;, y) e U such that
w, (t,x, y) + <I>(t,x, y,_u*(t,x,y)) + Aa‘l’wlwna, y) = o,
and if
u*(.) = {Joann y) = u‘(s,x;jg>(s,x), y),(t,x, y) e Q,t s s < T) 671,
then u*(-) is optimal.
c) If we can ﬁnd y£(t,x, y) e U such that
W,(t,x, y) + <D(t,x, y,a,(i,x,y)) + Az-‘"("x’y)W(t,x, y) s f: ,
and if
a,(.) = {u,(s;t,x, y) = a,(s,x;jg>(s,x),y),(i,x, y) e Q,t s s < T) 611,

then u£(-) is s- optimal, i.e.

J(t,x,y,u£(-)) S V(t,x,y) + s for all (t,x,y) e Q.

Proof:

a) Let u(-) 671.
Since for all (t,x, y) e Q ,

W,(t,x,y) + Au(')W(t,x,y) 2 —<D(t,x,y,u(t)) a

we obtain by Lemma 3

25

E,,,,,{Lv(x(T).y(T))} = Et,x,y{W(T,x(T>sy(T))} =

T
= W(t,x, y) + E,,x,y{ [[W,(s,x(s), y(s)) + Au(S)W(s,x(s), y(s))]ds} Z
t

T
Z W (t,x, y) — E,,x,y{ j<D(s,x(s), y(s),u(s))ds} ,
t

and the assertion immediately follows.
b) The same line of arguments works, only the inequality becomes equality.

c) Again, the same line of arguments as in (a) works. We obtain with the help of (a)

J(t,x,y,u8(-)) S W(t,x,y) + a S V(t,x,y) + 5.

26

CHAPTER 3

REDUCTION TO A FAMILY OF DETERMINISTIC

CONTROL PROBLEMS

Let 8 denote the set of all functions f (t, x, y): Q —> R such that
|f(t, x, y) S Cl (T -t)+ D] for all (t, x, y) e Q, and let
“Up = {u(-) 6 “LI: J(t,x,y,u(-)) e 8}.
Notice that (LI 0 is non-empty since uo(-) for which uo(s; t, x, y) = up for all
(t, x, y) e Q, t S s < T, is in 110, and if there is an optimal control in the class 11, then that
control is in “LI 0. Lemma 1 (a) states that V(t, x, y) e 8.

For every u(-) 6 “LI, deﬁne the function j(t,x, y, u(-)): [0, T] x Rn x Y—> R by

T

J(t, x, y, u( ))= [A (t, s, y)<D(s, x,y )(s,x),y,u(s;t,x,y))l's+A(t,T,y)‘P(x,y)(T, x), y).

For every u(-) G “Ll, deﬁne operators

T“(')f(t,x, y) J(t, x, y, u(-)+) /\(,t s y){ L“ (,s x,y s, x), y')](s,y, dy')}ds,

and deﬁne

27

Tf(t,x,y)= inf {J(,t x, y, u(' .))+ [A(t, s, y){L{y}f fs (y’)/(s x),y')‘l(s, y, dy'ds)} },

u(-,'t,x,y)

where the inﬁmum is taken over all u(-;t,x, y) e PC([t,T);U).

Lemma 4.

a) For any control u(-) 6 (LI, J (t,x, y,u(-)) satisﬁes the integral equation

J(t, x, y, u(-)) = Tum-[(1. x, y,u(-))
i.e.

T

J(t, x, y, u( ))= [A A,(t s, y)(D(s, sx,y (s,.x), y, u(st x, y))is+A(t,T, y)‘I’(xty)(T x), y)+

+ [A(t, s, ”(I'M J,s (x,y)(s x),',y u(-))'I(s, y, dy'ds)}.

b) For any control u(-) e 110, if W(t, x, y) e 8 satisﬁes W(t, x, y) = T“(')W(t, x,y),

then

W(t, x, y) = J(t, x. y. um)-

c) u(-) G “U is optimal if and only if V(t, x, y) = Tu” V(t, x, y).

Proof:

a) Fix a control u(-) 6 “LI. Then

28

T
J(t, x, y. u(-)) = E,,x,y[1{r}=r}{ [q>(s, x(s), y(s), u(s))ds + lP(xﬂ), y(T))}] +
t

+E,,,1xy[ {,g. T:}{ [¢(s x(s) y(s) u(s))dswwn MDH = Ei + Ea.

Straightforward calculations yield

t

E, =A(t,T,y{T[<D(s, x,y)(s, x), y, u(s, t, x, y))is+‘I’(x,u§,)(T (T,x),y)].

T 5
E2 = [A(t,s, y)7L(s, y{ [(D(r, x,y '4 ) (r, x), y, u(r;t,x, y))lr:lds+

t l

+Etix,y[1{t}<1}'](‘ix7;)(t'x)y(1i)“('))]=

T

= I{A(t,r,y)—A(t,T,y)}<I>(r, x,y)(r, x), y, u(r, t, x, y)Pr+

N

+IA(t,,sy){L\{y} J(,sx,y (,sx),'y, u())'l(s,,')ds,ydy}

and the assertion immediately follows.

b) Suppose sup [W(t, x,y)—J(t,x, y, u(-)) = n > O, and (t, x, y) 6 Q is such that
(my)

[W(t,x,y)-J(t,x,y,u(s))Z’q-e—K'T 7%.
Then

n—e—K'T~%S|W(t,x,y)—J(t,x,y,u(-))=

 

T'WWa, x, y) — T“(')J(t, x, y, u(-)) =

29

 

T

[A(t, s, y){ LVN [W(s, x,lf(')(s, x), y')— V(s, xx?) (3, x), y')}‘ﬂs, y, dy' ))ds
t

S

 

 

S (1 - A( t, T, y ))n S (l — e-K 'T ) n. Contradiction.
Therefore, W(t, x, y) = J(t, x, y, u(-)).

c) Immediately follows from (a) and (b).

Corollary.

Finding the optimal u(-) e rLl is equivalent to ﬁnding, for each (t, x, y) 6 Q, an

optimal deterministic open loop control function u(-; t, x, y) minimizing

T ..
[A(t, s, y)(D(s, xi?) (s,x),y, u(s; t, x, y))b +A(t, T, y)‘I’(x,1f)(,') (T, x),y),

t

where

(D(s, x, y, u): <D(s, x, y,u)+ L {y}V(s, x, y')'I(s, y, dy').

A wider class of controls could be considered. Let us consider controls that have
one ﬁrnctional form until a jump of y(s) and then another functional form after each jump.

Let

M = {u(-) = u, (-,'t, x, y): Q —> PC([t,T),-U), k = 0,1, 2,...},

Deﬁne

V(t) = {number of jumps of y(s) on [0, t]}.

30

The intrajump control function uv(,) (-; t, x, y) is used from time t up to the terminal

time T or the time of the ﬁrst jump of y(s) after t, whatever comes ﬁrst. Now the control

functions and thus the state of the system x(s) depend on the past of y(s).

Notice that controls u(-) e “L! also are in class M, they are constant in the index k.
It can be seen from Corollary 4.1 that if there is an optimal control in the class “Ll,

then that control is also optimal in class M, and therefore, no advantage can be gained by

 

going to the wider class M.

Theorem 3. (Integral equation for the value function V(t, x, y))

a) V(t, x, y) satisﬁes the integral equation
(10) V(t, x, y) = T V(t, x, y)
i.e.

T
V(t, x, y) = inf {JA(I, s, y)<D(s, xx?) (3, x), y, u(s; t, x, y))ds +A(t, T,y)‘P(x,'f§,) (T, x), y)+

u(r,'t,x,y) ,

+ 7[A(t, S, y){ Ll{y}V(S’ xx?) (5, x). y')‘l(s, y, dy')}ds} ,

where the inﬁmum is taken over all u( t, x, y) e PC([t, T); U).

b) If W(t, x, y) e 8 satisﬁes W(t, x, y) = T W(t, x, y), then W(t, x, y) = V(t, x, y).

c) u(-) 6 (LI is optimal if and only if J(t, x, y, u(-)) = T J(t, x, y, u(-)).

31

Proof:

Let (t, x, y) e Q. Fix a control u(-) 6 (LI.

For any a> 0, there exists control as, (-) e ‘11 such that for every

(t',x',y') e (t, T) x R" x Y
J(t',x',y’,u8',(-))SV(t',x',y’)+s.

Deﬁne

142,1 (r,'t',x',y')= u("r"’rx'r)")°1{t’St] +u8,t(r;t"x'iy')'1{I'M}
for all (t', x', y') 6 Q and t S r < T.

Then by Lemma 4

V(t, x, y)S J(t, x, y, 118,, {-)J =
= 7W, s, y)o(3, x39) (3, x), y, u(s,-i, x, y))rs+x(i, 1, mpg) (T, x) ))+
t
+ :[A(t, s, y){ L\{y}J(s, xx?) (5, x), y', "3.! {-)hIS, y. dY')}ds S
s Tm, s, y)(D(s, xrf '1 (s, x), y, u(s; i, x, y))o + A(t, T, nygQ) (T, x), y)+
,

+ T[A(t,s,y){L\{y}V(s, x396, x),yr)1(s,y,dy')}ds +8.0 — A(,, T, y»

Since £> 0 is arbitrary, we get

32

 

V(t,x,y)ST[A(t,s,y)<D(s, x,§,)(s x), y, u(s, t, x, y))ls+A(t, T, y)‘P(x,y)T (T, x),y)+

+:[A(,,{y}tsy){£,\ V(s,x,y )ds(,,',,'sx)y)'l(sydy)} ),

for any u(, t, x, y) 6 PC ([t, T); U).

Also by Lemma 4

J(t, x, y, u( ))= :[A(t, s, y)<D(s, x,y (s, x), y,u(s;t,x, y))is+A(t,T, y)‘~I’(x,y)(T x), ))+)
+ [A(t, s, y){[{y}( J.s x,“,.’(s) x y' )u())1(s y dy’))ds>
2?A(t,s,y)¢(s 117;)(3, x), y, u(s, t, x, y))is+A(,t T, y)‘~I’(x, "()(T, x), y)+

+)A(I,S.{y}y){L\V(,Sx,y)(s,x),y'.,'ds‘,)1(sydy)}

and the assertion follows.

b) Suppose sup |W(t,x,y)— V(t, x,y)| = n > O, and (t, x, y) 6 Q is such that
(my)

|W(t,x,y)—V(t,x,y)| 2 n—e‘” ‘X.
Without loss of generality, W(t, x, y) > V(t, x, y).

There exists a control u(-) e ’U such that

Tu(')V(t,x,y)STV(t,x,y)+e—K'T1%.

Then

33

 

n—e‘K'T 7% s W(t,x,y)— V(t,x,y) = TW(t,x,y)— T V(t,x,y) s

s T”(')W(t, x, y) — T"(')V(t, x, y) + e'K‘T 7% _<.

T
S [A(t, S, y){ L‘{y}[W(s, xx?) (s, x), y')— V(s, x3201), Y')}7(S, y, 61)" ))ds+e'K'T 1% S
t

S(l—A(t,T,y))r]+e_K'T -T%Sn—e_K'T 2%.

Contradiction. Therefore, W(t, x, y) = V(t, x, y).

c) Immediately follows from (a) and (b).

Consider u0(-) e “U o for which uo(s; t, x, y) = up for all (t, x, y) e Q, t S s < T.

Let xo(s) be the solution of the system of differential equations

dx
——0d—“Z=a(s,x0(s),y(s),u0), OStSsST,
s
with initial condition
xo(t) = x e R".

Deﬁne

T
(11) Wow. x, y) = J(t, x. y. uo(-)) = Er,x,y{ Ids. xo(S). y(s). uo)ds + W(xotr), ya»),
t

and fork=1, 2, 3, , let

(12) Wk (fsxsy)=TWk-1(t,x,y)-

34

Theorem 4.

W), (t, x, y) converges to V(t, x, y) from above unifome on Q.

Proof:
It is easy to see that W, (t, x, y) 2 V(t, x, y) for all k = 0, l, 2, .

Since W0 (t, x, y) e 8 and V(t, x, y) e 8,

llWo(t,x,y)-V(t,x.y)ll= sup{Wo(t,x,y)-V(t,x,y)}-<—Z'IC1T+D1l-
(axiy)

Suppose that for some k,

||Wk(t,x,y)-V(t,x,y)||S2-[C1T+D1]-(l—e-K'TY.

Then

Wk+1(t,x,y)= TWk (t,x,y)S

T
S T V(t, x, y) + IA(t,s, y){ L\{y}2.[C1T + D, ](1 —e“K-T )k u(s, y, dy')}ds =
t

= V(t,x,y)+ 2-[C,T+Dl]-(1—e‘K'T)‘+'.

The assertion of the lemma follows by the mathematical induction.

35

CHAPTER 4

FAMILY OF CONTROL PROBLEMS WITH “STOPPED” y(s)

Let
0 _ _ 0
z,(s)—y(t)—y(1,), tSsST,

and let for k =1, 2,

z,k(s) = y(s) for t S s < 1,k
z,k(s)=y(1,k) for 1,k SSS T.

(2," (s) is y(s) stopped at the time of the kth jump after t.)
Note that for k= 1, 2,, for]: O, 1, , k,
z,k(s)=z:,—I(s), 1,I SsS T.
I

For k = 0, l, 2, , consider the control problem where the state of a system is

described by the system of differential equations

(ix/AS) k

T = a(s,xk(s),z, (s), uk(s)), t S s S T,
with initial condition

x(t) = x e R" ,

and the performance criterion is given by

T k k
J, (t, x, y, uk(.)) = E,_,,,,{ jo(s,x,,(s),z, (s),u,,(s))ds + ‘P(xk(T),z, (T))).
I

36

Since, for k = O, l, 2, , z," (s) has at most k jumps on [t, T], let us consider
controls u), (-) of the form
_ _ . I I k I
uk(S) _ "[t,/(S) "' Elk/(S, TI 9xk(TI )921 (TI ))9
1,1Ss< r,’+‘,l=o, 1, ...,k— 1,
141(5) = uk,k(s) = Alt/((5; n" ,xtmk ),21'( (If D,
k

1, SsST.

Let ’le be the set of all controls uk (-) of this form. Note that U ‘le = M.
k

For k = O, 1, 2, , consider the value function

Vk(’»xtJ’)= infiJkIMJWN-Di-
“It ')

Using arguments similar to the ones used earlier, we can prove the following

lemma.

Lemma 5.

If J], (t,x,y,u;(-))= Vk (t,x,y) and Jk_1(t,x,y,u;_l(-))= Vk_l(t,x,y), then
a) 112,10) = u;_l’1_1(-) , l = I, 2, , k,

b) u;,0(-) minimizes
T u), 0() uk 0()
[(Dk(s,x,,); (3.x),yauk,o(3))ds + u(s,); (T’xky) ’
t

where (D0(s,x,y, u) = (I)(s,x, y, u), W0(x,y) = ‘I’(x,y) ,

37

o,(.,.,y,..)=A(,,.,,){ots,x,,,.)+ ),,,,thmamas»),

LI’k(x,y)=A(t,T,y)‘I’(x,y), k= 1,2,

For every y e Y, consider the deterministic control problem where the state of a

system is described by the system of differential equations
(13) gig—S) = a(s,x(s),y,u(s)), t S s S T,
s

with initial condition
(14) x(t)=x ER",

and the performance criterion is given by
T
Fo (t,x,y,u(-)) = [$1 (s.x(s),y,u(s))ds + ‘Pt (x(T).y).
t
Suppose that ((1),( )x (s, x,y, u) exists for all (s,x,y,u) e (t, T)x R" x Yx U.
Suppose that (32(3), 12(5)) , t S s S T, is the optimal solution.
Let v(s):[t, T] —+ Rm be a piecewise continuous function and set
u(s,8)=zi(s)+6'v(s), tSsS T.
Then (13) and (14) with u(s) = u(s,8) have a one—parameter family of solutions

x(s,8), t S s S T, for [8] < 50, which contains x(s) for 5 = O. The functions x(s,8) are

continuous and have continuous derivatives with respect to 5.

Let

38

A(s)=g%(s,0), tSsS T.

Then A(s) satisﬁes the system of differential equations

€915) = ax (551(5), y,ti(s ) A(s)+ 0,, (s, 12(5) y,13(s))- v(s), t S S S T,

ds

with initial condition

A(t) = 0 e R”.
Denote by A(s), t S s S T, the n x n matrix-valued function satisfying the

differential equation

5’3?=a.(s,a(,),,,,;(,».,4(.), asst

with initial condition
A(t) = I n.

Then
A(s=) A(s) [A4 (r-)au (,rx(r),,yu(r)) v(r)dr, tSsST.

Set f (5;v(-)) = F0 (t, x, y, u(-,6)) , for |8| < 8 0.

Then f '(6;v(-)) exists and

f(6;v())= (wt) (any) )()+Tila>i )csscs)yu(s))A(s)+(¢t). (sxts)yu(s»v(s)lds

= [u(erkIr

where

39

<p(s) = (W ),, (s, i(s). y. ﬁ(s))

T

(15) + (‘Pt ), (AT). y)A(T) + [(<I>t ), (r. ﬁr). y. zi(r))A(r)dr)A”l (ska (s. As), y. 136)),

t S 3 S T.
If (12(3), 17(3)), t S s S T, is the optimal solution, then for any piecewise continuous
v(s), f(8;v(-)) has a local minimum at o = 0. Then
f ' (6;v(-)) = 0, f " (5; v(-)) 2 0-

This proves the following lemma:

Lemma 6.

If (x(s), 13(3)): t S s S T, is the optimal solution, then

f ' (5;V(-)) = 0, f " (8; v(-)) 2 0
for any piecewise continuous v(~).

The relation
f '(5; V(-)) = 0
holds for any piecewise continuous v() if and only if

(16) (p(s)=0 e R’"

almost everywhere on [t, T].

Set

T
P(s) = (‘I’k )x (x(T),y)A(T)+ “(1),, )x (r,x(r), y, ﬁ(r))A(r)dr) - A_1 (s), t S 3 S T.

40

Then since g— A’1 (3) = -A-l (3)- ax (3, 12(3), y,12(s)),
3

d

(17) ,,;-P(s) = —(<Dt ),(s.£(s).y.ﬁ(s))— P(s)- ax (3,x(3),y,12(3)), ts s s T,
and
P(T) = (‘Pt ), (2(1), y)
Deﬁne the function Hy(3,x,u, p): [O,T]x R" x U x R" —) R by
Hy(3,x,u,p)= (Dk(s,x,y,u)+ p-a(s,x,y,u).
Then (13), (17) and (16) can be rewritten as
(13.) 113%) = H;(s, 2(a), 12(3)P(s)), is s s r,
(17') if? = -H,),’ (3,)?(3), 12(3), P(s)), IS s s T,
(16') Hg(s,£(s),ii(s),1>(s)) = o, t s 3 s T.
Suppose that
(18) aa 115,,(5, 55(3), 12(3), 10(3)) .1 0, i s s s T.

 

Then since x(s) and P(s) are AC, (16') implies that 12(3) is continuous. Then (13')

and (17') imply that x(s) and P(s) are of class C I. Then (16') implies that 12(3) is also of

classCl.

Also from(16') we get
19 OW)— [Hy [l [Hy Hy Hy ((cp) P() )] < <T
() 7T—- ,,,, - su+ xu-a— up- kx+ stax , t_s_ ,

4i

where the arguments in the derivatives of Hy are (3, 32(3), 12(3), P(s)) , and the arguments in
a, (D), and their derivatives are (3, 2(5), y, 12(3)).

From (16') we also get
20 p()_((p).T. .T)1—(¢).‘ < <T
() 3— kuau auau — kuau, t_3_,
where a; is the generalized inverse of n x m matrix a u, and the arguments in a, (D), and

their derivatives are (3, 51(3), y, 12(3)).
Then (19) and (20) imply that 12(3) satisﬁes the differential equation

dt2(3)

= f y (s.i(s).zi(s)), t s s s T,
d3

 

where the function f y (3, x, u): [t, T)x R" x U —) Rm is continuous and is entirely

determined by functions a, (D), and their derivatives (excluding a xx and ((1) k )xx ).

Denote by x(3; t, x, y, v) and u(s; t, x, y, v) the solution of the following system of

differential equations

esl=.(.,xn),,,u(.», zsssz
d3

du(3)

——=fy(s.x(s)u(s)), ms]:
d3

x(t) = x,

u(t) = v e U.

Then we get the following lemma:

42

Lemma 7.

If (x(s),12(S)), t S s S T, is the optimal solution and (18) holds, then
a) 12(3) is of class C l,

b) x(s): x(s;t,x,y,12(t)),

12(3) = u(s;t,x,y,12(t)), t S s S T.

43

CONCLUSION

While dynamic programming is a simple mathematical technique that has been
used for many years by mathematicians and engineers in a variety of contexts, it is a
powerful systematic tool for optimization problems. The partial differential equation of
dynamic programming provides us with the means for ﬁnding both the value function
and the optimal control, along with the necessary and sufﬁcient conditions for optimality.

The method of dynamic programming encounters the difﬁculty that for'many
problems the value ﬁrnction is not differentiable everywhere. In this case, the (Hamilton-
J acobi-)Bellman equation cannot be solved in the classical sense, and the value function
is its generalized viscosity solution.

The partial differential equation of dynamic programming is generally difﬁcult to
solve explicitly, the value function can be found this way only in a few special cases. In
many other cases, numerical methods are needed to solve the (Hamilton-Jacobi-)Bellman
equation approximately, that creates another set of difﬁculties especially for
multidimensional control problems.

The sequence of functions W k constructed in this work can be used to

approximate the value function V for an optimal control problem without having to solve

the (Hamilton-Jacobi-)Bellman equation. The sequence Wk converges to Vuniformly

from above with an exponential convergence rate. Once the value function is known or

approximated with the desired accuracy, we can use the ideas of the dynamic

44

programming or the Corollary to Lemma 4 to search for an optimal or an s—optimal
control depending on the techniques used to assist the dynamic programming approach
and whether an optimal control exists or not.

In the near future, we plan to investigate under which conditions

a) the sequence of control functions used to obtain Wk(t, x, y) converges, and if it can
be used to obtain (or approximate) the optimal control u*(-) for the main problem.
b) the sequence of functions Vk(t, x, y) converges to V(t, x, y);

c) functions Vk(t, x, y) are differentiable in x;

d) the sequence of control functions 14;,0 (3) converges, and if it can be used to

obtain (or approximate) the optimal control u*(-) for the main problem.
Allow us to suggest a couple of potential further problems. Suppose the Markov

disturbance y(s) in (l) is not a jump Markov process. If y(s) can be approximated by a

jump Markov process y(s), would an optimal control 17* (3) for the control problem with
the Markov disturbance y(3) , if it exist, perform well for the original control problem
with y(s)?

For another possible ﬁrture problem, let {y n , n 2 I} be an ergodic stochastic
process in Y with ergodic distribution p for y n. For a < so, consider the process y8(s),

O S 3 S T, deﬁned by
yg(S)=yn for (n— 1)e Ss<ns.

Consider a stochastic optimal control problem where the evolution of the system is

described by the differential equation

45

“353) = a(s,x,,(s), y,(s),u,(s)), 0 s i s s s r,

 

with initial condition x8(t) = x, and the performance criterion is given by

J 8(t x, y “80 E,,txy ,ICD(S x8(s),y8(s) u8(3))ds+‘P(x€(T))

Let

a(3, x, u) = Ia( s, x,y,u)p(dy), $(s,x,u)= I<D(s,x,y,u)p(dy),
Y

Y
and consider a deterministic optimal control problem where the evolution of the system is

described by the differential equation

ESTES): 0(3, x(s), u(s)), O S t S 3 ST’

with initial condition ;(t) = x , and the performance criterion is given by
t x u(-)=) TICD(3, x(,s) (u(s ))13+‘I’(x(T)).

Suppose an optimal control 11— (3) exists. Would ; (s) perform well for the original

stochastic control problem for small 8? If 118*(3) is an optimal control for the original

control problem for s < 80, can anything be said about the behavior of u8*(s) and their

._t
relationship to u (3) as s —-> O?

46

[1]
[2]

[3]

[4]

[5]

[6]

l7]

[8]

[9]

[10]

[11]

112]

[13]

BIBLIOGRAPHY

R. Bellman, Dynamic Programming, Princeton Univ. Press, New Jersey (1957)

M. G. Crandall and P.-L. Lions, Viscosity solutions of Hamilton-Jacobi equations,
Trans. Amer. Math. Soc., 277 (1984) pp. l--42.

M. H. A. Davis, Piecewise-deterministic Markov processes: A general class of
non-diﬂusion stochastic models, J. Roy. Statist. Soc., B46 (1984), pp. 353--388.

M. H. A. Davis, Control of piecewise-deterministic processes via discrete-time
dynamic programming, Stochastic differential systems (Bad Honnef, 1985), pp.
140--150, Lecture Notes in Control and Inform. Sci., 78, Springer, Berlin-New
York, 1986.

M. A. H. Dempster and J. J. Ye, Necessary and suﬁ‘icient optimality conditions
for control of piecewise deterministic Markov processes, Stochastics Stochastics
Rep., 40 (1992), no. 3-4, pp. 125--l45.

M. A. H. Dempster and J. J. Ye, Generalized Bellman-Hamilton-Jacobi optimality
conditions for a control problem with a boundary condition, Appl. Math. Optim.,
33 (1996), no. 3, pp. 211--225.

W. H. Fleming and R. W. Rishel, Deterministic and Stochastic Optimal Control,
Applications of Mathematics, 1, Springer-Verlag, New York, 1975

W. H. Fleming and H. M. Soner, Controlled Markov processes and viscosity
solutions, Applications of Mathematics, 25, Springer-Verlag, New York, 1993.

J. J. F lorentin, Optimal control of systems with generalized Poisson inputs, J.
Basic Engineering, 85 (1963), pp. 217--221.

R. M. Goor, Existence of an optimal control for systems with jump Markov
disturbances, SIAM J. Control Optimization, 14 (1976), no. 5, pp. 899--9l 8.

N. N. Krassovskii and E. A. Lidskii, Analytical design of controllers in stochastic
systems with velocity limited controlling action, Appl. Math. Mech., 25 (1961),
pp. 627--643.

N. V. Krylov, Introduction to the theory of diﬂusion processes, Amer. Math. Soc.,
Providence, RI. (1995)

H. J. Kushner, 0n the stochastic maximum principle: Fixed time of control, J.
Math. Anal. Appl., 11 (1965), pp. 78--92.

47

[14]

[15]

[16]

[17]

[18]

[19]

[20]

E. A. Lidskii, Optimal control of systems with random properties, Appl. Math.
Mech., 27 (1963), pp. 42-—59.

L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze and BF. Mishchenko,
The Mathematical Theory of Optimal Processes, John Wiley, New York (1962)

R. W. Rishel, Dynamic programming and minimum principles for systems with
jump Markov disturbances, SIAM J. Control, 13 (1975), pp. 338--371.

H. M. Soner, Optimal control with state-space constraint. 1], SIAM J. Control
Optim., 24 (1986), no. 6, pp. lllO--1122.

D. D. Sworder, Feedback control of a class of linear systems with jump
parameters, IEEE Trans. Automatic Control, AC-14 (1969), pp. 9--14.

D. Vermes, Optimal control of piecewise deterministic Markov process,
Stochastics, 14 (1985), no. 3, pp. 165-—207.

W. M. Wonham, Random diﬂerential equations in control theory, Probabilistic

Methods in Applied Mathematics, vol. 2, A. T. Bharucha-Reid, ed., Academic
Press, New York, 1970, pp. 191--l99.

48

   

IIlililjljllljljljlﬂﬂlIljllljlilll