Skip to content Skip to sidebar Skip to footer

Principle of Optimality Continuous Time Process

Alexander S. Poznyak , in Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques, Volume 1, 2008

22.7 Dynamic programing

The dynamic programing method is another powerful approach to solving optimal control problems. It provides sufficient conditions for testing if some control is optimal or not. The basic idea of this approach consists of considering a family of optimal control problems with different initial conditions (times and states) and of obtaining some relationships among them via the, so-called, Hamilton-Jacoby-Bellman equation (HJB) which is a nonlinear first-order partial differential equation. The optimal control can be designed by maximization (or minimization) of the generalized Hamiltonian involved in this equation. If this HJB equation is solvable (analytically or even numerically) then the corresponding optimal controllers turn out to be given by a nonlinear feedback depending on the optimized plant nonlinearity as well as the solution of the corresponding HJB equation. Such approach actually provides the solutions to the whole family of optimization problems, and, in particular, to the original problem. Such a technique is called "invariant embedding". The major drawback of the classical HJB method is that it requires that this partial differential equation admits a smooth enough solution. Unfortunately this is not the case even for some very simple situations. To overcome this problem the so-called viscosity solutions have been introduced (Crandall & Lions 1983). These solutions are some sort of nonsmooth solutions with a key function to replace the conventional derivatives by set-valued super/sub-differentials maintaining the uniqueness of solutions under very mild conditions. These approaches not only save the DPM as a mathematical method, but make it a powerful tool in tackling optimal control. In this section we do not touch on this approach. But we will discuss the gap between necessary (MP) and sufficient (DPM) conditions.

22.7.1 Bellman's principle of optimality

Claim 22.1. (Bellman's principle (BP) of optimality) "Any tail of an optimal trajectory is optimal too."

2

In other words, if some trajectory in the phase space connects the initial x ( 0 ) and terminal x ( T ) points and is optimal in the sense of some cost functional, then the sub-trajectory, connecting any intermediate point x ( t ' ) of the same trajectory with the same terminal point x ( T ) , should also be optimal (see Fig. 22.2).

Fig. 22.2. Illustration of Bellman's principle of optimality.

22.7.2 Sufficient conditions for BP fulfilling

Theorem 22.16. (Sufficient condition for BP fulfilling)

Let

1.

the performance index (a cost functional) J ( u ( ) ) with u ( ) U a d m i s [ 0 , T ] be separable for any time t ' ( 0 , T ) such that

(22.115) J ( u ( ) ) = J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) )

where u 1 ( ) is the control within the time interval [ 0 , t ' ) called the initial control strategy and u 2 ( ) is the control within the time interval [ t ' , T ] called the terminal control strategy;

2.

the functional J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) ) is monotonically nondecreasing with respect to its second argument J 2 ( u 2 ( ) ) , that is,

(22.116) J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) ) J 1 ( u 1 ( ) , J 2 ( u 2 ' ( ) ) ) i f J 2 ( u 2 ( ) ) J 2 ( u 2 ' ( ) )

Then Bellman's principle of optimality takes place for this functional.

Proof

For any admissible control strategies u 1 ( ) , u 2 ( ) the following inequality holds

(22.117) J * : = inf u U a d m i s [ 0 , T ] J ( u ( ) ) = inf u 1 U a d m i s [ 0 , t ' ] u 2 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) ) J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) )

Select

(22.118) u 2 ( ) = arg inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) )

Then (22.117) and (22.118) imply

(22.119) J * J 1 ( u 1 ( ) = inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) ) )

So,

(22.120) u 1 ( ) = arg inf u 1 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) ) )

leads to

(22.121) J * inf u 1 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) ) )

Since J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) ) is monotonically nondecreasing with respect to the second argument, from (22.121) we obtain

(22.122) inf u 1 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) ) ) undefined inf u 1 U a d m i s [ t ' , T ] inf u 2 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , J 2 ( u 2 ( ) ) ) undefined = inf u 1 U a d m i s [ 0 , T ] J ( u ( ) ) = J *

Combining (22.121) and (22.122), we finally derive that

(22.123) J * = inf u 1 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) ) )

This proves the desired result.

Summary 22.3

In strict mathematical form this fact may be expressed as follows: under the assumptions of the theorem above for any time t ' ( 0 , T )

(22.124) inf u U a d m i s [ 0 , T ] J ( u ( ) ) undefined = inf u 1 U a d m i s [ t ' , T ] J 1 ( u 1 ( ) , inf u 2 U a d m i s [ t ' , T ] J 2 ( u 2 ( ) ) )

Corollary 22.7

For the cost functional

J ( u ( ) ) : = h 0 ( x ( T ) ) + t = 0 T h ( x ( t ) , u ( t ) , t ) d t

given in the Bolza form (22.41) Bellman's principle holds.

Proof

ZFor any t ' ( 0 , T ) from (22.41) obviously it follows that

(22.125) J ( u ( ) ) = J 1 ( u 1 ( ) ) + J 2 ( u 2 ( ) )

where

(22.126) J 1 ( u 1 ( ) ) : = t = 0 t ' h ( x ( t ) , u 1 ( t ) , t ) d t J 2 ( u 2 ( ) ) : = h 0 ( x ( T ) ) + t = t ' T h ( x ( t ) , u 1 ( t ) , t ) d t

The representation (22.125) evidently yields the validity (22.115) and (22.116) for this functional.

22.7.3 Invariant embedding

22.7.3.1 System description and basic assumptionsa

Let ( s , y ) [ 0 , T ) × n be "an initial time and state pair" to the following controlled system over [ s , T ] :

(22.127) x ˙ ( t ) = f ( x ( t ) , u ( t ) , t ) , a . e . t [ s , T ] x ( s ) = y }

where x n is its state vector, and u r is the control that may run over a given control region U r with the cost functional in the Bolza form

(22.128) J ( s , y ; u ( ) ) : = h 0 ( x ( T ) ) + t = s T h ( x ( t ) , u ( t ) , t ) d t

containing the integral term as well as the terminal one and with the terminal set n given by the inequalities (22.42). Here, as before, u ( ) U a d m i s [ s , T ] . For s = 0 and y = x 0 this plant coincides with the original one given by (22.40).

Suppose also that assumption (A1) is accepted and, instead of (A2), its small modification holds:

(A2) The maps

(22.129) f : n × U × [ 0 , T ] n h : n × U × [ 0 , T ] n h 0 : n × U × [ 0 , T ] n g l : n n ( l = 1 , , L ) }

are uniformly continuous in (x, u, t) including t (before in (A2) they were assumed to be only measurable) and there exists a constant L such that for φ = f ( x , u , t ) , h ( x , u , t ) , h 0 ( x , u , t ) , g l ( x ) ( l = 1 , , L ) the following inequalities hold:

(22.130) φ ( x , u , t ) φ ( x ^ , u ^ , t ) L x x ^ t [ 0 , T ] , x , x ^ n , u U φ ( 0 , u , t ) L u , t U × [ 0 , T ] }

It is evident that under assumptions (A1)–(A2) for any ( s , y ) [ 0 , T ) × n and any u ( ) U a d m i s [ s , T ] the optimization problem

(22.131) J ( s , y ; u ( ) ) min u ( ) U a d m i s [ s , T ]

formulated for the plant (22.127) and for the cost functional J ( s , y ; u ( ) ) (22.128), admits a unique solution x ( ) : = x ( , s , y ; u ( ) ) and the functional (22.128) is well defined.

Definition 22.8. (The value function)

The function V ( s , y ) defined for any ( s , y ) [ 0 , T ) × n

(22.132) V ( s , y ) : = inf u ( ) U a d m i s [ s , T ] J ( s , y ; u ( ) ) V ( T , y ) = h 0 ( y ) }

is called the value function of the optimization problem (22.131).

22.7.3.2 Dynamic programing equation in the integral form
Theorem 22.17

Under assumptions (A1)–(A2)for any ( s , y ) [ 0 , T ) × n the following relation holds

(22.133) V ( s , y ) = inf u ( ) U a d m i s [ s , T ] { t = s s ^ h ( x ( t , s , y , u ( ) ) , u ( t ) , t ) d t undefined + V ( s ^ , x ( s ^ , s , y , u ( ) ) ) } s ^ [ 0 , T ]

Proof

The result follows directly from BP of optimality (22.124), but, in view of the great importance of this result, we present the proof again, using the concrete form of the Bolza cost functional (22.128). Denoting the right-hand side of (22.133) by V ¯ ( s , y ) and taking into account the definition (22.132), for any u ( ) U a d m i s [ s , T ] we have

V ( s , y ) J ( s , y ; u ( ) ) undefined = t = s s ^ h ( x ( t , s , y , u ( ) ) , u ( t ) , t ) d t + J ( s ^ , x ( s ^ ) ; u ( ) )

and, taking infimum over u ( ) U a d m i s [ s , T ] , it follows that

Hence, for any ε > 0 there exists a control u ε ( ) U a d m i s [ s , T ] such that for x ( ) : = x ( , s , y ; u ε ( ) )

(22.135) V ( s , y ) + ε J ( s , y ; u ε ( ) ) undefined t = s s ^ h ( x ( t , s , y , u ε ( ) ) , u ε ( t ) , t ) d t + V ( s ^ , x ε ( s ^ ) ) V ( s , y )

Tending ε 0 the inequalities (22.134), (22.135) imply the result (22.133) of this theorem.

Finding a solution V ( s , y ) to equation (22.133), we would be able to solve the origin optimal control problem putting s = 0 and y = x 0 . Unfortunately, this equation is very difficult to handle because of overcomplicated operations involved on its right-hand side. That's why in the next subsection we will explore this equation further, trying to get another equation for the function V ( s , y ) with a simpler and more practically used form.

22.7.4 Hamilton–Jacoby–Bellman equation

To simplify the sequent calculations and following Young & Zhou (1999) we will consider the original optimization problem without any terminal set, that is, = n . This may be expressed with the constraint function equal to

(22.136) g ( x ) : = 0 x 2 ε 0 ( ε > 0 )

which is true for any x n . Slater's condition (21.88) is evidently valid (also for any x n ). So, we deal here with the regular case. Denote by C 1 ( [ 0 , T ) × n ) the set of all continuously differentiable functions v : [ 0 , T ) × n .

Theorem 22.18. (The HJB equation)

Suppose that under assumptions (A1)–(A2′) the value function υ : [ 0 , T ) × n (22.132) is continuously differentiable, that is V C 1 ( [ 0 , T ) × n ) . Then V ( s , y ) is a solution to the following terminal value problem of a first-order partial differential equation, named below the Hamilton–Jacoby–Bellman (HJB) equation associated with the original optimization problem (22.131) without terminal set ( = n ) :

(22.137) t V ( t , x ) + sup u U H ( t V ( t , x ) , x ( t ) , x ( t ) , u ( t ) , t ) = 0 ( t , x ) [ 0 , T ) × n , V ( T , x ) = h 0 ( x ) x n }

where

(22.138) H ( ψ , x , u , t ) : = ψ T f ( x , u , t ) h ( x ( t ) , u ( t ) , t ) x , u , t , ψ [ 0 , T ] × n × r × n

is the same as in (22.81) with μ = 1 corresponding to the regular optimization problem.

Proof

Fixing u ( t ) u U , , by (22.133) with s ^ s we obtain

V ( s , y ) V ( s ^ , x ( s ^ , s , y , u ( ) ) ) s ^ s 1 s ^ s t = s s ^ h ( x ( t , s , y , u ( ) ) , u ( t ) , t ) d t 0

which implies

t V ( s , y ) t V ( s , y ) T f ( s , y , u ) h ( s , y , u ) 0

resulting in the following inequality

(22.139) 0 t V ( s , y ) + sup u U H ( t V ( t , x ) , x ( t ) , u ( t ) , t )

On the other hand, for any ε > 0 and s, closed to s ^ , there exists a control u ( ) : = u ε , s ^ ( ) U admis [ s , T ] for which

(22.140) V ( s , y ) + ε ( s ^ s ) t = s s ^ h ( x ( t , s , y , u ( ) ) , u ( t ) , t ) d t + V ( s ^ , x ( s ^ ) )

Since V C 1 ( [ 0 , T ) × n ) , the last inequality leads to the following

(22.141) ε V ( s ^ , x ( s ^ ) ) V ( s , y ) s ^ s 1 s ^ s t = s s ^ h ( x ( t , s , y , u ( ) ) , u ( t ) , t ) d t undefined = 1 s ^ s t = s s ^ [ t V ( t , x ( t , s , y , u ( ) ) ) undefined t V ( t , x ( t , s , y , u ( ) ) ) T f ( t , x ( t , s , y , u ( ) ) , u ) undefined h ( x ( t , s , y , u ( ) ) , u ( t ) , t ) ] d t = 1 s ^ s t = s s ^ [ t V ( t , x ( t , s , y , u ( ) ) ) undefined + H ( t V ( t , x ( t , s , y , u ( ) ) ) , x ( t , s , y , u ( ) ) , u ( t ) , t ) ] d t undefined 1 s ^ s t = s s ^ [ t V ( t , x ( t , s , y , u ( ) ) ) undefined + sup u U H ( t V ( t , x ( t , s , y , u ( ) ) ) , x ( t , s , y , u ( ) ) , u ( t ) , t ) ] d t

which for s ^ s gives

(22.142) ε t V ( s , y ) + sup u U H ( t V ( s , y ) y , u , s )

Here the uniform continuity property of the functions f and h has been used, namely,

(22.143) lim t s undefined sup y n , u U φ ( t , y , u ) φ ( s , y , u ) = 0 , φ = f , h

Combining (22.139) and (22.142) when ε 0 we obtain (22.137).

The theorem below, representing the sufficient conditions of optimality, is known as the verification rule.

Theorem 22.19.(The verification rule)

Accept the following assumptions:

1.

Let u * ( ) : = u * ( t , x , x V ( t , x ) ) be a solution to the following optimization problem

(22.144) H ( t V ( t , x ) x , u , t ) sup u U

with fixed values x, t and x V ( t , x ) ;

2.

Suppose that we can obtain the solution V ( t , x ) to the HJB equation

(22.145) t V ( t , x ) + H ( t V ( t , x ) x , u * ( ) , t ) = 0 V ( T , x ) = h 0 ( x ) ( t , x ) [ 0 , T ) × n }

which for any ( t , x ) [ 0 , T ) × n is unique and smooth, that is, ( s , x ) C 1 ( [ 0 , T ) × n ) ;

3.

Suppose that for any ( s , x ) [ 0 , T ) × n there exists ( a . e . t [ s , T ] ) a solution x * ( s , x ) to the following ODE (ordinary differential equation)

(22.146) x ˙ * ( t ) = f ( x * ( t ) , u * ( t , x * ( t ) , t V ( t , x * ( t ) ) ) , t ) x * ( s ) = x

Then with ( s , x ) = ( 0 , x 0 ) the pair

(22.147) ( x * ( t ) , u * ( t , x * ( t ) , t V ( t , x * ( t ) ) ) )

is optimal, that is, u * ( t , x * ( t ) , x V ( t , x * ( t ) ) ) is an optimal control.

Proof

The relations (22.138) and (22.145) imply

(22.148) d d t V ( t , x * ( t ) ) = t V ( t , x * ( t ) ) undefined + t V ( t , x * ( t ) ) T f ( x * ( t ) , u * ( t , x * ( t ) , t V ( t , x * ( t ) ) ) , t ) = h ( x * ( t ) , u * ( t , x * ( t ) , t V ( t , x * ( t ) ) ) , t )

Integrating this equality by t within [s, T] leads to the following relation

V ( T , x * ( T ) ) V ( s , x * ( s ) ) = t = s T h ( x * ( t ) , u * ( t , x * ( t ) , t V ( t , x * ( t ) ) ) , t ) d t

which, in view of the identity V ( T , x ( T ) ) = h 0 ( x * ( T ) ) , is equal to the following one

(22.149) V ( s , x * ( s ) ) = h 0 ( x * ( T ) ) + t = s T h ( x * ( t ) , u * ( t , x * ( t ) , x ( t , x * ( t ) ) ) , t ) d t

By (22.133), this last equation means exactly that

( x * ( t ) , u * ( t , x * ( t ) , x V ( t , x * ( t ) ) ) )

is an optimal pair and u * ( t , x * ( t ) , x V ( t , x * ( t ) ) ) is an optimal control.

hernandezallould.blogspot.com

Source: https://www.sciencedirect.com/topics/engineering/principle-of-optimality#:~:text=2Bellman's%20principle%20of%20optimality%2C%20formulated,resulting%20from%20the%20first%20decision.%E2%80%9D

Postar um comentário for "Principle of Optimality Continuous Time Process"