Principle of Optimality Continuous Time Process
Alexander S. Poznyak , in Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques, Volume 1, 2008
22.7 Dynamic programing
The dynamic programing method is another powerful approach to solving optimal control problems. It provides sufficient conditions for testing if some control is optimal or not. The basic idea of this approach consists of considering a family of optimal control problems with different initial conditions (times and states) and of obtaining some relationships among them via the, so-called, Hamilton-Jacoby-Bellman equation (HJB) which is a nonlinear first-order partial differential equation. The optimal control can be designed by maximization (or minimization) of the generalized Hamiltonian involved in this equation. If this HJB equation is solvable (analytically or even numerically) then the corresponding optimal controllers turn out to be given by a nonlinear feedback depending on the optimized plant nonlinearity as well as the solution of the corresponding HJB equation. Such approach actually provides the solutions to the whole family of optimization problems, and, in particular, to the original problem. Such a technique is called "invariant embedding". The major drawback of the classical HJB method is that it requires that this partial differential equation admits a smooth enough solution. Unfortunately this is not the case even for some very simple situations. To overcome this problem the so-called viscosity solutions have been introduced (Crandall & Lions 1983). These solutions are some sort of nonsmooth solutions with a key function to replace the conventional derivatives by set-valued super/sub-differentials maintaining the uniqueness of solutions under very mild conditions. These approaches not only save the DPM as a mathematical method, but make it a powerful tool in tackling optimal control. In this section we do not touch on this approach. But we will discuss the gap between necessary (MP) and sufficient (DPM) conditions.
22.7.1 Bellman's principle of optimality
Claim 22.1. (Bellman's principle (BP) of optimality) "Any tail of an optimal trajectory is optimal too."
2
In other words, if some trajectory in the phase space connects the initial and terminal points and is optimal in the sense of some cost functional, then the sub-trajectory, connecting any intermediate point of the same trajectory with the same terminal point , should also be optimal (see Fig. 22.2).
Fig. 22.2. Illustration of Bellman's principle of optimality.
22.7.2 Sufficient conditions for BP fulfilling
Theorem 22.16. (Sufficient condition for BP fulfilling)
Let
- 1.
-
the performance index (a cost functional) with be separable for any time such that
(22.115)
where is the control within the time interval called the initial control strategy and is the control within the time interval called the terminal control strategy;
- 2.
-
the functional is monotonically nondecreasing with respect to its second argument , that is,
(22.116)
Then Bellman's principle of optimality takes place for this functional.
Proof
For any admissible control strategies the following inequality holds
(22.117)
Select
(22.118)
Then (22.117) and (22.118) imply
(22.119)
So,
(22.120)
leads to
(22.121)
Since is monotonically nondecreasing with respect to the second argument, from (22.121) we obtain
(22.122)
Combining (22.121) and (22.122), we finally derive that
(22.123)
This proves the desired result.
Summary 22.3
In strict mathematical form this fact may be expressed as follows: under the assumptions of the theorem above for any time
(22.124)
Corollary 22.7
For the cost functional
given in the Bolza form (22.41) Bellman's principle holds.
Proof
ZFor any from (22.41) obviously it follows that
(22.125)
where
(22.126)
The representation (22.125) evidently yields the validity (22.115) and (22.116) for this functional.
22.7.3 Invariant embedding
22.7.3.1 System description and basic assumptionsa
Let be "an initial time and state pair" to the following controlled system over :
(22.127)
where is its state vector, and is the control that may run over a given control region with the cost functional in the Bolza form
(22.128)
containing the integral term as well as the terminal one and with the terminal set given by the inequalities (22.42). Here, as before, . For and this plant coincides with the original one given by (22.40).
Suppose also that assumption (A1) is accepted and, instead of (A2), its small modification holds:
(A2′) The maps
(22.129)
are uniformly continuous in (x, u, t) including t (before in (A2) they were assumed to be only measurable) and there exists a constant L such that for the following inequalities hold:
(22.130)
It is evident that under assumptions (A1)–(A2′) for any and any the optimization problem
(22.131)
formulated for the plant (22.127) and for the cost functional (22.128), admits a unique solution and the functional (22.128) is well defined.
Definition 22.8. (The value function)
The function defined for any
(22.132)
is called the value function of the optimization problem (22.131).
22.7.3.2 Dynamic programing equation in the integral form
Theorem 22.17
Under assumptions (A1)–(A2′)for any the following relation holds
(22.133)
Proof
The result follows directly from BP of optimality (22.124), but, in view of the great importance of this result, we present the proof again, using the concrete form of the Bolza cost functional (22.128). Denoting the right-hand side of (22.133) by and taking into account the definition (22.132), for any we have
and, taking infimum over , it follows that
Hence, for any there exists a control such that for
(22.135)
Tending the inequalities (22.134), (22.135) imply the result (22.133) of this theorem.
Finding a solution to equation (22.133), we would be able to solve the origin optimal control problem putting and . Unfortunately, this equation is very difficult to handle because of overcomplicated operations involved on its right-hand side. That's why in the next subsection we will explore this equation further, trying to get another equation for the function with a simpler and more practically used form.
22.7.4 Hamilton–Jacoby–Bellman equation
To simplify the sequent calculations and following Young & Zhou (1999) we will consider the original optimization problem without any terminal set, that is, . This may be expressed with the constraint function equal to
(22.136)
which is true for any . Slater's condition (21.88) is evidently valid (also for any ). So, we deal here with the regular case. Denote by the set of all continuously differentiable functions .
Theorem 22.18. (The HJB equation)
Suppose that under assumptions (A1)–(A2′) the value function (22.132) is continuously differentiable, that is . Then is a solution to the following terminal value problem of a first-order partial differential equation, named below the Hamilton–Jacoby–Bellman (HJB) equation associated with the original optimization problem (22.131) without terminal set :
(22.137)
where
(22.138)
is the same as in (22.81) with corresponding to the regular optimization problem.
Proof
Fixing , by (22.133) with we obtain
which implies
resulting in the following inequality
(22.139)
On the other hand, for any and s, closed to , there exists a control for which
(22.140)
Since , the last inequality leads to the following
(22.141)
which for gives
(22.142)
Here the uniform continuity property of the functions f and h has been used, namely,
(22.143)
Combining (22.139) and (22.142) when we obtain (22.137).
The theorem below, representing the sufficient conditions of optimality, is known as the verification rule.
Theorem 22.19.(The verification rule)
Accept the following assumptions:
- 1.
-
Let be a solution to the following optimization problem
(22.144)
with fixed values x, t and ;
- 2.
-
Suppose that we can obtain the solution to the HJB equation
(22.145)
which for any is unique and smooth, that is, ;
- 3.
-
Suppose that for any there exists a solution to the following ODE (ordinary differential equation)
(22.146)
Then with the pair
(22.147)
is optimal, that is, is an optimal control.
Proof
The relations (22.138) and (22.145) imply
(22.148)
Integrating this equality by t within [s, T] leads to the following relation
which, in view of the identity , is equal to the following one
(22.149)
By (22.133), this last equation means exactly that
is an optimal pair and is an optimal control.
Source: https://www.sciencedirect.com/topics/engineering/principle-of-optimality#:~:text=2Bellman's%20principle%20of%20optimality%2C%20formulated,resulting%20from%20the%20first%20decision.%E2%80%9D
Postar um comentário for "Principle of Optimality Continuous Time Process"