J Syst Sci Complex (2018) 31: 173–187
Pareto Efficiency of Finite Horizon Switched Linear Quadratic Differential Games∗ HUANG Yabing · ZHAO Jun
DOI: 10.1007/s11424-018-7439-7 Received: 22 December 2017 / Revised: 27 December 2017 c The Editorial Office of JSSC & Springer-Verlag GmbH Germany 2018 Abstract A switched linear quadratic (LQ) differential game over finite-horizon is investigated in this paper. The switching signal is regarded as a non-conventional player, afterwards the definition of Pareto efficiency is extended to dynamics switching situations to characterize the solutions of this multi-objective problem. Furthermore, the switched differential game is equivalently transformed into a family of parameterized single-objective optimal problems by introducing preference information and auxiliary variables. This transformation reduces the computing complexity such that the Pareto frontier of the switched LQ differential game can be constructed by dynamic programming. Finally, a numerical example is provided to illustrate the effectiveness. Keywords game.
1
Dynamic optimization, linear quadratic problems, Pareto efficiency, switched differential
Introduction
Game theory, as a branch of applied mathematics, has been applied to many real-world directions due to its superiority in modeling the complexly interactive decision making process consists of multiple players who are equipped with their own objectives respectively, such as cyber-physical systems[1, 2] , microgrids[3–5] , wind power systems[6] and chemical engineering processes[7] . As a powerful tool handling multi-player decision making over time, the concept of differential games is introduced to deal with the situations where the environment affecting and being affected by players is described by differential equations[8, 9] . As a result, the history of the environment state and that of each player’s decision should be taken into consideration before making the current action. The introduction of the differential equations makes it possible to HUANG Yabing · Zhao Jun (Corresponding author) College of Information Science and Engineering, Northeastern University, Shenyang 110819, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China. Email: yabing
[email protected];
[email protected]. ∗ This work was supported by the National Natural Science Foundation of China under Grant No. 61773098 and the 111 Project under Grant No. B16009. This paper was recommended for publication by Guest Editor LIU Tengfei.
174
HUANG YABING · ZHAO JUN
handle much more sophisticated situations, such as well-known pursuit-evasion problems[10, 11] , multi-agent collision avoidance problems[12, 13] , and cooperative control problems[14, 15] . Moreover, some classical control problems can also be converted to differential game problems, and further be solved from the perspective of game theory. In [16], based on differential game theory, the H2 /H∞ control problem of stochastic systems is solved by being formulated as a particular case of non-zero sum game with two players (a controller and an external disturbance) and two pay-off functions representing H2 and H∞ criteria. Taking a step back to look at the macroscopical picture, it is reasonable to say that differential game theory can be regarded as a generalization of optimal control. In optimal control, there is only one player attempting to optimize its performance criterion by seeking the best strategy subject to a dynamic constraint of environment state variable, while in differential game theory, there exist several players, each attempting to optimize its own performance criterion subject to the common dynamic constraint. This closely connected relationship makes it possible to take full advantage of mature techniques in the field of optimal control, such as Pontryagin’s maximum principle and dynamic programming. The above mentioned works in [8–16] make a contribution to the development of differential game theory, however, they considered only the non-switched environment dynamics situations. It is worth pointing out that the environment dynamics may vary according to certain external parameters, or just switch from one operating mode to another. The piecewise deterministic differential game is considered in [17] where the state equation switches randomly among different dynamics at random instants, which is caused by the machine’s breakdown and repair of manufacturing systems. Taking pursuit-evasion game for another example, the evader in [18] can change its operating mode to become a pursuer if the switching criteria based on the current and future state of controlled vehicle is satisfied. Nevertheless, under some circumstances, the switching in system dynamics is neither stochastic nor internal forced (i.e., the switchings are generated implicity when the state trajectory intersects some switching sets), but is another instrument of control to be designed. Then it is reasonable to model the switching signal as an extra player who also makes an attempt to optimize its strategy by taking an active participation in the differential game consisting of conventional players. The fact that the switching signal is a continuous-in-time but discrete-invalue function, makes it quite difficult to optimize the switching signal even in the unconstrained scenarios, let alone the coupled constrained problems in the framework of differential game. In order to handle this special character of the switching signal, the parameterization approach of switching instants proposed in [19] transcribes the undetermined switching instants into extended states. Then, the conventional optimal control techniques can be adopted to solve the optimal switching instants problem of the extended systems on assumption that the switching mode sequence is prescribed. Even though the result is in presentation of two subsystems with only one switching, the same procedure can be repeated such that the corresponding number of auxiliary states can be introduced to parameterize multiple switching instants. Under the same assumption, a novel method proposed in [20], without state extension, directly regards multiple switching instants as a decision vector. In these two papers, the optimal control prob-
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
175
lems of switched systems are composed of only one controller and a switching signal, where the switching signal is not taken as a player, therefore does not have its own performance index. In this paper, in contrast, besides the presentation of multiple controllers, a special form of performance criterion is constructed to evaluate switching law’s quality, and to act as a guideline for the design of the switching signal. Motivated by the analysis above, a switched LQ differential game over finite horizon is considered in this paper. The contributions of this paper are threefold: • Compared with the conventional differential games, the dynamics of state evolution are generalized to switched scenarios, which makes it possible to handle much sophisticated real-world situations. • The switching signal, being another measure of control rather than being a stochastic variable nor being internal forced, greatly enlarges the freedom in the controller design procedure. • Paralleling the conventionally modelling control inputs as players, the switching signal is also regarded as a player pursuing the optimization of its own performance criterion. Then the cooperative game among players, involving the controllers and the switching signal, is considered to characterizethe interactive decision making process, which provides a new method for the simultaneous design of the multiple controllers and the switching signal. Throughout this paper some standard notations are utilized. The structure of this paper is organized as follows. The control problem is posed in Section 2, and the main result is derived in Section 3. In Section 4, a numerical example is given to validate the effectiveness of our results and we make a concluding remark in Section 5.
2
Problem Statement
Consider N -person finite-horizon switched LQ differential game whose environment dynamics are described by the switched linear differential equation as below: x˙ = Aσ x +
N
Bi,σ ui ,
(1)
i=1
where x ∈ Rn and ui ∈ Rmi denote the state of system and ith player’s action, respectively. Aj , Bi,j , ∀j ∈ {1, 2, · · · , M } are dynamical matrices of appropriate dimensions, with M denoting the number of subsystems. σ(t) : R+ → {1, 2, · · · , M } is a piecewise constant switching signal orchestrating the activation among subsystems. Assumption 1
σ has finite number of switchings on any finite interval of time.
This is a standard assumption in the switched systems literatures [21–23] to rule out Zeno behavior. How to solve the optimization of the switching signal with Zeno behavior is a challenging topic which is beyond the scope of this paper. Under this non-Zeno assumption, a
HUANG YABING · ZHAO JUN
176
switching sequence can be written as σ(t) = (t0 , i0 ), (t1 , i1 ), · · · , (tNs , iNs ) , where the subscript Ns is a prescribed number of switchings. Also, we assume ik−1 = ik for any k ∈ {1, 2, · · · , Ns }, then the admissible strategy set for the switching signal is defined as Γσ (Ns ) = (t0 , i0 ), (t1 , i1 ), · · · , (tNs , iNs ) | t0 < t1 < · · · < tNs , ij ∈ {1, 2, · · · , M }, ∀j = 0, 1, · · · , Ns . Each player is equipped with its own performance criterion over finite horizon which it would like to minimize. Here the LQ form is adopted as follows: tf T min Ji = min x (t)Qi x(t) + uT (2) i (t)Ri ui (t) dt, ui ∈Ui
ui ∈Ui
t0
where Qi ≥ 0 and Ri > 0 are ith player’s weighting matrices, which describes its preference between running state x(t) and its own control input ui (t), and that between the components of them. The admissible set Ui of ui is defined as Ui {ui | ui ∈ Cp [t0 , tf ], u(t) ∈ Rmi }, that is to say, ith player can only choose strategy from the set of piecewise continuous functions for t ∈ [t0 , tf ] that take value in Rmi . Compared with the conventional differential game theory, the introduction of the switching signal not only leads to a sophisticated dynamical behavior of state evolution, but has a direct influence on the process that each player pursues its objective, so the switching signal is also modelled as a player who only cares the scrap value, i.e., min
σ∈Γσ (Ns )
Jσ =
min
σ∈Γσ (Ns )
xT (tf )QT x(tf ) ,
(3)
where QT ≥ 0 is the weighting matrix of the switching signal. In summary, the objective of this paper is to solve Problem 1 stated below. Problem 1 Multi-objective switched cooperative game. Consider the switched dynamical systems (1) and players’ performance criteria (2)–(3), given the a priori time instants t0 , tf and initial condition x(t0 ) = x0 , the objective is to solve the interactive decision making problem with the help of cooperative game theory by finding suitable control laws ui , i = 1, 2, · · · , N and the switching signal σ(t).
3 3.1
Main Results Pareto Efficiency
Section 2 portraits a situation where there is more than one player affecting the system and these players decide to coordinate their actions in order to minimize their costs. By cooperation, in general, the cost a specific player incurs is not uniquely determined anymore. Therefore, the concept of Pareto efficient solution is introduced to appraise the multiple objectives brought by multi-player’s respective performance criterion.
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
177
Definition 3.1 (Pareto efficiency) An (N + 1)-tuple of strategies (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) is Pareto efficient if the set of inequalities Ji (σ, u1 , u2 , · · · , uN ) ≤ Ji (σ ∗ , u∗1 , u∗2 , · · · , u∗N ),
i = 1, 2, · · · , N,
Jσ (σ, u1 , u2 , · · · , uN ) ≤ Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N ),
(4) (5)
where at least one of the inequalities is strict, does not allow for any solution (σ, u1 , u2 , · · · , uN ) ∈ Γσ (Ns )×U1 ×U2 ×· · ·×UN . The corresponding point Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N ), J1 (σ ∗ , u∗1 , u∗2 , · · · , u∗N ), · · · , JN (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) ∈ RN +1 is called a Pareto solution, and the set of all Pareto solutions is called the Pareto frontier. Remark 3.2 As a generalization of conventional Pareto efficiency[24] into dynamics switching scenarios, Definition 3.1, taking the switching signal’s cost into consideration, defines a partial order relationship between any two aggregate objective vectors, which allows us only focus on solutions that cannot be improved upon by all players simultaneously. Lemma 3.3 An (N + 1)-tuple of strategies (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) is Pareto efficient if there
N +1 exists αi ∈ (0, 1) with i=1 αi = 1 such that J(σ ∗ , u∗1 , u∗2 , · · · , u∗N ) ≤ J(σ, u1 , u2 , · · · , uN ), ∀(σ, u1 , u2 , · · · , uN ) ∈ Γσ (Ns ) × Γ1 × Γ2 × · · · × ΓN , where J(σ, u1 , u2 , · · · , uN ) =
N
i=1
(6)
αi Ji + αN +1 Jσ .
Proof Inspired by the works in [25] on the non-switching situations, this proof can be undertaken by contradiction. Assume (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) minimizing J(σ, u1 , u2 , · · · , uN ) under a given set of convex combination coefficients (α1 , α2 , · · · , αN +1 ) is not Pareto efficient, then there exists another (N + 1)-tuple of strategies denoting as ( σ, u 1 , u 2 , · · · , u N ), such that Ji ( σ, u 1 , u 2 , · · · , u N ) ≤ Ji (σ ∗ , u∗1 , u∗2 , · · · , u∗N ),
i = 1, 2, · · · , N,
Jσ ( σ, u 1 , u 2 , · · · , u N ) ≤ Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N ), where at least one inequality is strict. Then J( σ, u 1 , u 2 , · · · , u N ) =
N
αi Ji ( σ, u 1 , u 2 , · · · , u N ) + αN +1 Jσ ( σ, u 1 , u 2 , · · · , u N )
i=1
<
N
αi Ji (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) + αN +1 Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N )
i=1
= J(σ ∗ , u∗1 , u∗2 , · · · , u∗N ),
(7)
which is in contrast with the fact that (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) minimizing the weighted sum of all players’ cost functions. Thus, (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) is Pareto efficient. Remark 3.4 Lemma 3.3 states that every strategy tuple minimizing the weighted sum of all players’ cost functions is a Pareto solution of the original cooperative game. The reason
178
HUANG YABING · ZHAO JUN
why the original multi-objective interactive decision making problem can be converted into a family of parameterized single-objective problems lies in the cooperation among all the players including a switching signal and N controllers, that they agree to coordinate their strategies in view of optimizing a joint cost function. Even though each player has its own cost function which differs more or less from that of other player, players in cooperative game, modelled as a group rather than individuals, seek for the strategies that lead to the best outcome for the whole group, which makes it possible to transform the decision problem by integrating the whole group’s preference information among multiple cost functions. For the sake of simplicity, we concentrate on the case of only two subsystems where the switching only occurs once, then without loss of generality, it is supposed that subsystem 1 is active on the interval t ∈ [t0 , ts ) and subsystem 2 is active on the interval t ∈ [ts , tf ], where ts denotes the switching instant. According to Lemma 3.3, the original Problem 1 can be parameterized as a family of single-objective optimal control problems. Problem 2 Equivalent single-objective optimal control problem. For the switched system x˙ = A1 x + B 1 u,
t ∈ [t0 , ts ),
(8)
x˙ = A2 x + B 2 u,
t ∈ [ts , tf ],
(9)
where B j := [B1,j , B2,j , · · · , BN,j ] for j = 1, 2 denotes the augmented control coefficient matrix T T T and u := [uT 1 , u2 , · · · , uN ] denotes aggregated control input, find a switching instant ts and N control inputs ui (t), i = 1, 2, · · · , N, t ∈ [t0 , tf ] such that the weighted sum of cost functions under the given combination coefficients (α1 , α2 , · · · , αN +1 ), J(ts , u1 , u2 , · · · , uN ) =
N
αi Ji + αN +1 Jσ
i=1 tf
= t0
xT Qx + uT Ru dt + xT (tf )QT x(tf )
(10)
is minimized, where Q := N i=1 αi Qi , R := diag(α1 R1 , α2 R2 , · · · , αN RN ) and QT := αN +1 QT are weighting matrices. Here t0 , tf and x(t0 ) = x0 are given a priori. 3.2
Parameterization of the Switching Instant
Following the parameterization approach of switching instants proposed by [19], an auxiliary state variable xn+1 is introduced corresponding to the switching instant ts , which satisfies dxn+1 = 0, dt xn+1 (0) = ts .
(11) (12)
Next, a new independent time variable τ is introduced to establish the piecewise linear mapping relationship between t and τ as τ ∈ [0, 1] , t0 + (xn+1 − t0 ) · τ, t= (13) xn+1 + (tf − xn+1 ) · (τ − 1), τ ∈ [1, 2] .
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
179
With the aim of auxiliary state variable xn+1 and auxiliary time variable τ , Problem 2 can be transcribed into the following equivalent problem. Problem 3 Parameterized equivalent single-objective optimal control problem. For the switched dynamical systems dx = (xn+1 − t0 ) · (A1 x + B 1 u), dτ dxn+1 = 0, dτ
(14) (15)
when τ ∈ [0, 1] and dx = (tf − xn+1 ) · (A2 x + B 2 u), (16) dτ dxn+1 = 0, (17) dτ when τ ∈ [1, 2], find a switching instant ts and an augmented control input u(τ ), τ ∈ [0, 2] such that the cost functional 1 Ji = (xn+1 − t0 ) · (xT Qx + uT Ru) dτ 0
+ 1
2
(tf − xn+1 ) · (xT Qx + uT Ru) dτ + xT (tf )QT x(tf )
(18)
is minimized. Here, t0 , tf and x(0) = x0 are given a priori. Remark 3.5 The transformation from Problem 2 into the equivalent Problem 3 makes it possible to eliminate the undetermined switching instant from the range of integration, which means that it no longer has a varying switching instant in the τ -domain but with an unknown constant xn+1 representing the varying switching instant through τ ∈ [0, 2]. Then Problem 3 is regarded as an optimal control problem parameterized by the switching instant xn+1 . Although here in this paper only one switching case is considered, in the case of more than one switching, if we apply similar transcriptions, the original problem could still be transformed into a family of parameterized optimal control problems by introducing corresponding number of auxiliary variables. Assume that a fixed xn+1 is given, according to Bellman’s principle of optimality[26] , the HJB equations are obtained as follows: ∂V ∗ (x, τ, xn+1 ) ∂τ
T ∂V ∗ (x, τ, xn+1 ) (A1 x + B 1 u) = min (xn+1 − t0 ) · xT Qx + uT Ru + ∂x u(τ ) −
(19)
when τ ∈ [0, 1), and ∂V ∗ (x, τ, xn+1 ) ∂τ
T ∂V ∗ T T (x, τ, xn+1 ) (A2 x + B 2 u) = min (tf − xn+1 ) · x Qx + u Ru + ∂x u(τ ) −
(20)
HUANG YABING · ZHAO JUN
180
when τ ∈ [1, 2]. Taking xn+1 as a parameter, we can obtain the parameterized Hamiltonian function as ∂V ∗ H x, τ, xn+1 , u, ∂x ⎧ T ∂V ∗ ⎪ T T ⎪ ⎪ ⎨ (xn+1 − t0 ) · x Qx + u Ru + ∂x (x, τ, xn+1 ) (A1 x + B 1 u) , τ ∈ [0, 1), (21) T ∗ ⎪ ⎪ ∂V T T ⎪ (t − x ⎩ (x, τ, xn+1 ) (A2 x + B 2 u) , τ ∈ [1, 2]. f n+1 ) · x Qx + u Ru + ∂x According to the stationarity conditions, associated feedback control policies are given by 1 −1 T ∂V ∗ ∂H = 0 ⇒ u = − R Bj · (x, τ, xn+1 ), ∂u 2 ∂x
j = 1, 2,
(22)
and substituting (22) into (19)–(20), one can obtain the parameterized HJB equations as follows: ∂V ∗ (x, τ, xn+1 ) ∂τ T T ∗ ∂V ∗ 1 ∂V ∗ −1 T ∂V =(xn+1 − t0 ) xT Qx + A1 x − B1R B1 ∂x 4 ∂x ∂x −
(23)
in the interval τ ∈ [0, 1) and ∂V ∗ (x, τ, xn+1 ) ∂τ T T ∗ ∂V ∗ 1 ∂V ∗ −1 T ∂V T =(tf − xn+1 ) x Qx + A2 x − B2R B2 ∂x 4 ∂x ∂x −
(24)
in the interval τ ∈ [1, 2]. From the observation that the switched environment dynamics are linear in the state and control variables, and that the weighted performance criterion is in linear quadratic form, it is reasonable to assume that the value function also takes the linear quadratic form, i.e., V (x, τ, xn+1 ) = xT P (τ, xn+1 )x,
(25)
where P (τ, xn+1 ) is a positive definite symmetric matrix, i.e., P T (τ, xn+1 ) = P (τ, xn+1 ). Then the partial differential equations can be degenerated into parameterized Riccati equations as follows: −
∂P −1 T = (xn+1 − t0 )(AT B1 P ) 1 P + P A1 + Q − P B 1 R ∂τ
(26)
−
∂P −1 T = (tf − xn+1 )(AT B2 P ) 2 P + P A2 + Q − P B 2 R ∂τ
(27)
for τ ∈ [0, 1), and
for τ ∈ [1, 2]. Along with the boundary conditions P (2, xn+1 ) = QT ,
(28)
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
181
we can solve for a fixed xn+1 backward in τ and obtain the parameterized optimal value at τ = 0, i.e., the optimal J under fixed switching instant xn+1 , as J(xn+1 ) = xT 0 P (0, xn+1 )x0 .
(29)
From (29), we can get the gradient information of J(xn+1 ) with respect to the switching instant xn+1 as dJ ∂P (0, xn+1 ) (xn+1 ) = xT · x0 , 0 · dxn+1 ∂xn+1
(30)
at (0, xn+1 ) has been calculated first. Differentiating (26)–(27) as long as the value of ∂x∂P n+1 with respect to xn+1 leads to the following evolution dynamics of ∂x∂P by using the fact that n+1 ∂ ∂ ∂ ∂ ( ) = ( ): ∂xn+1 ∂τ ∂τ ∂xn+1 −
∂ ∂τ
∂P (τ, xn+1 ) ∂xn+1
−1
T
=(AT B1 P ) 1 P + P A1 + Q − P B 1 R ∂P ∂P −1 T −1 T ∂P T ∂P + A1 − B1R B1 P − P B1R B1 + (xn+1 − t0 ) A1 ∂xn+1 ∂xn+1 ∂xn+1 ∂xn+1
(31)
when τ ∈ [0, 1), and ∂ ∂P (τ, xn+1 ) − ∂τ ∂xn+1 −1
T
= − (AT B2 P ) 2 P + P A2 + Q − P B 2 R ∂P ∂P −1 T −1 T ∂P T ∂P + A2 − B2R B2 P − P B2R B2 + (tf − xn+1 ) A2 ∂xn+1 ∂xn+1 ∂xn+1 ∂xn+1
(32)
when τ ∈ [1, 2). Equations (26)–(27) and (31)–(32) together with the following boundary conditions P (2, xn+1 ) = QT ,
(33)
∂P (2, xn+1 ) = 0, ∂xn+1
(34)
and the following continuity condition P (1− , xn+1 ) = P (1+ , xn+1 ),
(35)
even though are almost impossible to obtain the analytic solution, can be numerically solved by using Runge-Kutta method. Remark 3.6 In this paper, we consider two-subsystems case only for the sake of brevity. As for the case with several subsystems and more than one switchings, the above procedure is still theoretically valid, except for the increasing complexity of computation.
182
HUANG YABING · ZHAO JUN
Remark 3.7 We do not consider the aspect of selecting a particular solution on the Pareto frontier. This is generally posed as a function optimization problem, see bargaining in game theory[9] and compromise solutions in multi-objective or vector-valued optimization[27] . In summary, the calculation of Problem 1 can be realized by the following algorithm. Algorithm 1 Step 1 Initialization. Specify the information given a priori including: The dynamical matrices of subsystems, the weighting matrices of each player’s performance criterion Qi , Ri , QT , the initial point x0 and the prescribed switching order among subsystems. Step 2 Generating the convex weighting coefficient tuples (α1 , α2 , · · · , αN +1 ). Step 3 Optimization under the given convex coefficient. 1) Compute the weighted matrices Qi , Ri and QT for each given convex coefficient. 2) Make an initial guess of xn+1 ∈ [t0 , tf ]. 3) Calculate the performance criterion’s value J and gradient with respect to xn+1 by solving equations (26)–(27) and (31)–(32) along with (33)–(35). 4) Update the value of xn+1 with the gradient information obtained in 3). 5) Repeat 3)–4) until the termination condition is satisfied, and then calculate and output the optimal switching instant ts based on (13), together with values of performance index for each player. Step 4 Constructing the approximate Pareto frontier. Remark 3.8 The reason why the Pareto frontier constructed in Step 4 is only an approximation is that the solutions obtained in Step 3 are only of finite number due to the limited computation capacity, and there always exists approximating error in the procedure of depicting a surface with only a finite number of points. Intuitively, the more solutions are calculated, the less approximating error it will be, if the time-complexity on the computation level is ignored.
4
Numerical Example
In this section, a numerical example is presented to validate the effectiveness of the method proposed in Section 3. Consider the switched system consisting of ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0.5 1 1 ⎦ x + ⎣ ⎦ u1 + ⎣ ⎦ u2 , Σ1 : x˙ = ⎣ 0.5 −1 0 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 0.5 0 1 ⎦ x + ⎣ ⎦ u1 + ⎣ ⎦ u2 . Σ2 : x˙ = ⎣ 0.5 1 1 1 Assume t0 = 0, tf = 3 and the system switches once at t = ts ∈ [t0 , tf ] from subsystem 1 to
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
183
subsystem 2. The performance criteria for u1 and u2 are ⎡ ⎤ tf 2 0.5 T 2 ⎣ ⎦ J1 = x (t) x(t) + u1 (t) dt , t0 0.5 1 ⎡ ⎤ tf 1 0.5 T 2 ⎣ ⎦ x (t) x(t) + 2u1 (t) dt, J2 = t0 0.5 2 respectively, and the performance criterion for the switching signal is ⎡ ⎤ 1 ⎦ x(tf ). Jσ = x(tf ) 2 = xT (tf ) ⎣ 1 The objective is to find the Pareto efficient switching instant ts and control inputs u1 (t), u2 (t). We choose 240 different weighting coefficient tuples (α1 , α2 , α3 ), then following the Algorithm 1 in Subsection 3.2, the corresponding solutions are obtained by solving a family of parameterized equivalent single-objective optimal control problems and plotted in Figure 1, and the projections on plane J1 -J2 , J1 -J3 and J2 -J3 are given in Figures 2–4, respectively. It can be seen from Figure 1 that the magnitude of the switching signal’s performance index is notably less than that of other two players’, which is caused by the special form of the switching signal’s performance index, i.e., the switching signal cares only the scrap value of the state trajectory over the time interval [t0 , tf ]. Players u1 and u2 are concerned partially with the accumulation of weighted sum of state components, then intuitively the scarp value would, to a certain extent, reduce along with the process u1 and u2 pursue the minimization of their own performance indices. It is easy to verify that the solutions are all non-dominated, therefore are optimal in the sense of Pareto. Here 4 typical solutions are chosen and marked in Figures 1–4 with different symbols to illustrate the non-dominant relationship between any pair of Pareto efficient solutions: 3 minimum points (Points 1–3) along each axis direction respectively, and 1 solution (Point 4) giving consideration to all performance criteria simultaneously and equally. The corresponding switching instants and values of different performance indices are listed in Table 1. Table 1 vividly portraits the feature of Pareto efficiency: Pareto efficient solutions are a subset of solutions that describe the best trade-off between objectives in multi-objective decision making problems. Take Point 1 for example. Due to the largish weight allocated to J1 , the whole group consisting of u1 , u2 and σ pays almost all attention to the minimization of J1 . Meanwhile, the values of the other two performance criteria, J2 and Jσ , occupy a much small portion of the weighted performance criterion J, thus the increases in J2 and Jσ only have little influence on the value of J. Then Point 1 has the minimum value of J1 but relatively large values of J2 and Jσ , which can also be validated by Figures 2–3. The similar analysis also applies to Point 2 and Point 3. As for Point 4, since the weighting coefficients are chosen equally as α1 = α2 = α3 = 13 , the corresponding Pareto efficient solution demonstrate the impartial trade-off among the 3 conflicting objectives.
HUANG YABING · ZHAO JUN
184
x 10
−3
1.4
1.2 Pareto efficient solutions Point 1 Point 2 Point 3 Point 4
1
Jσ
0.8
0.6
0.4
0.2
0 0 5 10 15 20 J
Figure 1
4.5
3.5
4
3
J 2
2
2.5
1.5
0.5
1
1
The Pareto efficient solutions obtained by Algorithm 1
20 Pareto efficient solutions Point 1 Point 2 Point 3 Point 4
18
16
J
2
14
12
10
8
6
4 0.5
Figure 2
1
1.5
2
2.5 J1
3
3.5
4
4.5
The projections of Pareto efficient solutions on J1 -J2 plane
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
1.4
x 10
−3
1.2
1
J
σ
0.8
0.6
Pareto efficient solutions Point 1 Point 2 Point 3 Point 4
0.4
0.2
0 0.5
1
Figure 3 1.4
x 10
1.5
2
2.5 J1
3
3.5
4
4.5
The projections of Pareto efficient solutions on J1 -J3 plane
−3
Pareto efficient solutions Point 1 Point 2 Point 3 Point 4
1.2
1
J
σ
0.8
0.6
0.4
0.2
0
4
6
8
10
12 J
14
16
18
20
2
Figure 4
The projections of Pareto efficient solutions on J2 -J3 plane
185
HUANG YABING · ZHAO JUN
186 Table 1
Switching instants and performance values of 4 typical solutions ts
J1
J2
Jσ
Point 1
2.3912
0.7060
19.1133
0.0005
Point 2
2.7006
4.1355
4.2988
0.0013
Point 3
2.0242
0.9586
11.9312
0.0001
Point 4
2.6385
2.8457
5.1511
0.0006
Even though only 240 points are presented here, it is still obvious that Pareto efficient solutions could constitute a convex surface if the amount of Pareto efficient solutions trends to infinity, which is in accord with the fact that the original switched cooperative game is a convex optimization problem.
5
Conclusion
In this paper, the cooperative differential game under switched dynamics was considered. The definition of Pareto efficiency is extended to switched scenarios, which lays the foundation of tackling cooperative game under the dynamics switching situation. The original multiobjective cooperative game problem is transformed equivalently into a family of single-objective optimal problems by introducing the preference information, i.e., convex combination coefficient tuples, and furthermore transcribed into a family of parameterized single-objective optimal problems with the aim of auxiliary variables representing unknown switching instants. Dynamic programming is adopted afterwards to tackle the optimization process and the Pareto frontier can be constructed accordingly. Finally, the effectiveness of the methods is illustrated by a numerical example.
References [1] [2] [3] [4]
[5] [6]
Li Y, Quevedo D E, Dey S, et al., A game-theoretic approach to fake-acknowledgment attack on cyber-physical systems, IEEE Trans. Signal Inform. Process. Netw., 2017, 3(1): 1–11. Liu X, Dong M, Ota K, et al., Service pricing decision in cyber-physical systems: Insights from game theory, IEEE Trans. Serv. Comput., 2016, 9(2): 186–198. Mylvaganam T and Astolfi A, Control of microgrids using a differential game theoretic framework, IEEE Conf. Decis. Control, Osaka, Japan, 2015. Saad W, Han Z, Poor H V, et al., Game-theoretic methods for the smart grid: An overview of microgrid systems, demand-side management, and smart grid communications, IEEE Signal Process. Mag., 2012, 29(5): 86–105. Srikantha P and Kundur D, A DER attack-mitigation differential game for smart grid security analysis, IEEE Trans. Smart Grid, 2016, 7(3): 1476–1485. Mei S, Zhang D, Wang Y, et al., Robust optimization of static reserve planning with large-scale
PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME
[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
[18]
[19] [20] [21] [22] [23] [24] [25] [26] [27]
187
integration of wind power: A game theoretic approach, IEEE Trans. Sustainable Enery, 2014, 5(2): 535–545. Li Y, Mu Y, Yuan S, et al., The game theoretical approach for multi-phase complex systems in chemical engineering, Journal of Systems Science & Complexity, 2017, 30(1): 4–19. Ba¸sar T and Olsder G J, Dynamic Noncooperative Game Theory, SIAM, Philadelphia, 1998. Engwerda J, LQ Dynamic Optimization and Differential Games, John Wiley & Sons, Chichester, 2005. Ho Y C, Bryson A, and Baron S, Differential games and optimal pursuit-evasion strategies, IEEE Trans. Autom. Control, 1965, 10(4): 385–389. Garcia E, Casbeer D W, and Pachter M, Active target defence differential game: Fast defender case, IET Contr. Theory Appl., 2017, 11(17): 2985–2993. Mylvaganam T, Sassano M, and Astolfi A, A differential game approach to multi-agent collision avoidance, IEEE Trans. Autom. Control, 2017, 62(8): 4229–4235. Li S and Hou C, On the collision detection problem of two moving objects described by algebraic sets, Journal of Systems Science & Complexity, 2007, 20(1): 108–118. Semsar-Kazerooni E and Khorasani K, Multi-agent team cooperation: A game theory approach, Automatica, 2009, 45(10): 2205–2213. Gao H, Petrosyan L, Qiao H, et al., Cooperation in two-stage games on undirected networks, Journal of Systems Science & Complexity, 2017, 30(3): 680–693. Zhang Q, H2 /H∞ control problems of backward stochastic systems, Journal of Systems Science & Complexity, 2014, 27(5): 899–910. Sethi S P and Zhang Q, Asymptotic optimal controls in stochastic manufacturing systems with machine failures dependent on production rates, Stochastics and Stochastic Reports, 1994, 48(1– 2): 97–121. Eklund J M, Sprinkle J, and Sastry S S, Switched and symmetric pursuit/evasion games using online model predictive control with application to autonomous aircraft, IEEE Trans. Control Syst. Technol., 2012, 20(3): 604–620. Xu X and Antsaklis P J, Optimal control of switched systems based on parameterization of the switching instants, IEEE Trans. Autom. Control, 2004, 49(1): 2–16. Kamgarpour M and Tomlin C, On optimal control of non-autonomous switched systems with a fixed mode sequence, Automatica, 2012, 48(6): 1177–1181. Liberzon D, Switching in Systems and Control, Birkhuser, Boston, 2003. Zhao J and Hill D J, On stability, L2 -gain and H∞ control for switched systems, Automatica, 2008, 44(5): 1220–1232. Long L, Wang Z, and Zhao J, Switched adaptive control of switched nonlinearly parameterized systems with unstable subsystems, Automatica, 2015, 54(4): 217–228. Engwerda J, Necessary and sufficient conditions for pareto optimal solutions of cooperative differential games, SIAM J. Control Optim., 2010, 48(6): 3859–3881. Leitmann G, Cooperative and Non-Cooperative Many Players Differential Games, Springer, Vienna, 1974. Bertsekas D P, Dynamic Programming and Optimal Control, Athena Scientific Belmont, Belmont, 1995. Salukvadze M, Vector-Valued Optimization Problems in Control Theory, Academic Press, New York, 1979.