Pareto Efficiency of Finite Horizon Switched Linear Quadratic Differential Games

A switched linear quadratic (LQ) differential game over finite-horizon is investigated in this paper. The switching signal is regarded as a non-conven...

0 downloads 14 Views 365KB Size

Download PDF

J Syst Sci Complex (2018) 31: 173–187

Pareto Eﬃciency of Finite Horizon Switched Linear Quadratic Diﬀerential Games∗ HUANG Yabing · ZHAO Jun

DOI: 10.1007/s11424-018-7439-7 Received: 22 December 2017 / Revised: 27 December 2017 c The Editorial Oﬃce of JSSC & Springer-Verlag GmbH Germany 2018 Abstract A switched linear quadratic (LQ) diﬀerential game over ﬁnite-horizon is investigated in this paper. The switching signal is regarded as a non-conventional player, afterwards the deﬁnition of Pareto eﬃciency is extended to dynamics switching situations to characterize the solutions of this multi-objective problem. Furthermore, the switched diﬀerential game is equivalently transformed into a family of parameterized single-objective optimal problems by introducing preference information and auxiliary variables. This transformation reduces the computing complexity such that the Pareto frontier of the switched LQ diﬀerential game can be constructed by dynamic programming. Finally, a numerical example is provided to illustrate the eﬀectiveness. Keywords game.

1

Dynamic optimization, linear quadratic problems, Pareto eﬃciency, switched diﬀerential

Introduction

Game theory, as a branch of applied mathematics, has been applied to many real-world directions due to its superiority in modeling the complexly interactive decision making process consists of multiple players who are equipped with their own objectives respectively, such as cyber-physical systems[1, 2] , microgrids[3–5] , wind power systems[6] and chemical engineering processes[7] . As a powerful tool handling multi-player decision making over time, the concept of diﬀerential games is introduced to deal with the situations where the environment aﬀecting and being aﬀected by players is described by diﬀerential equations[8, 9] . As a result, the history of the environment state and that of each player’s decision should be taken into consideration before making the current action. The introduction of the diﬀerential equations makes it possible to HUANG Yabing · Zhao Jun (Corresponding author) College of Information Science and Engineering, Northeastern University, Shenyang 110819, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China. Email: yabing [email protected]; [email protected]. ∗ This work was supported by the National Natural Science Foundation of China under Grant No. 61773098 and the 111 Project under Grant No. B16009. This paper was recommended for publication by Guest Editor LIU Tengfei.

174

HUANG YABING · ZHAO JUN

handle much more sophisticated situations, such as well-known pursuit-evasion problems[10, 11] , multi-agent collision avoidance problems[12, 13] , and cooperative control problems[14, 15] . Moreover, some classical control problems can also be converted to diﬀerential game problems, and further be solved from the perspective of game theory. In [16], based on diﬀerential game theory, the H2 /H∞ control problem of stochastic systems is solved by being formulated as a particular case of non-zero sum game with two players (a controller and an external disturbance) and two pay-oﬀ functions representing H2 and H∞ criteria. Taking a step back to look at the macroscopical picture, it is reasonable to say that diﬀerential game theory can be regarded as a generalization of optimal control. In optimal control, there is only one player attempting to optimize its performance criterion by seeking the best strategy subject to a dynamic constraint of environment state variable, while in diﬀerential game theory, there exist several players, each attempting to optimize its own performance criterion subject to the common dynamic constraint. This closely connected relationship makes it possible to take full advantage of mature techniques in the ﬁeld of optimal control, such as Pontryagin’s maximum principle and dynamic programming. The above mentioned works in [8–16] make a contribution to the development of diﬀerential game theory, however, they considered only the non-switched environment dynamics situations. It is worth pointing out that the environment dynamics may vary according to certain external parameters, or just switch from one operating mode to another. The piecewise deterministic diﬀerential game is considered in [17] where the state equation switches randomly among different dynamics at random instants, which is caused by the machine’s breakdown and repair of manufacturing systems. Taking pursuit-evasion game for another example, the evader in [18] can change its operating mode to become a pursuer if the switching criteria based on the current and future state of controlled vehicle is satisﬁed. Nevertheless, under some circumstances, the switching in system dynamics is neither stochastic nor internal forced (i.e., the switchings are generated implicity when the state trajectory intersects some switching sets), but is another instrument of control to be designed. Then it is reasonable to model the switching signal as an extra player who also makes an attempt to optimize its strategy by taking an active participation in the diﬀerential game consisting of conventional players. The fact that the switching signal is a continuous-in-time but discrete-invalue function, makes it quite diﬃcult to optimize the switching signal even in the unconstrained scenarios, let alone the coupled constrained problems in the framework of diﬀerential game. In order to handle this special character of the switching signal, the parameterization approach of switching instants proposed in [19] transcribes the undetermined switching instants into extended states. Then, the conventional optimal control techniques can be adopted to solve the optimal switching instants problem of the extended systems on assumption that the switching mode sequence is prescribed. Even though the result is in presentation of two subsystems with only one switching, the same procedure can be repeated such that the corresponding number of auxiliary states can be introduced to parameterize multiple switching instants. Under the same assumption, a novel method proposed in [20], without state extension, directly regards multiple switching instants as a decision vector. In these two papers, the optimal control prob-

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

175

lems of switched systems are composed of only one controller and a switching signal, where the switching signal is not taken as a player, therefore does not have its own performance index. In this paper, in contrast, besides the presentation of multiple controllers, a special form of performance criterion is constructed to evaluate switching law’s quality, and to act as a guideline for the design of the switching signal. Motivated by the analysis above, a switched LQ diﬀerential game over ﬁnite horizon is considered in this paper. The contributions of this paper are threefold: • Compared with the conventional diﬀerential games, the dynamics of state evolution are generalized to switched scenarios, which makes it possible to handle much sophisticated real-world situations. • The switching signal, being another measure of control rather than being a stochastic variable nor being internal forced, greatly enlarges the freedom in the controller design procedure. • Paralleling the conventionally modelling control inputs as players, the switching signal is also regarded as a player pursuing the optimization of its own performance criterion. Then the cooperative game among players, involving the controllers and the switching signal, is considered to characterizethe interactive decision making process, which provides a new method for the simultaneous design of the multiple controllers and the switching signal. Throughout this paper some standard notations are utilized. The structure of this paper is organized as follows. The control problem is posed in Section 2, and the main result is derived in Section 3. In Section 4, a numerical example is given to validate the eﬀectiveness of our results and we make a concluding remark in Section 5.

2

Problem Statement

Consider N -person ﬁnite-horizon switched LQ diﬀerential game whose environment dynamics are described by the switched linear diﬀerential equation as below: x˙ = Aσ x +

N

Bi,σ ui ,

(1)

i=1

where x ∈ Rn and ui ∈ Rmi denote the state of system and ith player’s action, respectively. Aj , Bi,j , ∀j ∈ {1, 2, · · · , M } are dynamical matrices of appropriate dimensions, with M denoting the number of subsystems. σ(t) : R+ → {1, 2, · · · , M } is a piecewise constant switching signal orchestrating the activation among subsystems. Assumption 1

σ has ﬁnite number of switchings on any ﬁnite interval of time.

This is a standard assumption in the switched systems literatures [21–23] to rule out Zeno behavior. How to solve the optimization of the switching signal with Zeno behavior is a challenging topic which is beyond the scope of this paper. Under this non-Zeno assumption, a

HUANG YABING · ZHAO JUN

176

switching sequence can be written as σ(t) = (t0 , i0 ), (t1 , i1 ), · · · , (tNs , iNs ) , where the subscript Ns is a prescribed number of switchings. Also, we assume ik−1 = ik for any k ∈ {1, 2, · · · , Ns }, then the admissible strategy set for the switching signal is deﬁned as Γσ (Ns ) = (t0 , i0 ), (t1 , i1 ), · · · , (tNs , iNs ) | t0 < t1 < · · · < tNs , ij ∈ {1, 2, · · · , M }, ∀j = 0, 1, · · · , Ns . Each player is equipped with its own performance criterion over ﬁnite horizon which it would like to minimize. Here the LQ form is adopted as follows: tf T min Ji = min x (t)Qi x(t) + uT (2) i (t)Ri ui (t) dt, ui ∈Ui

ui ∈Ui

t0

where Qi ≥ 0 and Ri > 0 are ith player’s weighting matrices, which describes its preference between running state x(t) and its own control input ui (t), and that between the components of them. The admissible set Ui of ui is deﬁned as Ui {ui | ui ∈ Cp [t0 , tf ], u(t) ∈ Rmi }, that is to say, ith player can only choose strategy from the set of piecewise continuous functions for t ∈ [t0 , tf ] that take value in Rmi . Compared with the conventional diﬀerential game theory, the introduction of the switching signal not only leads to a sophisticated dynamical behavior of state evolution, but has a direct inﬂuence on the process that each player pursues its objective, so the switching signal is also modelled as a player who only cares the scrap value, i.e., min

σ∈Γσ (Ns )

Jσ =

min

σ∈Γσ (Ns )

xT (tf )QT x(tf ) ,

(3)

where QT ≥ 0 is the weighting matrix of the switching signal. In summary, the objective of this paper is to solve Problem 1 stated below. Problem 1 Multi-objective switched cooperative game. Consider the switched dynamical systems (1) and players’ performance criteria (2)–(3), given the a priori time instants t0 , tf and initial condition x(t0 ) = x0 , the objective is to solve the interactive decision making problem with the help of cooperative game theory by ﬁnding suitable control laws ui , i = 1, 2, · · · , N and the switching signal σ(t).

3 3.1

Main Results Pareto Eﬃciency

Section 2 portraits a situation where there is more than one player aﬀecting the system and these players decide to coordinate their actions in order to minimize their costs. By cooperation, in general, the cost a speciﬁc player incurs is not uniquely determined anymore. Therefore, the concept of Pareto eﬃcient solution is introduced to appraise the multiple objectives brought by multi-player’s respective performance criterion.

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

177

Deﬁnition 3.1 (Pareto eﬃciency) An (N + 1)-tuple of strategies (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) is Pareto eﬃcient if the set of inequalities Ji (σ, u1 , u2 , · · · , uN ) ≤ Ji (σ ∗ , u∗1 , u∗2 , · · · , u∗N ),

i = 1, 2, · · · , N,

Jσ (σ, u1 , u2 , · · · , uN ) ≤ Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N ),

(4) (5)

where at least one of the inequalities is strict, does not allow for any solution (σ, u1 , u2 , · · · , uN ) ∈ Γσ (Ns )×U1 ×U2 ×· · ·×UN . The corresponding point Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N ), J1 (σ ∗ , u∗1 , u∗2 , · · · , u∗N ), · · · , JN (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) ∈ RN +1 is called a Pareto solution, and the set of all Pareto solutions is called the Pareto frontier. Remark 3.2 As a generalization of conventional Pareto eﬃciency[24] into dynamics switching scenarios, Deﬁnition 3.1, taking the switching signal’s cost into consideration, deﬁnes a partial order relationship between any two aggregate objective vectors, which allows us only focus on solutions that cannot be improved upon by all players simultaneously. Lemma 3.3 An (N + 1)-tuple of strategies (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) is Pareto eﬃcient if there

N +1 exists αi ∈ (0, 1) with i=1 αi = 1 such that J(σ ∗ , u∗1 , u∗2 , · · · , u∗N ) ≤ J(σ, u1 , u2 , · · · , uN ), ∀(σ, u1 , u2 , · · · , uN ) ∈ Γσ (Ns ) × Γ1 × Γ2 × · · · × ΓN , where J(σ, u1 , u2 , · · · , uN ) =

N

i=1

(6)

αi Ji + αN +1 Jσ .

Proof Inspired by the works in [25] on the non-switching situations, this proof can be undertaken by contradiction. Assume (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) minimizing J(σ, u1 , u2 , · · · , uN ) under a given set of convex combination coeﬃcients (α1 , α2 , · · · , αN +1 ) is not Pareto eﬃcient, then there exists another (N + 1)-tuple of strategies denoting as ( σ, u 1 , u 2 , · · · , u N ), such that Ji ( σ, u 1 , u 2 , · · · , u N ) ≤ Ji (σ ∗ , u∗1 , u∗2 , · · · , u∗N ),

i = 1, 2, · · · , N,

Jσ ( σ, u 1 , u 2 , · · · , u N ) ≤ Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N ), where at least one inequality is strict. Then J( σ, u 1 , u 2 , · · · , u N ) =

N

αi Ji ( σ, u 1 , u 2 , · · · , u N ) + αN +1 Jσ ( σ, u 1 , u 2 , · · · , u N )

i=1

<

N

αi Ji (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) + αN +1 Jσ (σ ∗ , u∗1 , u∗2 , · · · , u∗N )

i=1

= J(σ ∗ , u∗1 , u∗2 , · · · , u∗N ),

(7)

which is in contrast with the fact that (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) minimizing the weighted sum of all players’ cost functions. Thus, (σ ∗ , u∗1 , u∗2 , · · · , u∗N ) is Pareto eﬃcient. Remark 3.4 Lemma 3.3 states that every strategy tuple minimizing the weighted sum of all players’ cost functions is a Pareto solution of the original cooperative game. The reason

178

HUANG YABING · ZHAO JUN

why the original multi-objective interactive decision making problem can be converted into a family of parameterized single-objective problems lies in the cooperation among all the players including a switching signal and N controllers, that they agree to coordinate their strategies in view of optimizing a joint cost function. Even though each player has its own cost function which diﬀers more or less from that of other player, players in cooperative game, modelled as a group rather than individuals, seek for the strategies that lead to the best outcome for the whole group, which makes it possible to transform the decision problem by integrating the whole group’s preference information among multiple cost functions. For the sake of simplicity, we concentrate on the case of only two subsystems where the switching only occurs once, then without loss of generality, it is supposed that subsystem 1 is active on the interval t ∈ [t0 , ts ) and subsystem 2 is active on the interval t ∈ [ts , tf ], where ts denotes the switching instant. According to Lemma 3.3, the original Problem 1 can be parameterized as a family of single-objective optimal control problems. Problem 2 Equivalent single-objective optimal control problem. For the switched system x˙ = A1 x + B 1 u,

t ∈ [t0 , ts ),

(8)

x˙ = A2 x + B 2 u,

t ∈ [ts , tf ],

(9)

where B j := [B1,j , B2,j , · · · , BN,j ] for j = 1, 2 denotes the augmented control coeﬃcient matrix T T T and u := [uT 1 , u2 , · · · , uN ] denotes aggregated control input, ﬁnd a switching instant ts and N control inputs ui (t), i = 1, 2, · · · , N, t ∈ [t0 , tf ] such that the weighted sum of cost functions under the given combination coeﬃcients (α1 , α2 , · · · , αN +1 ), J(ts , u1 , u2 , · · · , uN ) =

N

αi Ji + αN +1 Jσ

i=1 tf

= t0

xT Qx + uT Ru dt + xT (tf )QT x(tf )

(10)

is minimized, where Q := N i=1 αi Qi , R := diag(α1 R1 , α2 R2 , · · · , αN RN ) and QT := αN +1 QT are weighting matrices. Here t0 , tf and x(t0 ) = x0 are given a priori. 3.2

Parameterization of the Switching Instant

Following the parameterization approach of switching instants proposed by [19], an auxiliary state variable xn+1 is introduced corresponding to the switching instant ts , which satisﬁes dxn+1 = 0, dt xn+1 (0) = ts .

(11) (12)

Next, a new independent time variable τ is introduced to establish the piecewise linear mapping relationship between t and τ as τ ∈ [0, 1] , t0 + (xn+1 − t0 ) · τ, t= (13) xn+1 + (tf − xn+1 ) · (τ − 1), τ ∈ [1, 2] .

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

179

With the aim of auxiliary state variable xn+1 and auxiliary time variable τ , Problem 2 can be transcribed into the following equivalent problem. Problem 3 Parameterized equivalent single-objective optimal control problem. For the switched dynamical systems dx = (xn+1 − t0 ) · (A1 x + B 1 u), dτ dxn+1 = 0, dτ

(14) (15)

when τ ∈ [0, 1] and dx = (tf − xn+1 ) · (A2 x + B 2 u), (16) dτ dxn+1 = 0, (17) dτ when τ ∈ [1, 2], ﬁnd a switching instant ts and an augmented control input u(τ ), τ ∈ [0, 2] such that the cost functional 1 Ji = (xn+1 − t0 ) · (xT Qx + uT Ru) dτ 0

+ 1

2

(tf − xn+1 ) · (xT Qx + uT Ru) dτ + xT (tf )QT x(tf )

(18)

is minimized. Here, t0 , tf and x(0) = x0 are given a priori. Remark 3.5 The transformation from Problem 2 into the equivalent Problem 3 makes it possible to eliminate the undetermined switching instant from the range of integration, which means that it no longer has a varying switching instant in the τ -domain but with an unknown constant xn+1 representing the varying switching instant through τ ∈ [0, 2]. Then Problem 3 is regarded as an optimal control problem parameterized by the switching instant xn+1 . Although here in this paper only one switching case is considered, in the case of more than one switching, if we apply similar transcriptions, the original problem could still be transformed into a family of parameterized optimal control problems by introducing corresponding number of auxiliary variables. Assume that a ﬁxed xn+1 is given, according to Bellman’s principle of optimality[26] , the HJB equations are obtained as follows: ∂V ∗ (x, τ, xn+1 ) ∂τ

T ∂V ∗ (x, τ, xn+1 ) (A1 x + B 1 u) = min (xn+1 − t0 ) · xT Qx + uT Ru + ∂x u(τ ) −

(19)

when τ ∈ [0, 1), and ∂V ∗ (x, τ, xn+1 ) ∂τ

T ∂V ∗ T T (x, τ, xn+1 ) (A2 x + B 2 u) = min (tf − xn+1 ) · x Qx + u Ru + ∂x u(τ ) −

(20)

HUANG YABING · ZHAO JUN

180

when τ ∈ [1, 2]. Taking xn+1 as a parameter, we can obtain the parameterized Hamiltonian function as ∂V ∗ H x, τ, xn+1 , u, ∂x ⎧ T ∂V ∗ ⎪ T T ⎪ ⎪ ⎨ (xn+1 − t0 ) · x Qx + u Ru + ∂x (x, τ, xn+1 ) (A1 x + B 1 u) , τ ∈ [0, 1), (21) T ∗ ⎪ ⎪ ∂V T T ⎪ (t − x ⎩ (x, τ, xn+1 ) (A2 x + B 2 u) , τ ∈ [1, 2]. f n+1 ) · x Qx + u Ru + ∂x According to the stationarity conditions, associated feedback control policies are given by 1 −1 T ∂V ∗ ∂H = 0 ⇒ u = − R Bj · (x, τ, xn+1 ), ∂u 2 ∂x

j = 1, 2,

(22)

and substituting (22) into (19)–(20), one can obtain the parameterized HJB equations as follows: ∂V ∗ (x, τ, xn+1 ) ∂τ T T ∗ ∂V ∗ 1 ∂V ∗ −1 T ∂V =(xn+1 − t0 ) xT Qx + A1 x − B1R B1 ∂x 4 ∂x ∂x −

(23)

in the interval τ ∈ [0, 1) and ∂V ∗ (x, τ, xn+1 ) ∂τ T T ∗ ∂V ∗ 1 ∂V ∗ −1 T ∂V T =(tf − xn+1 ) x Qx + A2 x − B2R B2 ∂x 4 ∂x ∂x −

(24)

in the interval τ ∈ [1, 2]. From the observation that the switched environment dynamics are linear in the state and control variables, and that the weighted performance criterion is in linear quadratic form, it is reasonable to assume that the value function also takes the linear quadratic form, i.e., V (x, τ, xn+1 ) = xT P (τ, xn+1 )x,

(25)

where P (τ, xn+1 ) is a positive deﬁnite symmetric matrix, i.e., P T (τ, xn+1 ) = P (τ, xn+1 ). Then the partial diﬀerential equations can be degenerated into parameterized Riccati equations as follows: −

∂P −1 T = (xn+1 − t0 )(AT B1 P ) 1 P + P A1 + Q − P B 1 R ∂τ

(26)

−

∂P −1 T = (tf − xn+1 )(AT B2 P ) 2 P + P A2 + Q − P B 2 R ∂τ

(27)

for τ ∈ [0, 1), and

for τ ∈ [1, 2]. Along with the boundary conditions P (2, xn+1 ) = QT ,

(28)

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

181

we can solve for a ﬁxed xn+1 backward in τ and obtain the parameterized optimal value at τ = 0, i.e., the optimal J under ﬁxed switching instant xn+1 , as J(xn+1 ) = xT 0 P (0, xn+1 )x0 .

(29)

From (29), we can get the gradient information of J(xn+1 ) with respect to the switching instant xn+1 as dJ ∂P (0, xn+1 ) (xn+1 ) = xT · x0 , 0 · dxn+1 ∂xn+1

(30)

at (0, xn+1 ) has been calculated ﬁrst. Diﬀerentiating (26)–(27) as long as the value of ∂x∂P n+1 with respect to xn+1 leads to the following evolution dynamics of ∂x∂P by using the fact that n+1 ∂ ∂ ∂ ∂ ( ) = ( ): ∂xn+1 ∂τ ∂τ ∂xn+1 −

∂ ∂τ

∂P (τ, xn+1 ) ∂xn+1

−1

T

=(AT B1 P ) 1 P + P A1 + Q − P B 1 R ∂P ∂P −1 T −1 T ∂P T ∂P + A1 − B1R B1 P − P B1R B1 + (xn+1 − t0 ) A1 ∂xn+1 ∂xn+1 ∂xn+1 ∂xn+1

(31)

when τ ∈ [0, 1), and ∂ ∂P (τ, xn+1 ) − ∂τ ∂xn+1 −1

T

= − (AT B2 P ) 2 P + P A2 + Q − P B 2 R ∂P ∂P −1 T −1 T ∂P T ∂P + A2 − B2R B2 P − P B2R B2 + (tf − xn+1 ) A2 ∂xn+1 ∂xn+1 ∂xn+1 ∂xn+1

(32)

when τ ∈ [1, 2). Equations (26)–(27) and (31)–(32) together with the following boundary conditions P (2, xn+1 ) = QT ,

(33)

∂P (2, xn+1 ) = 0, ∂xn+1

(34)

and the following continuity condition P (1− , xn+1 ) = P (1+ , xn+1 ),

(35)

even though are almost impossible to obtain the analytic solution, can be numerically solved by using Runge-Kutta method. Remark 3.6 In this paper, we consider two-subsystems case only for the sake of brevity. As for the case with several subsystems and more than one switchings, the above procedure is still theoretically valid, except for the increasing complexity of computation.

182

HUANG YABING · ZHAO JUN

Remark 3.7 We do not consider the aspect of selecting a particular solution on the Pareto frontier. This is generally posed as a function optimization problem, see bargaining in game theory[9] and compromise solutions in multi-objective or vector-valued optimization[27] . In summary, the calculation of Problem 1 can be realized by the following algorithm. Algorithm 1 Step 1 Initialization. Specify the information given a priori including: The dynamical matrices of subsystems, the weighting matrices of each player’s performance criterion Qi , Ri , QT , the initial point x0 and the prescribed switching order among subsystems. Step 2 Generating the convex weighting coeﬃcient tuples (α1 , α2 , · · · , αN +1 ). Step 3 Optimization under the given convex coeﬃcient. 1) Compute the weighted matrices Qi , Ri and QT for each given convex coeﬃcient. 2) Make an initial guess of xn+1 ∈ [t0 , tf ]. 3) Calculate the performance criterion’s value J and gradient with respect to xn+1 by solving equations (26)–(27) and (31)–(32) along with (33)–(35). 4) Update the value of xn+1 with the gradient information obtained in 3). 5) Repeat 3)–4) until the termination condition is satisﬁed, and then calculate and output the optimal switching instant ts based on (13), together with values of performance index for each player. Step 4 Constructing the approximate Pareto frontier. Remark 3.8 The reason why the Pareto frontier constructed in Step 4 is only an approximation is that the solutions obtained in Step 3 are only of ﬁnite number due to the limited computation capacity, and there always exists approximating error in the procedure of depicting a surface with only a ﬁnite number of points. Intuitively, the more solutions are calculated, the less approximating error it will be, if the time-complexity on the computation level is ignored.

4

Numerical Example

In this section, a numerical example is presented to validate the eﬀectiveness of the method proposed in Section 3. Consider the switched system consisting of ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0.5 1 1 ⎦ x + ⎣ ⎦ u1 + ⎣ ⎦ u2 , Σ1 : x˙ = ⎣ 0.5 −1 0 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 0.5 0 1 ⎦ x + ⎣ ⎦ u1 + ⎣ ⎦ u2 . Σ2 : x˙ = ⎣ 0.5 1 1 1 Assume t0 = 0, tf = 3 and the system switches once at t = ts ∈ [t0 , tf ] from subsystem 1 to

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

183

subsystem 2. The performance criteria for u1 and u2 are ⎡ ⎤ tf 2 0.5 T 2 ⎣ ⎦ J1 = x (t) x(t) + u1 (t) dt , t0 0.5 1 ⎡ ⎤ tf 1 0.5 T 2 ⎣ ⎦ x (t) x(t) + 2u1 (t) dt, J2 = t0 0.5 2 respectively, and the performance criterion for the switching signal is ⎡ ⎤ 1 ⎦ x(tf ). Jσ = x(tf ) 2 = xT (tf ) ⎣ 1 The objective is to ﬁnd the Pareto eﬃcient switching instant ts and control inputs u1 (t), u2 (t). We choose 240 diﬀerent weighting coeﬃcient tuples (α1 , α2 , α3 ), then following the Algorithm 1 in Subsection 3.2, the corresponding solutions are obtained by solving a family of parameterized equivalent single-objective optimal control problems and plotted in Figure 1, and the projections on plane J1 -J2 , J1 -J3 and J2 -J3 are given in Figures 2–4, respectively. It can be seen from Figure 1 that the magnitude of the switching signal’s performance index is notably less than that of other two players’, which is caused by the special form of the switching signal’s performance index, i.e., the switching signal cares only the scrap value of the state trajectory over the time interval [t0 , tf ]. Players u1 and u2 are concerned partially with the accumulation of weighted sum of state components, then intuitively the scarp value would, to a certain extent, reduce along with the process u1 and u2 pursue the minimization of their own performance indices. It is easy to verify that the solutions are all non-dominated, therefore are optimal in the sense of Pareto. Here 4 typical solutions are chosen and marked in Figures 1–4 with diﬀerent symbols to illustrate the non-dominant relationship between any pair of Pareto eﬃcient solutions: 3 minimum points (Points 1–3) along each axis direction respectively, and 1 solution (Point 4) giving consideration to all performance criteria simultaneously and equally. The corresponding switching instants and values of diﬀerent performance indices are listed in Table 1. Table 1 vividly portraits the feature of Pareto eﬃciency: Pareto eﬃcient solutions are a subset of solutions that describe the best trade-oﬀ between objectives in multi-objective decision making problems. Take Point 1 for example. Due to the largish weight allocated to J1 , the whole group consisting of u1 , u2 and σ pays almost all attention to the minimization of J1 . Meanwhile, the values of the other two performance criteria, J2 and Jσ , occupy a much small portion of the weighted performance criterion J, thus the increases in J2 and Jσ only have little inﬂuence on the value of J. Then Point 1 has the minimum value of J1 but relatively large values of J2 and Jσ , which can also be validated by Figures 2–3. The similar analysis also applies to Point 2 and Point 3. As for Point 4, since the weighting coeﬃcients are chosen equally as α1 = α2 = α3 = 13 , the corresponding Pareto eﬃcient solution demonstrate the impartial trade-oﬀ among the 3 conﬂicting objectives.

HUANG YABING · ZHAO JUN

184

x 10

−3

1.4

1.2 Pareto efficient solutions Point 1 Point 2 Point 3 Point 4

1

Jσ

0.8

0.6

0.4

0.2

0 0 5 10 15 20 J

Figure 1

4.5

3.5

4

3

J 2

2

2.5

1.5

0.5

1

1

The Pareto eﬃcient solutions obtained by Algorithm 1

20 Pareto efficient solutions Point 1 Point 2 Point 3 Point 4

18

16

J

2

14

12

10

8

6

4 0.5

Figure 2

1

1.5

2

2.5 J1

3

3.5

4

4.5

The projections of Pareto eﬃcient solutions on J1 -J2 plane

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

1.4

x 10

−3

1.2

1

J

σ

0.8

0.6

Pareto efficient solutions Point 1 Point 2 Point 3 Point 4

0.4

0.2

0 0.5

1

Figure 3 1.4

x 10

1.5

2

2.5 J1

3

3.5

4

4.5

The projections of Pareto eﬃcient solutions on J1 -J3 plane

−3

Pareto efficient solutions Point 1 Point 2 Point 3 Point 4

1.2

1

J

σ

0.8

0.6

0.4

0.2

0

4

6

8

10

12 J

14

16

18

20

2

Figure 4

The projections of Pareto eﬃcient solutions on J2 -J3 plane

185

HUANG YABING · ZHAO JUN

186 Table 1

Switching instants and performance values of 4 typical solutions ts

J1

J2

Jσ

Point 1

2.3912

0.7060

19.1133

0.0005

Point 2

2.7006

4.1355

4.2988

0.0013

Point 3

2.0242

0.9586

11.9312

0.0001

Point 4

2.6385

2.8457

5.1511

0.0006

Even though only 240 points are presented here, it is still obvious that Pareto eﬃcient solutions could constitute a convex surface if the amount of Pareto eﬃcient solutions trends to inﬁnity, which is in accord with the fact that the original switched cooperative game is a convex optimization problem.

5

Conclusion

In this paper, the cooperative diﬀerential game under switched dynamics was considered. The deﬁnition of Pareto eﬃciency is extended to switched scenarios, which lays the foundation of tackling cooperative game under the dynamics switching situation. The original multiobjective cooperative game problem is transformed equivalently into a family of single-objective optimal problems by introducing the preference information, i.e., convex combination coeﬃcient tuples, and furthermore transcribed into a family of parameterized single-objective optimal problems with the aim of auxiliary variables representing unknown switching instants. Dynamic programming is adopted afterwards to tackle the optimization process and the Pareto frontier can be constructed accordingly. Finally, the eﬀectiveness of the methods is illustrated by a numerical example.

References [1] [2] [3] [4]

[5] [6]

Li Y, Quevedo D E, Dey S, et al., A game-theoretic approach to fake-acknowledgment attack on cyber-physical systems, IEEE Trans. Signal Inform. Process. Netw., 2017, 3(1): 1–11. Liu X, Dong M, Ota K, et al., Service pricing decision in cyber-physical systems: Insights from game theory, IEEE Trans. Serv. Comput., 2016, 9(2): 186–198. Mylvaganam T and Astolﬁ A, Control of microgrids using a diﬀerential game theoretic framework, IEEE Conf. Decis. Control, Osaka, Japan, 2015. Saad W, Han Z, Poor H V, et al., Game-theoretic methods for the smart grid: An overview of microgrid systems, demand-side management, and smart grid communications, IEEE Signal Process. Mag., 2012, 29(5): 86–105. Srikantha P and Kundur D, A DER attack-mitigation diﬀerential game for smart grid security analysis, IEEE Trans. Smart Grid, 2016, 7(3): 1476–1485. Mei S, Zhang D, Wang Y, et al., Robust optimization of static reserve planning with large-scale

PARETO EFFICIENCY OF SWITCHED DIFFERENTIAL GAME

[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

[18]

[19] [20] [21] [22] [23] [24] [25] [26] [27]

187

integration of wind power: A game theoretic approach, IEEE Trans. Sustainable Enery, 2014, 5(2): 535–545. Li Y, Mu Y, Yuan S, et al., The game theoretical approach for multi-phase complex systems in chemical engineering, Journal of Systems Science & Complexity, 2017, 30(1): 4–19. Ba¸sar T and Olsder G J, Dynamic Noncooperative Game Theory, SIAM, Philadelphia, 1998. Engwerda J, LQ Dynamic Optimization and Diﬀerential Games, John Wiley & Sons, Chichester, 2005. Ho Y C, Bryson A, and Baron S, Diﬀerential games and optimal pursuit-evasion strategies, IEEE Trans. Autom. Control, 1965, 10(4): 385–389. Garcia E, Casbeer D W, and Pachter M, Active target defence diﬀerential game: Fast defender case, IET Contr. Theory Appl., 2017, 11(17): 2985–2993. Mylvaganam T, Sassano M, and Astolﬁ A, A diﬀerential game approach to multi-agent collision avoidance, IEEE Trans. Autom. Control, 2017, 62(8): 4229–4235. Li S and Hou C, On the collision detection problem of two moving objects described by algebraic sets, Journal of Systems Science & Complexity, 2007, 20(1): 108–118. Semsar-Kazerooni E and Khorasani K, Multi-agent team cooperation: A game theory approach, Automatica, 2009, 45(10): 2205–2213. Gao H, Petrosyan L, Qiao H, et al., Cooperation in two-stage games on undirected networks, Journal of Systems Science & Complexity, 2017, 30(3): 680–693. Zhang Q, H2 /H∞ control problems of backward stochastic systems, Journal of Systems Science & Complexity, 2014, 27(5): 899–910. Sethi S P and Zhang Q, Asymptotic optimal controls in stochastic manufacturing systems with machine failures dependent on production rates, Stochastics and Stochastic Reports, 1994, 48(1– 2): 97–121. Eklund J M, Sprinkle J, and Sastry S S, Switched and symmetric pursuit/evasion games using online model predictive control with application to autonomous aircraft, IEEE Trans. Control Syst. Technol., 2012, 20(3): 604–620. Xu X and Antsaklis P J, Optimal control of switched systems based on parameterization of the switching instants, IEEE Trans. Autom. Control, 2004, 49(1): 2–16. Kamgarpour M and Tomlin C, On optimal control of non-autonomous switched systems with a ﬁxed mode sequence, Automatica, 2012, 48(6): 1177–1181. Liberzon D, Switching in Systems and Control, Birkhuser, Boston, 2003. Zhao J and Hill D J, On stability, L2 -gain and H∞ control for switched systems, Automatica, 2008, 44(5): 1220–1232. Long L, Wang Z, and Zhao J, Switched adaptive control of switched nonlinearly parameterized systems with unstable subsystems, Automatica, 2015, 54(4): 217–228. Engwerda J, Necessary and suﬃcient conditions for pareto optimal solutions of cooperative differential games, SIAM J. Control Optim., 2010, 48(6): 3859–3881. Leitmann G, Cooperative and Non-Cooperative Many Players Diﬀerential Games, Springer, Vienna, 1974. Bertsekas D P, Dynamic Programming and Optimal Control, Athena Scientiﬁc Belmont, Belmont, 1995. Salukvadze M, Vector-Valued Optimization Problems in Control Theory, Academic Press, New York, 1979.

Pareto Efficiency of Finite Horizon Switched Linear Quadratic Differential Games

Recommend Documents