J Optim Theory Appl (2011) 150:251–274 DOI 10.1007/s10957-011-9836-0
State-Feedback, Finite-Horizon, Cost Density-Shaping Control for the Linear Quadratic Gaussian Framework M.J. Zyskowski · M.K. Sain · R.W. Diersing
Published online: 13 April 2011 © Springer Science+Business Media, LLC 2011
Abstract A Multiple-Cumulant Cost Density-Shaping (MCCDS) control is proposed for the case when the system is linear and the cost is quadratic. This linear optimal control results from the minimization of an analytic, convex, non-negative function of cost cumulants and target cost cumulants. The MCCDS control allows the designer to shape the initial cost density with respect to a target density approximated by target cost cumulants. A numerical experiment shows that MCCDS control compares favorably with competing control paradigms in terms of official performance measures for inter-story drifts and per-story accelerations used in the first-generation structure benchmark for seismically excited buildings. Keywords Stochastic optimal control · Dynamic programming · Cost cumulant control · Cost density-shaping · Earthquake engineering benchmark 1 Introduction In stochastic optimal control, the complete specification of the probability density function for a random cost functional J might be considered the most a designer can Communicated by Jose B. Cruz. This work was completed under the guidance of a great friend, advisor, and colleague—Dr. Michael Sain, who unfortunately passed away before this manuscript could be published. M.J. Zyskowski () Systems Engineering, Hamilton Sundstrand—Electric Systems, Rockford, IL, USA e-mail:
[email protected] M.K. Sain Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN, USA R.W. Diersing Department of Engineering, University of Southern Indiana, Evansville, IN, USA e-mail:
[email protected]
252
J Optim Theory Appl (2011) 150:251–274
do, when formulating an optimal control law with respect to that cost. Indeed, it has previously been shown empirically that the stability and performance of a two cost cumulant (2CC) control are embedded in the first two cost cumulants resultant from that control law [1]. This suggests generally that desirable features of control system behavior are directly related to the shape and location of the cost density. Unfortunately, the current cost cumulant controls have left the specification of the probability density function for the associated random cost J outside the designer’s direct influence and control. In particular, Kalman’s widely used Linear Quadratic Gaussian (LQG) control only minimizes the expected value of the cost E{J }, so that the density’s mean lies as close to the origin as possible. However, other features of the cost density “shape” encapsulated in higher-order cost cumulants are without the designer’s direct specification under this control strategy. For example, the dispersion of cost values about its mean (e.g., the variance, the second cost cumulant), the amount of skew or anti-symmetry in the cost density (e.g., related to the third cost cumulant), the heaviness in its tails (e.g., associated with the fourth cost cumulant, sometimes called the kurtosis) are not influenced by the designer. Through research post-dating that of Kalman’s, other control laws have been developed that result from minimizing functions of higher-order cost cumulants. As a result, additional attributes of cost-density appearance are accounted for with these controls, albeit in a way where the effects are not entirely understood. For instance, Risk-Sensitive (RS) control [2] gives the designer only a single parameter to vary in order to specify the weightings of all the cost cumulants in the optimization, but how this parameter corresponds to the cost density, achieved under RS control, is ambiguous. Analogously, while k cost cumulant (kCC) control [3] allows a weighted linear combination of cost cumulants to be minimized, no clear direction is provided to the designer for choosing the weightings to achieve cost density-shaping objectives known a priori. In retrospect of these foregoing observations, it may be fair to say that the LQG, kCC, and RS paradigms do not give the designer precise influence over higher-order cost statistics. There is a clear need for a design paradigm that permits the formulation of control laws upon the desired shape and location of the cost density. This proposition begs at least two questions be answered. First of all, if cost cumulant optimization yields control solutions that do not precisely constrain higher-order cost statistics, then what function of cost cumulants should be optimized? Secondly, do the cumulants provide a clear advantage over the moments for cost density-shaping? In response to the first question, we propose that probability distance measures be ideal performance indices for a more general sort of cost cumulant optimization. Indeed, probability distance measures between Gaussian approximations to densities have explicit representations in terms of the mean and the variance of the density arguments. This is easily verified for the Kullback–Leibler Divergence (KLD), the Hellinger Distance, and the Bhattacharyya Distance. Furthermore, expressions for the KLD in terms of the higher-order cumulants of its density arguments have been derived in [4]. Certain probability distance measures are also convex and positive, such as the KLD. The totality of existing results challenges us to find the control solution that optimizes a smooth, convex, positive-definite function of initial cost cumulants and analogous target statistics.
J Optim Theory Appl (2011) 150:251–274
253
On the second question, we contend that the cumulants are more suited for cost density-shaping than are the moments. For many densities, higher-order moments are not known to correspond with any feature of the cost density’s appearance, and it is not known how many moments are needed to closely approximate a cost density. Conversely, Sain and Liberty [5] found that changing cumulants beyond the fourth has little or no effect on the cost’s density function, which suggests that the first four cumulants of a cost nearly constrain its entire density function. Cost cumulant optimizations for LQ systems and costs have linear solutions, whereas cost moment optimizations for this problem class have yielded non-linear control laws. If this were not enough to motivate the usage of cost cumulants over the cost moments, perhaps the largest advantage of cumulants is the cost cumulant-generating differential equations of the LQG framework, which have no known counterpart for the moments. This paper is organized as follows. We present some preliminary results, our notation, and the MCCDS optimization formulation, followed by its solution using dynamic programming techniques. After the development of the MCCDS control solution, we validate the theory using the first-generation structure benchmark for seismically excited buildings. Here a comparison is made between the performance of a MCCDS control and the kCC controller used in the numerical experiment of [3]. A summary and conclusion ensue.
2 Background 2.1 Problem Class This work pertains to the process defined on t ∈ [t0 , tf ] with linear dynamics and additive noise, (1) dx(t) = A(t)x(t) dt + B(t)u(t) dt + G(t) dw(t), x0 = E x(t0 ) , where A ∈ C([t0 , tf ]; Rn×n ), B ∈ C([t0 , tf ]; Rn×m ), G ∈ C([t0 , tf ]; Rn×p ) and w(t) is a p-dimensional stationary Wiener process having a correlation of increments defined by T = W |τ1 − τ2 |, E w(τ1 ) − w(τ2 ) w(τ1 ) − w(τ2 )
W 0p×p .
The cost J [x, u; t0 , x0 ] is an integral-quadratic form defined by tf T x (t)Q(t)x(t) + uT (t)R(t)u(t) dt + x T (tf )Qf x(tf ), J :=
(2)
(3)
t0 n T where Q = QT ∈ C([t0 , tf ]; Sn+ ), R = R T ∈ C([t0 , tf ]; Sm ++ ), and Qf = Qf ∈ S+ . n Here S+ denotes the set of real-valued, symmetric, positive semi-definite n × n matrices, and similarly Sn++ denotes the set of real-valued, symmetric, positive-definite n × n matrices. Under this membership, we have Qf , Q 0n×n and R 0m×m ; these are typical conditions that ensure well-posedness for the associated stochastic optimal control problem.
254
J Optim Theory Appl (2011) 150:251–274
2.2 Cost Cumulants For linear state-feedback control inputs, it is shown in [3] that J is a finite χ 2 random variable on a probability space (Ω, F , P). The finiteness of J stems from the fact that for linear state-feedback controls, the “running cost” and “terminal cost” functions of (3) always satisfy the suitable polynomial growth conditions given in [6] necessary for boundedness of the expectation of the cost functional E{J }. For this class of control inputs, we know a finite number of r cumulants exist for J . The initial cumulants of (3) are given explicitly by the following recursive relationship: r−1
r r −1 κ (t )E J r−i , κr (t0 ) := E J − i −1 i 0
κ1 (t0 ) := E{J },
r ≥ 2.
i=1
The work of Liberty and Hartwig [7, 8] established that when the system and cost take the LQG form, and when the control is a linear state-feedback input u(t) = K(t)x(t), then the r cumulants of (3) are quadratic in the known initial state and are composed of functions that satisfy a family of coupled Lyapunov matrix equations. In particular, when α = t0 , the initial cost cumulants are given by κi (α) = x0T Hi (α)x0 + Di (α),
1 ≤ i ≤ r,
(4)
where the functions Hi (α) satisfy the following system of backwards-in-time, matrix differential equations, and the dynamics of the functions Di (α) depend on the Hi (α) functions, T dH1 (α) = − A(α) + B(α)K(α) H1 (α) − H1 (α) A(α) + B(α)K(α) dα − Q(α) − K T (α)R(α)K(α) := F1 H(α), K(α) , T dHi (α) = − A(α) + B(α)K(α) Hi (α) − Hi (α) A(α) + B(α)K(α) dα −2
i−1 i j =1
j
(5)
Hj (α)G(α)W GT (α)Hi−j (α)
:= Fi H(α), K(α) ,
2 ≤ i ≤ r,
dDj (α) = −Tr Hj (α)G(α)W GT (α) := Gj H(α) , dα α ∈ [t0 , tf ], 1 ≤ j ≤ r. These functions satisfy the terminal conditions H1 (tf ) = Qf ,
Hi (tf ) = 0n×n ,
i ≥ 2, (6)
D1 (tf ) = 0,
D2 (tf ) = 1,
Dj (tf ) = 0,
j ≥ 3.
J Optim Theory Appl (2011) 150:251–274
255
2.3 Notation We introduce the notation used by Pham to make restatements of the above equations more concise in the development. The state variables H(α) ∈ Rrn×n and D(α) ∈ Rr are defined as below H(α) := H1 (α), . . . , Hr (α) , D(α) := D1 (α), . . . , Dr (α) . Using these state variables, define the functions F H(α), K(α) := F1 H(α), K(α) , . . . , Fr H(α), K(α) , G H(α) := G1 H(α) , . . . , Gr H(α) , where Fi (·) and Gi (·) are defined as beforehand in (5). We also introduce a condensed form for the terminal conditions as below, Hf := Qf , 0n×n , . . . , 0n×n , Df := (0, 1, 0, . . . , 0 ). (r−2) zeros
Finally, let κ(t0 ) ∈ Rr denote the vector of cost cumulants (4), κ(t0 ) := κ1 (t0 ), . . . , κr (t0 ) . 2.4 Target Cost Statistics Given matrices for a system characterization (A, B, G), an integral-quadratic cost characterization (Q, R, Qf ), and the second-order statistics of the noise (W ), consider the cost cumulants as a result of the alternative (and unknown) linear state˜ x(t), feedback control u(t) ˜ = K(t) ˜ where K˜ ∈ C([t0 , tf ]; Rm×n ). When α = t0 , the initial cost cumulants are given by κ˜ i (α) = x0T H˜ i (α)x0 + D˜ i (α),
1 ≤ i ≤ r.
(7)
Let this set of numbers be regarded as target cost cumulants; these quantifies encapsulate the shape the target cost density. Here, the functions H˜ i (α) are determined by the same system of backwards-in-time, matrix differential equations as (5). The dynamics of D˜ i (α) will also be as before. These are ˜ ˜ d H(α) d D(α) ˜ ˜ ˜ = F H(α), K(α) , = G H(α) , dα dα ˜ f ) = Hf ;E ∗ , ˜ f ) = Df ; ∗ , α ∈ [t0 , tf ], H(t D(t
(8)
with the terminal conditions H˜ 1 (tf ) = Qf + E ∗ , D˜ 1 (tf ) = ∗ ,
H˜ i (tf ) = 0n×n ,
D˜ 2 (tf ) = 1,
i ≥ 2,
D˜ j (tf ) = 0,
j ≥ 3.
(9)
256
J Optim Theory Appl (2011) 150:251–274
Above, we use the short-hand notation, Hf ;E ∗ := Qf + E ∗ , 0n×n , . . . , 0n×n ,
Df ; ∗ := ( ∗ , 1, 0, . . . , 0 ). (r−2) zeros
Here ∗ > 0 is a small perturbation constant, and E ∗ 0n×n is a positive-definite perturbation matrix. ˜ ˜ ˜ It should be noted that (8) must be integrable under K(α) for H(α) and D(α) to ˜ exist, and hence for κ(α) to exist. This requirement means that a r-cumulant approximation for the target cost density exists, and that subsequently the MCCDS problem is well-posed. We make the important assumption of integrability of (8) for some ˜ K(α) in our development of the MCCDS control theory. 2.5 Perturbations to Terminal Conditions The terminal conditions of (5) have modified from those established in [7, 8]. In particular, the terminal condition D2 (tf ) = 1 is used instead of D2 (tf ) = 0. For (8), the terminal conditions have also been modified; D˜ 2 (tf ) = 1 replaces D˜ 2 (tf ) = 0, and the perturbations ( ∗ , E ∗ ) are used in H˜ 1 (tf ) and D˜ 1 (tf ). The motivation for these changes to the terminal conditions stems from technical aspects encountered by the authors when numerically integrating (5) and (8) under MCCDS controls with the original, unmodified terminal condition set for the cost cumulant-generating equations of the LQG framework. In particular, a singularity is encountered at the start of computing the solution for (5) and (8) for only i = 1, 2, thus perturbations to the terminal conditions of higher-order variables have not been introduced. The effects of all the aforementioned perturbations can be well-understood on the infinite horizon [t0 , ∞) by considering the process (1) with the exponentially stabiliz˜ x(t) ing control u(t) ˜ = K(t) ˜ in the absence of disturbances w(t) = 0, ∀t ∈ [t0 , ∞). We assume the control underlying the target cost cumulants possesses this quality, and we dedicate this portion of the manuscript to present two theorems associated with terminal condition perturbations. Theorem 2.1 (Effects of Matrix Perturbation) Assume that the function H˜ 1 (α), determined by (8) for i = 1 with E ∗ 0n×n , is defined on [t0 , ∞). Similarly, consider the function H˜ 1 (α) determined by (8) for i = 1, but instead with E ∗ = 0n×n . Let this solution be defined on [t0 , ∞). Let the deterministic system below be uniformly, exponentially stable T d x(t) ˜ ˜ = A(α) + B(α)K(α) x(t), ˜ dt
x(t ˜ 0 ) = x0 ,
t ∈ [t0 , ∞).
Under these assumptions, the effect of perturbation E ∗ to the terminal condition H˜ 1 (tf ) exponentially and asymptotically vanishes, lim H˜ 1 (tf + t0 − t) − H˜ 1 (tf + t0 − t) = 0.
tf →∞
(10)
J Optim Theory Appl (2011) 150:251–274
257
Remark This work focuses on the finite-horizon (tf < ∞) where as the above result applies when tf → ∞. Perturbations therefore have an effect on the solutions to cumulant-generating equations. However, the above conditions guarantee exponential convergence. So for tf suitably large, the solutions for equations for perturbed and unperturbed terminal conditions will share an approximately equal initial value on [t0 , tf ]. Hence controls constructed from matrices that result from perturbed versus unperturbed terminal condition systems in the controller design equations should not differ appreciably. Theorem 2.2 (Effects of Scalar Perturbation) For the finite-horizon [t0 , tf ], let the following be satisfied for solutions D2 (α), D˜ 2 (α), and D˜ 1 (α) of (5) and (8), κ2 (t0 ) ≥ D2 (t0 ) 1,
κ˜ 2 (t0 ) ≥ D˜ 2 (t0 ) 1,
κ˜ 1 (t0 ) ≥ D˜ 1 (t0 ) ∗ .
When these conditions are satisfied, the unity perturbations to D2 (tf ) and D˜ 2 (tf ) become negligible asymptotically. Furthermore, under the above conditions, the ∗ perturbation to D˜ 1 (tf ) is also of negligible effect. Remark Unity perturbations to D2 (tf ) and D˜ 2 (tf ) have been chosen somewhat arbitrarily, and perturbations other than unity are easily qualified and may be more preferable to the control designer. Perturbations should be chosen as small as possible, so to remove any singularities in the state derivatives at t = t0 under cost density-shaping control and to permit the integration of (5) and (8) on [t0 , tf ].
3 Problem Formulation The goal of this section is to pose a novel cost density-shaping optimization, to be termed as the MCCDS problem hereafter. Before we proceed to the problem statement, two important definitions must be introduced—the target set and admissible feedback gains. The target set is a space that the end value of the state trajectories must lie in. Put somewhat loosely, the set of admissible control gains are those control gains that can steer the state variables into the target set. The following definitions formalize these ideas. ˜ 0 ), D(t ˜ 0 )) ∈ M, where M deDefinition 3.1 (Target Set) Let (t0 , H(t0 ), D(t0 ), H(t notes the target set which is a closed subset of n×n n×n [t0 , tf ] × R × Rn×n × · · · × Rn×n × Rr × R × Rn×n × · · · × Rn×n × Rr . r times
r times
˜ Remark The goal of tracking target cumulants suggests that the trajectories of H(α) ˜ ˜ and D(α), being predetermined by K(α), will inherently have initial values in M. Definition 3.2 (Admissible Feedback Gains) Denote the allowable set of control gain values by K¯ ⊂ Rm×n and let this set be compact. For fixed r ∈ N let K(tf ) :=
258
J Optim Theory Appl (2011) 150:251–274
m×n ) such that for any Ktf ,H(tf ),D(tf ),H(t ˜ f ),D(t ˜ f ) characterize a class of C([t0 , tf ]; R K ∈ Ktf ,H(tf ),D(tf ),H(t ˜ f ),D(t ˜ f ) the solutions to
dH(α) = F H(α), K(α) , dα dD(α) = G H(α) , dα H(tf ) = Hf ,
˜ d H(α) ˜ ˜ = F H(α), K(α) , dα
˜ d D(α) ˜ = G H(α) , dα
˜ f ) = Hf ;E ∗ , H(t
α ∈ [t0 , tf ],
D(tf ) = Df ,
˜ f ) = Df ; ∗ D(t
exist and also the initial values of the state trajectories satisfy ˜ 0 ), D(t ˜ 0 ) ∈ M. t0 , H(t0 ), D(t0 ), H(t Remark The MCCDS performance index is a function of both cumulant and target ˜ variables; hence the optimal control will depend on both H(α), D(α) as well as H(α), ˜ D(α). Given this fact, the space K(tf ) has implicit dependence on the target variables, which is not evidenced explicitly in (5). Consider a scalar function g : Rr × Rr → R with vector arguments as a general ˜ For fixed κ, ˜ the function becomes performance index, which we denote by g(κ, κ). gκ˜ : Rr → R. Analogously, for fixed κ, the function becomes gκ : Rr → R. We im˜ to ensure that the ensuing oppose the following restrictions on gκ˜ (κ) and gκ (κ) timization problem is well-posed. In the following, the notation dom f is used to denote the domain of the function f (·). Suppose the function gκ˜ be analytic on its domain dom gκ˜ ; likewise let gκ be analytic on its domain dom gκ . Further assume that the function gκ˜ is convex in κ and its domain dom gκ˜ is a convex set. Finally, let the function gκ˜ be non-negative ˜ When the smoothness and convexity conditions in κ on some neighborhood of κ. ˜ = 0 and also ∇κ gκ˜ (κ) ˜ = 01×r , then the function gκ˜ is are true in addition to gκ˜ (κ) ˜ It is henceforth reasonable to use positive in some neighborhood of a fixed vector κ. a function satisfying these conditions as our optimization performance index. Consider the following optimization problem of minimizing the function g(κ(t0 ), ˜ 0 )) of the initial cost cumulants for a given set of target initial cost cumulants, over κ(t ˜ 0 )) will be some the admissible space of controls. Ideally, the function g(κ(t0 ), κ(t positive measure between initial cost cumulants and target initial cost cumulants, so that a control input which minimizes this function will inherently drive initial cost cumulants closer to the targets, and in effect will drive the initial cost density closer to the target initial cost density. Definition 3.3 (MCCDS Performance Index) Let the MCCDS performance index be defined as the function ˜ 0 ), D(t ˜ 0 ) = g κ(t0 ), κ(t ˜ 0) , φ H(t0 ), D(t0 ), H(t ˜ 0 ) denote the vectors where κ(t0 ) and κ(t κ(t0 ) = κ1 (t0 ), . . . , κr (t0 ) ,
˜ 0 ) = κ˜ 1 (t0 ), . . . , κ˜ r (t0 ) . κ(t
J Optim Theory Appl (2011) 150:251–274
259
˜ 0 ), let g(κ(t0 ), κ(t ˜ 0 )) be an anDefinition 3.4 (MCCDS Optimization) For every κ(t alytic function, convex in κ(t0 ), defined for positive values of its vector-valued argu˜ 0 ). Let r ∈ N be a ments such that it be non-negative on some neighborhood of κ(t ˜ 0 ) ∈ Rr are the vectors of initial cost cumufixed positive integer, where κ(t0 ), κ(t lants and target initial cost cumulants, respectively. Then the MCCDS optimization can be formulated as K∈Kt
min
˜ ˜ f ,H(tf ),D(tf ),H(tf ),D(tf )
˜ 0 ), D(t ˜ 0) φ H(t0 ), D(t0 ), H(t
subject to dH(α) = F H(α), K(α) , dα dD(α) = G H(α) , dα H(tf ) = Hf ,
˜ d H(α) ˜ ˜ = F H(α), K(α) , dα
˜ d D(α) ˜ = G H(α) , dα
˜ f ) = Hf ;E ∗ , H(t
α ∈ [t0 , tf ],
D(tf ) = Df ,
˜ f ) = Df ; ∗ D(t
where φ(·) = g(·) and the initial values of the state trajectories satisfy ˜ 0 ), D(t ˜ 0 ) ∈ M. t0 , H(t0 ), D(t0 ), H(t
4 Problem Solution We now seek for a solution to the MCCDS optimization by employing the traditional techniques of dynamic programming for Mayer-form problems as described in Fleming and Rishel [6] following closely along the same lines as Pham. This development will specifically adapt the Hamilton–Jacobi–Bellman (HJB) Verification Lemma to the formulation of the MCCDS problem. The key idea in our approach is to embed the optimization problem in a larger class of problems, where the terminal time tf is displaced to some < tf . We introduce ˜ ˜ ∈ Rr thus: dynamic programming variables Y(), Y() ∈ Rrn×n and Z(), Z() Y() := Y1 (), . . . , Yr () , Z() := Z1 (), . . . , Zr () ,
˜ Y() := Y˜ 1 (), . . . , Y˜ r () , ˜ Z() := Z˜1 (), . . . , Z˜r ()
where Yi () := Hi (),
Zi () := Di (),
Y˜ i () := H˜ i (),
Z˜i () := D˜ i (),
1 ≤ i ≤ r. We are now ready to define the reachable set Q and the value function V(, Y(), ˜ ˜ Z(), Y(), Z()), which are core ideas in our dynamic programming formulation.
260
J Optim Theory Appl (2011) 150:251–274
Definition 4.1 (Reachable Set) Define the reachable set as the set of displaced terminal values from which there exists a control that can take the system to the target set. More formally, we have K ˜ ˜ = ∅ . Q = , Y(), Z(), Y(), Z() ˜ ˜ ,Y(),Z(),Y(), Z() ˜ ˜ Definition 4.2 (Value Function) Let (, Y(), Z(), Y(), Z()) ∈ [t0 , tf ]×(Sn )r × ˜ ˜ Z()) be a scalar function Rr × (Sn )r × Rr and let V(, Y(), Z(), Y(), n r r V : [t0 , tf ] × S × Rr × Sn × Rr → R such that ˜ ˜ V , Y(), Z(), Y(), Z() ˜ 0 ), D(t ˜ 0 )), infK∈K() φ(H(t0 ), D(t0 ), H(t = ∞,
if K() = ∅, if K() = ∅
where K() denotes K,Y(),Z(),Y(), . ˜ ˜ Z() Remark The symbol Sn above denotes the set of real-valued, symmetric n × n matrices. ˜ ˜ The value function V(, Y(), Z(), Y(), Z()) is a function of the displaced terminal conditions, which possesses two innate properties. First, the value function is non-increasing when evaluated at terminal conditions from anywhere along a nonoptimal trajectory of the state equations resultant from non-optimal controls K = K ∗ . Second, the value function is constant when evaluated at terminal conditions from anywhere along the optimal trajectory of the state equations resultant from the control K ∗ . These necessary conditions characterize the value function. Interestingly enough, these necessary conditions are sufficient, meaning that if we ˜ ˜ can find a function W(, Y(), Z(), Y(), Z()) that evidences both of the properties above, as well as satisfying a boundary condition, then that function agrees with the value function along trajectories of the state equations under any control selection. A rigorous proof of the sufficiency of the aforementioned necessary conditions can be found in [9]. The sufficient conditions as stated are quite difficult to verify in practice. For ex˜ ˜ ample, how does one verify that a function W(, Y(), Z(), Y(), Z()) is non∗ increasing for all trajectories associated with control gains K = K ? Fortunately, the HJB verification lemma provides an easy and effective way to check this particular ˜ ˜ quality in a candidate value function, W(, Y(), Z(), Y(), Z()). Before proceeding to the statement of the lemma, it is important to note that ˙ our equations of motion H(α) = F (H(α), K(α)) are matrix equations and do not ˙ correspond directly with the standard vector-type evolution equations (e.g. x(t) = f (x(t), u(t))) that are so prevalent in dynamic systems theory. The latter equations fit the classical framework for dynamic programming and HJB verification, whereas our equations of motion necessitate the same adaptation utilized by Pham et al. in
J Optim Theory Appl (2011) 150:251–274
261
[3]. Essentially the authors’ approach makes use of the vec : Rm×n → Rmn operator, which gives as isomorphic relationship between the space of real-valued matrices and space of real-valued vectors. Through the vec(·) operator, Rm×n and Rmn are algebraically and topologically equivalent, so that relationships holding in one space must hold in the other and vice versa. In this way, vec(F (H(α), K(α))) can be thought of as f (x(t), u(t)) in the sequel. We provide the important definitions of vec(·) and Tr(·) here. To begin, define the trace operator Tr(·) for square matrices U ∈ Rn×n with components [uij ]ni,j =1 as Tr(U ) =
n n
uij δij =
i=1 j =1
n
uii .
i=1
Now the vec(·) operator is defined as the n2 × 1 column vector formed by stacking the columns of U , vec(U ) = [ u11
u21
...
un1
. . . u1n
u2n
...
unn ]T .
We are now ready to give the HJB verification lemma. ˜ ˜ Lemma 4.1 (HJB Verification) Let (, Y(), Z(), Y(), Z()) be an interior point ˜ ˜ of the reachable set Q. Assume the scalar function W(, Y(), Z(), Y(), Z()) satisfies the boundary condition ˜ 0 ), D(t ˜ 0) ˜ ˜ W , Y(), Z(), Y(), Z() = φ H(t0 ), D(t0 ), H(t ˜ ˜ , Y(), Z(), Y(), Z() ∈M
(11)
and also the HJB equation of dynamic programming, −
˜ ˜ ∂W(, Y(), Z(), Y(), Z()) ∂ r ∂W(, Y(), Z(), Y(), ˜ ˜ Z()) vec Fi Y(), K() ≥ ∂ vec(Yi ()) i=1
+
r ˜ ˜ ∂W(, Y(), Z(), Y(), Z()) i=1
+
r ˜ ˜ ∂W(, Y(), Z(), Y(), Z()) i=1
+
∂ vec(Y˜ i ())
∂Zi ()
r ˜ ˜ ∂W(, Y(), Z(), Y(), Z()) i=1
∂ Z˜i ()
˜ ˜ vec Fi Y(), K() Gi Y() ˜ , Gi Y()
∀K ∈ K(). (12)
If further for a given control K ∗ ∈ K() this function satisfies the following equation:
262
J Optim Theory Appl (2011) 150:251–274
−
˜ ˜ ∂W(, Y(), Z(), Y(), Z()) ∂ r ∂W(, Y(), Z(), Y(), ˜ ˜ Z()) vec Fi Y(), K() = min ¯ ∂ vec(Yi ()) K∈K i=1
+
r i=1
+
˜ ˜ ∂W(, Y(), Z(), Y(), Z()) ˜ ˜ K() vec Fi Y(), ∂ vec(Y˜ i ())
r ˜ ˜ ∂W(, Y(), Z(), Y(), Z())
∂Zi ()
i=1
+
r ˜ ˜ ∂W(, Y(), Z(), Y(), Z())
∂ Z˜i ()
i=1
Gi Y() ˜ , Gi Y()
(13)
˜ ˜ then the control K ∗ is optimal and W(, Y(), Z(), Y(), Z()) = V(, Y(), ˜ ˜ Z(), Y(), Z()). ˜ ˜ We now propose a function W(, Y(), Z(), Y(), Z()) characterized by an ˜ ˜ unknown function η() that is to be chosen so that W(, Y(), Z(), Y(), Z()) = ˜ ˜ V(, Y(), Z(), Y(), Z()). Definition 4.3 (Candidate Value Function) Consider a solution of the form ˜ ˜ ˜ ˜ W , Y(), Z(), Y(), Z() = η() − φ Y(), Z(), Y(), Z()
(14)
where the function η(τ ) ∈ C 1 ([t0 , tf ], R) is to be determined. ˜ ˜ For any point (, Y(), Z(), Y(), Z()) in the reachable set Q where the candidate value function is differentiable, we may differentiate it directly. The following ˜ ˜ Y(), Z()) , which will establish the form of the total time derivative d W (,Y(),Z(), d essentially is constrained by the HJB equations of dynamic programming as given in (12) and (13). Lemma 4.2 (Derivative of Candidate Value Function) Let r ∈ N be fixed and let ˜ ˜ (, Y(), Z(), Y(), Z()) ∈ Q be an interior point of the set at which the candidate value function ˜ ˜ ˜ ˜ W , Y(), Z(), Y(), Z() = η() − φ Y(), Z(), Y(), Z() is differentiable. Then its total time derivative takes the form ˜ ˜ dW(, Y(), Z(), Y(), Z()) d r ∂ T ˜ g κ(), κ() x0 Fi Y(), K() x0 =− ∂κi () i=1
J Optim Theory Appl (2011) 150:251–274
−
r i=1
−
r i=1
−
r i=1
263
T ∂ ˜ ˜ ˜ g κ(), κ() x0 Fi Y(), K() x0 ∂ κ˜ i () ∂ ˜ g κ(), κ() Gi Y() ∂κi () dη() ∂ ˜ ˜ g κ(), κ() Gi Y() . + ∂ κ˜ i () d
(15)
Proof We proceed by differentiating (14) to obtain the following formal expression which follows from the definition of the candidate value function: ˜ ˜ dW(, Y(), Z(), Y(), Z()) d r ˜ ˜ ∂φ(Y(), Z(), Y(), Z()) =− vec Fi Y(), K() ∂ vec(Yi ()) − −
i=1 r i=1 r i=1
−
˜ ˜ ∂φ(Y(), Z(), Y(), Z()) ˜ ˜ K() vec Fi Y(), ∂ vec(Y˜ i ()) ˜ ˜ ∂φ(Y(), Z(), Y(), Z()) Gi Y() ∂Zi ()
r ˜ ˜ ∂φ(Y(), Z(), Y(), Z()) i=1
∂ Z˜i ()
dη() ˜ Gi Y() , + d
∀K ∈ K(). (16)
Note the cumulant and target cost cumulant forms when the cost is integral quadratic and the control input is of the linear state feedback type. In particular, these are κi () = x0T Yi ()x0 + Zi (),
κ˜i () = x0T Y˜ i ()x0 + Z˜i ().
Note also the properties below of the vec(·) and Tr(·) operators for x ∈ Rr and A, B ∈ Rn×n , vec(B)T vec(A) = Tr(AB) = Tr(BA). x T Ax = Tr Axx T = Tr xx T A , Using the cumulant forms, the chain rule of differentiation, and the properties above we write ˜ ˜ ∂φ(Y(), Z(), Y(), Z()) vec Fi Y(), K() ∂ vec(Yi ()) ˜ ∂κi () ∂g(κ(), κ()) vec Fi Y(), K() = ∂κi () vec(Yi ()) T ˜ ∂g(κ(), κ()) = vec x0 x0T vec Fi Y(), K() ∂κi () ˜ ∂g(κ(), κ()) (17) = x0T Fi Y(), K() x0 ; ∂κi ()
264
J Optim Theory Appl (2011) 150:251–274
˜ ˜ ∂φ(Y(), Z(), Y(), Z()) ˜ ˜ K() vec Fi Y(), ˜ ∂ vec(Yi ()) =
˜ ∂ κ˜i () ∂g(κ(), κ()) ˜ ˜ K() vec Fi Y(), ˜ ∂ κ˜i () vec(Yi ())
T ˜ ∂g(κ(), κ()) ˜ ˜ vec x0 x0T vec Fi Y(), K() ∂ κ˜ i () ˜ ∂g(κ(), κ()) ˜ ˜ K() x0 ; = x0T Fi Y(), ∂ κ˜ i ()
=
(18)
˜ ˜ ∂φ(Y(), Z(), Y(), Z()) ∂Zi () =
˜ ˜ ∂κi () ∂g(κ(), κ()) ∂g(κ(), κ()) = ; ∂κi () ∂Zi () ∂κi ()
(19)
˜ ˜ ∂φ(Y(), Z(), Y(), Z()) ∂ Z˜i () =
˜ ˜ ∂ κ˜i () ∂g(κ(), κ()) ∂g(κ(), κ()) . = ˜ ∂ κ˜i () ∂ κ ˜ () ∂ Zi () i
Inserting the relations (17), (18), and (19) into the expression (16) gives the desired form (15), and the proof is complete. We are now ready to state the main result of this paper. Theorem 4.1 (State-Feedback Solution to MCCDS Optimization) Consider the LQG stochastic optimal control problem involving the process (1) and the cost (3). Then the linear state-feedback, finite-horizon, optimal control solution to the MCCDS optimization is characterized by the optimal gain ∗
K (α) = −R
−1
T
(α)B (α)
H1∗ (α) +
∂g(κ ∗ (α),κ(α)) ˜ r ∂κ (α) i
i=2
˜ ∂g(κ ∗ (α),κ(α)) ∂κ1 (α)
Hi∗ (α)
,
(20)
where the optimal cost cumulants and target cost cumulants are defined by κi∗ (α) = x0T Hi∗ (α)x0 + Di∗ (α)
and κ˜ i (α) = x0T H˜ i (α)x0 + D˜ i (α),
1 ≤ i ≤ r.
˜ ˜ D(α) follow the equations of The optimal state variables H∗ (α), D∗ (α) and H(α), motion ˜ dH∗ (α) d H(α) ˜ ˜ = F H∗ (α), K ∗ (α) , = F H(α), K(α) , dα dα ˜ d D(α) dD∗ (α) ˜ = G H∗ (α) , = G H(α) , α ∈ [t0 , tf ], dα dα
J Optim Theory Appl (2011) 150:251–274
265
˜ f ) = Hf ;E ∗ , H(t
H∗ (tf ) = Hf ,
D∗ (tf ) = Df ,
˜ f ) = Df ; ∗ D(t
where the initial values of the state trajectories satisfy ˜ 0 ), D(t ˜ 0 ) ∈ M. t0 , H(t0 ), D(t0 ), H(t Proof The objective is to identify a control gain K ∗ and a function η() such that r ∂ T dη() ˜ = min g κ(), κ() x0 Fi Y(), K() x0 ¯ d ∂κ () K∈K i i=1
+
r i=1
+
r i=1
+
r i=1
∂ ˜ g κ(), κ() Gi Y() ∂κi () T ∂ ˜ ˜ ˜ K() x0 g κ(), κ() x0 Fi Y(), ∂ κ˜ i () ∂ ˜ ˜ g κ(), κ() Gi Y() . ∂ κ˜ i ()
(21)
We begin with the minimization on the right-hand side of (21). In the following ˜ ˜ differentiation, we denote the partial derivatives of φ(Y(), Z(), Y(), Z()) as shown below to simplify the notation, ci () =
∂ ˜ g κ(), κ() , ∂κi ()
1 ≤ i ≤ r.
We differentiate the expression in braces in (21) with respect to K and set the resulting form equal to a zero matrix with the appropriate dimension. This is the necessary condition for the expression to take an extremal value on the interior of its domain. −2B () T
r
ci ()Yi () x0 x0T − 2c1 ()R()K() x0 x0T = 0m×n .
i=1
Assume that c1 () = 0, ∀ ∈ [t0 , tf ]. Since x0 x0T is an fixed rank-one matrix, we must have r ci∗ () ∗ ∗ −1 T ∗ Y () K () = −R ()B () Y1 () + c1∗ () i i=2
= −R
−1
T
()B ()
Y1∗ () +
r i=2
∂g(κ ∗ (),κ()) ˜ ∂κi () ˜ ∂g(κ ∗ (),κ()) ∂κ1 ()
Yi∗ ()
.
(22)
Let Y ∗ (τ ) and Z ∗ (τ ) denote the solutions of the equations of motion under this control selection:
266
J Optim Theory Appl (2011) 150:251–274
dY ∗ (τ ) = F Y ∗ (τ ), K ∗ (τ ) , dτ Yi∗ (tf ) = Hi (tf ),
dZ ∗ (τ ) = G Y ∗ (τ ) , dτ
Zi∗ (tf ) = Di (tf ),
(23)
τ ∈ [, tf ], 1 ≤ i ≤ r.
We assume that the solutions to the above equations exist. In addition, let κ ∗ () be the vector of cost cumulants determined using Y ∗ () and Z ∗ () determined above according to the relations κi∗ () = x0T Yi∗ ()x0 + Zi∗ (),
1 ≤ i ≤ r.
Return again to the minimized expression on the right-hand side of (21), inserting the controller (22), r T ∗ dη() ∂ ˜ x0 Fi Y (), K ∗ () x0 = g κ ∗ (), κ() ∗ d ∂κi () i=1
+ +
r
r i=1
+
r i=1
=
r i=1
+
∂ ˜ ˜ g κ ∗ (), κ() Gi Y() ∂ κ˜ i ()
∂Zi∗ ()
r ˜ ˜ ∂φ(Y ∗ (), Z ∗ (), Y(), Z())
∂ vec(Y˜ i ())
r ˜ ˜ ∂φ(Y ∗ (), Z ∗ (), Y(), Z()) i=1
=
T ∂ ˜ ˜ ˜ g κ ∗ (), κ() x0 Fi Y(), K() x0 ∂ κ˜ i ()
r ˜ ˜ ∂φ(Y ∗ (), Z ∗ (), Y(), Z())
i=1
+
∗ ∗ ˜ κ (), κ() Gi Y ()
˜ ˜ ∂φ(Y ∗ (), Z ∗ (), Y(), Z()) vec Fi Y ∗ (), K ∗ () ∗ ∂ vec(Yi ())
i=1
+
∂
g ∂κi∗ () i=1
∂ Z˜i ()
Gi Y ∗ () ˜ ˜ vec Fi Y(), K() ˜ Gi Y()
˜ ˜ Z()) dφ(Y ∗ (), Z ∗ (), Y(), . d
˜ ˜ Note that (, Y(), Z(), Y(), Z()) ∈ Q, for some K ∈ K(). Recall by assump∗ ˜ ˜ tion, K is admissible on [t0 , ]. Thus (, Y ∗ (), Z ∗ (), Y(), Z()) ∈ Q since ∗ K ∈ K(). Consider now that any displaced terminal conditions at τ < along the respective state trajectories resultant from K ∗ will also be in Q due to the fact that the restrictions of the control K ∗ to [t0 , τ ] will be in K(τ ). Clearly then, we can
J Optim Theory Appl (2011) 150:251–274
267
˜ ), Z(τ ˜ )) ∈ Q, ∀τ ∈ [t0 , ]. We wish to desimilarly argue that (τ, Y ∗ (τ ), Z ∗ (τ ), Y(τ termine η() so to enforce the above relationship for all possible displaced terminal times τ < and the associated terminal conditions, gotten from anywhere along the optimal trajectory of the state equations. ˜ ), Z(τ ˜ )) dη(τ ) dφ(Y ∗ (τ ), Z ∗ (τ ), Y(τ = , dτ dτ
τ ∈ [t0 , ].
This differential equation can be integrated over a reduced time-horizon since the state equations Yi∗ (τ ), Zi∗ (τ ), Y˜ i (τ ), and Z˜i (τ ) for 1 ≤ i ≤ r are continuously differentiable. By the Fundamental Theorem we have, ˜ ˜ ˜ 0 ), Z(t ˜ 0) φ Y ∗ (), Z ∗ (), Y(), Z() − φ Y ∗ (t0 ), Z ∗ (t0 ), Y(t t0 d ∗ ˜ ), Z(τ ˜ ) , φ Y (τ ), Z ∗ (τ ), Y(τ =− dτ t0 d η(τ ) η() − η(t0 ) = − dτ from which it is immediate that ˜ 0 ), Z(t ˜ 0 ) + φ Y ∗ (), Z ∗ (), Y(), ˜ ˜ η() = η(t0 ) − φ Y ∗ (t0 ), Z ∗ (t0 ), Y(t Z() . ˜ 0 ), Z(t ˜ 0 )) ∈ M, the target set, we can rewrite the above Since (t0 , Y ∗ (t0 ), Z ∗ (t0 ), Y(t equation as ˜ 0 ), D(t ˜ 0 ) + φ Y ∗ (), Z ∗ (), Y(), ˜ ˜ η() = η(t0 ) − φ H∗ (t0 ), D∗ (t0 ), H(t Z() . The function η() is determined up to its initial condition, which remains to be constrained for this problem. To achieve this end, the Mayer-form boundary condition (11) of the verification lemma is used, which is ˜ 0 ), Z(t ˜ 0 ) = η(t0 ) − φ Y(t0 ), Z(t0 ), Y(t ˜ 0 ), Z(t ˜ 0) W t0 , Y(t0 ), Z(t0 ), Y(t ˜ 0 ), D(t ˜ 0) = η(t0 ) − φ H(t0 ), D(t0 ), H(t ˜ 0 ), D(t ˜ 0) . (24) = φ H(t0 ), D(t0 ), H(t ˜ 0 ), D(t ˜ 0 )) and thus η() is deterThis condition requires η(t0 ) = 2φ(H(t0 ), D(t0 ), H(t mined completely as ˜ 0 ), D(t ˜ 0 ) − φ H∗ (t0 ), D∗ (t0 ), H(t ˜ 0 ), D(t ˜ 0) η() = 2φ H(t0 ), D(t0 ), H(t ˜ ˜ Z() . (25) + φ Y ∗ (), Z ∗ (), Y(), Our candidate value function becomes ˜ ˜ W , Y(), Z(), Y(), Z() ˜ 0 ), D(t ˜ 0 ) − φ H∗ (t0 ), D∗ (t0 ), H(t ˜ 0 ), D(t ˜ 0) = 2φ H(t0 ), D(t0 ), H(t ˜ ˜ ˜ ˜ Z() − φ Y(), Z(), Y(), Z() . (26) + φ Y ∗ (), Z ∗ (), Y(),
268
J Optim Theory Appl (2011) 150:251–274
The selection of K ∗ per (22) and selecting η() per (25) make W(, Y(), Z(), ˜ ˜ Y(), Z()) satisfy the requirements of the verification lemma, which we briefly verify in the following. First recall that our selection of η(t0 ) in (24) achieves (11). To verify (12) and (13), consider the following full expansion of the total time derivative of W(·), ∂W ∂W ∂W dW = + vec F Y(), K() + G Y() d ∂ ∂ vec(Y()) ∂Z() ∂W ∂W ˜ ˜ ˜ + vec F Y(), K() + G Y() . ˜ ˜ ∂ vec(Y()) ∂ Z()
(27)
A slight manipulation of the right-hand side of (27) makes it easier to see that (28) below is equivalent to (12), and (29) below is equivalent to (13). dW ≤ 0, d dW = 0, d
K = K ∗ ,
(28)
K = K ∗.
(29)
The condition (28) requires W(·) be non-increasing along non-optimal state trajectories, and (29) requires W(·) be constant along optimal state trajectories. Now to confirm that both of the above conditions hold for the determined candidate value function (26). A brief check reveals that (26) is non-increasing when evaluated along terminal conditions from trajectories resultant from any and all K = K ∗ , ˜ ˜ ˜ ˜ dφ(Y ∗ (), Z ∗ (), Y(), Z()) dW(, Y(), Z(), Y(), Z()) = d d ˜ ˜ dφ(Y(), Z(), Y(), Z()) − d ˜ ˜ dφ(Y(), Z(), Y(), Z()) = min d K∈K¯ −
˜ ˜ dφ(Y(), Z(), Y(), Z()) d
≤ 0. ˜ ˜ Further, using the displaced terminal condition (Y ∗ (), Z ∗ (), Y(), Z()) and then choosing K = K ∗ over [t0 , ] makes H(t0 ) = H∗ (t0 ), D(t0 ) = D∗ (t0 ) and also Y() = Y ∗ (), Z() = Z ∗ (), so that (26) assumes the constant value ˜ 0 ), D(t ˜ 0) ˜ ˜ W , Y ∗ (), Z ∗ (), Y(), Z() = φ H∗ (t0 ), D∗ (t0 ), H(t ∗
∗
˜
˜
and clearly d W (,Y (),Zd(),Y(),Z()) = 0. ˜ ˜ By the verification lemma, W(, Y(), Z(), Y(), Z()) = V(, Y(), Z(), ˜ ˜ Y(), Z()) and the derived control (22) is optimal.
J Optim Theory Appl (2011) 150:251–274
269
5 Ties to Existing Theory When a performance index is considered that is a linear combination of cost cumulants with positive weights, and all target statistics are set to zero, it can be directly verified that the performance index is non-negative, convex, and analytic. Hence, such selections lead to optimizations that are well-suited for the MCCDS optimization formulation and we can apply the form of the MCCDS control solution directly. Three particular cases are considered. 5.1 Linear Quadratic Gaussian Assume a performance index of the form gLQG κ1 (t0 ), κ˜ 1 (t0 ) = κ1 (t0 ) − κ˜ 1 (t0 ).
(30)
Assume the target κ˜ 1 (α) = 0, α ∈ [t0 , tf ]. We apply the form of the MCCDS control solution directly. It is immediate that the MCCDS controller form yields the desired LQG controller, that is, r>1 ∂gLQG (κ1 (α),0) ∂κi (α) MCCDS −1 T −1 T Hi (α) KLQG (α) = −R (α)B (α)H1 (α) − R (α)B (α) ∂g (κ (α),0) LQG
i=2
1
∂κ1 (α)
= 01
= −R −1 (α)B T (α)H1 (α). 5.2 k Cost Cumulants
Now assume a performance index that is a weighted sum of the first k cost cumulants, k ˜ 0) = gkCC κ(t0 ), κ(t μi κi (t0 ) − κ˜ i (t0 ) , i=1
μ1 > 0, μi ≥ 0, 2 ≤ i ≤ k, k ∈ N.
(31)
Assume the targets κ˜ i (α) = 0, α ∈ [t0 , tf ], 1 ≤ i ≤ k. We again apply the form of the MCCDS control solution. It is immediate that the MCCDS controller form yields the desired kCC controller, that is, MCCDS (α) = −R −1 (α)B T (α)H1 (α) KkCC k×1 k ∂gkCC (κ(α),0 ) ∂κi (α) −1 T Hi (α) − R (α)B (α) k×1 i=2
∂gkCC (κ(α),0 ∂κ1 (α)
μ 1
= μi
)
k μi = −R (α)B (α) H1 (α) + Hi (α) μ1 i=2 k μi −1 T = −R (α)B (α) μˆ i Hi (α) , μˆ i = , 1 ≤ i ≤ k. μ1 −1
T
i=1
270
J Optim Theory Appl (2011) 150:251–274
5.3 Risk-Sensitive Control Assume k → ∞ presents no problems with the convergence of the performance index i given in (31) when the weights are chosen to be μi = θi! and the targets are zero. That is, we assume the function below is well-defined, ∞ i θ ˜ 0) = κi (t0 ) − κ˜ i (t0 ) , gRS κ(t0 ), κ(t i!
θ > 0.
(32)
i=1
Assume the targets κ˜ i (α) = 0, α ∈ [t0 , tf ], i ∈ N. Consider a solution fitting the MCCDS form, assuming that differentiation of the series (32) can be completed term-byterm. The RS control shown in [2] follows, MCCDS KRS (α) = −R −1 (α)B T (α)H1 (α) − R −1 (α)B T
∞ ∂gRS (κ(α),0,0,... ) ∂κ (α) i
i=2
∂gRS (κ(α),0,0,... ) ∂κ1 (α)
Hi (α)
i−1
= θ i!
∞ θ θ θ i−1 = −R (α)B (α) H1 (α) + Hi (α) θ θ i! i=2 ∞ i 1 θ = −R −1 (α)B T (α) · Hi (α) θ i! i=1 ∞ θi −1 T = − θ R(α) B (α) Hi (α) . i! −1
T
i=1
6 Application to Earthquake Engineering The first-generation benchmark serves to validate the MCCDS theory. This problem involves a 3-story test structure that is subject to 1-dimensional ground motion in order to simulate the effects of seismic disturbances. In particular, the structure is on a shaking table that is excited according to historical data from the El Centro and Hachinohe earthquakes. The test structure frame is constructed of steel, and has a mass of 77 kg and a height of 158 cm. The floor masses for the three floors are distributed evenly, and sum to a total mass of 227 kg. On each floor of the structure, accelerometers are mounted to record accelerations. For control purposes, a representative Active Mass Driver (AMD) has been deployed on the third story of the structure, which is comprised of a single hydraulic actuator with steel masses attached to the end of a piston rod. For this experiment, the moving mass of the AMD was 5.2 kg, which amounted to 1.7% of the total mass of the structure. Separate sensors are used to record the displacement and acceleration of the AMD device. For further details on the experiment’s setup, the system model, and controller evaluations, consult [3] and [10].
J Optim Theory Appl (2011) 150:251–274
271
The goal of the initial study was to design a control that optimizes consumption of control resources while best attenuating disturbances according to 10 performance measures {Ji }10 i=1 . These measures pertain to the structural dynamics of the test building when protected by the AMD under a given control actuation, and the quality and costs associated with the control effort. In particular, J1 , J6 pertain to maximum, normalized inter-story drifts and J2 , J7 pertain to maximum, normalized per-story accelerations of the structure. Respectively, the performance measures J3 , J4 , J5 as well as J8 , J9 , J10 deal with the physical size of the actuator, the control energy expended by a given control law, and the magnitude of control forces generated by the actuator. The first five performance measures {Ji }5i=1 measure rms response of the structure when subjected to excitations from the Kanai–Tajimi (K-T) spectrum. On the other hand, {Ji }10 i=6 assess peak response when the structure is excited with historical data from the El Centro and Hachinohe earthquakes. In general, lower values of J1 and J2 are a good tradeoff for increased values of J3 , J4 , and J5 , within acceptable limits on the control effort. Likewise, decreased J6 and J7 are considered a good tradeoff for increased values of J8 , J9 , and J10 . For a full description of these performance criteria and the restrictions on control implementation, control energy, and control force, see [3]. A baseline LQG compensation for the AMD benchmark is described in [10], and all controllers in the following are computed using the weighting matrices of the proposed cost for the LQG design, in addition to the system matrices of the linear time-invariant, state-space model for the structure and the matrices used in the computation of the output. We use cost statistics generated from controls, ˜ K(α) = −R −1 (α)B T (α) H˜ 1 (α) + μ2 H˜ 2 (α) + μ3 H˜ 3 (α) + μ4 H˜ 4 (α) . (33) A family of MCCDS controls is now evaluated based upon the amount of reduction in J1 , J6 and J2 , J7 in comparison to the levels of these metrics achieved with the 4CC control of [3]. Our approach follows the Statistical Target Selection (STS) method as introduced in [1], which enables the control designer to formulate a linear control that is optimal with respect to a family of target cost densities, as approximated by the first r cumulants of the associated variate. Such might be regarded as a new approach for robust LQG design. For this numerical experiment, r = 4 and the target cost statistics are chosen as the expectation of a family of 4-point probability mass functions, as described in the proceeding. ξ Consider now random quantities κ˜ i (α) that assume values κ˜ i,LQG (α), κ˜ i,2CC (α), κ˜ i,3CC (α), and κ˜ i,4CC (α) at specified frequencies. Use as the ith target the expectation ξ of κ˜ i (α) given below, where 1 ≤ i ≤ 4. ξ κ˜ i (α) = EPθ κ˜ i (α) = θ1 · κ˜ i,LQG (α) + θ2 · κ˜ i,2CC (α) + θ3 · κ˜ i,3CC (α) + θ4 · κ˜ i,4CC (α), (34) 4
θj = 1.
j =1
Of course, κ˜ i,LQG (α) refers to the cost cumulants resultant from numerical integration of (8) under the baseline LQG control. Above κ˜ i,2CC (α) refers to the ith cost
272
J Optim Theory Appl (2011) 150:251–274
cumulant (7) that result from the integration of (8) under the control (33) with parameters (μ1 , μ2 , μ3 , μ4 ) = (1.0, 1.0 × 10−5 , 0.0, 0.0). Analogously, let κ˜ i,3CC (α) and κ˜ i,4CC (α) refer to the ith cost cumulant (7) that result from integration of (8) under the control (33) with parameters (μ1 , μ2 , μ3 , μ4 ) = (1.0, 1.0 × 10−5 , 9.0 × 10−12 , 0.0) and (μ1 , μ2 , μ3 , μ4 ) = (1.0, 1.0 × 10−5 , 9.0 × 10−12 , 2.0 × 10−20 ), respectively. These parameter selections for the controllers stem from those Pham used in [3]. Perturbation constants E ∗ = 010×10 and ∗ = 1.0 × 10−9 are chosen arbitrarily. For the MCCDS performance index φ(·) = g(·), consider a “hybrid” version of the KLD that has been derived using the techniques described in Lin and Saito [4], and is shown below,
1 1 1 1 2 2 ˜ 0) = 4 · g κ(t0 ), κ(t · κ (t0 ) + · κ˜ (t0 ) − · κ3 (t0 )κ˜ 3 (t0 ) 12 3 12 3 6 κ˜ 2 (t0 )
1 1 1 1 + 5 · κ42 (t0 ) + · κ˜ 42 (t0 ) − · κ4 (t0 )κ˜ 4 (t0 ) · 48 48 24 κ˜ 2 (t0 )
1 1 1 + 6 · κ3 (t0 )κ˜ 3 (t0 )κ˜ 4 (t0 ) − · κ˜ 32 (t0 )κ˜ 4 (t0 ) · 4 4 κ˜ 2 (t0 ) 1 1 + · κ˜ 32 (t0 )κ4 (t0 ) − · κ32 (t0 )κ4 (t0 ) 8 8
329 7 1 707 · κ˜ 34 (t0 ) + · κ34 (t0 ) − · κ3 (t0 )κ˜ 33 (t0 ) + 7 · 1296 1296 12 κ˜ 2 (t0 ) 35 · κ32 (t0 )κ˜ 32 (t0 ) − 162
κ2 (t0 ) κ2 (t0 ) (κ1 (t0 ) − κ˜ 1 (t0 ))2 1 − 1 − log + . (35) + · 2 κ˜ 2 (t0 ) κ˜ 2 (t0 ) κ˜ 2 (t0 ) It is assumed that (35) is convex since the it stems from the KLD and the KLD is convex. Under the assumption of convexity, the function g(·) is suitable for use in a ˜ 0 )) = 0 and ∇κ(t0 ) gκ(t ˜ 0 )) = MCCDS optimization since the conditions gκ(t ˜ 0 ) (κ(t ˜ 0 ) (κ(t 01×r can be verified. We now use (35) to compute the controller gain (20) for each target (34) to make a comparison between over 250 stabilizing MCCDS controllers, considering parameter sets {θj }4j =1 where θj = j · θ with an increment θ = 0.1. Relative to the 4CC control (20) with parameters (μ1 , μ2 , μ3 , μ4 ) = (1.0, 1.0 × 10−5 , 9.0 × 10−12 , 2.0 × 10−20 ), it can be verified that the MCCDS (θ1 = 0.1, θ2 = 0.0, θ3 = 0.6, θ4 = 0.3) control has better peak response and rms response, and stays within the control limits as given in [10]. For this MCCDS control, there is a 2.11% (respectively, 6.89%) reduction in peak inter-story drift and a 2.24% (8.39%) reduction in peak acceleration or the Hachinohe (respectively, El Centro) earthquake. As for the K-T spectrum case, J1 and J2 are lowered by 7.60% and 6.64%, respectively. Table 1 shows data for four additional MCCDS (θ1 , θ2 , θ3 , θ4 ) controls that reduce J1 , J6 and J2 , J7 more than 4CC does while meeting the established control con-
J Optim Theory Appl (2011) 150:251–274
273
Table 1 Control perf./costs, top-five high-performing MCCDS (θ1 , θ2 , θ3 , θ4 ) controls θ1
θ2
θ3
θ4
J1
J2
J3
J4
J5
J6
J7
J8
J9
J10
0.1
0.0
0.6
0.3 −7.60 −6.64
8.85
8.55
7.45
−2.11 −2.24
17.51
14.65
7.62
0.1
0.0
0.3
0.6 −7.61 −6.64
8.87
8.56
7.45
−2.10 −2.15
17.58
14.70
8.82
0.3
0.0
0.0
0.7 −6.08 −5.09
6.90
6.73
5.39
−1.65 −1.56
13.68
11.92
5.64
0.5
0.1
0.3
0.1 −5.69 −5.04
6.51
6.27
5.55
−1.54 −1.54
12.70
10.70
6.41
0.4
0.1
0.4
0.1 −5.46 −4.70
6.19
6.00
5.02
−1.46 −1.41
12.16
10.49
6.05
Table 2 Control limits, top-five high-performing MCCDS (θ1 , θ2 , θ3 , θ4 ) controls θ1
θ2
θ3
θ4
maxt u
maxt xm
maxt am
σu
σxm
σam
0.1
0.0
0.6
0.3
1.2656
4.5626
5.9733
0.2901
1.1340
1.5215
0.1
0.0
0.3
0.6
1.2656
4.5634
5.9377
0.2902
1.1341
1.5216
0.3
0.0
0.0
0.7
1.2422
4.4756
5.8238
0.2846
1.1136
1.4924
0.5
0.1
0.3
0.1
1.2319
4.4444
5.8418
0.2827
1.1095
1.4947
0.4
0.1
0.4
0.1
1.2305
4.4335
5.8194
0.2821
1.1062
1.4872
straints (see Table 2). Indicated is the percentage change to Ji , 1 ≤ i ≤ 10 achieved with 4CC. This family of controls is rich in tradeoffs between rms/peak performance and controller implementation costs. Our experiment illustrates that given the statistical characterizations of the costs resulting from stabilizing linear state-feedback compensations—here LQG, 2CC, 3CC, 4CC controls—it is possible to derive controls for alternative statistical profiles of the random cost J and investigate the resulting closed-loop system behavior.
7 Conclusions For the LQG framework, the MCCDS control minimizes a smooth, convex function of initial cost cumulants and target initial cost cumulants. The MCCDS control has been derived under the assumptions that the initial cost cumulants and targets are orders of magnitude larger than unity, and that a r-cumulant approximation to the target cost density exists under some exponentially stabilizing linear control law. Existence requires that the generating equations of the target cost cumulants are integrable for the aforementioned stabilizing control. Given these conditions, perturbations to the terminal conditions of the design equations for the initial cost cumulants and the target initial cost cumulants are asymptotically negligible and the MCCDS dynamic optimization is well-posed. Our result generalizes the LQG, kCC, and RS controls when linear performance indices with zero target cost statistics are considered in the MCCDS optimization. These observations suggest that for a problem fitting the LQG form, minimization of a general quadratic form of cost cumulants will yield an optimal controller bearing the structure shared by all the optimal linear controls and possessing a unique dependence on the cost cumulant-generating equations.
274
J Optim Theory Appl (2011) 150:251–274
The MCCDS control theory has been applied to the non-trivial first-generation benchmark problem. Using STS and parametric targets formed by taking convex combinations of cost cumulants resultant from nominal 2CC, 3CC, and 4CC controls and also the baseline LQG controller, a family of over 250 stabilizing MCCDS controllers are computed that minimize a hybrid variant of the KLD probability distance measure. Of these, a control is identified that outperforms a nominal 4CC control that previously compared well with LQG, multi-objective, covariance, probabilistic, and fuzzy controls for the first-generation benchmark [3]. The MCCDS control paradigm enables the control designer to translate the approximate shape of a target cost density into a linear control law, and this benefits control algorithm development tremendously. As demonstrated in this work and [1], STS facilitates the deliberate specification of target cost cumulants that encapsulate both performance objectives and desirable closed-loop behavior of the system through iterative MCCDS controller computations. These early results on MCCDS control are very encouraging, and show promise that the continued development of the theory will make the paradigm itself a valuable contribution to the family of existing stochastic optimal controls for the LQG framework.
References 1. Zyskowski, M.J., Sain, M.K., Diersing, R.W.: Maximum Bhattacharyya coefficient, cost densityshaping: a new cumulant-based control paradigm with applications to seismic protection. In: 5th World Conference on Structural Control and Monitoring, 5WCSCM-10402 (2010) 2. Mou, L., Liberty, S.R., Pham, K.D., Sain, M.K.: Linear cumulant control and its relationship to risksensitive control. In: Proceedings of the 38th Annual Allerton Conference on Communication, Control, and Computing, pp. 422–430 (2000) 3. Pham, K.D., Sain, M.K., Liberty, S.R.: Cost cumulant control: state-feedback, finite-horizon paradigm with application to seismic protection. J. Optim. Theory Appl. 115(3), 685–710 (2002) 4. Lin, J.J., Saito, N., Levine, R.A.: On approximation of the Kullback–Leibler information by Edgeworth expansion. Technical report, Dept. of Statistics, University of California-Davis (2001) 5. Sain, M.K., Liberty, S.R.: Performance-measure densities for a class of LQG control systems. IEEE Trans. Autom. Control AC-16(5), 431–439 (1971) 6. Fleming, W.H., Rishel, R.W.: Deterministic and Stochastic Optimal Control. Springer, Berlin (1975) 7. Liberty, S.R., Hartwig, R.C.: Design-performance-measure statistics for stochastic linear control systems. IEEE Trans. Autom. Control AC-23(6), 1085–1090 (1978) 8. Liberty, S.R., Hartwig, R.C.: On the essential quadratic nature of LQG control-performance measure cumulant. Inf. Control 32(3), 276–305 (1976) 9. Zyskowski, M.J.: Cost density-shaping for stochastic optimal control. Ph.D. thesis, University of Notre Dame (2010) 10. Spencer, B.F., Dyke, S., Deoskar, H.: Benchmark problems in structural control: Part I—Active mass driver system. Earthquake Eng. Struct. Dyn. 27, 1127–1139 (1998)