Math Meth Oper Res (2013) 77:15–32 DOI 10.1007/s00186-012-0413-6 ORIGINAL ARTICLE
On the construction of component importance measures for semi-Markov systems Mario Hellmich · Heinz-Peter Berg
Received: 26 January 2012 / Accepted: 28 September 2012 / Published online: 10 October 2012 © Springer-Verlag Berlin Heidelberg 2012
Abstract In this paper we consider semi-Markov reliability models of systems with discrete state space in a setup general enough to cover systems with maintenance and repair. The systems are assumed to consist of several components which can either be up or down in each state. In this framework we propose two different types of component importance measures which are based on transition rates and interval availability, respectively. For these importance measures we study both the time-dependent and the steady state situation, and express them in terms of quantities easily calculated from the building blocks of the semi-Markov process. Keywords Multicomponent system · Reliability · Importance measure · Semi-Markov process 1 Introduction In the analysis of complex multicomponent systems, component importance measures play a prominent role to identify those components which are most detrimental to system performance. System performance may have various interpretations depending on the nature of the system and its mission, such as safety, reliability, maintainability, availability, etc. Moreover, importance measures are used in various stages of the system life cycle for various purposes: during system design, for optimization of operation or for regulatory purposes in the context of safety reviews or risk informed decision
M. Hellmich (B) · H.-P. Berg Bundesamt für Strahlenschutz (Federal Office for Radiation Protection), Willy-Brandt-Straße 5, 38226 Salzgitter, Germany e-mail:
[email protected] H.-P. Berg e-mail:
[email protected]
123
16
M. Hellmich, H.-P. Berg
making. Accordingly one cannot expect to find one importance measure that is universally best for all needs, and not surprisingly a large number of such measures are in use in the field of probabilistic safety analysis, see Aven and Nøkland (2010), Cheok et al. (1998), and Boland and El-Neweihi (1995) for an overview. Efficient algorithms for their computation are available as well (Dutuit and Rauzy 2001). A common feature of these measures is that they are defined and studied in the framework of binary coherent systems, or in engineering terms, in the context of fault tree analysis. For binary coherent systems with independent components there is by now a copious and mathematically satisfying theory of importance measures, which has been recently generalized to include systems with repairable as well as multistate components, see the references provided in Sect. 2.2. An alternative and in some (though not in all) aspects more general way to model systems with component repair and maintenance is by means of a Markov or semiMarkov process. One of the earliest references to the use of semi-Markov models in reliability seems to be Barlow (1960), see also Chapter 5 of Barlow and Proschan (1965), as well as Korolyuk and Tomusyak (1965). A large class of semi-Markov models in reliability analysis employs a partitioning of the state space in two subsets of up (working) and down (failure) states (Csenki 1994, 1995; Limnios 1997; Limnios and Opri¸san 2001; Ouhbi and Limnios 2002; Limnios 2011), and we will adhere to this setup in the present work. Applications of semi-Markov models include areas as diverse as nuclear power plant safety and reliability (Jung and Cho 1995; Vulpe and Carausu 2005) and telecommunication network dependability (Gupta and Dharmaraja 2011). However, the problem of constructing and studying component importance measures in the semi-Markov framework remains open. In Do Van and Barros (2008), and Do Van et al. (2008, 2010) importance measures for Markov systems in the time dependent and steady state case were constructed and explicitly calculated for some examples by a perturbation method, but for semi-Markov systems importance measures have never been constructed or investigated. In view of the significance semi-Markov models have gained in reliability modeling it is crucial to have component importance measures in this framework at one’s disposal as well. Therefore it is our purpose to propose and study different kinds of importance measures which will hopefully turn out be useful for practical applications in reliability and safety analysis. Our importance measures are based on an entirely different approach than those in Do Van and Barros (2008), and Do Van et al. (2008, 2010). Our mathematical setup rests on the following assumptions: The multicomponent systems are supposed to have a time evolution given by a homogeneous semi-Markov process on a finite state space. It is assumed that in all states the functioning or failure of each component is specified. Moreover, the conventional partitioning of the state space in two disjoint sets, corresponding to the functioning or failure of the system, is assumed. An example of such a model appears in Csenki (1995). In this framework we obtain results which express our importance measures in terms of quantities easily determined numerically from the building blocks of the underlying semi-Markov process, such as transition rates, mean recurrence times, mean sojourn times, etc. The paper is organized as follows. In Sect. 2.1 we introduce some basic notions concerning semi-Markov and Markov renewal processes, along with the notation and terminology used throughout the paper. In order to make contact to the framework of
123
Semi-Markov systems
17
binary coherent systems, Sect. 2.2 provides the most basic notions concerning them. In the next two sections we propose two types of component importance measures. The basic measure in Sect. 3 is defined as the expected rate of system failures caused by (i.e. associated with) the failure of a specific component at a particular instant. The corresponding normalized version of this measure is shown to be equal to the probability that system failure at a particular instant is associated with the failure of the component, given that system failure occurs. The measures defined in Sect. 4 rely on the notion of interval availability, i.e. they are equal to the expected fraction of time the semi-Markov process spends in certain subsets of the state space. All measures are studied in both the time dependent and steady state situation. The final Sect. 5 provides some discussion about the importance measures proposed. 2 Mathematical setup and assumptions In the following we introduce our mathematical setup and necessary notations as well as some frequently used results concerning semi-Markov processes. General references for the theory of semi-Markov and Markov renewal processes are Korolyuk et al. (1975), Çinlar (1969), and Limnios and Opri¸san (2001). 2.1 Multicomponent systems and semi-Markov processes We consider some repairable or maintained system with a finite state space E = {1, . . . , d}. The system is assumed to consist of n components and the set of components is denoted by C = {1, . . . , n}. We suppose that in each system state i ∈ E a component a ∈ C can either be up (functioning) or down (failed, undergoing maintenance or repair, etc.). The up or down state of each component is assumed to be prescribed by maps ca : E −→ {0, 1} such that ca (i) =
0 : component a is down in state i , i ∈ E, a ∈ C . 1 : component a is up in state i
(1)
Moreover, we suppose that there are two distinguished subsets U, D ⊆ E with E = U ∪ D, U ∩ D = ∅, with the interpretation that U contains the states in which the system is up and D the states of system failure. In the following we shall frequently need the sets of states for which the system as well as some component a ∈ C are up and down, respectively, so we introduce the following notation: Ua = U ∩ {i ∈ E : ca (i) = 1}, Da = D ∩ {i ∈ E : ca (i) = 0}.
(2)
The time evolution is assumed to be given by a homogeneous semi-Markov process Z = {Z (t)}t≥0 with state space E. In the following we introduce the necessary terminology to describe this process. We start from some complete probability space (, F , P). In order to define a semi-Markov process we introduce the
123
18
M. Hellmich, H.-P. Berg
notion of an (homogeneous) Markov renewal process (MRP), which is a sequence {(Jn , Sn )}n∈N0 of random variables defined on (, F , P), with values in E × R+ , such that S0 ≤ S1 ≤ · · ·. Moreover, we assume that it satisfies the following Markov property P{Jn+1 = j, Sn+1 ≤ t|J0 , . . . , Jn , S0 , . . . , Sn } = P{Jn+1 = j, Sn+1 ≤ t|Jn , Sn } =: Q Jn j (t − Sn ), (3) where t ≥ 0, j ∈ E, n ∈ N0 , and Q i j , i, j ∈ E, is the semi-Markov kernel associated with the MRP. For each i, j ∈ E, the function t → Q i j (t) is nondecreasing and right-continuous, and Q i j (0) = 0. Moreover, for pi j := Q i j (∞) = limt→∞ Q i j (t) we have j∈E pi j = 1, thus it follows that a bounded measure Q i j (dt) on (R+ , B + ) is uniquely determined. In the following we shall use the matrix notation Q(t) = [Q i j (t)] and P = [ pi j ]. Conversely, it can be shown that for a matrix Q(t) with the above properties there exists a probability space and a sequence of random variables {(Jn , Sn )}n∈N0 such that (3) and P{J0 = i, S0 = s} = 1 for given i ∈ E, s ≥ 0, hold true. Upon introducing the random variables X 0 = S0 , X n = Sn − Sn−1 , n ∈ N, we can write (3) in the form Q i j (t) = P{Jn = j, X n ≤ t|Jn−1 = i}, i, j ∈ E, t ≥ 0, n ∈ N,
(4)
independently of n. The semi-Markov process corresponding to the MRP is defined by Z (t) = Jn if Sn ≤ t + S0 < Sn+1 , t ≥ 0, n ∈ N0 , or equivalently Z (t) = J N (t) , t ≥ 0, where
N (t) =
(5)
sup{n ∈ N : X 1 + · · · + X n ≤ t} : X 1 ≤ t 0 : X1 > t
counts the number of jumps of the MRP. This means that Z starts in state J0 and is observed only after the random time S0 ; in the following we shall always take S0 = 0. Occasionally we write Pi {·} as an abbreviation for P{·|J0 = i}. The initial distribution of the MRP will be denoted by pi := P{J0 = i}, i ∈ E. We further introduce the notation Fi j (t) = Q i j (t)/ pi j if pi j = 0 and Fi j (t) = 0 if pi j = 0; it can easily be shown that Fi j is the distribution of the sojourn time in state i when the next transition is to state j. It follows that the distribution of the sojourn time in state i is given by Q i j (t) = pi j Fi j (t), (6) Pi (t) = P{X n ≤ t|Jn−1 = i} = j∈E
j∈E
the mean ∞ sojourn time, i.e. the first moment of Pi , is denoted by m i , i.e. m i = 0 t Pi (dt). The n-fold matrix convolution power of the semi-Markov kernel Q is defined by (0) (1) Q i j (t) = δi j 1{t≥0} , Q i j (t) = Q i j (t), and
123
Semi-Markov systems
19
Q i(n) j (t)
= 1{t≥0}
t
Q ik (ds)Q (n−1) (t − s) kj
k∈E 0
= 1{t≥0}
t (n−1)
Q ik
(7) (ds)Q k j (t − s)
k∈E 0
= Pi {Jn = j, Sn ≤ t}, j
j
where δi j denotes the Kronecker symbol. Let S1 , S2 , . . . be the recurrence times of the j j state j ∈ E, then the random variables Sn+1 − Sn , n ∈ N, are i. i. d. with distribution j
denoted by G j j . Moreover, the distribution of S1 conditional on {J0 = i} is denoted by G i j , and its first moment by μi j . We write N j (t) = n∈N0 1{Jn = j,Sn ≤t} for the associated counting process. We then define the Markov renewal function of Z ,
ψi j (t) = Ei (N j (t)) =
∞ n=0
Pi {Jn = j, Sn ≤ t} =
∞
Q i(n) j (t).
(8)
n=0
The matrix ψ(t) = [ψi j (t)] is called the Markov renewal matrix, it satisfies a Markov renewal equation ψ(t) = 1{t≥0} I + (Q ψ)(t), t ≥ 0, where I is the unit matrix and the matrix convolution as in (7). It can be shown that J = {Jn }n∈N0 is a Markov chain, the so-called embedded Markov chain of the MRP, and P is its transition matrix. Recall that the semi-Markov process Z is irreducible if and only if J is irreducible. Similarly, a state i of Z is recurrent (i.e. G ii (∞) = 1) if and only if it is recurrent for J . A state i recurrent for Z is called positive recurrent if μii < ∞, otherwise it is called null recurrent. Recall that i is positive recurrent for an irreducible semi-Markov process if and only if it is recurrent for J and m i < ∞. The transition function of the semi-Markov process Z at time t is defined by Pi j (t) = P{Z (t) = j|J0 = i}. In the following we will be interested in the limiting behavior of Pi j (t) as t → ∞. To this end we quote the following result (Nollau 1980; Limnios and Opri¸san 2001) which gives sufficient conditions for a semi-Markov process to approach a steady state: Proposition 1 If Z is irreducible and positive recurrent, then lim Pi j (t) =
t→∞
m jπj =: ω j , i, j ∈ E, k∈E m k πk
(9)
of the embedded Markov chain J Here πi , i ∈ E, is the steady state distribution (which exists in this case), i.e. πi ≥ 0, i∈E πi = 1, and the vector π = [πi ] is a left eigenvector of P, π P = π . Notice that the limit distribution (9) is independent of the initial state.
123
20
M. Hellmich, H.-P. Berg
2.2 Binary coherent systems The traditional framework for discussing the reliability of complex multicomponent systems is the theory of binary coherent systems, which was introduced in Birnbaum et al. (1961), see Barlow and Proschan (1965, 1975) for textbook accounts. A number of different component importance measures have been extensively studied in this framework, the most important of them are connected with the names of Birnbaum (1969), Barlow–Proschan (1975) and Natvig (1979, 1982, 1985); they have been generalized to repairable binary coherent as well as multistate strongly coherent systems (Natvig and Gåsemyr 2009; Natvig 2011). For further importance measures see also the reviews (Boland and El-Neweihi 1995; Huseby 2004). In order to exhibit the relation of our mathematical setup to the theory of binary coherent systems we shall introduce some relevant notions here. Again we consider some system consisting of n components C = {1, . . . , n}, and assume that each component can either be in a working or failed state. The state space of the system is taken to be E = {0, 1}×n , with the interpretation that (i 1 , . . . , i n ) ∈ E means that component a ∈ C is working if i a = 1 and failed if i a = 0, thus ca (i 1 , . . . , i n ) = i a . A structure function is a map φ : E −→ {0, 1} such that the following properties are satisfied: 1. φ(0, . . . , 0) = 0 and φ(1, . . . , 1) = 1. 2. φ(i 1 , . . . , i n ) ≤ φ( j1 , . . . , jn ) if i k ≤ jk for all k = 1, . . . , n. 3. No component is irrelevant to the system, i.e. for any a ∈ C there exists some i ∈ E such that φ(1a , i) = φ(0a , i). Here we have employed the usual notation (1a , i) = (i 1 , . . . , i a−1 , 1, i a+1 , . . . , i n ) if i = (i 1 , . . . , i n ) and (0a , i) = (i 1 , . . . , i a−1 , 0, i a+1 , . . . , i n ). The functioning or failure of the system is determined by the structure function φ by way of U = {i ∈ E : φ(i) = 1} and D = {i ∈ E : φ(i) = 0}. Let us now turn to the time evolution of the system. We assume that it is given by some stochastic process Z = {Z (t)}t≥0 with values in E, where we write Z (t) = (χ1 (t), . . . , χn (t)), hence the process χa (t) ∈ {0, 1} indicates the condition of component a at time t. In general, the process Z will not be semi-Markov, even if we assume the components and hence χ1 , . . . , χn to be independent. This is due to the regenerative property of semi-Markov processes which excludes component life distributions with aging or improvement (i.e. with nonconstant failure rate), because the component ages are set to zero after each jump of Z . An example of a semi-Markov evolution is the case of independent components with constant failure rate. Specifically, let us assume that component a has an exponential life distribution Fa (t) = (1 − e−λa t )1{t≥0} with failure rate λa , and that the component is undergoing repair after failure with a repair time distribution G a (t) = (1 − e−μa t )1{t≥0} , where μa is the repair rate. Then Z is a semi-Markov process with semi-Markov kernel Q (1a ,i)(0a ,i) (t) = Q (0a ,i)(1a ,i) (t) =
λa (δib 1 λb + δib 0 μb ) b∈C μa
b∈C
123
(δib 1 λb + δib 0 μb )
(1 − e−λa t ), (1 − e−μa t ),
t ≥ 0,
(10)
Semi-Markov systems
21
for any i ∈ E, where i = (i 1 , . . . , i n ), and with all other entries of Q equal to 0 (in particular, not more than one component fails at a time). It is well known that a semi-Markov process with a kernel of the above form is a Markov jump process. 3 Importance based on transition rates In this section we define a component importance measure which is based on transition rates. Let i, j ∈ E be two states and consider the following three conditions: C1 On transiting from i to j the system fails, i.e. i ∈ U and j ∈ D. C2 On transiting from i to j the component a fails, i.e. ca (i) − ca ( j) = 1. C3 On transiting from i to j no components other than a fail, i.e. cb (i) − cb ( j) = 0 for all b ∈ C \{a}. We say that a component a ∈ C is critical for the transition from i to j if conditions C1–C3 are satisfied. We regard condition C3 as optional, it may be considered dropped in certain applications as discussed below. The set of all components which are critical for the transition from i to j will be denoted by Ci j . Clearly, in the presence of condition C3 it consists of at most one component. We now introduce a counting process Na (t) =
∞
1{a∈C Jn−1 Jn ,Sn ≤t} = #{n ∈ N : a ∈ C Jn−1 Jn , Sn ≤ t},
(11)
n=1
which counts the number of transitions until t for which component a is critical, i.e. the number of system failures coinciding with the failure of component a, but of no other component. In order to arrive at an importance measure one may try to adapt the idea behind the Barlow–Proschan importance measure (Barlow and Proschan 1975) for binary coherent systems with independent components, which is constructed by means of the probability that system failure is caused by (i.e. is associated with) the failure of component a, given that the system fails at time t. In the present approach this probability can be written as P{Na (t) = 1|τ D = t}, where τ D is the hitting time of D. However, since the semi-Markov framework is capable of dealing with repairable and maintained systems as well, the restriction to {Na (t) = 1} is unnatural. Instead we propose to define a time-dependent component importance measure by Ir (a, t) =
d E(Na (t)), t ≥ 0, a ∈ C , dt
provided the derivative exists, or equivalently IE(Na (t)) = we define a normalized form of Ir (a, t) by
t
0 Ir (a, s) ds.
(12) Moreover,
123
22
M. Hellmich, H.-P. Berg
Ir∗ (a, t) =
Ir (a, t) , t ≥ 0, a ∈ C , b∈C Ir (b, t)
(13)
with the property that a∈C Ir∗ (a, t) = 1 for all t ≥ 0. Thus Ir (a, t) is the expected rate of transitions at time t for which component a is critical. The index r indicates the relation of this importance measure to this transition rate. In other words, Ir (a, t)dt is the probability that within the time interval (t, t + dt] system failure occurs together with failure of component a, but of no other component, i.e. that system failure is “caused” by the failure of a (more precisely associated with the failure of a). Moreover, provided that system failure always coincides with the failure of precisely one component, b∈C Ir (b, t)dt is the probability of system failure during (t, t + dt]. For example, this is the case for a binary coherent system as defined in Sect. 2.2 if the probability for the simultaneous failure of two or more components is zero. Under this premise definition (13) implies that Ir∗ (a, t) can be interpreted as the probability that a caused system failure, given that the system fails at time t. Notice that in the present setup we cannot attribute in general a causal relationship between the system failure and that of a, as in the case of binary coherent systems. However, we can contend that in realistic models the association of a component with system failure has a bearing on its importance; this is the underlying idea of the importance measure (12). The notion of component criticality as defined by conditions C1– C3 should be subjected to critical scrutiny in concrete applications. For example, the underlying notion of association of system and component failure makes the importance measure blind for, e.g., common cause failures, which take out more than one component at a time. If this is not tolerable and common cause failures are to be taken into account by the importance measure, condition C3 may be dropped from the list. After having introduced the component importance measure our present goal is to show that the derivative in (12) exists under appropriate conditions and to prove a formula which expresses Ir (a, t) in terms of the building blocks of Z . To this end we use a method which closely parallels the one introduced in Ouhbi and Limnios (2002). We start with a lemma which is a result implicit in the proof of Theorem 1 of Ouhbi and Limnios (2002). Lemma 2 Suppose that Q i j (dt) = qi j (t)dt for all i, j ∈ E, and that lim qi j (t) = 0.
t→0
(14)
Then it follows that ∞
kP{Na (t + h) − Na (t) = k} = o(h)
k=2
as h ↓ 0 for all t ≥ 0. Proof We fix some k ∈ N and h, t ≥ 0, then we have P {Na (t + h) − Na (t) = k} ≤ P{N (t + h) − N (t) ≥ k} ≤ P{S N (t)+k − S N (t)+1 ≤ h}
123
(15)
Semi-Markov systems
=
23
P{J N (t)+1 = i}P{J N (t)+k = j, S N (t)+k − S N (t)+1 ≤ h|J N (t)+1 = i}
i, j∈E
≤
(k−1)
Qi j
(h) ≤ d(dhq(h))k−1
i, j∈E (n)
where we have used d = #E together with Q i j (t) ≤ d n−1 q(t)n t n , where q(t) = sup{qi j (s) : i, j ∈ E, 0 ≤ s ≤ t}, which may be established by induction using (7). Thus we conclude that ∞ ∞ kP{Na (t + h) − Na (t) = k} ≤ d k(dhq(h))k−1 k=2
k=2
≤d
1 −1 . (1 − dhq(h))2
Since (14) implies q(h) → 0 as h ↓ 0 it can be shown that ∞ 1 lim kP{Na (t + h) − Na (t) = k} = 0 h↓0 h k=2
as required.
The next lemma is a special case of a result contained in Limnios (2011), for the reader’s convenience we provide a proof here. Lemma 3 Assuming Q i j (dt) = qi j (t)dt then for any pair of states i, j ∈ E we have 1 lim P{Z (t) = i, Z (t + h) = j} = pk h→0 h k∈E
t ψki (ds)qi j (t − s).
(16)
0
Proof Fix some k ∈ E. Then for h ↓ 0 we have ∞ t Pk {Z (t) = i, Z (t + h) = j} = Pk {Z (t) = i, Sn ∈ ds, N (t) = n, Z (t + h) = j} n=0 0
=
∞
t
Pk {Jn = i, Sn ∈ ds}
n=0 0
Pk {N (t) = n, Z (t + h) = j|Jn = i, Sn = s} =
∞ t
Q (n) ki (ds)Q i j ((t − s, t − s + h])
n=0 0 t
=
ψki (ds)Q i j ((t − s, t − s + h]). 0
The assertion now follows from the dominated convergence theorem.
123
24
M. Hellmich, H.-P. Berg
We can now establish an explicit formula for Ir (a, t). Theorem 4 Assume that Q i j (dt) = qi j (t)dt and that (14) is satisfied. Then we have
Ir (a, t) =
t 1{a∈Ci j } pk
i, j,k∈E
ψki (ds)qi j (t − s), t ≥ 0, a ∈ C .
(17)
0
Proof By the definition (12) of Ir (a, t) we have 1 Ir (a, t) = lim E(Na (t + h) − Na (t)) h↓0 h
∞ 1 kP{Na (t + h) − Na (t) = k} = lim P{Na (t + h) − Na (t) = 1} + h↓0 h k=2
1 = lim P{Na (t + h) − Na (t) = 1}, h↓0 h where we have used Lemma 2. Furthermore, we calculate P{Na (t+h) − Na (t) = 1} = P{Na (t+h) − Na (t) = 1, N (t+h) − N (t) = 1}+o(h) = 1{a∈Ci j } P{Z (t + h) = j, Z (t) = i} + o(h), i, j∈E
where an argument similar to Lemma 2 was used. Substituting this result in the previous equation and applying Lemma 3 yields the assertion. Notice that if a is critical for the transition from i to j, but pi j = 0, then it follows that qi j = 0 a. e., hence this transition does not figure in (17). With the help of (17) we may directly calculate Ir (a, t), provided the Markov renewal matrix ψ is known explicitly. Since in most situations of practical interest a semi-Markov process is given in terms of its semi-Markov kernel and its initial distribution, an additional effort is required to obtain ψ. One possibility is to exploit the fact that ψ satisfies a Markov renewal equation and use standard methods of solution, e.g. in terms of Laplace transforms. However, we can expect a considerable simplification if the process approaches a steady state. This is exemplified by the next corollary, which is similar to a result contained in Ouhbi and Limnios (2002). Corollary 5 Suppose the assumptions of Theorem 4 are satisfied. Moreover, suppose that Z is irreducible and recurrent, and that qi j , i, j ∈ E, is direct Riemann integrable. Then pi j 1{a∈Ci j } , (18) Irst (a) := lim Ir (a, t) = t→∞ μii i, j∈E
for any a ∈ C , where μii is the mean recurrence time of state i.
123
Semi-Markov systems
25
Proof We first notice that Z is aperiodic, i.e. G ii is nonarithmetic for all i ∈ E (i.e. not concentrated on {nδ : n ∈ N} for some δ > 0). Assume for a contradiction that G ii is arithmetic with period δ. Then by irreducibility G i j are arithmetic with (i j) the same period for all i, j ∈ E, i.e. of the form G i j (t) = ∞ n=1 pn 1{nδ≤t} , with (n) ∞ some pi(n) n=1 pi j = 1. Then we calculate j ≥ 0 such that
(G i j ∗ Q k )(t) =
∞
(i j)
t−nδ
pn 1{nδ≤t}
n=1
qk (s) ds, k, ∈ E, 0
and by uniform convergence we see that t → (G i j ∗ Q k )(t) is continuous. Using the connection between G = [G i j ] and Q (see e.g. Pyke 1961), G i j (t) =
(G j ∗ Q i )(t) + [(1 − G j j ) ∗ Q i j ](t), i, j ∈ E, t ≥ 0,
∈E
we see that t → G i j (t) is continuous. This gives a contradiction, and we conclude that Z is aperiodic. Thus the key renewal theorem for MRPs (Limnios and Opri¸san 2001) applies and we obtain
lim Ir (a, t) =
t→∞
i, j,k∈E
pk 1{a∈Ci j } μii
∞ qi j (s) ds, 0
and (18) follows.
The assumptions of Theorem 4 and Corollary 5, in particular (14), may be too restrictive for certain applications. However, in the steady state case, we can prove a result similar to Corollary 5 under weaker assumptions. Recall that a function f : [0, ∞) −→ R is called asymptotically differentiable at +∞ with asymptotic derivative L if limt→∞ f (t)/t = L exists. Then if f is differentiable with derivative f , it follows by l’Hôpital’s rule that L = limt→∞ f (t), provided that this limit exists. The following result shows that the derivative in (12) exists asymptotically under weaker conditions, and thus generalizes Corollary 5. Proposition 6 Suppose that Z is irreducible. Then, for any a ∈ C , the function t → E(Na (t)) is asymptotically differentiable at +∞ with asymptotic derivative still given by the right hand side of (18). We will denote this asymptotic derivative by Irst (a) as well, i.e. pi j E(Na (t)) = 1{a∈Ci j } , a ∈ C, t→∞ t μii
Irst (a) := lim
(19)
i, j∈E
in this case.
123
26
M. Hellmich, H.-P. Berg (n)
Proof Let i, j ∈ E and denote by Si j the time of the n-th transition from i to j
(the so-called n-th i, j-renewal), and put Si(0) j = 0. Then by the regenerative property (n)
of Z , {Si j }n∈N0 is a renewal process. Let Ni j be the associated counting process, denote by Hi j the waiting time distribution between successive i, j-renewals, i.e. Hi j (t) = P{Si(1) j ≤ t|J0 = j}, and let νi j be the expectation of Hi j . Now since Na (t) = 1{a∈Ci j } Ni j (t), i, j∈E
we obtain E(Ni j ) E(Na (t)) 1 1{a∈Ci j } 1{a∈Ci j } = lim = t→∞ t→∞ t t νi j lim
i, j∈E
i, j∈E
by the elementary renewal theorem, hence t → E(Na (t)) is asymptotically differentiable. Noticing that νi j = μii / pi j (by combining equations (4.40) and (4.51) from Störmer 1970) and substituting in the last equation concludes the proof. In this way we have found a time independent importance measure. To calculate Irst (a) explicitly from (18) or (19) we have to determine the mean recurrence times μii , i ∈ E. They may be obtained as the unique bounded solution of the following system of linear equations, see Limnios and Opri¸san (2001) for proofs: μi j = m i + pik μk j , (20) k∈E k = j
or alternatively from μii =
1 πjm j, πi
(21)
j∈E
where πi , i ∈ E, is the stationary distribution of the embedded Markov chain. We shall define a normalized version of Irst (a) as well by Irst (a) . st b∈C Ir (b)
Ir∗st (a) =
(22)
In the following we discuss the relation of Irst (a) to the Barlow–Proschan importance measure for binary coherent systems, which was introduced in Barlow and Proschan (1975). In Section 6 of that work, a generalization of the standard Barlow–Proschan importance measure for binary coherent systems to repairable binary coherent systems in the steady state is given. It is shown there that this generalization can be expressed as lim
t→∞
E(Na (t)) b∈C E(Nb (t))
in our notation [see also Natvig and Gåsemyr 2009, equation (11)]. We can now record the following result.
123
Semi-Markov systems
27
Proposition 7 Suppose that Z is irreducible. Then E(Na (t)) , a ∈ C. b∈C E(Nb (t))
Ir∗st (a) = lim t→∞
(23)
Proof Using (19), it follows that I st (a) E(Na (t)) E(Na (t))/t = lim = r st = Ir∗st (a), t→∞ b∈C E(Nb (t)) b∈C E(Nb (t))/t b∈C Ir (b)
lim
t→∞
which is the claim.
In particular, Proposition 7 applies to binary coherent systems with a Markov time evolution as introduced in Sect. 2.2, Eq. (10), provided λa , μa > 0 for all a ∈ C . Thus we find that for a repairable binary coherent system in the sense of Sect. 2.2, with exponential failure and repair distributions, the generalized steady state Barlow–Proschan importance measure of Barlow and Proschan (1975) coincides with Ir∗st (a). 4 Importance based on interval availability The notion of interval availability is well known in semi-Markov reliability modeling, see e.g. Limnios and Opri¸san (2001), Limnios (2011), and Rubino and Sericola (1992). It can be used in the following way to define a component importance measure: We can contend that a component a is more important than another component b if during a time interval (t, t + τ ] of length τ the system spends a larger fraction of the time τ in Da than in Db . Since the total amount of time spent in Da is given by total t+τ 1{Z (s)∈Da } ds we define, in analogy to the interval availability (sometimes called t average availability) the importance of component a as 1 Iav (a, t, t + τ ) = τ
t+τ P{Z (s) ∈ Da } ds.
(24)
t
In order to evaluate the right hand side of (24) we introduce a quantity similar to the point availability, Ai (t) = Pi {Z (t) ∈ Da }, i ∈ E, t ≥ 0, as well as the vector A(t) = [Ai (t)]. It has been shown in Limnios (2011) that A(t) satisfies the following Markov renewal equation, Ai (t) = 1{i∈Da } P¯i (t) +
t
Q i j (ds)A j (t − s),
j∈E 0
or A(t) = 1{·∈Da } P¯· (t) + (Q A)(t) in matrix notation, where P¯i (t) = 1 − Pi (t). A solution of this equation (see Limnios and Opri¸san (2001)) is given by A(t) = (ψ 1{·∈Da } P¯· )(t), or
123
28
M. Hellmich, H.-P. Berg
Ai (t) =
t
ψi j (ds)(1 − Q jk (t − s)).
j∈Da k∈E 0
In order to obtain a simple formula for the right hand side of (24) in the steady state we prove the following result, which is closely related to Limnios (2011), Proposition 3.3. Proposition 8 Suppose that Z is irreducible, recurrent and aperiodic (this is the case if Q i j (dt) = qi j (t)dt). Then st Iav (a) := lim Iav (a, 0, t) = t→∞
ωj,
(25)
j∈Da
where the notation of (9) was used. Proof Under the present assumptions the key renewal theorem applies and leads to lim Ai (t) =
t→∞
j∈Da
1 μjj
∞
P¯ j (s) ds =
mj = ωj, μjj
j∈Da
0
j∈Da
where we have used (9) and (21). Now by a basic property of the Cesàro mean we obtain ωj, lim Iav (a, 0, t) = lim Ai (t) = t→∞
as was to be shown.
t→∞
j∈Da
st (a) by I ∗st (a) = I st (a)/ st Again we define a normalized version of Iav av av b∈C Iav (b). st From (25) we see that a calculation of Iav (a) is particularly simple: it suffices to know the steady state distribution ωi , i ∈ E, which can be obtained from Proposition 1 by way of the mean sojourn times and the stationary distribution of the embedded Markov chain J . The importance measure just introduced should be used with caution. If the assost can be very misciation of component downtime with system downtime is weak, Iav leading. Indeed, consider a component a that is assumed to be always in the up state, st (a) = 0. However, if a is always in the down state, then D = D then Da = ∅ and Iav a st and in general Iav (a) = 0, even though the component is irrelevant to the system. In order to avoid this unfortunate phenomenon we propose an importance measure which is somewhat akin to the one introduced in Huseby (2004), based on system uptime. For a component a ∈ C we define
Ca = {i ∈ E : there exists some j ∈ E such that a ∈ Ci j , and pi j > 0}, i.e. Ca is the set of all states i such that component a is critical for some transition t+τ starting in i. Thus τ −1 t 1{Z (s)∈Ca } ds is the fraction of time for which component a
123
Semi-Markov systems
29
critical for some transition during the time interval (t, t + τ ]; as before the notion of criticality is based on conditions C1–C3 in Sect. 3. Now in analogy to (24) we define a component importance measure by 1 Icr (a, t, t + τ ) = τ
t+τ P{Z (s) ∈ Ca } ds.
(26)
t
If Ca = ∅ then Icr (a, t, t + τ ) = 0. Note that since Ca ⊆ Ua , this measure addresses system uptime, unlike Iav . For the importance measure based on system uptime introduced in Huseby (2004), there is a corresponding measure based on system downtime. Similarly, we can define a dual version of Icr as follows. For a ∈ C we write Ca = { j ∈ E : there exists some i ∈ E such that a ∈ Ci j , and pi j > 0}, i.e. Ca is the set of all states j such that component a is critical for some transition ending in j. We define analogously Icr (a, t, t
1 + τ) = τ
t+τ
P{Z (s) ∈ Ca } ds,
(27)
t
Since Ca ⊆ Da , this importance measure is more in line with the others defined in this work. In the steady state we obtain a result analogous to Proposition 8. Proposition 9 If Z is irreducible, recurrent and aperiodic then Icrst (a) := lim Icr (a, 0, t) = t→∞
ωj,
(28)
j∈Ca
and we have a similar formula for I st cr (a) = lim t→∞ Icr (a, 0, t), with C a replaced by Ca in (28).
Corresponding normalized versions Icr∗st (a) and I ∗st cr (a) of (28) can be defined as well. We remark that Icrst (a) is the probability that after a sufficiently large time the system is up and in a state such that component a critical for some transition. Similarly, I st cr (a) is the probability that after a sufficiently large time the system is in a state which can be reached by a transition for which component a is critical. 5 Discussion After having introduced and studied two different types of component importance measures for semi-Markov systems, both in the time-dependent and the steady state situation, we close this paper with some remarks concerning their prospective practical applications.
123
30
M. Hellmich, H.-P. Berg
We start with the importance measure based on transition rates introduced in Sect. 3. We have found that (under an additional assumption) its normalized version Ir∗ (a, t) can be interpreted as the probability that system failure is caused by component a, given that the system fails at t. From (13) it follows that component a is important with respect to this measure if the rate of occurrence of system failures associated with a is large. Hence this importance measure suggests itself in applications when the number of system failures per time is to be kept under control. Since Ir∗ (a, t) is time-dependent it is left to the analyst to decide at which points of time it is to be evaluated. This problem does not arise when its steady state version as introduced in Corollary 5 is used. The explicit expression of this measure in (18) can be understood very intuitively: in order to obtain Irst (a) we have to sum over all transitions for which a is critical, and each summand is proportional to the corresponding transition probability and inversely proportional to the mean recurrence time of the state in which the transition starts. Moreover, in Proposition 7 we have shown that Ir∗st (a) is closely related to the well known steady state Barlow–Proschan importance measure. Hence we have obtained a version of this importance measure for semi-Markov systems along with an explicit formula for it. Finally we refer to the discussion of the Barlow–Proschan importance measure for repairable systems contained in Natvig and Gåsemyr (2009) and to the objections raised against the Barlow–Proschan measure in the case of parallel and series systems. The three importance measures given in Sect. 4 along with their steady state versions rely on the notion of interval availability; both of them are defined as the fraction of time the system spends in a particular set of states during some bounded time interval. Hence these importance measures are expected to be of value in situations when the importance measure is required to reflect a component’s contribution to system unavailability. However, both Iav (a, s, t) as well as Icr (a, s, t) and Icr (a, s, t) cannot be considered entirely satisfactory. As was already explained in Sect. 4, the measure Iav (a, s, t) requires a certain amount of correlation between the failure of component a and system failure. The measures Icr (a, s, t) and Icr (a, s, t) are defined as the fraction of time the system spends in sets of states such that a is critical for transitions starting or ending in these states, however, the probabilities of these transitions are not incorporated in the definitions (26) and (27). Nevertheless, we expect that Icr (a, s, t) and Icr (a, s, t) have its merits in certain applications. In the present paper we have not said anything about the importance of groups of components, although this is an important point in some applications. As a first approximation to the solution of this problem one may resort to simply adding the importances of the components in the group: For example, for two components a, b ∈ C , we see that Ir (a, t) + Ir (b, t) is the expected rate of system failures caused by some component in the group {a, b}. However, more complicated interactions between components cannot be detected in this way. It remains a point for further research to develop importance measures for groups of components with properties like the joint reliability importance (Hong et al. 2000) in the context of binary coherent systems. In conclusion we state that importance measures for semi-Markov systems warrant further investigation. In particular, the measures provided in the present paper should be studied in concrete situations in order to evaluate their merits and deficiencies,
123
Semi-Markov systems
31
and experience thereby gained should be used to better understand their range of applicability as well as to develop improved importance measures. Acknowledgments the paper.
Thanks are due to the referees whose comments led to a significant improvement of
References Aven T, Nøkland TE (2010) On the use of uncertainty importance measures in reliability and risk analysis. Reliab Eng Syst Safety 95:127–133 Barlow RE (1960) Applications of semi-Markov processes to counter and reliability problems. Stanford University, Thesis Barlow RE, Proschan F (1965) Mathematical theory of reliability. Wiley, New York Barlow RE, Proschan F (1975) Statistical theory of reliability and life testing. Probability models. Holt, Rinehart and Winston, New York Barlow RE, Proschan F (1975) Importance of system components and fault tree events. Stoch Proc Appl 3:153–173 Birnbaum ZW, Esary JD, Saunders SC (1961) Multi-component systems and structures and their reliability. Technometrics 3:55–77 Birnbaum ZW (1969) On the importance of different components in a multicomponent system. In: Krishnaiah PR (ed) Multivariate analysis II. Academic Press, New York Boland PJ, El-Neweihi E (1995) Measures of component importance in reliability theory. Comput Oper Res 22:455–463 Cheok MC, Parry GW, Sherry RR (1998) Use of importance measures in risk-informed regulatory applications. Reliab Eng Syst Safety 60:213–226 Çinlar E (1969) Markov renewal theory. Adv Appl Prob 69:123–187 Csenki A (1994) Dependability for systems with a partitioned state space: Markov and semi-Markov theory and computational implementation. Lecture Notes in Statistics, vol 90. Springer, New York Csenki A (1995) An integral equation approach to the interval reliability of systems modelled by a finite semi-Markov process. Reliab Eng Syst Safety 47:37–45 Do Van P, Barros A (2008) Reliability importance analysis of Markovian systems at steady state using perturbation analysis. Reliab Eng Syst Safety 93:1605–1615 Do Van P, Barros A, Bérenguer C (2008) Importance measure on finite time horizon and application to Markovian multistate production systems. Proc Inst Mech Eng Part O J Risk Reliab 222:449–461 Do Van P, Barros A, Bérenguer C (2010) From differential to difference importance measures for Markov reliability models. Eur J Oper Res 204:513–521 Dutuit Y, Rauzy A (2001) Efficient algorithms to assess component and gate importance in fault tree analysis. Reliab Eng Syst Safety 72:213–222 Gupta V, Dharmaraja S (2011) Semi-Markov modeling of dependability of VoIP network in the presence of resource degradation and security attacks. Reliab Eng Syst Safety 47:1627–1636 Hong JS, Koo HY, Lie CH (2000) Computation of joint reliability importance of two gate events in a fault tree. Reliab Eng Syst Safety 68:1–5 Huseby AB (2004) Importance measures for multicomponent binary systems. Statistical research report no. 11, Department of Mathematics, University of Oslo Jung WS, Cho NZ (1995) Semi-Markov reliability analysis of three test/repair policies for standby safety systems in a nuclear power plant. Reliab Eng Syst Safety 31:1–30 Korolyuk VS, Tomusyak AA (1965) Description of the functioning of reserve systems by means of semiMarkov processes. Kibernetika 5:55–59 Korolyuk VS, Brodi SM, Turbin AF (1975) Semi-Markov processes and their applications. J Math Sci 4:244–280 Limnios N (1997) Dependability analysis of semi-Markov systems. Reliab Eng Syst Safety 55:203–207 Limnios N (2011) Reliability measures of semi-Markov systems with general state space. Methodol Comput Appl Probab. doi:10.1007/s11009-011-9211-5 Limnios N, Opri¸san G (2001) Semi-Markov processes and reliability. Birkhäuser, Boston
123
32
M. Hellmich, H.-P. Berg
Natvig B (1979) A suggestion of a new measure of importance of system components. Stoch Proc Appl 9:319–330 Natvig B (1982) On the reduction of the remaining system lifetime due to the failure of a specific component. J Appl Prob 19:642–652 (Correction: J Appl Prob 20:713, 1983) Natvig B (1985) New light on measures of importance of system components. J Scand Stat 12:43–54 Natvig B, Gåsemyr J (2009) New results on the Barlow–Proschan and Natvig measures of component importance in nonrepairable and repairable systems. Methodol Comput Appl Probab 11:603–620 Natvig B (2011) Measures of component importance in nonrepairable and repairable multistate systems. Methodol Comput Appl Probab 13:523–547 Nollau V (1980) Semi-Markovsche Prozesse. Thun, Harri Deutsch (in German) Ouhbi B, Limnios N (2002) The rate of occurrence of failures for semi-Markov processes and estimation. Stat Probab Lett 59:245–255 Pyke R (1961) Markov renewal processes with finitely many states. Ann Math Stat 32:1243–1259 Rubino G, Sericola B (1992) Interval availability analysis using operational periods. Perf Eval 14:257–272 Störmer H (1970) Semi-Markoff-Prozesse mit endlich vielen Zuständen. Lecture Notes in Operations Research and Mathematical Systems, vol 34. Springer, Berlin (in German) Vulpe A, Carausu A (2005) Markovian and semi-Markov models for availability evaluation of NPP subsystems and equipment. In: Proceedings of the 18th international conference on structural mechanics in reactor technology. SMiRT 18-M02-3, pp 3833–3842
123