Distrib. Comput. (2013) 26:141–158 DOI 10.1007/s00446-012-0169-5
Anonymous asynchronous systems: the case of failure detectors François Bonnet · Michel Raynal
Received: 8 February 2011 / Accepted: 22 March 2012 / Published online: 17 May 2012 © Springer-Verlag 2012
Abstract Due to the multiplicity of loci of control, a main issue distributed systems have to cope with lies in the uncertainty on the system state created by the adversaries that are asynchrony, failures, dynamicity, mobility, etc. Considering message-passing systems, this paper considers the uncertainty created by the net effect of asynchrony and process crash failures in systems where the processes are anonymous (i.e., processes have no identity and locally execute the same algorithm). Trivially, agreement problems such as consensus, that cannot be solved in non-anonymous asynchronous systems prone to process failures, cannot be solved either if the system is anonymous. The paper investigates failure detectors that allow processes to circumvent this impossibility. It has several contributions. It first presents four failure detectors (denoted A P, A P, AΩ, and AΣ) and show that they are the “identity-free” counterparts of perfect failure detectors, eventual leader failure detectors, and quorum failure detectors, respectively. AΣ is new and showing that AΣ and Σ have the same computability power in a non-anonymous system is not trivial. The paper also shows that the notion of failure detector reduction is related to the computation model. Then, the paper presents and proves correct a uniform anonymous consensus algorithm based on the failure detector pair (AΩ, AΣ) (“uniform” means here that not
F. Bonnet (B) School of Information Science, JAIST, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan e-mail:
[email protected] M. Raynal Institut Universitaire de France and IRISA, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes, France e-mail:
[email protected]
only processes have no identity, but no process is aware of the total number of processes). This new algorithm is not a simple “straightforward extension” of an algorithm designed for non-anonymous systems. To benefit from AΣ, it uses a novel broadcast facility which encapsulates an AΣ-based message exchange pattern that provides the processes with an interesting intersection property on the set of messages they have exchanged. Finally, the paper discusses the notions of failure detector hierarchy, weakest failure detector for anonymous consensus, and the implementation of identity-free failure detectors in anonymous systems. Keywords Anonymous system · Asynchronous system · Communication abstraction · Distributed computability · Failure detector · Fault-tolerance · Message-passing system · Modularity · Process crash
1 Introduction Anonymous systems One of the main issues faced by distributed computing lies in mastering the uncertainty created by adversaries such as asynchrony and failures. As a simple example, the net effect of these adversaries makes impossible for a process to know if another process has crashed or is only very slow. Recently, new facets of uncertainty (e.g., dynamicity, mobility) have appeared and made distributed computing even more challenging. Among the many adversaries that distributed computing has to cope with, anonymity is particularly important. It occurs when the computing entities (processes, agents, sensors, etc.) have no name, and consequently cannot distinguish the ones from the others. It is worth noticing that, from a practical point of view, anonymity is a first class property as soon as one is interested in guaranteeing privacy. As an
123
142
example, some peer-to-peer file-sharing systems assume the peers are anonymous [11]. In the same vein, not all the sensor networks assume that each sensor has a proper identity [2,16]. One of the very first works (to our knowledge) that addressed anonymous systems is the work of Angluin [1]. In that paper, considering message passing systems, Angluin was mainly interested in computability issues, namely answering the question “which functions can be computed in presence of asynchrony and anonymity?” The leader election problem is a simple example of a problem that is unsolvable in such a setting (intuitively, this is because symmetry cannot be broken in presence of both asynchrony and anonymity). Other works have then addressed anonymity in particular settings such as ring networks [3], or networks with a regular structure [21]. Failure-free message passing anonymous systems have also been investigated in [28,29] where is given a characterization of problems solvable in this context according to which amount on information about network attributes are initially known by the processes. Enriching a system with a failure detector Failure detectors [10] are one of the most popular approaches to circumvent impossibility results in non-anonymous failure-prone asynchronous systems. Roughly speaking, a failure detector is a device that provides each process with failure-related information. According to the quality of this information, several types of failure detectors can be defined. As an example, let us consider the consensus problem. This problem, that cannot be solved in a pure asynchronous message-passing system prone to even a single process crash [17], is defined as follows. Each process proposes a value, and every process that does not crash has to decide a value (termination), such that a decided value is a proposed value (validity), and no two processes decide different values (agreement). It has been shown that the eventual leader failure detector denoted Ω is the weakest failure detector that allows consensus to be solved in message-passing asynchronous systems where a majority of processes never crash [9]. It has also been shown that the pair (Σ, Ω), where Σ is a quorum failure detector [13], is the weakest failure detector to solve consensus in non-anonymous systems when any number of processes may crash [13,14]. (These failure detectors are precisely defined later in the paper.) Failure detectors and anonymous systems The local output of a failure detector Ω is a process identity. Similarly, the local output of a failure detector P (perfect failure detector) [10] or Σ [13] is a set of process identities. While these failure detectors can be added to an anonymous distributed system, their outputs cannot be directly used by the anonymous processes for the simple reason that there is no “process identity” notion inside the system. This means that (the
123
F. Bonnet, M. Raynal
output of) Ω is useless in an anonymous system. As far as the output of P or Σ is concerned, an anonymous process can only exploit the cardinality of the identity set that is currently output. As we will see, this cardinal value can be exploited if the failure detector is P, while it cannot if it is Σ. Differently from Ω, P, or Σ, failure detectors have been proposed that, while used in non-anonymous systems, neither output process identities nor associate values with process identities. We call them identity-free failure detectors. As an example, the failure detector L, that outputs a Boolean value at every process has been introduced in [15], where it is shown to be the weakest failure detector for the (n − 1)set agreement problem in n-process asynchronous messagepassing systems prone to any number of crashes. This failure detector has been generalized in [5]. A failure detector, denoted here A P, that outputs an approximation of the number of crashed processes has been proposed in [23,24]. This failure detector has been used in [6] to solve consensus in anonymous systems prone to any number t < n of process crashes. It has also been shown in [6] that, in an anonymous system enriched with such a failure detector, 2t + 1 is a lower bound on the number of rounds for consensus (in a non-anonymous system enriched with a perfect failure detector, this lower bound is t + 1). Content of the paper This paper is on failure detectors suited to anonymous asynchronous message-passing systems prone to any number of process crashes. It has several contributions. – It first introduces a new failure detector, denoted AΣ, and shows that it is the identity-free counterpart of the failure detector Σ (quorum failure detector). While Σ provides each process with a set of process identities that satisfies an intersection property, the main issue encountered in defining its identity-free counterpart AΣ lies in capturing properties on set cardinals from which an intersection property can be extracted. This capture is not trivial, as demonstrated by the construction of Σ from AΣ in a non-anonymous system which is particularly subtle. (Contrarily, the construction of AΣ from Σ in a non-anonymous system is simple.) The paper also presents the failure detectors A P, A P and AΩ which are the identity-free counterparts of P (perfect failure detector), P (a variant of P) and Ω (eventual leader failure detector), respectively. A novel (bounded and quiescent) construction of a failure detector of P from A P in a nonanonymous system is presented. – The paper presents then a communication abstraction which encapsulates the use of AΣ. It consists in a novel message exchange pattern composed of a finite number of asynchronous communication phases (the number of these internal communication phases depends on
Anonymous asynchronous systems
the output of AΣ). Hence, this abstraction hides an AΣbased sophisticated communication pattern to the upper layer while providing it with well-defined properties. The most important of these properties is an intersection property on the messages received by the processes. – The paper presents then an algorithm, based on the pair of failure detectors AΣ and AΩ, that solves the consensus problem in an anonymous system whatever the number t of processes that may crash. In addition of being anonymous, the algorithm is also strongly anonymous in the sense that no process is required to know the total number n of processes. The fact that Σ and Ω are useless in anonymous systems has motivated the design of their identity-free counterparts. The fact that (Σ, Ω) is the weakest failure detector to solve consensus in a non-anonymous system despite any number of process crashes [14] has motivated the design of an (AΣ, AΩ)-based anonymous consensus algorithm and poses the question of the weakest failure detector for anonymous consensus. This algorithm adopts the structure of the non-anonymous consensus algorithm presented in [26]: processes execute asynchronous rounds and each round is made up of three communication phases. While in a non-anonymous system, the use of quorums [25] allows a process to broadcast a message and then wait for messages from a given set of processes, this is no longer possible in an anonymous system. To solve this problem, the proposed algorithm uses the previously defined communication abstraction. – The paper finally discusses the “weakest failure detector” notion for anonymous consensus. As an example, while P is strictly stronger than Ω in a non-anonymous system, their identity-free counterparts A P and AΩ are incomparable. The paper addresses also the implementation of identity-free failure detectors in anonymous synchronous systems. Both A P and AΣ can be implemented in such systems and are consequently realistic [12].
Roadmap The paper is made up of 8 sections. The anonymous distributed computation model is presented in Sect. 2. Section 3 presents the four identity-free failure detectors A P, A P, AΩ, and AΣ. Section 4 addresses failure detector reductions and shows that AΣ and Σ are equivalent in non-anonymous systems. This section also discusses the notion of failure detector hierarchy and the implementability of anonymous failure detectors. Section 5 presents the communication abstraction which encapsulates an AΣ-based communication pattern. Section 6 presents an (AΣ, AΩ)-based consensus algorithm for strongly anonymous asynchronous systems. Section 7 discusses the “weakest failure detector” issue for anonymous consensus. Finally Sect. 8 concludes the paper.
143
2 Anonymous asynchronous message-passing systems Process model The system is made up of a fixed number n of processes, denoted p1 , . . . , pn . Π = {1, . . . , n} denotes the set of process identities (also called indexes). Processes are anonymous in the sense that no process knows the existence of indexes and all processes execute the same algorithm. This means that indexes can only be used from an external observer point of view: they do not belong to the system as perceived by processes. The processes are asynchronous in the sense that there is no assumption on their respective speeds. Anonymous versus strongly anonymous When not explicitly indicated, the anonymous model does not prevent the processes from knowing n (the total number of processes). In order to prevent confusion, we say that the model is strongly anonymous when no process is aware of the value of n. Time model The underlying time model is the set of positive integers (denoted N). Time instants are denoted τ , τ , etc. Similarly to indexes, this time notion is not accessible to the processes. It is only used from an external observer point of view to state or prove properties. More generally, but for process identities, the computation model is the same as in [10]. Failure model A process executes correctly its algorithm until it possibly crashes. A crash is a premature stop; after it has crashed, a process executes no step. A process that does not crash in a run is correct in that run. Otherwise, it is faulty in that run. Until it crashes (if ever it does), a process is alive. An environment is a set of failure patterns, where a failure pattern [10] is a function F : N → 2Π such that F(τ ) denotes the set of processes that have crashed by time τ . For a given failure pattern F, we define the set of crashed processes crashed(F) = ∪τ ∈N F(τ ) and the set of correct processes corr ect (F) = Π \ crashed(F). We consider here failure patterns in which all (but one) processes may crash in a run. This set of failure patterns is called wait-free environment. Communication The processes communicate by exchanging messages through reliable channels, i.e., there is no creation, duplication or alteration of messages. These channels are asynchronous, which means that there is no assumption on message transit delays, except that they are positive and finite (every message eventually arrives). The processes are provided with a broadcast() communication primitive that allows the invoking process to send the same message to all the processes (including itself). When it receives a message, a process cannot determine which process is its sender. If the process pi that issues an invocation of broadcast() does not crash during its invocation, the message is received by all the processes (possibly at different
123
144
F. Bonnet, M. Raynal
time instants). If it crashes during its invocation, the message is received by an arbitrary subset of processes. Such a broadcast primitive is sometimes called best effort broadcast.
such that it (1) never contains the index of a process before it crashes and (2) eventually contains the indexes of all faulty processes. Formally, ∀F ∀H ∈ P(F)
Notations The previous model is denoted AAS[∅]. AAS is an acronym for Anonymous Asynchronous System; ∅ means that there is no additional assumption. AS[∅] is then used to denote the non-anonymous counterpart of AAS[∅], i.e., an Asynchronous message-passing System prone to any number of crash failures and where each process has a distinct identity and knows all process identities [4,22].
– Validity. ∀τ ∀i ∈ Π H (i, τ ) ⊆ Π . – Safety. ∀τ ∀i, j ∈ Π (i ∈ F(τ )) ⇒ (i ∈ H ( j, τ )).2 – Liveness. ∃τ, ∀i ∈ crashed(F) ∀ j ∈ corr ect (F) ∀τ ≥ τ i ∈ H ( j, τ ).
3 Failure detectors 3.1 Definition of failure detectors The following definitions, based on Π and the set N of time instants, are from [10]. A failure detector history H with range R describes the behavior of a failure detector during a run. It is a function H : Π × N → R where H (i, τ ) describes the value of the failure detector at pi at time τ . A failure detector D with range R is a function that maps each failure pattern F to a set of failure detector histories with range R: D(F) is the set of failure detector histories that D can exhibit when the failure pattern is F. Let A and B be two failure detectors. In the following AAS[A, B] denotes the system AAS[∅] enriched with both failure detectors A and B. This means that a process can additionally read the local variables provided by these failure detectors. AAS[A] denotes a system where only the failure detector A can be accessed. 3.2 A few classical failure detectors This section recall the definition of three well-known failure detectors, namely, P, Ω, and Σ and also the definition of P (a simple variant of P). While they have been designed for non-anonymous systems, nothing prevents us from enriching an anonymous system with any of them (but, as shown below, it is possible that such an “enrichment” does not add computational power to the anonymous system). In the following definitions, we simplify some logical terms, when there is no ambiguity; for example ∀τ means ∀τ ∈ N. The perfect failure detector P The perfect failure detector [10] provides each process pi with a local set variable denoted suspectedi 1 that contains process indexes and is 1
The read-only local variable suspectedi corresponds to the output H (i, −) of the failure detector P. In all formal definitions, we keep the classical formalism H (i, τ ). The notation suspectedi is however used in the code of algorithms (and in their proofs), where it is needed to distinguish different failure detectors. The same remark applies to the
123
When considering AAS[P], it is important to notice that while the values that are currently in suspectedi are meaningless for pi (e.g., pi cannot use such a value to send a message to a process and only to it), this process can use the integer |suspectedi | that provides it with a lower bound on the number of crashed processes. The perfect failure detector P This failure detector provides each process pi with a local set variable denoted alivei that contains process indexes and is such that it (1) contains at least the indexes of the processes that are currently alive and (2) eventually contains only the indexes of correct processes. Intuitively, the output of P corresponds to the complement set of the output of P with respect to Π the set of indexes. Formally, ∀F ∀H ∈ P(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) ⊆ Π . – Safety. ∀τ ∀i, j ∈ Π (i ∈ F(τ )) ⇒ (i ∈ H ( j, τ )). – Liveness. ∃τ, ∀i ∈ crashed(F) ∀ j ∈ corr ect (F) ∀τ ≥ τ i ∈ H ( j, τ ). The eventual leader failure detector Ω The eventual leader failure detector Ω [9] provides each process pi with a local variable leaderi that contains a process index such that eventually (1) the variables leaderi of the non-faulty processes contain forever the same index and (2) this index is the one of a non-faulty process. Formally, ∀F ∀H ∈ Ω(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) ∈ Π . – Liveness. ∃τ, ∃i ∈ corr ect (F), ∀ j ∈ corr ect (F) ∀τ ≥ τ H ( j, τ ) = i. The same remark as the one done for AAS[P] applies to AAS[Ω]. More precisely, the output of Ω is useless in AAS[Ω] in the sense that it cannot be used by a process to send a message only (i.e., without sending copies to all processes) to the process whose “identity” is output by Ω. Footnote 1 continued output of all the failure detectors defined in the paper. Their outputs are denoted alivei , leaderi , sigmai , ancpi , anapi , a_leaderi , and a_sigmai , respectively. 2
This property and the next one are called “safety” and “liveness” because they are used to prove the safety (resp., liveness) of the algorithm that uses this failure detector.
Anonymous asynchronous systems
The quorum failure detector Σ The notion of quorum has been introduced in [18] (and explicitly used to solve consensus in [25]). The quorum failure detector Σ [13] provides each process with a local set variable denoted sigmai that contains process indexes (such a set is called quorum) and is such that (1) any two quorum values do intersect (whatever the time instants at which these quorum values have been output) and (2) eventually, any quorum contains only correct processes. Formally, ∀F ∀H ∈ Σ(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) ⊆ Π . – Safety. ∀τ1 , τ2 ∀i 1 , i 2 ∈ Π H (i 1 , τ1 ) ∩ H (i 2 , τ2 ) = ∅. – Liveness. ∃τ, ∀i ∈ corr ect (F) ∀τ ≥ τ H (i, τ ) ⊆ corr ect (F). It is shown in [13] that Σ is the weakest failure detector to implement a register in an asynchronous message-passing system prone to any number of crashes. A simple proof of this result appears in [7]. 3.3 Identity-free failure detectors As seen in the Introduction, some failure detectors do not output process identities (or values associated with process identities) but Boolean values, integers, etc. whose “meaning” is on the entire system. As already indicated, we call them identity-free failure detectors. This section recalls the definition of three of them ( A P, A P, and AΩ) and introduces a new one (AΣ) that is the identity-free counterpart of Σ. As we will see later, “counterpart” means that they have the same computational power in a non-anonymous system. However AΣ is meaningful in an anonymous system, whereas Σ is not. The identity-free perfect failure detector A P Such a failure detector (a variant of a failure detector introduced in [23,24]) provides each process pi with a local integer variable ancpi (approximate number of crashed processes) that (1) is never greater than the number of crashed processes and (2) is eventually equal to the number of faulty processes. Intuitively A P satisfies the same properties as P except that, instead of returning a set of indexes, it simply returns the cardinal of this set. Formally, ∀F ∀H ∈ A P(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) ∈ N. – Safety. ∀τ ∀i ∈ Π H (i, τ ) ≤ |F(τ )|. – Liveness. ∃τ, ∀i ∈ corr ect (F) ∀τ ≥ τ H (i, τ ) = |crashed(F)|. The identity-free perfect failure detector A P Such a failure detector provides each process pi with an integer anapi (approximate number of alive processes) that (1) is never
145
smaller than the number of alive processes and (2) is eventually equal to the number of correct processes. Intuitively A P satisfies the same properties as P except that, instead of returning a set of indexes, it simply returns the cardinal of this set. Formally, ∀F ∀H ∈ A P(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) ∈ N. – Safety. ∀τ ∀i ∈ Π H (i, τ ) ≥ |Π \ F(τ )|. – Liveness. ∃τ, ∀i ∈ corr ect (F) ∀τ ≥ τ H (i, τ ) = |corr ect (F)|. The identity-free eventual leader failure detector AΩ Such a failure detector [19] provides each process pi with a local Boolean variable denoted a_leaderi such that eventually (1) there is one non-faulty process (say p ) whose Boolean variable remains forever true and (2) the Boolean variables of the other non-faulty processes remain forever false. Intuitively AΩ satisfies the same properties as Ω. They differ in that eventually all but one Boolean a_leaderi become false forever, while Ω eventually provides them with the same process index. Formally, ∀F ∀H ∈ AΩ(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) ∈ {tr ue, f alse}. τ – Liveness. ∃τ, ∃∈ corr ect (F), ∀τ ≥ H (, τ ) = tr ue ∧ ∀i = H (i, τ ) = f alse . The identity-free quorum failure detectors AΣ Such a failure detector outputs a set of pairs at each process. Each pair is composed of a label x and an integer y, and each label appears at most once in a given output. Without loss of generality, the set of labels is assumed to be a subset of N. If (x, y) appears in the output of AΣ at process pi at time τ , then a pair of the form (x, y ) with y ≤ y is guaranteed to appear in all future times at pi until pi crashes. Intuitively, (x, y) denotes to pi that label x exists and at least y processes know about x (let us notice that this information may be unreliable). Eventually, the output at each correct process stabilizes with respect to label x by having a pair (x, z) in the output and we are guaranteed that the failure detector output at at least z processes eventually contains the label x. Finally, given two pairs (x1, y1) and (x2, y2) in the outputs of the failure detector in a given history, we are guaranteed that every subset T 1 of y1 processes at which label x1 appears and every subset T 2 of y2 processes at which label x2 appears, have a non-empty intersection. As, we will see, a quorum is a set of processes that know the same label. Formally, the behavior of the failure detector is defined by the following properties. The first two properties (validity and monotonicity) are well-formedness properties, while the last two properties (safety and liveness) are behavioral properties. Definition 1 S(x) = {i | ∃τ ∈ N : (x, −) ∈ H (i, τ )}.
123
146
Formally, ∀F ∀H ∈ AΣ(F) – Validity. ∀τ ∀i ∈ Π H (i, τ ) = {(x1 , y1 ), . . . , (x p , y p )} where ∀a, b ∈ {1, . . . , p} (xa , xb ∈ N) ∧ (a = b) ⇒ (xa = xb ) . – Monotonicity. ∀τ ∀i ∈ Π (x, y) ∈ H (i, τ ) ⇒ ∀τ ≥ τ ∃y ≤ y, (x, y ) ∈ H (i, τ ) . – Liveness. ∃τ ∀i ∈ corr ect (F) ∃(x, y), ∀τ ≥ τ (x, y) ∈ H (i, τ ) ∧ |S(x) ∩ corr ect (F)| ≥ y . – Safety. ∀ τ1 , τ2 ∀i 1 , i 2 ∈ Π ∀ (x1 , y1 ) ∈ H (i 1 , τ1 ) ∀ (x2 , y2 ) ∈ H (i 2 , τ2 ) ∀ T1 ⊆ S(x1 ) ∀ T2 ⊆ S(x2 ) (|T1 | = y1 ) ∧ (|T2 | = y2 ) ⇒ (T1 ∩ T2 = ∅). Interpretation The validity property expresses the fact that, at any time, the output of AΣ is a non-empty set of pairs (x, y) where x is a label and y a number of processes associated with this label (those are processes assumed to know the label x). For any process pi , at any time τ and any label x, x can appear at most once, but any finite number of distinct labels can appear. The monotonicity property states that the number y of processes associated with a label x, as known by pi , can only decrease. This requirement is not necessary but makes things simpler. Not considering this monotonicity property will not change our results but would make them more difficult to understand and proofs more technical.3 Hence, this property has to be seen as a “comfort” property, and not as a “computability” property. S(x) is the set of all processes that know the label x. While a process pi knows it belongs to S(x), it does not know the value of S(x). Moreover S(x) can not be used by algorithms; it is only used to define AΣ. The next property is called liveness because it is used to prove liveness of AΣ-based algorithms (and similarly for the safety property). It captures the fact that, after some time, a quorum contains only correct processes, thereby preventing a correct process from blocking forever if it uses that quorum. To that end, this property states that, for any correct process pi , there is eventually a label x such that its associated number y of processes remains always smaller than or equal to the number of correct processes in S(x). (The underlying intuition is that any correct process will eventually know a label that is associated with a set of correct processes only.) The safety property is a little bit more involved. It captures the intersection property associated with quorums. Let x1 and x2 be two labels known by pi1 and pi2 respectively, T1
3
Basically, if not provided by the failure detector, each process can locally achieve this property using a local history of the output of AΣ.
123
F. Bonnet, M. Raynal
any subset of S(x1 ), T2 any subset of S(x2 ) (let us remember that S(x) is the set of all the processes that know label x). The safety property states the following: if |T1 | = y1 and |T2 | = y2 , where (x1 , y1 ) ∈ H (i 1 , τ1 ) and (x2 , y2 ) ∈ H (i 2 , τ2 ), then T1 ∩ T2 = ∅. Let us remember that y1 is the number of processes associated with label x1 as known by pi1 (and similarly for y2 ). The intuition is that the y1 processes that know label x1 and y2 processes that know label x2 do intersect.
4 Reductions between failure detectors Definitions The following definitions are a straightforward generalization of definitions given in [10]. They add the notion of “system model”. Given two failure detectors D1 and D2, and a system model C (AAS or AS), D1 is weaker than D2 in C (denoted D1 C D2) if there is an algorithm that emulates the output of the failure detector D1 in C[D2]. If reductions exist in both direction, (i.e., D1 C D2 and D2 C D1), D1 and D2 are equivalent in C (denoted D1 C D2). Finally the notation D1 ≺C D2 means that D1 is strictly weaker than D2 (i.e., D1 C D2 and D2 C D1). Similarly, if D1 is (strictly) weaker than D2 in C, D2 is said to be (strictly) stronger than D1 in C. If D1 C D2 and D2 C D1, D1 and D2 are said to be not comparable in C. It is important to notice that the existence of reductions between failure detectors depends on the system model. Given any two failure detectors D1 and D2, as AAS is AS without the notion of process identities, we have (D1 AAS D2) ⇒ (D1 AS D2), but we do not have (D1 AS D2) ⇒ (D1 AAS D2). As simple examples (see the proofs below) we have: ⎧ ⎨ X AS AX for X = P, Σ, or Ω, A P ≺AAS P, ⎩ (AΩ AS Ω) ∧ (AΩ AAS Ω). 4.1 Reductions in the non-anonymous model AS Directly from definitions, one can easily see that P AS P and A P AS A P: for the first equivalence it is sufficient to compute the complement set of the output set with respect to Π the set of all indexes, whereas for the latter it is sufficient to subtract the output integer to n, the total number of processes. In the following we prove the three equivalences P AS A P, Ω AS AΩ, and AΣ AS Σ. Moreover it is wellknown that Ω ≺AS P [9,27], Σ ≺AS P [13,27], and Ω and Σ are not comparable in AS [13]. The reductions among the failure detectors are summarized in Fig. 5 and explained in Sect. 4.3.
Anonymous asynchronous systems
Fig. 1 Building P in AS [A P]: a bounded quiescent transformation (code for pi )
4.1.1 P and A P are equivalent in AS The interested reader can have a look to [8] where is proved a more general equivalence in the model AS n,t which is the classical non-anonymous model where at most t < n processes may crash. We consider here the case of the so-called wait-free environment (t = n − 1). Building A P in AS[P] The transformation in that direction is trivial. The reader can easily check that, defining ancpi as |suspectedi | constructs the failure detector A P. Building P in AS[A P] A transformation that builds P in AS[A P] is described in Fig. 1. Interestingly, this transformation is bounded (be the execution finite or infinite, the local memory of each process requires only a bounded number of bits). Moreover, (1) the transformation is quiescent (i.e., there is a finite time after which no more messages are exchanged), and (2) the algorithm terminates in the runs where n − 1 processes crash. In order to compute the value of suspectedi (that is initialized to ∅), each process pi manages two local variables: – An integer ki , initialized to 0, that represents its current knowledge on the number of processes that have crashed. – An array ansi [1.. n], initialized to [0, . . . , 0], such that ansi [ j] = k means that k is the greatest inquiry number for which pi has received the corresponding answer alive(k). The behavior of pi is defined by four tasks. First, when pi discovers that more than ki processes have crashed, it updates accordingly ki , and broadcasts an inquiry message inquiry (ki ) to all the processes. Let us notice that this task can stop when ki = n − 1 as, due to the model definition, no more crashes can occur. Let us also observe that the messages inquiry(ki ) are sent by pi with increasing values, and due to the safety property of A P, pi knows that there are at most n − ki alive processes.
147
When pi receives an inquiry(k) message from a process p j it sends back to p j an alive(k) message to indicate that it is still alive. When it receives an answer alive(k) from a process p j , pi learns that p j has answered up to its k-th inquiry, and consequently updates ansi [ j]. The core of the transformation is the task T 4 that sets the current value of suspectedi . It is made up of a repeat statement that is executed until n − 1 processes are locally suspected. (When n − 1 processes have crashed, no more processes can crash and the task can terminate. If less than n − 1 processes crash, the task becomes quiescent -no more messages are sent- but does not terminate.) The body of the repeat statement is as follows. First, pi sets a local variable m to ki (the number of processes that, to the best of its knowledge, have crashed). Then, pi computes the set X made up of the processes that have not yet answered its m-th inquiry or a more recent one. If the predicate |X | = m is true, pi can safely conclude that the m processes that have not answered its m-th inquiry have crashed (let us recall that, while the tasks T 1 and T 4 proceed asynchronously, pi broadcasts inquiry(m) only after it knows that m processes have crashed). Theorem 1 The algorithm described in Fig. 1 is a bounded quiescent construction of P in AS[A P]. Proof Proof of the liveness property P. Let us consider an execution with a given failure pattern F and pi a non-faulty process in this execution. We have to show that if a process p j crashes, after some finite time, j permanently belongs to suspectedi . Let f = |crashed(F)|. There is a finite time τ , after which the f faulty processes have crashed and we have permanently ancpi = f (due to the liveness property of A P), which means that, after some finite time, pi broadcasts a message inquiry( f ). Due to the safety property of A P, this message is sent after the f processes have crashed. Consequently, no crashed process can answer this inquiry message. It follows that, when task T 4 executes with m = ki = f , the set X can only contain the f faulty processes when |X | = f , which concludes the proof of the liveness property. Proof of the safety property of P. Let pi be any process in an execution with a given failure pattern F. We have to show that no process is added to suspectedi before crashing. Let i 1 , . . . , i m be the m process identities that are placed in suspectedi during an iteration of task T 4. It follows from the query/response mechanism (implemented by the messages inquiry/alive) used when ki = m, and the safety property of A P, that each of the n −m other processes has answered after these m processes have crashed. Consequently, none of these n − m processes can be part of the m crashed processes. Hence, the set of processes that defines the value ofsuspectedi contains only crashed processes.
123
148
Proof of boundedness and quiescence. The fact that the construction is bounded and quiescent follows directly from the text of the algorithm: a process broadcast at most one inquiry(k) message for every value of k, and k can take a bounded number of values. 4.1.2 Ω and AΩ are equivalent in AS Ω and AΩ are equivalent in AS[∅]. The two directions of the equivalence are explained below. The proofs are straightforward and left to the reader. Building AΩ in AS[Ω] For any process pi , the current value of the Boolean variable a_leaderi of AΩ is computed by the test leaderi = i where leaderi is the output of Ω. Building Ω in AS[AΩ] The reduction consists in two tasks executed by all processes: (1) Each process pi checks periodically its Boolean a_leaderi and, if its value is true, broadcasts a message leader(i) (note that this reduction is done in the non-anonymous model, hence pi knows its identity). (2) When pi receives a message leader(k), it updates its leaderi to k. 4.1.3 Σ and AΣ are equivalent in AS: From Σ to AΣ The construction As AS[Σ] is not anonymous, it is possible for the processes to (statically) enumerate all the possible subsets Q, such that Q = ∅ and Q ⊆ Π . There are 2n − 1 such subsets. Moreover, the processes are provided with a deterministic function denoted name(). That function associates a label with each set Q, and satisfies the following properties: (a) ∀Q name(Q) ∈ {1, . . . , 2n − 1}, and (b) ∀Q, Q (Q = Q ) ⇒ name(Q) = name(Q ) . The algorithm building AΣ in AS[Σ] is described in Fig. 2. Interestingly, this construction is bounded and quiescent. It builds, at each process pi , a set a_sigmai that contains pairs of integers, and ensures that these sets of pairs satisfy the properties defining AΣ. The algorithm is made up of two tasks that are executed at each process pi .
F. Bonnet, M. Raynal
– Task T 1 repeatedly broadcasts the local output sigmai of the underlying failure detector Σ. In order to ensure the boundedness and quiescence properties, a local set variable senti (a set of sets) is used to prevent the same quorum to be sent several times. – Task T 2 is associated with the reception of messages quorum(quor um). When such a message is received, pi adds the pair (name(quor um), |quor um|) to the set a_sigmai if i ∈ quor um. Otherwise, pi discards the message. Theorem 2 The algorithm described in Fig. 2 is a bounded quiescent construction of AΣ in AS[Σ]. Proof The validity property follows immediately from the definition of the function name(). Moreover since name() is a one-to-one function, there is at most one pair (x, −) associated with a given x. The monotonicity property is then obvious. Proof of the liveness property of AΣ. Let us consider an execution with a given failure pattern F. We have to show that ∀i ∈ corr ect (F) ∃ (x, y), ∃ τ, ∀τ ≥ τ ((x, y) ∈ a_sigmaiτ ) ∧ (|S(x) ∩ corr ect (F)| ≥ y).4 To that end, let τ0 be the time instant at which all faulty processes have crashed and their messages have been received and processed. Moreover, let τ1 be a time instant, such that, ∀τ1 ≥ τ1 , τ
sigmai 1 contains only correct processes (due to the liveness property of Σ, τ1 does exist). Finally, let τ ≥ max(τ0 , τ1 ). All the processes that execute after τ are correct. Let i ∈ corr ect (F). Let quor um be any set obtained by pi after τ . As τ ≥ τ1 , quor um contains only correct processes, i.e., quor um ⊆ corr ect (F) (Observation O1). As pi is a correct process, any correct process p j receives quorum(quor um) and deal with it (if not yet done). From observation O1, all processes of the set quor um receive quorum(quor um). Let us consider a process p j such that j ∈ quor um. Each such p j adds (name(quor um), |quor um|) to a_sigma j if it has not done so yet. It follows that j ∈ S(x). Moreover, it follows from the text of the algorithm that no process pk outside the set quor um adds the pair (name(quor um), |quor um|) to a_sigmak . Consequently we have S(name(quor um)) = quor um (Observation O2). It follows from both observations O1 and O2 that, for any correct process pi , there is a pair (x, y) such that (x, y) = (name(quor um), |quor um|) such that |S(x) ∩ corr ect (F)| = |quor um| = y, which completes the proof of the liveness property. Proof of the safety property of AΣ.
Fig. 2 Building AΣ in AS [Σ]: a bounded quiescent transformation (code for pi )
123
The notation xiτ refers to the value of the variable x for the process pi at the time instant τ .
4
Anonymous asynchronous systems
149
Let quor um i1 and quor um i2 be two quorums obtained (at line 2) by two processes pi1 and pi2 at the time instants τi1 and τi2 , respectively. Let (x1 , y1 ) = (name(quor um i1 ), |quor um i1 |). As previously stated, only processes that can belong to S(x1 ) (by adding the pair (x1 , y1 ) to their set a_sigma) are processes of quor um i1 , which means that S(x1 ) ⊆ quor um i1 . As defined in the safety property of AΣ, let T1 ⊆ S(x1 ) such that |T1 | = y1 . Let us notice that, if such a set T1 exists, we have T1 = S(x1 ) = quor um i1 . With similar definitions and observations for quor um i2 , if a set T2 exists, we have T2 = S(x2 ) = quor um i2 . τi Thus we have T1 = quor um i1 = sigmai11 , and T2 = τi
quor um i2 = sigmai22 . It follows from the safety property of τi
τi
Σ that sigmai11 ∩sigmai22 = ∅. Consequently, T1 ∩T2 = ∅, which concludes the proof of the safety property. Proof of boundedness and quiescence. Since there is a bounded number of processes, the number of possible quorums output by a failure detector Σ is clearly bounded. It follows that both all the sets a_sigma and sent are bounded. Moreover, since each process broadcasts a quorum at most once, not only the content but also the number of messages is bounded, from which follows the quiescence property. The AS[Σ] model assumes that the communication channels are reliable in the sense that there is no loss, no duplication, and no creation of messages. Actually, the reader can check that the previous construction is not only bounded and quiescent, but remains correct when messages are finitely duplicated. 4.1.4 Σ and AΣ are equivalent in AS: From AΣ to Σ The construction The algorithm that builds failure detector Σ in AS[AΣ] is described in Fig. 3. It relies on two main data structures at each process pi . – alivei is a queue, always containing all the process indexes, that is managed as follows. When pi receives a message from p j , it reorders j and places it at the head of that queue. In that way, the processes that are alive (i.e., those that send messages) appear at the head of alivei , while the processes that have crashed are progressively moved at its tail. – queuei is an array of queues; queuei [x] contains the identities of the processes that, from pi ’s point of view, know the quorum whose name is x. The quorum names x are obtained from the local output a_sigmai (underlying failure detector AΣ). queuei [x][ j] denotes the jth element of queuei [x].
Fig. 3 Building Σ in AS [AΣ] (code for pi )
According to these data structures, the behavior of a process pi is made up of three tasks T 1, T 2 and T 3. – T 1 is an infinite loop in which pi repeatedly broadcasts a message alive(i, labelsi ) that contains the names of the quorums it knows (i.e., those that are in a_sigmai ). – T 2 is the matching task of T 1. When process pi receives alive( j, labels), it first updates alivei accordingly (line 6). Then, for each quorum name it knows (line 7), it updates its current view of the processes that know x (i.e., the processes p j that have (x, −) in their a_sigma j , lines 8–9). – T 3 is the core of the construction. It is an infinite loop whose aim is to define the current value of sigmai (the local output of Σ). pi first computes a set candidates that contains all the pairs (x, y) ∈ a_sigmai such that |queuei [x]| ≥ y (line 12). Those are the pairs such that pi has received alive(−, {· · · , x, · · · }) from at least y distinct processes (i.e., y processes know the label x). If the set candidates is empty, pi cannot compute a nontrivial value for sigmai . It consequently sets sigmai to Π (line 14). Otherwise, pi computes a non-trivial value for sigmai from the set candidates (lines 15–18). To that end, rank() is defined as the position of the identity in the queue alivei (line 16). The aim is to assign to sigmai the y processes that are at the head of queuei [x] (line 18), where the corresponding pair (x, y) ∈ candidates is determined as follows. Using an array-like notation, the identities in the prefix queuei [x][1..y] “globally appear in alivei before” the identities in the other prefixes queuei [x ][1..y ]. “Globally appear before” means that there is an identity in queuei [x ][1..y ] whose rank in alivei is after the rank of any identity in queuei [x][1..y]. (This is formally
123
150
F. Bonnet, M. Raynal
expressed by lines 15–17.) Let us notice that several prefixes queuei [x][1..y] can globally appear as being the “first” in alivei . If it is the case, any of them can be selected. To fix the idea, let us consider the following example. ⎧ ⎪ ⎪ alivei = [7, 1, 3, 9, 4, 8, 2, 5, 6], ⎪ ⎪ ⎨ a_sigmai = {(5, 4), (7, 3), (2, 5)}, queuei [5] = [1, 3, 4, 2, 5], ⎪ ⎪ ⎪ queue i [7] = [1, 8, 5], ⎪ ⎩ queuei [2] = [1, 5]. Considering queuei [5], queuei [7] and queuei [2], we have candidates = {(5, 4), (7, 3)}. Since |queuei [2]| < 5, the pair (2, 5) does not belong to the set of candidates. As queuei [5][4] = 2, and queuei [7][3] = 5, we have:
rank(queuei [5][4]) = rank(2) = 7, rank(queuei [7][3]) = rank(5) = 8.
Hence, r _min = 7 and (x, y) = (5, 4) defines the queue prefix whose identities are “first” in alivei . Consequently sigmai is set to queuei [5][1..4] = {1, 3, 4, 2}. Theorem 3 The algorithm described in Fig. 3 builds Σ in AS[AΣ]. Proof Proof of the safety property of Σ. We have to show that ∀ τ1 , τ2 ∀i 1 , i 2 ∈ Π sigmaiτ11 ∩ sigmaiτ22 = ∅. Let us first observe that a set assigned to sigmai is never empty (lines 14 and 18). When the set Π is the value of sigmai (line 14), the safety property is trivially satisfied. Hence, let us consider two processes pi1 and pi2 , such that the values of sigmai1 and sigmai2 have been computed at any time instants τ1 and τ2 , respectively (at lines 15–18). We have then the following. The value of sigmaiτ11 , obtained from some pair (x1 , y1 ), is the value of some set T1 = queuei [x1 ][1..y1 ]. It follows from the definition of S(x1 ), line 2, and lines 7–9 that T1 = queuei [x1 ][1..y1 ] ⊆ S(x1 ). Similarly, the value of sigmaiτ22 is obtained from some pair (x2 , y2 ), is the value of some set T2 = queuei [x2 ][1..y2 ], and it follows from the definition of S(x2 ), line 2, and lines 7– 9 that T2 = queuei [x2 ][1..y2 ] ⊆ S(x2 ). It then follows directly from the safety property of AΣ that T1 ∩ T2 = ∅, and consequently, sigmaiτ11 ∩ sigmaiτ22 = ∅, which concludes the proof of the safety property of Σ. Proof of the liveness property of Σ. Let us consider an execution with a given failure pattern F. We have to show that ∃τ, ∀i ∈ corr ect (F) ∀τ ≥ τ sigmaiτ ⊆ corr ect (F). Let τ0 be the time instant at which all faulty processes have crashed, all messages alive(−, −) sent by faulty pro-
123
Fig. 4 Variables alivei , queuei [x] = [q1 , q2 , . . . , q y , . . .], and queuei [x ] = [q1 , q2 , . . . , q y , . . .]
cesses have been received and processed, and each correct process has received a message alive(−, −) from each correct process after it has received all the messages from faulty processes. It follows from lines 3 and 6 that, from time τ0 , the correct processes are always before the faulty processes in alivei (for any correct process pi ). Moreover, due to the monotonicity and liveness properties of AΣ, there is a time τ1 from which there is a pair (x, y) ∈ a_sigmai such that |S(x) ∩ corr ect (F)| ≥ y. This means that there are at least y correct processes in S(x). All the time instants considered in the rest of the proof of the liveness property are time instants after max(τ0 , τ1 ). Let us consider such a pair (x, y) ∈ a_sigmai (as defined above). Each process p j with j ∈ S(x)∩corr ect (F) broadcasts forever alive( j, labels) with x ∈ labels (line 3). As each process pi such that i ∈ S(x) ∩ corr ect (F) receives these messages, it executes lines 8–9, hence the processes in S(x) ∩ corr ect (F) remain forever in queuei [x], which means that we eventually have forever |queuei [x]| ≥ |S(x) ∩ corr ect (F)| ≥ y and then (x, y) ∈ candidates, which means that the predicate of line 12, (candidates = ∅), remains forever false. As, after τ = max(τ0 , τ1 ), the faulty processes remain forever at the tail of alivei (see Fig. 4), it follows that the pair (x , y ) that is selected at lines 15–17 to define the current value of sigmai , is such that the processes that belong to queuei [x ][1..y ] are not “globally after” the processes in queuei [x][1..y] with respect to alivei . This means that the y -th process of queuei [x ] (q y in Fig. 4) is, in the list alivei , before the y-th process of queuei [x] (q y in Fig. 4). As all processes in queuei [x][1..y] are correct, it follows from the structure of alivei , that queuei [x ][1..y ] includes only correct processes. Hence, there is a time instant after which sigmai always contains correct processes, which concludes the proof of the liveness property of Σ. 4.2 Reductions in the anonymous model AAS The impossibility to build a failure detector D1 from a failure detector D2 in AS remains obviously true in AAS (this is because, when looking to an anonymous system, anonymity does provide processes with additional information). Formally, as stated in Sect. 4 (under its contrapositive form), for
Anonymous asynchronous systems
any two failure detectors D1 and D2 we have: (D1 AS D2) ⇒ (D1 AAS D2). It follows that we only need to check if the reductions that exist in a non-anonymous system still exist in its anonymous counterpart. 4.2.1 A list of reductions Most of the reductions in the standard model do not hold in the anonymous model. The following proofs are often simple and thus briefly explained in the following. As an example, the reduction between A P and AΩ is proved in details. In AAS: – P and P are incomparable. There is no reduction from one to the other since there is no way for processes, in the anonymous model, to discover indexes of alive (resp. faulty) processes when their failure detectors provide them only with indexes of faulty (resp. alive) processes. – A P and A P are equivalent. The reductions proposed in Sect. 4.1 for non-anonymous model remain valid since they do not use indexes. It is sufficient to subtract from n the output of one of these detector to emulate the output of the other. – Ω and AΩ are incomparable. On the one side, there exists no algorithm that can build Ω in AAS[AΩ] since it is not possible to associate identities with processes. On the other side, there exists no algorithm that builds AΩ in AAS[Ω] since the processes being anonymous, none of them can discover it is the eventual leader. Σ and AΣ cannot be compared for the same reasons. – A P (resp. A P) is stronger than AΣ. Since n is known, the reduction consists in permanently outputting the pair (0, n − ancpi ) (resp. the pair (0, anapi )) where ancpi (resp. anapi ) is pi ’s local output of A P (resp. A P). – P is not stronger than Σ and Ω. Indeed there is no way for processes, in the anonymous model, to discover indexes of correct processes when their failure detectors provide them only with indexes of faulty processes. – Due to the absence of identities, A P (resp. A P) are not stronger than P, P, Σ, and Ω. – P is stronger than A P (take the cardinality of the output set) and thus also stronger than AΣ by transitivity since AΣ ≺AAS A P and A P ≺AAS P.
(a)
151
– P is stronger than Σ (take the output of P), stronger than Ω (take the smallest identity output by P), stronger than A P (take the cardinal of the output of P), and thus also stronger than AΣ by transitivity since AΣ ≺AAS A P and A P ≺AAS P. – P (resp. P) and AΩ cannot be compared. Indeed, as the system is anonymous, there is no way for the processes to break asymmetry and elect a leader. 4.2.2 A P and AΩ are incomparable in AAS Theorem 4 It is impossible to construct A P in AAS[AΩ], and it is impossible to construct AΩ in AAS[A P]. Proof From AP to A : impossibility. Let us remember that all the processes execute the same code. Whatever the code they execute, there is a run in which all the processes are correct and they all proceed at the same speed and read exactly the same value from their failure detector variable ancpi . In such a run, there is no way to break the symmetry in order to distinguish a process from the other processes. It follows that AΩ cannot be built. From A to AP : impossibility. The proof is by contradiction. Let us suppose that there is an algorithm T that builds A P in AAS[AΩ]. By construction T does not rely on the process identities. Moreover, in a nonanonymous system, it is possible to transform Ω into AΩ (algorithm T ), and it is also possible to transform A P into P (algorithm T ). It is then possible to build P in AS[Ω] as follows. (All algorithms are executed in the non-anonymous model AS.) – First, use T to transform the failure detector Ω into the failure detector AΩ, – Then, use T to transform AΩ into A P, – Finally, use T to transform A P into P. This construction contradicts the fact that it is impossible to build P in AS[Ω]. It follows that T cannot exist. 4.3 Hierarchy and implementability of failure detectors The reductions among the failure detectors that have been previously described are summarized in Fig. 5. The reduc-
(b)
Fig. 5 Hierarchy of failure detectors. a in AS , b in AAS
123
152
tions in non-anonymous systems are described on the left, while the reductions for anonymous systems are described on the right. An arrow from D1 to D2 means that D2 is weaker than D1 in the corresponding system model. Differently, the absence of an arrow from D1 to D2 means that D2 cannot be built from D1. Failure detectors are introduced to capture the additional power required to solve a problem that is otherwise unsolvable in the considered system. While a (non-trivial) failure detector cannot be implemented in a pure asynchronous system, it is interesting to investigate if it can be implemented in a synchronous system. When such an implementation does exist, the failure detector is realistic [12]. Considering Fig. 5, a square indicates that the associated failure detector can be implemented in the corresponding synchronous system, while an ellipses denotes it cannot. As P can be easily implemented in a non-anonymous synchronous system, by reduction all the proposed failure detectors are realistic in AS. As far anonymous synchronous systems are concerned we have the following. – As there is no notion of process identity, P, P, Σ and Ω cannot be implemented in an anonymous synchronous system. – A P, A P and AΣ can be implemented in an anonymous synchronous system. At every round r , any alive process broadcasts a heartbeat message and counts the number h r of heartbeats received during that round. This number defines the current output of A P, the integer n−h r defines the current output of A P and the pair (0, h r ) defines the current output of AΣ. – AΩ cannot be implemented in an anonymous synchronous system. Indeed even if the system is synchronous there is no deterministic solution for processes to break the symmetry between them. The important point is that AΩ is not a realistic failure detector in an anonymous system. Despite this fact, AΩ remains important from a theoretical point of view as it is relevant in the search for the “weakest failure detector” for anonymous consensus (see Sect. 7). 5 An AΣ-based communication abstraction This section defines and presents an implementation of a communication abstraction that allows the upper layer to benefit from AΣ while hiding the communication pattern required to benefit from it. This abstraction, which will be used in Sect. 6 to obtain a simple consensus algorithm based on the pair of failure detectors AΣ, AΩ, could also be used to solve other problems addressed in the context of anonymous asynchronous systems enriched with AΣ.
123
F. Bonnet, M. Raynal
5.1 Definition of an AΣ-based communication abstraction This abstraction provides the processes with an operation denoted a_sigma_broadcast(). This operation has two input parameters tag and est, and outputs a set of values s_r ec. The parameter est denotes the value that the invoking process wants to broadcast. The parameter tag is the identity of the corresponding invocations of a_sigma_broadcast(). This operation is a one-shot operation which means that, given a value of tag, it is assumed that a process invokes at most once a_sigma_broadcast(tag, −). Hence, a given value tag identifies the set (of at most n) strongly related invocations of a_sigma_broadcast(tag, −). Definition The behavior of the invocations a_sigma_ broadcast(tag, −) is defined by the following properties. – Termination. Let us assume that all the correct processes invoke a_sigma_broadcast(tag, −). Any of these invocations terminates. – Validity. Let SB[tag] be the set of values broadcast by the invocations a_sigma_broadcast(tag, −) and s_r eci be the set of values returned by the invocation issued by pi . Then s_r eci ⊆ SB[tag]. – Intersection. For any two processes pi and p j , whose invocations of a_sigma_broadcast(tag, −) terminate, let s_r eci and s_r ec j be the sets of values returned to pi and p j respectively. Then s_r eci ∩ s_r ec j = ∅. 5.2 An algorithm implementing a_sigma_broadcast() Description of the implementation An implementation of a_sigma_broadcast(tag, est) is described in Fig. 6. As previously indicated, the parameter tag identifies a set of invocations while est denotes the value that the invoking process wants to broadcast. This implementation uses sequence numbers locally denoted sn i at process pi . The invoking process pi first reads the current value of a_sigmai to extract the set of labels labelsi . It then broadcasts the message msg(tag, sn i , labelsi , est) (line 1) in which sn i denotes the current sequence number. Then pi enters a repeat loop which encapsulates a communication pattern (lines 2–12). Each time it enters the loop, pi checks if an early termination is possible. This happens when pi has received a message early(tag, s_r ec) (line 3). If it is the case, pi forwards this early termination message and returns the corresponding set s_r ec to pi . If there is no early termination, pi checks the predicate of line 4 and returns a set s_r eci if this predicate is satisfied (lines 5–6).
Anonymous asynchronous systems
153
Fig. 6 The AΣ-based communication operation a_sigma_broadcast() (code for pi )
This predicate states that there is currently a pair (x, y) in a_sigmai such that pi has received “enough” (namely y) message msg(tag, sn, labels j , −) and – all these message carry the same sub-round number sn (which can be different from sn i ), and – x ∈ label j for the field label j of each of these messages. The set returned by pi at line 6 contains the estimate values carried by these y messages. Moreover, before returning this set, pi broadcast a message early(tag, s_r eci ) to allow other processes to benefit from its termination. If the predicate of line 4 is not satisfied, pi checks (line 7) if the set of labels (as defined by a_sigmai ) has changed or if it is late with respect to its current sequence number sn i . If it is the case, pi progresses to the next sequence number, recomputes the current set of labels labeli (line 8) and broadcasts a new message msg(tag, sn i , labelsi , est) (line 9). Finally, whether the predicate of line 7 is satisfied or not, pi enters again the repeat loop. Theorem 5 The algorithm described in Fig. 6 is an implementation of the AΣ-based communication abstraction. Proof Proof of the termination property. Assuming that all the correct processes invoke the operation a_sigma_broadcast() with the same parameter tag, we have to show that they all terminate their invocation, i.e., no correct process loops forever in the repeat loop (lines 02–12). If a process loops forever, it follows from line 3 that, no correct process has sent an early(tag, −) message which means that all correct processes are looping forever in their invocations of a_sigma_broadcast(tag, −). It follows from the liveness property of AΣ that there is a finite time τ after which, for each correct process pi , there is a pair (x, y) ∈ a_sigmai such that |S(x)∩corr ect (F)| ≥ y. Due to (a) the definition of S(x) = { j | ∃τ ∈ N, (x, −) ∈ a_sigma τj }, (b) the fact that, after a finite time, S(x) contains at least y correct processes, and (c) the repeated broadcast by the correct processes of msg(tag, −, −, −) messages with
increasing sequence numbers, it follows that there is necessarily a sequence sn during which pi receives y messages msg(tag, sn, labelsj , −) with x ∈ labelsj , and the predicate of line 4 is then satisfied, from which the termination property follows. Proof of the validity property. This proof follows directly from the observation that any message msg(tag, −, −, est) broadcast by pi (at lines 1 or 10) is such that est is the value of the input parameter of its invocation of a_sigma_broadcast(tag, −). Proof of the intersection property. Considering the invocations a_sigma_broadcast(tag, −), let s_r eci be the set returned by a process pi at line 6. This means that the predicate of line 4 is satisfied, i.e., ∃ (x1 , y1 ) ∈ a_sigmai such that the process pi has received y1 messages msg(tag, sn1, labelsk , −) carrying the same sequence number sn1 and such that x1 ∈ labelsk (each message carrying its own value labelsk ). Let T1 ⊆ S(x1 ) be the set of processes that have sent these y1 messages, hence |T1 | = y1 . In a similar way, let p j be a process whose invocation a_sigma_broadcast(tag, −) returns the set s_r ec j at line 6. Hence, the predicate of line 4 is satisfied, i.e., there is pair (x2 , y2 ) ∈ a_sigma j such that p j has received y2 messages msg(tag, sn2, labelsk , −) carrying the same sequence number sn2 and such that x2 ∈ labelsk (each message carrying its own value labelsk ). Let T2 ⊆ S(x2 ) be the set of processes that have sent these y2 messages, hence |T2 | = y2 . It follows from the safety of AΣ that T1 ∩ T2 = ∅. Hence, there is a process p , such that ∈ T1 ∩ T2 , that has sent the message msg(tag, sn1, labels , v) to pi (and we have v ∈ s_r eci ) and the message msg(tag, sn2, labels , v ) to p j (and we have v ∈ s_r ec j ). But p does not change its estimate value est while executing the repeat loop from which follows that v = v = est. Consequently, est ∈ s_r eci and est ∈ s_r ec j , which concludes the proof of the theorem. 6 A consensus algorithm for AAS[AΣ, AΩ] Why (AΣ, AΩ)? As indicated in the Introduction, the fact that (1) (Σ, Ω) is the weakest failure detector to solve
123
154
F. Bonnet, M. Raynal
consensus in a non-anonymous system, (2) (Σ, Ω) are useless in anonymous systems, and (3) (AΣ, AΩ) is the identity-free counterpart of (Σ, Ω) was one of our motivations for designing an (AΣ, AΩ)-based uniform anonymous consensus algorithm. 6.1 The consensus problem The consensus problem has been defined in the Introduction. Each process proposes a value and (at least) the correct processes have to decide a value. The problem can be defined more precisely by the following properties (which means that, to be correct, any run of any algorithm that pretends to solve consensus has to satisfy these properties). – – – –
Termination. Every correct process decides on a value. Integrity. A process decides at most once. Validity. A decided value is a proposed value. Agreement. No two processes decide on different values.
6.2 Notation A process pi uses 5 local variables denoted ri , est1i , s_r ec1i , est2i and s_r ec2i (their meaning is described below). Each of them is written exactly once during a round but est1i which can be written 0, 1 or two times during a round. Let lvari be any of s_r ec1i , est2i , or s_r ec2i . The notation lvari [r ] is used to denote the value of lvari after it has been assigned by pi during round r . As each of the concerned local variables is assigned exactly once during a round, there is no ambiguity. 6.3 Description of the (AΣ, AΩ)-based consensus algorithm This algorithm, which works in strongly anonymous systems (i.e., systems where n is not known by the processes) borrows its “three-phase per round” structure from the (non-anonymous) consensus algorithm presented in [26]. Differently from that algorithm, the message exchange patterns used inside the second and third phases are encapsulated inside the previously defined a_sigma_broadcast() communication abstraction. The algorithm is described in Fig. 7. As already indicated, it is round-based: each process executes a sequence of asynchronous rounds until it decides. A process pi invokes the operation propose(vi ) (where vi is the value it proposes). It decides when it executes the statement return(v) (line 11 or 17, where v is the value it decides). As in other nonanonymous consensus algorithms, when a process decides it stops participating in the consensus algorithm. Consequently,
123
Fig. 7 A Consensus algorithm for AAS [AΣ, AΩ] (code for pi )
before deciding, a process pi broadcasts a message decide(v) in order to prevent the other processes from blocking forever waiting for a message that pi will never send. The three main local variables associated with a round are ri (the local round number), and a pair of estimates of the decision value denoted est1i and est2i . The variable est1i contains pi ’s current estimate of the decision value when a new round starts while est2i , whose value is computed during every round, contains either a new estimate of the decision value or a default value ⊥. In a round r , a process pi executes three phases, denoted phase 0, 1, and 2 which are as follows. The first phase of a round is the only one where AΩ is used, while AΣ is used only in the two other phases through invocations of the communication abstraction a_sigma_broadcast(). The two instances of a_sigma_broadcast() invoked during a round r use unambiguously the tag values 2 × r and 2 × r + 1, respectively. – First phase of round r . In the first phase of a round, a process pi that considers itself as the leader broadcasts a message phase0(ri , v). If a_leaderi is false, pi waits for a message phase0(r, v), adopts v as its new estimate and forwards phase0(ri , v) to all (this forwarding is used to prevent other processes from blocking forever in that phase of round r ). – Second phase of round r . Similarly to [25], the aim of the second phase of a round r is to assign a value to the variables est2i in such a way that the following round property denoted P(r ) is always satisfied:
(est2i [r ] = ⊥) ∧ (est2 j [r ] = ⊥) ⇒ est2i [r ] = est2 j [r ] .
Anonymous asynchronous systems
To attain this goal in an anonymous system, each process invokes a_sigma_broadcast(2r, est1i ) and saves its result in s_r ec1i . If all the elements of the set s_r ec1i are equal to the same value v, est2i is set to that value v, otherwise est2i is set to the default value ⊥. Intuitively, the previous property P(r ) is a consequence of the intersection property of a_sigma_broadcast (2r, −) which states that any two sets s_r ec1i [r ] and s_r ec1 j [r ] have a non-empty intersection. – Third phase of round r . The aim of this phase is to allow a process to decide when it discovers that “enough” processes have the same non-⊥ value v in their estimates est2i . To that end, each process pi invokes (line 10) a_sigma_broadcast(2r + 1, est2i ) and saves into s_r ec2i the set that is returned. Let us observe that, due to the property P(r ), the estimate values est2i that are sent are equal either to the same value v = ⊥ or ⊥. As s_r ec2i is the set of values received by a process pi , we have then s_r ec2i [r ] = {v}, s_r ec2i [r ] = {v, ⊥} or s_r ec2i [r ] = {⊥}. As we will see in the proof, the intersection property of the invocations a_sigma_broadcast(2r + 1, −) ensures that s_r ec2i [r ] = {v} and s_r ec2 j [r ] = {⊥} are mutually exclusive. The behavior of pi is then governed by the value of s_r ec2i . If s_r ec2i = {v}, pi decides v after having broadcast a deadlock-prevention message indicating that it is about to decide v (line 11). If s_r ec2i = {v, ⊥}, pi adopts v as the new estimate value of est1i and starts the next round (line 12). Finally, if s_r ec2i = {⊥}, pi proceeds to the next round without modifying its estimate est1i (line 13).
6.4 Proof of correctness of the consensus algorithm Lemma 1 If no process decides, the correct processes execute an infinite number of rounds. Proof The proof is by contradiction. Assuming that no process decides and all correct processes block forever, let r be the smallest round number at which a process blocks forever and let pi be such a correct process. It can be blocked in the wait until statement in phase 0 (line 5), during the invocation of a_sigma_broadcast(2r, −) (line 8) or the invocation of a_sigma_broadcast(2r + 1, −) (line 10). Phase 0. If pi is the eventual leader it cannot block forever in phase 0. So, let p , = i, be the eventual leader. Process p cannot be blocked forever at phase 0 of round r either because a_leader becomes eventually true, or because p receives a message phase0(r, −). Whatever the case, p broadcasts phase0(r, −), and eventually pi receives it. Hence, no correct process can block forever in phase 0 of round r .
155
Phase 1. As r is the first round during which a correct process blocks forever and no correct process blocks forever at line 8, let us assume that a correct process blocks forever at line 8. It follows that all the correct processes eventually enter round r and invoke a_sigma_broadcast(2r, −). Consequently, the assumption required by the termination property of a_sigma_broadcast() is satisfied. It then follows from that property that all the correct processes terminate their invocation of a_sigma_broadcast(2r, −) and no correct process blocks forever at line 8. Phase 2. The proof that no correct process can block at line 10 is the same as the previous one (after having replaced the tag value 2r by 2r + 1). Lemma 2 Every correct process eventually decides. Proof Before deciding at line 11, a process broadcasts a message decide(−). Hence, if a process decides, all correct processes decide. Hence, let us assume, by contradiction, that no process decides. Due to the definition of AΩ, there is a time τ0 from which there is exactly one correct process (say p ) whose Boolean variable a_leader remains forever true, while all other a_leaderi Boolean variables remain forever false. Let τ1 be a time after which all faulty processes have crashed. Finally, let τ ≥ max(τ0 , τ1 ). Due to Lemma 1, this means that there is a round r , entered by the correct processes after τ , from which p is the only process such that a_leader = true. Process p is consequently the only process to broadcast phase0(r, v) with v = est1 (line 7) and each correct process receives this message (either directly from p or after forwarding by another process). The important point here is that each correct process pi is such that est1i = v after it has executed line 6. Hence, they all broadcast phase0(r, v) at line 7. It follows that each correct process executes round r and invokes a_sigma_broadcast(2r, v) (line 8). Due to Lemma 1 no correct process remains blocked at line 8. It then follows from the validity property of a_sigma_broadcast (2r, −) that each correct process pi obtains s_r ec1i = {v} and invokes a_sigma_broadcast(2r + 1, v). Finally, using the same arguments as before, each correct process pi obtains s_r ec2i = {v}, executes line 11 and decides contradicting the initial assumption and thereby completing the proof of the consensus termination property. Lemma 3 ∀ r : (est2i [r ] = ⊥) ∧ (est2 j [r ] = ⊥) ⇒ (est2i [r ] = est2 j [r ]). Proof Let us assume that est2i [r ] = v, i.e., s_r ec1i [r ] = {v}. If follows from the intersection property of the invocations a_sigma_broadcast(2r, −) that, for any p j , we have v ∈ s_r ec1 j [r ]. It then follows from line 9 that est2 j [r ] = v if s_r ec1 j [r ] = {v} and est2 j [r ] = ⊥ if s_r ec1 j [r ] = {v}.
123
156
Lemma 4 No two processes decide on different values. Proof If no process decides or a single process decides at line 11, the consensus agreement property is trivially satisfied. Moreover, if a process decides at line 17, it decides a value that has been sent at line 11. Hence, let us consider that two processes decide at line 11 and let r be the first round at which a process decides. Let pi be a process that decides at that round, and v the value it decides. Let p j be another process that decides at round r ≥ r . We consider two cases. Case 1: p j decides at round r = r . As pi decides v during round r , we have s_r ec2i [r ] = {v} (line 11). It follows from the intersection property of a_sigma_broadcast(2r +1, −) that s_r ec2i [r ] ∩ s_r ec2 j [r ] = ∅ from which we conclude that v ∈ s_r ec2 j [r ]. As p j decides, s_r ec2 j [r ] contains a single value which (due to the previous argument) is necessarily v and the consensus agreement property follows. Case 2: p j decides at round r > r . Hence, p j proceeds from r to r + 1. In that case, during round r , we have s_r ec2 j [r ] = {v}. As s_r ec2i [r ] = {v} and s_r ec2i [r ] ∩ s_r ec2 j [r ] = ∅, we necessarily have v ∈ s_r ec2 j [r ]. Moreover, it follows from Lemma 3, line 9 and the validity property of a_sigma_broadcast(2r +1, −) that only v and ⊥ have been broadcast by the invocations of a_sigma_broadcast(2r + 1, −) at line 10. Hence, we have s_r ec2 j [r ] = {v, ⊥}. Consequently, any process p j that proceeds to the round r + 1 executes line 12 and we have est1 j = v when it starts the round r +1. Said another way, v is the only estimate value present in round r +1, from which follows that no other value can be decided at line 11 in a round r > r , which completes the proof of the consensus agreement property. Theorem 6 The algorithm described in Fig. 7 solves the consensus problem in AAS[AΣ, AΩ]. Proof The consensus integrity property follows directly from the observation that a process stops participating in the algorithm as soon as it has invoked the statement return(). The consensus validity property (a decided value is a proposed value) follows directly from the validity property of a_sigma_omega() and the following observations O1 and O2. All the est1i variables are initialized to proposed values (O1). A decided value is a non-⊥ value of an est2i local variable, which has been assigned the value of an est1 j variable (O2). The proof of consensus agreement follows from Lemma 4. The proof of consensus termination follows from Lemma 2. 7 On the weakest FD for anonymous consensus Failure detector-based consensus algorithms The previous section has presented an (AΣ, AΩ)-based anonymous
123
F. Bonnet, M. Raynal
consensus algorithm. A (non-uniform) A P-based consensus algorithm has been presented in [6]. A natural question is then: “which of (AΣ, AΩ) and A P is the weakest to solve consensus in anonymous systems ?” Unfortunately (AΣ, AΩ) and A P cannot be compared in AAS (the proof is similar to the ones that appear in Sect. 4.1). Notion of weakest failure detector for a given problem [9] Given a problem P and a failure detector D, D is the weakest failure detector for P in X X [∅] (where X X stands for AS or AAS) if (a) there is an algorithm that solves P in X X [D], and (b) for any failure detector D such that P can be solved in X X [D ], we have D D . It is shown in [20] that, in AS[∅], any problem has a weakest failure detector. New failure detectors Given two failure detectors D1 and D2, let us define a new failure detector D1 ⊕ D2 as follows. During an arbitrary but finite period of time, D1 ⊕ D2 outputs ⊥ at every process, and then behaves either as D1 or as D2 at all processes. Let us observe that, if D1 and D2 cannot be compared, D1 (resp.,D2) is strictly stronger that D1 ⊕ D2. This is because D1 ⊕ D2 can trivially be built in X X [D1] (resp., X X [D2]), while D1 (resp.,D2) cannot be built in X X [D1 ⊕ D2]. Weakest failure detector for consensus in AAS Whereas (Σ, Ω) is the weakest failure detector for consensus in nonanonymous systems, (AΣ, AΩ) is not the weakest failure detector for anonymous consensus, due to the absence of reduction between A P and AΩ. Hence, let us introduce the new failure detector (AΣ, AΩ) ⊕ A P which is strictly weaker than both (AΣ, AΩ) and A P. Interestingly, there is a simple algorithm that solves anonymous consensus in the system AAS[(AΣ, AΩ) ⊕ A P]. This algorithm is as follows. Each process pi waits until the output of (AΣ, AΩ) ⊕ A P is different from ⊥. Then, according to the actual output of the failure detector (that is non-deterministic), it executes either the (AΣ, AΩ)-based algorithm presented in Sect. 6 or the A P-based algorithm described in [6]. A question Is (AΣ, AΩ)⊕ A P the weakest failure detector for solving anonymous consensus. This question is motivated by that in a non-anonymous system we have the observation (Σ, Ω) ⊕ P AS (Σ, Ω) which is the weakest failure detector for consensus.
8 Conclusion This paper was on failure detectors in anonymous systems. It has presented three main contributions. The first is the introduction of AΣ the identity-free quorum failure detector. The paper has shown that it is the anonymous counterpart of Σ the quorum failure detector (which means that they
Anonymous asynchronous systems
are equivalent in non-anonymous systems). The paper has also investigated the identity-free perfect detector A P and presented a quiescent bounded construction that builds the failure detector P in non-anonymous asynchronous systems enriched with A P. The paper has also introduced a new communication abstraction that provides processes with a quorum-like intersection property on the sets of delivered messages. This abstraction, which favors modularity, hides an AΣ-based communication pattern which involves a finite number of internal asynchronous communication phases. The paper has then presented and proved a consensus algorithm for strongly anonymous systems enriched with the failure detector AΩ (the identity-free eventual leader failure detector) and the failure detector AΣ. The consensus algorithm that is obtained is not trivial and does not require the knowledge of total number of participating processes. Its design relies on the new communication abstraction introduced in the paper. Finally, the paper has discussed the hierarchy notion of failure detectors in both anonymous and non-anonymous systems. It has also discussed the notion of “weakest failure detector” for anonymous consensus, and shown that, differently from their non-anonymous counterparts P and (Ω, Σ), the identity-free failure detectors A P and (AΩ, AΣ) cannot be compared. Acknowledgments We would like to thank the anonymous referees whose suggestions helped us to improve both the presentation and the content of the paper. F. Bonnet was partially supported by the JSPS Postdoctoral Fellowship for Foreign Researchers and by MEXT KAKENHI (No.22-00720). M. Raynal was partially supported by the French ANR project DISPLEXITY devoted to computability and complexity of distributed computing. A preliminary version of this paper has been presented at the 24th International Symposium on Distributed Computing (DISC 2010) (pages 206–220 of volume 6343 of Springer-Verlag LNCS).
References 1. Angluin, D.: Local and global properties in networks of processes. In: Proc. 12th Symposium on Theory of Computing (STOC’80). ACM Press, pp. 82–93 (1980) 2. Angluin, D., Aspnes, J., Diamadi, Z., Fischer, M.J., Peralta, R.: Computation in networks of passively mobile finite-state sensors. Distrib. Comput. 18(4), 235–253 (2006) 3. Attiya, H., Snir, M., Warmuth, M.K.: Computing on an anonymous ring. J. ACM. 35(4), 845–875 (1988) 4. Attiya, H. Welch, J.: Distributed computing, fundamentals, simulation and advanced topics, 2nd edn. Wiley Series on Parallel and Distributed Computing, p. 414 (2004) 5. Biely, M., Robinson, P., Schmid, U.: Weak synchrony models and failure detectors for message-passing (k) set agreement. In: Proc. 13th Int’l Conference on Principles of Distributed Systems (OPODIS’09), Springer-Verlag LNCS #5923, pp. 285–299 (2009) 6. Bonnet, F., Raynal, M.: The price of anonymity: optimal consensus despite asynchrony, crash and anonymity. ACM Trans. Auton. Adapt. Syst. (TAAS). 6(4):Article 23 (2011)
157 7. Bonnet, F., Raynal, M.: A simple proof of the necessity of the failure detector Σ to implement an atomic register in asynchronous message-passing systems. Inf. Process. Lett. 110(4), 153–157 (2010) 8. Bonnet, F., Raynal, M.: Anonymous asynchronous systems: the case of failure detectors. Tech Report PI 1945, IRISA, Rennes (2010) 9. Chandra, T., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM. 43(4), 685–722 (1996) 10. Chandra, T., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM. 43(2), 225–267 (1996) 11. Chothia, T., Chatzikokolakis, K.: A survey of anonymous peerto-peer file-sharing. In: Proc. Satellite workshop of the Int’l Conference on Embedded and Ubiquitous Systems (EUS’05), pp. 744–755 (2005) 12. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: A realistic look at failure detectors. In: Proc. Int’l Conference International on Dependable Systems and Networks (DSN’02). IEEE Computer Press, pp. 345–353 (2002) 13. Delporte-Gallet, C., Fauconnier, H., Guerraoui R.: Tight failure detectors bounds on atomic object implementations. J. ACM. 57(4) (2010). http://dx.doi.org/10.1145/1734213.1734216 14. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Kouznetsov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proc. 23th ACM Symposium on Principles of Distributed Computing (PODC’04) ACM Press, pp. 338–346 (2004) 15. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Tielmann, A.: The weakest failure detector for message passing set-agreement. In: Proc. 22th Int’l Symposium on Distributed Computing (DISC’08), Springer-Verlag LNCS #5218, pp. 109–120 (2008) 16. Durresi, A., Paruchuri, V., Durresi, M., Barolli, L.: A hierarchical anonymous communication protocol for sensor networks. In: Proc. Int’l Conference on Embedded and Ubiquitous Systems (EUS’05), Springer Verlag LNCS #3824, pp. 1123–1132 (2005) 17. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM. 32(2), 374– 382 (1985) 18. Gifford, D.K.: Weighted voting for replicated data. In: Proc. 7th ACM Symposium on Operating System Principles (SOSP’79), ACM Press, pp. 150–172 (1979) 19. Guerraoui, R., Raynal, M.: The alpha of indulgent consensus. Comput. J. 50(1), 53–67 (2007) 20. Jayanti, P., Toueg, S.: Every problem has a weakest failure detector. In: Proc. 27th ACM Symposium on Principles of Distributed Computing (PODC’08), pp. 75–84 (2008) 21. Lakshman, T.V., Wei, V.K.: Distributed computing on regular networks with anonymous nodes. IEEE Trans. Comput. 43(2), 211– 218 (1994) 22. Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann Pub., San Francisco, CA, p. 872 (1996) 23. Mostefaoui, A., Rajsbaum, S., Raynal, M., Travers, C.: On the computability power and the robustness of set agreement-oriented failure detector classes. Distrib. Comput. 21(3), 201–222 (2008) 24. Mostefaoui, A., Rajsbaum, S., Raynal, M., Travers, C.: The combined power of conditions and information on failures to solve asynchronous set agreement. SIAM J. Comput. 38(4), 1574–1601 (2008). http://dx.doi.org/10.1137/050645580 25. Mostefaoui, A., Raynal, M.: Solving consensus using Chandra–Toueg’s unreliable failure detectors: a general quorum-based approach. In: Proc. 13th Int’l Symposium on Distributed Computing (DISC’99), Springer-Verlag LNCS #1693, pp. 49–63 (1999) 26. Mostefaoui, A., Raynal, M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95–107 (2001) 27. Raynal, M.: Communication and Agreement Abstractions for Fault-Tolerant Asynchronous Distributed Systems. Morgan &
123
158 Claypool Publishers, San Francisco, CA, p. 251, ISBN 978-160845-293-4 (2010) 28. Yamashita, M., Kameda, T.: Computing on anonymous networks: part I-characterizing the solvable cases. IEEE Trans. Parallel Distrib. Syst. 7(1), 69–89 (1996)
123
F. Bonnet, M. Raynal 29. Yamashita, M., Kameda, T.: Computing on anonymous networks: part II-decision and membership problems. IEEE Trans. Parallel Distrib. Syst. 7(1), 90–96 (1996)