Theory Comput Syst (2011) 48: 343–373 DOI 10.1007/s00224-009-9247-x
A Hierarchy of Monotone Deterministic Non-Forgetting Restarting Automata Hartmut Messerschmidt · Friedrich Otto
Published online: 21 November 2009 © Springer Science+Business Media, LLC 2009
Abstract The non-forgetting restarting automaton is an extension of the restarting automaton that is obtained by combining the restart operation with a change of the internal state just like the other operations. Thus, when executing a restart operation a non-forgetting restarting automaton is not forced to reset its internal state to its initial state. We analyze the expressive power of the deterministic variants of this model. As our main result we establish a hierarchy of language classes that are characterized by various types of monotone deterministic non-forgetting restarting automata. This hierarchy ranges from the deterministic context-free languages to the so-called leftto-right regular languages. In particular, we show that for monotone deterministic non-forgetting restarting automata, the RRWW-model is strictly more powerful than the RWW-model. This is the first time that for a particular type of restarting automaton the expressive power of the RRWW-variant is separated from the expressive power of the corresponding RWW-variant. Keywords Restarting automata · Nonforgetting · Hierarchy of language classes 1 Introduction The restarting automaton was introduced by Janˇcar et al. [6] to model the analysis by reduction, which is a technique used in linguistics to analyze sentences of natural Some of the results of this paper have been announced at CSR 2006 at St. Petersburg. An extended abstract appeared in the proceedings of that conference [18]. H. Messerschmidt AG Künstliche Intelligenz, Technologie-Zentrum Informatik, Universität Bremen, 28357 Bremen, Germany e-mail:
[email protected] F. Otto () Fachbereich Elektrotechnik/Informatik, Universität Kassel, 34109 Kassel, Germany e-mail:
[email protected]
344
Theory Comput Syst (2011) 48: 343–373
languages. According to this technique a sentence is processed by performing local simplifications, in each step preserving the syntactic correctness or incorrectness of the sentence. In Czech and German linguistics already several programs exploit the basic ideas of restarting automata [15, 24]. A two-way restarting automaton, RLWW-automaton for short, consists of a finitestate control, a finite tape alphabet that consists of input symbols and auxiliary symbols, a flexible tape with endmarkers, and a read/write window of a fixed finite size. Its computations proceed in cycles. Starting with the finite-state control in the initial state and with the read/write window at the left endmarker, an RLWW-automaton moves its window along the tape by performing move-right and move-left steps until it reaches a position at which a rewrite step is applicable. Such a rewrite step performs a local simplification of the tape contents by replacing the string currently inside the window by a string that is strictly shorter than the original string. At the same time the (flexible) tape is shortened appropriately. Thereafter the window is shifted again along the tape by move-right and move-left steps until finally a restart step is executed, which repositions the window to the left endmarker and resets the finite-state control to the initial state, in this way completing the current cycle. The part of a computation that follows after the last application of a restart step is called the tail of the computation. It either ends with an accept instruction, which means that the current computation is successful, or with a reject, which means that the current computation fails. An input word w is accepted by a restarting automaton M, if M has an accepting computation that begins with tape contents w. By L(M) we denote the language consisting of all input words that are accepted by M. It is called the input language of M. Since 1995 many variants of the restarting automaton have been introduced and studied in the literature, and it has been shown that many well studied language classes can be characterized by variants of the restarting automaton (see, e.g., [25, 27]). For example, the class DCFL of deterministic context-free languages is characterized by several types of deterministic one-way restarting automata that are monotone, the class CFL of context-free languages is characterized by nondeterministic monotone restarting automata with auxiliary symbols, and the class CRL of Church-Rosser languages of McNaughton et al. [16, 21] is characterized by deterministic one-way restarting automata with auxiliary symbols. Obviously, the restarting automaton can be seen as a restricted type of linearbounded automaton. Hence, the input language L(M) of a restarting automaton M is necessarily context-sensitive. In fact, the restarting automaton is more restricted than a linear-bounded automaton in two ways: – each rewrite operation of a restarting automaton strictly reduces the length of the tape (inscription), and – between any two rewrite operations a restarting automaton must execute a restart step, which has two effects: (a) the automaton forgets the place of its latest rewrite operation, as its window is moved back to the left end of its tape, and (b) the automaton forgets that it has at all done some rewrite steps, as its internal state is reset to its initial state.
Theory Comput Syst (2011) 48: 343–373
345
Thus, a restarting automaton must encode this information in the tape inscription, but it can do so only in a very restricted way, as it can only perform a single rewrite operation between any two restart steps, and, in addition, this rewrite operation is length-reducing. In [17, 18] the current authors have initiated an investigation of the influence of the second effect of the restart operation listed above on the expressive power of the restarting automaton. For this they introduced a more general model, the so-called non-forgetting restarting automaton. As the standard model it is required to perform exactly one rewrite operation between any two restart steps, but the restart operation is more powerful than in the standard model as it involves a change of the internal state just like any of the other operations. Hence, when executing a restart step, such an automaton does not necessarily forget the information about the current computation that it has collected internally in its finite-state control; it just forgets the position at which the latest rewrite operation was performed. How much does this influence the expressive power of the various types of restarting automata? In the current paper we restrict our attention to non-forgetting restarting automata that are deterministic. In fact, we study several variants of these restarting automata, denoted by class names like R, RW, RWW, RR, RRW, and RRWW, to name a few (See Sect. 2 for the definitions of these and some other classes of restarting automata). On the expressive power of various types of these restarting automata we present three main results, of which the first two have already been announced at CSR 2006 (see [18]). First of all we show that already deterministic non-forgetting R-automata, that is, one-way restarting automata that, when executing a rewrite step, can only delete symbols, and that restart immediately after executing a rewrite step, are quite expressive. In fact, from a single-tape Turing machine T one can construct a deterministic non-forgetting R-automaton M for (a simple encoding of) the language of valid computations of T . Accordingly the emptiness problem for the language L(T ) accepted by T reduces to the emptiness problem for the language L(M). Hence, already for deterministic non-forgetting R-automata, the emptiness problem for input languages is undecidable, in contrast to the situation for standard (that is, forgetting) R-automata. Secondly, we show that deterministic non-forgetting R(W)(W)-automata that are monotone are only as expressive as the corresponding forgetting types of restarting automata, as they just yield further characterizations for the class of deterministic context-free languages. On the other hand, the monotone deterministic non-forgetting RR-, RRW-, and RRWW-automata form a strict hierarchy above DCFL. In particular, this means that the monotone deterministic non-forgetting RWW-automaton is strictly weaker in expressive power than the corresponding RRWW-automaton, which represents the first separation result in the literature between an RWW- and an RRWWautomaton of the same restricted form. Thirdly, we will see that the monotone deterministic non-forgetting RRWWautomaton is as expressive as the monotone deterministic non-forgetting (shrinking) RLWW-automaton. Further, the class of languages that are accepted by these automata coincides with the class of languages that are accepted by monotone deterministic RLautomata, which is known to coincide with the class of left-to-right regular languages LRR of Culik and Cohen [3, 26].
346
Theory Comput Syst (2011) 48: 343–373
This paper is structured as follows. In Sect. 2 we define the various types of restarting automata considered in this paper in detail, summarize a few fundamental results on the expressive power of various types of deterministic (forgetting) restarting automata from the literature, and present a useful normalization result for restarting automata in general. Also we present the notion of metainstructions for (forgetting and non-forgetting) one-way restarting automata. Section 3 begins with a few simple examples of languages that are accepted by deterministic non-forgetting one-way restarting automata. In particular, an example of a language is given that is accepted by a deterministic non-forgetting RW-automaton, but which is not even growing context-sensitive [4]. Then the aforementioned result on the language of valid computations of a single-tape Turing machine is established. In Sect. 4 it is shown that deterministic non-forgetting RWW-automata that are monotone can only accept deterministic context-free languages. This is done by presenting a simulation of a restarting automaton of this form through a deterministic pushdown automaton, which is an adaptation of the simulation of a monotone deterministic (forgetting) RWW-automaton by a deterministic pushdown automaton given in [8]. Then in Sect. 5, which can be seen as the main part of this paper, the announced hierarchy result on monotone deterministic non-forgetting RR-, RRW-, and RRWWautomata is established. A language Lp is presented that is easily seen to be accepted by a monotone deterministic non-forgetting RRWW-automaton, but that cannot be accepted by any monotone deterministic non-forgetting RRW-automaton. For establishing the latter lower bound result we use the notion of Kolmogorov complexity for words over a three-letter alphabet and a result on the behaviour of deterministic finite-state acceptors on random words. Then from Lp we easily obtain a language separating the monotone deterministic non-forgetting RRW-automata from the corresponding RR-automata, and a variant of the palindrome language is used to separate the latter from the deterministic context-free languages. Finally, in Sect. 6 we first show that monotone deterministic non-forgetting (shrinking) RLWW-automata are not more expressive than monotone deterministic (forgetting) RL-automata, which extends a corresponding result of [11] to nonforgetting restarting automata. Then we prove that monotone deterministic nonforgetting RRWW-automata have the same expressive power, too, by presenting a simulation of a monotone deterministic RL-automaton through a monotone deterministic non-forgetting RRWW-automaton. The crucial point of this simulation is the fact that a monotone deterministic RL-automaton can be simulated by a monotone nondeterministic RR-automaton that is correctness-preserving. The latter can then be simulated by a monotone deterministic non-forgetting RRWW-automaton. The paper closes with Sect. 7 which contains a short summary of the main results obtained in this paper and a number of interesting open problems. Most of the results presented here are taken from the Doctoral Dissertation of the first author [17]. In the current paper, however, the results presented are derived in a more concise, but still self-contained fashion, and the proofs for the lower bound results in Sect. 5 are new.
Theory Comput Syst (2011) 48: 343–373
347
2 Definitions Here we describe in short the type of restarting automaton we will be dealing with. More details on restarting automata in general can be found in [25, 27]. A non-forgetting two-way restarting automaton, nf-RLWW-automaton for short, is a nondeterministic machine M = (Q, , , c, $, q0 , k, δ) with a finite set of internal states Q, a flexible tape, and a read/write window of a fixed size k ≥ 1. The work space is limited by the left sentinel c and the right sentinel $, which cannot be removed from the tape. The tape alphabet consists of the finite input alphabet and a finite number of so-called auxiliary symbols. The behaviour of M is described by the transition relation δ that associates a finite set of transition steps to each pair (q, u) consisting of a state q and a possible content u of the read/write window. There are five types of transition steps: 1. A move-right step (MVR) causes M to shift the read/write window one position to the right and to change the state. However, the window cannot be shifted across the right delimiter $. 2. A move-left step (MVL) causes M to shift the read/write window one position to the left and to change the state. Again, if the window already contains the left delimiter c, then no move-left step is possible. 3. A rewrite step causes M to replace the content u of the read/write window by a shorter string v, thereby shortening the tape, and to change the state. Further, the window is placed immediately to the right of the string v. However, if u ends with the symbol $, then so does v, and in this case the window is placed on the $-symbol. 4. A restart step causes M to place its read/write window over the left end of the tape, so that the first symbol it sees is the left sentinel c, and to change the state. 5. An accept step causes M to halt and accept. If δ(q, u) = ∅ for some pair (q, u), then M necessarily halts, and we say that M rejects in this situation. Further, it is required that, when ignoring move operations, rewrite and restart steps alternate in each computation of M, with a rewrite step coming first. In general, the automaton M is nondeterministic, that is, there can be two or more instructions with the same left-hand side (q, u). If that is not the case, the automaton is deterministic. By the prefix det- we denote deterministic types of restarting automata. A configuration of M is a string αqβ where q is a state, and either α = λ (the empty string) and β ∈ {c} · ∗ · {$} or α ∈ {c} · ∗ and β ∈ ∗ · {$}; here q represents the current state, αβ is the current content of the tape, and it is understood that the window contains the first k symbols of β or all of β when |β| ≤ k. A restarting configuration is a configuration of the form qcw$, where q ∈ Q and w ∈ ∗ , which is reached by execution of a restart step; if q is the initial state q0 and w ∈ ∗ is an input word, then q0 cw$ is an initial configuration. We observe that any finite computation of a non-forgetting two-way restarting automaton M consists of certain phases. A phase, called a cycle, starts in a restarting configuration. The window is moved along the tape by executing MVR and MVL operations and a single rewrite operation until a restart operation is performed and thus
348
Theory Comput Syst (2011) 48: 343–373
a new restarting configuration is reached. The part after the last restart operation is called a tail. By (q, w) cM (p, w ) we denote a cycle of M that transforms the restart∗ ing configuration qcw$ into the restarting configuration pcw $. By cM we denote the reflexive and transitive closure of cM . A word w ∈ ∗ is accepted by M, if there is a computation which, starting with the configuration q0 cw$, finishes by executing an accept instruction. By LC (M) we denote the language consisting of all words accepted by M; this is the characteristic language of M, and L(M) = LC (M) ∩ ∗ is the (input) language recognized (accepted) by M. We are also interested in various restricted types of restarting automata. They are obtained by combining three types of restrictions: (a) A restarting automaton is called forgetting, if each of its restart operations resets the internal state to the initial state q0 . This is the type of restarting automaton introduced originally by Janˇcar et al. [6], and therefore we usually delete the prefix ‘forgetting’ when talking about a restarting automaton of this type. (b) Restrictions on the movement of the read/write window (expressed by the first part of the class name): RL- denotes no restriction, RR- means that no MVL operations are available, and R- means that no MVL operations are available and that each rewrite step is combined with a restart step. Restarting automata without MVL operations are called one-way. (c) Restrictions on the rewrite instructions (expressed by the second part of the class name): -WW denotes no restriction, -W means that no auxiliary symbols are available (that is, = ), and the empty suffix means that no auxiliary symbols are available and that each rewrite step is simply a deletion (that is, if (q , v) ∈ δ(q, u) is a rewrite instruction, then v is obtained from u by deleting some symbols). Also some generalizations of restarting automata have been introduced and studied. Here we are interested in the so-called shrinking restarting automaton. A shrinking nf-RLWW-automaton M = (Q, , , c, $, q0 , k, δ) has the same components as an nf-RLWW-automaton with the exception that it is not required that |v| < |u| holds for each rewrite instruction (q , v) ∈ δ(q, u) of M. Instead there must exist a weight function ϕ : ∪ {c, $} → N+ such that, for each rewrite step (q , v) ∈ δ(q, u), ϕ(u) > ϕ(v) holds. Here ϕ is extended to a morphism ϕ : ( ∪ {c, $})∗ → N by taking ϕ(λ) := 0 and ϕ(wa) := ϕ(w) + ϕ(a) for all w ∈ ( ∪ {c, $})∗ and all a ∈ ∪ {c, $}. We use the prefix s- to denote shrinking types of restarting automata. Observe that the weight function ϕ is not part of the specification of a shrinking nf-RLWW-automaton M, as M is shrinking with respect to many different weight functions. However, if M is shrinking, then one can construct a weight function ϕ in polynomial time from M such that M is shrinking with respect to ϕ (see, e.g., [1] proof of Satz IX.10). Concerning the expressive power of deterministic restarting automata the following results have been obtained. Here CRL denotes the class of Church-Rosser languages of [16, 21], which can be seen as the deterministic variants of the growing context-sensitive languages [23]. Further, by L(X) we denote the class of input languages of restarting automata of type X.
Theory Comput Syst (2011) 48: 343–373
349
Theorem 2.1 ([10, 22]) CRL = L(det-RWW) = L(det-RRWW) = L(det-s-RRWW) L(det-RLWW). Here the strictness of the inclusion of CRL in L(det-RLWW) follows from the observation that there exists a deterministic RLWW-automaton for the copy language Lcopy := { w#w | w ∈ {a, b}∗ }, which is not even growing context-sensitive [2, 13]. Finally, we need two monotonicity conditions for restarting automata. Each cycle C of a restarting automaton M contains a unique configuration αqβ in which a rewrite step is applied. Then |β| is called the right distance of C, denoted as Dr (C), and |α| is called the left distance of C, denoted as Dl (C). A sequence of cycles S = (C1 , C2 , . . . , Cn ) of M is called monotone (or rightmonotone) if Dr (C1 ) ≥ Dr (C2 ) ≥ · · · ≥ Dr (Cn ), and it is called left-monotone if Dl (C1 ) ≥ Dl (C2 ) ≥ · · · ≥ Dl (Cn ). A computation of M is called monotone (leftmonotone) if the corresponding sequence of cycles is monotone (left-monotone). Finally, M itself is called monotone (left-monotone) if all its computations that start from an initial configuration are monotone (left-monotone). We use the prefix mon(left-mon-) to denote monotone (left-monotone) types of restarting automata. The following results have been obtained on monotone and left-monotone deterministic two-way restarting automata. Here w R denotes the reversal (or mirror image) of a word w, LR denotes the reversal { w R | w ∈ L } of a language L, and for a language class L, LR = { LR | L ∈ L }. Theorem 2.2 ([11, 12]) (a) For all X ∈ {λ, W, WW}, L(det-left-mon-RLX) = (L(det-mon-RLX))R . (b) L(det-mon-RL) = L(det-mon-RLWW). (c) L(det-left-mon-RL) = L(det-left-mon-RLWW) = L(det-left-mon-s-RLWW) = L(det-left-mon-RWW) = L(det-left-mon-s-RRWW). To simplify the proof details in the following sections we state a normalization result on restarting automata in general. Proposition 2.3 For each restarting automaton M = (Q, , , c, $, q0 , k, δ), there exists a restarting automaton M = (Q, , , c, $, q0 , k , δ ) of the same type as M such that • all rewrite operations (p, v) ∈ δ (q, u) of M satisfy the restriction that |u| = k and v ∈ {λ, c, $}, • k ∈ {k, k + 1}, • the relations cM and cM coincide for all words of length at least k , • LC (M) = LC (M ). Proof outline M is obtained from M as follows. If M contains a rewrite operation of the form (q, u) → (p, v) such that v ∈ {λ, c, $}, then we take k = k + 1; otherwise, k = k can be taken. Now a deterministic finite-state acceptor for the language c · { w ∈ LC (M) | |w| < k } · $ is incorporated in the finite-state control of M . This implies that these words are accepted in tail computations, while words of the form
350
Theory Comput Syst (2011) 48: 343–373
w ∈ LC (M), |w| < k , are rejected in tail computations. Next each rewrite operation (q, u$) → (p, v$) of M, where |u$| < k , is extended by attaching all possible prefixes x of length k − |u$| to u and to v and by changing q and p to the corresponding states. Finally, if k = k + 1, then also all rewrite operations of M of the form (q, u) → (p, v), |u| = k, are extended by attaching all possible suffixes of length 1 to the right of u and v. For a nf-RRWW-automaton M, the first phase of each cycle, that is, the part preceding the application of the corresponding rewrite step, consists only of MVR steps, that is, during this phase M behaves just like a (one-way) finite-state acceptor. Also the part after the execution of the rewrite step consists only of MVR steps. Accordingly, the transition relation of M can be described more compactly by so-called meta-instructions. A rewriting meta-instruction for a nf-RRWW-automaton M is of the form (p1 , E1 , u → v, E2 , p2 ), where p1 and p2 are internal states of M, E1 and E2 are regular expressions, and u, v ∈ ∗ are words satisfying k ≥ |u| > |v|, which stand for a rewrite step of M. To execute a cycle M chooses a rewriting meta-instruction (p1 , E1 , u → v, E2 , p2 ). On trying to execute this meta-instruction M will get stuck (and so reject) starting from a configuration p1 cw$, if w does not admit a factorization of the form w = w1 uw2 such that cw1 ∈ L(E1 ) and w2 $ ∈ L(E2 ). On the other hand, if w does have factorizations of this form, then one such factorization is chosen nondeterministically, and p1 cw$ is transformed into p2 cw1 vw2 $. In order to describe the tails of accepting computations we use accepting meta-instructions of the form (p1 , E, Accept), where E is a regular expression. Starting from a configuration p1 cw$, this meta-instruction accepts if cw$ ∈ L(E) holds. As a nf-RWW-automaton restarts immediately after executing a rewrite step, it can be described by rewriting meta-instructions of the form (p1 , E1 , u → v, ∗ $, p2 ), which are more compactly written as (p1 , E1 , u → v, p2 ). Further, for forgetting RRWW- or RWW-automata, the internal states used in meta-instructions all coincide with the initial state q0 , and accordingly they need not be given explicitly. Observe, however, that meta-instructions are just a convenient way for describing restarting automata. In particular, such a description is essentially nondeterministic. Hence, meta-instructions are only used as a shortform for restarting automata. When it comes to properties like determinism, then one first has to construct the real transition relation of the restarting automaton considered.
3 Deterministic Non-Forgetting Restarting Automata Here we study the expressive power of deterministic non-forgetting restarting automata. We will see that already the most restricted variant, the deterministic nonforgetting R-automaton, is surprizingly expressive. We begin with some examples illustrating the way in which non-forgetting restarting automata work.
Theory Comput Syst (2011) 48: 343–373
351 n
Example 3.1 The language Lexpo := { a 2 | n ∈ N } is accepted by the det-nf-RWWautomaton M with one auxiliary symbol B that is given through the following metainstructions: (1) (2) (3)
(q0 , c(aa)∗ , aaaa$ → Baa$, q1 ), (q1 , c, B → λ, q0 ), (q0 , ca$, Accept),
(4) (5)
(q1 , c(aa)∗ , aaB → Ba, q1 ), (q0 , caa$, Accept).
In fact, from meta-instructions (3) and (5) we see that M accepts in tail computations only the words a 1 and a 2 . Further, from (1) we see that M will get stuck on an input of the form a n , if n is odd. If, however, n = 2m ≥ 4, then (1), (2), and (4) show that by executing m cycles M will reach the restarting configuration q0 ca m $. Inductively it now follows that M accepts on input a n if and only if n is a power of two. Thus, L(M) = Lexpo . Example 3.2 The language Lexpo := { a 2 b | n ∈ N } is accepted by the det-nf-RWautomaton M that is given through the following meta-instructions: n
(1) (q0 , c(aa)+ , aab$ → ba$, q1 ), (2) (q1 , c, b → λ, q1 ), (3) (q1 , caa$, Accept), (4) (q0 , caab$, Accept),
(5) (6) (7)
(q1 , c(aa)∗ , aab → ba, q1 ), (q1 , c(aa)∗ , aaaa$ → baa$, q1 ), (q0 , cab$, Accept).
From meta-instructions (4) and (7) we see that M accepts in tail computations only the words a 1 b and a 2 b. For all other words we see from (1) that M will get stuck if the given input is not of the form a 2m b, where m ≥ 2. A word of this form, however, is reduced by a sequence of m + 1 cycles to the word a m by meta-instructions (1), (2), and (5). Finally, we see from meta-instructions (3), (5), and (6) that such a word will lead to acceptance only if m is a power of two. Thus, L(M ) = Lexpo . Observe that the language Lexpo is also accepted by a (forgetting) deterministic
RWW-automaton, while the language Lexpo cannot be accepted by any (forgetting) deterministic RW-automaton. To further illustrate the expressive power of det-nf-RW-automata, we now con-
sider the iterated copy language Lcopy∗ := { (w#)+ | w ∈ {a, b}∗ }. Observe that this language is not even growing context-sensitive, as the language Lcopy · # = { w#w# | w ∈ {a, b}∗ } = Lcopy∗ ∩ ({a, b}∗ · # · {a, b}∗ · #) is not growing context-sensitive [2]. Lemma 3.3 Lcopy∗ ∈ L(det-nf-RW).
Proof A det-nf-RW-automaton M for the language Lcopy∗ proceeds as follows. Given an input of the form w ∈ {a, b, #}∗ , M checks in its first cycle that w is of the form w = w1 #w2 # · · · #wm #, where w1 , . . . , wm ∈ {a, b}∗ and m ≥ 1. In the negative M halts and rejects immediately, while in the affirmative it deletes the last occurrence of the symbol # and restarts in a special state q0 . In the next cycle the last two symbols
352
Theory Comput Syst (2011) 48: 343–373
of w1 are replaced by a new copy of the symbol #, thus creating an occurrence of the factor ##, and stored in the state of M, and in the following cycle M compares these stored symbols to the last two symbols of w2 . If they do not agree, then M halts and rejects, otherwise the last two symbols of w2 are also replaced by the symbol #. This continues until all syllables w2 , . . . , wm have been processed. Then M enters a new restart state, which causes M to replace each factor of the form ## by the symbol #, one at a time, proceeding from left to right, and finally to replace the final factor #$ by $. Once this task is completed, every syllable wi has been shortened by two symbols, which have been verified to agree for all syllables. Now M restarts in the special state q0 and proceeds to process the now shortened word in the same way. This process continues until either a disagreement between some factors is found, or until all factors have been shortened to length at most 2, and their agreement can be checked simply by scanning the tape from left to right. The method from the proof of Lemma 3.3 can be extended to even design a det-nf-RW-automaton for the language VALC(T ) of all valid computations of a
single-tape Turing machine T . Let T = (QT , , , , q0 , F, δT ) be a deterministic single-tape Turing machine, where QT is a finite set of internal states, is a finite input alphabet, is a finite tape alphabet containing and the blank symbol , q0 ∈ QT is the initial state, F ⊆ QT is the set of final states, and δT : QT × → QT × ( {}) × {L, R} is the transition function. A 5-tuple (p, a, q, b, ρ) ∈ δT expresses the fact that T performs the following action when it is in state p reading symbol a: this occurrence of a is replaced by the symbol b, the head is moved one step to the left or to the right, depending on whether ρ = L or ρ = R, and the internal state is changed to q. Observe that we require that b = , that is, T cannot write blank symbols. Without loss of generality we may also assume that the tape of T is infinite only to the right, and that T only starts and halts with its head in the leftmost position. A configuration of T is presented by a string of the form upvm , where u, v ∈ ( {})∗ , m ≥ 0, and p ∈ QT , expressing the fact that M is in state p, that the current tape content is the word uv (while all other tape squares contain occurrences of the blank symbol), and the head is currently on the first symbol of v or on the first blank symbol if v = λ. A valid computation of T is a finite sequence of configurations (u0 p0 v0 , u1 p1 v1 , . . . , un pn vn ) satisfying the following conditions: 1. |ui pi vi | = |u0 p0 v0 | for all i = 1, . . . , n, 2. u0 p0 v0 is an initial configuration, that is, u0 = λ, p0 is the initial state q0 , and v0 = wm for an input word w ∈ ∗ and an integer m ≥ 0, 3. un pn vn is an accepting configuration, that is, un = λ and pn ∈ F , and vn ∈ ( {})∗ , that is, it does not contain any blank symbols, and 4. for each i = 1, . . . , n, ui pi vi is the immediate successor configuration of ui−1 pi−1 vi−1 . Thus, a valid computation is a description of an accepting computation of T , in which all configurations have the same length. As we require that the last configura-
Theory Comput Syst (2011) 48: 343–373
353
tion does not end with a blank symbol, we see that the valid computation corresponding to a given accepting computation of T is unique. Now the language VALC(T ) of all valid computations of T consists of encodings of all valid computations of T , that is, it consists of all words of the form u0 p0 v0 #u1 p1 v1 # · · · #un pn vn #, where (u0 p0 v0 , u1 p1 v1 , . . . , un pn vn ) is a valid computation of T , and # is a new symbol. Lemma 3.4 For each single-tape Turing machine T , VALC(T ) ∈ L(det-nf-RW). Proof outline To accept VALC(T ) a det-nf-RW-automaton M proceeds as follows. Given an input w ∈ (QT ∪ ∪ {#})∗ , it first checks that w is of the form w = w0 #w1 # · · · #wn #, where wi ∈ ( {})∗ · QT · ( {})∗ · {}∗ for all i = 0, 1, . . . , n, that is, w is actually an encoding of a sequence of configurations of T . This is done in the first cycle. Also M checks in this cycle whether w0 is an encoding of an initial configuration of T , that is, whether w0 ∈ q0 · ∗ · {}∗ , whether wn is an encoding of an accepting configuration of T , that is, whether wn ∈ F · ( {})∗ , and whether, for all 0 ≤ i ≤ n − 1, wi+1 is a possible successor configuration of wi . This means the following: if wi contains the factor tj −1 ptj tj +1 , where p is a state of T , and tj −1 , tj , tj +1 ∈ , and wi+1 contains the factor tl−1 qtl tl+1 , where q is a state of T , and tl−1 , tl , tl+1 ∈ , then either • δT (p, tj ) = (q, tl−1 , R) and tl = tj +1 , or • δT (p, tj ) = (q, tl+1 , L) and tl = tj −1 hold. When M reaches the end of the tape, the last occurrence of the symbol # is removed, and M restarts in a non-initial state, if all these tests were positive. In this case the tape contents is now of the form w0 #w1 # · · · #wn , where w0 encodes an initial configuration of T , for each i = 1, . . . , n, wi encodes a configuration that is possibly a successor configuration of the configuration encoded by wi−1 , and wn encodes an accepting configuration of T . In this first cycle M cannot possibly check whether the various configurations are really consistent with each other. This is done in the next phases, where a variant of the method to accept the language Lcopy∗ is used. Accordingly, in the next cycle M stores the last two symbols, say cd, of w0 in its state, and replaces these two symbols by a new copy of the symbol #, thus creating an occurrence of the factor ##. In the following cycle M compares the two symbols stored in its state to the last two symbols ef of w1 . If they agree, or if the transition of T applied to w0 will result in w1 ending with these symbols, then M stores the symbols ef in its state and replaces them on the tape by another copy of the symbol #; otherwise, it halts and rejects. In the next cycle M compares the symbols ef to the last two symbols of w2 , and this process continues until the last two symbols of all syllables w0 , . . . , wn have been compared. If all these tests are positive, then M restarts in a special state that causes M to replace each of the factors ## by the symbol #. Then the next two symbols are considered. This continues until either a mismatch is found, and then M halts and rejects, or until all factors w0 , . . . , wn have been erased completely, and then M halts and accepts.
354
Theory Comput Syst (2011) 48: 343–373
Actually, by using a properly chosen encoding (see, e.g., [27]) each det-nf-RWautomaton can be simulated by a det-nf-R-automaton. Thus, we obtain the following result. Lemma 3.5 For each single-tape Turing machine T , there exists an injective morphism ϕ such that ϕ(VALC(T )) ∈ L(det-nf-R). From this observation we obtain the following undecidablility results, as the deterministic nf-R-automaton for the language ϕ(VALC(T )) can actually be constructed from the Turing machine T . Theorem 3.6 The following problems are in general undecidable: INSTANCE QUESTION 1 QUESTION 2 QUESTION 3 QUESTION 4
: : : : :
A det-nf-R-automaton M. Is the language L(M) non-empty? Is the language L(M) finite? Is the language L(M) regular? Is the language L(M) context-free?
Proof Let T be a single-tape Turing machine, and let M be a deterministic nf-Rautomaton for the language ϕ(VALC(T )). Then the language L(M) = ϕ(VALC(T )) is non-empty iff T has valid computations iff L(T ) is non-empty. Also the language L(M) = ϕ(VALC(T )) is finite iff T has only finitely many valid computations iff L(T ) is finite. Further, it is easily seen from the corresponding pumping lemmas that the language L(M) = ϕ(VALC(T )) is regular (context-free) iff it is finite. As emptiness and finiteness of L(T ) are undecidable in general, it follows that the four problems above are undecidable in general for deterministic nf-R-automata. Observe that for a det-R-automaton, emptiness of the language accepted is easily decidable. From the results above we see that (shrinking) deterministic non-forgetting R(R)WW-automata have a much larger expressive power than (shrinking) deterministic R(R)WW-automata (cf. Theorem 2.1). In [10] it is shown that in the nondeterministic case shrinking RRWW-automata have the same expressive power as shrinking non-forgetting RRWW-automata. It is easily seen that the proof of this result carries over to deterministic restarting automata that can also execute move-left steps, that is, to shrinking deterministic RLWW-automata. This yields the following equality. Corollary 3.7 L(det-s-RLWW) = L(det-nf-s-RLWW).
4 Monotone Deterministic Non-Forgetting RWW-Automata All types of monotone deterministic R(W)(W)- and RR(W)(W)-automata accept exactly the deterministic context-free languages [8]. As we will see below for nonforgetting restarting automata this result only carries over to R(W)(W)-automata. Theorem 4.1 For each X ∈ {λ, W, WW}, L(det-mon-nf-RX) = DCFL.
Theory Comput Syst (2011) 48: 343–373
355
Proof As DCFL = L(det-mon-R), the inclusion from right to left is obvious. It remains to show that L(det-mon-nf-RWW) ⊆ DCFL holds. To establish this inclusion we present a simulation of a monotone deterministic nf-RWW-automaton by a deterministic pushdown automaton, which is a generalization of the corresponding simulation of monotone deterministic RWW-automata given in [7]. So let M = (Q, , , c, $, q0 , k, δ) be a det-mon-nf-RWW-automaton, where Q = {q0 , q1 , . . . , qn−1 }. We will simulate M by a deterministic pushdown automaton P that, in its finite control, realizes a buffer for storing a word w ∈ ( ∪ {c, $})∗ of length up to 2k symbols and n + 1 states of M, and that uses n + 1 tracks on its pushdown store. On the first of these tracks P will store a prefix of the actual tape content of M, so that the content of this track together with the word stored in the buffer, and the remaining content of the input tape yields the current tape content of M. On track i, 2 ≤ i ≤ n + 1, P will simulate the state transitions of M that correspond to a restart in state qi−2 . Accordingly, P has pushdown alphabet = × Qn . The simulation works as following. Let w ∈ ∗ be the given input. Then P needs to simulate the computation of M that begins with the restarting configuration q0 cw$. In the corresponding initial configuration of P , the pushdown store contains the bottom marker [c, q0 , . . . , qn−1 ], P is in its initial state, and its input tape contains the word w. Let w = w1 uw2 , where u is the factor of w that is rewritten in the first cycle of the accepting computation of M on input w, and assume that |w1 | ≥ 2k. Then w1 = x1 x2 x3 , where |x1 | = k − 1 and |x2 | = k + 1. In this situation P proceeds as follows. In the first 2k steps, P reads x1 x2 letter by letter, storing it in its internal buffer. Now the letter c together with the word x1 stored in the buffer form the word that M currently sees in its read/write window. Thus, from M’s transition function P can determine the transitions δ(qi , cx1 ) of M for all i = 0, . . . , n−1. P now stores the states q0 , q0,1 , . . . , qn−1,1 in its buffer, where q0 represents the state with which the cycle of M being currently simulated begins, and qi,1 is the state that M would enter on executing the transition δ(qi , cx1 ) (1 ≤ i ≤ n − 1). Thereafter P reads the word x3 u letter by letter. Each letter is appended to the word stored in the buffer, the first letter of this word is pushed onto the pushdown store together with the n-tuple of states [q0,1 , . . . , qn−1,1 ], and a new n-tuple of states of M is computed from this n-tuple, the letter stored in the topmost symbol on the pushdown store, and the first k − 1 letters in the buffer. This continues until x3 u has been read completely. At this point P simulates the rewrite step of the current cycle of M as explained below for the general case. Let (p, xuy) cM (qν , xvy) be a cycle of a computation of M, that is, this cycle has been completed by executing a combined rewrite/restart operation of the form δ(p , u) = (qν , v). Then the restarting configuration qν cxvy$ of M will be encoded by the following configuration of P , where x = x1 x2 · · · xm , x1 , . . . , xm ∈ , |v| = s < k, and qi,j is the state that M reaches from the restarting configuration qi cxuy$ after executing j move-right steps (0 ≤ i ≤ n − 1, 1 ≤ j ≤ m − 2k + s + 1): • The content of the pushdown store is [c, q0 , . . . , qn−1 ][x1 , q0,1 , . . . , qn−1,1 ] · · · [xm−2k+s , q0,m−2k+s , . . . , qn−1,m−2k+s ], • the current state is [qν , [q0,m−2k+s+1 , . . . , qn−1,m−2k+s+1 ], xm−2k+s+1 · · · xm v],
356
Theory Comput Syst (2011) 48: 343–373
• and the input tape contains the suffix y. Because of the monotonicity of M, no rewrite step can be executed on a proper prefix of xv. Hence, starting from the configuration qν cxvy$, M will perform |x| + s + 1 − k move-right steps, by which it reaches the configuration cx1 · · · xm−k+s qν,m−k+s+1 xm−k+s+1 · · · xm vy$. If δ(qν,m−k+s+1 , xm−k+s+1 · · · xm v) is a rewrite step, then P simulates this rewrite step within its finite-state control, refilling the buffer from the left with symbols from the top of the pushdown store; otherwise it pushes the auxiliary symbol [xm−2k+s+1 , q0,m−2k+s+1 , . . . , qn−1,m−2k+s+1 ] onto its pushdown store and refills the buffer by reading the next input symbol. In addition, it updates the list of states stored in its buffer by replacing, for each i = 0, . . . , n − 1, the state qi,m−2k+s+1 by the state that is reached by the transition δ(qi,m−2k+s+1 , xm−k+s+1 · · · xm v). Continuing in this way P simulates the computation of M cycle by cycle. It now follows easily that L(P ) = L(M) holds.
5 Monotone Deterministic Non-Forgetting RRWW-Automata Here we establish the aforementioned proper hierarchy of language classes for nonforgetting RRWW-automata that are monotone and deterministic, in this way separating the language class L(det-mon-nf-RRWW) from the class L(det-mon-nf-RWW). Definition 5.1 Let 0 = {a, b} and 1 = {a, b, c}, and let ϕ : 1∗ → 0∗ be the morphism that is induced by a → aa, b → bb, and c → ab. Then ϕ : 1∗ → 0∗ is an encoding. Now let Lp be the following language: Lp := { wc(ϕ(w))R | w ∈ 1∗ , |w| ≥ 2, and |w| ≡ 0 mod 2}. The language Lp is easily seen to be accepted by a monotone deterministic nonforgetting restarting automaton that uses auxiliary symbols. Lemma 5.2 Lp is accepted by a monotone deterministic nf-RRWW-automaton. Proof Let M be the non-forgetting RRWW-automaton with input alphabet 1 and tape alphabet := 1 ∪ , where := { [x, y] | x, y ∈ 1 }, that is given by the following sequence of meta-instructions: (1) (2) (3) (4)
(q0 , c · ∗ , xy → [x, y], (12 )+ · c · 0∗ · $, q0 ) for all x, y ∈ 1 , for all x, y ∈ 1 , (q0 , c · ∗ , xy → [x, y], c · 0∗ · $, q1 ) (q1 , c · ∗ , [x, y]c(ϕ(xy))R → c, 0∗ · $, q1 ) for all x, y ∈ 1 , (q1 , cc$, Accept).
Given an input z ∈ 1∗ , M works as follows. If |z| < 3, then M rejects immediately, as neither meta-instruction (1) nor (2) is applicable to the corresponding initial tape content cz$. Thus, assume that z = xyz1 , where x, y ∈ 1 and z1 ∈ 1+ . If z1 does not contain an occurrence of the symbol c, then again it follows that M will reject
Theory Comput Syst (2011) 48: 343–373
357
immediately. Hence, we can assume that z1 = z2 cz3 , where z2 ∈ (12 )∗ and z3 ∈ 0∗ . Then M rewrites its tape content cz$ = cxyz2 cz3 $ into c[x, y]z2 cz3 $, and it restarts in state q1 , if |z2 | = 0, and it restarts in state q0 , if |z2 | = 2m ≥ 2. Continuing with the latter case, it follows by induction that M rewrites z2 into a sequence of auxiliary symbols of length m, and that it then restarts in state q1 . From meta-instructions (3) and (4) we see that now M compares the mirror image of the (encoded version of the) prefix xyz2 to the suffix z3 , and that it accepts if and only if z3 = (ϕ(xyz2 ))R . Thus, we see that L(M) = Lp holds. Finally, it is easily seen from the definition above that M is deterministic, and it follows from the above description of the way in which M works that it is also monotone. Thus, M is a monotone deterministic nf-RRWW-automaton for the language Lp . Now the main part of the work in this section consists in establishing the following lower bound result for the language Lp . Theorem 5.3 The language Lp is not accepted by any monotone deterministic nonforgetting RRW-automaton. To establish this result we will make use of the notion and techniques of Kolmogorov complexity [9, 14]. Here we use K(x) to denote the Kolmogorov complexity of a word x over a finite alphabet of cardinality at least two. A word x ∈ 1+ is called incompressible if K(x) ≥ |x| holds. It is c-incompressible for a constant c ∈ N+ , if K(x) ≥ |x| − c holds. Finally, x is called random if K(x) > |x| − 4 · log3 |x| holds. Here log3 denotes the logarithm to base 3. 1 n Lemma 5.4 For each c ≥ 0 and each n ≥ 1, there are more than (1 − 2·3 c ) · 3 words of length n over 1 that are c-incompressible.
Proof There are 3n words of length n over 1 . On the other hand, there are only n−c−1 i n−c 3 = 3 2 −1 many words of length at most n − c − 1 over 1 . Thus, at i=0 most these many words of length n have a description of length at most n − c − 1. n−c 1 n Thus, at least 3n − 3 2 −1 > (1 − 2·3 c ) · 3 many words of length n over 1 are c-incompressible. Not all factors of incompressible words are themselves incompressible. However, we have the following result saying that sufficiently long suffixes of c-incompressible words are random. Proposition 5.5 For each m ∈ N+ , there exists an integer n0 ∈ N+ such that the following statement holds for all n ≥ n0 and all c-incompressible words u ∈ 1n : If u is factored as u = u1 u2 · · · um uˆ such that |ui | = log3 (n)2 , 1 ≤ i ≤ m, then the suffix u(i) = ui+1 · · · um uˆ of u is random for each 1 ≤ i ≤ m. Proof Let i ∈ {1, . . . , m}, and let u(i) = ui+1 · · · um u, ˆ that is, u = u1 · · · ui u(i) . Then (i) 2 |u | = |u| − |u1 · · · ui | = n − i · log3 (n) . We have to prove that K(u(i) ) > |u(i) | − 4 · log3 |u(i) | holds.
358
Theory Comput Syst (2011) 48: 343–373
For r ∈ N+ , let bin(r) denote the binary representation of r. If p (i) is a (shortest) description for u(i) , then p := 0|bin(|u1 ···ui |)| · 1 · bin(|u1 · · · ui |) · u1 · · · ui · p (i) is a description for the word u. Indeed from p a program can first extract the information |bin(|u1 · · · ui |)|, which it can use to determine bin(|u1 · · · ui |) and then u1 · · · ui . Finally, from the remaining suffix of p, which is p (i) , it can determine the suffix u(i) of u. Thus, we see that |p| ≥ K(u) ≥ n − c holds. As |p| = 2 · |bin(|u1 · · · ui |)| + 1 + |u1 · · · ui | + |p (i) | ≤ 2 · (log2 (i · log3 (n)2 ) + 1) + 1 + i · log3 (n)2 + |p (i) | ≤ 2 · log2 (i · log3 (n)2 ) + 3 + i · log3 (n)2 + |p (i) | = 2 · log2 (i) + 4 · log2 (log3 (n)) + 3 + i · log3 (n)2 + |p (i) |, it follows that |p (i) | ≥ n − c − 2 · log2 (i) − 4 · log2 (log3 (n)) − 3 − i · log3 (n)2 = n − i · log3 (n)2 − (c + 3 + 2 · log2 (i)) − 4 · log2 (log3 (n)). On the other hand, we have |u(i) | − 4 · log3 |u(i) | = n − i · log3 (n)2 − 4 · log3 (n − i · log3 (n)2 ). Since i ≤ m, where m is a fixed constant, we see that for all sufficiently large values of n, 4 · log3 (n − i · log3 (n)2 ) > (c + 3 + 2 · log2 (i)) + 4 · log2 (log3 (n)) holds, which implies that K(u(i) ) = |p (i) | > |u(i) | − 4 · log3 |u(i) | holds. Thus, u(i) is random. The next proposition, which is taken from [9], is concerned with the behaviour of a deterministic finite-state acceptor on a random word of sufficient length. Proposition 5.6 Let A be a deterministic finite-state acceptor with tape alphabet ⊇ = ∅. Then there exists a constant n0 ∈ N+ such that, for each integer n > n0 and each random word w ∈ n , the following condition is satisfied for each word v ∈ + : Assume that A is in state q when it enters the factor wv on its tape from the left, and that it reaches state q when its head is located inside the factor v. Then A already encounters state q while its head is still inside the prefix of w of length logs n2 , where s = ||. Assume that M = (Q, 1 , 1 , c, $, q0 , k, δ) is a monotone deterministic nonforgetting RRW-automaton such that L(M) = Lp . We will now derive a contradiction through a sequence of technical lemmas. The nf-RRW-automaton M can be described through a finite sequence of rewriting meta-instructions Ij = (pj , Ej , uj → vj , Fj , qj ), 1 ≤ j ≤ rM , and a finite sequence (a) of accepting meta-instruction Ij = (pj , Gj , Accept), rM + 1 ≤ j ≤ rM + aM . As M is deterministic, the above meta-instructions are used as follows. Assume that M
Theory Comput Syst (2011) 48: 343–373
359
is in the restarting configuration qcw$. Then M scans the tape from left to right until it detects the shortest prefix w1 of w such that q = pj , w1 = w3 uj , and cw3 ∈ L(Ej ) for some j ∈ {1, . . . , rM }. It then rewrites uj into vj and checks whether the corresponding suffix w2 of w satisfies the condition that w2 $ ∈ L(Fj ). At the same time it checks whether the original tape content cw$ belongs to the language L(Gs ) for some s satisfying rM + 1 ≤ s ≤ rM + aM and q = ps . If the latter holds, then M halts and accepts; if the latter does not hold, but w2 $ ∈ L(Fj ), then M restarts, which yields the restarting configuration qj cw3 vj w2 $. Finally, if w2 $ ∈ L(Fj ) does not hold, either, then M halts and rejects. If no prefix of the above form is found, then M halts and rejects as well, unless cw$ ∈ L(Gs ) holds for some s satisfying rM + 1 ≤ s ≤ rM + aM and q = ps , in which case M halts and accepts. With M we can now associate deterministic finite-state acceptors Aj such that L(Aj ) = L(Ej ) · uj , 1 ≤ j ≤ rM . Finally we define a deterministic finite-state acceptor (without initial state) A as the disjoint union of the finite-state acceptors Aj , 1 ≤ j ≤ rM . For the following discussion we distinguish between inner rewrites and suffix rewrites of M. A rewrite step uj → vj is a suffix rewrite, if uj (and therewith also vj ) ends with the right delimiter $; otherwise it is called an inner rewrite. By ρsuf we denote the number of suffix rewrite operations contained in the definition of M. As M is monotone, we see that each (accepting) computation of M consists of two parts: The first part consists of a number of cycles in which inner rewrite steps are performed, and the second part consists only of cycles in which suffix rewrite steps are performed. Concerning the first part we have the following result. Proposition 5.7 For each m ∈ N+ , there exists an integer n0 ∈ N+ such that the following statement holds for all even n ≥ n0 and all c-incompressible words u ∈ 1n : If M starts from an initial configuration corresponding to the input z = uc(ϕ(u))R , then in the resulting computation of M the first m inner rewrites are executed within the prefix of u of length m · log3 (n)2 . Proof Let m ∈ N+ be a constant. For m we obtain a constant n(1) 0 ∈ N+ from Proposi(2) tion 5.5, and for the deterministic finite-state acceptor A, we get a constant n0 ∈ N+ (2) from Proposition 5.6. Thus, we can choose n0 := max{n(1) 0 , n0 }. n Now let n ≥ n0 be an even number, and let u ∈ 1 be a c-incompressible word. We can factor u as u = u1 u2 · · · um uˆ such that |ui | = log3 (n)2 , 1 ≤ i ≤ m. Then we know from Proposition 5.5 that the suffix u(i) = ui+1 · · · um uˆ of u is random for each i, 1 ≤ i ≤ m. Consider a computation of M starting from the initial configuration q0 cuc(ϕ(u))R $. If in the first cycle an inner rewrite is executed by applying metainstruction Ij , then A must reach a corresponding state from the state that corresponds to the initial state of Aj . By Proposition 5.6 this happens already inside the prefix u1 . Continuing with the computation of M, the next inner rewrite is either also executed on (the successor of) the prefix u1 , or the same argument can be used to show that it is executed on the factor u2 . Inductively it follows that the first m inner rewrites are executed on the prefix u1 · · · um of u of length m · log3 (n)2 .
360
Theory Comput Syst (2011) 48: 343–373
For all j , 1 ≤ j ≤ rM , if the rewrite step uj → vj is an inner rewrite, then let Bj be a finite-state acceptor for the language L(Fj ), and if uj → vj is a suffix rewrite, then let Bj be a finite-state acceptor for the language L(Ej ). Assume that Bj has bj ≥ 1 internal states, and let β := rjM=1 bj . For the following considerations we use n0 (m) to denote the constant from Proposition 5.7 that corresponds to a given value of m, and we take n1 (m) := max{n0 (m), (k − 1) · ρsuf · β}. The next result shows that the first part of the computation of M on input uc(ϕ(u))R consists of at least m cycles. Proposition 5.8 Let m ≥ 1, and let u ∈ 1n be a c-incompressible word of even length n ≥ n1 (m). Then the accepting computation of M on input uc(ϕ(u))R contains at least m applications of inner rewrite steps. Proof Assume that the accepting computation of M on input uc(ϕ(u))R consists of only r < m cycles in which inner rewrite steps are executed followed by a sequence of cycles that only involve applications of suffix rewrite steps. Then it follows from Proposition 5.7 that in the first part of the computation considered, uc(ϕ(u))R is rewritten into a word of the form u u2 u3 c(ϕ(u))R = u u2 u3 c(ϕ(u3 ))R (ϕ(u2 ))R (ϕ(u1 ))R , where u = u1 u2 u3 , |u1 | = r · log3 (n)2 , |u1 u2 | = |u3 |, and u is obtained from u1 through the execution of the above-mentioned r inner rewrite steps. In the second part of the computation considered a sequence of cycles is executed that only involves suffix rewrite steps, and that leads to a restarting configuration from which M accepts. A suffix rewrite x → y has the form x = x1 $ and y = y1 $ for some words x1 ∈ 1k−1 and y1 ∈ 1∗ of length |y1 | ≤ k − 2. The next suffix rewrite x → y that follows after executing the suffix rewrite x → y necessarily satisfies the condition that x = k−1−|y1 | . In particular, a sequence of suffix rewrites xy ˆ 1 $ for some word xˆ ∈ 1 replaces a suffix of length at most (k − 1) · of (ϕ(u))R by the right-hand side y of the last rewrite in that sequence. As M must compare the factor u3 to its image (ϕ(u3 ))R before it can accept, and as |(ϕ(u1 u2 ))R | = 2 · |u1 u2 | = |u| = n ≥ (k − 1) · ρsuf · β, the sequence of suffix rewrites is of length α > ρsuf · β. M contains only ρsuf many different suffix rewrite operations, and so at least one of them, say x1 $ → y1 $, is used at least β + 1 many times in this sequence. Accordingly, this computation contains a subcomputation of the following form: (qi , u u2 u3 c(ϕ(u3 ))R (ϕ(u2 ))R (ϕ(u1 ))R ) = (qi , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wβ wβ+1 ) cM0 (qj , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wβ y1 ) s
cM1 (qj , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wβ−1 y1 ) s s
cM2 · · · sβ−1
cM
s cβ
(qj , u u2 u3 c(ϕ(u3 ))R w0 w1 y1 )
M (qj , u u2 u3 c(ϕ(u3 ))R w0 y1 ).
Theory Comput Syst (2011) 48: 343–373
361
In each of the cycles preceding this subcomputation the factor w0 w1 · · · wβ is scanned by M, which means that one of the finite-state acceptors Bj is used to read this factor. Consider the positions that correspond to the first letter of w1 , w2 , . . . , wβ and wβ+1 , respectively. As these are β + 1 many positions, we see from the definition of β that there are indices 1 ≤ r < s ≤ β + 1 such that each of the Bj used is in the same state when reading the first letter of wr and the first letter of ws . Now consider the input uˆ := uc(ϕ(u3 ))R w0 w1 · · · wr (wr+1 · · · ws )2 ws+1 · · · wβ wβ+1 that is obtained from uc(ϕ(u))R by repeating the factor wr+1 · · · ws . Then uˆ ∈ Lp , as the second syllable is too long. However, given uˆ as input, M will execute the following computation: (q0 , uc(ϕ(u3 ))R w0 w1 · · · wr (wr+1 · · · ws )2 ws+1 · · · wβ wβ+1 ) ∗
cM (qi , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wr (wr+1 · · · ws )2 ws+1 · · · wβ wβ+1 ) cM0 (qj , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wr (wr+1 · · · ws )2 ws+1 · · · wβ y1 ) s
∗
cM (qj , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wr wr+1 · · · ws wr+1 · · · ws y1 ) ∗
cM (qj , u u2 u3 c(ϕ(u3 ))R w0 w1 · · · wr wr+1 · · · ws y1 ) ∗
cM (qj , u u2 u3 c(ϕ(u3 ))R w0 y1 ). As M will eventually accept starting from the configuration (qj , u u2 u3 c(ϕ(u3 ))R w0 y1 ), this contradicts the so-called Error Preserving Property for M (see, e.g., [17, 27]). Thus, in the computation considered M must execute at least m inner rewrite steps. After performing an inner rewrite step inside the prefix of length m · log3 (n)2 of a c-incompressible word of even length n ≥ n1 (m), M scans the corresponding suffix u2 c(ϕ(u))R of the tape content, and we can assume that at the right delimiter it either accepts, executes a restart operation Restart(q) for some q ∈ Q, or rejects. Thus, with the word u2 c(ϕ(u))R we can associate a mapping ψu : Q → ({Accept, Reject} ∪ {Restart(q) | q ∈ Q}). Obviously, there are only η := |Q||Q|+2 such mappings. Accordingly, we choose a constant m ∈ N+ such that 3m−2 > η · |Q|, and let n ≥ n1 (m) be an even integer. Proposition 5.9 There exists a word u ∈ 1∗ that has the following properties: • n − m · k ≤ |u | ≤ n − m; 1 m−1 different c-incompressible words u of length n • there are at least (1 − 2·3 c)·3 such that, for each of these words u, M has an accepting computation on input uc(ϕ(u))R that rewrites the prefix u into the word u through the first m inner rewrites of this computation; ˆ where |u| ˆ = n − m · log3 (n)2 , and each of the c• u has the form u = u1 u, incompressible words u above has the same suffix u. ˆ
362
Theory Comput Syst (2011) 48: 343–373
1 n Proof According to Lemma 5.4 there are more than (1 − 2·3 c ) · 3 words of length n over 1 that are c-incompressible. On the other hand, by executing m rewrite operations inside the prefix of length m · log3 (n)2 of a c-incompressible word u of length n, M reduces the word u to a word u ∈ 1∗ of length n such that n−m i i n−m+1 words n − m · k ≤ n ≤ n − m. As there are only n−m i=n−m·k 3 ≤ i=0 3 ≤ 3 1 3n = of this length, we see that there exists a word u such that at least (1 − 2·3c ) · 3n−m+1 1 m−1 (1 − 2·3c ) · 3 different c-incompressible words u of length n are reduced to this particular word u . As for each of these c-incompressible words of length n the rewrite operations are executed inside the prefix of length m · log3 (n)2 , we see that they all have the same suffix uˆ of length n − m · log3 (n)2 , and that u = u1 uˆ holds for some word u1 ∈ 1∗ .
Now we can complete the proof of Theorem 5.3. Proof of Theorem 5.3 By Proposition 5.9 there exists a word u ∈ 1∗ such that there 1 m−1 different c-irreducible words u ∈ n with the properties are at least (1 − 2·3 c)·3 1 detailed in Proposition 5.9. As 1 1 1− · 3m−1 ≥ · 3m−1 > 3m−2 > η · |Q|, c 2·3 2 as there are at most η different mappings ψu , and as there are at most |Q| different restarting states of M, there are (at least) two different c-incompressible words u, U ∈ 1n , and a restarting state q ∈ Q such that the following conditions are all satisfied simultaneously: • u = u1 uˆ and U = U1 u, ˆ where |u1 | = |U1 | = m · log3 (n)2 , • the mapping ψu and ψU are identical, • through the first m inner rewrite steps in the accepting computation of M on input uc(ϕ(u))R , the prefix u1 of u is rewritten into the word u , and after these m cycles M restarts in the restarting state q, • through the first m inner rewrite steps in the accepting computation of M on input U c(ϕ(U ))R , the prefix U1 of U is rewritten into the word u , and after these m cycles M restarts in the restarting state q. Now consider the input zmix := U c(ϕ(u))R . As u = U , and as the mapping ϕ : 1∗ → 0∗ is injective, we see that zmix ∈ Lp holds. Thus, there must not be an accepting computation of M on input zmix . There is an accepting computation of M on input U c(ϕ(U ))R that begins with ∗ an initial segment (q0 , U c(ϕ(U ))R ) cM (q, u uc(ϕ(U ˆ ))R ). Also there is an acceptR ing computation of M on input uc(ϕ(u)) that begins with an initial segment ∗ R ). In both cases q is the restarting state with (q0 , uc(ϕ(u))R ) cM (q, u uc(ϕ(u)) ˆ which M starts the next cycle. As the mappings ψu and ψU that are induced by the R and uc(ϕ(U factors uc(ϕ(u)) ˆ ˆ ))R , respectively, coincide, it follows that in the former of these accepting computations we can replace the factor (ϕ(U ))R by (ϕ(u))R , which yields the following computation: ∗
R R (q0 , U c(ϕ(u))R ) = (q0 , U1 uc(ϕ(u)) ˆ ) cM (q, u uc(ϕ(u)) ˆ ).
Theory Comput Syst (2011) 48: 343–373
363
Also in this case q is the restarting state with which M starts the next cycle. Hence, with the word uc(ϕ(u))R , M will also accept the word zmix . This, however, contradicts our assumption that L(M) = Lp holds. It follows that Lp is not accepted by any monotone deterministic non-forgetting RRW-automaton. Together with Lemma 5.2 this yields the following separation result. Corollary 5.10 L(det-mon-nf-RRW) L(det-mon-nf-RRWW). In the proof of Lemma 5.2 we have presented a monotone deterministic nf-RRWWautomaton M with tape alphabet = 1 ∪ for the language Lp . Next we consider the characteristic language Lp := LC (M) of M. We claim that this language is accepted by a monotone deterministic nf-RRW-automaton, but that it is not accepted by any monotone deterministic nf-RR-automaton. Lemma 5.11 Lp is accepted by a monotone deterministic nf-RRW-automaton. Proof Let MC be the deterministic nf-RRW-automaton that is obtained from the aforementioned nf-RRWW-automaton M by declaring all tape symbols to input symbols. Then it is immediate that L(MC ) = LC (M) = Lp holds. It remains to show that MC is monotone, that is, for each restarting configuration of MC , the computation of MC starting from that configuration is monotone. However, this follows immediately from the definition of the nf-RRWW-automaton M, as its meta-instructions ensure that a cycle can be completed only if the current tape content is of the form cUycz$ or cU cz$, where U ∈ ∗ , y ∈ (12 )+ , and z ∈ 0∗ . In the former case the prefix of length 2 of y is rewritten into a symbol from , and in the latter case some symbols surrounding the symbol c are deleted. It remains to establish the announced lower bound. Theorem 5.12 The language Lp is not accepted by any monotone deterministic nonforgetting RR-automaton. Proof Assume to the contrary that M = (Q, , , c, $, q0 , k, δ) is a monotone deterministic nf-RR-automaton such that L(M ) = Lp holds. Define a deterministic nfRR-automaton Mr as a restricted variant of M by taking Mr := (Q, 1 , 1 , c, $, q0 , k, δr ). Here δr is simply the restriction of δ to words that only contain letters from the subalphabet 1 of , that is, δr (q, u) := δ(q, u) for all q ∈ Q and all words u ∈ (c · 1k−1 ) ∪ 1k ∪ (1≤k−1 · $) ∪ (c · 1≤k−2 · $). As M is an RR-automaton, all its rewrite operations are just deletions. Thus, for each input w ∈ 1∗ , the computation of Mr on input w is identical to the computation of M on input w. Hence, we see that Mr is monotone, and that L(Mr ) = L(M ) ∩ 1∗ = Lp ∩ 1∗ = LC (M) ∩ 1∗ = Lp . This, however, contradicts Theorem 5.3, which states that the language Lp is not accepted by any monotone deterministic nf-RRW-automaton. This shows that the language Lp is indeed not accepted by any monotone deterministic non-forgetting RR-automaton.
364
Theory Comput Syst (2011) 48: 343–373
Together with Lemma 5.11 this yields the following separation result. Corollary 5.13 L(det-mon-nf-RR) L(det-mon-nf-RRW). Finally we will see that monotone deterministic nf-RR-automata are strictly more powerful than the corresponding standard variant. For this we consider the example language Lpal := {wcφ(w)R | w ∈ {a, b, c}∗ , |w|c ≥ 1}, where φ is the morphism defined by φ(a) := a, φ(b) := b, and φ(c) := λ. Lemma 5.14 Lpal ∈ L(det-mon-nf-RR) DCFL. Proof The language Lpal is accepted by the det-mon-nf-RR-automaton Mpal that is given through the following meta-instructions: (1) (2) (3) (4) (5)
(q0 , c · {a, b}∗ , c → λ, ({a, b}∗ · c)2 · {a, b, c}∗ · $, q0 ), (q0 , c · {a, b}∗ , c → λ, {a, b}∗ · c · {a, b}∗ · $, q1 ), (q1 , c · {a, b}∗ , aca → c, {a, b}∗ · $, q1 ), (q1 , c · {a, b}∗ , bcb → c, {a, b}∗ · $, q1 ), (q1 , cc$, Accept).
Let w ∈ {a, b, c}∗ be the given input. We see from meta-instructions (1) and (2) that w will be rejected immediately, that is, in a tail computation, if |w|c ≤ 1 holds. So assume that w = w1 cw2 , where w1 ∈ {a, b, c}∗ , |w1 |c ≥ 1, and w2 ∈ {a, b}∗ . Proceeding from left to right all occurrences of the symbol c are erased from w1 , and on completing this task the state q1 is entered. Now w ∈ Lpal holds if and only if w2 = φ(w1 )R . The latter condition, however, is now checked by meta-instructions (3) to (5) by deleting matching symbols surrounding the symbol c. It follows that L(Mpal ) = Lpal . It remains to show that the language Lpal is not deterministic context-free. As DCFL = L(det-mon-R), it suffices to prove that this language is not accepted by any monotone deterministic R-automaton. So assume to the contrary that M is a monotone deterministic R-automaton that accepts the language Lpal . For a sufficiently large integer m, the word w := ca m ca m ∈ Lpal cannot be accepted by M in a tail computation. Hence, starting from the initial configuration q0 cca m ca m $, M will execute a cycle (q0 , w) cM (q0 , w ). As M is deterministic, w ∈ Lpal , implying that w = ca m−i ca m−i for some integer i > 0. As M restarts immediately after executing a rewrite step, M will apply the same rewrite operation when starting from the initial configuration q0 cca m ca m ca m−i a m−i $, that is, it will execute the cycle (q0 , ca m ca m ca m−i a m−i ) cM (q0 , ca m−i ca m−i ca m−i a m−i ). This contradicts the Error Preserving Property, as ca m ca m ca m−i a m−i ∈ Lpal , while ca m−i ca m−i ca m−i a m−i ∈ Lpal . Together with Theorem 4.1 this yields the following proper inclusion. Corollary 5.15 DCFL = L(det-mon-nf-R) L(det-mon-nf-RR).
Theory Comput Syst (2011) 48: 343–373
365
6 Monotone Deterministic Non-Forgetting RLWW-Automata As we have seen monotone deterministic non-forgetting RRWW-automata are strictly more expressive than monotone deterministic RRWW-automata. We will now see that they are actually as expressive as monotone deterministic non-forgetting (shrinking) RLWW-automata. In fact, they yield another characterization for the class of left-toˇ right regular languages of Culik and Cohen [3]. We begin this section by showing that all the different types of monotone deterministic RL(W)(W)-automata are of the same expressive power. Theorem 6.1 L(det-mon-nf-s-RLWW) = L(det-mon-RL). Proof According to the main result of [11], L(det-mon-RLWW) = L(det-mon-RL). This equality is easily extended to monotone deterministic shrinking RLWWautomata as follows. Obviously, L(det-mon-RL) ⊆ L(det-mon-s-RLWW) holds. Conversely, assume that L ∈ L(det-mon-s-RLWW). By Lemma 1 of [11], a language is accepted by a monotone deterministic RLWW-automaton if and only if its mirror image is accepted by a left-monotone deterministic RLWW-automaton. The proof of this result, and therewith the result itself, carries over to shrinking restarting automata, that is, we have LR ∈ L(det-left-mon-s-RLWW). By Theorem 5.6 of [12], L(det-left-mon-s-RLWW) = L(det-left-mon-RLWW), which in turn implies that LR ∈ L(det-left-mon-RLWW). Thus, by applying Lemma 1 of [11] again we see that L is accepted by a monotone deterministic RLWW-automaton, that is, L(det-mon-s-RLWW) = L(det-mon-RL) follows. Thus, in order to complete the proof of Theorem 6.1 it remains to show that each det-mon-nf-s-RLWW-automaton can be simulated by some det-mon-sRLWW-automaton. To this end we use a modification of the proof of the corresponding result for nondeterministic shrinking RRWW-automata from [10]. So let M = (Q, , , c, $, q0 , k, δ) be a monotone deterministic non-forgetting s-RLWWautomaton accepting a language L ⊆ ∗ . To simplify the discussion below we assume that each rewrite transition of M has the form u → v for some word v ∈ {λ, c, $} (see Proposition 2.3). We describe an s-RLWW-automaton M = (Q , , , c, $, q0 , k, δ ) that simulates M. In fact, it will take M two cycles to simulate a single cycle of M: first M simulates the first part of the actual cycle of M, which is the part from the corresponding restarting configuration to the execution of the corresponding rewrite step, and then in a second cycle M will simulate the second part of the cycle of M from after the execution of the rewrite step to the restart step. In the first of these cycles M stores the state that M reaches by executing the corresponding rewrite step on its tape, and in the second of these cycles M replaces this state of M by the restart state for the next cycle of M. Accordingly, we take = ∪ ( × Q × {0, 1}), and let π be the morphism defined by a → a and [a, q, i] → a for all a ∈ , all q ∈ Q, and all i ∈ {0, 1}. An initial configuration q0 cw$ of M, where w ∈ ∗ , is encoded by the initial configuration q0 cw$ of M . A general restarting configuration qcw$ of M is encoded by restarting configurations of M of the form q0 cu[a, q, 0]v$, where
366
Theory Comput Syst (2011) 48: 343–373
u ∈ ∗ , a ∈ , and v ∈ ∗ such that w = π(uav), that is, the actual state q in which M restarts is encoded on the tape within the rightmost symbol from . Finally, restarting configurations of M of the form q0 cu[a, q, 1]v$, where u ∈ ∗ , a ∈ , and v ∈ ∗ , are used to encode the configuration cπ(u)aqv$ of M that is reached by M by performing a rewrite operation on a suffix of π(u)a that puts M into state q. Assume that M is in a restarting configuration of the form q0 cx$. As M is an s-RLWW-automaton, it can begin the current cycle by first checking whether there is a code symbol [a, q, i] on its tape. In the negative it knows that the current restarting configuration corresponds to the restarting configuration q0 cx$ of M. In the affirmative, that is, if x = u[a, q, i]v for some u ∈ ∗ , a ∈ , and v ∈ ∗ , it determines the rightmost occurrence of a symbol [a, q, i] ∈ on the tape. If i = 0, then it knows that it has to simulate the first part of a cycle of M that begins with the restarting configuration qc π(x)$. Finally, if i = 1, then it has to simulate the second part of a cycle of M starting from the configuration cπ(u)aqv$. If M has to simulate the first part of a cycle of M, then it simulates the step-bystep behaviour of M on the tape content cπ(x)$. When it reaches the application of a rewrite operation (p, u) → (q , vb), then it rewrites the factor of x encoding u by the word v[b, q , 1] and restarts. If M has to simulate the second part of a cycle of M, then it simulates the step-by-step behaviour of M on the tape content cπ(x)$ starting from the configuration cπ(u)aqv$ of M. When it reaches the application of a restart ˆ then it returns to the symbol [a, q, 1] and replaces it operation (p, u) → (Restart, q), by the symbol [a, q, ˆ 0]. Of course, if M would accept during the first or second part of the cycle being simulated, then so does M . As M is monotone, each computation of M is monotone as well. Indeed monotonicity of M ensures that the rightmost code symbol [a, q, i] contains the correct information about the state of the cycle of M being simulated. It now follows easily that L(M ) = L(M). Also from a weight function witnessing that M is shrinking one can easily construct a weight function with respect to which M is shrinking. Next we extend Theorem 6.1 by showing that monotone deterministic nonforgetting RRWW-automata are just as expressive as monotone deterministic nonforgetting shrinking RLWW-automata. Theorem 6.2 L(det-mon-nf-RRWW) = L(det-mon-RL). Proof Obviously, L(det-mon-nf-RRWW) ⊆ L(det-mon-nf-s-RLWW), which yields the inclusion L(det-mon-nf-RRWW) ⊆ L(det-mon-RL) by Theorem 6.1. For establishing the converse inclusion, we consider a monotone deterministic RL-automaton M accepting a language L ⊆ ∗ . Then there exists a monotone (non(1) deterministic) RR-automaton M1 = (Q1 , , , c, $, q0 , k, δ1 ) such that the cycle c c relations M and M1 coincide, and L(M1 ) = L (see [28]). Even though it is nondeterministic, M1 is correctness preserving, that is, if w ∈ L(M1 ) and w cM1 w hold, then necessarily w ∈ L(M1 ) follows. This is an immediate consequence of the fact that M1 executes exactly the same cycles as M, and the assumption that M is deterministic. As M1 is an RR-automaton, it can be described by a finite set of
Theory Comput Syst (2011) 48: 343–373
367
rewriting meta-instructions (Ei , ui → vi , Fi ), i = 1, . . . , m, and an accepting metainstruction (G, Accept). By Proposition 2.3 we may assume that vi ∈ {λ, c, $} for all i = 1, . . . , m. We now describe a monotone deterministic non-forgetting RRWW-automaton (2) M2 = (Q2 , , , c, $, q0 , k, δ2 ) that simulates M1 . The automaton M2 will systematically search for a factor of the current tape content to which a rewriting metainstruction of M1 applies. It will use auxiliary symbols to mark the position at which to look for the lefthand side of a rewrite step ui → vi , and it will use its ability to restart in a non-initial state to carry information on the suffix of the current tape content to the next cycle. Actually, as all rewrite steps of M2 must be length-reducing, M2 will encode the prefix up to the position of the next application of a rewrite step of M1 by replacing pairs of tape symbols by new code symbols. Accordingly, we choose := 2 ∪ , where 2 := { [a, b] | a, b ∈ }. By ϕ we denote the morphism from ∗ onto ∗ that is induced by the mapping a → a and [a, b] → ab for all a, b ∈ . Each restarting configuration of M2 will be of the form qc[a1 , a2 ][a3 , a4 ] · · · [a2s−1 , a2s ]b1 b2 · · · bn $, where [a2i−1 , a2i ] ∈ 2 , 1 ≤ i ≤ s, and bj ∈ , 1 ≤ j ≤ n. This is to express the fact that no rewrite step of M1 applies to any proper prefix of a1 · · · a2s . However, there are some technical problems with this approach that we must overcome: • a rewrite operation ui → vi may rewrite a factor at an odd position; • it may remove an even or an odd number of symbols; • in fact, it may remove a single symbol only. To simplify the presentation we first assume that each rewrite operation ui → vi of M1 shortens the tape content by at least two symbols, that is, |ui | − |vi | ≥ 2 for all i = 1, . . . , m. Afterwards we will consider the case that |ui | − |vi | = 1 holds for some of the rewrite operations of M1 . Let qcx ay$ be the current restarting configuration of M2 , where x ∈ 2∗ , a ∈ , and y ∈ ∗ . Starting from this configuration M2 will proceed as follows. We distinguish three cases. Case 1. First assume that x = [a1 , a2 ] · · · [a2s−1 , a2s ] ∈ 2∗ , that y = bdz for some letters b, d ∈ , and that no rewrite step of M1 applies to the suffix of the word (2) ϕ(x ) or ϕ(x )a. Then q = q0 . In this situation M2 moves its read/write window to the right, simultaneously simulating finite-state acceptors for the regular languages L(Ej ) · uj , j = 1, . . . , m, and L(G) until its window contains the suffix x2 of x of length k/2 and the factor ab of length 2. Then M2 replaces the factor ab by the symbol [a, b], and it stores all pairs of the form (0, l) in its finite control for which cϕ(x )ab ∈ L(El ) · ul holds. Next it moves its head one more step to the right, thereby simulating finite-state acceptors for those languages L(Fl ) for which the pair (0, l) has been stored, and continuing to simulate the finite-state acceptors for the regular languages L(Ej ) · uj , j = 1, . . . , m, and L(G). Then it stores all pairs of the form (1, l) in its finite control for which cϕ(x )abd ∈ L(El ) · ul holds. Thereafter M2 moves its read/write window further to the right, thereby simulating finite-state acceptors for those languages L(Fl ) for which the pair (0, l) or (1, l) has been stored
368
Theory Comput Syst (2011) 48: 343–373
in the finite control, and it keeps on simulating the finite-state acceptor for L(G). At the right end of the tape it halts and accepts, if cϕ(x )ay$ ∈ L(G) holds, that is, if r M1 would accept at this point. Otherwise, M2 restarts in state q(0,j ) , if it has verified r that cϕ(x )ab ∈ L(Ej ) · uj and dz$ ∈ L(Fj ) hold, it restarts in state q(1,j ) , if it has verified that cϕ(x )abd ∈ L(Ej ) · uj and z$ ∈ L(Fj ) hold, or it restarts in state q0(2) if no such index exists. If there are more than one index with the above properties, then the smallest of these indices is chosen. Recall that M1 is correctness preserving, which implies that any of the applicable meta-instructions can be used in an accepting computation. Case 2. Assume that x = [a1 , a2 ] · · · [a2j −1 , a2j ], where a1 a2 · · · a2j −1 a2j = x1 ui such that cx1 ∈ L(Ei ) and ay$ ∈ L(Fi ) for some i ∈ {1, . . . , m}, that is, the i-th rewriting meta-instruction of M1 is applicable at the border between the factors x1 ui r . In this situation M moves its read/write window and ay. Then q is of the form q0,i 2 to the suffix of x , simultaneously simulating finite-state acceptors for the regular languages L(Ej ) · uj , j = 1, . . . , m, and L(G) until its window contains the suffix x2 of x of length k/2, and applies the rewrite operation ui → vi . This rewrite operation is executed by replacing the shortest suffix of x containing the encoded form of ui by an encoding of vi . Here we must again distinguish between several cases depending on the parities of |ui | and |vi |. If both |ui | and |vi | are even, that is, |ui | = 2r and |vi | = 2r for some 1 ≤ r < r, then the encoded form of ui is a suffix [c1 , c2 ] · · · [c2r−1 , c2r ] of x , which is then simply replaced by the corresponding encoded form of vi of length r . If both |ui | and |vi | are odd, that is, |ui | = 2r + 1 and |vi | = 2r + 1 for some 1 ≤ r < r, then there is a letter c0 ∈ such that the suffix [c0 , c1 ][c2 , c3 ] · · · [c2r , c2r+1 ] of x is the encoded form of c0 ui . This suffix of length r + 1 is then replaced by the encoded form [c0 , b1 ][b2 , b3 ] · · · [b2r , b2r +1 ] of c0 vi of length r + 1. If |ui | is even and |vi | is odd, that is, |ui | = 2r and |vi | = 2r − 1 for some 1 ≤ r < r, then the encoded form of ui is a suffix [c1 , c2 ] · · · [c2r−1 , c2r ] of x , which is then replaced by the encoded form [b1 , b2 ] · · · [b2r −3 , b2r −2 ]b2r −1 of vi of length r . Finally, if |ui | is odd and |vi | is even, that is, |ui | = 2r + 1 and |vi | = 2r for some 1 ≤ r < r, then there is a letter c0 ∈ such that the suffix [c0 , c1 ][c2 , c3 ] · · · [c2r , c2r+1 ] of x is the encoded form of c0 ui . This suffix of length r + 1 is then replaced by the encoded form [c0 , b1 ][b2 , b3 ] · · · [b2r −2 , b2r −1 ]b2r of c0 vi of length r + 1. In the first two subcases M2 stores all pairs of the form (0, l) in its finite control for which the rewritten form x of x satisfies the condition that cϕ(x ) ∈ L(El ) · ul holds. Then it moves its head one more step to the right, thereby simulating finitestate acceptors for those languages L(Fl ) for which the pair (0, l) has been stored, and continuing to simulate the finite-state acceptors for the regular languages L(Ej ) · uj , j = 1, . . . , m, and L(G). Next it stores all pairs of the form (1, l) in its finite control for which cϕ(x )a ∈ L(El ) · ul holds. In the latter two subcases M2 only stores all pairs of the form (1, l) in its finite control for which the rewritten form x b2r −1 of x satisfies the condition that cϕ(x ) ∈ L(El ) · ul holds. Thereafter M2 moves its read/write window further to the right, thereby simulating finite-state acceptors for
Theory Comput Syst (2011) 48: 343–373
369
those languages L(Fl ) for which the pair (0, l) or (1, l) has been stored in its finite control, and it keeps on simulating the finite-state acceptor for L(G). At the right end of the tape M2 halts and accepts, if cϕ(x )ay$ ∈ L(G) holds; otherwise, it restarts r in state q(0,j ) , if it has verified that cϕ(x ) ∈ L(Ej ) · uj and ay$ ∈ L(Fj ) hold, it r restarts in state q(1,j ) , if it has verified that cϕ(x )a ∈ L(Ej ) · uj and y$ ∈ L(Fj ) or cϕ(x )b2r −1 ∈ L(Ej ) · uj and ay$ ∈ L(Fj ) hold, and it restarts in state q0 , if no such index exists. (2)
Case 3. Assume that x = [a1 , a2 ] · · · [a2j −1 , a2j ], where a1 a2 · · · a2j −1 a2j a = x1 ui such that cx1 ∈ L(Ei ) and y$ ∈ L(Fi ) for some i ∈ {1, . . . , m}, that is, the i-th rewriting meta-instruction of M1 is applicable at the border between the factors x1 ui and y. r . In this situation M moves its read/write window to the sufThen q is of the form q1,i 2 fix of x a, simultaneously simulating finite-state acceptors for the regular languages L(Ej ) · uj , j = 1, . . . , m, and L(G) until its window contains the suffix x2 of x of length k/2, and applies the rewrite operation ui → vi . This rewrite operation is executed by replacing the shortest suffix of x a containing the encoded form of ui by an encoding of vi . Here we must again distinguish between several cases depending on the parities of |ui | and |vi |. If both |ui | and |vi | are odd, that is, |ui | = 2r + 1 and |vi | = 2r + 1 for some 1 ≤ r < r, then the encoded form of ui is a suffix [c1 , c2 ] · · · [c2r−1 , c2r ]a of x a of length r + 1, which is then simply replaced by the corresponding encoded form [b1 , b2 ] · · · [b2r −1 , b2r ]b2r +1 of vi of length r + 1. If both |ui | and |vi | are even, that is, |ui | = 2r and |vi | = 2r for some 1 ≤ r < r, then there is a letter c0 ∈ such that the suffix [c0 , c1 ][c2 , c3 ] · · · [c2r , c2r+1 ]a of x a is the encoded form of c0 ui . This suffix of length r +1 is then replaced by the encoded form [c0 , b1 ][b2 , b3 ] · · · [b2r −2 , b2r ]b2r +1 of c0 vi of length r + 1. If |ui | is odd and |vi | is even, that is, |ui | = 2r + 1 and |vi | = 2r for some 1 ≤ r < r, then the encoded form of ui is a suffix [c1 , c2 ] · · · [c2r−1 , c2r ]a of x a of length r + 1, which is then replaced by the encoded form [b1 , b2 ] · · · [b2r −1 , b2r ] of vi of length r . Finally, if |ui | is even and |vi | is odd, that is, |ui | = 2r and |vi | = 2r − 1 for some 1 ≤ r < r, then there is a letter c0 ∈ such that the suffix [c0 , c1 ][c2 , c3 ] · · · [c2r , c2r+1 ]a of x a is the encoded form of c0 ui . This suffix of length r + 1 is then replaced by the encoded form [c0 , b1 ][b2 , b3 ] · · · [b2r −2 , b2r −1 ] of c0 vi of length r . In the latter two subcases M2 stores all pairs of the form (0, j ) in its finite control for which the rewritten form x of x satisfies the condition that cϕ(x ) ∈ L(Ej ) · uj holds. Then it moves its head one more step to the right, thereby simulating finitestate acceptors for those languages L(Fl ) for which the pair (0, l) has been stored, and continuing to simulate the finite-state acceptors for the regular languages L(Ej ) · uj , j = 1, . . . , m, and L(G). Next it stores all pairs of the form (1, l) in its finite control for which cϕ(x )b ∈ L(Ej ) · uj holds, where y = bz. In the former two subcases M2 only stores all pairs of the form (1, l) in its finite control for which the rewritten form x b2r +1 of x a satisfies the condition that cϕ(x )b2r +1 ∈ L(El ) · ul holds. Thereafter M2 moves its read/write window further to the right, thereby simulating
370
Theory Comput Syst (2011) 48: 343–373
finite-state acceptors for those languages L(Fl ) for which the pair (0, l) or (1, l) has been stored in its finite control, and it keeps on simulating the finite-state acceptor for L(G). At the right end of the tape M2 halts and accepts, if cϕ(x )ay$ ∈ L(G) r holds; otherwise, it restarts in state q(0,l) , if it has verified that cϕ(x ) ∈ L(El ) · ul and r , if it has verified that cϕ(x )a ∈ L(El ) · ul ay$ ∈ L(Fl ) hold, it restarts in state q(1,l) and y$ ∈ L(Fl ) or cϕ(x )b2r +1 ∈ L(El ) · ul and y$ ∈ L(Fl ) hold, or it restarts in state q0(2) , if no such index exists. As M1 is monotone and correctness preserving, it follows from the above description that M2 simulates accepting computations of M1 cycle by cycle. Thus, L(M2 ) = L(M1 ) holds. It remains to deal with the case that |ui | − |vi | = 1 for some of the rewrite operations of M1 . In some of the cases considered above the realization of the rewrite operation ui → vi on the encoded part of the tape content is length-preserving, if |ui | − |vi | = 1 holds. Thus, for these rewrite operations we need to ensure that either they are combined with an additional encoding of a still unencoded prefix of y, or that they are combined with a subsequent rewrite operation that applies at the same position or at the next position to the right. However, this can be realized by simulating the meta-instructions of M1 concurrently for the current tape content ϕ(x )ay and for the new tape content ϕ(x )ay, respectively ϕ(x )y. In this way a sequence of two rewrite operations at the same (or at neighbouring positions) can be detected, which is then simulated by a single rewrite operation of M2 . Thus, we see that each monotone correctness preserving RR-automaton can be simulated by a monotone deterministic non-forgetting RRWW-automaton, which completes the proof of Theorem 6.2. In [26] it has been shown that the class LRR of left-to-right regular languages ˇ of Culik and Cohen [3] is characterized by monotone deterministic RL-automata, and that this language class is properly contained in the intersection of the class of Church-Rosser languages with the class of context-free languages. Thus, in summary we obtain the following results. Corollary 6.3 LRR = L(det-mon-RL) = L(det-mon-nf-s-RLWW) = L(det-mon-nf-RRWW) CRL ∩ CFL.
7 Conclusion In this paper we have derived several results on the expressive power of some variants of deterministic non-forgetting restarting automata. The diagram in Fig. 1 summarizes the inclusion and equality results obtained. As we see, deterministic nonforgetting restarting automata are in general more powerful than their standard counterparts. In fact, equality in expressive power between a type of deterministic nonforgetting restarting automaton and the corresponding type of (forgetting) restarting automaton only holds in three of the cases considered here: for monotone deter-
Theory Comput Syst (2011) 48: 343–373
371
Fig. 1 Inclusion relations between language classes defined by various types of deterministic restarting automata. An arrow denotes a proper inclusion, and a dotted arrow denotes an inclusion that is not known to be proper
ministic R(W)(W)-automata, which form the bottom level of our hierarchy, for deterministic shrinking RLWW-automata, which form the top level of our hierarchy, and for monotone deterministic RLWW-automata, which characterize the class of leftto-right regular languages. Thus, deterministic non-forgetting restarting automata are certainly a very interesting variant of restarting automata, although, as seen in Sect. 3, already for det-nf-R-automata many algorithmic problems are undecidable. The most interesting results about these types of non-forgetting restarting automata are the separation of L(det-mon-nf-RWW) from L(det-mon-nf-RRWW), and the fact that the latter class coincides with the class of left-to-right regular languages. For deterministic non-forgetting restarting automata that are not monotone some open questions remain. Currently it is open whether or not deterministic nonforgetting RRWW-automata are more expressive than the corresponding RWWautomata. Further, it remains open whether the inclusions of L(det-RLWW) and of L(det-nf-RRWW) in L(det-nf-RLWW) are proper. Also the relationship between the class L(det-RLWW) on the one hand and the classes L(det-nf-RWW) and L(det-nf-RRWW) on the other hand remains open. Finally, it is open whether deterministic (non-forgetting or forgetting) shrinking RLWW-automata have more expressive power than deterministic non-forgetting RLWW-automata. Observe that the technique used in the proof of Theorem 6.2 cannot be applied to simulate a deterministic shrinking RLWW-automaton by a deterministic non-forgetting RLWW-automaton, as the restrictions in locality of subsequent applications of rewrite steps enforced by monotonicity play a crucial role in that proof.
372
Theory Comput Syst (2011) 48: 343–373
Non-forgetting restarting automata can also be seen as a particular case of cooperating distributed systems of restarting automata, where a finite number of restarting automata of a certain type work together to analyse a sentence [17, 19]. This cooperation is coordinated by a mode of operation just as in CD systems of grammars (see, e.g., [5]). In particular, it has been shown in [20] that deterministic non-forgetting restarting automata correspond to globally deterministic CD systems of restarting automata. Accordingly, the results obtained in the present paper can also be formulated in terms of these CD systems.
References 1. Buntrock, G.: Wachsende kontext-sensitive Sprachen. Habilitationsschrift, Fakultät für Mathematik und Informatik, Universität Würzburg (1996) 2. Buntrock, G., Otto, F.: Growing context-sensitive languages and Church-Rosser languages. Inf. Comput. 141, 1–36 (1998) ˇ 3. Culik, K., II, Cohen, R.: LR-regular grammars—an extension of LR(k) grammars. J. Comput. Syst. Sci. 7, 66–96 (1973) 4. Dahlhaus, E., Warmuth, M.: Membership for growing context-sensitive grammars is polynomial. J. Comput. Syst. Sci. 33, 456–472 (1986) 5. Dassow, J., P˘aun, G., Rozenberg, G.: Grammar systems. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 2, pp. 155–213. Springer, Berlin (1997), 6. Janˇcar, P., Mráz, F., Plátek, M., Vogel, J.: Restarting automata. In: Reichel, H. (ed.) FCT 1995, Proc. Lecture Notes in Computer Science, vol. 965, pp. 283–292. Springer, Berlin (1995) 7. Janˇcar, P., Mráz, F., Plátek, M., Vogel, J.: Restarting automata with rewriting. In: Jeffery, K.G., Král, J., Bartošek, M. (eds.) Theory and Practice of Informatics, SOFSEM’96, Proc. Lecture Notes in Computer Science, vol. 1175, pp. 401–408. Springer, Berlin (1996) 8. Janˇcar, P., Mráz, F., Plátek, M., Vogel, J.: On monotonic automata with a restart operation. J. Autom. Lang. Comb. 4, 287–311 (1999) 9. Jurdzi´nski, T., Otto, F.: Restarting automata with restricted utilization of auxiliary symbols. Theor. Comput. Sci. 363, 162–181 (2006) 10. Jurdzi´nski, T., Otto, F.: Shrinking restarting automata. Int. J. Found. Comput. Sci. 18, 361–385 (2007) 11. Jurdzi´nski, T., Mráz, F., Otto, F., Plátek, M.: Monotone deterministic RL-automata don’t need auxiliary symbols. In: De Felice, C., Restivo, A. (eds.) DLT 2005, Proc. Lecture Notes in Computer Science, vol. 3572, pp. 284–295. Springer, Berlin (2005) 12. Jurdzi´nski, T., Mráz, F., Otto, F., Plátek, M.: Degrees of non-monotonicity for restarting automata. Theor. Comput. Sci. 369, 1–34 (2006) 13. Lautemann, C.: One pushdown and a small tape. In: Wagner, K.W. (ed.) Dirk Siefkes zum 50. Geburtstag, pp. 42–47. Technische Universität Berlin and Universität Augsburg (1988) 14. Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, Berlin (1997) 15. Lopatková, M., Plátek, M., Sgall, P.: Towards a formal model for functional generative description: Analysis by reduction and restarting automata. Prague Bull. Math. Linguist. 87, 7–26 (2007) 16. McNaughton, R., Narendran, P., Otto, F.: Church-Rosser Thue systems and formal languages. J. Assoc. Comput. Mach. 35, 324–344 (1988) 17. Messerschmidt, H.: CD-systems of restarting automata. Doctoral Dissertation, Fachbereich Elektrotechnik/Informatik, Universität Kassel (2008) 18. Messerschmidt, H., Otto, F.: On non-forgetting restarting automata that are deterministic and/or monotone. In: Grigoriev, D., Harrison, J., Hirsch, E.A. (eds.) CSR 2006, Proc. Lecture Notes in Computer Science, vol. 3967, pp. 247–258. Springer, Berlin (2006) 19. Messerschmidt, H., Otto, F.: Cooperating distributed systems of restarting automata. Int. J. Found. Comput. Sci. 18, 1333–1342 (2007) 20. Messerschmidt, H., Otto, F.: On deterministic CD-systems of restarting automata. Int. J. Found. Comput. Sci. 20, 185–209 (2009) 21. Narendran, P.: Church-Rosser and related Thue systems. PhD thesis, Rensselaer Polytechnic Institute, Troy, New York (1984)
Theory Comput Syst (2011) 48: 343–373
373
22. Niemann, G., Otto, F.: Further results on restarting automata. In: Ito, M., Imaoka, T. (eds.) Words, Languages and Combinatorics III, Proc. pp. 353–369. World Scientific, Singapore (2003) 23. Niemann, G., Otto, F.: The Church-Rosser languages are the deterministic variants of the growing context-sensitive languages. Inf. Comput. 197, 1–21 (2005) 24. Oliva, K., Kvˇetoˇn, P., Ondruška, R.: The computational complexity of rule-based part-of-speech tagging. In: Matoušek, V., Mautner, P. (eds.) TSD 2003, Proc. Lecture Notes in Computer Science, vol. 2807, pp. 82–89. Springer, Berlin (2003) 25. Otto, F.: Restarting automata and their relations to the Chomsky hierarchy. In: Ésik, Z., Fülöp, Z. (eds.) DLT 2003, Proc. Lecture Notes in Computer Science, vol. 2710, pp. 55–74. Springer, Berlin (2003) 26. Otto, F.: Left-to-right regular languages and two-way restarting automata. RAIRO Theor. Inf. Appl. 43, 653–665 (2009) 27. Otto, F.: Restarting automata. In: Ésik, Z., Martin-Vide, C., Mitrana, V. (eds.) Recent Advances in Formal Languages and Applications. Studies in Computational Intelligence, vol. 25, pp. 269–303. Springer, Berlin (2006) 28. Plátek, M.: Two-way restarting automata and j-monotonicity. In: Pacholski, L., Ružiˇcka, P. (eds.) SOFSEM’01, Proc. Lecture Notes in Computer Science, vol. 2234, pp. 316–325. Springer, Berlin (2001)