Machine Translation 14: 217–230, 1999. © 2001 Kluwer Academic Publishers. Printed in the Netherlands.
217
Anaphora and Translation Discrepancies in Russian–German MT STEFANIE GELDBACH SAIL Labs GmbH, Grünbergerstr. 54, 10245 Berlin, Germany (E-mail:
[email protected])
Abstract. Anaphora resolution in machine translation involves two aspects: (1) the identification of the antecedent, i.e., the determination of co-reference relations between anaphor and antecedent; and (2) the translation of the anaphor, i.e., the selection of the appropriate target-language equivalent. The identification of the antecedent is essentially a monolingual, language-pair independent problem which is usually solved during analysis. The selection of the target-language equivalent, on the other hand, can be regarded as a language-pair dependent task which has to be tackled during transfer and generation. In this paper, the problems of anaphora translation are discussed for the language pair Russian–German. Although in most cases source-language anaphoric pronouns correspond to targetlanguage anaphoric pronouns, in some cases this straightforward equation does not hold. Two cases of such translation discrepancies are treated here: zero anaphora and pronominal PPs. The differences in the distribution of zero anaphora and pronominal PPs in Russian and German are described, and solutions to these translation problems based on the Russian–German MT system T1 are presented. Key words: anaphora resolution, Russian–German MT, zero anaphora
1. Introduction Over the last years, anaphora resolution has received considerable attention in various fields of natural language processing such as machine translation (MT) or information extraction (for a survey on recent work see Mitkov and Boguraev 1997). In MT, the translation of anaphoric pronouns consists of two steps: the identification of the antecedent (i.e., anaphora resolution proper) and the lexical transfer of the pronoun. Whereas the resolution of the anaphoric link can be regarded as a language-pair independent task whose outcome is not influenced by the respective target language, the translation of the anaphor is language-pair dependent. The anaphora-resolution algorithm will produce the same results irrespective of whether the target language is German, English or French. The actual translation of the anaphor, however, will have to take language-pair-specific peculiarities into account. Such translation discrepancies, as they are termed by Mitkov and Schmidt (1998), may be caused by general differences in the pronominal system of source and target language or by typological differences such as the presence or absence of elliptical zero subject constructions. Much of the work in MT has focused on
218
STEFANIE GELDBACH
the development of operational anaphora-resolution algorithms which identify the antecedents of pronominal anaphora. Difficulties with the lexical transfer of pronominal anaphora have hardly been addressed in the MT literature. This might be due to the fact that in comparison to the highly complex task of resolving anaphoric links, the actual translation seems rather trivial. However, even if the anaphoric link has been successfully resolved, the translation of anaphoric pronouns is not always a straightforward task. This paper deals with the lexical transfer of pronominal anaphora for the language-pair Russian–German. A detailed discussion of the anaphora-resolution algorithm for Russian which has been designed at SAIL Labs GmbH, Berlin for the Russian analysis module of T1 (T1RuDe) can be found in Geldbach (1997).1 The algorithm can be classified as a knowledge-based approach which draws on various anaphora-resolution factors, partly implemented as constraints (gender and number agreement, c-command constraints), partly implemented as preferences (e.g., selectional restrictions, syntactic parallelism, semantic hierarchy, distance, functional sentence perspective, weight of syntactic roles). These factors are used in order to select the best match among a number of antecedent candidates (for similar algorithms see for example Rich and LuperFoy 1988; Hauenschild et al. 1993; or Lappin and Leass 1994). Here we presuppose that the anaphora identification has been carried out successfully during Russian analysis and deal only with transfer problems. In the following sections, we will discuss cases where the Russian anaphora on, ona, ono cannot be simply equated with their German counterparts er, sie, es and/or their inflectional forms. The inflectional paradigms of the Russian and German anaphoric pronouns are reproduced in Tables I and II. Table I. Inflectional paradigm of the Russian anaphoric pronouns
Nominative Genitive Dative Accusative Instrumental Prepositive
Masculine
Feminine
Neuter
Plural
on (n)ego (n)emu (n)ego (n)im nem
ona (n)ee (n)e (n)ee (n)e ne
ono (n)ego (n)emu (n)ego (n)im nem
oni (n)ih (n)im (n)ih (n)imi nih
At first glance, there is a striking one-to-one correspondence between Russian and German: both languages distinguish three genders in the singular and neutralize this gender distinction in the plural. It will become clear, however, that identical form does not necessarily entail identical function. We will consider two cases of translation discrepancies in Russian–German anaphora translation, namely Russian zero anaphora (Section 2) and pronominal prepositional phrases (PPs) (Section 3)
219
ANAPHORA AND TRANSLATION DISCREPANCIES
Table II. Inflectional paradigm of the German anaphoric pronouns
Nominative Genitive Dative Accusative
Masculine
Feminine
Neuter
Plural
er seiner ihm ihn
sie ihrer ihr sie
es seiner ihm es
sie ihrer ihnen sie
and will suggest practical solutions to these translation problems which are based on the system T1RuDe but should be applicable to other systems as well. 2. Russian Zero Anaphora and their Translation into German In contrast to other Slavonic languages such as Czech or Polish, Russian is not a typical pro-drop language. Although at least in the present tense Russian verbs are unambiguously marked for person and number, pronominal subjects are rarely dropped in written Russian texts. Nevertheless there are some constructions and text types where the omission of pronominal subjects is accepted by the Russian literary norm as well. For example, first-person pronouns are regularly omitted in letters, especially with performative verbs as in (1).2 The elliptical subject is marked by φ. (1)
φ Nadeems (1st pers. pl.), qto naxa pros ba budet vypolnena. Wir hoffen, daß unsere Bitte erfüllt wird. ‘We hope that our request will be met.’
Elliptical subjects are also quite common if the zero subject of the subclause is co-referent with the subject of the main clause as in (2) or (3). (2)
Vyi uvereny, qto φ i hotite (2nd pers. pl.) uniqtoit perevod? Sind Siei überzeugt, daß Siei die Übersetzung vernichten wollen? ‘Are youi sure that youi want to delete the translation?’
(3)
Nastowee rukovodstvoi vlets universal nym soprovodeniem programmy, tak kak φ i opisyvaet (3rd pers. sg.) vse osnovnye zlementy. Das vorliegende Handbuchi ist eine universelle Begleitung des Programms, da esi alle grundlegenden Elemente beschreibt. ‘The included manuali is a universal accompaniment to the program, since iti describes all basic elements.’
In the present tense, the grammatical person of Russian verbs can be determined by the inflectional endings. In the past tense, however, Russian verbs are
220
STEFANIE GELDBACH
marked only for gender and number. As a result, elliptical constructions in the past tense are often highly ambiguous, which explains their low frequency in written texts. Without the context it is not even possible to determine whether personal or anaphoric pronouns have been omitted. Since in German pronominal subjects are obligatory, Russian zero subjects have to be replaced with an overt pronoun in the German translation regardless of whether the missing pronoun is a personal pronoun as in (1) and (2) or an anaphoric pronoun as in (3). In the following, we will discuss how zero anaphora can be handled in MT and in what respect the treatment of zero anaphora differs from the treatment of overt anaphora. Intra- and intersentential zero anaphora will be considered separately. 2.1.
INTRASENTENTIAL ZERO ANAPHORA
In T1RuDe, the resolution of intrasentential zero anaphora differs slightly from the resolution of pronominal anaphora (Geldbach and Höser 1997). In contrast to pronominal anaphora where the antecedent is selected among a list of antecedent candidates, the antecedent of intrasentential zero anaphora as in (3) is not determined by the anaphora-resolution algorithm briefly described above. As Russian zero anaphora normally occur only in clearly definable syntactic environments (e.g., subject identity in main clause and subclause) it is safe to propose the subject of the main clause as antecedent of the elliptical subject. We therefore simply insert a copy of the matrix subject into the empty subject slot of the subclause. This is illustrated in the analysis structure of (3) given in Figure 1 where the structure of the subclause (CLS - SUB :173) contains a copy of the subject of the main clause (NP :135).
Figure 1. Analysis structure for (3).
In the corresponding German transfer tree (Figure 2) the second occurrence of has been replaced by a pronoun. The pronominalization of the inserted NP takes place after lexical transfer, i.e., during German generation when the targetlanguage agreement features of the antecedent have already been determined. Thus NP :135
ANAPHORA AND TRANSLATION DISCREPANCIES
221
it is ensured that the appropriate German agreement features can be generated for the German pronoun.
Figure 2. Transfer structure for (3).
The situation is slightly more complicated in anaphoric chains, i.e. when the immediate antecedent of the zero anaphor is also a pronoun as in (4). (4)
Izuqa ztu staty-oi , nuno imet snoe predstavlenie o ee funkcsii. Onai ves ma udobna dl naqinawih pol zovatele, poskol ku φ i pozvolet (3rd pers. sg.) srazu pereti k praktiqesko rabote. Während man diesen Artikeli studiert, muß man eine klare Vorstellung von seiner Funktion haben. Eri ist für die angehenden Nutzer überaus günstig, da eri erlaubt, sofort zu einer praktischen Arbeit überzugehen. ‘When studying this articlei one has to have a clear idea about its purpose. Iti is very useful for beginning users because iti allows [them] to start working immediately.’
In this case, the zero pronoun is at first replaced by the pronoun ona which is the subject of the second sentence. As a result, the syntactic structure of (4) contains two occurrences of the anaphoric pronoun ona whose antecedent statb ‘article’ will then be determined by regular anaphora resolution. In T1RuDe intrasentential zero anaphora resolution takes place before the resolution of pronominal anaphora. This offers the additional advantage that the result of zero anaphora resolution can contribute important information to pronominal anaphora resolution, especially to the application of c-command constraints, as illustrated in (5). The analysis structure for (5) is given in Figure 3. (5)
Glagoli imeet raznye perevody, kogda φ i upotreblets s prmym dopolneniemj ili bez negoj . Das Verbi hat verschiedene Übersetzungen, wenn esi mit einem direkten Objektj oder ohne esj verwendet wird. ‘The verbi has various translations, when iti is used with a direct objectj or without onej .’
222
STEFANIE GELDBACH
Figure 3. Analysis structure for (5).
During anaphora resolution, c-command constraints are used in order to define certain domains in the syntactic structure, the so-called “minimal governing categories” (MGCs), where the anaphor cannot be bound by its antecedent. In the analysis structure of T1RuDe, this domain is usually marked by the first clausal node (CLS, CLS - SUB or CLS - REL) dominating the anaphor. In (5), the MGC of the anaphoric pronoun nego is restricted by the CLS - SUB :167 node. The potential antecedent of nego has to be outside this domain. As can be seen in Figure 3, the first occurrence of the NP glagol ‘verb’ in the syntactic structure is immediately dominated by the CLS :192 node and is therefore in a different MGC than the anaphor nego. Thus, it is not possible to discard this occurrence of glagol from the list of potential antecedents for nego on the basis of syntactic constraints. However, the second occurrence of glagol in the analysis tree, which was inserted during zero anaphora resolution, is in the same MGC as the pronoun nego (NP :135) and therefore violates the c-command constraints. This makes it possible to delete all co-referent NPs (among them the first occurrence of glagol in the tree) from the antecedent candidate list. Actually, (5) illustrates an interesting exception to the c-command constraint. As can be seen in Figure 3, the anaphor nego (NP :135) and its antecedent dopolnenie ‘object’ (NP :132) are both dominated by the same CLS - SUB node. If we take the definition of MGC as it has just been given above, this would mean that the NP dopolnenie is not a legitimate antecedent for the pronoun nego. Obviously, the formulation of c-command constraints has to be modified for coordinated structures. Possibly, this deviation can be explained by the fact that
ANAPHORA AND TRANSLATION DISCREPANCIES
223
(5) can be regarded as synonymous with (6) where dopolnenie and nego are in different MGCs. (6)
Glagoli imeet raznye perevody, kogda φ i upotreblets s prmym dopolneniemj ili upotreblets bez negoj . Das Verbi hat verschiedene Übersetzungen, wenn esi mit einem direkten Objektj verwendet wird oder ohne esj verwendet wird. ‘The verbi has various translations, when iti is used with a direct objectj or used without onej .’
Although anaphora resolution in (5) is further complicated by certain modifications of the c-command constraint for coordinated constituents, (5) also shows that the insertion of elliptical subjects can be crucial for the correct selection of the antecedent. Without zero anaphora resolution the NP glagol would not have violated the c-command constraints. In that case this NP eventually would have been proposed as antecedent for nego due to a high score for such anaphora resolution factors as weight of syntactic role (subject) and functional sentence perspective (topic). 2.2.
INTERSENTENTIAL ZERO ANAPHORA
Now let us turn to the treatment of intersentential zero anaphora. In contrast to intrasentential zero anaphora, the antecedent of intersentential zero anaphora is not directly copied into the subject position during syntactic analysis. Instead, a zero pronoun is inserted into the empty subject slot in order to obtain an analysis structure where all obligatory argument slots of the verb are filled. This is illustrated in the analysis structure of (7) given in Figure 4. (7)
Ivani rabotaet v ministerstve. φ i Qasto ezdit (3rd pers. sg.) v stolicu. Ivani arbeitet im Ministerium. Eri fährt häufig in die Hauptstadt. ‘Ivani is working in the ministry. Hei often drives to the capital.’
In this case, the zero pronoun, which is marked [e] in the syntactic structure, has the canonical form on/ona/ono ‘he’/‘she’/‘it’. If the verb had been in the past tense the inserted pronoun would have the canonical form /ty/on ‘I’/‘you’/‘he’ or /ty/ona ‘I’/‘you’/‘she’ depending on the gender of the verb. These zero pronouns which are also included in the Russian–German transfer dictionary, can be treated like their overt counterparts on, ona and ono which means that they are included in the list of anaphoric pronouns to be resolved during regular anaphora resolution. Because of the general syntactic constraints which hold for the use of subject zero anaphora in Russian, zero pronouns are basically easier to resolve than
224
STEFANIE GELDBACH
Figure 4. Analysis structure for (7).
overt ones. The antecedent of intersentential zero anaphora is mostly the subject of the preceding sentence. Although subject candidates also score high during the resolution of overt anaphora the choice of the antecedent does not depend solely on the syntactic function of the antecedent candidate but is influenced by other anaphora resolution factors as well. 3. The Case of Russian Pronominal PPs In this section, we will discuss the translation of Russian pronominal PPs. In Russian, pronominalized PPs are much more common than in German where other proforms, notably the so-called “pronominal adverbs” (Pronominaladverbien), may be used instead. Pronominal adverbs are made up of the adverbs da(r), hier or wo(r), and the prepositions an, auf, aus, bei, durch, für, gegen, hinter, in, mit, nach, neben, über, um, unter, von, vor, zu and zwischen. These words are used as substitutes for PPs instead of anaphoric pronouns, demonstrative pronouns, interrogative pronouns and relative pronouns. The collocation preposition + anaphoric pronoun competes with pronominal adverbs of the da-type. The distribution of pronominal adverbs and pronominalized PPs is more or less complementary (Engel 1996). Pronominal PPs are predominantly used in order to refer to animate nouns and pronominal adverbs in order to refer to non-animate nouns although there are considerable overlaps where both forms are possible. The following rules hold: − If the antecedent is an individual person, only pronominal PPs are used (8a). − If the antecedent denotes a group of persons both forms are possible (8b). − Usually, pronominal adverbs are used if the antecedent is non-animate. This holds especially for abstract or deverbal nouns (8c). − Pronominal adverbs are also used if the antecedent is a noun of neuter gender (8d). − If the speaker has a specific object in mind, however, PPs are possible for non-animates as well (8e).
ANAPHORA AND TRANSLATION DISCREPANCIES
(8)
225
a. Anna vermißt ihren Bruderi sehr und denkt oft an ihni (*darani ). ‘Anna misses her brotheri a lot and often thinks of himi .’ b. Die Teilnehmeri kamen aus ganz Europa. Unter ihneni (darunteri ) waren auch fünf Russen. ‘The participantsi came from all over Europe. Among themi were five Russians.’ c. Anna interessiert sich für Politiki und diskutiert gerne darüberi (??über siei ). ‘Anna is interested in politicsi and likes discussing iti .’ d. Hast du ein Geschenki für Anna gekauft? – Nein, ich habe nicht darani (??an esi ) gedacht. ‘Did you buy a presenti for Anna? – No, I didn’t think about iti .’ e. Die neue Klimaanlagei funktioniert tadellos. Sie können sich auf siei (daraufi ) verlassen. ‘The new air-conditioningi works perfectly. You can rely on iti .’
Russian (just like English) has no forms corresponding directly to the German pronominal adverbs and the use of pronominalized PPs is not restricted to certain semantic classes. As a result, Russian anaphoric pronouns may be used in contexts where they are not acceptable or at least somewhat awkward in German. The situation is further complicated by the fact that in some cases Russian pronominal PPs can also be translated with local or directional proadverbs like hier ‘here’, dort ‘there’ or dorthin ‘thither’ (9). (9)
Posle revolcii osobnki byl nacionalizirovan. V 1921 godu v nemi Asedora Dunkan. (Sputnik 5/98, 128). Nach der Revolution wurde die Villai verstaatlicht. 1921 zog hieri die amerikanische Tänzerin Isadora Duncan ein. ‘After the revolution the villai was nationalized. In 1921, the American dancer Isadora Duncan moved to iti .’
These distinctions in the use of pronominal PPs and competing proforms should be – at least to some extent – reflected in machine translations as well. A Russian– German system should be able to make a choice between PPs, pronominal adverbs and proadverbs like hier. The actual implementation raises two questions. First, under which conditions should pronominal PPs be replaced by other proforms? And second, at which stage of the translation process should this substitution be carried out? It follows from the examples given above that the translation of PPs with animate antecedents is straightforward. In this case the default translation with a PP is always possible and often the only solution (10). (10)
Vqera Ivan poznakomils s zto devuxkoi . On srazu vlbils v neei .
226
STEFANIE GELDBACH
Gestern hat Ivan dieses schöne Mädcheni kennengelernt. Er hat sich sofort in siei verliebt. ‘Yesterday, Ivan got to know this beautiful girli . He fell in love with heri immediately.’ In the case of non-animate antecedents the picture is more complex. Here, a translator or a translation system has to choose between different options. Consider examples (11)–(14) which are partly machine translated and partly taken from human translations. The Russian PPs and their alternative translations are each printed in bold face. (11)
Tret glavai dast Vam vozmonost bystro pristupit k rabote s utilito. V nei na konkretnyh primerah pokazyvaets posledovatel nost destvi. Das dritte Kapiteli wird Ihnen die Möglichkeit geben, schnell an die Arbeit mit dem Tool zu gehen. ?? In ihmi /darini /dorti wird an konkreten Beispielen eine Sequenz der Handlungen gezeigt. ‘The third chapteri will provide you with the possibility to go quickly to work with the tool. In iti concrete examples show the order of actions.’
(12)
To oknoi , kotoroe nahodits sverhu, vlets aktivnym v nastowi moment, i Vy moete rabotat v nemi . Jenes Fensteri , das sich oben befindet, ist zum gegenwärtigen Moment aktiv, und Sie können in ihmi /darini /dorti arbeiten. ‘That windowi which is in the upper part is currently active and you can work in iti .’
(13)
Vodka – prosto vodai , a esli vy v nei raztvorite «pi», sodet za lboe maroqnoe vino. (Sputnik 5/98, 89). Als Wodka nehmen wir einfach Wasseri , und wenn man ??in ihmi /darini /??dorti ein Päckchen Instant-Juice auflöst, erhält man jeden beliebigen Markenwein. ‘Instead of vodka we simply take wateri , and if you dissolve “Yupi” in iti , it will pass for any quality wine.’
(14)
V Monte Karlo pokazyvala nomeri «Bogin zme», – rasskazyvaet Irina, – v nemi uqastvovalo bol xe destka reptili. (Sputnik 5/98, 99). “In Monte Carlo trat ich mit der Nummeri ‘Die Göttin der Schlangen’ auf”, berichtet die Schlangenbändigerin. “ ??An ihri /darani /??dorti nahmen mehr als zehn Reptilien teil.” ‘ “In Monte Carlo I appeared in the showi ‘The Goddess of Snakes’,” says Irina. “More than ten reptiles participated in iti .” ’
If the PP in question is the sentence topic as in (11), a translation with a pronominal adverb or local proadverb is usually preferable. With different word order as in
ANAPHORA AND TRANSLATION DISCREPANCIES
227
(12), PPs gain in acceptability, however. The choice between pronominal adverbs (e.g., darin) and local adverbs (e.g., dort) also depends on the semantic features of the antecedent which can be seen by comparing the human translations of (13) and (9). In (9) where the Russian PP v nem refers to a noun with the semantic feature “location”, the adverb hier was chosen as translation whereas in (13) where the antecedent is a mass noun (voda ‘water’) the PP was translated with the pronominal adverb darin. In contrast to adverbials, prepositional objects usually cannot be replaced by proadverbs like dort. In this case, the human translator or the MT system just has to make a choice between pronominal adverbs or pronominal PPs. This is illustrated in (14) where the PP v nem is an obligatory argument of the verb uqastvovatb ‘participate’. In the human translation given here, the Russian PP v nem was translated with the German pronominal adverb daran. These examples demonstrate quite clearly that the selection of the German proform is influenced by several parameters (i.e., semantic features of the antecedent but also word order and syntactic function of the PP in question) which makes its formalization rather difficult. Therefore, the specification of transfer tests requires considerable attention from the lexicographer. If it has been decided under which conditions Russian PPs should be replaced by pronominal adverbs or proadverbs the structural transformation shown in Figure 5 has to be carried out.
PP Prep
ADVP NP
Figure 5. Structural transformation of Russian pronominal PPs.
Russian PPs have to be mapped onto German AdvPs. In a transfer system like T1RuDe, this transformation can take place either during lexical transfer or during generation. During lexical transfer the transformation PP → AdvP can be performed in the transfer entries of Russian prepositions in the Russian–German transfer dictionary using the schematic entry (15) as a template. (15)
Prep → Adv Tests CONDITIONS Transformations DELETE - NP - DAUGHTER MAP - SLPP - ONTO - TLADVP
228
STEFANIE GELDBACH
This entry contains the following information: a Russian preposition (e.g., v) is translated with a German adverb (e.g., dort) if all the conditions (e.g., specific semantic features of a pronominal NP daughter) specified in the test part are met. Then, the NP daughter, i.e. the anaphoric pronoun, will be deleted from the syntactic structure and the SL syntactic category PP will be mapped onto the TL category AdvP. In principle, additional entries of this format are required for all Russian prepositions which can be translated with one of the 19 German prepositions mentioned above. Due to the polysemy of many Russian prepositions this approach would quickly result in dozens of new transfer entries. The situation is even further complicated by prepositional objects. As the prepositions of prepositional objects are often semantically empty the selection of the equivalent TL preposition is basically unpredictable and depends on the subcategorization of the respective TL verb. For example, the German translation teilnehmen of the Russian verb uqastvovatb ‘take part’ does not govern the preposition in, but an. In T1RuDe, the transfer entry of the verb uqastvovattb contains therefore a transformation which maps the Russian preposition v directly onto the German preposition an. If the preposition has already been translated in the entry of its governing verb, however, a further dictionary look-up of prepositional transfer entries is unnecessary. As a consequence, structural transformations of the type PP→AdvP contained in Prep entries will not be carried out for pronominal PPs which are prepositional objects. This has to be compensated for, by including analogous transformations in the respective verb entries, for example. It is obvious that this approach is rather labour-intensive because dozens of new transfer entries have to be written by the lexicographer. Therefore, it is more efficient to carry out the transformation PP→AdvP after lexical transfer by including a rule or procedure into the German generation grammar which converts German PPs into German AdvPs if necessary. Whenever the system comes across a pronominal PP whose lexical head is one of the 19 prepositions listed above, it has to check whether the respective PP should be replaced by another proform or left unchanged. This strategy has two main advantages: first, all the information which is needed for the treatment of pronominal PPs is located in one procedure instead of being distributed across dozens of transfer entries. This makes testing and consistency checks a lot easier. Secondly, if the category conversion PP→AdvP is included in the German generation module, it can easily be used in other language pairs with similar translation discrepancies as well.
4. Summary This paper has attempted to show with the example of Russian zero pronouns and pronominal PPs which translation discrepancies may arise in translating Russian anaphora into German and how these discrepancies can be treated in MT. It should be emphasized, however, that these translation discrepancies are by no
ANAPHORA AND TRANSLATION DISCREPANCIES
229
means limited to the translation direction Russian–German. The complexity of anaphora transfer is a recurrent theme in many language pairs. Basically, zero anaphora are problematic in all language pairs which combine pro-drop languages (such as Czech, Polish or Italian) with non-pro-drop languages (such as English or German). Similarly, problems with pronominal PPs will arise with German as target language as soon as the source language has no proforms which correspond to German pronominal adverbs, as is true for Russian or English. These considerations are especially important for multilingual MT environments. Although in contrast to anaphora resolution proper, which is truly languagepair independent, the translation of anaphora is language-pair dependent, it is nevertheless desirable to implement solutions which are as language-pair unspecific as possible in order to enhance the reusability of the system components. Ideally, the treatment of zero anaphora developed for one language pair should be transferable to other language pairs with minor adaptations to the system components. It is also important to note that the two translation discrepancies discussed in this paper have different priorities. If a Russian–German MT system does not attempt to resolve zero anaphora, the resulting German translation will definitely be ungrammatical because pronominal subjects cannot be dropped in German. The substitution of pronominal PPs, however, improves foremost the stylistic quality of the German translation: the understandability of the translation is not affected if PPs remain unchanged in the text which might be sufficient for raw translations. High-quality translations, however, require more sophisticated solutions. Thus, the resolution of zero anaphora is an absolute necessity whereas the treatment of pronominal PPs is more of a luxury for an MT system.
Acknowledgements Thanks to Ulrich Schwarz for helpful comments and help with the typesetting. I also wish to express my thanks to the anonymous referees who have encouraged me to rewrite parts of this paper. This notwithstanding, I alone, of course, am solely responsible for any shortcomings in this work.
Notes 1 Recent developments in the linguistic software of the language pair Russian–German have been
described in volume 21, issue 2 (1997) of Sprache und Datenverarbeitung which contains articles on the treatment of multiwords, Russian compounds, elliptical and coordinated structures and Russian modals. Höser and Klimonow (1994) give a brief introduction to the overall system architecture. 2 Literal glosses for the Russian and German sentences have been omitted here, since the purpose of the examples is in all cases made explicit. The English translations follow as closely as possible the phrasing of the Russian and German. All Russian examples (with the exception of the three examples taken from the magazine Sputnik) are machine translated.
230
STEFANIE GELDBACH
References Engel, Ulrich: 1996, Deutsche Grammatik [German grammar], Heidelberg, Julius Groos. Geldbach, Stefanie: 1997, ‘Pronominale Referenzen in der maschinellen Übersetzung: Ein Verfahren zur Anaphernresolution’ [Pronominal Reference in Machine Translation: An Experiment in Anaphora Resolution]’, Sprache und Datenverarbeitung 21(2), 76–85. Geldbach, Stefanie and Iris Höser: 1997, ‘Zur Behandlung subjektloser Sätze [On the Treatment of Subjectless Sentences]’, Sprache und Datenverarbeitung 21(2), 49–57. Hauenschild, Christa, Bernd Mahr, Susanne Preuß, Birte Schmitz, Carla Umbach, Wilhelm Weisweber, Lone Behesty, Guido Dunker, Rickard Matthew, Christian Werner-Maier, and Erich Ziegler: 1993, Anapherninterpretation in der maschinellen Übersetzung: Schlußbericht des Berliner Projektes der EUROTRA-D-Begleitforschung [Anaphora Interpretation in Machine Translation: Final Report of the Berlin Project of the EUROTRA-D Supplementary Research], KIT-Report 108, TU Berlin. Höser, Iris and Gerda Klimonow: 1994, ‘Russisch–Deutsch: ein neues METAL-Sprachpaar’ [Russian–German: A New METAL Language Pair], Sprache und Datenverarbeitung 18(1), 53–64. Lappin, Shalom and Herbert Leass: 1994, ‘An Algorithm for Pronominal Anaphora Resolution’, Computational Linguistics 20, 535–561. Mitkov, Ruslan and Branimir Boguraev (eds): 1997, Proceedings of the ACL’97/EACL’97 Workshop on Operational Factors in Practical, Robust Anaphora Resolution, Madrid, Spain. Mitkov, Ruslan and Paul Schmidt: 1998, ‘On the Complexity of Pronominal Anaphora Resolution in Machine Translation’, in Carlos Martin-Víde (ed.), Mathematical and Computational Analysis of Natural Language, Amsterdam, John Benjamins, pp. 207–222. Rich, Elaine and Susann LuperFoy: 1988, ‘An Architecture for Anaphora Resolution’, Second Conference on Applied Natural Language Processing, Austin, Texas, pp. 18–24.