A natural-language paraphrase generator for on-line monitoring and commenting incremental sentence construction by L2 learners of German

2008 
representation of the content, often called logical form (see Reiter & Dale, 2000, for an authoritative overview of sentence and text generation technology). In the case of paraphrase generation, the generator delivers all possible ways of linguistically realizing the input logical form, given the lexicon and the grammar rules. However, virtually all natural language generation systems work in a best-first manner and produce only one output sentence rather than the set of all paraphrases. As it is not so easy to change the control structure of such a system, the choice of generators is very limited. Zamorano Mansilla's (2004) project is the only one that applied a sentence generator (KPML; Bateman, 1997) to the recognition and diagnosis of writing errors (“fill-in-the-blank” exercises). Zock & Quint (2004) converted an electronic dictionary into a drill tutor. Exercises were produced by a goal-driven, template-based sentence generator, with Japanese as the target language. 3. Incremental sentence production based on natural-language generation In this Section, we first sketch the grammar formalism of COMPASS-II and its graphical user interface. Then, we describe the paraphraser and run a stepwise demo illustrating the system’s feedback for a sentence under construction. 3.1 The Performance Grammar formalism COMPASS-II is based on the Performance Grammar (PG) formalism, which is well suited to express fine-grained word-order rules in Dutch and German. Moreover, these rules can easily be tailored to other languages, in particular English. PG is a declarative syntax formalism where the hierarchical structure of a sentence is kept separate from its linear structure. PG’s key operation is typed feature unification. Figure 1 illustrates an elementary treelet (also called lexical frame) for the wordform Junge ‘boy’. The second layer represents grammatical functions (e.g., “hd” for head). Phrasal leave nodes (e.g. “ADJP” for adjectival phrase in the function of modifier) can be expanded by an appropriate treelet whose root node carries the same label (Figure 2). Figure 1. Elementary treelet for the lexical anchor Junge. The box associated with the wordclass of the head shows a subset of this node’s morpho-syntactic features. Slashes represent alternative options. Figure 2. Well-formed tree for der kleine Junge ‘the little boy’: Appropriate DP and ADJP treelets have been unified at two leaves of the Junge treelet (cf. Figure 1). The Quantifier Phrase (QP) has no unification partner. Word order is not yet defined (see remainder of Section 3.1). Associated with every treelet is a topology. Topologies serve to assign a left-to-right order to the branches of lexical frames. Here, we only illustrate lemma=Junge gender=masculine person=3rd case=nom number=singular ... the topologies for verb frames (clauses). (1) Was will der kleine Junge dass ich sage? what wants the little boy that I say ‘What does the little boy want me to say?’ F1 M1 M2 ... M6 E1 E2  will der kleine Junge  ↑ ⇑ Was dass ich sage The slot labeled F1 makes up the Forefield (from German Vorfeld), M1-M6 the Midfield (Mittelfeld), and E1 and E2 the Endfield (Nachfeld). Every constituent (subject, head, direct object, complement, etc.) has a small number of placement options, i.e. slots in the topology associated with its “own” clause. For instance, the finite verb of a main clause goes to M2 whereas in subordinate clauses it goes to M6. How is the Direct Object NP was ‘what’ “extracted” from the complement clause and “promoted” into the main clause? Movement of phrases between clauses is due to lateral topology sharing. If a sentence contains more than one verb, each of their lexical frames instantiates its own topology. This applies to verbs of any type—main, auxiliary or copula. In such cases, the topologies are allowed to share identically labeled lateral (i.e. leftand/or right-peripheral) slots, conditionally upon several restrictions (not to be explained here; but see Harbusch & Kempen, 2002). After two slots have been shared, they are no longer distinguishable; in fact, they are unified and become the same object. In example (1), the embedded topology shares its F1 slot with the F1 slot of the matrix clause. This is indicated by the dashed borders of the bottom F1 slot. Sharing the F1 slots effectively causes the embedded Direct Object was to be preposed into the main clause (black dot in F1 above the single arrow in (1)). The dot in E2 above the double arrow marks the position selected by the finite complement clause (cf. Figure 3). 3.2 “Scaffolded” writing with COMPASS-II The paraphrase generator of COMPASS-II can produce all linear order variants licensed by the most important word order rules of German. The generator takes as input tentative syntactic trees constructed by the student through a graphical direct-manipulation (“dragd the confusion arises from the correct ein kleiner Junge ‘a little boy’.). With respect to word order, malrules refer to typical differences between L1 and L2. For instance, one rule “allows” ungrammatical verb-second word-order in German subordinate clauses (most of which are clause-final rather than verb-second), but it triggers an error message if the student-produced sentence conforms to it. Figure 4 displays the overall system at work for clause (2) where the student uses an English word-order rule. (2) Heute Anja baut eine Rakete Today Anja builds a rocket Figure 4. Snapshot of the first system response to the incorrect word order in sentence (2). The left window shows the system’s word list. Lexical frames (treelets) corresponding to selected words appear in the big window in the middle column. This window is the workspace where the student can combine and edit the selected treelets. The resulting leaf strings are automatically shown in the small window at the top. The student can edit these strings by typing or cutp cf. (2)).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    4
    Citations
    NaN
    KQI
    []