A Single Layer Network Model of Sentential Recursive Patterns

2009 
A Single Layer Network Model of Sentential Recursive Patterns Lei Ding (dinglei@cse.ohio-state.edu) Department of Computer Science & Engineering, The Ohio State University, 2015 Neil Ave, Columbus, OH 43210, USA Simon Dennis (simon.dennis@gmail.com) Department of Psychology, The Ohio State University, 1835 Neil Ave, Columbus, OH 43210, USA Dennis N. Mehay (mehay@ling.ohio-state.edu) Department of Linguistics, The Ohio State University, 1712 Neil Ave, Columbus, OH 43210, USA such as identifying the a n b n language from raw input strings using small networks (Wiles & Elman, 1995). This work demonstrated that recursive generalization was possible to a significant degree, but it remained unclear whether these re- sults would apply to other kinds of recursion and with larger vocabularies. Christiansen and Chater (1999) expanded previous work significantly by demonstrating that the simple recurrent net- work (Elman, 1991) was capable of capturing the three main kinds of recursive structures that were proposed by Chomsky (1957) as problematic for finite state systems. These were counting recursion (e.g. ab, aabb, aaabbb) of the kind studied by Wiles and Elman (1995); identity recursion (e.g. abbabb, aabbaabb) which captures cross serial structures found in Swiss German and in Dutch and mirror recursion (e.g. abba, aabbaa, abbbba) which can be interpreted as center embed- ding. In addition, Christiansen and Chater (1999) showed that the SRN predicted that cross serial structures should be eas- ier to process than center embedded structures as has been demonstrated by Bach et al. (1986). While the SRN provides a compelling proof of the capabil- ities of connectionist networks, its structure is not, in general, easy to analyze formally. Consequently, one must rely upon simulation results and it is not feasible to sample parameters such as initial weight vectors comprehensively. In this paper, we propose a single layer architecture that employs decaying input activations and analyze the linear separability and in- terclass distance of patterns in each of the recursion classes. We take linear separability as an indication of which patterns the model predicts should be able to be processed and the in- terclass distance as an indication of the ease with which that processing might occur. We start by outlining the model and then explore its performance at each level of recursion. Abstract Recurrent connectionist models, such as the simple recurrent network (SRN, Elman, 1991), have been shown to be able to account for people’s ability to process sentences with center embedded structures of limited depth without recourse to a competence grammar that allows unbounded recursion (Chris- tiansen & Chater, 1999). To better understand the connec- tionist approaches to recursive structures, we analyze the per- formance of a single layer network architecture that employs decaying lexical context representation on three kinds of re- cursive structures (i.e., right branching, cross serial and cen- ter embedding). We show that the model with one input bank can capture one and two levels of right branching recursion, one level of center embedded recursion, but not two levels and cannot capture a single level of cross serial recursion. If one adds a second bank of input units with a different decay rate, the model can capture one and two levels of both cen- ter embedded and cross serial recursion. Furthermore, with this model the interclass difference of doubly cross serial pat- terns is greater than it is for center embedded recursion, which may explain why people rate these patterns as easier to process (Bach, Brown, & Marslen-Wilson, 1986). Keywords: sentence processing; network model; decaying lexical contexts; linear separability. Introduction Chomsky (1957) argued that the presence of recursive struc- tures such as center embedded clauses rules out associative explanations of the language processing mechanism. This ar- gument has been challenged in many ways both by disputing the empirical claim that humans are capable of processing re- cursive structures (Reich, 1969) and the computational claim that associative mechanisms, particularly associative mecha- nisms that employ hidden unit representations in the connec- tionist tradition, are unable to process recursive structures, at least of the depth observed in human performance (Chris- tiansen & Chater, 1999). Early attempts to investigate the capabilities of connec- tionist networks to capture linguistic structure fell into two approaches (Christiansen & Chater, 1999). In the first ap- proach, networks were provided with tagged datasets that provided information about the extent and identity of con- stituents (Chalmers, 1990; Pollack, 1988) and were required to generalize these mappings. While these models demon- strated the representational abilities of networks, the fact that they required labeled training data of a kind that is unlikely to be available to human learners meant that their relevance to the question of how linguistic structure is acquired was lim- ited. A second approach involved learning simplified tasks A Single-layer Network Model Our network model, as seen in Figure 1, is a softmax single layer neural network (Bridle, 1990). For each word w in a sentence, an input x fed to the network has a nonzero value at the i th position, only when a word token of the i th type in a given vocabulary V appears to its left. The strength of context word’s input is decided by the probability density function of an exponential distribution taking the distance between the words as the argument. The exponential function we use takes into account all the words occurring in the left context of a
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    3
    Citations
    NaN
    KQI
    []