Phonetic variation and the recognition of words with pronunciation variants

Meghan Sumner,Chigusa Kurumada,Roey Gafter,Marisa Casillas

Phonetic variation and the recognition of words with pronunciation variants

2013

Phonetic variation and the recognition of words with pronunciation variants Meghan Sumner (sumner@stanford.edu), Chigusa Kurumada (kurumada@stanford.edu), Roey J. Gafter (gafter@stanford.edu), Marisa Casillas (middyp@stanford.edu) Department of Linguistics, Margaret Jacks Hall, Bldg. 460 Stanford, CA 94305-2150 USA Abstract with pronunciation variants typically compare a frequent (commonly produced) variant (e.g., [ɾ] or [n_]) to an canonical, but infrequent variant (e.g., [t] or [nt]). Interestingly, in this area of research, two conceptually- identical studies have found evidence for lexical representations that are specified for a particular pronunciation variant. In one case, though, the data suggest that the frequent variant is stored (Connine, 2004). In the other case, the data suggest that the canonical variant is stored (Pitt, 2009). We call this the representation paradox. Specifically, these studies found: Studies on the effects of pronunciation variants on spoken word recognition have seemingly contradictory results – some find support for a lexical representation that contains a frequent variant, others, an infrequent (but idealized) variant. We argue that this paradox is resolved by appealing to the phonetics of the overall word. In two phoneme categorization studies, we examined the categorization of the initial sounds of words that contain either tap or [t]. Listeners identified the initial sound of items along a voiced-voiceless continuum (e,g, bottom–pottom, produced with word-medial [t] or tap). No preference for word- forming responses for either variant was found. But, a bias toward voiced responses for words with [t] was found. We suggest this reflects a categorization bias dependent on speaking style, and claim that the difference in responses to words with different variants is best attributed to the phonetic composition of the word, not to a particular pronunciation variant. (1) Frequency bias: A cost for words produced with [t], like baiting produced like bay-ting, (Connine, 2004) compared to those produced with the more common tap ([ɾ]) variant, and Keywords: phonetic variation, pronunciation variation, speech perception, phoneme categorization, lexical representation Introduction As listeners, we face a speech signal that is riddled with variation, with countless acoustic realizations of any given word. Words stream by listeners at a rate of about 5–7 syllables per second, further complicating the listener’s task. How listeners understand spoken words despite this variation is an issue central to linguistic theory. The finding that lexical representations are rich with phonetic detail along with associated theories of representation and lexical access have greatly advanced our understanding of this process (e.g., Goldinger, 1998; Johnson, 2006). Incorporating variation into theory was a major step toward a full explanation of spoken language understanding. 1 But, claims made by lexical-representation- based accounts are becoming increasingly difficult to validate or falsify. Studies that examine the effects of pronunciation variants on spoken word recognition highlight this point. Two different realizations of a sound are considered pronunciation variants. For example, one can produce the word baiting with a [t], sounding like bay-ting or with a tap [ɾ], sounding more like bay-ding. Or, one can produce the word center with a [t], sounding like sen-ter, or without [n_] (though some acoustic residue is likely to remain), sounding like sen-ner. Studies that examine the recognition of words This is a move often discussed, but still largely absent from theories of spoken word recognition; see McLennan & Luce (2005) for related discussion. (2) Canonical bias: A benefit for words with [t], like center produced sounding like sen-ter (Pitt, 2009) compared those produced with the more common post- nasal deletion variant ([n_]) (sounding like sen-ner). In this paper, we suggest that this paradox has resulted for two reasons. First, pronunciation variants are typically examined independent of the phonetic composition of the entire word (see also Andruski et al., 1994). While it is true that we may produce [t] or [ɾ] in a word like baiting, it is also true that each variant co-varies with a different set of acoustic correlates across the word. Second, in the examples in (1) and (2), it is not clear that listener responses are driven by stored lexical forms in this task, and not by these co-present acoustic cues. It is undoubtedly the case that detailed representations exist. But, it is also the case that (1) listeners are highly sensitive to acoustic fluctuations in speech (Clayards, Tanenhaus, Aslin, & Jacobs, 2008; Green, Tomiak, & Kuhl, 1997; McMurray & Aslin, 2005; McMurray, Tanenhaus, & Aslin, 2009), (2) low-level acoustic mismatches result in major perceptual costs either from manipulations resulting in incongruent cues (Gaskell & Marslen-Wilson, 1996) or from intentionally mispronounced sounds (Gow, 2001, 2003; Sumner & Samuel, 2005), and (3) acoustic cues inform a listener not only about linguistic units, but provide expectations about the style of a speech event (Labov, 1966; among many others) In this paper, we ground ourselves broadly in a phonetic perspective and make two suggestions. First, we suggest that different pronunciation variants are processed equally

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations