Voices and Variants: Effects of Voice on the Form-Based Processing of Words with Different Phonological Variants

2014 
Voices and Variants: Effects of Voice on the Form-Based Processing of Words with Different Phonological Variants Sharese King (sharese@stanford.edu) Department of Linguistics, Margaret Jacks Hall, Bldg. 460 Stanford, CA 94301-2150 USA Meghan Sumner (sumner@stanford.edu) Department of Linguistics, Margaret Jacks Hall, Bldg. 460 Stanford, CA 94301-2150 USA Abstract Spoken words have robust acoustic variation. How listeners understand spoken words despite this variation remains an issue central to theories of speech perception. Current models predict listener behavior based on the frequency of a variant in production. A phonological variant, though, is often investigated independent of phonetic variation that provides listeners with information about talkers. In this study, we investigate whether standard variants in words produced by a talker with a standard voice are recognized more quickly than standard variants in words produced by a talker with a non- standard voice. Conversely, we investigate whether non- standard variants in words produced by a talker with a standard voice are recognized more slowly than standard variants in words produced by a talker with a non-standard voice. These comparisons enable us to assess limitations of current theory, illuminating the understudied influence of talker voice in the understanding of spoken words with different phonological variants. Keywords: spoken-word recognition; speech perception; variation; dialect; African American Vernacular English Introduction Speech varies across speakers based on a variety of social and linguistic factors. While variation was, viewed as problematic noise (e.g., Verbrugge, Strange, Shankweiler, & Edman 1976), researchers have turned to investigating the potential contribution phonetic variation has in the quick and adept ability of listeners to understand spoken words. For example, listeners are highly sensitive to variation in speech (Bradlow & Bent, 2008; Bradlow & Pisoni, 1999; Clopper & Pisoni, 2004; Johnson, 2006; Sumner & Samuel, 2009), use this information to process upcoming words (e.g., Beddor, McGowan, Boland, Coetzee, & Brasher, 2013; Salverda, Kleinschmidt, & Tanenhaus, 2014), store detailed talker-based acoustic detail in memory (Goldinger, 1998; Nygaard, Sommers, & Pisoni, 1994), and depend on acoustic patterns in speech to activate acoustically-similar representations (Johnson, 2006). Many contemporary theories oriented toward accommodating phonetic variation in speech perception are episodic in nature. Such theories posit that a listener’s ability to access a lexical item is contingent upon the encoding of detailed episodes of spoken words (Goldinger, 1998; Johnson, 2006). Incoming speech is perceived against the clusters arising from the storage of phonetically-rich lexical representations. This leads to an activation benefit of more frequently experienced acoustic patterns, as a common structure benefits from the shared activation of a rich, dense cluster of stored word forms, making up the form component of a form-meaning lexical representation. In the most simplistic and extreme interpretation of such a theory, listeners understand and recognize frequent word forms faster than and/or more accurately than less frequent word forms. The bulk of studies that have supported this view have investigated talker-specific variation and its effect on the recall and recognition of spoken words. For example, Johnson (2006) found that words produced by women with more typical female voices are recognized more quickly than words produced by women with less typical female voices. Nygaard and colleagues (Nygaard et al., 1994; Nygaard and Pisoni, 1998) have shown that words are recognized better and recalled more accurately upon second presentation when the first presentation matched in talker voice, and speech rate. While these studies have provided evidence for specificity in form at the lexical level and in a benefit for more typical or frequent forms, words forms vary phonologically, too. For example, speakers of General American English (GA) may produce the word center with a medial [nt] sequence, or with a medial [n_] sequence, stemming from a post-nasal t-deletion process common in GA, and across regional and ethnic varieties of American English more broadly. Recent work has investigated the composition of lexical form-based representations of words with different pronunciation variants, as well. Typically, these studies compare the effects of words produced with one variant to those of words produced with a different variant. In other words, the comparisons are typically purely phonological and categorical. The similar thread tying this work to those with episodic-based approaches to variation has been the link to frequency. From a representational standpoint, researchers have wondered whether one variant is dominant compared to another. Additionally, they wonder if evidence exists as to whether representations are tied to production frequency or tied to a canonical, or idealized, form of a word.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    1
    Citations
    NaN
    KQI
    []