A study was conducted to examine the effects of sentence context and speaking rate in the perceptual assimilation of non-native speech sounds. Native speakers of American English were presented with two voicing contrasts, [k]-[g] and [t\/sh]-[d\/yog], produced by two Hindi speakers. They appeared word-initially before [a] and within a short sentence frame. These sentences were produced at three speaking rates (slow, normal, and fast). Listeners were administered a categorial AXB discrimination test and a forced-choice identification test. The results were compared to those collected in a previous study that employed the same contrasts appearing in isolated words and produced by the same talkers. The results showed that the discrimination of these speech sounds in a sentence frame was significantly poorer relative to the isolation context. Moreover, the modal native consonant used to classify the Hindi stops varied depending upon the sentence context variable: in isolation, more discriminable uncategorizable assimilation types were elicited, while in a sentence frame, Hindi contrasts tended to be classified using a single English (voiced) consonant. Speaking rate only showed modest effects on proportion of responses represented by the modal native consonants. These results highlight the sensitivity of perceptual patterns elicited in the laboratory to experimental variables.
This study aims to bridge the gap between previous research on the impact of phonological neighborhood density on word identification, which did not consider phoneme confusability, and studies on phoneme identification and discrimination, which did not account for the effects of phonological neighborhood density. Native Japanese listeners participated in this study, where they were tasked with identifying monosyllabic and disyllabic English words. The words varied in both phonological neighborhood density and word frequency and were produced by native speakers of American English and Japanese. Participants responded by typing the words they heard. The findings of this study indicated that there were no significant effects of phonological neighborhood density on word identification. However, frequent words were identified with greater accuracy compared to infrequent words. Moreover, words produced by native speakers of American English were more accurately identified than those produced by Japanese speakers. Japanese listeners' inaccurate perception of phonemes has resulted in word identification errors in both dense and sparse phonological neighborhoods. For example, they often mistakenly identify /l/ as /r/ and vice versa. Low and central vowels /æ, ɑ, ʌ/ are frequently misidentified as one another. These confusions are commonly observed among Japanese learners of English.
Abstract In this article, we explored the pitch contour patterns of the French discourse marker donc in realizing different pragmatic functions from native and non-native oral corpora in French. Statistical analyses using generalized additive mixed modeling revealed that even though Mandarin Chinese L1 speakers learning French also used the pitch cue to realize pragmatic functions, their prosodic pattern is different from the native pattern. Their L1 Chinese seemed to influence their usage of the pitch cue significantly. In addition, women were shown to be better than men in using the pitch cue in conveying pragmatic functions with a closer pattern to the native pattern. Overall, our study sheds new light on the relationship between speakers’ L1 and L2 regarding the interaction between pragmatic and prosodic features. It also provides new reflections on the acquisition of socio-pragmatic competence.
The effects of stimulus presentation contexts on cross-language consonant category identification and category goodness rating were examined in two experiments. In Experiment 1, native Korean listeners’ identification and goodness ratings of Thai stop consonants were obtained under two conditions: ‘single’ and ‘triadic’ stimulus presentations. In the ‘single’ stimulus presentation, each target Thai stop consonant was presented in isolation for categorical identification and goodness rating, while the target stimulus (X) was presented between two other stimuli (A and B) in the ‘triadic’ stimulus presentation. Korean listeners’ identification data obtained under both presentation contexts were then used to generate ‘predicted’ discrimination scores. However, Korean listeners’ ‘actual’ (AXB) discrimination scores of Thai stop consonant contrasts were also obtained. The results indicated that the two stimulus presentation conditions (i.e., ‘triadic’ and ‘single’) did not affect the choice of modal categories with which the Thai consonants were identified and that native Korean listeners had no perceptual difficulty discriminating among Thai stop consonants. However, a slightly better fit between ‘actual’ and ‘predicted’ discrimination scores derived from the ‘triadic’ identification data was observed. On the other hand, the identification of Korean stop consonants obtained from the Thai listeners in Experiment 2 showed a strong effect of stimulus presentation contexts. Specifically, the identification of Korean lax /p/, /t/, and /k/ varied depending on whether they were presented in the context of an aspirated or a tense stop in the ‘triadic’ stimulus presentation format. Nonetheless, like the Korean listeners in Experiment 1, Thai listeners had no difficulty discriminating among Korean stops. In contrast with Experiment 1, however, a much stronger correlation between ‘actual’ and ‘predicted’ discrimination scores derived from Thai listeners’ ‘triadic’ identification data was observed.
Some researchers claim that intonation can be used to express specific emotion while others argue against the existence of emotion specific intonation patterns. In addition, languages differ in their use of intonation pattern to deliver similar emotion, and that L2 learners have the tendency to use L1 knowledge to produce intonation. Previous research shows that a falling successive addition boundary tone was used to express “disgust” or “anger” while a rising successive addition tone was used to convey “surprise” and “happy” emotions in Mandarin. In this study, we compare intonation patterns used to express five emotions: anger, disgust, surprise, joy, and neutral by 10 Mandarin and 10 English speakers in 1, 2, or 5-word utterances in English. Mandarin speakers were also asked to produce all 5 emotions in 1, 2 and 5-word utterances in Mandarin. Preliminary analyses from one Mandarin speaker showed that mean F0 of utterances produced with different emotions are significantly different in all three utterance lengths in both Mandarin and English. Inconsistent with previous research, a “falling” successive addition tone is used in all five emotions in Mandarin and in four emotions, except disgust, in English.
Intoxication has a well-known effect on speech production. Lester and Skousen (1974) reported that the place of articulation for /s/ is retracted and /tʃ/ and /ʤ/ are deaffricated (i.e., substituted by a non-affricate segment) in drunken speech. Zihlmann (2017) further established the robustness of deaffrication as it cannot be consciously suppressed under intoxication. Using these prevalent speech errors as test cases, this study extends a phonologically-informed neural network approach to the study of intoxicated speech. The approach has success in measuring pathological speech and lenition patterns in healthy speakers. Degrees of place retraction for /s/ and deaffrication of /tʃ/ and /ʤ/ are estimated from posterior probabilities calculated by recurrent neural networks trained to recognize [anterior], [continuant] and [strident] features. When applied to a corpus of alcohol English speech, preliminary results suggested that sober versus drunken state could be reliably predicted by the three posterior probabilities. The directions of the effects are largely in line with previous studies. For example, /tʃ/ and /ʤ/ are more fricated (higher strident and continuant probabilities), and /s/ is more retracted (lower anterior probability) in drunken compared to sober speech. The results suggest that the intoxicated speech can be reliably quantified by this new approach.
This study applies an automated procedure, Amplitude Envelope Modulation Spectrum (AEMS), to probe rhythmic differences between L1 Japanese and Japanese accented English. AEMS directly and automatically quantifies temporal regularities in the amplitude envelope of the speech waveform within specified frequency bands and has been shown to successfully differentiate types of dysarthria and utterances with and without code-switches. Ten native speakers of Japanese (1 male, 9 female) produced the rainbow passage in both Japanese and English. The passages will be segmented into sentences before AEMS application. The AEMS consist of the slow-rate (up to 10 Hz) amplitude modulations of the full signal and 7 octave bands ranging in center frequency from 125 to 8000 Hz. Six variables relating to peak frequency and amplitude and relative energy above, below, and in the region of 4 Hz will be calculated from each frequency bands. Discriminant function analyses (DFA) will then be performed to determine which sets of predictor variables best discriminate between the two passages.