logo
    Functional Organization of the Ventral Auditory Pathway
    13
    Citation
    56
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    The fundamental problem in audition is determining the mechanisms required by the brain to transform an unlabelled mixture of auditory stimuli into coherent perceptual representations. This process is called auditory-scene analysis. The perceptual representations that result from auditory-scene analysis are formed through a complex interaction of perceptual grouping, attention, categorization and decision-making. Despite a great deal of scientific energy devoted to understanding these aspects of hearing, we still do not understand (1) how sound perception arises from neural activity and (2) the causal relationship between neural activity and sound perception. Here, we review the role of the "ventral" auditory pathway in sound perception. We hypothesize that, in the early parts of the auditory cortex, neural activity reflects the auditory properties of a stimulus. However, in latter parts of the auditory cortex, neurons encode the sensory evidence that forms an auditory decision and are causally involved in the decision process. Finally, in the prefrontal cortex, which receives input from the auditory cortex, neural activity reflects the actual perceptual decision. Together, these studies indicate that the ventral pathway contains hierarchical circuits that are specialized for auditory perception and scene analysis.
    Keywords:
    Auditory imagery
    Auditory scene analysis
    Stimulus (psychology)
    Auditory perception
    Selective auditory attention
    Auditory System
    To recognize and understand the auditory environment, the listener must first separate sounds that arise from different sources and capture each event. This process is known as auditory scene analysis. The aim of this thesis is to investigate whether and how visual information can influence auditory scene analysis. The thesis consists of four chapters. Firstly, I reviewed the literature to give a clear framework about the impact of visual information on the analysis of complex acoustic environments. In chapter II, I examined psychophysically whether temporal coherence between auditory and visual stimuli was sufficient to promote auditory stream segregation in a mixture. I have found that listeners were better able to report brief deviants in an amplitude modulated target stream when a visual stimulus changed in size in a temporally coherent manner than when the visual stream was coherent with the non-target auditory stream. This work demonstrates that temporal coherence between auditory and visual features can influence the way people analyse an auditory scene. In chapter III, the integration of auditory and visual features in auditory cortex was examined by recording neuronal responses in awake and anaesthetised ferret auditory cortex in response to the modified stimuli used in Chapter II. I demonstrated that temporal coherence between auditory and visual stimuli enhances the neural representation of a sound and influences which sound a neuron represents in a sound mixture. Visual stimuli elicited reliable changes in the phase of the local field potential which provides mechanistic insight into this finding. Together these findings provide evidence that early cross modal integration underlies the behavioural effects in chapter II. Finally, in chapter IV, I investigated whether training can influence the ability of listeners to utilize visual cues for auditory stream analysis and showed that this ability improved by training listeners to detect auditory-visual temporal coherence.
    Auditory scene analysis
    Selective auditory attention
    Stimulus (psychology)
    Auditory System
    Auditory perception
    Auditory imagery
    Citations (0)
    Real world sound is a mixture of different sources. The sound scene of a busy coffeehouse, for example, usually consists of several conversations, music playing, laughter and maybe a baby crying, the door being slammed, different machines operating in the background and more. When humans are confronted with these sounds, they rapidly and automatically adjust themselves in this complex sound environment, paying attention to the sound source of interest. This ability has been labeled in psychoacoustics under the name of Auditory Scene Analysis (ASA). The counterpart to ASA in machine listening is called Computational Auditory Scene Analysis (CASA) — the efforts to build computer models to perform auditory scene analysis. Research on CASA has led to great advancement in machine systems capable of analyzing complex sound scene, such as audio source separation and multiple pitch estimation. Such systems often fail to perform in presence of corrupted or incomplete sound scenes. In a real world sound scene, different sounds overlap in time and frequency, interfering with and canceling each other. Sometimes, the sound of interest may have some critical information totally missing, examples including an old recording from a scratched CD or a band-limited telephone speech signal. In the real world filled with incomplete sounds, the human auditory system has the ability, known as Auditory Scene Induction (ASI), to estimate the missing parts of a continuous auditory scene briefly covered by noise or other interferences, and perceptually resynthesize them. Since human is able to infer the missing elements in an auditory scene, it is important for machine systems to have the same function. However, there are very few efforts in computer audition to computationally realize this ability. This thesis focuses on the computational realization of auditory scene induction — Computational Auditory Scene Induction (CASI). More specifically, the goal of my research is to build computer models that are capable of resynthesizing the missing information of an audio scene. Building upon existing statistical models (NMF, PLCA, HMM and N-HMM) for audio representation, I will formulate this ability as a model-based spectrogram analysis and inference problem under the expectation–maximization (EM) framework with missing data in the observation. Various sources of information, including the spectral and temporal structure of audio, and the top-down knowledge about speech are incorporated into the proposed models to produce accurate reconstruction of the missing information in an audio scene. The effectiveness of these proposed machine systems are demonstrated on three audio signal processing tasks: singing melody extraction, audio imputation and audio bandwidth expansion. Each system is assessed through experiments on real world audio data and compared to the state-of-art. Although far from perfect, the proposed systems have shown many advantages and significant improvement over the existing systems. In addition, this thesis has shown that different applications related to missing audio data can be considered under the unified framework of CASI. This opened a new avenue of research in the Computer Audition community.
    Auditory scene analysis
    Psychoacoustics
    Selective auditory attention
    Auditory System
    Scene statistics
    Citations (0)
    Speech is normally heard in the presence of other interfering sounds, a fact which has plagued speech technology research. A technique for segregating speech from an arbitrary noise source is described. The approach is based on a model of human auditory processing. The auditory system has an extraordinary ability to group together acoustic components that belong to the same sound source, a phenomenon named auditory scene analysis by Bregman (1989). Models of auditory scene analysis could provide a robust front-end for speech recognition in noisy environments, and may also have applications in automatic music transcription. Additionally, the authors hope that models of this type will contribute to the understanding of hearing and hearing impairment
    Auditory scene analysis
    Auditory System
    Selective auditory attention
    Transcription
    Citations (1)
    The ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically based representations in the auditory nerve, into perceptually distinct auditory-object-based representations in the auditory cortex. Here, using magnetoencephalography recordings from men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of the auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in the auditory cortex contain dominantly spectrotemporal-based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. We also show that higher-order auditory cortical areas, by contrast, represent the attended stream separately and with significantly higher fidelity than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of the human auditory cortex. SIGNIFICANCE STATEMENT Using magnetoencephalography recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of the auditory cortex. We show that the primary-like areas in the auditory cortex use a dominantly spectrotemporal-based representation of the entire auditory scene, with both attended and unattended speech streams represented with almost equal fidelity. We also show that higher-order auditory cortical areas, by contrast, represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects.
    Magnetoencephalography
    Auditory scene analysis
    Auditory System
    Auditory perception
    Auditory imagery
    Stimulus (psychology)
    Selective auditory attention
    The advance and the direction of research in auditory scene analysis (ASA), including psychological auditory scene analysis and computational auditory scene analysis (CASA), are reviewed. The psychological ASA uncovers the human auditory mental and intellectual perceptual procedure and principles of detection and separation of multi acoustic stream. The goal of CASA is to simulate the processing mechanism of human auditory system using computers that could extract desired information from noisy environment, aiming at creating a machine with auditory intelligence.
    Auditory scene analysis
    Auditory perception
    Auditory System
    Selective auditory attention
    Citations (1)
    Abstract The ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically-based representations in the auditory nerve, into perceptually distinct auditory-objects based representation in auditory cortex. Here, using magnetoencephalography (MEG) recordings from human subjects, both men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in auditory cortex contain dominantly spectro-temporal based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. In contrast, we also show that higher order auditory cortical areas represent the attended stream separately, and with significantly higher fidelity, than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Taken together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of human auditory cortex. Significance Statement Using magnetoencephalography (MEG) recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of auditory cortex. We show that the primary-like areas in auditory cortex use a dominantly spectro-temporal based representation of the entire auditory scene, with both attended and ignored speech streams represented with almost equal fidelity. In contrast, we show that higher order auditory cortical areas represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects.
    Magnetoencephalography
    Auditory scene analysis
    Auditory System
    Selective auditory attention
    Representation
    Auditory imagery
    Stimulus (psychology)
    Auditory perception
    Citations (6)
    The fundamental problem in audition is determining the mechanisms required by the brain to transform an unlabelled mixture of auditory stimuli into coherent perceptual representations. This process is called auditory-scene analysis. The perceptual representations that result from auditory-scene analysis are formed through a complex interaction of perceptual grouping, attention, categorization and decision-making. Despite a great deal of scientific energy devoted to understanding these aspects of hearing, we still do not understand (1) how sound perception arises from neural activity and (2) the causal relationship between neural activity and sound perception. Here, we review the role of the "ventral" auditory pathway in sound perception. We hypothesize that, in the early parts of the auditory cortex, neural activity reflects the auditory properties of a stimulus. However, in latter parts of the auditory cortex, neurons encode the sensory evidence that forms an auditory decision and are causally involved in the decision process. Finally, in the prefrontal cortex, which receives input from the auditory cortex, neural activity reflects the actual perceptual decision. Together, these studies indicate that the ventral pathway contains hierarchical circuits that are specialized for auditory perception and scene analysis.
    Auditory imagery
    Auditory scene analysis
    Stimulus (psychology)
    Auditory perception
    Selective auditory attention
    Auditory System
    The psychophysical based modeling approach of computational auditory scene analysis helps to understand the human auditory system and contributes to the improvement of technical acoustical systems, e.g. hearing aids and hands free telephony. In the present paper the primitive auditory scene analysis (Bregman 1990) is characterized as a cluster analysis problem. This leads to a system based on a temporal fuzzy cluster analysis capable of reproducing psychoacoustical streaming experiments. Moreover, it is possible to effectively combine monaural and binaural features to produce a robust segmentation of auditory scenes. This also facilitates the separation of the original sound source signals.
    Auditory scene analysis
    Monaural
    Auditory System
    Selective auditory attention
    Citations (7)
    It has been more than a decade since Al Bregman and other authors brought the challenge of auditory scene analysis back to the attention of auditory science. While a lot of research has been done on and around this topic, an accepted theory of auditory scene analysis has not evolved. Auditory science has little, if any, information about how the nervous system solves this problem, and there have not been any major successes in developing computational methods that solve the problem for most real-world auditory scenes. I will argue that the major reason that more has not been accomplished is that auditory scene analysis is a really hard problem. If one starts with a single sound source and tries to understand how the auditory system determines this single source, the problem is already very complicated without adding other sources that occur at the same time as is the typical depiction of the auditory scene. In this paper I will illustrate some of the challenges that exist for determining the auditory scene that have not received a lot of attention, as well as some of the more discussed aspects of the challenge. [Work supported by NIDCD.]
    Auditory scene analysis
    Auditory System
    Selective auditory attention
    Depiction
    Citations (0)