logo
    A really complicated problem: Auditory scene analysis
    0
    Citation
    0
    Reference
    10
    Related Paper
    Abstract:
    It has been more than a decade since Al Bregman and other authors brought the challenge of auditory scene analysis back to the attention of auditory science. While a lot of research has been done on and around this topic, an accepted theory of auditory scene analysis has not evolved. Auditory science has little, if any, information about how the nervous system solves this problem, and there have not been any major successes in developing computational methods that solve the problem for most real-world auditory scenes. I will argue that the major reason that more has not been accomplished is that auditory scene analysis is a really hard problem. If one starts with a single sound source and tries to understand how the auditory system determines this single source, the problem is already very complicated without adding other sources that occur at the same time as is the typical depiction of the auditory scene. In this paper I will illustrate some of the challenges that exist for determining the auditory scene that have not received a lot of attention, as well as some of the more discussed aspects of the challenge. [Work supported by NIDCD.]
    Keywords:
    Auditory scene analysis
    Auditory System
    Selective auditory attention
    Depiction
    Real world sound is a mixture of different sources. The sound scene of a busy coffeehouse, for example, usually consists of several conversations, music playing, laughter and maybe a baby crying, the door being slammed, different machines operating in the background and more. When humans are confronted with these sounds, they rapidly and automatically adjust themselves in this complex sound environment, paying attention to the sound source of interest. This ability has been labeled in psychoacoustics under the name of Auditory Scene Analysis (ASA). The counterpart to ASA in machine listening is called Computational Auditory Scene Analysis (CASA) — the efforts to build computer models to perform auditory scene analysis. Research on CASA has led to great advancement in machine systems capable of analyzing complex sound scene, such as audio source separation and multiple pitch estimation. Such systems often fail to perform in presence of corrupted or incomplete sound scenes. In a real world sound scene, different sounds overlap in time and frequency, interfering with and canceling each other. Sometimes, the sound of interest may have some critical information totally missing, examples including an old recording from a scratched CD or a band-limited telephone speech signal. In the real world filled with incomplete sounds, the human auditory system has the ability, known as Auditory Scene Induction (ASI), to estimate the missing parts of a continuous auditory scene briefly covered by noise or other interferences, and perceptually resynthesize them. Since human is able to infer the missing elements in an auditory scene, it is important for machine systems to have the same function. However, there are very few efforts in computer audition to computationally realize this ability. This thesis focuses on the computational realization of auditory scene induction — Computational Auditory Scene Induction (CASI). More specifically, the goal of my research is to build computer models that are capable of resynthesizing the missing information of an audio scene. Building upon existing statistical models (NMF, PLCA, HMM and N-HMM) for audio representation, I will formulate this ability as a model-based spectrogram analysis and inference problem under the expectation–maximization (EM) framework with missing data in the observation. Various sources of information, including the spectral and temporal structure of audio, and the top-down knowledge about speech are incorporated into the proposed models to produce accurate reconstruction of the missing information in an audio scene. The effectiveness of these proposed machine systems are demonstrated on three audio signal processing tasks: singing melody extraction, audio imputation and audio bandwidth expansion. Each system is assessed through experiments on real world audio data and compared to the state-of-art. Although far from perfect, the proposed systems have shown many advantages and significant improvement over the existing systems. In addition, this thesis has shown that different applications related to missing audio data can be considered under the unified framework of CASI. This opened a new avenue of research in the Computer Audition community.
    Auditory scene analysis
    Psychoacoustics
    Selective auditory attention
    Auditory System
    Scene statistics
    Citations (0)
    Auditory scene analysis
    Auditory System
    Auditory perception
    Selective auditory attention
    Auditory feedback
    Timbre
    Auditory scene analysis
    Auditory System
    Auditory masking
    Speech is normally heard in the presence of other interfering sounds, a fact which has plagued speech technology research. A technique for segregating speech from an arbitrary noise source is described. The approach is based on a model of human auditory processing. The auditory system has an extraordinary ability to group together acoustic components that belong to the same sound source, a phenomenon named auditory scene analysis by Bregman (1989). Models of auditory scene analysis could provide a robust front-end for speech recognition in noisy environments, and may also have applications in automatic music transcription. Additionally, the authors hope that models of this type will contribute to the understanding of hearing and hearing impairment
    Auditory scene analysis
    Auditory System
    Selective auditory attention
    Transcription
    Citations (1)
    The advance and the direction of research in auditory scene analysis (ASA), including psychological auditory scene analysis and computational auditory scene analysis (CASA), are reviewed. The psychological ASA uncovers the human auditory mental and intellectual perceptual procedure and principles of detection and separation of multi acoustic stream. The goal of CASA is to simulate the processing mechanism of human auditory system using computers that could extract desired information from noisy environment, aiming at creating a machine with auditory intelligence.
    Auditory scene analysis
    Auditory perception
    Auditory System
    Selective auditory attention
    Citations (1)
    The "cocktail party effect" refers to the ability of human listeners to separate the acoustic signal reaching their ears into its individual components, corresponding to individual sound sources in the environment.Despite this phenomenon appearing trivial for humans, implementing the cocktail party effect computationally remains an ambitious challenge.The approach used in this paper takes inspiration from human strategies for separating an acoustic environment into distinct perceptual auditory streams.A series of time-frequency-based features, analogous to those thought to emerge at various stages in the human auditory processing pathway, are derived from biaural auditory inputs.These feature vectors are used as inputs to an unsupervised cluster analysis used to group feature values that are assumed to correspond to the same object.Reconstructed auditory streams are then correlated to the original components used to create the auditory scene.Our model is capable of reconstructing streams that correlate to the original components (r = 0.3-0.7)used to create the complex auditory scene.The success of the reconstructions is largely dependent on the signal-to-noise ratio of the components of the auditory scene.
    Auditory scene analysis
    Auditory System
    SIGNAL (programming language)
    Feature (linguistics)
    Auditory perception
    Source Separation
    Auditory scene analysis
    Auditory System
    Auditory perception
    Perceptual system
    Categorical Perception
    Auditory pathways
    The psychophysical based modeling approach of computational auditory scene analysis helps to understand the human auditory system and contributes to the improvement of technical acoustical systems, e.g. hearing aids and hands free telephony. In the present paper the primitive auditory scene analysis (Bregman 1990) is characterized as a cluster analysis problem. This leads to a system based on a temporal fuzzy cluster analysis capable of reproducing psychoacoustical streaming experiments. Moreover, it is possible to effectively combine monaural and binaural features to produce a robust segmentation of auditory scenes. This also facilitates the separation of the original sound source signals.
    Auditory scene analysis
    Monaural
    Auditory System
    Selective auditory attention
    Citations (7)
    It has been more than a decade since Al Bregman and other authors brought the challenge of auditory scene analysis back to the attention of auditory science. While a lot of research has been done on and around this topic, an accepted theory of auditory scene analysis has not evolved. Auditory science has little, if any, information about how the nervous system solves this problem, and there have not been any major successes in developing computational methods that solve the problem for most real-world auditory scenes. I will argue that the major reason that more has not been accomplished is that auditory scene analysis is a really hard problem. If one starts with a single sound source and tries to understand how the auditory system determines this single source, the problem is already very complicated without adding other sources that occur at the same time as is the typical depiction of the auditory scene. In this paper I will illustrate some of the challenges that exist for determining the auditory scene that have not received a lot of attention, as well as some of the more discussed aspects of the challenge. [Work supported by NIDCD.]
    Auditory scene analysis
    Auditory System
    Selective auditory attention
    Depiction
    Citations (0)