LISTEN: a system for locating and tracking individual speakers

1996 
Both visual and acoustical informations provide effective means of telecommunication between persons. In this context, the face is the most important part of the person both visually and acoustically. We describe how the cooperation of image and audio processing allows to track a person's face and to collect the audio information it produces. We present detection techniques of regions of interest (e.g. Moving regions of skin color), coupled with a neural network based face detector with a low false alarm rate, to locate and track faces. The system is connected to a nine microphone array adaptive beam forming which performs immediate beam forming. Visual and acoustical informations from the speaker face are thus obtained in real time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    31
    Citations
    NaN
    KQI
    []