Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments

2019 
This paper proposes methods for generation and implementation of uniform, large-scale data from auralized MIDI music files for use with deep learning networks for polyphonic pitch perception and impulse response recognition. This includes synthesis and sound source separation of large batches of multitrack MIDI files in non-real time, convolution with artificial binaural room impulse responses, and techniques for neural network training. Using ChucK, individual tracks for each MIDI file, containing the ground truth for pitch and other parameters, are processed concurrently with variable Synthesis ToolKit (STK) instruments, and the audio output is written to separate wave files in order to create multiple incoherent sound sources. Then, each track is convolved with a measured or synthetic impulse response that corresponds to the virtual position of the instrument in the room before all tracks are digitally summed. The database now contains the symbolic description in the form of MIDI commands and the auralized music performances. A polyphonic pitch model based on an array of autocorrelation functions for individual frequency bands is used to train a neural network and analyze the data [Work supported by IBM AIRC grant and NSF BCS-1539276.]This paper proposes methods for generation and implementation of uniform, large-scale data from auralized MIDI music files for use with deep learning networks for polyphonic pitch perception and impulse response recognition. This includes synthesis and sound source separation of large batches of multitrack MIDI files in non-real time, convolution with artificial binaural room impulse responses, and techniques for neural network training. Using ChucK, individual tracks for each MIDI file, containing the ground truth for pitch and other parameters, are processed concurrently with variable Synthesis ToolKit (STK) instruments, and the audio output is written to separate wave files in order to create multiple incoherent sound sources. Then, each track is convolved with a measured or synthetic impulse response that corresponds to the virtual position of the instrument in the room before all tracks are digitally summed. The database now contains the symbolic description in the form of MIDI commands and the aural...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []