Mask Optimisation for Neural Network Monaural Source Separation

2017 
An ideal binary mask is a means by which multiple sound sources within a single audio file can be separated. Previous work has shown a deep neural network can be trained to approximate the ideal mask, but at a substantial computational cost. We present a method to assess the impact of reducing the mask by averaging time and frequency bins, so that the computational cost can be significantly reduced. Our work uses the original separate musical channels mask as a ground truth and compares this against an ideal binary mask and an ideal ”soft” or proportional mask. The ideal soft mask is then compared against masks produced by a range of averaging levels. We find that averaging could produce a reduction by a factor of 16 in the number of weights in the neural network (and thus a significant improvement in computation time), while still achieving plausible results in terms of source separation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []