Mask Optimisation for Neural Network Monaural Source Separation

Richard J. Cant,Caroline S. Langensiepen,William Metcalf

Mask Optimisation for Neural Network Monaural Source Separation

2017

An ideal binary mask is a means by which multiple sound sources within a single audio file can be separated. Previous work has shown a deep neural network can be trained to approximate the ideal mask, but at a substantial computational cost. We present a method to assess the impact of reducing the mask by averaging time and frequency bins, so that the computational cost can be significantly reduced. Our work uses the original separate musical channels mask as a ground truth and compares this against an ideal binary mask and an ideal ”soft” or proportional mask. The ideal soft mask is then compared against masks produced by a range of averaging levels. We find that averaging could produce a reduction by a factor of 16 in the number of weights in the neural network (and thus a significant improvement in computation time), while still achieving plausible results in terms of source separation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations