Nonnegative Matrix Factorization With Basis Clustering Using Cepstral Distance Regularization

2018 
One successful approach for audio source separation involves applying nonnegative matrix factorization (NMF) to a magnitude spectrogram regarded as a nonnegative matrix. This can be interpreted as approximating the observed spectra at each time frame as the linear sum of the basis spectra scaled by time-varying amplitudes. This paper deals with the problem of the unsupervised instrument-wise source separation of polyphonic signals based on an extension of the NMF approach. We focus on the fact that each piece of music is typically played on a handful of musical instruments, which allows us to assume that the spectra of the underlying audio events in a polyphonic signal can be grouped into a reasonably small number of clusters in the mel-frequency cepstral coefficient (MFCC) domain. Based on this assumption, we propose formulating factorization of a magnitude spectrogram and clustering of the basis spectra in the MFCC domain as a joint optimization problem and derive a novel optimization algorithm based on the majorization–minimization principle. Experimental results revealed that our method was superior to a two-stage algorithm that consists of performing factorization followed by clustering the basis spectra, thus showing the advantage of the joint optimization approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    4
    Citations
    NaN
    KQI
    []