Improving Musical Tag Annotation with Stacking and Convolutional Neural Networks

2020 
Personalized music systems usually rely on manual song annotations (tags) as a mechanism for querying and navigating large music collections. However, the manual annotation is a hard task given the large amount of music available nowadays. Automatic song annotation based on content analysis is a potential solution to this problem and has recently been gaining attention. In this work, we propose to extend the Stacking prediction framework to use Convolutional Neural Networks (CNNs) in order to improve music tag annotation task. In general, the Stacked prediction consists of a technique in which the output of the first stage of learning is used as input in the second stage. In our work we have tried two proposals of extension. The first one consists of using the weights learned by the CNN in the first stage of training as input in the second stage. In the second proposal, we use the autoencoder technique with the weights learned by CNN in the first stage to generate images 50% smaller than the ones used as input of the first stage. The weights and the images obtained in the first stage are used as input in the second stage. We evaluated our proposals with five different CNN models in three datasets well-known in the literature (FMA, MillionSong, and MagnaTagATune), obtaining interesting results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []