2015 | OriginalPaper | Buchkapitel
Multimodal Music Mood Classification by Fusion of Audio and Lyrics
verfasst von : Hao Xue, Like Xue, Feng Su
Erschienen in: MultiMedia Modeling
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.