Skip to main content

2017 | OriginalPaper | Buchkapitel

Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space

verfasst von : Feng Su, Hao Xue

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic music mood classification is an important and challenging problem in the field of music information retrieval (MIR) and has attracted growing attention from variant research areas. In this paper, we proposed a novel multimodal method for music mood classification that exploits the complementarity of the lyrics and audio information of music to enhance the classification accuracy. We first extract descriptive sentence-level lyrics and audio features from the music. Then, we project the paired low-level features of two different modalities into a learned common discriminative latent space, which not only eliminates between modality heterogeneity, but also increases the discriminability of the resulting descriptions. On the basis of the latent representation of music, we employ a graph learning based multi-modal classification model for music mood, which takes the cross-modality similarity between local audio and lyrics descriptions of music into account for effective exploitation of correlations between different modalities. The acquired predictions of mood category for every sentence of music are then aggregated by a simple voting scheme. The effectiveness of the proposed method has been demonstrated in the experiments on a real dataset composed of more than 3,000 min of music and corresponding lyrics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Besson, M., Faita, F., Peretz, I., Bonnel, A.M., Requin, J.: Singing in the brain: independence of lyrics and tunes. Psychol. Sci. 9(6), 494–498 (1998)CrossRef Besson, M., Faita, F., Peretz, I., Bonnel, A.M., Requin, J.: Singing in the brain: independence of lyrics and tunes. Psychol. Sci. 9(6), 494–498 (1998)CrossRef
2.
Zurück zum Zitat Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011)CrossRef Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011)CrossRef
3.
Zurück zum Zitat He, X., Yan, S., Hu, Y., Niyogi, P., Zhang, H.J.: Face recognition using Laplacian faces. IEEE TPAMI 27(3), 328–340 (2005)CrossRef He, X., Yan, S., Hu, Y., Niyogi, P., Zhang, H.J.: Face recognition using Laplacian faces. IEEE TPAMI 27(3), 328–340 (2005)CrossRef
4.
Zurück zum Zitat Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: ISMIR 2009, pp. 411–416 (2009) Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: ISMIR 2009, pp. 411–416 (2009)
5.
Zurück zum Zitat Hu, Y., Ogihara, M.: Identifying accuracy of social tags by using clustering representations of song lyrics. In: ICMLA 2012, vol. 1, pp. 582–585, December 2012 Hu, Y., Ogihara, M.: Identifying accuracy of social tags by using clustering representations of song lyrics. In: ICMLA 2012, vol. 1, pp. 582–585, December 2012
6.
Zurück zum Zitat Kim, S., E.M., Migneco, R., Youngmoo, E.: Music emotion recognition: a state of the art review. In: ISMIR 2010, pp. 255–266 (2010) Kim, S., E.M., Migneco, R., Youngmoo, E.: Music emotion recognition: a state of the art review. In: ISMIR 2010, pp. 255–266 (2010)
7.
Zurück zum Zitat Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: ICMLA 2008, pp. 688–693 (2008) Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: ICMLA 2008, pp. 688–693 (2008)
8.
Zurück zum Zitat Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE TASLP 14(1), 5–18 (2006) Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE TASLP 14(1), 5–18 (2006)
10.
Zurück zum Zitat Panda, R., Paiva, R.P.: Mirex 2012: mood classification tasks submission (2012) Panda, R., Paiva, R.P.: Mirex 2012: mood classification tasks submission (2012)
11.
Zurück zum Zitat Ren, J.M., Wu, M.J., Jang, J.S.R.: Automatic music mood classification based on timbre and modulation features. IEEE Trans. Affect. Comput. 6(3), 236–246 (2015)CrossRef Ren, J.M., Wu, M.J., Jang, J.S.R.: Automatic music mood classification based on timbre and modulation features. IEEE Trans. Affect. Comput. 6(3), 236–246 (2015)CrossRef
12.
Zurück zum Zitat Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef
13.
Zurück zum Zitat Sharma, A., Kumar, A., III, H.D., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR 2012, pp. 2160–2167, June 2012 Sharma, A., Kumar, A., III, H.D., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR 2012, pp. 2160–2167, June 2012
14.
Zurück zum Zitat Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP 2011, pp. 151–161 (2011) Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP 2011, pp. 151–161 (2011)
15.
Zurück zum Zitat Su, D., Fung, P., Auguin, N.: Multimodal music emotion classification using AdaBoost with decision stumps. In: ICASSP 2013, pp. 3447–3451 (2013) Su, D., Fung, P., Auguin, N.: Multimodal music emotion classification using AdaBoost with decision stumps. In: ICASSP 2013, pp. 3447–3451 (2013)
16.
Zurück zum Zitat Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010) Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
17.
Zurück zum Zitat Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 26–37. Springer, Heidelberg (2015). doi:10.1007/978-3-319-14442-9_3 Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 26–37. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-14442-9_​3
18.
Zurück zum Zitat Yang, Y.H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3(3), 40:1–40:30 (2012)MathSciNetCrossRef Yang, Y.H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3(3), 40:1–40:30 (2012)MathSciNetCrossRef
19.
Zurück zum Zitat Yang, Y.-H., Lin, Y.-C., Cheng, H.-T., Liao, I.-B., Ho, Y.-C., Chen, H.H.: Toward multi-modal music emotion classification. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 70–79. Springer, Heidelberg (2008). doi:10.1007/978-3-540-89796-5_8 CrossRef Yang, Y.-H., Lin, Y.-C., Cheng, H.-T., Liao, I.-B., Ho, Y.-C., Chen, H.H.: Toward multi-modal music emotion classification. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 70–79. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-89796-5_​8 CrossRef
20.
Zurück zum Zitat Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004) Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)
Metadaten
Titel
Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space
verfasst von
Feng Su
Hao Xue
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-51811-4_13

Neuer Inhalt