Skip to main content

2019 | OriginalPaper | Buchkapitel

Multimodel Music Emotion Recognition Using Unsupervised Deep Neural Networks

verfasst von : Jianchao Zhou, Xiaoou Chen, Deshun Yang

Erschienen in: Proceedings of the 6th Conference on Sound and Music Technology (CSMT)

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In most studies on multimodal music emotion recognition, different modalities are generally combined in a simple way and used for supervised training. The improvement of the experiment results illustrates the correlations between different modalities. However, few studies focus on modeling the relationships between different modal data. In this paper, we propose to model the relationships between different modalities (i.e., lyric and audio data) by deep learning methods in multimodal music emotion recognition. Several deep networks are first applied to perform unsupervised feature learning over multiple modalities. We, then, design a series of music emotion recognition experiments to evaluate the learned features. The experiment results show that the deep networks perform well on unsupervised feature learning for multimodal data and can model the relationships effectively. In addition, we demonstrate a unimodal enhancement experiment, where better features for one modality (e.g., lyric) can be learned by the proposed deep network, if the other modality (e.g., audio) is also present at unsupervised feature learning time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio Speech Lang Process 16(2):448–457CrossRef Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio Speech Lang Process 16(2):448–457CrossRef
2.
Zurück zum Zitat Kim YE, Schmidt EM, Migneco R, Morton BG, Richardson P, Scott J, Speck JA, Turnbull D (2010) Music emotion recognition: a state of the art review. ResearchGate 86(00):937–952 Kim YE, Schmidt EM, Migneco R, Morton BG, Richardson P, Scott J, Speck JA, Turnbull D (2010) Music emotion recognition: a state of the art review. ResearchGate 86(00):937–952
3.
Zurück zum Zitat Laurier C, Grivolla J, Herrera P (2008) Multimodal music mood classification using audio and lyrics. In: International conference on machine learning and applications, pp 688–693 Laurier C, Grivolla J, Herrera P (2008) Multimodal music mood classification using audio and lyrics. In: International conference on machine learning and applications, pp 688–693
4.
Zurück zum Zitat Tzanetakis G, Ermolinskyi A, Cook P (2003) Pitch histograms in audio and symbolic music information retrieval. J New Music Res 32(2):143–152CrossRef Tzanetakis G, Ermolinskyi A, Cook P (2003) Pitch histograms in audio and symbolic music information retrieval. J New Music Res 32(2):143–152CrossRef
5.
Zurück zum Zitat Hu X, Downie JS, Ehmann AF (2009) Lyric text mining in music mood classification. In: International society for music information retrieval conference, ISMIR 2009, pp 411–416. Kobe International Conference Center, Kobe, Japan, October Hu X, Downie JS, Ehmann AF (2009) Lyric text mining in music mood classification. In: International society for music information retrieval conference, ISMIR 2009, pp 411–416. Kobe International Conference Center, Kobe, Japan, October
6.
Zurück zum Zitat Yang YH, Lin YC, Cheng HT, Liao I, Ho YC, Chen HH (2008) Toward multi-modal music emotion classification. In: Pacific Rim conference on multimedia: advances in multimedia information processing, pp 70–79, (2008)CrossRef Yang YH, Lin YC, Cheng HT, Liao I, Ho YC, Chen HH (2008) Toward multi-modal music emotion classification. In: Pacific Rim conference on multimedia: advances in multimedia information processing, pp 70–79, (2008)CrossRef
7.
Zurück zum Zitat Hu X, Downie JS (2010) Improving mood classification in music digital libraries by combining lyrics and audio. In: Joint international conference on digital libraries, JCDL 2010, pp 159–168, Gold Coast, Queensland, Australia, June Hu X, Downie JS (2010) Improving mood classification in music digital libraries by combining lyrics and audio. In: Joint international conference on digital libraries, JCDL 2010, pp 159–168, Gold Coast, Queensland, Australia, June
8.
Zurück zum Zitat Lu Q, Chen X, Yang D, Wang J (2010) Boosting for multi-modal music emotion classification. In International society for music information retrieval conference, ISMIR 2010, pp 105–110, Utrecht, Netherlands, August Lu Q, Chen X, Yang D, Wang J (2010) Boosting for multi-modal music emotion classification. In International society for music information retrieval conference, ISMIR 2010, pp 105–110, Utrecht, Netherlands, August
9.
Zurück zum Zitat Zhao Y, Yang D, Chen X (2010) Multi-modal music mood classification using co-training. In: International conference on computational intelligence and software engineering, pp 1–4 Zhao Y, Yang D, Chen X (2010) Multi-modal music mood classification using co-training. In: International conference on computational intelligence and software engineering, pp 1–4
10.
Zurück zum Zitat Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep boltzmann machines. J Mach Learn Res 15(8):1967–2006MathSciNetMATH Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep boltzmann machines. J Mach Learn Res 15(8):1967–2006MathSciNetMATH
11.
Zurück zum Zitat Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning, ICML 2011, pp 689–696. Bellevue, Washington, USA, June 28–July Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning, ICML 2011, pp 689–696. Bellevue, Washington, USA, June 28–July
12.
Zurück zum Zitat Liu W, Zheng WL, Lu BL (2016) Emotion recognition using multimodal deep learning Liu W, Zheng WL, Lu BL (2016) Emotion recognition using multimodal deep learning
13.
Zurück zum Zitat Mehrabian A (1995) Framework for a comprehensive description and measurement of emotional states. Genet Soc Gen Psychol Monogr 121(3):339 Mehrabian A (1995) Framework for a comprehensive description and measurement of emotional states. Genet Soc Gen Psychol Monogr 121(3):339
14.
Zurück zum Zitat Zhou J, Peng L, Chen X, Yang D (2016) Robust sound event classification by using denoising autoencoder. In: 18th IEEE international workshop on multimedia signal processing, MMSP 2016, pp 1–6. Montreal, QC, Canada, September 21–23 Zhou J, Peng L, Chen X, Yang D (2016) Robust sound event classification by using denoising autoencoder. In: 18th IEEE international workshop on multimedia signal processing, MMSP 2016, pp 1–6. Montreal, QC, Canada, September 21–23
15.
Zurück zum Zitat Chen H, Murray AF (2003) Continuous restricted boltzmann machine with an implementable training algorithm. Vis Image Signal Process IEE Proc 150(3):153–158CrossRef Chen H, Murray AF (2003) Continuous restricted boltzmann machine with an implementable training algorithm. Vis Image Signal Process IEE Proc 150(3):153–158CrossRef
16.
Zurück zum Zitat Lang PJ (1980) Behavioral treatment and bio-behavioural assessment: computer applications. Technology in mental health care delivery systems. Norwood Ablex Lang PJ (1980) Behavioral treatment and bio-behavioural assessment: computer applications. Technology in mental health care delivery systems. Norwood Ablex
17.
Zurück zum Zitat Mckay C, Fujinaga I, Depalle P (2005) jAudio: a feature extraction library Mckay C, Fujinaga I, Depalle P (2005) jAudio: a feature extraction library
18.
Zurück zum Zitat Guan D, Chen X, Yang D (2012) Music emotion regression based on multi-modal features. In: Proceedings of international symposium on computer music modeling and retrieval, pp 70–77 Guan D, Chen X, Yang D (2012) Music emotion regression based on multi-modal features. In: Proceedings of international symposium on computer music modeling and retrieval, pp 70–77
19.
Zurück zum Zitat Mckay C, Fujinaga I (2006) Symbolic: a feature extractor for midi files, pp 302–305 Mckay C, Fujinaga I (2006) Symbolic: a feature extractor for midi files, pp 302–305
Metadaten
Titel
Multimodel Music Emotion Recognition Using Unsupervised Deep Neural Networks
verfasst von
Jianchao Zhou
Xiaoou Chen
Deshun Yang
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-8707-4_3

Neuer Inhalt