Skip to main content
Top

2015 | OriginalPaper | Chapter

Fusion of Text and Audio Semantic Representations Through CCA

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Humans are natural multimedia processing machines. Multimedia is a domain of multi-modalities including audio, text and images. A central aspect of multimedia processing is the coherent integration of media from different modalities as a single identity. Multimodal information fusion architectures become a necessity when not all information channels are available at all times. In this paper, we introduce a multimodal fusion of audio signals and lyrics in a shared semantic space through canonical correlation analysis. We propose an audio retrieval system based on extended semantic analysis of audio signals. We will combine this model with a tf-idf representation of lyrics to achieve a multimodal retrieval system. We use canonical correlation analysis and supervised learning methods as a basis for relating audio and lyrics information. Our experimental evaluation of the proposed method indicated that the proposed model outperforms the prior approaches based on simple canonical correlation methods. Finally, the efficiency of the proposed method allows for dealing with large music and lyrics collections enabling users to explore relevant lyrics information for music datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aryafar, K., Jafarpour, S., Shokoufandeh, A.: Music genre classification using sparsity-eager support vector machines. Technical report Aryafar, K., Jafarpour, S., Shokoufandeh, A.: Music genre classification using sparsity-eager support vector machines. Technical report
2.
go back to reference Aryafar, K., Jafarpour, S., Shokoufandeh, A.: Automatic musical genre classification using sparsity-eager support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 1526–1529. IEEE (2012) Aryafar, K., Jafarpour, S., Shokoufandeh, A.: Automatic musical genre classification using sparsity-eager support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 1526–1529. IEEE (2012)
3.
go back to reference Aryafar, K., Shokoufandeh, A.: Music genre classification using explicit semantic analysis. In: Proceedings of the 1st International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 33–38. ACM (2011) Aryafar, K., Shokoufandeh, A.: Music genre classification using explicit semantic analysis. In: Proceedings of the 1st International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 33–38. ACM (2011)
4.
go back to reference Pradeep, K., Atrey, M., Hossain, A., Saddik, A.E., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)CrossRef Pradeep, K., Atrey, M., Hossain, A., Saddik, A.E., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)CrossRef
5.
go back to reference Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (2011) Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (2011)
6.
go back to reference Dorai, C., Venkatesh, S.: Bridging the semantic gap in content management systems. In: Dorai, C., Venkatesh, S. (eds.) Media Computing, pp. 1–9. Springer, New York (2002)CrossRef Dorai, C., Venkatesh, S.: Bridging the semantic gap in content management systems. In: Dorai, C., Venkatesh, S. (eds.) Media Computing, pp. 1–9. Springer, New York (2002)CrossRef
7.
go back to reference Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007) Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)
8.
go back to reference Jensen, B.S., Troelsgaard, R., Larsen, J., Hansen, L.K.: Towards a universal representation for audio information retrieval and analysis. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3168–3172. IEEE (2013) Jensen, B.S., Troelsgaard, R., Larsen, J., Hansen, L.K.: Towards a universal representation for audio information retrieval and analysis. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3168–3172. IEEE (2013)
9.
go back to reference Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266. Citeseer (2010) Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266. Citeseer (2010)
10.
go back to reference Li, T.L.H., Chan, A.B.: Genre classification and the invariance of MFCC features to key and tempo. In: Lee, K.-T., Tsai, W.-H., Liao, H.-Y.M., Chen, T., Hsieh, J.-W., Tseng, C.-C. (eds.) MMM 2011 Part I. LNCS, vol. 6523, pp. 317–327. Springer, Heidelberg (2011)CrossRef Li, T.L.H., Chan, A.B.: Genre classification and the invariance of MFCC features to key and tempo. In: Lee, K.-T., Tsai, W.-H., Liao, H.-Y.M., Chen, T., Hsieh, J.-W., Tseng, C.-C. (eds.) MMM 2011 Part I. LNCS, vol. 6523, pp. 317–327. Springer, Heidelberg (2011)CrossRef
11.
go back to reference Mandel, M.I., Ellis, D.P.W.: Song-level features and support vector machines for music classification. In: Reiss, J.D., Wiggins, G.A. (eds.) Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), pp. 594–599, September 2005 Mandel, M.I., Ellis, D.P.W.: Song-level features and support vector machines for music classification. In: Reiss, J.D., Wiggins, G.A. (eds.) Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), pp. 594–599, September 2005
12.
go back to reference McVicar, M., De Bie, T.: CCA and a multi-way extension for investigating common components between audio, lyrics and tags. In: Proceedings of the 9th International Symposium on Computational Music Modeling and Retrieval (CMMR), pp. 53–68 (2012) McVicar, M., De Bie, T.: CCA and a multi-way extension for investigating common components between audio, lyrics and tags. In: Proceedings of the 9th International Symposium on Computational Music Modeling and Retrieval (CMMR), pp. 53–68 (2012)
13.
go back to reference Schüssel, F., Honold, F., Weber, M.: MPRSS 2012. LNCS, vol. 7742. Springer, Heidelberg (2013)CrossRef Schüssel, F., Honold, F., Weber, M.: MPRSS 2012. LNCS, vol. 7742. Springer, Heidelberg (2013)CrossRef
14.
go back to reference Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005) Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005)
Metadata
Title
Fusion of Text and Audio Semantic Representations Through CCA
Authors
Kamelia Aryafar
Ali Shokoufandeh
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-14899-1_7

Premium Partner