Top

Published in:

2015 | OriginalPaper | Chapter

Fusion of Text and Audio Semantic Representations Through CCA

Authors : Kamelia Aryafar, Ali Shokoufandeh

Published in: Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Humans are natural multimedia processing machines. Multimedia is a domain of multi-modalities including audio, text and images. A central aspect of multimedia processing is the coherent integration of media from different modalities as a single identity. Multimodal information fusion architectures become a necessity when not all information channels are available at all times. In this paper, we introduce a multimodal fusion of audio signals and lyrics in a shared semantic space through canonical correlation analysis. We propose an audio retrieval system based on extended semantic analysis of audio signals. We will combine this model with a tf-idf representation of lyrics to achieve a multimodal retrieval system. We use canonical correlation analysis and supervised learning methods as a basis for relating audio and lyrics information. Our experimental evaluation of the proposed method indicated that the proposed model outperforms the prior approaches based on simple canonical correlation methods. Finally, the efficiency of the proposed method allows for dealing with large music and lyrics collections enabling users to explore relevant lyrics information for music datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Complementary Gaussian Mixture Models for Multimodal Speech Recognition

next chapter uulmMAD – A Human Action Recognition Dataset for Ground-Truth Evaluation and Investigation of View Invariances

Aryafar, K., Jafarpour, S., Shokoufandeh, A.: Music genre classification using sparsity-eager support vector machines. Technical report

Aryafar, K., Jafarpour, S., Shokoufandeh, A.: Automatic musical genre classification using sparsity-eager support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 1526–1529. IEEE (2012)

Aryafar, K., Shokoufandeh, A.: Music genre classification using explicit semantic analysis. In: Proceedings of the 1st International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 33–38. ACM (2011)

Pradeep, K., Atrey, M., Hossain, A., Saddik, A.E., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)CrossRef

Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (2011)

Dorai, C., Venkatesh, S.: Bridging the semantic gap in content management systems. In: Dorai, C., Venkatesh, S. (eds.) Media Computing, pp. 1–9. Springer, New York (2002)CrossRef

Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)

Jensen, B.S., Troelsgaard, R., Larsen, J., Hansen, L.K.: Towards a universal representation for audio information retrieval and analysis. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3168–3172. IEEE (2013)

Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266. Citeseer (2010)

10.

Li, T.L.H., Chan, A.B.: Genre classification and the invariance of MFCC features to key and tempo. In: Lee, K.-T., Tsai, W.-H., Liao, H.-Y.M., Chen, T., Hsieh, J.-W., Tseng, C.-C. (eds.) MMM 2011 Part I. LNCS, vol. 6523, pp. 317–327. Springer, Heidelberg (2011)CrossRef

11.

Mandel, M.I., Ellis, D.P.W.: Song-level features and support vector machines for music classification. In: Reiss, J.D., Wiggins, G.A. (eds.) Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), pp. 594–599, September 2005

12.

McVicar, M., De Bie, T.: CCA and a multi-way extension for investigating common components between audio, lyrics and tags. In: Proceedings of the 9th International Symposium on Computational Music Modeling and Retrieval (CMMR), pp. 53–68 (2012)

13.

Schüssel, F., Honold, F., Weber, M.: MPRSS 2012. LNCS, vol. 7742. Springer, Heidelberg (2013)CrossRef

14.

Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005)

Title: Fusion of Text and Audio Semantic Representations Through CCA
Authors: Kamelia Aryafar
Ali Shokoufandeh
Publisher: Springer International Publishing
Book: Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction
Print ISBN: 978-3-319-14898-4

Electronic ISBN: 978-3-319-14899-1

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-14899-1_7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner