Skip to main content
Top

2017 | OriginalPaper | Chapter

Musical Query-by-Semantic-Description Based on Convolutional Neural Network

Authors : Jing Qin, Hongfei Lin, Dongyu Zhang, Shaowu Zhang, Xiaocong Wei

Published in: Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present a new music retrieval system based on query by semantic description (QBSD) system, by which a novel song can be used as query and transformed into semantic vector by a convolutional neural network. This method based on Supervised Multi-class labeling (SML), which a song can be annotated by some semantically meaningful tags and retrieved relevant song in semantically annotated database. CAL500 data set is used in experiment, we can learn a deep learning model for each tag in semantic space. To improve the annotation effect, loss function adjustment algorithm and SMOTE algorithm are employed. The experiment results show that this model can get songs with high semantically similarity, and provide a more nature way to music retrieval.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)CrossRef Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)CrossRef
3.
go back to reference Wang, J., Deng, H., Yan, Q.: A collaborative model of low-level and high-level descriptors for semantics-based music information retrieval. In: International Conference on Web Intelligence and Intelligent Agent Technology, pp. 532–535. IEEE, New York (2008) Wang, J., Deng, H., Yan, Q.: A collaborative model of low-level and high-level descriptors for semantics-based music information retrieval. In: International Conference on Web Intelligence and Intelligent Agent Technology, pp. 532–535. IEEE, New York (2008)
4.
go back to reference Buccoli, M., Gallo, A., Zanoni, M., Sarti, A., Tubaro, S.: A dimensional contextual semantic model for music description and retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 673–677. IEEE, New York (2015) Buccoli, M., Gallo, A., Zanoni, M., Sarti, A., Tubaro, S.: A dimensional contextual semantic model for music description and retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 673–677. IEEE, New York (2015)
5.
go back to reference Buccoli, M., Zanoni, M., Sarti, A., Tubaro, S.: A music search engine based on semantic text-based query. In: IEEE International Workshop on Multimedia Signal Processing, pp. 254–259. IEEE, New York (2013) Buccoli, M., Zanoni, M., Sarti, A., Tubaro, S.: A music search engine based on semantic text-based query. In: IEEE International Workshop on Multimedia Signal Processing, pp. 254–259. IEEE, New York (2013)
6.
go back to reference Miotto, R., Lanckriet, G.: A generative context model for semantic music annotation and retrieval. IEEE Trans. Audio Speech Lang. Process. 20(4), 1096–1108 (2012)CrossRef Miotto, R., Lanckriet, G.: A generative context model for semantic music annotation and retrieval. IEEE Trans. Audio Speech Lang. Process. 20(4), 1096–1108 (2012)CrossRef
7.
go back to reference Su, J.H., Wang, C.Y., Chiu, T.W., Ying, J.C., Tseng, V.S.: Semantic content-based music retrieval using audio and fuzzy-music-sense features. In: IEEE International Conference on Granular Computing, pp. 259–264. IEEE, New York (2014) Su, J.H., Wang, C.Y., Chiu, T.W., Ying, J.C., Tseng, V.S.: Semantic content-based music retrieval using audio and fuzzy-music-sense features. In: IEEE International Conference on Granular Computing, pp. 259–264. IEEE, New York (2014)
8.
go back to reference Foster, P., Mauch, M., Dixon, S.: Sequential complexity as a descriptor for musical similarity. IEEE Press 22(12), 1965–1977 (2014) Foster, P., Mauch, M., Dixon, S.: Sequential complexity as a descriptor for musical similarity. IEEE Press 22(12), 1965–1977 (2014)
9.
go back to reference Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Towards musical query- by- semantic description using the CAL500 data set. In: International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 439–446. ACM, New York (2007) Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Towards musical query- by- semantic description using the CAL500 data set. In: International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 439–446. ACM, New York (2007)
10.
go back to reference Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)CrossRef Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)CrossRef
11.
go back to reference Turnbull, D.R., Barrington, L., Lanckriet, G., Yazdani, M.: Combining audio content and social context for semantic music discovery. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 387–394. ACM, New York (2009) Turnbull, D.R., Barrington, L., Lanckriet, G., Yazdani, M.: Combining audio content and social context for semantic music discovery. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 387–394. ACM, New York (2009)
12.
go back to reference Lee, H., Yan, L., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: International Conference on Neural Information Processing Systems, pp. 1096–1104. Springer, Heidelberg (2009) Lee, H., Yan, L., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: International Conference on Neural Information Processing Systems, pp. 1096–1104. Springer, Heidelberg (2009)
13.
go back to reference Dieleman, S., Brakel, P., Schrauwen, B.: Audio-based music classification with a pretrained convolutional network. In: Proceedings of the ISMIR (2011) Dieleman, S., Brakel, P., Schrauwen, B.: Audio-based music classification with a pretrained convolutional network. In: Proceedings of the ISMIR (2011)
14.
go back to reference Hu, Z., Fu, K., Zhang, C.: Audio classical composer identification by deep neural network. J. Comput. Res. Dev. 51(9), 1945–1954 (2014) Hu, Z., Fu, K., Zhang, C.: Audio classical composer identification by deep neural network. J. Comput. Res. Dev. 51(9), 1945–1954 (2014)
15.
go back to reference Humphrey, E.J., Cho, T., Bello, J.P.: Learning a robust Tonnetz-space transform for automatic chord recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 453–456. IEEE, New York (2012) Humphrey, E.J., Cho, T., Bello, J.P.: Learning a robust Tonnetz-space transform for automatic chord recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 453–456. IEEE, New York (2012)
16.
go back to reference Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: Proceedings of the ISMIR, pp. 339–344 (2010) Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: Proceedings of the ISMIR, pp. 339–344 (2010)
17.
go back to reference Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
18.
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH
19.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. ACM, New York (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. ACM, New York (2012)
20.
go back to reference Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)MATHMathSciNet Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)MATHMathSciNet
22.
go back to reference Coviello, E., Chan, A.B., Lanckriet, G.: Time series models for semantic music annotation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1343–1359 (2011)CrossRef Coviello, E., Chan, A.B., Lanckriet, G.: Time series models for semantic music annotation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1343–1359 (2011)CrossRef
Metadata
Title
Musical Query-by-Semantic-Description Based on Convolutional Neural Network
Authors
Jing Qin
Hongfei Lin
Dongyu Zhang
Shaowu Zhang
Xiaocong Wei
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-68699-8_19