Skip to main content

2017 | OriginalPaper | Buchkapitel

Musical Query-by-Semantic-Description Based on Convolutional Neural Network

verfasst von : Jing Qin, Hongfei Lin, Dongyu Zhang, Shaowu Zhang, Xiaocong Wei

Erschienen in: Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a new music retrieval system based on query by semantic description (QBSD) system, by which a novel song can be used as query and transformed into semantic vector by a convolutional neural network. This method based on Supervised Multi-class labeling (SML), which a song can be annotated by some semantically meaningful tags and retrieved relevant song in semantically annotated database. CAL500 data set is used in experiment, we can learn a deep learning model for each tag in semantic space. To improve the annotation effect, loss function adjustment algorithm and SMOTE algorithm are employed. The experiment results show that this model can get songs with high semantically similarity, and provide a more nature way to music retrieval.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)CrossRef Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)CrossRef
3.
Zurück zum Zitat Wang, J., Deng, H., Yan, Q.: A collaborative model of low-level and high-level descriptors for semantics-based music information retrieval. In: International Conference on Web Intelligence and Intelligent Agent Technology, pp. 532–535. IEEE, New York (2008) Wang, J., Deng, H., Yan, Q.: A collaborative model of low-level and high-level descriptors for semantics-based music information retrieval. In: International Conference on Web Intelligence and Intelligent Agent Technology, pp. 532–535. IEEE, New York (2008)
4.
Zurück zum Zitat Buccoli, M., Gallo, A., Zanoni, M., Sarti, A., Tubaro, S.: A dimensional contextual semantic model for music description and retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 673–677. IEEE, New York (2015) Buccoli, M., Gallo, A., Zanoni, M., Sarti, A., Tubaro, S.: A dimensional contextual semantic model for music description and retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 673–677. IEEE, New York (2015)
5.
Zurück zum Zitat Buccoli, M., Zanoni, M., Sarti, A., Tubaro, S.: A music search engine based on semantic text-based query. In: IEEE International Workshop on Multimedia Signal Processing, pp. 254–259. IEEE, New York (2013) Buccoli, M., Zanoni, M., Sarti, A., Tubaro, S.: A music search engine based on semantic text-based query. In: IEEE International Workshop on Multimedia Signal Processing, pp. 254–259. IEEE, New York (2013)
6.
Zurück zum Zitat Miotto, R., Lanckriet, G.: A generative context model for semantic music annotation and retrieval. IEEE Trans. Audio Speech Lang. Process. 20(4), 1096–1108 (2012)CrossRef Miotto, R., Lanckriet, G.: A generative context model for semantic music annotation and retrieval. IEEE Trans. Audio Speech Lang. Process. 20(4), 1096–1108 (2012)CrossRef
7.
Zurück zum Zitat Su, J.H., Wang, C.Y., Chiu, T.W., Ying, J.C., Tseng, V.S.: Semantic content-based music retrieval using audio and fuzzy-music-sense features. In: IEEE International Conference on Granular Computing, pp. 259–264. IEEE, New York (2014) Su, J.H., Wang, C.Y., Chiu, T.W., Ying, J.C., Tseng, V.S.: Semantic content-based music retrieval using audio and fuzzy-music-sense features. In: IEEE International Conference on Granular Computing, pp. 259–264. IEEE, New York (2014)
8.
Zurück zum Zitat Foster, P., Mauch, M., Dixon, S.: Sequential complexity as a descriptor for musical similarity. IEEE Press 22(12), 1965–1977 (2014) Foster, P., Mauch, M., Dixon, S.: Sequential complexity as a descriptor for musical similarity. IEEE Press 22(12), 1965–1977 (2014)
9.
Zurück zum Zitat Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Towards musical query- by- semantic description using the CAL500 data set. In: International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 439–446. ACM, New York (2007) Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Towards musical query- by- semantic description using the CAL500 data set. In: International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 439–446. ACM, New York (2007)
10.
Zurück zum Zitat Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)CrossRef Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)CrossRef
11.
Zurück zum Zitat Turnbull, D.R., Barrington, L., Lanckriet, G., Yazdani, M.: Combining audio content and social context for semantic music discovery. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 387–394. ACM, New York (2009) Turnbull, D.R., Barrington, L., Lanckriet, G., Yazdani, M.: Combining audio content and social context for semantic music discovery. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 387–394. ACM, New York (2009)
12.
Zurück zum Zitat Lee, H., Yan, L., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: International Conference on Neural Information Processing Systems, pp. 1096–1104. Springer, Heidelberg (2009) Lee, H., Yan, L., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: International Conference on Neural Information Processing Systems, pp. 1096–1104. Springer, Heidelberg (2009)
13.
Zurück zum Zitat Dieleman, S., Brakel, P., Schrauwen, B.: Audio-based music classification with a pretrained convolutional network. In: Proceedings of the ISMIR (2011) Dieleman, S., Brakel, P., Schrauwen, B.: Audio-based music classification with a pretrained convolutional network. In: Proceedings of the ISMIR (2011)
14.
Zurück zum Zitat Hu, Z., Fu, K., Zhang, C.: Audio classical composer identification by deep neural network. J. Comput. Res. Dev. 51(9), 1945–1954 (2014) Hu, Z., Fu, K., Zhang, C.: Audio classical composer identification by deep neural network. J. Comput. Res. Dev. 51(9), 1945–1954 (2014)
15.
Zurück zum Zitat Humphrey, E.J., Cho, T., Bello, J.P.: Learning a robust Tonnetz-space transform for automatic chord recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 453–456. IEEE, New York (2012) Humphrey, E.J., Cho, T., Bello, J.P.: Learning a robust Tonnetz-space transform for automatic chord recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 453–456. IEEE, New York (2012)
16.
Zurück zum Zitat Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: Proceedings of the ISMIR, pp. 339–344 (2010) Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: Proceedings of the ISMIR, pp. 339–344 (2010)
17.
Zurück zum Zitat Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
18.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)MATH
19.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. ACM, New York (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. ACM, New York (2012)
20.
Zurück zum Zitat Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)MATHMathSciNet Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)MATHMathSciNet
22.
Zurück zum Zitat Coviello, E., Chan, A.B., Lanckriet, G.: Time series models for semantic music annotation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1343–1359 (2011)CrossRef Coviello, E., Chan, A.B., Lanckriet, G.: Time series models for semantic music annotation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1343–1359 (2011)CrossRef
Metadaten
Titel
Musical Query-by-Semantic-Description Based on Convolutional Neural Network
verfasst von
Jing Qin
Hongfei Lin
Dongyu Zhang
Shaowu Zhang
Xiaocong Wei
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-68699-8_19

Neuer Inhalt