Abstract
The rise of the Internet has led the music industry to a transition from physical media to online products and services. As a consequence, current online music collections store millions of songs and are constantly being enriched with new content. This has created a need for music technologies that allow users to interact with these extensive collections efficiently and effectively. Music search and discovery may be carried out using tags, matching user interests and exploiting content-based acoustic similarity. One major issue in music information retrieval is how to combine such noisy and heterogeneous information sources in order to improve retrieval effectiveness. With this aim in mind, the article explores a novel music retrieval framework based on combining tags and acoustic similarity through a probabilistic graph-based representation of a collection of songs. The retrieval function highlights the path across the graph that most likely observes a user query and is used to improve state-of-the-art music search and discovery engines by delivering more relevant ranking lists. Indeed, by means of an empirical evaluation, we show how the proposed approach leads to better performances than retrieval strategies which rank songs according to individual information sources alone or which use a combination of them.
- Agarwal, S. 2006. Ranking on graph data. In Proceedings of the International Conference on Machine Learning (ICML’06). 25--32. Google ScholarDigital Library
- Barrington, L., Oda, R., and Lanckriet, G. 2009. Smarter than genius? Human evaluation of music recommender systems. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 357--362.Google Scholar
- Berenzweig, A., Logan, B., Ellis, D., and Whitman, B. 2004. A large-scale evaluation of acoustic and subjective music-similarity measures. Comput. Music J. 28, 63--76. Google ScholarDigital Library
- Bertin-Mahieux, T., Eck, D., Maillet, F., and Lamere, P. 2008. Autotagger: A model for predicting social tags from acoustic features on large music databases. J. New Music Res. 37, 2, 115--135.Google ScholarCross Ref
- Bertin-Mahieux, T., Weiss, R., and Ellis, D. 2010. Clustering beat-chroma patterns in a large music database. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 111--116.Google Scholar
- Bu, J., Tan, S., Chen, C., Wang, C., Wu, H., Zhang, L., and He, X. 2010. Music recommendation by unified hypergraph: Combining social media information and music content. In Proceedings of the ACM Multimedia Conference. 391--400. Google ScholarDigital Library
- Carneiro, G., Chan, A., Moreno, P., and Vasconcelos, N. 2007. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29, 3, 394--410. Google ScholarDigital Library
- Casey, M., Rhodes, C., and Slaney, M. 2008a. Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio, Speech Lang. Process. 5, 16, 1015--1028. Google ScholarDigital Library
- Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., and Slaney, M. 2008b. Content-Based music information retrieval: Current directions and future challenges. Proc. IEEE 96, 4, 668--696.Google ScholarCross Ref
- Celma, O. 2008. Music recommendation and discovery in the long tail. Ph.D. thesis, Universitat Pompeu Fabra, Barcelona.Google Scholar
- Celma, O., Cano, P., and Herrera, P. 2006. Search sounds: An audio crawler focused on Web-logs. In Proceedings of the International Society for Music Information Retrieval (ISMIR’06). 365--366.Google Scholar
- Coviello, E., Chan, A., and Lanckriet, G. 2011. Time series models for semantic music annotation. IEEE Trans. Audio, Speech Lang. Process. 19, 5, 1343--1359. Google ScholarDigital Library
- Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B39, 1, 1--38.Google Scholar
- Downie, J. 2008. The music information retrieval evaluation exchange (2005--2007): A window into music information retrieval research. Acoust. Sci. Technol. 29, 4, 247--255.Google ScholarCross Ref
- Feng, S., Manmatha, R., and Lavrenko, V. 2004. Multiple Bernoulli relevance models for image and video annotation. In Proceedings of the IEEE Conference on Computerc Vision and Pattern Recognition (CVPR’04). 1002--1009. Google ScholarDigital Library
- Fields, B., Jacobson, K., Rhodes, C., d’Inverno, M., Sandler, M., and Casey, M. 2011. Analysis and exploitation of musician social networks for recommendation and discovery. IEEE Trans. Multimedia 13, 4, 674--686. Google ScholarDigital Library
- Flexer, A., Schnitzer, D., Gasser, M., and Pohle, T. 2010. Combining features reduces hubness in audio similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 171--176.Google Scholar
- Forney, G. 1973. The Viterbi algorithm. Proc. IEEE 61, 3, 268--278.Google ScholarCross Ref
- Hoffman, M., Blei, D., and Cook, P. 2008. Content-Based musical similarity computation using the hierarchical Dirichlet process. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 349--354.Google Scholar
- Hoffman, M., Blei, D., and Cook, P. 2009. Easy as CBA: A simple probabilistic model for tagging music. In Proc. of ISMIR. 369--374.Google Scholar
- Jensen, J., Christensen, M., Ellis, D., and Jensen, S. 2009. Quantitative analysis of a common audio similarity measure. IEEE Trans. Audio, Speech Lang. Process. 17, 4, 693--703. Google ScholarDigital Library
- Knees, P., Pohle, T., Schedl, M., and Widmer, G. 2007. A music search engine built upon audio-based and Web-based similarity measures. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 23--27. Google ScholarDigital Library
- Knees, P., Pohle, T., Schedl, M., Schnitzer, D., Seyerlehner, K., and Widmer, G. 2009. Augmenting text-based music retrieval with audio similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 579--584.Google Scholar
- Kullback, S. and Leibler, R. 1951. On information and sufficiency. Ann. Math. Statist. 12, 2, 79--86.Google ScholarCross Ref
- Lamere, P. 2008. Social tagging and music information retrieval. J. New Music Res. 37, 2, 101--114.Google ScholarCross Ref
- Logan, B. 2000. Mel frequency cepstral coefficients for music modeling. In Proceedings of the International Society for Music Information Retrieval (ISMIR’00).Google Scholar
- Mandel, M. and Ellis, D. 2005. Song-level features and support vector machines for music classification. In Proceedings of the International Society for Music Information Retrieval (ISMIR’05). 594--599.Google Scholar
- Mandel, M. and Ellis, D. 2008. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 577--582.Google Scholar
- Manning, C., Raghavan, P., and Schtze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- McFee, B. and Lanckriet, G. 2009. Heterogenous embedding for subjective artist similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 513--518.Google Scholar
- Miotto, R. and Lanckriet, G. 2012. A generative context model for semantic music annotation and retrieval. IEEE Trans. Audio, Speech Lang. Process. 20, 4, 1096--1108.Google ScholarDigital Library
- Miotto, R. and Orio, N. 2010. A probabilistic approach to merge context and content information for music retrieval. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 15--20.Google Scholar
- Miotto, R., Montecchio, N., and Orio, N. 2010. Statistical music modeling aimed at identification and alignment. In Advances in Music Information Retrieval, Z. Ras and A. Wieczorkowska Eds., Springer, 187--212.Google Scholar
- Ness, S., Theocharis, A., Tzanetakis, G., and Martins, L. 2009. Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. In Proceedings of the ACM Multimedia Conference. 705--708. Google ScholarDigital Library
- Orio, N. 2006. Music retrieval: A tutorial and review. Found Trends Inf. Retriev. 1, 1, 1--90. Google ScholarDigital Library
- Pampalk, E. 2006. Computational models of music similarity and their application to music information retrieval. Ph.D. thesis, Vienna University of Technology.Google Scholar
- Rabiner, L. 1989. A tutorial on hidden Markov models and selected application. Proc. IEEE 77, 2, 257--286.Google ScholarCross Ref
- Raphael, C. 1999. Automatic segmentation of acoustic musical signals using hidden Markov models. IEEE Trans. Pattern Anal. Mach. Intell. 21, 4, 360--370. Google ScholarDigital Library
- Rasiwasia, N. and Vasconcelos, N. 2007. Bridging the semantic gap: Query by semantic example. IEEE Trans. Multimedia 9, 5, 923--938. Google ScholarDigital Library
- Seyerlehner, K., Widmer, G., and Knees, P. 2008. Frame level audio similarity -- A codebook approach. In Proceedings of the International Conference on Digital Audio Effects (DAFx’08). 349--356.Google Scholar
- Shifrin, J., Pardo, B., Meek, C., and Birmingham, W. 2002. HMM-based musical query retrieval. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL’02). 295--300. Google ScholarDigital Library
- Slaney, M., Weinberger, K., and White, W. 2008. Learning a metric for music similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 313--318.Google Scholar
- Sordo, M., Laurier, C., and Celma, O. 2007. Annotating music collections: How content-based similarity helps to propagate labels. In Proceedings of the International Society for Music Information Retrieval (ISMIR’07). 531--534.Google Scholar
- Tingle, D., Kim, Y., and Turnbull, D. 2010. Exploring automatic music annotation with “acoustically-objective” tags. In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR’10). 55--61. Google ScholarDigital Library
- Tomasik, B., Kim, J., Ladlow, M., Augat, M., Tingle, D., Wicentowski, R., and Turnbull, D. 2009. Using regression to combine data sources for semantic music discovery. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 405--410.Google Scholar
- Tomasik, B., Thiha, P., and Turnbull, D. 2010. Beat-sync-mash-coder: A web application for real-time creation of beat-synchronous music mashups. In Proc. of IEEE ICASSP. 437--440.Google Scholar
- Tsai, C. and Hung, C. 2008. Automatically annotating images with keywords: A review of image annotation systems. Recent Patents Comput. Sci 1, 55--68.Google ScholarCross Ref
- Turnbull, D., Barrington, L., Torres, D., and Lanckriet, G. 2007. Towards musical query-by-semantic description using the CAL500 data set. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 439--446. Google ScholarDigital Library
- Turnbull, D., Barrington, L., and Lanckriet, G. 2008a. Five approaches to collecting tags for music. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 225--230.Google Scholar
- Turnbull, D., Barrington, L., Torres, D., and Lanckriet, G. 2008b. Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio, Speech Lang. Process. 16, 2, 467--476. Google ScholarDigital Library
- Turnbull, D., Barrington, L., Lanckriet, G., and Yazdani, M. 2009. Combining audio content and social context for semantic music discovery. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 387--394. Google ScholarDigital Library
- Vasconcelos, N. and Lippman, A. 1998. Learning mixture hierarchies. In Proceedings of the Conference on Advances in Neutral Information Processing Systems (NIPS’98). 606--612. Google ScholarDigital Library
- Wang, D., Li, T., and Ogihara, M. 2010. Are tags better than audio features? The effects of joint use of tags and audio content features for artistic style clustering. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 57--62.Google Scholar
- Yang, Y., Lin, Y., Lee, A., and Chen, H. 2009. Improving musical concept detection by ordinal regression and context fusion. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 147--152.Google Scholar
Index Terms
- A Probabilistic Model to Combine Tags and Acoustic Similarity for Music Retrieval
Recommendations
Music similarity and retrieval
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalThis tutorial serves as an introductory course to the field of and state-of-the-art in music information retrieval (MIR) and in particular to music similarity estimation which is an essential component of music retrieval. Apart from explaining ...
Music Retrieval and Recommendation: A Tutorial Overview
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalIn this tutorial, we give an introduction to the field of and state of the art in music information retrieval (MIR). The tutorial particularly spotlights the question of music similarity, which is an essential aspect in music retrieval and ...
Semantic Annotation and Retrieval of Music and Sound Effects
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-...
Comments