skip to main content
research-article

A Probabilistic Model to Combine Tags and Acoustic Similarity for Music Retrieval

Published:01 May 2012Publication History
Skip Abstract Section

Abstract

The rise of the Internet has led the music industry to a transition from physical media to online products and services. As a consequence, current online music collections store millions of songs and are constantly being enriched with new content. This has created a need for music technologies that allow users to interact with these extensive collections efficiently and effectively. Music search and discovery may be carried out using tags, matching user interests and exploiting content-based acoustic similarity. One major issue in music information retrieval is how to combine such noisy and heterogeneous information sources in order to improve retrieval effectiveness. With this aim in mind, the article explores a novel music retrieval framework based on combining tags and acoustic similarity through a probabilistic graph-based representation of a collection of songs. The retrieval function highlights the path across the graph that most likely observes a user query and is used to improve state-of-the-art music search and discovery engines by delivering more relevant ranking lists. Indeed, by means of an empirical evaluation, we show how the proposed approach leads to better performances than retrieval strategies which rank songs according to individual information sources alone or which use a combination of them.

References

  1. Agarwal, S. 2006. Ranking on graph data. In Proceedings of the International Conference on Machine Learning (ICML’06). 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barrington, L., Oda, R., and Lanckriet, G. 2009. Smarter than genius? Human evaluation of music recommender systems. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 357--362.Google ScholarGoogle Scholar
  3. Berenzweig, A., Logan, B., Ellis, D., and Whitman, B. 2004. A large-scale evaluation of acoustic and subjective music-similarity measures. Comput. Music J. 28, 63--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bertin-Mahieux, T., Eck, D., Maillet, F., and Lamere, P. 2008. Autotagger: A model for predicting social tags from acoustic features on large music databases. J. New Music Res. 37, 2, 115--135.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bertin-Mahieux, T., Weiss, R., and Ellis, D. 2010. Clustering beat-chroma patterns in a large music database. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 111--116.Google ScholarGoogle Scholar
  6. Bu, J., Tan, S., Chen, C., Wang, C., Wu, H., Zhang, L., and He, X. 2010. Music recommendation by unified hypergraph: Combining social media information and music content. In Proceedings of the ACM Multimedia Conference. 391--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carneiro, G., Chan, A., Moreno, P., and Vasconcelos, N. 2007. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29, 3, 394--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Casey, M., Rhodes, C., and Slaney, M. 2008a. Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio, Speech Lang. Process. 5, 16, 1015--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., and Slaney, M. 2008b. Content-Based music information retrieval: Current directions and future challenges. Proc. IEEE 96, 4, 668--696.Google ScholarGoogle ScholarCross RefCross Ref
  10. Celma, O. 2008. Music recommendation and discovery in the long tail. Ph.D. thesis, Universitat Pompeu Fabra, Barcelona.Google ScholarGoogle Scholar
  11. Celma, O., Cano, P., and Herrera, P. 2006. Search sounds: An audio crawler focused on Web-logs. In Proceedings of the International Society for Music Information Retrieval (ISMIR’06). 365--366.Google ScholarGoogle Scholar
  12. Coviello, E., Chan, A., and Lanckriet, G. 2011. Time series models for semantic music annotation. IEEE Trans. Audio, Speech Lang. Process. 19, 5, 1343--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B39, 1, 1--38.Google ScholarGoogle Scholar
  14. Downie, J. 2008. The music information retrieval evaluation exchange (2005--2007): A window into music information retrieval research. Acoust. Sci. Technol. 29, 4, 247--255.Google ScholarGoogle ScholarCross RefCross Ref
  15. Feng, S., Manmatha, R., and Lavrenko, V. 2004. Multiple Bernoulli relevance models for image and video annotation. In Proceedings of the IEEE Conference on Computerc Vision and Pattern Recognition (CVPR’04). 1002--1009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fields, B., Jacobson, K., Rhodes, C., d’Inverno, M., Sandler, M., and Casey, M. 2011. Analysis and exploitation of musician social networks for recommendation and discovery. IEEE Trans. Multimedia 13, 4, 674--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Flexer, A., Schnitzer, D., Gasser, M., and Pohle, T. 2010. Combining features reduces hubness in audio similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 171--176.Google ScholarGoogle Scholar
  18. Forney, G. 1973. The Viterbi algorithm. Proc. IEEE 61, 3, 268--278.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hoffman, M., Blei, D., and Cook, P. 2008. Content-Based musical similarity computation using the hierarchical Dirichlet process. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 349--354.Google ScholarGoogle Scholar
  20. Hoffman, M., Blei, D., and Cook, P. 2009. Easy as CBA: A simple probabilistic model for tagging music. In Proc. of ISMIR. 369--374.Google ScholarGoogle Scholar
  21. Jensen, J., Christensen, M., Ellis, D., and Jensen, S. 2009. Quantitative analysis of a common audio similarity measure. IEEE Trans. Audio, Speech Lang. Process. 17, 4, 693--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Knees, P., Pohle, T., Schedl, M., and Widmer, G. 2007. A music search engine built upon audio-based and Web-based similarity measures. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 23--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Knees, P., Pohle, T., Schedl, M., Schnitzer, D., Seyerlehner, K., and Widmer, G. 2009. Augmenting text-based music retrieval with audio similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 579--584.Google ScholarGoogle Scholar
  24. Kullback, S. and Leibler, R. 1951. On information and sufficiency. Ann. Math. Statist. 12, 2, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  25. Lamere, P. 2008. Social tagging and music information retrieval. J. New Music Res. 37, 2, 101--114.Google ScholarGoogle ScholarCross RefCross Ref
  26. Logan, B. 2000. Mel frequency cepstral coefficients for music modeling. In Proceedings of the International Society for Music Information Retrieval (ISMIR’00).Google ScholarGoogle Scholar
  27. Mandel, M. and Ellis, D. 2005. Song-level features and support vector machines for music classification. In Proceedings of the International Society for Music Information Retrieval (ISMIR’05). 594--599.Google ScholarGoogle Scholar
  28. Mandel, M. and Ellis, D. 2008. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 577--582.Google ScholarGoogle Scholar
  29. Manning, C., Raghavan, P., and Schtze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. McFee, B. and Lanckriet, G. 2009. Heterogenous embedding for subjective artist similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 513--518.Google ScholarGoogle Scholar
  31. Miotto, R. and Lanckriet, G. 2012. A generative context model for semantic music annotation and retrieval. IEEE Trans. Audio, Speech Lang. Process. 20, 4, 1096--1108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Miotto, R. and Orio, N. 2010. A probabilistic approach to merge context and content information for music retrieval. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 15--20.Google ScholarGoogle Scholar
  33. Miotto, R., Montecchio, N., and Orio, N. 2010. Statistical music modeling aimed at identification and alignment. In Advances in Music Information Retrieval, Z. Ras and A. Wieczorkowska Eds., Springer, 187--212.Google ScholarGoogle Scholar
  34. Ness, S., Theocharis, A., Tzanetakis, G., and Martins, L. 2009. Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. In Proceedings of the ACM Multimedia Conference. 705--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Orio, N. 2006. Music retrieval: A tutorial and review. Found Trends Inf. Retriev. 1, 1, 1--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pampalk, E. 2006. Computational models of music similarity and their application to music information retrieval. Ph.D. thesis, Vienna University of Technology.Google ScholarGoogle Scholar
  37. Rabiner, L. 1989. A tutorial on hidden Markov models and selected application. Proc. IEEE 77, 2, 257--286.Google ScholarGoogle ScholarCross RefCross Ref
  38. Raphael, C. 1999. Automatic segmentation of acoustic musical signals using hidden Markov models. IEEE Trans. Pattern Anal. Mach. Intell. 21, 4, 360--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rasiwasia, N. and Vasconcelos, N. 2007. Bridging the semantic gap: Query by semantic example. IEEE Trans. Multimedia 9, 5, 923--938. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Seyerlehner, K., Widmer, G., and Knees, P. 2008. Frame level audio similarity -- A codebook approach. In Proceedings of the International Conference on Digital Audio Effects (DAFx’08). 349--356.Google ScholarGoogle Scholar
  41. Shifrin, J., Pardo, B., Meek, C., and Birmingham, W. 2002. HMM-based musical query retrieval. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL’02). 295--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Slaney, M., Weinberger, K., and White, W. 2008. Learning a metric for music similarity. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 313--318.Google ScholarGoogle Scholar
  43. Sordo, M., Laurier, C., and Celma, O. 2007. Annotating music collections: How content-based similarity helps to propagate labels. In Proceedings of the International Society for Music Information Retrieval (ISMIR’07). 531--534.Google ScholarGoogle Scholar
  44. Tingle, D., Kim, Y., and Turnbull, D. 2010. Exploring automatic music annotation with “acoustically-objective” tags. In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR’10). 55--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tomasik, B., Kim, J., Ladlow, M., Augat, M., Tingle, D., Wicentowski, R., and Turnbull, D. 2009. Using regression to combine data sources for semantic music discovery. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 405--410.Google ScholarGoogle Scholar
  46. Tomasik, B., Thiha, P., and Turnbull, D. 2010. Beat-sync-mash-coder: A web application for real-time creation of beat-synchronous music mashups. In Proc. of IEEE ICASSP. 437--440.Google ScholarGoogle Scholar
  47. Tsai, C. and Hung, C. 2008. Automatically annotating images with keywords: A review of image annotation systems. Recent Patents Comput. Sci 1, 55--68.Google ScholarGoogle ScholarCross RefCross Ref
  48. Turnbull, D., Barrington, L., Torres, D., and Lanckriet, G. 2007. Towards musical query-by-semantic description using the CAL500 data set. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 439--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Turnbull, D., Barrington, L., and Lanckriet, G. 2008a. Five approaches to collecting tags for music. In Proceedings of the International Society for Music Information Retrieval (ISMIR’08). 225--230.Google ScholarGoogle Scholar
  50. Turnbull, D., Barrington, L., Torres, D., and Lanckriet, G. 2008b. Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio, Speech Lang. Process. 16, 2, 467--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Turnbull, D., Barrington, L., Lanckriet, G., and Yazdani, M. 2009. Combining audio content and social context for semantic music discovery. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 387--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Vasconcelos, N. and Lippman, A. 1998. Learning mixture hierarchies. In Proceedings of the Conference on Advances in Neutral Information Processing Systems (NIPS’98). 606--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wang, D., Li, T., and Ogihara, M. 2010. Are tags better than audio features? The effects of joint use of tags and audio content features for artistic style clustering. In Proceedings of the International Society for Music Information Retrieval (ISMIR’10). 57--62.Google ScholarGoogle Scholar
  54. Yang, Y., Lin, Y., Lee, A., and Chen, H. 2009. Improving musical concept detection by ordinal regression and context fusion. In Proceedings of the International Society for Music Information Retrieval (ISMIR’09). 147--152.Google ScholarGoogle Scholar

Index Terms

  1. A Probabilistic Model to Combine Tags and Acoustic Similarity for Music Retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 30, Issue 2
        May 2012
        245 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/2180868
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 2012
        • Accepted: 1 December 2011
        • Revised: 1 September 2011
        • Received: 1 February 2011
        Published in tois Volume 30, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader