skip to main content
research-article

A novel framework for efficient automated singer identification in large music databases

Authors Info & Claims
Published:19 May 2009Publication History
Skip Abstract Section

Abstract

Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for efficient automated singer recognition. HSI uses multiple low-level features extracted from both vocal and nonvocal music segments to enhance the identification process; it achieves this via a hybrid architecture that builds profiles of individual singer characteristics based on statistical mixture models. An extensive experimental study on a large music database demonstrates the superiority of our method over state-of-the-art approaches in terms of effectiveness, efficiency, scalability, and robustness.

References

  1. Bartsch, M. and Wakefield, G. 2004. Singing voice identification using spectral envelop estimation. IEEE Trans. Speech Aud. Process. 12, 100--109.Google ScholarGoogle ScholarCross RefCross Ref
  2. Becchetti, C., Ricotti, L., and Ricotti, L. 1999. Speech Recognition. John Wiley, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berenzweig, A., Ellis, D. P. W., and Lawrence, S. 2002. Using voice segments to improve artist classification of music. In Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio. 119--122.Google ScholarGoogle Scholar
  4. Berenzweig, A., Logan, B., Ellis, D., and Whitman, B. 2004. A large-scale evaluation of acoustic and subjective music-similarity measures. Comput. Mus. J. 28, 63--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berenzweig, A. L. and Ellis, D. P. W. 2001. Locating singing voice segments within music signals. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 119--122.Google ScholarGoogle Scholar
  6. Blum, A. 1990. Learning Boolean functions in an infinite attribute space. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing (STOC'90). 64--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carson, C., Belongie, S., Greenspan, H., and Malik, J. 2002. Blobworld:image segmentation using expectation-maximization and its application to image querying. IEEE Trans. Patt. Anal. Mach. Intell. 24, 8, 1026--1038. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  9. Collins, M., Schapire, R. E., and Singer, Y. 2000. Logistic regression, Adaboost and Bregman distances. In Proceedings of the 13rd Annual Conference on Computational Learning Theory (COLT'00). 158--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Downie, J., Ehmann, A., and Hu, X. 2005a. Music-to-knowledge (M2K): A prototyping and evaluation environment for music digital library research. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries (JCDL). 376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Downie, J. S. 2006. The Music Information Retrieval Evaluation Exchange (MIREX). D-Lib Mag. 12, 12 (Dec.)Google ScholarGoogle Scholar
  12. Downie, J. S., West, K., Ehmann, A., and Vincent, E. 2005b. The 2005 Music Information Retrieval Evaluation Exchange (MIREX 2005) preliminary overview. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR). 320--323.Google ScholarGoogle Scholar
  13. Easley, R. F., Michel, J. G., and Devaraj, S. 2003. The MP3 open standard and the music industry's response to internet piracy. Commun. ACM 46, 11, 90--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Freund, Y. and Schapire, R. 1997. A decision-theoretic generalzation of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1, 119--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Greenspan, H., Goldberger, J., and Ridel, L. 2001. A continuous probabilistic framework for image matching. Comput. Vis. Image Underst. 84, 3, 384--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag, Berlin, Germany.Google ScholarGoogle Scholar
  17. ISMIR. 2004. The Fifth International Conference on Music Information Retrieval. http://ismir2004.ismir.net/index.html.Google ScholarGoogle Scholar
  18. Jordan, M. I. 1995. Why the logistic function? a tutorial discussion on probabilities and neural networks. Tech. rep. 9503. MIT, Cambridge, MA.Google ScholarGoogle Scholar
  19. Kim, Y. E. and Whitman, B. 2002. Singer identification in popular music recordings using voice coding features. In Proceedings of the 3rd International Conference Music on Information Retrieval (ISMIR). 164--169.Google ScholarGoogle Scholar
  20. Kim, Y. E., Williamson, D., and Pilli, S. 2006. Towards quantifying the album effect in artist identification. In Proceedings of the 7th International Conference Music Information Retrieval (ISMIR'06). 393--394.Google ScholarGoogle Scholar
  21. Lam, C. K. M. and Tan, B. C. Y. 2001. The Internet is changing the music industry. Commun. ACM 44, 8, 62--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lebanon, G. and Lafferty, J. 2001. Boosting and maximum likelihood for exponential model and Bregman distances. In Advances in Neural Information Processing Systems 14 (Proceedings of NIPS). 110--121.Google ScholarGoogle Scholar
  23. Li, T. and Ogihara, M. 2004. Music artist style identification by semisupervised learning from both lyrics and content. In Proceedings of the 12th Annual ACM International Conference on Multimedia. 364--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Li, T., Ogihara, M., and Li, Q. 2003. A comparative study on content-based music genre classification. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liu, C. C. and Huang, C. S. 2002. A singer identification technique for content-based classification of MP3 music objects. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM). 506--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Livshin, A. and Rodet, X. 2004. Musical instrument identification in continuous recordings. In Proceedings of the 7th International Conference on Digital Audio Effects (DAFx). 222--227.Google ScholarGoogle Scholar
  27. Lu, L., Zhang, H., and Li, S. Z. 2003. Content-based audio classification and segmentation by using support vector machines. Multimed. Syst. 8, 6, 482--492.Google ScholarGoogle ScholarCross RefCross Ref
  28. MIREX. 2005. Artist identification contest track. http://www.music-ir.org/evaluation/mirex-results/audio-artist/index.html.Google ScholarGoogle Scholar
  29. MIREX. 2007. Artist identification contest track. http://www.music-ir.org/mirex2007/index.php/AudioArtistIdentificationResults.Google ScholarGoogle Scholar
  30. Pachet, F. 2003. Content management for electronic music distribution. Commun. ACM 46, 4, 71--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pardo, B. 2006. Special issue: Music information retrieval. Commun. ACM 49, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pinto, A. and Haus, G. 2007. A novel XML music information retrieval method using graph invariants. ACM Trans. Inf. Syst. 25, 4, 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rabiner, L. and Juang, B. 1993. Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rabiner, L. and Schafer, R. 1978. Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  35. Rissanen, J. 1978. Modeling by shortest data description. Automatica 14, 465--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shen, J., Shepherd, J., Cui, B., and Tan, K.-L. 2006. HSI: A novel framework for efficient automated singer identification in large music database. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). 169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tolonen, T. and Karjalainen, M. 2000. A computationally efficient multipitch analysis model. IEEE Trans. Speech Aud. Process. 8, 4, 708--716.Google ScholarGoogle ScholarCross RefCross Ref
  38. Tsai, W. H. and Wang, H. M. 2006. Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Trans. Speech Aud. Process. 14, 1, 330--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tsai, W. H., Wang, H. M., Rodgers, D., Cheng, S. S., and Yu, H. M. 2003. Blind clustering of popular music recordings based on singer voice characteristics. In Proceedings of the 4th international Conference on Music Information Retrieval (ISMIR). 167--173.Google ScholarGoogle Scholar
  40. Vapnik, V. 1998. Statistical Learning Theory. John Wiley & Sons. New York, NY.Google ScholarGoogle Scholar
  41. Whitman, B., Flake, G., and Lawrence, S. 2001. Artist detection in music with Minnowmatch. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing. 559--568.Google ScholarGoogle Scholar
  42. Xu, C. S., Maddage, N., and Shao, X. 2005. Automatic music classification and summarization. IEEE Trans. Speech Aud. Process. 13, 3, 441--450.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zhang, T. 2003. Automatic singer identification. In Proceedings of the 2003 International Conference on Multimedia and Expo (ICME). 33--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A novel framework for efficient automated singer identification in large music databases

          Recommendations

          Reviews

          Rosa Michaelson

          It is very difficult to automate the recognition of music and musicians by content rather than bibliographical detail. Recent work to identify a singer uses voice data, often abstracted from a more complicated musical source, as the most representative feature, with some form of statistical modeling and machine learning of the singer's characteristics, so that further examples can be tested and categorized. Shen et al. regard other information, such as beat and timbre, as being of equal and additional importance to vocal data, as well as what they call the genre of the piece, the accompanying instrumental music. The authors present in this paper a new method, the hybrid singer identification (HSI), which they claim is more robust than previous techniques. HSI is a multi-faceted method, in which the four specific features noted above are abstracted from a piece of music, and a statistical profile for a particular singer is constructed for each feature, from sample performances. This profile is then used to classify further songs. Several assumptions are made in the creation of each profile, namely, that singers tend to play with the same backing band and that the type of instrumentation does not change from recording to recording. These assumptions are not realistic, since session musicians are used extensively in recordings of major artists who may also change style, hence the instrumentation, across a range of musical genres. Problems also occur with the first stage of profiling-using datasets from complete albums biases the "learning" toward the style of an album, not a singer, although HSI attempts to overcome this bias. The authors provide a useful overview of a range of associated research that is often conducted on small datasets, introduce us to a benchmarking system for such research set up in 2005, and proceed to demonstrate that HSI performs better than comparative methods over a number of factors, such as robustness and scalability. Their experimental work uses a large dataset of commercial popular singers of the late 20th century; it would be interesting to see how HSI fares when applied to different types of music and to what extent it can apply to other forms of multimedia. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Information Systems
            ACM Transactions on Information Systems  Volume 27, Issue 3
            May 2009
            206 pages
            ISSN:1046-8188
            EISSN:1558-2868
            DOI:10.1145/1508850
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 May 2009
            • Accepted: 1 September 2008
            • Revised: 1 May 2007
            • Received: 1 September 2006
            Published in tois Volume 27, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader