skip to main content
10.1145/500141.500201acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Hierarchical filtering method for content-based music retrieval via acoustic input

Published:01 October 2001Publication History

ABSTRACT

This paper presents an implementation of a content-based music retrieval system that can take a user's acoustic input (8-second clip of singing or humming) via a microphone and then retrieve the intended song from a database containing over 3000 candidate songs. The system, known as Super MBox, demonstrates the feasibility of real-time music retrieval with a high success rate. Super MBox first takes the user's acoustic input from a microphone and converts it into a pitch vector. Then a hierarchical filtering method (HFM) is used to first filter out 80% unlikely candidates and then compare the query input with the remaining 20% candidates in a detailed manner. The output of Super MBox is a ranked song list according to the computed similarity scores. A brief mathematical analysis of the two-step HFM is given in the paper to explain how to derive the optimum parameters of the comparison engine. The proposed HFM and its analysis framework can be directly applied to other multimedia information retrieval systems. We have tested Super MBox extensively and found the top-20 success rate is over 85%, based on a dataset of about singing/humming 2000 clips from people with mediocre singing skills. Our studies demonstrate the feasibility of using Super MBox as a prototype for music search engines over the Internet and/or query engines in digital music libraries.

References

  1. 1.Brown, J. and Zhang, B. "Musical frequency tracking using the methods of conventional and 'narrowed' autocorrelation". Journal of the Acoustical Society of America, Volume 89, Number 5, pages 2346-2354, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2.Chan, Chok-ki, and Ma, Chi-Kit, "A Fast Method of Designing Better Codebooks for Image Vector Quantization," IEEE Transactions on Communications, Vol. 42, No. 21314, PP. 237-242, February/March/April, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.Chen B. and Jang, J.-S. Roger "Query by Singing", 11th JPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 529-536, Taiwan, Aug 1998.Google ScholarGoogle Scholar
  4. 4.Flickner, M. and Sawhney, H. S., Ashley, Huang, J., Q., Dom, Gorkani, B., Hafner, Lee, M., J., D., Petkovic, D., D. Steele, and Yanker, P. "Query by image and video content: the QBIC system," IEEE Computers, Vol. 28, No. 9, pp.23- 32, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Foote, J. "An Overview of Audio Information Retrieval," In Multimedia Systems, vol. 7 no. 1, pp. 2-l 1, ACM Press/Springer-Verlag, January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Fukunaga, Keinosuke and M. Narendra, Patrenahalli "A Branch and Bound Algorithm for Computing K-Nearest Neighbors", IEEE Transactions on Computers, July 1975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Ghias, A. J. and Logan, D. Chamberlain, B. C. Smith, "Query by humming-musical information retrieval in an audio database", ACM Multimedia '95 San Francisco, 1995. (http://www2.cs.comell.edu/zeno/Papers/humming/hummin g.html) Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Gold, B. and Rabiner, L. "Parallel processing techniques for estimating pitch periods of speech in the time domain," J. Acoust. Sot. Am. 46 (2), pp 442-448, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9.Hess, Wolfgang, "Pitch determination of speech signals: algorithms and devices," Springer-Verlag, 1983.Google ScholarGoogle Scholar
  10. 10.International Symposium on Music Information Retrieval (MUSIC IR 2000), Plymouth, Massachusetts, Oct. 23-25, 2000. (httn://ciir.cs.umass.edu/music2000/)Google ScholarGoogle Scholar
  11. 11.Jang, J.-S. Roger and Gao, Ming-Yang "A Query-by-Singing System based on Dynamic Programming", International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum), PP. 85-89, Hsinchu, Taiwan, Dee 2000.Google ScholarGoogle Scholar
  12. 12.Katsavounidis, Ioannis and Kuo, C.-C Jay and Zhang, Zhen, "Fast Tree-Structured Nearest Neighbor Encoding for Vector Quantization," IEEE Transactions on Image Processing, Vol. 5, No. 2, PP. 398-404, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Kosugi, N. Y., Kon'ya, Nishihara, S., Yamamura, M. and Kushima, K. "Music Retrieval by Humming - Using Similarity Retrieval over High Dimensional Feature Vector Space," pp 404-407, IEEE 1999.Google ScholarGoogle Scholar
  14. 14.Kosugi, N., Nishihara, Y., Kon'ya, S., Yamamuro, M., and Kushima, K., "Let's Search for Songs by Humming!" In Proc. ACM Multimedia 99 (Part 2), page 194, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Kosugi, N., Nishihara, Y., Kon'ya, S., Yamamuro, M., and Kushima, K., "Music Retrieval by Humming," In Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 404-407, August 1999.Google ScholarGoogle Scholar
  16. 16.Kosugi, N., Nishihara, Y., Sakata, T., Yamamuro, M., and Kushima, K., "A practical query-by-humming system for a large music database," In Proc. ACM Multimedia 2000, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Lee, I-Yang, Jang, J.-S. Roger and Hsu, Wen-Hao "Content-based Music Retrieval from Acoustic Input", 12th IPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 325-330, Taiwan, August 1999.Google ScholarGoogle Scholar
  18. 18.Liu, C. C. and Chen, A. L. P., "A Multimedia Database System Supporting Content-Based Retrieval", Journal of Information Science and Engineering, 13, PP. 369- 398,1997.Google ScholarGoogle Scholar
  19. 19.McNab, R. J. and Smith, L. A. "Melody transcription for interactive applications" Department of Computer Science University of Waikato, New Zealand.Google ScholarGoogle Scholar
  20. 20.McNab, R. J., Smith, L. A. and Witten, Jan H. "Towards the Digital Music Library: Tune Retrieval from Acoustic Input"" ACM, 1996.Google ScholarGoogle Scholar
  21. 21.McNab, R. J., Smith, L. A., Witten, I. H. and Henderson, C. L. "Tune Retrieval in the Multimedia Library,"Google ScholarGoogle Scholar
  22. 22.McNab,R. J., Smith, L. A. and Witten, Jan H. "Signal Processing for Melody Transcription" Proceedings of the 19'h Australasian Computer Science Conference, 1996.Google ScholarGoogle Scholar
  23. 23.Proakis, J. R. J. G. and Hansen, J. H. L. "Discrete-time processing of speech signals," New York, Macmillan Pub. co., 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Torres, L. and Huguet, J., "An Improvement on Codebook Search for Vector Quantization," IEEE Transactions on Communications, Vol 42, No. 2/3/4, PP. 208-210, February/March/April, 1994.Google ScholarGoogle Scholar
  25. 25.Uitdenbogerd A. and Zobel, J. ""Melodic Matching Techniques for Large Music Databases", (httn://www.kom.e-technik.tudarnstadt.de/acmmm99/ep/uitdcnbogerd/)Google ScholarGoogle Scholar
  26. 26.Yianilos, Peter N. "Data structures and algorithms for nearest neighbor search in general metric spaces," In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 3 1 l-321, Austin, Texas, 25- 27 January 1993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.Yianilos, Peter N. "Excluded Middle Vantage Point Forests for Nearest Neighbor Search," NEC Research Institute Technical Report, 1998Google ScholarGoogle Scholar

Index Terms

  1. Hierarchical filtering method for content-based music retrieval via acoustic input

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader