ABSTRACT
This paper presents an implementation of a content-based music retrieval system that can take a user's acoustic input (8-second clip of singing or humming) via a microphone and then retrieve the intended song from a database containing over 3000 candidate songs. The system, known as Super MBox, demonstrates the feasibility of real-time music retrieval with a high success rate. Super MBox first takes the user's acoustic input from a microphone and converts it into a pitch vector. Then a hierarchical filtering method (HFM) is used to first filter out 80% unlikely candidates and then compare the query input with the remaining 20% candidates in a detailed manner. The output of Super MBox is a ranked song list according to the computed similarity scores. A brief mathematical analysis of the two-step HFM is given in the paper to explain how to derive the optimum parameters of the comparison engine. The proposed HFM and its analysis framework can be directly applied to other multimedia information retrieval systems. We have tested Super MBox extensively and found the top-20 success rate is over 85%, based on a dataset of about singing/humming 2000 clips from people with mediocre singing skills. Our studies demonstrate the feasibility of using Super MBox as a prototype for music search engines over the Internet and/or query engines in digital music libraries.
- 1.Brown, J. and Zhang, B. "Musical frequency tracking using the methods of conventional and 'narrowed' autocorrelation". Journal of the Acoustical Society of America, Volume 89, Number 5, pages 2346-2354, 1991.Google ScholarCross Ref
- 2.Chan, Chok-ki, and Ma, Chi-Kit, "A Fast Method of Designing Better Codebooks for Image Vector Quantization," IEEE Transactions on Communications, Vol. 42, No. 21314, PP. 237-242, February/March/April, 1994.Google ScholarCross Ref
- 3.Chen B. and Jang, J.-S. Roger "Query by Singing", 11th JPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 529-536, Taiwan, Aug 1998.Google Scholar
- 4.Flickner, M. and Sawhney, H. S., Ashley, Huang, J., Q., Dom, Gorkani, B., Hafner, Lee, M., J., D., Petkovic, D., D. Steele, and Yanker, P. "Query by image and video content: the QBIC system," IEEE Computers, Vol. 28, No. 9, pp.23- 32, 1995. Google ScholarDigital Library
- 5.Foote, J. "An Overview of Audio Information Retrieval," In Multimedia Systems, vol. 7 no. 1, pp. 2-l 1, ACM Press/Springer-Verlag, January 1999. Google ScholarDigital Library
- 6.Fukunaga, Keinosuke and M. Narendra, Patrenahalli "A Branch and Bound Algorithm for Computing K-Nearest Neighbors", IEEE Transactions on Computers, July 1975.Google ScholarDigital Library
- 7.Ghias, A. J. and Logan, D. Chamberlain, B. C. Smith, "Query by humming-musical information retrieval in an audio database", ACM Multimedia '95 San Francisco, 1995. (http://www2.cs.comell.edu/zeno/Papers/humming/hummin g.html) Google ScholarDigital Library
- 8.Gold, B. and Rabiner, L. "Parallel processing techniques for estimating pitch periods of speech in the time domain," J. Acoust. Sot. Am. 46 (2), pp 442-448, 1969.Google ScholarCross Ref
- 9.Hess, Wolfgang, "Pitch determination of speech signals: algorithms and devices," Springer-Verlag, 1983.Google Scholar
- 10.International Symposium on Music Information Retrieval (MUSIC IR 2000), Plymouth, Massachusetts, Oct. 23-25, 2000. (httn://ciir.cs.umass.edu/music2000/)Google Scholar
- 11.Jang, J.-S. Roger and Gao, Ming-Yang "A Query-by-Singing System based on Dynamic Programming", International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum), PP. 85-89, Hsinchu, Taiwan, Dee 2000.Google Scholar
- 12.Katsavounidis, Ioannis and Kuo, C.-C Jay and Zhang, Zhen, "Fast Tree-Structured Nearest Neighbor Encoding for Vector Quantization," IEEE Transactions on Image Processing, Vol. 5, No. 2, PP. 398-404, Feb. 1996. Google ScholarDigital Library
- 13.Kosugi, N. Y., Kon'ya, Nishihara, S., Yamamura, M. and Kushima, K. "Music Retrieval by Humming - Using Similarity Retrieval over High Dimensional Feature Vector Space," pp 404-407, IEEE 1999.Google Scholar
- 14.Kosugi, N., Nishihara, Y., Kon'ya, S., Yamamuro, M., and Kushima, K., "Let's Search for Songs by Humming!" In Proc. ACM Multimedia 99 (Part 2), page 194, November 1999. Google ScholarDigital Library
- 15.Kosugi, N., Nishihara, Y., Kon'ya, S., Yamamuro, M., and Kushima, K., "Music Retrieval by Humming," In Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 404-407, August 1999.Google Scholar
- 16.Kosugi, N., Nishihara, Y., Sakata, T., Yamamuro, M., and Kushima, K., "A practical query-by-humming system for a large music database," In Proc. ACM Multimedia 2000, November 2000. Google ScholarDigital Library
- 17.Lee, I-Yang, Jang, J.-S. Roger and Hsu, Wen-Hao "Content-based Music Retrieval from Acoustic Input", 12th IPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 325-330, Taiwan, August 1999.Google Scholar
- 18.Liu, C. C. and Chen, A. L. P., "A Multimedia Database System Supporting Content-Based Retrieval", Journal of Information Science and Engineering, 13, PP. 369- 398,1997.Google Scholar
- 19.McNab, R. J. and Smith, L. A. "Melody transcription for interactive applications" Department of Computer Science University of Waikato, New Zealand.Google Scholar
- 20.McNab, R. J., Smith, L. A. and Witten, Jan H. "Towards the Digital Music Library: Tune Retrieval from Acoustic Input"" ACM, 1996.Google Scholar
- 21.McNab, R. J., Smith, L. A., Witten, I. H. and Henderson, C. L. "Tune Retrieval in the Multimedia Library,"Google Scholar
- 22.McNab,R. J., Smith, L. A. and Witten, Jan H. "Signal Processing for Melody Transcription" Proceedings of the 19'h Australasian Computer Science Conference, 1996.Google Scholar
- 23.Proakis, J. R. J. G. and Hansen, J. H. L. "Discrete-time processing of speech signals," New York, Macmillan Pub. co., 1993. Google ScholarDigital Library
- 24.Torres, L. and Huguet, J., "An Improvement on Codebook Search for Vector Quantization," IEEE Transactions on Communications, Vol 42, No. 2/3/4, PP. 208-210, February/March/April, 1994.Google Scholar
- 25.Uitdenbogerd A. and Zobel, J. ""Melodic Matching Techniques for Large Music Databases", (httn://www.kom.e-technik.tudarnstadt.de/acmmm99/ep/uitdcnbogerd/)Google Scholar
- 26.Yianilos, Peter N. "Data structures and algorithms for nearest neighbor search in general metric spaces," In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 3 1 l-321, Austin, Texas, 25- 27 January 1993 Google ScholarDigital Library
- 27.Yianilos, Peter N. "Excluded Middle Vantage Point Forests for Nearest Neighbor Search," NEC Research Institute Technical Report, 1998Google Scholar
Index Terms
- Hierarchical filtering method for content-based music retrieval via acoustic input
Recommendations
Super MBox: an efficient/effective content-based music retrieval system
MULTIMEDIA '01: Proceedings of the ninth ACM international conference on MultimediaThis demo presents an implementation of a content-based music retrieval system that can take a user's acoustic input (8-second clip of singing or humming) via a microphone and then retrieve the intended song from a database containing 13,000 candidate ...
Microcontroller implementation of melody recognition: a prototype
MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on MultimediaThis demo presents a 16-bit microcontroller implementation of a content-based music retrieval system that can take a user's acoustic input (5-second clip of singing or humming) and then retrieve the intended song from 20 candidate songs. Performance ...
Effective Music Retrieval by Sequential Pattern-Based Alignment
TAAI '12: Proceedings of the 2012 Conference on Technologies and Applications of Artificial IntelligenceDue to the rapid growth of music data, how to effectively and efficiently retrieve the interested music piece has been an attractive issue in recent years. In traditional music retrieval systems, the most popular way is to retrieve the music piece by ...
Comments