Abstract
The acoustic environment provides a rich source of information on the types of activity, communication modes, and people involved in many situations. It can be accurately classified using recordings from microphones commonly found in PDAs and other consumer devices. We describe a prototype HMM-based acoustic environment classifier incorporating an adaptive learning mechanism and a hierarchical classification model. Experimental results show that we can accurately classify a wide variety of everyday environments. We also show good results classifying single sounds, although classification accuracy is influenced by the granularity of the classification.
- Bakker, E. M. and Lew, M. S. 2002. Semantic retrieval using audio analysis. In Proceedings of the Conference on Image and Video Retrieval. London UK. Lecture Notes in Computer Science, vol. 2383. 271--277. Google Scholar
- Browne, P., Czirjek, C., Gurrin, C., Jarina, R., Lee, H., Marlow, S., McDonald, K., Murphy, N., O'Connor, N. E., Smeaton, A. F., and Ye, J. 2003. Dublin City University video track experiments for TREC 2002.Google Scholar
- Cai, R., Lu, L., Zhang, H. J., and Cai, L.-H. 2003. Using structure patterns of temporal and spectral feature in audio similarity measure. In Proceedings of the ACM Multimedia Conference. Berkeley, CA. (Nov.). 219--222. Google Scholar
- Clarkson, B., Sawhney, N., and Pentland, A. 1998. Auditory context awareness via wearable computing. Workshop on Perceptual User Interfaces. 37--42.Google Scholar
- Couvreur, L. and Laniray, M. 2004. Automatic noise recognition in urban environments based on artificial neural networks and hidden Markov models. Inter-noise2004. Prague, Czech Republic.Google Scholar
- Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification, 2nd Ed. Wiley, New York, NY. Google Scholar
- Foote, J. 1999. An overview of audio information retrieval. Multimedia Syst. 7, 1, 2--11. Google Scholar
- Gaunard, P., Mubikangiey, C. G., Couvreur, C., and Fontaine, V. 1998. Automatic classification of environmental noise events by hidden Markov models. Appl. Acoustics 54, 3, 187.Google Scholar
- Hindus, D. and Schmandt, C. 1992. Ubiquitous audio: Capturing spontaneous collaboration. In Proceedings of Computer-Supported Cooperative Work (CSCW). Toronto, Canada (Nov.). 210--217. Google Scholar
- Huang, X., Acero, A., and Hon, H. 2001. Spoken Language Processing. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
- Leggetter, C. J. and Woodland, P. C. 1995. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9, 171--185.Google Scholar
- Liu, L., Jiang, H., and Zhang, H.-J. 2001. A robust audio classification and segmentation method. In Proceedings of the ACM Multimedia Conference. Ottawa, Canada. 203--211. Google Scholar
- Liu, L., Zhang, H.-J., and Jiang, H. 2002. Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10, 7, 504--516.Google Scholar
- Ma, L., Smith, D. J., and Milner, B. P. 2003. Context awareness using environmental noise classification. In Proceedings of Eurospeech. Geneva, Switzerland, 2237--2240.Google Scholar
- Ma, L., Smith, D. J., and Milner, B. P. 2003. Environmental noise classification for context-aware applications. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA). Lecture Notes in Computer Science, vol. 2736. 360--370.Google Scholar
- Mynatt, E. D., Back, M., Want, R., Baer, M., and Ellis, J. B. 1998. Designing audio aura. In Proceedings of Conference on Human Factors in Computing Systems (CHI'98). 566--573. Google Scholar
- Nishiura, T., Nakamura, S., Miki, K., and Shikano, K. 2003. Environment sound source identification based on hidden Markov model for robust speech recognition. In Proceedings of EuroSpeech. 2157--2160.Google Scholar
- Peltonen, V. T. K., Eronen, A. J., Parviainen, M. P., and Klapuri, A. P. 2001. Recognition of everyday auditory environments: Potentials, latencies and cues, 110th Convention of Audio Engineering Society.Google Scholar
- Peltonen, V., Tuomi, J., Klapuri, A., Huopaniemi, J., and Sorsa, T. 2002. Computational auditory environment recognition. In Proceedings of the International Conference on Acoustic, Speech, and Signal Processing. Orlando, FL.Google Scholar
- Quénot, G. M., Moraru, D., Besacier, L., and Hulhem, P. 2003. CLIPS at TREC-11: Experiments in video retrieval. TREC-2002.Google Scholar
- Sawhney, N. and Schmandt, C. 2000. Nomadic radio: Speech and audio interaction for contextual messaging in nomadic environments. ACM Trans. Comput. Human. Interact. 7, 3, 353--383. Google Scholar
- Sawhney, N. 1997. Situational awareness from environmental sounds. Tech. rep. for Modeling Adaptive Behavior (MAS 738). MIT Media Lab.Google Scholar
- Scheirer, E. and Slaney, M. 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1331--1334. Google Scholar
- Schmandt, C., Marmasse, N., Marti, S., Shawhney, N., and Wheeler, S. 2000. Everywhere messaging. IBM Syst. J. 39, 3--4, 660--677. Google Scholar
- Smeaton, A. F. and Over, P. 2003. TRECVID: Benchmarking the effectivenss of information retrieval tasks on digital video. Lecture Notes in Computer Science, vol. 2728. 19--27. Google Scholar
- Smith, D., Ma, L., and Ryan, N. 2005. Acoustic environment as an indicator of social and physical context. Person. Ubiquitous Comput. 10, 1 (DOI: 10.1007/s00779-005-0045-4). Google Scholar
- Srinivasen, S., Petkovic, D., and Poncelon, D. B. 1999. Towards robust features for classifying audio in the CueVideo system. In Proceedings of the ACM Multimedia Conference. 393--340. Google Scholar
- Stäger, M., Lukowitz, P., and Tröster, G. 2004. Implementation and evaluation of a low-power sound-based user activity recognition system. International Semantic Web Conference. 138--141. Google Scholar
- Steward, J. 2005. Using a PDA for audio capture. BSc Project, University of East Anglia, Norwich, UK.Google Scholar
- Toyoda, Y., Huang, J., Ding, S., and Liu, Y. 2004. Environmental sound recognition by multilayered neural networks. In Proceedings of the 4th International Conference on Computer and Information Technology (CIT '04). 123--127. Google Scholar
- Tzanetakis, G. and Cook, P. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 5, 293--302.Google Scholar
- Vega, V. S B., Bressan, S. 2003. Continuous naive bayesian classifications. In Proceedings of the International Conference on Asian Digital Libraries. 279--289.Google Scholar
- Vendrig, J., den Hartog, J., van Leeuwen, D., Patras, I., Raaijmakers, S., van Rest, J., Snoek, C., and Worring, M. 2003. TREC feature extraction by Active learning, TREC-2002.Google Scholar
- Wold, E., Blum, T., Keslar, D., and Wheaton, J. 1996. Content-based classification search and retrieval of audio. IEEE Multimedia 3, 3, 27--36. Google Scholar
- Wu, L., Guo, Y., Qiu, X., Feng, Z., Rong, J., Jin, W., Zhou, D., Wang, R., and Jin, M. 2003. TRECVid 2003. TREC-2003.Google Scholar
- Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. 2001. The HTK Book 3.1. Cambridge University Engineering Department, Cambridge, UK. http://htk.eng.cam.ac.uk.Google Scholar
- Zhuang, L., Zhou, F., and Tyger, J. D. 2005. Keyboard acoustic emanations revisited. In Proceedings of the ACM Conference on Computer and Communications Security. Alexandria, VA (Nov). Google Scholar
Index Terms
- Acoustic environment classification
Recommendations
Classification of similar impact sounds
ICISP'10: Proceedings of the 4th international conference on Image and signal processingSeveral sound classifiers have been developed throughout the years. The accuracy provided by these classifiers is influenced by the features they use and the classification method implemented. While there are many approaches in sound feature extraction ...
Sound classification in hearing aids inspired by auditory scene analysis
A sound classification system for the automatic recognition of the acoustic environment in a hearing aid is discussed. The system distinguishes the four sound classes "clean speech," "speech in noise," "noise," and "music." A number of features that are ...
Automatic rain and cicada chorus filtering of bird acoustic data
AbstractRecording and analysing environmental audio recordings has become a common approach for monitoring the environment. This has several advantages over other approaches, such as reducing costs by avoiding the need for experts to be ...
Highlights- Combinations of tasks for detecting rain and cicada choruses in audio are explored.
Comments