skip to main content
article

Acoustic environment classification

Published:01 July 2006Publication History
Skip Abstract Section

Abstract

The acoustic environment provides a rich source of information on the types of activity, communication modes, and people involved in many situations. It can be accurately classified using recordings from microphones commonly found in PDAs and other consumer devices. We describe a prototype HMM-based acoustic environment classifier incorporating an adaptive learning mechanism and a hierarchical classification model. Experimental results show that we can accurately classify a wide variety of everyday environments. We also show good results classifying single sounds, although classification accuracy is influenced by the granularity of the classification.

References

  1. Bakker, E. M. and Lew, M. S. 2002. Semantic retrieval using audio analysis. In Proceedings of the Conference on Image and Video Retrieval. London UK. Lecture Notes in Computer Science, vol. 2383. 271--277. Google ScholarGoogle Scholar
  2. Browne, P., Czirjek, C., Gurrin, C., Jarina, R., Lee, H., Marlow, S., McDonald, K., Murphy, N., O'Connor, N. E., Smeaton, A. F., and Ye, J. 2003. Dublin City University video track experiments for TREC 2002.Google ScholarGoogle Scholar
  3. Cai, R., Lu, L., Zhang, H. J., and Cai, L.-H. 2003. Using structure patterns of temporal and spectral feature in audio similarity measure. In Proceedings of the ACM Multimedia Conference. Berkeley, CA. (Nov.). 219--222. Google ScholarGoogle Scholar
  4. Clarkson, B., Sawhney, N., and Pentland, A. 1998. Auditory context awareness via wearable computing. Workshop on Perceptual User Interfaces. 37--42.Google ScholarGoogle Scholar
  5. Couvreur, L. and Laniray, M. 2004. Automatic noise recognition in urban environments based on artificial neural networks and hidden Markov models. Inter-noise2004. Prague, Czech Republic.Google ScholarGoogle Scholar
  6. Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification, 2nd Ed. Wiley, New York, NY. Google ScholarGoogle Scholar
  7. Foote, J. 1999. An overview of audio information retrieval. Multimedia Syst. 7, 1, 2--11. Google ScholarGoogle Scholar
  8. Gaunard, P., Mubikangiey, C. G., Couvreur, C., and Fontaine, V. 1998. Automatic classification of environmental noise events by hidden Markov models. Appl. Acoustics 54, 3, 187.Google ScholarGoogle Scholar
  9. Hindus, D. and Schmandt, C. 1992. Ubiquitous audio: Capturing spontaneous collaboration. In Proceedings of Computer-Supported Cooperative Work (CSCW). Toronto, Canada (Nov.). 210--217. Google ScholarGoogle Scholar
  10. Huang, X., Acero, A., and Hon, H. 2001. Spoken Language Processing. Prentice Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  11. Leggetter, C. J. and Woodland, P. C. 1995. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9, 171--185.Google ScholarGoogle Scholar
  12. Liu, L., Jiang, H., and Zhang, H.-J. 2001. A robust audio classification and segmentation method. In Proceedings of the ACM Multimedia Conference. Ottawa, Canada. 203--211. Google ScholarGoogle Scholar
  13. Liu, L., Zhang, H.-J., and Jiang, H. 2002. Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10, 7, 504--516.Google ScholarGoogle Scholar
  14. Ma, L., Smith, D. J., and Milner, B. P. 2003. Context awareness using environmental noise classification. In Proceedings of Eurospeech. Geneva, Switzerland, 2237--2240.Google ScholarGoogle Scholar
  15. Ma, L., Smith, D. J., and Milner, B. P. 2003. Environmental noise classification for context-aware applications. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA). Lecture Notes in Computer Science, vol. 2736. 360--370.Google ScholarGoogle Scholar
  16. Mynatt, E. D., Back, M., Want, R., Baer, M., and Ellis, J. B. 1998. Designing audio aura. In Proceedings of Conference on Human Factors in Computing Systems (CHI'98). 566--573. Google ScholarGoogle Scholar
  17. Nishiura, T., Nakamura, S., Miki, K., and Shikano, K. 2003. Environment sound source identification based on hidden Markov model for robust speech recognition. In Proceedings of EuroSpeech. 2157--2160.Google ScholarGoogle Scholar
  18. Peltonen, V. T. K., Eronen, A. J., Parviainen, M. P., and Klapuri, A. P. 2001. Recognition of everyday auditory environments: Potentials, latencies and cues, 110th Convention of Audio Engineering Society.Google ScholarGoogle Scholar
  19. Peltonen, V., Tuomi, J., Klapuri, A., Huopaniemi, J., and Sorsa, T. 2002. Computational auditory environment recognition. In Proceedings of the International Conference on Acoustic, Speech, and Signal Processing. Orlando, FL.Google ScholarGoogle Scholar
  20. Quénot, G. M., Moraru, D., Besacier, L., and Hulhem, P. 2003. CLIPS at TREC-11: Experiments in video retrieval. TREC-2002.Google ScholarGoogle Scholar
  21. Sawhney, N. and Schmandt, C. 2000. Nomadic radio: Speech and audio interaction for contextual messaging in nomadic environments. ACM Trans. Comput. Human. Interact. 7, 3, 353--383. Google ScholarGoogle Scholar
  22. Sawhney, N. 1997. Situational awareness from environmental sounds. Tech. rep. for Modeling Adaptive Behavior (MAS 738). MIT Media Lab.Google ScholarGoogle Scholar
  23. Scheirer, E. and Slaney, M. 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1331--1334. Google ScholarGoogle Scholar
  24. Schmandt, C., Marmasse, N., Marti, S., Shawhney, N., and Wheeler, S. 2000. Everywhere messaging. IBM Syst. J. 39, 3--4, 660--677. Google ScholarGoogle Scholar
  25. Smeaton, A. F. and Over, P. 2003. TRECVID: Benchmarking the effectivenss of information retrieval tasks on digital video. Lecture Notes in Computer Science, vol. 2728. 19--27. Google ScholarGoogle Scholar
  26. Smith, D., Ma, L., and Ryan, N. 2005. Acoustic environment as an indicator of social and physical context. Person. Ubiquitous Comput. 10, 1 (DOI: 10.1007/s00779-005-0045-4). Google ScholarGoogle Scholar
  27. Srinivasen, S., Petkovic, D., and Poncelon, D. B. 1999. Towards robust features for classifying audio in the CueVideo system. In Proceedings of the ACM Multimedia Conference. 393--340. Google ScholarGoogle Scholar
  28. Stäger, M., Lukowitz, P., and Tröster, G. 2004. Implementation and evaluation of a low-power sound-based user activity recognition system. International Semantic Web Conference. 138--141. Google ScholarGoogle Scholar
  29. Steward, J. 2005. Using a PDA for audio capture. BSc Project, University of East Anglia, Norwich, UK.Google ScholarGoogle Scholar
  30. Toyoda, Y., Huang, J., Ding, S., and Liu, Y. 2004. Environmental sound recognition by multilayered neural networks. In Proceedings of the 4th International Conference on Computer and Information Technology (CIT '04). 123--127. Google ScholarGoogle Scholar
  31. Tzanetakis, G. and Cook, P. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 5, 293--302.Google ScholarGoogle Scholar
  32. Vega, V. S B., Bressan, S. 2003. Continuous naive bayesian classifications. In Proceedings of the International Conference on Asian Digital Libraries. 279--289.Google ScholarGoogle Scholar
  33. Vendrig, J., den Hartog, J., van Leeuwen, D., Patras, I., Raaijmakers, S., van Rest, J., Snoek, C., and Worring, M. 2003. TREC feature extraction by Active learning, TREC-2002.Google ScholarGoogle Scholar
  34. Wold, E., Blum, T., Keslar, D., and Wheaton, J. 1996. Content-based classification search and retrieval of audio. IEEE Multimedia 3, 3, 27--36. Google ScholarGoogle Scholar
  35. Wu, L., Guo, Y., Qiu, X., Feng, Z., Rong, J., Jin, W., Zhou, D., Wang, R., and Jin, M. 2003. TRECVid 2003. TREC-2003.Google ScholarGoogle Scholar
  36. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. 2001. The HTK Book 3.1. Cambridge University Engineering Department, Cambridge, UK. http://htk.eng.cam.ac.uk.Google ScholarGoogle Scholar
  37. Zhuang, L., Zhou, F., and Tyger, J. D. 2005. Keyboard acoustic emanations revisited. In Proceedings of the ACM Conference on Computer and Communications Security. Alexandria, VA (Nov). Google ScholarGoogle Scholar

Index Terms

  1. Acoustic environment classification

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Speech and Language Processing
          ACM Transactions on Speech and Language Processing   Volume 3, Issue 2
          July 2006
          94 pages
          ISSN:1550-4875
          EISSN:1550-4883
          DOI:10.1145/1149290
          Issue’s Table of Contents

          Copyright © 2006 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 July 2006
          Published in tslp Volume 3, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader