skip to main content
10.1145/1555816.1555834acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

SoundSense: scalable sound sensing for people-centric applications on mobile phones

Published:22 June 2009Publication History

ABSTRACT

Top end mobile phones include a number of specialized (e.g., accelerometer, compass, GPS) and general purpose sensors (e.g., microphone, camera) that enable new people-centric sensing applications. Perhaps the most ubiquitous and unexploited sensor on mobile phones is the microphone - a powerful sensor that is capable of making sophisticated inferences about human activity, location, and social events from sound. In this paper, we exploit this untapped sensor not in the context of human communications but as an enabler of new sensing applications. We propose SoundSense, a scalable framework for modeling sound events on mobile phones. SoundSense is implemented on the Apple iPhone and represents the first general purpose sound sensing system specifically designed to work on resource limited phones. The architecture and algorithms are designed for scalability and Soundsense uses a combination of supervised and unsupervised learning techniques to classify both general sound types (e.g., music, voice) and discover novel sound events specific to individual users. The system runs solely on the mobile phone with no back-end interactions. Through implementation and evaluation of two proof of concept people-centric sensing applications, we demostrate that SoundSense is capable of recognizing meaningful sound events that occur in users' everyday lives.

References

  1. ]]T. Abdelzaher, Y. Anokwa, P. Boda, J. Burke, D. Estrin, L. Guibas, A. Kansal, S. Madden, and J. Reich. Mobiscopes for human spaces. IEEE Pervasive Computing, 6(2):20---29, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ]]O. Amft, M. Stäger, P. Lukowicz, and G. Tröster. Analysis of chewing sounds for dietary monitoring. In M. Beigl, S. S. Intille, J. Rekimoto, and H.Tokuda, editors, Ubicomp, volume 3660 of Lecture Notes in Computer Science, pages 56--72. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ]]Apple. Introduction to the ob jective-c 2.0 programming language. Website, 2008. http://developer.apple.com/documentation/Cocoa/ConceptualObjectiveC/Introduction/chapter_1_section_1.html.Google ScholarGoogle Scholar
  4. ]]Apple. iphone. Website, 2008. http://www.apple.com/iphone/.Google ScholarGoogle Scholar
  5. ]]Apple. iphone sdk. Website, 2008. http://developer.apple.com/iphone/.Google ScholarGoogle Scholar
  6. ]]L. Bao and S. S. Intille. Activity recognition from user-annotated acceleration data. In A. Ferscha and F. Mattern, editors, Pervasive, volume 3001 of Lecture Notes in Computer Science, pages 1--17. Springer, 2004.Google ScholarGoogle Scholar
  7. ]]S. Basu. A linked-HMM model for robust voicing and speech detection. In Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). 2003 IEEE International Conference on, volume 1, 2003.Google ScholarGoogle Scholar
  8. ]]C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ]]M. Borgerding. Kiss fft. Website, 2008. http://sourceforge.net/projects/kissfft/.Google ScholarGoogle Scholar
  10. ]]J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and Srivastava. Participatory sensing. In In: Workshop on World-Sensor-Web (WSW): Mobile Device Centric Sensor Networks and Applications, 2006.Google ScholarGoogle Scholar
  11. ]]A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, and R. A. Peterson. People-centric urban sensing. In WICON '06: Proceedings of the 2nd annual international workshop on Wireless internet, page 18, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ]]T. Choudhury, G. Borriello, S. Consolvo, D. Haehnel, B. Harrison, B. Hemingway, J. Hightower, P. Klasnja, K. Koscher, A. LaMarca, et al. The Mobile Sensing Platform: An Embedded System for Capturing and Recognizing Human Activities. IEEE Pervasive Computing Special Issue on Activity-Based Computing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. ]]T. K. Choudhury. Sensing and modeling human networks. Technical report, Ph. D. Thesis, Program in Media Arts and Sciences, Massachusetts Institute of Technology, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. ]]B. Clarkson, N. Sawhney, and A. Pentl. Auditory context awareness via wearable computing. In In Proceedings of the 1998 Workshop on Perceptual User Interfaces(PUI98), pages 4--6, 1998.Google ScholarGoogle Scholar
  15. ]]S. Dixon. Onset Detection Revisited. In Proceedings of the 9th International Conference on Digital Audio Effects (DAFx06), Montreal, Canada, 2006.Google ScholarGoogle Scholar
  16. ]]J. Foote. An overview of audio information retrieval. Multimedia Systems, 7(1):2--10, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ]]Google. Android. Website, 2008. http://code.google.com/android/.Google ScholarGoogle Scholar
  18. ]]F. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1):51--83, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  19. ]]N. D. Lane, H. Lu, S. B. Eisenman, and A. T. Campbell. Cooperative techniques supporting sensor-based people-centric inferencing. In J. Indulska, D. J. Patterson, T. Rodden, and M. Ott, editors, Pervasive, volume 5013 of Lecture Notes in Computer Science, pages 75--92. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ]]M. L. Lee and A. K. Dey. Lifelogging memory appliance for people with episodic memory impairment. In H. Y. Youn and W.-D. Cho, editors, UbiComp, volume 344 of ACM International Conference Proceeding Series, pages 44--53. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. ]]D. Li, I. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5):533--544, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. ]]K. A. Li, T. Y. Sohn, S. Huang, and W. G. Griswold. Peopletones: a system for the detection and notification of buddy proximity on mobile phones. In MobiSys '08: Proceeding of the 6th international conference on Mobile systems, applications, and services, pages 160--173, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. ]]L. Liao, D. Fox, and H. Kautz. Extracting places and activities from gps traces using hierarchical conditional random fields. Int. J. Rob. Res., 26(1):119--134, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. ]]L. Ma, B. Milner, and D. Smith. Acoustic environment classification. ACM Transactions on Speech and Language Processing (TSLP), 3(2):1--22, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. ]]L. Ma, D. Smith, and B. Milner. Context Awareness Using Environmental Noise Classification. In Eighth European Conference on Speech Communication and Technology. ISCA, 2003.Google ScholarGoogle Scholar
  26. ]]M. McKinney and J. Breebaart. Features for audio and music classification. In Proc. ISMIR, pages 151--158, 2003.Google ScholarGoogle Scholar
  27. ]]E. Miluzzo, N. Lane, K. Fodor, R. Peterson, H. Lu, M. Musolesi, S. Eisenman, X. Zheng, and A. Campbell. Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application. In Proceedings of the 6th ACM conference on Embedded network sensor systems, pages 337--350. ACM New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. ]]Nokia. N95. Website, 2008. http://nseries.nokia.com.Google ScholarGoogle Scholar
  29. ]]D. J. Patterson, L. Liao, K. Gajos, M. Collier, N. Livic, K. Olson, S. Wang, D. Fox, and H. Kautz. Opportunity knocks: A system to provide cognitive assistance with transportation services. In UbiComp 2004: Ubiquitous Computing, volume 3205 of Lecture Notes in Computer Science, pages 433--450, Berlin / Heidelberg, 2004. Springer.Google ScholarGoogle Scholar
  30. ]]V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa. Computational Auditory Scene Recognition. In IEEE International conference on acoustics speech and signal processing, volume 2. IEEE; 1999, 2002.Google ScholarGoogle Scholar
  31. ]]V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa. Computational Auditory Scene Recognition. In IEEE Intl. Conf. on Acoustics Speech and Signal Processing, volume 2. IEEE; 1999, 2002.Google ScholarGoogle Scholar
  32. ]]L. Rabiner and B. Juang. Fundamentals of speech recognition. 1993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. ]]D. Reynolds. An Overview of Automatic Speaker Recognition Technology. In IEEE International Conference on Acoustics Speech and Signal Processing, volume 4, pages 4072--4075. IEEE; 1999, 2002.Google ScholarGoogle Scholar
  34. ]]J. Saunders, L. Co, and N. Nashua. Real-time discrimination of broadcast speech/music. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, volume 2, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. ]]E. Scheirer and M. Slaney. Construction and evaluation of a robust multifeature speech/musicdiscriminator. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, volume 2, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. ]]A. Schmidt, K. Aidoo, A. Takaluoma, U. Tuomela, K. Van Laerhoven, and W. Van de Velde. Advanced interaction in context. In Handheld and Ubiquitous Computing: First International Symposium, Huc'99, Karlsruhe, Germany, September 27-29, 1999, Proceedings, page 89. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. ]]I. Shafran, M. Riley, and M. Mohri. Voice signatures. In Automatic Speech Recognition and Understanding, 2003. ASRU'03. 2003 IEEE Workshop on, pages 31--36, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  38. ]]C. Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10--21, 1949.Google ScholarGoogle ScholarCross RefCross Ref
  39. ]]D. Smith, L. Ma, and N. Ryan. Acoustic environment as an indicator of social and physical context. Personal and Ubiquitous Computing, 10(4):241--254, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. ]]M. Spina and V. Zue. Automatic transcription of general audio data: preliminary analyses. In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, volume 2, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  41. ]]G. Tzanetakis and P. Cook. Musical genre classification of audio signals. Speech and Audio Processing, IEEE Transactions on, 10(5):293--302, 2002.Google ScholarGoogle Scholar
  42. ]]S. Vemuri, C. Schmandt, W. Bender, S. Tellex, and B. Lassey. An audio-based personal memory aid. In N. Davies, E. D. Mynatt, and I. Siio, editors, Ubicomp, volume 3205 of Lecture Notes in Computer Science, pages 400--417. Springer, 2004.Google ScholarGoogle Scholar
  43. ]]I. Witten, U. of Waikato, and D. of Computer Science. Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Dept. of Computer Science, University of Waikato, 1999.Google ScholarGoogle Scholar
  44. ]]T. Zhang and C. Kuo. Audio-guided audiovisual data segmentation, indexing, and retrieval. In Proceedings of SPIE, volume 3656, page 316. SPIE, 1998.Google ScholarGoogle Scholar
  45. ]]F. Zheng, G. Zhang, and Z. Song. Comparison of Different Implementations of MFCC. Journal of Computer Science and Technology, 16(6):582--589, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SoundSense: scalable sound sensing for people-centric applications on mobile phones

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MobiSys '09: Proceedings of the 7th international conference on Mobile systems, applications, and services
        June 2009
        370 pages
        ISBN:9781605585666
        DOI:10.1145/1555816

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate274of1,679submissions,16%

        Upcoming Conference

        MOBISYS '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader