ABSTRACT
Top end mobile phones include a number of specialized (e.g., accelerometer, compass, GPS) and general purpose sensors (e.g., microphone, camera) that enable new people-centric sensing applications. Perhaps the most ubiquitous and unexploited sensor on mobile phones is the microphone - a powerful sensor that is capable of making sophisticated inferences about human activity, location, and social events from sound. In this paper, we exploit this untapped sensor not in the context of human communications but as an enabler of new sensing applications. We propose SoundSense, a scalable framework for modeling sound events on mobile phones. SoundSense is implemented on the Apple iPhone and represents the first general purpose sound sensing system specifically designed to work on resource limited phones. The architecture and algorithms are designed for scalability and Soundsense uses a combination of supervised and unsupervised learning techniques to classify both general sound types (e.g., music, voice) and discover novel sound events specific to individual users. The system runs solely on the mobile phone with no back-end interactions. Through implementation and evaluation of two proof of concept people-centric sensing applications, we demostrate that SoundSense is capable of recognizing meaningful sound events that occur in users' everyday lives.
- ]]T. Abdelzaher, Y. Anokwa, P. Boda, J. Burke, D. Estrin, L. Guibas, A. Kansal, S. Madden, and J. Reich. Mobiscopes for human spaces. IEEE Pervasive Computing, 6(2):20---29, 2007. Google ScholarDigital Library
- ]]O. Amft, M. Stäger, P. Lukowicz, and G. Tröster. Analysis of chewing sounds for dietary monitoring. In M. Beigl, S. S. Intille, J. Rekimoto, and H.Tokuda, editors, Ubicomp, volume 3660 of Lecture Notes in Computer Science, pages 56--72. Springer, 2005. Google ScholarDigital Library
- ]]Apple. Introduction to the ob jective-c 2.0 programming language. Website, 2008. http://developer.apple.com/documentation/Cocoa/ConceptualObjectiveC/Introduction/chapter_1_section_1.html.Google Scholar
- ]]Apple. iphone. Website, 2008. http://www.apple.com/iphone/.Google Scholar
- ]]Apple. iphone sdk. Website, 2008. http://developer.apple.com/iphone/.Google Scholar
- ]]L. Bao and S. S. Intille. Activity recognition from user-annotated acceleration data. In A. Ferscha and F. Mattern, editors, Pervasive, volume 3001 of Lecture Notes in Computer Science, pages 1--17. Springer, 2004.Google Scholar
- ]]S. Basu. A linked-HMM model for robust voicing and speech detection. In Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). 2003 IEEE International Conference on, volume 1, 2003.Google Scholar
- ]]C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 2006. Google ScholarDigital Library
- ]]M. Borgerding. Kiss fft. Website, 2008. http://sourceforge.net/projects/kissfft/.Google Scholar
- ]]J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and Srivastava. Participatory sensing. In In: Workshop on World-Sensor-Web (WSW): Mobile Device Centric Sensor Networks and Applications, 2006.Google Scholar
- ]]A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, and R. A. Peterson. People-centric urban sensing. In WICON '06: Proceedings of the 2nd annual international workshop on Wireless internet, page 18, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- ]]T. Choudhury, G. Borriello, S. Consolvo, D. Haehnel, B. Harrison, B. Hemingway, J. Hightower, P. Klasnja, K. Koscher, A. LaMarca, et al. The Mobile Sensing Platform: An Embedded System for Capturing and Recognizing Human Activities. IEEE Pervasive Computing Special Issue on Activity-Based Computing, 2008. Google ScholarDigital Library
- ]]T. K. Choudhury. Sensing and modeling human networks. Technical report, Ph. D. Thesis, Program in Media Arts and Sciences, Massachusetts Institute of Technology, 2003. Google ScholarDigital Library
- ]]B. Clarkson, N. Sawhney, and A. Pentl. Auditory context awareness via wearable computing. In In Proceedings of the 1998 Workshop on Perceptual User Interfaces(PUI98), pages 4--6, 1998.Google Scholar
- ]]S. Dixon. Onset Detection Revisited. In Proceedings of the 9th International Conference on Digital Audio Effects (DAFx06), Montreal, Canada, 2006.Google Scholar
- ]]J. Foote. An overview of audio information retrieval. Multimedia Systems, 7(1):2--10, 1999. Google ScholarDigital Library
- ]]Google. Android. Website, 2008. http://code.google.com/android/.Google Scholar
- ]]F. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1):51--83, 1978.Google ScholarCross Ref
- ]]N. D. Lane, H. Lu, S. B. Eisenman, and A. T. Campbell. Cooperative techniques supporting sensor-based people-centric inferencing. In J. Indulska, D. J. Patterson, T. Rodden, and M. Ott, editors, Pervasive, volume 5013 of Lecture Notes in Computer Science, pages 75--92. Springer, 2008. Google ScholarDigital Library
- ]]M. L. Lee and A. K. Dey. Lifelogging memory appliance for people with episodic memory impairment. In H. Y. Youn and W.-D. Cho, editors, UbiComp, volume 344 of ACM International Conference Proceeding Series, pages 44--53. ACM, 2008. Google ScholarDigital Library
- ]]D. Li, I. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5):533--544, 2001. Google ScholarDigital Library
- ]]K. A. Li, T. Y. Sohn, S. Huang, and W. G. Griswold. Peopletones: a system for the detection and notification of buddy proximity on mobile phones. In MobiSys '08: Proceeding of the 6th international conference on Mobile systems, applications, and services, pages 160--173, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- ]]L. Liao, D. Fox, and H. Kautz. Extracting places and activities from gps traces using hierarchical conditional random fields. Int. J. Rob. Res., 26(1):119--134, 2007. Google ScholarDigital Library
- ]]L. Ma, B. Milner, and D. Smith. Acoustic environment classification. ACM Transactions on Speech and Language Processing (TSLP), 3(2):1--22, 2006. Google ScholarDigital Library
- ]]L. Ma, D. Smith, and B. Milner. Context Awareness Using Environmental Noise Classification. In Eighth European Conference on Speech Communication and Technology. ISCA, 2003.Google Scholar
- ]]M. McKinney and J. Breebaart. Features for audio and music classification. In Proc. ISMIR, pages 151--158, 2003.Google Scholar
- ]]E. Miluzzo, N. Lane, K. Fodor, R. Peterson, H. Lu, M. Musolesi, S. Eisenman, X. Zheng, and A. Campbell. Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application. In Proceedings of the 6th ACM conference on Embedded network sensor systems, pages 337--350. ACM New York, NY, USA, 2008. Google ScholarDigital Library
- ]]Nokia. N95. Website, 2008. http://nseries.nokia.com.Google Scholar
- ]]D. J. Patterson, L. Liao, K. Gajos, M. Collier, N. Livic, K. Olson, S. Wang, D. Fox, and H. Kautz. Opportunity knocks: A system to provide cognitive assistance with transportation services. In UbiComp 2004: Ubiquitous Computing, volume 3205 of Lecture Notes in Computer Science, pages 433--450, Berlin / Heidelberg, 2004. Springer.Google Scholar
- ]]V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa. Computational Auditory Scene Recognition. In IEEE International conference on acoustics speech and signal processing, volume 2. IEEE; 1999, 2002.Google Scholar
- ]]V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa. Computational Auditory Scene Recognition. In IEEE Intl. Conf. on Acoustics Speech and Signal Processing, volume 2. IEEE; 1999, 2002.Google Scholar
- ]]L. Rabiner and B. Juang. Fundamentals of speech recognition. 1993 Google ScholarDigital Library
- ]]D. Reynolds. An Overview of Automatic Speaker Recognition Technology. In IEEE International Conference on Acoustics Speech and Signal Processing, volume 4, pages 4072--4075. IEEE; 1999, 2002.Google Scholar
- ]]J. Saunders, L. Co, and N. Nashua. Real-time discrimination of broadcast speech/music. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, volume 2, 1996. Google ScholarDigital Library
- ]]E. Scheirer and M. Slaney. Construction and evaluation of a robust multifeature speech/musicdiscriminator. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, volume 2, 1997. Google ScholarDigital Library
- ]]A. Schmidt, K. Aidoo, A. Takaluoma, U. Tuomela, K. Van Laerhoven, and W. Van de Velde. Advanced interaction in context. In Handheld and Ubiquitous Computing: First International Symposium, Huc'99, Karlsruhe, Germany, September 27-29, 1999, Proceedings, page 89. Springer, 1999. Google ScholarDigital Library
- ]]I. Shafran, M. Riley, and M. Mohri. Voice signatures. In Automatic Speech Recognition and Understanding, 2003. ASRU'03. 2003 IEEE Workshop on, pages 31--36, 2003.Google ScholarCross Ref
- ]]C. Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10--21, 1949.Google ScholarCross Ref
- ]]D. Smith, L. Ma, and N. Ryan. Acoustic environment as an indicator of social and physical context. Personal and Ubiquitous Computing, 10(4):241--254, 2006. Google ScholarDigital Library
- ]]M. Spina and V. Zue. Automatic transcription of general audio data: preliminary analyses. In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, volume 2, 1996.Google ScholarCross Ref
- ]]G. Tzanetakis and P. Cook. Musical genre classification of audio signals. Speech and Audio Processing, IEEE Transactions on, 10(5):293--302, 2002.Google Scholar
- ]]S. Vemuri, C. Schmandt, W. Bender, S. Tellex, and B. Lassey. An audio-based personal memory aid. In N. Davies, E. D. Mynatt, and I. Siio, editors, Ubicomp, volume 3205 of Lecture Notes in Computer Science, pages 400--417. Springer, 2004.Google Scholar
- ]]I. Witten, U. of Waikato, and D. of Computer Science. Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Dept. of Computer Science, University of Waikato, 1999.Google Scholar
- ]]T. Zhang and C. Kuo. Audio-guided audiovisual data segmentation, indexing, and retrieval. In Proceedings of SPIE, volume 3656, page 316. SPIE, 1998.Google Scholar
- ]]F. Zheng, G. Zhang, and Z. Song. Comparison of Different Implementations of MFCC. Journal of Computer Science and Technology, 16(6):582--589, 2001. Google ScholarDigital Library
Index Terms
- SoundSense: scalable sound sensing for people-centric applications on mobile phones
Recommendations
Controlling Home and Office Appliances with Smart Phones
Most home and office appliances contain microprocessors. All these appliances have some user interface, but many users become frustrated with their appliances' difficult, complex functions. However, a new framework, the personal universal controller (...
A System for Detecting Unusual Sounds from Sound Environment Observed by Microphone Arrays
IAS '09: Proceedings of the 2009 Fifth International Conference on Information Assurance and Security - Volume 01In this paper, we propose a system that can detect unusual sounds and directions by observing sound environment with microphone arrays. One of the attractive features of the system is to detect the unusual information through daily environmental sound ...
A smartphone-based digital hearing aid to mitigate hearing loss at specific frequencies
MMA '14: Proceedings of the 1st Workshop on Mobile Medical ApplicationsHearing Loss is one of the three most common chronic conditions among the elderly. In many cases, an individuals hearing is only impaired at certain (not all) frequencies. Analog hearing aids boost all sound frequencies equally including frequencies in ...
Comments