Abstract
This paper examines sensor fusion techniques for modeling opportunities for proactive speech-based in-car interfaces. We leverage the Is Now a Good Time (INAGT) dataset, which consists of automotive, physiological, and visual data collected from drivers who self-annotated responses to the question "Is now a good time?," indicating the opportunity to receive non-driving information during a 50-minute drive. We augment this original driver-annotated data with third-party annotations of perceived safety, in order to explore potential driver overconfidence. We show that fusing automotive, physiological, and visual data allows us to predict driver labels of availability, achieving an 0.874 F1-score by extracting statistically relevant features and training with our proposed deep neural network, PazNet. Using the same data and network, we achieve an 0.891 F1-score for predicting third-party labeled safe moments. We train these models to avoid false positives---determinations that it is a good time to interrupt when it is not---since false positives may cause driver distraction or service deactivation by the driver. Our analyses show that conservative models still leave many moments for interaction and show that most inopportune moments are short. This work lays a foundation for using sensor fusion models to predict when proactive speech systems should engage with drivers.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Learning When Agents Can Talk to Drivers Using the INAGT Dataset and Multisensor Fusion
- Antonio Affanni, Riccardo Bernardini, Alessandro Piras, Roberto Rinaldo, and Pamela Zontone. 2018. Driver's stress detection using skin potential response signals. Measurement 122 (2018), 264--274.Google ScholarCross Ref
- Jóhannes Ingi Árnason, Jannik Jepsen, Allan Koudal, Michael Rosendahl Schmidt, and Stefania Serafin. 2014. Volvo intelligent news: A context aware multi modal proactive recommender system for in-vehicle use. Pervasive and Mobile Computing 14 (2014), 95--111.Google ScholarDigital Library
- Daniel Avrahami, James Fogarty, and Scott E. Hudson. 2007. Biases in Human Estimation of Interruptibility: Effects and Implications for Practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI '07). ACM, New York, NY, USA, 50--60. https://doi.org/10.1145/1240624.1240632Google Scholar
- Siddhartha Banerjee, Andrew Silva, Karen Feigh, and Sonia Chernova. 2018. Effects of interruptibility-aware robot behavior. arXiv preprint arXiv:1804.06383 (2018).Google Scholar
- Adriana Barón and Paul Green. 2006. Safety and usability of speech interfaces for in-vehicle tasks while driving: A brief literature review. Technical Report. University of Michigan, Transportation Research Institute.Google Scholar
- Adriana Barón and Paul Green. 2006. Safety and usability of speech interfaces for in-vehicle tasks while driving: A brief literature review. Technical Report. University of Michigan, Transportation Research Institute. https://www.researchgate.net/profile/Paul_Green2/publication/254968106_Safety_and_Usability_of_Speech_Interfaces_for_In-Vehicle_Tasks_while_Driving_A_Brief_Literature_Review/links/56d99e0108aebabdb40f790d.pdfGoogle Scholar
- Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, and Matthias Grundmann. 2020. BlazePose: On-device Real-time Body Pose tracking. arXiv preprint arXiv:2006.10204 (2020).Google Scholar
- Luis Miguel Bergasa, Jesús Nuevo, Miguel A Sotelo, Rafael Barea, and María Elena Lopez. 2006. Real-time system for monitoring driver vigilance. IEEE Transactions on Intelligent Transportation Systems 7, 1 (2006), 63--77.Google ScholarDigital Library
- Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review 60, 2 (2018), 223--311.Google ScholarCross Ref
- Pinar Boyraz, Xuebo Yang, and John HL Hansen. 2012. Computer vision systems for "context-aware" active vehicle safety and driver assistance. In Digital Signal Processing for In-Vehicle Systems and Safety. Springer, 217--227.Google Scholar
- Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).Google Scholar
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarCross Ref
- Sandro Castronovo, Angela Mahr, Margarita Pentcheva, and Christian Müller. 2010. Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions. In Eleventh Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
- Narae Cha, Auk Kim, Cheul Young Park, Soowon Kang, Mingyu Park, Jae-Gil Lee, Sangsu Lee, and Uichin Lee. 2020. Hello There! Is Now a Good Time to Talk? Opportune Moments for Proactive Interactions with Smart Speakers. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3, Article 74 (Sept. 2020), 28 pages. https://doi.org/10.1145/3411810Google ScholarDigital Library
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.Google ScholarCross Ref
- Daniel Chen, Jamie Hart, and Roel Vertegaal. 2007. Towards a physiological model of user interruptability. In IFIP Conference on Human-Computer Interaction. Springer, 439--451.Google ScholarDigital Library
- Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. Time series feature extraction on basis of scalable hypothesis tests (tsfresh-a python package). Neurocomputing 307 (2018), 72--77.Google ScholarDigital Library
- Maximilian Christ, Andreas W. Kempa-Liehr, and Michael Feindt. 2016. Distributed and parallel time series feature extraction for industrial big data applications. arXiv:1610.07717 [cs.LG]Google Scholar
- Mary Czerwinski, Edward Cutrell, and Eric Horvitz. 2000. Instant messaging: Effects of relevance and timing. In People and computers XIV: Proceedings of HCI, Vol. 2. 71--76.Google Scholar
- Alberto Vianna Dias da Silva, Lucas Borges, and Vaninha Vieira. 2018. CDNA: A Context-Aware Notification System for Driver Interruption. In Proceedings of the 17th Brazilian Symposium on Human Factors in Computing Systems. 1--8.Google ScholarDigital Library
- David M DeJoy. 1989. The optimism bias and traffic accident risk perception. Accident Analysis & Prevention 21, 4 (1989), 333--340.Google ScholarCross Ref
- Frank A Drews, Monisha Pasupathi, and David L Strayer. 2008. Passenger and cell phone conversations in simulated driving. Journal of Experimental Psychology: Applied 14, 4 (2008), 392.Google ScholarCross Ref
- Haluk Eren, Semiha Makinist, Erhan Akin, and Alper Yilmaz. 2012. Estimating driving behavior by a smartphone. In Intelligent Vehicles Symposium (IV), 2012 IEEE. IEEE, 234--239.Google ScholarCross Ref
- Robert Fisher and Reid Simmons. 2011. Smartphone interruptibility using density-weighted uncertainty sampling with reinforcement learning. In Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, Vol. 1. IEEE, 436--441.Google ScholarDigital Library
- James Fogarty, Scott E Hudson, Christopher G Atkeson, Daniel Avrahami, Jodi Forlizzi, Sara Kiesler, Johnny C Lee, and Jie Yang. 2005. Predicting human interruptibility with sensors. ACM Transactions on Computer-Human Interaction (TOCHI) 12, 1 (2005), 119--146.Google ScholarDigital Library
- James Fogarty, Andrew J Ko, Htet Htet Aung, Elspeth Golden, Karen P Tang, and Scott E Hudson. 2005. Examining task engagement in sensor-based statistical models of human interruptibility. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 331--340.Google ScholarDigital Library
- Nitesh Goyal and Susan R. Fussell. 2017. Intelligent Interruption Management Using Electro Dermal Activity Based Physiological Sensor for Collaborative Sensemaking. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, Article 52 (Sept. 2017), 21 pages. https://doi.org/10.1145/3130917Google ScholarDigital Library
- Nis Hjortskov, Dag Rissén, Anne Katrine Blangsted, Nils Fallentin, Ulf Lundberg, and Karen Søgaard. 2004. The effect of mental stress on heart rate variability and blood pressure during computer work. European journal of applied physiology 92, 1--2 (2004), 84--89.Google Scholar
- Scott Hudson, James Fogarty, Christopher Atkeson, Daniel Avrahami, Jodi Forlizzi, Sara Kiesler, Johnny Lee, and Jie Yang. 2003. Predicting human interruptibility with sensors: a Wizard of Oz feasibility study. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 257--264.Google ScholarDigital Library
- Shamsi T Iqbal, Eric Horvitz, Yun-Cheng Ju, and Ella Mathews. 2011. Hang on a sec! Effects of proactive mediation of phone conversations while driving. In Proceedings of the SIGCHI conference on human factors in computing systems. 463--472.Google ScholarDigital Library
- Maria Jabon, Jeremy Bailenson, Emmanuel Pontikakis, Leila Takayama, and Clifford Nass. 2010. Facial expression analysis for predicting unsafe driving behavior. IEEE Pervasive Computing 10, 4 (2010), 84--95.Google ScholarDigital Library
- James W Jenness, Linda Ng Boyle, John D Lee, Chun-Cheng Chang, Vindhya Venkatraman, Madeleine Gibson, Kaitlin E Riegler, and Daniel Kellman. 2016. In-vehicle voice control interface performance evaluation. Technical Report.Google Scholar
- Qiang Ji and Xiaojie Yang. 2001. Real time visual cues extraction for monitoring driver vigilance. In International Conference on Computer Vision Systems. Springer, 107--124.Google ScholarCross Ref
- Qiang Ji and Xiaojie Yang. 2002. Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-time imaging 8, 5 (2002), 357--377.Google Scholar
- Joni Kettunen, Niklas Ravaja, Petri Näätänen, Pertti Keskivaara, and Liisa Keltikangas-Järvinen. 1998. The synchronization of electrodermal activity and heart rate and its relationship to energetic arousal: A time series approach. Biological Psychology 48, 3 (1998), 209--225.Google ScholarCross Ref
- David G Kidd, William J Horrey, et al. 2010. Distracted Driving: Do Drivers' Perceptions of Distractions Become more Accurate Over Time? Professional Safety 55, 01 (2010), 40--45.Google Scholar
- Auk Kim, Woohyeok Choi, Jungmi Park, Kyeyoon Kim, and Uichin Lee. 2018. Interrupting Drivers for Interactions: Predicting Opportune Moments for In-vehicle Proactive Auditory-verbal Tasks. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--28.Google ScholarDigital Library
- Auk Kim, Jung-Mi Park, and Uichin Lee. 2020. Interruptibility for In-vehicle Multitasking: Influence of Voice Task Demands and Adaptive Behaviors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--22.Google ScholarDigital Library
- SeungJun Kim, Jaemin Chun, and Anind K. Dey. 2015. Sensors Know When to Interrupt You in the Car: Detecting Driver Interruptibility Through Monitoring of Peripheral Interactions. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI '15). ACM, New York, NY, USA, 487--496. https://doi.org/10.1145/2702123.2702409Google Scholar
- Ankita S Kulkarni and Sagar B Shinde. 2017. A review paper on monitoring driver distraction in real time using computer vision system. In 2017 IEEE International Conference on Electrical, Instrumentation and Communication Engineering (ICEICE). IEEE, 1--4.Google ScholarCross Ref
- Haet Bit Lee, Jong Min Choi, Jung Soo Kim, Yun Seong Kim, Hyun Jae Baek, Myung Suk Ryu, Ryang Hee Sohn, and Kwang Suk Park. 2007. Nonintrusive biosignal measurement system in a vehicle. In 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2303--2306.Google ScholarCross Ref
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarCross Ref
- Charles X Ling and Victor S Sheng. 2008. Cost-sensitive learning and the class imbalance problem., 231--235 pages.Google Scholar
- Chien-Liang Liu, Wen-Hoar Hsaio, and Yao-Chung Tu. 2018. Time series classification with multivariate convolutional neural network. IEEE Transactions on Industrial Electronics 66, 6 (2018), 4788--4797.Google ScholarCross Ref
- Maria Soledad López Gambino, Casey Kennington, and David Schlangen. 2017. Silence, Please! Interrupting In-Car Phone Conversations. In Proceedings of the First Workshop on Conversational Interruptions in Human-Agent Interactions (CIHAI 2017), Vol. 1943.Google Scholar
- Joel C McCall and Mohan M Trivedi. 2004. Visual context capture and analysis for driver attention monitoring. In Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749). IEEE, 332--337.Google ScholarCross Ref
- Daniel C McFarlane. 2002. Comparison of four primary methods for coordinating the interruption of people in human-computer interaction. Human-Computer Interaction 17, 1 (2002), 63--139.Google ScholarDigital Library
- Bruce Mehler, David Kidd, Bryan Reimer, Ian Reagan, Jonathan Dobres, and Anne McCartt. 2016. Multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems. Ergonomics 59, 3 (2016), 344--367.Google ScholarCross Ref
- Charlie Miller and Chris Valasek. 2015. Remote exploitation of an unaltered passenger vehicle. Black Hat USA 2015 (2015), 91.Google Scholar
- Christopher A Monk, Deborah A Boehm-Davis, George Mason, and J Gregory Trafton. 2004. Recovering from interruptions: Implications for driver distraction research. Human factors 46, 4 (2004), 650--663.Google Scholar
- Taro Nakamura, Akinobu Maejima, and Shigeo Morishima. 2014. Driver drowsiness estimation from facial expression features computer vision feature investigation using a CG model. In 2014 International Conference on Computer Vision Theory and Applications (VISAPP), Vol. 2. IEEE, 207--214.Google Scholar
- Richard Nelesen, Yasmin Dar, KaMala Thomas, and Joel E Dimsdale. 2008. The relationship between fatigue and cardiac functioning. Archives of internal medicine 168, 9 (2008), 943--949.Google Scholar
- Mirko Nentwig and Marc Stamminger. 2011. Hardware-in-the-loop testing of computer vision based driver assistance systems. In 2011 IEEE Intelligent vehicles symposium (IV). IEEE, 339--344.Google Scholar
- Julia L Newton, Amish Sheth, Jane Shin, Jessie Pairman, Katharine Wilton, Jennifer A Burt, and David EJ Jones. 2009. Lower ambulatory blood pressure in chronic fatigue syndrome. Psychosomatic medicine 71, 3 (2009), 361--365.Google Scholar
- Martin Pielot, Bruno Cardoso, Kleomenis Katevas, Joan Serrà, Aleksandar Matic, and Nuria Oliver. 2017. Beyond Interruptibility: Predicting Opportune Moments to Engage Mobile Phone Users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, Article 91 (Sept. 2017), 25 pages. https://doi.org/10.1145/3130956Google ScholarDigital Library
- Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.Google ScholarCross Ref
- Bryan Reimer, Bruce Mehler, J Dobres, and JF Coughlin. 2013. The effects of a production level "voice-command" interface on driver behavior: summary findings on reported workload, physiology, visual attention, and driving performance.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.Google Scholar
- Arun Sahayadhas, Kenneth Sundaraj, and Murugappan Murugappan. 2012. Detecting driver drowsiness based on sensors: a review. Sensors 12, 12 (2012), 16937--16953.Google ScholarCross Ref
- Rob Semmens, Nikolas Martelaro, Pushyami Kaveti, Simon Stent, and Wendy Ju. 2019. Is Now A Good Time? An Empirical Study of Vehicle-Driver Communication Timing. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300867Google ScholarDigital Library
- Daniel P Siewiorek, Asim Smailagic, Junichi Furukawa, Andreas Krause, Neema Moraveji, Kathryn Reiger, Jeremy Shaffer, and Fei Lung Wong. 2003. SenSay: A Context-Aware Mobile Phone.. In ISWC, Vol. 3. 248.Google ScholarDigital Library
- José Solaz, José Laparra-Hernández, Daniel Bande, Noelia Rodríguez, Sergio Veleff, José Gerpe, and Enrique Medina. 2016. Drowsiness detection based on the analysis of breathing rate obtained from real-time image recognition. Transportation research procedia 14 (2016), 3867--3876.Google Scholar
- David L Strayer, Joel M Cooper, Jonna Turrill, James R Coleman, and Rachel J Hopman. 2016. Talking to your car can drive you to distraction. Cognitive research: principles and implications 1, 1 (2016), 16.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
- Yoshinao Takemae, Takehiko Ohno, Ikuo Yoda, and Shinji Ozawa. 2007. Estimating Interruptibility in the Home for Remote Communication Based on Audio-Visual Tracking. IPSJ Digital Courier 3 (2007), 125--133.Google ScholarCross Ref
- Patrick Tchankue, Janet Wesson, and Dieter Vogts. 2011. The impact of an adaptive user interface on reducing driver distraction. In Proceedings of the 3rd International Conference on Automotive User Interfaces and Interactive Vehicular Applications. 87--94.Google ScholarDigital Library
- Andrea L Thomaz and Cynthia Breazeal. 2008. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence 172, 6--7 (2008), 716--737.Google ScholarDigital Library
- Mohan Manubhai Trivedi, Tarak Gandhi, and Joel McCall. 2007. Looking-in and looking-out of a vehicle: Computer-vision-based enhanced vehicle safety. IEEE Transactions on Intelligent Transportation Systems 8, 1 (2007), 108--120.Google ScholarDigital Library
- Yuji Uchiyama, Shin-ichi Kojima, Takero Hongo, Ryuta Terashima, and Toshihiro Wakita. 2002. Voice information system adapted to driver's mental workload. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 46. SAGE Publications Sage CA: Los Angeles, CA, 1871--1875.Google ScholarCross Ref
- Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition.. In Ijcai, Vol. 15. Buenos Aires, Argentina, 3995--4001.Google Scholar
- Chuang-Wen You, Nicholas D Lane, Fanglin Chen, Rui Wang, Zhenyu Chen, Thomas J Bao, Martha Montes-de Oca, Yuting Cheng, Mu Lin, Lorenzo Torresani, et al. 2013. Carsafe app: Alerting drowsy and distracted drivers using dual cameras on smartphones. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services. 13--26.Google Scholar
- Wei Zhang, Bo Cheng, and Yingzi Lin. 2012. Driver drowsiness recognition based on computer vision technology. Tsinghua Science and Technology 17, 3 (2012), 354--362.Google ScholarCross Ref
- Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, and Dongya Wu. 2017. Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics 28, 1 (2017), 162--169.Google ScholarCross Ref
- Manuela Züger, Sebastian C. Müller, André N. Meyer, and Thomas Fritz. 2018. Sensing Interruptibility in the Office: A Field Study on the Use of Biometric and Computer Interaction Sensors. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/3173574.3174165Google ScholarDigital Library
Index Terms
- Learning When Agents Can Talk to Drivers Using the INAGT Dataset and Multisensor Fusion
Recommendations
Vehicle Ride Comfort Analysis and Optimization Using Design of Experiment
IHMSC '10: Proceedings of the 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 01In this paper the fundamental of the component mode synthesis was reviewed. In order to optimize the ride comfort of the vehicle, a rigid-flexible coupling model of a car was established by using multibody system dynamics method and component mode ...
A Nonlinear Analysis of Driver-Vehicle Performance with Four Wheel Steering Using Time Delay Control Method
APWCS '10: Proceedings of the 2010 Asia-Pacific Conference on Wearable Computing SystemsThe authors have adopted a time delay control method to roundly analyze vehicle dynamic capability equipped with four wheel steering based on close-loop driver vehicle system, especially, driver's steering behavior tends to become more discontinuous ...
Research on the Parameters Measurement of Vehicle Brake Performance in Driving
ICAIIS 2021: 2021 2nd International Conference on Artificial Intelligence and Information SystemsThe brake performance parameters of vehicles are important indicators that can most truly reflect the brake conditions of vehicles on the road. Aiming at the problems in the traditional parameters measurement of vehicle brake performance, which is only ...
Comments