Skip to main content

2021 | Buch

Artificial Intelligence for Human Computer Interaction: A Modern Approach

insite
SUCHEN

Über dieses Buch

This edited book explores the many interesting questions that lie at the intersection between AI and HCI. It covers a comprehensive set of perspectives, methods and projects that present the challenges and opportunities that modern AI methods bring to HCI researchers and practitioners. The chapters take a clear departure from traditional HCI methods and leverage data-driven and deep learning methods to tackle HCI problems that were previously challenging or impossible to address.

It starts with addressing classic HCI topics, including human behaviour modeling and input, and then dedicates a section to data and tools, two technical pillars of modern AI methods. These chapters exemplify how state-of-the-art deep learning methods infuse new directions and allow researchers to tackle long standing and newly emerging HCI problems alike. Artificial Intelligence for Human Computer Interaction: A Modern Approach concludes with a section on Specific Domains which covers a set of emerging HCI areas where modern AI methods start to show real impact, such as personalized medical, design, and UI automation.

Inhaltsverzeichnis

Frontmatter

Modeling

Frontmatter
Human Performance Modeling with Deep Learning
Abstract
Predicting human performance in interaction tasks allows designers or developers to understand the expected performance of a target interface without actually testing it with real users. In this chapter, we are going to discuss how deep learning methods can be used to aid human performance prediction in the context of HCI. Particularly, we are going to look at three case studies. In the first case study, we discuss deep models for goal-driven human visual search on arbitrary web pages. In the second study, we show that deep learning models could successfully capture human learning effects from repetitive interaction with vertical menus. In the third case study, we describe how deep models can be combined with analytical understanding to capture high-level interaction strategies and low-level behaviors in touchscreen grid interfaces on mobile devices. In all these studies, we show that deep learning provides great capacity for modeling complex interaction behaviors, which would be extremely difficult for traditional heuristic-based models. Furthermore, we showcase different ways to analyze a learned deep model to obtain better model interpretability, and understanding of human behaviors to advance the science.
Arianna Yuan, Ken Pfeuffer, Yang Li
Optimal Control to Support High-Level User Goals in Human-Computer Interaction
Abstract
With emerging technologies like robots, mixed-reality systems or mobile devices, machine-provided capabilities are increasing, so is the complexity of their control and display mechanisms. To address this dichotomy, we propose optimal control as a framework to support users in achieving their high-level goals in human-computer tasks. We reason that it will improve user support over usual approaches for adaptive interfaces as its formalism implicitly captures the iterative nature of human-computer interaction. We conduct two case studies to test this hypothesis. First, we propose a model-predictive-control-based optimization scheme that supports end-users to plan and execute robotic aerial videos. Second, we introduce a reinforcement-learning-based method to adapt mixed-reality augmentations based on users’ preferences or tasks learned from their gaze interactions with a UI. Our results show that optimal control can better support users’ high-level goals in human-computer tasks than common approaches. Optimal control models human-computer interaction as a sequential decision problem which represents its nature and, hence, results in better predictability of user behavior than for other methods. In addition, our work highlights that optimization- and learning-based optimal control have complementary strengths with respect to interface adaptation.
Christoph Gebhardt, Otmar Hilliges
Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning
Abstract
Tapping is an immensely important gesture in mobile touchscreen interfaces, yet people still frequently are required to learn which elements are tappable through trial and error. Predicting human behavior for this everyday gesture can help mobile app designers understand an important aspect of the usability of their apps without having to run a user study. In this chapter, we present an approach for modeling tappability of mobile interfaces at scale. We conducted large-scale data collection of interface tappability over a rich set of mobile apps using crowdsourcing and computationally investigated a variety of signifiers that people use to distinguish tappable versus not tappable elements. Based on the dataset, we developed and trained a deep neural network that predicts how likely a user will perceive an interface element as tappable versus not tappable. To demonstrate the capability of the trained tappability model, we developed TapShoe, a tool that automatically diagnoses mismatches between the tappability of each element as perceived by a human user—predicted by our model, and the intended or actual tappable state of the element specified by the developer or designer. Our model achieved reasonable accuracy: mean precision 90.2% and recall 87.0%, in matching human perception on identifying tappable UI elements. The tappability model and TapShoe were well received by designers via an informal evaluation with 7 professional interaction designers.
Amanda Swearngin, Yang Li

Input

Frontmatter
Eye Gaze Estimation and Its Applications
Abstract
The human eye gaze is an important non-verbal cue that can unobtrusively provide information about the intention and attention of a user to enable intelligent interactive systems. Eye gaze can also be taken as input to systems as a replacement of the conventional mouse and keyboard, and can also be indicative of the cognitive state of the user. However, estimating and applying gaze in real-world applications poses significant challenges. In this chapter, we first review the development of gaze estimation methods in recent years. We especially focus on learning-based gaze estimation methods which benefit from large-scale data and deep learning methods that recently became available. Second, we discuss the challenges of using gaze estimation for real-world applications and our efforts toward making these methods easily usable for the Human-Computer Interaction community. At last, we provide two application examples, demonstrating the use of eye gaze to enable attentive and adaptive interfaces.
Xucong Zhang, Seonwook Park, Anna Maria Feit
AI-Driven Intelligent Text Correction Techniques for Mobile Text Entry
Abstract
Current text correction processes on mobile touch devices are laborious: users either extensively use backspace, or navigate the cursor to the error position, make a correction, and navigate back, usually by employing multiple taps or drags over small targets. In this chapter, we present two techniques, Type, Then Correct and JustCorrect, that utilize the power of artificial intelligence to improve the text correction experience on mobile devices. All of the techniques skip error-deletion and cursor-positioning procedures, and instead allow the user to type the correction first, and then apply that correction to a previously committed error. We evaluated these techniques in and the results show that correction with the new techniques was faster than de facto cursor and backspace-based correction.
Mingrui Ray Zhang, He Wen, Wenzhe Cui, Suwen Zhu, H. Andrew Schwartz, Xiaojun Bi, Jacob O. Wobbrock
Deep Touch: Sensing Press Gestures from Touch Image Sequences
Abstract
Capacitive touch sensors capture a sequence of images of a finger’s interaction with a surface that contain information about its contact shape, posture, and biomechanical structure. These images are typically reduced to two-dimensional points, with the remaining data discarded—restricting the expressivity that can be captured to discriminate a user’s touch intent. We develop a deep touch hypothesis that (1) the human finger performs richer expressions on a touch surface than simple pointing; (2) such expressions are manifested in touch sensor image sequences due to finger-surface biomechanics; and (3) modern neural networks are capable of discriminating touch gestures using these sequences. In particular, a press gesture based on an increase in a finger’s force can be sensed without additional hardware, and reliably discriminated from other common expressions. This work demonstrates that combining capacitive touch sensing with modern neural network algorithms is a practical direction to improve the usability and expressivity of touch-based user interfaces.
Philip Quinn, Wenxin Feng, Shumin Zhai
Deep Learning-Based Hand Posture Recognition for Pen Interaction Enhancement
Abstract
This chapter examines how digital pen interaction can be expanded by detecting different hand postures formed primarily by the hand while it grips the pen. Three systems using different types of sensors are considered: an EMG armband, the raw capacitive image of the touchscreen, and a pen-top fisheye camera. In each case, deep neural networks are used to perform classification or regression to detect hand postures and gestures. Additional analyses are provided to demonstrate the benefit of deep learning over conventional machine-learning methods, as well as explore the impact on model accuracy resulting from the number of postures to be recognised, user-dependent versus user-independent models, and the amount of training data. Examples of posture-based pen interaction in applications are discussed and a number of usability aspects resulting from user evaluations are identified. The chapter concludes with perspectives on the recognition and design of posture-based pen interaction for future systems.
Fabrice Matulic, Daniel Vogel

Data and Tools

Frontmatter
An Early Rico Retrospective: Three Years of Uses for a Mobile App Dataset
Abstract
The Rico dataset, containing design data from more than 9.7 k Android apps spanning \(27\) categories, was released in 2017. It exposes visual, textual, structural, and interactive design properties of more than 72 k unique UI screens. Over the years since its release, the original paper has been cited nearly 100 times according to Google Scholar and the dataset has been used as the basis for numerous research projects. In this chapter, we describe the creation of Rico using a system that combined crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. We then describe two projects that we conducted using the dataset: the training of an autoencoder to identify similarity between UI designs, and an exploration of the use of Google’s Material Design within the dataset using machine learned models. We conclude with an overview of other work that has used Rico to understand our mobile UI world and build data-driven models that assist users, designers, and developers.
Biplab Deka, Bardia Doosti, Forrest Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Ranjitha Kumar, Tao Dong, Jeffrey Nichols
Visual Intelligence through Human Interaction
Abstract
Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces, and even generating novel visual content. As these tasks and applications have modernized, so too has the reliance on more data, either for model training or for evaluation. In this chapter, we demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision. First, we present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models. Second, we explore a method to increase volunteer contributions using automated social interventions. Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable, and grounded in psychophysics theory. We conclude with future opportunities for Human–Computer Interaction to aid Computer Vision.
Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein
ML Tools for the Web: A Way for Rapid Prototyping and HCI Research
Abstract
Machine learning (ML) has become a powerful tool with the potential to enable new interactions and user experiences. Although the use of ML in HCI research is growing, the process of prototyping and deploying ML remains challenging. We claim that ML tools designed to be used on the Web are suitable for fast prototyping and HCI research. In this chapter, we review literature, current technologies, and use cases of ML tools for the Web. We also provide a case study, using TensorFlow.js—a major Web ML library, to demonstrate how to prototype with Web ML tools in different prototyping scenarios. At the end, we discuss challenges and future directions of designing tools for fast prototyping and research.
Na Li, Jason Mayes, Ping Yu
Interactive Reinforcement Learning for Autonomous Behavior Design
Abstract
Reinforcement Learning (RL) is a machine learning approach based on how humans and animals learn new behaviors by actively exploring their environment that provides them positive and negative rewards. The interactive RL approach incorporates a human-in-the-loop that can guide a learning RL-based agent to personalize its behavior and/or speed up its learning process. To enable HCI researchers to make advances in this area, we introduce an interactive RL framework that outlines HCI challenges in the domain. By following this taxonomy, HCI researchers can (1) design new interaction techniques and (2) propose new applications. To help the role (1) researchers, we describe how different types of human feedback can adapt an RL model to perform as the users intend. We help researchers perform the role (2) by proposing generic design principles to create effective RL applications. Finally, we list current open challenges in interactive RL and what we consider the most promising research directions in this research area.
Christian Arzate Cruz, Takeo Igarashi

Specific Domains

Frontmatter
Sketch-Based Creativity Support Tools Using Deep Learning
Abstract
Sketching is a natural and effective visual communication medium commonly used in creative processes. Recent developments in deep-learning models drastically improved machines’ ability in understanding and generating visual content. An exciting area of development explores deep-learning approaches used to model human sketches, opening opportunities for creative applications. This chapter describes three fundamental steps in developing deep-learning-driven creativity support tools that consume and generate sketches: (1) a data collection effort that generated a new paired dataset between sketches and mobile user interfaces; (2) a sketch-based user interface retrieval system adapted from state-of-the-art computer vision techniques; and, (3) a conversational sketching system that supports the novel interaction of a natural-language-based sketch/critique authoring process. In this chapter, we survey relevant prior work in both the deep-learning and human-computer interaction communities, document the data collection process and the systems’ architectures in detail, present qualitative and quantitative results, and paint the landscape of several future research directions in this exciting area.
Forrest Huang, Eldon Schoop, David Ha, Jeffrey Nichols, John Canny
Generative Ink: Data-Driven Computational Models for Digital Ink
Abstract
Digital ink promises to combine the flexibility of pen and paper interaction and the versatility of digital devices. Computational models of digital ink often focus on recognition of the content by following discriminative techniques such as classification, albeit at the cost of ignoring or losing personalized style. In this chapter, we propose augmenting the digital ink framework via generative modeling to achieve a holistic understanding of the ink content. Our focus particularly lies in developing novel generative models to gain fine-grained control by preserving user style. To this end, we model the inking process and learn to create ink samples similar to users. We first present how digital handwriting can be disentangled into style and content to implement editable digital ink, enabling content synthesis and editing. Second, we address a more complex setup of free-form sketching and propose a novel approach for modeling stroke-based data efficiently. Generative ink promises novel functionalities, leading to compelling applications to enhance the inking experience for users in an interactive and collaborative manner.
Emre Aksan, Otmar Hilliges
Bridging Natural Language and Graphical User Interfaces
Abstract
“Language as symbolic action” (https://​en.​wikipedia.​org/​wiki/​Kenneth_​Burke) has a natural connection with direct-manipulation interaction (e.g., via GUI or physical appliances) that is common for modern computers such as smartphones. In this chapter, we present our efforts for bridging the gap between natural language and graphical user interfaces, which can potentially enable a broad category of interaction scenarios. Specifically, we develop datasets and deep learning models that can ground natural language instructions or command into executable actions on GUIs, and on the other hand generate natural language descriptions of user interfaces such that a user knows how to control them in language. These projects resemble research efforts intersecting Natural Language Processing (NLP) and HCI, and produce datasets and opensource code that lay a foundation for future research in the area.
Yang Li, Xin Zhou, Gang Li
Demonstration + Natural Language: Multimodal Interfaces for GUI-Based Interactive Task Learning Agents
Abstract
We summarize our past five years of work on designing, building, and studying Sugilite, an interactive task learning agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations leveraging the graphical user interfaces (GUIs) of third-party mobile apps. Through its multi-modal and mixed-initiative approaches for Human-AI interaction, Sugilite made important contributions in improving the usability, applicability, generalizability, flexibility, robustness, and shareability of interactive task learning agents. Sugilite also represents a new human-AI interaction paradigm for interactive task learning, where it uses existing app GUIs as a medium for users to communicate their intents with an AI agent instead of the interfaces for users to interact with the underlying computing services. In this chapter, we describe the Sugilite system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android.
Toby Jia-Jun Li, Tom M. Mitchell, Brad A. Myers
Human-Centered AI for Medical Imaging
Abstract
Medical imaging is the primary data source most physicians refer to when making a diagnosis. However, examination of medical imaging data, due to its density and uncertainty, can be time-consuming and error-prone. The recent advent of data-driven artificial intelligence (AI) provides a promising solution, yet the adoption of AI in medicine is often hindered by the ‘black box’ nature. This chapter reviews how AI can distil new insights from medical imaging data and how a human-centered approach can transform AIs role as one that engages patients with self-assessment and personalized models and as one that enables physicians to comprehend and control how AI performs a diagnosis, thus able to collaborate with AI in making a diagnosis.
Yuan Liang, Lei He, Xiang ‘Anthony’ Chen
3D Spatial Sound Individualization with Perceptual Feedback
Abstract
Designing an interactive system tailored appropriately for each user’s physical and cognitive characteristics is important for providing optimal user experience. In this chapter, we discuss how we could address such problems leveraging modern interactive machine learning techniques. As a case study, we introduce a method to individualize 3D spatial sound rendering with perceptual feedback. 3D spatial sound rendering traditionally required time-consuming measurement of individual user using an expensive device. By taking data-driven approach, one can replace such expensive measurement with simple calibration. We first describe how to train a generic deep learning model with an existing measured data set. We then describe how to adapt the model to a specific user with simple calibration process consisting of pairwise comparisons. Through this case study, the readers will get insight on how to adapt an interactive system for a specific user’s characteristics, taking advantage of the high expressiveness of modern machine learning techniques.
Kazuhiko Yamamoto, Takeo Igarashi
Metadaten
Titel
Artificial Intelligence for Human Computer Interaction: A Modern Approach
herausgegeben von
Yang Li
Otmar Hilliges
Copyright-Jahr
2021
Electronic ISBN
978-3-030-82681-9
Print ISBN
978-3-030-82680-2
DOI
https://doi.org/10.1007/978-3-030-82681-9

Neuer Inhalt