nach oben

2021 | Buch

Kapitel lesen Erstes Kapitel lesen

Artificial Intelligence for Human Computer Interaction: A Modern Approach

herausgegeben von: Yang Li, Otmar Hilliges

Verlag: Springer International Publishing

Buchreihe : Human–Computer Interaction Series

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This edited book explores the many interesting questions that lie at the intersection between AI and HCI. It covers a comprehensive set of perspectives, methods and projects that present the challenges and opportunities that modern AI methods bring to HCI researchers and practitioners. The chapters take a clear departure from traditional HCI methods and leverage data-driven and deep learning methods to tackle HCI problems that were previously challenging or impossible to address.

It starts with addressing classic HCI topics, including human behaviour modeling and input, and then dedicates a section to data and tools, two technical pillars of modern AI methods. These chapters exemplify how state-of-the-art deep learning methods infuse new directions and allow researchers to tackle long standing and newly emerging HCI problems alike. Artificial Intelligence for Human Computer Interaction: A Modern Approach concludes with a section on Specific Domains which covers a set of emerging HCI areas where modern AI methods start to show real impact, such as personalized medical, design, and UI automation.

Inhaltsverzeichnis

Frontmatter

Modeling

Frontmatter

Human Performance Modeling with Deep Learning

Abstract

Predicting human performance in interaction tasks allows designers or developers to understand the expected performance of a target interface without actually testing it with real users. In this chapter, we are going to discuss how deep learning methods can be used to aid human performance prediction in the context of HCI. Particularly, we are going to look at three case studies. In the first case study, we discuss deep models for goal-driven human visual search on arbitrary web pages. In the second study, we show that deep learning models could successfully capture human learning effects from repetitive interaction with vertical menus. In the third case study, we describe how deep models can be combined with analytical understanding to capture high-level interaction strategies and low-level behaviors in touchscreen grid interfaces on mobile devices. In all these studies, we show that deep learning provides great capacity for modeling complex interaction behaviors, which would be extremely difficult for traditional heuristic-based models. Furthermore, we showcase different ways to analyze a learned deep model to obtain better model interpretability, and understanding of human behaviors to advance the science.

Arianna Yuan, Ken Pfeuffer, Yang Li

Optimal Control to Support High-Level User Goals in Human-Computer Interaction

Abstract

With emerging technologies like robots, mixed-reality systems or mobile devices, machine-provided capabilities are increasing, so is the complexity of their control and display mechanisms. To address this dichotomy, we propose optimal control as a framework to support users in achieving their high-level goals in human-computer tasks. We reason that it will improve user support over usual approaches for adaptive interfaces as its formalism implicitly captures the iterative nature of human-computer interaction. We conduct two case studies to test this hypothesis. First, we propose a model-predictive-control-based optimization scheme that supports end-users to plan and execute robotic aerial videos. Second, we introduce a reinforcement-learning-based method to adapt mixed-reality augmentations based on users’ preferences or tasks learned from their gaze interactions with a UI. Our results show that optimal control can better support users’ high-level goals in human-computer tasks than common approaches. Optimal control models human-computer interaction as a sequential decision problem which represents its nature and, hence, results in better predictability of user behavior than for other methods. In addition, our work highlights that optimization- and learning-based optimal control have complementary strengths with respect to interface adaptation.

Christoph Gebhardt, Otmar Hilliges

Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning

Abstract

Tapping is an immensely important gesture in mobile touchscreen interfaces, yet people still frequently are required to learn which elements are tappable through trial and error. Predicting human behavior for this everyday gesture can help mobile app designers understand an important aspect of the usability of their apps without having to run a user study. In this chapter, we present an approach for modeling tappability of mobile interfaces at scale. We conducted large-scale data collection of interface tappability over a rich set of mobile apps using crowdsourcing and computationally investigated a variety of signifiers that people use to distinguish tappable versus not tappable elements. Based on the dataset, we developed and trained a deep neural network that predicts how likely a user will perceive an interface element as tappable versus not tappable. To demonstrate the capability of the trained tappability model, we developed TapShoe, a tool that automatically diagnoses mismatches between the tappability of each element as perceived by a human user—predicted by our model, and the intended or actual tappable state of the element specified by the developer or designer. Our model achieved reasonable accuracy: mean precision 90.2% and recall 87.0%, in matching human perception on identifying tappable UI elements. The tappability model and TapShoe were well received by designers via an informal evaluation with 7 professional interaction designers.

Amanda Swearngin, Yang Li

Input

Frontmatter

Eye Gaze Estimation and Its Applications

Abstract

The human eye gaze is an important non-verbal cue that can unobtrusively provide information about the intention and attention of a user to enable intelligent interactive systems. Eye gaze can also be taken as input to systems as a replacement of the conventional mouse and keyboard, and can also be indicative of the cognitive state of the user. However, estimating and applying gaze in real-world applications poses significant challenges. In this chapter, we first review the development of gaze estimation methods in recent years. We especially focus on learning-based gaze estimation methods which benefit from large-scale data and deep learning methods that recently became available. Second, we discuss the challenges of using gaze estimation for real-world applications and our efforts toward making these methods easily usable for the Human-Computer Interaction community. At last, we provide two application examples, demonstrating the use of eye gaze to enable attentive and adaptive interfaces.

Xucong Zhang, Seonwook Park, Anna Maria Feit

AI-Driven Intelligent Text Correction Techniques for Mobile Text Entry

Abstract

Current text correction processes on mobile touch devices are laborious: users either extensively use backspace, or navigate the cursor to the error position, make a correction, and navigate back, usually by employing multiple taps or drags over small targets. In this chapter, we present two techniques, Type, Then Correct and JustCorrect, that utilize the power of artificial intelligence to improve the text correction experience on mobile devices. All of the techniques skip error-deletion and cursor-positioning procedures, and instead allow the user to type the correction first, and then apply that correction to a previously committed error. We evaluated these techniques in and the results show that correction with the new techniques was faster than de facto cursor and backspace-based correction.

Mingrui Ray Zhang, He Wen, Wenzhe Cui, Suwen Zhu, H. Andrew Schwartz, Xiaojun Bi, Jacob O. Wobbrock

Deep Touch: Sensing Press Gestures from Touch Image Sequences

Abstract

Capacitive touch sensors capture a sequence of images of a finger’s interaction with a surface that contain information about its contact shape, posture, and biomechanical structure. These images are typically reduced to two-dimensional points, with the remaining data discarded—restricting the expressivity that can be captured to discriminate a user’s touch intent. We develop a deep touch hypothesis that (1) the human finger performs richer expressions on a touch surface than simple pointing; (2) such expressions are manifested in touch sensor image sequences due to finger-surface biomechanics; and (3) modern neural networks are capable of discriminating touch gestures using these sequences. In particular, a press gesture based on an increase in a finger’s force can be sensed without additional hardware, and reliably discriminated from other common expressions. This work demonstrates that combining capacitive touch sensing with modern neural network algorithms is a practical direction to improve the usability and expressivity of touch-based user interfaces.

Philip Quinn, Wenxin Feng, Shumin Zhai

Deep Learning-Based Hand Posture Recognition for Pen Interaction Enhancement

Abstract

This chapter examines how digital pen interaction can be expanded by detecting different hand postures formed primarily by the hand while it grips the pen. Three systems using different types of sensors are considered: an EMG armband, the raw capacitive image of the touchscreen, and a pen-top fisheye camera. In each case, deep neural networks are used to perform classification or regression to detect hand postures and gestures. Additional analyses are provided to demonstrate the benefit of deep learning over conventional machine-learning methods, as well as explore the impact on model accuracy resulting from the number of postures to be recognised, user-dependent versus user-independent models, and the amount of training data. Examples of posture-based pen interaction in applications are discussed and a number of usability aspects resulting from user evaluations are identified. The chapter concludes with perspectives on the recognition and design of posture-based pen interaction for future systems.

Fabrice Matulic, Daniel Vogel

Data and Tools

Frontmatter

An Early Rico Retrospective: Three Years of Uses for a Mobile App Dataset

Abstract

The Rico dataset, containing design data from more than 9.7 k Android apps spanning \(27\) categories, was released in 2017. It exposes visual, textual, structural, and interactive design properties of more than 72 k unique UI screens. Over the years since its release, the original paper has been cited nearly 100 times according to Google Scholar and the dataset has been used as the basis for numerous research projects. In this chapter, we describe the creation of Rico using a system that combined crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. We then describe two projects that we conducted using the dataset: the training of an autoencoder to identify similarity between UI designs, and an exploration of the use of Google’s Material Design within the dataset using machine learned models. We conclude with an overview of other work that has used Rico to understand our mobile UI world and build data-driven models that assist users, designers, and developers.

Biplab Deka, Bardia Doosti, Forrest Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Ranjitha Kumar, Tao Dong, Jeffrey Nichols

Visual Intelligence through Human Interaction

Abstract

Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces, and even generating novel visual content. As these tasks and applications have modernized, so too has the reliance on more data, either for model training or for evaluation. In this chapter, we demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision. First, we present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models. Second, we explore a method to increase volunteer contributions using automated social interventions. Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable, and grounded in psychophysics theory. We conclude with future opportunities for Human–Computer Interaction to aid Computer Vision.

Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein

ML Tools for the Web: A Way for Rapid Prototyping and HCI Research

Abstract

Machine learning (ML) has become a powerful tool with the potential to enable new interactions and user experiences. Although the use of ML in HCI research is growing, the process of prototyping and deploying ML remains challenging. We claim that ML tools designed to be used on the Web are suitable for fast prototyping and HCI research. In this chapter, we review literature, current technologies, and use cases of ML tools for the Web. We also provide a case study, using TensorFlow.js—a major Web ML library, to demonstrate how to prototype with Web ML tools in different prototyping scenarios. At the end, we discuss challenges and future directions of designing tools for fast prototyping and research.

Na Li, Jason Mayes, Ping Yu

Interactive Reinforcement Learning for Autonomous Behavior Design

Abstract

Reinforcement Learning (RL) is a machine learning approach based on how humans and animals learn new behaviors by actively exploring their environment that provides them positive and negative rewards. The interactive RL approach incorporates a human-in-the-loop that can guide a learning RL-based agent to personalize its behavior and/or speed up its learning process. To enable HCI researchers to make advances in this area, we introduce an interactive RL framework that outlines HCI challenges in the domain. By following this taxonomy, HCI researchers can (1) design new interaction techniques and (2) propose new applications. To help the role (1) researchers, we describe how different types of human feedback can adapt an RL model to perform as the users intend. We help researchers perform the role (2) by proposing generic design principles to create effective RL applications. Finally, we list current open challenges in interactive RL and what we consider the most promising research directions in this research area.

Christian Arzate Cruz, Takeo Igarashi

Specific Domains

Frontmatter

Sketch-Based Creativity Support Tools Using Deep Learning

Abstract

Sketching is a natural and effective visual communication medium commonly used in creative processes. Recent developments in deep-learning models drastically improved machines’ ability in understanding and generating visual content. An exciting area of development explores deep-learning approaches used to model human sketches, opening opportunities for creative applications. This chapter describes three fundamental steps in developing deep-learning-driven creativity support tools that consume and generate sketches: (1) a data collection effort that generated a new paired dataset between sketches and mobile user interfaces; (2) a sketch-based user interface retrieval system adapted from state-of-the-art computer vision techniques; and, (3) a conversational sketching system that supports the novel interaction of a natural-language-based sketch/critique authoring process. In this chapter, we survey relevant prior work in both the deep-learning and human-computer interaction communities, document the data collection process and the systems’ architectures in detail, present qualitative and quantitative results, and paint the landscape of several future research directions in this exciting area.

Forrest Huang, Eldon Schoop, David Ha, Jeffrey Nichols, John Canny

Generative Ink: Data-Driven Computational Models for Digital Ink

Abstract

Digital ink promises to combine the flexibility of pen and paper interaction and the versatility of digital devices. Computational models of digital ink often focus on recognition of the content by following discriminative techniques such as classification, albeit at the cost of ignoring or losing personalized style. In this chapter, we propose augmenting the digital ink framework via generative modeling to achieve a holistic understanding of the ink content. Our focus particularly lies in developing novel generative models to gain fine-grained control by preserving user style. To this end, we model the inking process and learn to create ink samples similar to users. We first present how digital handwriting can be disentangled into style and content to implement editable digital ink, enabling content synthesis and editing. Second, we address a more complex setup of free-form sketching and propose a novel approach for modeling stroke-based data efficiently. Generative ink promises novel functionalities, leading to compelling applications to enhance the inking experience for users in an interactive and collaborative manner.

Emre Aksan, Otmar Hilliges

Bridging Natural Language and Graphical User Interfaces

Abstract

“Language as symbolic action” (https://en.wikipedia.org/wiki/Kenneth_Burke) has a natural connection with direct-manipulation interaction (e.g., via GUI or physical appliances) that is common for modern computers such as smartphones. In this chapter, we present our efforts for bridging the gap between natural language and graphical user interfaces, which can potentially enable a broad category of interaction scenarios. Specifically, we develop datasets and deep learning models that can ground natural language instructions or command into executable actions on GUIs, and on the other hand generate natural language descriptions of user interfaces such that a user knows how to control them in language. These projects resemble research efforts intersecting Natural Language Processing (NLP) and HCI, and produce datasets and opensource code that lay a foundation for future research in the area.

Yang Li, Xin Zhou, Gang Li

Demonstration + Natural Language: Multimodal Interfaces for GUI-Based Interactive Task Learning Agents

Abstract

We summarize our past five years of work on designing, building, and studying Sugilite, an interactive task learning agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations leveraging the graphical user interfaces (GUIs) of third-party mobile apps. Through its multi-modal and mixed-initiative approaches for Human-AI interaction, Sugilite made important contributions in improving the usability, applicability, generalizability, flexibility, robustness, and shareability of interactive task learning agents. Sugilite also represents a new human-AI interaction paradigm for interactive task learning, where it uses existing app GUIs as a medium for users to communicate their intents with an AI agent instead of the interfaces for users to interact with the underlying computing services. In this chapter, we describe the Sugilite system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android.

Toby Jia-Jun Li, Tom M. Mitchell, Brad A. Myers

Human-Centered AI for Medical Imaging

Abstract

Medical imaging is the primary data source most physicians refer to when making a diagnosis. However, examination of medical imaging data, due to its density and uncertainty, can be time-consuming and error-prone. The recent advent of data-driven artificial intelligence (AI) provides a promising solution, yet the adoption of AI in medicine is often hindered by the ‘black box’ nature. This chapter reviews how AI can distil new insights from medical imaging data and how a human-centered approach can transform AIs role as one that engages patients with self-assessment and personalized models and as one that enables physicians to comprehend and control how AI performs a diagnosis, thus able to collaborate with AI in making a diagnosis.

Yuan Liang, Lei He, Xiang ‘Anthony’ Chen

3D Spatial Sound Individualization with Perceptual Feedback

Abstract

Designing an interactive system tailored appropriately for each user’s physical and cognitive characteristics is important for providing optimal user experience. In this chapter, we discuss how we could address such problems leveraging modern interactive machine learning techniques. As a case study, we introduce a method to individualize 3D spatial sound rendering with perceptual feedback. 3D spatial sound rendering traditionally required time-consuming measurement of individual user using an expensive device. By taking data-driven approach, one can replace such expensive measurement with simple calibration. We first describe how to train a generic deep learning model with an existing measured data set. We then describe how to adapt the model to a specific user with simple calibration process consisting of pairwise comparisons. Through this case study, the readers will get insight on how to adapt an interactive system for a specific user’s characteristics, taking advantage of the high expressiveness of modern machine learning techniques.

Kazuhiko Yamamoto, Takeo Igarashi

Titel: Artificial Intelligence for Human Computer Interaction: A Modern Approach
herausgegeben von: Yang Li
Otmar Hilliges
Verlag: Springer International Publishing
Electronic ISBN: 978-3-030-82681-9
Print ISBN: 978-3-030-82680-2
DOI: https://doi.org/10.1007/978-3-030-82681-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Dr. Fabian Struck/© Forto Logistics SE & Co., Bau Immobilie/© Gina Sanders / Fotolia, Kundenpotenzial/© Andrii Yalanskyi / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Modeling

Frontmatter

Human Performance Modeling with Deep Learning

Optimal Control to Support High-Level User Goals in Human-Computer Interaction

Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning

Input

Frontmatter

Eye Gaze Estimation and Its Applications

AI-Driven Intelligent Text Correction Techniques for Mobile Text Entry

Deep Touch: Sensing Press Gestures from Touch Image Sequences

Deep Learning-Based Hand Posture Recognition for Pen Interaction Enhancement

Data and Tools

Frontmatter

An Early Rico Retrospective: Three Years of Uses for a Mobile App Dataset

Visual Intelligence through Human Interaction

ML Tools for the Web: A Way for Rapid Prototyping and HCI Research

Interactive Reinforcement Learning for Autonomous Behavior Design

Specific Domains

Frontmatter

Sketch-Based Creativity Support Tools Using Deep Learning

Generative Ink: Data-Driven Computational Models for Digital Ink

Bridging Natural Language and Graphical User Interfaces

Demonstration + Natural Language: Multimodal Interfaces for GUI-Based Interactive Task Learning Agents

Human-Centered AI for Medical Imaging

3D Spatial Sound Individualization with Perceptual Feedback

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.