Conceptual spatial representations for indoor mobile robots

https://doi.org/10.1016/j.robot.2008.03.007Get rights and content

Abstract

We present an approach for creating conceptual representations of human-made indoor environments using mobile robots. The concepts refer to spatial and functional properties of typical indoor environments. Following different findings in spatial cognition, our model is composed of layers representing maps at different levels of abstraction. The complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition. The system also incorporates a linguistic framework that actively supports the map acquisition process, and which is used for situated dialogue. Finally, we discuss the capabilities of the integrated system.

Introduction

Recently, there has been an increasing interest in service robots, such as domestic or elderly care robots, whose aim is to assist people in human-made environments. In such situations, the robots will no longer be operated by trained personnel but instead have to interact with people from the general public. Thus, an important challenge lies in facilitating the communication between robots and humans.

One of the most intuitive and powerful ways for humans to communicate is spoken language. It is therefore interesting to design robots that are able to speak with people and understand their words and expressions. If a dialogue between robots and humans is to be successful, the robots must make use of the same concepts to refer to things and phenomena as a person would do. For this, the robot needs to perceive the world similarly to a human.

An important aspect of human-like perception of the world is the robot’s understanding of the spatial and functional properties of human-made environments, while still being able to safely act in it. For the robot, one of the first tasks will consist in learning the environment in the same way as a person does, sharing common concepts like, for instance, corridor or living room. These terms are used not only as labels, but as semantic expressions that relate them to some complex object or objective situation. For example, the term living room usually implies a place with some particular structure, and which includes objects like a couch or a television set. Thus representing the space in a way similar to humans needs to also account for the way linguistic references to spatial entities are established in situated natural language dialogues. In addition, a spatial knowledge representation for robotic assistants must address the issues involved with safe and reliable navigation control. Only then robots can be deployed in semi-structured environments, such as offices, where they have to interact with humans in everyday situations.

The specific problem we focus on in this article is how, given innate (possibly human-like) concepts a robot may have of spatial organization, the robot can autonomously build an internal representation of the environment by combining these concepts with different low-level sensory systems. This is done by creating a conceptual representation of the environment, in which the concepts represent spatial and functional properties of typical human-made indoor environments.

In order to meet both of the aforementioned requirements–robust robot control and human-like conceptualization–we propose a spatial representation that contains maps at different levels of abstraction. This stepwise abstraction from raw sensor input not only produces maps that are suitable for reliable robot navigation, but also yields a level of representation that is similar to a human conceptualization of spatial organization. Furthermore, this model provides a richer semantic view of an environment that permits the robot to do spatial categorization rather than only instantiation.

Our approach has been integrated into a system running on a mobile robot. This robot is capable of conceptual spatial mapping in an indoor environment, perceiving the world through different typical sensors like a laser range finder and a camera. Moreover, the robot is endowed with the necessary abilities to conduct a reflected, situated dialogue about its environment.

The rest of the paper is organized as follows. In Section 2 we present related work. Section 3 gives an overview of the components of our robotic system. After explaining the individual techniques that are used for evaluating the sensory input in Section 4, we describe our approach to a multi-layered conceptual spatial representation that bridges the gap between sensory input and human spatial concepts in Section 5. Then, the general principles of our robot’s situated dialogue capabilities are introduced in Section 6. In Section 7, we discuss the integration of the complete system in a mobile robot. Finally, concluding remarks are given in Section 8.

Section snippets

Related work

An approach to endowing autonomous robots with a human-like conceptualization of space inherently needs to take into account research in sensor-based mapping and localization for robots as well as findings about human spatial cognition.

Research in cognitive psychology addresses the inherently qualitative nature of human spatial knowledge. Backed up by experimental studies, it is nowadays generally assumed that humans adopt a partially hierarchical representation of spatial organization [1], [2]

System overview

Following the research in spatial cognition and qualitative spatial reasoning on the one hand, and in mobile robotics and artificial intelligence on the other hand, we propose a spatial representation for indoor mobile robots that is divided into layers. These layers represent different levels of abstraction from sensory input to human-like spatial concepts.

This multi-layered spatial representation is the centerpiece of our integrated robotic system. It is created using information coming from

Perception

The perception subsystem gathers information from the laser range scanner and from a camera. Different techniques are used for evaluation of the sensory input. The laser data is processed and used to create the low-level layers of the spatial representation. At the same time the input from the laser scanner is used by a component for detecting and following people [25]. Finally, the images acquired by the camera are analyzed by a computer vision component for object recognition.

Multi-layered spatial representation

The sensors that a robot has are very different from the human sensory modalities. Yet if a robot is to act in a human-populated environment, and to interact with users that are not expert roboticists, it needs to understand its surroundings in terms of human spatial concepts. We propose a layered model of space at different levels of abstraction that range from low-level metric maps for robot localization and navigation to a conceptual layer that provides a human-like decomposition and

Situated dialogue

In this section, we discuss the functionality which enables a robot to carry out a natural language dialogue with a human.

A core characteristic of our approach is that the robot builds up a semantic representation for each utterance. The robot interprets it against the dialogue context, relating it to previously mentioned objects and events, and to previous utterances in terms of “speech acts” (dialogue moves). Since dialogues in human–robot interaction are inherently situated, the robot also

System integration

Our approach has been implemented as an integrated system, running on an ActivMedia PeopleBot mobile robot platform. In this section, we discuss the integration of the components presented in the earlier sections. We focus on what integration brings us in terms of achieving a better understanding of sensory signals, i.e. one that is more complete and more appropriate for interacting with humans; particularly, given that sensory information usually only provides a partial, potentially noisy view

Conclusions

We presented an integrated approach for creating conceptual representations of human-made environments where the concepts represent spatial and functional properties of typical office indoor environments. Our representation is based on multiple maps at different levels of abstraction. The information needed for each level stems from different modalities, including a laser sensor, a camera, and a natural language processing system. The complete system was integrated and tested on a mobile robot

H. Zender is a PhD student researcher at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI). His research interests are linguistic aspects of spatial cognition and spatial knowledge representations for human–robot interaction. He received his Diploma degree in Computational Linguistics from Saarland University in 2006.

References (46)

  • A. Stevens et al.

    Distortions in judged spatial relations

    Cognitive Psychology

    (1978)
  • T. McNamara

    Mental representations of spatial relations

    Cognitive Psychology

    (1986)
  • B. Kuipers

    The Spatial semantic hierarchy

    Artificial Intelligence

    (2000)
  • R. Siegwart

    Robox at expo.02: A large scale installation of personal robots

    Robotics and Autonomous Systems

    (2003)
  • A.G. Cohn et al.

    Qualitative spatial representation and reasoning: An overview

    Fundamenta Informaticae

    (2001)
  • S.C. Hirtle et al.

    Evidence for hierarchies in cognitive maps

    Memory and Cognition

    (1985)
  • R. Brown

    How shall a thing be called?

    Psychological Review

    (1958)
  • E Rosch

    Principles of categorization

  • B. Krieg-Brückner et al.

    A taxonomy of spatial knowledge for navigation and its application to the Bremen autonomous wheelchair

  • S. Vasudevan, S. Gachter, M. Berger, R. Siegwart, Cognitive maps for mobile robots an object based approach, in: Proc....
  • C. Galindo, A. Saffiotti, S. Coradeschi, P. Buschka, J. Fernández-Madrigal, J. González, Multi-hierarchical semantic...
  • P. Beeson, M. MacMahon, J. Modayil, A. Murarka, B. Kuipers, B. Stankiewicz, Integrating multiple representations of...
  • A. Diosi, G. Taylor, L. Kleeman, Interactive SLAM using laser and advanced sonar, in: Proc. of the IEEE Int. Conference...
  • O. Martínez Mozos, A. Rottmann, R. Triebel, P. Jensfelt, W. Burgard, Semantic labeling of places using information...
  • S. Friedman, H. Pasula, D. Fox, Voronoi random fields: Extracting the topological structure of indoor environments via...
  • W. Burgard, A. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D. Schulz, W. Steiner, S. Thrun, Experiences with an...
  • H. Ishiguro et al.

    Robovie: An interactive humanoid robot

    Int. J. Industrial Robotics

    (2001)
  • A. Haasch et al.

    Biron - the Bielefeld robot companion

  • J. Bos, E. Klein, T. Oka, Meaningful conversation with a mobile robot, in: Proceedings of the Research Note Sessions of...
  • O. Lemon, A. Bracy, A. Gruenstein, S. Peters, A multi-modal dialogue system for human–robot conversation, in:...
  • C. Sidner, C. Kidd, C. Lee, N. Lesh, Where to look: A study of human–robot engagement, in: Proceedings of the ACM...
  • G. Kruijff, P. Lison, T. Benjamin, H. Jacobsson, N. Hawes, Incremental, multi-level processing for comprehending...
  • D. Traum et al.

    The information state approach to dialogue management

  • Cited by (265)

    • Accurate indoor location awareness based on machine learning of environmental sensing data

      2022, Computers and Electrical Engineering
      Citation Excerpt :

      Winterhalter, et al. also [16] proposed an effective method of using a smartphone or tablet equipped with an RGB-D camera to analyze two-dimensional floor maps for the purpose of indoor localization. One of the most significant tasks of efficient indoor localization is the ability to learn functional properties or location semantics of man-made environments, such as recognizing corridor, living room or water room, and so on [17]. A few of studies have been proposed for modeling and querying location semantics [18].

    • Multiview vision-based human crowd localization for UAV fleet flight safety

      2021, Signal Processing: Image Communication
      Citation Excerpt :

      DSMs and DTMs often come in raster format, i.e., essentially georeferenced images where a pixel’s value denotes elevation of the corresponding location. Several approaches aiming to augment topological maps [22] with semantic information [23,24] and high-level attributes have been proposed over the past years, allowing aerial robots to handle more expressive concepts or be deployed for more sophisticated tasks. Typically, the goal is to segment the environment into regions that have a coherent semantic meaning.

    View all citing articles on Scopus

    H. Zender is a PhD student researcher at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI). His research interests are linguistic aspects of spatial cognition and spatial knowledge representations for human–robot interaction. He received his Diploma degree in Computational Linguistics from Saarland University in 2006.

    O. Martínez Mozos is a Ph.D. student at the lab of Autonomous Intelligent Systems headed by Wolfram Burgard at the University of Freiburg in Germany. His areas of interest lie on mobile robotics, artificial intelligence, and pattern recognition. In 2005, he received a M.Sc. in applied Computer Science at the University of Freiburg. In 1997 he completed a M.Eng. in Computer Science at the University of Alicante in Spain.

    P. Jensfelt is an assistant professor at the Centre for Autonomous Systems at the Royal Institute of Technology, Stockholm, Sweden. He received his M.Sc. in Engineering Physics in 1996 and Ph.D. in Automatic Control in 2001. His research interests include mapping and localization, mobile robotics, and system integration.

    G.-J. Kruijff is a Senior Researcher at the DFKI Language Technology Lab, where he leads efforts in the area of ”cognitive systems.” His research focuses on developing ”talking robots”. He is particularly interested in developing theories and has implemented architectures for cognitive robots to understand, and produce, situated dialogue with human users — in other words, how do we make talking robots? He has over 90 refereed conference papers and articles in human–robot interaction, and formal and computational linguistics. He is a member of IEEE.

    W. Burgard is a professor at the Department of Computer Science at the University of Freiburg, where he heads the lab for Autonomous Intelligent Systems. He studied Computer Science at the University of Dortmund and received his Ph.D. degree in Computer Science from the University of Bonn in 1991. His research focuses on mobile robotics and system integration.

    This work was supported by the EU FP6 IST Cognitive Systems Integrated Project “CoSy” FP6-004250-IP.

    View full text