Visual search for an object in a 3D environment using a mobile robot

https://doi.org/10.1016/j.cviu.2009.06.010Get rights and content

Abstract

Consider the problem of visually finding an object in a mostly unknown space with a mobile robot. It is clear that all possible views and images cannot be examined in a practical system. Visual attention is a complex phenomenon; we view it as a mechanism that optimizes the search processes inherent in vision (Tsotsos, 2001; Tsotsos et al., 2008) [1], [2]. Here, we describe a particular example of a practical robotic vision system that employs some of these attentive processes. We cast this as an optimization problem, i.e., optimizing the probability of finding the target given a fixed cost limit in terms of total number of robotic actions required to find the visual target. Due to the inherent intractability of this problem, we present an approximate solution and investigate its performance and properties. We conclude that our approach is sufficient to solve this problem and has additional desirable empirical characteristics.

Introduction

Attention is one of those visual phenomena that has been very easy to ignore in computer vision and robotics but seems to now be emerging as an important issue. Visual attention is a phenomenon that has been of interest to many disciplines for over a hundred years, with an enormous literature and thousands upon thousands of experiments investigating the vast range of its manifestations. Theoretical and computational models have been proposed since the 1950s in an attempt to explain how this phenomenon comes about and how it contributes to our perception of the real world (for a review see [3]). The first formal proof for the necessity of attentive processes appeared in [4] (see also [5], [6], [7], [8]). There, the problem of visual matching – the task of determining whether or not an instance of a particular model exists in a given image without the use of any knowledge whatsoever – was shown to be NP-complete. It has exponential time complexity, in the size of the image, and further, the result is independent of implementation. In addition to other mechanisms, attention contributes to changing this problem into one with linear time complexity in both worst case and median case analyses [4], [9].

The breadth and variety of attentive phenomena as they relate to computer vision was described in [7]. There, a spectrum of problems requiring attention was laid out including: selection of objects, events or tasks relevant for a domain; selection of world model; selection of visual field; selection of detailed sub-regions for analysis; selection of spatial and feature dimensions of interest; and the selection of operating parameters for low level operations. Most computer vision research makes assumptions that reduce the combinatorial problems inherent in the above tasks, or better yet, eliminate the need for attention, using strategies such as:

  • fixed camera systems negate the need for selection of visual field or selection of best viewpoints;

  • pre-segmentation eliminates the need to select a region of interest;

  • ‘clean’ backgrounds ameliorate the segmentation problem;

  • assumptions about relevant features and their values reduce their search ranges;

  • knowledge of task domain negates the need to search a stored set of all domains;

  • knowledge of objects appearing in scenes eliminates search of a stored set of objects;

  • knowledge of which events are of interest eliminates search of a stored set of events.

In this way, the extent of the search space is seriously reduced before the visual processing takes place, and often even before the algorithms for solution are designed. However, it is clear that in everyday vision, and certainly in order to understand vision, these assumptions cannot be made. Real vision requirements, for humans as well as robots, are not so cooperative and attentive processes need to play a central role in all visual processes.

The example robotic system described in this paper looks at the viewpoint and selection of visual field issues in the context of search for a given, known object in an unknown 3D world. As such, it is an instance of the active vision approach [10]. Bajcsy argued that rather than simply analyzing a set of prerecorded images, the observer should actively control its image acquisition process so that the acquired images are relevant and useful for the task at hand. In the case of region segmentation problems, the camera could be moved to a viewpoint in which, for example, the projection of an object in the image plane leads to a higher contrast region, or an object’s edge projects to a stronger gradient in the image. If a particular view of an object (or one of its parts) is ambiguous, the camera can be moved to disambiguate the object. For example, Wilkes et al. [11] proposed a system that drives a camera to a standard viewpoint with respect to an unknown object. From such a viewpoint, the object recognition task is reduced to a two-dimensional pattern recognition problem. The authors choose to define a standard view as a position at which the lengths of two non-parallel object line segments are maximized, and the longer line has a specified length in the image. The standard view is achieved by moving the camera on the end of a robot arm. From a standard viewing position, the extracted line segments are used to index into the database to find a matching, stored object. In a different strategy for the same problem, Dickinson et al. [12] combine an attention mechanism and a viewpoint control strategy to perform active object recognition. Their representation scheme is called the aspect prediction graph. Given an ambiguous view of an object this representation can inform the algorithm if there is a more discriminating view of the object. If there is, the representation will indicate, in which direction the camera should be moved to capture that view. Finally, it specifies what visual events (appearance or disappearance of object features) one should encounter while moving the camera to the new viewpoint. In both cases, the image interpretation process is tightly coupled to the viewpoint selection and data acquisition process, as Bajcsy suggested. The success of these works lies in the fact that no assumptions about viewpoint were needed and attentive processes – selection processes – provided the reduction in the combinatorics of search that would cripple a brute-force, blind, search.

This paper focuses on the problem of visual search for an object in a 3D environment using a mobile robot, providing a description of the solution strategy and an example of the robot’s performance and an empirical performance evaluation.

Section snippets

A robot that searches: previous work

Suppose one wishes a robot to search for and locate a particular object in a 3D world. A direct search certainly suffices for the solution. Assuming that the target may lie with equal probability at any location, the viewpoint selection problem is resolved by moving a camera to take images of the previously not viewed portions of the full 3D space. This kind of exhaustive, brute-force approach can suffice for a solution; however, it is both computationally and mechanically prohibitive. As an

The object search problem

Ye and Tsotsos define object search as a problem of maximizing the probability of detecting the target within a given cost constraint [30], [31]. Their formulation combines the influence of a search agent’s initial knowledge and the influence of the performance of available recognition algorithms. For a practical search strategy, the search region is characterized by a probability distribution of the presence of the target. The control of the sensing parameters depends on the current state of

Implementation

Basic requirements for the successful implementation of a search agent include having a method for determining depth, a method for detecting the target, and means to control sensor parameters and mobility.

Our search agent is implemented on a Pioneer 3 robot, a mobile four-wheel differentially steered drive ActiveMedia Robotics platform. The platform is equipped with a Point Grey Research Bumblebee camera mounted on a Directed Perception pan-tilt unit. It is a two lens stereo vision camera that

A typical full search

Fig. 1 shows the search region that is part of our laboratory. During the experiments, the environment is static, there are no dynamic obstacles. The region’s dimensions are 9 m × 5 m × 2.5 m (width × length × height). We divide the floor plane into 1 × 1 m2 grid. Each vertex of the grid is a potential robot’s location. The search agent knows its initial position with respect to the region’s boundaries, but has no prior knowledge of the internal configuration of the room. The search region is also divided

Object-centred viewpoint-dependent detectability function

Whether or not a camera system plus detection function can actually be in a position where detection is possible depends on a variety of interacting factors. Detectability depends on viewpoint, image size, distance, scale, rotation in 3D occlusion, let alone lighting conditions or surface reflectance. Here, the only dimensions of this problem that we address are those dependent of viewpoint selection and object pose. A target object may be at any pose in the environment; it may even be hidden.

Conclusions

The search for an object in a 3D space benefits greatly from attentive mechanisms that limit the search space in a principled manner. We presented a solution to this problem with an effective implementation using a robotic agent.

Our solution performs search for a target object in an unknown 3D environment. No assumptions are made about the configuration of the environment other than the locations of the exterior boundary nor the position of the target object. Since our search agent generates

Acknowledgments

The authors wish to thank Dr. Ehud Rivlin for providing the code for the TangentBug algorithm. Research support was gratefully received from the Natural Sciences and Engineering Research Council of Canada, the Canada Foundation for Innovation, the Ontario Innovation Trust, and the Canada Research Chairs Program.

References (45)

  • J.K. Tsotsos

    On the relative complexity of active vs. passive visual search

    Int. J. Comput. Vis.

    (1992)
  • R. Rensink, A new proof of the np-completeness of visual match, Technical report, Computer Science Department,...
  • R. Bajcsy, Active perception vs. passive perception, in: Proceedings of the IEEE Workshop on Computer Vision:...
  • D. Wilkes, J.K. Tsotsos, Active object recognition, in: CVPR’92, 1992, pp....
  • T.D. Garvey, Perceptual strategies for purposive vision, Technical report, SRI International, 117,...
  • L. Wixson et al.

    Using intermediate object to improve efficiency of visual search

    Int. J. Comput. Vis.

    (1994)
  • D.A. Reece, Selective perception for robot driving, Technical report, Carnegie Mellon Computer Science,...
  • Y. Ye et al.

    A complexity level analysis of the sensor planning task for object search

    Comput. Intell.

    (2001)
  • C.I. Connolly, The determination of next best views, in: Proceedings of the IEEE International Conference on Robotics...
  • J. Maver, R. Bajcsy, Occlusions as a guide for planning the next view, in: IEEE Transactions of Pattern Analysisv and...
  • H. Kim, R. Jain, R. Volz, Object recognition using multiple views, in: Proceedings of the IEEE International Conference...
  • C.K. Cowan, P.D. Kovesi, Automatic sensor placement from vision task requirements, in: IEEE Transactions on Pattern...
  • Cited by (85)

    • Service planning oriented efficient object search: A knowledge-based framework for home service robot

      2022, Expert Systems with Applications
      Citation Excerpt :

      The unknown object search (Aydemir, Pronobis, Gobelbecker, & Jensfelt, 2013) is appropriate for the service planning since the robot usually does not know where the service-related objects are before performing the service. In the early research, an approximate solution was presented to find the target objects in a robot vision system (Shubina & Tsotsos, 2010). Jeremy et al. proposed a systematic method to address the issue of autonomous 3D object search in indoor environments by an actuated stereo-camera head (Ma, Chung, & Burdick, 2011).

    • Knowledge-based multimodal information fusion for role recognition and situation assessment by using mobile robot

      2019, Information Fusion
      Citation Excerpt :

      Robots are expected to perform high-level tasks such as visual searching, or interaction with the human. Previous studies introduced prior knowledge of environments in their systems [1–3]. This could enable robots to understand their surroundings, aware of situations, and act wisely.

    • Hierarchical Semantic Knowledge-Based Object Search Method for Household Robots

      2024, IEEE Transactions on Emerging Topics in Computational Intelligence
    View all citing articles on Scopus
    View full text