Elsevier

Computers & Graphics

Volume 35, Issue 4, August 2011, Pages 810-822
Computers & Graphics

Mobile Augmented Reality
Indirect augmented reality

https://doi.org/10.1016/j.cag.2011.04.010Get rights and content

Abstract

Developing augmented reality (AR) applications for mobile devices and outdoor environments has historically required a number of technical trade-offs related to tracking. One approach is to rely on computer vision which provides very accurate tracking, but can be brittle, and limits the generality of the application. Another approach is to rely on sensor-based tracking which enables widespread use, but at the cost of generally poor tracking performance. In this paper we present and evaluate a new approach, which we call Indirect AR, that enables perfect alignment of virtual content in a much greater number of application scenarios.

To achieve this improved performance we replace the live camera view used in video see through AR with a previously captured panoramic image. By doing this we improve the perceived quality of the tracking while still maintaining a similar overall experience. There are some limitations of this technique, however, related to the use of panoramas. We evaluate these boundaries conditions on both a performance and experiential basis through two user studies. The result of these studies indicates that users preferred Indirect AR over traditional AR in most conditions, and when conditions do degrade to the point the experience changes, Indirect AR can still be a very useful tool in many outdoor application scenarios.

Highlights

► We present a new type of mixed reality using panoramas called Indirect AR. ► Using panoramas in place of a live camera view reduces perceptual tracking error. ► Indirect AR improves user experience by enabling correct occlusions and lighting. ► Users prefer indirect AR to traditional AR in many conditions. ► Indirect AR maintains a similar user experience to AR in non-perfect conditions.

Introduction

Outdoor AR has become very popular in a number of application spaces in recent years with the growth in popularity of smartphones and other portable hand-held devices. There are several commercially available AR browsers that display both point-of-interest (POI) information and, increasingly, basic 3D content. There are also an increasing number of AR games available for these platforms. Academically there has also been an increase in the number of experience and game focused projects that make use of outdoor AR in some way. However, many of these projects suffer from poor registration because they rely primarily on built-in sensors (GPS, compass, and sometimes gyroscopes) for tracking. These sensors, especially those used in commodity products, do not have nearly the accuracy required for convincing tracking in AR. This limits both the scope of the applications that are possible and decreases the quality of the user experience for applications that do exist. On the plus side, smartphone platforms allow AR applications to have a much broader audience. It is possible to have well registered AR on a smartphone [20], but this generally requires vision based tracking which frequently relies on previously known textures, and often can be brittle in unknown outdoor environments.

The goal of this paper is to enable experiences in these outdoor unprepared environments that are traditionally nearly impossible to have compelling AR experiences in. The approach we are taking for this is not to perfect computer vision based tracking, but instead to minimize the visual disturbances in the experience when using a given, existing, tracking approach. We have named our technique to do this Indirect AR because we are making the entire scene inside of the device virtual, using panoramic images instead of a live camera view (see example in Fig. 1). The user is no longer looking directly at the scene through a live camera view, as in video see through AR, but is instead looking at the scene indirectly, by looking at a previously captured image of it. This moves visible registration error from being essentially inside of the device (between the real world and virtual content), to being on the border of the device itself, between the view on the device screen, and the real world around the device. This means that within the device, between the virtual content and the panoramic representation of the real world, there is no registration error. In Fig. 2 a somewhat abstracted example is shown illustrating the difference in how both AR and Indirect AR look with the same amount of registration error. Moving registration error to the edge of the screen is better in part because it is a more difficult place to detect error due to both the bezel around the screen, and the altered field-of-view parameters of the on-screen image. In many ways, people are also already trained to believe that when they see a view of the real world on the screen of the mobile device it lines up with the world behind it. This is largely due to the proliferation of digital cameras already using the screen as a viewfinder. The viewfinder image people are used to seeing has already been manipulated in many ways though. It does not show the same view as a piece of glass in its place would. Instead the view is modified by the lens system to have a different field of view, depth of focus, etc. Even with these changes though people understand they are looking “through” the camera. We are taking this one step further by adding tracking error, essentially as another image modifier. Using built-in sensors, the portion of the panorama that is shown might be 5° or 10° off from what is directly behind the device, but people are not as likely to notice this because of all the other image modifiers.

While we focus on using panoramas as the representation of the real world in this paper, as the complexity and detail of the virtual representation of the real world increases, Indirect AR could become an even more powerful approach. The extreme conclusion of this increase in complexity might be the case of a real environment filled with a large array of video cameras and other sensors that would capture, in real time, the real environment and permit a perfect reconstruction of that environment, in real time, as seen from any arbitrary viewpoint. If this were possible, then an Indirect AR experience would become virtually indistinguishable from an ideal traditional AR experience. This vision may not be practical today; however, it is useful to keep in mind this ultimate expression of Indirect AR, much as Sutherlands Ultimate Display provided a vision of the ultimate expression of Virtual Reality.

In this paper, we focus on an implementation of Indirect AR that uses a more practical model of the real world that still delivers a very high quality, albeit temporally static, experience: panoramic images. Several companies, including Navteq, are collecting detailed models of urban environments by driving special vehicles that collect panoramic imagery and other data such as 3D point clouds. These are captured every few meters as the car drives down a road. Our approach assumes that a user stands at one location outdoors, downloads the nearest panoramic image, and then rotates around in place to examine the real world and the added augmentations. While this implementation is constrained compared to the ultimate vision, it still provides a very similar experience to traditional AR, and can potentially scale well today due to the wide availability of panoramic imagery. Since such imagery and related data are rapidly becoming available for most major urban areas across the world, Indirect AR could become a practical method for generating high quality AR experiences in urban environments. Using panoramas as a representation of the real world is somewhat limited in that panoramas are captured infrequently, meaning dynamic elements of the scene, such as traffic, weather, and lighting may not be represented correctly. As we will see throughout the rest of this paper, though, a high quality experience can still be had even if some of these limitations are not perfectly met.

In Section 3 we will first define our research questions in more detail. We will then discuss some of the pure performance comparisons between AR and Indirect AR in Section 4, showing how orientation error can have a huge negative effect on traditional AR, while having a much more minor effect on Indirect AR. Next, in Section 5, we will examine a component of the common scenario in Indirect AR when the panorama is not colocated with the user. We will examine how well users can align their view of the real world with the view presented on the device by studying user performance at pointing out real world objects that are highlighted in an image taken from a different location. We will show that even in difficult conditions users are very good at correlating their real world view with the alternate view displayed on the device screen. This does not help determine how similar the actual experience is to traditional AR though. We will explore that question with a second more in depth study, presented in Section 6, examining user perception of the Indirect AR experience. This study looks at many of the boundary cases of Indirect AR, and how these boundary cases change the type of experience for the user. It also examines the user experience in “good” conditions comparing Indirect AR to traditional AR with several different types of content. In these good conditions we found Indirect AR was superior to traditional AR regardless of the type of content, and gave users the same type of experience. When the panorama was not colocated with the user the Indirect AR experience eventually degraded, but this was not instantaneous; there was a region around the user where the same type of experience was preserved.

Section snippets

Related work

Our work builds on many magic lens techniques in both VR and AR. Magic lenses were first used in AR as part of HMD-based systems and affected the way virtual content was viewed [10]. Some examples of magic lens use closer to our use are Brown and Hua's [2] work in VR, and Quarles et al.'s [13], [14] work in AR. Both of these techniques allow the user to see a virtual version of the world (real or virtual) around them on their hand-held screen that is roughly registered to the world. Quarles et

Using panoramas as an AR substitute

One thing we feel is important for a compelling AR application is for the virtual content to be convincingly placed in the AR scene. By this we mean that the technology for how the virtual and physical components are combined should be as hidden as possible. The user should not have to make additional assumptions about what the AR scene “should” look like if something worked better; the AR scene should be convincing as it is. There are many ways to do this, from having pixel accurate tracking,

Registration

One of the biggest benefits of using Indirect AR in place of traditional AR is the greater accuracy in registration. In traditional AR any error in registration is visible directly between the physical object and virtual annotation. In Indirect AR the same registration error is only visible between the device and surroundings. That means that the registration between virtual annotations and the panorama that represents the real world is always perfect, even if the registration between the

Localization from disparate viewpoints

One of the biggest drawbacks of our current implementation of Indirect AR is the reliance on pre-captured panoramas. For the naïve, ideal experience it would be necessary to have a panorama located exactly where the user was standing. In the case of deploying Indirect AR generally this would mean having panoramas everywhere. While we do not have panoramas everywhere, we can approximate this by using panoramas collected by Navteq as they drive down most streets. There are two possible problems

User experience vs. AR

Now that we have established both that the link between augmentations and a representation of the real world is tighter in Indirect AR than in traditional AR, and panoramas from a variety of locations can at least be recognized by a user, we can ask questions about how the actual experience compares to AR. We are particularly interested in seeing if two related experiential components of traditional AR are also present in Indirect AR. In well done traditional AR there is a very strong visual

Conclusions

In this paper we have presented a new type of Mixed Reality experience that is similar to AR experientially, but can be used with lower quality tracking while still maintaining a good overall user experience. Using panoramas in place of the live camera view enables pixel accurate matching between the virtual content and representation of the real world. Through the work presented in this paper we also showed that even when conditions for Indirect AR degrade to the point that the experience

References (24)

  • T. Aoki et al.

    Virtual 3d world construction by inter-connecting photograph-based 3d models

  • L.D. Brown et al.

    Magic lenses for augmented virtual environments

    IEEE Computer Graphics and Applications

    (2006)
  • M. Gandy et al.

    Experiences with an ar evaluation test bed: presence, performance, and physiological measurement

  • B. Goldiez et al.

    Is presence present in augmented reality systems?

  • A. Hill et al.

    Kharma: An open kml/html architecture for mobile augmented reality applications

  • T. Iachini et al.

    The role of perspective in locating position in a real-world, unfamiliar environment

    Applied Cognitive Psychology

    (2003)
  • G. Liestol

    Augmented reality and digital genre design—situated simulations on the iphone

  • G. Liestol et al.

    In the presence of the past. a field trial evaluation of a situated simulation design reconstructing a viking burial scene

  • M. Livingston et al.

    The effect of registration error on tracking distant augmented objects

  • J. Looser et al.

    Through the looking glass: the use of lenses as an interface tool for augmented reality interfaces

  • T. Miyashita et al.

    An augmented reality museum guide

  • G. Papagiannakis et al.

    Believability and presence in mobile mixed reality environments

  • Cited by (91)

    View all citing articles on Scopus
    View full text