skip to main content
10.1145/1088463.1088485acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Analyzing and predicting focus of attention in remote collaborative tasks

Published:04 October 2005Publication History

ABSTRACT

To overcome the limitations of current technologies for remote collaboration, we propose a system that changes a video feed based on task properties, people's actions, and message properties. First, we examined how participants manage different visual resources in a laboratory experiment using a collaborative task in which one partner (the helper) instructs another (the worker) how to assemble online puzzles. We analyzed helpers' eye gaze as a function of the aforementioned parameters. Helpers gazed at the set of alternative pieces more frequently when it was harder for workers to differentiate these pieces, and less frequently over repeated trials. The results further suggest that a helper's desired focus of attention can be predicted based on task properties, his/her partner's actions, and message properties. We propose a conditional Markov model classifier to explore the feasibility of predicting gaze based on these properties. The accuracy of the model ranged from 65.40% for puzzles with easy-to-name pieces to 74.25% for puzzles with more difficult to name pieces. The results suggest that we can use our model to automatically manipulate video feeds to show what helpers want to see when they want to see it.

References

  1. Argyle, M. & Cook, M. (1976). Gaze and Mutual Gaze. Cambridge University Press.]]Google ScholarGoogle Scholar
  2. Brumitt B., Krumm J., Meyers B., & Shafer S. (2000). Let there be light: Comparing interfaces for homes of the future. IEEE Personal Communications, August 2000.]]Google ScholarGoogle Scholar
  3. Campana, E., Baldridge, J., Dowding, J., Hockey, B. A., Remington, R. W., & Stone, L. S. (2001). Using eye movements to determine referents in a spoken dialogue system. Proceedings of the 2001 Workshop on Perceptive User Interfaces, pp. 1--5.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Clark, H. H. (1996). Using Language. Cambridge, England: Cambridge University Press.]]Google ScholarGoogle Scholar
  5. Clark, H. H., & Brennan, S. E. (1991). Grounding in Communication. In L. B. Resnick, R. M. Levine, & S. D. Teasley (Eds.). Perspectives on socially shared cognition (pp. 127--149). Washington, DC: APA.]]Google ScholarGoogle Scholar
  6. Clark, H. H. & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1--39.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. Frey L. A., White K. P. Jr., Hutchinson T. E. (1990). Eye-gaze word processing. IEEE Transactions on Systems, Man and Cybernetics, 20(4), 944--950.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Fussell, S. R., Kraut, R. E., & Siegel, J. (2000). Coordination of communication: Effects of shared visual context on collaborative work. Proceedings of CSCW 2000 (pp. 21--30). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fussell, S. R., Setlock, L. D., & Kraut, R. E. (2003). Effects of head-mounted and scene-oriented video systems on remote collaboration on physical tasks. Proceedings of CHI 2003 (pp. 513--520). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gaver, W., Sellen, A., Heath, C., & Luff, P. (1993). One is not enough: Multiple views in a media space. Proceedings of Interchi '93 (pp. 335--341). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gergle, D., Millan, D. R., Kraut, R. E., & Fussell, S. R. (2004). Persistence matters: Making the most of chat in tightly-coupled work. CHI 2004 (pp. 431--438). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hutchinson T. E., White K. P. Jr., Martin W. N., Reichert K. C., Frey L. A. (1989). Human-computer interaction using eye-gaze input. IEEE Transaction on Systems, Man, and Cybernetics, 19, pp. 1527--1534.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. Jacob, R. J. K. (1993). Eye-movement-based human-computer interaction techniques. In H. R. Hartson & D. Hix (Eds.), Advances in Human-Computer Interaction, Vol. 4 (pp. 151--190). Norwood, NJ: Ablex.]]Google ScholarGoogle Scholar
  14. Kraut, R. E., Fussell, S. R., Brennan, S., & Siegel, J. (2003). A framework for understanding effects of proximity on collaboration : Implications for technologies to support remote collaborative work. In P. Hinds & S. Kiesler (Eds.). Technology and Distributed Work.]]Google ScholarGoogle Scholar
  15. Kraut, R. E., Fussell, S. R., & Siegel, J. (2003). Visual information as a conversational resource in collaborative physical tasks. Human Computer Interaction, 18, 13--49.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kraut, R. E., Gergle, D., & Fussell, S. R. (2002). The use of visual information in shared visual spaces: Informing the development of virtual co-presence. Proceedings of CSCW 2002 (pp. 31--40). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kuzuoka, H., Kosuge, T., & Tanaka, K.. (1994) GestureCam: A video communication system for sympathetic remote collaboration. Proceedings of CSCW 1994 (pp. 35--43). NY: ACM.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kuzuoka, H., Oyama, S., Yamazaki, K., Suzuki, K., & Mitsuishi, M. (2000). GestureMan: A mobile robot that embodies a remote instructor's actions. Proceedings of CSCW 2000 (pp. 155--162). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Maglio, P. P., Matlock T., Campbell C. S., Zhai S., Smith B. A. (2000). Gaze and speech in attentive user interfaces. In Proceedings of the International Conference on Multimodal Interfaces, volume 1948 from LNCS. Springer, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Oh, A., Fox, H., Kleek, M. V., Adler, A., Gajos, K., Morency, L., Darrell, T., Evaluating Look-to-Talk: A Gaze-Aware Interface in a Collaborative Environment (2002), In Proceedings of CHI '02 extended abstracts on Human factors in computing systems, pp. 650 -- 651.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ou, J. (unpublished). DOVE-2: Combining gesture with remote camera control.]]Google ScholarGoogle Scholar
  22. Ou, J., Fussell, S. R., Chen, X., Setlock, L. D., & Yang, J. (2003). Gestural communication over video stream: Supporting multimodal interaction for remote collaborative physical tasks. In Proceedings of International Conference on Multimodal Interfaces, Nov. 5-7, 2003, Vancouver, Canada.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ou, J., Oh, L.M., Yang, J., & Fussell, S. R. (2005). Effects of task properties, partner actions, and message content on eye gaze patterns in a collaborative task. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 231 -- 240. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Salvucci Dario D. (1999). Inferring Intent in Eye-Based Interfaces: Tracing Eye Movements with Process Models. In Human Factors in Computing Systems: CHI 99, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sibert, L. E., and Jacob, R. J. (2000). Evaluation of Eye Gaze Interaction, In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 281 -- 288.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stiefelhagen, R., Yang, J. (1997). Gaze Tracking for Multimodal Human-Computer Interaction. In Proceedings of International Conf. on Acoustics, Speech, and Signal Processing, April 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stiefelhagen, R., Yang, J., & Waibel, A. (2002). Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks, 13, 928--938.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Vertegaal, R., Slagter, R., van der Veer, G., & Nijholt, A. (2001). Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. Proceedings of CHI 2001 (pp. 301--308). NY: ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analyzing and predicting focus of attention in remote collaborative tasks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces
        October 2005
        344 pages
        ISBN:1595930280
        DOI:10.1145/1088463

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 October 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader