Skip to main content
Top

2021 | OriginalPaper | Chapter

Egomap: Hierarchical First-Person Semantic Mapping

Authors : Tamas Suveges, Stephen McKenna

Published in: Pattern Recognition. ICPR International Workshops and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We consider unsupervised learning of semantic, user-specific maps from first-person video. The task we address can be thought of as a semantic, non-geometric form of simultaneous localisation and mapping, differing in significant ways from formulations typical in robotics. Locations, termed stations, typically correspond to rooms or areas in which a user spends time, places to which they might refer in spoken conversation. Our maps are modeled as a hierarchy of probabilistic station graphs and view graphs. View graphs capture an aspect of user behaviour within stations. Visits are temporally segmented based on qualitative visual motion and used to update the map, either by updating an existing map station or adding a new map station. We contribute a labelled dataset suitable for evaluation of this novel SLAM task. Experiments compare mapping performance with and without the use of view graphs and demonstrate better online mapping than when using offline clustering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Kristensson, P.-O., Lilley, J., Black, R., Waller, A.: A design engineering approach for quantitatively exploring context-aware sentence retrieval for nonspeaking individuals with motor disabilities. In: Proceedings of CHI Conference on Human Factors in Computing Systems (2020) Kristensson, P.-O., Lilley, J., Black, R., Waller, A.: A design engineering approach for quantitatively exploring context-aware sentence retrieval for nonspeaking individuals with motor disabilities. In: Proceedings of CHI Conference on Human Factors in Computing Systems (2020)
2.
go back to reference Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: IEEE Conference on Computer Vision and Pattern Recognition 2012, pp. 2847–2854 (2012) Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: IEEE Conference on Computer Vision and Pattern Recognition 2012, pp. 2847–2854 (2012)
3.
go back to reference Gemmell, J., Bell, C., Lueder, R.: MyLifeBits: a personal database for everything. Commun. ACM 49, 89–95 (2006)CrossRef Gemmell, J., Bell, C., Lueder, R.: MyLifeBits: a personal database for everything. Commun. ACM 49, 89–95 (2006)CrossRef
5.
go back to reference Hou, Y., Zhang, H., Zhou, S.: Convolutional neural network-based image representation for visual loop closure detection. In: IEEE International Conference on Information and Automation, pp. 2238–2245 (2015) Hou, Y., Zhang, H., Zhou, S.: Convolutional neural network-based image representation for visual loop closure detection. In: IEEE International Conference on Information and Automation, pp. 2238–2245 (2015)
6.
go back to reference Valgren, C., Duckett, T., Lilienthal, A.: Incremental spectral clustering and its application to topological mapping. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 4283–4288, April 2007 Valgren, C., Duckett, T., Lilienthal, A.: Incremental spectral clustering and its application to topological mapping. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 4283–4288, April 2007
7.
go back to reference Payá, L., Mayol, W., Cebollada, S., Reinoso, O.: Compression of topological models and localization using the global appearance of visual information. In: IEEE International Conference on Robotics and Automation (ICRA) (2017) Payá, L., Mayol, W., Cebollada, S., Reinoso, O.: Compression of topological models and localization using the global appearance of visual information. In: IEEE International Conference on Robotics and Automation (ICRA) (2017)
8.
go back to reference Garcia-Fidalgo, E., Ortiz, A.: Hierarchical place recognition for topological mapping. IEEE Trans. Robot. 33(5), 1061–1074 (2017)CrossRef Garcia-Fidalgo, E., Ortiz, A.: Hierarchical place recognition for topological mapping. IEEE Trans. Robot. 33(5), 1061–1074 (2017)CrossRef
9.
go back to reference Patra, S., Gupta, K., Ahmad, F., Arora, C., Banerjee, S.: EGO-SLAM: a robust monocular SLAM for egocentric videos. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 31–40 (2019) Patra, S., Gupta, K., Ahmad, F., Arora, C., Banerjee, S.: EGO-SLAM: a robust monocular SLAM for egocentric videos. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 31–40 (2019)
10.
go back to reference Furnari, A., Farinella, G.M., Battiato, S.: Recognizing personal locations from egocentric videos. IEEE Trans. Hum.-Mach. Syst. 47(1), 6–18 (2017) Furnari, A., Farinella, G.M., Battiato, S.: Recognizing personal locations from egocentric videos. IEEE Trans. Hum.-Mach. Syst. 47(1), 6–18 (2017)
11.
go back to reference Furnari, A., Farinella, G.M., Battiato, S.: Temporal segmentation of egocentric videos to highlight personal locations of interest. In: Hua, G., Jégou, H. (eds.) Computer Vision: ECCV Workshops (2016) Furnari, A., Farinella, G.M., Battiato, S.: Temporal segmentation of egocentric videos to highlight personal locations of interest. In: Hua, G., Jégou, H. (eds.) Computer Vision: ECCV Workshops (2016)
12.
go back to reference Ortis, A., Farinella, G.M., D’Amico, V., Addesso, L., Torrisi, G., Battiato, S.: Organizing egocentric videos of daily living activities. Pattern Recogn. 72, 207–218 (2017)CrossRef Ortis, A., Farinella, G.M., D’Amico, V., Addesso, L., Torrisi, G., Battiato, S.: Organizing egocentric videos of daily living activities. Pattern Recogn. 72, 207–218 (2017)CrossRef
13.
go back to reference Zivkovic, Z., Booij, O., Kröse, B.: From images to rooms. Robot. Auton. Syst. 55(5), 411–418 (2007)CrossRef Zivkovic, Z., Booij, O., Kröse, B.: From images to rooms. Robot. Auton. Syst. 55(5), 411–418 (2007)CrossRef
14.
go back to reference Milford, M.J., Wyeth, G.F.: SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE International Conference on Robotics and Automation (2012) Milford, M.J., Wyeth, G.F.: SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE International Conference on Robotics and Automation (2012)
15.
go back to reference Cummins, M., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)CrossRef Cummins, M., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)CrossRef
16.
go back to reference Jeong, J., Cho, Y., Shin, Y.-S., Roh, H., Kim, A.: Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 38(6), 642–657 (2019)CrossRef Jeong, J., Cho, Y., Shin, Y.-S., Roh, H., Kim, A.: Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 38(6), 642–657 (2019)CrossRef
17.
go back to reference Ruiz-Sarmiento, J.R., Galindo, C., González-Jiménez, J.: Robot@home, a robotic dataset for semantic mapping of home environments. Int. J. Robot. Res. 36(2), 131–141 (2017)CrossRef Ruiz-Sarmiento, J.R., Galindo, C., González-Jiménez, J.: Robot@home, a robotic dataset for semantic mapping of home environments. Int. J. Robot. Res. 36(2), 131–141 (2017)CrossRef
18.
go back to reference Schubert, D., Goll, T., Demmel, N., Usenko, V., Stuckler, J., Cremers, D.: The TUM VI benchmark for evaluating visual-inertial odometry. In: International Conference on Intelligent Robots and Systems (IROS), October 2018 Schubert, D., Goll, T., Demmel, N., Usenko, V., Stuckler, J., Cremers, D.: The TUM VI benchmark for evaluating visual-inertial odometry. In: International Conference on Intelligent Robots and Systems (IROS), October 2018
19.
go back to reference Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2013) Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2013)
20.
go back to reference Li, W., Saeedi, S., McCormac, J., Clark, R., Tzoumanikas, D., Ye, Q., Huang, Y., Tang, R., Leutenegger, S.: InteriorNet: mega-scale multi-sensor photo-realistic indoor scenes dataset. In: British Machine Vision Conference (BMVC) (2018) Li, W., Saeedi, S., McCormac, J., Clark, R., Tzoumanikas, D., Ye, Q., Huang, Y., Tang, R., Leutenegger, S.: InteriorNet: mega-scale multi-sensor photo-realistic indoor scenes dataset. In: British Machine Vision Conference (BMVC) (2018)
21.
go back to reference Caruso, D., Engel, J., Cremers, D.: Large-scale direct SLAM for omnidirectional cameras. In: International Conference on Intelligent Robots and Systems (IROS) Caruso, D., Engel, J., Cremers, D.: Large-scale direct SLAM for omnidirectional cameras. In: International Conference on Intelligent Robots and Systems (IROS)
22.
go back to reference Spera, E., Furnari, A., Battiato, S., Farinella, G.M.: EgoCart: a benchmark dataset for large-scale indoor image-based localization in retail stores. IEEE Trans. Circuits Syst. Video Technol. (2019) Spera, E., Furnari, A., Battiato, S., Farinella, G.M.: EgoCart: a benchmark dataset for large-scale indoor image-based localization in retail stores. IEEE Trans. Circuits Syst. Video Technol. (2019)
23.
go back to reference Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.M.: Egocentric visitors localization in cultural sites. J. Comput. Cult. Heritage (JOCCH) 12(2), 1–19 (2019)CrossRef Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.M.: Egocentric visitors localization in cultural sites. J. Comput. Cult. Heritage (JOCCH) 12(2), 1–19 (2019)CrossRef
24.
go back to reference Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.M.: Egocentric point of interest recognition in cultural sites. In: VISIGRAPP (VISAPP) (2019) Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.M.: Egocentric point of interest recognition in cultural sites. In: VISIGRAPP (VISAPP) (2019)
25.
go back to reference Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: International Conference on Computer Vision (ICCV), December 2015 Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: International Conference on Computer Vision (ICCV), December 2015
26.
go back to reference Damen, D., et al.: Rescaling egocentric vision. CoRR, vol. abs/2006.13256 (2020) Damen, D., et al.: Rescaling egocentric vision. CoRR, vol. abs/2006.13256 (2020)
27.
go back to reference Aghaei, M., Dimiccoli, M., Ferrer, C.C., Radeva, P.: Towards social pattern characterization in egocentric photo-streams. Comput. Vis. Image Underst. 171, 104–117 (2018)CrossRef Aghaei, M., Dimiccoli, M., Ferrer, C.C., Radeva, P.: Towards social pattern characterization in egocentric photo-streams. Comput. Vis. Image Underst. 171, 104–117 (2018)CrossRef
28.
go back to reference Talavera, E., Wuerich, C., Petkov, N., Radeva, P.: Topic modelling for routine discovery from egocentric photo-streams. Pattern Recogn. 104, 107330 (2020)CrossRef Talavera, E., Wuerich, C., Petkov, N., Radeva, P.: Topic modelling for routine discovery from egocentric photo-streams. Pattern Recogn. 104, 107330 (2020)CrossRef
29.
go back to reference Bolaños, M., Peris, Á., Casacuberta, F., Soler, S., Radeva, P.: Egocentric video description based on temporally-linked sequences. J. Vis. Commun. Image Represent. 50, 205–216 (2018)CrossRef Bolaños, M., Peris, Á., Casacuberta, F., Soler, S., Radeva, P.: Egocentric video description based on temporally-linked sequences. J. Vis. Commun. Image Represent. 50, 205–216 (2018)CrossRef
30.
go back to reference Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2714–2721 (2013) Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2714–2721 (2013)
31.
go back to reference Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2537–2544, June 2014 Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2537–2544, June 2014
32.
go back to reference Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.M.: EGO-CH: dataset and fundamental tasks for visitors behavioral understanding using egocentric vision. Pattern Recogn. Lett. 131, 150–157 (2020)CrossRef Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.M.: EGO-CH: dataset and fundamental tasks for visitors behavioral understanding using egocentric vision. Pattern Recogn. Lett. 131, 150–157 (2020)CrossRef
33.
go back to reference Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 1470–1477. IEEE (2003) Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 1470–1477. IEEE (2003)
34.
go back to reference Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981) Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981)
35.
go back to reference Srinivasan, M.V., Venkatesh, S., Hosie, R.: Qualitative estimation of camera motion parameters from video sequences. Pattern Recogn. 30(4), 593–606 (1997)CrossRef Srinivasan, M.V., Venkatesh, S., Hosie, R.: Qualitative estimation of camera motion parameters from video sequences. Pattern Recogn. 30(4), 593–606 (1997)CrossRef
36.
go back to reference Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018)CrossRef Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018)CrossRef
37.
go back to reference Vinh, N., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: International Conference on Machine Learning (ICML) (2009) Vinh, N., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: International Conference on Machine Learning (ICML) (2009)
Metadata
Title
Egomap: Hierarchical First-Person Semantic Mapping
Authors
Tamas Suveges
Stephen McKenna
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-68796-0_25

Premium Partner