Skip to main content

2017 | OriginalPaper | Buchkapitel

Object Triggered Egocentric Video Summarization

verfasst von : Samriddhi Jain, Renu M. Rameshan, Aditya Nigam

Erschienen in: Computer Analysis of Images and Patterns

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Egocentric videos are usually of long duration and contains lot of redundancy which makes summarization an essential task for such videos. In this work we are targeting object triggered egocentric video summarization which aims at extracting all the occurrences of an object in a given video, in near real time. We propose a modular pipeline which first aims at limiting the redundant information and then uses a Convolutional Neural Network and LSTM based approach for object detection. Following this we represent the video as a dictionary which captures the semantic information in the video. Matching a query object reduces to doing an And-Or Tree traversal followed by deepmatching algorithm for fine grained matching. The frames containing the object, which would have been missed at the pruning stage are retrieved by running a tracker on the frames selected by the pipeline mentioned. The modular pipeline allows replacing any module with its more efficient version. Performance tests ran on the overall pipeline for egocentric datasets, EDUB dataset and personal recorded videos, give an average recall of 0.76.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Corso, J.J., Alahi, A., Grauman, K., Hager, G.D., Morency, L.P., Sawhney, H., Sheikh, Y.: Video analysis for body-worn cameras in law enforcement. arXiv preprint (2016). arXiv:1604.03130 Corso, J.J., Alahi, A., Grauman, K., Hager, G.D., Morency, L.P., Sawhney, H., Sheikh, Y.: Video analysis for body-worn cameras in law enforcement. arXiv preprint (2016). arXiv:​1604.​03130
4.
Zurück zum Zitat Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_23 CrossRef Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-33718-5_​23 CrossRef
7.
Zurück zum Zitat Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)CrossRef Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)CrossRef
8.
Zurück zum Zitat Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
9.
Zurück zum Zitat Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR, vol. 2, p. 7 (2012) Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR, vol. 2, p. 7 (2012)
10.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRef
11.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781 Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:​1301.​3781
12.
Zurück zum Zitat Nebot, A., Binefa, X., de Mántaras, R.L.: Artificial intelligence research and development. In: Proceedings of the 19th International Conference of the Catalan Association for Artificial Intelligence, vol. 288, Barcelona, Catalonia, Spain. IOS Press, 19–21 October 2016 Nebot, A., Binefa, X., de Mántaras, R.L.: Artificial intelligence research and development. In: Proceedings of the 19th International Conference of the Catalan Association for Artificial Intelligence, vol. 288, Barcelona, Catalonia, Spain. IOS Press, 19–21 October 2016
14.
Zurück zum Zitat Pech-Pacheco, J.L., Cristóbal, G., Chamorro-Martinez, J., Fernández-Valdivia, J.: Diatom autofocusing in brightfield microscopy: a comparative study. In: 15th International Conference on Pattern Recognition Proceedings, vol. 3, pp. 314–317. IEEE (2000) Pech-Pacheco, J.L., Cristóbal, G., Chamorro-Martinez, J., Fernández-Valdivia, J.: Diatom autofocusing in brightfield microscopy: a comparative study. In: 15th International Conference on Pattern Recognition Proceedings, vol. 3, pp. 314–317. IEEE (2000)
17.
Zurück zum Zitat Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: hierarchical deformable dense matching. Int. J. Comput. Vision 120, 1–24 (2015)MathSciNet Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: hierarchical deformable dense matching. Int. J. Comput. Vision 120, 1–24 (2015)MathSciNet
18.
Zurück zum Zitat Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011) Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
19.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRef
20.
Zurück zum Zitat Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of the h. 264/avc standard. IEEE Trans. Circuits Syst. Video Technol. 17(9), 1103–1120 (2007)CrossRef Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of the h. 264/avc standard. IEEE Trans. Circuits Syst. Video Technol. 17(9), 1103–1120 (2007)CrossRef
21.
Zurück zum Zitat Sivic, J., Schaffalitzky, F., Zisserman, A.: Efficient object retrieval from videos. In: 2004 12th European Signal Processing Conference, pp. 1737–1740, September 2004 Sivic, J., Schaffalitzky, F., Zisserman, A.: Efficient object retrieval from videos. In: 2004 12th European Signal Processing Conference, pp. 1737–1740, September 2004
22.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2016) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2016)
23.
Zurück zum Zitat Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV 2015, pp. 4525–4533 (2015). doi:10.1109/ICCV.2015.514 Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV 2015, pp. 4525–4533 (2015). doi:10.​1109/​ICCV.​2015.​514
24.
Zurück zum Zitat Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings CVPR (2015) Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings CVPR (2015)
Metadaten
Titel
Object Triggered Egocentric Video Summarization
verfasst von
Samriddhi Jain
Renu M. Rameshan
Aditya Nigam
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64698-5_36