skip to main content
10.1145/3123266.3123287acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective

Published:19 October 2017Publication History

ABSTRACT

Predicting the walking path of a pedestrian in crowds is a pivotal step towards understanding his/her behavior. This is one of the recently emerging tasks in computer vision scarcely addressed to date. In this paper, we put forth a deep spatio-temporal learning-forecasting approach, which is composed of two modules. First, displacement information from pedestrians' walking history is extracted and fed into a convolutional layer in order to learn the undergoing motion patterns and produce high-level representations. Second, unlike the mainstream literature which learns the temporal or the spatial dynamics among the pedestrians separately, we propose to embed both components into a single framework via a Long-Short Term Memory based architecture that takes as input the previously extracted high-level motion cues and outputs the potential future walking routes of all pedestrians in one shot. We evaluate our approach on three large benchmark datasets, and show that it introduces large margin improvements with respect to recent works in the literature, both in short and long-term forecasting scenarios.

References

  1. Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social lstm: Human trajectory prediction in crowded spaces Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--971.Google ScholarGoogle Scholar
  2. Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. 2016. Knowledge Transfer for Scene-specific Motion Prediction European Conference on Computer Vision. Springer, 697--713.Google ScholarGoogle Scholar
  3. Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. Whatrqs the point: Semantic segmentation with point supervision European Conference on Computer Vision. Springer, 549--565.Google ScholarGoogle Scholar
  4. R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning BigLearn, NIPS Workshop.Google ScholarGoogle Scholar
  5. Zhiwei Deng, Arash Vahdat, Hexiang Hu, and Greg Mori. 2016. Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4772--4781.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description Proceedings of the IEEE conference on computer vision and pattern recognition. 2625--2634.Google ScholarGoogle Scholar
  7. Bo Du, Wei Xiong, Jia Wu, Lefei Zhang, Liangpei Zhang, and Dacheng Tao. 2017. Stacked convolutional denoising auto-encoders for feature representation. IEEE transactions on cybernetics Vol. 47, 4 (2017), 1017--1027.Google ScholarGoogle ScholarCross RefCross Ref
  8. Bob Givan and Ron Parr. 2001. An introduction to Markov decision processes. (2001).Google ScholarGoogle Scholar
  9. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).Google ScholarGoogle Scholar
  11. Alex Graves and Navdeep Jaitly. 2014. Towards End-To-End Speech Recognition with Recurrent Neural Networks Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1764--1772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fan Hu, Gui-Song Xia, Jingwen Hu, and Liangpei Zhang. 2015. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sensing, Vol. 7, 11 (2015), 14680--14707.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ashesh Jain, Amir R Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-RNN: Deep learning on spatio-temporal graphs Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5308--5317.Google ScholarGoogle Scholar
  14. Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In European Conference on Computer Vision. Springer, 201--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Namhoon Lee and Kris M Kitani. 2016. Predicting wide receiver trajectories in American football WACV. IEEE, 1--9.Google ScholarGoogle Scholar
  17. Nicholas Léonard, Sagar Waghmare, Yang Wang, and Jin-Hwa Kim. 2015. rnn: Recurrent library for torch. arXiv preprint arXiv:1511.07889 (2015).Google ScholarGoogle Scholar
  18. Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. 2007. Crowds by example Computer Graphics Forum, Vol. Vol. 26. Wiley Online Library, 655--664.Google ScholarGoogle Scholar
  19. Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos Anomaly detection in crowded scenes. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  20. Andrew Y Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. in Proc. 17th International Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You'll never walk alone: Modeling social behavior for multi-target tracking ICCV. IEEE, 261--268.Google ScholarGoogle Scholar
  22. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks Advances in neural information processing systems. 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jing Shao, Kai Kang, Chen Change Loy, and Xiaogang Wang. 2015. Deeply learned attributes for crowded scene understanding Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4657--4666.Google ScholarGoogle Scholar
  24. Jing Shao, Chen Change Loy, and Xiaogang Wang. 2014. Scene-independent group profiling in crowd. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shikhar Sharma, Ryan Kiros, and Ruslan Salakhutdinov. 2015. Action recognition using visual attention. arXiv preprint arXiv:1511.04119 (2015).Google ScholarGoogle Scholar
  26. Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016. Robust scene text recognition with automatic rectification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4168--4176.Google ScholarGoogle Scholar
  27. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos Advances in neural information processing systems. 568--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised Learning of Video Representations using LSTMs Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 843--852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hang Su, Yinpend Dong, Jun Zhu, Haibin Lin, and Bo Zhang. 2016. Crowd Scene Understanding with Coherent Recurrent Neural Networks Proceedings of the IJCAI 2016. 3469--3476. http://www.ijcai.org/Abstract/16/490;http://dblp.uni-trier.de/rec/bib/conf/ijcai/SuDZLZ16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  31. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research Vol. 11, Dec (2010), 3371--3408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jacob Walker, Abhinav Gupta, and Martial Hebert. 2014. Patch to the future: Unsupervised visual prediction 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3302--3309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shi Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting Advances in Neural Information Processing Systems. 802--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 2048--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015 a. Pedestrian Travel Time Estimation in Crowded Scenes IEEE International Conference on Computer Vision (ICCV). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015 b. Understanding Pedestrian Behaviors from Stationary Crowd Groups Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE.Google ScholarGoogle Scholar
  37. Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2016. Pedestrian Behavior Understanding and Prediction with Deep Neural Networks European Conference on Computer Vision. Springer, 263--279.Google ScholarGoogle Scholar
  38. YoungJoon Yoo, Kimin Yun, Sangdoo Yun, JongHee Hong, Hawook Jeong, and Jin Young Choi. 2016. Visual Path Prediction in Complex Scenes With Crowded Moving Objects The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  39. Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-scene crowd counting via deep convolutional neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 833--841.Google ScholarGoogle Scholar
  40. Jianming Zhang, Zhe Lin, Jonathan Shen Xiaohui Brandt, and Stan Sclaroff. 2016. Top-down Neural Attention by Excitation Backprop European Conference on Computer Vision(ECCV).Google ScholarGoogle Scholar
  41. Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. 2015. Learning collective crowd behaviors with dynamic pedestrian-agents. International Journal of Computer Vision Vol. 111, 1 (2015), 50--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Bolei Zhou, Xiaoou Tang, Hepeng Zhang, and Xiaogang Wang. 2014. Measuring Crowd Collectiveness. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 8 (2014), 1586--1599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, and Yu Qiao. 2016. A key volume mining deep framework for action recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1991--1999.Google ScholarGoogle Scholar
  44. Maryam Ziaeefard and Robert Bergevin. 2015. Semantic human activity recognition: a literature review. Pattern Recognition, Vol. 48, 8 (2015), 2329--2345. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '17: Proceedings of the 25th ACM international conference on Multimedia
      October 2017
      2028 pages
      ISBN:9781450349062
      DOI:10.1145/3123266

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader