research-article

Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective

Author:
Yuke Li

Wuhan University, Wuhan, China

Wuhan University, Wuhan, China
View Profile

MM '17: Proceedings of the 25th ACM international conference on MultimediaOctober 2017Pages 235–243https://doi.org/10.1145/3123266.3123287

Published:19 October 2017Publication History

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 235–243

ABSTRACT

Predicting the walking path of a pedestrian in crowds is a pivotal step towards understanding his/her behavior. This is one of the recently emerging tasks in computer vision scarcely addressed to date. In this paper, we put forth a deep spatio-temporal learning-forecasting approach, which is composed of two modules. First, displacement information from pedestrians' walking history is extracted and fed into a convolutional layer in order to learn the undergoing motion patterns and produce high-level representations. Second, unlike the mainstream literature which learns the temporal or the spatial dynamics among the pedestrians separately, we propose to embed both components into a single framework via a Long-Short Term Memory based architecture that takes as input the previously extracted high-level motion cues and outputs the potential future walking routes of all pedestrians in one shot. We evaluate our approach on three large benchmark datasets, and show that it introduces large margin improvements with respect to recent works in the literature, both in short and long-term forecasting scenarios.

References

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social lstm: Human trajectory prediction in crowded spaces Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--971.Google Scholar
Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. 2016. Knowledge Transfer for Scene-specific Motion Prediction European Conference on Computer Vision. Springer, 697--713.Google Scholar
Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. Whatrqs the point: Semantic segmentation with point supervision European Conference on Computer Vision. Springer, 549--565.Google Scholar
R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning BigLearn, NIPS Workshop.Google Scholar
Zhiwei Deng, Arash Vahdat, Hexiang Hu, and Greg Mori. 2016. Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4772--4781.Google ScholarCross Ref
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description Proceedings of the IEEE conference on computer vision and pattern recognition. 2625--2634.Google Scholar
Bo Du, Wei Xiong, Jia Wu, Lefei Zhang, Liangpei Zhang, and Dacheng Tao. 2017. Stacked convolutional denoising auto-encoders for feature representation. IEEE transactions on cybernetics Vol. 47, 4 (2017), 1017--1027.Google ScholarCross Ref
Bob Givan and Ron Parr. 2001. An introduction to Markov decision processes. (2001).Google Scholar
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Google ScholarDigital Library
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).Google Scholar
Alex Graves and Navdeep Jaitly. 2014. Towards End-To-End Speech Recognition with Recurrent Neural Networks Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1764--1772. Google ScholarDigital Library
Fan Hu, Gui-Song Xia, Jingwen Hu, and Liangpei Zhang. 2015. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sensing, Vol. 7, 11 (2015), 14680--14707.Google ScholarCross Ref
Ashesh Jain, Amir R Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-RNN: Deep learning on spatio-temporal graphs Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5308--5317.Google Scholar
Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In European Conference on Computer Vision. Springer, 201--214. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
Namhoon Lee and Kris M Kitani. 2016. Predicting wide receiver trajectories in American football WACV. IEEE, 1--9.Google Scholar
Nicholas Léonard, Sagar Waghmare, Yang Wang, and Jin-Hwa Kim. 2015. rnn: Recurrent library for torch. arXiv preprint arXiv:1511.07889 (2015).Google Scholar
Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. 2007. Crowds by example Computer Graphics Forum, Vol. Vol. 26. Wiley Online Library, 655--664.Google Scholar
Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos Anomaly detection in crowded scenes. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google Scholar
Andrew Y Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. in Proc. 17th International Conf. on Machine Learning. Google ScholarDigital Library
Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You'll never walk alone: Modeling social behavior for multi-target tracking ICCV. IEEE, 261--268.Google Scholar
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks Advances in neural information processing systems. 91--99. Google ScholarDigital Library
Jing Shao, Kai Kang, Chen Change Loy, and Xiaogang Wang. 2015. Deeply learned attributes for crowded scene understanding Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4657--4666.Google Scholar
Jing Shao, Chen Change Loy, and Xiaogang Wang. 2014. Scene-independent group profiling in crowd. In CVPR. Google ScholarDigital Library
Shikhar Sharma, Ryan Kiros, and Ruslan Salakhutdinov. 2015. Action recognition using visual attention. arXiv preprint arXiv:1511.04119 (2015).Google Scholar
Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016. Robust scene text recognition with automatic rectification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4168--4176.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos Advances in neural information processing systems. 568--576. Google ScholarDigital Library
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised Learning of Video Representations using LSTMs Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 843--852. Google ScholarDigital Library
Hang Su, Yinpend Dong, Jun Zhu, Haibin Lin, and Bo Zhang. 2016. Crowd Scene Understanding with Coherent Recurrent Neural Networks Proceedings of the IJCAI 2016. 3469--3476. http://www.ijcai.org/Abstract/16/490;http://dblp.uni-trier.de/rec/bib/conf/ijcai/SuDZLZ16 Google ScholarDigital Library
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarCross Ref
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research Vol. 11, Dec (2010), 3371--3408. Google ScholarDigital Library
Jacob Walker, Abhinav Gupta, and Martial Hebert. 2014. Patch to the future: Unsupervised visual prediction 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3302--3309. Google ScholarDigital Library
Shi Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting Advances in Neural Information Processing Systems. 802--810. Google ScholarDigital Library
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 2048--2057. Google ScholarDigital Library
Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015 a. Pedestrian Travel Time Estimation in Crowded Scenes IEEE International Conference on Computer Vision (ICCV). IEEE. Google ScholarDigital Library
Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015 b. Understanding Pedestrian Behaviors from Stationary Crowd Groups Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE.Google Scholar
Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2016. Pedestrian Behavior Understanding and Prediction with Deep Neural Networks European Conference on Computer Vision. Springer, 263--279.Google Scholar
YoungJoon Yoo, Kimin Yun, Sangdoo Yun, JongHee Hong, Hawook Jeong, and Jin Young Choi. 2016. Visual Path Prediction in Complex Scenes With Crowded Moving Objects The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-scene crowd counting via deep convolutional neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 833--841.Google Scholar
Jianming Zhang, Zhe Lin, Jonathan Shen Xiaohui Brandt, and Stan Sclaroff. 2016. Top-down Neural Attention by Excitation Backprop European Conference on Computer Vision(ECCV).Google Scholar
Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. 2015. Learning collective crowd behaviors with dynamic pedestrian-agents. International Journal of Computer Vision Vol. 111, 1 (2015), 50--68. Google ScholarDigital Library
Bolei Zhou, Xiaoou Tang, Hepeng Zhang, and Xiaogang Wang. 2014. Measuring Crowd Collectiveness. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 8 (2014), 1586--1599. Google ScholarDigital Library
Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, and Yu Qiao. 2016. A key volume mining deep framework for action recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1991--1999.Google Scholar
Maryam Ziaeefard and Robert Bergevin. 2015. Semantic human activity recognition: a literature review. Pattern Recognition, Vol. 48, 8 (2015), 2329--2345. Google ScholarDigital Library

Index Terms

Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Pedestrian Detection Algorithm Based on ViBe and YOLO
ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image Processing

As more and more monitoring devices are deployed in various cities around the world, the technology of intelligent analysis and processing of video image data based on the computer is becoming more and more mature. This paper adopts an algorithm based ...
Read More
Context-aware pedestrian detection especially for small-sized instances with Deconvolution Integrated Faster RCNN (DIF R-CNN)

Pedestrian detection is a canonical problem in computer vision. Motivated by the observation that the major bottleneck of pedestrian detection lies on the different scales of pedestrian instances in images, our effort is focused on improving the ...
Read More
Robust pedestrian detection using scale and illumination invariant Mask R-CNN

In this paper, we address the challenging difficulty of detecting pedestrians with variation in scale and the illumination of the images. Occurrences of pedestrians with such variations exhibit diverse features. Therefore, it intensely affects the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer vision
crowd motion dynamics
deep learning
path forecasting
Qualifiers
- research-article
Conference

Acceptance Rates
MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 561
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective

MM '17: Proceedings of the 25th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Pedestrian Detection Algorithm Based on ViBe and YOLO

Context-aware pedestrian detection especially for small-sized instances with Deconvolution Integrated Faster RCNN (DIF R-CNN)

Robust pedestrian detection using scale and illumination invariant Mask R-CNN

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media