ABSTRACT
Background learning is a pre-processing of motion detection which is a basis step of video analysis. For the static background, many previous works have already achieved good performance. However, the results on learning dynamic background are still much to be improved. To address this challenge, in this paper, a novel and practical method is proposed based on deep auto-encoder networks. Firstly, dynamic background images are extracted through a deep auto-encoder network (called Background Extraction Network) from video frames containing motion objects. Then, a dynamic background model is learned by another deep auto-encoder network (called Background Learning Network) using the extracted background images as the input. To be more flexible, our background model can be updated on-line to absorb more training samples. Our main contributions are 1) a cascade of two deep auto-encoder networks which can deal with the separation of dynamic background and foregrounds very efficiently; 2) a method of online learning is adopted to accelerate the training of Background Extraction Network. Compared with previous algorithms, our approach obtains the best performance over six benchmark data sets. Especially, the experiments show that our algorithm can handle large variation background very well.
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems (NIPS), pages 153--160, 2007.Google Scholar
- V. Cevher, A. Sankaranarayanan, M. Duarte, D. Reddy, R. Baraniuk, and R. Chellappa. Compressive sensing background subtraction. In European Conference on Computer Vision (ECCV), pages 155--168, 2008. Google ScholarDigital Library
- L. Cheng, S. V. N. Vishwanathan, and D. Schuurmans et al. Implicit online learning with kernels. In Advances in Neural Information Processing Systems (NIPS), pages 249--256, 2006.Google Scholar
- L. Cheng, and M. Gong. Realtime Background Substraction from Dynamic Scenes. In IEEE International Conference on Computer Vision (ICCV), pages 2066--2073, 2009.Google Scholar
- D. Cremers and S. Soatto. Motion competition: A variational approach to piecewise parametric motion segmentation, International Journal of Computer Vision (IJCV), 62(3):249--265, 2005. Google ScholarDigital Library
- C. Farabet, C. Couprie, L. Najman, and Y. Lecun. Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(8):1915--1929, 2013. Google ScholarDigital Library
- J. He, L. Balzano, and A. Szlam. Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1568--1575, 2012. Google ScholarDigital Library
- N. Heess, N. L. Roux, and J.Winn. Weakly supervised learning of foreground-background segmentation using masked RBMs. In International Conference on Artificial Neural Networks (ICANN), pages 9--16, 2011. Google ScholarDigital Library
- J. Huang, X. Huang, and D. N. Metaxas. Learning with dynamic group sparsity. In IEEE International Conference on Computer Vision (ICCV), pages 64--71, 2009.Google Scholar
- W. Kim, C. Kim. Background subtraction for dynamic texture scenes using fuzzy color histograms. IEEE Signal Process Letter, 19(3):127--130, 2012.Google ScholarCross Ref
- D. Lee. Effective gaussian mixture learning for video background substraction. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(5):827--832, 2005. Google ScholarDigital Library
- L. Li, W. Huang, I. Gu, and Q. Tian. Statistical modeling of complex backgrounds for foreground object detectiong. IEEE Transactions on Image Processing (TIP), 13(11):1459--1472,2004. Google ScholarDigital Library
- Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report UILU-ENG-09--2215 , 1009.5055, 2009.Google Scholar
- Cewu Lu, Jianping Shi, and Jiaya Jia. Online Robust Dictionary Learning. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 415--422, 2013. Google ScholarDigital Library
- J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11(3):19--60, 2010. Google ScholarDigital Library
- A. Monnet, A. Mittal, N. Paragios, and V. Ramesh. Background modeling and subtraction of dynamic scenes. In IEEE International Conference on Computer Vision (ICCV), pages 1305--1312, 2005. Google ScholarDigital Library
- Marc'Aurelio Ranzato, F. J. Huang, Y. L. Boureau, Y. LeCun. Unsupervised learning of invarient feature hierarchies with applications to object recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), papges 1--8, 2007.Google Scholar
- Z. Ren, L. Chia, D. Rajan, and S. Gao. Background subtraction via coherent trajectory decomposition. In ACM international conference on Multimedia (MM), pages 545--548, 2013. Google ScholarDigital Library
- J. Rittscher, J. Kato, S. Joga, and A. Blake. A probabilistic background model for tracking. In European Conference on Computer Vision (ECCV), pages 336--350,2000. Google ScholarDigital Library
- F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386--408, 1958.Google ScholarCross Ref
- Y. Sheikh and M. Shah. Bayesian object detection in dynamic scenes. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 74--79, 2005. Google ScholarDigital Library
- C. Stauffer and W. Grimson. Adaptive Background Mixture Models for Real-Time Tracking. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 2246--2252, 1999.Google Scholar
- G. W. Taylor, and G. E. Hinton. Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style. In International Conference on Machine Learning (ICML), pages 129--137, 2009. Google ScholarDigital Library
- G. W. Taylor, G. E. Hinton, and S. Roweis. Modeling Human Motion Using Binary Latent Vairiables. In Advances in Neural Information Processing Systems (NIPS), pages 1345--1352, 2006.Google Scholar
- K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles and practice of background maintenance. In IEEE International Conference on Computer Vision (ICCV), pages 255--261, 1999.Google ScholarCross Ref
- R. Vidal and Y. Ma. A unified algebraic approach to 2-d and 3-d motion segmentations. In European Conference on Computer Vision (ECCV), pages 1--15, 2004.Google ScholarCross Ref
- P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising auto-encoders. In International Conference on Machine Learning (ICML), Pages 1096--1103, 2008. Google ScholarDigital Library
- A. Wagner, J. Wright, A. Ganesh, Z. Zhou, H. Mobahi, and Y. Ma. Towards a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(2):372--386, 2012. Google ScholarDigital Library
- N.Wang, T. Yao, J.Wang, and D.-Y. Yeung. A probabilistic approach to robust matrix factorization. In European Conference on Computer Vision (ECCV), pages 126--139, 2012. Google ScholarDigital Library
- J. Wang, P. Zhao, and S. C. H. Hoi. Exact soft confidence-weighted learning. In International Conference on Machine Learning (ICML), pages 121--128, 2012.Google Scholar
- C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 19(2):780--785, 2002. Google ScholarDigital Library
- P. Wu, S. C. H. Hoi, H. Xia et al. Online Multimodal Deep Similarity Learning with Application to Image Retrieval. In ACM international conference on Multimedia (MM), pages 153--162, 2013. Google ScholarDigital Library
- Q. Wu, P. Boulanger, and W. F. Bischof. Bi-Layer Video Segmentation with Foreground and Background Infrared Illumination. In ACM international conference on Multimedia (MM), pages 1025--1026, 2012. Google ScholarDigital Library
- Jinhua Xu, DanielW. C. Ho. A new training and pruning algorithm based on node dependence and Jacobian rank deficiency. Neurocomputing , 70(1):544--558, 2006.Google ScholarCross Ref
- P. Xu, M. Ye, Q.H. Liu, et al. Motion detection via a couple of auto-encoder networks. In IEEE International Conference on Multimedia and Expo (ICME), 2014.Google ScholarCross Ref
- A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. ACM computing survey, 38(4):1--45, 2006. Google ScholarDigital Library
- S. Yoo, and C. Kim. Background subtraction using hybrid feature coding in the bag-of-features framework. Pattern Recognition Letters, 34(16): 2086--2093, 2013. Google ScholarDigital Library
- C. Zhao, X. Wang, and W.K. Cham. Background subtraction via robust dictionary learning. EURASIP Journal on Image and Video Processing, 2011(972961):1--12, 2011.Google ScholarCross Ref
- P. Zhao, S. C. H. Hoi, and R. Jin. Double updating online learning. Journal of Machine Learning Research, 12(5):1587--1615, 2011. Google ScholarDigital Library
- Y. Zheng, S. Gu, and C. Tomas. Detection motion synchrony by video tubes. In ACM international conference on Multimedia (MM), pages 1197--1200, 2011. Google ScholarDigital Library
- J. Zhong and S. Sclaroff. Segmenting foreground objects from a dynamic textured background via a robust Kalman filter. In IEEE International Conference on Computer Vision (ICCV), pages 44--50, 2003. Google ScholarDigital Library
- X. Zhou, C. Yang, and W. Yu. Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(3):597--610, 2013. Google ScholarDigital Library
- G. Zhou, K. Sohn, and H. Lee. Online Incremental Feature Learning with Denoising Auto-encoders, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (ICAIS), pages 1453--1461, 2012.Google Scholar
Index Terms
- Dynamic Background Learning through Deep Auto-encoder Networks
Recommendations
A Hybrid Background Subtraction Method with Background and Foreground Candidates Detection
Background subtraction for motion detection is often used in video surveillance systems. However, difficulties in bootstrapping restrict its development. This article proposes a novel hybrid background subtraction technique to solve this problem. For ...
A Vehicle Detection Algorithm Based on Three-Frame Differencing and Background Subtraction
ISCID '12: Proceedings of the 2012 Fifth International Symposium on Computational Intelligence and Design - Volume 01A vehicle detection algorithm based on three-frame differencing and background subtraction is presented in this paper. Firstly, improved GMM is for background subtraction, then the moving object region is gained using background subtraction, and then ...
Parallel Algorithm for Moving Foreground Detection in Dynamic Background
ISCID '12: Proceedings of the 2012 Fifth International Symposium on Computational Intelligence and Design - Volume 02Foreground detection in dynamic background has become a hot topic in video surveillance in recent years. in this paper we propose a new foreground detection approach based on GPU in dynamic background. with the proposed method, SIFT features are first ...
Comments