skip to main content
10.1145/2647868.2654914acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Dynamic Background Learning through Deep Auto-encoder Networks

Authors Info & Claims
Published:03 November 2014Publication History

ABSTRACT

Background learning is a pre-processing of motion detection which is a basis step of video analysis. For the static background, many previous works have already achieved good performance. However, the results on learning dynamic background are still much to be improved. To address this challenge, in this paper, a novel and practical method is proposed based on deep auto-encoder networks. Firstly, dynamic background images are extracted through a deep auto-encoder network (called Background Extraction Network) from video frames containing motion objects. Then, a dynamic background model is learned by another deep auto-encoder network (called Background Learning Network) using the extracted background images as the input. To be more flexible, our background model can be updated on-line to absorb more training samples. Our main contributions are 1) a cascade of two deep auto-encoder networks which can deal with the separation of dynamic background and foregrounds very efficiently; 2) a method of online learning is adopted to accelerate the training of Background Extraction Network. Compared with previous algorithms, our approach obtains the best performance over six benchmark data sets. Especially, the experiments show that our algorithm can handle large variation background very well.

References

  1. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems (NIPS), pages 153--160, 2007.Google ScholarGoogle Scholar
  2. V. Cevher, A. Sankaranarayanan, M. Duarte, D. Reddy, R. Baraniuk, and R. Chellappa. Compressive sensing background subtraction. In European Conference on Computer Vision (ECCV), pages 155--168, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Cheng, S. V. N. Vishwanathan, and D. Schuurmans et al. Implicit online learning with kernels. In Advances in Neural Information Processing Systems (NIPS), pages 249--256, 2006.Google ScholarGoogle Scholar
  4. L. Cheng, and M. Gong. Realtime Background Substraction from Dynamic Scenes. In IEEE International Conference on Computer Vision (ICCV), pages 2066--2073, 2009.Google ScholarGoogle Scholar
  5. D. Cremers and S. Soatto. Motion competition: A variational approach to piecewise parametric motion segmentation, International Journal of Computer Vision (IJCV), 62(3):249--265, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Farabet, C. Couprie, L. Najman, and Y. Lecun. Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(8):1915--1929, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. He, L. Balzano, and A. Szlam. Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1568--1575, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Heess, N. L. Roux, and J.Winn. Weakly supervised learning of foreground-background segmentation using masked RBMs. In International Conference on Artificial Neural Networks (ICANN), pages 9--16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Huang, X. Huang, and D. N. Metaxas. Learning with dynamic group sparsity. In IEEE International Conference on Computer Vision (ICCV), pages 64--71, 2009.Google ScholarGoogle Scholar
  10. W. Kim, C. Kim. Background subtraction for dynamic texture scenes using fuzzy color histograms. IEEE Signal Process Letter, 19(3):127--130, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Lee. Effective gaussian mixture learning for video background substraction. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(5):827--832, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Li, W. Huang, I. Gu, and Q. Tian. Statistical modeling of complex backgrounds for foreground object detectiong. IEEE Transactions on Image Processing (TIP), 13(11):1459--1472,2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report UILU-ENG-09--2215 , 1009.5055, 2009.Google ScholarGoogle Scholar
  14. Cewu Lu, Jianping Shi, and Jiaya Jia. Online Robust Dictionary Learning. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 415--422, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11(3):19--60, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Monnet, A. Mittal, N. Paragios, and V. Ramesh. Background modeling and subtraction of dynamic scenes. In IEEE International Conference on Computer Vision (ICCV), pages 1305--1312, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Marc'Aurelio Ranzato, F. J. Huang, Y. L. Boureau, Y. LeCun. Unsupervised learning of invarient feature hierarchies with applications to object recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), papges 1--8, 2007.Google ScholarGoogle Scholar
  18. Z. Ren, L. Chia, D. Rajan, and S. Gao. Background subtraction via coherent trajectory decomposition. In ACM international conference on Multimedia (MM), pages 545--548, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Rittscher, J. Kato, S. Joga, and A. Blake. A probabilistic background model for tracking. In European Conference on Computer Vision (ECCV), pages 336--350,2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386--408, 1958.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Sheikh and M. Shah. Bayesian object detection in dynamic scenes. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 74--79, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Stauffer and W. Grimson. Adaptive Background Mixture Models for Real-Time Tracking. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 2246--2252, 1999.Google ScholarGoogle Scholar
  23. G. W. Taylor, and G. E. Hinton. Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style. In International Conference on Machine Learning (ICML), pages 129--137, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. W. Taylor, G. E. Hinton, and S. Roweis. Modeling Human Motion Using Binary Latent Vairiables. In Advances in Neural Information Processing Systems (NIPS), pages 1345--1352, 2006.Google ScholarGoogle Scholar
  25. K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles and practice of background maintenance. In IEEE International Conference on Computer Vision (ICCV), pages 255--261, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  26. R. Vidal and Y. Ma. A unified algebraic approach to 2-d and 3-d motion segmentations. In European Conference on Computer Vision (ECCV), pages 1--15, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising auto-encoders. In International Conference on Machine Learning (ICML), Pages 1096--1103, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Wagner, J. Wright, A. Ganesh, Z. Zhou, H. Mobahi, and Y. Ma. Towards a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(2):372--386, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N.Wang, T. Yao, J.Wang, and D.-Y. Yeung. A probabilistic approach to robust matrix factorization. In European Conference on Computer Vision (ECCV), pages 126--139, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Wang, P. Zhao, and S. C. H. Hoi. Exact soft confidence-weighted learning. In International Conference on Machine Learning (ICML), pages 121--128, 2012.Google ScholarGoogle Scholar
  31. C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 19(2):780--785, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Wu, S. C. H. Hoi, H. Xia et al. Online Multimodal Deep Similarity Learning with Application to Image Retrieval. In ACM international conference on Multimedia (MM), pages 153--162, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q. Wu, P. Boulanger, and W. F. Bischof. Bi-Layer Video Segmentation with Foreground and Background Infrared Illumination. In ACM international conference on Multimedia (MM), pages 1025--1026, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jinhua Xu, DanielW. C. Ho. A new training and pruning algorithm based on node dependence and Jacobian rank deficiency. Neurocomputing , 70(1):544--558, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  35. P. Xu, M. Ye, Q.H. Liu, et al. Motion detection via a couple of auto-encoder networks. In IEEE International Conference on Multimedia and Expo (ICME), 2014.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. ACM computing survey, 38(4):1--45, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Yoo, and C. Kim. Background subtraction using hybrid feature coding in the bag-of-features framework. Pattern Recognition Letters, 34(16): 2086--2093, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Zhao, X. Wang, and W.K. Cham. Background subtraction via robust dictionary learning. EURASIP Journal on Image and Video Processing, 2011(972961):1--12, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  39. P. Zhao, S. C. H. Hoi, and R. Jin. Double updating online learning. Journal of Machine Learning Research, 12(5):1587--1615, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Zheng, S. Gu, and C. Tomas. Detection motion synchrony by video tubes. In ACM international conference on Multimedia (MM), pages 1197--1200, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Zhong and S. Sclaroff. Segmenting foreground objects from a dynamic textured background via a robust Kalman filter. In IEEE International Conference on Computer Vision (ICCV), pages 44--50, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. X. Zhou, C. Yang, and W. Yu. Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(3):597--610, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Zhou, K. Sohn, and H. Lee. Online Incremental Feature Learning with Denoising Auto-encoders, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (ICAIS), pages 1453--1461, 2012.Google ScholarGoogle Scholar

Index Terms

  1. Dynamic Background Learning through Deep Auto-encoder Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '14: Proceedings of the 22nd ACM international conference on Multimedia
        November 2014
        1310 pages
        ISBN:9781450330633
        DOI:10.1145/2647868

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader