Skip to main content
Erschienen in: International Journal of Computer Vision 3/2016

01.05.2016

Learning from Multiple Sources for Video Summarisation

verfasst von: Xiatian Zhu, Chen Change Loy, Shaogang Gong

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many visual surveillance tasks, e.g. video summarisation, is conventionally accomplished through analysing imagery-based features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are often not sufficiently trustworthy and events of interest can be subtle. We believe that non-visual data sources such as weather reports and traffic sensory signals can be exploited to complement visual data for video content analysis and summarisation. In this paper, we present a novel unsupervised framework to learn jointly from both visual and independently-drawn non-visual data sources for discovering meaningful latent structure of surveillance video data. In particular, we investigate ways to cope with discrepant dimension and representation whilst associating these heterogeneous data sources, and derive effective mechanism to tolerate with missing and incomplete data from different sources. We show that the proposed multi-source learning framework not only achieves better video content clustering than state-of-the-art methods, but also is capable of accurately inferring missing non-visual semantics from previously-unseen videos. In addition, a comprehensive user study is conducted to validate the quality of video summarisation generated using the proposed multi-source model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Spatio-temporal combinations of human activity or interaction patterns, e.g. gathering, or environmental state changes, e.g. raining.
 
2
Also known as the heteroscedasticity problem (Duin and Loog 2004).
 
3
There exist missing data filling algorithms utilised in conventional random forests, e.g. for the missing value of one feature in one class, the median value (continuous) or the most frequent category (discrete) of this feature over the current class can be used as the estimation (Breiman 2003). Whilst a similar strategy is possible to apply on our MSC-Forest, we consider an alternative by proposing an effective adaptive weighting algorithm in order not to further introduce noisy training data.
 
4
It is worth noticing that the purpose of this clustering step is completely different from the multi-source data clustering during model training, as presented in Sect. 3.3. The latter is a component of our multi-source model training pipeline (Fig. 2), whilst the former aims at revealing the latent structure over testing data for video summarisation.
 
6
No vehicle detection on the ERCe dataset.
 
7
Evaluating a forest that takes only non-visual inputs is not possible, since non-visual data is not available for previously-unseen video footages.
 
8
VNV-MSC-Forest-hard shares the same clusters as VNV-MSC-Forest.
 
9
The event of interest is analogous to important objects/regions in (Lee et al. 2012).
 
10
The inferred non-visual tags include weather, traffic conditions, and typicality. The typicality tag, i.e. usual and interesting, of each clip, is computed based on the size of their assigned clusters (Fig. 4c). Clips assigned to the top \(20\,\%\) smallest clusters are treated as ‘interesting’.
 
Literatur
Zurück zum Zitat Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D.U. (2006). Complex networks: Structure and dynamics. Physics reports (pp. 175–308). Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D.U. (2006). Complex networks: Structure and dynamics. Physics reports (pp. 175–308).
Zurück zum Zitat Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests and ferns. In IEEE international conference on computer vision. Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests and ferns. In IEEE international conference on computer vision.
Zurück zum Zitat Breiman, L. (2003). Rf/tools: A class of two-eyed algorithms. In: SIAM Workshop, Statistics Department, UC Berkeley. Breiman, L. (2003). Rf/tools: A class of two-eyed algorithms. In: SIAM Workshop, Statistics Department, UC Berkeley.
Zurück zum Zitat Breiman, L., Friedman, J., Stone, C., & Olshen, R. (1984). Classification and regression trees. New York: Chapman & Hall/CRC.MATH Breiman, L., Friedman, J., Stone, C., & Olshen, R. (1984). Classification and regression trees. New York: Chapman & Hall/CRC.MATH
Zurück zum Zitat Cai, X., Nie, F., Huang, H., & Kamangar, F. (2011). Heterogeneous image feature integration via multi-modal spectral clustering. In IEEE conference on computer vision and pattern recognition. Cai, X., Nie, F., Huang, H., & Kamangar, F. (2011). Heterogeneous image feature integration via multi-modal spectral clustering. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Caruana, R., Karampatziakis, N., & Yessenalina, A. (2008). An empirical evaluation of supervised learning in high dimensions. In International conference on machine learning. Caruana, R., Karampatziakis, N., & Yessenalina, A. (2008). An empirical evaluation of supervised learning in high dimensions. In International conference on machine learning.
Zurück zum Zitat Chan, A. B., & Vasconcelos, N. (2008). Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 909–926.CrossRef Chan, A. B., & Vasconcelos, N. (2008). Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 909–926.CrossRef
Zurück zum Zitat Chu, W. S., Song, Y., & Jaimes, A. (2015). Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Conference on Computer Vision and Pattern Recognition, 30, 3584–3592. Chu, W. S., Song, Y., & Jaimes, A. (2015). Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Conference on Computer Vision and Pattern Recognition, 30, 3584–3592.
Zurück zum Zitat Cong, Y., Yuan, J., & Luo, J. (2012). Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia, 14(1), 66–75.CrossRef Cong, Y., Yuan, J., & Luo, J. (2012). Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia, 14(1), 66–75.CrossRef
Zurück zum Zitat Criminisi, A., & Shotton, J. (2012). Decision forests: A unified framework. Foundations and trends in computer graphics and vision (pp. 81–227). Criminisi, A., & Shotton, J. (2012). Decision forests: A unified framework. Foundations and trends in computer graphics and vision (pp. 81–227).
Zurück zum Zitat Duin, R., & Loog, M. (2004). Linear dimensionality reduction via a heteroscedastic extension of lda: the chernoff criterion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 732–739.CrossRef Duin, R., & Loog, M. (2004). Linear dimensionality reduction via a heteroscedastic extension of lda: the chernoff criterion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 732–739.CrossRef
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.CrossRef
Zurück zum Zitat Feng, S., Lei, Z., Yi, D., & Li, S. Z. (2012). Online content-aware video condensation. In IEEE conference on computer vision and pattern recognition. Feng, S., Lei, Z., Yi, D., & Li, S. Z. (2012). Online content-aware video condensation. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Fu, Y., Hospedales, T., Xiang, T., & Gong, S. (2013). Learning multi-modal latent attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 303–316. Fu, Y., Hospedales, T., Xiang, T., & Gong, S. (2013). Learning multi-modal latent attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 303–316.
Zurück zum Zitat Gall, J., Yao, A., Razavi, N., Gool, L. J. V., & Lempitsky, V. S. (2011). Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (pp. 2188–2202). Gall, J., Yao, A., Razavi, N., Gool, L. J. V., & Lempitsky, V. S. (2011). Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (pp. 2188–2202).
Zurück zum Zitat Gong, S., Loy, C. C., & Xiang, T. (2011). Security and surveillance. Visual Analysis of Humans (pp. 455–472). Berlin: Springer.CrossRef Gong, S., Loy, C. C., & Xiang, T. (2011). Security and surveillance. Visual Analysis of Humans (pp. 455–472). Berlin: Springer.CrossRef
Zurück zum Zitat Gong, Y. (2003). Summarizing audiovisual contents of a video program. EURASIP Journal on Advances in Signal Processing, 2003, 160–169.CrossRef Gong, Y. (2003). Summarizing audiovisual contents of a video program. EURASIP Journal on Advances in Signal Processing, 2003, 160–169.CrossRef
Zurück zum Zitat Gygli, M., & Van Gool, H. G. L. (2015). Video summarization by learning submodular mixtures of objectives. In IEEE conference on computer vision and pattern recognition (pp. 3090–3098). Gygli, M., & Van Gool, H. G. L. (2015). Video summarization by learning submodular mixtures of objectives. In IEEE conference on computer vision and pattern recognition (pp. 3090–3098).
Zurück zum Zitat Gygli, M., Grabner, H., Riemenschneider, H., & Van Gool, L. (2014). Creating summaries from user videos. In European conference on computer vision (pp. 505–520). Gygli, M., Grabner, H., Riemenschneider, H., & Van Gool, L. (2014). Creating summaries from user videos. In European conference on computer vision (pp. 505–520).
Zurück zum Zitat Heer, J., & Chi, E. H. (2001). Identification of web user traffic composition using multi-modal clustering and information scent. In Proceedings of the workshop on web mining, SIAM conference on data mining (pp. 51–58). Heer, J., & Chi, E. H. (2001). Identification of web user traffic composition using multi-modal clustering and information scent. In Proceedings of the workshop on web mining, SIAM conference on data mining (pp. 51–58).
Zurück zum Zitat Hospedales, T. M., Li, J., Gong, S., & Xiang, T. (2011). Identifying rare and subtle behaviors: a weakly supervised joint topic model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2451–2464.CrossRef Hospedales, T. M., Li, J., Gong, S., & Xiang, T. (2011). Identifying rare and subtle behaviors: a weakly supervised joint topic model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2451–2464.CrossRef
Zurück zum Zitat Huang, H. C., Chuang, Y. Y., & Chen, C. S. (2012). Affinity aggregation for spectral clustering. In IEEE conference on computer vision and pattern recognition. Huang, H. C., Chuang, Y. Y., & Chen, C. S. (2012). Affinity aggregation for spectral clustering. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.CrossRef Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.CrossRef
Zurück zum Zitat Kang, H., Chen, X., Matsushita, Y., & Tang, X. (2006). Space-time video montage. In IEEE conference on computer vision and pattern recognition. Kang, H., Chen, X., Matsushita, Y., & Tang, X. (2006). Space-time video montage. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Karydis, I., Nanopoulos, A., Gabriel, H. H., & Spiliopoulou, M. (2009). Tag-aware spectral clustering of music items. In The international society for music information retrieval (pp. 159–164). Karydis, I., Nanopoulos, A., Gabriel, H. H., & Spiliopoulou, M. (2009). Tag-aware spectral clustering of music items. In The international society for music information retrieval (pp. 159–164).
Zurück zum Zitat Khalidov, V., Forbes, F., & Horaud, R. (2011). Conjugate mixture models for clustering multimodal data. Neural Computation, 23, 517–557.MathSciNetCrossRefMATH Khalidov, V., Forbes, F., & Horaud, R. (2011). Conjugate mixture models for clustering multimodal data. Neural Computation, 23, 517–557.MathSciNetCrossRefMATH
Zurück zum Zitat Khosla, A., Hamid, R., Lin, C. J., & Sundaresan, N. (2013). Large-scale video summarization using web-image priors. In IEEE conference on computer vision and pattern recognition (pp. 2698–2705). Khosla, A., Hamid, R., Lin, C. J., & Sundaresan, N. (2013). Large-scale video summarization using web-image priors. In IEEE conference on computer vision and pattern recognition (pp. 2698–2705).
Zurück zum Zitat Kim, C., & Hwang, J. N. (2002). Object-based video abstraction for video surveillance systems. IEEE Transactions on Circuits and Systems for Video Technology, 12, 1128–1138.CrossRef Kim, C., & Hwang, J. N. (2002). Object-based video abstraction for video surveillance systems. IEEE Transactions on Circuits and Systems for Video Technology, 12, 1128–1138.CrossRef
Zurück zum Zitat Kim, G., Sigal, L., & Xing, E. P. (2014). Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In IEEE conference on computer vision and pattern recognition (pp. 4225–4232). Kim, G., Sigal, L., & Xing, E. P. (2014). Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In IEEE conference on computer vision and pattern recognition (pp. 4225–4232).
Zurück zum Zitat Kratz, L., & Nishino, K. (2012). Going with the flow: pedestrian efficiency in crowded scenes. In European conference on computer vision. Kratz, L., & Nishino, K. (2012). Going with the flow: pedestrian efficiency in crowded scenes. In European conference on computer vision.
Zurück zum Zitat Lee, Y. J., Ghosh, J., & Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In IEEE conference on computer vision and pattern recognition. Lee, Y. J., Ghosh, J., & Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Li, W., Mahadevan, V., & Vasconcelos, N. (2013). Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 18–32. Li, W., Mahadevan, V., & Vasconcelos, N. (2013). Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 18–32.
Zurück zum Zitat Liu, B., Xia, Y., & Yu, P. S. (2000). Clustering through decision tree construction. In Conference on information and knowledge management. Liu, B., Xia, Y., & Yu, P. S. (2000). Clustering through decision tree construction. In Conference on information and knowledge management.
Zurück zum Zitat Loy, C. C., Xiang, T., & Gong, S. (2012). Incremental activity modeling in multiple disjoint cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 1799–1813.CrossRef Loy, C. C., Xiang, T., & Gong, S. (2012). Incremental activity modeling in multiple disjoint cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 1799–1813.CrossRef
Zurück zum Zitat Lu, Z., Grauman, K. (2013a). Story-driven summarization for egocentric video. In IEEE conference on computer vision and pattern recognition. Lu, Z., Grauman, K. (2013a). Story-driven summarization for egocentric video. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Lu, Z., Grauman, K. (2013b). Story-driven summarization for egocentric video. In IEEE conference on computer vision and pattern recognition (pp. 2714–2721). Lu, Z., Grauman, K. (2013b). Story-driven summarization for egocentric video. In IEEE conference on computer vision and pattern recognition (pp. 2714–2721).
Zurück zum Zitat Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60.MathSciNetMATH Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60.MathSciNetMATH
Zurück zum Zitat Martin, J. K. (1997). An exact probability metric for decision tree splitting and stopping. Machine Learning, 28, 257–291.CrossRef Martin, J. K. (1997). An exact probability metric for decision tree splitting and stopping. Machine Learning, 28, 257–291.CrossRef
Zurück zum Zitat Money, A. G., & Agius, H. (2008). Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation, 19, 121–143.CrossRef Money, A. G., & Agius, H. (2008). Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation, 19, 121–143.CrossRef
Zurück zum Zitat Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1632–1646.CrossRef Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1632–1646.CrossRef
Zurück zum Zitat Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 971–987.CrossRefMATH Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 971–987.CrossRefMATH
Zurück zum Zitat Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175.CrossRefMATH Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175.CrossRefMATH
Zurück zum Zitat Perbet, F., Stenger, B., & Maki, A. (2009). Random forest clustering and application to video segmentation. In British machine vision conference. Perbet, F., Stenger, B., & Maki, A. (2009). Random forest clustering and application to video segmentation. In British machine vision conference.
Zurück zum Zitat Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014). Category-specific video summarization. In European conference on computer vision (pp. 540–555). Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014). Category-specific video summarization. In European conference on computer vision (pp. 540–555).
Zurück zum Zitat Pritch, Y., Rav-Acha, A., Gutman, A., & Peleg, S. (2007). Webcam synopsis: Peeking around the world. In The IEEE international conference on computer vision. Pritch, Y., Rav-Acha, A., Gutman, A., & Peleg, S. (2007). Webcam synopsis: Peeking around the world. In The IEEE international conference on computer vision.
Zurück zum Zitat Pritch, Y., Rav-Acha, A., & Peleg, S. (2008). Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1971–1984.CrossRef Pritch, Y., Rav-Acha, A., & Peleg, S. (2008). Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1971–1984.CrossRef
Zurück zum Zitat Schulter, S., Leistner, C., Wohlhart, P., Roth, P. M., & Bischof, H. (2013a). Alternating regression forests for object detection and pose estimation. In IEEE international conference on computer vision. Schulter, S., Leistner, C., Wohlhart, P., Roth, P. M., & Bischof, H. (2013a). Alternating regression forests for object detection and pose estimation. In IEEE international conference on computer vision.
Zurück zum Zitat Schulter, S., Wohlhart, P., Leistner, C., Saffari, A., Roth, P. M., & Bischof, H. (2013b). Alternating decision forests. In IEEE conference on computer vision and pattern recognition. Schulter, S., Wohlhart, P., Leistner, C., Saffari, A., Roth, P. M., & Bischof, H. (2013b). Alternating decision forests. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15, 118–138.MathSciNetCrossRef Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15, 118–138.MathSciNetCrossRef
Zurück zum Zitat Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., & Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In IEEE conference on computer vision and pattern recognition. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., & Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Strehl, A., & Ghosh, J. (2003). Cluster ensembles—A knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583–617.MathSciNetMATH Strehl, A., & Ghosh, J. (2003). Cluster ensembles—A knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583–617.MathSciNetMATH
Zurück zum Zitat Sun, M., Farhadi, A., & Seitz, S. (2014). Ranking domain-specific highlights by analyzing edited videos. In European conference on computer vision (pp. 787–802). Sun, M., Farhadi, A., & Seitz, S. (2014). Ranking domain-specific highlights by analyzing edited videos. In European conference on computer vision (pp. 787–802).
Zurück zum Zitat Taskiran, C., Pizlo, Z., Amir, A., Ponceleon, D., & Delp, E. (2006). Automated video program summarization using speech transcripts. IEEE Transactions on Multimedia, 8, 775–791.CrossRef Taskiran, C., Pizlo, Z., Amir, A., Ponceleon, D., & Delp, E. (2006). Automated video program summarization using speech transcripts. IEEE Transactions on Multimedia, 8, 775–791.CrossRef
Zurück zum Zitat Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., & Yagnik, J. (2010). Finding meaning on youtube: Tag recommendation and category discovery. In IEEE Conference on Computer Vision and Pattern Recognition. Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., & Yagnik, J. (2010). Finding meaning on youtube: Tag recommendation and category discovery. In IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 1866–1881.CrossRef Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 1866–1881.CrossRef
Zurück zum Zitat Truong, B. T., & Venkatesh, S. (2007). Video abstraction: A systematic review and classification. ACM transactions on multimedia computing, communications, and applications. Truong, B. T., & Venkatesh, S. (2007). Video abstraction: A systematic review and classification. ACM transactions on multimedia computing, communications, and applications.
Zurück zum Zitat Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al. (2001). Constrained k-means clustering with background knowledge. International Conference on Machine learning, 1, 577–584. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al. (2001). Constrained k-means clustering with background knowledge. International Conference on Machine learning, 1, 577–584.
Zurück zum Zitat Wang, M., Hong, R., Li, G., Zha, Z., Yan, Z. J., Yan, S., et al. (2012). Event driven web video summarization by tag localization and key-shot identification. IEEE Transactions on Multimedia, 14, 975–985.CrossRef Wang, M., Hong, R., Li, G., Zha, Z., Yan, Z. J., Yan, S., et al. (2012). Event driven web video summarization by tag localization and key-shot identification. IEEE Transactions on Multimedia, 14, 975–985.CrossRef
Zurück zum Zitat Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 539–555.CrossRef Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 539–555.CrossRef
Zurück zum Zitat Wang, Z., Zhao, M., Song, Y., Kumar, S., & Li, B. (2010). Youtubecat: Learning to categorize wild web videos. In IEEE conference on computer vision and pattern recognition. Wang, Z., Zhao, M., Song, Y., Kumar, S., & Li, B. (2010). Youtubecat: Learning to categorize wild web videos. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wolf, W. (1996). Keyframe selection by motion analysis. In IEEE international conference on acoustics, speech, and signal processing. Wolf, W. (1996). Keyframe selection by motion analysis. In IEEE international conference on acoustics, speech, and signal processing.
Zurück zum Zitat Wu, S., Moore, B. E., Shah, M. (2010). Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In IEEE conference on computer vision and pattern recognition (pp. 2054–2060). Wu, S., Moore, B. E., Shah, M. (2010). Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In IEEE conference on computer vision and pattern recognition (pp. 2054–2060).
Zurück zum Zitat Xing, E. P., Jordan, M. I., Russell, S., & Ng, A. Y. (2002). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems (pp. 505–512). Xing, E. P., Jordan, M. I., Russell, S., & Ng, A. Y. (2002). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems (pp. 505–512).
Zurück zum Zitat Zelnik-manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing systems. Zelnik-manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing systems.
Zurück zum Zitat Zhang, D. Q., Lin, C. Y., Chang, S. F., & Smith, J. R. (2004). Semantic video clustering across sources using bipartite spectral clustering. In IEEE international conference on multimedia and expo. Zhang, D. Q., Lin, C. Y., Chang, S. F., & Smith, J. R. (2004). Semantic video clustering across sources using bipartite spectral clustering. In IEEE international conference on multimedia and expo.
Zurück zum Zitat Zhang, H., Wu, J., Zhong, D., & Smoliar, S. W. (1997). An integrated system for content-based video retrieval and browsing. Patten Recognition, 30, 643–658.CrossRef Zhang, H., Wu, J., Zhong, D., & Smoliar, S. W. (1997). An integrated system for content-based video retrieval and browsing. Patten Recognition, 30, 643–658.CrossRef
Zurück zum Zitat Zhao, B., & Xing, E. P. (2014). Quasi real-time summarization for consumer videos. In IEEE conference on computer vision and pattern recognition (pp. 2513–2520). Zhao, B., & Xing, E. P. (2014). Quasi real-time summarization for consumer videos. In IEEE conference on computer vision and pattern recognition (pp. 2513–2520).
Zurück zum Zitat Zhao, Y., & Karypis, G. (2004). Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning (pp. 311–331). Zhao, Y., & Karypis, G. (2004). Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning (pp. 311–331).
Zurück zum Zitat Zhu, X., Loy, C. C., & Gong, S. (2014). Constructing robust affinity graphs for spectral clustering. In Proceedings of the 27th IEEE conference on computer vision and pattern recognition (pp. 1450–1457). Zhu, X., Loy, C. C., & Gong, S. (2014). Constructing robust affinity graphs for spectral clustering. In Proceedings of the 27th IEEE conference on computer vision and pattern recognition (pp. 1450–1457).
Metadaten
Titel
Learning from Multiple Sources for Video Summarisation
verfasst von
Xiatian Zhu
Chen Change Loy
Shaogang Gong
Publikationsdatum
01.05.2016
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2016
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-015-0864-3

Weitere Artikel der Ausgabe 3/2016

International Journal of Computer Vision 3/2016 Zur Ausgabe

Premium Partner