Skip to main content
Erschienen in: Multimedia Systems 5/2013

01.10.2013 | Regular Paper

State-of-the-art and future challenges in video scene detection: a survey

verfasst von: Manfred Del Fabro, Laszlo Böszörmenyi

Erschienen in: Multimedia Systems | Ausgabe 5/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the last 15 years much effort has been made in the field of segmentation of videos into scenes. We give a comprehensive overview of the published approaches and classify them into seven groups based on three basic classes of low-level features used for the segmentation process: (1) visual-based, (2) audio-based, (3) text-based, (4) audio-visual-based, (5) visual-textual-based, (6) audio-textual-based and (7) hybrid approaches. We try to make video scene detection approaches better assessable and comparable by making a categorization of the evaluation strategies used. This includes size and type of the dataset used as well as the evaluation metrics. Furthermore, in order to let the reader make use of the survey, we list eight possible application scenarios, including an own section for interactive video scene segmentation, and identify those algorithms that can be applied to them. At the end, current challenges for scene segmentation algorithms are discussed. In the appendix the most important characteristics of the algorithms presented in this paper are summarized in table form.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
5
If an approach has been evaluated with multiple video types, it is counted once for each corresponding genre. For the total number of approaches, such approaches are counted multiple times, once for each type of video. Therefore, the sum of all percentages in the chart in Fig. 12 is 100 %.
 
Literatur
1.
Zurück zum Zitat Adams, B., Dorai, C., Venkatesh, S.: Toward automatic extraction of expressive elements from motion pictures: tempo. IEEE Trans. Multimed. 4(4), 472–481 (2002)CrossRef Adams, B., Dorai, C., Venkatesh, S.: Toward automatic extraction of expressive elements from motion pictures: tempo. IEEE Trans. Multimed. 4(4), 472–481 (2002)CrossRef
2.
Zurück zum Zitat Aner, A., Kender, J.: Video Summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen P. (eds.) Computer Vision ECCV 2002, Lecture Notes in Computer Science, vol. 2353, Chap. 26, pp. 45–49. Springer, Berlin (2006) Aner, A., Kender, J.: Video Summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen P. (eds.) Computer Vision ECCV 2002, Lecture Notes in Computer Science, vol. 2353, Chap. 26, pp. 45–49. Springer, Berlin (2006)
3.
Zurück zum Zitat Arifin, S., Cheung, P.Y.K.: Affective level video segmentation by utilizing the Pleasure-Arousal-dominance information. IEEE Trans. Multimed. 10(7), 1325–1341 (2008)CrossRef Arifin, S., Cheung, P.Y.K.: Affective level video segmentation by utilizing the Pleasure-Arousal-dominance information. IEEE Trans. Multimed. 10(7), 1325–1341 (2008)CrossRef
4.
Zurück zum Zitat Ariki, Y., Kumano, M., Tsukada, K.: Highlight scene extraction in real time from baseball live video. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR ’03, pp. 209–214. ACM, New York, NY, USA (2003) Ariki, Y., Kumano, M., Tsukada, K.: Highlight scene extraction in real time from baseball live video. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR ’03, pp. 209–214. ACM, New York, NY, USA (2003)
5.
Zurück zum Zitat Benini, S., Xu, L.Q., Leonardi, R.: Identifying video content consistency by vector quantization. In: Proceedings of the 2005 International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005) (2005) Benini, S., Xu, L.Q., Leonardi, R.: Identifying video content consistency by vector quantization. In: Proceedings of the 2005 International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005) (2005)
6.
Zurück zum Zitat Bredin, H.: Segmentation of tv shows into scenes using speaker diarization and speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 2377–2380 (2012) Bredin, H.: Segmentation of tv shows into scenes using speaker diarization and speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 2377–2380 (2012)
7.
Zurück zum Zitat Cao, J.R.: Algorithm of scene segmentation based on svm for scenery documentary. In: Third International Conference on Natural Computation, 2007 (ICNC 2007), vol. 3, pp. 95–98 (2007) Cao, J.R.: Algorithm of scene segmentation based on svm for scenery documentary. In: Third International Conference on Natural Computation, 2007 (ICNC 2007), vol. 3, pp. 95–98 (2007)
8.
Zurück zum Zitat Chaisorn, L., Chua, T.S., Lee, C.H.: The segmentation of news video into story units. In: IEEE International Conference on Multimedia and Expo, 2002. ICME ’02, 2002, vol. 1, pp. 73–76 (2002) Chaisorn, L., Chua, T.S., Lee, C.H.: The segmentation of news video into story units. In: IEEE International Conference on Multimedia and Expo, 2002. ICME ’02, 2002, vol. 1, pp. 73–76 (2002)
9.
Zurück zum Zitat Chasanis, V.T., Likas, A.C., Galatsanos, N.P.: Scene detection in videos using shot clustering and sequence alignment. IEEE Trans. Multimed. 11(1), 89–100 (2009)CrossRef Chasanis, V.T., Likas, A.C., Galatsanos, N.P.: Scene detection in videos using shot clustering and sequence alignment. IEEE Trans. Multimed. 11(1), 89–100 (2009)CrossRef
10.
Zurück zum Zitat Chen, L., Ozsu, M.: Rule-based scene extraction from video. In: Proceedings of 2002 International Conference on Image Processing (2002) Chen, L., Ozsu, M.: Rule-based scene extraction from video. In: Proceedings of 2002 International Conference on Image Processing (2002)
11.
Zurück zum Zitat Chen, L.H., Lai, Y.C., Mark Liao, H.Y.: Movie scene segmentation using background information. Pattern Recognit. 41, 1056–1065 (2008)CrossRefMATH Chen, L.H., Lai, Y.C., Mark Liao, H.Y.: Movie scene segmentation using background information. Pattern Recognit. 41, 1056–1065 (2008)CrossRefMATH
12.
Zurück zum Zitat Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues, pp. 365–368 Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues, pp. 365–368
13.
Zurück zum Zitat Cheng, W., Lu, J.: Video scene oversegmentation reduction by tempo analysis. In: Fourth International Conference on Natural Computation, 2008 (ICNC ’08), vol. 4, pp. 296–300 (2008) Cheng, W., Lu, J.: Video scene oversegmentation reduction by tempo analysis. In: Fourth International Conference on Natural Computation, 2008 (ICNC ’08), vol. 4, pp. 296–300 (2008)
14.
Zurück zum Zitat Chu, W.T., Li, C.J., Tseng, S.C.: Travelmedia: an intelligent management system for media captured in travel. J. Vis. Commun. Image Represent. 22(1), 93–104 (2011)CrossRef Chu, W.T., Li, C.J., Tseng, S.C.: Travelmedia: an intelligent management system for media captured in travel. J. Vis. Commun. Image Represent. 22(1), 93–104 (2011)CrossRef
15.
Zurück zum Zitat Chu, W.T., Lin, C.C., Yu, J.Y.: Using cross-media correlation for scene detection in travel videos. In: Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR ’09. ACM, New York, NY, USA (2009) Chu, W.T., Lin, C.C., Yu, J.Y.: Using cross-media correlation for scene detection in travel videos. In: Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR ’09. ACM, New York, NY, USA (2009)
16.
Zurück zum Zitat Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision ECCV 2008, Lecture Notes in Computer Science, vol. 5305, Chap. 12, pp. 158–171. Springer, Berlin (2008) Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision ECCV 2008, Lecture Notes in Computer Science, vol. 5305, Chap. 12, pp. 158–171. Springer, Berlin (2008)
17.
Zurück zum Zitat Del Fabro, M., Böszörmenyi, L.: Video scene detection based on recurring motion patterns. In: Second International Conferences on Advances in Multimedia (MMEDIA), pp. 113–118 (2010) Del Fabro, M., Böszörmenyi, L.: Video scene detection based on recurring motion patterns. In: Second International Conferences on Advances in Multimedia (MMEDIA), pp. 113–118 (2010)
18.
Zurück zum Zitat Del Fabro, M., Böszörmenyi, L.: Summarization and presentation of real-life events using community-contributed content. In: Schoeffmann, K., Merialdo, B., Hauptmann, A., Ngo, C.W., Andreopoulos, Y., Breiteneder, C. (eds.) Advances in Multimedia Modeling, Lecture Notes in Computer Science, vol. 7131, pp. 630–632. Springer, Berlin (2012) Del Fabro, M., Böszörmenyi, L.: Summarization and presentation of real-life events using community-contributed content. In: Schoeffmann, K., Merialdo, B., Hauptmann, A., Ngo, C.W., Andreopoulos, Y., Breiteneder, C. (eds.) Advances in Multimedia Modeling, Lecture Notes in Computer Science, vol. 7131, pp. 630–632. Springer, Berlin (2012)
19.
Zurück zum Zitat Del Fabro, M., Sobe, A., Böszörmenyi, L.: Summarization of real-life events based on community-contributed content. In: The Fourth International Conferences on Advances in Multimedia, pp. 119–126. IARIA (2012) Del Fabro, M., Sobe, A., Böszörmenyi, L.: Summarization of real-life events based on community-contributed content. In: The Fourth International Conferences on Advances in Multimedia, pp. 119–126. IARIA (2012)
20.
Zurück zum Zitat Ellouze, M., Boujemaa, N., Alimi, A.: Scene pathfinder: unsupervised clustering techniques for movie scenes extraction. Multimed. Tools Appl. 47(2), 325–346 (2010)CrossRef Ellouze, M., Boujemaa, N., Alimi, A.: Scene pathfinder: unsupervised clustering techniques for movie scenes extraction. Multimed. Tools Appl. 47(2), 325–346 (2010)CrossRef
21.
Zurück zum Zitat Ercolessi, P., Bredin, H., Sénac, C., Joly, P.: Segmenting TV series into scenes using speaker diarization. In: WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services. Delft, The Netherlands (2011) Ercolessi, P., Bredin, H., Sénac, C., Joly, P.: Segmenting TV series into scenes using speaker diarization. In: WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services. Delft, The Netherlands (2011)
22.
Zurück zum Zitat Friedland, G., Gottlieb, L., Janin, A.: Joke-o-mat: browsing sitcoms punchline by punchline. In: Proceedings of the Seventeen ACM International Conference on Multimedia, MM ’09, pp. 1115–1116. ACM, New York, NY, USA (2009) Friedland, G., Gottlieb, L., Janin, A.: Joke-o-mat: browsing sitcoms punchline by punchline. In: Proceedings of the Seventeen ACM International Conference on Multimedia, MM ’09, pp. 1115–1116. ACM, New York, NY, USA (2009)
23.
Zurück zum Zitat Gatica-Perez, D., Loui, A., Sun, M.T.: Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circuits Syst. Video Technol. 13(6), 539– 548 (2003)CrossRef Gatica-Perez, D., Loui, A., Sun, M.T.: Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circuits Syst. Video Technol. 13(6), 539– 548 (2003)CrossRef
24.
Zurück zum Zitat Goela, N., Wilson, K., Niu, F., Divakaran, A., Otsuka, I.: An SVM framework for Genre-Independent scene change detection. In: IEEE International Conference on Multimedia and Expo, pp. 532–535 (2007) Goela, N., Wilson, K., Niu, F., Divakaran, A., Otsuka, I.: An SVM framework for Genre-Independent scene change detection. In: IEEE International Conference on Multimedia and Expo, pp. 532–535 (2007)
25.
Zurück zum Zitat Gu, Z., Mei, T., Hua, X.S., Wu, X., Li, S.: EMS: Energy Minimization Based Video Scene Segmentation. In: IEEE International Conference on Multimedia and Expo, pp. 520–523 (2007) Gu, Z., Mei, T., Hua, X.S., Wu, X., Li, S.: EMS: Energy Minimization Based Video Scene Segmentation. In: IEEE International Conference on Multimedia and Expo, pp. 520–523 (2007)
26.
Zurück zum Zitat Han, B., Wu, W.: Video scene segmentation using a novel boundary evaluation criterion and dynamic programming. In: IEEE International Conference on Multimedia and Expo (ICME), 2011, pp. 1–6 (2011) Han, B., Wu, W.: Video scene segmentation using a novel boundary evaluation criterion and dynamic programming. In: IEEE International Conference on Multimedia and Expo (ICME), 2011, pp. 1–6 (2011)
27.
Zurück zum Zitat Hanjalic, A., Lagendijk, R.L., Biemond, J.: Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol. 9(4), 580–588 (1999)CrossRef Hanjalic, A., Lagendijk, R.L., Biemond, J.: Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol. 9(4), 580–588 (1999)CrossRef
28.
Zurück zum Zitat Hauptmann, A., Witbrock, M.: Story segmentation and detection of commercials in broadcast news video. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries, 1998. ADL 98, pp. 168–179 (1998) Hauptmann, A., Witbrock, M.: Story segmentation and detection of commercials in broadcast news video. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries, 1998. ADL 98, pp. 168–179 (1998)
29.
Zurück zum Zitat Hsu, W.H.M., Chang, S.F.: Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. In: IEEE International Conference on Multimedia and Expo, 2004. ICME ’04, vol. 2, pp. 1091–1094 (2004) Hsu, W.H.M., Chang, S.F.: Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. In: IEEE International Conference on Multimedia and Expo, 2004. ICME ’04, vol. 2, pp. 1091–1094 (2004)
30.
Zurück zum Zitat Huang, J., Liu, Z., Wang, Y.: Joint scene classification and segmentation based on hidden markov model. IEEE Trans. Multimed. 7(3), 538–550 (2005)CrossRef Huang, J., Liu, Z., Wang, Y.: Joint scene classification and segmentation based on hidden markov model. IEEE Trans. Multimed. 7(3), 538–550 (2005)CrossRef
31.
Zurück zum Zitat Huang, J., Liu, Z., Yao, W.: Integration of audio and visual information for content-based video segmentation. In: International Conference on Image Processing, ICIP 98, vol. 3, pp. 526–529 (1998) Huang, J., Liu, Z., Yao, W.: Integration of audio and visual information for content-based video segmentation. In: International Conference on Image Processing, ICIP 98, vol. 3, pp. 526–529 (1998)
32.
Zurück zum Zitat Janin, A., Gottlieb, L., Friedland, G.: Joke-o-Mat HD: browsing sitcoms with human derived transcripts. In: Proceedings of the International Conference on Multimedia, MM ’10, pp. 1591–1594. ACM, New York, NY, USA (2010) Janin, A., Gottlieb, L., Friedland, G.: Joke-o-Mat HD: browsing sitcoms with human derived transcripts. In: Proceedings of the International Conference on Multimedia, MM ’10, pp. 1591–1594. ACM, New York, NY, USA (2010)
33.
Zurück zum Zitat Javed, O., Rasheed, Z., Shah, M.: A framework for segmentation of talk and game shows. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001, (2001) Javed, O., Rasheed, Z., Shah, M.: A framework for segmentation of talk and game shows. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001, (2001)
35.
Zurück zum Zitat Kender, J., Yeo, B.L.: Video scene segmentation via continuous video coherence. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 367–373 (1998) Kender, J., Yeo, B.L.: Video scene segmentation via continuous video coherence. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 367–373 (1998)
36.
37.
Zurück zum Zitat Kwon, Y.M., Song, C.J., Kim, I.J.: A new approach for high level video structuring. In: IEEE International Conference on Multimedia and Expo, ICME 2000. (2000) Kwon, Y.M., Song, C.J., Kim, I.J.: A new approach for high level video structuring. In: IEEE International Conference on Multimedia and Expo, ICME 2000. (2000)
38.
Zurück zum Zitat Kyperountas, M., Kotropoulos, C., Pitas, I.: Enhanced Eigen-Audioframes for audiovisual scene change detection. IEEE Trans. Multimed. 9(4), 785–797 (2007)CrossRef Kyperountas, M., Kotropoulos, C., Pitas, I.: Enhanced Eigen-Audioframes for audiovisual scene change detection. IEEE Trans. Multimed. 9(4), 785–797 (2007)CrossRef
39.
Zurück zum Zitat Liang, C., Zhang, Y., Cheng, J., Xu, C., Lu, H.: A novel role-based movie scene segmentation method. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds.) Advances in Multimedia Information Processing—PCM 2009, Lecture Notes in Computer Science, vol. 5879, Chap. 82, pp. 917–922. Springer, Berlin (2009) Liang, C., Zhang, Y., Cheng, J., Xu, C., Lu, H.: A novel role-based movie scene segmentation method. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds.) Advances in Multimedia Information Processing—PCM 2009, Lecture Notes in Computer Science, vol. 5879, Chap. 82, pp. 917–922. Springer, Berlin (2009)
40.
Zurück zum Zitat Lienbart, R., Pfeiffer, S., Effelsberg, W.: Scene determination based on video and audio features. In: IEEE International Conference on Multimedia Computing and Systems, vol. 1, pp. 685–690 (1999) Lienbart, R., Pfeiffer, S., Effelsberg, W.: Scene determination based on video and audio features. In: IEEE International Conference on Multimedia Computing and Systems, vol. 1, pp. 685–690 (1999)
41.
Zurück zum Zitat Lin, T., Zhang, H.J., Shi, Q.Y.: Video scene extraction by force competition. In: IEEE International Conference on Multimedia and Expo, p. 192 (2001) Lin, T., Zhang, H.J., Shi, Q.Y.: Video scene extraction by force competition. In: IEEE International Conference on Multimedia and Expo, p. 192 (2001)
42.
Zurück zum Zitat Liu, C., Huang, Q., Jiang, S., Xing, L., Ye, Q., Gao, W.: A framework for flexible summarization of racquet sports video using multiple modalities. Comput. Vis. Image Underst. 113(3), 415–424 (2009)CrossRef Liu, C., Huang, Q., Jiang, S., Xing, L., Ye, Q., Gao, W.: A framework for flexible summarization of racquet sports video using multiple modalities. Comput. Vis. Image Underst. 113(3), 415–424 (2009)CrossRef
43.
Zurück zum Zitat Lu, L., Cai, R., Hanjalic, A.: Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings, vol. 5, p. V (2006) Lu, L., Cai, R., Hanjalic, A.: Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings, vol. 5, p. V (2006)
44.
Zurück zum Zitat Lu, L., Zhang, H.J., Jiang, H.: Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10(7), 504–516 (2002)CrossRef Lu, L., Zhang, H.J., Jiang, H.: Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10(7), 504–516 (2002)CrossRef
45.
Zurück zum Zitat Mitrović, D., Hartlieb, S., Zeppelzauer, M., Zaharieva, M.: Scene segmentation in artistic archive documentaries. In: Leitner, G., Hitz, M., Holzinger, A. (eds.) HCI in Work and Learning, Life and Leisure, Lecture Notes in Computer Science, vol. 6389, Chap. 27, pp. 400–410. Springer, Berlin (2010) Mitrović, D., Hartlieb, S., Zeppelzauer, M., Zaharieva, M.: Scene segmentation in artistic archive documentaries. In: Leitner, G., Hitz, M., Holzinger, A. (eds.) HCI in Work and Learning, Life and Leisure, Lecture Notes in Computer Science, vol. 6389, Chap. 27, pp. 400–410. Springer, Berlin (2010)
46.
Zurück zum Zitat Monaco, J.: How to Read a Film: The World of Movies, Media, Multimedia: Language, History, Theory, 3 edn. Oxford University Press, USA (2000) Monaco, J.: How to Read a Film: The World of Movies, Media, Multimedia: Language, History, Theory, 3 edn. Oxford University Press, USA (2000)
47.
Zurück zum Zitat Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuits Syst. Video Technol. 15(2), 296–305 (2005)CrossRef Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuits Syst. Video Technol. 15(2), 296–305 (2005)CrossRef
48.
Zurück zum Zitat Ngo, C.W., Pong, T.C., Zhang, H.J.: Motion-based video representation for scene change detection. Int. J. Comput. Vis. 50(2), 127–142 (2002)CrossRefMATH Ngo, C.W., Pong, T.C., Zhang, H.J.: Motion-based video representation for scene change detection. Int. J. Comput. Vis. 50(2), 127–142 (2002)CrossRefMATH
49.
Zurück zum Zitat Nitanda, N., Haseyama, M., Kitajima, H.: Audio signal segmentation and classification for scene-cut detection. In: IEEE International Symposium on Circuits and Systems, 2005. ISCAS 2005, Vol. 4, pp. 4030– 4033 (2005) Nitanda, N., Haseyama, M., Kitajima, H.: Audio signal segmentation and classification for scene-cut detection. In: IEEE International Symposium on Circuits and Systems, 2005. ISCAS 2005, Vol. 4, pp. 4030– 4033 (2005)
50.
Zurück zum Zitat Niu, F., Goela, N., Divakaran, A., Abdel-Mottaleb, M.: Audio scene segmentation for video with generic content. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, vol. 6820 (2008) Niu, F., Goela, N., Divakaran, A., Abdel-Mottaleb, M.: Audio scene segmentation for video with generic content. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, vol. 6820 (2008)
51.
Zurück zum Zitat Odobez, J.M., Gatica-Perez, D., Guillemot, M.: Spectral structuring of home videos. In: Bakker, E., Lew, M., Huang, T., Sebe, N., Zhou, X. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 2728, Chap. 31, pp. 85–90. Springer, Berlin (2003) Odobez, J.M., Gatica-Perez, D., Guillemot, M.: Spectral structuring of home videos. In: Bakker, E., Lew, M., Huang, T., Sebe, N., Zhou, X. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 2728, Chap. 31, pp. 85–90. Springer, Berlin (2003)
52.
Zurück zum Zitat Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A.F., Kraaij, W., Quenot, G.: Trecvid 2010—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2010. NIST, USA (2010) Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A.F., Kraaij, W., Quenot, G.: Trecvid 2010—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2010. NIST, USA (2010)
53.
Zurück zum Zitat Parshin, V., Paradzinets, A., Chen, L.: Multimodal data fusion for video scene segmentation. In: Bres, S., Laurini, R. (eds.) Visual Information and Information Systems, Lecture Notes in Computer Science, vol. 3736, pp. 279–289. Springer, Berlin (2006) Parshin, V., Paradzinets, A., Chen, L.: Multimodal data fusion for video scene segmentation. In: Bres, S., Laurini, R. (eds.) Visual Information and Information Systems, Lecture Notes in Computer Science, vol. 3736, pp. 279–289. Springer, Berlin (2006)
54.
Zurück zum Zitat Petersohn, C.: Temporal video structuring for preservation and annotation of video content. In: 16th IEEE International Conference on Image Processing (ICIP), 2009, pp. 93–96 (2009) Petersohn, C.: Temporal video structuring for preservation and annotation of video content. In: 16th IEEE International Conference on Image Processing (ICIP), 2009, pp. 93–96 (2009)
55.
Zurück zum Zitat Poulisse, G., Moens, M.: Unsupervised scene detection in olympic video using multi-modal chains. In: 9th International Workshop on Content-Based Multimedia Indexing (CBMI), 2011, pp. 103–108 (2011) Poulisse, G., Moens, M.: Unsupervised scene detection in olympic video using multi-modal chains. In: 9th International Workshop on Content-Based Multimedia Indexing (CBMI), 2011, pp. 103–108 (2011)
56.
Zurück zum Zitat Rasheed, Z., Shah, M.: Scene Detection in Hollywood Movies and TV Shows. IEEE Computer Society, Los Alamitos, CA, USA, p. 343 (2003) Rasheed, Z., Shah, M.: Scene Detection in Hollywood Movies and TV Shows. IEEE Computer Society, Los Alamitos, CA, USA, p. 343 (2003)
57.
Zurück zum Zitat Rasheed, Z., Shah, M.: Detection and representation of scenes in videos. IEEE Trans. Multimed. 7(6), 1097–1105 (2005)CrossRef Rasheed, Z., Shah, M.: Detection and representation of scenes in videos. IEEE Trans. Multimed. 7(6), 1097–1105 (2005)CrossRef
58.
Zurück zum Zitat Rui, Y., Huang, T.S., Mehrotra, S.: Constructing table-of-content for videos. Multimed. Syst. 7(5), 359–368 (1999)CrossRef Rui, Y., Huang, T.S., Mehrotra, S.: Constructing table-of-content for videos. Multimed. Syst. 7(5), 359–368 (1999)CrossRef
59.
Zurück zum Zitat Sakarya, U., Telatar, Z.: Graph-based multilevel temporal video segmentation. Multimed. Syst. 14(5), 277–290 (2008)CrossRef Sakarya, U., Telatar, Z.: Graph-based multilevel temporal video segmentation. Multimed. Syst. 14(5), 277–290 (2008)CrossRef
60.
Zurück zum Zitat Sakarya, U., Telatar, Z.: Video scene detection using dominant sets. In: 15th IEEE International Conference on Image Processing, 2008. ICIP 2008, pp. 73–76 (2008) Sakarya, U., Telatar, Z.: Video scene detection using dominant sets. In: 15th IEEE International Conference on Image Processing, 2008. ICIP 2008, pp. 73–76 (2008)
61.
Zurück zum Zitat Sakarya, U., Telatar, Z.: Video scene detection using graph-based representations. Signal Process. Image Commun. 25(10), 774–783 (2010)CrossRef Sakarya, U., Telatar, Z.: Video scene detection using graph-based representations. Signal Process. Image Commun. 25(10), 774–783 (2010)CrossRef
62.
Zurück zum Zitat Sang, J., Xu, C.: Character-based movie summarization. In: Proceedings of the International Conference on Multimedia, MM ’10, pp. 855–858. ACM, New York, NY, USA (2010) Sang, J., Xu, C.: Character-based movie summarization. In: Proceedings of the International Conference on Multimedia, MM ’10, pp. 855–858. ACM, New York, NY, USA (2010)
63.
Zurück zum Zitat Schoeffmann, K., Lux, M., Taschwer, M., Boeszoermenyi, L.: Visualization of video motion in context of video browsing. In: Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, New York, USA (2009) Schoeffmann, K., Lux, M., Taschwer, M., Boeszoermenyi, L.: Visualization of video motion in context of video browsing. In: Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, New York, USA (2009)
64.
Zurück zum Zitat Schoeffmann, K., Taschwer, M., Boeszoermenyi, L.: The video explorer: a tool for navigation and searching within a single video based on fast content analysis. In: MMSys 10: Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems, p. 247–258. ACM, New York, NY, USA (2010) Schoeffmann, K., Taschwer, M., Boeszoermenyi, L.: The video explorer: a tool for navigation and searching within a single video based on fast content analysis. In: MMSys 10: Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems, p. 247–258. ACM, New York, NY, USA (2010)
65.
Zurück zum Zitat Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRef Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRef
66.
Zurück zum Zitat Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Kittler, J.: Differential edit distance: a metric for scene segmentation evaluation. IEEE Transa. Circuits Syst. Video Technol. 22(6), 904–914 (2012)CrossRef Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Kittler, J.: Differential edit distance: a metric for scene segmentation evaluation. IEEE Transa. Circuits Syst. Video Technol. 22(6), 904–914 (2012)CrossRef
67.
Zurück zum Zitat Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using High-Level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011)CrossRef Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using High-Level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011)CrossRef
68.
Zurück zum Zitat Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Trancoso, I.: Multi-modal scene segmentation using scene transition graphs. In: Proceedings of the Seventeen ACM International Conference on Multimedia, MM ’09, pp. 665–668. ACM, New York, NY, USA (2009) Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Trancoso, I.: Multi-modal scene segmentation using scene transition graphs. In: Proceedings of the Seventeen ACM International Conference on Multimedia, MM ’09, pp. 665–668. ACM, New York, NY, USA (2009)
69.
Zurück zum Zitat Song, Y., Ogawa, T., Haseyama, M.: MCMC-based scene segmentation method using structure of video. In: IEEE International Symposium on Communications and Information Technologies (ISCIT), pp. 862–866 (2010) Song, Y., Ogawa, T., Haseyama, M.: MCMC-based scene segmentation method using structure of video. In: IEEE International Symposium on Communications and Information Technologies (ISCIT), pp. 862–866 (2010)
70.
Zurück zum Zitat Sundaram, H., Chang, S.F.: Video scene segmentation using video and audio features. In: IEEE International Conference on Multimedia and Expo, 2000. ICME 2000 (2000) Sundaram, H., Chang, S.F.: Video scene segmentation using video and audio features. In: IEEE International Conference on Multimedia and Expo, 2000. ICME 2000 (2000)
71.
Zurück zum Zitat Sundaram, H., Chang, S.F.: Computable scenes and structures in films. IEEE Trans. Multimed. 4(4), 482–491 (2002)CrossRef Sundaram, H., Chang, S.F.: Computable scenes and structures in films. IEEE Trans. Multimed. 4(4), 482–491 (2002)CrossRef
72.
Zurück zum Zitat Surowiecki, J.: The Wisdom of Crowds. Anchor, New York (2005) Surowiecki, J.: The Wisdom of Crowds. Anchor, New York (2005)
73.
Zurück zum Zitat Tavanapong, W., Zhou, J.: Shot Clustering Techniques for Story Browsing. IEEE Trans. Multimed. 6(4), 517–527 (2004)CrossRef Tavanapong, W., Zhou, J.: Shot Clustering Techniques for Story Browsing. IEEE Trans. Multimed. 6(4), 517–527 (2004)CrossRef
74.
Zurück zum Zitat Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM Trans. Multimed. Comput. Commun. Appl. 3(1), 3+ (2007)CrossRef Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM Trans. Multimed. Comput. Commun. Appl. 3(1), 3+ (2007)CrossRef
75.
Zurück zum Zitat Truong, B.T., Venkatesh, S., Dorai, C.: Scene extraction in motion pictures. IEEE Trans. Circuits Syst. Video Technol. 13(1), 5–15 (2003)CrossRef Truong, B.T., Venkatesh, S., Dorai, C.: Scene extraction in motion pictures. IEEE Trans. Circuits Syst. Video Technol. 13(1), 5–15 (2003)CrossRef
76.
Zurück zum Zitat Velivelli, A., Ngo, C.W., Huang, T.S.: Detection of documentary scene changes by Audio-Visual fusion image and video retrieval. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 2728, Chap. 23, pp. 227–238. Springer, Berlin (2003) Velivelli, A., Ngo, C.W., Huang, T.S.: Detection of documentary scene changes by Audio-Visual fusion image and video retrieval. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 2728, Chap. 23, pp. 227–238. Springer, Berlin (2003)
77.
Zurück zum Zitat Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2002)CrossRef Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2002)CrossRef
78.
Zurück zum Zitat Vinciarelli, A., Favre, S.: Broadcast news story segmentation using social network analysis and hidden markov models. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA ’07, pp. 261–264. ACM, New York, NY, USA (2007) Vinciarelli, A., Favre, S.: Broadcast news story segmentation using social network analysis and hidden markov models. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA ’07, pp. 261–264. ACM, New York, NY, USA (2007)
79.
Zurück zum Zitat Wang, J., Duan, L., Liu, Q., Lu, H., Jin, J.S.: A multimodal scheme for program segmentation and representation in broadcast video streams. IEEE Trans. Multimed. 10(3), 393–408 (2008)CrossRef Wang, J., Duan, L., Liu, Q., Lu, H., Jin, J.S.: A multimodal scheme for program segmentation and representation in broadcast video streams. IEEE Trans. Multimed. 10(3), 393–408 (2008)CrossRef
80.
Zurück zum Zitat Wang, X., Wang, S., Xuejun, S., Gabbouj, M.: A shot clustering based algorithm for scene segmentation. In: International Conference on Computational Intelligence and Security Workshops, CISW 2007, pp. 259–252 (2007) Wang, X., Wang, S., Xuejun, S., Gabbouj, M.: A shot clustering based algorithm for scene segmentation. In: International Conference on Computational Intelligence and Security Workshops, CISW 2007, pp. 259–252 (2007)
81.
Zurück zum Zitat Weng, C.Y., Chu, W.T., Wu, J.L.: RoleNet: Movie analysis from the perspective of social networks. IEEE Trans. Multimed. 11(2), 256–271 (2009)CrossRef Weng, C.Y., Chu, W.T., Wu, J.L.: RoleNet: Movie analysis from the perspective of social networks. IEEE Trans. Multimed. 11(2), 256–271 (2009)CrossRef
82.
Zurück zum Zitat Wengang, C., De, X.: A novel approach of generating video scene structure. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 1, pp. 350– 353 (2003) Wengang, C., De, X.: A novel approach of generating video scene structure. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 1, pp. 350– 353 (2003)
83.
Zurück zum Zitat Wilson, K.W., Divakaran, A.: Discriminative genre-independent audio-visual scene change detection. SPIE, p. 725502 (2009) Wilson, K.W., Divakaran, A.: Discriminative genre-independent audio-visual scene change detection. SPIE, p. 725502 (2009)
84.
Zurück zum Zitat Xie, L.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognit. Lett. 25(7), 767–775 (2004)CrossRef Xie, L.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognit. Lett. 25(7), 767–775 (2004)CrossRef
85.
Zurück zum Zitat Yaşaroğlu, Y., Alatan, A.: Summarizing video: Content, features, and HMM topologies. In: García, N., Salgado, L., Martínez, J.M. (eds.) Visual Content Processing and Representation, Lecture Notes in Computer Science, vol. 2849, Chap. 15, pp. 101–110. Springer, Berlin (2003) Yaşaroğlu, Y., Alatan, A.: Summarizing video: Content, features, and HMM topologies. In: García, N., Salgado, L., Martínez, J.M. (eds.) Visual Content Processing and Representation, Lecture Notes in Computer Science, vol. 2849, Chap. 15, pp. 101–110. Springer, Berlin (2003)
86.
Zurück zum Zitat Yeung, M., Yeo, B.L., Liu, B.: Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst. 71(1), 94–109 (1998)CrossRef Yeung, M., Yeo, B.L., Liu, B.: Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst. 71(1), 94–109 (1998)CrossRef
87.
Zurück zum Zitat Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Trans. Multimed. 8(4), 686–697 (2006)CrossRef Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Trans. Multimed. 8(4), 686–697 (2006)CrossRef
88.
Zurück zum Zitat Zhai, Y., Yilmaz, A., Shah, M.: Story segmentation in news videos using visual and text cues. In: Leow, W.K., Lew, M., Chua, T.S., Ma, W.Y., Chaisorn, L., Bakker, E. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 3568, Chap. 13, pp. 92–102. Springer, Berlin (2005) Zhai, Y., Yilmaz, A., Shah, M.: Story segmentation in news videos using visual and text cues. In: Leow, W.K., Lew, M., Chua, T.S., Ma, W.Y., Chaisorn, L., Bakker, E. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 3568, Chap. 13, pp. 92–102. Springer, Berlin (2005)
89.
Zurück zum Zitat Zhang, Z., Li, B., Lu, H., Xue, X.: Scene segmentation based on video structure and spectral methods. In: 10th International Conference on Control, Automation, Robotics and Vision, 2008. ICARCV 2008, pp. 1093–1096 (2008) Zhang, Z., Li, B., Lu, H., Xue, X.: Scene segmentation based on video structure and spectral methods. In: 10th International Conference on Control, Automation, Robotics and Vision, 2008. ICARCV 2008, pp. 1093–1096 (2008)
90.
Zurück zum Zitat Zhao, L., Yang, S.Q., Feng, B.: Video scene detection using slide windows method based on temporal constrain shot similarity. In: IEEE International Conference on Multimedia and Expo, ICME 2001, pp. 1171– 1174 (2001) Zhao, L., Yang, S.Q., Feng, B.: Video scene detection using slide windows method based on temporal constrain shot similarity. In: IEEE International Conference on Multimedia and Expo, ICME 2001, pp. 1171– 1174 (2001)
91.
Zurück zum Zitat Zhao, Y., Wang, T., Wang, P., Hu, W., Du, Y., Zhang, Y., Xu, G.: Scene segmentation and categorization using ncuts. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’07, pp. 1–7 (2007) Zhao, Y., Wang, T., Wang, P., Hu, W., Du, Y., Zhang, Y., Xu, G.: Scene segmentation and categorization using ncuts. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’07, pp. 1–7 (2007)
92.
Zurück zum Zitat Zhou, J., Tavanapong, W.: Shot Weave: A shot clustering technique for story browsing for large video databases. In: Chaudhri, A., Unland, R., Djeraba, C., Lindner, W. (eds.) XML-Based Data Management and Multimedia Engineering EDBT 2002 Workshops, Lecture Notes in Computer Science, vol. 2490, Chap. 17, pp. 529–533. Springer, Berlin (2002) Zhou, J., Tavanapong, W.: Shot Weave: A shot clustering technique for story browsing for large video databases. In: Chaudhri, A., Unland, R., Djeraba, C., Lindner, W. (eds.) XML-Based Data Management and Multimedia Engineering EDBT 2002 Workshops, Lecture Notes in Computer Science, vol. 2490, Chap. 17, pp. 529–533. Springer, Berlin (2002)
93.
Zurück zum Zitat Zhu, S., Liu, Y.: Video scene segmentation and semantic representation using a novel scheme. Multimed. Tools Appl. 42(2), 183–205 (2009)CrossRef Zhu, S., Liu, Y.: Video scene segmentation and semantic representation using a novel scheme. Multimed. Tools Appl. 42(2), 183–205 (2009)CrossRef
Metadaten
Titel
State-of-the-art and future challenges in video scene detection: a survey
verfasst von
Manfred Del Fabro
Laszlo Böszörmenyi
Publikationsdatum
01.10.2013
Verlag
Springer Berlin Heidelberg
Erschienen in
Multimedia Systems / Ausgabe 5/2013
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-013-0306-4

Neuer Inhalt