Skip to main content

2014 | OriginalPaper | Buchkapitel

10. Audio-Visual Fusion for Film Database Retrieval and Classification

verfasst von : Paisarn Muneesawang, Ning Zhang, Ling Guan

Erschienen in: Multimedia Database Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter presents the techniques for the characterization and fusion of audio and visual content in videos, and demonstrates their applications in movie database retrieval. In the audio domain, a study is conducted on the peaky nature of the distribution of wavelet coefficients of an audio signal, which cannot be effectively modeled by a single distribution. Thus, a new modeling method based on a Laplacian mixture model is studied for analyzing audio content and extracting audio features. The dimension of the indexed features is low, which is important for the retrieval efficiency of the system in terms of response time. Together with the audio feature, the visual feature is extracted by template frequency modeling. Both features are referred to as perceptual features. Then, a learning algorithm for audiovisual fusion is presented. Specifically, the two features are fused at the late fusion stage and input into a support vector machine to learn semantic concepts from a given video database. Based on the experimental results, the current system implementing the support vector machine-based fusion technique achieves high classification accuracy when applied to a large volume database containing Hollywood movies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
305.
Zurück zum Zitat C.-C. Chang, C.-J. Lin.: Training ν-support vector classifiers: Theory, algorithms, Neural Computation, Vol. 13, No. 9, 2119–2147 (2001) C.-C. Chang, C.-J. Lin.: Training ν-support vector classifiers: Theory, algorithms, Neural Computation, Vol. 13, No. 9, 2119–2147 (2001)
306.
Zurück zum Zitat S. Ben-Yacoub, Y. Abdeljaoued, E. Mayoraz.: Fusion of face, speech data for person identity verification, IEEE Trans. on Neural Networks, Vol. 10, No. 5, 1065–1074 (1999) S. Ben-Yacoub, Y. Abdeljaoued, E. Mayoraz.: Fusion of face, speech data for person identity verification, IEEE Trans. on Neural Networks, Vol. 10, No. 5, 1065–1074 (1999)
307.
Zurück zum Zitat K. Wu, K.-H. Yap.: Fuzzy SVM for content-based image retrieval - A pseudo-label support vector machine framework. IEEE Computational Intelligence Magazine, vol.1, 10–16 (2006) K. Wu, K.-H. Yap.: Fuzzy SVM for content-based image retrieval - A pseudo-label support vector machine framework. IEEE Computational Intelligence Magazine, vol.1, 10–16 (2006)
308.
Zurück zum Zitat D.W. Massaro.: Auditory visual speech processing, European Conference on Speech Communication, Technology, Aalborg, Denmark, 1153–1156 (2001) D.W. Massaro.: Auditory visual speech processing, European Conference on Speech Communication, Technology, Aalborg, Denmark, 1153–1156 (2001)
309.
Zurück zum Zitat G. F. Meyer, J. B. Mulligan, S. M. Wuerger.: Continuous audio–visual digit recognition using N-best decision fusion. Inter. J. on Multi-Sensor, Multi-Source Information Fusion, Vol. 5, No. 2, 91–101 (2004) G. F. Meyer, J. B. Mulligan, S. M. Wuerger.: Continuous audio–visual digit recognition using N-best decision fusion. Inter. J. on Multi-Sensor, Multi-Source Information Fusion, Vol. 5, No. 2, 91–101 (2004)
310.
Zurück zum Zitat C. Cortes, V. Vapnik.: Support-vector network, Machine Learning, Vol. 20, No.3, 273–297 (1995) C. Cortes, V. Vapnik.: Support-vector network, Machine Learning, Vol. 20, No.3, 273–297 (1995)
311.
Zurück zum Zitat J. Zhou, L.-P. Xin, G. Rong.: Decision fusion based cartridge identification using support vector machine. Proc. IEEE Inter. Conf. on Systems, Man, Cybernetics, 2873–2877 (2000) J. Zhou, L.-P. Xin, G. Rong.: Decision fusion based cartridge identification using support vector machine. Proc. IEEE Inter. Conf. on Systems, Man, Cybernetics, 2873–2877 (2000)
312.
Zurück zum Zitat L. Manovich.: The Language of New Media, MIT Press, Cambridge, (2001) L. Manovich.: The Language of New Media, MIT Press, Cambridge, (2001)
313.
Zurück zum Zitat D. Bordwell, K. Thompson.: Film Art: An Introduction, 7th edition, MaGraw-Hill, (2004) D. Bordwell, K. Thompson.: Film Art: An Introduction, 7th edition, MaGraw-Hill, (2004)
314.
Zurück zum Zitat J.A. Lay, L. Guan.: Semantic retrieval of multimedia by concept languages, IEEE Signal Processing Magazine, Vol. 23, Issue 2, 115–123 (2006) J.A. Lay, L. Guan.: Semantic retrieval of multimedia by concept languages, IEEE Signal Processing Magazine, Vol. 23, Issue 2, 115–123 (2006)
315.
Zurück zum Zitat J.A. Lay, L. Guan.: Retrieval for color artistry concepts, IEEE Trans. on Image Processing, Vol. 13, No. 3, 326–339 (2004) J.A. Lay, L. Guan.: Retrieval for color artistry concepts, IEEE Trans. on Image Processing, Vol. 13, No. 3, 326–339 (2004)
316.
Zurück zum Zitat M. Petkovic, W. Jonker.: Content-based video retrieval by integrating spatio-temporal, stochastic recognition of events, in: Proc. IEEE Workshop on Detection, Recognition of Events in Video, 75–82 (2001) M. Petkovic, W. Jonker.: Content-based video retrieval by integrating spatio-temporal, stochastic recognition of events, in: Proc. IEEE Workshop on Detection, Recognition of Events in Video, 75–82 (2001)
317.
Zurück zum Zitat H. Miyamori, S.-I. Iisaku.: Video annotation for content-based retrieval using human behavior analysis, domain knowledge. Proc. IEEE Automatic Face, Gesture Recognition, 320–325 (2000) H. Miyamori, S.-I. Iisaku.: Video annotation for content-based retrieval using human behavior analysis, domain knowledge. Proc. IEEE Automatic Face, Gesture Recognition, 320–325 (2000)
318.
Zurück zum Zitat G. Sudhir, J.C.M. Lee, A.K. Jain.: Automatic classification of tennis video for high-level content-based retrieval. Proc. IEEE Content-based Access of Image, Video Database, 81–90 (1998) G. Sudhir, J.C.M. Lee, A.K. Jain.: Automatic classification of tennis video for high-level content-based retrieval. Proc. IEEE Content-based Access of Image, Video Database, 81–90 (1998)
319.
Zurück zum Zitat J. Vesanto, E. Alhoniemi.: Clustering of the self-organizing map. IEEE Trans. Neural Network, vol. 11, no. 3, 586–600 (2000) J. Vesanto, E. Alhoniemi.: Clustering of the self-organizing map. IEEE Trans. Neural Network, vol. 11, no. 3, 586–600 (2000)
320.
Zurück zum Zitat H. S. Chang, S. Sull, S. U. Lee.: Efficient video indexing scheme for content-based retrieval, IEEE Trans. on Circuits, Systems for Video Technology, vo. 9, no. 8, 1269–1279 (1999) H. S. Chang, S. Sull, S. U. Lee.: Efficient video indexing scheme for content-based retrieval, IEEE Trans. on Circuits, Systems for Video Technology, vo. 9, no. 8, 1269–1279 (1999)
321.
Zurück zum Zitat C.-W. Ngo, T.-C. Pong, H.-J. Zhang.: On clustering, retrieval of video shots. Proc. ACM Multimedia, 51–60 (2001) C.-W. Ngo, T.-C. Pong, H.-J. Zhang.: On clustering, retrieval of video shots. Proc. ACM Multimedia, 51–60 (2001)
322.
Zurück zum Zitat A.M. Ferman, A.M. Tekalp.: Efficient filtering, clustering methods for temporal video segmentation, visual summarization. J. of Visual Comm. and Image Rep., 9(4), 336–351 (1998) A.M. Ferman, A.M. Tekalp.: Efficient filtering, clustering methods for temporal video segmentation, visual summarization. J. of Visual Comm. and Image Rep., 9(4), 336–351 (1998)
323.
Zurück zum Zitat G. Salton, E.A. Fox, E. Voorheers.: Advanced feedback methods in information retrieval. J. of the American Society for Information science, vol. 36, No. 3, 200–210 (1985) G. Salton, E.A. Fox, E. Voorheers.: Advanced feedback methods in information retrieval. J. of the American Society for Information science, vol. 36, No. 3, 200–210 (1985)
324.
Zurück zum Zitat Muneesawang, P., Guan, L.: iARM-an interactive video retrieval system. Proc. IEEE ICME, 285–288 (2004) Muneesawang, P., Guan, L.: iARM-an interactive video retrieval system. Proc. IEEE ICME, 285–288 (2004)
325.
Zurück zum Zitat Usevitch, B. E.: A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Processing Magazine, 18(5), 22–35 (2001)CrossRef Usevitch, B. E.: A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Processing Magazine, 18(5), 22–35 (2001)CrossRef
326.
Zurück zum Zitat Jain, A. K., Duin, R. P. W., Mao, J.: Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis, Machine Intelligence, 22(1), 4–37 (2000) Jain, A. K., Duin, R. P. W., Mao, J.: Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis, Machine Intelligence, 22(1), 4–37 (2000)
327.
Zurück zum Zitat Figueiredo, M. A., Jain, A. K.: Unsupervised selection, estimation of finite mixture models. Proc IEEE Pattern Recognition, Vol. 2, 2087–2087 (2000) Figueiredo, M. A., Jain, A. K.: Unsupervised selection, estimation of finite mixture models. Proc IEEE Pattern Recognition, Vol. 2, 2087–2087 (2000)
328.
Zurück zum Zitat Crouse, M. S., Nowak, R. D., Baraniuk, R. G.: Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. on Signal Processing, 46(4), 886–902 (1998)MathSciNetCrossRef Crouse, M. S., Nowak, R. D., Baraniuk, R. G.: Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. on Signal Processing, 46(4), 886–902 (1998)MathSciNetCrossRef
329.
Zurück zum Zitat Muneesawang, P., Guan, L.: An interactive approach for CBIR using a network of radial basis functions. IEEE Trans. on Multimedia, 6(5), 703–716 (2004)CrossRef Muneesawang, P., Guan, L.: An interactive approach for CBIR using a network of radial basis functions. IEEE Trans. on Multimedia, 6(5), 703–716 (2004)CrossRef
330.
Zurück zum Zitat T. Kohonen.: Self-Organizing MAPS. 2nd edition, Springer-Verlag, (1997) T. Kohonen.: Self-Organizing MAPS. 2nd edition, Springer-Verlag, (1997)
331.
Zurück zum Zitat S. Haykin.:, Neural Networks, a Comprehensive Foundation, Prentice Hall, (1999) S. Haykin.:, Neural Networks, a Comprehensive Foundation, Prentice Hall, (1999)
Metadaten
Titel
Audio-Visual Fusion for Film Database Retrieval and Classification
verfasst von
Paisarn Muneesawang
Ning Zhang
Ling Guan
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-11782-9_10

Neuer Inhalt