nach oben

Erschienen in:

2014 | OriginalPaper | Buchkapitel

10. Audio-Visual Fusion for Film Database Retrieval and Classification

verfasst von : Paisarn Muneesawang, Ning Zhang, Ling Guan

Erschienen in: Multimedia Database Retrieval

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This chapter presents the techniques for the characterization and fusion of audio and visual content in videos, and demonstrates their applications in movie database retrieval. In the audio domain, a study is conducted on the peaky nature of the distribution of wavelet coefficients of an audio signal, which cannot be effectively modeled by a single distribution. Thus, a new modeling method based on a Laplacian mixture model is studied for analyzing audio content and extracting audio features. The dimension of the indexed features is low, which is important for the retrieval efficiency of the system in terms of response time. Together with the audio feature, the visual feature is extracted by template frequency modeling. Both features are referred to as perceptual features. Then, a learning algorithm for audiovisual fusion is presented. Specifically, the two features are fused at the late fusion stage and input into a support vector machine to learn semantic concepts from a given video database. Based on the experimental results, the current system implementing the support vector machine-based fusion technique achieves high classification accuracy when applied to a large volume database containing Hollywood movies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Scalable Video Genre Classification and Event Detection

Nächstes Kapitel Motion Database Retrieval with Application to Gesture Recognition in a Virtual Reality Dance Training System

304.

Wikipedia, Type I and type II error http://en.wikipedia.org/wiki.Citedin2007

305.

C.-C. Chang, C.-J. Lin.: Training ν-support vector classifiers: Theory, algorithms, Neural Computation, Vol. 13, No. 9, 2119–2147 (2001)

306.

S. Ben-Yacoub, Y. Abdeljaoued, E. Mayoraz.: Fusion of face, speech data for person identity verification, IEEE Trans. on Neural Networks, Vol. 10, No. 5, 1065–1074 (1999)

307.

K. Wu, K.-H. Yap.: Fuzzy SVM for content-based image retrieval - A pseudo-label support vector machine framework. IEEE Computational Intelligence Magazine, vol.1, 10–16 (2006)

308.

D.W. Massaro.: Auditory visual speech processing, European Conference on Speech Communication, Technology, Aalborg, Denmark, 1153–1156 (2001)

309.

G. F. Meyer, J. B. Mulligan, S. M. Wuerger.: Continuous audio–visual digit recognition using N-best decision fusion. Inter. J. on Multi-Sensor, Multi-Source Information Fusion, Vol. 5, No. 2, 91–101 (2004)

310.

C. Cortes, V. Vapnik.: Support-vector network, Machine Learning, Vol. 20, No.3, 273–297 (1995)

311.

J. Zhou, L.-P. Xin, G. Rong.: Decision fusion based cartridge identification using support vector machine. Proc. IEEE Inter. Conf. on Systems, Man, Cybernetics, 2873–2877 (2000)

312.

L. Manovich.: The Language of New Media, MIT Press, Cambridge, (2001)

313.

D. Bordwell, K. Thompson.: Film Art: An Introduction, 7th edition, MaGraw-Hill, (2004)

314.

J.A. Lay, L. Guan.: Semantic retrieval of multimedia by concept languages, IEEE Signal Processing Magazine, Vol. 23, Issue 2, 115–123 (2006)

315.

J.A. Lay, L. Guan.: Retrieval for color artistry concepts, IEEE Trans. on Image Processing, Vol. 13, No. 3, 326–339 (2004)

316.

M. Petkovic, W. Jonker.: Content-based video retrieval by integrating spatio-temporal, stochastic recognition of events, in: Proc. IEEE Workshop on Detection, Recognition of Events in Video, 75–82 (2001)

317.

H. Miyamori, S.-I. Iisaku.: Video annotation for content-based retrieval using human behavior analysis, domain knowledge. Proc. IEEE Automatic Face, Gesture Recognition, 320–325 (2000)

318.

G. Sudhir, J.C.M. Lee, A.K. Jain.: Automatic classification of tennis video for high-level content-based retrieval. Proc. IEEE Content-based Access of Image, Video Database, 81–90 (1998)

319.

J. Vesanto, E. Alhoniemi.: Clustering of the self-organizing map. IEEE Trans. Neural Network, vol. 11, no. 3, 586–600 (2000)

320.

H. S. Chang, S. Sull, S. U. Lee.: Efficient video indexing scheme for content-based retrieval, IEEE Trans. on Circuits, Systems for Video Technology, vo. 9, no. 8, 1269–1279 (1999)

321.

C.-W. Ngo, T.-C. Pong, H.-J. Zhang.: On clustering, retrieval of video shots. Proc. ACM Multimedia, 51–60 (2001)

322.

A.M. Ferman, A.M. Tekalp.: Efficient filtering, clustering methods for temporal video segmentation, visual summarization. J. of Visual Comm. and Image Rep., 9(4), 336–351 (1998)

323.

G. Salton, E.A. Fox, E. Voorheers.: Advanced feedback methods in information retrieval. J. of the American Society for Information science, vol. 36, No. 3, 200–210 (1985)

324.

Muneesawang, P., Guan, L.: iARM-an interactive video retrieval system. Proc. IEEE ICME, 285–288 (2004)

325.

Usevitch, B. E.: A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Processing Magazine, 18(5), 22–35 (2001)CrossRef

326.

Jain, A. K., Duin, R. P. W., Mao, J.: Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis, Machine Intelligence, 22(1), 4–37 (2000)

327.

Figueiredo, M. A., Jain, A. K.: Unsupervised selection, estimation of finite mixture models. Proc IEEE Pattern Recognition, Vol. 2, 2087–2087 (2000)

328.

Crouse, M. S., Nowak, R. D., Baraniuk, R. G.: Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. on Signal Processing, 46(4), 886–902 (1998)MathSciNetCrossRef

329.

Muneesawang, P., Guan, L.: An interactive approach for CBIR using a network of radial basis functions. IEEE Trans. on Multimedia, 6(5), 703–716 (2004)CrossRef

330.

T. Kohonen.: Self-Organizing MAPS. 2nd edition, Springer-Verlag, (1997)

331.

S. Haykin.:, Neural Networks, a Comprehensive Foundation, Prentice Hall, (1999)

332.

Chang, C. C., Lin, C. J.: Library of SVMs, LIBSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvm. (2008)

333.

Stauffer, C.: Automated audio-visual analysis, MIT Artificial Intelligence Laboratory Memo. http://people.csail.mit.edu/stauffer/Home. (2005)

Titel: Audio-Visual Fusion for Film Database Retrieval and Classification
verfasst von: Paisarn Muneesawang
Ning Zhang
Ling Guan
Verlag: Springer International Publishing
Buch: Multimedia Database Retrieval
Print ISBN: 978-3-319-11781-2

Electronic ISBN: 978-3-319-11782-9

Copyright-Jahr: 2014
DOI: https://doi.org/10.1007/978-3-319-11782-9_10

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.