nach oben

Journal on Multimodal User Interfaces

Erschienen in:

13.07.2018 | Original Paper

An approach for exploring a video via multimodal feature extraction and user interactions

verfasst von: Fahim A. Salim, Fasih Haider, Owen Conlan, Saturnino Luz

Erschienen in: Journal on Multimodal User Interfaces | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Exploring the content of a video is typically inefficient due to the linear streamed nature of its media and the lack of interactivity. Video may be seen as a combination of a set of features, the visual track, the audio track and transcription of the spoken words, etc. These features may be viewed as a set of temporally bounded parallel modalities. It is our contention that together these modalities and derived features have the potential to be presented individually or in discrete combination, to allow deeper and effective content exploration within different parts of a video in an interactive manner. A novel system for video exploration by offering video content as an alternative representation is proposed. The proposed system represents the extracted multimodal features as an automatically generated interactive multimedia webpage. This paper also presents a user study conducted to learn its (proposed system) usage patterns. The learned usage patterns may be utilized to build a template driven representation engine that uses the features to offer a multimodal synopsis of video that may lead to efficient exploration of video content.

Vorheriger Artikel Three recent trends in Paralinguistics on the way to omniscient machine intelligence

Nächster Artikel Explorations in multiparty casual social talk and its relevance for social human machine dialogue

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

https://techcrunch.com/2017/02/28/people-now-watch-1-billion-hours-of-youtube-per-day/—last verified: October 2017.

autosummarizer.com (2016) http://autosummarizer.com/

Belo L, Caetano C, do Patrocínio Z, Guimarães SJ (2016) Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing 173:1001–1016. https://doi.org/10.1016/j.neucom.2015.08.057 CrossRef

Bouamrane MM, King D, Luz S, Masoodian M (2004) A framework for collaborative writing with recording and post-meeting retrieval capabilities. In: Proceedings of the sixth international workshop on collaborative editing systems, Chicago, November 6, 2004. IEEE distributed systems online journal on collaborative computing

Bouamrane MM, Luz S (2007) An analytical evaluation of search by content and interaction patterns on multimodal meeting records. Multimed Syst 13(2):89–103. https://doi.org/10.1007/s00530-007-0087-8 CrossRef

Bradski G (2000) The OpenCV Library. Dr. Dobbs J Softw Tools 120:122–125

Calumby RT, André M, Torres S (2017) Neurocomputing diversity-based interactive learning meets multimodality. Neurocomputing 259:159–175. https://doi.org/10.1016/j.neucom.2016.08.129 CrossRef

Chen F, De Vleeschouwer C, Cavallaro A (2014) Resource allocation for personalized video summarization. IEEE Trans Multimed 16(2):455–469. https://doi.org/10.1109/TMM.2013.2291967 CrossRef

Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of NAACL 2000, Stroudsburg, PA, USA, pp 26–33

Cobârzan C, Schoeffmann K, Bailer W, Hürst W, Blažek A, Lokoč J, Vrochidis S, Barthel KU, Rossetto L (2017) Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimed Tools Appl 76(4):5539–5571. https://doi.org/10.1007/s11042-016-3661-2 CrossRef

10.

Craig CL, Friehs CG (2013) Video and HTML: testing online tutorial formats with biology students. J Web Librariansh 7(3):292–304. https://doi.org/10.1080/19322909.2013.815112 CrossRef

11.

Dong A, Li H (2008) Ontology-driven annotation and access of presentation video data. Estudios de Economía Aplicada 26(2):840–860

12.

Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimed 15(7):1553–1568. https://doi.org/10.1109/TMM.2013.2267205 CrossRef

13.

Farhadi B, Ghaznavi-Ghoushchi MB (2013) Creating a novel semantic video search engine through enrichment textual and temporal features of subtitled YouTube media fragments. In: Proceedings of the 3rd international conference on computer and knowledge engineering, ICCKE 2013 (Iccke), pp 64–72 https://doi.org/10.1109/ICCKE.2013.6682857

14.

Freeland C (2013) The rise of the new global super-rich. https://www.ted.com/talks/chrystia_freeland_the_rise_of_the_new_global_super_rich

15.

Galuščáková P, Saleh S, Pecina P (2016) SHAMUS: UFAL search and hyperlinking multimedia system. Springer, Cham, pp 853–856. https://doi.org/10.1007/978-3-319-30671-1_80 CrossRef

16.

Ganier F, de Vries P (2016) Are instructions in video format always better than photographs when learning manual techniques? The case of learning how to do sutures. Learn Instr 44:87–96. https://doi.org/10.1016/j.learninstruc.2016.03.004 CrossRef

17.

Girgensohn A, Marlow J, Shipman F, Wilcox L (2015) HyperMeeting: supporting asynchronous meetings with hypervideo. In: Proceedings of the 23rd annual ACM Conference on multimedia conference, pp 611–620. https://doi.org/10.1145/2733373.2806258

18.

Haesen M, Meskens J, Luyten K, Coninx K, Becker J, Tuytelaars T, Poulisse G, Pham T, Moens M (2011) Finding a needle in a haystack: an interactive video archive explorer for professional video searchers. Multimed Tools Appl 63(2):331–356. https://doi.org/10.1007/s11042-011-0809-y CrossRef

19.

Halvey M, Vallet D, Hannah D, Jose JM (2014) Supporting exploratory video retrieval tasks with grouping and recommendation. Inf Process Manag 50(6):876–898. https://doi.org/10.1016/j.ipm.2014.06.004 CrossRef

20.

Hosseini MS, Eftekhari-Moghadam AM (2013) Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video. Appl Soft Comput 13(2):846–866. https://doi.org/10.1016/j.asoc.2012.10.007 CrossRef

21.

Hudelist MA, Schoeffmann K, Xu Q (2015) Improving interactive known-item search in video with the keyframe navigation tree. Springer, Cham, pp 306–317

22.

Lei P, Sun C, Lin S, Huang T (2015) Effect of metacognitive strategies and verbal-imagery cognitive style on biology-based video search and learning performance. Comput Educ 87:326–339. https://doi.org/10.1016/j.compedu.2015.07.004 CrossRef

23.

Lienhart R, Kuranov A, Pisarevsky V (2003) Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: Proceedings of the 25th DAGM pattern recognition symposium, pp 297–304. https://doi.org/10.1007/978-3-540-45243-0_39

24.

Luz S, Masoodian M (2004) A mobile system for non-linear access to time-based data. In: Proceedings of the working conference on advanced visual interfaces, ACM, pp 454–457

25.

Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: ACL system demos, pp 55–60

26.

Marchionini G (2006) Exploratory search: from finding to understanding. Commun ACM 49(4):41–46. https://doi.org/10.1145/1121949.1121979 CrossRef

27.

Marchionini G (2006) From finding to understanding. Commun ACM 49(4):41–46CrossRef

28.

Matejka J, Grossman T, Fitzmaurice G (2014) Video lens : rapid playback and exploration of large video collections and associated metadata. In: Proceedings of UIST’14, pp 541–550. https://doi.org/10.1145/2642918.2647366

29.

Merkt M, Schwan S (2014) Training the use of interactive videos: effects on mastering different tasks. Instr Sci 42(3):421–441. https://doi.org/10.1007/s11251-013-9287-0 CrossRef

30.

Moumtzidou A, Avgerinakis K, Apostolidis E, Aleksić V, Markatopoulou F, Papagiannopoulou C, Vrochidis S, Mezaris V, Busch R, Kompatsiaris I (2014) VERGE: an interactive search engine for browsing video collections. Springer, Cham, pp 411–414

31.

Nautiyal A, Kenny E, Dawson-Howe K (2014) Video adaptation for the creation of advanced intelligent content for conferences. In: Irish machine vision and image processing conference, pp 122–127

32.

Pavel A, Reed C, Hartmann B, Agrawala M (2014) Video digests: a browsable, skimmable format for informational lecture videos. In: Symposium on user interface software and technology, USA, pp 573–582. https://doi.org/10.1145/2642918.2647400

33.

Piketty T (2014) New thoughts on capital in the twenty-first century. https://www.ted.com/talks/thomas_piketty_new_thoughts_on_capital_in_the_twenty_first_century

34.

Rafailidis D, Manolopoulou S, Daras P (2013) A unified framework for multimodal retrieval. Pattern Recognit 46(12):3358–3370. https://doi.org/10.1016/j.patcog.2013.05.023 CrossRef

35.

Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of CoNLL ’09, ACL, Stroudsburg, pp 147–155

36.

Rogers Y (2012) HCI theory: classical, modern, and contemporary, vol 5. Morgan & Claypool Publishers, San Rafael

37.

Salim FA, Haider F, Conlan O, Luz S (2017) An alternative approach to exploring a video. In: Karpov A, Potapova R, Mporas I (eds) Speech and computer. Springer, Cham, pp 109–118CrossRef

38.

Schoeffmann K, Taschwer M, Boeszoermenyi L (2010) The video explorer a tool for navigation and searching within a single video based on fast content analysis. In: Proceedings of the ACM conference on Multimedia systems, pp 247–258. https://doi.org/10.1145/1730836.1730867

39.

Shipman F, Girgensohn A, Wilcox L (2008) Authoring, viewing, and generating hypervideo. ACM Trans Multimed Comput Commun Appl 5(2):1–19. https://doi.org/10.1145/1413862.1413868 CrossRef

40.

Steinbock D (2016) http://tagcrowd.com/

41.

Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (2016) MultiMedia modeling. 22nd international conference, MMM 2016 Miami, FL, USA, January 4–6, 2016 proceedings, part I. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) vol 9516, pp 382–394. https://doi.org/10.1007/978-3-319-27671-7

42.

Tonndorf K, Handschigl C, Windscheid J, Kosch H, Granitzer M (2015) The effect of non-linear structures on the usage of hypervideo for physical training. IN: Proceedings—IEEE international conference on multimedia and expo, August 2015. https://doi.org/10.1109/ICME.2015.7177378

43.

Waitelonis J, Sack H (2012) Towards exploratory video search using linked data. Multimed Tools Appl 59(2):645–672. https://doi.org/10.1007/s11042-011-0733-1 CrossRef

44.

Zhang H, Liu Y, Ma Z (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119:10–16. https://doi.org/10.1016/j.neucom.2012.03.033 CrossRef

Titel: An approach for exploring a video via multimodal feature extraction and user interactions
verfasst von: Fahim A. Salim
Fasih Haider
Owen Conlan
Saturnino Luz
Publikationsdatum: 13.07.2018
Verlag: Springer International Publishing
Erschienen in: Journal on Multimodal User Interfaces / Ausgabe 4/2018
Print ISSN: 1783-7677
Elektronische ISSN: 1783-8738
DOI: https://doi.org/10.1007/s12193-018-0268-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2018

Speech communication integrated with other modalities

Multimodal speech recognition: increasing accuracy using high speed video data

Communication via warm haptic interfaces does not increase social warmth

Experimenting with lipreading for large vocabulary continuous speech recognition

Explorations in multiparty casual social talk and its relevance for social human machine dialogue

Three recent trends in Paralinguistics on the way to omniscient machine intelligence