nach oben

Universal Access in the Information Society

Erschienen in:

01.08.2009 | Long Paper

Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

verfasst von: Langis Gagnon, Samuel Foucher, Maguelonne Heritier, Marc Lalonde, David Byrns, Claude Chapdelaine, James Turner, Suzanne Mathieu, Denis Laurendeau, Nath Tan Nguyen, Denis Ouellet

Erschienen in: Universal Access in the Information Society | Ausgabe 3/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.

Vorheriger Artikel Visualization of math expressions through modality-nonspecific signals

Nächster Artikel Linguistic diversity and information poverty in South Asia and Sub-Saharan Africa

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Canadian Radio-television and Telecommunications Communication: Broadcasting Decision CRTC 2002-384. http://www.crtc.gc.ca/archive/ENG/Decisions/2002/db2002-384.htm (2002)

Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 1–36 (2004)

Turner, J.M.: Some characteristics of audio description and the corresponding moving image. In: Preston, C.M., Medford, N.J. (eds.) Proceedings of the 61st ASIS Annual Meeting, Pittsburgh, 24–29 October 1998, Information Today, pp. 108–117 (1998)

Turner, J.M., Colinet, E.: Using audio description for indexing moving images. Knowl. Org. 31(4), 222–230 (2004)

Office of Communication: ITC guidance on standards for audio description. http://www.ofcom.org.uk/static/archive/itc/itc_publications/codes_guidance/audio_description/index.asp.html (2000)

Canadian Network for Inclusive Cultural Exchange: Online video description guidelines. http://cnice.utoronto.ca/guidelines/video.php (2005)

Guidelines for video description. http://www.joeclark.org/access/description/ad-principles.html

Mathieu, S.: Audiovision Interactive et Adaptable, Technical Report for the E-inclusion Research Network (2007)

Gagnon, L., Foucher, S., Laliberté, F., Lalonde, M., Beaulieu, M.: Towards an application of content-based video indexing to computer-assisted descriptive video. In: Proceedings of Computer and Robot Vision 2006, 8 pp (on CD-ROM) (2006)

10.

Héritier, M., Gagnon, L., Foucher, S.: Places clustering of full-length film key-frames using latent aspects modeling over SIFT matches. IEEE Trans. Circuits Syst. Video Technol. (to appear) (2008)

11.

Foucher, S., Gagnon, L.: Automatic detection and clustering of actor faces based on spectral clustering techniques. In: Proceedings of Computer and Robot Vision 2007, 8 pp (on CD-ROM) (2007)

12.

Lalonde, M., Gagnon, L.: Key-text spotting in documentary videos using Adaboost. In: Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Applications of Neural Networks and Machine Learning in Image Processing X (SPIE #6064B) (2006)

13.

Branje, C., Marshall, S., Tyndall, A., Fels, D.I.: LiveDescribe. In: Proceedings of the AMCIS 2006 (2006)

14.

TRECVID. http://www-nlpir.nist.gov/projects/trecvid/

15.

State-of-the-art on Multimedia Search Engines, Technical Report D2.1. Chorus Project Consortium (2007)

16.

CIMWOS project. http://www.xanthi.ilsp.gr/cimwos

17.

SCHEMA network of excellence. http://www.iti.gr/SCHEMA/index.html

18.

VIZIR project. http://vizir.ims.tuwien.ac.at/index.html

19.

Center for Digital Video Processing. http://www.cdvp.dcu.i.e

20.

CALIPH and EMIR project. http://caliph-emir.sourceforge.net

21.

IBM VideoAnnEx project. http://www.research.ibm.com/VideoAnnEx

22.

Ricoh MovieTool project. http://www.ricoh.co.jp/src/multimedia/MovieTool

23.

IBM Marvel project. http://mp7.watson.ibm.com/marvel

24.

MADIS project. http://madis.crim.ca

25.

Gagnon, L., Foucher, S., Gouaillier, V., Brousseau, J., Boulianne, G., Osterrath, F., Chapdelaine, C., Brun, C., Dutrisac, J., St-Onge, F., Champagne, B., Lu, X.: MPEG-7 Audio-Visual Indexing Test-Bed for Video Retrieval, IS&T/SPIE Electronic Imaging 2004: Internet Imaging V (SPIE #5304), pp. 319–329 (2003)

26.

Foucher, S., Héritier, M., Lalonde, M., Byrns, D., Chapdelaine, C., Gagnon, L.: Proof-of-concept software tools for video content extraction applied to computer-assisted descriptive video, and results of consultations with producers, technical report, CRIM-07/04-07, 2007 (2007)

27.

Mathieu, S., Turner, J.M.: Audiovision interactive et adaptable, technical report, 2007. http://hdl.handle.net/1866/1307 (2007)

28.

Turner, J.M., Mathieu, S.: Audio description for indexing films, World Library and Information Congress (IFLA), Durban. http://members.e-inclusion.crim.ca/files/articles/IFLA-en.pdf (2007)

29.

Fels, D.I., Udo, J.P., Diamond, J.E., Diamond, J.I.: A first person narrative approach to video description for animated comedy. J. Vis. Impair. Blind. 100(5), 295–305 (2006)

30.

Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2000)CrossRef

31.

Bovik, A.C. (ed.): Handbook of Image and Video Processing. Academic Press, New York (2000)

32.

Schaffalitzky, F., Zisserman, A.: Automated location matching in movies. Comput. Vis. Image Underst. 42:236–264 (2003)

33.

Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR (1999)

34.

Bosch, A., Zisserman, A., Munoz, S.: Scene Classification via pLSA. In: ECCV (2006)

35.

Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV (2005)

36.

Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005)

37.

Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects categories in image collection, MIT AI Lab Memo AIM-2005-005 (2005)

38.

Lowe, D.G.: Distinctive Image Features from Scale-invariant Keypoints. In: IJCV (2004)

39.

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHCrossRef

40.

Ng, A.Y., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS (2002)

41.

Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2) (2004)

42.

Gagnon, L., Laliberté, F., Foucher, S., Laurendeau, D., Branzan Albu, A.: A System for Tracking and Recognizing Pedestrian Faces using a Network of Loosely Coupled Cameras, SPIE Defense and Security: Visual Information Processing XV (SPIE #6246), Orlando (2006)

43.

Yang, J., Zhang, D., Frangi, A.F., Yanf, J.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004)CrossRef

44.

Kong, H., Li, X., Wang, L., Teoh, E.K., Wang, J.G., Venkateswarlu, R.: Generalized 2D principal component analysis. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2005)

45.

Zhang, D., Zhou, Z.H., Chen, S.: Diagonal principal component analysis for face recognition. Pattern Recognit. 39(1), 140–142 (2006)CrossRef

46.

Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: VideoOCR: indexing digital news libraries by recognition of superimposed caption. ACM J. Multime. Syst. 7(5), 385–395 (1999)CrossRef

47.

Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 12(4), 256–268 (2002)CrossRef

48.

Chen, X., Yuille, A.L.: Detecting and Reading Text in Natural Scenes. In: CVPR, vol. II, pp. 366–373 (2004)

49.

http://www.up.univ-mrs.fr/veronis/data/bigrammes.html

50.

Ouellet, D., Nguyen, N.T., Dung, V.V., Laurendeau, D.: Gait and Gesture Description, Technical Report, Laval University (2007)

51.

Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)

52.

Tomasi, C., Kanade, T.: Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132 (1991)

53.

Birchfield, S.: KLT: an Implementation of the Kanade-Lucas-Tomasi Feature Tracker. http://www.ces.clemson.edu/~stb/klt

54.

Bailer, W., Schallauer, P., Thallinger, G.: Camera Motion Detection, Joanneum Research. In: TRECVID (2005)

55.

Birchfield, S.: Derivation of Kanade-Lucas-Tomasi Tracking Equation. http://www.ces.clemson.edu/~stb/klt/birchfield-klt-derivation.pdf (unpublished) (1997)

56.

Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York (1981)

57.

Rote, G.: Computing the minimum Hausdorff distance between two point sets on a line under translation. Inf. Process. Lett. 38, 123–127 (1991)MATHCrossRefMathSciNet

58.

AVISynth. http://avisynth.org

Titel: Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss
verfasst von: Langis Gagnon
Samuel Foucher
Maguelonne Heritier
Marc Lalonde
David Byrns
Claude Chapdelaine
James Turner
Suzanne Mathieu
Denis Laurendeau
Nath Tan Nguyen
Denis Ouellet
Publikationsdatum: 01.08.2009
Verlag: Springer-Verlag
Erschienen in: Universal Access in the Information Society / Ausgabe 3/2009
Print ISSN: 1615-5289
Elektronische ISSN: 1615-5297
DOI: https://doi.org/10.1007/s10209-008-0141-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2009

Visualization of math expressions through modality-nonspecific signals

An empirical study of factors affecting the perceived usability of websites for student Internet users

Linguistic diversity and information poverty in South Asia and Sub-Saharan Africa

Evaluating choice in universal access: an example from rehabilitation robotics

e-Document management in situated interactivity: the WIL approach

Towards co-design with users who have autism spectrum disorders

Premium Partner