Skip to main content
Erschienen in: Universal Access in the Information Society 3/2009

01.08.2009 | Long Paper

Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

verfasst von: Langis Gagnon, Samuel Foucher, Maguelonne Heritier, Marc Lalonde, David Byrns, Claude Chapdelaine, James Turner, Suzanne Mathieu, Denis Laurendeau, Nath Tan Nguyen, Denis Ouellet

Erschienen in: Universal Access in the Information Society | Ausgabe 3/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 1–36 (2004) Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 1–36 (2004)
3.
Zurück zum Zitat Turner, J.M.: Some characteristics of audio description and the corresponding moving image. In: Preston, C.M., Medford, N.J. (eds.) Proceedings of the 61st ASIS Annual Meeting, Pittsburgh, 24–29 October 1998, Information Today, pp. 108–117 (1998) Turner, J.M.: Some characteristics of audio description and the corresponding moving image. In: Preston, C.M., Medford, N.J. (eds.) Proceedings of the 61st ASIS Annual Meeting, Pittsburgh, 24–29 October 1998, Information Today, pp. 108–117 (1998)
4.
Zurück zum Zitat Turner, J.M., Colinet, E.: Using audio description for indexing moving images. Knowl. Org. 31(4), 222–230 (2004) Turner, J.M., Colinet, E.: Using audio description for indexing moving images. Knowl. Org. 31(4), 222–230 (2004)
8.
Zurück zum Zitat Mathieu, S.: Audiovision Interactive et Adaptable, Technical Report for the E-inclusion Research Network (2007) Mathieu, S.: Audiovision Interactive et Adaptable, Technical Report for the E-inclusion Research Network (2007)
9.
Zurück zum Zitat Gagnon, L., Foucher, S., Laliberté, F., Lalonde, M., Beaulieu, M.: Towards an application of content-based video indexing to computer-assisted descriptive video. In: Proceedings of Computer and Robot Vision 2006, 8 pp (on CD-ROM) (2006) Gagnon, L., Foucher, S., Laliberté, F., Lalonde, M., Beaulieu, M.: Towards an application of content-based video indexing to computer-assisted descriptive video. In: Proceedings of Computer and Robot Vision 2006, 8 pp (on CD-ROM) (2006)
10.
Zurück zum Zitat Héritier, M., Gagnon, L., Foucher, S.: Places clustering of full-length film key-frames using latent aspects modeling over SIFT matches. IEEE Trans. Circuits Syst. Video Technol. (to appear) (2008) Héritier, M., Gagnon, L., Foucher, S.: Places clustering of full-length film key-frames using latent aspects modeling over SIFT matches. IEEE Trans. Circuits Syst. Video Technol. (to appear) (2008)
11.
Zurück zum Zitat Foucher, S., Gagnon, L.: Automatic detection and clustering of actor faces based on spectral clustering techniques. In: Proceedings of Computer and Robot Vision 2007, 8 pp (on CD-ROM) (2007) Foucher, S., Gagnon, L.: Automatic detection and clustering of actor faces based on spectral clustering techniques. In: Proceedings of Computer and Robot Vision 2007, 8 pp (on CD-ROM) (2007)
12.
Zurück zum Zitat Lalonde, M., Gagnon, L.: Key-text spotting in documentary videos using Adaboost. In: Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Applications of Neural Networks and Machine Learning in Image Processing X (SPIE #6064B) (2006) Lalonde, M., Gagnon, L.: Key-text spotting in documentary videos using Adaboost. In: Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Applications of Neural Networks and Machine Learning in Image Processing X (SPIE #6064B) (2006)
13.
Zurück zum Zitat Branje, C., Marshall, S., Tyndall, A., Fels, D.I.: LiveDescribe. In: Proceedings of the AMCIS 2006 (2006) Branje, C., Marshall, S., Tyndall, A., Fels, D.I.: LiveDescribe. In: Proceedings of the AMCIS 2006 (2006)
15.
Zurück zum Zitat State-of-the-art on Multimedia Search Engines, Technical Report D2.1. Chorus Project Consortium (2007) State-of-the-art on Multimedia Search Engines, Technical Report D2.1. Chorus Project Consortium (2007)
25.
Zurück zum Zitat Gagnon, L., Foucher, S., Gouaillier, V., Brousseau, J., Boulianne, G., Osterrath, F., Chapdelaine, C., Brun, C., Dutrisac, J., St-Onge, F., Champagne, B., Lu, X.: MPEG-7 Audio-Visual Indexing Test-Bed for Video Retrieval, IS&T/SPIE Electronic Imaging 2004: Internet Imaging V (SPIE #5304), pp. 319–329 (2003) Gagnon, L., Foucher, S., Gouaillier, V., Brousseau, J., Boulianne, G., Osterrath, F., Chapdelaine, C., Brun, C., Dutrisac, J., St-Onge, F., Champagne, B., Lu, X.: MPEG-7 Audio-Visual Indexing Test-Bed for Video Retrieval, IS&T/SPIE Electronic Imaging 2004: Internet Imaging V (SPIE #5304), pp. 319–329 (2003)
26.
Zurück zum Zitat Foucher, S., Héritier, M., Lalonde, M., Byrns, D., Chapdelaine, C., Gagnon, L.: Proof-of-concept software tools for video content extraction applied to computer-assisted descriptive video, and results of consultations with producers, technical report, CRIM-07/04-07, 2007 (2007) Foucher, S., Héritier, M., Lalonde, M., Byrns, D., Chapdelaine, C., Gagnon, L.: Proof-of-concept software tools for video content extraction applied to computer-assisted descriptive video, and results of consultations with producers, technical report, CRIM-07/04-07, 2007 (2007)
29.
Zurück zum Zitat Fels, D.I., Udo, J.P., Diamond, J.E., Diamond, J.I.: A first person narrative approach to video description for animated comedy. J. Vis. Impair. Blind. 100(5), 295–305 (2006) Fels, D.I., Udo, J.P., Diamond, J.E., Diamond, J.I.: A first person narrative approach to video description for animated comedy. J. Vis. Impair. Blind. 100(5), 295–305 (2006)
30.
Zurück zum Zitat Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2000)CrossRef Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2000)CrossRef
31.
Zurück zum Zitat Bovik, A.C. (ed.): Handbook of Image and Video Processing. Academic Press, New York (2000) Bovik, A.C. (ed.): Handbook of Image and Video Processing. Academic Press, New York (2000)
32.
Zurück zum Zitat Schaffalitzky, F., Zisserman, A.: Automated location matching in movies. Comput. Vis. Image Underst. 42:236–264 (2003) Schaffalitzky, F., Zisserman, A.: Automated location matching in movies. Comput. Vis. Image Underst. 42:236–264 (2003)
33.
Zurück zum Zitat Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR (1999) Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR (1999)
34.
Zurück zum Zitat Bosch, A., Zisserman, A., Munoz, S.: Scene Classification via pLSA. In: ECCV (2006) Bosch, A., Zisserman, A., Munoz, S.: Scene Classification via pLSA. In: ECCV (2006)
35.
Zurück zum Zitat Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV (2005) Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV (2005)
36.
Zurück zum Zitat Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005) Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005)
37.
Zurück zum Zitat Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects categories in image collection, MIT AI Lab Memo AIM-2005-005 (2005) Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects categories in image collection, MIT AI Lab Memo AIM-2005-005 (2005)
38.
Zurück zum Zitat Lowe, D.G.: Distinctive Image Features from Scale-invariant Keypoints. In: IJCV (2004) Lowe, D.G.: Distinctive Image Features from Scale-invariant Keypoints. In: IJCV (2004)
39.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHCrossRef Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHCrossRef
40.
Zurück zum Zitat Ng, A.Y., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS (2002) Ng, A.Y., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS (2002)
41.
Zurück zum Zitat Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2) (2004) Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2) (2004)
42.
Zurück zum Zitat Gagnon, L., Laliberté, F., Foucher, S., Laurendeau, D., Branzan Albu, A.: A System for Tracking and Recognizing Pedestrian Faces using a Network of Loosely Coupled Cameras, SPIE Defense and Security: Visual Information Processing XV (SPIE #6246), Orlando (2006) Gagnon, L., Laliberté, F., Foucher, S., Laurendeau, D., Branzan Albu, A.: A System for Tracking and Recognizing Pedestrian Faces using a Network of Loosely Coupled Cameras, SPIE Defense and Security: Visual Information Processing XV (SPIE #6246), Orlando (2006)
43.
Zurück zum Zitat Yang, J., Zhang, D., Frangi, A.F., Yanf, J.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004)CrossRef Yang, J., Zhang, D., Frangi, A.F., Yanf, J.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004)CrossRef
44.
Zurück zum Zitat Kong, H., Li, X., Wang, L., Teoh, E.K., Wang, J.G., Venkateswarlu, R.: Generalized 2D principal component analysis. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2005) Kong, H., Li, X., Wang, L., Teoh, E.K., Wang, J.G., Venkateswarlu, R.: Generalized 2D principal component analysis. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2005)
45.
Zurück zum Zitat Zhang, D., Zhou, Z.H., Chen, S.: Diagonal principal component analysis for face recognition. Pattern Recognit. 39(1), 140–142 (2006)CrossRef Zhang, D., Zhou, Z.H., Chen, S.: Diagonal principal component analysis for face recognition. Pattern Recognit. 39(1), 140–142 (2006)CrossRef
46.
Zurück zum Zitat Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: VideoOCR: indexing digital news libraries by recognition of superimposed caption. ACM J. Multime. Syst. 7(5), 385–395 (1999)CrossRef Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: VideoOCR: indexing digital news libraries by recognition of superimposed caption. ACM J. Multime. Syst. 7(5), 385–395 (1999)CrossRef
47.
Zurück zum Zitat Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 12(4), 256–268 (2002)CrossRef Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 12(4), 256–268 (2002)CrossRef
48.
Zurück zum Zitat Chen, X., Yuille, A.L.: Detecting and Reading Text in Natural Scenes. In: CVPR, vol. II, pp. 366–373 (2004) Chen, X., Yuille, A.L.: Detecting and Reading Text in Natural Scenes. In: CVPR, vol. II, pp. 366–373 (2004)
50.
Zurück zum Zitat Ouellet, D., Nguyen, N.T., Dung, V.V., Laurendeau, D.: Gait and Gesture Description, Technical Report, Laval University (2007) Ouellet, D., Nguyen, N.T., Dung, V.V., Laurendeau, D.: Gait and Gesture Description, Technical Report, Laval University (2007)
51.
Zurück zum Zitat Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981) Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
52.
Zurück zum Zitat Tomasi, C., Kanade, T.: Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132 (1991) Tomasi, C., Kanade, T.: Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132 (1991)
54.
Zurück zum Zitat Bailer, W., Schallauer, P., Thallinger, G.: Camera Motion Detection, Joanneum Research. In: TRECVID (2005) Bailer, W., Schallauer, P., Thallinger, G.: Camera Motion Detection, Joanneum Research. In: TRECVID (2005)
56.
Zurück zum Zitat Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York (1981) Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York (1981)
57.
Zurück zum Zitat Rote, G.: Computing the minimum Hausdorff distance between two point sets on a line under translation. Inf. Process. Lett. 38, 123–127 (1991)MATHCrossRefMathSciNet Rote, G.: Computing the minimum Hausdorff distance between two point sets on a line under translation. Inf. Process. Lett. 38, 123–127 (1991)MATHCrossRefMathSciNet
Metadaten
Titel
Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss
verfasst von
Langis Gagnon
Samuel Foucher
Maguelonne Heritier
Marc Lalonde
David Byrns
Claude Chapdelaine
James Turner
Suzanne Mathieu
Denis Laurendeau
Nath Tan Nguyen
Denis Ouellet
Publikationsdatum
01.08.2009
Verlag
Springer-Verlag
Erschienen in
Universal Access in the Information Society / Ausgabe 3/2009
Print ISSN: 1615-5289
Elektronische ISSN: 1615-5297
DOI
https://doi.org/10.1007/s10209-008-0141-0

Weitere Artikel der Ausgabe 3/2009

Universal Access in the Information Society 3/2009 Zur Ausgabe

Premium Partner