Skip to main content
Erschienen in: International Journal of Computer Vision 1/2017

31.03.2017

Guest Editorial: Image and Language Understanding

verfasst von: Margaret Mitchell, John C. Platt, Kate Saenko

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Excerpt

We are pleased to present this special issue of IJCV on combined image and language understanding. It contains some of the latest work in a long line of research into problems at the intersection of computer vision and natural language processing. …

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425–2433). Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425–2433).
Zurück zum Zitat Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., & Mitchell, M. (2015). Language models for image captioning: The quirks and what works. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Vol. 2: short papers, pp. 100–105). Association for Computational Linguistics, Beijing, China. http://www.aclweb.org/anthology/P15-2017. Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., & Mitchell, M. (2015). Language models for image captioning: The quirks and what works. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Vol. 2: short papers, pp. 100–105). Association for Computational Linguistics, Beijing, China. http://​www.​aclweb.​org/​anthology/​P15-2017.
Zurück zum Zitat Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625–2634). Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625–2634).
Zurück zum Zitat Fang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., & Platt, J. C., et al. (2015). From captions to visual concepts and back. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1473–1482). Fang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., & Platt, J. C., et al. (2015). From captions to visual concepts and back. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1473–1482).
Zurück zum Zitat Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., & Saenko, K. (2013). Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In 2013 IEEE international conference on computer vision (ICCV) (pp. 2712–2719). IEEE. Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., & Saenko, K. (2013). Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In 2013 IEEE international conference on computer vision (ICCV) (pp. 2712–2719). IEEE.
Zurück zum Zitat Jabri, A., Joulin, A., & van der Maaten, L. (2016). Revisiting visual question answering baselines. In European conference on computer vision (pp. 727–739). Berlin: Springer. Jabri, A., Joulin, A., & van der Maaten, L. (2016). Revisiting visual question answering baselines. In European conference on computer vision (pp. 727–739). Berlin: Springer.
Zurück zum Zitat Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128–3137). Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128–3137).
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Zurück zum Zitat Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al. (2013). Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2891–2903.CrossRef Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al. (2013). Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2891–2903.CrossRef
Zurück zum Zitat Lawrence, S., Giles, C. L., Tsoi, A. C., & Back, A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks, 8(1), 98–113.CrossRef Lawrence, S., Giles, C. L., Tsoi, A. C., & Back, A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks, 8(1), 98–113.CrossRef
Zurück zum Zitat LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.CrossRef LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.CrossRef
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (pp. 740–755). Berlin: Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (pp. 740–755). Berlin: Springer.
Zurück zum Zitat Malinowski, M., & Fritz, M. (2014). A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in neural information processing systems (pp. 1682–1690). Malinowski, M., & Fritz, M. (2014). A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in neural information processing systems (pp. 1682–1690).
Zurück zum Zitat Malisiewicz, T., & Efros, A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1222–1230). Malisiewicz, T., & Efros, A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1222–1230).
Zurück zum Zitat Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Interspeech (Vol. 2, pp. 1045–1048). Makuhari, Chiba: ISCA. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Interspeech (Vol. 2, pp. 1045–1048). Makuhari, Chiba: ISCA.
Zurück zum Zitat Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A., Yamaguchi, K., Berg, T., Stratos, K., & Daumé III, H. (2012). Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th conference of the European chapter of the association for computational linguistics (pp. 747–756). Association for Computational Linguistics. Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A., Yamaguchi, K., Berg, T., Stratos, K., & Daumé III, H. (2012). Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th conference of the European chapter of the association for computational linguistics (pp. 747–756). Association for Computational Linguistics.
Zurück zum Zitat Nowlan, S. J., & Platt, J. C. (1995). A convolutional neural network hand tracker. In Advances in neural information processing systems (pp. 901–908). Nowlan, S. J., & Platt, J. C. (1995). A convolutional neural network hand tracker. In Advances in neural information processing systems (pp. 901–908).
Zurück zum Zitat Parikh, D., Zitnick, C. L., & Chen, T. (2008). From appearance to context-based recognition: Dense labeling in small images. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–8). IEEE. Parikh, D., Zitnick, C. L., & Chen, T. (2008). From appearance to context-based recognition: Dense labeling in small images. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–8). IEEE.
Zurück zum Zitat Ren, M., Kiros, R., & Zemel, R. (2015). Exploring models and data for image question answering. In Advances in neural information processing systems (pp. 2953–2961). Ren, M., Kiros, R., & Zemel, R. (2015). Exploring models and data for image question answering. In Advances in neural information processing systems (pp. 2953–2961).
Zurück zum Zitat Roberts, L. G. (1963). Machine perception of three-dimensional solids. Ph.D. thesis, MIT. Roberts, L. G. (1963). Machine perception of three-dimensional solids. Ph.D. thesis, MIT.
Zurück zum Zitat Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1745–1752). IEEE. Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1745–1752). IEEE.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156–3164). Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156–3164).
Zurück zum Zitat Zitnick, C. L., Agrawal, A., Antol, S., Mitchell, M., Batra, D., & Parikh, D. (2016). Measuring machine intelligence through visual question answering. AI Magazine, 37(1). Zitnick, C. L., Agrawal, A., Antol, S., Mitchell, M., Batra, D., & Parikh, D. (2016). Measuring machine intelligence through visual question answering. AI Magazine, 37(1).
Metadaten
Titel
Guest Editorial: Image and Language Understanding
verfasst von
Margaret Mitchell
John C. Platt
Kate Saenko
Publikationsdatum
31.03.2017
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-017-0993-y

Weitere Artikel der Ausgabe 1/2017

International Journal of Computer Vision 1/2017 Zur Ausgabe

OriginalPaper

Movie Description