Skip to main content

2019 | OriginalPaper | Buchkapitel

An Image Captioning Method for Infant Sleeping Environment Diagnosis

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a new method of image captioning, which generate textual description of an image. We applied our method for infant sleeping environment analysis and diagnosis to describe the image with the infant sleeping position, sleeping surface and bedding condition, which involves recognition and representation of body pose, activity and surrounding environment. In this challenging case, visual attention as an essential part of human visual perception is employed to efficiently process the visual input. Texture analysis is used to give a precise diagnosis of sleeping surface. The encoder-decoder model was trained by Microsoft COCO dataset combined with our own annotated dataset contains relevant information. The result shows it is able to generate description of the image and address the potential risk factors in the image, then give the corresponding advice based on the generated caption. It proved its ability to assist human in infant care-giving area and potential in other human assistive systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Kiros, R, Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014) Kiros, R, Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:​1411.​2539 (2014)
4.
Zurück zum Zitat Mao, J., Xu, W., Yang, Y., et al.: Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632 (2014) Mao, J., Xu, W., Yang, Y., et al.: Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:​1412.​6632 (2014)
5.
Zurück zum Zitat Wu, Q., Shen, C., Liu, L., et al.: What value do explicit high level concepts have in vision to language problems? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2016) Wu, Q., Shen, C., Liu, L., et al.: What value do explicit high level concepts have in vision to language problems? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2016)
6.
Zurück zum Zitat Liu, X., Milanova, M.: Visual attention in deep learning: a review. Int. Rob. Auto. J. 4(3), 154–155 (2018) Liu, X., Milanova, M.: Visual attention in deep learning: a review. Int. Rob. Auto. J. 4(3), 154–155 (2018)
10.
Zurück zum Zitat Fang, H., Gupta, S., Iandola, F., et al.: From captions to visual concepts and back. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 1473–1482 (2015) Fang, H., Gupta, S., Iandola, F., et al.: From captions to visual concepts and back. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 1473–1482 (2015)
11.
Zurück zum Zitat Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
12.
Zurück zum Zitat Socher, R., Karpathy, A., Le, Q.V., et al.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2(1), 207–218 (2014)CrossRef Socher, R., Karpathy, A., Le, Q.V., et al.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2(1), 207–218 (2014)CrossRef
13.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. IEEE (2015) Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. IEEE (2015)
14.
Zurück zum Zitat Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015) Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
15.
Zurück zum Zitat Cheng, M.M., Mitra, N.J., Huang, X., et al.: Global contrast based salient region detection. IEEE Trans. Patt. Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef Cheng, M.M., Mitra, N.J., Huang, X., et al.: Global contrast based salient region detection. IEEE Trans. Patt. Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef
16.
Zurück zum Zitat Mnih, V., Heess, N., Graves, A.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014) Mnih, V., Heess, N., Graves, A.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
17.
18.
Zurück zum Zitat Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000)CrossRef Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000)CrossRef
19.
Zurück zum Zitat Haralick, R.M., Shanmugam, K.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973)CrossRef Haralick, R.M., Shanmugam, K.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973)CrossRef
20.
Zurück zum Zitat Huang, X., Liu, X., Zhang, L.: A multichannel gray level co-occurrence matrix for multi/hyperspectral image texture representation. Remote Sens. 6(9), 8424–8445 (2014)CrossRef Huang, X., Liu, X., Zhang, L.: A multichannel gray level co-occurrence matrix for multi/hyperspectral image texture representation. Remote Sens. 6(9), 8424–8445 (2014)CrossRef
21.
Zurück zum Zitat Soh, L.K., Tsatsoulis, C.: Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 37(2), 780–795 (1999)CrossRef Soh, L.K., Tsatsoulis, C.: Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 37(2), 780–795 (1999)CrossRef
22.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
23.
Zurück zum Zitat Chen, X., Fang, H., Lin, T.Y., et al.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015) Chen, X., Fang, H., Lin, T.Y., et al.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:​1504.​00325 (2015)
24.
Zurück zum Zitat Kulkarni, G., Premraj, V., Ordonez, V., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Patt. Anal. Mach. Intell. 35(12), 2891–2903 (2013)CrossRef Kulkarni, G., Premraj, V., Ordonez, V., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Patt. Anal. Mach. Intell. 35(12), 2891–2903 (2013)CrossRef
25.
Zurück zum Zitat Mitchell, M., Han, X., Dodge, J., et al.: Midge: generating image descriptions from computer vision detections. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 747–756. Association for Computational Linguistics (2012) Mitchell, M., Han, X., Dodge, J., et al.: Midge: generating image descriptions from computer vision detections. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 747–756. Association for Computational Linguistics (2012)
26.
Zurück zum Zitat Yang, Y., Teo, C.L., Daumé III, H., et al.: Corpus-guided sentence generation of natural images. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 444–454. Association for Computational Linguistics (2011) Yang, Y., Teo, C.L., Daumé III, H., et al.: Corpus-guided sentence generation of natural images. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 444–454. Association for Computational Linguistics (2011)
27.
Zurück zum Zitat Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006) Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)
28.
Zurück zum Zitat Hearst, M.A., Dumais, S.T., Osuna, E., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef Hearst, M.A., Dumais, S.T., Osuna, E., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef
Metadaten
Titel
An Image Captioning Method for Infant Sleeping Environment Diagnosis
verfasst von
Xinyi Liu
Mariofanna Milanova
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-20984-1_2

Premium Partner