nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

An Image Captioning Method for Infant Sleeping Environment Diagnosis

verfasst von : Xinyi Liu, Mariofanna Milanova

Erschienen in: Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents a new method of image captioning, which generate textual description of an image. We applied our method for infant sleeping environment analysis and diagnosis to describe the image with the infant sleeping position, sleeping surface and bedding condition, which involves recognition and representation of body pose, activity and surrounding environment. In this challenging case, visual attention as an essential part of human visual perception is employed to efficiently process the visual input. Texture analysis is used to give a precise diagnosis of sleeping surface. The encoder-decoder model was trained by Microsoft COCO dataset combined with our own annotated dataset contains relevant information. The result shows it is able to generate description of the image and address the potential risk factors in the image, then give the corresponding advice based on the generated caption. It proved its ability to assist human in infant care-giving area and potential in other human assistive systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Multi-focus Image Fusion with PCA Filters of PCANet

Nächstes Kapitel A First-Person Vision Dataset of Office Activities

https://www1.nichd.nih.gov/sts/about/SIDS/Pages/default.aspx. Accessed 01 June 2018

https://www.cdc.gov/sids/data.htm. Accessed 01 June 2018

Kiros, R, Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)

Mao, J., Xu, W., Yang, Y., et al.: Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632 (2014)

Wu, Q., Shen, C., Liu, L., et al.: What value do explicit high level concepts have in vision to language problems? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2016)

Liu, X., Milanova, M.: Visual attention in deep learning: a review. Int. Rob. Auto. J. 4(3), 154–155 (2018)

http://pediatrics.aappublications.org/content/early/2016/10/20/peds.2016-2938

https://store.nanit.com/. Accessed 01 June 2018

https://owletcare.com/. Accessed 01 June 2018

10.

Fang, H., Gupta, S., Iandola, F., et al.: From captions to visual concepts and back. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 1473–1482 (2015)

11.

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

12.

Socher, R., Karpathy, A., Le, Q.V., et al.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2(1), 207–218 (2014)CrossRef

13.

Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. IEEE (2015)

14.

Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

15.

Cheng, M.M., Mitra, N.J., Huang, X., et al.: Global contrast based salient region detection. IEEE Trans. Patt. Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef

16.

Mnih, V., Heess, N., Graves, A.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)

17.

Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)

18.

Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000)CrossRef

19.

Haralick, R.M., Shanmugam, K.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973)CrossRef

20.

Huang, X., Liu, X., Zhang, L.: A multichannel gray level co-occurrence matrix for multi/hyperspectral image texture representation. Remote Sens. 6(9), 8424–8445 (2014)CrossRef

21.

Soh, L.K., Tsatsoulis, C.: Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 37(2), 780–795 (1999)CrossRef

22.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

23.

Chen, X., Fang, H., Lin, T.Y., et al.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)

24.

Kulkarni, G., Premraj, V., Ordonez, V., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Patt. Anal. Mach. Intell. 35(12), 2891–2903 (2013)CrossRef

25.

Mitchell, M., Han, X., Dodge, J., et al.: Midge: generating image descriptions from computer vision detections. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 747–756. Association for Computational Linguistics (2012)

26.

Yang, Y., Teo, C.L., Daumé III, H., et al.: Corpus-guided sentence generation of natural images. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 444–454. Association for Computational Linguistics (2011)

27.

Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)

28.

Hearst, M.A., Dumais, S.T., Osuna, E., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef

Titel: An Image Captioning Method for Infant Sleeping Environment Diagnosis
verfasst von: Xinyi Liu
Mariofanna Milanova
Verlag: Springer International Publishing
Buch: Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction
Print ISBN: 978-3-030-20983-4

Electronic ISBN: 978-3-030-20984-1

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-20984-1_2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner