nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

26.04.2021 | Original Paper

Knowledge-driven description synthesis for floor plan interpretation

verfasst von: Shreya Goyal, Chiranjoy Chattopadhyay, Gaurav Bhatnagar

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1-2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

Vorheriger Artikel Arrow R-CNN for handwritten diagram recognition

Nächster Artikel Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adam, S., Ogier, J.M., Cariou, C., Mullot, R., Labiche, J., Gardes, J.: Symbol and character recognition: application to engineering drawings. IJDAR 3(2), 89–101 (2000)CrossRef

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

Barducci, A., Marinai, S.: Object recognition in floor plans by graphs of white connected components. In: ICPR (2012)

Chatterjee, M., Schwing, A.G.: Diverse and coherent paragraph generation from images. In: ECCV (2018)

Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)

Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

de las Heras, L.P., Terrades, O.R., Robles, S., Sánchez, G.: CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR 18(1), 15–30 (2015)

Delalandre, M., Valveny, E., Pridmore, T., Karatzas, D.: Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems. IJDAR 13(3), 187–207 (2010)CrossRef

Dutta, A., Llados, J., Pal, U.: Symbol spotting in line drawings through graph paths hashing. In: ICDAR (2011)

10.

Dutta, A., Lladós, J., Pal, U.: A symbol spotting approach in graphical documents by hashing serialized graphs. Pattern Recognit. 46(3), 752–768 (2013)CrossRef

11.

Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010)

12.

Girshick, R.: Fast R-CNN. In: ICCV (2015)

13.

Goyal, S., Bhavsar, S., Patel, S., Chattopadhyay, C., Bhatnagar, G.: SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity-based grammar. Image Process. 13(13), 2623–2635 (2019)CrossRef

14.

Goyal, S., Chattopadhyay, C., Bhatnagar, G.: ASYSST: a framework for synopsis synthesis empowering visually impaired. In: MAHCI (2018)

15.

Goyal, S., Chattopadhyay, C., Bhatnagar, G.: Plan2Text: a framework for describing building floor plan images from first person perspective. In: CSPA (2018)

16.

Goyal, S., Mistry, V., Chattopadhyay, C., Bhatnagar, G.: BRIDGE: building plan repository for image description generation, and evaluation. In: ICDAR (2019)

17.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

18.

He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. T-PAMI 37(9), 1904–1916 (2015)CrossRef

19.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

20.

Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)

21.

Khan, I., Islam, N., Rehman, H.U., Khan, M.: A comparative study of graphic symbol recognition methods. Multimedia Tools Appl. 79(13), 8695–8725 (2020)CrossRef

22.

Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017)

23.

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)MathSciNetCrossRef

24.

Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Babytalk: understanding and generating simple image descriptions. T-PAMI 35(12), 2891–2903 (2013)CrossRef

25.

Li, S., Kulkarni, G., Berg, T.L., Berg, A.C., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL (2011)

26.

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014)

27.

Liu, Y., Fu, J., Mei, T., Chen, C.W.: Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In: AAAI (2017)

28.

Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

29.

Madugalla, A., Marriott, K., Marinai, S., Capobianco, S., Goncu, C.: Creating accessible online floor plans for visually impaired readers. ACM T-ACCESS 13(4), 1–37 (2020)CrossRef

30.

Mao, Y., Zhou, C., Wang, X., Li, R.: Show and tell more: topic-oriented multi-sentence image captioning. In: IJCAI (2018)

31.

Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank (1993)

32.

Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)

33.

Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: NIPS (2011)

34.

Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: NIPS (2015)

35.

Qureshi, R.J., Ramel, J.Y., Barret, D., Cardot, H.: Spotting symbols in line drawing images using graph representations. In: GREC (2007)

36.

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

37.

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)

38.

Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

39.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

40.

Rezvanifar, A., Cote, M., Branzan Albu, A.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: Proceedings of the IEEE/CVF CVPR Workshops, pp. 568–569 (2020)

41.

Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015)

42.

Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems, pp. 3856–3866 (2017)

43.

Saha, R., Mondal, A., Jawahar, C.: Graphical Object Detection in Document Images. In: ICDAR (2019)

44.

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: ICDAR (2017)

45.

Sharma, D., Gupta, N., Chattopadhyay, C., Mehta, S.: DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In: ICDAR (2017)

46.

Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Signature and Logo Detection using Deep CNN for Document Image Retrieval. In: ICFHR (2018)

47.

Su, H., Gong, S., Zhu, X.: Scalable logo detection by self co-learning. Pattern Recognition 97, 107003 (2020)CrossRef

48.

Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. NIPS (2014)

49.

Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

50.

Wang, Q., Chan, A.B.: CNN+CNN: convolutional decoders for image captioning. arXiv preprint arXiv:1805.09019 (2018)

51.

Wang, Z., Luo, Y., Li, Y., Huang, Z., Yin, H.: Look Deeper See Richer: Depth-aware Image Paragraph Captioning. In: ACM MM (2018)

52.

Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: ICCV (2017)

53.

Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR (2017)

54.

Ziran, Z., Marinai, S.: Object detection in floor plan images. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition (2018)

Titel: Knowledge-driven description synthesis for floor plan interpretation
verfasst von: Shreya Goyal
Chiranjoy Chattopadhyay
Gaurav Bhatnagar
Publikationsdatum: 26.04.2021
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1-2/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-021-00367-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1-2/2021

Deep learning for graphics recognition: document understanding and beyond

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples

Persian handwritten digit, character and word recognition using deep learning

Text recognition for Vietnamese identity card based on deep features network

Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Premium Partner