Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 1-2/2021

26.04.2021 | Original Paper

Knowledge-driven description synthesis for floor plan interpretation

verfasst von: Shreya Goyal, Chiranjoy Chattopadhyay, Gaurav Bhatnagar

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1-2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adam, S., Ogier, J.M., Cariou, C., Mullot, R., Labiche, J., Gardes, J.: Symbol and character recognition: application to engineering drawings. IJDAR 3(2), 89–101 (2000)CrossRef Adam, S., Ogier, J.M., Cariou, C., Mullot, R., Labiche, J., Gardes, J.: Symbol and character recognition: application to engineering drawings. IJDAR 3(2), 89–101 (2000)CrossRef
2.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
3.
Zurück zum Zitat Barducci, A., Marinai, S.: Object recognition in floor plans by graphs of white connected components. In: ICPR (2012) Barducci, A., Marinai, S.: Object recognition in floor plans by graphs of white connected components. In: ICPR (2012)
4.
Zurück zum Zitat Chatterjee, M., Schwing, A.G.: Diverse and coherent paragraph generation from images. In: ECCV (2018) Chatterjee, M., Schwing, A.G.: Diverse and coherent paragraph generation from images. In: ECCV (2018)
5.
Zurück zum Zitat Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015) Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:​1504.​00325 (2015)
6.
Zurück zum Zitat Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014) Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:​1409.​1259 (2014)
7.
Zurück zum Zitat de las Heras, L.P., Terrades, O.R., Robles, S., Sánchez, G.: CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR 18(1), 15–30 (2015) de las Heras, L.P., Terrades, O.R., Robles, S., Sánchez, G.: CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR 18(1), 15–30 (2015)
8.
Zurück zum Zitat Delalandre, M., Valveny, E., Pridmore, T., Karatzas, D.: Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems. IJDAR 13(3), 187–207 (2010)CrossRef Delalandre, M., Valveny, E., Pridmore, T., Karatzas, D.: Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems. IJDAR 13(3), 187–207 (2010)CrossRef
9.
Zurück zum Zitat Dutta, A., Llados, J., Pal, U.: Symbol spotting in line drawings through graph paths hashing. In: ICDAR (2011) Dutta, A., Llados, J., Pal, U.: Symbol spotting in line drawings through graph paths hashing. In: ICDAR (2011)
10.
Zurück zum Zitat Dutta, A., Lladós, J., Pal, U.: A symbol spotting approach in graphical documents by hashing serialized graphs. Pattern Recognit. 46(3), 752–768 (2013)CrossRef Dutta, A., Lladós, J., Pal, U.: A symbol spotting approach in graphical documents by hashing serialized graphs. Pattern Recognit. 46(3), 752–768 (2013)CrossRef
11.
Zurück zum Zitat Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010) Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010)
12.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: ICCV (2015) Girshick, R.: Fast R-CNN. In: ICCV (2015)
13.
Zurück zum Zitat Goyal, S., Bhavsar, S., Patel, S., Chattopadhyay, C., Bhatnagar, G.: SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity-based grammar. Image Process. 13(13), 2623–2635 (2019)CrossRef Goyal, S., Bhavsar, S., Patel, S., Chattopadhyay, C., Bhatnagar, G.: SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity-based grammar. Image Process. 13(13), 2623–2635 (2019)CrossRef
14.
Zurück zum Zitat Goyal, S., Chattopadhyay, C., Bhatnagar, G.: ASYSST: a framework for synopsis synthesis empowering visually impaired. In: MAHCI (2018) Goyal, S., Chattopadhyay, C., Bhatnagar, G.: ASYSST: a framework for synopsis synthesis empowering visually impaired. In: MAHCI (2018)
15.
Zurück zum Zitat Goyal, S., Chattopadhyay, C., Bhatnagar, G.: Plan2Text: a framework for describing building floor plan images from first person perspective. In: CSPA (2018) Goyal, S., Chattopadhyay, C., Bhatnagar, G.: Plan2Text: a framework for describing building floor plan images from first person perspective. In: CSPA (2018)
16.
Zurück zum Zitat Goyal, S., Mistry, V., Chattopadhyay, C., Bhatnagar, G.: BRIDGE: building plan repository for image description generation, and evaluation. In: ICDAR (2019) Goyal, S., Mistry, V., Chattopadhyay, C., Bhatnagar, G.: BRIDGE: building plan repository for image description generation, and evaluation. In: ICDAR (2019)
17.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
18.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. T-PAMI 37(9), 1904–1916 (2015)CrossRef He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. T-PAMI 37(9), 1904–1916 (2015)CrossRef
19.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
20.
Zurück zum Zitat Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016) Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)
21.
Zurück zum Zitat Khan, I., Islam, N., Rehman, H.U., Khan, M.: A comparative study of graphic symbol recognition methods. Multimedia Tools Appl. 79(13), 8695–8725 (2020)CrossRef Khan, I., Islam, N., Rehman, H.U., Khan, M.: A comparative study of graphic symbol recognition methods. Multimedia Tools Appl. 79(13), 8695–8725 (2020)CrossRef
22.
Zurück zum Zitat Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017) Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017)
23.
Zurück zum Zitat Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)MathSciNetCrossRef Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)MathSciNetCrossRef
24.
Zurück zum Zitat Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Babytalk: understanding and generating simple image descriptions. T-PAMI 35(12), 2891–2903 (2013)CrossRef Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Babytalk: understanding and generating simple image descriptions. T-PAMI 35(12), 2891–2903 (2013)CrossRef
25.
Zurück zum Zitat Li, S., Kulkarni, G., Berg, T.L., Berg, A.C., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL (2011) Li, S., Kulkarni, G., Berg, T.L., Berg, A.C., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL (2011)
26.
Zurück zum Zitat Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014) Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014)
27.
Zurück zum Zitat Liu, Y., Fu, J., Mei, T., Chen, C.W.: Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In: AAAI (2017) Liu, Y., Fu, J., Mei, T., Chen, C.W.: Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In: AAAI (2017)
28.
Zurück zum Zitat Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015) Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:​1508.​04025 (2015)
29.
Zurück zum Zitat Madugalla, A., Marriott, K., Marinai, S., Capobianco, S., Goncu, C.: Creating accessible online floor plans for visually impaired readers. ACM T-ACCESS 13(4), 1–37 (2020)CrossRef Madugalla, A., Marriott, K., Marinai, S., Capobianco, S., Goncu, C.: Creating accessible online floor plans for visually impaired readers. ACM T-ACCESS 13(4), 1–37 (2020)CrossRef
30.
Zurück zum Zitat Mao, Y., Zhou, C., Wang, X., Li, R.: Show and tell more: topic-oriented multi-sentence image captioning. In: IJCAI (2018) Mao, Y., Zhou, C., Wang, X., Li, R.: Show and tell more: topic-oriented multi-sentence image captioning. In: IJCAI (2018)
31.
Zurück zum Zitat Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank (1993) Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank (1993)
32.
Zurück zum Zitat Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016) Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:​1602.​06023 (2016)
33.
Zurück zum Zitat Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: NIPS (2011) Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: NIPS (2011)
34.
Zurück zum Zitat Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: NIPS (2015) Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: NIPS (2015)
35.
Zurück zum Zitat Qureshi, R.J., Ramel, J.Y., Barret, D., Cardot, H.: Spotting symbols in line drawing images using graph representations. In: GREC (2007) Qureshi, R.J., Ramel, J.Y., Barret, D., Cardot, H.: Spotting symbols in line drawing images using graph representations. In: GREC (2007)
36.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
37.
Zurück zum Zitat Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017) Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
39.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
40.
Zurück zum Zitat Rezvanifar, A., Cote, M., Branzan Albu, A.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: Proceedings of the IEEE/CVF CVPR Workshops, pp. 568–569 (2020) Rezvanifar, A., Cote, M., Branzan Albu, A.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: Proceedings of the IEEE/CVF CVPR Workshops, pp. 568–569 (2020)
41.
Zurück zum Zitat Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015) Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:​1509.​00685 (2015)
42.
Zurück zum Zitat Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems, pp. 3856–3866 (2017) Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems, pp. 3856–3866 (2017)
43.
Zurück zum Zitat Saha, R., Mondal, A., Jawahar, C.: Graphical Object Detection in Document Images. In: ICDAR (2019) Saha, R., Mondal, A., Jawahar, C.: Graphical Object Detection in Document Images. In: ICDAR (2019)
44.
Zurück zum Zitat Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: ICDAR (2017) Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: ICDAR (2017)
45.
Zurück zum Zitat Sharma, D., Gupta, N., Chattopadhyay, C., Mehta, S.: DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In: ICDAR (2017) Sharma, D., Gupta, N., Chattopadhyay, C., Mehta, S.: DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In: ICDAR (2017)
46.
Zurück zum Zitat Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Signature and Logo Detection using Deep CNN for Document Image Retrieval. In: ICFHR (2018) Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Signature and Logo Detection using Deep CNN for Document Image Retrieval. In: ICFHR (2018)
47.
Zurück zum Zitat Su, H., Gong, S., Zhu, X.: Scalable logo detection by self co-learning. Pattern Recognition 97, 107003 (2020)CrossRef Su, H., Gong, S., Zhu, X.: Scalable logo detection by self co-learning. Pattern Recognition 97, 107003 (2020)CrossRef
48.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. NIPS (2014) Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. NIPS (2014)
49.
Zurück zum Zitat Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001) Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
51.
Zurück zum Zitat Wang, Z., Luo, Y., Li, Y., Huang, Z., Yin, H.: Look Deeper See Richer: Depth-aware Image Paragraph Captioning. In: ACM MM (2018) Wang, Z., Luo, Y., Li, Y., Huang, Z., Yin, H.: Look Deeper See Richer: Depth-aware Image Paragraph Captioning. In: ACM MM (2018)
52.
Zurück zum Zitat Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: ICCV (2017) Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: ICCV (2017)
53.
Zurück zum Zitat Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR (2017) Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR (2017)
54.
Zurück zum Zitat Ziran, Z., Marinai, S.: Object detection in floor plan images. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition (2018) Ziran, Z., Marinai, S.: Object detection in floor plan images. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition (2018)
Metadaten
Titel
Knowledge-driven description synthesis for floor plan interpretation
verfasst von
Shreya Goyal
Chiranjoy Chattopadhyay
Gaurav Bhatnagar
Publikationsdatum
26.04.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1-2/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-021-00367-3

Weitere Artikel der Ausgabe 1-2/2021

International Journal on Document Analysis and Recognition (IJDAR) 1-2/2021 Zur Ausgabe

Premium Partner