Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 4/2015

01-12-2015 | Original Paper

g-DICE: graph mining-based document information content exploitation

Author: K. C. Santosh

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 4/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we present document information content (i.e. text fields) extraction technique via graph mining. Real-world users first provide a set of key text fields from the document image which they think are important. These fields are used to initialise a graph where nodes are labelled with the field names in addition to other features such as size, type and number of words, and edges are attributed with relative positioning between them. Such an attributed relational graph is then used to mine similar graphs from document images which are used to update the initial graph iteratively each time we extract them, to produce a graph model. Graph models, therefore, are employed in the absence of users. We have validated the proposed technique and evaluated its scientific impact on real-world industrial problem with the performance of 86.64 % precision and 90.80 % recall by considering all zones, viz. header, body and footer. More specifically, the proposed technique is well suited for table processing (i.e. extracting repeated patterns from the table) and it outperforms the state-of-the-art method by approximately more than 3 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993) Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
2.
go back to reference Aiello, M., Monz, C., Todoran, L.: Document understanding for a broad class of documents. Int. J. Doc. Anal. Recogn. 5(1), 1–16 (2002)MATHCrossRef Aiello, M., Monz, C., Todoran, L.: Document understanding for a broad class of documents. Int. J. Doc. Anal. Recogn. 5(1), 1–16 (2002)MATHCrossRef
3.
go back to reference Aksoy, S.: Spatial relationship models for image information mining. Global Earth Observation System of Systems—Summer School on Advancing Earth Observation Data Understanding, Romania (2009) Aksoy, S.: Spatial relationship models for image information mining. Global Earth Observation System of Systems—Summer School on Advancing Earth Observation Data Understanding, Romania (2009)
4.
go back to reference Bart, E., Sarkar, P.: Information extraction by finding repeated structure. In: Proceedings of International Workshop on Document Analysis Systems, pp. 175–182 (2010) Bart, E., Sarkar, P.: Information extraction by finding repeated structure. In: Proceedings of International Workshop on Document Analysis Systems, pp. 175–182 (2010)
5.
go back to reference Belaïd, A., Belaïd, Y., Valverde, L.N., Kebairi, S.: Adaptive technology for mail-order form segmentation. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 689–693 (2001) Belaïd, A., Belaïd, Y., Valverde, L.N., Kebairi, S.: Adaptive technology for mail-order form segmentation. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 689–693 (2001)
6.
go back to reference Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19(3–4), 255–259 (1998)MATHCrossRef Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19(3–4), 255–259 (1998)MATHCrossRef
7.
go back to reference Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Proceedings of International Conference on Pattern Recognition, pp. 236–240 (2002) Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Proceedings of International Conference on Pattern Recognition, pp. 236–240 (2002)
8.
go back to reference Chandran, S., Kasturi, R.: Structural recognition of tabulated data. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 516–519 (1993) Chandran, S., Kasturi, R.: Structural recognition of tabulated data. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 516–519 (1993)
9.
go back to reference Chen, J., Lopresti, D.P.: Table detection in noisy off-line handwritten documents. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 399–403 (2011) Chen, J., Lopresti, D.P.: Table detection in noisy off-line handwritten documents. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 399–403 (2011)
10.
go back to reference Cook, D., Holder, L.: Graph-based data mining. IEEE Intell. Syst. 15(2), 32–41 (2000)CrossRef Cook, D., Holder, L.: Graph-based data mining. IEEE Intell. Syst. 15(2), 32–41 (2000)CrossRef
11.
go back to reference Coüasnon, B.: Dmos, a generic document recognition method: application to table structure analysis in a general and in a specific way. Int. J. Doc. Anal. Recogn. 8(2–3), 111–122 (2006)CrossRef Coüasnon, B.: Dmos, a generic document recognition method: application to table structure analysis in a general and in a specific way. Int. J. Doc. Anal. Recogn. 8(2–3), 111–122 (2006)CrossRef
12.
go back to reference Diane, D.J., Cook, L.B.: Mining Graph Data. Wiley-Interscience, London (2006) Diane, D.J., Cook, L.B.: Mining Graph Data. Wiley-Interscience, London (2006)
13.
go back to reference Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)CrossRef Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)CrossRef
14.
go back to reference Doermann, D., Tombre, K.: Handbook of Document Image Processing and Recognition. Springer, New York (2013) Doermann, D., Tombre, K.: Handbook of Document Image Processing and Recognition. Springer, New York (2013)
15.
go back to reference e Silva, A.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Doc. Anal. Recogn. 8(2–3), 144–171 (2006)CrossRef e Silva, A.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Doc. Anal. Recogn. 8(2–3), 144–171 (2006)CrossRef
16.
go back to reference Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2–3), 66–86 (2006)CrossRef Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2–3), 66–86 (2006)CrossRef
17.
go back to reference Gallagher, B.: Matching structure and semantics: a survey on graph-based pattern matching. In: AAAI FS ’06: Papers from the 2006 AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pp. 45–53 (2006) Gallagher, B.: Matching structure and semantics: a survey on graph-based pattern matching. In: AAAI FS ’06: Papers from the 2006 AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pp. 45–53 (2006)
18.
go back to reference Garey, M.R., Johnson, D.S.: Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman & Co., New York (1990) Garey, M.R., Johnson, D.S.: Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman & Co., New York (1990)
19.
go back to reference Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic Table Detection in Document Images. Springer, Berlin (2005)CrossRef Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic Table Detection in Document Images. Springer, Berlin (2005)CrossRef
20.
go back to reference Giugno, R., Shasha, D.: Graphgrep: a fast and universal method for querying graphs. In: Proceedings of International Conference on Pattern Recognition, pp. 112–115 (2002) Giugno, R., Shasha, D.: Graphgrep: a fast and universal method for querying graphs. In: Proceedings of International Conference on Pattern Recognition, pp. 112–115 (2002)
21.
go back to reference Green, E., Krishnamoorthy, M.: Model-based analysis of printed tables. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 214–217 (1995) Green, E., Krishnamoorthy, M.: Model-based analysis of printed tables. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 214–217 (1995)
22.
go back to reference Hamza, H., Belaïd, Y., Belaïd, A.: Case-based reasoning for invoice analysis and recognition. In: Weber, R., Richter, M.M. (eds.) International Conference on Case-Based Reasoning, Volume 4626 of Lecture Notes in Computer Science, pp. 404–418 (2007) Hamza, H., Belaïd, Y., Belaïd, A.: Case-based reasoning for invoice analysis and recognition. In: Weber, R., Richter, M.M. (eds.) International Conference on Case-Based Reasoning, Volume 4626 of Lecture Notes in Computer Science, pp. 404–418 (2007)
23.
go back to reference Hamza, H., Belaïd, Y., Belaïd, A., Chaudhuri, B.B.: An end-to-end administrative document analysis system. In: Proceedings of International Workshop on Document Analysis Systems, pp. 175–182 (2008) Hamza, H., Belaïd, Y., Belaïd, A., Chaudhuri, B.B.: An end-to-end administrative document analysis system. In: Proceedings of International Workshop on Document Analysis Systems, pp. 175–182 (2008)
24.
go back to reference Hassan, T.: User-guided wrapping of pdf documents using graph matching techniques. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 631–635 (2009) Hassan, T.: User-guided wrapping of pdf documents using graph matching techniques. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 631–635 (2009)
25.
go back to reference Hassan, T., Baumgartner, R.: Table recognition and understanding from pdf files. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1143–1147 (2007) Hassan, T., Baumgartner, R.: Table recognition and understanding from pdf files. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1143–1147 (2007)
26.
go back to reference Hori, O., Doermann, D.S.: Robust table-form structure analysis based on box-driven reasoning. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 218–221 (1995) Hori, O., Doermann, D.S.: Robust table-form structure analysis based on box-driven reasoning. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 218–221 (1995)
27.
go back to reference Hu, J., Kashi, R.S., Lopresti, D., Wilfong, G.: Medium-independent table detection. In: Proceedings of SPIE Conference on Document Recognition and Retrieval, pp. 291–302 (2000) Hu, J., Kashi, R.S., Lopresti, D., Wilfong, G.: Medium-independent table detection. In: Proceedings of SPIE Conference on Document Recognition and Retrieval, pp. 291–302 (2000)
28.
go back to reference Hu, J., Kashi, R.S., Lopresti, D.P., Wilfong, G.T.: Evaluating the performance of table processing algorithms. Int. J. Doc. Anal. Recogn. 4(3), 140–153 (2002)CrossRef Hu, J., Kashi, R.S., Lopresti, D.P., Wilfong, G.T.: Evaluating the performance of table processing algorithms. Int. J. Doc. Anal. Recogn. 4(3), 140–153 (2002)CrossRef
29.
go back to reference Hurst, M.: A constraint-based approach to table structure derivation. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 911–915 (2003) Hurst, M.: A constraint-based approach to table structure derivation. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 911–915 (2003)
30.
go back to reference Hurst, M.: Towards a theory of tables. Int. J. Doc. Anal. Recogn. 8(2–3), 123–131 (2006)CrossRef Hurst, M.: Towards a theory of tables. Int. J. Doc. Anal. Recogn. 8(2–3), 123–131 (2006)CrossRef
31.
go back to reference Kasturi, R., O’Gorman, L., Govindaraju, V.: Document image analysis: a primer. Char. Recogn. 27(1), 3–22 (2002) Kasturi, R., O’Gorman, L., Govindaraju, V.: Document image analysis: a primer. Char. Recogn. 27(1), 3–22 (2002)
32.
go back to reference Kieninger, T., Dengel, A.: The t-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) Proceedings of International Workshop on Document Analysis Systems, Volume 1655 of Lecture Notes in Computer Science, pp. 255–269. Springer, Berlin (1998) Kieninger, T., Dengel, A.: The t-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) Proceedings of International Workshop on Document Analysis Systems, Volume 1655 of Lecture Notes in Computer Science, pp. 255–269. Springer, Berlin (1998)
33.
go back to reference Kieninger, T., Dengel, A.: Applying the t-recs table recognition system to the business letter domain. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 518–522 (2001) Kieninger, T., Dengel, A.: Applying the t-recs table recognition system to the business letter domain. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 518–522 (2001)
34.
go back to reference Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Proceedings of SPIE, Document Recognition V, vol. 3305, pp. 22–32 (1998) Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Proceedings of SPIE, Document Recognition V, vol. 3305, pp. 22–32 (1998)
35.
go back to reference Klein, B., Gokkus, S., Kieninger, T., Dengel, A.: Three approaches to “industrial” table spotting. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 513–517 (2001) Klein, B., Gokkus, S., Kieninger, T., Dengel, A.: Three approaches to “industrial” table spotting. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 513–517 (2001)
36.
go back to reference Klein, B., Agne, S., Dengel, A.: Results of a study on invoice-reading systems in germany. In: Marinai, S., Dengel, A. (eds.) Proceedings of International Workshop on Document Analysis Systems, Volume 3163 of Lecture Notes in Computer Science, pp. 451–462. Springer, Berlin (2004) Klein, B., Agne, S., Dengel, A.: Results of a study on invoice-reading systems in germany. In: Marinai, S., Dengel, A. (eds.) Proceedings of International Workshop on Document Analysis Systems, Volume 3163 of Lecture Notes in Computer Science, pp. 451–462. Springer, Berlin (2004)
37.
go back to reference Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8 (1966) Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8 (1966)
38.
go back to reference Li, Y., Liu, B.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)CrossRef Li, Y., Liu, B.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)CrossRef
39.
go back to reference Liang, J., Haralick, R.M., Phillips, I.T.: A statistically based, highly accurate text-line segmentation method. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 551–554 (1999) Liang, J., Haralick, R.M., Phillips, I.T.: A statistically based, highly accurate text-line segmentation method. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 551–554 (1999)
40.
go back to reference Lopresti, D.P., Nagy, G.: A tabular survey of automated table processing. In: Chhabra, A.K., Dori, D. (eds.) Graphics Recognition, Lecture Notes in Computer Science Series, Volume 1941 of Lecture Notes in Computer Science, pp. 93–120. Springer, Berlin (1999) Lopresti, D.P., Nagy, G.: A tabular survey of automated table processing. In: Chhabra, A.K., Dori, D. (eds.) Graphics Recognition, Lecture Notes in Computer Science Series, Volume 1941 of Lecture Notes in Computer Science, pp. 93–120. Springer, Berlin (1999)
41.
go back to reference Mandal, S., Chowdhury, S.P., Das, A.K., Chanda, B.: A simple and effective table detection system from document images. Int. J. Doc. Anal. Recogn. 8(2–3), 172–182 (2006)CrossRef Mandal, S., Chowdhury, S.P., Das, A.K., Chanda, B.: A simple and effective table detection system from document images. Int. J. Doc. Anal. Recogn. 8(2–3), 172–182 (2006)CrossRef
42.
go back to reference Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Kanungo, T., Smith, E.H.B., Hu, J., Kantor, P.B. (eds.) Proceedings of SPIE Conference on Document Recognition and Retrieval, vol. 5010, pp. 197–207 (2003) Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Kanungo, T., Smith, E.H.B., Hu, J., Kantor, P.B. (eds.) Proceedings of SPIE Conference on Document Recognition and Retrieval, vol. 5010, pp. 197–207 (2003)
43.
go back to reference Messmer, B.T., Bunke, H.: Subgraph isomorphism in polynomial time. Technical report, Institute of Computer Science and Applied Math, University of Bern (1995) Messmer, B.T., Bunke, H.: Subgraph isomorphism in polynomial time. Technical report, Institute of Computer Science and Applied Math, University of Bern (1995)
44.
go back to reference Nagy, G.: Twenty years of document image analysis in pami. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)CrossRef Nagy, G.: Twenty years of document image analysis in pami. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)CrossRef
45.
go back to reference Papadias, D., Theodoridis, Y.: Spatial relations, minimum bounding rectangles, and spatial data structures. Int. J. Geogr. Inf. Sci. 11(2), 111–138 (1997)CrossRef Papadias, D., Theodoridis, Y.: Spatial relations, minimum bounding rectangles, and spatial data structures. Int. J. Geogr. Inf. Sci. 11(2), 111–138 (1997)CrossRef
46.
go back to reference Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 374–378 (2003) Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 374–378 (2003)
47.
go back to reference Riesen, K., Bunke, H.: Graph Classification and Clustering Based on Vector Space Embedding. World Scientific Publishing Co. Inc., River Edge (2010)MATH Riesen, K., Bunke, H.: Graph Classification and Clustering Based on Vector Space Embedding. World Scientific Publishing Co. Inc., River Edge (2010)MATH
48.
go back to reference Santosh, K.C., Belaïd, A.: Client-driven content extraction associated with table. In: Machine Vision and Applications, pp. 277–280 (2013) Santosh, K.C., Belaïd, A.: Client-driven content extraction associated with table. In: Machine Vision and Applications, pp. 277–280 (2013)
49.
go back to reference Santosh, K.C., Belaïd, A.: Document information extraction and its evaluation based on client’s relevance. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 35–39 (2013) Santosh, K.C., Belaïd, A.: Document information extraction and its evaluation based on client’s relevance. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 35–39 (2013)
50.
go back to reference Santosh, K.C., Belaïd, A.: Pattern-based approach to table extraction. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds.) Proceedings of the IAPR Iberian Conference on Pattern Recognition and Image Analysis, Volume 7887 of Lecture Notes in Computer Science, pp. 766–773. Springer, Berlin (2013) Santosh, K.C., Belaïd, A.: Pattern-based approach to table extraction. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds.) Proceedings of the IAPR Iberian Conference on Pattern Recognition and Image Analysis, Volume 7887 of Lecture Notes in Computer Science, pp. 766–773. Springer, Berlin (2013)
51.
go back to reference Saund, E.: A graph lattice approach to maintaining and learning dense collections of subgraphs as image features. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2323–2339 (2013)CrossRef Saund, E.: A graph lattice approach to maintaining and learning dense collections of subgraphs as image features. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2323–2339 (2013)CrossRef
52.
go back to reference Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Doermann, D.S., Govindaraju, V., Lopresti, D.P., Natarajan, P. (eds.) Proceedings of International Workshop on Document Analysis Systems, pp. 65–72 (2010) Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Doermann, D.S., Govindaraju, V., Lopresti, D.P., Natarajan, P. (eds.) Proceedings of International Workshop on Document Analysis Systems, pp. 65–72 (2010)
53.
go back to reference Shamilian, J.H., Baird, H.S., Wood, T.L.: A retargetable table reader. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 158–163 (1997) Shamilian, J.H., Baird, H.S., Wood, T.L.: A retargetable table reader. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 158–163 (1997)
54.
go back to reference Smith, R.W.: Hybrid page layout analysis via tab-stop detection. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 241–245 (2009) Smith, R.W.: Hybrid page layout analysis via tab-stop detection. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 241–245 (2009)
56.
go back to reference Tsai, W.-H., Fu, K.-S.: Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Trans. Syst. Man Cybern. 9(12), 757–768 (1979)MATHCrossRef Tsai, W.-H., Fu, K.-S.: Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Trans. Syst. Man Cybern. 9(12), 757–768 (1979)MATHCrossRef
58.
go back to reference Wang, Y., Haralick, R.M., Phillips, I.T.: Automatic table ground truth generation and a background-analysis-based table structure extraction method. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 528–532 (2001) Wang, Y., Haralick, R.M., Phillips, I.T.: Automatic table ground truth generation and a background-analysis-based table structure extraction method. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 528–532 (2001)
59.
go back to reference Wang, Y., Phillips, I.T., Haralick, R.M.: Table detection via probability optimization. In: Proceedings of International Workshop on Document Analysis Systems, pp. 272–282 Wang, Y., Phillips, I.T., Haralick, R.M.: Table detection via probability optimization. In: Proceedings of International Workshop on Document Analysis Systems, pp. 272–282
60.
go back to reference Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newslett. 5(1), 59–68 (2003)CrossRef Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newslett. 5(1), 59–68 (2003)CrossRef
61.
go back to reference Watanabe, T., Luo, Q., Sugie, N.: Toward a practical document understanding of table-form documents: its framework and knowledge representation. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 510–515 (1993) Watanabe, T., Luo, Q., Sugie, N.: Toward a practical document understanding of table-form documents: its framework and knowledge representation. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 510–515 (1993)
62.
go back to reference Weber, M., Liwicki, M., Dengel, A.: Faster subgraph isomorphism detection by well-founded total order indexing. Pattern Recogn. Lett. 33(15), 2011–2019 (2012)CrossRef Weber, M., Liwicki, M., Dengel, A.: Faster subgraph isomorphism detection by well-founded total order indexing. Pattern Recogn. Lett. 33(15), 2011–2019 (2012)CrossRef
63.
go back to reference Wenzel, C., Tersteegen, W.: Precise table recognition by making use of reference tables. In: Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice. Springer, Berlin, pp. 283–294 (1999) Wenzel, C., Tersteegen, W.: Precise table recognition by making use of reference tables. In: Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice. Springer, Berlin, pp. 283–294 (1999)
64.
go back to reference Yan, X., Zhou, X.J., Han, J.: Mining closed relational graphs with connectivity constraints. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 324–333 (2005) Yan, X., Zhou, X.J., Han, J.: Mining closed relational graphs with connectivity constraints. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 324–333 (2005)
65.
go back to reference Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recogn. 7(1), 1–16 (2004)CrossRef Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recogn. 7(1), 1–16 (2004)CrossRef
Metadata
Title
g-DICE: graph mining-based document information content exploitation
Author
K. C. Santosh
Publication date
01-12-2015
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 4/2015
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-015-0253-z

Other articles of this Issue 4/2015

International Journal on Document Analysis and Recognition (IJDAR) 4/2015 Go to the issue

Premium Partner