ABSTRACT
In this paper, we propose a precise, comprehensive model of table processing which aims to remedy some of the problems in the discussion of table processing in the literature. The model targets application-independent, end-to-end table processing, and thus encompasses a large subset of the work in the area. The model can be used to aid the design of table processing systems (We provide an example of such a system), can be considered as a reference framework for evaluating the performance of table processing systems, and can assist in clarifying terminological differences in the table processing literature.
- Chen, H.-H., Tsai, S.-C., and Tsai, J.-H. Mining tables from large scale html texts. In Proceedings of the 18th conference on Computational linguistics-Volume 1 (2000), Association for Computational Linguistics, pp. 166--172. Google ScholarDigital Library
- E. Silva, A. C. Parts that add up to a whole: a framework for the analysis of tables. PhD thesis, The University of Edinburgh, 2010.Google Scholar
- E. Silva, A. C. Metrics for evaluating performance in document analysis: application to tables. International Journal on Document Analysis and Recognition (IJDAR) 14, 1 (2011), 101--109. Google ScholarDigital Library
- E Silva, A. C., Jorge, A., and Torgo, L. Automatic selection of table areas in documents for information extraction. In Progress in Artificial Intelligence. Springer, 2003, pp. 460--465.Google Scholar
- E Silva, A. C., Jorge, A., and Torgo, L. Design of an end-to-end method to extract information from tables. International Journal of Document Analysis and Recognition 8, 2--3 (2006), 144--171.Google ScholarCross Ref
- Embley, D. W., Lopresti, D., and Nagy, G. Notes on contemporary table recognition. In Document Analysis Systems VII. Springer, 2006, pp. 164--175. Google ScholarDigital Library
- Fang, J., Gao, L., Bai, K., Qiu, R., Tao, X., and Tang, Z. A table detection method for multipage pdf documents via visual seperators and tabular structures. In Document Analysis and Recognition (ICDAR) (2011), IEEE, pp. 779--783. Google ScholarDigital Library
- Fang, J., Tao, X., Tang, Z., Qiu, R., and Liu, Y. Dataset, ground-truth and performance metrics for table detection evaluation. In Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on (2012), IEEE, pp. 445--449. Google ScholarDigital Library
- Gobel, M., Hassan, T., Oro, E., and Orsi, G. ICDAR 2013 table competition. In 12th International Conference on Document Analysis and Recognition (ICDAR'13) (2013), IEEE, pp. 1449--1453. Google ScholarDigital Library
- Hu, J., Kashi, R., Lopresti, D., Nagy, G., and Wilfong, G. Why table ground-truthing is hard. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on (2001), IEEE, pp. 129--133. Google ScholarDigital Library
- Hu, J., and Liu, Y. Analysis of documents born digital. Handbook of Document Image Processing and Recognition (2014), 775--804.Google Scholar
- Hurst, M. The interpretation of tables in texts. PhD thesis, The University of Edinburgh, 2000.Google Scholar
- Hurst, M. Layout and language: Exploring text block discovery in tables using linguistic resources. In International Conference on Document Analysis and Recognition (2001), pp. 523--527. Google ScholarDigital Library
- Jha, P., and Nagy, G. Wang notation tool: Layout independent representation of tables. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on (2008), IEEE, pp. 1--4.Google ScholarCross Ref
- Jin, D. An algebraic approach to building category parse trees for web tables. 2012 NCUR (2013).Google Scholar
- Kieninger, T., and Dengel, A. Applying the t-recs table recognition system to the business letter domain. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on (2001), IEEE, pp. 518--522. Google ScholarDigital Library
- Lee, M.-H., Kim, Y.-S., and Lee, K.-H. Logical structure analysis: From html to xml. Computer Standards & Interfaces 29, 1 (2007), 109--124. Google ScholarDigital Library
- Liu, Y., Bai, K., Mitra, P., and Giles, C. L. Improving the table boundary detection in pdfs by fixing the sequence error of the sparse lines. In 10th International Conference on Document Analysis and Recognition (ICDAR'09) (2009), IEEE, pp. 1006--1010. Google ScholarDigital Library
- Liu, Y., Mitra, P., and Giles, C. L. Identifying table boundaries in digital documents via sparse line detection. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (2008), ACM, pp. 1311--1320. Google ScholarDigital Library
- Long, V. An Agent-Based Approach to Table Recognition and Interpretation. PhD thesis, Macquarie University Sydney, Australia, 2010.Google Scholar
- Long, V., Cassidy, S., and Dale, R. A multi-level table evaluation method for plain text documents. In Extended Abstracts of the 7th International Association for Pattern Recognition Workshop on Document Analysis Systems (DAS 2006) (2006), pp. 21--24.Google Scholar
- Lopresti, D., and Nagy, G. A tabular survey of automated table processing. In Graphics Recognition Recent Advances. Springer, 2000, pp. 93--120. Google ScholarDigital Library
- Nagy, G., Padmanabhan, R., Jandhyala, R., Silversmith, W., and Krishnamoorthy, M. Table metadata: Headers, augmentations and aggregates. In Ninth IAPR International Workshop on Document Analysis Systems (2010).Google Scholar
- Nagy, G., and Tamhankar, M. Vericlick: an efficient tool for table format verification. In IS&T/SPIE Electronic Imaging (2012), International Society for Optics and Photonics, pp. 1--9.Google Scholar
- Oro, E., and Ruffolo, M. Xonto: An ontology-based system for semantic information extraction from pdf documents. In 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'08) (2008), vol. 1, IEEE, pp. 118--125. Google ScholarDigital Library
- Padmanabhan, R. K., Jandhyala, R. C., Krishnamoorthy, M., Nagy, G., Seth, S., and Silversmith, W. Interactive conversion of web tables. In Graphics Recognition. Achievements, Challenges, and Evolution. Springer, 2010, pp. 25--36. Google ScholarDigital Library
- Seth, S., Jandhyala, R., Krishnamoorthy, M., and Nagy, G. Analysis and taxonomy of column header categories for web tables. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (2010), ACM, pp. 81--88. Google ScholarDigital Library
- Shahab, A., Shafait, F., Kieninger, T., and Dengel, A. An open approach towards the benchmarking of table structure recognition systems. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (2010), ACM, pp. 113--120. Google ScholarDigital Library
- Tao, C., and Embley, D. W. Automatic hidden-web table interpretation, conceptualization, and semantic annotation. Data & Knowledge Engineering 68, 7 (2009), 683--703. Google ScholarDigital Library
- Wang, X. Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo, 1996. Google ScholarDigital Library
- Wang, Y., and Hu, J. Detecting tables in html documents. In Document Analysis Systems V. Springer, 2002, pp. 249--260. Google ScholarDigital Library
- Yang, Y. Web table mining and database discovery. PhD thesis, Simon Fraser University, 2002.Google Scholar
- Zanibbi, R., Blostein, D., and Cordy, J. R. A survey of table recognition. Document Analysis and Recognition 7, 1 (2004), 1--16. Google ScholarDigital Library
Index Terms
- TEXUS: A Task-based Approach for Table Extraction and Understanding
Recommendations
TEXUS: A unified framework for extracting and understanding tables in PDF documents
AbstractTables in documents are a widely-available and rich source of information, but not yet well-utilised computationally because of the difficulty in automatically extracting their structure and data content. There has been a plethora of ...
Web Table Extraction, Retrieval, and Augmentation: A Survey
Survey Paper and Regular PaperTables are powerful and popular tools for organizing and manipulating data. A vast number of tables can be found on the Web, which represent a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research ...
Table Modelling, Extraction and Processing
DocEng '16: Proceedings of the 2016 ACM Symposium on Document EngineeringThis tutorial is targeted at academics and practitioners, both within and outside of the Document Engineering community, who are confronted with table processing tasks such as information extraction and conversion, or have an interest in the topic, and ...
Comments