skip to main content
research-article

SXPath: extending XPath towards spatial querying on web documents

Published:01 November 2010Publication History
Skip Abstract Section

Abstract

Querying data from presentation formats like HTML, for purposes such as information extraction, requires the consideration of tree structures as well as the consideration of spatial relationships between laid out elements. The underlying rationale is that frequently the rendering of tree structures is very involved and undergoing more frequent updates than the resulting layout structure. Therefore, in this paper, we present Spatial XPath (SXPath), an extension of XPath 1.0 that allows for inclusion of spatial navigation primitives into the language resulting in conceptually simpler queries on Web documents. The SXPath language is based on a combination of a spatial algebra with formal descriptions of XPath navigation, and maintains polynomial time combined complexity. Practical experiments demonstrate the usability of SXPath.

References

  1. Acid Tests, http://www.acidtests.org. Web Standards Project.Google ScholarGoogle Scholar
  2. S. Adali, M. L. Sapino, and V. S. Subrahmanian. An algebra for creating and querying multimedia presentations. Multimedia Syst., 8(3):212--230, 2000. Google ScholarGoogle Scholar
  3. J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832--843, 1983. Google ScholarGoogle Scholar
  4. P. Balbiani, J.-F. Condotta, and L. F. d. Cerro. A new tractable subclass of the rectangle algebra. In IJCAI, pages 442--447, 1999. Google ScholarGoogle Scholar
  5. R. Baumgartner, G. Gottlob, and M. Herzog. Scalable web data extraction for online market intelligence. VLDB, 2(2):1512--1523, 2009. Google ScholarGoogle Scholar
  6. M. Benedikt and C. Koch. Xpath leashed. ACM Computational Survey, 41(1):1--54, 2008. Google ScholarGoogle Scholar
  7. C.-H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan. A survey of web information extraction systems. TKDE, 18(10):1411--1428, 2006. Google ScholarGoogle Scholar
  8. P. Eades and K. Sugiyama. How to draw a directed graph. Journal of Information Processing, 13(4):424--437, 1990. Google ScholarGoogle Scholar
  9. W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krüpl, and B. Pollak. Towards domain-independent information extraction from web tables. In WWW, pages 71--80, 2007. Google ScholarGoogle Scholar
  10. G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, and S. Flesca. The lixto data extraction project: back and forth between theory and practice. In PODS, pages 1--12, 2004. Google ScholarGoogle Scholar
  11. G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing xpath queries. In VLDB, pages 95--106, 2002. Google ScholarGoogle Scholar
  12. G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing xpath queries. TODS, 30(2):444--491, 2005. Google ScholarGoogle Scholar
  13. J. Kong, K. Zhang, and X. Zeng. Spatial graph grammars for graphical user interfaces. TOCHI, 13(2):268--307, 2006. Google ScholarGoogle Scholar
  14. T. Lee, L. Sheng, T. Bozkaya, N. H. Balkir, Z. M. Özsoyoglu, and G. Özsoyoglu. Querying multimedia presentations based on content. TKDE, 11(3):361--385, 1999. Google ScholarGoogle Scholar
  15. L. Libkin. Elements Of Finite Model Theory. SpringerVerlag, 2004. Google ScholarGoogle Scholar
  16. J. Madhavan, S. R. Jeffery, S. Cohen, X. Dong, D. Ko, C. Yu, A. Halevy, and G. Inc. Web-scale data integration: You can only afford to pay as you go. In CIDR, 2007.Google ScholarGoogle Scholar
  17. M. Marx and M. de Rijke. Semantic characterizations of navigational xpath. SIGMOD Rec., 34(2):41--46, 2005. Google ScholarGoogle Scholar
  18. I. Navarrete and G. Sciavicco. Spatial reasoning with rectangular cardinal direction relations. In ECAI, pages 1--9, 2006.Google ScholarGoogle Scholar
  19. N. K. Papadakis, D. Skoutas, K. Raftopoulos, and T. A. Varvarigou. Stavies: A system for information extraction from unknown web data sources through automatic web wrapper generation using clustering techniques. TKDE, 17(12):1638--1652, 2005. Google ScholarGoogle Scholar
  20. P. Parys. Xpath evaluation in linear time with polynomial combined complexity. In PODS, pages 55--64. ACM, 2009. Google ScholarGoogle Scholar
  21. J. Renz. Qualitative spatial reasoning with topological information. Springer, 2002. Google ScholarGoogle Scholar
  22. A. Sahuguet and F. Azavant. Building intelligent web applications using lightweight wrappers. DKE, 36(3):283--316, 2001. Google ScholarGoogle Scholar
  23. B. ten Cate and M. Marx. Axiomatizing the logical core of xpath 2.0. Theory of Computing Systems, 44(4):561--589, 2009. Google ScholarGoogle Scholar
  24. W3C, http://www.w3.org/XML/Query/. XML Query (XQuery), 1.0 edition.Google ScholarGoogle Scholar
  25. W3C, http://www.w3.org/TR/xpath. XML Path Language (XPath) Version 1.0, 1.0 edition, November 1999.Google ScholarGoogle Scholar
  26. P. Wadler. Two semantics for xpath. Draft: http://homepages.inf.ed.ac.uk/~wadler/papers/xpath-semantics, 2000.Google ScholarGoogle Scholar
  27. Y. Zhai and B. Liu. Structured data extraction from the web based on partial tree alignment. TKDE, 18(12):1614--1628, 2006. Google ScholarGoogle Scholar

Index Terms

  1. SXPath: extending XPath towards spatial querying on web documents

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image Proceedings of the VLDB Endowment
                Proceedings of the VLDB Endowment  Volume 4, Issue 2
                November 2010
                105 pages

                Publisher

                VLDB Endowment

                Publication History

                • Published: 1 November 2010
                Published in pvldb Volume 4, Issue 2

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader