skip to main content
research-article
Free Access

Structured data on the web

Published:01 February 2011Publication History
Skip Abstract Section

Abstract

Google's Web Tables and Deep Web Crawler identify and deliver this otherwise inaccessible resource directly to end users.

References

  1. Barbosa, L. and Freire, J. Siphoning Hidden-Web data through keyword-based interfaces. In Proceedings of the Brazilian Symposium on Databases, 2004, 309--321.Google ScholarGoogle Scholar
  2. Bergman. M.K. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7, 1 (2001).Google ScholarGoogle ScholarCross RefCross Ref
  3. Cafarella, M.J., Halevy, A.Y., and Khoussainova, N. Data integration for the relational Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1090--1101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., and Zhang, Y. WebTables: Exploring the power of tables on the Web. Proceedings of the VLDB Endowment 1, 1 (Aug. 2008), 538--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., and Wu, E. Uncovering the relational Web. In Proceedings of the 11th International Workshop on the Web and Databases (Vancouver, B.C., June 13, 2008).Google ScholarGoogle Scholar
  6. Callan, J.P. and Connell, M.E. Query-based sampling of text databases. ACM Transactions on Information Systems 19, 2 (2001), 97--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cars.com (faq); http://siy.cars.com/siy/qsg/faqgeneralinfo.jsp#howmanyadsGoogle ScholarGoogle Scholar
  8. Cazoodle apartment search; http://apartments.cazoodle.com/Google ScholarGoogle Scholar
  9. Chang, K.C.-C., He, B., and Zhang, Z. Toward large-scale integration: Building a metaquerier over databases on the Web. In Proceedings of the Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 2005).Google ScholarGoogle Scholar
  10. Chen, H., Tsai, S., and Tsai, J. Mining tables from large-scale html texts. In Proceedings of the 18th International Conference on Computational Linguistics (Saarbrucken, Germany, July 31--Aug. 4, 2000), 166--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Elmeleegy, H., Madhavan, J., and Halevy, A. Harvesting relational tables from lists on the Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1078--1089. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gatterbauer, W., Bohunsky, P., Herzog, M., Krüupl, B., and Pollak, B. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International World Wide Web Conference (Banff, Canada, May 8--12, 2007), 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gonzalez, H., Halevy, A., Jensen, C., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. Google Fusion Tables: Web-centered data management and collaboration. In Proceedings of the SIGMOD ACM Special Interest Group on Management of Data (Indianapolis, 2010). ACM Press, New York, 2010, 1061--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. He, B., Patel, M., Zhang, Z., and Chang, K.C.-C. Accessing the Deep Web. Commun. ACM 50, 5 (May 2007), 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ipeirotis, P.G. and Gravano, L. Distributed search over the Hidden Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (Hong Kong, Aug. 20--23, 2002), 394--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Limaye, G., Sarawagi, S., and Chakrabarti, S. Annotating and searching Web tables using entities, types, and relationships. Proceedings of the VLDB Endowment 3, 1 (2010), 1338--1347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., and Halevy, A.Y. Google's Deep Web Crawl. Proceedings of the VLDB Endowment 1, 1 (2008), 1241--1252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., and Yu, C. Web-scale data integration: You can afford to pay as you go. In Proceedings of the Second Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 7--10, 2007). 342--350.Google ScholarGoogle Scholar
  19. Ntoulas, A., Zerfos, P., and Cho, J. Downloading textual Hidden Web content through keyword queries. In Proceedings of the Joint Conference on Digital Libraries (Denver, June 7--11, 2005), 100--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Raghavan, S. and Garcia-Molina, H. Crawling the Hidden Web. In Proceedings of the 27th International Conference on Very Large Databases (Rome, Italy, Sept. 11--14, 2001), 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Trulia; http://www.trulia.com/Google ScholarGoogle Scholar
  22. Wang, Y. and Hu, J. A machine-learning-based approach for table detection on the Web. In Proceedings of the 11th International World Wide Web Conference (Honolulu, 2002), 242--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zanibbi, R., Blostein, D., and Cordy, J. A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7, 1 (2004), 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Structured data on the web

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        • Published in

                          cover image Communications of the ACM
                          Communications of the ACM  Volume 54, Issue 2
                          February 2011
                          115 pages
                          ISSN:0001-0782
                          EISSN:1557-7317
                          DOI:10.1145/1897816
                          Issue’s Table of Contents

                          Copyright © 2011 ACM

                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 1 February 2011

                          Permissions

                          Request permissions about this article.

                          Request Permissions

                          Check for updates

                          Qualifiers

                          • research-article
                          • Popular
                          • Refereed

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader

                        HTML Format

                        View this article in HTML Format .

                        View HTML Format