Abstract
Google's Web Tables and Deep Web Crawler identify and deliver this otherwise inaccessible resource directly to end users.
- Barbosa, L. and Freire, J. Siphoning Hidden-Web data through keyword-based interfaces. In Proceedings of the Brazilian Symposium on Databases, 2004, 309--321.Google Scholar
- Bergman. M.K. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7, 1 (2001).Google ScholarCross Ref
- Cafarella, M.J., Halevy, A.Y., and Khoussainova, N. Data integration for the relational Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1090--1101. Google ScholarDigital Library
- Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., and Zhang, Y. WebTables: Exploring the power of tables on the Web. Proceedings of the VLDB Endowment 1, 1 (Aug. 2008), 538--549. Google ScholarDigital Library
- Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., and Wu, E. Uncovering the relational Web. In Proceedings of the 11th International Workshop on the Web and Databases (Vancouver, B.C., June 13, 2008).Google Scholar
- Callan, J.P. and Connell, M.E. Query-based sampling of text databases. ACM Transactions on Information Systems 19, 2 (2001), 97--130. Google ScholarDigital Library
- Cars.com (faq); http://siy.cars.com/siy/qsg/faqgeneralinfo.jsp#howmanyadsGoogle Scholar
- Cazoodle apartment search; http://apartments.cazoodle.com/Google Scholar
- Chang, K.C.-C., He, B., and Zhang, Z. Toward large-scale integration: Building a metaquerier over databases on the Web. In Proceedings of the Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 2005).Google Scholar
- Chen, H., Tsai, S., and Tsai, J. Mining tables from large-scale html texts. In Proceedings of the 18th International Conference on Computational Linguistics (Saarbrucken, Germany, July 31--Aug. 4, 2000), 166--172. Google ScholarDigital Library
- Elmeleegy, H., Madhavan, J., and Halevy, A. Harvesting relational tables from lists on the Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1078--1089. Google ScholarDigital Library
- Gatterbauer, W., Bohunsky, P., Herzog, M., Krüupl, B., and Pollak, B. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International World Wide Web Conference (Banff, Canada, May 8--12, 2007), 71--80. Google ScholarDigital Library
- Gonzalez, H., Halevy, A., Jensen, C., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. Google Fusion Tables: Web-centered data management and collaboration. In Proceedings of the SIGMOD ACM Special Interest Group on Management of Data (Indianapolis, 2010). ACM Press, New York, 2010, 1061--1066. Google ScholarDigital Library
- He, B., Patel, M., Zhang, Z., and Chang, K.C.-C. Accessing the Deep Web. Commun. ACM 50, 5 (May 2007), 94--101. Google ScholarDigital Library
- Ipeirotis, P.G. and Gravano, L. Distributed search over the Hidden Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (Hong Kong, Aug. 20--23, 2002), 394--405. Google ScholarDigital Library
- Limaye, G., Sarawagi, S., and Chakrabarti, S. Annotating and searching Web tables using entities, types, and relationships. Proceedings of the VLDB Endowment 3, 1 (2010), 1338--1347. Google ScholarDigital Library
- Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., and Halevy, A.Y. Google's Deep Web Crawl. Proceedings of the VLDB Endowment 1, 1 (2008), 1241--1252. Google ScholarDigital Library
- Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., and Yu, C. Web-scale data integration: You can afford to pay as you go. In Proceedings of the Second Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 7--10, 2007). 342--350.Google Scholar
- Ntoulas, A., Zerfos, P., and Cho, J. Downloading textual Hidden Web content through keyword queries. In Proceedings of the Joint Conference on Digital Libraries (Denver, June 7--11, 2005), 100--109. Google ScholarDigital Library
- Raghavan, S. and Garcia-Molina, H. Crawling the Hidden Web. In Proceedings of the 27th International Conference on Very Large Databases (Rome, Italy, Sept. 11--14, 2001), 129--138. Google ScholarDigital Library
- Trulia; http://www.trulia.com/Google Scholar
- Wang, Y. and Hu, J. A machine-learning-based approach for table detection on the Web. In Proceedings of the 11th International World Wide Web Conference (Honolulu, 2002), 242--250. Google ScholarDigital Library
- Zanibbi, R., Blostein, D., and Cordy, J. A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7, 1 (2004), 1--16. Google ScholarDigital Library
Recommendations
Extracting structured data from Web pages
SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of dataMany web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its book pages. The values used to generate the pages (e.g., the author, title,...
Keyword search on structured and semi-structured data
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataEmpowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an ...
Bigtable: A Distributed Storage System for Structured Data
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google ...
Comments