Abstract
Google's Web Tables and Deep Web Crawler identify and deliver this otherwise inaccessible resource directly to end users.
- Barbosa, L. and Freire, J. Siphoning Hidden-Web data through keyword-based interfaces. In Proceedings of the Brazilian Symposium on Databases, 2004, 309--321.Google Scholar
- Bergman. M.K. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7, 1 (2001).Google ScholarCross Ref
- Cafarella, M.J., Halevy, A.Y., and Khoussainova, N. Data integration for the relational Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1090--1101. Google ScholarDigital Library
- Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., and Zhang, Y. WebTables: Exploring the power of tables on the Web. Proceedings of the VLDB Endowment 1, 1 (Aug. 2008), 538--549. Google ScholarDigital Library
- Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., and Wu, E. Uncovering the relational Web. In Proceedings of the 11th International Workshop on the Web and Databases (Vancouver, B.C., June 13, 2008).Google Scholar
- Callan, J.P. and Connell, M.E. Query-based sampling of text databases. ACM Transactions on Information Systems 19, 2 (2001), 97--130. Google ScholarDigital Library
- Cars.com (faq); http://siy.cars.com/siy/qsg/faqgeneralinfo.jsp#howmanyadsGoogle Scholar
- Cazoodle apartment search; http://apartments.cazoodle.com/Google Scholar
- Chang, K.C.-C., He, B., and Zhang, Z. Toward large-scale integration: Building a metaquerier over databases on the Web. In Proceedings of the Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 2005).Google Scholar
- Chen, H., Tsai, S., and Tsai, J. Mining tables from large-scale html texts. In Proceedings of the 18th International Conference on Computational Linguistics (Saarbrucken, Germany, July 31--Aug. 4, 2000), 166--172. Google ScholarDigital Library
- Elmeleegy, H., Madhavan, J., and Halevy, A. Harvesting relational tables from lists on the Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1078--1089. Google ScholarDigital Library
- Gatterbauer, W., Bohunsky, P., Herzog, M., Krüupl, B., and Pollak, B. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International World Wide Web Conference (Banff, Canada, May 8--12, 2007), 71--80. Google ScholarDigital Library
- Gonzalez, H., Halevy, A., Jensen, C., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. Google Fusion Tables: Web-centered data management and collaboration. In Proceedings of the SIGMOD ACM Special Interest Group on Management of Data (Indianapolis, 2010). ACM Press, New York, 2010, 1061--1066. Google ScholarDigital Library
- He, B., Patel, M., Zhang, Z., and Chang, K.C.-C. Accessing the Deep Web. Commun. ACM 50, 5 (May 2007), 94--101. Google ScholarDigital Library
- Ipeirotis, P.G. and Gravano, L. Distributed search over the Hidden Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (Hong Kong, Aug. 20--23, 2002), 394--405. Google ScholarDigital Library
- Limaye, G., Sarawagi, S., and Chakrabarti, S. Annotating and searching Web tables using entities, types, and relationships. Proceedings of the VLDB Endowment 3, 1 (2010), 1338--1347. Google ScholarDigital Library
- Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., and Halevy, A.Y. Google's Deep Web Crawl. Proceedings of the VLDB Endowment 1, 1 (2008), 1241--1252. Google ScholarDigital Library
- Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., and Yu, C. Web-scale data integration: You can afford to pay as you go. In Proceedings of the Second Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 7--10, 2007). 342--350.Google Scholar
- Ntoulas, A., Zerfos, P., and Cho, J. Downloading textual Hidden Web content through keyword queries. In Proceedings of the Joint Conference on Digital Libraries (Denver, June 7--11, 2005), 100--109. Google ScholarDigital Library
- Raghavan, S. and Garcia-Molina, H. Crawling the Hidden Web. In Proceedings of the 27th International Conference on Very Large Databases (Rome, Italy, Sept. 11--14, 2001), 129--138. Google ScholarDigital Library
- Trulia; http://www.trulia.com/Google Scholar
- Wang, Y. and Hu, J. A machine-learning-based approach for table detection on the Web. In Proceedings of the 11th International World Wide Web Conference (Honolulu, 2002), 242--250. Google ScholarDigital Library
- Zanibbi, R., Blostein, D., and Cordy, J. A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7, 1 (2004), 1--16. Google ScholarDigital Library
Index Terms
- Structured data on the web
Recommendations
Structured Web Pages Management for Efficient Data Retrieval
WISE '00: Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 2 - Volume 2The widespread use of World Wide Web in recent years has opened a way of universal access to vast amount of information sources. An obstacle that affects the access to Web data is the lack of information structure among and within Web pages. This raises ...
Structured data on the web
NGITS'09: Proceedings of the 7th international conference on Next generation information technologies and systemsThough search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. I will describe several current projects at Google whose overall goal ...
Structured Data on the Web
APWEB '10: Proceedings of the 2010 12th International Asia-Pacific Web ConferenceThough search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. Moreover, structured data is starting to play a greater role in many ...
Comments