- 1.K.Bharat,A.Broder,M.Henzinger,P.Kumar,and S.Venkatasubramanian.The connecti ity ser er:Fast access to linkage information on the web.In Proceedings of the 7th International Worl d Wide Web Conference pages 469 -477,Brisbane,Australia,April 1998.Elsevier Science. Google ScholarDigital Library
- 2.S.Brin and L.Page.The anatomy of a large-scale hypertextual web search engine.In Proceedings of the 7th International World Wide Web Conference pages 107 -117,Brisbane,Australia,April 1998.Elsevier Science. Google ScholarDigital Library
- 3.M.Burner.Crawling towards eternity:Building an archive of the world wide web.Web Techniques Magazine 2(5):37 -40,May 1997.Google Scholar
- 4.J.Cho,H.Garcia-Molina,and L.Page.E .cient crawling through URL ordering.In Proceedings of the 7th International World Wide Web Conference pages 161 -172,Brisbane,Australia,April 1998.Elsevier Science. Google ScholarDigital Library
- 5.Google Inc.Press release:"Google launches world 's largest search engine."June 26,2000.A ailable at http://www.google.com/press/pressrel/pressrelease26.htmlGoogle Scholar
- 6.M.Henzinger,A.Heydon,M.Mitzenmacher,and M.Najork.On near-uniform URL sampling.In Proceedings of the 9th International Worl d Wide Web Conference pages 295 -308,Amsterdam,Netherlands, May 2000.Elsevier Science. Google ScholarDigital Library
- 7.A.Heydon and M.Najork.Mercator:A scalable, extensible web crawler.World Wide Web 2(4):219 -229,Dec.1999. Google ScholarDigital Library
- 8.J.Kleinberg.Authoritati e sources in a hyperlinked en ironment.In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms pages 668 -677, San Francisco,CA,Jan.1998. Google ScholarDigital Library
- 9.P.Lyman,H.Varian,J.Dunn,A.Strygin,and K.Swearingen.How much information?School of Information Management and Systems,Uni .of California at Berkeley,2000.A ailable at http://www.sims.berkeley.edu/how-much-infoGoogle Scholar
- 10.Mercator Home Page. http://www.research.digital.com/SRC/mercatorGoogle Scholar
- 11.J.L.Wiener,R.Wickremesinghe,M.Burrows, K.Randall,and R.Stata.Better link compression. Manuscript in progress.Compaq Systems Research Center,2001.Google Scholar
Index Terms
- Breadth-first crawling yields high-quality pages
Recommendations
Random web crawls
WWW '07: Proceedings of the 16th international conference on World Wide WebThis paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, whose vertices are the pages and whose edges are the hypertextual links. Of ...
Incorporating the surfing behavior of web users into pagerank
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementIn large-scale commercial web search engines, estimating the importance of a web page is a crucial ingredient in ranking web search results. So far, to assess the importance of web pages, two different types of feedback have been taken into account, ...
Intelligent crawling of web applications for web archiving
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThe steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently ...
Comments