Article

Breadth-first crawling yields high-quality pages

Authors:
Marc Najork

Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA

Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA
View Profile

,
Janet L. Wiener

Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA

Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA
View Profile

WWW '01: Proceedings of the 10th international conference on World Wide WebMay 2001Pages 114–118https://doi.org/10.1145/371920.371965

Published:01 April 2001Publication History

WWW '01: Proceedings of the 10th international conference on World Wide Web

Pages 114–118

References

1.K.Bharat,A.Broder,M.Henzinger,P.Kumar,and S.Venkatasubramanian.The connecti ity ser er:Fast access to linkage information on the web.In Proceedings of the 7th International Worl d Wide Web Conference pages 469 -477,Brisbane,Australia,April 1998.Elsevier Science. Google ScholarDigital Library
2.S.Brin and L.Page.The anatomy of a large-scale hypertextual web search engine.In Proceedings of the 7th International World Wide Web Conference pages 107 -117,Brisbane,Australia,April 1998.Elsevier Science. Google ScholarDigital Library
3.M.Burner.Crawling towards eternity:Building an archive of the world wide web.Web Techniques Magazine 2(5):37 -40,May 1997.Google Scholar
4.J.Cho,H.Garcia-Molina,and L.Page.E .cient crawling through URL ordering.In Proceedings of the 7th International World Wide Web Conference pages 161 -172,Brisbane,Australia,April 1998.Elsevier Science. Google ScholarDigital Library
5.Google Inc.Press release:"Google launches world 's largest search engine."June 26,2000.A ailable at http://www.google.com/press/pressrel/pressrelease26.htmlGoogle Scholar
6.M.Henzinger,A.Heydon,M.Mitzenmacher,and M.Najork.On near-uniform URL sampling.In Proceedings of the 9th International Worl d Wide Web Conference pages 295 -308,Amsterdam,Netherlands, May 2000.Elsevier Science. Google ScholarDigital Library
7.A.Heydon and M.Najork.Mercator:A scalable, extensible web crawler.World Wide Web 2(4):219 -229,Dec.1999. Google ScholarDigital Library
8.J.Kleinberg.Authoritati e sources in a hyperlinked en ironment.In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms pages 668 -677, San Francisco,CA,Jan.1998. Google ScholarDigital Library
9.P.Lyman,H.Varian,J.Dunn,A.Strygin,and K.Swearingen.How much information?School of Information Management and Systems,Uni .of California at Berkeley,2000.A ailable at http://www.sims.berkeley.edu/how-much-infoGoogle Scholar
10.Mercator Home Page. http://www.research.digital.com/SRC/mercatorGoogle Scholar
11.J.L.Wiener,R.Wickremesinghe,M.Burrows, K.Randall,and R.Stata.Better link compression. Manuscript in progress.Compaq Systems Research Center,2001.Google Scholar

Index Terms

Breadth-first crawling yields high-quality pages

Recommendations

Random web crawls
WWW '07: Proceedings of the 16th international conference on World Wide Web

This paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, whose vertices are the pages and whose edges are the hypertextual links. Of ...
Read More
Incorporating the surfing behavior of web users into pagerank
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

In large-scale commercial web search engines, estimating the importance of a web page is a crucial ingredient in ranking web search results. So far, to assess the importance of web pages, two different types of feedback have been taken into account, ...
Read More
Intelligent crawling of web applications for web archiving
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

The steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '01: Proceedings of the 10th international conference on World Wide Web
May 2001
770 pages
ISBN:1581133480
DOI:10.1145/371920
Chairmen:
Vincent Y. Shen
Hong Kong Univ. of Science and Technology
,
Nobuo Saito
Keio Univ., Japan
,
Michael R. Lyu
Chinese Univ. of Hong Kong, HK
,
Mary Ellen Zurko
Iris Associates,USA
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PageRank
breadth-first search
crawl order
crawling
metric
page quality
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 207
  Total Citations
  View Citations
- 1,728
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Breadth-first crawling yields high-quality pages

WWW '01: Proceedings of the 10th international conference on World Wide Web

References

Cited By

Index Terms

Recommendations

Random web crawls

Incorporating the surfing behavior of web users into pagerank

Intelligent crawling of web applications for web archiving

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Breadth-first crawling yields high-quality pages

WWW '01: Proceedings of the 10th international conference on World Wide Web

References

Cited By

Index Terms

Recommendations

Random web crawls

Incorporating the surfing behavior of web users into pagerank

Intelligent crawling of web applications for web archiving

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media