Article

Outlink estimation for pagerank computation under missing data

Authors:
Sreangsu Acharyya

University of Texas, Austin, TX

University of Texas, Austin, TX
View Profile

,
Joydeep Ghosh

University of Texas, Austin, TX

University of Texas, Austin, TX
View Profile

WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & postersMay 2004Pages 486–487https://doi.org/10.1145/1013367.1013538

Published:19 May 2004Publication History

WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters

Pages 486–487

ABSTRACT

The enormity and rapid growth of the web-graph forces quantities such as its pagerank tobe computed under missing information consisting of outlinks of pages that have not yet been crawled. This paper examines the role played by the size and distribution of this missing data in determining the accuracy of the computed pagerank, focusing on questions such as (i) the accuracy of pageranks under missing information, (ii) the size at which a crawl process may be aborted while still ensuring reasonable accuracy of pageranks, and (iii) algorithms to estimate pageranks under such missing information. Thefirst couple of questions are addressed on the basis of certain simple bounds relating the expected distance between the true and computed pageranks and the size of the missing data. The third question is explored by devising algorithms to predict the pageranks when full information is not available. A key feature of the "dangling link estimation" and "clustered link estimation" algorithms proposed is that, they do not need to run the pagerank iteration afresh once the outlinks have been estimated.

References

T. Hofmann and J. Puzicha. Unsupervised learning from dyadic data. Technical Report TR-98-042, University of California, Berkeley, Berkeley, CA, 1998.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1998.Google Scholar
G. Pandurangan, P. Raghavan, and E. Upfal. Using PageRank to Characterize Web Structure. In 8th Annual International Computing and Combinatorics Conference (COCOON), 2002. Google ScholarDigital Library
www.lans.ece.utexas.edu/ srean/wip/missing.pdf.Google Scholar

Index Terms

Outlink estimation for pagerank computation under missing data
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide Web

Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
Read More
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems

Web search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...
Read More
Local computation of PageRank: the ranking side
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Imagine you are a social network user who wants to search, in a list of potential candidates, for the best candidate for a job on the basis of their PageRank-induced importance ranking. Is it possible to compute this ranking for a low cost, by visiting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
May 2004
532 pages
ISBN:1581139128
DOI:10.1145/1013367
Conference Chairs:
Stuart Feldman
IBM Research
,
Mike Uretsky
New York University
,
Program Chairs:
Marc Najork
Microsoft Research
,
Craig Wills
Worcester Polytechnic Institute
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 May 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 242
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Outlink estimation for pagerank computation under missing data

WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beyond PageRank: machine learning for static ranking

Associated pagerank: improved pagerank measured by frequent term sets

Local computation of PageRank: the ranking side