ABSTRACT
The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph has about a billion nodes today, several billion links, and appears to grow exponentially with time. There are many reasons—mathematical, sociological, and commercial—for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
- 1.S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel Query language for semistructured data. Intl. J. on Digital Libraries, 1(1):68-88, 1997.Google ScholarCross Ref
- 2.R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the World Wide Web. Nature 401:130-131, 1999.Google ScholarCross Ref
- 3.W. Aiello, P. Chung, and L. Lu. A random graph model for massive graphs. Proc. A CM Syrup. on Theory of Computing, 2000. To appear. Google ScholarDigital Library
- 4.G. O. Arocena, A. O. Mendelzon, and G. A. Mihaila. Applications of a Web query language. Proc. 6th WWW Conf., 1997. Google ScholarDigital Library
- 5.K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998. Google ScholarDigital Library
- 6.K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. Proc. A CM SIGIR, 1998. Google ScholarDigital Library
- 7.S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Proc. 7th WWW Conf., 1998. Google ScholarDigital Library
- 8.B. Bollob~s. Random Graphs. Academic Press, 1985.Google Scholar
- 9.A. Z. Broder, S. R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. Proc. 9th WWW Conf., 2000. To appear. Google ScholarDigital Library
- 10.J. Carri~re and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity. Proc. 6th WWW Conf., 1997. Google ScholarDigital Library
- 11.S. Chakrabarti, B. Dora, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th WWW Conf., 1998. Google ScholarDigital Library
- 12.S. Chakrabarti, B. Dora, and M. van den Berg. Focused crawling: A new approach for topic-specific resource discovery. Proc. 8th WWW Conf., 1999. Google ScholarDigital Library
- 13.S. Chakrabarti, B. Dora, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on Hypertext IR, 1998.Google Scholar
- 14.S. Chakrabarti, B. Dora, and P. Indyk. Enhanced hypertext classification using hyperlinks. Proc. A CM SIGMOD, 1998. Google ScholarDigital Library
- 15.H. T. Davis. The Analysis of Economic Time Series. Principia Press, 1941.Google Scholar
- 16.R. Downey and M. Fellows. Parametrized computational feasibility. In Feasible Mathematics II, P. Clote and J. Remmel, eds., Birkhauser, 1994.Google Scholar
- 17.L. Egghe and R. Rousseau. Introduction toGoogle Scholar
- 18.D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3): 59-74, 1998. Google ScholarDigital Library
- 19.E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471-479, 1972.Google ScholarCross Ref
- 20.N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2(2), 1997.Google Scholar
- 21.G. Golub and C. F. Van Loan. Matrix Oomputations. Johns Hopkins University Press, 1989.Google Scholar
- 22.M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10-25, 1963.Google ScholarCross Ref
- 23.J. Kleinberg. Authoritative sources in a hyperlinked environment. J. o} the A CM, 1999, to appear. Also appears as IBM Research Report RJ 10076(91892) May 1997. Google ScholarDigital Library
- 24.D. Konopnicki and O. Shmueli. Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems, 1998. Google ScholarDigital Library
- 25.S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling emerging cyber-communities automatically. Proc. 8~h WWW ConI., 1999. Google ScholarDigital Library
- 26.S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. Proc. VLDB, 1999. Google ScholarDigital Library
- 27.L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative approach to querying and restructuring the World Wide Web. Post-ICDE Workshop on RIDE, 1996. Google ScholarDigital Library
- 28.R. Larson. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting o} the American Soc. In}o. Sci., 1996.Google Scholar
- 29.A. J. Lotka. The frequency distribution of scientific productivity. J. o} the Washington Acad. o} Sci., 16:317, 1926.Google Scholar
- 30.A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web. J. o} Digital Libraries, 1(1):68-88, 1997.Google Scholar
- 31.A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Oomp., 24(6): 1235-1258, 1995. Google ScholarDigital Library
- 32.E. Spertus. ParaSite: Mining structural information on the Web. Proc. 6th WWW Con}., 1997. Google ScholarDigital Library
- 33.G. K. Zipf. Human behavior and the principle of least effort. New York: Ha}her, 1949.Google Scholar
Index Terms
- The Web as a graph
Recommendations
Web graph analyzer tool
valuetools '06: Proceedings of the 1st international conference on Performance evaluation methodolgies and toolsWe present the software tool "Web Graph Analyzer". This tool is designed to perform a comprehensive analysis of the Web Graph structure. By Web Graph we mean a graph whose vertices are Web pages and whose edges are hyper-links. With the help of the Web ...
Graph structure in the web: aggregated by pay-level domain
WebSci '14: Proceedings of the 2014 ACM conference on Web sciencePrevious research on the overall graph structure of the World Wide Web mostly focused on the page level, meaning that the graph that directly results from hyperlinks between individual web pages was analyzed. This paper aims to provide additional ...
Stochastic models for the Web graph
FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer ScienceThe Web may be viewed as a directed graph each of whose vertices is a static HTML Web page, and each of whose edges corresponds to a hyperlink from one Web page to another. We propose and analyze random graph models inspired by a series of empirical ...
Comments