skip to main content
10.1145/335168.335170acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article
Free Access

The Web as a graph

Published:01 May 2000Publication History

ABSTRACT

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph has about a billion nodes today, several billion links, and appears to grow exponentially with time. There are many reasons—mathematical, sociological, and commercial—for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.

References

  1. 1.S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel Query language for semistructured data. Intl. J. on Digital Libraries, 1(1):68-88, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2.R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the World Wide Web. Nature 401:130-131, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.W. Aiello, P. Chung, and L. Lu. A random graph model for massive graphs. Proc. A CM Syrup. on Theory of Computing, 2000. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.G. O. Arocena, A. O. Mendelzon, and G. A. Mihaila. Applications of a Web query language. Proc. 6th WWW Conf., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. Proc. A CM SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Proc. 7th WWW Conf., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.B. Bollob~s. Random Graphs. Academic Press, 1985.Google ScholarGoogle Scholar
  9. 9.A. Z. Broder, S. R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. Proc. 9th WWW Conf., 2000. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.J. Carri~re and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity. Proc. 6th WWW Conf., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.S. Chakrabarti, B. Dora, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th WWW Conf., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.S. Chakrabarti, B. Dora, and M. van den Berg. Focused crawling: A new approach for topic-specific resource discovery. Proc. 8th WWW Conf., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.S. Chakrabarti, B. Dora, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on Hypertext IR, 1998.Google ScholarGoogle Scholar
  14. 14.S. Chakrabarti, B. Dora, and P. Indyk. Enhanced hypertext classification using hyperlinks. Proc. A CM SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.H. T. Davis. The Analysis of Economic Time Series. Principia Press, 1941.Google ScholarGoogle Scholar
  16. 16.R. Downey and M. Fellows. Parametrized computational feasibility. In Feasible Mathematics II, P. Clote and J. Remmel, eds., Birkhauser, 1994.Google ScholarGoogle Scholar
  17. 17.L. Egghe and R. Rousseau. Introduction toGoogle ScholarGoogle Scholar
  18. 18.D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3): 59-74, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471-479, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  20. 20.N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2(2), 1997.Google ScholarGoogle Scholar
  21. 21.G. Golub and C. F. Van Loan. Matrix Oomputations. Johns Hopkins University Press, 1989.Google ScholarGoogle Scholar
  22. 22.M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10-25, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  23. 23.J. Kleinberg. Authoritative sources in a hyperlinked environment. J. o} the A CM, 1999, to appear. Also appears as IBM Research Report RJ 10076(91892) May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.D. Konopnicki and O. Shmueli. Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling emerging cyber-communities automatically. Proc. 8~h WWW ConI., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. Proc. VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative approach to querying and restructuring the World Wide Web. Post-ICDE Workshop on RIDE, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.R. Larson. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting o} the American Soc. In}o. Sci., 1996.Google ScholarGoogle Scholar
  29. 29.A. J. Lotka. The frequency distribution of scientific productivity. J. o} the Washington Acad. o} Sci., 16:317, 1926.Google ScholarGoogle Scholar
  30. 30.A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web. J. o} Digital Libraries, 1(1):68-88, 1997.Google ScholarGoogle Scholar
  31. 31.A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Oomp., 24(6): 1235-1258, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.E. Spertus. ParaSite: Mining structural information on the Web. Proc. 6th WWW Con}., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.G. K. Zipf. Human behavior and the principle of least effort. New York: Ha}her, 1949.Google ScholarGoogle Scholar

Index Terms

  1. The Web as a graph

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '00: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      May 2000
      281 pages
      ISBN:158113214X
      DOI:10.1145/335168

      Copyright © 2000 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 May 2000

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      PODS '00 Paper Acceptance Rate26of119submissions,22%Overall Acceptance Rate642of2,707submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader