ABSTRACT
Recent research on link analysis has shown the existence of numerous web communities on the Web. A web community is a collection of web pages created by individuals or any kind of associations that have a common interest on a specific topic. In this paper, we propose a technique to create a web community chart, that connects related web communities, from thousands of seed pages. This allows the user to navigate through related web communities, and can be used for a `What's Related Community' service that provides not only the web community including a given page but also related web communities. Our technique is based on a related page algorithm that gives related pages to a given page using only link analysis. We show that the algorithm can be used for creating the chart by applying the algorithm to each seed, then using similarities of the results to classify seeds into clusters and to deduce their relationships. We perform experiments to create a web community chart of companies and organizations from thousands of seed pages. First, we improve the precision of an existing related page algorithm, Companion, and evaluated the improved version, Companion-, by an user study. Then the chart is created using Companion-. The result chart consists of web communities including related pages, and paths between related web communities. From the chart, we can find many web communities of companies classified by their category of business, and relationships between the communities.
- 1.Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumar, and Suresh Venkatasubramanian. The Connectivity Server: fast access to linkage information on the Web. In Proceedings of the 7th International World Wide Web Conference, 1998. Google ScholarDigital Library
- 2.Krishna Bharat and Monika Henzinger. Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In Proceedings of ACM SIGIR '98, 1998. Google ScholarDigital Library
- 3.D. Boley, M. Gini, R. Gross, S. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. Partitioning-Based Clustering for Web Document Categorization. Desision Support Systems, 27(3):329-341, 1999. Google ScholarDigital Library
- 4.A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference, 1997. Google ScholarDigital Library
- 5.S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and Jon Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 1998. Google ScholarDigital Library
- 6.Jeffrey Dean and Monika R. Henzinger. Finding related pages in the World Wide Web. In Proceedings of the 8th World-Wide Web Conference, 1999. Google ScholarDigital Library
- 7.Gary W. Flake, Steve Lawrence, and C. Lee Giles. Efficient Identification of Web Communities. In Proceedings of KDD 2000, 2000. Google ScholarDigital Library
- 8.David Gibson, Jon Kleinberg, and Prabhakar Raghavan. Inferring Web Communities from Link Topology. In Proceedings of HyperText98, 1998. Google ScholarDigital Library
- 9.Jon M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarDigital Library
- 10.Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and AndrewTomkins. Extracting large-scale knowledge bases from the web. In Proceedings of the 25th VLDB Conference, 1999. Google ScholarDigital Library
- 11.J. Pitkow and P. Pirolli. Life, death, and lawfulness on the electronic frontier. In Proceedings of International Conference on Computer and Human Interaction, 1997. Google ScholarDigital Library
- 12.Sridhar Rajagopalan Ravi Kumar, Prabhakar Raghavan and Andrew Tomkins. Trawling the Web for emerging cyber-communities. In Proceedings of the 8th World- Wide Web Conference, 1999. Google ScholarDigital Library
Index Terms
- Creating a Web community chart for navigating related communities
Recommendations
Extracting evolution of web communities from a series of web archives
HYPERTEXT '03: Proceedings of the fourteenth ACM conference on Hypertext and hypermediaRecent advances in storage technology make it possible to store a series of large Web archives. It is now an exciting challenge for us to observe evolution of the Web. In this paper, we propose a method for observing evolution of web communities. A web ...
Detection of web communities from community cores
WISS'10: Proceedings of the 2010 international conference on Web information systems engineeringA Web community, as a significant pattern of the Web, formed by a group of pages focusing on a common topic. Web communities are able to be oriented by complete bipartite graphs (CBG for short, and also known as community cores). Investigations have ...
Constructing good quality web page communities
World Wide Web is a rich source of information and continues to expand in size and complexity. To capture the features of the Web at a higher level to realise the information classification and efficient retrieval on the Web is becoming a challenge task. ...
Comments