ABSTRACT
This paper addresses the problem of automatically structuring linked document collections by using clustering. In contrast to traditional clustering, we study the clustering problem in the light of available link structure information for the data set (e.g., hyperlinks among web documents or co-authorship among bibliographic data entries). Our approach is based on iterative relaxation of cluster assignments, and can be built on top of any clustering algorithm. This technique results in higher cluster purity, better overall accuracy, and make self-organization more robust.
- R. Angelova, S. Siersdorfer, and G. Weikum. A neighborhood based approach for clustering of linked document collections. Research Report MPI-I-2006-5-005, 2006.Google ScholarDigital Library
- R. Angelova and G. Weikum. Graph-based text classification: Learn from your neighbors. In ACM SIGIR '06, 2006. Google ScholarDigital Library
- J. Hartigan and M. Wong. A k-means clustering algorithm. Applied Statistics, 28:100--108, 1979.Google ScholarCross Ref
- S. Z. Li. Markov random field modeling in image analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2001. Google ScholarDigital Library
- L. Pelkowitz. A continuous relaxation labeling algorithm for markov random fields. 20:709--715, 1990.Google Scholar
- A. Schenker, H. Bunke, M. Last, and A. Kandel. Graph-theoretic techniques for web content mining. Series in Machine Perception and Artificial Intelligence, 62, 2005.Google Scholar
- C. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comp., C20:68--86, 1971.Google ScholarDigital Library
Index Terms
- A neighborhood-based approach for clustering of linked document collections
Recommendations
Evolution-Based Tabu Search Approach to Automatic Clustering
Traditional clustering algorithms (e.g., the K-means algorithm and its variants) are used only for a fixed number of clusters. However, in many clustering applications, the actual number of clusters is unknown beforehand. The general solution to this ...
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Document clustering with universum
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalDocument clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As ...
Comments