Weitere Kapitel dieses Buchs durch Wischen aufrufen
The dynamic Web where thousands of pages are updated in every second is growing at lightning speed. Hence, getting required Web documents in a fraction of time is becoming a challenging task for the present search engine. Clustering, which is an important technique of data mining can shed light on this problem. Association technique of data mining plays a vital role in clustering the Web documents. This paper is an effort in that direction where the following techniques have been proposed:
a new feature selection technique named term-term correlation has been introduced which reduces the size of the corpus by eliminating noise and redundant features.
a novel technique named Support Based Count (SBC) has been proposed which combines with traditional Apriori approach for clustering the Web documents.
Empirical results on two benchmark datasets show that the proposed approach is more promising compared to the traditional clustering approaches.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
A. Spink, D. Wolfram, M. B. Jansen, and T. Saracevic, “Searching the web: The public and their queries,” Journal of the American society for information science and technology, vol. 52, no. 3, pp. 226–234, 2001.
W. B. Croft, “A model of cluster searching based on classification,” Information systems, vol. 5, no. 3, pp. 189–195, 1980.
J. Tang, “Improved k-means clustering algorithm based on user tag,” Journal of Convergence Information Technology, vol. 12, pp. 124–130, 2010.
C. X. Lin, Y. Yu, J. Han, and B. Liu, “Hierarchical web-page clustering via in-page and cross-page link structures,” in Advances in Knowledge Discovery and Data Mining. Springer, 2010, pp. 222–229.
X. Gu, X. Wang, R. Li, K. Wen, Y. Yang, and W. Xiao, “A new vector space model exploiting semantic correlations of social annotations for web page clustering,” in Web-Age Information Management. Springer, 2011, pp. 106–117.
P. Worawitphinyo, X. Gao, and S. Jabeen, “Improving suffix tree clustering with new ranking and similarity measures,” in Advanced Data Mining and Applications. Springer, 2011, pp. 55–68.
M. T. Hassan and A. Karim, “Clustering and understanding documents via discrimination information maximization,” in Advances in Knowledge Discovery and Data Mining. Springer, 2012, pp. 566–577.
P. Li, B. Wang, and W. Jin, “Improving web document clustering through employing user-related tag expansion techniques,” Journal of Computer Science and Technology, vol. 27, no. 3, pp. 554–566, 2012.
R. K. Roul, S. Varshneya, A. Kalra, and S. K. Sahay, “A novel modified apriori approach for web document clustering,” in Computational Intelligence in Data Mining-Volume 3. Springer, 2015, pp. 159–171.
A. Inokuchi, T. Washio, and H. Motoda, “An apriori-based algorithm for mining frequent substructures from graph data,” in Principles of Data Mining and Knowledge Discovery. Springer, 2000, pp. 13–23.
M. Steinbach, G. Karypis, V. Kumar et al., “A comparison of document clustering techniques,” in KDD workshop on text mining, vol. 400, no. 1. Boston, 2000, pp. 525–526.
G. Salton, A. Wong, and C.-S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, 1975.
- Combining Apriori Approach with Support-Based Count Technique to Cluster the Web Documents
Rajendra Kumar Roul
Sanjay Kumar Sahay
- Springer Singapore
Neuer Inhalt/© ITandMEDIA