Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access June 29, 2013

Evaluation of text document clustering approach based on particle swarm optimization

  • Stuti Karol EMAIL logo and Veenu Mangat
From the journal Open Computer Science

Abstract

Clustering, an extremely important technique in Data Mining is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. The goal is to create clusters that are coherent internally, but substantially different from each other. Text Document Clustering refers to the clustering of related text documents into groups based upon their content. It is a fundamental operation used in unsupervised document organization, text data mining, automatic topic extraction, and information retrieval. Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. The documents to be clustered can be web news articles, abstracts of research papers etc. This paper proposes two techniques for efficient document clustering involving the application of soft computing approach as an intelligent hybrid approach PSO algorithm. The proposed approach involves partitioning Fuzzy C-Means algorithm and K-Means algorithm each hybridized with Particle Swarm Optimization (PSO). The performance of these hybrid algorithms has been evaluated against traditional partitioning techniques (K-Means and Fuzzy C Means).

[1] Abraham A., Das S., Roy, Swarm Intelligence Algorithms for Data Clustering Search in Google Scholar

[2] Abraham A., Das S., Konar A., Document Clustering using Differential Evolution, IEEE, 2006 Search in Google Scholar

[3] Abraham A., Grosan C., Ramos V., Swarm Intelligence in Data Mining, Stud. Comput. Intell., 34, 2006 10.1007/978-3-540-34956-3Search in Google Scholar

[4] Abraham A., Guo H., Liu H., Swarm Intelligence: Foundations, Perspectives and Applications Search in Google Scholar

[5] Abraham A., Ramos V., Web using mining using artificial ant colony clustering and linear genetic programming, In: Fifth Congress on Evolutionary Computation (CEC2003), Canberra, Australia, IEEE Press, 1384–1391, 2003 Search in Google Scholar

[6] Aliguliyev R.M., Clustering of document collection — A weighting approach, Expert Syst. Appl., 36, 7904–7916, 2009 http://dx.doi.org/10.1016/j.eswa.2008.11.01710.1016/j.eswa.2008.11.017Search in Google Scholar

[7] Amalabai V., Manimegalai D., An Analysis of Document Clustering algorithms, ICCCT-10, IEEE, 2010 Search in Google Scholar

[8] Anaya-Sánchez H., Pons-Porrata A., Berlanga-Llavori R., A document clustering algorithm for discovering and describing topics, Pattern Recognit. Lett., 31, 502–510, 2010 http://dx.doi.org/10.1016/j.patrec.2009.11.01310.1016/j.patrec.2009.11.013Search in Google Scholar

[9] Anaya-Sánchez H., Pons-Porrata A., Berlanga-Llavori R., A document clustering algorithm for discovering and describing topics, Pattern Recognit. Lett., 31, 502–510, 2010 http://dx.doi.org/10.1016/j.patrec.2009.11.01310.1016/j.patrec.2009.11.013Search in Google Scholar

[10] Arch-int S., Web document clustering using Semantic Link Analysis, In: Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) Search in Google Scholar

[11] Bae, Xu, Esteva, Facilitating Understanding of Large Document Collections, 2011 International Conference on Document Analysis and Recognition, IEEE, 2011 10.1109/ICDAR.2011.268Search in Google Scholar

[12] Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 1999 Search in Google Scholar

[13] Baghel R., Dhir D., Text Clustering based on Frequent Concept, 1st International Conference on Parallel, Distributed and Grid Computing (PDGC — 2010), IEEE, 2010 10.1109/PDGC.2010.5679969Search in Google Scholar

[14] Bezdek E., Full, FCM: the fuzzy c-means clustering algorithm, Comput. Geosci., 10, 191–203, 1984 http://dx.doi.org/10.1016/0098-3004(84)90020-710.1016/0098-3004(84)90020-7Search in Google Scholar

[15] Brian S. Everitt, Sabine Landau, and Morven Leese, Cluster Analysis, Oxford University Press, 4th edition, 2001 Search in Google Scholar

[16] Cheng Y., Ontology based Fuzzy Semantic Clustering, Third 2008 International Conference on Convergence and Hybrid Information Technology, IEEE, 2008 10.1109/ICCIT.2008.232Search in Google Scholar

[17] Civicioglu P., Besdok E., A conceptual comparison of the Cuckoo-search, particle swarm optimization, differential evolution and artificial bee colony algorithms, Springer Science and Business Media B.V., 2011 10.1007/s10462-011-9276-0Search in Google Scholar

[18] Corne D., Dorigo M., Glover F., New ideas in optimization, McGraw-Hill, USA, 1999 Search in Google Scholar

[19] Cui X., Potok T.E., Palathingal P., Document Clustering using Particle Swarm Optimization, IEEE, 2005 10.1109/SIS.2005.1501621Search in Google Scholar

[20] Der Merwe van D.W., Engelbrecht A.P., Data clustering using particle swarm optimization, Proceedings of IEEE Congress on Evolutionary Computation, Canberra, Australia, 2003 Search in Google Scholar

[21] Eric Bonabeau, Christopher Meyer, Swarm Intelligence: A Whole New Way to Think About Business, Harvard Business Review, 2001 Search in Google Scholar

[22] Freeman R., Yin H., Allinson N.M., Self-Organising Maps for Tree View Based Hierarchical Document Clustering, IEEE, 2002 Search in Google Scholar

[23] Gu P., Zhu Q., He X., Concept based Text Classification using Labelled and Unlabelled Data, ADMA 2006, LNAI 4093, 652–660, 2006, Springer-Verlag, Berlin, Heidelberg, Germany, 2006 10.1007/11811305_72Search in Google Scholar

[24] Guha S., Rastogi R., Shim K., ROCK: A robust clustering algorithm for categorical attributes, International Conference on Data Engineering (ICDE’99), 512–521, 1999 10.1109/ICDE.1999.754967Search in Google Scholar

[25] Guha S., Rastogi R., Shim K., Cure: An efficient clustering algorithm for large databases, In: Proceedings of 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), 73–84, Seattle, WA, 1998 http://dx.doi.org/10.1145/276304.27631210.1145/276304.276312Search in Google Scholar

[26] Han, Kamber, Data Mining Concepts and Techniques, Morgan Kauffman Publishers Search in Google Scholar

[27] Hoe K.M., Lai W.K., Tai S.Y., Homogeneous ants for web document similarity modeling and categorization, Third Int. Workshop on Ant Algorithms (ANTS2002), Brussels, Belgium, LNCS 2463, Springer-Verlag, Berlin, Heidelberg, Germany, 256–261, 2002 10.1007/3-540-45724-0_24Search in Google Scholar

[28] Hotho A., Maedche A., Staab S., Ontology-based text document clustering, Kunstliche Intelligenz, 16, 48–54, 2002 Search in Google Scholar

[29] Ingaramo D., Errecalde M., Cagnina L., Rosso P., Particle Swarm Optimization for clustering short text corpora Search in Google Scholar

[30] Jain A.K., Data Clustering: A Review, ACM Computing Surveys, 31, 1999 10.1145/331499.331504Search in Google Scholar

[31] Jain A.K., Dubes R.C., Algorithms for Clustering Data, Prentice Hall Advanced Reference Series, 1988 Search in Google Scholar

[32] Jing L., Zhou L., Ng M.K., Huang J.Z., Ontology based distance Measure for Text clustering Search in Google Scholar

[33] Jo T., Jo G.-S., Table based single pass algorithm for clustering Electronic documents in 20Newsgroup, IEEE International Workshop on Semantic Computing and Applications, IEEE, 2008 10.1109/IWSCA.2008.32Search in Google Scholar

[34] Karypis G., Han E.-H., Kumar V., CHAMELEON: A Hierarchical Clustering Algorithm using Dynamic Modelling, COMPUTER, 32, 68–75, 1999 http://dx.doi.org/10.1109/2.78163710.1109/2.781637Search in Google Scholar

[35] Kennedy J., Eberhart R., Particle Swarm Optimization, IEEE, 1995 Search in Google Scholar

[36] Kohonen T., Self-organized formation of topologically correct feature maps, Biol. Cybern., 43,1, 59–69, 1982 http://dx.doi.org/10.1007/BF0033728810.1007/BF00337288Search in Google Scholar

[37] Kuo R.J., Lin L.M., Application of a hybrid of genetic algorithm and particle swarm optimization algorithm for order clustering, Decis. Support Syst., 49, 451–462, 2010 http://dx.doi.org/10.1016/j.dss.2010.05.00610.1016/j.dss.2010.05.006Search in Google Scholar

[38] Lu Y., Wang S., Li S., Zhou C., Text Clustering Via Particle Swarm Optimization, IEEE, 2009 Search in Google Scholar

[39] Lu Y., Wang S., Li S., Zhou C., Particle Swarm Optimizer for Variable weighting clustering in high dimensional data, DOI 10.1007/s10994-009-5154-2 Search in Google Scholar

[40] Mahdavi M., Abolhassani H., Harmony K-Means Algorithm for Document Clustering, Data. Min. Knowl. Disc., 18, 370–391, 2009 http://dx.doi.org/10.1007/s10618-008-0123-010.1007/s10618-008-0123-0Search in Google Scholar

[41] Mahdavi M., Haghir Chehreghani M., Abolhassani H., Forsati R., Novel Meta-heuristic algorithm for clustering Web documents, Appl. Math. Comput., 201, 441–451, 2008 http://dx.doi.org/10.1016/j.amc.2007.12.05810.1016/j.amc.2007.12.058Search in Google Scholar

[42] Manning C.D., Raghavan P., Schötze H., An Introduction to Information retrieval, Cambridge University Press Search in Google Scholar

[43] Muflikhah L., Baharudin B., Document Clustering using concept space and cosine Similarity measure, International Conference on Computer Technology and Development, 2009 10.1109/ICCTD.2009.206Search in Google Scholar

[44] Odukoya O.H, Aderounmu G.A., Adagunodo, E.R., An improved Data clustering algorithm for Mining Web Documents, IEEE, 2010 10.1109/CISE.2010.5676720Search in Google Scholar

[45] Oikonomakou N., Vazirgiannis M., A Review of Web document Clustering Approaches Search in Google Scholar

[46] Pessiot J.-F., Kim Y.-M., Amini M.R., Gallinari P., Improving Document Clustering in a learned concept space, Inf. Process. Manage., 46, 180–192, 2010 http://dx.doi.org/10.1016/j.ipm.2009.09.00710.1016/j.ipm.2009.09.007Search in Google Scholar

[47] Porter M.F, An Algorithm for Suffix Stripping, Program, 14, 130–137, 1980 http://dx.doi.org/10.1108/eb04681410.1108/eb046814Search in Google Scholar

[48] Premlatha K., Natrajan A.M., Discrete PSO with GA operators for Document Clustering, Int. J. Recent Trends Eng., 1, 2009 Search in Google Scholar

[49] Selim S.Z., Ismail M.A., K-means type algorithms: A generalized convergence theorem and characterization of local optimality, IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87, 1984 http://dx.doi.org/10.1109/TPAMI.1984.476747810.1109/TPAMI.1984.4767478Search in Google Scholar

[50] Shafiei M. et al., Document Representation and Dimension Reduction for Text Clustering, IEEE, 2007 10.1109/ICDEW.2007.4401066Search in Google Scholar

[51] Shyu M.L., Shen M., Rubin S.H., Affinity based Similarity Measure for Document Clustering, IEEE, 2004 Search in Google Scholar

[52] Singh V.K., Tiwari N., Garg, Document clustering using K-Means, Heuristic K-Means and Fuzzy-CMeans, 2011 International Conference on Computational Intelligence and Communication Systems, IEEE, 2011 10.1109/CICN.2011.62Search in Google Scholar

[53] Smeaton A.F., Burnett M., Crimmins F., Quinn G., An architecture for Efficient Document Clustering and Retrieval on a Dynamic Collection of Newspaper Texts, 20th BCS-IRSG Colloquium on Information Retrieval, 1998 10.14236/ewic/IRSG1998.10Search in Google Scholar

[54] http://snowball.tartarus.org Search in Google Scholar

[55] Song L., Ma J., Yan P., Lian L., Zhang D., Clustering Deep Web Databases Semantically, Springer Verlag Search in Google Scholar

[56] Sridevi U.K., Nagaveni N., Ontology based Similarity Measures in Document Similarity Ranking, International Conference on Advances in Recent Technologies in Communication and Computing, IEEE, 2009 10.1109/ARTCom.2009.144Search in Google Scholar

[57] Sridevi U.K., Nagaveni N., Semantically Enhanced Document Clustering Based on PSO Algorithm, Eur. J. European Journal Sci. Res., 57, 485–493, 2011 Search in Google Scholar

[58] Steinbach M., Karypis G., Kumar V., A Comparison of document Clustering Techniques Search in Google Scholar

[59] Strehl A., Ghosh J., Impact of Similarity Measures on Web page clustering, Raymond Mooney, AAAI Technical Report WS-00-01, 2000 Search in Google Scholar

[60] Subhashini R., Jawahar V., Kumar S., Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval, First International Conference on Integrated Intelligent Computing, IEEE, 2010 10.1109/ICIIC.2010.42Search in Google Scholar

[61] Thangamani M., Thangaraj P., Survey on Text document Clustering, Int. J. Comput. Sci. Inf. Secur., 8, 2010 Search in Google Scholar

[62] Thangaraj R., Pant M., Abraham A., Bouvry P., Particle swarm optimization: Hybridization perspectives and experimental illustrations, Appl. Math. Comput., 2011 10.1016/j.amc.2010.12.053Search in Google Scholar

[63] Tjhi W-C., Chen L., Fuzzy Co-Clustering of Web Documents, In: Proceedings of the 2005 International Conference on Cyberworlds (CW’05) 10.1109/CW.2005.48Search in Google Scholar

[64] Treerattanapitak K., Jaruskulchai C., Wong K.W. et al. (Eds.), Membership Enhancement with Exponential Fuzzy Clustering for Collaborative Filtering, ICONIP 2010, Part I, LNCS, 6443, 559–566, 2010 10.1007/978-3-642-17537-4_68Search in Google Scholar

[65] Wang Z., Liu Z., Chen D., Tang K., A New Partitioning based algorithm for Document clustering, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), IEEE, 2011 10.1109/FSKD.2011.6019857Search in Google Scholar

[66] Yongxin L., Zhijng L., An improved hierarchical K-means algorithm for web document clustering, International Conference on Computer Science and Information Technology, 2008 Search in Google Scholar

[67] Zhang T., Ramakrishnan R., Livny M., BIRCH: An Efficient Data Clustering Method for very Large Databases. In: Proceedings of the 1996 ACM SIGMOD international Conference on Management of Data, Montreal, Quebec, Canada, June 04–06, 1996 Search in Google Scholar

[68] Zhang X., Jing L., Hu X., Ng M., Zhou X., A Comparative study of Ontology based Term Similarity Measures on PubMed Document Clustering Search in Google Scholar

[69] Zhang Z., Cheng H., Zhang S., Chen W., Fang Q., Clustering aggregation based on genetic algorithm for document clustering, Evol. Comput., 2008 Search in Google Scholar

[70] Zhao Y., Karypis G., Criteria functions for Document Clustering Experiments and Analysis, University of Minnesota, Army HPC Research Centre, 2001 Search in Google Scholar

Published Online: 2013-6-29
Published in Print: 2013-6-1

© 2013 Versita Warsaw

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.2478/s13537-013-0104-2/html
Scroll to top button