Skip to main content
Erschienen in: The Journal of Supercomputing 11/2017

11.04.2017

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

verfasst von: Laith Mohammad Abualigah, Ahamad Tajudin Khader

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The text clustering technique is an appropriate method used to partition a huge amount of text documents into groups. The documents size affects the text clustering by decreasing its performance. Subsequently, text documents contain sparse and uninformative features, which reduce the performance of the underlying text clustering algorithm and increase the computational time. Feature selection is a fundamental unsupervised learning technique used to select a new subset of informative text features to improve the performance of the text clustering and reduce the computational time. This paper proposes a hybrid of particle swarm optimization algorithm with genetic operators for the feature selection problem. The k-means clustering is used to evaluate the effectiveness of the obtained features subsets. The experiments were conducted using eight common text datasets with variant characteristics. The results show that the proposed algorithm hybrid algorithm (H-FSPSOTC) improved the performance of the clustering algorithm by generating a new subset of more informative features. The proposed algorithm is compared with the other comparative algorithms published in the literature. Finally, the feature selection technique encourages the clustering algorithm to obtain accurate clusters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE). IEEE, pp 67–72 Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE). IEEE, pp 67–72
2.
Zurück zum Zitat Rao B, Mishra BK (2017) An approach to clustering of text documents using graph mining techniques. International Journal of Rough Sets and Data Analysis (IJRSDA) 4(1):38–55CrossRef Rao B, Mishra BK (2017) An approach to clustering of text documents using graph mining techniques. International Journal of Rough Sets and Data Analysis (IJRSDA) 4(1):38–55CrossRef
3.
Zurück zum Zitat Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering, pp 1–6 Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering, pp 1–6
4.
Zurück zum Zitat Li C, Lin M, Yang LT, Ding C (2014) Integrating the enriched feature with machine learning algorithms for human movement and fall detection. J Supercomput 67(3):854–865CrossRef Li C, Lin M, Yang LT, Ding C (2014) Integrating the enriched feature with machine learning algorithms for human movement and fall detection. J Supercomput 67(3):854–865CrossRef
5.
Zurück zum Zitat Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30(2):117–131CrossRefMATH Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30(2):117–131CrossRefMATH
6.
Zurück zum Zitat Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114CrossRef Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114CrossRef
7.
Zurück zum Zitat Bu F, Chen Z, Zhang Q, Yang LT (2016) Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J Supercomput 72(8):2977–2990CrossRef Bu F, Chen Z, Zhang Q, Yang LT (2016) Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J Supercomput 72(8):2977–2990CrossRef
8.
Zurück zum Zitat Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 30(2):117–131 Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 30(2):117–131
9.
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef
10.
Zurück zum Zitat Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636CrossRef Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636CrossRef
11.
Zurück zum Zitat Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34CrossRef Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34CrossRef
12.
Zurück zum Zitat Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763CrossRef Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763CrossRef
13.
Zurück zum Zitat Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313CrossRef Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313CrossRef
14.
Zurück zum Zitat Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017) Unsupervised Text Feature Selection Technique Based on Particle Swarm Optimization Algorithm for Improving the Text Clustering. EAI Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017) Unsupervised Text Feature Selection Technique Based on Particle Swarm Optimization Algorithm for Improving the Text Clustering. EAI
15.
Zurück zum Zitat Shamsinejadbabki P, Saraee M (2012) A new unsupervised feature selection method for text clustering based on genetic algorithms. J Intell Inf Syst 38(3):669–684CrossRef Shamsinejadbabki P, Saraee M (2012) A new unsupervised feature selection method for text clustering based on genetic algorithms. J Intell Inf Syst 38(3):669–684CrossRef
16.
Zurück zum Zitat Hong SS, Lee W, Han MM (2015) The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int J Adv Soft Comput Appl 7(1):22–40 Hong SS, Lee W, Han MM (2015) The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int J Adv Soft Comput Appl 7(1):22–40
17.
Zurück zum Zitat Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72:1–12CrossRef Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72:1–12CrossRef
18.
Zurück zum Zitat Diao R (2014) Feature selection with harmony search and its applications. Aberystwyth University, Aberystwyth Diao R (2014) Feature selection with harmony search and its applications. Aberystwyth University, Aberystwyth
19.
Zurück zum Zitat Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19 Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19
20.
Zurück zum Zitat Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032CrossRef Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032CrossRef
21.
Zurück zum Zitat Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-Based Text Clustering Technique Using K-Mean Algorithm. 2016 July, pp 1–6 Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-Based Text Clustering Technique Using K-Mean Algorithm. 2016 July, pp 1–6
22.
Zurück zum Zitat Bharti KK, Singh PK (2014) A three-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):156–169CrossRef Bharti KK, Singh PK (2014) A three-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):156–169CrossRef
23.
Zurück zum Zitat Bharti KK, Singh PK (2013) A two-stage unsupervised dimension reduction method for text clustering. In: Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012) Volume 2. Springer, 2013, pp 529–542 Bharti KK, Singh PK (2013) A two-stage unsupervised dimension reduction method for text clustering. In: Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012) Volume 2. Springer, 2013, pp 529–542
24.
Zurück zum Zitat Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Harmony Search Algorithm for Improving the Text Clustering. 2016 July, pp 1–6 Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Harmony Search Algorithm for Improving the Text Clustering. 2016 July, pp 1–6
25.
Zurück zum Zitat Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200CrossRef Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200CrossRef
26.
Zurück zum Zitat Nekkaa M, Boughaci D (2015) Hybrid harmony search combined with stochastic local search for feature selection. Neural Process Lett 44:1–22 Nekkaa M, Boughaci D (2015) Hybrid harmony search combined with stochastic local search for feature selection. Neural Process Lett 44:1–22
27.
Zurück zum Zitat Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 49:437–446CrossRef Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 49:437–446CrossRef
28.
Zurück zum Zitat Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845CrossRefMATHMathSciNet Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845CrossRefMATHMathSciNet
29.
Zurück zum Zitat Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291CrossRefMathSciNet Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291CrossRefMathSciNet
30.
Zurück zum Zitat Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632CrossRef Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632CrossRef
31.
Zurück zum Zitat Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst Appl 42(1):306–324CrossRef Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst Appl 42(1):306–324CrossRef
Metadaten
Titel
Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering
verfasst von
Laith Mohammad Abualigah
Ahamad Tajudin Khader
Publikationsdatum
11.04.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 11/2017
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2046-2

Weitere Artikel der Ausgabe 11/2017

The Journal of Supercomputing 11/2017 Zur Ausgabe