Skip to main content
Top

2019 | OriginalPaper | Chapter

An Evolutionary Algorithm-Based Text Categorization Technique

Authors : Ajit Kumar Das, Asit Kumar Das, Apurba Sarkar

Published in: Computational Intelligence in Data Mining

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In general, most of the organizations generate unstructured data from which extraction of meaningful information becomes a difficult task. Preprocessing of unstructured data before mining helps to improve the efficiency of the mining algorithms. In this paper, text data is initially preprocessed using tokenization, stop word removal, and stemming operations and a bag-of-words is identified to characterize the text dataset. Next, improved strength pareto evolutionary algorithm-based genetic algorithm is applied to determine the more compact set of informative words for clustering of text documents efficiently. It is a bi-objective genetic algorithm used to approximate the pareto-optimal front exploring the search space for optimal solution. The external clustering index and number of words described in the documents are considered as two objective functions of the algorithm, and based on these functions chromosomes in the population are evaluated and the best chromosome in non-dominated pareto front of final population gives the optimal set of words sufficient for categorizartion of text dataset.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Arampatzis, A., der Weide, P.V., Koster, C., van Bommel, P.: Linguistically-motivated information retrieval. Encyclopedia of Library and Information Science (2000) Arampatzis, A., der Weide, P.V., Koster, C., van Bommel, P.: Linguistically-motivated information retrieval. Encyclopedia of Library and Information Science (2000)
4.
go back to reference Campo, D., Stegmayer, G., Milone, D.: A new index for clustering validation with overlapped clusters. Expert Systems with Applications 64(1), 549–556 (2016)CrossRef Campo, D., Stegmayer, G., Milone, D.: A new index for clustering validation with overlapped clusters. Expert Systems with Applications 64(1), 549–556 (2016)CrossRef
5.
go back to reference Foltz, P.W.: Latent semantic analysis for text-based research. Behavior Research Methods, Instruments, & Computers 28(2), 197–202 (Jun 1996)CrossRef Foltz, P.W.: Latent semantic analysis for text-based research. Behavior Research Methods, Instruments, & Computers 28(2), 197–202 (Jun 1996)CrossRef
6.
go back to reference Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42(7), 7–15 (1991)CrossRef Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42(7), 7–15 (1991)CrossRef
7.
go back to reference Hull, D.: Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)CrossRef Hull, D.: Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)CrossRef
8.
go back to reference Jivani, A.G.: A comparative study of stemming algorithms. International Journal of Computer Technology and Applications 2(6), 1930–1938 (2011) Jivani, A.G.: A comparative study of stemming algorithms. International Journal of Computer Technology and Applications 2(6), 1930–1938 (2011)
9.
go back to reference Krovetz, R.: Viewing morphology as an inference process. In: In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 191–202 (1993) Krovetz, R.: Viewing morphology as an inference process. In: In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 191–202 (1993)
10.
go back to reference dos SantosEmail, J.C.A., Favero, E.L.: Practical use of a latent semantic analysis (lsa) model for automatic evaluation of written answers. Journal of the Brazilian Computer Society 21(21), 1–8 (November 2015) dos SantosEmail, J.C.A., Favero, E.L.: Practical use of a latent semantic analysis (lsa) model for automatic evaluation of written answers. Journal of the Brazilian Computer Society 21(21), 1–8 (November 2015)
11.
go back to reference Willett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)CrossRef Willett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)CrossRef
12.
go back to reference Xu, J., Croft, B.: Corpus based stemming using co-occurrence of word variants. ACM Transactions on Information Systems 16(1) (1998)CrossRef Xu, J., Croft, B.: Corpus based stemming using co-occurrence of word variants. ACM Transactions on Information Systems 16(1) (1998)CrossRef
13.
go back to reference Zitzler, E., Thiele, L.: An evolutionary algorithm for multiobjective optimization: The strength pareto approach. Technical Report 43, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich, Gloriastrasse 35, CH-8092 Zurich, Switzerland. (May 1998) Zitzler, E., Thiele, L.: An evolutionary algorithm for multiobjective optimization: The strength pareto approach. Technical Report 43, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich, Gloriastrasse 35, CH-8092 Zurich, Switzerland. (May 1998)
14.
go back to reference Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary algorithm. Tech. rep. (2001) Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary algorithm. Tech. rep. (2001)
Metadata
Title
An Evolutionary Algorithm-Based Text Categorization Technique
Authors
Ajit Kumar Das
Asit Kumar Das
Apurba Sarkar
Copyright Year
2019
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8055-5_75

Premium Partner