Skip to main content
Erschienen in: Soft Computing 2/2021

03.08.2020 | Methodologies and Application

Automatic text classification using machine learning and optimization algorithms

verfasst von: R. Janani, S. Vijayarani

Erschienen in: Soft Computing | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the recent years, the volume of text documents in the form of digital way has grown up extremely in size. As significance, there is a need to be competent to automatically bring together and classify the documents based on their content. The main goal of text classification is to partition the unstructured set of documents into their respective categories based on its content. The main aim of this research work is to automatically classify the documents which are stored in the personal computer into their relevant categories. This work has two significant phases. In the first phase, the important features are selected for classification and the second phase is the classification of text documents. For selecting the optimal features, this research work proposes a new algorithm, optimization technique for feature selection (OTFS) algorithm. To estimate the proficiency of proposed feature selection algorithm, the OTFS algorithm was compared with the existing approaches artificial bee colony, firefly algorithm, ant colony optimization and particle swarm optimization. In the second phase, this research work proposed machine learning-based automatic text classification (MLearn-ATC) algorithm for text classification. In classification, the MLearn-ATC algorithm was compared with widely used classification techniques probabilistic neural network, support vector machine, K-nearest neighbor and Naïve Bayes. From this, the output of first phase is used as the input for classification phase. The decisive results establish that the proposed algorithms achieve the better accuracy for optimizing the features and classifying the text documents based on their content.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853CrossRef Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853CrossRef
Zurück zum Zitat Ahmad SR, Yusop NMM, Bakar AA, Yaakub MR (2017) Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis. In: AIP conference proceedings, vol 1891, no 1, p 020018. AIP Publishing Ahmad SR, Yusop NMM, Bakar AA, Yaakub MR (2017) Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis. In: AIP conference proceedings, vol 1891, no 1, p 020018. AIP Publishing
Zurück zum Zitat Alghamdi HS, Tang HL, Alshomrani S (2012) Hybrid ACO and TOFA feature selection approach for text classification. In: 2012 IEEE congress on evolutionary computation. IEEE, pp 1–6 Alghamdi HS, Tang HL, Alshomrani S (2012) Hybrid ACO and TOFA feature selection approach for text classification. In: 2012 IEEE congress on evolutionary computation. IEEE, pp 1–6
Zurück zum Zitat Azam N, Yao J (2012) Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 39(5):4760–4768CrossRef Azam N, Yao J (2012) Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 39(5):4760–4768CrossRef
Zurück zum Zitat Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorization. Appl Artif Intell 15(9):843–873CrossRef Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorization. Appl Artif Intell 15(9):843–873CrossRef
Zurück zum Zitat Danaee S, Darakeh F, Mohammad-Khani G-R (2018) Applying an ANFIS-based algorithm in comparison with mechanistic modelling in a biofilter treating hexane. J Green Eng 8(3):319–338CrossRef Danaee S, Darakeh F, Mohammad-Khani G-R (2018) Applying an ANFIS-based algorithm in comparison with mechanistic modelling in a biofilter treating hexane. J Green Eng 8(3):319–338CrossRef
Zurück zum Zitat Dey Sarkar S, Goswami S, Agarwal A, Aktar J (2014) A novel feature selection technique for text classification using Naive Bayes. In: International scholarly research notices, 2014 Dey Sarkar S, Goswami S, Agarwal A, Aktar J (2014) A novel feature selection technique for text classification using Naive Bayes. In: International scholarly research notices, 2014
Zurück zum Zitat Gulin VV, Frolov AB (2016) On the classification of text documents taking into account their structural features. J Comput Syst Sci Int 55(3):394–403MathSciNetCrossRef Gulin VV, Frolov AB (2016) On the classification of text documents taking into account their structural features. J Comput Syst Sci Int 55(3):394–403MathSciNetCrossRef
Zurück zum Zitat Hamdani TM, Won JM, Alimi AM, Karray F (2011) Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate. Appl Soft Comput 11(2):2501–2509CrossRef Hamdani TM, Won JM, Alimi AM, Karray F (2011) Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate. Appl Soft Comput 11(2):2501–2509CrossRef
Zurück zum Zitat Ikonomakis M, Kotsiantis S, Tampakas V (2005) Text classification using machine learning techniques. WSEAS Trans Comput 4(8):966–974 Ikonomakis M, Kotsiantis S, Tampakas V (2005) Text classification using machine learning techniques. WSEAS Trans Comput 4(8):966–974
Zurück zum Zitat Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272CrossRef Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272CrossRef
Zurück zum Zitat Li R, Wang ZO (2004) Mining classification rules using rough sets and neural networks. Eur J Oper Res 157(2):439–448CrossRef Li R, Wang ZO (2004) Mining classification rules using rough sets and neural networks. Eur J Oper Res 157(2):439–448CrossRef
Zurück zum Zitat Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72(8):3210–3221CrossRef Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72(8):3210–3221CrossRef
Zurück zum Zitat Lipovetzky N, Geffner H (2017) Best-first width search: Exploration and exploitation in classical planning. In: AAAI'17: proceedings of the thirty-first AAAI conference on artificial intelligence, pp 3590–3596 Lipovetzky N, Geffner H (2017) Best-first width search: Exploration and exploitation in classical planning. In: AAAI'17: proceedings of the thirty-first AAAI conference on artificial intelligence, pp 3590–3596
Zurück zum Zitat Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 4:491–502 Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 4:491–502
Zurück zum Zitat Marie-Sainte SL, Alalyani N (2020) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ Comput Inf Sci 32(3):320-328 Marie-Sainte SL, Alalyani N (2020) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ Comput Inf Sci 32(3):320-328
Zurück zum Zitat Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54CrossRef Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54CrossRef
Zurück zum Zitat Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137CrossRef Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137CrossRef
Zurück zum Zitat Radha P, MeenaPreethi B (2019) Machine learning approaches for disease prediction from radiology and pathology reports. J Green Eng 9(2):149–166 Radha P, MeenaPreethi B (2019) Machine learning approaches for disease prediction from radiology and pathology reports. J Green Eng 9(2):149–166
Zurück zum Zitat Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47CrossRef Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47CrossRef
Zurück zum Zitat Subanya B, Rajalaxmi RR (2014) Feature selection using Artificial Bee Colony for cardiovascular disease classification. In: 2014 international conference on electronics and communication systems (ICECS). IEEE, pp 1–6 Subanya B, Rajalaxmi RR (2014) Feature selection using Artificial Bee Colony for cardiovascular disease classification. In: 2014 international conference on electronics and communication systems (ICECS). IEEE, pp 1–6
Zurück zum Zitat Suguna N, Thanushkodi KG (2011) An independent rough set approach hybrid with artificial bee colony algorithm for dimensionality reduction. Am J Appl Sci 8(3):261CrossRef Suguna N, Thanushkodi KG (2011) An independent rough set approach hybrid with artificial bee colony algorithm for dimensionality reduction. Am J Appl Sci 8(3):261CrossRef
Zurück zum Zitat Tamilmani G, Sivakumari S (2020) Safe engineering application for detecting the brain tumor using grey wolf optimization technique. J Green Eng 10(5):1971–1983 Tamilmani G, Sivakumari S (2020) Safe engineering application for detecting the brain tumor using grey wolf optimization technique. J Green Eng 10(5):1971–1983
Zurück zum Zitat Vo DT, Ock CY (2015) Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst Appl 42(3):1684–1698CrossRef Vo DT, Ock CY (2015) Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst Appl 42(3):1684–1698CrossRef
Zurück zum Zitat Xu S (2018) Bayesian Naïve Bayes classifiers to text classification. J Inform Sci 44(1):48–59CrossRef Xu S (2018) Bayesian Naïve Bayes classifiers to text classification. J Inform Sci 44(1):48–59CrossRef
Zurück zum Zitat Younus ZS, Mohamad D, Saba T, Alkawaz MH, Rehman A, Al-Rodhaan M, Al-Dhelaan A (2015) Content-based image retrieval using PSO and k-means clustering algorithm. Arab J Geosci 8(8):6211–6224CrossRef Younus ZS, Mohamad D, Saba T, Alkawaz MH, Rehman A, Al-Rodhaan M, Al-Dhelaan A (2015) Content-based image retrieval using PSO and k-means clustering algorithm. Arab J Geosci 8(8):6211–6224CrossRef
Zurück zum Zitat Zhang N, Xiong J, Zhong J, Thompson L (2018) Feature selection method using BPSO-EA with ENN classifier. In: 2018 eighth international conference on information science and technology (ICIST). IEEE, pp 364–369 Zhang N, Xiong J, Zhong J, Thompson L (2018) Feature selection method using BPSO-EA with ENN classifier. In: 2018 eighth international conference on information science and technology (ICIST). IEEE, pp 364–369
Metadaten
Titel
Automatic text classification using machine learning and optimization algorithms
verfasst von
R. Janani
S. Vijayarani
Publikationsdatum
03.08.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 2/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-05209-8

Weitere Artikel der Ausgabe 2/2021

Soft Computing 2/2021 Zur Ausgabe

Premium Partner