Skip to main content
Erschienen in: Progress in Artificial Intelligence 4/2021

03.06.2021 | Regular Paper

Text categorization based on a new classification by thresholds

verfasst von: Walid Cherif, Abdellah Madani, Mohamed Kissi

Erschienen in: Progress in Artificial Intelligence | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automated text categorization attempts to provide an effective solution to today’s unprecedented growth of textual data. Due to its capacity to organize a huge and varied amount of texts from which it is possible to gain invaluable insights, it has become an emerging investigative field for the research community. However, although several mathematical approaches have been studied to formalize the main components of a text categorization system: text representation, features extraction, and the classification process; such systems still face many difficulties due both to the complex nature of text databases and to the high dimensionality of texts representations. In this sense, this paper introduces an alternative way to process this problem. First, it starts by reducing the original set of features by using a newly proposed metric. And second, the added advantage of the proposed approach is that it automatically classifies a text without necessarily processing all its features. Moreover, some standard pretreatments such as stemming can be abandoned with this approach. The experimental results showed that this new text categorization method outperforms the state-of-the-art methods. As a result, the obtained f-measures on the 20 Newsgroups, BBC News, Reuters, and AG news datasets were, respectively, 95.06%, 98.21%, 88.44%, 95.70%, while standard approaches returned considerably lower scores.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Pérez-Rodríguez, G., Pérez-Pérez, M., Fdez-Riverola, F., Lourenço, A.: Online visibility of software-related web sites: the case of biomedical text mining tools. Inf. Process. Manag. 56(3), 565–583 (2019)CrossRef Pérez-Rodríguez, G., Pérez-Pérez, M., Fdez-Riverola, F., Lourenço, A.: Online visibility of software-related web sites: the case of biomedical text mining tools. Inf. Process. Manag. 56(3), 565–583 (2019)CrossRef
2.
Zurück zum Zitat Hartmann, J., Huppertz, J., Schamp, C., Heitmann, M.: Comparing automated text classification methods. Int. J. Res. Mark. 36(1), 20–38 (2019)CrossRef Hartmann, J., Huppertz, J., Schamp, C., Heitmann, M.: Comparing automated text classification methods. Int. J. Res. Mark. 36(1), 20–38 (2019)CrossRef
3.
Zurück zum Zitat Kakol, M., Nielek, R., Wierzbicki, A.: Understanding and predicting Web content credibility using the Content Credibility Corpus. Inf. Process. Manag. 53(5), 1043–1061 (2017)CrossRef Kakol, M., Nielek, R., Wierzbicki, A.: Understanding and predicting Web content credibility using the Content Credibility Corpus. Inf. Process. Manag. 53(5), 1043–1061 (2017)CrossRef
4.
Zurück zum Zitat Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur Priv 1(1), e9 (2018)CrossRef Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur Priv 1(1), e9 (2018)CrossRef
5.
Zurück zum Zitat Posadas-Durán, J.-P., Gómez-Adorno, H., Sidorov, G., Batyrshin, I., Pinto, D., Chanona-Hernández, L.: Application of the distributed document representation in the authorship attribution task for small corpora. Soft Comput. 21(3), 627–639 (2017)CrossRef Posadas-Durán, J.-P., Gómez-Adorno, H., Sidorov, G., Batyrshin, I., Pinto, D., Chanona-Hernández, L.: Application of the distributed document representation in the authorship attribution task for small corpora. Soft Comput. 21(3), 627–639 (2017)CrossRef
6.
Zurück zum Zitat Giatsoglou, M., Vozalis, M.G., Diamantaras, K., Vakali, A., Sarigiannidis, G., Chatzisavvas, K.C.: Sentiment analysis leveraging emotions and word embeddings. Expert Syst. Appl. 69, 214–224 (2017)CrossRef Giatsoglou, M., Vozalis, M.G., Diamantaras, K., Vakali, A., Sarigiannidis, G., Chatzisavvas, K.C.: Sentiment analysis leveraging emotions and word embeddings. Expert Syst. Appl. 69, 214–224 (2017)CrossRef
7.
Zurück zum Zitat Cherif, W., Madani, A., Kissi, M.: Towards an efficient opinion measurement in Arabic comments. Procedia Comput. Sci. 73, 122–129 (2015)CrossRef Cherif, W., Madani, A., Kissi, M.: Towards an efficient opinion measurement in Arabic comments. Procedia Comput. Sci. 73, 122–129 (2015)CrossRef
8.
Zurück zum Zitat Petrenz, P., Webber, B.: Stable classification of text genres. Comput. Linguist. 37(2), 385–393 (2011)CrossRef Petrenz, P., Webber, B.: Stable classification of text genres. Comput. Linguist. 37(2), 385–393 (2011)CrossRef
9.
Zurück zum Zitat Stavrianou, A., Andritsos, P., Nicoloyannis, N.: Overview and semantic issues of text mining. ACM Sigmod Rec. 36(3), 23–34 (2007)CrossRef Stavrianou, A., Andritsos, P., Nicoloyannis, N.: Overview and semantic issues of text mining. ACM Sigmod Rec. 36(3), 23–34 (2007)CrossRef
10.
Zurück zum Zitat Kostkina, A., Bodunkov, D., Klimov, V.: Document categorization based on usage of features reduction with synonyms clustering in weak semantic map. Procedia Comput. Sci. 145, 288–292 (2018)CrossRef Kostkina, A., Bodunkov, D., Klimov, V.: Document categorization based on usage of features reduction with synonyms clustering in weak semantic map. Procedia Comput. Sci. 145, 288–292 (2018)CrossRef
11.
Zurück zum Zitat Wang, R., Chen, G., Sui, X.: Multi label text classification method based on co-occurrence latent semantic vector space. Procedia Comput. Sci. 131, 756–764 (2018)CrossRef Wang, R., Chen, G., Sui, X.: Multi label text classification method based on co-occurrence latent semantic vector space. Procedia Comput. Sci. 131, 756–764 (2018)CrossRef
12.
Zurück zum Zitat Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRef Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRef
13.
Zurück zum Zitat Manikandan, R., Sivakumar, R.: Machine learning algorithms for text-documents classification: a review. Mach. Learn. 3(2), 384–389 (2018) Manikandan, R., Sivakumar, R.: Machine learning algorithms for text-documents classification: a review. Mach. Learn. 3(2), 384–389 (2018)
14.
Zurück zum Zitat Alostad, J.M.: Dimensionality scale back in massive datasets using PDLPP. J. Comput. Sci. 26, 141–146 (2018)MathSciNetCrossRef Alostad, J.M.: Dimensionality scale back in massive datasets using PDLPP. J. Comput. Sci. 26, 141–146 (2018)MathSciNetCrossRef
15.
Zurück zum Zitat Leopold, E., May, M., Paaß, G.: Data mining and text mining for science and technology research. In: Handbook of quantitative science and technology research, pp. 187–213. Springer, Dordrecht (2004) Leopold, E., May, M., Paaß, G.: Data mining and text mining for science and technology research. In: Handbook of quantitative science and technology research, pp. 187–213. Springer, Dordrecht (2004)
16.
Zurück zum Zitat Virmani, D., Taneja, S.: A text preprocessing approach for efficacious information retrieval. In: Smart innovations in communication and computational sciences, pp. 13–22. Springer, Singapore (2019) Virmani, D., Taneja, S.: A text preprocessing approach for efficacious information retrieval. In: Smart innovations in communication and computational sciences, pp. 13–22. Springer, Singapore (2019)
17.
Zurück zum Zitat Joachims, T.: A Probabilistic analysis of the rocchio algorithm with TFIDF for text categorization (No. CMU-CS-96-118). Carnegie-mellon univ pittsburgh pa dept of computer science (1996) Joachims, T.: A Probabilistic analysis of the rocchio algorithm with TFIDF for text categorization (No. CMU-CS-96-118). Carnegie-mellon univ pittsburgh pa dept of computer science (1996)
18.
Zurück zum Zitat Dogan, T., Uysal, A.K.: On term frequency factor in supervised term weighting schemes for text classification. Arab. J. Sci. Eng. 44, 1–16 (2019)CrossRef Dogan, T., Uysal, A.K.: On term frequency factor in supervised term weighting schemes for text classification. Arab. J. Sci. Eng. 44, 1–16 (2019)CrossRef
19.
Zurück zum Zitat Guru, D.S., Suhil, M., Raju, L.N., Kumar, N.V.: An alternative framework for univariate filter-based feature selection for text categorization. Pattern Recognit. Lett. 103, 23–31 (2018)CrossRef Guru, D.S., Suhil, M., Raju, L.N., Kumar, N.V.: An alternative framework for univariate filter-based feature selection for text categorization. Pattern Recognit. Lett. 103, 23–31 (2018)CrossRef
20.
Zurück zum Zitat Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477, 15–29 (2019)CrossRef Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477, 15–29 (2019)CrossRef
21.
Zurück zum Zitat Bai, V.M.A., Manimegalai, D.: Analysis of feature selection measures for text categorization. Int. J. Enterp. Netw. Manag. 8(1), 45–60 (2017) Bai, V.M.A., Manimegalai, D.: Analysis of feature selection measures for text categorization. Int. J. Enterp. Netw. Manag. 8(1), 45–60 (2017)
22.
Zurück zum Zitat Lang, K.: Newsweeder: learning to filter netnews. In: Machine learning proceedings 1995, pp. 331–339. Morgan Kaufmann (1995) Lang, K.: Newsweeder: learning to filter netnews. In: Machine learning proceedings 1995, pp. 331–339. Morgan Kaufmann (1995)
23.
Zurück zum Zitat Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM (JACM) 8(3), 404–417 (1961)MATHCrossRef Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM (JACM) 8(3), 404–417 (1961)MATHCrossRef
24.
Zurück zum Zitat Sebastiani, F.: Text categorization. In: Encyclopedia of database technologies and applications, pp. 683–687. IGI Global (2005) Sebastiani, F.: Text categorization. In: Encyclopedia of database technologies and applications, pp. 683–687. IGI Global (2005)
25.
Zurück zum Zitat Hayes, P.J., Andersen, P.M., Nirenburg, I.B., Schmandt, L.M.: Tcs: a shell for content-based text categorization. In: Sixth conference on artificial intelligence for applications, pp. 320–326. IEEE (1990) Hayes, P.J., Andersen, P.M., Nirenburg, I.B., Schmandt, L.M.: Tcs: a shell for content-based text categorization. In: Sixth conference on artificial intelligence for applications, pp. 320–326. IEEE (1990)
26.
Zurück zum Zitat Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retrieval 1(1–2), 69–90 (1999)CrossRef Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retrieval 1(1–2), 69–90 (1999)CrossRef
27.
Zurück zum Zitat Xu, S.: Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 44(1), 48–59 (2018)CrossRef Xu, S.: Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 44(1), 48–59 (2018)CrossRef
28.
Zurück zum Zitat Zhang, L., Jiang, L., Li, C., Kong, G.: Two feature weighting approaches for naive Bayes text classifiers. Knowl.-Based Syst. 100, 137–144 (2016)CrossRef Zhang, L., Jiang, L., Li, C., Kong, G.: Two feature weighting approaches for naive Bayes text classifiers. Knowl.-Based Syst. 100, 137–144 (2016)CrossRef
29.
Zurück zum Zitat Hassaine, A., Mecheter, S., Jaoua, A.: Text categorization using hyper rectangular keyword extraction: application to news articles classification. In: International conference on relational and algebraic methods in computer science, pp. 312–325. Springer, Cham (2015) Hassaine, A., Mecheter, S., Jaoua, A.: Text categorization using hyper rectangular keyword extraction: application to news articles classification. In: International conference on relational and algebraic methods in computer science, pp. 312–325. Springer, Cham (2015)
30.
Zurück zum Zitat Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)CrossRef Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)CrossRef
31.
Zurück zum Zitat Nikhath, A.K., Subrahmanyam, K., Vasavi, R.: Building a K-nearest neighbor classifier for text categorization. Int. J. Comput. Sci. Inf. Technol. 7(1), 254–256 (2016) Nikhath, A.K., Subrahmanyam, K., Vasavi, R.: Building a K-nearest neighbor classifier for text categorization. Int. J. Comput. Sci. Inf. Technol. 7(1), 254–256 (2016)
32.
Zurück zum Zitat Jo, T.: String vector based KNN for text categorization. In: 2018 20th international conference on advanced communication technology (ICACT), pp. 438–443. IEEE (2018) Jo, T.: String vector based KNN for text categorization. In: 2018 20th international conference on advanced communication technology (ICACT), pp. 438–443. IEEE (2018)
33.
Zurück zum Zitat Yu, B., Xu, Z.B., Li, C.H.: Latent semantic analysis for text categorization using neural network. Knowl.-Based Syst. 21(8), 900–904 (2008)CrossRef Yu, B., Xu, Z.B., Li, C.H.: Latent semantic analysis for text categorization using neural network. Knowl.-Based Syst. 21(8), 900–904 (2008)CrossRef
34.
Zurück zum Zitat Ramesh, B., Sathiaseelan, J.G.R.: An advanced multi class instance selection-based support vector machine for text classification. Procedia Comput. Sci. 57, 1124–1130 (2015)CrossRef Ramesh, B., Sathiaseelan, J.G.R.: An advanced multi class instance selection-based support vector machine for text classification. Procedia Comput. Sci. 57, 1124–1130 (2015)CrossRef
35.
Zurück zum Zitat Goudjil, M., Koudil, M., Bedda, M., Ghoggali, N.: A novel active learning method using SVM for text classification. Int. J. Autom. Comput. 15, 1–9 (2018)CrossRef Goudjil, M., Koudil, M., Bedda, M., Ghoggali, N.: A novel active learning method using SVM for text classification. Int. J. Autom. Comput. 15, 1–9 (2018)CrossRef
36.
Zurück zum Zitat Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review. Multimed. Tools Appl. 78(3), 3797–3816 (2019)CrossRef Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review. Multimed. Tools Appl. 78(3), 3797–3816 (2019)CrossRef
37.
Zurück zum Zitat Tang, X., Dai, Y., Xiang, Y.: Feature selection based on feature interactions with application to text categorization. Expert Syst. Appl. 120, 207–216 (2019)CrossRef Tang, X., Dai, Y., Xiang, Y.: Feature selection based on feature interactions with application to text categorization. Expert Syst. Appl. 120, 207–216 (2019)CrossRef
38.
Zurück zum Zitat Banks, G.C., Woznyj, H.M., Wesslen, R.S., Ross, R.L.: A review of best practice recommendations for text analysis in R (and a user-friendly app). J. Bus. Psychol. 33(4), 445–459 (2018)CrossRef Banks, G.C., Woznyj, H.M., Wesslen, R.S., Ross, R.L.: A review of best practice recommendations for text analysis in R (and a user-friendly app). J. Bus. Psychol. 33(4), 445–459 (2018)CrossRef
39.
Zurück zum Zitat Cherif, W., Madani, A., Kissi, M.: New rules-based algorithm to improve Arabic stemming accuracy. Int. J. Knowl. Eng. Data Min. 3(3–4), 315–336 (2015)CrossRef Cherif, W., Madani, A., Kissi, M.: New rules-based algorithm to improve Arabic stemming accuracy. Int. J. Knowl. Eng. Data Min. 3(3–4), 315–336 (2015)CrossRef
40.
Zurück zum Zitat Das, A.K., Das, A.K., Sarkar, A.: An Evolutionary Algorithm-Based Text Categorization Technique. In: Computational intelligence in data mining, pp. 851–861. Springer, Singapore (2019) Das, A.K., Das, A.K., Sarkar, A.: An Evolutionary Algorithm-Based Text Categorization Technique. In: Computational intelligence in data mining, pp. 851–861. Springer, Singapore (2019)
41.
Zurück zum Zitat Murphy, G., & Cubranic, D.: Automatic bug triage using text categorization. In: Proceedings of the sixteenth international conference on software engineering and knowledge engineering, pp. 261–272 (2004) Murphy, G., & Cubranic, D.: Automatic bug triage using text categorization. In: Proceedings of the sixteenth international conference on software engineering and knowledge engineering, pp. 261–272 (2004)
42.
Zurück zum Zitat Gupta, V., Lehal, G.S.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009) Gupta, V., Lehal, G.S.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)
43.
Zurück zum Zitat Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)CrossRef Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)CrossRef
44.
Zurück zum Zitat Jo, T.: K nearest neighbor for text categorization using feature similarity. In: Advanced engineering and ICT–convergence 2019 (ICAEIC-2019), p. 99 (2019) Jo, T.: K nearest neighbor for text categorization using feature similarity. In: Advanced engineering and ICT–convergence 2019 (ICAEIC-2019), p. 99 (2019)
45.
Zurück zum Zitat Langlois, A., Nie, J.Y., Thomas, J., Hong, Q.N., Pluye, P.: Discriminating between empirical studies and nonempirical works using automated text classification. Res. Synth. Methods 9(4), 587–601 (2018)CrossRef Langlois, A., Nie, J.Y., Thomas, J., Hong, Q.N., Pluye, P.: Discriminating between empirical studies and nonempirical works using automated text classification. Res. Synth. Methods 9(4), 587–601 (2018)CrossRef
46.
Zurück zum Zitat Zhang, T., Ge, S.S.: An improved TF-IDF algorithm based on class discriminative strength for text categorization on desensitized data. In: Proceedings of the 2019 3rd international conference on innovation in artificial intelligence, pp. 39–44. ACM (2019) Zhang, T., Ge, S.S.: An improved TF-IDF algorithm based on class discriminative strength for text categorization on desensitized data. In: Proceedings of the 2019 3rd international conference on innovation in artificial intelligence, pp. 39–44. ACM (2019)
47.
Zurück zum Zitat Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH
48.
Zurück zum Zitat Rehman, A., Javed, K., Babri, H.A.: Feature selection based on a normalized difference measure for text classification. Inf. Process. Manag. 53(2), 473–489 (2017)CrossRef Rehman, A., Javed, K., Babri, H.A.: Feature selection based on a normalized difference measure for text classification. Inf. Process. Manag. 53(2), 473–489 (2017)CrossRef
49.
Zurück zum Zitat Hussain, S., Keung, J., Khan, A.A., Ahmad, A., Cuomo, S., Piccialli, F., Jeon, G., Akhunzada, A.: Implications of deep learning for the automation of design patterns organization. J. Parallel Distrib. Comput. 117, 256–266 (2018)CrossRef Hussain, S., Keung, J., Khan, A.A., Ahmad, A., Cuomo, S., Piccialli, F., Jeon, G., Akhunzada, A.: Implications of deep learning for the automation of design patterns organization. J. Parallel Distrib. Comput. 117, 256–266 (2018)CrossRef
50.
Zurück zum Zitat Premchander, K., Sarma, S.S.V.N., Vaishali, K., Reddy, P.V., Anjaneyulu, M., Nagaprasad, S.: WordNet-based text categorization using convolutional neural networks. In: Proceedings of International Conference on Recent Advancement on Computer and Communication, pp. 243–251. Springer, Singapore (2018) Premchander, K., Sarma, S.S.V.N., Vaishali, K., Reddy, P.V., Anjaneyulu, M., Nagaprasad, S.: WordNet-based text categorization using convolutional neural networks. In: Proceedings of International Conference on Recent Advancement on Computer and Communication, pp. 243–251. Springer, Singapore (2018)
51.
Zurück zum Zitat Tao, X., Yaling, W., Nan, M.: Convolutional neural network based on word sense disambiguation for text classification. Appl. Res. Comput. 5, 10 (2018) Tao, X., Yaling, W., Nan, M.: Convolutional neural network based on word sense disambiguation for text classification. Appl. Res. Comput. 5, 10 (2018)
52.
Zurück zum Zitat Wang, X., Kim, H.C.: Text categorization with improved deep learning methods. J. Inf. Commun. Converg. Eng. 16(2), 106–113 (2018) Wang, X., Kim, H.C.: Text categorization with improved deep learning methods. J. Inf. Commun. Converg. Eng. 16(2), 106–113 (2018)
53.
Zurück zum Zitat Škrlj, B., Kralj, J., Lavrač, N., Pollak, S.: Towards robust text classification with semantics-aware recurrent neural architecture. Mach. Learn. Knowl. Extr. 1(2), 575–589 (2019)CrossRef Škrlj, B., Kralj, J., Lavrač, N., Pollak, S.: Towards robust text classification with semantics-aware recurrent neural architecture. Mach. Learn. Knowl. Extr. 1(2), 575–589 (2019)CrossRef
54.
Zurück zum Zitat Jiang, M., Liang, Y., Feng, X., Fan, X., Pei, Z., Xue, Y., Guan, R.: Text classification based on deep belief network and softmax regression. Neural Comput. Appl. 29(1), 61–70 (2018)CrossRef Jiang, M., Liang, Y., Feng, X., Fan, X., Pei, Z., Xue, Y., Guan, R.: Text classification based on deep belief network and softmax regression. Neural Comput. Appl. 29(1), 61–70 (2018)CrossRef
55.
Zurück zum Zitat Tellez, E.S., Moctezuma, D., Miranda-Jiménez, S., Graff, M.: An automated text categorization framework based on hyperparameter optimization. Knowl.-Based Syst. 149, 110–123 (2018)CrossRef Tellez, E.S., Moctezuma, D., Miranda-Jiménez, S., Graff, M.: An automated text categorization framework based on hyperparameter optimization. Knowl.-Based Syst. 149, 110–123 (2018)CrossRef
56.
Zurück zum Zitat Shah, F.P., Patel, V.: A review on feature selection and feature extraction for text classification. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2264–2268. IEEE (2016) Shah, F.P., Patel, V.: A review on feature selection and feature extraction for text classification. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2264–2268. IEEE (2016)
57.
Zurück zum Zitat Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine learning, pp. 377–384 (2006) Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine learning, pp. 377–384 (2006)
58.
Zurück zum Zitat Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46(1–3), 423–444 (2002)MATHCrossRef Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46(1–3), 423–444 (2002)MATHCrossRef
59.
Zurück zum Zitat Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015) Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
60.
Zurück zum Zitat Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)CrossRef Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)CrossRef
61.
Zurück zum Zitat Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics and Cognitive Computing (ICCI* CC), pp. 136–140 (2015) Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics and Cognitive Computing (ICCI* CC), pp. 136–140 (2015)
62.
Zurück zum Zitat Labani, M., Moradi, P., Ahmadizar, F., Jalili, M.: A novel multivariate filter method for feature selection in text classification problems. Eng. Appl. Artif. Intell. 70, 25–37 (2018)CrossRef Labani, M., Moradi, P., Ahmadizar, F., Jalili, M.: A novel multivariate filter method for feature selection in text classification problems. Eng. Appl. Artif. Intell. 70, 25–37 (2018)CrossRef
63.
Zurück zum Zitat Bramesh, S.M., Kumar, K.A.: Empirical study to evaluate the performance of classification algorithms on public datasets. In: Emerging Research in Electronics, Computer Science and Technology, pp. 447–455. Springer, Singapore (2019) Bramesh, S.M., Kumar, K.A.: Empirical study to evaluate the performance of classification algorithms on public datasets. In: Emerging Research in Electronics, Computer Science and Technology, pp. 447–455. Springer, Singapore (2019)
64.
Zurück zum Zitat Chowdhury, S.B.R., Annervaz, K.M., Dukkipati, A.: Instance-based inductive deep transfer learning by cross-dataset querying with locality sensitive hashing (2018) Chowdhury, S.B.R., Annervaz, K.M., Dukkipati, A.: Instance-based inductive deep transfer learning by cross-dataset querying with locality sensitive hashing (2018)
65.
Zurück zum Zitat Pappagari, R., Villalba, J., Dehak, N.: Joint verification-identification in end-to-end multi-scale CNN framework for topic identification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6199–6203 (2018) Pappagari, R., Villalba, J., Dehak, N.: Joint verification-identification in end-to-end multi-scale CNN framework for topic identification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6199–6203 (2018)
66.
Zurück zum Zitat Kadhim, A.I., Cheah, Y.N., Ahamed, N.H.: Text document preprocessing and dimension reduction techniques for text document clustering. In: 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, pp. 69–73. IEEE (2014) Kadhim, A.I., Cheah, Y.N., Ahamed, N.H.: Text document preprocessing and dimension reduction techniques for text document clustering. In: 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, pp. 69–73. IEEE (2014)
67.
Zurück zum Zitat Camacho-Collados, J., Pilehvar, M.T.: On the role of text preprocessing in neural network architectures: an evaluation study on text categorization and sentiment analysis (2017). arXiv:1707.01780 Camacho-Collados, J., Pilehvar, M.T.: On the role of text preprocessing in neural network architectures: an evaluation study on text categorization and sentiment analysis (2017). arXiv:​1707.​01780
68.
Zurück zum Zitat Asim, M.N., Khan, M.U.G., Malik, M.I., Dengel, A., Ahmed, S.: A robust hybrid approach for textual document classification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1390–1396. IEEE (2019) Asim, M.N., Khan, M.U.G., Malik, M.I., Dengel, A., Ahmed, S.: A robust hybrid approach for textual document classification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1390–1396. IEEE (2019)
69.
Zurück zum Zitat Elghannam, F.: Text representation and classification based on bi-gram alphabet. J. King Saud Univ. Comput. Inf. Sci. 33(2), 235–242 (2021) Elghannam, F.: Text representation and classification based on bi-gram alphabet. J. King Saud Univ. Comput. Inf. Sci. 33(2), 235–242 (2021)
70.
Zurück zum Zitat Pradhan, L., Taneja, N.A., Dixit, C., Suhag, M.: Comparison of text classifiers on news articles. Int. Res. J. Eng. Technol. 4(3), 2513–2517 (2017) Pradhan, L., Taneja, N.A., Dixit, C., Suhag, M.: Comparison of text classifiers on news articles. Int. Res. J. Eng. Technol. 4(3), 2513–2517 (2017)
71.
Zurück zum Zitat Aziguli, W., Zhang, Y., Xie, Y., Zhang, D., Luo, X., Li, C., & Zhang, Y.: A robust text classifier based on denoising deep neural network in the analysis of big data. Sci. Program. 2017(1), 3610378 (2017) Aziguli, W., Zhang, Y., Xie, Y., Zhang, D., Luo, X., Li, C., & Zhang, Y.: A robust text classifier based on denoising deep neural network in the analysis of big data. Sci. Program. 2017(1), 3610378 (2017)
72.
Zurück zum Zitat Al-Salemi, B., Ayob, M., Noah, S.A.M.: Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst. Appl. 113, 531–543 (2018)CrossRef Al-Salemi, B., Ayob, M., Noah, S.A.M.: Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst. Appl. 113, 531–543 (2018)CrossRef
73.
Zurück zum Zitat Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: An kNN model-based approach and its application in text categorization. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 559–570. Springer, Berlin, Heidelberg (2004) Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: An kNN model-based approach and its application in text categorization. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 559–570. Springer, Berlin, Heidelberg (2004)
74.
Zurück zum Zitat Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text classification with recurrent neural networks (2017). arXiv:1703.01898 Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text classification with recurrent neural networks (2017). arXiv:​1703.​01898
75.
Zurück zum Zitat Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI, vol. 350 (2017) Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI, vol. 350 (2017)
76.
Zurück zum Zitat Wang, B.: Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2311–2320 (2018) Wang, B.: Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2311–2320 (2018)
77.
Zurück zum Zitat Marivate, V., Sefara, T.: Improving short text classification through global augmentation methods. In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction, pp. 385–399. Springer, Cham (2020) Marivate, V., Sefara, T.: Improving short text classification through global augmentation methods. In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction, pp. 385–399. Springer, Cham (2020)
78.
Zurück zum Zitat Khalifi, H., Cherif, W., El Qadi, A., Ghanou, Y.: Query expansion based on clustering and personalized information retrieval. Prog. Artif. Intell. 8(2), 241–251 (2019)CrossRef Khalifi, H., Cherif, W., El Qadi, A., Ghanou, Y.: Query expansion based on clustering and personalized information retrieval. Prog. Artif. Intell. 8(2), 241–251 (2019)CrossRef
Metadaten
Titel
Text categorization based on a new classification by thresholds
verfasst von
Walid Cherif
Abdellah Madani
Mohamed Kissi
Publikationsdatum
03.06.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Progress in Artificial Intelligence / Ausgabe 4/2021
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-021-00247-1

Weitere Artikel der Ausgabe 4/2021

Progress in Artificial Intelligence 4/2021 Zur Ausgabe

Premium Partner