nach oben

Soft Computing

Erschienen in:

09.11.2020 | Methodologies and Application

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

verfasst von: Bichitrananda Behera, G. Kumaravelan

Erschienen in: Soft Computing | Ausgabe 15/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The fuzzy rough set (FRS) acts as a powerful mathematical tool to deal with uncertain data, and it has many applications in feature selection, dimensionality reduction and classification. The fuzzy rough set based on robust nearest neighbor (FRS-RNN) is one of the vital classifiers which has been successfully applied to handle real-valued datasets. From the literature, it is very clearly evident that no research attempt has been made on FRS-RNN to text document classification. Generally, the document classification process consists of two crucial phases, namely feature extraction and classifier model construction. Mainly TF-IDF and convolutional neural network (CNN)-based techniques are used for efficient feature extraction. The CNN provides the best feature engineering through effective preprocessing the documents for better representation using pre-trained word embedding. In this paper, we proposed a modified CNN structure for both text document classification and feature extraction. Then, both FRS and FRS-RNN have been implemented for text document classification on the benchmark datasets like 20 Newsgroup and Reuter-21578 using both TF-IDF and modified CNN-based feature extraction techniques. The classification performance of the FRS, CNN and FRS-RNN is evaluated and compared using well-defined metrics like accuracy, precision, recall and F1-measure. Finally, the classification performance of FRS-RNN is compared with state-of-the-art traditional classification models such as SVM, KNN, Naïve Bayes, DNN, CNN and RNN and with some recently developed classification models. The experimental results followed by empirical evaluation show that the proposed FRS-RNN outperforms all the aforementioned classification models.

Vorheriger Artikel Triangular approximation of intuitionistic fuzzy numbers on multi-criteria decision making problem

Nächster Artikel Regression test optimization and prioritization using Honey Bee optimization algorithm with fuzzy rule base

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795CrossRef

Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, BerlinCrossRef

Adhikari A, Ram A, Tang R, Lin J (2019) Docbert: Bert for document classification. arXiv:1904.08398

Adhikari A, Ram A, Tang R, Lin J (2019) Rethinking complex neural network architectures for document classification. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4046–4051

Alraimi A, Ertürk S (2016) Effect of feature extraction and classification method on hyperspectral image classification accuracy. In: 2016 24th signal processing and communication application conference (SIU), pp 625–628. IEEE

Behera B, Kumaravelan G, et al. (2019). Performance evaluation of deep learning algorithms in biomedical document classification. In: 2019 11th international conference on advanced computing (ICoAC). IEEE, pp 220–224

Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383

CireşAn D, Meier U, Masci J, Schmidhuber J (2012) Multi-column deep neural network for traffic sign classification. Neural Netw 32:333–338CrossRef

Cornelis C, De Cock M, Radzikowska AM (2007) Vaguely quantified rough sets. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing. Springer, pp. 87–94

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH

Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483CrossRef

De Cock M, Cornelis C, Kerre EE (2007) Fuzzy rough sets: the forgotten step. IEEE Trans Fuzzy Syst 15(1):121–130CrossRef

Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J General Syst 17(2–3):191–209CrossRef

Gupta V, Saw A, Nokhiz P, Gupta H, Talukdar P (2020) Improving document classification with multi-sense embeddings. In: Proceedings of the European conference on artificial intelligence

Hu H, Liao M, Zhang C, Jing Y (2020) Text classification based recurrent neural network. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC). IEEE, pp 652–655

Hu Q, An S, Yu D (2010) Soft fuzzy rough sets for robust feature evaluation and selection. Inf Sci 180(22):4384–4400MathSciNetCrossRef

Hu Q, Yu D, Pedrycz W, Chen D (2010) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 23(11):1649–1667CrossRef

Hu Q, Zhang L, An S, Zhang D, Yu D (2011a) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651CrossRef

Hu Q, Zhang L, An S, Zhang D, Yu D (2011b) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651CrossRef

Huang Y, Li L (2011) Naive bayes classification algorithm based on small sample set. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 34–39

Ionescu RT, Butnaru AM (2019) Vector of locally-aggregated word embeddings (vlawe): a novel document-level representation. In: NAACL-HLT

Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016a) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef

Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016b) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef

Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70CrossRef

Ketkar N (2017) Introduction to keras. In: Deep learning with Python. Springer, pp 97–111

Kowsari K, Heidarysafa M, Brown DE, Meimandi KJ Barnes LE (2018) Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd International Conference on Information System and Data Mining, (pp. 19–28)

Liang H, Sun X, Sun Y, Gao Y (2017a) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12CrossRef

Liang H, Sun X, Sun Y, Gao Y (2017b) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12CrossRef

Liu Q, Wang J, Zhang D, Yang Y, Wang N (2018) Text features extraction based on tf-idf associating semantic. In: 2018 IEEE 4th international conference on computer and communications (ICCC). IEEE, pp. 2338–2343

Lu H, Huang SH, Ye T, Guo X (2019) Graph star net for generalized multi-task learning. arXiv:1906.12330

Mieszkowicz-Rolka A, Rolka L (2004) Variable precision fuzzy rough sets. In: Transactions on Rough Sets I. Springer, pp 144–160

Moldagulova A, Sulaiman RB (2017) Using knn algorithm for classification of textual documents. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 665–671

Nikolentzos G, Tixier AJ-P, Vazirgiannis, M (2020) Message passing attention networks for document understanding. arXiv:1908.06267

Pawlak Z (2012) Rough sets: theoretical aspects of reasoning about data, volume 9. Springer, Berlin

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH

Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

Salido JF, Murakami S (2003) Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations. Fuzzy Sets Syst 139(3):635–660MathSciNetCrossRef

Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRef

Skianis K, Nikolentzos G, Limnios S, Vazirgiannis M (2020) Rep the set: neural networks for learning set representations. In: International conference on artificial intelligence and statistics, pp 1410–1420

Song P, Geng C, Li Z (2019) Research on text classification based on convolutional neural network. In: 2019 international conference on computer network, electronic and automation (ICCNEA). IEEE, pp 229–232

Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In: ICML

Wang Z, Qu Z (2017) Research on web text classification algorithm based on improved cnn and svm. In: 2017 IEEE 17th international conference on communication technology (ICCT). IEEE, pp 1958–1961

Werner M, Laber ES (2019) Speeding up word mover’s distance and its variants via properties of distances between embeddings. arXiv:1912.00509

Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: Proceedings of the 36th international conference on machine learning. PMLR, pp 6861–6871

Xu W, Tan Y (2019) Semisupervised text classification by variational autoencoder. IEEE Trans. Neural Netw Learn Syst 31(1):295–308MathSciNetCrossRef

Yamada I, Shindo H (2019) Neural attentive bag-of-entities model for text classification. arXiv:1909.01259

Yamada I, Shindo H, Takefuji Y (2018) Representation learning of entities and documents from knowledge base descriptions. In: COLING

Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence vol 33, pp 7370–7377

Yeung DS, Chen D, Tsang EC, Lee JW, Xizhao W (2005) On the generalization of fuzzy rough sets. IEEE Trans Fuzzy Syst 13(3):343–361CrossRef

Yokoyama Y, Katsumata T, Yasuda M (2019) Restricted boltzmann machine with multivalued hidden variables. Rev Socionetw Strateg 13(2):253–266CrossRef

Zadeh LA, Klir GJ, Yuan B (1996) Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers, vol 6. World Scientific, SingaporeCrossRef

Zhao S, Tsang EC, Chen D, Wang X (2009) Building a rule-based classifier-a fuzzy-rough set approach. IEEE Trans Knowl Data Eng 22(5):624–638CrossRef

Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46(1):39–59MathSciNetCrossRef

Titel: Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)
verfasst von: Bichitrananda Behera
G. Kumaravelan
Publikationsdatum: 09.11.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 15/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-020-05410-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 15/2021

Deep logarithmic neural network for Internet intrusion detection

A cloud load forecasting model with nonlinear changes using whale optimization algorithm hybrid strategy

Group decision-making based on bipolar neutrosophic fuzzy prioritized muirhead mean weighted averaging operator

A modified whale optimization algorithm to overcome delayed convergence in artificial neural networks

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Solution of multi-objective transportation-p-facility location problem with effect of variable carbon emission by evolutionary algorithms

Premium Partner