Skip to main content
Erschienen in: Soft Computing 15/2021

09.11.2020 | Methodologies and Application

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

verfasst von: Bichitrananda Behera, G. Kumaravelan

Erschienen in: Soft Computing | Ausgabe 15/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The fuzzy rough set (FRS) acts as a powerful mathematical tool to deal with uncertain data, and it has many applications in feature selection, dimensionality reduction and classification. The fuzzy rough set based on robust nearest neighbor (FRS-RNN) is one of the vital classifiers which has been successfully applied to handle real-valued datasets. From the literature, it is very clearly evident that no research attempt has been made on FRS-RNN to text document classification. Generally, the document classification process consists of two crucial phases, namely feature extraction and classifier model construction. Mainly TF-IDF and convolutional neural network (CNN)-based techniques are used for efficient feature extraction. The CNN provides the best feature engineering through effective preprocessing the documents for better representation using pre-trained word embedding. In this paper, we proposed a modified CNN structure for both text document classification and feature extraction. Then, both FRS and FRS-RNN have been implemented for text document classification on the benchmark datasets like 20 Newsgroup and Reuter-21578 using both TF-IDF and modified CNN-based feature extraction techniques. The classification performance of the FRS, CNN and FRS-RNN is evaluated and compared using well-defined metrics like accuracy, precision, recall and F1-measure. Finally, the classification performance of FRS-RNN is compared with state-of-the-art traditional classification models such as SVM, KNN, Naïve Bayes, DNN, CNN and RNN and with some recently developed classification models. The experimental results followed by empirical evaluation show that the proposed FRS-RNN outperforms all the aforementioned classification models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795CrossRef Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795CrossRef
Zurück zum Zitat Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, BerlinCrossRef Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, BerlinCrossRef
Zurück zum Zitat Adhikari A, Ram A, Tang R, Lin J (2019) Rethinking complex neural network architectures for document classification. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4046–4051 Adhikari A, Ram A, Tang R, Lin J (2019) Rethinking complex neural network architectures for document classification. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4046–4051
Zurück zum Zitat Alraimi A, Ertürk S (2016) Effect of feature extraction and classification method on hyperspectral image classification accuracy. In: 2016 24th signal processing and communication application conference (SIU), pp 625–628. IEEE Alraimi A, Ertürk S (2016) Effect of feature extraction and classification method on hyperspectral image classification accuracy. In: 2016 24th signal processing and communication application conference (SIU), pp 625–628. IEEE
Zurück zum Zitat Behera B, Kumaravelan G, et al. (2019). Performance evaluation of deep learning algorithms in biomedical document classification. In: 2019 11th international conference on advanced computing (ICoAC). IEEE, pp 220–224 Behera B, Kumaravelan G, et al. (2019). Performance evaluation of deep learning algorithms in biomedical document classification. In: 2019 11th international conference on advanced computing (ICoAC). IEEE, pp 220–224
Zurück zum Zitat Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383 Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383
Zurück zum Zitat CireşAn D, Meier U, Masci J, Schmidhuber J (2012) Multi-column deep neural network for traffic sign classification. Neural Netw 32:333–338CrossRef CireşAn D, Meier U, Masci J, Schmidhuber J (2012) Multi-column deep neural network for traffic sign classification. Neural Netw 32:333–338CrossRef
Zurück zum Zitat Cornelis C, De Cock M, Radzikowska AM (2007) Vaguely quantified rough sets. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing. Springer, pp. 87–94 Cornelis C, De Cock M, Radzikowska AM (2007) Vaguely quantified rough sets. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing. Springer, pp. 87–94
Zurück zum Zitat Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH
Zurück zum Zitat Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483CrossRef Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483CrossRef
Zurück zum Zitat De Cock M, Cornelis C, Kerre EE (2007) Fuzzy rough sets: the forgotten step. IEEE Trans Fuzzy Syst 15(1):121–130CrossRef De Cock M, Cornelis C, Kerre EE (2007) Fuzzy rough sets: the forgotten step. IEEE Trans Fuzzy Syst 15(1):121–130CrossRef
Zurück zum Zitat Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J General Syst 17(2–3):191–209CrossRef Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J General Syst 17(2–3):191–209CrossRef
Zurück zum Zitat Gupta V, Saw A, Nokhiz P, Gupta H, Talukdar P (2020) Improving document classification with multi-sense embeddings. In: Proceedings of the European conference on artificial intelligence Gupta V, Saw A, Nokhiz P, Gupta H, Talukdar P (2020) Improving document classification with multi-sense embeddings. In: Proceedings of the European conference on artificial intelligence
Zurück zum Zitat Hu H, Liao M, Zhang C, Jing Y (2020) Text classification based recurrent neural network. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC). IEEE, pp 652–655 Hu H, Liao M, Zhang C, Jing Y (2020) Text classification based recurrent neural network. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC). IEEE, pp 652–655
Zurück zum Zitat Hu Q, An S, Yu D (2010) Soft fuzzy rough sets for robust feature evaluation and selection. Inf Sci 180(22):4384–4400MathSciNetCrossRef Hu Q, An S, Yu D (2010) Soft fuzzy rough sets for robust feature evaluation and selection. Inf Sci 180(22):4384–4400MathSciNetCrossRef
Zurück zum Zitat Hu Q, Yu D, Pedrycz W, Chen D (2010) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 23(11):1649–1667CrossRef Hu Q, Yu D, Pedrycz W, Chen D (2010) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 23(11):1649–1667CrossRef
Zurück zum Zitat Hu Q, Zhang L, An S, Zhang D, Yu D (2011a) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651CrossRef Hu Q, Zhang L, An S, Zhang D, Yu D (2011a) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651CrossRef
Zurück zum Zitat Hu Q, Zhang L, An S, Zhang D, Yu D (2011b) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651CrossRef Hu Q, Zhang L, An S, Zhang D, Yu D (2011b) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651CrossRef
Zurück zum Zitat Huang Y, Li L (2011) Naive bayes classification algorithm based on small sample set. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 34–39 Huang Y, Li L (2011) Naive bayes classification algorithm based on small sample set. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 34–39
Zurück zum Zitat Ionescu RT, Butnaru AM (2019) Vector of locally-aggregated word embeddings (vlawe): a novel document-level representation. In: NAACL-HLT Ionescu RT, Butnaru AM (2019) Vector of locally-aggregated word embeddings (vlawe): a novel document-level representation. In: NAACL-HLT
Zurück zum Zitat Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016a) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016a) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef
Zurück zum Zitat Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016b) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016b) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef
Zurück zum Zitat Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70CrossRef Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70CrossRef
Zurück zum Zitat Ketkar N (2017) Introduction to keras. In: Deep learning with Python. Springer, pp 97–111 Ketkar N (2017) Introduction to keras. In: Deep learning with Python. Springer, pp 97–111
Zurück zum Zitat Kowsari K, Heidarysafa M, Brown DE, Meimandi KJ Barnes LE (2018) Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd International Conference on Information System and Data Mining, (pp. 19–28) Kowsari K, Heidarysafa M, Brown DE, Meimandi KJ Barnes LE (2018) Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd International Conference on Information System and Data Mining, (pp. 19–28)
Zurück zum Zitat Liang H, Sun X, Sun Y, Gao Y (2017a) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12CrossRef Liang H, Sun X, Sun Y, Gao Y (2017a) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12CrossRef
Zurück zum Zitat Liang H, Sun X, Sun Y, Gao Y (2017b) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12CrossRef Liang H, Sun X, Sun Y, Gao Y (2017b) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12CrossRef
Zurück zum Zitat Liu Q, Wang J, Zhang D, Yang Y, Wang N (2018) Text features extraction based on tf-idf associating semantic. In: 2018 IEEE 4th international conference on computer and communications (ICCC). IEEE, pp. 2338–2343 Liu Q, Wang J, Zhang D, Yang Y, Wang N (2018) Text features extraction based on tf-idf associating semantic. In: 2018 IEEE 4th international conference on computer and communications (ICCC). IEEE, pp. 2338–2343
Zurück zum Zitat Mieszkowicz-Rolka A, Rolka L (2004) Variable precision fuzzy rough sets. In: Transactions on Rough Sets I. Springer, pp 144–160 Mieszkowicz-Rolka A, Rolka L (2004) Variable precision fuzzy rough sets. In: Transactions on Rough Sets I. Springer, pp 144–160
Zurück zum Zitat Moldagulova A, Sulaiman RB (2017) Using knn algorithm for classification of textual documents. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 665–671 Moldagulova A, Sulaiman RB (2017) Using knn algorithm for classification of textual documents. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 665–671
Zurück zum Zitat Nikolentzos G, Tixier AJ-P, Vazirgiannis, M (2020) Message passing attention networks for document understanding. arXiv:1908.06267 Nikolentzos G, Tixier AJ-P, Vazirgiannis, M (2020) Message passing attention networks for document understanding. arXiv:​1908.​06267
Zurück zum Zitat Pawlak Z (2012) Rough sets: theoretical aspects of reasoning about data, volume 9. Springer, Berlin Pawlak Z (2012) Rough sets: theoretical aspects of reasoning about data, volume 9. Springer, Berlin
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
Zurück zum Zitat Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Zurück zum Zitat Salido JF, Murakami S (2003) Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations. Fuzzy Sets Syst 139(3):635–660MathSciNetCrossRef Salido JF, Murakami S (2003) Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations. Fuzzy Sets Syst 139(3):635–660MathSciNetCrossRef
Zurück zum Zitat Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRef Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRef
Zurück zum Zitat Skianis K, Nikolentzos G, Limnios S, Vazirgiannis M (2020) Rep the set: neural networks for learning set representations. In: International conference on artificial intelligence and statistics, pp 1410–1420 Skianis K, Nikolentzos G, Limnios S, Vazirgiannis M (2020) Rep the set: neural networks for learning set representations. In: International conference on artificial intelligence and statistics, pp 1410–1420
Zurück zum Zitat Song P, Geng C, Li Z (2019) Research on text classification based on convolutional neural network. In: 2019 international conference on computer network, electronic and automation (ICCNEA). IEEE, pp 229–232 Song P, Geng C, Li Z (2019) Research on text classification based on convolutional neural network. In: 2019 international conference on computer network, electronic and automation (ICCNEA). IEEE, pp 229–232
Zurück zum Zitat Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In: ICML Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In: ICML
Zurück zum Zitat Wang Z, Qu Z (2017) Research on web text classification algorithm based on improved cnn and svm. In: 2017 IEEE 17th international conference on communication technology (ICCT). IEEE, pp 1958–1961 Wang Z, Qu Z (2017) Research on web text classification algorithm based on improved cnn and svm. In: 2017 IEEE 17th international conference on communication technology (ICCT). IEEE, pp 1958–1961
Zurück zum Zitat Werner M, Laber ES (2019) Speeding up word mover’s distance and its variants via properties of distances between embeddings. arXiv:1912.00509 Werner M, Laber ES (2019) Speeding up word mover’s distance and its variants via properties of distances between embeddings. arXiv:​1912.​00509
Zurück zum Zitat Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: Proceedings of the 36th international conference on machine learning. PMLR, pp 6861–6871 Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: Proceedings of the 36th international conference on machine learning. PMLR, pp 6861–6871
Zurück zum Zitat Xu W, Tan Y (2019) Semisupervised text classification by variational autoencoder. IEEE Trans. Neural Netw Learn Syst 31(1):295–308MathSciNetCrossRef Xu W, Tan Y (2019) Semisupervised text classification by variational autoencoder. IEEE Trans. Neural Netw Learn Syst 31(1):295–308MathSciNetCrossRef
Zurück zum Zitat Yamada I, Shindo H, Takefuji Y (2018) Representation learning of entities and documents from knowledge base descriptions. In: COLING Yamada I, Shindo H, Takefuji Y (2018) Representation learning of entities and documents from knowledge base descriptions. In: COLING
Zurück zum Zitat Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence vol 33, pp 7370–7377 Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence vol 33, pp 7370–7377
Zurück zum Zitat Yeung DS, Chen D, Tsang EC, Lee JW, Xizhao W (2005) On the generalization of fuzzy rough sets. IEEE Trans Fuzzy Syst 13(3):343–361CrossRef Yeung DS, Chen D, Tsang EC, Lee JW, Xizhao W (2005) On the generalization of fuzzy rough sets. IEEE Trans Fuzzy Syst 13(3):343–361CrossRef
Zurück zum Zitat Yokoyama Y, Katsumata T, Yasuda M (2019) Restricted boltzmann machine with multivalued hidden variables. Rev Socionetw Strateg 13(2):253–266CrossRef Yokoyama Y, Katsumata T, Yasuda M (2019) Restricted boltzmann machine with multivalued hidden variables. Rev Socionetw Strateg 13(2):253–266CrossRef
Zurück zum Zitat Zadeh LA, Klir GJ, Yuan B (1996) Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers, vol 6. World Scientific, SingaporeCrossRef Zadeh LA, Klir GJ, Yuan B (1996) Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers, vol 6. World Scientific, SingaporeCrossRef
Zurück zum Zitat Zhao S, Tsang EC, Chen D, Wang X (2009) Building a rule-based classifier-a fuzzy-rough set approach. IEEE Trans Knowl Data Eng 22(5):624–638CrossRef Zhao S, Tsang EC, Chen D, Wang X (2009) Building a rule-based classifier-a fuzzy-rough set approach. IEEE Trans Knowl Data Eng 22(5):624–638CrossRef
Metadaten
Titel
Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)
verfasst von
Bichitrananda Behera
G. Kumaravelan
Publikationsdatum
09.11.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 15/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-05410-9

Weitere Artikel der Ausgabe 15/2021

Soft Computing 15/2021 Zur Ausgabe

Premium Partner