Skip to main content
Top
Published in: Evolutionary Intelligence 4/2021

12-07-2020 | Research Paper

Stochastic gradient-CAViaR-based deep belief network for text categorization

Authors: V. Srilakshmi, K. Anuradha, C. Shoba Bindu

Published in: Evolutionary Intelligence | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Text categorization is defined as the process of assigning tags to text according to its content. Some of the text classification approaches are document organization, spam email filtering, and news groupings. This paper introduces stochastic gradient-CAViaR-based deep belief networks for text categorization. The overall procedure of the proposed approach involves four steps, such as pre-processing, feature extraction, feature selection, and text categorization. At first, the pre-processing is carried out from the input data based on stemming, stop-word removal, and then, the feature extraction is performed using a vector space model. Once the extraction is done, the feature selection is carried out based on entropy. Subsequently, the selected features are given to the text categorization step. Here, the text categorization is done using the proposed SG-CAV-based deep belief networks (SG-CAV-based DBN). The proposed SG-CAV is used to train the DBN, which is designed by combining conditional autoregressive value at risk and stochastic gradient descent. The performance of the proposed SGCAV + DBN is evaluated based on the metrics, such as recall, precision, F-measure and accuracy. Also, the performance of the proposed method is compared with the existing methods, such as Naive Bayes, K-nearest neighbours, support vector machine, and deep belief network (DBN). From the analysis, it is depicted that the proposed SGCAV + DBN method achieves the maximal precision of 0.78, the maximal recall of 0.78, maximal F-measure of 0.78, and the maximal accuracy of 0.95. Among the existing methods, DBN achieves the maximum precision, recall, F-measure and accuracy, for 20 Newsgroup database and Reuter database. The performance of the proposed system is 10.98%, 11.54%, 11.538%, and 18.33% higher than the precision, recall, F-measure, and accuracy of the DBN for 20 Newsgroup database, and 2.38%, 2.38%, 2.37%, and 0.21% higher than the precision, recall, F-measure and accuracy of the DBN for Reuter database.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543CrossRef Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543CrossRef
2.
go back to reference Tellez ES, Moctezuma D, Miranda-Jiménez S, Graff M (2018) An automated text categorization framework based on hyper parameter optimization. Knowl-Based Syst 149:110–123CrossRef Tellez ES, Moctezuma D, Miranda-Jiménez S, Graff M (2018) An automated text categorization framework based on hyper parameter optimization. Knowl-Based Syst 149:110–123CrossRef
3.
go back to reference Saad MK, Ashour W (2010) Arabic text classification using decision trees. In: Proceedings of 12th international workshop on computer science and information technologies CSIT, Moscow-Saint Petersburg, Russia Saad MK, Ashour W (2010) Arabic text classification using decision trees. In: Proceedings of 12th international workshop on computer science and information technologies CSIT, Moscow-Saint Petersburg, Russia
4.
go back to reference Mohammad AH, Alwadan T, Al-Momani O (2016) Arabic text categorization using support vector machine. Naïve Bayes Neural Netw 5(1):108–115 Mohammad AH, Alwadan T, Al-Momani O (2016) Arabic text categorization using support vector machine. Naïve Bayes Neural Netw 5(1):108–115
5.
go back to reference Tang B, He H, Baggenstoss PM, Kay S (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606CrossRef Tang B, He H, Baggenstoss PM, Kay S (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606CrossRef
6.
go back to reference Lee J, Yu I, Park J, Kim DW (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inf Sci 485:263–280CrossRef Lee J, Yu I, Park J, Kim DW (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inf Sci 485:263–280CrossRef
7.
go back to reference Alwehaibi A, Roy K (2018) Comparison of pre-trained word vectors for arabic text classification using deep learning approach. In: Proceedings of 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, pp 1471–1474 Alwehaibi A, Roy K (2018) Comparison of pre-trained word vectors for arabic text classification using deep learning approach. In: Proceedings of 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, pp 1471–1474
8.
go back to reference Hu Y, Yi Y, Yang T, Pan Q (2018) Short text classification with a convolutional neural networks based method. In: Proceedings of 15th international conference on control, automation, robotics and vision (ICARCV), Singapore, pp 1432–1435 Hu Y, Yi Y, Yang T, Pan Q (2018) Short text classification with a convolutional neural networks based method. In: Proceedings of 15th international conference on control, automation, robotics and vision (ICARCV), Singapore, pp 1432–1435
9.
go back to reference Xu Z, Li J, Liu B, Bi J, Li R, Mao R (2017) Semi-supervised learning in large scale text categorization. J Shanghai Jiatong Univ 22(3):291–302CrossRef Xu Z, Li J, Liu B, Bi J, Li R, Mao R (2017) Semi-supervised learning in large scale text categorization. J Shanghai Jiatong Univ 22(3):291–302CrossRef
10.
go back to reference Attaccalite C, Cannuccia E, Grüning M (2017) Excitonic effects in third-harmonic generation: the case of carbon nanotubes and nanoribbons. Phys Rev B 95(12):125403CrossRef Attaccalite C, Cannuccia E, Grüning M (2017) Excitonic effects in third-harmonic generation: the case of carbon nanotubes and nanoribbons. Phys Rev B 95(12):125403CrossRef
11.
go back to reference Nguyen HM, Khoa BT (2019) The relationship between the perceived mental benefits, online trust, and personal information disclosure in online shopping. J Asian Finance 6(4):261–270CrossRef Nguyen HM, Khoa BT (2019) The relationship between the perceived mental benefits, online trust, and personal information disclosure in online shopping. J Asian Finance 6(4):261–270CrossRef
12.
go back to reference Tu F, Yin S, Ouyang P, Tang S, Liu L, Wei S (2017) Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans Very Large Scale Integr Syst 25(8):2220–2233CrossRef Tu F, Yin S, Ouyang P, Tang S, Liu L, Wei S (2017) Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans Very Large Scale Integr Syst 25(8):2220–2233CrossRef
13.
go back to reference Ninu Preetha NS, Praveena S (2018) Multiple feature sets and SVM classifier for the detection of diabetic retinopathy using retinal images. Multimed Res 1(1):17–26 Ninu Preetha NS, Praveena S (2018) Multiple feature sets and SVM classifier for the detection of diabetic retinopathy using retinal images. Multimed Res 1(1):17–26
14.
go back to reference Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142:012012 Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142:012012
15.
go back to reference Bhopale AP, Kamath SS, Tiwari A (2018) Concise semantic analysis based text categorization using modified hybrid union feature selection approach. In: Proceedings of 4th international conference on recent advances in information technology (RAIT), Dhanbad, pp 1–7 Bhopale AP, Kamath SS, Tiwari A (2018) Concise semantic analysis based text categorization using modified hybrid union feature selection approach. In: Proceedings of 4th international conference on recent advances in information technology (RAIT), Dhanbad, pp 1–7
16.
go back to reference Haryanto AW, Mawardi EK, Muljono (2018) Influence of word normalization and chi squared feature selection on support vector machine (SVM) text classification. In: Proceedings of international seminar on application for technology of information and communication, Semarang, pp 229–233 Haryanto AW, Mawardi EK, Muljono (2018) Influence of word normalization and chi squared feature selection on support vector machine (SVM) text classification. In: Proceedings of international seminar on application for technology of information and communication, Semarang, pp 229–233
17.
go back to reference Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing (BigComp), pp 404–409 Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing (BigComp), pp 404–409
18.
go back to reference Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advances in computing, communications and informatics (ICACCI), Bangalore, pp 538–542 Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advances in computing, communications and informatics (ICACCI), Bangalore, pp 538–542
19.
go back to reference Bigi B (2003) Using Kullback–Leibler distance for text categorization. In: Advances in information retrieval, vol 2633. Springer, Berlin, pp 305–319 Bigi B (2003) Using Kullback–Leibler distance for text categorization. In: Advances in information retrieval, vol 2633. Springer, Berlin, pp 305–319
20.
go back to reference Ma T, Motta G, Liu K (2017) Delivering real-time information services on public transit: a framework. IEEE Trans Intell Transp Syst 18(10):2642–2656CrossRef Ma T, Motta G, Liu K (2017) Delivering real-time information services on public transit: a framework. IEEE Trans Intell Transp Syst 18(10):2642–2656CrossRef
21.
go back to reference Kouretas GP, Zarangas L (2005) Conditional autoregressive value at risk by regression quantiles estimating market risk for major stock markets, no. 0521 Kouretas GP, Zarangas L (2005) Conditional autoregressive value at risk by regression quantiles estimating market risk for major stock markets, no. 0521
22.
go back to reference Kim S-B, Han K-S, Rim H-C, Myaeng SH (2006) Some effective techniques for naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466CrossRef Kim S-B, Han K-S, Rim H-C, Myaeng SH (2006) Some effective techniques for naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466CrossRef
23.
go back to reference Liu C, Wang W, Tu G, Xiang Y, Wang S, Lv F (2017) A new centroid-based classification model for text categorization. Knowl Based Syst 136:15–26CrossRef Liu C, Wang W, Tu G, Xiang Y, Wang S, Lv F (2017) A new centroid-based classification model for text categorization. Knowl Based Syst 136:15–26CrossRef
24.
go back to reference Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216CrossRef Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216CrossRef
25.
go back to reference Zheng T, Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing Zheng T, Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing
26.
go back to reference Liu B, Xiao Y, Hao Z (2018) A selective multiple instance transfer learning method for text categorization problems. Knowl-Based Syst 141:178–187CrossRef Liu B, Xiao Y, Hao Z (2018) A selective multiple instance transfer learning method for text categorization problems. Knowl-Based Syst 141:178–187CrossRef
27.
go back to reference Kim K, Zhang SY (2018) Trigonometric comparison measure: a feature selection method for text categorization. Data Knowl Eng 119:1–12CrossRef Kim K, Zhang SY (2018) Trigonometric comparison measure: a feature selection method for text categorization. Data Knowl Eng 119:1–12CrossRef
28.
go back to reference Feng G, Li S, Sun T, Zhang B (2018) A probabilistic model derived term weighting scheme for text classification. Pattern Recogn Lett 110:23–29CrossRef Feng G, Li S, Sun T, Zhang B (2018) A probabilistic model derived term weighting scheme for text classification. Pattern Recogn Lett 110:23–29CrossRef
29.
go back to reference Yang J, Yang G (2018) Modified convolutional neural network based on dropout and the stochastic gradient descent optimizer. Algorithms 11(3):28MathSciNetCrossRef Yang J, Yang G (2018) Modified convolutional neural network based on dropout and the stochastic gradient descent optimizer. Algorithms 11(3):28MathSciNetCrossRef
30.
go back to reference Dai W, Xue G-R, Yang Q, Yu Y (2007) Transferring Naive Bayes classifiers for text classification. In: AAAI, vol 7, pp 540–545 Dai W, Xue G-R, Yang Q, Yu Y (2007) Transferring Naive Bayes classifiers for text classification. In: AAAI, vol 7, pp 540–545
31.
go back to reference Camastra F, Razi G (2019) Italian text categorization with lemmatization and support vector machines. In: Neural approaches to dynamics of signal exchanges, vol 151, pp 47–54 Camastra F, Razi G (2019) Italian text categorization with lemmatization and support vector machines. In: Neural approaches to dynamics of signal exchanges, vol 151, pp 47–54
32.
go back to reference Jo T (2019) Improving K nearest neighbor into string vector version for text categorization. In: 21st international conference on advanced communication technology (ICACT), PyeongChang Kwangwoon_Do, Korea (South) Jo T (2019) Improving K nearest neighbor into string vector version for text categorization. In: 21st international conference on advanced communication technology (ICACT), PyeongChang Kwangwoon_Do, Korea (South)
33.
go back to reference Berge GT, Granmo O-C, Tveit TO, Goodwin M, Jiao L, Matheussen BV (2019) Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. In: IEEE Access, vol 7, pp 115134–115146 Berge GT, Granmo O-C, Tveit TO, Goodwin M, Jiao L, Matheussen BV (2019) Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. In: IEEE Access, vol 7, pp 115134–115146
34.
go back to reference Engle RF, Manganelli S (2004) CAViaR: conditional autoregressive value at risk by regression quantiles. J Bus Econ Stat 22(4):367–381MathSciNetCrossRef Engle RF, Manganelli S (2004) CAViaR: conditional autoregressive value at risk by regression quantiles. J Bus Econ Stat 22(4):367–381MathSciNetCrossRef
35.
go back to reference Ranjan NM, Prasad RS (2018) LFNN: lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Appl Soft Comput J 71:994–1008CrossRef Ranjan NM, Prasad RS (2018) LFNN: lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Appl Soft Comput J 71:994–1008CrossRef
36.
go back to reference Huang D, Yu B, Fabozzi FJ, Fukushima M (2009) CAViaR-based forecast for oil price risk. Energy Econ 31:511–518CrossRef Huang D, Yu B, Fabozzi FJ, Fukushima M (2009) CAViaR-based forecast for oil price risk. Energy Econ 31:511–518CrossRef
37.
38.
go back to reference Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23 (NIPS 2010) Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23 (NIPS 2010)
41.
go back to reference Wajeed MA, Adilakshmi T (2011) Using KNN algorithm for text categorization. In: Proceedings of international conference on computational intelligence and information technology, pp 796–801 Wajeed MA, Adilakshmi T (2011) Using KNN algorithm for text categorization. In: Proceedings of international conference on computational intelligence and information technology, pp 796–801
42.
go back to reference Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advance in computing, communications, and informatics (ICACCI) Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advance in computing, communications, and informatics (ICACCI)
Metadata
Title
Stochastic gradient-CAViaR-based deep belief network for text categorization
Authors
V. Srilakshmi
K. Anuradha
C. Shoba Bindu
Publication date
12-07-2020
Publisher
Springer Berlin Heidelberg
Published in
Evolutionary Intelligence / Issue 4/2021
Print ISSN: 1864-5909
Electronic ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-020-00449-x

Other articles of this Issue 4/2021

Evolutionary Intelligence 4/2021 Go to the issue

Premium Partner