Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 4/2020

19-07-2019 | Original Article

Combination of loss functions for deep text classification

Authors: Hamideh Hajiabadi, Diego Molla-Aliod, Reza Monsefi, Hadi Sadoghi Yazdi

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Ensemble methods have shown to improve the results of statistical classifiers by combining multiple single learners into a strong one. In this paper, we explore the use of ensemble methods at the level of the objective function of a deep neural network. We propose a novel objective function that is a linear combination of single losses and integrate the proposed objective function into a deep neural network. By doing so, the weights associated with the linear combination of losses are learned by back propagation during the training stage. We study the impact of such an ensemble loss function on the state-of-the-art convolutional neural networks for text classification. We show the effectiveness of our approach through comprehensive experiments on text classification. The experimental results demonstrate a significant improvement compared with the conventional state-of-the-art methods in the literature.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156MathSciNetCrossRef Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156MathSciNetCrossRef
2.
go back to reference Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155MATH Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155MATH
3.
go back to reference Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9(Sep):2015–2033MathSciNetMATH Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9(Sep):2015–2033MathSciNetMATH
4.
go back to reference Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
6.
go back to reference Chen L, Qu H, Zhao J (2017) Generalized correntropy based deep learning in presence of non-gaussian noises. Neurocomputing 278:41–50CrossRef Chen L, Qu H, Zhao J (2017) Generalized correntropy based deep learning in presence of non-gaussian noises. Neurocomputing 278:41–50CrossRef
7.
go back to reference Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. ACM, New York, pp 160–167 Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. ACM, New York, pp 160–167
8.
go back to reference Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537MATH Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537MATH
9.
go back to reference Condorcet MJANC (1955) Sketch for a historical picture of the progress of the human mind Condorcet MJANC (1955) Sketch for a historical picture of the progress of the human mind
10.
go back to reference Dasarathy BV, Sheela BV (1979) A composite classifier system design: concepts and methodology. Proc IEEE 67(5):708–713CrossRef Dasarathy BV, Sheela BV (1979) A composite classifier system design: concepts and methodology. Proc IEEE 67(5):708–713CrossRef
11.
go back to reference De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67MathSciNetCrossRef De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67MathSciNetCrossRef
12.
go back to reference Dragoni M, Petrucci G (2018) A fuzzy-based strategy for multi-domain sentiment analysis. Int J Approx Reason 93:59–73MathSciNetCrossRef Dragoni M, Petrucci G (2018) A fuzzy-based strategy for multi-domain sentiment analysis. Int J Approx Reason 93:59–73MathSciNetCrossRef
13.
go back to reference Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: ICML'96 Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 03–06 July 1996. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 148–156 Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: ICML'96 Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 03–06 July 1996. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 148–156
14.
go back to reference Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, vol 9. SIAM, PhiladelphiaCrossRef Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, vol 9. SIAM, PhiladelphiaCrossRef
15.
go back to reference Hajiabadi H, Molla-Aliod D, Monsefi R (2017) On extending neural networks with loss ensembles for text classification. arXiv:1711.05170 (preprint) Hajiabadi H, Molla-Aliod D, Monsefi R (2017) On extending neural networks with loss ensembles for text classification. arXiv:​1711.​05170 (preprint)
16.
go back to reference Hajiabadi H, Monsefi R, Yazdi HS (2018) relf: robust regression extended with ensemble loss function. Appl Intell 49(4):1437–1450CrossRef Hajiabadi H, Monsefi R, Yazdi HS (2018) relf: robust regression extended with ensemble loss function. Appl Intell 49(4):1437–1450CrossRef
17.
go back to reference Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001CrossRef Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001CrossRef
18.
go back to reference He R, Zheng W-S, Bao-Gang H (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576CrossRef He R, Zheng W-S, Bao-Gang H (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576CrossRef
19.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
20.
go back to reference Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 22 August 2004. ACM, pp 168–177 Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 22 August 2004. ACM, pp 168–177
21.
go back to reference Kim HC, Pang S, Je HM, Kim D, Bang SY (2002) Support vector machine ensemble with bagging. Pattern recognition with support vector machines. Springer, New York, pp 397–408CrossRef Kim HC, Pang S, Je HM, Kim D, Bang SY (2002) Support vector machine ensemble with bagging. Pattern recognition with support vector machines. Springer, New York, pp 397–408CrossRef
23.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
24.
go back to reference Li X, Roth D (2002) Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics, vol 1, 24 August 2002. Association for Computational Linguistics, pp 1–7 Li X, Roth D (2002) Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics, vol 1, 24 August 2002. Association for Computational Linguistics, pp 1–7
25.
go back to reference Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: The IEEE international joint conference on neural network proceedings, 16 July 2006. IEEE, pp 4919–4924 Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: The IEEE international joint conference on neural network proceedings, 16 July 2006. IEEE, pp 4919–4924
27.
go back to reference Mannor S, Meir R (2001) Weak learners and improved rates of convergence in boosting. In: Advances in neural information processing systems, pp 280–286 Mannor S, Meir R (2001) Weak learners and improved rates of convergence in boosting. In: Advances in neural information processing systems, pp 280–286
28.
go back to reference Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056 Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056
29.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
30.
go back to reference Moore R, DeNero J (2011) L1 and L2 regularization for multiclass hinge loss models. In: Symposium on machine learning in speech and language processing Moore R, DeNero J (2011) L1 and L2 regularization for multiclass hinge loss models. In: Symposium on machine learning in speech and language processing
31.
go back to reference Nocedal J, Wright SJ (2006) Penalty and augmented Lagrangian methods. In: Numerical Optimization, pp 497–528 Nocedal J, Wright SJ (2006) Penalty and augmented Lagrangian methods. In: Numerical Optimization, pp 497–528
32.
go back to reference Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, 25 June 2005. Association for Computational Linguistics, pp 115–124 Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, 25 June 2005. Association for Computational Linguistics, pp 115–124
33.
go back to reference Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642 Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
34.
go back to reference Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association
35.
go back to reference Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112 Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
36.
go back to reference Yu CH (1977) Exploratory data analysis. Methods 2:131–160 Yu CH (1977) Exploratory data analysis. Methods 2:131–160
37.
go back to reference Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH
38.
go back to reference Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd annual meeting of the association for computational Linguistics and the 7th international joint conference on natural language processing (vol 2: short papers), pp 352–357 Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd annual meeting of the association for computational Linguistics and the 7th international joint conference on natural language processing (vol 2: short papers), pp 352–357
39.
go back to reference Wang W (2008) Some fundamental issues in ensemble methods. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1 June 2008. IEEE, pp 2243–2250 Wang W (2008) Some fundamental issues in ensemble methods. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1 June 2008. IEEE, pp 2243–2250
40.
go back to reference Weingessel A, Dimitriadou E, Hornik K (2003) An ensemble method for clustering. In: Proceedings of the 3rd international workshop on distributed statistical computing Weingessel A, Dimitriadou E, Hornik K (2003) An ensemble method for clustering. In: Proceedings of the 3rd international workshop on distributed statistical computing
41.
go back to reference Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890CrossRef Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890CrossRef
42.
go back to reference Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, New York, pp 818–833 Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, New York, pp 818–833
43.
go back to reference Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820 (preprint) Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:​1510.​03820 (preprint)
44.
go back to reference Zhao L, Mammadov M, Yearwood J (2010) From convex to nonconvex: a loss function analysis for binary classification. In: IEEE International Conference on Data Mining Workshops, 13 December 2010. IEEE, pp 1281–1288 Zhao L, Mammadov M, Yearwood J (2010) From convex to nonconvex: a loss function analysis for binary classification. In: IEEE International Conference on Data Mining Workshops, 13 December 2010. IEEE, pp 1281–1288
Metadata
Title
Combination of loss functions for deep text classification
Authors
Hamideh Hajiabadi
Diego Molla-Aliod
Reza Monsefi
Hadi Sadoghi Yazdi
Publication date
19-07-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 4/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-019-00982-x

Other articles of this Issue 4/2020

International Journal of Machine Learning and Cybernetics 4/2020 Go to the issue