Top

International Journal of Machine Learning and Cybernetics

Published in:

19-07-2019 | Original Article

Combination of loss functions for deep text classification

Authors: Hamideh Hajiabadi, Diego Molla-Aliod, Reza Monsefi, Hadi Sadoghi Yazdi

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Ensemble methods have shown to improve the results of statistical classifiers by combining multiple single learners into a strong one. In this paper, we explore the use of ensemble methods at the level of the objective function of a deep neural network. We propose a novel objective function that is a linear combination of single losses and integrate the proposed objective function into a deep neural network. By doing so, the weights associated with the linear combination of losses are learned by back propagation during the training stage. We study the impact of such an ensemble loss function on the state-of-the-art convolutional neural networks for text classification. We show the effectiveness of our approach through comprehensive experiments on text classification. The experimental results demonstrate a significant improvement compared with the conventional state-of-the-art methods in the literature.

previous article Recent advances in deep learning

next article A deep neural networks based recommendation algorithm using user and item basic data

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156MathSciNetCrossRef

Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155MATH

Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9(Sep):2015–2033MathSciNetMATH

Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH

Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRef

Chen L, Qu H, Zhao J (2017) Generalized correntropy based deep learning in presence of non-gaussian noises. Neurocomputing 278:41–50CrossRef

Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. ACM, New York, pp 160–167

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537MATH

Condorcet MJANC (1955) Sketch for a historical picture of the progress of the human mind

10.

Dasarathy BV, Sheela BV (1979) A composite classifier system design: concepts and methodology. Proc IEEE 67(5):708–713CrossRef

11.

De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67MathSciNetCrossRef

12.

Dragoni M, Petrucci G (2018) A fuzzy-based strategy for multi-domain sentiment analysis. Int J Approx Reason 93:59–73MathSciNetCrossRef

13.

Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: ICML'96 Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 03–06 July 1996. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 148–156

14.

Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, vol 9. SIAM, PhiladelphiaCrossRef

15.

Hajiabadi H, Molla-Aliod D, Monsefi R (2017) On extending neural networks with loss ensembles for text classification. arXiv:1711.05170 (preprint)

16.

Hajiabadi H, Monsefi R, Yazdi HS (2018) relf: robust regression extended with ensemble loss function. Appl Intell 49(4):1437–1450CrossRef

17.

Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001CrossRef

18.

He R, Zheng W-S, Bao-Gang H (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576CrossRef

19.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

20.

Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 22 August 2004. ACM, pp 168–177

21.

Kim HC, Pang S, Je HM, Kim D, Bang SY (2002) Support vector machine ensemble with bagging. Pattern recognition with support vector machines. Springer, New York, pp 397–408CrossRef

22.

Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 (preprint)

23.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

24.

Li X, Roth D (2002) Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics, vol 1, 24 August 2002. Association for Computational Linguistics, pp 1–7

25.

Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: The IEEE international joint conference on neural network proceedings, 16 July 2006. IEEE, pp 4919–4924

26.

Mandelbaum A, Shalev A (2016) Word embeddings and their use in sentence classification tasks. arXiv:1610.08229 (preprint)

27.

Mannor S, Meir R (2001) Weak learners and improved rates of convergence in boosting. In: Advances in neural information processing systems, pp 280–286

28.

Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056

29.

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

30.

Moore R, DeNero J (2011) L1 and L2 regularization for multiclass hinge loss models. In: Symposium on machine learning in speech and language processing

31.

Nocedal J, Wright SJ (2006) Penalty and augmented Lagrangian methods. In: Numerical Optimization, pp 497–528

32.

Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, 25 June 2005. Association for Computational Linguistics, pp 115–124

33.

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

34.

Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association

35.

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

36.

Yu CH (1977) Exploratory data analysis. Methods 2:131–160

37.

Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH

38.

Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd annual meeting of the association for computational Linguistics and the 7th international joint conference on natural language processing (vol 2: short papers), pp 352–357

39.

Wang W (2008) Some fundamental issues in ensemble methods. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1 June 2008. IEEE, pp 2243–2250

40.

Weingessel A, Dimitriadou E, Hornik K (2003) An ensemble method for clustering. In: Proceedings of the 3rd international workshop on distributed statistical computing

41.

Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890CrossRef

42.

Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, New York, pp 818–833

43.

Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820 (preprint)

44.

Zhao L, Mammadov M, Yearwood J (2010) From convex to nonconvex: a loss function analysis for binary classification. In: IEEE International Conference on Data Mining Workshops, 13 December 2010. IEEE, pp 1281–1288

Title: Combination of loss functions for deep text classification
Authors: Hamideh Hajiabadi
Diego Molla-Aliod
Reza Monsefi
Hadi Sadoghi Yazdi
Publication date: 19-07-2019
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 4/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-019-00982-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 4/2020

A deep neural networks based recommendation algorithm using user and item basic data

DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

Weighted multi-deep ranking supervised hashing for efficient image retrieval

Recent advances in deep learning

Learning deep hierarchical and temporal recurrent neural networks with residual learning

An adversarial non-volume preserving flow model with Boltzmann priors