Skip to main content
Top
Published in: Cognitive Computation 1/2017

24-10-2016

Semi-supervised Learning for Affective Common-Sense Reasoning

Authors: Luca Oneto, Federica Bisio, Erik Cambria, Davide Anguita

Published in: Cognitive Computation | Issue 1/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Background

Big social data analysis is the area of research focusing on collecting, examining, and processing large multi-modal and multi-source datasets in order to discover patterns/correlations and extract information from the Social Web. This is usually accomplished through the use of supervised and unsupervised machine learning algorithms that learn from the available data. However, these are usually highly computationally expensive, either in the training or in the prediction phase, as they are often not able to handle current data volumes. Parallel approaches have been proposed in order to boost processing speeds, but this clearly requires technologies that support distributed computations.

Methods

Extreme learning machines (ELMs) are an emerging learning paradigm, presenting an efficient unified solution to generalized feed-forward neural networks. ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. However, ELM cannot be easily parallelized, due to the presence of a pseudo-inverse calculation. Therefore, this paper aims to find a reliable method to realize a parallel implementation of ELM that can be applied to large datasets typical of Big Data problems with the employment of the most recent technology for parallel in-memory computation, i.e., Spark, designed to efficiently deal with iterative procedures that recursively perform operations over the same data. Moreover, this paper shows how to take advantage of the most recent advances in statistical learning theory (SLT) in order to address the issue of selecting ELM hyperparameters that give the best generalization performance. This involves assessing the performance of such algorithms (i.e., resampling methods and in-sample methods) by exploiting the most recent results in SLT and adapting them to the Big Data framework. The proposed approach has been tested on two affective analogical reasoning datasets. Affective analogical reasoning can be defined as the intrinsically human capacity to interpret the cognitive and affective information associated with natural language. In particular, we employed two benchmarks, each one composed by 21,743 common-sense concepts; each concept is represented according to two models of a semantic network in which common-sense concepts are linked to a hierarchy of affective domain labels.

Results

The labeled data have been split into two sets: The first 20,000 samples have been used for building the model with the ELM with the different SLT strategies, while the rest of the labeled samples, numbering 1743, have been kept apart as reference set in order to test the performance of the learned model. The splitting process has been repeated 30 times in order to obtain statistically relevant results. We ran the experiments through the use of the Google Cloud Platform, in particular, the Google Compute Engine. We employed the Google Compute Engine Platform with NM = 4 machines with two cores and 1.8 GB of RAM (machine type n1-highcpu-2) and an HDD of 30 GB equipped with Spark. Results on the affective dataset both show the effectiveness of the proposed parallel approach and underline the most suitable SLT strategies for the specific Big Data problem.

Conclusion

In this paper we showed how to build an ELM model with a novel scalable approach and to carefully assess the performance, with the use of the most recent results from SLT, for a sentiment analysis problem. Thanks to recent technologies and methods, the computational requirements of these methods have been improved to allow for the scaling to large datasets, which are typical of Big Data applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In this paper, we deal with a frequentist approach, which derives confidence intervals for quantities of interest, but the credible intervals of the Bayesian approach can be addressed equally well in the parametric setting [30].
 
2
We have exploited the property \(\sqrt{a 2b} \le \frac{a}{2} + b\) in order to remove all the constant terms which do not depend on \({\widehat{\beta }}_{\text{loo}}({\mathscr{A}}_{\mathscr{H}}, {\sqrt{n}}/{2})\).
 
Literature
1.
go back to reference Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–7.CrossRef Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–7.CrossRef
2.
go back to reference Saif H, He Y, Fernandez M, Alani H. Contextual semantics for sentiment analysis of twitter. Inf Process Manag. 2016;52(1):5–19.CrossRef Saif H, He Y, Fernandez M, Alani H. Contextual semantics for sentiment analysis of twitter. Inf Process Manag. 2016;52(1):5–19.CrossRef
3.
go back to reference Xia R, Xu F, Yu J, Qi Y, Cambria E. Polarity shift detection, elimination and ensemble: a three-stage model for document-level sentiment analysis. Inf Process Manag. 2016;52(1):36–45.CrossRef Xia R, Xu F, Yu J, Qi Y, Cambria E. Polarity shift detection, elimination and ensemble: a three-stage model for document-level sentiment analysis. Inf Process Manag. 2016;52(1):36–45.CrossRef
4.
go back to reference Balahur A, Jacquet G. Sentiment analysis meets social media-challenges and solutions of the field in view of the current information sharing context. Inf Process Manag. 2015;51(4):428–32.CrossRef Balahur A, Jacquet G. Sentiment analysis meets social media-challenges and solutions of the field in view of the current information sharing context. Inf Process Manag. 2015;51(4):428–32.CrossRef
6.
go back to reference Roy RS, Agarwal S, Ganguly N, Choudhury M. Syntactic complexity of web search queries through the lenses of language models, networks and users. Inf Process Manag. 2016;52(5):923–48. Roy RS, Agarwal S, Ganguly N, Choudhury M. Syntactic complexity of web search queries through the lenses of language models, networks and users. Inf Process Manag. 2016;52(5):923–48.
7.
go back to reference Abainia K, Ouamour S, Sayoud H. Effective language identification of forum texts based on statistical approaches. Inf Process Manag. 2016;52(4):491–512. Abainia K, Ouamour S, Sayoud H. Effective language identification of forum texts based on statistical approaches. Inf Process Manag. 2016;52(4):491–512.
8.
go back to reference Sun J, Wang G, Cheng X, Fu Y. Mining affective text to improve social media item recommendation. Inf Process Manag. 2015;51(4):444–57.CrossRef Sun J, Wang G, Cheng X, Fu Y. Mining affective text to improve social media item recommendation. Inf Process Manag. 2015;51(4):444–57.CrossRef
9.
go back to reference Cambria E, Hussain A. Sentic computing: a common-sense-based framework for concept-level sentiment analysis. Switzerland: Cham; 2015.CrossRef Cambria E, Hussain A. Sentic computing: a common-sense-based framework for concept-level sentiment analysis. Switzerland: Cham; 2015.CrossRef
10.
go back to reference Poria S, Cambria E, Howard N, Huang G-B, Hussain A. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. 2016;174:50–9.CrossRef Poria S, Cambria E, Howard N, Huang G-B, Hussain A. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. 2016;174:50–9.CrossRef
11.
go back to reference Wang Q, Cambria E, Liu C, Hussain A. Common sense knowledge for handwritten chinese recognition. Cogn Comput. 2013;5(2):234–42.CrossRef Wang Q, Cambria E, Liu C, Hussain A. Common sense knowledge for handwritten chinese recognition. Cogn Comput. 2013;5(2):234–42.CrossRef
12.
go back to reference Cambria E, Hussain A, Durrani T, Havasi C, Eckl C, Munro J. Sentic computing for patient centered application. In: IEEE ICSP, Beijing; 2010. p. 1279–82. Cambria E, Hussain A, Durrani T, Havasi C, Eckl C, Munro J. Sentic computing for patient centered application. In: IEEE ICSP, Beijing; 2010. p. 1279–82.
13.
go back to reference Cambria E, Gastaldo P, Bisio F, Zunino R. An ELM-based model for affective analogical reasoning. Neurocomputing. 2015;149:443–55.CrossRef Cambria E, Gastaldo P, Bisio F, Zunino R. An ELM-based model for affective analogical reasoning. Neurocomputing. 2015;149:443–55.CrossRef
14.
go back to reference Cambria E, Fu J, Bisio F, Poria S. AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: AAAI, Austin; 2015. p. 508–14. Cambria E, Fu J, Bisio F, Poria S. AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: AAAI, Austin; 2015. p. 508–14.
15.
go back to reference Cambria E, Wang H, White B. Guest editorial: big social data analysis. Knowl Based Syst. 2014;69:1–2.CrossRef Cambria E, Wang H, White B. Guest editorial: big social data analysis. Knowl Based Syst. 2014;69:1–2.CrossRef
16.
go back to reference Chakraborty M, Pal S, Pramanik R, Chowdary CR. Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag. 2016;52(6):1053–73. Chakraborty M, Pal S, Pramanik R, Chowdary CR. Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag. 2016;52(6):1053–73.
17.
go back to reference Kranjc J, Smailović J, Podpečan V, Grčar M, Žnidaršič M, Lavrač N. Active learning for sentiment analysis on data streams: methodology and workflow implementation in the clowdflows platform. Inf Process Manag. 2015;51(2):187–203.CrossRef Kranjc J, Smailović J, Podpečan V, Grčar M, Žnidaršič M, Lavrač N. Active learning for sentiment analysis on data streams: methodology and workflow implementation in the clowdflows platform. Inf Process Manag. 2015;51(2):187–203.CrossRef
18.
go back to reference Fersini E, Messina E, Pozzi FA. Expressive signals in social media languages to improve polarity detection. Inf Process Manag. 2016;52(1):20–35.CrossRef Fersini E, Messina E, Pozzi FA. Expressive signals in social media languages to improve polarity detection. Inf Process Manag. 2016;52(1):20–35.CrossRef
19.
go back to reference Cambria E, Livingstone A, Hussain A. The hourglass of emotions. In: Esposito A, Esposito AM, Vinciarelli A, Hoffmann R, Müller CC, editors. Cognitive behavioural systems. Berlin Heidelberg: Springer; 2012. p. 144–57. Cambria E, Livingstone A, Hussain A. The hourglass of emotions. In: Esposito A, Esposito AM, Vinciarelli A, Hoffmann R, Müller CC, editors. Cognitive behavioural systems. Berlin Heidelberg: Springer; 2012. p. 144–57.
20.
go back to reference Huang G-B, Wang DH, Lan Y. Extreme learning machines: a survey. Int J Mach Learn Cybern. 2011;2(2):107–22.CrossRef Huang G-B, Wang DH, Lan Y. Extreme learning machines: a survey. Int J Mach Learn Cybern. 2011;2(2):107–22.CrossRef
21.
go back to reference Huang G, Song S, Gupta JND, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern. 2014;44(12):2405–17.PubMedCrossRef Huang G, Song S, Gupta JND, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern. 2014;44(12):2405–17.PubMedCrossRef
22.
go back to reference Cambria E, Huang G-B, et al. Extreme learning machines. IEEE Intell Syst. 2013;28(6):30–59.CrossRef Cambria E, Huang G-B, et al. Extreme learning machines. IEEE Intell Syst. 2013;28(6):30–59.CrossRef
23.
go back to reference Huang G-B, Cambria E, Toh K-A, Widrow B, Xu Z. New trends of learning in computational intelligence. IEEE Comput Intell Mag. 2015;10(2):16–7.CrossRef Huang G-B, Cambria E, Toh K-A, Widrow B, Xu Z. New trends of learning in computational intelligence. IEEE Comput Intell Mag. 2015;10(2):16–7.CrossRef
24.
go back to reference Chapelle O, Schölkopf B, Zien A, et al. Semi-supervised learning. Cambridge: MIT Press; 2006.CrossRef Chapelle O, Schölkopf B, Zien A, et al. Semi-supervised learning. Cambridge: MIT Press; 2006.CrossRef
25.
go back to reference Zhu X. Semi-supervised learning literature survey. Madison: University of Wisconsin; 2005. Zhu X. Semi-supervised learning literature survey. Madison: University of Wisconsin; 2005.
26.
go back to reference Habernal I, Ptáček T, Steinberger J. Supervised sentiment analysis in Czech social media. Inf Process Manag. 2014;50(5):693–707.CrossRef Habernal I, Ptáček T, Steinberger J. Supervised sentiment analysis in Czech social media. Inf Process Manag. 2014;50(5):693–707.CrossRef
27.
go back to reference Guo Z, Zhang ZM, Xing EP, Faloutsos C. Semi-supervised learning based on semiparametric regularization, vol. 8. In: SDM, SIAM; 2008. p. 132–42. Guo Z, Zhang ZM, Xing EP, Faloutsos C. Semi-supervised learning based on semiparametric regularization, vol. 8. In: SDM, SIAM; 2008. p. 132–42.
28.
go back to reference Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434. Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434.
29.
go back to reference Draper NR, Smith H, Pownell E. Applied regression analysis. New York: Wiley; 1966. Draper NR, Smith H, Pownell E. Applied regression analysis. New York: Wiley; 1966.
30.
31.
go back to reference Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231.CrossRef Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231.CrossRef
32.
33.
go back to reference Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw. 1999;10(5):988–99.PubMedCrossRef Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw. 1999;10(5):988–99.PubMedCrossRef
34.
go back to reference Wolpert DH. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996;8(7):1341–90.CrossRef Wolpert DH. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996;8(7):1341–90.CrossRef
35.
36.
go back to reference Vapnik VN. Statistical learning theory. New York: Wiley-Interscience; 1998. Vapnik VN. Statistical learning theory. New York: Wiley-Interscience; 1998.
37.
38.
go back to reference Bartlett PL, Boucheron S, Lugosi G. Model selection and error estimation. Mach Learn. 2002;48(1–3):85–113.CrossRef Bartlett PL, Boucheron S, Lugosi G. Model selection and error estimation. Mach Learn. 2002;48(1–3):85–113.CrossRef
39.
go back to reference Langford J. Tutorial on practical prediction theory for classification. J Mach Learn Res. 2006;6(1):273. Langford J. Tutorial on practical prediction theory for classification. J Mach Learn Res. 2006;6(1):273.
40.
go back to reference Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst. 2012;23(9):1390–406.PubMedCrossRef Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst. 2012;23(9):1390–406.PubMedCrossRef
41.
go back to reference Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; 1995. Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; 1995.
42.
go back to reference Efron B, Tibshirani RJ. An introduction to the bootstrap. London: Chapman & Hall; 1993.CrossRef Efron B, Tibshirani RJ. An introduction to the bootstrap. London: Chapman & Hall; 1993.CrossRef
43.
go back to reference Oneto L, Ghio A, Ridella S, Anguita D. Fully empirical and data-dependent stability-based bounds. IEEE Trans Cybern. 2015;45(9):1913–1926. Oneto L, Ghio A, Ridella S, Anguita D. Fully empirical and data-dependent stability-based bounds. IEEE Trans Cybern. 2015;45(9):1913–1926.
44.
go back to reference Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik–Chervonenkis entropy and the Rademacher complexity. IEEE Trans Neural Netw Learn Syst. 2014;25(12):2202–11.PubMedCrossRef Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik–Chervonenkis entropy and the Rademacher complexity. IEEE Trans Neural Netw Learn Syst. 2014;25(12):2202–11.PubMedCrossRef
45.
go back to reference Oneto, Ghio A, Ridella S, Anguita D. Global Rademacher complexity bounds: from slow to fast convergence rates. Neural Process Lett. 2016;43(2):567–602. Oneto, Ghio A, Ridella S, Anguita D. Global Rademacher complexity bounds: from slow to fast convergence rates. Neural Process Lett. 2016;43(2):567–602.
46.
go back to reference Bartlett PL, Bousquet O, Mendelson S. Local Rademacher complexities. Ann Stat. 2005;33(4):1497–1537.CrossRef Bartlett PL, Bousquet O, Mendelson S. Local Rademacher complexities. Ann Stat. 2005;33(4):1497–1537.CrossRef
47.
go back to reference Oneto L, Ghio A, Ridella S, Anguita D. Local Rademacher complexity: sharper risk bounds with and without unlabeled samples, Neural Netw. 2015 (in press). Oneto L, Ghio A, Ridella S, Anguita D. Local Rademacher complexity: sharper risk bounds with and without unlabeled samples, Neural Netw. 2015 (in press).
49.
go back to reference McAllester DA. Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory. ACM; 1998. p. 230–234. McAllester DA. Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory. ACM; 1998. p. 230–234.
50.
go back to reference Lever G, Laviolette F, Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoret Comput Sci. 2013;473:4–28.CrossRef Lever G, Laviolette F, Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoret Comput Sci. 2013;473:4–28.CrossRef
51.
go back to reference Germain P, Lacasse A, Laviolette F, Marchand M, Roy JF. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. J Mach Learn Res. 2015;16(4):787–860. Germain P, Lacasse A, Laviolette F, Marchand M, Roy JF. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. J Mach Learn Res. 2015;16(4):787–860.
52.
go back to reference Bégin L, Germain P, Laviolette F, Roy JF. PAC-Bayesian bounds based on the rényi divergence. In: International conference on artificial intelligence and statistics; 2016. Bégin L, Germain P, Laviolette F, Roy JF. PAC-Bayesian bounds based on the rényi divergence. In: International conference on artificial intelligence and statistics; 2016.
53.
go back to reference Floyd S, Warmuth M. Sample compression, learnability, and the Vapnik–Chervonenkis dimension. Mach Learn. 1995;21(3):269–304. Floyd S, Warmuth M. Sample compression, learnability, and the Vapnik–Chervonenkis dimension. Mach Learn. 1995;21(3):269–304.
54.
go back to reference Langford J, McAllester DA, Computable shell decomposition bounds. In: Proceedings of the eleventh annual conference on Computational learning theory; 2000. p. 25–34. Langford J, McAllester DA, Computable shell decomposition bounds. In: Proceedings of the eleventh annual conference on Computational learning theory; 2000. p. 25–34.
55.
go back to reference Bousquet O, Elisseeff A. Stability and generalization. J Mach Learn Res. 2002;2:499–526. Bousquet O, Elisseeff A. Stability and generalization. J Mach Learn Res. 2002;2:499–526.
56.
go back to reference Poggio T, Rifkin R, Mukherjee S, Niyogi P. General conditions for predictivity in learning theory. Nature. 2004;428(6981):419–22.PubMedCrossRef Poggio T, Rifkin R, Mukherjee S, Niyogi P. General conditions for predictivity in learning theory. Nature. 2004;428(6981):419–22.PubMedCrossRef
57.
go back to reference Guyon I, Saffari A, Dror G, Cawley G. Model selection: beyond the Bayesian/frequentist divide. J Mach Learn Res. 2010;11:61–87. Guyon I, Saffari A, Dror G, Cawley G. Model selection: beyond the Bayesian/frequentist divide. J Mach Learn Res. 2010;11:61–87.
58.
go back to reference Huang GB. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn Comput. 2015;7(3):263–78.CrossRef Huang GB. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn Comput. 2015;7(3):263–78.CrossRef
60.
go back to reference Huang GB, Bai Z, Kasun LLC, Vong CM. Local receptive fields based extreme learning machine. IEEE Comput Intell Mag. 2015;10(2):18–29.CrossRef Huang GB, Bai Z, Kasun LLC, Vong CM. Local receptive fields based extreme learning machine. IEEE Comput Intell Mag. 2015;10(2):18–29.CrossRef
61.
go back to reference Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern. 2012;42(2):513–29.PubMedCrossRef Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern. 2012;42(2):513–29.PubMedCrossRef
62.
go back to reference Bisio F, Decherchi S, Gastaldo P, Zunino R. Inductive bias for semi-supervised extreme learning machine, vol. 1. In: Proceedings of ELM-2014; 2015. Bisio F, Decherchi S, Gastaldo P, Zunino R. Inductive bias for semi-supervised extreme learning machine, vol. 1. In: Proceedings of ELM-2014; 2015.
63.
go back to reference Dinuzzo F, Schölkopf B. The representer theorem for hilbert spaces: a necessary and sufficient condition. In: Advances in neural information processing systems; 2012. p. 189–196. Dinuzzo F, Schölkopf B. The representer theorem for hilbert spaces: a necessary and sufficient condition. In: Advances in neural information processing systems; 2012. p. 189–196.
64.
go back to reference Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: International Conference on Computational Learning Theory. Springer Berlin Heidelberg; 2001. p. 416–426. Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: International Conference on Computational Learning Theory. Springer Berlin Heidelberg; 2001. p. 416–426.
65.
go back to reference Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–60. Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–60.
66.
go back to reference Salakhutdinov R, Hinton G. An efficient learning procedure for deep boltzmann machines. Neural Comput. 2012;24(8):1967–2006.PubMedCrossRef Salakhutdinov R, Hinton G. An efficient learning procedure for deep boltzmann machines. Neural Comput. 2012;24(8):1967–2006.PubMedCrossRef
67.
go back to reference Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79.CrossRef Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79.CrossRef
68.
go back to reference McAllester DA. PAC-Bayesian stochastic model selection. Mach Learn. 2003;51(1):5–21.CrossRef McAllester DA. PAC-Bayesian stochastic model selection. Mach Learn. 2003;51(1):5–21.CrossRef
69.
go back to reference Anguita D, Ghio A, Oneto L, Ridella S. In-sample model selection for support vector machines. In: International joint conference on neural networks; 2011. Anguita D, Ghio A, Oneto L, Ridella S. In-sample model selection for support vector machines. In: International joint conference on neural networks; 2011.
70.
go back to reference Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory. 2001;47(5):1902–14.CrossRef Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory. 2001;47(5):1902–14.CrossRef
71.
go back to reference Inoue A, Kilian L. In-sample or out-of-sample tests of predictability: which one should we use? Econom Rev. 2005;23(4):371–402.CrossRef Inoue A, Kilian L. In-sample or out-of-sample tests of predictability: which one should we use? Econom Rev. 2005;23(4):371–402.CrossRef
72.
go back to reference Cheng F, Yu J, Xiong H. Facial expression recognition in Jaffe dataset based on Gaussian process classification. IEEE Trans Neural Netw. 2010;21(10):1685–90.PubMedCrossRef Cheng F, Yu J, Xiong H. Facial expression recognition in Jaffe dataset based on Gaussian process classification. IEEE Trans Neural Netw. 2010;21(10):1685–90.PubMedCrossRef
73.
go back to reference Shalev-Shwartz S, Ben-David S. Understanding machine learning: from theory to algorithms. Cambridge: Cambridge University Press; 2014.CrossRef Shalev-Shwartz S, Ben-David S. Understanding machine learning: from theory to algorithms. Cambridge: Cambridge University Press; 2014.CrossRef
74.
go back to reference Hoeffding W. Probability inequalities for sums of bounded random variables. J Am Stat Assoc. 1963;58(301):13–30.CrossRef Hoeffding W. Probability inequalities for sums of bounded random variables. J Am Stat Assoc. 1963;58(301):13–30.CrossRef
75.
go back to reference Anguita D, Ghio A, Ridella S, Sterpi D. K-fold cross validation for error rate estimate in support vector machines. In: International conference on data mining; 2009. Anguita D, Ghio A, Ridella S, Sterpi D. K-fold cross validation for error rate estimate in support vector machines. In: International conference on data mining; 2009.
76.
go back to reference Vapnik VN, Kotz S. Estimation of dependences based on empirical data, vol. 41. New York: Springer; 1982. Vapnik VN, Kotz S. Estimation of dependences based on empirical data, vol. 41. New York: Springer; 1982.
77.
go back to reference Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory. 1998;44(5):1926–40.CrossRef Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory. 1998;44(5):1926–40.CrossRef
78.
go back to reference Boucheron S, Lugosi G, Massart P. A sharp concentration inequality with applications. Random Struct Algorithms. 2000;16(3):277–92.CrossRef Boucheron S, Lugosi G, Massart P. A sharp concentration inequality with applications. Random Struct Algorithms. 2000;16(3):277–92.CrossRef
79.
go back to reference Boucheron S, Lugosi G, Massart P. Concentration inequalities: a nonasymptotic theory of independence. Oxford: Oxford University Press; 2013.CrossRef Boucheron S, Lugosi G, Massart P. Concentration inequalities: a nonasymptotic theory of independence. Oxford: Oxford University Press; 2013.CrossRef
80.
go back to reference Bartlett PL, Mendelson S. Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res. 2003;3:463–82. Bartlett PL, Mendelson S. Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res. 2003;3:463–82.
81.
go back to reference Laviolette F, Marchand M. PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. J Mach Learn Res. 2007;8(7):1461–87. Laviolette F, Marchand M. PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. J Mach Learn Res. 2007;8(7):1461–87.
82.
go back to reference Lacasse A, Laviolette F, Marchand M, Germain P, Usunier N. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In: Advances in Neural information processing systems; 2006. p. 769–776. Lacasse A, Laviolette F, Marchand M, Germain P, Usunier N. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In: Advances in Neural information processing systems; 2006. p. 769–776.
83.
go back to reference Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40. Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.
85.
go back to reference Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat. 1998;26(5):1651–86.CrossRef Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat. 1998;26(5):1651–86.CrossRef
86.
go back to reference Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Mach Learn. 1999;37(3):297–336.CrossRef Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Mach Learn. 1999;37(3):297–336.CrossRef
87.
go back to reference Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis, vol. 2. London: Taylor & Francis; 2014. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis, vol. 2. London: Taylor & Francis; 2014.
88.
go back to reference Rakhlin A, Mukherjee S, Poggio T. Stability results in learning theory. Anal Appl. 2005;3(04):397–417.CrossRef Rakhlin A, Mukherjee S, Poggio T. Stability results in learning theory. Anal Appl. 2005;3(04):397–417.CrossRef
89.
go back to reference Devroye L, Györfi L, Lugosi G. A probabilistic theory of pattern recognition. Berlin: Springer; 1996.CrossRef Devroye L, Györfi L, Lugosi G. A probabilistic theory of pattern recognition. Berlin: Springer; 1996.CrossRef
90.
go back to reference Dietrich R, Opper M, Sompolinsky H. Statistical mechanics of support vector networks. Phys Rev Lett. 1999;82(14):2975.CrossRef Dietrich R, Opper M, Sompolinsky H. Statistical mechanics of support vector networks. Phys Rev Lett. 1999;82(14):2975.CrossRef
91.
go back to reference Li M, Vitányi P. An introduction to Kolmogorov complexity and its applications. Springer-Verlag, New York: Springer Science & Business Media; 2013. Li M, Vitányi P. An introduction to Kolmogorov complexity and its applications. Springer-Verlag, New York: Springer Science & Business Media; 2013.
92.
go back to reference Grünwald PD. The minimum description length principle. Cambridge: MIT Press; 2007. Grünwald PD. The minimum description length principle. Cambridge: MIT Press; 2007.
93.
go back to reference Tikhonov AN, Arsenin VI. Solutions of ill-posed problems. New York: Vh Winston; 1977. Tikhonov AN, Arsenin VI. Solutions of ill-posed problems. New York: Vh Winston; 1977.
94.
go back to reference Boyd S, Vandenberghe L. Convex optimization. Cambridge: Cambridge University Press; 2004.CrossRef Boyd S, Vandenberghe L. Convex optimization. Cambridge: Cambridge University Press; 2004.CrossRef
95.
go back to reference Serfling RJ. Probability inequalities for the sum in sampling without replacement. Ann Stat. 1974;2(1):39–48.CrossRef Serfling RJ. Probability inequalities for the sum in sampling without replacement. Ann Stat. 1974;2(1):39–48.CrossRef
96.
go back to reference Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn. 2009;3(1):1–130.CrossRef Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn. 2009;3(1):1–130.CrossRef
97.
go back to reference Anguita D, Ghio A, Oneto L, Ridella S. In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett. 2012;36(3):275–83.CrossRef Anguita D, Ghio A, Oneto L, Ridella S. In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett. 2012;36(3):275–83.CrossRef
98.
go back to reference Bartlett PL, Long PM, Williamson RC. Fat-shattering and the learnability of real-valued functions. In: Proceedings of the seventh annual conference on Computational learning theory. ACM; 1994. p. 299–310. Bartlett PL, Long PM, Williamson RC. Fat-shattering and the learnability of real-valued functions. In: Proceedings of the seventh annual conference on Computational learning theory. ACM; 1994. p. 299–310.
99.
go back to reference Zhou D-X. The covering number in learning theory. J Complex. 2002;18(3):739–67.CrossRef Zhou D-X. The covering number in learning theory. J Complex. 2002;18(3):739–67.CrossRef
100.
go back to reference Massart P. Some applications of concentration inequalities to statistics. Ann Fac Sci Toulouse Math. 2000;9(2):245–303.CrossRef Massart P. Some applications of concentration inequalities to statistics. Ann Fac Sci Toulouse Math. 2000;9(2):245–303.CrossRef
101.
go back to reference Ivanov VV. The theory of approximate methods and their applications to the numerical solution of singular integral equations. US: Springer Science & Business Media; 1976. Ivanov VV. The theory of approximate methods and their applications to the numerical solution of singular integral equations. US: Springer Science & Business Media; 1976.
102.
go back to reference Pelckmans K, Suykens JA, De Moor B. Morozov. Ivanov and Tikhonov regularization based LS-SVMS. In: International Conference on Neural information processing, Springer Berlin Heidelberg; 2004. p. 1216–1222. Pelckmans K, Suykens JA, De Moor B. Morozov. Ivanov and Tikhonov regularization based LS-SVMS. In: International Conference on Neural information processing, Springer Berlin Heidelberg; 2004. p. 1216–1222.
103.
go back to reference Oneto L, Anguita D, Ghio A, Ridella S. The impact of unlabeled patterns in rademacher complexity theory for kernel classifiers. In: Advances in Neural information processing systems; 2011. p. 585–593. Oneto L, Anguita D, Ghio A, Ridella S. The impact of unlabeled patterns in rademacher complexity theory for kernel classifiers. In: Advances in Neural information processing systems; 2011. p. 585–593.
104.
go back to reference Anguita D, Ghio A, Oneto L, Ridella S. Unlabeled patterns to tighten rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett. 2014;37:210–9.CrossRef Anguita D, Ghio A, Oneto L, Ridella S. Unlabeled patterns to tighten rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett. 2014;37:210–9.CrossRef
105.
go back to reference Eckart C, Young G. The approximation of one matrix by another of lower rank. Psychometrika. 1936;1(3):211–8.CrossRef Eckart C, Young G. The approximation of one matrix by another of lower rank. Psychometrika. 1936;1(3):211–8.CrossRef
Metadata
Title
Semi-supervised Learning for Affective Common-Sense Reasoning
Authors
Luca Oneto
Federica Bisio
Erik Cambria
Davide Anguita
Publication date
24-10-2016
Publisher
Springer US
Published in
Cognitive Computation / Issue 1/2017
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-016-9433-5

Other articles of this Issue 1/2017

Cognitive Computation 1/2017 Go to the issue

Premium Partner