Skip to main content
Erschienen in: Progress in Artificial Intelligence 3/2017

16.02.2017 | Regular Paper

The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks

verfasst von: Justin Lovinger, Iren Valova

Erschienen in: Progress in Artificial Intelligence | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

While scientific applications can gather consistent data from the natural world, psychological, sociological, and even economic applications rely on data provided by people. Since the majority of machine learning is aimed at improving the lives of people, human input is essential for useful results. In this paper, we explore datasets where input and target attributes are provided by people taking surveys. Every survey dataset, generated from human input, is reliable and self-consistent according to Cronbach’s alpha. One expects a reliable questionnaire to provide effective data for learning. It is this expectation that our analysis finds false, when applied to supervised learning. Both statistical analysis and application of several supervised learning architectures, with a focus on neural networks, are utilized to provide insight into data gathered through human input.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRef Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRef
3.
Zurück zum Zitat Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer Science & Business Media, Berlin (2006)CrossRefMATH Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer Science & Business Media, Berlin (2006)CrossRefMATH
4.
Zurück zum Zitat Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn. 57, 164–178 (2016)CrossRef Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn. 57, 164–178 (2016)CrossRef
5.
Zurück zum Zitat Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATH Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATH
6.
Zurück zum Zitat Smith, M.R., Martinez, T.: Improving classification accuracy by identifying and removing instances that should be misclassified. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2690–2697. IEEE (2011) Smith, M.R., Martinez, T.: Improving classification accuracy by identifying and removing instances that should be misclassified. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2690–2697. IEEE (2011)
7.
Zurück zum Zitat Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24(7), 1015–1022 (2003)CrossRef Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24(7), 1015–1022 (2003)CrossRef
8.
Zurück zum Zitat Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Ferri, F.J., Inesta, J.M., Amin, A., Pudil, P. (eds.) Advances in Pattern Recognition, pp. 621–630. Springer, Berlin (2000) Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Ferri, F.J., Inesta, J.M., Amin, A., Pudil, P. (eds.) Advances in Pattern Recognition, pp. 621–630. Springer, Berlin (2000)
9.
Zurück zum Zitat Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Advances in Neural Networks–ISNN 2004, pp. 356–361. Springer, Berlin (2004) Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Advances in Neural Networks–ISNN 2004, pp. 356–361. Springer, Berlin (2004)
10.
Zurück zum Zitat Bootkrajang, J., Kabán, A.: Multi-class classification in the presence of labelling errors. In: ESANN, Citeseer (2011) Bootkrajang, J., Kabán, A.: Multi-class classification in the presence of labelling errors. In: ESANN, Citeseer (2011)
11.
Zurück zum Zitat Harhoff, D., Körting, T.: Lending relationships in Germany–empirical evidence from survey data. J. Bank. Finance 22(10), 1317–1353 (1998)CrossRef Harhoff, D., Körting, T.: Lending relationships in Germany–empirical evidence from survey data. J. Bank. Finance 22(10), 1317–1353 (1998)CrossRef
12.
Zurück zum Zitat De Vaus, D.: Surveys in Social Research. Routledge, London (2013) De Vaus, D.: Surveys in Social Research. Routledge, London (2013)
13.
Zurück zum Zitat Thompson, D.F.: Deliberative democratic theory and empirical political science. Annu. Rev. Polit. Sci. 11, 497–520 (2008)CrossRef Thompson, D.F.: Deliberative democratic theory and empirical political science. Annu. Rev. Polit. Sci. 11, 497–520 (2008)CrossRef
14.
Zurück zum Zitat van Kampen, D.: The 5-dimensional personality test (5dpt): relationships with two lexically based instruments and the validation of the absorption scale. J. Personal. Assess. 94(1), 92–101 (2012)CrossRef van Kampen, D.: The 5-dimensional personality test (5dpt): relationships with two lexically based instruments and the validation of the absorption scale. J. Personal. Assess. 94(1), 92–101 (2012)CrossRef
15.
Zurück zum Zitat Burisch, M.: Approaches to personality inventory construction: a comparison of merits. Am. Psychol. 39(3), 214 (1984)CrossRef Burisch, M.: Approaches to personality inventory construction: a comparison of merits. Am. Psychol. 39(3), 214 (1984)CrossRef
16.
Zurück zum Zitat Reyes-Ortiz, J.-L., Anguita, D., Ghio, A., Parra, X.: Human activity recognition using smartphones data set. UCI Machine Learning Repository (2013) Reyes-Ortiz, J.-L., Anguita, D., Ghio, A., Parra, X.: Human activity recognition using smartphones data set. UCI Machine Learning Repository (2013)
17.
Zurück zum Zitat Aha, D.W.: Heart disease data set. UCI Machine Learning Repository (1988) Aha, D.W.: Heart disease data set. UCI Machine Learning Repository (1988)
18.
Zurück zum Zitat Gonyea, R.M.: Self-reported data in institutional research: review and recommendations. New Dir. Inst. Res. 127, 73 (2005) Gonyea, R.M.: Self-reported data in institutional research: review and recommendations. New Dir. Inst. Res. 127, 73 (2005)
19.
Zurück zum Zitat Harrison, L.D.: The validity of self-reported data on drug use. J. Drug Issues 25(1), 91–111 (1995)CrossRef Harrison, L.D.: The validity of self-reported data on drug use. J. Drug Issues 25(1), 91–111 (1995)CrossRef
20.
Zurück zum Zitat van Poppel, M.N.M., de Vet, H.C.W., Koes, B.W., Smid, T., Bouter, L.M.: Measuring sick leave: a comparison of self-reported data on sick leave and data from company records. Occup. Med. 52(8), 485–490 (2002)CrossRef van Poppel, M.N.M., de Vet, H.C.W., Koes, B.W., Smid, T., Bouter, L.M.: Measuring sick leave: a comparison of self-reported data on sick leave and data from company records. Occup. Med. 52(8), 485–490 (2002)CrossRef
21.
Zurück zum Zitat Wang, S.: Classification with incomplete survey data: a hopfield neural network approach. Comput. Oper. Res. 32(10), 2583–2594 (2005)CrossRefMATH Wang, S.: Classification with incomplete survey data: a hopfield neural network approach. Comput. Oper. Res. 32(10), 2583–2594 (2005)CrossRefMATH
22.
Zurück zum Zitat Lu, C., Li, X.-W., Pan, H.-B.: Application of extension neural network for classification with incomplete survey data. In: First International Conference on Innovative Computing, Information and Control, 2006. ICICIC’06, vol. 3, pp. 190–193. IEEE (2006) Lu, C., Li, X.-W., Pan, H.-B.: Application of extension neural network for classification with incomplete survey data. In: First International Conference on Innovative Computing, Information and Control, 2006. ICICIC’06, vol. 3, pp. 190–193. IEEE (2006)
23.
Zurück zum Zitat Tagliaferri, R., Longo, G., Milano, L., Acernese, F., Barone, F., Ciaramella, A., De Rosa, R., Donalek, C., Eleuteri, A., Raiconi, G., et al.: Neural networks in astronomy. Neural Netw. 16(3), 297–319 (2003)CrossRef Tagliaferri, R., Longo, G., Milano, L., Acernese, F., Barone, F., Ciaramella, A., De Rosa, R., Donalek, C., Eleuteri, A., Raiconi, G., et al.: Neural networks in astronomy. Neural Netw. 16(3), 297–319 (2003)CrossRef
24.
Zurück zum Zitat Hagan, M.T., Demuth, H.B., Beale, M.H., et al.: Neural Network Design. Pws Pub, Boston (1996) Hagan, M.T., Demuth, H.B., Beale, M.H., et al.: Neural Network Design. Pws Pub, Boston (1996)
26.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
27.
Zurück zum Zitat Tóth, L.: Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6985–6989. IEEE (2013) Tóth, L.: Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6985–6989. IEEE (2013)
28.
Zurück zum Zitat Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 1 (2013) Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 1 (2013)
29.
Zurück zum Zitat Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Berlin (2010) Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Berlin (2010)
30.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH
31.
Zurück zum Zitat Lowe, D., Broomhead, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355 (1988)MathSciNetMATH Lowe, D., Broomhead, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355 (1988)MathSciNetMATH
32.
Zurück zum Zitat Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document (1988) Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document (1988)
33.
Zurück zum Zitat Tan, Y., Wang, J., Zurada, J.M.: Nonlinear blind source separation using a radial basis function network. IEEE Trans. Neural Netw. 12(1), 124–134 (2001)CrossRef Tan, Y., Wang, J., Zurada, J.M.: Nonlinear blind source separation using a radial basis function network. IEEE Trans. Neural Netw. 12(1), 124–134 (2001)CrossRef
35.
Zurück zum Zitat Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRef Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRef
36.
Zurück zum Zitat Kalteh, A.M., Hjorth, P., Berndtsson, R.: Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ. Model. Softw. 23(7), 835–845 (2008)CrossRef Kalteh, A.M., Hjorth, P., Berndtsson, R.: Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ. Model. Softw. 23(7), 835–845 (2008)CrossRef
37.
Zurück zum Zitat Mao, K.Z., Tan, K.-C.: Probabilistic neural-network structure determination for pattern classification. IEEE Trans. Neural Netw. 11(4), 1009–1016 (2000)CrossRef Mao, K.Z., Tan, K.-C.: Probabilistic neural-network structure determination for pattern classification. IEEE Trans. Neural Netw. 11(4), 1009–1016 (2000)CrossRef
38.
Zurück zum Zitat Gao, M., Tian, J.: Web classification mining based on radial basic probabilistic neural network. In: 2009 First International Workshop on Database Technology and Applications, pp. 586–589. IEEE (2009) Gao, M., Tian, J.: Web classification mining based on radial basic probabilistic neural network. In: 2009 First International Workshop on Database Technology and Applications, pp. 586–589. IEEE (2009)
39.
Zurück zum Zitat Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)CrossRef Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)CrossRef
40.
Zurück zum Zitat Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282. IEEE (1995) Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282. IEEE (1995)
41.
Zurück zum Zitat Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinf. 7(1), 1 (2006)CrossRef Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinf. 7(1), 1 (2006)CrossRef
42.
Zurück zum Zitat Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J.P.: An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 67, 93–104 (2012)CrossRef Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J.P.: An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 67, 93–104 (2012)CrossRef
43.
Zurück zum Zitat Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989) Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)
44.
Zurück zum Zitat Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (1994) Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (1994)
45.
Zurück zum Zitat Pradhan, B.: A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput. Geosci. 51, 350–365 (2013)CrossRef Pradhan, B.: A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput. Geosci. 51, 350–365 (2013)CrossRef
47.
Zurück zum Zitat Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)MathSciNetCrossRef Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)MathSciNetCrossRef
48.
Zurück zum Zitat Quinlan, R.J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) Quinlan, R.J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
51.
Zurück zum Zitat Lewis, R., Goldberg, L.R.: The structure of phenotypic personality traits. Am. Psychol. 48(1), 26 (1993)CrossRef Lewis, R., Goldberg, L.R.: The structure of phenotypic personality traits. Am. Psychol. 48(1), 26 (1993)CrossRef
52.
Zurück zum Zitat Costa, P.T., McCrae, R.R.: The revised neo personality inventory (neo-pi-r). SAGE Handb. Personal. Theory Assess. 2, 179–198 (2008) Costa, P.T., McCrae, R.R.: The revised neo personality inventory (neo-pi-r). SAGE Handb. Personal. Theory Assess. 2, 179–198 (2008)
53.
Zurück zum Zitat Turiano, N.A., Mroczek, D.K., Moynihan, J., Chapman, B.P.: Big 5 personality traits and interleukin-6: evidence for healthy neuroticism in a us population sample. Brain Behav. Immun. 28, 83–89 (2013)CrossRef Turiano, N.A., Mroczek, D.K., Moynihan, J., Chapman, B.P.: Big 5 personality traits and interleukin-6: evidence for healthy neuroticism in a us population sample. Brain Behav. Immun. 28, 83–89 (2013)CrossRef
56.
Zurück zum Zitat Marshall, M.: UCI machine learning repository (1988) Marshall, M.: UCI machine learning repository (1988)
57.
Zurück zum Zitat Mangasarian, O.L., Setiono, R., Wolberg, W.H.: Pattern recognition via linear programming: theory and application to medical diagnosis (1990) Mangasarian, O.L., Setiono, R., Wolberg, W.H.: Pattern recognition via linear programming: theory and application to medical diagnosis (1990)
58.
Zurück zum Zitat Tavakol, M., Dennick, R.: Making sense of Cronbach’s alpha. Int. J. Med. Educ. 2, 53 (2011)CrossRef Tavakol, M., Dennick, R.: Making sense of Cronbach’s alpha. Int. J. Med. Educ. 2, 53 (2011)CrossRef
59.
Zurück zum Zitat Bland, J.M., Altman, D.G.: Statistics notes: Cronbach’s alpha. Bmj 314(7080), 572 (1997)CrossRef Bland, J.M., Altman, D.G.: Statistics notes: Cronbach’s alpha. Bmj 314(7080), 572 (1997)CrossRef
Metadaten
Titel
The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks
verfasst von
Justin Lovinger
Iren Valova
Publikationsdatum
16.02.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Progress in Artificial Intelligence / Ausgabe 3/2017
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-017-0118-4

Weitere Artikel der Ausgabe 3/2017

Progress in Artificial Intelligence 3/2017 Zur Ausgabe

Premium Partner