Skip to main content

2018 | OriginalPaper | Buchkapitel

Active Learning for Improving Machine Learning of Student Explanatory Essays

verfasst von : Peter Hastings, Simon Hughes, M. Anne Britt

Erschienen in: Artificial Intelligence in Education

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There is an increasing emphasis, especially in STEM areas, on students’ abilities to create explanatory descriptions. Holistic, overall evaluations of explanations can be performed relatively easily with shallow language processing by humans or computers. However, this provides little information about an essential element of explanation quality: the structure of the explanation, i.e., how it connects causes to effects. The difficulty of providing feedback on explanation structure can lead teachers to either avoid giving this type of assignment or to provide only shallow feedback on them. Using machine learning techniques, we have developed successful computational models for analyzing explanatory essays. A major cost of developing such models is the time and effort required for human annotation of the essays. As part of a large project studying students’ reading processes, we have collected a large number of explanatory essays and thoroughly annotated them. Then we used the annotated essays to train our machine learning models. In this paper, we focus on how to get the best payoff from the expensive annotation process within such an educational context and we evaluate a method called Active Learning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The percentages are all parameters to the model. These were selected because they allowed us to see the performance of the models at a reasonable granularity. It should be noted, however, that in our case, 10% of the total set represents over 100 additional essays. In real-world settings, a smaller increment would likely be used due to the cost of annotation.
 
2
We used a validation or holdout set to provide a consistent basis on which to judge the performance of the models.
 
3
For what it’s worth, these are analogous to the U.S. House of Representatives and Senate, respectively, with one giving more weight to more “populous” (i.e., frequent) entities, and the other giving “equal representation” to each entity.
 
4
Alternatively, we could have used the frequencies from the training set. We used frequencies from the remainder pool because they would be more accurate, especially at the earlier stages. In a real-life setting where the items in the remainder pool would be unlabeled, those frequencies would, of course, be unknown.
 
Literatur
1.
Zurück zum Zitat Osborne, J., Erduran, S., Simon, S.: Enhancing the quality of argumentation in science classrooms. J. Res. Sci. Teach. 41(10), 994–1020 (2004)CrossRef Osborne, J., Erduran, S., Simon, S.: Enhancing the quality of argumentation in science classrooms. J. Res. Sci. Teach. 41(10), 994–1020 (2004)CrossRef
2.
Zurück zum Zitat Achieve Inc.: Next generation science standards (2013) Achieve Inc.: Next generation science standards (2013)
3.
Zurück zum Zitat Hastings, P., Britt, M.A., Rupp, K., Kopp, K., Hughes, S.: Computational analysis of explanatory essay structure. In: Millis, K., Long, D., Magliano, J.P., Wiemer, K. (eds.) Multi-Disciplinary Approaches to Deep Learning. Routledge, New York (2018). Accepted for publication Hastings, P., Britt, M.A., Rupp, K., Kopp, K., Hughes, S.: Computational analysis of explanatory essay structure. In: Millis, K., Long, D., Magliano, J.P., Wiemer, K. (eds.) Multi-Disciplinary Approaches to Deep Learning. Routledge, New York (2018). Accepted for publication
4.
Zurück zum Zitat Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations Session at EACL 2012, Avignon, France, Association for Computational Linguistics, April 2012 Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations Session at EACL 2012, Avignon, France, Association for Computational Linguistics, April 2012
5.
Zurück zum Zitat Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J.D., Tsujii, J.: BioNLP shared task 2011: Supporting resources. In: Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, Association for Computational Linguistics, pp. 112–120, June 2011 Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J.D., Tsujii, J.: BioNLP shared task 2011: Supporting resources. In: Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, Association for Computational Linguistics, pp. 112–120, June 2011
6.
Zurück zum Zitat Goldman, S.R., Greenleaf, C., Yukhymenko-Lescroart, M., Brown, W., Ko, M., Emig, J., George, M., Wallace, P., Blaum, D., Britt, M.: Project READI: Explanatory modeling in science through text-based investigation: Testing the efficacy of the READI intervention approach. Technical Report 27, Project READI (2016) Goldman, S.R., Greenleaf, C., Yukhymenko-Lescroart, M., Brown, W., Ko, M., Emig, J., George, M., Wallace, P., Blaum, D., Britt, M.: Project READI: Explanatory modeling in science through text-based investigation: Testing the efficacy of the READI intervention approach. Technical Report 27, Project READI (2016)
7.
Zurück zum Zitat Shermis, M.D., Hamner, B.: Contrasting state-of-the-art automated scoring of essays: analysis. In: Annual National Council on Measurement in Education Meeting, pp. 14–16 (2012) Shermis, M.D., Hamner, B.: Contrasting state-of-the-art automated scoring of essays: analysis. In: Annual National Council on Measurement in Education Meeting, pp. 14–16 (2012)
8.
Zurück zum Zitat Deane, P.: On the relation between automated essay scoring and modern views of the writing construct. Assessing Writ. 18(1), 7–24 (2013)CrossRef Deane, P.: On the relation between automated essay scoring and modern views of the writing construct. Assessing Writ. 18(1), 7–24 (2013)CrossRef
9.
Zurück zum Zitat Roscoe, R.D., Crossley, S.A., Snow, E.L., Varner, L.K., McNamara, D.S.: Writing quality, knowledge, and comprehension correlates of human and automated essay scoring. In: The Twenty-Seventh International Flairs Conference (2014) Roscoe, R.D., Crossley, S.A., Snow, E.L., Varner, L.K., McNamara, D.S.: Writing quality, knowledge, and comprehension correlates of human and automated essay scoring. In: The Twenty-Seventh International Flairs Conference (2014)
10.
Zurück zum Zitat Shermis, M.D., Burstein, J.: Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge (2013) Shermis, M.D., Burstein, J.: Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge (2013)
11.
Zurück zum Zitat Dikli, S.: Automated essay scoring. Turk. Online J. Distance Educ. 7(1), 49–62 (2015) Dikli, S.: Automated essay scoring. Turk. Online J. Distance Educ. 7(1), 49–62 (2015)
12.
Zurück zum Zitat Condon, W.: Large-scale assessment, locally-developed measures, and automated scoring of essays: Fishing for red herrings? Assessing Writ. 18(1), 100–108 (2013)CrossRef Condon, W.: Large-scale assessment, locally-developed measures, and automated scoring of essays: Fishing for red herrings? Assessing Writ. 18(1), 100–108 (2013)CrossRef
13.
Zurück zum Zitat Riaz, M., Girju, R.: Recognizing causality in verb-noun pairs via noun and verb semantics. EACL 2014, 48 (2014) Riaz, M., Girju, R.: Recognizing causality in verb-noun pairs via noun and verb semantics. EACL 2014, 48 (2014)
14.
Zurück zum Zitat Rink, B., Bejan, C.A., Harabagiu, S.M.: Learning textual graph patterns to detect causal event relations. In: Guesgen, H.W., Murray, R.C. (eds.) FLAIRS Conference. AAAI Press (2010) Rink, B., Bejan, C.A., Harabagiu, S.M.: Learning textual graph patterns to detect causal event relations. In: Guesgen, H.W., Murray, R.C. (eds.) FLAIRS Conference. AAAI Press (2010)
16.
Zurück zum Zitat Hughes, S.: Automatic inference of causal reasoning chains from student essays. Ph.D. thesis, DePaul University, Chicago, IL (2018) Hughes, S.: Automatic inference of causal reasoning chains from student essays. Ph.D. thesis, DePaul University, Chicago, IL (2018)
17.
Zurück zum Zitat Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRef Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRef
19.
Zurück zum Zitat Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison (2009) Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison (2009)
20.
Zurück zum Zitat Sharma, M., Bilgic, M.: Most-surely vs. least-surely uncertain. In: 13th International Conference on Data Mining (ICDM), pp. 667–676. IEEE (2013) Sharma, M., Bilgic, M.: Most-surely vs. least-surely uncertain. In: 13th International Conference on Data Mining (ICDM), pp. 667–676. IEEE (2013)
21.
Zurück zum Zitat Ferdowsi, Z.: Active learning for high precision classification with imbalanced data. Ph.D. thesis, DePaul University, Chicago, IL, USA, May 2015 Ferdowsi, Z.: Active learning for high precision classification with imbalanced data. Ph.D. thesis, DePaul University, Chicago, IL, USA, May 2015
22.
Zurück zum Zitat Cawley, G.C.: Baseline methods for active learning. In: Active Learning and Experimental Design Workshop in Conjunction with AISTATS 2010, pp. 47–57 (2011) Cawley, G.C.: Baseline methods for active learning. In: Active Learning and Experimental Design Workshop in Conjunction with AISTATS 2010, pp. 47–57 (2011)
23.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH
24.
Zurück zum Zitat Mirroshandel, S.A., Ghassem-Sani, G., Nasr, A.: Active learning strategies for support vector machines, application to temporal relation classification. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 56–64 (2011) Mirroshandel, S.A., Ghassem-Sani, G., Nasr, A.: Active learning strategies for support vector machines, application to temporal relation classification. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 56–64 (2011)
25.
Zurück zum Zitat Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998) Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
26.
Zurück zum Zitat Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRef Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRef
27.
Zurück zum Zitat Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)CrossRef Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)CrossRef
Metadaten
Titel
Active Learning for Improving Machine Learning of Student Explanatory Essays
verfasst von
Peter Hastings
Simon Hughes
M. Anne Britt
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93843-1_11

Premium Partner