Skip to main content
Erschienen in: Journal of Science Education and Technology 3/2021

19.11.2020

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

verfasst von: Xiaoming Zhai, Lehong Shi, Ross H. Nehm

Erschienen in: Journal of Science Education and Technology | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine learning (ML) has been increasingly employed in science assessment to facilitate automatic scoring efforts, although with varying degrees of success (i.e., magnitudes of machine-human score agreements [MHAs]). Little work has empirically examined the factors that impact MHA disparities in this growing field, thus constraining the improvement of machine scoring capacity and its wide applications in science education. We performed a meta-analysis of 110 studies of MHAs in order to identify the factors most strongly contributing to scoring success (i.e., high Cohen's kappa [κ]). We empirically examined six factors proposed as contributors to MHA magnitudes: algorithm, subject domain, assessment format, construct, school level, and machine supervision type. Our analyses of 110 MHAs revealed substantial heterogeneity in \(\kappa{ (mean =} \, \text{.64; range = .09-.97}\), taking weights into consideration). Using three-level random-effects modeling, MHA score heterogeneity was explained by the variability both within publications (i.e., the assessment task level: 82.6%) and between publications (i.e., the individual study level: 16.7%). Our results also suggest that all six factors have significant moderator effects on scoring success magnitudes. Among these, algorithm and subject domain had significantly larger effects than the other factors, suggesting that technical features and assessment external features might be primary targets for improving MHAs and ML-based science assessments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Altman, D. G. (1991). Mathematics for kappa. Practical statistics for medical research, 1991, 406–407. Altman, D. G. (1991). Mathematics for kappa. Practical statistics for medical research, 1991, 406–407.
Zurück zum Zitat Anderson, C. W., & de los Santos, E. X., Bodbyl, S., Covitt, B. A., Edwards, K. D., & Hancock, J. B. (2018). Designing educational systems to support enactment of the Next Generation Science Standards. Journal of Research in Science Teaching, 55(7), 1026–1052. Anderson, C. W., & de los Santos, E. X., Bodbyl, S., Covitt, B. A., Edwards, K. D., & Hancock, J. B. (2018). Designing educational systems to support enactment of the Next Generation Science Standards. Journal of Research in Science Teaching, 55(7), 1026–1052.
Zurück zum Zitat Bartolucci, A. A., & Hillegass, W. B. (2010). Overview, strengths, and limitations of systematic reviews and meta-analyses. In F. Chiappelli (Ed.), Evidence-based practice: Toward optimizing clinical outcomes (pp. 17–33). Berlin Heidelberg: Springer. Bartolucci, A. A., & Hillegass, W. B. (2010). Overview, strengths, and limitations of systematic reviews and meta-analyses. In F. Chiappelli (Ed.), Evidence-based practice: Toward optimizing clinical outcomes (pp. 17–33). Berlin Heidelberg: Springer.
Zurück zum Zitat *Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182. *Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.
Zurück zum Zitat Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40. Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40.
Zurück zum Zitat Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2011). Introduction to meta-analysis: John Wiley & Sons. Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2011). Introduction to meta-analysis: John Wiley & Sons.
Zurück zum Zitat Castelvecchi, D. (2016). Can we open the black box of AI? Nature, 538(7623), 20–23. Castelvecchi, D. (2016). Can we open the black box of AI? Nature, 538(7623), 20–23.
Zurück zum Zitat *Chanijani, S. S. M., Klein, P., Al-Naser, M., Bukhari, S. S., Kuhn, J., & Dengel, A. (2016). A study on representational competence in physics using mobile eye tracking systems. Paper presented at the International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct. *Chanijani, S. S. M., Klein, P., Al-Naser, M., Bukhari, S. S., Kuhn, J., & Dengel, A. (2016). A study on representational competence in physics using mobile eye tracking systems. Paper presented at the International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct.
Zurück zum Zitat *Chen, C.-K. (2010). Curriculum Assessment Using Artificial Neural Network and Support Vector Machine Modeling Approaches: A Case Study. IR Applications. Volume 29. Association for Institutional Research (NJ1). *Chen, C.-K. (2010). Curriculum Assessment Using Artificial Neural Network and Support Vector Machine Modeling Approaches: A Case Study. IR Applications. Volume 29. Association for Institutional Research (NJ1).
Zurück zum Zitat *Chen, C.-M., Wang, J.-Y., & Yu, C.-M. (2017). Assessing the attention levels of students by using a novel attention aware system based on brainwave signals. British Journal of Educational Technology, 48(2), 348–369. *Chen, C.-M., Wang, J.-Y., & Yu, C.-M. (2017). Assessing the attention levels of students by using a novel attention aware system based on brainwave signals. British Journal of Educational Technology, 48(2), 348–369.
Zurück zum Zitat Chen, J., Zhang, Y., Wei, Y., & Hu, J. (2019). Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach. Research in Science Education, 1–30. Chen, J., Zhang, Y., Wei, Y., & Hu, J. (2019). Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach. Research in Science Education, 1–30.
Zurück zum Zitat Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37–46. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37–46.
Zurück zum Zitat Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4), 213. Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4), 213.
Zurück zum Zitat Cohen, J. (2013). Statistical power analysis for the behavioral sciences. New York: Lawrence Erlbaum Associates. Cohen, J. (2013). Statistical power analysis for the behavioral sciences. New York: Lawrence Erlbaum Associates.
Zurück zum Zitat Cooper, H., Valentine, J. C., Charlton, K., & Melson, A. (2003). The effects of modified school calendars on student achievement and on school and community attitudes. Review of Educational Research, 73(1), 1–52. Cooper, H., Valentine, J. C., Charlton, K., & Melson, A. (2003). The effects of modified school calendars on student achievement and on school and community attitudes. Review of Educational Research, 73(1), 1–52.
Zurück zum Zitat Donnelly, D. F., Vitale, J. M., & Linn, M. C. (2015). Automated guidance for thermodynamics essays: Critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874. Donnelly, D. F., Vitale, J. M., & Linn, M. C. (2015). Automated guidance for thermodynamics essays: Critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874.
Zurück zum Zitat Dusseldorp, E., Li, X., & Meulman, J. (2016). Which combinations of behaviour change techniques are effective assessing interaction effects in meta-analysis. European Health Psychologist, 18, 563. Dusseldorp, E., Li, X., & Meulman, J. (2016). Which combinations of behaviour change techniques are effective assessing interaction effects in meta-analysis. European Health Psychologist, 18, 563.
Zurück zum Zitat Duval, S., & Tweedie, R. (2000). A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. Journal of the American statistical association, 95(449), 89–98. Duval, S., & Tweedie, R. (2000). A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. Journal of the American statistical association, 95(449), 89–98.
Zurück zum Zitat *Elluri, S. (2017). A machine learning approach for identifying the effectiveness of simulation tools for conceptual understanding. (unpublished master’s thesis,10686333), Purdue University, West Lafayette, Indiana. *Elluri, S. (2017). A machine learning approach for identifying the effectiveness of simulation tools for conceptual understanding. (unpublished master’s thesis,10686333), Purdue University, West Lafayette, Indiana.
Zurück zum Zitat Everitt, B. (1968). Moments of the statistics kappa and weighted kappa. British Journal of Mathematical and Statistical Psychology, 21(1), 97–103. Everitt, B. (1968). Moments of the statistics kappa and weighted kappa. British Journal of Mathematical and Statistical Psychology, 21(1), 97–103.
Zurück zum Zitat Fleiss, J., Levin, B., & Paik, M. (2013). Statistical methods for rates and proportions. John Wiley & Sons. Fleiss, J., Levin, B., & Paik, M. (2013). Statistical methods for rates and proportions. John Wiley & Sons.
Zurück zum Zitat Gane, B., Zaidi, S., Zhai., X., & Pellegrino, J. (2020). Using Machine Learning to Score Tasks that Assess Three-dimensional Science Learning. Paper will be presented on the 2020 annual conference of the American Educational Research Association, California. Gane, B., Zaidi, S., Zhai., X., & Pellegrino, J. (2020). Using Machine Learning to Score Tasks that Assess Three-dimensional Science Learning. Paper will be presented on the 2020 annual conference of the American Educational Research Association, California.
Zurück zum Zitat Gerard, L. F., & Linn, M. C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129. Gerard, L. F., & Linn, M. C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129.
Zurück zum Zitat Gerard, L., Matuk, C., McElhaney, K., & Linn, M. C. (2015). Automated, adaptive guidance for K-12 education. Educational Research Review, 15, 41–58. Gerard, L., Matuk, C., McElhaney, K., & Linn, M. C. (2015). Automated, adaptive guidance for K-12 education. Educational Research Review, 15, 41–58.
Zurück zum Zitat Gerard, L. F., Ryoo, K., McElhaney, K. W., Liu, O. L., Ra"erty, A. N., & Linn, M. C. (2016). Automated guidance for student inquiry. Journal of Educational Psychology, 108(1), 60–81. Gerard, L. F., Ryoo, K., McElhaney, K. W., Liu, O. L., Ra"erty, A. N., & Linn, M. C. (2016). Automated guidance for student inquiry. Journal of Educational Psychology, 108(1), 60–81.
Zurück zum Zitat *Ghali, R., Frasson, C., & Ouellet, S. (2016, June). Using Electroencephalogram to Track Learner’s Reasoning in Serious Games. In International Conference on Intelligent Tutoring Systems (pp. 382–388). Springer, Cham. *Ghali, R., Frasson, C., & Ouellet, S. (2016, June). Using Electroencephalogram to Track Learner’s Reasoning in Serious Games. In International Conference on Intelligent Tutoring Systems (pp. 382–388). Springer, Cham.
Zurück zum Zitat *Ghali, R., Ouellet, S., & Frasson, C. (2016). LewiSpace: An exploratory study with a machine learning model in an educational game. Journal of Education and Training Studies, 4(1), 192–201. *Ghali, R., Ouellet, S., & Frasson, C. (2016). LewiSpace: An exploratory study with a machine learning model in an educational game. Journal of Education and Training Studies, 4(1), 192–201.
Zurück zum Zitat *Gobert, J. D., Baker, R. S., & Wixon, M. B. (2015). Operationalizing and detecting disengagement within online science microworlds. Educational Psychologist, 50(1), 43–57. *Gobert, J. D., Baker, R. S., & Wixon, M. B. (2015). Operationalizing and detecting disengagement within online science microworlds. Educational Psychologist, 50(1), 43–57.
Zurück zum Zitat *Gobert, J. D., Sao Pedro, M., Raziuddin, J., & Baker, R. S. (2013). From log files to assessment metrics: Measuring students’ science inquiry skills using educational data mining. Journal of the Learning Sciences, 22(4), 521–563. *Gobert, J. D., Sao Pedro, M., Raziuddin, J., & Baker, R. S. (2013). From log files to assessment metrics: Measuring students’ science inquiry skills using educational data mining. Journal of the Learning Sciences, 22(4), 521–563.
Zurück zum Zitat Goubeaud, K. (2010). How is science learning assessed at the postsecondary level? Assessment and grading practices in college biology, chemistry and physics. Journal of Science Education and Technology, 19(3), 237–245. Goubeaud, K. (2010). How is science learning assessed at the postsecondary level? Assessment and grading practices in college biology, chemistry and physics. Journal of Science Education and Technology, 19(3), 237–245.
Zurück zum Zitat Gwet, K. L. (2014). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC. Gwet, K. L. (2014). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.
Zurück zum Zitat *Ha, M. (2013). Assessing scientific practices using machine learning methods: Development of automated computer scoring models for written evolutionary explanations (Doctoral dissertation, The Ohio State University). *Ha, M. (2013). Assessing scientific practices using machine learning methods: Development of automated computer scoring models for written evolutionary explanations (Doctoral dissertation, The Ohio State University).
Zurück zum Zitat *Ha, M., & Nehm, R. H. (2016a). The impact of misspelled words on automated computer scoring: A case study of scientific explanations. Journal of Science Education and Technology, 25(3), 358–374. *Ha, M., & Nehm, R. H. (2016a). The impact of misspelled words on automated computer scoring: A case study of scientific explanations. Journal of Science Education and Technology, 25(3), 358–374.
Zurück zum Zitat *Ha, M., & Nehm, R. (2016b). predicting the accuracy of computer scoring of text: Probabilistic, multi-model, and semantic similarity approaches. Paper in proceedings of the National Association for Research in Science Teaching, Baltimore, MD, April, 14–17. *Ha, M., & Nehm, R. (2016b). predicting the accuracy of computer scoring of text: Probabilistic, multi-model, and semantic similarity approaches. Paper in proceedings of the National Association for Research in Science Teaching, Baltimore, MD, April, 14–17.
Zurück zum Zitat *Ha, M., Nehm, R. H., UrbanLurain, M., & Merrill, J. E. (2011). Applying computerized-scoring models of written biological explanations across courses and colleges: Prospects and limitations. CBE Life Sciences Education, 10(4), 379–393. *Ha, M., Nehm, R. H., UrbanLurain, M., & Merrill, J. E. (2011). Applying computerized-scoring models of written biological explanations across courses and colleges: Prospects and limitations. CBE Life Sciences Education, 10(4), 379–393.
Zurück zum Zitat Huang, C.-J., Wang, Y.-W., Huang, T.-H., Chen, Y.-C., Chen, H.-M., & Chang, S.-C. (2011). Performance evaluation of an online argumentation learning assistance agent. Computers & Education, 57(1), 1270–1280. Huang, C.-J., Wang, Y.-W., Huang, T.-H., Chen, Y.-C., Chen, H.-M., & Chang, S.-C. (2011). Performance evaluation of an online argumentation learning assistance agent. Computers & Education, 57(1), 1270–1280.
Zurück zum Zitat Hunt, R. J. (1986). Percent agreement, Pearson’s correlation, and kappa as measures of inter-examiner reliability. Journal of Dental Research, 65(2), 128–130. Hunt, R. J. (1986). Percent agreement, Pearson’s correlation, and kappa as measures of inter-examiner reliability. Journal of Dental Research, 65(2), 128–130.
Zurück zum Zitat Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated Machine Learning: Springer. Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated Machine Learning: Springer.
Zurück zum Zitat Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.
Zurück zum Zitat Jovic, A., Brkic, K., & Bogunovic, N. (2014, May). An overview of free software tools for general data mining. In 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1112–1117). IEEE. Jovic, A., Brkic, K., & Bogunovic, N. (2014, May). An overview of free software tools for general data mining. In 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1112–1117). IEEE.
Zurück zum Zitat *Kim, K. J., Pope, D. S., Wendel, D., & Meir, E. (2017). WordBytes: Exploring an intermediate constraint format for rapid classification of student answers on constructed response assessments. Journal of Educational Data Mining, 9(2), 45–71. *Kim, K. J., Pope, D. S., Wendel, D., & Meir, E. (2017). WordBytes: Exploring an intermediate constraint format for rapid classification of student answers on constructed response assessments. Journal of Educational Data Mining, 9(2), 45–71.
Zurück zum Zitat *Klebanov, B., Burstein, J., Harackiewicz, J. M., Priniski, S. J., & Mulholland, M. (2017). Reflective writing about the utility value of science as a tool for increasing stem motivation and retention – Can AI help scale up? International Journal of Artificial Intelligence in Education, 27(4), 791–818. *Klebanov, B., Burstein, J., Harackiewicz, J. M., Priniski, S. J., & Mulholland, M. (2017). Reflective writing about the utility value of science as a tool for increasing stem motivation and retention – Can AI help scale up? International Journal of Artificial Intelligence in Education, 27(4), 791–818.
Zurück zum Zitat Konstantopoulos, S. (2011). Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods, 2(1), 61–76. Konstantopoulos, S. (2011). Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods, 2(1), 61–76.
Zurück zum Zitat Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155–163. Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155–163.
Zurück zum Zitat Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. biometrics, 159–174. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. biometrics, 159–174.
Zurück zum Zitat Leacock, C., Messineo, D., & Zhang, X. (2013). Issues in prompt selection for automated scoring of short answer questions. In annual conference of the National Council on Measurement in Education, San Francisco, CA. Leacock, C., Messineo, D., & Zhang, X. (2013). Issues in prompt selection for automated scoring of short answer questions. In annual conference of the National Council on Measurement in Education, San Francisco, CA.
Zurück zum Zitat Lee, H.-S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622. Lee, H.-S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622.
Zurück zum Zitat *Lintean, M., Rus, V., & Azevedo, R. (2012). Automatic detection of student mental models based on natural language student input during metacognitive skill training. International Journal of Artificial Intelligence in Education, 21(3), 169–190. *Lintean, M., Rus, V., & Azevedo, R. (2012). Automatic detection of student mental models based on natural language student input during metacognitive skill training. International Journal of Artificial Intelligence in Education, 21(3), 169–190.
Zurück zum Zitat Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: Prospects and obstacles. Educational measurement: issues and practice, 33(2), 19–28. Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: Prospects and obstacles. Educational measurement: issues and practice, 33(2), 19–28.
Zurück zum Zitat Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233. Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.
Zurück zum Zitat *Lottridge, S., Wood, S., & Shaw, D. (2018). The effectiveness of machine score-ability ratings in predicting automated scoring performance. Applied Measurement in Education, 31(3), 215–232. *Lottridge, S., Wood, S., & Shaw, D. (2018). The effectiveness of machine score-ability ratings in predicting automated scoring performance. Applied Measurement in Education, 31(3), 215–232.
Zurück zum Zitat *Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H.-S., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121–138. *Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H.-S., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121–138.
Zurück zum Zitat *Mason, R. A., & Just, M. A. (2016). Neural representations of physics concepts. Psychological science, 27(6), 904–913. *Mason, R. A., & Just, M. A. (2016). Neural representations of physics concepts. Psychological science, 27(6), 904–913.
Zurück zum Zitat McGraw-Hill Education, C. T. B. (2014). Smarter balanced assessment consortium field test: Automated scoring research studies (in accordance with smarter balanced RFP 17). McGraw-Hill Education, C. T. B. (2014). Smarter balanced assessment consortium field test: Automated scoring research studies (in accordance with smarter balanced RFP 17).
Zurück zum Zitat *Moharreri, K., Ha, M., & Nehm, R. H. (2014). EvoGrader: An online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 15. *Moharreri, K., Ha, M., & Nehm, R. H. (2014). EvoGrader: An online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 15.
Zurück zum Zitat Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning: MIT press. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning: MIT press.
Zurück zum Zitat Montalvo, O., Baker, R. S., Sao Pedro, M. A., Nakama, A., & Gobert, J. D. (2010). Paper presented at the Educational Data Mining: Identifying students’ inquiry planning using machine learning. Montalvo, O., Baker, R. S., Sao Pedro, M. A., Nakama, A., & Gobert, J. D. (2010). Paper presented at the Educational Data Mining: Identifying students’ inquiry planning using machine learning.
Zurück zum Zitat *Muldner, K., Burleson, W., Van de Sande, B., & VanLehn, K. (2011). An analysis of students’ gaming behaviors in an intelligent tutoring system: Predictors and impacts. User Modeling and User-Adapted Interaction, 21(1–2), 99–135. *Muldner, K., Burleson, W., Van de Sande, B., & VanLehn, K. (2011). An analysis of students’ gaming behaviors in an intelligent tutoring system: Predictors and impacts. User Modeling and User-Adapted Interaction, 21(1–2), 99–135.
Zurück zum Zitat Nakamura, C. M., Murphy, S. K., Christel, M. G., Stevens, S. M., & Zollman, D. A. (2016). Automated analysis of short responses in an interactive synthetic tutoring system for introductory physics. Physical Review Physics Education Research, 12(1), 010122. Nakamura, C. M., Murphy, S. K., Christel, M. G., Stevens, S. M., & Zollman, D. A. (2016). Automated analysis of short responses in an interactive synthetic tutoring system for introductory physics. Physical Review Physics Education Research, 12(1), 010122.
Zurück zum Zitat National Research Council. (2012). A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas. Committee on a Conceptual Framework for New K-12 Science Education Standards. Board on Science Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. National Research Council. (2012). A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas. Committee on a Conceptual Framework for New K-12 Science Education Standards. Board on Science Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
Zurück zum Zitat National Research Council. (2014). Developing Assessments for the Next Generation Science Standards. Committee on Developing Assessments of Science Proficiency in K-12. Board on Testing and Assessment and Board on Science Education, J.W. Pellegrino, M.R. Wilson, J.A. Koenig, and A.S. Beatty, Editors. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. National Research Council. (2014). Developing Assessments for the Next Generation Science Standards. Committee on Developing Assessments of Science Proficiency in K-12. Board on Testing and Assessment and Board on Science Education, J.W. Pellegrino, M.R. Wilson, J.A. Koenig, and A.S. Beatty, Editors. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
Zurück zum Zitat *Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196. *Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.
Zurück zum Zitat *Nehm, R. H., & Haertig, H. (2012). Human vs computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56–73. *Nehm, R. H., & Haertig, H. (2012). Human vs computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56–73.
Zurück zum Zitat NGSS Lead States. (2013). Next Generation Science Standards: For States, By States. Washington, DC: The National Academies Press. NGSS Lead States. (2013). Next Generation Science Standards: For States, By States. Washington, DC: The National Academies Press.
Zurück zum Zitat *Okoye, I., Sumner, T., & Bethard, S. (2013). Automatic extraction of core learning goals and generation of pedagogical sequences through a collection of digital library resources. Paper presented at the Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. *Okoye, I., Sumner, T., & Bethard, S. (2013). Automatic extraction of core learning goals and generation of pedagogical sequences through a collection of digital library resources. Paper presented at the Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries.
Zurück zum Zitat *Okoye, I. U. (2015). Building an educational recommender system based on conceptual change learning theory to improve students' understanding of science concepts. (AAI3704786 Ph.D.), University of Colorado at Boulder. *Okoye, I. U. (2015). Building an educational recommender system based on conceptual change learning theory to improve students' understanding of science concepts. (AAI3704786 Ph.D.), University of Colorado at Boulder.
Zurück zum Zitat *Opfer, J. E., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: knowing what students know about evolution. Journal of Research in Science Teaching, 49(6), 744–777. *Opfer, J. E., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: knowing what students know about evolution. Journal of Research in Science Teaching, 49(6), 744–777.
Zurück zum Zitat Parsons, S. (2016). Authenticity in Virtual Reality for assessment and intervention in autism: A conceptual review. Educational Research Review, 19, 138–157. Parsons, S. (2016). Authenticity in Virtual Reality for assessment and intervention in autism: A conceptual review. Educational Research Review, 19, 138–157.
Zurück zum Zitat Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63. Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
Zurück zum Zitat Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological bulletin, 86(3), 638–641. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological bulletin, 86(3), 638–641.
Zurück zum Zitat Rothstein, H. R. (2008). Publication bias as a threat to the validity of meta-analytic results. Journal of Experimental Criminology, 4(1), 61–81. Rothstein, H. R. (2008). Publication bias as a threat to the validity of meta-analytic results. Journal of Experimental Criminology, 4(1), 61–81.
Zurück zum Zitat *Ryoo, K., & Linn, M. C. (2016). Designing automated guidance for concept diagrams in inquiry instruction. Journal of Research in Science Teaching, 53(7), 1003–1035. *Ryoo, K., & Linn, M. C. (2016). Designing automated guidance for concept diagrams in inquiry instruction. Journal of Research in Science Teaching, 53(7), 1003–1035.
Zurück zum Zitat *Ryoo, K., & Linn, M. C. (2016). Designing automated guidance for concept diagrams in inquiry instruction. Journal of Research in Science Teaching, 53(7), 1003–1035. *Ryoo, K., & Linn, M. C. (2016). Designing automated guidance for concept diagrams in inquiry instruction. Journal of Research in Science Teaching, 53(7), 1003–1035.
Zurück zum Zitat Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3), 210–229. Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3), 210–229.
Zurück zum Zitat *Sao Pedro, M., Baker, R. S., Montalvo, O., Nakama, A., & Gobert, J. D. (2010). Using text replay tagging to produce detectors of systematic experimentation behavior patterns. Paper presented at the Educational Data Mining 2010. *Sao Pedro, M., Baker, R. S., Montalvo, O., Nakama, A., & Gobert, J. D. (2010). Using text replay tagging to produce detectors of systematic experimentation behavior patterns. Paper presented at the Educational Data Mining 2010.
Zurück zum Zitat *Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65. *Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65.
Zurück zum Zitat *Shermis, M. D., & Burstein, J. C. (2003). Automated essay scoring: A cross-disciplinary perspective. Routledge. *Shermis, M. D., & Burstein, J. C. (2003). Automated essay scoring: A cross-disciplinary perspective. Routledge.
Zurück zum Zitat *Steele, M. M., Merrill, J., Haudek, K., & Urban-Lurain, M. (2016). The development of constructed response astronomy assessment items. Paper presented at the National Association for Research in Science Teaching (NARST), Baltimore, MD. *Steele, M. M., Merrill, J., Haudek, K., & Urban-Lurain, M. (2016). The development of constructed response astronomy assessment items. Paper presented at the National Association for Research in Science Teaching (NARST), Baltimore, MD.
Zurück zum Zitat Sun, S. (2011). Meta-analysis of Cohen’s kappa. Health Services and Outcomes Research Methodology, 11(3–4), 145–163. Sun, S. (2011). Meta-analysis of Cohen’s kappa. Health Services and Outcomes Research Methodology, 11(3–4), 145–163.
Zurück zum Zitat *Tansomboon, C., Gerard, L. F., Vitale, J. M., & Linn, M. C. (2017). Designing automated guidance to promote the productive revision of science explanations. International Journal of Artificial Intelligence in Education, 27(4), 729–757. *Tansomboon, C., Gerard, L. F., Vitale, J. M., & Linn, M. C. (2017). Designing automated guidance to promote the productive revision of science explanations. International Journal of Artificial Intelligence in Education, 27(4), 729–757.
Zurück zum Zitat Tufféry, S. (2011). Data mining and statistics for decision making. John Wiley & Sons. Tufféry, S. (2011). Data mining and statistics for decision making. John Wiley & Sons.
Zurück zum Zitat Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of statistical software, 36(3), 1–48. Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of statistical software, 36(3), 1–48.
Zurück zum Zitat *Vitale, J., Lai, K., & Linn, M. (2015). Taking advantage of automated assessment of student-constructed graphs in science. Journal of Research in Science Teaching, 52(10), 1426–1450. *Vitale, J., Lai, K., & Linn, M. (2015). Taking advantage of automated assessment of student-constructed graphs in science. Journal of Research in Science Teaching, 52(10), 1426–1450.
Zurück zum Zitat *Wang, H. C., Chang, C. Y., & Li, T. Y. (2008). Assessing creative problem-solving with automated text grading. Computers & Education, 51(4), 1450–1466. *Wang, H. C., Chang, C. Y., & Li, T. Y. (2008). Assessing creative problem-solving with automated text grading. Computers & Education, 51(4), 1450–1466.
Zurück zum Zitat Wiley, J., Hastings, P., Blaum, D., Jaeger, A. J., Hughes, S., & Wallace, P. (2017). Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. International Journal of Artificial Intelligence in Education, 27(4), 758–790. Wiley, J., Hastings, P., Blaum, D., Jaeger, A. J., Hughes, S., & Wallace, P. (2017). Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. International Journal of Artificial Intelligence in Education, 27(4), 758–790.
Zurück zum Zitat Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational measurement: issues and practice, 31(1), 2–13. Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational measurement: issues and practice, 31(1), 2–13.
Zurück zum Zitat *Yan, J. (2014). A computer-based approach for identifying student conceptual change (Unpublished master’s dissertation). West Lafayette, Indiana: Purdue University. *Yan, J. (2014). A computer-based approach for identifying student conceptual change (Unpublished master’s dissertation). West Lafayette, Indiana: Purdue University.
Zurück zum Zitat Yeh, S. S. (2009). Class size reduction or rapid formative assessment?: A comparison of cost-effectiveness. Educational Research Review, 4(1), 7–15. Yeh, S. S. (2009). Class size reduction or rapid formative assessment?: A comparison of cost-effectiveness. Educational Research Review, 4(1), 7–15.
Zurück zum Zitat *Yoo, J., & Kim, J. (2014). Can online discussion participation predict group project performance? Investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education, 24(1), 8–32. *Yoo, J., & Kim, J. (2014). Can online discussion participation predict group project performance? Investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education, 24(1), 8–32.
Zurück zum Zitat *Zehner, F., Sälzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and psychological measurement, 76(2), 280–303. *Zehner, F., Sälzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and psychological measurement, 76(2), 280–303.
Zurück zum Zitat Zhai, X., Yin, Y., Pellegrino, J., Haudek, K., & Shi, L. (2020b). Applying machine learning in science assessment: A systematic review. Studies in Science Education., 56(1), 111–151. Zhai, X., Yin, Y., Pellegrino, J., Haudek, K., & Shi, L. (2020b). Applying machine learning in science assessment: A systematic review. Studies in Science Education., 56(1), 111–151.
Zurück zum Zitat Zhu, M., Lee, H. S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668. Zhu, M., Lee, H. S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.
Zurück zum Zitat Zhu, M., Liu, O. L., & Lee, H. S. (2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668. Zhu, M., Liu, O. L., & Lee, H. S. (2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668.
Zurück zum Zitat Zhai, X. (in press). Advancing automatic guidance in virtual science inquiry: From ease of use to personalization. Educational Technology Research and Development. Zhai, X. (in press). Advancing automatic guidance in virtual science inquiry: From ease of use to personalization. Educational Technology Research and Development.
Metadaten
Titel
A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements
verfasst von
Xiaoming Zhai
Lehong Shi
Ross H. Nehm
Publikationsdatum
19.11.2020
Verlag
Springer Netherlands
Erschienen in
Journal of Science Education and Technology / Ausgabe 3/2021
Print ISSN: 1059-0145
Elektronische ISSN: 1573-1839
DOI
https://doi.org/10.1007/s10956-020-09875-z

Weitere Artikel der Ausgabe 3/2021

Journal of Science Education and Technology 3/2021 Zur Ausgabe

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.