Weitere Artikel dieser Ausgabe durch Wischen aufrufen
The online version of this article (https://doi.org/10.1007/s11092-018-9285-5) contains supplementary material, which is available to authorized users.
This study examines the association between two measures of teaching effectiveness—a student survey measure and a classroom observation measure—to determine whether their correlation depends on the study design. The sample includes 160 classroom observations of 56 teachers across 15 classes, in which students also rated the teachers with a survey. Dependencies are examined using generalizability theory. Results suggest that the correlation between the survey and observation measures depends on the number of classroom observations, the number of student ratings, and whether the designs are nested or partially nested. The effect is substantial: Predicted correlations range between 0.10–0.80 for the same variables with different study designs. In particular, the number of classroom observations has a notable influence, such that across all investigated scenarios, the correlation doubles when observers visit three lessons instead of one. Correlations also tend to be positively biased when research designs are nested.
ESM 1 (DOCX 54 kb)11092_2018_9285_MOESM1_ESM.docx
Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: a summary of the research and literature. (IDEA paper no. 50). Retrieved March 3, 2015, from http://www.ntid.rit.edu/sites/default/files/academic_affairs/Sumry%20of%20Res%20%2350%20Benton%202012.pdf.
Brennan, R. L. (2001). Generalizability theory: statistics for social science and public policy. New York, NY: Springer-Verlag. CrossRef
Brennan, R. L. (2010). Generalizability theory and classical test theory. Applied Measurement in Education, 24, 1–21. CrossRef
Briggs, D. C., & Wilson, M. (2007). Generalizability in item response theory. Journal of Educational Measurement, 44, 131–155. CrossRef
Carrier, N. A., Howard, G. S., & Miller, W. G. (1974). Course evaluation: When? Journal of Educational Psychology, 66, 609–613. CrossRef
Charalambous, C., Kyriakides, L., Tsangaridou, N., & Kyriakides, L. (2017). Exploring the reliability of generic and content-specific instructional aspects in physical education lessons. School Improvement and School Effectiveness (online first publication). https://doi.org/10.1080/09243453.2017.1311929 CrossRef
Choi, J. (2013). Advances in combining generalizability theory and item response theory. Doctoral dissertation, University of California, Berkeley.
Costin, F. (1968). A graduate course in the teaching of psychology: description and evaluation. Journal of Teacher Education, 19, 425–432. CrossRef
Cronbach, L. J., Gleser, C. G., Rajaratnam, N., & Nanda, H. (1972). The dependability of behavioral measurements. New York, NY: Wiley.
Darling-Hammond, L. (2013). Getting teacher evaluation right. What really matters for effectiveness and improvement. New York, NY: Teachers College Press.
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Abe, H., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39, 1–25. CrossRef
De Jong, R., & Westerhof, K. J. (2001). The quality of student ratings of teacher behaviour. Learning Environments Research, 4, 51–85. CrossRef
Fan, X., & Sun, S. (2014). Generalizability theory as a unifying framework of measurement reliability in adolescent research. Journal of Early Adolescence, 34, 38–65. CrossRef
Ferguson, R. F., & Danielson, C. (2015). How framework for teaching and tripod 7Cs evidence distinguish key components of effective teaching. In T. J. Kane, K. A. Kerr, & R. C. Pianta (Eds.), Designing teacher evaluation systems. San Francisco: John Wiley & Sons, Inc.
Glas, C. A. W. (2012). Generalizability theory and item response theory. In T. J. H. M. Eggen, & B. P. Veldkamp (Eds.), Psychometrics in practice at RCEC. E-book, Adobe pdf version. https://doi.org/10.3990/3.9789036533744.ch1.
Goe, L., & Croft, A. (2009). Methods of evaluating teacher effectiveness. Washington, DC: National Comprehensive Center for Teacher Quality.
Howard, G. S., Conway, C. G., & Maxwell, S. E. (1985). Construct validity of measures of college teaching effectiveness. Journal of Educational Psychology, 77(2), 187–196. CrossRef
Isoré, M. (2009). Teacher evaluation: Current practices in OECD countries and a literature review. OECD education working papers, no. 23. OECD publishing (NJ1).
Kane, M. T., & Brennan, R. L. (1977). The generalizability of class means. Review of Educational Research, 47(2), 267–292. CrossRef
Kane, M. T., & Case, S. M. (2004). The reliability and validity of weighted composite scores. Applied Measurement in Education, 17(3), 221–240. CrossRef
Kane, T. J., Staiger, D. O., McCaffrey, D., Cantrell, S., Archer, J., Buhayar, S., Kerr, K., Kawakita, T., & Parker, D. (2012). Gathering feedback for teaching: combining high-quality observations with student surveys and achievement gains. Seattle, WA: Bill & Melinda Gates Foundation.
Kenny, D. A. (2004). PERSON: a general model of interpersonal perception. Personality and Social Psychology Review, 8, 265–280. CrossRef
Kolen, M. J., & Brennan, R. L. (2013). Test equating: methods and practices . Springer Science & Business Media.
Levy, J., Wubbels, T., den Brok, P., & Brekelmans, M. (2003). Students’ perceptions of interpersonal aspects of the learning environment. Learning Environments Research, 6, 5–36. CrossRef
Mainhard, M. T., Brekelmans, M., den Brok, P., & Wubbels, T. (2011). The development of the classroom social climate during the first months of the school year. Contemporary Educational Psychology, 36, 190–200. CrossRef
Marsh, H. D. (2007). Students’ evaluations of university teaching: dimensionality, reliability, validity, potential biases and usefulness. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319–383). Dordrecht: Springer. CrossRef
Martínez, J. F. (2012). Consequences of omitting the classroom in multilevel models of schooling: an illustration using opportunity to learn and reading achievement. School Effectiveness and School Improvement, 23(3), 305–326. CrossRef
Martínez, J. F., Schweig, J., & Goldschmidt, P. (2016). Approaches for combining multiple measures of teacher performance: reliability, validity, and implications for evaluation policy. Educational Evaluation and Policy Analysis, 38(4), 738–756. CrossRef
Marzano, R. J., & Toth, M. D. (2013). Teacher evaluation that makes a difference: a new model for teacher growth and student achievement. Alexandria, VA: ASCD.
Maulana, R., Helms-Lorenz, M., & van de Grift, W. J. C. M. (2015). Development and evaluation of a survey measuring pre-service teachers’ teaching behaviour: a Rasch modelling approach. School Effectiveness and School Improvement, 26(2), 169–194. CrossRef
Mihaly, K., McCaffrey, D. F., Staiger, D. O., & Lockwood, J. R. (2013). A composite estimator of effective teaching. Seattle, WA: Bill & Melinda Gates Foundation.
Mourshed, M., Chijioke, C., & Barber, M. (2010). How the world’s most improved school systems keep getting better. London: McKinsey Company.
Muijs, D. (2006). Measuring teacher effectiveness: some methodological reflections. Educational Research and Evaluation: An International Journal on Theory and Practice, 12, 53–74. CrossRef
Muijs, D., Kyriakides, L., van der Werf, G., Creemers, B., Timperley, H., & Earl, L. (2014). State of the art – teacher effectiveness and professional learning. School Effectiveness and School Improvement, 25(2), 231–256. https://doi.org/10.1080/09243453.2014.885451. CrossRef
Murray, H. G. (1983). Low-inference classroom teaching and student ratings of college teaching effectiveness. Journal of Educational Psychology, 75(1), 138–149. CrossRef
NCTQ. (2013). Connect the dots: using evaluations of teaching effectiveness to inform policy and practice. Washington, DC: NCTQ.
Praetorius, A. K., Pauli, C., Reusser, K., Rakoczy, K., & Klieme, E. (2014). One lesson is all you need? Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12. CrossRef
Scriven, M. (1987). Validity in personnel evaluation. Journal of Personnel Evaluation in Education, 1, 9–23. CrossRef
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: a primer. Thousand Oaks, CA: Sage Publications.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72–101. CrossRef
Traub, R. E. (1994). Reliability for the social sciences: theory and applications. Thousand Oaks, CA: Sage.
Van de Grift, W. J. C. M. (2014). Measuring teaching quality in several European countries. School Effectiveness and School Improvement, 25(3), 295–311. https://doi.org/10.1080/09243453.2013.794845. CrossRef
Van de Grift, W. J. C. M., Helms-Lorenz, M., & Maulana, R. (2014). Teaching skills of student teachers: calibration of an evaluation instrument and its value in predicting student academic engagement. Studies in Educational Evaluation, 43, 150–159. https://doi.org/10.1016/j.stueduc.2014.09.003. CrossRef
van der Lans, R. M. (2017). Teacher evaluation through observation: Application of classroom observation and student ratings to improve teaching effectiveness in classrooms. Published doctoral dissertation, Ipskamp Printing, Enschede.
van der Lans, R. M., van de Grift, W. J. C. M., & van Veen, K. (2015). Developing a teacher evaluation instrument to provide formative feedback using student ratings of teaching acts. Educational Measurement: Issues and Practice, 34(3), 18–27. CrossRef
van der Lans, R. M., van de Grift, W. J. C. M., van Veen, K., & Fokkens-Bruinsma, M. (2016). Once is not enough: establishing reliability criteria for feedback and evaluation decisions based on classroom observations. Studies in Educational Evaluation, 50, 88–95. CrossRef
van der Lans, R. M., van de Grift, W. J. C. M., & van Veen, K. (2017). Individual differences in teacher development: an exploration of the applicability of a stage model to assess individual teachers. Learning and Individual Differences, 58, 46–55. CrossRef
van der Lans, R. M., van de Grift, W. J. C. M., & van Veen, K. (2018). Developing an instrument for teacher feedback: using the rasch model to explore teachers’ development of effective teaching strategies and behaviors. The Journal of Experimental Education, 86(2), 247–264. CrossRef
- On the “association between two things”: the case of student surveys and classroom observations of teaching quality
Rikkert M. van der Lans
- Springer Netherlands
Educational Assessment, Evaluation and Accountability
Print ISSN: 1874-8597
Elektronische ISSN: 1874-8600
Neuer Inhalt/© Stellmach, Neuer Inhalt/© Maturus, Pluta Logo/© Pluta, digitale Transformation/© Maksym Yemelyanov | Fotolia