Skip to main content
Log in

Generalizability of Multidimensional Student Ratings of University Instruction Across Courses and Teachers

  • Published:
Research in Higher Education Aims and scope Submit manuscript

Abstract

Course quality is multifaceted, being determined by instructor, students, and external conditions. Consequently, any attempt at measurement should reflect this diversity, so that stable evaluations can be made that reflect both personal (instructor) and situational (student and external conditions) variables. This study extends previous research by examining the stability of both dimensions across different courses, student populations, and universities. In addition, the sample (N = 692 courses) was drawn from 6 traditional and technical German universities that have a different ethos of student interaction with academic staff than those in many other Western countries. Using the Heidelberg Inventory, it was found that instructor variables were reliable across courses given by the same instructor, but student scales or background variables were less consistent across courses in which the content was identical. It was concluded that the instrument was both reliable and valid for student evaluations of both teaching performance and course quality within a European context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  • Asendorpf, J., and Wallbott, H. G. (1979). Maβe der Beobachterü bereinstimmung: ein systematischer Vergleich [Measures of observer agreement: a systematic comparison]. Zeitschrift für Sozialpsychologie 10: 243–252.

    Google Scholar 

  • Bausell, R., Schwartz, S., and Purohit, A. (1975). An examination of the conditions under which various student rating parameters replicate across time. Journal of Educational Measurement 12: 273–280.

    Google Scholar 

  • Biggs, J. B. (1990). Asian students' approaches to learning: implications for teaching overseas students. Keynote discussion paper. Proceedings of the 8th Australasian Tertiary Learning Skills and Language Conference (pp. 1–51). Brisbane: Q.U.T.

    Google Scholar 

  • Bortz, J., and Döring, N. (1995). Forschungsmethoden und Evaluation [Research methods and evaluation]. Berlin: Springer.

    Google Scholar 

  • Bortz, J., Lienert, G. A., and Boehnke, K. (1990). Verteilungsfreie Methoden in der Biostatistik [Distribution-free methods in biostatistics]. Berlin: Springer.

    Google Scholar 

  • Cashin, W. E. (1990). Student ratings of teaching: recommendations for use. Manhattan, KS: Center for Faculty Evaluation and Development (IDEA No. 22).

    Google Scholar 

  • Cashin, W. E. (1996). Developing an effective faculty evaluation system. Manhattan, KS: Center for Faculty Evaluation and Development (IDEA No. 33).

    Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H., and Rajaratnam, N. (1972). The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York: John Wiley.

    Google Scholar 

  • Cruickshank, D. R., and Kennedy, J. J. (1986). Teacher clarity. Teaching & Teacher Education 2: 43–67.

    Google Scholar 

  • Daniel, H.-D. (1996). Evaluierung der universitären Lehre durch Studenten und Absol-venten [Evaluation of university teaching through students and graduates]. Zeitschrift für Sozialisationsforschung und Erziehungssoziologie 16: 149–164.

    Google Scholar 

  • Dunkin, M. J., and Barnes, J. (1986). Research on teaching in higher education. In M. C. Wittrock (ed.), Handbook of research on teaching, pp. 754–777. New York: Macmillan.

    Google Scholar 

  • Emmer, E. T., Evertson, C. M., and Brophy, J. E. (1979). Stability of teacher effects in junior high classrooms. American Educational Research Journal 16: 71–75.

    Google Scholar 

  • Feger, H. (1992). Vergleichende bewertung von lehrveranstaltungen—anmerkungen zur methodik [Comparative evaluation of courses—notation to methods used]. In D. Grühn and H. Gattwinkel (ed.), Evaluation von Lehrveranstaltungen. űberfrachtung eines sinnvollen Instrumentes? [Evaluation of courses. The overloading of a meaning-ful instrument?], pp. 127–142. Berlin: FU-Dokumentationsreihe.

    Google Scholar 

  • Feldman, K. A. (1977). Consistency and variability among college students in rating their teachers and courses: a review and analysis. Research in Higher Education 6: 223–274.

    Google Scholar 

  • Feldman, K. A. (1978). Course characteristics and college students' ratings of their teachers: What we know and what we don't. Research in Higher Education 9: 199–242.

    Google Scholar 

  • Frey, P. W. (1978). A two-dimensional analysis of student ratings of instruction. Research in Higher Education 9: 69–91.

    Google Scholar 

  • Gillmore, G. M. (1977). How large is the course effect? A note on Romney's course effect vs. teacher effect on students' ratings of teacher competence. Research in Higher Education 7: 187–189.

    Google Scholar 

  • Gillmore, G. M., Kane, M. T., and Naccarato, R. W. (1978). The generalizability of student ratings of instruction: estimation of the teacher and course components. Journal of Educational Measurement 15: 1–13.

    Google Scholar 

  • Greenwald, A. G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist 52: 1182–1186.

    Google Scholar 

  • Hage, N. el (1996). Lehrevaluation und studentische Veranstaltungskritik [Teaching evaluation and students' course criticism]. Bonn: Bundesministerium für Bildung, Wissenschaft, Forschung, und Technologie.

    Google Scholar 

  • Hanges, P. J., Schneider, B., and Niles, K. (1990). Stability of performance: an interactionist perspective. Journal of Applied Psychology 75: 658–667.

    Google Scholar 

  • Hogan, T. P. (1973). Similarity of student ratings across instructors, courses, and time. Research in Higher Education 1: 149–154.

    Google Scholar 

  • Holloway, S. D. (1988). Concepts of ability and effort in Japan and the United States. Review of Educational Research 58: 327–345.

    Google Scholar 

  • Kane, M. T., Gillmore, G. M., and Crooks, T. J. (1976). Student evaluations of teaching: the generalizability of class means. Journal of Educational Measurement 13(3): 171–183.

    Google Scholar 

  • Lienert, G. A., and Raatz, U. (1994). Testaufbau und Testanalyse [Test construction and test analysis]. Weinheim: Beltz.

    Google Scholar 

  • Marsh, H. W. (1982). The use of path analysis to estimate teacher and course effects in student ratings of instructional effectiveness. Applied Psychological Measurement 6: 47–60.

    Google Scholar 

  • Marsh, H. W. (1983). Multidimensional ratings of teaching effectiveness by students from different academic settings and their relation to student/course/instructor characteristics. Journal of Educational Psychology 75: 150–166.

    Google Scholar 

  • Marsh, H. W., and Bailey, H. W. (1993). Multidimensional students' evaluations of teaching effectiveness: A profile analysis. Journal of Higher Education 64: 1–18.

    Google Scholar 

  • Marsh, H. W., and Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective. American Psychologist 52: 1187–1197.

    Google Scholar 

  • McKeachie, W. J. (1997). Student ratings. American Psychologist 52: 1218–1225.

    Google Scholar 

  • Meredith, G. M. (1975). Structure of student-based evaluation ratings. Journal of Psychology 91: 3–9.

    Google Scholar 

  • Murray, H. G., Rushton, J. Rh., and Paunonen, S. V. (1990). Teacher personality traits and student instructional ratings in six types of university courses. Journal of Educa-tional Psychology 82: 250–261.

    Google Scholar 

  • Preiβer, R. (1993). Abschluβbericht zur ersten Phase des Studienreformprojekts “Evaluation der Lehre” an der Technischen Universität Berlin [Final report on the first phase of the educational reform project “Evaluation of Teaching” at the Technological University Berlin]. Berlin: TU-Bericht.

    Google Scholar 

  • Rindermann, H. (1996a). Untersuchungen zur Brauchbarkeit studentischer Lehrevaluati-onen [Investigation into the usefulness of student course evaluations]. Landau: Empirische Pädagogik.

    Google Scholar 

  • Rindermann, H. (1996b). Zur Qualität studentischer Lehrveranstaltungsevaluationen: Eine Antwort auf Kritik an der Lehrevaluation [Quality of student course evaluations: An answer to criticism of the teaching evaluations]. Zeitschrift für Pädagogische Psychologie 10: 129–145.

    Google Scholar 

  • Rindermann, H. (1997). Die studentische Beurteilung von Lehrveranstaltungen: Forschungsstand und Implikationen für den Einsatz von Lehrevaluationen [The student judgement of courses: The state of the current research and implications for the use of teaching evaluations]. Tests und Trends (Jahrbuch der Pädagogischen Diagnostik 11), pp. 12–53. Weinheim: Beltz.

    Google Scholar 

  • Rindermann, H. (1998). űbereinstimmung und Divergenz bei der studentischen Beurtei-lung von Lehrveranstaltungen: Methoden zu ihrer Berechnung und Konsequenzen für die Lehrevaluation [Agreement and divergence in the student judgement of courses. Methods of computation and consequences for teaching evaluation]. Zeitschrift für Differentielle und Diagnostische Psychologie 19: 73–92.

    Google Scholar 

  • Rindermann, H. (1999). Bedingungs-und Effektvariablen in der Lehrevaluationsforschung [Concept and examination of the Munich multifactored model of course quality]. Unterrichtswissenschaft 27: 357–380.

    Google Scholar 

  • Rindermann, H., and Amelang, M. (1994). Das Heidelberger Inventar zur Lehrveranstaltungs-Evaluation (HILVE). Handanweisung [The Heidelberg Inventory on Teaching Evaluation]. Heidelberg: Asanger.

    Google Scholar 

  • Romney, D. (1976). Course effect vs. teacher effect on students' ratings of teaching competence. Research in Higher Education 5: 345–350.

    Google Scholar 

  • Rosenthal, R. (1987). Judgment Studies. Design, Analysis, and Meta-Analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Rosenthal, R. (1991). Some indices of the reliability of peer review. Behavioral and Brain Sciences 14: 160–161.

    Google Scholar 

  • Seiler, L. H., Weybright, L. D., and Stang, D. J. (1977). How useful are published evaluations ratings to students selecting courses and instructors? Teaching of Psychology 4: 174–177.

    Google Scholar 

  • Shavelson, R., and Russo, N. A. (1977). Generalizability of measures of teacher effectiveness. Educational Research 19: 171–183.

    Google Scholar 

  • Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 86: 420–428.

    Google Scholar 

  • Süllwold, F. (1992). Welche Realität wird bei der Beurteilung von Hochschullehrern durch Studierende erfaβt? [What reality is expressed through the judgement of university instructors by students?] Mitteilungen des Hochschulverbandes 40: 34–35.

    Google Scholar 

  • Terry, R. L., and McIntosh, D. E. (1988). Do students' expectancies affect their course evaluations? Educational and Psychological Measurement 48: 787–798.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rindermann, H., Schofield, N. Generalizability of Multidimensional Student Ratings of University Instruction Across Courses and Teachers. Research in Higher Education 42, 377–399 (2001). https://doi.org/10.1023/A:1011050724796

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011050724796

Navigation