When Rater Reliability Is Not Enough: Teacher Observation Systems and a Case for the Generalizability Study

Authors

Heather C. Hill,

Charalambos Y. Charalambous,

Matthew A. Kraft

Year of publication

2012

Publication

Educational Researcher

Volume/Issue

41(2)

Pages

56–64

https://doi.org/10.3102/0013189X12437203

In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to specify important criteria regarding instrument use. In this article, the authors argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training, and certification. To illustrate how such a system might be developed and improved, the authors provide an empirical example that applies generalizability theory to data from a mathematics observational instrument.

Educator Preparation and Development

Center topics

How educators learn and develop throughout the career

Center

Center on the Study of Educators

When Rater Reliability Is Not Enough: Teacher Observation Systems and a Case for the Generalizability Study

Suggested Citation

Annenberg authors