When Rater Reliability Is Not Enough: Teacher Observation Systems and a Case for the Generalizability Study

Authors
Heather C. Hill,
Charalambos Y. Charalambous,
Matthew A. Kraft
Year of publication
2012
Publication
Educational Researcher
Volume/Issue
41(2)
Pages
56–64
In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to specify important criteria regarding instrument use. In this article, the authors argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training, and certification. To illustrate how such a system might be developed and improved, the authors provide an empirical example that applies generalizability theory to data from a mathematics observational instrument.

Suggested Citation:

Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When Rater Reliability Is Not Enough: Teacher Observation Systems and a Case for the Generalizability Study. Educational Researcher, 41(2), 56–64