Study Tests Using Teacher Observation Data for Evaluation of EPPs

A new study finds that using observational ratings of beginning teachers may be a viable alternative—or a useful complement—to relying solely on controversial “value-added” modeling (VAM) in evaluation of educator preparation providers (EPPs).

An article about the study by Matthew Ronfeldt and Shanyce Campbell of the University of Michigan School of Education, published in the journal Educational Evaluation and Policy Analysis, is now available online.

In what the authors describe as the first study to investigate the use of teachers’ observational ratings to evaluate their preparation programs and institutions, the results are compelling.

“The demands for teacher preparation accountability continue to grow, from the proposed federal regulations to new accreditation standards,” said Ronfeldt, who was also the 2016 recipient of AACTE’s Outstanding Journal of Teacher Education Article Award. “We sorely need better ways to assess program quality. Although VAM makes an important contribution to our understanding of program outcomes, we likely need multiple measures to capture something as complex as preparation quality. We are excited to find that teacher observational ratings could be a viable supplement.”

Some potential advantages of observational ratings are that they are available for most teachers, are direct measures of instruction, and can provide useful feedback to teachers and programs about performance in specific areas of instruction. Currently, VAM is widely used to measure teachers’ impact on student achievement, but the evidence is mixed on its reliability. Moreover, the method’s dependence on standardized test scores has faced especially heavy criticism in the current backlash against overtesting. Extending this model to judge effectiveness of teachers’ EPPs has also been widely criticized for unfairly penalizing programs that send teachers into low-performing schools and for failing to capture performance of graduates teaching in nontested subjects and levels, among other concerns.

Tennessee is the birthplace of VAM, originating in the Tennessee Value-Added Assessment System (TVAAS) developed in the 1990s and still in use today. The state also employs observational rubrics that collect numerical ratings for each teacher, usually several times per year, across a number of domains and indicators. Ronfeldt and Campbell drew on both of these statewide data sources for their study, identifying nearly 9,500 teachers who completed in-state preparation programs between 2009-10 and 2012-13 and were employed in more than 1,500 Tennessee public schools during the 2011-12 through the 2013-14 academic years. These teachers came from 183 programs at 44 different providers around the state.

Testing uncharted waters in connecting teachers’ observational ratings to their specific preparation program, Ronfeldt and Campbell first investigated what modeling approach would work best. They found that using simple averages would likely penalize programs that supply teachers to schools with more historically marginalized students. All regression approaches they tested were preferable to using averages, though they argue that “school fixed effects” and hierarchical linear models seem to do a better job of adjusting for differences between schools than ordinary least squares models. The authors are careful to note that a few programs in the study came out with very different rankings under the different models—as disparate as being in the top quartile in one but the bottom quartile in another—making the choice of model profoundly significant for at least that small percentage of programs.

In the end, the authors found that observational ratings of EPP graduates did indeed reveal significant and meaningful differences among providers and programs. The difference between the average quality of graduates in the top and in the bottom quartile programs was roughly equivalent to a full year of teaching experience. Additionally, more than half of the programs in the study were ranked in a different quartile than their host institution, suggesting that assessing institutions alone may not be sufficient. Doing so would, for example, risk penalizing a high-performing program in a low-performing institution.  

In addition, assessing providers using graduates’ observational ratings produced moderately similar results to evaluations based on their students’ achievement gains as measured by TVAAS. For some programs, though, the choice of outcome measure (observation ratings vs. VAM scores) made a significant difference in their rankings.

If used for high-stakes policy purposes such as program approval and accreditation standards, Ronfeldt and Campbell suggest combining observation-based with VAM-based approaches as complementary “checks and balances” for capturing program outcomes, along with other factors such as graduates’ retention data. Just as teacher evaluation needs to employ multiple measures to capture varied dimensions of teacher quality, so must assessment of EPP quality reflect the complexity of the enterprise.


Tags: , , ,