“In other words, when you hold people accountable using a numerical measure–vehicle emissions, scores on a test, whatever–two things generally happen: they do things you don’t want them to do, and the measure itself becomes inflated, painting too optimistic a view of whatever it is that the system is supposed to measure.” (Koretz, page 38.)
Inflated scores is not a problem limited to education. Koretz takes pains to demonstrate how it permeates any endeavor where too much focus is placed upon a score. He describes how British hospital emergency rooms were scored on how quickly they saw newly arriving patients, but the response was anything but good. Some hospitals hit their target by queueing the ambulances in the parking lot and not allowing people to be brought in until they could see them!
In other cases, hospitals were scored on how long it took to admit ER patients to the building. They got around that by declaring a gurney in the hall a hospital bed and patients lined the hallways as no rooms were available.
We haven’t been immune to that in the U.S. When New York began publishing mortality rates for cardiac patients, many physicians refused to treat the most seriously endangered patients.
When the target becomes the measure, it distorts the behavior of the people involved and it produces undesired outcomes.
This is known as Campbell’s Law and it rears up anywhere numerical measures are used to judge the performance of people. Decision-making inevitably skews towards whatever will make the number better.
As in education. Koretz shows how the results of high-stakes tests do not match those of low-stakes tests. He demonstrates how the dramatic improvement in 8th grade math scores in New York is not confirmed by the little improvement in the same grade on the NAEP scores. New York is not alone. The phenomena of state tests showing increasing gains while low-stakes tests do not is widespread and has been going on for decades.
What makes the difference? In the case of New York, the one area NAEP showed increases was algebraic thinking. Not coincidentally, that is the area heavily emphasized on the New York state test. That is the area, therefore, that teachers focused on to the detriment of the rest of mathematical content.
Koretz documents the reluctance of states and school districts to allow researchers to study this. It seems they simply do not want to find evidence that the proclaimed success stories, the gains in learning, are an illusion.
Why do scores inflate? Whenever a new test is introduced, scores drop. But as time goes on, scores move upward as teachers become more familiar with what is being tested and how it is being tested. As a result, they are better able to predict what students will see and prepare them to be successful.
The predictability of the test items is a large factor in inflating scores. Teachers can ignore what is not being tested; teachers can give students strategies to recognize test questions and answer them correctly whether or not the students actually know what to do.
In other words, because the tests are predictable, teachers are able to teach to the test and ignore the rest.
Why would they do that? Even ignoring VAM measurements and the career-ending and maybe life-destroying consequences of not focusing on getting the highest scores possible, the unrelenting pressure on teachers, administrators, and schools pushes them to make test scores the purpose of education. Good scores? Golden. Bad scores? Gehenna for you! The pressure began before VAM became the vogue among policymakers.
Few can withstand it. Koretz discusses cheating in Chapter Six, a chapter he allowed a colleague to write and share. The mentioned scandals are well-known. What is interesting is that, in passing the chapter to someone else, he subtly undermines it as a chief concern. Yes, cheating is wrong and yes, cheating should be stopped. Cheating should be punished. But even if all the cheating ends, test results will still be corrupted and we will still have a false picture of school improvement.
Next: Test prep, good and bad, unrealistic targets, the sham of Vam and the Common Core.