clock menu more-arrow no yes mobile

Filed under:

How schools that obsess about standardized tests ruin them as measures of success

Students sit politely, in uniforms, waiting for their teacher, at Harlem Success Academy.
Students at the Success Academy charter schools regularly outperform their peers on standardized tests. But does it matter?
Chris Hondros/Getty Images

The video changed everything. The surreptitious recording of a teacher berating a first-grader for failing to explain a math problem correctly, ripping up her paper, and then inexplicably telling the subdued child to go sit in the "calm down chair" went viral and jump-started a long-running national debate over the "no excuses" charter school model — a pedagogical approach marked by heavy workloads, rigid zero-tolerance discipline, and a relentless focus on metrics, particularly standardized tests.

The Success Academy Charter Schools are arguably the best-known institutions that follow this model. The storyline its proponents push is that no-excuses schools may be tough but they prepare students well: They represent a kind of educational tough love.

In real life, though, the data seems to show that Success Academy thrives by a combination of kicking out poorly performing students and training the remainder to perform well on tests that kids at other schools don’t really care about — or don’t care as much about.

Other critics have focused on how Success Academy focuses on excluding students who are not likely to perform well on tests — an option public schools don’t have. A parent of a kindergartener with a speech disability complained to the New York Daily News that the academy tried to force her son back into the public schools by framing his frustration in class as a disciplinary problem and repeatedly suspending him. The New York Times revealed that one principal at a Success Academy school had a list of low-performing students labeled "Got to Go."

The statistical problem with test score obsession

When a school uses selection and attrition policies that effectively filter out many of the extremely poor, students speaking English as a second language, and the learning disabled, that clearly calls into question test score advantages that such a school might have over an ordinary public school.

But the problems run even deeper than most critics realize: A look at the data combined with some basic principles of social science suggests that the practices of no-excuses charters are undermining the very foundation of data-based education reform.

As statisticians with experience teaching at the high school and college level, we recognize a familiar problem: A test that overshadows the ultimate outcomes it is intended to measure turns into an invalid test.

Back in the old Soviet Union, factories would produce masses of unusable products as a result of competition to meet unrealistic production quotas. Analogously, many charter schools, under pressure to deliver unrealistic gains in test scores, are contorting themselves to get the numbers they've promised. They're being rewarded for doing so. But that monomaniacal focus on test scores undermines the correlation between test scores and academic accomplishment that originally existed.

Thus, when there is a policy of teachers berating poorly performing students, it’s not tough love in the service of preparing students for future academic success. Rather, it’s consistent with a metrics-driven strategy that’s about looking good but not necessarily performing well.

In short, a statistical perspective can help reveal how behavior that seems like an aberration can actually be a natural response to perverse incentives.

It's important to grasp that the standardized tests used to evaluate schools, such as the PARCC (Partnership for Assessment of Readiness for College and Careers) exam, are proxies. (New York does not use the PARCC but has its own comparable test.) Despite the reference to "college and careers" in the test’s name, there is no way it can directly measure which kids will get college degrees and good jobs. (To study the question, you'd have to wait 20 years to get your results.)

Instead, these tests are designed with the expectation that the scores correlate with later academic and economic success. The tests are proxy measures. And as is the case with all proxies, if the underlying relationships break down, they can become misleading.

If a school can figure out a way to prevent homeless kids from enrolling, for example, and the school's test scores go up, there is no reason to assume we have improved anyone's economic prospects. Likewise, if the school makes sure that all the kids at one school are well-rested and motivated to answer all the questions on test day, we would not expect to see that school's improvement in scores translate to other areas.

These tests work under normal teaching conditions — but Success Academy is not "normal"

As social psychologist Donald T. Campbell put it in 1976:

[A]chievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.

The greatest threat to any data-based system is corruption of the data. Education reform advocates are correct in arguing that competition, market forces, and metric-based incentives can accomplish great things, but only if the system is properly designed and subjected to vigilant oversight to guard against the corruption. And if you were to set out to try to come up with a model that would inevitably and irretrievably undermine the data, it would look very much like the test-centric no-excuses charter school model.

First, the focus on standardized test scores is relentless. And it takes place in a context in which, thanks to lobbying by leaders of and advocates for these schools, there is very little official oversight.

High scores on the practice test are rewarded with praise and even toys, purchased by the school as rewards; low scores are punished with public shaming. The behavior on the aforementioned video is an extreme example. More routine examples include posting on classroom walls every student’s scores, even when parents complain this embarrasses and pains their children.

Worse yet, in terms of data corruption, many of the practices — such as calling parents the night before a test, to make sure the kids get enough sleep — have no conceivable effect on students' mastery of the material. They are entirely designed to improve the conditions under which the test is taken.

A red-flag test score discrepancy

You might be inclined to excuse some of the focus on test prep if higher test scores were meaningful indicators of student achievement (and teacher effectiveness). But we already have one troubling piece of evidence that Success Academy practices are undermining the value of the test scores in that context.

According to the tests that New York uses to evaluate schools, Success Academies ranks at the top of the state — the top 0.3 percent in math and the top 1.5 percent in English, according to the founder of the Success Academies, Eva Moskowitz. That rivals or exceeds the performance of public schools in districts where homes sell for millions of dollars.

But it took three years before any Success Academy students were accepted into New York City's elite high school network — and not for lack of trying. After two years of zero-percent acceptance rates, the figure rose to 11 percent this year, still considerably short of the 19 percent citywide average.

News coverage of those figures emphasized that that acceptance rate was still higher than the average for students of color (the population Success Academy mostly serves). But from a statistical standpoint, we would expect extremely high scores on the state exam to go along with extremely high scores on the high school application exams. It's not clear why race should be a factor when interpreting one and not the other.

The explanation for the discrepancy would appear to be that in high school admissions, everybody is trying hard, so the motivational tricks and obsessive focus on tests at Success Academy schools has less of an effect. Routine standardized tests are, by contrast, high stakes for schools but low stakes for students. Unless prodded by teachers and anxious administrators, the typical student may be indifferent about his or her performance.

In general, competition is good, as are market forces and data-based incentives, but they aren't magic. They require careful thought and oversight to prevent gaming and what statisticians call model decay. Without these, reform initiatives can be left defenseless against fast operators and unintended (but foreseeable) consequences.

What went wrong with Success Academy is, paradoxically, what also seems to have gone right. Success Academy schools have excelled at selecting out students who will perform poorly on state tests and then preparing their remaining students to test well. But their students do not do so well on tests that matter to the students themselves.

Like those Soviet factories, Success Academy and other charter schools have been under pressure to perform on a particular measure, and are reminding us once again what Donald Campbell told us 40 years ago: Tampering with the speedometer won't make the car go faster.

We want to stress that our interest is not the quality of Success Academy but the quality of the data. For disadvantaged students who are focused and disciplined, who thrive in highly structured settings, who can handle high-pressure situations and who have highly supportive families, Success Academy schools may be really good environments.

Our concern is that under the current system of flawed metrics and perverse incentives, the Success Academy model can lead to more data corruption, which can lead to increasingly bad decisions, maldistribution of resources, and apples-to-oranges comparisons.

Whenever you make huge decisions about complex situations based on one or two numbers, you're headed for disaster — especially when those numbers can be gamed.

Mark Palko is a marketing statistician and former math teacher who blogs at West Coast Stat Views. Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He blogs at Statistical Modeling.

Sign up for the newsletter Sign up for Vox Recommends

Get curated picks of the best Vox journalism to read, watch, and listen to every week, from our editors.