Are New York’s New Tests Better, or Just Harder?

The big drop in scores on New York’s new Common Core exams have sparked discussion about whether the tests are different in the *right* way. Did the tests actually evaluate the “higher-order” thinking skills kids are now supposed to be learning, or are the low scores the result of questions that were harder or just plain bad, but not conceptually different? In other words, did the tests do what they were supposed to do?

At this point there’s no way to get a definitive answer, but the data on how each grade’s performance changed relative to the other grades may be of some use. In theory, younger students have spent less time learning complex math and taking high-stakes exams, and so they should have an easier time adjusting to change. Look at the test question below. It’s not hard, but it initially makes your brain hurt.

If you went through elementary school without encountering “weird” questions like that, you might be a little thrown off. So older kids should have a harder time adjusting to exams that test for new higher-order skills. But what does the data say?

Below are the math proficiency rates of New York City 3rd through 8th graders over the last 4 years. (I’ve focused only on math because it seems like students are more likely to have set ways of thinking about and math problems, and thus the shift to “Common Core math” ought to present bigger adjustment. But I could be wrong.)

As you can see,  it’s hard to make out any kind of trend because of the large across-the-board drop this year.

chart 1 new

However, the chart below presents the same data, but now each grade’s proficiency rate has been replaced with its performance relative to the overall 3rd-8th grade (i.e. “All Grades”) proficiency rate. Note that the numbers below don’t represent raw percentage point changes, but the rate of proficiency relative to the 3rd-8th grade average. So for example, in 2012, the 4th grade proficiency rate of 65.7% represented a 9.6% increase over the 3rd-8th grade proficiency rate of 60%. But in 2013, the 4th grade proficiency rate of 35.2% represented an 18.9% increase over the 3rd-8th grade rate of 29.6%.

Overall, what you see are large jumps for the two youngest grades and large declines for the two oldest grades (although the drop for 8th grade puts them at a level similar to 2010.)

chart 2 new

Below is the same data, but now 3rd & 4th grade and 7th & 8th grade have been averaged into single measures to emphasize the change. Once again, the x-axis represents the proficiency rate of all 3rd-8th grade students.

In 2013, 3rd and 4th graders went from being 2.3% better than the 3rd-8th grade average to being 15.4% better.

chart 3 new

Finally, below is the same chart as above, but with data from all of New York State rather than just New York City. The pattern is the same.

chart 4 new

It would be interesting to know whether the same thing occurred last year in Kentucky, the first state to pilot new exams based on the Common Core. I poked around a bit but couldn’t find any handy state level aggregates. (Feel free to let me know or leave a comment if you have them.)

Of course these numbers don’t necessarily mean that the tests successfully evaluated new skills, or that the younger kids did better because they weren’t acclimated to the old tests. It’s possible that for whatever reason the 3rd and 4th grade tests were objectively easier. It also possible that this year’s 3rd and 4th grades were particularly strong cohorts, although the fact that this year’s top performing 4th graders were pedestrian 3rd graders last year suggests that might not be the case. Most importantly, it’s only one year of data, and so it might just be statistical noise.

Nevertheless, it’s worth thinking about what it would mean if the “newness” of the test had a non-uniform impact across all six grade levels.  One takeaway is that we should expect scores to rise more than we might otherwise expect because kids will either get acclimated to the new tests or be replaced by kids who were never exposed to the old ones. In other words, even if the 2019 cohort of 8th graders has exactly the same raw knowledge as this year’s cohort, we should expect the 2019 cohort to do better. That means that Mr. or Mrs. New Mayor will get a little credit they probably don’t deserve.

More broadly, if familiarity does have an impact it would be yet another reason to be patient when evaluating new standards, tests, or instructional paradigms. In addition to all the obvious moving parts, there is psychologically-based inertia that has to be overcome, and it will take time until everybody is mentally up to speed with the new system. Teachers have already made it clear they don’t feel they’re there yet, and perhaps certain students also need more time to adjust before we can properly judge them. In that sense the charts above merely reinforce what the smart people have been saying all along. If you’re looking to draw sweeping conclusions after one or two years of testing, you’re out of luck.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s