Learn About: Evaluating Performance | Common Core
Home > Evaluating performance > The proficiency debate > Score wars: What to make of state v. NAEP tests
| print Print


Score wars: What to make of state v. NAEP tests

It’s a safe bet that newly released test scores will get a lot of media coverage. If anything, the spotlight has grown brighter since federal and state requirements went into place linking test scores to school accountability. Unfortunately, the focus of attention has not exactly brought, well, focus.

The latest results of the National Assessment of Education Progress are a case in point. NAEP, often called “the nation’s report card,” is periodically administered to find out how well American students perform in various subjects. Depending on who’s talking, the 2005 NAEP scores are good news, bad news or business as usual. To some, they’re evidence that the federal No Child Left Behind (NCLB) Act is working to raise achievement while to others, they’re proof that the law is holding students back.

The chatter is further complicated by comparisons between NAEP scores and student performance on tests given by individual states. What does it mean, for example, when 80 percent of your state’s 4th graders are proficient on the state test, but only 30 percent of them are proficient according to NAEP?  How can performance on a state test go up while the same state’s NAEP scores go down?  It’s enough to make even the most dedicated data geek’s brain hurt.

But before you despair, you should ask these few, semi-simple questions to help you understand NAEP and your state scores. These may or may not give you the news you want, but at least you’ll have a better sense of what that news is.

How is proficient defined?

The most common — and most widely reported  — discrepancy between NAEP and state test scores concerns the numbers of students reported to be proficient. When this happens, most of the time the state test results show that more students are proficient than NAEP does. Sometimes the difference is quite large, as much as 60 percentage points or more.

Testing proponents like former Department of Education official Diane Ravitch claim these anomalies between NAEP and state tests show that some states are making it too easy too be proficient. [See, for example, Ravitch's Nov. 7, 2005 commentary on NAEP.] Testing critics like education researcher Gerald Bracey counter that, no, NAEP’s proficiency standards are too high. [See "Bracey on NAEP",  Nov. 7, 2005.]

Nonetheless, proponents and critics alike point to the same reason for the discrepancy: 

Across the U.S. there is no common definition for proficient. NCLB requires states to make sure all their students are at least proficient in math and reading by the year 2014. But the law leaves it to each state to figure out for themselves what students have to do to hit that target. Thus the nation now has 51 different ways of saying that all 4th-graders have met math standards, for example, and these may or may not bear much resemblance to what NAEP says about it. 

To be sure, there could be other reasons the numbers don’t agree.

Your state standards could have some content that’s out of sync with NAEP. This is more likely to be a problem in topic-driven subjects, like history, because the content can be taught at almost any grade level (for example, the American Revolution). On the other hand, subjects that build on skills at each grade, such as reading and math, tend to follow a similar progression across states. Some curricular mismatch could still result, but the effect on NAEP scores would likely be modest.

Students’ motivation to do well on the test may differ. NAEP has no consequences for students while state assessments can, for example, by influencing decisions on grade promotion or high school graduation. The research on motivation is by no means conclusive, but does suggest that the effect of student motivation (or more precisely, the lack of it) is most significant in high school. Because state NAEP results are only produced for 4th- and 8th-graders, motivation may contribute to lower NAEP scores, but it’s probably not a major factor.

Your state is not shooting as high as NAEP. If your state reports that 80 percent of its 4th graders are proficient and NAEP says only 30 percent are, there’s a very good chance this is the case.  If so, you might see closer alignment between your state’s proficient numbers and those reported at NAEP's basic level, as in Florida (see Exhibit 1).

Exhibit 1
 Florida, 2005, 4th grade reading scores
(as a percent of students)

 Performance Level

 NAEP

 FCAT1

 Proficient and above

 30%

 71%

 Basic and above

 65%

 

What are the trends?

NAEP and state tests can also report different trends: that is, how have scores changed over time? Are they going up, down, staying the same?

NAEP, along with many state tests, reports scores in two ways: performance levels, such as proficient or basic, and scale scores. Trends are documented using both methods.

Performance levels. NCLB requires states to establish at least three levels of student performance to report to the public:  proficient, which essentially means meeting standards; basic, or performance below proficient; and advanced, or performance above proficient. NAEP and many states also report a level equivalent to below basic, which helps to track progress of the lowest performing students. 

(To confuse things, states can establish their own names for these levels, for example, Level 1, 2 or 3. They can also have more than three performance levels; some states have as many as six.)

A few canny observers have documented instances of NAEP and state proficient scores moving in opposite directions, prompting one New York Times commentator to write, “Are 4th graders getting smarter or dumber?” (Michael Winerip, “Are Schools Passing or Failing? Now, There’s a Third Choice … Both,” New York Times, November 2, 2005) 

So which is it? The first thing you need when looking at trends with performance levels is a complete picture.

In the example cited by Winerip, New York 4th graders reading at the proficient level declined by 1 percentage point on NAEP, but increased by 6 points on the state test. A seemingly contradictory trend.

But let’s look at the whole picture (see Exhibit 2). The percentage of advanced students did not change on NAEP. But on the state test it declined by 7 points. This means that New York’s gains at proficient were brought about by falling scores at the advanced level. This is not good news. But both NAEP and New York agree. They both show a 1 percentage point decline among students who are at least proficient (proficient plus advanced).

Exhibit 2

Performance Level

New York State, NAEP reading
Grade 4, 2003-2005
Change in percentage points

New York State, English language arts (ELA)
Grade 4, 2003-2004
Change in percentage points

Advanced

 0

 -7

Proficient

 -1

 +6

At least proficient
(proficient + advanced)

 -1

 -1

Basic

 +2

 +2

Below basic

 -2

 0

Scale scores. NAEP and many state assessments also report student performance in scale scores. These scores are based on a continuous numerical scale that measures skills from the lowest to the highest level. The NAEP scale, for example, ranges from 0-500 and the same scale is used to compute scores for students at 4th, 8th and 12th grades as shown in Exhibit 3.

Exhibit 3
NAEP reading performance, 2002

Grade Level

Average Scale Score

4th grade

 219

8th grade

 264

12th grade

 287

 What’s important to know is that test designers work their psychometric magic to create these scales so that we can reliably compare student performance from one year to the next irrespective of performance levels. For this reason, looking at scale scores is a better method for monitoring trends. 

In Tennessee, for example, the percentage of 8th-graders who were proficient or better in math was unchanged between 2003 and 2005. Yet the average scale score for these students increased by 3 points (from 268 to 271), reflecting gains that would have been missed if looking only at performance levels.

When should I worry?

The bottom line for policymakers and the public is figuring out whether their schools are moving along the right track or headed for disaster. Should they declare victory with a 4 point spike? Is a 1 point decline a catastrophe?

It depends. On large-scale assessments, a change of 1 or 2 points may or may not be meaningful. Statistical significance is a mathematical term that means a numerical result did not happen by chance, but rather reflects a real change. Often statisticians will flag changes that are significant, as well as changes that are not and should thus be interpreted with caution. Even if this information is not provided, a 1 point change is slight, so it’s probably safe to pay attention, but not get overwrought. 

A longer trend line will provide a better sense of how significant a change in test scores is. Because NCLB went into effect in 2002, a lot of reporting on the latest NAEP results has focused on the change in scores between 2003 and 2005 as a way to gauge NCLB’s influence. As tempting as it is to make some pronouncements (and indeed, they’re coming from both camps) two years of data probably won’t produce enough evidence to send the jury out yet, particularly when the change is negligible.

In an online article for the National Review, Fordham Foundation leaders, Chester Finn and Michael Petrelli addressed a concern simmering in some quarters that “by focusing on students at the lower end of the achievement range, [NCLB] will give educators incentives to ignore the top-performing pupils.” They find “disturbing evidence” in 2005 NAEP reading that indeed our top students “lost ground at both the 4th and 8th grades since 2003.”

But let’s look at the data. For the period 2003-2005, the performance of 4th- and 8th-graders at the 90th percentile — or the top students — declined by 1 point on NAEP reading. Only at the 8th grade level is this decline statistically significant. In addition, the scores for this same group of students have fluctuated up and down by 1 point over the last 13 years (click here to see NAEP reading charts). In 1992, the top 4th graders scored 261 on the NAEP scale; after a decade of 1 point zigs and zags, they managed to increase their score to 263 by 2005. Likewise, top 8th graders began at 305 in 1992 and scored 305 again in 2005.

Stagnant scores are clearly nothing to celebrate, especially at a time when schools are so focused on improvement. But based on this data, it’s hard to make the case that our top students have declined in reading since NCLB came into effect.

A good rule of thumb for interpreting NAEP scale scores is that 10 points translates roughly into one year’s worth of learning.

Changes in NAEP as well as your state’s scores should be tracked and analyzed for information about progress in your schools. At the same time, it’s important to look at where your students are compared to NAEP and your state standards for proficient so that you keep your eyes focused on the goal.

Are state standards too low? Are NAEP standards too high?

There’s been an ongoing debate in the research community about the appropriateness of NAEP performance levels. It is not likely to be resolved anytime soon. Critics of NAEP point to a 1998 report by the National Academy of Science (NAS), which found “the current process for setting NAEP achievement levels … fundamentally flawed.”  On the other hand, NAEP’s overseers at the National Center for Education Statistics  claim that NAS’s evaluations “relied on prior studies … rather than carrying out new evaluations” and are therefore inconclusive. Policymakers, educators, parents and the public who want to understand the process for setting NAEP levels can refer to the NAEP frameworks for a description.

Performance levels notwithstanding, to many educators and policymakers, NAEP represents the gold standard in testing for its ability to assess both content and critical thinking.

But use your own judgment. Anyone can download released items from NAEP. Look at these alongside examples from your own state tests and ask yourselves,

  • Would I expect my child to be able to perform well on these tests?
  • Would I want this for the children in my school and community?

When all stakeholders are engaged in such conversations, it not only brings clarity to where your students are, it gives direction to where you want them to be.

To find out more about your state see our Directory of State Departments of Education  for the link to your state's Department of Education web site.

Visit the NAEP web site at  more information about NAEP, including state results, sample test items, performance levels, and more..

1Florida Comprehensive Assessment Test


This guide was prepared by Patte Barth, director of the Center for Public Education.

Posted: March 29, 2006

©2006 Center for Public Education

Home > Evaluating performance > The proficiency debate > Score wars: What to make of state v. NAEP tests