Learn About: Evaluating Performance | Common Core
Home > Evaluating performance > The proficiency debate > The proficiency debate: A guide to NAEP achievement levels
| print Print

The proficiency debate: A guide to NAEP achievement levels

If you handed in the same assignment to two different teachers and received an “A” from one and a “D” from the other, wouldn’t you have some questions?

Although the difference isn’t always that dramatic, the discrepancy in “grades” received on the National Assessment of Educational Progress (NAEP)—a nationwide test often called The Nation’s Report Card—and state’s own assessments is causing many people to have a few questions.

Recently, numerous reports have highlighted the fact that most state assessments find more students “proficient” than the NAEP assessment does. Sometimes, the gap can be substantial. On average, there is nearly a 40 percentage point difference between state and NAEP assessments in scoring how many fourth-graders are proficient in reading (Table 1).

Such gaps immediately suggest two options:

  • Either NAEP standards are unrealistically high, or 
  • State standards are woefully low.

Those who are inclined to believe the latter option have called for NAEP scores to be displayed next to state scores or for NAEP to be used as a national standard. Those who lean toward the former have increased their criticism of NAEP, often citing a National Academy of Sciences (NAS) study that called NAEP's standard for proficiency “fundamentally flawed.”

So is NAEP unrealistic? Unfortunately, the answer isn’t that simple. NAEP and state assessments have different objectives and they measure different things. If you are facing a gap between your state’s scores and NAEP scores, it’s important to know how NAEP is developed, its purpose, and how “proficient” is defined. This guide will walk you through how NAEP scores are determined and help you uncover what it means if proficient scores in your state differ dramatically from NAEP scores.

This guide answers to the following questions:

How big is the discrepancy?

Recently, numerous reports have highlighted that most state assessments find more students proficient than NAEP (Hall and Kennedy

About NAEP 
NAEP, also known as The Nation's Report Card, is the only nationwide test that can compare student achievement across states. Although state participation was originally voluntary, No Child Left Behind now requires states to participate in the 4th and 8th grade NAEP assessments in reading and math6 as a condition to obtaining Title I funds.  The first NAEP assessments date back to 1969, and provide educators information on how well U.S. students were performing nationally over time. It wasn't until 1990 that State NAEP, also known as Main NAEP, was created to enable comparisons across states. This version of NAEP introduced criterion-based achievement levels (basic, proficient, and advanced) (NAGB 2005).

2006, Peterson and Hess 2006, Workforce 2007). These reports have been used to make two arguments:

  • Most state standards are not as rigorous as NAEP.
  • States are not being truthful when reporting how many students are actually proficient.

(Peterson and Hess 2006).

These two arguments lead to our first question: How many more students do states deem proficient compared to NAEP?

Table 1 provides a sample comparison. It shows that on the 2005 fourth-grade reading test, 89 percent of Mississippi’s students scored proficient, while only 18 percent did on NAEP. That’s a difference of 71 percentage points. Although this is an extreme example, across the country on average, state’s fourth-grade reading proficiency rates were almost 40 percentage points higher than their NAEP scores. In eighth grade math (Table 2), the average difference was a bit lower at 26 points. West Virginia had the largest gap with 53 percentage points, while in five states (Maine, Massachusetts, South Carolina, Louisiana, Missouri, and New Hampshire) the pattern was reversed. These states actually had more students reaching proficiency on NAEP than on their state assessments.

A more scientific comparison of state and NAEP proficiency levels was conducted by the National Center for Educational Statistics (NCES), an agency within the U.S. Department of Education. Rather than simply comparing proficiency percentages, this report placed the scores of each state’s 2005 assessment onto NAEP's scoring scale using a statistical linking process.As a result, the study provided a better “apples to apples” perspective on how state’s cutoff scores for proficiency differ from NAEP's cutoff scores (NCES 2007). The report found that two states, Massachusetts and Wyoming, set their cutoff score for proficiency on fourth-grade mathematics similar to NAEP’s.


How to Read Figures 1–4

The circles in the graphs represent where each state's cutoff score for proficiency would be if placed on NAEP's scoring scale. Since placing state scores on the NAEP scale is a statistical process there is a certain amount of error. So the lines expanding from the circles represents the range of NAEP scores that state cutoff scores for proficiency would likely fall between and the circles are in the center of that range. 

Placing each state's cutoff score for proficiency on the NAEP scale provides a way of comparing the rigor of state performance standards to NAEP. So states with higher NAEP scores likely have more rigorous performance standards for proficiency on their state assessments. For example, Figure 1 shows that 4th grade students in Massachusetts who score at or above proficient on their state math assessment would also score at or above NAEP's proficient achievement level, while students in Tennessee may score proficient on their state assessment but score below basic on NAEP. The graph shows that Massachusetts has a higher performance standard for proficiency than Tennessee.

In eighth-grade mathematics, Missouri, South Carolina, and Massachusetts were the only states to set their proficiency cutoff scores similar to NAEP’s. All other states set their cutoff scores for proficiency well below NAEP's. In some cases, state cutoff scores for proficiency were set within NAEP's “below basic” level (NCES 2007).


Most states also had lower cutoff scores for proficiency in reading than they did in mathematics. Not one state’s fourth or eighth grade cutoff score for reading proficiency was as high as NAEP's (NCES 2007). At the fourth grade level only 10 out of 32 states set cutoff scores for reading proficiency above NAEP’s cutoff score for “basic” achievement (NCES 2007). This means that students in 22 states can be deemed proficient in reading on their state assessment, yet be characterized as below basic on NAEP.


Why is there a discrepancy?

How can this be? How can one test find a student proficient, while another test of similar skills finds the same student to be below basic? Everyone, including politicians, education advocates, and researchers, has a different opinion. Some argue that NAEP sets the bar too high (Hunter 2007), while others claim that most states expect too little of their students (Peterson and Hess 2006). Still others say that comparing state and NAEP proficiency rates is misleading, because they are not measuring the same thing (Stoneberg 2007b). Some are advocating for the development of national standards based on NAEP benchmarks so there is a uniform definition of proficiency. (Thompson and Barnes 2007). And then there are those who want states to increase their academic rigor for defining proficiency to better align with the NAEP assessment (Dodd and Ehlers 2007). No wonder there’s confusion! The answers to the following questions, may help you navigate between all these arguments in your own state.

Do states and NAEP define proficiency the same way?Although NAEP and states label a certain level of achievement “proficient”, they are not necessarily defining proficiency the same way. Nor are they testing the same knowledge and skills. For example, the National Assessment Governing Board (NAGB), which oversees NAEP policies, says, "In particular, it is important to understand clearly that the Proficient achievement level does not refer to 'at grade' performance” (Loomis and Bourque 2001). The report goes on to say "…students who may be considered proficient in a subject, given the common usage of the term, might not satisfy the requirements for performance at the NAEP achievement level” (Loomis and Bourque 2001). The report also says that the basic level is less than mastery but more than minimal competency (Loomis and Bourque 2001).

Additionally, NAGB’s 2007 Mathematics Framework for the 2007 NAEP report says, "Proficient represents solid academic performance for each grade assessed. Students reaching this level have demonstrated competency over challenging subject matter, including subject-matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter” (NAGB 2006). The basic level, the report says, "…denotes partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at the 4th, 8th, and 12th grades” (NAGB 2006).

Using NAEP “standards”
Often, people get confused when the discussion turns to NAEP “standards.” Most of the time when people talk about standards, they are talking about content standards, which answer the questions “What should be learned and when?” 

NAEP does not dictate what should be taught. It develops frameworks for what knowledge and skills the tests will cover, but its main focus is on evaluating students’ performance.  When people talk about “NAEP standards,” they are talking about the standards for performance NAEP has set – the four achievement levels of Below Basic, Basic, Proficient and Advanced. So when the debate is about “whether NAEP standards are higher than state standards,” the discussion is really about whether NAEP sets the cutoff scores for those four achievement levels higher than states do.

Regardless of how NAEP defines basic and proficient, regulations contained within the 2001 No Child Left Behind Act (NCLB) require states to move all children to the proficient level in reading and math by the year 2014. However, NCLB gives each state leeway to define proficiency for itself. So 50 states have 50 separate definitions for proficiency in a particular subject, in a particular grade. This variability among the states has led to much confusion about what proficiency actually means.

In contrast to NAEP, NCLB defines proficiency as being on grade level. A report developed by the U.S. Department of Education specifically states that "The proficient achievement level [for NCLB] represents attainment of grade-level expectations for that academic content area” (U.S. Department of Education 2007). As Bert Stoneberg, NAEP State Coordinator for the Idaho State Board of Education, says "These NAGB publications make it clear that NAEP ‘proficient’ is not restricted to grade-level performance”(Stoneberg 2007a).

A report commissioned by the NAEP Validity Studies Panel2 found that the most appropriate NAEP level for gauging state assessment results should be the percentage of students at or above the basic achievement level (see Table 3  and Table 4) (Mosquin and Chromy 2004). Even U.S. Secretary of Education Margret Spellings urged reporters to compare state proficiency rates to NAEP's basic level (Dillon 2005). Moreover, James Pellegrino, lead author in the National Academy of Science's 1999 evaluation of NAEP calledGrading the Nation's Report Card, suggested that perhaps proficiency for accountability purposes lies somewhere between NAEP's basic and proficient levels because NAEP's proficiency level was not developed with accountability in mind (Pellegrino 2007).

Do NAEP and states assess the same content?
NAEP and state assessments are not always aligned, so in many cases they do not. For example, almost half (49 percent) of NAEP's eighth grade science standards are not aligned with what is assessed in the Texas state assessment (Timms, et al. 2007b). In Louisiana, where there was little difference in results (see Table 1), only seven percent of NAEP’s science content was not addressed in the state's assessment (Timms, et al. 2007a). In Michigan, roughly 22 percent of NAEP's eighth grade math standards are not aligned with the content on the state assessment (Stemmer and Hodges 2008), which may partially explain why there is a 33 point difference between the percent of students the state deems proficient and NAEP (see Table 2).  These are just a few examples, but they do show that different states may expect students to know different things than does NAEP. These differences in scope may partially explain the gap.

Are NAEP achievement levels set too high?

Even if the tests are measuring different things, does the large gap mean that NAEP achievement levels are set too high? Or, conversely, are state standards too low?

Those who argue that NAEP standards are too high often point to the 1999 evaluation of NAEP by the National Academy of Sciences (NAS), which concluded that NAEP achievement levels were “fundamentally flawed” (Pellegrino, Jones, and Mitchell 1999). Others argue that the achievement levels are not reasonable when compared with some other assessments.

NAEP vs. high school achievement: NCES investigated how student performance at a particular NAEP achievement level relates to high school achievement by using data from the National Education Longitudinal Study (NELS), which records a wealth of information about students, including students’ high school grades and assessment test scores. Although these students did not take the NAEP math assessment, they did take a similar math assessment in twelfth grade. Scores from this NELS assessment were then linked  to NAEP's scoring scale to determine how these students would have performed on NAEP. The report found that just over half (56 percent) of “A” students in math would have scored at or above the NAEP proficient level, while just 20 percent of “B” students would have done so (Scott, Ingels and Owings 2007). Just over two-thirds (68 percent) of seniors who had completed calculus reached the proficient level on NAEP, 13 percent of whom scored at the advanced level (Scott, Ingels and Owings 2007).

NAEP and the ACT: Only 21 percent of seniors who had scored between 21 and 25 on the ACT mathematics assessment performed at NAEP’s proficient level (Scott, Ingels, and Owings 2007). According to ACT, a score of 22 on the math portion of its college admissions test indicates that a student is ready for college algebra (Allen and Sconing 2005). Therefore, NAEP’s standard for proficiency seems to be set higher than the level of knowledge entering college freshmen need.

NAEP and college degrees: As expected, almost all (91 percent) high school seniors who scored at NAEP's advanced level in math earned a bachelor's degree. The percentage remained relatively high—79 percent of seniors scoring at the proficient level and 50 percent of seniors scoring at the basic level received bachelor’s degrees (Scott, Ingels, and Owings 2007).

These percentages are relatively high, considering that just about one-third of adults receive a four-year degree. Even at NAEP’s below basic level, 18 percent of seniors received bachelor's degrees (Scott, Ingels, and Owings 2007). Taken together, the data show that many students scoring at NAEP's basic level are prepared for postsecondary education and are well-equipped to go on to obtain a four-year degree.

When interpreting NAEP scores, however, keep in mind that NAEP is a low-stakes assessment, which means that neither the school nor the students face consequences if performance is poor. In contrast, state assessments are high stakes because consequences for the school and sometimes students themselves are tied to how well students perform. Researchers speculate that without consequences, students may not be motivated to do their best on NAEP. If so, scores on NAEP may understate what students truly know and can do.

One study found that while financial rewards can improve the performance of eighth-graders they had no effect at the twelfth grade level (O'Neil 1992). Dr. Mark Reckase, an expert in high stakes testing, found that high school students were more motivated to do well on college entrance exams (such as the SAT and ACT) than on NAEP (Reckace 2001). In particular, he found that students were less likely to do their best on more demanding questions such as open-ended questions, noting that these questions are more likely to be left blank (Reckace 2001). Researchers from the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) also found that NAEP scores underestimate student achievement, but by only a small amount (Kiplinger and Linn 1993).

NAEP and international achievement: No country, even those performing highest on international assessments, would have 100 percent of its students reach NAEP’s proficiency level. This is according to a report by the American Institutes of Research that linkedthe 2003 Trends in the International Math and Science Survey (TIMSS) scores to NAEP (Phillips 2007). Furthermore, no country would have 100 percent of its students reach NAEP’s basic level (Phillips 2007). 

In math, only three-quarters (73 percent) of top-performing Singapore’s eighth graders would reach NAEP’s proficient level (see Table 5). Two-thirds of the students in the next highest performing countries (such as Korea, Hong Kong, Chinese Taipei, and Japan) would also reach proficiency on NAEP (Phillips 2007). Moreover, only five of the 46 countries (Singapore, Hong Kong, Korea, Chinese Taipei, and Japan) who took the TIMSS would even have half their eighth-graders reach NAEP's proficient level (Phillips 2007). On the other hand, 26 percent of U.S. eighth graders reached proficiency, which is not significantly different from most European countries such as the Netherlands, England, Italy, and Sweden (Phillips 2007).

To be fair, NAEP achievement levels are not specifically designed to be predictors of any postsecondary or any other outcomes, so it is not surprising the results don't align (Hambleton, et al. 2000). Furthermore, there is no clear consensus on the kinds of external outcomes NAEP should be linked to (Pellegrino, Jones, and Mitchell 1999).

While the data do suggest that NAEP standards are set high, the important question is this: “Are they too high?” But even that question is misleading because setting achievement levels comes down to the judgment of individuals. Remember that NAEP achievement levels come from the collective judgment of experts, including teachers and other educators, who have determined that these are the knowledge and skills students need at grades four, eight, and twelve. In addition, as said before, NAEP standards are aspirational, whereas state standards are set for minimal competency.

Is NAEP constructed well?

Some have charged that NAEP’s achievement levels should be disregarded altogether because the test is not constructed well. Examining how NAEP and its achievement levels are constructed, what they are designed for, and how they should be used sheds light this controversy.

NAEP Achievement Levels       
Basic: This level denotes partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade.

Proficient: This level represents solid academic performance for each grade assessed. Students reaching this level have demonstrated competency over challenging subject matter, including subject-matter knowledge, application of such knowledge to real world situations, and analytical skills appropriate to the subject matter.

Advanced: This level signifies superior performance

The process of setting NAEP achievement levels was developed through numerous research studies and pilot tests taking place over many years (Hambleton, et al. 2000). Although decisions by panelists are based on individual judgment, the process is scientific, with each aspect documented and replicable (Hambleton, et al. 2000, Pellegrino 2007). Setting NAEP achievement levels starts well before assessments are developed. The foundation for both assessment and achievement levels is the frameworks.

The Frameworks

The assessment framework for each subject describes what students should know and be able to do at grades four, eight, and twelve (NAGB 2008a). After reviewing several evaluations of NAEP's frameworks, Robert Forsyth from the University of Iowa concluded: "In general, NAEP frameworks have been highly regarded by educators” (Forsyth 1997). And NAGB stresses we should keep in mind that these frameworks are not a national curriculum; rather, they are an outline for what NAEP should test (NAGB 2008a).

Each subject's framework is developed through a comprehensive process that involves teachers, curriculum specialists, policymakers, business representatives, and members of the general public (NAGB 2008a). When considering the knowledge and skills to be assessed, the frameworks account for factors such as state and local curricula and assessments and international standards and assessments (NAGB 2002). Each framework includes preliminary achievement levels, which provide guidance on the range of items to be included in the assessments and to begin setting achievement levels (NAGB 2002).

Once the frameworks have been approved by NAGB, they are used as the basis for developing the assessments and typically remain in place for at least ten years (NAGB 2008a). The debate over NAEP typically revolves around achievement levels and not the frameworks themselves (Linn 2001).

Setting Achievement Levels

After the frameworks have been developed and after students have taken the assessments, panelists are selected to determine cutoff scores that will divide results into three5 achievement levels (basic, proficient, and advanced). [See SideBar: NAEP Achievement Level Descriptions] Like the frameworks, panels of experts for setting achievement levels consist of teachers, curriculum specialists, and members of the public who have specific knowledge in the subject for which they are developing cutoff scores (Pellegrino, Jones, and Mitchell 1999). Although NAGB typically accepts the panelists’ recommendations, it does retain final say on where achievement levels are set (Loomis and Bourque 2001).

Panelists are carefully chosen for their knowledge of subject content, familiarity with the student population, and reputation within the education community (Hambleton, et al. 2000, Reckase 1998). There are approximately 30 panelists for each assessment, who proportionally represent all geographic regions of the country. They also represent racial/ethnic and gender groups (Hambleton, et al. 2000, Loomis and Bourque 2001). In general, 55 percent of the panel members are teachers, 15 percent are curriculum specialists or other educators, and 30 percent are members of the general public (Loomis and Bourque 2001). 

The panelists come together for approximately five days after students have completed the NAEP assessment (Reckace 2001). During the first two days panelists train on NAEP policy and the achievement level-setting process (Loomis and Bourque 2001). The training includes, for example, estimating how well borderline students will perform on NAEP items (Reckase 1998). To ensure valid and reliable results, extensive processes are in place to determine if panelists are making logical and informed judgments (Loomis and Bourque 2001). These processes include dividing panelists into two separate groups each of which develops a cut score for the same assessment (Reckase 1998). The cut scores are then compared to help determine if panelists' ratings are consistent (Reckase 1998). Furthermore, to ensure that they were comfortable with the process, panelists are asked to fill out questionnaires about their understanding of the process and their level of confidence in the results (Reckase 1998).

Unfortunately, even with all of these processes in place, there is still no way of knowing whether or not panelists have set the right standard since there is no true standard to measure against (Loomis and Bourque 2001).

Are the achievement levels “fundamentally flawed”?

The biggest charge against NAEP achievement levels was the 1999 NAS evaluation, Grading the National Report Card, which called them “fundamentally flawed.” Several other evaluations of NAEP have also found fault with NAEP's achievement levels. It should be noted that the 1999 NAS evaluation used evaluations from the early 1990s (when achievement levels were first introduced) as the basis for their conclusion instead of conducting a completely new evaluation (Hambleton, et al. 2000). It should also be noted that in 2005 NAGB implemented a new method for setting achievement levels (NAGB 2008b). However, the new process has yielded similar results as the process NAS called "fundamentally flawed" (NAGB 2008b).

NAS’s three primary arguments for claiming that NAEP achievement levels are fundamentally flawed were these.

Argument #1: Results are not believable because too few students were judged to be at the advanced level compared to Advanced Placement (AP) coursework (Pellegrino, Jones, and Mitchell 1999). The NAS report claims that the small percentage of students reaching the advanced level on the NAEP science assessment does not correspond to other measures of advanced achievement like the AP test.

Examinng the argument: Ronald Hambleton, member of NAEP's Technical Advisory Committee on Standard Setting (TACSS), found that just 1.93 percent of the 1996 grade twelve cohort scored at or above three, the score colleges typically accept as college credit on any AP science test (Hambleton, et al. 2000). In comparison three percent of twelfth-graders scored at the advanced level on the 1996 NAEP science assessment (Grigg, Lauko and Brockway 2006). So, contrary to the NAS report, it actually appears that the NAEP standard for advanced is well aligned with the percent of students passing an AP science test.

Argument #2: Panelists typically rated constructed-response questions (e.g., open-ended and short answer questions) at a higher difficulty level than multiple choice questions (Linn 1998, Pellegrino, Jones and Mitchell 1999). The NAS report argues that the cut scores obtained from constructed-response items are much higher than cut scores from multiple-choice items, thereby providing unreliable achievement levels (Pellegrino, Jones and Mitchell 1999). For example, based only on the cut score of multiple choice questions, Robert Linn from CRESST found that 78 percent of fourth-graders would have scored at or above the basic level on the NAEP math assessment (Linn 1998). However, just three percent of fourth-graders would reach the basic level if the cut scores were based on constructive-response questions alone. We see here that NAS and Linn both argue that the panelists are not consistent in their ratings of certain items. If this is true, then the percentage of students reaching a particular achievement level depends more on the mix of items within the assessment than on the students’ knowledge and skill.

Examining the argument: Many educators argue that there are too many standardized tests that don’t truly measure students' abilities. Their reasoning is that standardized tests rely too heavily on multiple-choice questions. To answer constructed response questions successfully, they argue, students need higher-order skills. However, the NAS report took issue with the fact that panelists consistently rated constructed response questions as more difficult to answer than multiple choice questions. This, they said, was evidence that NAEP levels were flawed. But, consideration must be given to the likelihood that constructed response questions and multiple choice questions are likely measuring different levels of achievement (Brennan 1998, Hambleton, et al. 2000). It could be that students are just having a difficult time answering constructed response questions.

Argument #3: Panelists have difficulty estimating whether or not a borderline student will answer a question correctly (Pellegrino, Jones and Mitchell 1999). Calculating cut scores requires that panelists rate each item to determine the probability that students at each achievement level will answer specific questions correctly (Reckase 1998). The NAS report argues that panelists are not able to determine accurately whether a borderline student (for example, one on the edge between proficient and advanced) will answer a specific item correctly. Such decisions greatly affect the cut scores for NAEP achievement levels, and the NAS report argues that they are just too difficult and confusing for panelists to judge.The NAS report also argues that panelists have a difficult time rating test items because the process is too complex.

Examining the argument: NAEP panelists consistently indicated that they understood the achievement level-setting process (Hambleton, et al. 2000), which is probably due to the extensive training they received before and during the standard-setting process. Furthermore, until 2005 the achievement level-setting method used by NAEP (the Angoff Method) had been used for decades (Hambleton, et al. 2000). Currently NAEP is using a “Mapmarking” method that produces very similar results (NAGB 2008b). NAEP checks that panelists are consistent in developing cut scores by having two sets of panelists in each grade work independently. A comparison of their results shows that they provide similar cut scores, suggesting that their results are not erratic or inconsistent (Hambleton, et al. 2000).

While the NAS report believes the NAEP achievement level setting process is fundamentally flawed, it fails to offer a better alternative. The report does cite its own model as a better choice for determining achievement levels, but its model is untried (Bourque and Byrd 2000, Pellegrino, Jones, and Mitchell 1999). However, the report does point out that looking at changes in achievement levels over time is a useful way of measuring change in student achievement. (Pellegrino, Jones, and Mitchell 1999).

Although NAGB has not  implemented every recommendation from NAS and other evaluations they have implemented numerous changes to the achievement level setting process and studied various alternatives to the current system (Bourque and Byrd 2000, Hambleton, et al. 2000). They have now switched to a process similar to the Angoff Method called Mapmarking, which they believe is just as reliable, easier to implement, and requires less time for computing cut scores while providing similar results as the Angoff Method (NAGB 2008b). Congress has mandated that every NAEP report released should clearly state that the NAEP achievement levels are to be used on a developmental basis until they are proven to be valid and reliable. Also, Congress warns that any inferences based on the achievement levels should be done with caution (Yeager 2007). As of May 2008, however, an evaluation to assess validity and reliability has yet to be conducted.

NAEP achievement levels are set consistently, logically, and scientifically. There is little evidence that they are “fundamentally flawed,” but they do appear to be high.

Are NAEP achievement levels unrealistic?

NAEP is generally considered the gold standard of student assessments. But it is not immune to criticisms, which have become more persistent as some advocate for NAEP to become the basis of national standards. Moreover, those advocating for national standards point to large discrepancies between states and NAEP when identifying students as proficient (i.e., where a student’s performance may be proficient on the state test but score below basic on NAEP).

It appears that NAEP's proficient level is a good goal but not easy to achieve. As studies have shown, not even the highest performing nations would get 100 percent of their students above NAEP's proficient level. Most would not even get half. And within the United States, many high school seniors, who by other measures are high-performing, do not score above NAEP's proficiency bar.

However, the NAEP's proficiency bar is not specifically set with the expectation that all students would be able to clear it. Proficiency is set by teachers, other educators, and subject-matter professionals who use their knowledge and expertise to set an aspirational goal for what they believe students should know and be able to do at each grade level. In contrast, state assessments are mandated by NCLB to set the proficiency bar at the level all students are expected to clear at their grade level. 

Policymakers need to consider what are good goals for educational purposes compared to what is appropriate for accountability when establishing cut scores on their state assessments. Many experts suggest that NAEP-Basic level is a better gauge for the latter.

What should you do?

In this guide we have helped educators, parents, board members, and other policymakers to understand what NAEP is actually measuring, how states compare to NAEP, and why results may differ. We have also explained how NAEP fits into the national standards debate so you can be more effective advocates for schoolchildren across the country. Below are some specific things you can do to better understand the issues surrounding state and NAEP assessments.

  • Ask questions if there’s a large gap between NAEP scores and state scores, but be careful about the conclusions you draw. The size of the gap can be misleading. Large proficiency gaps between NAEP and state assessments can be a red flag, indicating that your state’s standards aren’t providing the knowledge and skills students need for success. But if your state has a slightly smaller gap than another state, that doesn’t mean it necessarily has more rigorous standards.

  • Review assessment questions NAEP releases that exemplify each achievement level (Barth 2006). If you think these are questions your own child should be able to answer, then they are probably at the level all students in your district would be expected to reach.

  • Compare the percentage of students at proficient in your state to the percentage at NAEP’s proficient level (Table 1  and Table 2). Then compare it to NAEP’s basic level (Table 3  and Table 4) as well. Determine if the gap in your state is larger than the gap in other states.

  • Examine if there has been a change in the percentage of students reaching each level of achievement.

  • Examine the differences in cut scores between your state and others (see Figures 1, 2, 3, and 4). Data show that if state cut scores for proficiency were placed on a NAEP score scale, they would vary anywhere from higher than NAEP's proficiency cut score to within NAEP's below basic level. In some cases, state cutoff scores vary as much as 90 points. Again, don’t make snap judgments; be careful to only treat differences as an indication that more investigation is needed.

  • Encourage your state to analyze whether your state’s standards and curriculum are misaligned with NAEP frameworks. If so, examine whether or not your standards are addressed at a different grade level or not addressed at all. District leaders should also examine their curriculum alignment with NAEP's frameworks.

The steps above can help you determine for yourself whether or not students in your state and district are learning the knowledge and skills necessary to succeed throughout school and beyond.


Full description of the score linking process is in Section 1 of National Center for Educational Statistics (2007).Mapping 2005 State Proficiency Standards Onto the NAEP Scales (NCES 2007-482).

U.S. Department of Education. Retrieved on December 20, 2007, fromhttp://nces.ed.gov/nationsreportcard/pdf/studies/2007482.pdf

2 A panel of testing experts and education researchers formed by the American Institutes of Research (AIR) in 1995 through a contract with NCES to provide technical review of NAEP plans and products and to identify technical concerns.

Full description of the score linking process is in Scott, L. A., Ingels, S. J., and Owings, J. A. (2007).Interpreting 12th-Graders' NAEP-Scaled Mathematics Performance and Postsecondary Outcomes From the National Education Longitudinal Study of 1988 (NELS:88) (NCES 2007-328). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education Retrieved on October 17, 2007, fromhttp://nces.ed.gov/pubs2007/2007328.pdf

4 Full description of the score linking process is in Phillips, G. W. (2007). Linking NAEP Achievement Levels to TIMSS. American Institutes for Research. Retrieved on November 11, 2007, fromhttp://www.air.org/news/documents/naep-timss.pdf

Below Basic is not an official NAEP achievement level so there is no description of what students should know and be able to do at this level. It is simply the percent of students who did not reach the Basic achievement level.

This guide was written by Jim Hull, Policy Analyst, Center for Public Education

Special thanks to Susan Loomis, Assistant Director for Psychometrics, National Assessment Governing Board (NAGB); Ray Fields, Assistant Director for Policy and Research, NAGB; and Donald Rock, Educational Researcher, ETS for their insightful feedback and suggestions. However, the opinions and any errors found within the paper are solely those of the author.

Posted: June 17, 2008

© 2008 Center for Public Education

Add Your Comments

Display name as (required):  

Comments (max 2000 characters):


Home > Evaluating performance > The proficiency debate > The proficiency debate: A guide to NAEP achievement levels