Learn About: Evaluating Performance | Common Core
Home > Evaluating performance > The proficiency debate
| print Print


The proficiency debate: At a glance

If you handed in the same assignment to two different teachers and received an "A" from one and a "D" from the other, wouldn’t you have some questions?

Although the difference isn’t always that dramatic, the discrepancy in the “grades” received on the National Assessment of Educational Progress (NAEP)—a nationwide test often called The Nation’s Report Card—and state's own assessments is causing many people to have a few questions.

Recently, numerous reports have highlighted the fact that most state assessments find more students “proficient” than the NAEP assessment does. Sometimes, the gap can be substantial. On average, there is nearly a 40 percentage point difference between state and NAEP assessments in scoring how many fourth-graders are proficient in reading (Table 1 ).

Such gaps immediately suggest two options:

  • Either NAEP standards are unrealistically high, or
  • State standards are woefully low.

Those who believe the latter have called for NAEP scores to be displayed next to state scores or for NAEP to be used as a national standard. Those who lean toward the former have increased their criticism of NAEP, often citing a National Academy of Sciences (NAS) study that called NAEP's standard for proficiency “fundamentally flawed.”

So is NAEP unrealistic? Unfortunately, the answer isn’t that simple. NAEP and state assessments have different objectives and can measure different things. If you are facing a gap between your state’s scores and NAEP scores, it’s important to know how NAEP is developed, its purpose, and how “proficient” is defined. At-a-glance briefly walks you through how NAEP scores are determined and helps you uncover what it means if proficient scores in your state differ dramatically from NAEP scores. For a fuller discussion, see The proficiency debate: A guide to NAEP achievement levels .


Takeaway points

  • NAEP standards are high. For instance, not even the highest-performing countries, such as Singapore, would have 100 percent of their students reach NAEP’s “basic” level.
  • States vary widely in how they define proficiency. If state cut scores for proficiency were placed on a NAEP score scale, they would vary anywhere from higher than NAEP’s proficiency cut score to within NAEP’s “below basic” level. In some cases, state cut scores would vary as much as 90 points.
  • NAEP is developed scientifically, logically, and consistently. There is little evidence that the test is “fundamentally flawed,” as some critics charge, but there is evidence that its standards are set high.
  • NAEP and No Child Left Behind define “proficient” differently. According to NAEP, “proficient does not refer to ‘at grade’ performance.” Rather, proficient is “the overall achievement goal for American students” but NAEP acknowledges that “the average performance score on NAEP in most subjects falls within the Basic achievement level” (Loomis and Bourques, 2001). The U.S. Department of Education, on the other hand, is very explicit in stating that for NCLB purposes, "the Proficient achievement level [for state assessments] represents attainment of grade-level expectations for that academic content area” (U.S Department of Education 2007). Simply put, NAEP's standard for proficiency is set at a level wewant every student to reach, while states set their standard for proficiency at a level we expect every student to reach. Some have suggested that NAEP’s “basic” standard is a better comparison for state “proficient” percentages.

How big is the discrepancy?

  • On average across the country, states' fourth grade reading proficiency rates are almost 40 percentage points higher than their NAEP scores. In eighth grade math, the gap was a bit smaller at 26 points (see Table 1).
  • A NCES report found that in eighth grade mathematics (see Figure 2), Missouri, South Carolina, and Massachusetts were the only states to set their proficiency cutoff scores similar to NAEP’s (NCES 2007).
 
  • Not one state's fourth or eighth grade cutoff score for reading proficiency was as high as NAEP's (NCES 2007) (see Figures 3 and 4). As a matter of fact, fourth grade students in 22 states can be deemed proficient in reading on their state assessment, yet be characterized as below basic in terms of NAEP.
 
 

How to Read Figures 1, 3, and 4

The circles in the graphs represent where each state's cutoff score for proficiency would be if placed on NAEP's scoring scale. Since placing state scores on the NAEP scale is a statistical process there is a certain amount of error. So the lines expanding from the circles represents the range of NAEP scores that state cutoff scores for proficiency would likely fall between and the circles are in the center of that range. 

Placing each state's cutoff score for proficiency on the NAEP scale provides a way of comparing the rigor of state performance standards to NAEP. So states with higher NAEP scores likely have more rigorous performance standards for proficiency on their state assessments. For example, Figure 1 shows that 4th grade students in Massachusetts who score at or above proficient on their state math assessment would also score at or above NAEP's proficient achievement level, while students in Tennessee may score proficient on their state assessment but score below basic on NAEP. The graph shows that Massachusetts has a higher performance standard for proficiency than Tennessee.


Why is there a discrepancy?

Just because NAEP and states label a level of achievement “proficient” does not mean they are defining proficiency the same way. Nor are they necessarily testing the same knowledge and skills.

National Assessment Governing Board
National Assessment Governing Board is an independent, bipartisan group whose members include governors, state legislators, local and state school officials, educators, business representatives, and members of the general public. Congress created the 26–member Governing Board in 1988 to set policy for the National Assessment of Educational Progress (NAEP)—commonly known as the "The Nation's Report Card."

  • For example, the National Assessment Governing Board (NAGB), the board that oversees NAEP policies, states that "In particular, it is important to understand clearly that the Proficient achievement level does not refer to 'at grade' performance" (Loomis and Bourque 2001).
  • Unlike NAEP, NCLB requires states to define proficiency in terms of being on grade level. A report developed by the U.S. Department of Education specifically states that "The proficient achievement level [for NCLB] represents attainment of grade-level expectations for that academic content area" (U.S. Department of Education 2007).
  • A report commissioned by the NAEP Validity Studies Panel1 found that the most appropriate NAEP level for gauging state assessment results should be the percentage of students at or above the basic achievement level (Mosquin and Chromy 2004) (see Table 3  and Table 4 ). Even U.S. Secretary of Education Margaret Spellings urged reporters to compare state proficiency rates to NAEP's basic level (Dillon 2005). Moreover, James Pellegrino, lead author in the National Academy of Science's 1999 evaluation of NAEP called Grading the Nation's Report Card, suggested that perhaps proficiency for accountability purposes lies somewhere between NAEP's basic and proficient levels because NAEP's proficiency level was not developed with accountability in mind (Pellegrino 2007).
  • What NAEP and states assess is not always aligned. For example, almost half (49 percent) of NAEP's eighth grade science standards are not covered in TAKS, the Texas state assessment (Timms, et al. 2007b). In Louisiana, where there was little difference in results (see Table 1 ), only seven percent of NAEP’s science content was not addressed in its assessment (Timms, et al. 2007a).

Are NAEP achievement levels set too high?

Even if the tests are measuring different things, does the large gap mean that NAEP achievement levels are just set too high? Or, conversely, are state standards too low?

  • One report found that:
    • Just over half (56 percent) of “A” students in math scored at or above the NAEP proficient level, while just 20 percent of “B” students did so (Scott, Ingels, and Owings 2007).
    • About two-thirds (68 percent) of seniors who had completed calculus reached the proficient level on NAEP, 13 percent of whom scored at the advanced level (Scott, Ingels, and Owings 2007).
  • Only 21 percent of seniors who had scored between 21 and 25 on the ACT mathematics assessment performed at NAEP’s proficient level (Scott, Ingels, and Owings 2007). According to ACT, a score of 22 on the math portion of its college admissions test indicates that a student is ready for college algebra (Allen and Sconing 2005).
  • No country, not even those performing highest on international assessments, would have 100 percent of its students reach NAEP’s proficiency level (Phillips 2007). Furthermore, no country would have 100 percent of its students reach NAEP’s basic level (Phillips 2007).

While the data do suggest that NAEP standards are set high, the important question is this: “Are they too high?” But even that question is misleading, because setting achievement levels is left to the judgment of specific individuals. Remember that NAEP achievement levels come from the collective judgment of experts—including teachers and other educators—who have determined the knowledge and skills students should have at grades four, eight, and twelve.

Is NAEP constructed well?

Knowing how NAEP assessments are created, what they are designed for, and how they should be used sheds additional light on the question of whether or not the levels are too high.

About NAEP 
NAEP, also known as The Nation's Report Card, is the only nationwide test that can compare student achievement across states. Although state participation was originally voluntary, No Child Left Behind now requires states to participate in the fourth and eighth grade NAEP assessments in reading and math3 as a condition to obtaining Title I funds. 

The first NAEP assessments date back to 1969, and provide educators information on how well U.S. students were performing nationally over time. It wasn't until 1990 that State NAEP, also known as Main NAEP, was created to enable comparisons across states. This version of NAEP introduced criterion-based achievement levels (basic, proficient, and advanced) (NAGB 2005).

  • For each subject, an assessment framework describes what students should know and be able to do at grades four, eight, and twelve (NAGB 2008). Keep in mind, NAGB stresses that these frameworks are not a national curriculum, but only an outline of what NAEP should test (NAGB 2008).
  • Each subject's framework is developed through a comprehensive process involving teachers, curriculum specialists, policymakers, business representatives, and members representing the general public (NAGB 2008).
  • Once the frameworks have been developed and after students have taken the assessments, panelists of teachers, curriculum specialists, and members of the public with knowledge in the subject area are selected to determine the cutoff scores that will divide scores into four achievement levels (below basic2, basic, proficient, and advanced).
  • Although the achievement level setting process is based on the judgment of teachers, curriculum specialists, and other experts, the process is scientific, with each aspect documented and replicable (Hambleton, et al. 2000, Pellegrino 2007).

Are the achievement levels “fundamentally flawed”?

The biggest charge against NAEP achievement levels was the 1999 National Academy of Sciences (NAS) evaluation Grading the National Report Card, which called them “fundamentally flawed.” Other analysts disagreed with the NAS findings. NAS's principal arguments and the counter-arguments were these:

NAS Argument #1: Results are not believable because too few students were judged to be at the advanced level compared to Advanced Placement course work (Pellegrino, Jones, and Mitchell 1999). The NAS report claims that the small percentage of students reaching the advanced level on the NAEP science assessment does not correspond to other measures of advanced achievement, such as the Advanced Placement test.

Examining the argument: Just 1.93 percent of the 1996 grade 12 cohort scored at or above 3, the score which colleges typically accept as college credit, on any AP science test (Hambleton, et al. 2000). In comparison, three percent of twelfth graders scored at the advanced level on the 1996 NAEP science assessment (Grigg, Lauko and Brockway 2006).

NAS Argument #2: Panelists typically rated open-ended questions (also called “constructed-response” questions) at a higher difficulty level than multiple-choice questions (Linn 1998, Pellegrino, Jones, and Mitchell 1999). The NAS report finds fault with the way NAEP gives more weight to open-ended questions, such as short-answer questions, than it does to multiple-choice questions when determining cut scores. The NAS report implies that since these questions are measuring the same content, they should count the same. Since NAEP, instead, considers open-ended questions to be more difficult to answer, NAS argues that NAEP winds up with lower scores and unreliable achievement levels.

Examining the argument: Many educators argue that too many standardized tests don’t truly assess students’ abilities because they rely too heavily on multiple-choice questions. They argue that answering open-ended questions successfully requires higher-order thinking skills. So it’s possible that open-ended questions do measure a higher level of achievement than multiple-choice questions (Brennan 1998, Hambleton et al. 2000).

NAS Argument #3: The NAEP review panelists have difficulty estimating whether a borderline student will answer a question correctly or not (Pellegrino, Jones, and Mitchell 1999). The NAS report argues that the panelists who set NAEP achievement levels are unable to provide an accurate probability that a student at the borderline between proficient and advanced, for example, will answer a specific item correctly. These decisions greatly affect the cut scores of NAEP's achievement levels and the NAS report believes they are too difficult and confusing to panelists to judge.

Examining the argument: NAEP panelists consistently indicate that they understood the achievement level setting process (Hambleton, et al. 2000). NAEP checks that panelists are consistent in developing cut scores by having two sets of panelists in each grade working independently. Comparisons of their results show that panelists provide similar cut scores suggesting that they are not providing erratic or inconsistent results (Hambleton, et al. 2000).

NAEP achievement levels are set consistently, logically, and scientifically. There is little evidence that they are “fundamentally flawed,” but they do appear to be high. 

Are NAEP achievement levels unrealistic?

  • It appears that NAEP's proficient level is a good goal but not easy to achieve. As studies have shown, not even the highest performing nations would get 100 percent of their students above NAEP's proficient level. Most would not even get half. And within the United States, many high school seniors, who by other measures are high-performing, do not score above NAEP's proficiency bar.
  • However, the NAEP's proficiency bar is not specifically set with the expectation that all students would be able to clear it. Proficiency is set by teachers, other educators, and subject-matter professionals who use their knowledge and expertise to set an aspirational goal for what they believe students should know and be able to do at each grade level. In contrast, state assessments are mandated by NCLB to set the proficiency bar at the level all students are expected to clear at their grade level.
  • Policymakers need to consider what are good goals for educational purposes compared to what is appropriate for accountability when establishing cut scores on their state assessments. Many experts suggest that NAEP-Basic level is a better gauge for the latter.

Just remember, how proficiency is defined is based on how proficiency is used. Considering the different purposes of NAEP and state assessments, it is not surprising each provides different results. What is most concerning is the large variance in the way that states define proficiency.

What should you do?

There are several specific things you can do to better understand the issues surrounding state and NAEP assessments.

  • Ask questions if there’s a large gap between NAEP scores and state scores, but be careful about the conclusions you draw.
  • Encourage your state to analyze whether your state’s standards and curriculum are aligned with NAEP frameworks. District leaders should also examine their curriculum alignment with NAEP's frameworks.
  • Compare the percentage of students at proficient in your state to the percentage at NAEP’s proficient level (see Table 1  and Table 2). Then compare it to NAEP’s basic level (see Table 3  and Table 4 ). Determine if the gap in your state is larger than the gap in other states. 
  • Examine if there has been a change in the percentage of students reaching each level of achievement. Movement from “below basic” to “basic” is a positive change, but the gains may not show up when looking only at the percentage of students who reach “proficient.”
  • Look at the differences in cut scores between your state and others (Mapping state cut scores against NAEP ). Data show that state cut scores vary from higher than NAEP's proficiency cut score to within NAEP's below basic cut level. Again, don’t make snap judgments; be careful to only treat differences as an indication that more investigation is needed.   

Footnotes

A panel of testing experts and education researchers formed by the American Institutes of Research (AIR) in 1995 through a contract with the NCES to provide technical review of NAEP plans and products and to identify technical concerns. 
The Below Basic is not an official NAEP achievement level so there is no description of what students should be able to know and do at the Basic level. It is just the percent of students who do not reach NAEP's Basic Achievement Level.
Full description of the score linking process is in Scott, L. A., Ingels, S. J., and Owings, J. A. (2007).Interpreting 12th-Graders' NAEP-Scaled Mathematics Performance and Postsecondary Outcomes From the National Education Longitudinal Study of 1988 (NELS:88) (NCES 2007-328). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education Retrieved on October 17, 2007, from http://nces.ed.gov/pubs2007/2007328.pdf


This guide was written by Jim Hull, Policy Analyst, Center for Public Education

Special thanks to Susan Loomis, Assistant Director for Psychometrics, National Assessment Governing Board (NAGB); Ray Fields, Assistant Director for Policy and Research, NAGB; and Donald Rock, Educational Researcher, ETS for their insightful feedback and suggestions. However, the opinions and any errors found within the paper are solely those of the author.

Posted: June 17, 2008

© 2008 Center for Public Education 

Add Your Comments





Display name as (required):  

Comments (max 2000 characters):




Comments: