Learn About: Evaluating Performance | Common Core
Home > Instruction > High-stakes testing and effects on instruction > High-stakes testing and effects on instruction: Research review
| print Print


High-stakes testing and effects on instruction: Research review

Because tests are a critical reporting mechanism for the Adequate Yearly Progress (AYP) measurements mandated by the No Child Left Behind Act of 2001, this new century brings with it a vastly intensified focus on assessment in U.S. public education. Testing has provoked a spate of articles and books passionately defending and opposing assessment and its effects on teaching and learning. Formative assessments, otherwise known as classroom assessments, generally escape criticism; indeed they are actively encouraged as an integral part of teaching (Educational Leadership, 2005; McIntire 2005a). On the other hand, high-stakes tests provoke most of the controversy. In an effort to uncover the issues associated with high-stakes tests, this research review will focus solely on this venue and its effects on instruction.

Although there are many articles on high-stakes testing, only a few qualified for our consideration because most did not report empirical research. As is often the case with research on educational topics, the research on the responses to high-stakes tests needs to be approached with judgment and caution. Above all it needs to be approached with an open mind. Research does not give us the definitive answers we seek, rather it provides us with tools to arrive at our own conclusions. Keeping this in mind, we have selected research that is rigorous about answering the questions below. 

Research On The Effects Of Testing

Plenty has been written about the negative effects of high-stakes tests on instruction, but little is based on hard evidence. Although rigorous studies are emerging showing that the effects of high-stakes testing can be beneficial if the right conditions are met, we need to recognize a problem presented by much of the current literature—most is opinion that does not report empirical research. This problem is well documented by Sandra Cimbricz, who set out to examine the relationship between state-mandated testing and teachers’ beliefs and practice. She writes, "Most of the professional literature I was able to locate was theoretical rather than empirical…. The exclusion of 'non-empirical' works (e.g., essays, anecdotal reports, testimonials) reduced an extensive list of citations to a small body of work." Cimbricz then selected studies that met professional standards for qualitative and quantitative research, which, she found reduced the eligible studies to "a handful" (Cimbricz, 2004).

Research on the relationship of assessment to teaching has grown since 2002, but it remains overbalanced by essays, anecdotal reports, testimonials, and protests appearing in educational publications. This literature divides quite sharply between support for or arguments against high stakes testing. Many school board members and educators will be familiar with the literature describing the deleterious effects of testing and the authors who write it: George Madaus, Alfie Kohn, Monty Neil, Linda McNeil, Deborah Meier, Walt Haney, Harold Berlak, W. James Popham, Susan Harman, Susan Ohanian.

According to Stuart Yeh (2005), critics of high-stakes testing generally report four negative classroom effects produced by testing.

  • Narrowing the curriculum by excluding from it subject matter not tested.
  • Excluding topics either not tested or not likely to appear on the test even within tested subjects.
  • Reducing learning to the memorization of facts easily recalled for multiple-choice testing.
  • Devoting too much classroom time to test preparation rather than learning.

For the most part, these effects have not been subjected to the rigors of research. Instead they have been reported anecdotally or have been forecast to be the inevitable bad effects of high-stakes testing. Pointing out that large assumptions have been made from tiny studies about the negative effects of testing, Gregory J. Cizek says, "Evidence appears to have become even more skimpy in support of conclusions that seem even more confident" (Cizek, 2001).

A new large-scale survey of education officials in 50 states and 299 representative districts begins to provide some insights into the magnitude of the effects of testing, however.  Responding to questions about the impact of the No Child Left Behind Act, a large majority of districts — 71 percent — reported that NCLB's testing requirements have led them to increase curricular time spent on reading and math for students at risk of failing, and decrease time for other subjects. However, districts were divided about whether this was a negative or positive effect. Some thought that this narrowing was "shortchanging students from learning important subjects" while others saw it as "necessary to help low-achieving students catch up"  (Center on Education Policy, 2006).

The literature we consulted for this question looks at the effects of high-stakes testing from two vantage points.

  • How does high-stakes testing affect classroom instruction now? Several testing experts have published cautions about the possible negative effects of high-stakes testing and many articles cite anecdotal evidence to describe how this is currently happening in classrooms. But there is little research about the extent of such practices, even though it's clear that they occur at least sometimes. We looked at public opinion polls, teacher surveys, and a large-scale survey of state and district education officials to try to understand how testing is affecting classrooms and students and how widespread the effects are. These surveys presented a mixed picture of drawbacks and benefits as reported by both the public and educators.
  • What is the effect of classroom instruction on high-stakes test scores? The emerging research is more rigorous about documenting practices that produce better test results than it is at describing the extent of "teaching to the test" in schools and classrooms. This new research shows that teaching a curriculum aligned to state standards and using test data as feedback produces higher test scores than an instructional emphasis on memorization and test-taking skills.

The General Public's Support Of High-Stakes Testing

While a majority of the public supports testing and accountability, it is also worried that an emphasis on testing could go too far. A major concern is reliance on a single test for making high-stakes decisions. According to public opinion polls, the general public cautiously supports testing, believing that it has a role in accountability, but that it should not affect teaching. Indeed, confidence that it will not cause negative teaching to the test may be growing.


Research into attitudes towards assessment is conducted by surveys, either of sample populations or large-scale. In 2003, Public Agenda, a nonpartisan opinion research and civic engagement organization, published a report conducted by Jean Johnson and Ann Duffett. The report assembled results from several surveys of varying extent and concluded that most people in the United States support testing. According to the report, 71 percent of parents, teachers, and the general public said they support annual mandatory testing as a check on the performance of schools. Slightly fewer—67 percent—would like to see students in 3rd through 8th grade tested annually to track their academic progress. Fifty-four percent of parents and 58 percent of teachers would like high school students to pass a basic skills test in reading, writing, and mathematics before graduating.

Similar results were found in Phi Delta Kappan’s 2005 annual poll authored by Lowell C. Rose and Alec M. Gallup. In this poll, 57 percent of the general public support the amount of achievement testing in schools with 40 percent saying testing is about the right and 17 percent believing there is not enough testing. A larger percentage—67 percent—supports a proposal to extend testing for NCLB from its current standard of testing once in high school to testing yearly in grades 9-11.

However, both Public Agenda and Phi Delta Kappan (PDK) found concern over the amount of testing and its effect on teaching. In the polls summarized by Public Agenda, respondents worried that testing can go too far, therefore having negative rather than positive consequences. Dependence on a single test was the major source of worry. Respondents believed that standardized test scores should be accompanied by teacher evaluations for high-stakes decisions, such as graduation. Two-thirds of them also believed that a single test would not provide a fair picture about a school's need for improvement.

In the PDK poll, Rose and Gallup asked two questions about testing's effect.

  • Will the current emphasis on standardized tests encourage teachers to teach to the test?
  • If the current emphasis on results is encouraging teachers to teach to the tests, do you think this will be a good thing or a bad thing?

A majority of respondents replied that emphasizing tests will encourage teaching to the test and this would be a bad thing; however, in each case the percentage had declined from 2003. In that year's poll, 66 percent said that tests would provoke teaching to the test, but 58 percent thought so in 2005 (see Figure 1). In 2003, 60 percent thought this would be a bad thing, but 54 percent said so in 2005. Administrators of this poll did not comment on the trend, but perhaps it may be attributed to increasing familiarity with testing.

Attitudes of Teachers and Counselors Toward Testing

Teachers report complicated and sometimes contradictory views of how high-stakes tests affect instruction. While a large majority of teachers believe that testing will have a negative impact on instruction, an equally large majority said that it has not affected their own teaching.

Sandra Cimbricz found in her 2002 literature review that state-mandated testing influences teachers' beliefs and practice, but how and how much was not clear. Other factors such as teachers' knowledge of their subject matter, their views of teaching and learning, and the context in which teachers worked affected their perception of how much state-mandated testing influenced their beliefs and practice. Cimbricz concluded that more research is needed to tease out the influence of these other factors.

Pamela J. Carter conducted a small Tennessee study in 2003, that found conflicting attitudes regarding the attitudes of teachers toward testing. In this case, Carter refers to the Tennessee Comprehensive Assessment Program (TCAP), which is used for both state and NCLB accountability. The study included 96 highly effective teachers from the Hamilton County school district. Carter found that while 47 percent of the respondents thought the TCAP was sound and useful for measuring growth and achievement, 40 percent thought that the test was not a sound assessment device. Twenty-nine percent thought that it should not be the only measure of success.

A similarly complicated picture emerged from survey data collected by Public Agenda. Conducted by Jean Johnson and Ann Duffett, the 2003 survey found that although teachers thought too much emphasis was placed on testing, they exhibited contradictory attitudes about the influence of testing on teaching. On the one hand, 79 percent of teachers thought their colleagues would teach to the test rather than ensure that real learning takes place. On the other hand, they reported they would not do so themselves. A large majority—73 percent—said their teaching is not distorted by testing pressures. Only 26 percent said they spend so much time on test preparation that learning is neglected (see Figure 2).



There was no such contradiction in school counselors' negative response to high-stakes testing. At least not in North Carolina. Duane Brown and colleagues surveyed counselors in the state and published their findings in 2004. Eighty percent of the respondents said they act as test coordinators and this role reduces the time available to provide services to and work with students, teachers, and administrators. Despite their unhappiness with the time required by the North Carolina's high-stakes tests, the counselors perceived that testing had some positive effects on student attitudes and achievement, such as encouragement to achieve and pursue a consistent course of study. But by far, the majority of these counselors thought their state's high-stakes testing increased stress in teachers, students, and themselves.

Attitudes of Students Toward Testing

Gregory Cizek reviewed the literature on testing and observed that anecdotes abound "describing how testing…produces gripping anxiety in even the brightest students and makes young children vomit, or cry, or both" (Cizek, 2001). However, one empirical study conducted in Arkansas in 2001, by Sean Mulvenon and colleagues provides evidence that such reactions are the exception.

Mulvenon's study combined results from two specific tests with student survey data on attitudes and perceptions of standardized testing. The two tests used included an off-the-shelf norm-referenced test administered to students in the spring of 4th grade and the state's criterion-referenced test given to the same students as 5th graders the following fall. The state criterion-referenced exam was used as part of the Arkansas accountability system.

All 5th grade students from an Arkansas school district consisting of 10 elementary schools were asked to participate, and 283 did so. The questionnaire given to the students contained 24 questions designed to measure such constructs as test anxiety, pressure, and the students' perception of their ability to read and do mathematics. The researchers performed statistical analyses correlating the students’ scores with answers on the questionnaires.

Anxiety, school climate, pressure from teachers and parents, and school rewards for good scores were not significantly related to performance on standardized tests. Mulvenon and his colleagues concluded that while some students experienced anxiety and pressure, most students experienced "little or no negative effects from testing" (Mulvenon, et al, 2001).

This study was small and concerned children in a single grade in one school district, but the researchers obtained information directly from the students, a strategy that has not been widely adopted in research about the effects of testing.

High-Stakes Tests as Educational Policy

A major, widely reported study concluded that high-stakes testing is "a failed policy initiative" (Amrein & Berliner, 2001). But subsequent researchers are finding the opposite: that accountability measures linked to test scores improve student performance.

Although the general public, national, and state legislators support high-stakes testing as policy directed toward improving student achievement, educational experts frequently don't.

In numerous articles, experts in psychometrics—the study, design and administration of tests—repudiate the value of tests when they become policy. In his 1999 report, Richard Phelps writes, "The most curious aspect of this debate is the special animus that many testing 'experts' hold for tests." For example, psychometrician W. James Popham writes criticisms of tests in various educational journals (2001, 2003a, 2003b, 2004) and tells parents, "Educational tests are much less accurate than most parents believe" (Popham, 2003a). Similarly, in a widely quoted passage noted researcher Robert Linn, writes "Assessment systems that are useful monitors [of student performance] lose much of their dependability and credibility…when high-stakes are attached to them" (Linn, 2000).

There is no doubt—and on this experts agree, whether they oppose or support large-scale high-stakes testing—that accountability systems and the tests on which they depend are in their infancy and will need a great deal of refinement as they develop (McAdams, 2002).

There is no doubt—and on this experts agree, whether they oppose or support large-scale high-stakes testing—that accountability systems and the tests on which they depend are in their infancy and will need a great deal of refinement as they develop.

In order to establish the validity of this educational policy—one that depends on high-stakes tests to demonstrate improved student achievement—researchers use social science research techniques to look at the effects of testing. Comparative studies are often used to identify relationships between different variables, and to make generalizations based on the analysis. Researchers caution—and readers should keep in mind—that showing a statistical relationship is not the same as proving a cause and effect. Nonetheless, generalizations can be made with some degree of confidence when the methodology is sound.

In their 2001 study, Audrey Amrein and David Berliner used comparative data to answer the question: Does higher achievement on high-stakes state tests transfer to other tests? If learning does transfer, the authors argue, then educational policy based on the implementation of high-stakes tests is justified.

They compared state test results in 18 states with their scores on national tests (i.e., NAEP, SAT, ACT, AP), which have few or no stakes for schools. The assumption was that scores on the national tests would rise if learning occurred. Their statistical analysis seemed to show that in most cases there were no gains on the national tests after the implementation of high-stakes state tests. (One case was an exception. On NAEP reading, the student cohort followed from 4th-8th grade was observed to have made gains in nine of the 18 states studied. Amrein and Berliner speculate that this anomaly may have been related to an instructional emphasis on reading found across the country from 1994-1998).

As a result of their analysis, Amrein and Berliner concluded that high-stakes testing "is a failed policy initiative." Because of their prestige as educational experts, their results were widely reported in the national press.

But research is open to technical criticism by other researchers. In 2003, Margaret Raymond and Eric Hanushek looked at Amrein and Berliner's original study and a later one that expanded the number of states using high-stakes testing to 28. After examining Amrein and Berliner's data and methodology, Raymond and Hanushek refuted Amrein and Berliner's claim declaring, "Our results are astonishing: If basic statistical techniques are applied to [Amrein and Berliner's] data, it reverses nearly every one of their conclusions."

Raymond and Hanushek first demonstrated the technical flaws in Amrein and Berliner's analysis, then showed that Amrein and Berliner did not make the obvious comparison between performance on NAEP in states with high-stakes tests and those without. Finally Raymond and Hanushek reported on their own research using NAEP scores in states where consequences fall on schools rather than students. On one hand, they concluded that student performance on state tests improves after accountability measures are introduced and on the other hand, that abuses do occur—students are excluded from testing for fear that they will drag down the scores. Cheating is also found.

Meanwhile, Barak Rosenshine, of the University of Illinois, carried out the analysis that seemed obvious to Raymond and Hanushek and other researchers—comparing NAEP scores in states with high-stakes tests against states without such tests. In his 2003 study, Rosenshine showed that states with high-stakes tests scored higher on NAEP, therefore learning was transferred from statewide testing to the NAEP tests. Amrein and Berliner responded to Rosenshine's criticisms by reanalyzing some of their data. While staunchly maintaining the soundness of their findings, they acknowledge that in states with high-stakes tests, 4th grade math scores are higher than in those without. However, they emphasize the exclusion of students from NAEP in high-stakes testing states, a problem they insist is getting worse (Amrein & Berliner, 2003).

The most sophisticated criticism was yet to come. In 2004, Henry Braun of the Educational Testing Service performed a highly technical statistical analysis of Amrein and Berliner's claims about NAEP mathematics achievement at 4th and 8th grade by comparing high-stakes testing states against those low-stakes states that participated in NAEP. In his report, Braun found that his comparisons "strongly favor the high-stakes testing states" (Braun, 2004), and that these results cannot be accounted for by student exclusions.

Moving the discussion to the international scene, John H. Bishop of Cornell University also compared tests to see the effects of high stakes. He analyzed data from several international examinations (i.e., 1995 TIMSS and IAEP) by comparing the results of what he calls "curriculum-based" exams in Europe, Asia, Canada, and two U.S. states (New York and North Carolina) with those of U.S. states that had minimum-competency exams for high school graduation. In his 2001 analysis Bishop showed that students who were required to pass curriculum-based external graduation exams—like those used internationally and in New York and North Carolina—learn more than their peers who do not take such exams. However, he found the effects of minimum competency exams on test scores to be "positive but small and mainly insignificant" (Bishop, 2001).

A 2006 study by Education Week contributes more evidence of a positive relationship between high-stakes state tests and achievement gains. To commemorate the 10th anniversary of Quality Counts—Education Week'sannual state-by-state report card on public education—the publishers sponsored an analysis to find out if standards-based reform is having an effect on student achievement. Education Week's definition of standards-based reform comprises four key strands:

  • State standards in core academic subjects
  • Aligned assessments
  • Accountability
  • Policies to improve teacher quality

According to the study’s author, Christopher Swanson, there was "strong evidence that implementing a solid program of standards-based education policies has been associated with significant gains in mathematics achievement over the past decade, as measured by NAEP. Positive but less dramatic results are also found for achievement in reading" (Swanson, 2006).

Quality Counts grades each state on the strength of its education policies in the four standards-based strands. Analysts examined change in states’ grades for how they relate to change in their NAEP math and reading scores over the same period. As noted, the combined effect of the four strands was positive. Moreover, whenEducation Week looked at each strand separately, NAEP gains in math were most strongly related to the strength of state assessment policies followed by standards and accountability. (Interestingly, teacher-quality policies were found to have a negative relationship to NAEP scores—a puzzle that the author says requires further examination.)


"Teaching To The Test": Harmful or Not?

Emerging studies suggest that teaching to the test can be good or bad: Good if it means teaching a focused and aligned curriculum; bad if it reduces instruction to the memorization of test items.

Unfortunately, the empirical research on teaching to tests is not as extensive as needed to make generalizations with complete authority. Most of the studies are confined to a single state. Of those that include collections of states, none is national in scope. These studies, similar to the Arkansas student survey conducted by Sean Mulvenon, require replication on a large scale and in other locations, so the answer to our question must be provisional until more thorough research is completed. But we can consider some results as useful pointers toward sound policy.

As Stuart Yeh notes, critics of high-stakes testing assert the tests will lead to narrowing the curriculum to the subjects and topics that appear on the test, sacrifice higher order skills to rote memorization, and emphasize test preparation at the expense of "real learning." All of these could be regarded as aspects of what is often called "teaching to the test" (Yeh, 2005).

The phrase can also imply that, having found out by fair or foul means what items will be on the test, the teacher drills the students to fill in the appropriate bubbles on the answer sheet. This is clearly cheating, although not unknown in a climate of high anxiety about the consequences of poor results for both students and teacher. But narrowing the curriculum is also cheating, because it cheats students of the full extent of the curriculum.

In his 2001, article "Teaching to the Test," W. James Popham provides useful vocabulary by distinguishing between "item-teaching" and "curriculum-teaching." He writes, "In item-teaching, teachers organize their instruction either around the actual items found on a test or around a set of look-alike items." Curriculum-teaching, on the other hand, means teaching to the knowledge and skills prescribed in the curriculum. A good curriculum will cover everything students should know (aligned with state standards) so they are prepared to answer questions on any part of it. Curriculum-teaching, writes Popham, "will elevate students' scores on high-stakes tests and, more important, will elevate students' mastery of the knowledge or skills on which the tests items are based."

Stuart Yeh acknowledges studies showing that item-teaching has happened in some cases. But in his own study in Minnesota, Yeh showed that teachers believed that the quality of the curriculum did not suffer under the pressure of Minnesota’s two state tests. Teachers also thought that testing improved the quality of their instruction and made both the students and themselves more accountable for learning. Everyone involved put in greater efforts to ensure that all children succeeded.>

Minnesota teachers reported that the quality of the curriculum had not suffered under the pressure of the state tests. They also thought that it had improved the quality of their own instruction and made both the students and the teachers more accountable for learning.

 

According to Yeh's study, the curriculum was not so much narrowed as focused. Eighth-grade teachers reported rewriting the curriculum to align with the state test because the eight strands of math it assesses are, as one teacher said, "Necessary skills for kids." A principal told Yeh that "some of those fluffy extraneous things" were eliminated from the curriculum. Math and science teachers embraced the opportunity to teach reading in their subject areas. One said, "Drilling is very little help, because I see the math test is essentially a reading test." In general, by a two-to-one margin, the Minnesota teachers thought that the impact of state testing was positive (Yeh, 2005).

Other research seems to confirm the value of curriculum-teaching. Jay P. Greene, Marcus Winters, and Greg Forster, of the Manhattan Institute for Policy Research, established the credibility of high-stakes testing by comparing results on high-stakes state tests and on "low-stakes" tests in two states and seven school districts that administered both kinds of tests. Low-stakes tests were defined as tests given to monitor student progress but no major decisions, such as grade promotion or school accountability, are attached to the results. Greene and colleagues assumed that if the scores on high-stakes and low-stakes tests were highly correlated, then high-stakes tests had not distorted the curriculum, as detractors claim. They report that "the generally strong correlations between score levels on high- and low-stakes tests in all the school systems we examined suggest that teaching to the test, cheating, or other manipulations are not causing high-stakes tests to produce results that look very different from tests where there are no incentives for distortion" (Green, et al., 2003).

In her small-scale observational study, Julie Wollman-Bonilla of Rhode Island College, provided detail about how curriculum-teaching works. She observed both a 3rd and 4th grade teacher instructing their students on how to write effective persuasive prose. The teachers were fully aware that persuasive writing is one of the genres tested at the forth grade level. Wollman-Bonilla concluded that, "Findings from this study suggest that teachers needn't teach to the test in a narrow, evaluation-focused manner. Rather, they can develop tools that move students toward test-readiness while keeping writing process principles in focus" (Wollman-Bonilla, 2004).

Researchers' advice to parents echoes the same theme: succeeding at tests means knowing the curriculum, not acquiring a few tricks. In his article, Ronald Dietel of the National Center for Research on Evaluation, Standards, and Student Testing, told parents that "getting good at format and knowing the tricks of test taking only takes you so far if you don’t know the relevant content and skills" (Dietel, 2001).

Tests, Assessments, and Student Learning

When teachers and administrators use the opportunities that tests offer them, assessments do help students to learn. There are two main points of intervention:

Succeeding at tests means knowing the curriculum, not acquiring few tricks. "Getting good at format and knowing the tricks of test taking only takes you so far if you don't know the relevant content and skills."

--Dietel

 

Alignment of curriculum and tests with standards; and use of test results to target instruction on areas needing improvement.

According to Eva Baker's study on aligning the curriculum with standards and assessments, "Alignment is the ether in which float the component parts of RBR [results-based reform]," meaning that standards, curriculum, and assessments must all be aligned if the system is to work so that students achieve at high levels. Says Baker, "Without a semblance of alignment, nothing hangs together."

As we saw in the 2005 California study, alignment contributed significantly to the success of some California elementary schools that scored as much as 250 points higher on the state's academic performance index than schools with similar populations. Teachers in the higher performing schools reported that curriculum materials in mathematics and language arts were aligned with the California standards, that the classroom instruction was guided by the state’s academic standards, and the curriculum was aligned grade-by-grade.

In her large-scale, five-year study of teaching literacy, Judith Langer finds that alignment of curriculum was a factor in student success. By comparing high- and low-performing schools, Langer finds that infusion into the curriculum of needed literacy skills and knowledge seems to have made a difference in the higher performing schools. She reports that teachers in the higher performing schools use tests as an opportunity to revise and reformulate their curriculum. While they do practice format before a test, not much teaching time is devoted to it. Rather, infusion is the key. Langer finds a direct contrast in lower-performing schools. There, she says teachers, "treated tests as an additional hurdle, separated from their literacy curriculum. In these schools," Langer reports, "the test-taking focus seems to be on teaching to the test-taking skills rather than gaining skills and knowledge" (Langer, 2001).

Alignment before the test and studying the data after seems to be a formula for success. The California study found that schools that did well used assessment data to improve student achievement and instruction. The report says, "Strongly correlated with a higher academic performance index was the extensive use of student assessment data by the district and the principal in an effort to improve instruction and student learning" (Williams, et al., 2005) 

The teachers in Stuart Yeh’s Minnesota study used data to assign additional reading help for 4th grade students in need. An 8th grade teacher reported to Yeh that "the first year we took the test, we spent that summer rewriting our curriculum to align with it. We broke it down. We took the eight [math] strands and we analyzed which strands [our district's] 8th graders did the poorest on" (Yeh, 2005).

The sparseness of research on the use of data may be attributable to the relative newness of NCLB and its consequent demand for Adequate Yearly Progress in 2002—it takes a few years for researchers to devise strategies and even raise funds to conduct basic research. In many cases, helping teachers to look at test scores by specific subgroups of students (disaggregated data) and then design focused curriculum to overcome weaknesses has become an integral part of professional development. However, there is no empirical research on the efficacy of this strategy to date.

A few publications that cannot be defined as research demonstrate emphasis on the use of data (e.g., Boudett, et al., 2005; Carr, 2003; Mertler 2002). Todd McIntire provides a three-stage, step-by-step guide for teachers to analyze their student data. Stage three culminates in best practices, when schools establish structures and systems that "change the mindset of the school from one where some or most students can meet the standards to one where all students can" (McIntire, 2005b).

Conclusion

The response to questions posed for this research review is that a majority of the public, including teachers, supports high-stakes testing, although they worry about its effect on teaching and learning; counselors in North Carolina find testing affects their ability to do their jobs; and students in Arkansas are not upset by testing.

By looking at empirical evidence, we also find that high-stakes testing increases the amount of learning, as evidenced by performance on other tests. Additionally, we see that teachers can both prepare students for tests and teach them what they need to know, if curriculum is aligned with standards and tests and if data from the tests is used to refine curriculum.

As is often the case with research on educational topics, research on responses to high-stakes testing needs to be approached with judgment and caution, and above all with an open mind. Research does not give us the definitive answers we seek; rather it provides us the tools to arrive at our own conclusions.

More rigorous studies will continue to emerge that shed valuable light in various ways—both good and bad—that high-stakes tests can affect instruction. The Center will continue to explore this issue and provide updates as new research becomes available.


This document was prepared by Ruth Mitchell for the Center for Public Education. Mitchell, a consultant and writer, specializes in education research and practice.

Posted: February 10, 2006

Updated: March 30, 2006

Add Your Comments





Display name as (required):  

Comments (max 2000 characters):




Comments:



Home > Instruction > High-stakes testing and effects on instruction > High-stakes testing and effects on instruction: Research review