What is the role of high-stakes testing in school improvement?
Testing has been part and parcel of school reform policy for over a decade. Nearly every state has its own approach to standards-based reform, but they share three key components:
- high standards for what all students should know and do;
- tests aligned to the standards to gauge student progress; and
- accountability for schools based on the results.
While the particulars vary from state to state, the goal is the same: to make sure our schools are providing all students with the education they need to lead meaningful, productive lives in the new century.
Each state, with the exception of Iowa, has developed a set of standards that defines this education for its students. States administer tests as a check on the academic program in order to assure the public that schools are meeting their obligation to teach the state standards. Teachers’ grades are valuable indicators for students and their parents, but what grades mean can vary a lot from school to school and even from classroom to classroom. Research has shown, for example, that student performance earning an “A” in high-poverty schools would be given a “C-” or “D” in low-poverty schools. Standardized state tests, therefore, provide a valid, external measure that can be compared across schools and districts.
High stakes is a relative term because some states established higher stakes than others. Consequences for schools or districts range from public reporting—with its attendant possibility for public praise or censure—to financial rewards for good performance, to a complete state takeover for persistent bad performance. NCLB has extended federal accountability measures based on test scores to all schools and districts that accept Title I dollars, which are intended to supplement the educational program for students from low-income families.
Standards-based reform, including testing for results, enjoys widespread support as a school improvement strategy among policymakers, educators, and the general public. Even so, many observers express concern that an over-reliance on standardized tests will skew classroom instruction toward content and skills that can be most easily tested at the expense of critical thinking and creativity. Some critics further maintain that there is a limit to how much schools can compensate for distressed family circumstances, charging that the current high-stakes environment is unrealistic and unfair to schools.
Still, many others support high-stakes testing arguing that it casts a needed spotlight on the underachievement of many American students, in particular low-income and minority youth whose low achievement has previously been hidden behind school and district averages. Proponents point to the gains such students are showing on these tests as proof that the strategy is working to close achievement gaps. They maintain that anything less than a system that enforces high standards for all is unfair to students.
It’s important for school districts and their communities to be aware of both the promise and the pitfalls of high-stakes testing. Even though districts are bound to state and NCLB requirements, they still have some latitude in determining how to best prepare their students. This will include passing the state test, of course, because the results tell the community whether their children are able to compete with their peers across the state. But in the end, their success will not be defined by the test alone but by other attributes the community and the outside world value as well.
For more information, see What is standards-based reform? and A guide to the No Child Left Behind (NCLB) Act.
What are standardized tests?
Standardized tests are large-scale tests that are administered to students and scored in the same manner. Students take the same test in the same conditions and, if possible, at the same time so that results can be attributed to student performance and not to differences in the test or the way it is given. Because of this, the results of standardized tests can be compared across schools and districts.
Standardized tests are typically developed and administered by commercial test publishers who assure that the test is a fair and valid measure of student knowledge and skills.
The term “standardized testing” is sometimes used as a shorthand expression for machine-scorable, multiple-choice tests. But standardized tests, or assessments, can take many forms and can include open-ended short answer questions or longer essays.
What makes a standardized test high stakes?
A high stakes test has consequences attached to the results. For example, students may be promoted to the next grade, graduate from high school, or be admitted to college based on their scores on standardized tests. Standardized tests can also have high stakes for schools and districts. NCLB requires all schools and districts to make "adequate yearly progress" (AYP) as measured by their students’ scores on the state tests—standardized tests aligned to state standards. Failure to make AYP can result in various sanctions, from allowing students to transfer to a higher-scoring school to a complete restructuring of the academic program and staff.
Not all standardized tests have high stakes. In addition to their state test, many districts use other standardized tests to monitor student progress, diagnose areas of strength and weakness, and provide feedback used by educators so they can reflect on their practices and by parents for more information about their child’s progress. Such "low stakes" tests can be an important part of good instruction.
What is teaching to the test?
When high stakes are attached to the results of standardized tests, the normal reaction of schools is to make sure their students are prepared to do well on them. The simple definition of teaching to the test means precisely what it says: adjusting classroom instruction to produce high scores on a particular test.
Is teaching to the test a good or bad thing?
The practices that could be characterized as teaching to the test take many forms. At its worst, the phrase can imply that, having found out by fair or foul means what items will be on the test, the teacher drills the students to fill in the appropriate bubbles on the answer sheet. This is clearly cheating and there is no excusing it.
When educators talk about teaching to the test, however, they are more commonly referring to the various ways high-stakes testing drives classroom instruction, content, and activities, such as:
- Emphasizing content and procedures that are likely to appear on the test.
- Writing essays according to the scoring criteria used on the test.
- Minimizing or eliminating curriculum content that is not tested.
- Practicing test-taking strategies.
Research is beginning to show that teaching to the test can be either bad or good depending on how administrators and teachers approach it.
Observers generally report four negative classroom effects produced by high-stakes testing.
- Narrowing the curriculum by excluding from it subject matter not tested.
- Eliminating topics either not tested or not likely to appear on the test even within tested subjects.
- Reducing learning to the memorization of facts easily recalled for multiple-choice testing at the expense of in-depth learning and critical thinking.
- Devoting too much classroom time to test preparation rather than learning.
Other observers find that teaching informed by the state test can have positive effects.
- Focusing the curriculum on essential content and skills.
- Eliminating activities that don’t produce learning gains.
- Motivating both teachers and students to exert more effort.
Clearly some of the conflict between these statements is a difference of interpretation. What “narrowing the curriculum” is to some is “focusing” to others. So which is it? The answer will likely vary locally, whether by district, school, or classroom.
If there’s concern in your community that the test is narrowing instruction, first you need to define what this means. Are untested subjects—for example, social studies or the arts—being pushed out? Are enrichment activities like field trips being eliminated? Are favorite lesson plans falling by the wayside if they can’t be shown to advance learning?
Second you need to look at your data. Are your schools meeting district, state, and NCLB goals? Are there gaps in performance? Are some students—or lots of them—seriously lagging behind their peers?
Most communities want a full educational experience for their young people—one that includes the arts, physical education, field trips, and a rich array of extracurricular activities. At the same time, the public consistently names teaching the basics as the first priority of schools. If students are persistently failing state tests in math and reading, educators and administrators really need to consider what content they should concentrate on the most to help these young people. The answer to this question will likely be the difference between "narrowing" and "focusing."
New research is emerging that seeks to sort out which practices under the rubric "teaching to the test" are more effective than others, and how much is too much. For example, some research suggests that introducing students to the test format can be helpful so they won’t be surprised on test day, but that it requires only a small investment in class time and should not detract from other, arguably more important instruction. To do too much test prep, in fact, may produce diminishing returns and actually hurt test scores.
School leaders need to monitor how tests are affecting instruction in their classrooms, using their data and the goals of the community to inform their analysis. When these elements are brought together, "teaching to the test" is more likely to be a good thing than a bad one.
How are high-stakes tests affecting classrooms?
Several testing experts have published cautions about the possible negative effects of high-stakes testing, including such things as a narrow, test-driven curriculum and highly stressed students. Most of these articles cite anecdotal evidence to describe the negative impact tests are having in classrooms. But there is little research about the extent of such practices, even though it’s clear that they occur at least sometimes.
Although lacking the rigor of empirical research, public opinion polls and teacher surveys can provide some insight into how testing is affecting classrooms and students, and how widespread the effects are. Several polls present a mixed picture of drawbacks and benefits as reported by both the public and teachers. In one survey, teachers reported by large margins (79 percent) that high-stakes testing will lead to teaching to the test instead of real learning, yet a similarly large proportion (73 percent) said that testing has not caused real learning to be neglected in their own classrooms.
The general public likewise shows ambivalence about testing. Sizable majorities of the public have expressed support for mandatory testing in numerous polls and surveys. At the same time, they report concern that the emphasis on testing could go too far, although this concern appears to be declining in more recent polls.
Researchers are beginning to take a harder look at the impact of high-stakes tests on students. One small empirical study analyzed this effect on 283 elementary students in an Arkansas district. The researchers concluded that while some students experienced anxiety and pressure, most students experienced “little or no negative effects from testing.”
What classroom practices produce high test scores?
New research is showing that teaching a curriculum aligned to state standards and using test data as feedback produces higher test scores than an instructional emphasis on memorization and test-taking skills. This means that many of the negative responses to testing that experts have warned against—notably, the drill n’ kill curriculum and so-called test prep—aren’t likely to produce the test score gains these reductive practices are intended to achieve.
These findings should be of great importance to the many teaching professionals out there who have agonized over sacrificing rich learning experiences in order to do what they believed would prepare their students for the state test. This research affirms their judgment about what constitutes good instruction and it pays off in higher test scores.
This is not to say that teachers have carte blanche. The key word in these findings is alignment: both aligning curriculum to state standards and aligning instruction in response to data. A 5th grade teacher’s beloved lesson on butterfly cycles may not be helpful when the state standards call for students to be learning about the transfer of energy. But some lesson plans are too good to just toss, and with some adjustment this unit could be better aligned, for example, by using butterflies to show how food is converted to energy. By adding reading and writing tasks, the butterfly unit can also bolster skills described in the state language arts standards. The data will ultimately show the effect of these lessons and point the way toward continuous improvement.
What are the characteristics of good standardized tests?
Standardized tests are developed by publishers who follow an exhaustive vetting process to produce tests that are valid, reliable, and free of cultural bias. Therefore, psychometric soundness should probably be assumed. Beyond this, however, are a few considerations that can make some standardized tests better than others.
First is format of which there are basically two kinds: multiple-choice and open-ended. Some standardized tests combine both.
Most readers are familiar with the multiple-choice test. These fill-in-the-bubble tests provide test-takers with a choice of possible answers from which students select the best one and record their choice on an answer sheet with a number two pencil. Because answers are controlled, multiple-choice tests generally have higher reliability ratings than other formats. Reliability means that the test is so internally consistent that test-takers could take the test repeatedly and score about the same. Multiple-choice tests are scored by machine, which also makes them less expensive to administer.
Multiple-choice tests have long been criticized for reducing learning to facts and procedures that can be easily recalled. However, testing technology has improved over the last several years and tests now feature multiple-choice items that demand more critical thinking.
Open-ended items ask students to write a few sentences in a short answer or an extended response essay. They have the advantage of allowing students to display knowledge and apply critical thinking skills. It is difficult to assess writing ability, in particular, without an essay or writing sample.
The disadvantage is that open-ended items require human readers, although attempts are being made to develop computer programs to score essays. Open-ended items don’t share the same high reliability ratings as multiple-choice, although again, improvements in testing have substantially improved the reliability of such items and they appear on many state tests. Open-ended items are also more expensive to administer and score.
A state test, obviously, should be aligned to state standards. While a few states use off-the-shelf tests, most state tests were developed by test publishers according to specifications provided by the state based on its standards to assure this alignment. But, how well the test is aligned could be a matter of interpretation.
The best way for non-technicians to judge the qualities of a test is to take it. Many states publicly release whole tests or parts of them after they have been administered so all can see what is being demanded of schools and students. Developing new test items is expensive so some states keep their tests locked up and secure so they can reuse them. However, even these states provide sample items and specifications to the public.
Going through released tests and test items can be done with parent and community groups, teachers and administrators, school board members and community leaders, to build a better understanding about how the test relates to the school program. Consider these questions:
- Would I expect my child to do well on this test?
- Would I expect the students in my community to do well on this test?
- Does the test represent all or part of what we want for our students?
- What else would we want to know about school and student performance?
How can school districts use tests effectively?
Research shows that when schools teach a curriculum aligned to state standards and use test score data to reflect on their practices, students will produce higher scores.
School leaders should therefore make sure their teachers have an aligned curriculum. They should also provide ample professional development opportunities so that all of their teachers are comfortable using data to inform their instruction in addition to being able to adjust curriculum so that it aligns with state standards. These leaders should use the data themselves to see if their decisions are producing the results they want.
State tests are administered at most once a year. In order to get more frequent feedback, more and more districts are administering periodic tests called benchmark assessments. These benchmark assessments are aligned to state standards and are typically given district-wide at various points in the school year. Benchmark assessments are usually low stakes—they do not count for school accountability. Rather they are used primarily for monitoring the effect of instruction and identifying students who need extra support before they face the state test and risk failure when it counts. Some states have developed benchmark assessments that they offer to districts for this purpose, but districts have also developed their own.
Districts can also assess students in subjects not tested by the state but deemed important by the community. Moreover, because the scale is smaller, it’s easier for district-level tests to be richer and feature more open-ended items, thereby encouraging students to apply what they have learned.
Remember, assessments and tests are means to an end, not the end in itself. Assessment is a dipstick dropped into the academic program to obtain information about what has been learned, to produce data about student learning. It’s how schools and districts use the information that will make the difference.
Are test scores the only measure of school performance?
Test scores clearly command the lion’s share of attention in the press and in policy discussions. Their primary advantage is that they are quantifiable and standardized, making valid apples-to-apples comparisons possible. As such, they appeal to our American love affair with ranking the competition.
Scores on state tests have additional benefits, too. They are aligned to state standards that were defined through a public process and reflect a consensus about what’s important for students to know and be able to do. The public can look to the results of their state tests to find out if schools are meeting these expectations. State test scores also apply to all students so they present a more complete picture of school performance than other tests, such as the SAT or ACT, which are primarily taken by the students intending to apply to four-year colleges and thus may not be representative of the whole school.
Their merits notwithstanding, test scores will not tell communities everything they will want to know about their schools. Many of these other indicators are, like scores, fairly easy to quantify and disaggregate by student group so the community can monitor how well their schools are performing and how equitable the outcomes are.
Top on the list is high school graduation. No one would reasonably judge their local high school to be successful if it produced high test scores by virtue of having high drop out rates. Communities want to make sure all their students leave with a diploma in their hands.
A particular priority for parents is knowing that schools are safe and orderly. Policies about school safety and discipline are a matter of public record. But it is also possible to establish measurable goals for safe schools and report these to the public, for example, the number of incidents per school and the number of disciplinary actions taken.
Other performance indicators to look for:
- Enrollments in high-level high school courses, such as Advanced Placement. International Baccalaureate, and other college prep courses.
- Student participation in the arts and community service.
- College-going rates, including scholarships.
- College remediation rates of recent high school graduates (the proportion of entering freshmen required to take remedial courses in math, reading or writing).
- Evidence of community support, such as voter turnout on bond referenda and school board elections
This document was prepared by Patte Barth, director of the Center for Public Education, and Ruth Mitchell, education consultant and writer.
Posted: February 16, 2006
©2006 Center for Public Education