Learn About: Evaluating Performance | Common Core
Home > Evaluating performance > A guide to international assessments > More than a horse race: A guide to international tests of student achievement
| print Print


More than a horse race: A guide to international tests of student achievement

Few education stories get as much attention as the periodic ranking of U.S. students on international tests. The headlines are by now familiar: "U.S. Kids Mediocre in Math and Science"1; "4th and 8th Graders in U.S. Still Lag Many Peers"2. Surely, the media fascination with these stories is partly driven by our national desire to be number one. But according to many policymakers, business leaders, and analysts, more is at stake than American boasting rights. These individuals argue that the nation's economic future depends directly on our ability to raise our present academic standing, particularly in math and science (Business Roundtable 2005; National Research Council 2005; White House 2006).

Others aren't so sure. These observers assert that the reported failure of American students is exaggerated, claiming that the differences among countries aren't so large. Besides, they say, our top students do just fine compared with their top-scoring peers in other countries (Bracey 1998).

Still others point to inherent difficulties in trying to make apples-to-apples comparisons across countries and argue that international rankings are not meaningful (Rotberg 1995).

Are we "losing our edge," as one cover story asks?3Or is this, as one observer claims, a "manufactured crisis" (Berliner and Biddle 1995)?

Yes, and yes. American students aren't  "failing" as some overwrought headlines suggest.4  But they don't win, place, or show on any international test of knowledge and skills, either. In the vast space in between, U.S performance varies considerably depending on the subject area being tested and the age of test-takers.

Some highlights:

  • American kids are good readers in comparison to many of their peers across the globe. Only three countries significantly outscored the United States at the elementary and high school levels (PIRLS 2001). The reading performance of U.S. fourth graders was particularly strong. They scored significantly above the international average (PIRLS 2001), while our fifteen year-olds scored slightly above the average (PISA 2000).
  • U.S. math performance is mediocre. American fourth graders performed above the international average but were significantly outdone by young math students in eleven of the twenty-five nations participating in the assessment (TIMSS 2003). U.S. eighth graders performed about the same (TIMSS 2003). By high school, our students' performance falls below the international average. Only eleven of the thirty-nine participating nations did significantly worse than the United States (PISA 2003).
  • U.S. science performance is a study of contrasts. On one hand, both American fourth and eighth graders scored above the international average (TIMSS 2003). Only three countries did significantly better than the United States with their elementary students and American fourth graders outperformed their counterparts in sixteen other countries (TIMSS 2003). But as in math, our high school students were significantly outscored in science by their peers in eighteen of the thirty-eight participating countries with a performance that was below the international average (PISA 2003).
  • The gap between affluent and poor students in the United States is near the international average. When comparing students' performance by parents' educational level, parents' occupation, and number of books in the home, Canada, Finland, and Iceland had smaller achievement gaps than the United States while Germany had a larger gap (Hampden-Thompson and Johnston 2006). The results are similar when looking at students by their immigration status and first language spoken.
  • The American adult population (ages sixteen to sixty-five) performed near the bottom on a six-nation assessment of literacy and numeracy. The United States performance exceeded only Italy's. Outscoring us were Norway, Bermuda, Canada, and Switzerland (ALL 2003).

These are facts. But facts without context are not as useful. This guide is the first in a series aimed at providing an understanding of what can be learned from international assessments that can help improve student learning in the United States.

First we describe the various assessments—including what they test, how they are administered, by whom, and to whom—and discuss test administrators' efforts to produce closer apples-to-apples comparisons. Next we provide a detailed overview of U.S. performance. We will look at which countries are actually performing higher or lower than the United States—not just in terms of the horse race, but in statistically valid differences. Throughout this guide, the term significant differencealways refers to a statistically significant difference, meaning the differences between scores are meaningful and did not happen by chance. (It does not describe the size or educational importance of the difference, however.)

Finally, we offer some tips for what can be learned from international comparisons by using additional data provided by each of the assessments. This includes information about policies and practices that may relate to high performance in various countries—for example, their approach to curriculum, school time, and resources—that educators may find instructive for the United States.

Like any assessment, international tests have their limits, but they should not be dismissed. Viewed properly, they provide another way of identifying where American youth are and a gauge for determining where we want them to be. In the end, the real trophy for the United States to claim is being able to guarantee that all of our students have the knowledge and skills they will need in the world beyond the schoolhouse door.

This guide is organized as follows:

An overview of the four international assessments

  • Progress in International Reading Literacy Study (PIRLS)
  • Trends in Math and Science Study (TIMSS)
  • Programme for International Student Assessment (PISA)
  • Adult Literacy Skills and Lifeskills Survey (ALL)

What can we learn from international assessments?

  • Can systems from different countries be meaningfully compared?
  • Can similar groups of students be compared across countries?
  • What does the achievement gap analysis say?
  • Can results from different tests be meaningfully compared?
  • What do rankings mean?

How does the United States compare?

  • Reading
  • Math
  • Science

Are we losing the race?

Overview of International Assessments

The United States participates in four international assessments of knowledge and skills. The first step in making use of the findings from these tests is knowing the basics: Which countries participate, which groups are tested and in what subjects, and what types of knowledge and skills are assessed. (For example, some tests assess concepts learned in school while others focus on application of knowledge and skills in work and life.) This information helps to better understand the meaning of test results.

The international tests in which the United States participates and their areas of focus are as follows:

  • Program in International Reading Literacy Study (PIRLS) assesses reading literacy of fourth graders.
  • Trends in International Math and Science Study (TIMSS) assesses how well fourth and eighth graders understand math and science concepts they've been taught in school.
  • Programme for International Student Assessment (PISA) assesses how well fifteen year-olds apply knowledge and skills in reading, math, and science.
  • Adult Literacy and Lifeskills Survey (ALL) assesses how well adults ages sixteen to sixty-five apply reading and math skills in life and at work.

Each of the assessments shares the following characteristics:

  • The samples assessed are nationally representative of all students/adults within the targeted age group or grade.
  • The samples include all types of students/adults (e.g., native and non-native language speakers, students with disabilities) within all types of schools (e.g., charter schools, private schools).
  • The assessments contain both multiple choice and constructed-response (e.g., short answer) items designed to measure higher-order skills. 
  • The test questions were developed by experts from a wide cross-section of nations and were tested in various countries to minimize the chance any particular item would be biased toward one country or cultural group. 
  • The results for each of the assessments are reported both as average scores and by achievement levels for each country,(e.g., advanced, proficient, intermediate, low).  (See descriptions of achievement levels for PIRLS, TIMSSPISA , and ALL  .) 
  • Test administrators also collect background information on students, teachers, schools, and adults that provide insights into how and what students learn.

Table 1  summarizes the key characteristics of each test.

What can we learn from international assessments?

Designing high-quality tests is always a complex undertaking and one that is made even more challenging when different countries, cultures, and education systems are involved.

The quality, and therefore usefulness, of international assessment has improved greatly over the years, but some concerns about how meaningful the findings are persist. In this section, we'll look at how the tests are constructed and administered and consider issues of comparability.

Can systems from different countries be meaningfully compared?

Getting an accurate picture of overall achievement

While sampling techniques are not perfect, PIRLS, TIMSS, PISA, and ALL go to great lengths to achieve an accurate national representation of the population being tested.

Each of the assessments has strict guidelines on how the samples are to be taken and the sampling process is overseen by independent auditors to ensure that each country complies.

Furthermore, each of the assessments releases detailed data of the sampling process so the public will know who took each assessment in each country and who did not. They report the percent of schools that participated along with the percent of students who took the test from those schools.

Countries must also report the percent of students who were excluded from the assessment.

Each exclusion must be documented and meet a certain set of criteria. Students can only be excluded in rare instances such as having a disability for which the student would be unable to take the assessment or if the student is new to the country.

In each case each excluded student must meet stringent criteria and be clearly documented. If exclusion rates are above established levels (typically 10 percent) there is a risk that the country's scores will be dropped from the results.

Although participation rates and exclusion rates vary by country the differences are not large.

Another commonly heard criticism is that education systems vary too much to compare them. There is an element of truth to this statement.

Not all students start school at the same age or are required to be in school for the same number of years. However, the differences are not great, particularly among the countries we compete with economically. (See Table 2: Who's in school? ).

Most countries have followed the United States lead and now provide universal schooling along with compulsory attendance laws, including students with disabilities. In fact, most countries that participate in these assessments have almost all of their five to fourteen year-olds in formal education as well as the vast majority of their fifteen to nineteen year-olds (Sen, Partelow, and Miller 2005).

A common criticism of international assessments has been that they cannot be used to draw valid comparisons because many countries only assess their best and brightest, while the United States assesses their entire student population (Bracey 1998; Rotberg 1990, 1995 and 1998).

Indeed, early attempts at international assessment did suffer from weak methods for capturing a good representation of students. However, due to greatly improved sampling techniques (Porter and Gamoran 2002) and the widespread embrace of universal schooling (Garrett 2004), these flaws have largely been overcome and every effort is made to ensure that scores represent the average achievement of the overall population in each country. (See sidebar, "Getting an Accurate Picture of Overall Achievement.)

Can similar groups of students be compared between countries?

International test administrators collect information about students and schools to refine comparisons as much as possible. One useful comparison is to look at the highest and lowest achieving students across countries, as we do in the discussion of results later in this guide. This comparison enables one to see possibilities for top achievers and gauge effectiveness at raising the floor.

It's also useful to look at the performance of students who were born outside the country in which they took the assessment. This comparison sheds light on the effects of immigration.

Many researchers, policymakers, and educators would like to compare racial and ethnic subgroups as NAEP does, but it just isn't possible because these groups differ vastly from country to country. However, these assessments do report data allowing us to compare racial and ethnic subgroups within the United States.

Some researchers have also tried to compare students from different socioeconomic backgrounds (SES) because research has shown a relationship between students' economic status and achievement. These comparisons are difficult, even in developed countries, because economic status is defined very differently for each country. But researchers have identified some characteristics that serve as useful substitutes, or proxies, for SES. These are discussed below.

Achievement gap analysis

In April 2006, the U.S. National Center for Education Statistics (NCES) released an analysis of achievement gaps across countries using six proxies, or alternate indicators, for socioeconomic characteristics (SES) and family characteristics (Hampden-Thompson and Johnston 2006). The proxies for SES were:

  • Highest level of education attained by either parent.
  • Highest occupational status by either parent.
  • Number of books in the home.

The proxies for family characteristics were:

  • Speak country's native language at home.
  • Students' immigrant status.
  • Students' family structure, (e.g., single-parent family).

The data for SES and family characteristics came from the 2003 PISA math literacy assessment and student questionnaires. To make comparisons more valid, U.S. gaps were compared to the twenty highest income countries based on the World Bank Classification 2005 (Hampden-Thompson and Johnston 2006).

The results should be interpreted with some caution because both SES and family characteristics are measured by student questionnaires and self-reporting is not always accurate. Furthermore, the study did not limit the comparisons to countries with a minimum percentage of the population from each group. Nonetheless, they represent fairly good estimates as they are used here.

The analysis found that for all three proxies for SES the average achievement gap between high SES and low SES students in the United States was not measurably different from the international average (Hampden-Thompson and Johnston 2006). It also found that Canada, Finland, and Iceland had smaller gaps than the United States for all three proxies, while Germany had larger gaps for educational attainment and occupational status of the parent and a smaller gap for number of books in the home (Hampden-Thompson and Johnston 2006). (See Figure 1 .) [Figure 1 PDF ]

The findings related to family characteristics showed somewhat similar patterns. The analysis did not find a measurable difference between the United States and the international average on the achievement gap between test-takers who were native speakers of the test language and those who were not, or the achievement gaps between foreign born and native born students (Hampden-Thompson and Johnston 2006). However, the U.S. achievement gap between students with two-parent homes and other family structures was significantly larger than the international average (Hampden-Thompson and Johnston 2006). (See Figure 2 .) [Figure 2 PDF ]

What is most interesting about the analysis of the three family characteristic proxies is that the achievement gaps are significantly reduced in the United States when comparing students from the same SES. This indicates that SES has more to do with the U.S. achievement gap than family characteristics themselves (Hampden-Thompson and Johnston 2006). In addition, the analysis finds that in most other countries, regardless of student's SES, the achievement gaps due to family characteristics are significant.

Can results from different tests be meaningfully compared?

Because each assessment measures different subjects, different groups of students, and different types of knowledge (plus different countries participate), it is difficult to say definitively where the United States stands in comparison to other countries. In this section we highlight what to look for and questions to ask that will help you interpret the results.

What is being assessed?

Links to examples of test questions

Adult Literacy and Lifeskills Survey (ALL)

Programme for International Student Assessment (PISA)

Additional sample questions for PISA

Program in International Reading Literacy Study (PIRLS)

Trends in International Math and Science Study (TIMSS) 

The first thing you want to know is which assessments the United States participated in and what subjects and grade levels were tested. Second, you want to look at what each assessment measures. For example, both PISA and TIMSS test mathematics, but PISA measures students' ability to apply mathematical knowledge and skills to real life experiences while TIMSS measures the mathematical skills students have obtained specifically from the school's curriculum. [See links to examples of test questions in sidebar.]. The lessons we take away from these results will, therefore, have different implications.

What do rankings mean?

Results from each international test present a different snapshot of the knowledge and skills of each nation's population from fourth grade through adulthood. Taken together, the results from each assessment provide a broader perspective of performance across nations.

Much of the reporting on international results ranks countries' relative performance. To make sense of these rankings you need to know how many countries, and which ones, participated in each assessment. To make sense of the data, you also need to know what the rankings mean in terms of statistical significance.

No two international assessments are administered to the same combinations of countries. Some tests include a wide range of countries at various stages of educational and economic development. For this reason, it's sometimes useful to compare countries similar to the United States. In this report we chose Group of Eight (G8), an organization of the eight most industrialized countries in the world: Japan, England, Germany, Italy, France, Russia, Canada, and the United States, because the organization of their education systems is similar to ours and they compete with the United States in the economic market (Sen, Partelow, and Miller 2005). At this writing, no international assessment includes data for either India or China nor will they in upcoming assessments. Given the rapidly rising position these nations are taking in the global economy, we hope that test administrators continue to make every attempt to include them in the future.

Whether you compare the United States to a select group of countries or to all of the countries that participated in the assessment, it is imperative to determine if the differences in the scores are statistically significant. Statistical significance means that the differences between scores are statistically meaningful and did not happen by chance. This means that if the assessment was taken over and over again the same countries that scored significantly higher and lower than the United States will do so again each time the assessment is given even though the actual scores may vary slightly.

All of the international assessments in which the United States participates currently report relative rankings in terms of statistically significant difference. However, some media reports continue to cite straight numeric rankings that can be misleading. For example, it can be accurate to report that U.S. fifteen year-olds rank fifteen out of twenty-seven countries on the 2000 PISA reading;  however, although fourteen countries produced numerical scores that are higher than the United States, only three were significantly higher, while nineteen countries scores were statistically no different. The remaining four countries scored significantly below the United States (see Table 3: Where the U.S. ranks internationally in reading, math, and science ).

How does the United States compare?

In this guide we compare results on international tests in two ways. First we report the overall average score for all countries that participated in the particular assessment.  Comparisons are presented in terms of the number of countries that scored significantly higher, lower, or no different than the United States. Second, we look at student performance by achievement level—focusing on the highest and lowest levels—and compare U.S. performance to G8 nations that participated in each particular subject area.

As you will see the United States performs better in some areas than in others. The causes are not fully known, although ongoing analyses of the background and other data are providing insights. The Center will be looking closer at the proposed answers to these questions in future guides. In the meantime, educators and policymakers can delve deep into the data to find possible explanations.

In addition to overall subject area scores, PIRLS, TIMSS, and PISA report scores in subscales, also called domains or content areas, that make up each of those subjects. For example, eighth grade TIMSS assesses and reports scores in five distinct math subscales: number, algebra, measurement, geometry, and data.

These subscores are useful in determining the areas educators should concentrate on to improve the overall performance in the subject area. For example, compared to their peers, U.S. eighth graders scored low on the algebra subscale. Because of this, educators may want to look at the algebra curriculum typically provided to middle-schoolers in this country compared to what their peers in high-scoring countries get.

Scores for each subscale area are included where appropriate (e.g., reading for information and literary experience) as well as scores for different groups of students.

Reading performance

The United States has participated in three assessments that measure reading skills: PIRLS, PISA, and ALL. PIRLS assessed fourth graders in 2001, PISA assessed fifteen year-olds in 2000 and 2003, and ALL assessed sixteen to sixty-five year-olds in 2003. The U.S. performance on the reading tests was mixed but positive among the school-aged population. Generally speaking, when average scores were compared to all other countries participating in the assessment, the United States ranked high in PIRLS, about average in PISA, and close to the bottom among the six nations participating in ALL.

PIRLS 2001

Thirty-four countries participated in PIRLS 2001, including all of the G8 countries except Japan. See Figure 3 .) [Figure 3 PDF]

Overview 

  • American fourth graders' overall average was significantly higher in reading than the international average and significantly higher than twenty-three of the thirty-five participating countries (Martin et al. 2003).
  • Of the twenty-three countries the United States outperformed, two were G8 countries (Russia and France).
  • England was the only G8 country to significantly outperform the United States.
  • Canada, Italy, and Germany performed about the same as the United States.

High-end performance

Top U.S. students compared favorably to the top students in other countries: Nineteen percent of U.S. fourth graders scored at the highest achievement level in PIRLS. (See Figure 4 .) [Figure 4 PDF ]

Only three other countries had a greater percentage of students scoring at the highest level and England (24 percent) was the only G8 country. At this level students are able to contrast the actions, traits, and feelings of characters and recognize major purposes of different types of text. (See Description of PIRLS Achievement Levels .)

Low-end performance

Eleven percent of U.S. fourth graders scored in the lowest achievement category, a larger percentage than in fourteen other countries. (Students at this level were not able to demonstrate they could retrieve explicitly stated details about a character or locate explicitly stated facts about people, places, and animals from a simple text.) The United States also had the greatest percentage of students at this level than the other six participating G8 countries. Canada and Germany had the smallest percentage at seven percent.

Other findings

U.S. fourth graders did well in the two subscale areas on PIRLS: Reading for literary experience and reading to acquire and use information. The United States significantly outperformed all other countries except Sweden in reading for literary experience and significantly outperformed all but five countries in reading to acquire and use information (Martin et al. 2003).

PISA 2000, 2003

Twenty-eight countries participated, including all of the G8 nations. See Figure 5.) [Figure 5 PDF ]

Overview

  • American fifteen year-olds scored at the international average in overall reading literacy in 2000, the year reading literacy was the focus.
  • When looking at statistically significant differences, the United States performed as well as or better than all but three of the twenty-eight participating OECD (Organisation for Economic Cooperation and Development) countries.
  • The only G8 country to significantly outperform the United States was Canada.
  • The United States performed as well as England, Japan, France, Italy, and Germany. It significantly outperformed Russia. The results were similar in 2003 (Lemke 2004).

High-end performance

Top performing American students did fairly well: Twelve percent of U.S. fifteen year-olds scored at Level 5, the highest level in PISA for reading literacy at which students demonstrate, for example, that they are capable of completing sophisticated reading tasks. (See Description of PISA Achievement Levels .)  Six countries had a higher percentage including G8 members Canada (17 percent) and England (16 percent).

Low-end performance

Eighteen percent of U.S. fifteen year-olds scored at or below Level 1. Students at this level can complete only the least complex reading tasks.  Fifteen countries had a lower percentage of their students at this level including the G8 countries of Canada (9 percent), Japan (10 percent), England (13 percent), and France (15 percent). The remaining G8 countries, Italy (19 percent) Germany (23 percent), and Russia (27 percent) had similar or higher percentage of students at this level. (See Figure 6 .) [Figure 6 PDF ]

Other findings

U.S. fifteen year-olds also scored at the international average in the three reading literacy subscales (retrieving information, interpreting text, and reflecting on text). Only five countries significantly outperformed the United States on the measure of retrieving information. Only two countries significantly outperformed the United States on the measure interpreting text, while four countries outperformed the United States in reflecting on texts (Lemke et al. 2001).

ALL 2003

Six countries participated including three G8 countries (the United States, Canada, and Italy). (See Figure 7  and Figure 8.) [Figure 7 PDF ] [Figure 8 PDF ] On literacy measures, U.S. adults ages sixteen to sixty-five were significantly outperformed by four of the other five countries that participated. Only Italy scored significantly lower than the United States in the two literacy skill areas: Prose and document. When looking at the highest and lowest performers from each country, the United States compares poorly. Fewer top performing U.S. adults performed at the highest levels (4 or 5) in prose literacy (13 percent) than adults from three countries including the G8 country Canada (20 percent) (ALL 2005). At these levels adults are capable of making medium to high level text-based inferences by integrating or contrasting abstract pieces of information in relatively lengthy texts (ALL 2005). (See Description of ALL Achievement Levels .)]. Moreover, adults at these literacy levels tend to have high skilled jobs and are unlikely to be unemployed (ALL 2005).

At the other end of the scale, there were more American adults scoring at the lowest level (20 percent) than in four countries. At this level, test-takers on average demonstrate the ability only to read a simple text and retrieve explicit information from it (ALL 2005). Furthermore, adults at this level are likely to be unemployed, and when employed, tend to have low skilled jobs.

The results are just as discouraging when looking at prose literacy over time. Between the 1994/1998 International Adult Literacy Survey (IALS) and the 2003 ALL, the percentage of U.S. adults scoring at the highest levels (4 and 5) declined by nine percentage points, the largest decrease of the four countries that participated in both assessments. Results were similar for document literacy. (See Figure 9 , "How the Generations Measure Up.") [PDF ]

Math performance

The United States has participated in three assessments that measure the mathematical skills of students or adults-TIMSS, PISA and ALL. TIMSS assessed fourth graders in 1995 and 2003, and eighth graders in 1995, 1999, and 2003. PISA assessed fifteen-year-olds in 2000 and 2003 and ALL assessed sixteen to sixty-five year-olds in numeracy in 2003. Across assessments, the overall U.S. performance in mathematics was mixed.

TIMSS

On TIMSS, U.S. fourth and eighth graders ranked in the middle of the pack, while on PISA U.S. fifteen year-olds ranked below most countries, and again in ALL, U.S. adults ranked near the bottom.

TIMSS, Fourth Grade, 1995, 2003

  • TIMSS fourth graders scored significantly above the international average in mathematics.
  • The U.S. average score was good enough to significantly outperform thirteen of the twenty-five participating countries but was significantly below eleven countries, including the G8 countries of Japan, Russia, and England.
  • The U.S. average score did not change significantly between 1995 and 2003. However, of the fourteen countries that took both the assessments, more countries outperformed the United States in 2003 than 1995 so the U.S. ranking fell (Gonzales et al. 2004).

(See Figure 10 ). [Figure 10 PDF ]

High-end performance

Seven percent of U.S. fourth graders scored at the Advanced level which is not significantly different from the international average. Students at this level understand fractions and decimals, solve multi-step work problems, and can organize, interpret, and represent data to solve problems. (See Description of TIMSS Achievement Levels) The other participating G8 countries-Japan (21 percent), England (14 percent), and Russia (11 percent)-were among the eleven countries that had a higher percentage of fourth graders at the Advanced level.

Low-end performance

When comparing low performing students, eleven countries including Japan (11 percent), England (25 percent), and Russia (24 percent) had a smaller percentage of students scoring at the lowest level than the United States (28 percent). At best, students at these levels can understand whole numbers and can do simple computations with them. They can also read information from a simple bar graph. The only G8 country to have more students at the lowest level was Italy (35 percent). (See Figure 11 .) [Figure 11 PDF]

U.S. fourth graders did fairly well in the five mathematics subscales; numbers, patterns and relationships, measurement, geometry, and data. U.S. fourth graders scored above the international average in all but measurement and scored highest in data.

TIMSS, Eighth Grade, 1995, 1999, 2003

  • U.S. eighth graders scored significantly above the international average in mathematics just as fourth graders did.
  • They also performed significantly higher than twenty-five of the forty-five participating countries including the G8 country Italy, but were significantly outperformed by nine countries, including Japan.
  • The only other G8 country to participate in the eighth grade assessment was Russia, which did not perform significantly different from the United States.
  • Of the twenty-one countries taking both the 1995 and 2003 assessments, the United States significantly outperformed more countries in 2003 than in 1995 by improving its average score (Gonzales et al. 2004).

(See Figure 12 .) [Figure 12 PDF]

High-end performance

Like our fourth grade performance, seven percent of U.S. eighth graders scored at the Advanced Level which is not significantly different from the international average but is a higher percentage than two of the four participating G8 countries: Russia (6 percent) and Italy (3 percent). At this level eighth graders can compute percent changes and apply algebraic concepts to solve problems. (See Description of TIMSS Achievement Levels .) 

Low-end performance

When comparing low performing students, there were fourteen countries—including the G8 countries Japan (12 percent) and Russia (32 percent)—that had a smaller percentage of students achieving at the lowest level than the United States (36 percent). Students at the lowest level can only demonstrate some basic mathematical knowledge. Italy was the only other G8 country to participate. It had a higher percentage of low performers, forty-four percent, than the United States. (See Figure 13.) [Figure 13 PDF]

Other findings

U.S. eighth graders had mixed results in the five mathematics subscales. U.S. eighth graders were significantly outperformed by ten of the forty-four countries in the number subscale, eight countries in algebra, fourteen countries in measurement, nine countries in data, and over twenty countries in the geometry subscales (Mullis et al. 2004).

PISA 2000, 2003

  • Unlike their fourth and eighth grade peers who scored above the international average in math on TIMSS, U.S. fifteen year-olds scored below the international average on the mathematics literacy section.
  • The U.S. average score was significantly higher than five of the thirty participating OECD countries but significantly lower than twenty OECD countries.
  • Of the twenty countries scoring above the United States, four were G8 countries (Japan, Canada, France, and Germany). The only other two G8 countries to participate in the assessment (Italy and Russia) scored significantly below the United States.
  • Comparisons between 2000 and 2003 are possible for two of the math subscales: Space and shape, and change and relationships (Lemke et al. 2004). There was no measurable change in those two content areas in average scores from 2000 to 2003. In both years about two-thirds of the countries that took both assessments outperformed the United States (Lemke et al. 2004).

(See Figure 14 .) [Figure 14 PDF]

High-end performance

Top performing U.S. fifteen year-olds did not score as high on PISA as top performing students in other countries. Only two percent of our fifteen year-olds scored at Level 6, the highest level in PISA math. Nineteen other countries had higher percentages including the G8 countries of Canada (6 percent), France (4 percent), Germany (4 percent), and Japan (8 percent). At this level students are capable of advanced mathematical thinking and reasoning. The other two participating G8 countries (Italy and Russia) had less than two percent of their students at the Advanced level.

(See Figure 15 .) [Figure 15 PDF ]

Low-end performance

There are also twenty-three countries that had a smaller percentage of students scoring at the lowest level including the G8 countries Canada (10 percent), France (17 percent), Germany (22 percent), and Japan (13 percent). Students scoring at this level can carry out simple mathematical problems if given explicit instructions and information. The United States had a lower percentage than two participating G8 countries: Italy (32 percent) and Russia (30 percent).

Other findings

The mathematic subscale results are similar to the overall mathematics scores. In each of the four subscales—space and shape, change and relationships, quantity, and uncertainty—U.S. fifteen year-olds scored significantly below the international averages. In all but the uncertainty subscale Canada, Japan, France, and Germany scored significantly higher than the United States.

ALL 2003

  • The performance of U.S. adults on ALL numeracy was similar to their performance on the literacy section. U.S. sixteen to sixty-five year-olds were significantly outperformed by four of the five other participating countries.
  • Only two other G8 countries participated in ALL numeracy: Italy, which performed significantly lower than the United States, and Canada, which performed significantly higher.

(See Figure 16 .) [Figure 16 PDF

High-end performance

When comparing the highest performers from each country, the United States does not fare well. Fewer U.S. adults scored at the highest levels—4 or 5—(13 percent) than in four participating countries including the G8 country Canada (17 percent) (ALL 2005). Adults at this level are able to solve complex abstract problems using multiple steps. (See Description of ALL Achievement Levels.) 

Just as in the literacy assessment, adults at these high levels typically have high skilled jobs and are more likely to be employed (ALL 2005).

Low-end performance

The results were similar when comparing the lowest performers from each of the countries. Only the G8 country Italy had a higher percentage of students scoring at Level 1 (44 percent) than the United States (27 percent). The remaining four countries had lower percentages of students scoring at Level 1 including Canada (20 percent). Adults at this level could only perform simple tasks such as counting and performing simple arithmetic operations. These adults are less likely to be employed and when they are employed the jobs are typically low skilled (ALL 2005).

Science performance

TIMSS, Fourth Grade, 1995, 2003

  • Twenty-five countries participated including five of the G8 countries: The United States, Russia, England, Japan, and Italy.
  • U.S. fourth graders scored significantly above the international average in science.
  • The United States also significantly outperformed sixteen of the twenty-five participating countries.
  • Russia, England, and the United States all had roughly the same average scores.
  • Japan significantly outperformed the United States. The United States significantly outranked Italy.
  • The United States average score did not change measurably between 1995 and 2003. However, of the fourteen other countries that took science assessments in both years more countries outperformed the United States in 2003 than 1995 (Gonzales et al. 2004). This is due to the fact that U.S. performance did not improve while other countries did.

(See Figure 17.) [Figure 17 PDF]

High-end performance

When comparing the top performing U.S. students to their top performing counterparts in other countries the United States did quite well: Thirteen percent of U.S. fourth graders scored at the Advanced Level. This is a greater percentage than all but three countries including the G8 countries of Japan (12 percent), Russia (11 percent) and Italy (9 percent). At this level students can apply knowledge in beginning scientific inquiry. (See Description of TIMSS Achievement Levels .) The only G8 country with a higher percentage is England (15 percent).  It is also greater than the seven percent international average. However, this represents a decline from nineteen percent in 1995 (Mullis et al. 2004).

(See Figure 18 .) [Figure 18 PDF]

Low-end performance

The United States had more low performing students (22 percent) than eight other countries, including G8 countries England (21 percent) and Japan (16 percent). At this level fourth graders have some basic knowledge of the earth, life, and physical science. Italy (30 percent) and Russia (26 percent) are the only G8 countries to have more low performers than the United States.

Other findings

In separate science subscales (life science, physical science, and earth science) the United States scored significantly above the international average. We also significantly outperformed all but two countries in life science, all but five countries in physical science, and all but one country in earth science (Mullis et al. 2004).

TIMSS, Eighth Grade, 1995, 2003

U.S. eighth graders had a significantly higher average score than the international average. It was significantly higher than thirty-two of the forty-five participating countries, including the G8 countries Russia and Italy. Japan scored significantly higher. Of the twenty-one countries that took both the 1995 and 2003 assessments, the United States outperformed more countries in 2003 than in 1995 by improving its average score (Mullis et al. 2004).

(See Figure 19.) [Figure 19 PDF]

High-end performance

U.S. eighth graders had a higher percentage of students scoring at the Advanced Level (11 percent) than the international average (6 percent) and a higher percentage than all but seven countries, including the G8 countries Russia (6 percent) and Italy (4 percent). The only G8 country with a higher percentage was Japan at 15 percent. Students at this level grasp some complex and abstract science concepts. (See Description of TIMSS Achievement Levels.) Unlike fourth graders who improved, the percentage of eighth graders at the Advanced Level is unchanged from 1995.

(See Figure 20 .) [Figure 20 PDF]

Low-end performance

The United States does relatively well at the low end, too. Only ten countries had fewer students scoring at the lowest level. At this level students can only recognize some basic facts from the life and physical sciences. Again, only Japan had a lower percentage (14 percent) than the United States while Russia (30 percent) and Italy (41 percent) had higher percentages when comparing G8 countries.

Other findings

TIMSS breaks down eighth grade Science into five content areas: Life science, chemistry, physics, earth science, and environmental science. In all areas, the United States ranked significantly higher than most countries. In life science the United States outperformed all but six countries, and outperformed all but twelve countries in chemistry, all but ten countries in physics, all but six countries in earth science, and all but four countries in environmental science (Mullis et al. 2004). 

PISA 2000, 2003

There isn't as much data in science literacy as there is for PISA reading and math literacy. PISA has been administered twice, in 2000 and 2003. Science literacy was the focus for the 2006 PISA (for the first time ever), so we will know a lot more when the results are published in 2007.

(See Figure 21 .) [Figure 21 PDF]

However, based on the general results from the 2003 PISA, U.S. fifteen year-olds scored significantly below the international average and were significantly outperformed by fifteen of the thirty OECD countries including four G8 countries (Japan, Canada, France, and Germany). The other two G8 countries participating were Italy and Russia, both of which did not score significantly different from the United States.

More countries outperformed the United States in 2003 than in 2000, although our average score was not measurably different (Lemke et al. 2004).

Are we losing the race?

As we have seen, there is no clear answer for how well U.S. schools are doing compared to other countries. Contrary to some observers, U.S. students are not failing. But scores do show that other countries are doing better than the United States, even with similar groups of students.

Many analysts have noted that American fourth graders do relatively well on international tests, our eighth graders perform about average, and our older students fall beneath the international average. They suggest that this shows a relative decline in American school effectiveness (see for example, National Science Foundation, Science and Engineering Indicators 2000).

In 2005, the American Institutes of Research (AIR) issued a report that challenges this idea. AIR researchers analyzed the data from the same subset of twelve countries that, along with the United States, had participated in fourth and eighth grade TIMSS and PISA mathematics. By isolating the same group of countries, the analysts intended to make a closer apples-to-apples comparison. Based on data for the twelve core countries, AIR found that the position of the United States did not decline as students moved up in grade levels, but remained relatively constant with U.S. students although in the lower third. They further found the rankings were fairly stable for most participating countries. (See Table 4.)

As we have seen, identifying the relative position of American students compared to their peers in other countries is not a completely straightforward process. It changes by the countries we are compared to (or choose to compare ourselves to) and in relation to the statistical significance of the scores. But in general, the comparisons have shown that the United States is far from last in the world, although there is room to improve, more so in math and science than in reading, and at the high school level. The race isn't lost, but we do need to pay attention.

The tests just tell us where we are. We also want to know why. When comparing countries, even similar ones like the G8, consider the questions below. Many of the answers are collected as part of the background data. This information can help explain some of the differences, and cast a light on practices and policies that can further improve the achievement of American students.

  • What does the curriculum of other countries look like? For example, what is the scope and sequence? What are the topics covered and what is the breadth versus depth of coverage?
  • How much instructional time do students receive per year in each subject? What is the average number of years of schooling the test takers have received at the time the test is administered?
  • Does the country offer universal pre-school programs?
  • What funds and other resources are typically provided?
  • How are teachers prepared and certified?
  • What academic services or supports are provided to students outside the schools?
  • What is the demographic makeup of each country's student population? What proportion is foreign born? How many languages are spoken? What proportion is SES? How does their performance compare to similar groups in the United States?

Also, the scores and subscales within each subject should be analyzed and when possible, comparisons made of similar students in similar countries.

Researchers have looked into these questions and provide clues to the relationship between the answers and student performance. The Center will be summarizing their findings in the coming weeks.   

Once you understand what each of the assessments is actually measuring, who they are measuring, and what the results mean, you can make well informed decisions about how the United States stacks up internationally. Without this knowledge you will not know what to believe, whether it's the headlines that claim American education is in crisis or the tests' critics who say international comparisons are irrelevant. More important, you could be overlooking a wealth of data on students, teachers, and schools that could be extremely helpful in better preparing your students for success in a rapidly changing world.

1 USA Today, November 21, 1996

2 New York Times, December 15, 2004

3 Time, 2006,http://www.time.com/time/magazine/article/0,9171,1156575,00.html 

4 For example, John Stossel's broadcast, "Stupid in America"

(http://abcnews.go.com/2020/Stossel/story?id=1500338)

The document was written by Jim Hull, policy analyst, Center for Public Education.

Posted: January 17, 2007

© 2007 Center for Public Education

Add Your Comments





Display name as (required):  

Comments (max 2000 characters):




Comments:



Home > Evaluating performance > A guide to international assessments > More than a horse race: A guide to international tests of student achievement