Science Achievement in Grades 8 and 9
Exhibit 2.2.1a presents the estimated growth in students’ science achievement between the eighth and ninth grade for the three participating countries.¹ The numerical results tab displays each country’s average achievement at each grade, along with a growth estimate and its standard error (in parentheses). The graphical results tab shows the distribution of changes in achievement scores, including percentiles, and provides confidence intervals² associated with average growth.
On average, students in all countries showed growth in science achievement between Grades 8 and 9. The interactive table can be sorted by different criteria. When sorted by magnitude of growth, the exhibit highlights average growth for Jordan, Korea, and Sweden as +10, 8, and 6 scale score points, respectively.
Note that the estimates of average growth are provided with a margin of error, given in the form of a standard error for each country. For example, Jordan’s estimated growth of 10 points has a 95% confidence interval ranging from 4 to 16 points (10 plus and minus twice the standard error of 3.2, rounded). Besides the standard error, the percentiles of the change distribution are also informative in understanding how much test scores can vary between the first and second assessments.
The graphical results tab of the exhibit provides a more comprehensive look at patterns in growth within and across countries. The variation in standard deviations between grades and countries, in interquartile ranges, and in the 5th–95th percentile difference in the growth distribution all indicate that variability in achievement growth exists both between and within countries. Thus, while one country’s average growth may exceed another’s, all three countries include students with varying growth trajectories between the two years. Overall, fewer students showed a decline in achievement, most students showed a gain in achievement, and some showed very little change in achievement. For instance, despite lower overall growth in Sweden (+6 scale score points), some students who showed substantial growth in science likely outpaced students who showed less growth in Jordan, where students showed higher growth on average. However, the 95% confidence intervals for the average growth across the two assessments are all located within the positive range, above zero, in all three countries, indicating that the observed gains are statistically significant³ at the 5% level.
Exhibit 2.2.1b complements Exhibit 2.2.1a and presents the average science achievement results for the three TIMSS 2023 Longitudinal countries participating in Grades 8 (2023) and 9 (2024). For each country and grade, the numerical results tab shows the average scale score, accompanied by its standard error in parentheses, the 95% confidence interval for the average achievement, and the standard deviation of student scores with its standard error. As noted above, all countries show some gains in average science achievement between the two years. Higher average achievement in Grade 8 did not necessarily translate into larger growth in achievement between the two assessment years. Jordan had the lowest average science achievement at both grades (414 in Grade 8 and 424 in Grade 9) but showed the largest average growth among the three countries.
The exhibit also provides information about within-country score variability and shows that standard deviations increased from Grade 8 to Grade 9 in Korea (from 88 to 93) and Sweden (from 99 to 107). This suggests that achievement distributions within these countries widen over time. In contrast, Jordan showed a consistently high standard deviation of 98 in both years.
¹See the TIMSS 2023 International Results for more information about the science assessment. ²One can think of a confidence interval as a “net” we cast to catch the true average of a whole population. A 95% confidence interval means that if we were to repeat our study 100 times, the nets casted from those 100 studies would catch the true average about 95 times. It doesn’t mean there’s a 95% chance our specific net caught the true value—it either did or it didn’t. Any specific 95% confidence interval is one such application of a method that works 95% of the time. ³Statistical significance means nothing more than that the difference found is big enough that we only expect to see it by random chance less than 5% of the time if there was truly no difference. It’s surprising enough for us to look closer but does not imply practical significance. Also, in very large studies, such as TIMSS where many students take the test, there is reason to be cautious. One can occasionally find very small differences that are technically statistically significant but are so small they don’t actually matter much in terms of real world effects (e.g., Berkson, J. (1938)).