Full text keyword search[?]
"search" : Search for an exact word or phrase
-search : Exclude a word. Add a dash (-) before a word to exclude all results that include that word.
OR : Search for either word. If you want to search for pages that may have just one of several words, include OR (capitalised) between the words. For example, "labor" OR "labour" will show results containing pages with "labor" and "labour". Without the OR, your results will show only pages that match all terms.
intitle: Search for a word or phrase. Unlike the Title search field below the Keyword search field, you can combine terms. For example: intitle:female OR intitle:women will show results containing pages with "female" and "women" in the title.
intext: Search only in the description text field of the page. This field usually contains the abstract or summary of the publication.
Campbell systematic reviews
Browse by subject area
- Research methods
- Business and Management
- Crime and Justice
- International Development
- Knowledge Translation and Implementation
- Nutrition and Food Security
- Social Welfare
Learn more about Campbell systematic reviews
Campbell evidence and gap maps
Coming soon – Campbell EGMs are a new evidence synthesis product. Plain language summaries of our EGMs will be published on this website. The interactive EGMs and full EGM reports will be available in our journal on the Wiley Online Library platform: click here.
Learn more about Campbell EGMs
Campbell has produced maps on other topics, sometimes in partnership with other organisations.
See our other EGMs
Small class sizes for improving student achievement in primary and secondary schools
- Authors: Trine Filges, Christoffer Scavenius Sonne-Schmidt, Bjørn Christian Viinholt Nielsen
- Published date: 2018-10-11
- Coordinating group(s): Education
- Type of document: Title, Protocol, Review, Plain language summary, Data
- See the full review: https://onlinelibrary.wiley.com/doi/10.4073/csr.2018.10
About this systematic review
This Campbell systematic review examines the impact of class size on academic achievement. The review summarises findings from 148 reports from 41 countries. Ten studies were included in the meta-analysis.
What are the main results?
Overall, the evidence suggests at best a small effect on reading achievement. There is a negative, but statistically insignificant, effect on mathematics.
For the non-STAR studies the primary study effect sizes for reading were close to zero but the weighted average was positive and statistically significant. There was some inconsistency in the direction of the primary study effect sizes for mathematics and the weighted average effect was negative and statistically non-significant.
The STAR results are more positive, but do not change the overall finding. All reported results from the studies analysing STAR data indicated a positive effect of smaller class sizes for both reading and maths, but the average effects are small.
Increasing class size is one of the key variables that policy makers can use to control spending on education. Reducing class size to increase student achievement is an approach that has been tried, debated, and analysed for several decades. Despite the important policy and practice implications of the topic, the research literature on the educational effects of class-size differences has not been clear.
The consensus among many in education research, that smaller classes are effective in improving student achievement has led to a policy of class size reductions in a number of U.S. states, the United Kingdom, and the Netherlands. This policy is disputed by those who argue that the effects of class size reduction are only modest and that there are other more cost-effective strategies for improving educational standards.
The purpose of this review is to systematically uncover relevant studies in the literature that measure the effects of class size on academic achievement. We will synthesize the effects in a transparent manner and, where possible, we will investigate the extent to which the effects differ among different groups of students such as high/low performers, high/low income families, or members of minority/non-minority groups, and whether timing, intensity, and duration have an impact on the magnitude of the effect.
Relevant studies were identified through electronic searches of bibliographic databases, internet search engines and hand searching of core journals. Searches were carried out to February 2017. We searched to identify both published and unpublished literature. The searches were international in scope. Reference lists of included studies and relevant reviews were also searched.
The intervention of interest was a reduction in class size. We included children in grades kindergarten to 12 (or the equivalent in European countries) in general education. The primary focus was on measures of academic achievement. All study designs that used a well-defined control group were eligible for inclusion. Studies that utilized qualitative approaches were not included.
Data collection and analysis
The total number of potential relevant studies constituted 8,128 hits. A total of 127 studies, consisting of 148 papers, met the inclusion criteria and were critically appraised by the review authors. The 127 studies analysed 55 different populations from 41 different countries.
A large number of studies (45) analysed data from the STAR experiment (class size reduction in grade K-3) and its follow up data.
Of the 82 studies not analysing data from the STAR experiment, only six could be used in the data synthesis. Fifty eight studies could not be used in the data synthesis as they were judged to have too high risk of bias either due to confounding (51), other sources of bias (4) or selective reporting of results (3). Eighteen studies did not provide enough information enabling us to calculate an effects size and standard error or did not provide results in a form enabling us to use it in the data synthesis.
Meta-analysis was used to examine the effects of class size on student achievement in reading and mathematics. Random effects models were used to pool data across the studies not analysing STAR data. Pooled estimates were weighted using inverse variance methods, and 95% confidence intervals were estimated. Effect sizes were measured as standardised mean differences (SMD). It was only possible to perform a meta-analysis by the end of the treatment year (end of the school year).
Four of the studies analysing STAR data provided effect estimates that could be used in the data synthesis. The four studies differed in terms of both the chosen comparison condition and decision rules in selecting a sample for analysis. Which of these four studies’ effect estimates should be included in the data synthesis was not obvious as the decision rule (concerning studies using the same data set) as described in the protocol could not be used. Contrary to usual practice we therefore report the results of all four studies and do not pool the results with the studies not analysing STAR data except in the sensitivity analysis. We took into consideration the ICC in the results reported for the STAR experiment and corrected the effect sizes and standard errors using ρ=0.22. No adjustment due to clustering was necessary for the studies not analysing STAR data.
Sensitivity analysis was used to evaluate whether the pooled effect sizes were robust across components of methodological quality, in relation to inclusion of a primary study result with an unclear sign, inclusion of effect sizes from the STAR experiment and to using a one-student reduction in class size in studies using class size as a continuous variable.
All studies, not analysing STAR data, reported outcomes by the end of the treatment (end of the school year) only. The STAR experiment was a four year longitudinal study with outcomes reported by the end of each school year. The experiment was conducted to assess the effectiveness of small classes compared with regular-sized classes and of teachers’ aides in regular-sized classes on improving cognitive achievement in kindergarten and in the first, second, and third grades. The goal of the STAR experiment was to have approximately 100 small classes with 13-17 students (S), 100 regular classes with 22-25 students (R), and 100 regular with aide classes with 22-25 students (RA).
Of the six studies not analysing STAR, only five were used in the meta-analysis as the direction of the effect size in one study was unclear. The studies were from USA, the Netherlands and France, one was a RCT and five were NRS. The grades investigated spanned kindergarten to 3. Grade and one study investigated grade 10. The sample sizes varied; the smallest study investigated 104 students and the largest study investigated 11,567 students. The class size reductions varied from a minimum of one student in four studies, a minimum of seven students in another study to a minimum of 8 students in the last study.
All outcomes were scaled such that a positive effect size favours the students in small classes, i.e. when an effect size is positive a class size reduction improves the students’ achievement.
Primary study effect sizes for reading lied in the range -0.08 to 0.14. Three of the study-level effects were statistically non-significant. The weighted average was positive and statistically significant. The random effects weighted standardised mean difference was 0.11 (95% CI 0.05 to 0.16) which may be characterised as small. There is some inconsistency in the direction of the effect sizes between the primary studies. Primary study effect sizes for mathematics lies in the range -0.41 to 0.11. Two of the study-level effects were statistically non-significant. The weighted average was negative and statistically non-significant. The random effects weighted standardised mean difference was -0.03 (95% CI -0.22 to 0.16). There is some inconsistency in the direction as well as the magnitude of the effect sizes between the primary studies.
All reported results from the four studies analysing STAR data indicated a positive effect favouring the treated; all of the study-level effects were statistically significant. The study-level effect sizes for reading varied between 0.17 and 0.34 and the study-level effect sizes for mathematics varied between 0.15 and 0.33.
There were no appreciable changes in the results when we included the extremes of the range of effect sizes from the STAR experiment. The reading outcome lost statistical significance when the effect size from the primary study reporting a result with an unclear direction was included with a negative sign and when the results from the studies using class size as a continuous variable were included with a one student reduction in class size instead of a standard deviation reduction in class size.
Otherwise, there were no appreciable changes in the results.
There is some evidence to suggest that there is an effect of reducing class size on reading achievement, although the effect is very small. We found a statistically significant positive effect of reducing the class size on reading. The effect on mathematics achievement was not statistically significant, thus it is uncertain if there may be a negative effect.
The overall reading effect corresponds to a 53 per cent chance that a randomly selected score of a student from the treated population of small classes is greater than the score of a randomly selected student from the comparison population of larger classes. The overall effect on mathematics achievement corresponds to a 49 per cent chance that a randomly selected score of a student from the treated population of small classes is greater than the score of a randomly selected student from the comparison population of larger classes. Class size reduction is costly and the available evidence points to no or only very small effect sizes of small classes in comparison to larger classes. Taking the individual variation in effects into consideration, we cannot rule out the possibility that small classes may be counterproductive for some students. It is therefore crucial to know more about the relationship between class size and achievement and how it influences what teachers and students do in the classroom in order to determine where money is best allocated.