By Howard White, CEO, The Campbell Collaboration
The policy usefulness of evidence synthesis is often undermined by ‘mixed evidence’. All we can conclude is that ‘more research is needed’. But very often the evidence is not mixed at all, just poorly synthesized. Here are the top three reasons why so-called mixed evidence is not mixed at all.
The most obvious reason for mixed evidence is impact heterogeneity. The impact of an intervention can be affected by context, treatment population, what happens to the comparison group, and intervention design and implementation. What works in the US may not work in the UK, as in the case of the Nurse Family Partnership. What benefits people overall may not benefit the most disadvantaged. And so on.
What works is often too generic a question. Rather, who does it work for, in what settings, with what design features and implementation fidelity? A good systematic review will address these questions. But there have to be sufficient included studies for such analysis to be possible. If there are few included studies, then ‘more research needed’ will be one – but hopefully just one – of the study’s conclusions. A good review will be specific as to what research would be most helpful.
The second reason for mixed evidence is mixing different qualities of evidence. A couple of years ago the BBC reported a new study that, contrary to government advice to eat five pieces of fresh fruit and vegetables a day, we should eat seven or more to maximize health benefits. But shortly after, the BBC featured another study saying that five a day was sufficient. Isn’t this a case of mixed evidence?
No, it isn’t. The seven-pieces study is simply a correlational analysis using observational data. There is clearly a problem of selection bias here. We know that people who eat more fresh fruit and veg are more health conscious, wealthier, better educated and so on, all of which leads to better health outcomes. The five-pieces study is a systematic review of 16 high quality studies, which take account of these confounders which cause selection bias.
Lower quality studies are more likely to find an impact. Rossi calls this the ‘Stainless Steel Law of Evaluation’. Systematic reviews should only include high quality evidence. If they lower their quality threshold, then the evidence may appear mixed. But it isn’t. This mixing of quality of evidence was one of the reasons why two reviews of payment for environmental services came to apparently contrary findings. The other reason was vote counting….
Vote counting, especially when there are underpowered studies, can be presented as mixed evidence. But meta-analysis is the right way to summarize effect sizes.
The best-known example is the Cochrane logo. The logo is a stylized representation of the forest plot from the review of the impact of corticosteroid injections for women going into labour prematurely on infant death. There are seven studies summarized in the review. Each study is shown in the figure by a horizontal line. Five of the seven studies cross the vertical line of ‘no effect’. So, a vote counter would say that the score is five-two against. The balance of the evidence suggests the intervention doesn’t work. But suppose the studies were underpowered, so they are likely not to find an effect even when there is one. Meta-analysis pools all the data, using the larger sample size to get higher power. And what the meta-analysis shows is that the corticosteroid injection leads to a 30-50 per cent reduction in infant deaths. Incorrectly synthesizing this evidence by vote counting would literally kill babies.
So very often, there is no mixed evidence only poorly synthesized evidence. As the Campbell Collaboration’s vision statement says we need better evidence for a better world. We aren’t doing evidence synthesis as an academic exercise. We are doing it to raise the quality of life for the poor and disadvantaged, to allow people to live lives with opportunity, to make a better world. So let’s do it right.