The whats and whys of RCTs (Part 2)

By Susannah Hume, King’s College London and Behavioural Insights Team |

This is Part 2 of a 2-part blog series on using RCTs to evaluate the effectiveness of access interventions. In the first post I described the rationale for using an RCT and why it is considered the ‘gold standard’ of evaluation. In this post, I look at the cases where RCTs may not be appropriate, and the ethics of running RCTs.

When not to run an RCT

As outlined, RCTs are the ‘gold standard’ of causal evaluation, and in my view it is always worth seriously considering whether it is possible to run an RCT. However, in some cases, the answer to that is no. Here are some key considerations on whether an RCT is appropriate.

The sample size is too small

As a rule of thumb, a field RCT should have at least 1,000 participants (across treatment and control) to justify the assumption that randomisation has produced truly balanced groups. In addition, the more subtle the intervention, the larger the sample size required to detect it–a text messaging intervention will likely require multiple thousands of participants to confidently detect an effect, whereas a summer school is likely to require fewer. There are things we can do that reduce the sample size needed and increase our confidence in randomisation, but ultimately, an RCT is not going to be a sensible way to evaluate a one-off programme with 30 participants.

You don’t have the data required to evaluate it properly

RCTs require a great deal of information about participants–before and after the intervention–to be run well. If you don’t have consistent, reliable data, it will make running an RCT (or any quantitative-based evaluation) more difficult. For example, if we’re relying on a survey measure to gauge attitudes towards university, it is problematic if it isn’t administered to all students, in the same way, at around the same time, and with the same instructions given by teachers, across different schools.

You don’t have control and visibility over the implementation of the intervention

RCTs require high fidelity to the implementation protocol–if a student is in the treatment group, then they need to receive the intervention, and we need to be confident they’ve received it in the way we expect. Likewise, we need to be confident the control group hasn’t received anything they shouldn’t have.

You can’t run it in a way that’s externally valid

External validity refers to the extent to which the effect you observe in the trial is likely to continue to exist if the intervention is rolled out more broadly. For example, if during the trial all students are mandated into something that would usually be voluntary; or classroom teachers are ‘blind’ to students’ assignment (that is, they don’t know which students are in the intervention group) whereas generally teachers would know who was receiving the support. In these cases, students and teachers might behave differently in the trial than they would in ‘real life’. In addition, sometimes the nature of the intervention means that participants are highly aware that they are in a trial, which changes their behaviour (known as the ‘Hawthorne effect’).

The Ethics of RCTs

A commonly-cited ethical concern with RCTs is that they require the withholding of the treatment or intervention from an arbitrarily chosen subset of the population, who would have received it if it had simply been rolled out.

The ethical underpinning of RCTs is driven by the concept of ‘equipoise’, which refers to a state in which there is a lack of consensus in the expert community about whether an intervention is effective. Note that equipoise does not require a particular individual to be unsure about the impact of the intervention, only that there is a lack of consensus among those who have some expertise on the topic. Although the individual who is developing the intervention, such as the learning plans we discussed in the previous post, may be confident (and hopefully is!) that they will have a benefit, if there is a lack of (a) consistent, high-quality evidence backing this up, and (b) consensus among practitioners that learning plans make a difference, then it can be said that the community is in equipoise about the effectiveness of learning plans in lifting achievement.

Where we are genuinely unsure whether something works, it is not unethical to withhold it from some people; in fact, it is ethical to test whether it is effective before rolling it out. However, once a consensus has developed, then not acting on that consensus becomes unethical.

There are many cases of interventions that should have worked, and that those delivering them believed were effective, that turned out to have null or negative effects. For example:

The use of corticosteroids to treat traumatic brain injury, which was common practice until recently, has been found to increase the likelihood of death.^[1]
The popular ‘Scared Straight’ initiative, where juvenile delinquents and children at risk of becoming delinquent take part in prison visits. While the intention is to deter young people from criminal behaviour, RCT evaluation has shown that participants are actually more likely to offend after taking part.^[2]
Infant simulators (dolls that mimic the behaviour of real babies), which are used to discourage teenage pregnancy, were found by one RCT to increase the likelihood that participants would have experienced a pregnancy before they were 20.^[3]

We believe that in the vast majority of cases, testing via RCTs is ethical, for three reasons:

A great many things that ‘should’ work, don’t, as above;
Because even an intervention with no or a small positive effect may not be the best use of students’ time and it’s important to know that; and
Generating strong evidence that an intervention is effective (and cost-effective) can help to embed that intervention in practice in a way that ensures that it is available to future beneficiaries.

Alternatives to RCTs

There are a range of other ‘quasi-experimental’ methods that can be used to try and identify causal effects, including matching each treated individual to those in a larger pool of untreated individuals, and making use of natural experiments in the form of eligibility cut-offs, and measuring the effect of ‘intending’ to treat an individual (whether or not they were ultimately treated). It’s worth noting that all these methods require more data and stronger assumptions to identify a causal impact than a well-designed RCT.

In addition, there are a range of other evaluation methods that make use of participant experience and voice, such as interviewing and focus groups and surveys, and those that make more use of existing data. These methods are all important, valuable and useful, and will be the topic of another blog post in future.

RCTs: the verdict

RCTs are an immensely valuable form of evaluation, that comes as close as it is possible to come to identifying the causal impact of an intervention on an outcome of interest. They are not right for every situation, but it is always worth weighing up, when considering a implementing a new programme (or revisiting an existing one) whether an RCT evaluation is feasible.