This post is based on an article recently published in a special issue of the International Journal of Research and Method in Education, on ‘What Works’.
The RSA report Creative Public Leadership, released last month, advocated that school system leaders think of themselves as designers, prepared to rethink fundamental structures and conditions of schooling in order to create improvement. Two years ago, Licensed to Create set out a vision for a teaching profession with the freedom and preparation to continuously improve and adapt their practice in the face of new demands. A creative, design mindset is fundamental to breaking through some of the structural, organisational, and cultural barriers that constrain improvement in education, particularly at secondary level. But as the reports themselves acknowledge, leaders and teachers will see more progress the more their redesigning is based on accurate, well-founded knowledge (or, where we do not have agreed knowledge and, its murky shadow, evidence). So where should that knowledge and evidence come from?
In recent years, the favoured approach to generating evidence in education has been through experimental evaluations – the approach known as ‘What Works’, funded in the UK by the Education Endowment Foundation (EEF). In this strategy, promising approaches are packaged up and evaluated, where possible by a randomized controlled trial (RCT). The results of evaluations are aggregated to provide an estimate of which approaches have the biggest effects on learning, all collected in the Teaching and Learning Toolkit. Teachers and school leaders are then expected to select the most effective approaches and drop them into their practice and decision-making.
Of course, no one thinks it is that simple. David Halpern, National Advisor to the What Works Centres, have said that the phrase ‘what works’ should only ever be thought of as a short-hand for ‘what works, for whom, when and where’, and EEF guidance always urges that any intervention selected from the toolkit needs to be adapted to fit a context. The problem I see, however, is that this view is not followed through in the way evaluations are commissioned, designed and reported.
The key weakness of What Works as it currently stands is that trials generate evidence relevant only to single either-or decisions: should I teach my students in groups or individually? Should I adopt this phonics program or that one? Should I hire more teaching assistants or reduce class sizes? These are all worthwhile questions, but the interaction of factors which determine progress in learning mean that the answer to all of these questions is likely to be ‘it depends’. The fundamental feature of a RCT is that it provides an estimate of the effect of an intervention when all other things are held the same. That means that trial outcomes tell us very little about effects if we were then to change other features of school organisation, system conditions or teaching staff. (For more on the high cost of forgetting this fact, see this example of the excellent work of philosopher Nancy Cartwright).
This might be a demoralizing story, were it not for the fact that it would be even more depressing if we really thought that trialling programs and policies till we find enough things that ‘work’ across the board was the one path to improving education outcomes. If that were the case, ten years of RCTs in the United States where over 90% have had ‘null effects’ might leave us thinking that that goal is just too hard. But we need only conclude that if we are wedded to changing one thing at a time.
In the U.S., the limited pay-off of What Works means that federal research funding and the major foundations have been turning towards alternative strategies. Funders and system leaders are now investing in research-practice partnerships, improvement science, teacher-led inquiry and design thinking. These approaches are all united by a disposition towards continuous iteration and improvement. They are all much closer to the spirit of Licensed to Create: that the decisions necessary to create real improvement in schools can only be made in context, because each combination is going to be slightly different. As noted above, however, these decisions still need some basis, and the popularity of the likes of ResearchEd indicate that teachers are eager for knowledge and evidence to help inform them. So how could What Works be reconfigured to provide that?
The mistake of What Works 1.0 in Education was to base its practices on the medical model of RCTs. This method requires a standardised definition of a treatment to ensure that programs can be delivered in the same way to all patients. If the treatment is proven successful and packaged for scaling, the emphasis is then on implementing it with fidelity. What you end up giving people is something like the toys that used to be pushed on neat little girls (think Polly Pocket): everything about it is lovely – as long as you don’t want to move anything. The terrible track record of successful moves from impactful efficacy trials to null effects in ‘scaling’ trials is an indicator that fidelity is an inappropriate expectation for the context of public education: children are too different, teachers are too different, and contexts are too different.
The alternative approach would be to design trials as some economists try to do in international development: around a generalizable mechanism. Mechanisms are hard to identify, but if there are some human and social factors which hold across contexts, and if your ‘treatment’ is relatively simple and relies on the application of just one mechanism, in trialling a mechanism one should see a more consistent effect with less of the heterogeneity created by differences in context and conditions. If a mechanism holds across many different kinds of contexts, what we have then is not the fixed Polly Pocket, where we cannot move anything because we do not know which bit is doing the ‘work’, but instead a portable, neat little principle, that can be applied and combined in different ways: a LEGO. And the more LEGOs we have, the more we can be confident in what we’re building.
Designing mechanism experiments is not straightforward, but an advantage in education is that we already know a considerable amount about the kind of principles that might be most portable, derived from decades of applied education research such as the former Teaching and Learning Research Program, the work of CUREE, or summaries of the Institute for Effective Education. The challenge in moving from research outputs to classrooms is that research uses concepts that then have to be operationalized – what does it actually look like to provide effective feedback at different developmental stages? This is where social psychology and the learning sciences should be called on to help. The closer we can get to testing the most granular element of a practice we can identify, in a range of structural and organizational contexts, the closer we will get to having really portable knowledge of teaching and learning.
The full article addresses other starting points, but the most important one is teachers and the ideas they have developed through practice. This is essentially the work of ResearchED, which has built what appears to be a fantastic community around exactly that subject. Feeding some of those ideas into more formal evaluation processes could refine and strengthen certain kernels of practice, making them more suitable for integration into the new school designs that aim to overcome some of the organisational barriers to improving education. So for any teachers reading: What are the most granular elements of your practice that feel like they ‘work’? – for whom, when and why?