Once exams were cancelled, it was inevitable that Ofqual, the body charged with regulating examination bodies and hence exams, was faced with a very difficult, if not impossible, set of choices.
There could be no fair outcome as a result of scant data. Its challenge was to devise the least unfair method of grade assessment that it could. The only confusing aspect of this situation is why, having seen the about turn in Scotland, the Government and Ofqual didn’t change course. Now they have bowed to the inevitable and the algorithm is no more. But why did we end up in this situation? There is a deeper set of issues at play: the dangers of poorly regulated algorithmic decision-making across society.
Fundamentally, the issue seems to be that Ofqual started with the data rather than the ethics. The critical question it asked was the same question as every year: what is the best assessment we can get of exam grades given the data we have? That data normally includes, most obviously, exam performance but also relative performance of pupils against their peers and peers from previous years. The aim is to get a fair assessment of pupil performance based on their verified performance on a normal distribution curve, calibrated against current and previous candidates.
Little of the data that would normally be available to make these assessments was this year. So, like its counterpart in Scotland, the SQA, Ofqual chose to rely on the data it had. And that is where things started to get tricky. This may be where the error took place: a failure to adjust the question. At this point it might have been better to ask: can we adjust grades in a way that will be fair given the limitations of data that we have? The answer, as the Government and Ofqual have belatedly concluded, is no.
You can construct an algorithm to moderate exam grades but, unless you have accounted for system biases, programmer bias and inadequacy of data, you aren’t going to get very far. In fact, you are going to cause severe damage.
And this is not just a challenge for exam grading. Algorithms are used and trained with biased data within criminal justice and policing, insurance and finance, welfare adjudication, recruitment and even warfare. The RSA amongst many others has considered the issue of deep algorithmic bias in its technology and society work. Initially, the response to clear instances of unfairness in exam grading was to say that the algorithm needs ‘re-calibrating’. You can’t simply re-calibrate a fundamentally biased system. There isn’t a technical fix for every flawed system; sometimes you need an ethical fix.
Underneath these systems of algorithmic decision-making we usually find deep ethical code which will discriminate on race, class and gendered lines. How so? The ethical choice we make is one which focuses on moral hazard. In other words, large scale algorithmic decision-making systems are set to prevent people getting something to which they aren’t entitled: an exam grade, a cash benefit, a loan, freedom to walk or drive without harassment, an interview for a job. Algorithms are used to screen to prevent free riding because that is the default ethical setting of our systems of governance.
Good, one might say. The problem, though, is when they have in-built biases many who are deserving, entitled, and of merit lose out. And they may be impaired, potentially beyond repair, in their lives if they aren’t treated fairly. Those who are advantaged will thrive pretty much whatever the system. Those who are more precarious are constantly facing life risks that can be forks in the road. If it goes wrong, they may not only miss the opportunity to progress or to be secure in that moment, they may also be left demoralised, impaired and understandably mistrustful. The impacts are not simply immediate, they are quite possibly long-term.
When it was decided a few months ago that teacher assessments would replace examinations, the biggest concern was biases against students who may have been assumed to have lower performance and the biases in that process against certain backgrounds. We don’t yet know the degree to which that may have been factor in relative scores where such pupils may have gained from teacher assessment in absolute terms but finished lower in the pecking order in relative terms. Yet, the overall biases in the system, such as that created by leaving smaller class sizes unadjusted, benefiting public schools over FE and Sixth Form Colleges, seems likely to have swamped this risk.
Was the risk of social and demographic biases assessed before the algorithms were run? Was there any ethical analysis of the impossible task of evaluating individual performance from past group (school) performance undertaken? If these assessments of bias risk and ethical deficits were not undertaken then that says something far more concerning about Ofqual or the SQA alone. It says something about the unregulated and unsystematic way in which algorithms are applied despite potential deleterious consequences. It is often claimed that algorithmic systems are made safe by having ‘a human in the loop’. Not this time. In fact, humans were looped in every which way; humans are very much part of the problem. The issue is that regulation, ethics and equality was not looped in.
Technical fixes such as limiting variation through moderation to one grade rather than up to three and a robust pre-appeal system might have helped in advance of the publication of grades. But the moment has passed: confidence in the system couldn’t be restored and for very good reason. This does mean some grade inflation and we shouldn’t be happy with that. However, as things stand, we simply do not have a way of moderating that inflation in a way that is ethically just from an individual standpoint- or from the perspective of disadvantaged groups. So today’s u-turn is welcome. The in-built ethical bias to counter free-riding has now been reversed to give the benefit of any doubt instead: we will all benefit from the positive impacts on individuals in the long-run. The other way around and the damage would have been incalculable.
Universities may well struggle to cope from a capacity point of view as student numbers massively increase. However, every employer across the land has had to adapt at short notice to distance working. It doesn’t seem beyond the realms of possibility that universities can develop distance learning and tuition for those students happy to work in that way at short notice for the first year at least. This cohort will be used to it. It’s way beyond time that universities moved away from one standard product structure anyhow.
There is a bigger debate that is needed. Algorithms will be a critical part of life wherever a large volume of decisions need to be taken. They already are: almost no interaction with large scale systems such as energy, retail, transport, social media, finance, and often work is without algorithmic mediation. The lack of regulation to ensure transparency, assessment of bias, and ethical discussion is deeply concerning. Urgently, we need to consider how to properly regulate the algorithmic world in which we find ourselves so it can work to our collective advantage without the deep failures we have seen.
Can we use a boy’s love of computer games to help improve these skills in an engaging and competitive way? Update on a Fellow-led project in Wales from Simon West FRSA.