With algorithmic decision making likely to become a bigger feature of government, Gavin Freeguard urges the government to learn from the shambolic handling of A-level results
The discussion and use of algorithms in government has been growing steadily in recent years, but until this week the awareness of the role that algorithms actually play has not. The fiasco over A-level, BTEC and GSCE results has changed that.
Some 40% of A-level grades across England, Wales and Northern Ireland have come out lower than those estimated by teachers, with high-performing students in under-performing schools, and students in fast-improving schools, particularly affected. After initially insisting there would be no U-turn, the government has now said teacher-assessed grades would be used instead for A-level and GCSE results. Chaos continues as universities deal with the fall out.
The government is now trying to solve the problems caused by its approach to exams, but – with No.10 clearly focused on the more extensive use of data in government – these problems could be replicated elsewhere. Here are four key lessons government needs to draw.
Blaming The Algorithm, as if it is some shadowy, omniscient independent actor in its own right, is too easy and completely wrong. It is certainly not omniscient: Ofqual’s own analysis suggests that even at its most accurate – for History grades – the algorithm got one in every three grades wrong.
But more importantly, algorithms are made by humans. Humans decided what the rules should be and what data should be used. Humans should also have been well aware of the limitations of the data available, what the problems were likely to be and the need to mitigate them (government was warned of this, by the Royal Statistical Society and the Education Select Committee, among others). Like anything made by humans, there will be good and bad algorithms, used well or badly.
Focusing on the algorithm also elevates what could have been part of a system to being the system itself. The algorithm could have been useful in informing a wider approach; it should never have been treated as some magical mathematical, statistical, technological solution to the extremely difficult problem of not being able to test students properly. The fanfare around the NHS contact tracing app points to similar ‘tech solutionism’. The fallout also highlights the importance of being informed by data and not blindly, relentlessly driven by it. This will also be key in the public accepting the use of new technology and techniques – according to government polling, 80% of the public would be comfortable if artificial intelligence (which relies heavily on data and algorithms) were used to help a human doctor: this falls to just 19% if used instead of a doctor.
Bias is always a risk when any decision is made . Human beings are obviously susceptible to it. But there are particular concerns about algorithms using historical data, and entrenching and embedding existing bias. The UK government’s own data ethics framework identifies this risk.
Using data on schools’ performance from the last three years meant that high-performing students from under-performing schools, and schools that had improved their performance rapidly in recent years, were penalised by the algorithm. The small class sizes of private and independent schools meant teacher-assessed grades were likely to be given more weight (or be the only thing used) in awarding those grades. Existing social and educational inequalities were therefore exacerbated in this year’s A-level results. The current government has committed to ‘levelling up’ the UK and tackling regional inequalities – the original A-level results instead 'levelled down' those areas already suffering from inequality.
Proper assessment of such algorithms, by human beings, could force us to confront the potential for bias early in the development of a new system, and design in protections from and mitigations of harm from the start.
After the immediate controversy about the results came a second: the cost of schools appealing some of the grades, and the fact that this too could have hit the already disadvantaged hardest. Where algorithms are used across government, there should be a fair and robust appeals process.
In order to be able to appeal, we should understand how the algorithm came to make such a decision – there should be openness about the data used, the rules and processes applied, even the software involved, and how the output of that process was used. Governance should be clear – who is responsible for deciding how algorithms are used? But we currently lack some very basic transparency about algorithms in government. The Commons Science and Technology Committee has recommended that government "produce, publish, and maintain a list of where algorithms with significant impacts are being used". The government rejected both this proposal, and a ‘right to explanation’ (allowing people to understand how decisions about them were made), in September 2018. It should revisit those decisions.
All of these factors add up to something larger: if the government continues to make serious mistakes in its use of data and technology, it will squander public trust and with it the opportunity to benefit from these new tools and techniques (a forthcoming Institute for Government report explores the opportunities for policy making from new technology). Algorithms can, if used properly, improve the speed and accuracy of decision making – HMRC uses an algorithm to help identify instances of tax evasion, for example, saving time and money by more effectively targeting raids on businesses.
The government acknowledges the need to get the balance right. In the foreword to its 2019 report on algorithmic decision making, the Centre for Data Ethics and Innovation (CDEI) noted that "data-driven technology" like artificial intelligence and algorithmic systems:
"has the potential to improve lives and benefit society but it also brings ethical challenges which need to be carefully navigated if we are to make full use of it…"
That foreword was written by the chair of CDEI, Roger Taylor. He also happens to be the chair of Ofqual, one of the organisations responsible for the A-level farrago.
We have said before that government needs to earn the public’s trust when it comes to using personal data, and the same is true when using new technology and techniques to make decisions about people. It starts on the back foot when ‘algorithms’ will now forever be associated with downgrading and disappointment.
Technologists, ethicists, journalists and others have been warning of the dangers of algorithmic decision making for years – and both the Committee for Standards in Public Life and the CDEI have recently published reports on the subject. But many of the case studies and examples have not felt directly relevant to most people (such as Amazon’s recruitment algorithm, which favoured men over women, or ProPublica’s groundbreaking work on racial bias in criminal sentencing in the US). The A-level farce will bring many of the issues home to a UK audience, while also providing a world-beating example of the tangible effects of algorithmic bias.