Working to make government more effective

Insight paper

Policy making in the era of artificial intelligence

Technology is changing and so should the civil service.

Starmer giving a talk on AI
Keir Starmer has said he wants the UK to be an 'AI superpower'. Its greater use in policy making has the potential to transform how the civil service works.

The rapid rise of generative AI has the potential to fundamentally reshape how the civil service works, bringing with it both potential benefits and risks. Policy making is one area in which the scope for change could be most impactful

Introduction

The 1854 Northcote-Trevelyan Report, in effect the civil service’s founding document, envisaged one of the central roles of an “efficient body of permanent officers” as being to “advise, assist, and, to some extent, influence” ministers in designing government policy. In the intervening 160 years the civil service, like the process of policy making that remains one of its core functions, has changed considerably. Today, amid Sir Keir Starmer’s call for the “complete re-wiring of the British state”, it once again needs to change to match the demands placed upon it by rapid technological advances – most notably the rise of generative artificial intelligence.

The Central Digital and Data Office has described generative AI as having “the greatest level of immediate application in government”. 31 Central Digital and Data Office, Generative AI Framework for HMG, Gov.UK, 18 January 2024, https://www.gov.uk/government/publications/generative-ai-framework-for-hmg There is little doubt that it will become firmly embedded in the policy making process – as Cabinet Office minister Pat McFadden recently argued, “we are on the cusp of the next technological revolution… AI is set to transform the way people work”. 32 P McFadden, “Reform of the state has to deliver for the people”, speech at UCL Stratford, 9 December 2024, https://www.gov.uk/government/speeches/reform-of-the-state-has-to-deliver-for-the-people  But questions remain as to how generative AI, and particularly large language models (LLMs), can be most effectively used and what that means for the human policy maker.

Large language models are already here

Policy work across Whitehall is already being usefully augmented by LLMs. 37 Central Digital and Data Office, Generative AI Framework for HMG, Gov.UK, 18 January 2024, https://www.gov.uk/government/publications/generative-ai-framework-for-hmg The tools available include Redbox, which can summarise the policy recommendations in submissions and other policy documents and has more than 1,000 users across the Cabinet Office and Department for Science, Innovation and Technology; Consult, which the government says summarises and groups responses to public consultations a thousand times faster than human analysts; and Parlex, which helps bill teams develop parliamentary management plans. 38 Incubator for Artificial Intelligence, “AI in the policy profession”, presentation at civil service policy festival, November 2024, www.youtube.com/watch?v=_xLrvDD1oOw , 39 Incubator for Artificial Intelligence, “AI in the policy profession”, presentation at civil service policy festival, November 2024, www.youtube.com/watch?v=_xLrvDD1oOw  

These tools have been customised for use in government by the Incubator for Artificial Intelligence (now part of the Government Digital Service) and are more effective than the publicly available, free-to-use version of ChatGPT, which remains many people’s mental model of what LLMs can do. 40 For example, Redbox is a “retrieval augmented generation” app, which allows users to upload their own documents to provide the tool with additional context, improving its output. In the future it is highly likely more sophisticated tools will emerge, including AI ‘agents’ that can perform multiple tasks to achieve a specified goal (for example, searching the internet for high-quality information about benefits reform, then compiling the findings into the first draft of a submission).  

So, LLMs are already changing parts of the policy making process and are likely to only continue to improve. But what does that mean for the role of the policy maker? 

Not if or when: policy making is changing now

The question is not if or when AI will change how policy is made, but how policy makers can use it to improve outcomes for citizens. The impact will be extensive but not total. There are some parts of the policy making process where, for now, the role of the policy maker is relatively unaffected – like officials using their judgement to navigate the competing interests and idiosyncrasies of Whitehall to get things done. 

If an LLM was asked what steps a policy maker should take to ensure a policy was introduced, it could provide a useful top-level playbook. But it would have limited strategic insight into the specifics of how to navigate often unspoken power dynamics, like which officials and ministers are making the key decisions, what their views are, how those views make their way through the system both formally and informally – and what actually needs to happen to make sure the policy gets through the Whitehall machine. 

In other areas, the effect of LLMs will be more apparent and immediate. First, there are some ways they will change the way Whitehall itself works. For example, tools like Redbox can dramatically reduce the time it takes for a minister to learn about a new topic – alongside commissioning an official, they can ask an LLM. This challenges the traditional ways officials manage the flow of information to ministers. 

That could be extremely positive. More informed ministers should be better able to constructively challenge officials. But it could also arm them with superficial talking points on any conceivable topic, accessed at the touch of a button, to which they become overly attached. This has the potential to make it harder for officials to give good advice and reduce the ability of private offices to strategically prioritise the information ministers receive.

Second, LLMs will change the intellectual process by which policy is constructed. In particular, they are increasingly useful (and so increasingly being used) to synthesise existing evidence and suggest a policy intervention to achieve a goal. A live demonstration of Redbox at the 2024 civil service Policy Festival showed it analysing a document outlining problems with the operation of the National Grid and summarising ideas from an Ofgem report on how to improve it. 46 Incubator for Artificial Intelligence, “AI in the policy profession”, presentation at civil service policy festival, November 2024, www.youtube.com/watch?v=_xLrvDD1oOw  

Similar tools are already used abroad. The Singaporean government has Pair Chat, which helps officials with “writing emails, conducting research and generating ideas” and Ideathon, which goes “beyond the ChatGPT-based model that powers Pair Chat… built on a set of multiple models [it] enables users to ideate freely” with features including the ability to adjust how creative its answers are. 47 GovTech Singapore, “Productivity and Marketing”, Gov.SG, (no date), www.tech.gov.sg/products-and-services/for-government-agencies/productivity-and-marketing/  

Comparable tasks are also being augmented in the private sector. Goldman Sachs CEO David Solomon recently described how 95% of the work of drafting an S-1 – a registration form for a company preparing to be listed on a US stock exchange – can be done by AI in minutes. Previously, he said, it would take “a six-person team two weeks to complete”. 48 G Hammond, “Goldman Sachs chief David Solomon questions start-ups’ need to list”, “Financial Times”, 16 January 2025, www.ft.com/content/4f20fbb9-a10f-4a08-9a13-efa1b55dd38a McKinsey has a custom LLM called Lilli, which its director of product has described as “saving an average of up to 30% of consultants’ time by streamlining information gathering and synthesis” while meaning “the quality of their insights is substantially better.” 49 K Lakner and E Roth, “We spent nearly a year building a generative AI tool. These are the 5 (hard) lessons we learned”, Fast Company, 6 November 2024, https://www.fastcompany.com/91138609/we-spent-nearly-a-year-building-a-gen-ai-tool-these-are-the-5-hard-lessons-we-learned  

Interviews with business leaders conducted while producing the AI Opportunities Action Plan, 50 M Clifford, AI Opportunities Action Plan, Gov.UK, 13 January 2025, https://www.gov.uk/government/publications/ai-opportunities-action-plan/ai-opportunities-action-plan launched in January, found that many of their organisations are “using AI assistants to do repetitive tasks better and faster, freeing up to 20% of an employee’s time”. 

If LLMs can adequately get to grips with a topic and produce a written output on it in seconds rather than hours or days – rapidly performing the sort of task that has traditionally been core to a policy maker’s day-to-day existence – they could have the potential to reshape the policy maker’s role in a more fundamental way than is often realised.

Even the most advanced LLMs have limits

However, while LLMs are advancing quickly – with ‘reasoning models’ like ChatGPT o3 and Deepseek-R1 being released during the drafting of this paper – and some of their current shortcomings might only be temporary, there remain limits to what they can do.

First, while LLMs can synthesise a wide range of sophisticated information, their subsequent output can be wrong, occasionally wildly so (known as ‘hallucination’). Problematically, inaccurate answers are often presented in ways which sound plausible and require domain knowledge to detect. LLM outputs might also contain biases which officials need to correct, with some experts particularly concerned about them incorporating unfair assumptions about certain demographic groups. 

Second, because LLMs are trained on available written information, their outputs can lack the nuance and context that human experience can provide. For example, designing new policy to increase the efficiency with which hospitals are run requires possessing advanced knowledge about healthcare policy, of the sort LLMs are increasingly capable of summarising. But it also requires ‘insider’ insight into the way hospitals actually work – to provide vital context like what parts of the system are currently being gamed and how, and an understanding of how doctors, nurses and administrative staff will respond to any changes. To achieve its desired effects, any policy needs to take into account both the specific institutional dynamics at play and the potential second-order effects which might arise from them. These remain areas which AI can struggle to grasp.

LLMs also tend to provide ‘standard’ answers, struggling to capture information at the cutting edge of a field and provide novel ideas. There tends to be less written information about a field’s frontier developments for models to base answers on, and what does exist is usually newer and so potentially after an LLM's knowledge cutoff. 53 Some models, for example the publicly available version of Chat GPT, are trained on data up to a certain date and do not possess knowledge of anything after that. Unless stretched by the user, they are unlikely to suggest more radical answers and this has consequences, particularly in fast-moving areas of policy. 

Ironically, AI policy is one such area where the cutting edge is moving extremely quickly, and a policy maker with deep domain knowledge and a wide professional network, including at frontier firms 54 Organisations like OpenAI, Google Deepmind and Anthropic. and in communities where AI is being deployed on the ground, will have a sharper sense of how government policy should change than an LLM would usually be capable of suggesting.

Finally, over-credulously incorporating LLM outputs into the policy making process can be dangerous. Evidence, whether scientific, social or other, rarely points in one direction. Policy making involves officials weighing up competing claims and coming to a view about what to do. Senior officials and, at the point of decision, ministers, typically interrogate why policy makers have come to the conclusion they have, both as a form of quality control and to surface the trade-offs inherent in a recommended approach.

If done badly, incorporating LLM-generated outputs into the policy making process can risk hiding those trade-offs. If a policy maker using an LLM to generate a first draft of a submission fails to properly interrogate whether it has implicitly elevated some political principles over others, they risk building assumptions into their recommendations which run contrary to their minister’s views – ultimately meaning they will provide worse advice.

The role of policy makers will increasingly be to add expertise and new insights to LLMs

These are all good reasons for caution. But the potential benefits of using LLMs are so large that ignoring them is not an option. Humans are going to use LLMs to co-create policy. In an AI-augmented policy making process, the policy maker’s key role will be to introduce the knowledge that an LLM cannot.

Policy makers’ added value could manifest in two main ways. The first is in using their expertise to edit and shape LLM ‘first drafts’ – checking for and correcting hallucinations and untoward biases, or curating a long-list of LLM-generated policy options. It also means interrogating the trade-offs implied by an LLM’s output and checking it is aligned with the elected government’s values. This is not that dissimilar to what the best policy makers currently do – humans, too, get things wrong or expose biases through their work and checking an LLM’s work is similar to checking another official’s.

The second is by layering policy makers’ ideas and judgement on top of LLM outputs, sometimes being prepared to push them in a more radical direction. This could involve an interactive process, in which an LLM is asked to provide feedback on ideas produced by a policy maker. The time freed up by using LLMs to perform traditionally time-intensive tasks could give policy makers the opportunity to gather and deploy new types of information which can help craft better policy but might typically have been missing from the policy making process.

Particularly important will be the kind of hyper-specific or real-time insider insights which LLMs struggle to capture, which could be acquired in new and creative ways – spending time immersed on the frontline, building a professional network which can give real-time reactions to new developments, or something different entirely. 

This represents a change of emphasis. As Tom Westgarth, an AI policy adviser at the Tony Blair Institute, put it, it means that “a core value add for policy makers will be – what valuable information do I have that is not written down on the internet?”. Understanding how the complex and relational systems the government oversees actually work – including the hidden ways in which they are dysfunctional, because every system has some failures hidden from public view – will be particularly important.

Crucially, doing this well should mean developing a richer policy making process. By freeing up time typically spent on repetitive and low-creativity tasks, LLMs could give policy makers the opportunity to explore beyond the traditional constraints of the role, acquiring new insights and so developing new capabilities, leading to more complete policy and better results for citizens.

Of course, it will also require officials to be proficient at using LLMs. Getting the best out of these tools will require them to possess skills like ‘prompt engineering’ – designing instructions to maximise the chance of an LLM giving a useful and accurate response. And ensuring officials know the ways in which they should trust and be sceptical of an LLM’s output, and the appropriate way to use them in different scenarios (some use cases will require more caution than others), is important to facilitate their appropriate use. 

LLMs might make it harder for policy makers to acquire important skills 

For this vision to be realised, policy makers will need the capability to make it work. If domain expertise and insider insights are the things for which policy makers are increasingly valued, they must possess the commensurate skills.

But this presents something of a paradox: LLM adoption might not only make domain expertise even more important to possess, but also harder to acquire. It is precisely the activities that LLMs are so efficient at performing – gathering and synthesising existing evidence, and using it as the basis for actionable policy solutions – that policy makers have tended to use to acquire their first building blocks of expertise.

This also has consequences for policy makers’ ability to gather insider insights. It is all very well freeing up time for policy makers to collect information in new ways, but if they do not have a baseline level of expertise they will find it hard to know where to look for it and how to interpret it.

This leaves the civil service with two options. The first is to preserve some basic tasks for more junior officials so they can build the domain expertise needed to intelligently use LLMs. The second is to reinvent the way policy makers acquire expertise, reducing reliance on the (now AI-augmented) traditional methods. For example, just as in leading law firms partners take substantial responsibility for passing down knowledge to trainees, perhaps the civil service will need to institute a more apprenticeship-based approach to learning, in which senior officials take more responsibility for passing knowledge to junior ones. Or the type of official who is currently a junior policymaker could instead be deployed to the frontline, giving them personal experience of the operation of the state which they can use in a more conventional policy role in Whitehall once they get more senior.

Both options have pros and cons. Ringfencing some tasks might be a less efficient approach but is more easily implementable because it fits within the existing paradigm. Perhaps the most sensible approach would be for the civil service to start by ringfencing, but actively commission pilots and other ‘test and learn’ projects to explore more imaginative approaches, and scale those where they work. 

Supplementing this with more traditional solutions might help too. In particular, external experts who have worked in sectors outside government would enter the civil service with existing insider insights about that field. In the future, it might be even more important for the civil service to be ambitious about external recruitment. 56 J Urban and A Thomas, Opening Up, Institute for Government, 1 December 2022, https://www.instituteforgovernment.org.uk/publication/civil-service-external-recruitment

Conclusion

Policy making is among the most important and hardest jobs the civil service does, and improving how it is done is a substantial prize. The advice ministers receive helps determine the country’s trajectory and improving the process which generating that advice would make the UK a better governed country. 

The adoption of AI is not a ‘nice to have’ but core to ensuring the civil service provides a good service to ministers and citizens. A policy making process which blends human expertise with LLMs will not just be more efficient, but more insightful and connected to citizens’ concerns.

The question is not whether LLMs should be used or not – they are useful, as shown by the fact that policy makers and equivalent workers in other industries are already using them. It is instead how to channel their adoption in the most productive way possible, maximising the benefits – which, for all the justifiable caution, are substantial – while mitigating the risks. Failure to do so instead means they will be used haphazardly, increasing those risks. 

Just letting change happen should not be an option. The civil service must proactively shape it.
 

Acknowledgements

Particular thanks to Tom Westgarth, Gavin Freeguard, Ben Warner, Dylan Rogers, Vernie Oliveiro, Andrew Bennett and Peter Kemp for giving feedback on earlier drafts of this paper. As always, their help does not imply endorsement and all views are the author’s alone. Thank you also to all IfG staff who helped in the writing and launch of this report – particularly Alex Thomas, Teodor Grama, Will Driscoll and Sam Macrory. 

Related content

30 APR 2025 Hybrid event
30 April 2025

How can government be transformed?

Michael Jary, the government’s former lead Non-Executive Director, sets out his views on Whitehall reform