Scientists, and increasingly data journalists, are being asked to publish the data behind their work. What about thinktanks? Petr Bouchal thinks they should follow suit.
There has been a boom in ‘explainer journalism’ over the past year. Several projects – such as Vox, Fivethirtyeight and The Upshot from the the New York Times – have promised to explain the news to us, and many of these publications rely heavily on data to do so. In some sense, thinktanks have played this role for some time. They can build on their areas of expertise with a deeper, more long-term investment in understanding and analysing data than many media outlets can afford. Some think tanks have been increasing their presence in this space. Examples include IFS’s work on income distribution, the Kings Fund's Quarterly Monitoring Report, and the Resolution Foundation feature on living standards. This is also what we at the Institute for Government are doing with Whitehall Monitor, which aims to explain how government works, what it looks like, how it has changed and how well it is performing. We aim to make our work accessible to a range of audiences, which is why we publish short blogposts, more detailed research reports, and a comprehensive annual report on government performance. We also make our content available in smaller chunks on the web. This type of data journalism has been criticised for the relative lack of transparency of the data and analysis behind what is published. This criticism can be applied to think tanks and it resonates with what we think in relation to our work: as we work on how we present data, particularly through better visuals, we are keen not to lose sight of the need to be transparent about our analysis. We have benefited from a plethora of guides, blogs, and examples of how to visualise data. But there is less thinking out there about how think tanks can, or should, publish the data and analysis behind their work. The best examples might be Shelter’s Housing Databank or the IFS’s Fiscal Facts resource. We have published our data since we started work on the Whitehall Monitor, but we know we could do better. Getting this right is particularly important for us because we want to set an example for government. We have repeatedly encouraged government to make its data better, more accessible, and easier to use and understand. We also want to make our data as useful as possible for others. We typically create our datasets by collating or mashing up data that comes in bite-size chunks and messy formats. Publishing our complete, clean datasets should allow a range of users inside and outside government to generate new insights and add to our work. To this end, we have debated a range of issues, from the technical to the more conceptual:
- What formats are useful?
- In what 'shape' should the data be published?
- What is the best way to publish metadata, or documentation around datasets more generally?
- Is there any particular standard or set of rules out there – whether around formats, documentation, or making data open – that would be particularly valuable to follow or that others have experience with?
Naturally we may not be able to build an all singing, all dancing online tool for cutting and analysing our data on the fly, but we certainly want to improve on what we do at the moment. We would welcome thoughts from others who may be grappling with similar issues, but also from anyone who may have used, or is interested in using, our data. Please feel free to leave a comment or drop us an email if you want to discuss this. We also believe that thinktanks would benefit from an open discussion about these topics. This is our modest attempt at a start.