Over the last three months the Government has collected a whopping 405,000 responses to a single question: “How likely are you to recommend our ward/A&E department to friends and family if they needed similar care or treatment?” Responses are collected on a six-point scale from ‘extremely likely’ to ‘extremely unlikely’. This question is currently posed to all A&E and inpatient NHS users, but the plan is to expand it across the NHS. The data is then provided to hospitals to allow management teams to assess their performance and is displayed in a simplified form on the NHS choices website, allowing individual users to compare the performance of a particular department (Urology, for example) across the nearest few hospitals.
So will it work? The evidence is beginning to mount up that ‘naming and shaming’ poorly performing hospitals and schools does increase performance (see here for a summary of two natural experiments). We also know from rigorous econometric studies that the 2006 ‘choose and book’ reforms, which extended choice in hospitals, lead to patients becoming more responsive to information on hospital quality. Publishing information on hospital performance seems a worthwhile aim then. But, as we have learnt from the long and troubled attempts to get this right in schools and social care, getting the metrics right is often devilishly difficult.
Three Challenges for the FFT
Do people understand what they are being asked? The question is based on the Net Promoter Score (NPS). Researchers at the National Patient Survey Co-ordination Centre interviewed people as they responded to the NPS and found that some people interpreted it as recommending getting treatment for your illness, rather than the particular hospital. They also found that many people objected to the very idea of recommending something like a trip to hospital to their family. The researchers concluded that “Given these issues and academic doubts about the quality and validity of the NPS we are unable to recommend it for use in surveys of NHS patients.” Remarkably, the government has stuck to the NPS.
Does it give an accurate picture? On the NHS Choices website hospitals are ranked as either ‘better than expected’ (in the top fifth nationally), ‘among the worst’ (in the bottom fifth), or ‘in the normal range’ (everything in-between.) Because not everybody fills out the survey however, the scores are only an estimate of the ‘true’ score (if everyone filled in the survey.) Ipsos MORI researchers have calculated that, if the survey gets its target response rate, we can only be sure this ‘true’ score falls within 8 percentage points either side of the published score. A quick look at the data released today shows that a lot of hospitals sit within this margin of error either above or below the ‘amongst the worst’ cut off point. In simple terms: many of the ‘worst’ hospitals will accidentally be labelled ‘OK’ and many of the ‘OK’ hospitals will be accidentally be labelled ‘amongst the worst’.
Lastly, will hospitals just ‘game’ the system and manipulate the statistics? Very few hospital managers want to resort to this sort of behaviour but the sharp, high-stakes, cut-off point between, ‘OK’ and ‘amongst the worst’ hospitals will put them under severe pressure. In fact, the better the policy works, the more pressure they will be under. Hospitals could presumably game the system by ‘merging’ badly performing wards with well performing wards, to lift them above the cut-off point. Given that hospitals choose how to collect the data, they could also exploit this to manipulate their score. If a hospital manager saw, for example, that older people were more likely to give a bad rating, then the hospital could switch to a computer-based system for filling out the survey, to try and reduce the number of older people that respond.
Publishing performance metrics has the potential to improve public services; but they must be accurate and reliable. Institute for Government research on public service markets shows just how difficult it is to get this right. No metrics are perfect and it may be a matter of finding the least-bad option through gradual development and refinement. Pushing up the response rates would help by making the scores more accurate, for example. Heeding the research when developing metrics and thinking carefully about how to reduce the opportunities for gaming, would also go a long way.