Low data literacy is an important barrier in big data and open data projects realising their promise. Building on the experience of engaging Warrington in the RSA’s recent Heritage Index, I suggest five steps towards responsible data publishing.
Are you getting tired of hearing about ‘big data’ and ‘open data’? Or are you confused and still trying to figure out what these terms mean? Or perhaps you are both tired and confused?
Let me tell you a story about Warrington. The story shows stumbling blocks for those who want to use big and open data to try to help understand the world in a new way. In a nutshell, data literacy is low, both among the public, journalists who report the news, and among many professionals. With this reality, researchers and those reporting on research need to take on certain basic responsibilities to realise the promise of big data (huge datasets: read this) and open data (free and unrestricted for the public: eg Wikipedia). At the end of this blog, I suggest five steps towards responsible data publishing.
So what happened in Warrington?
We’re working on an ambitious project in partnership with Heritage Lottery Fund. We want to help places make the most of their heritage. In an era of increasing local powers in the UK, it’s increasingly important than councils, community groups and citizens shape their future in a way that reflects local identity - making them locally distinct. Heritage is the USP of a place.
Our first tool published was the Heritage Index, which pulled together over 100 indicators from many different datasets. The Index scored each local council district on its density of heritage assets and activities (per person and per square mile). We put the data into a series of maps, as well as publishing a data explorer (and technical report) with all the ‘raw’ data as well as detailed tables and a feature to allow weightings to be changes across the different components of heritage.
As you’d imagine, we put out a press release at time of launch. The easiest way to understand a very complex set of calculations is to express the results as rankings, which basically means all scores are simply reported as relative to other places. Someone has to come last. Luton came last. We prepared for this, writing a blog in advance of the launch.
The Times and then the Daily Mail quickly wrote stories under a one-dimensional headline about Warrington, which had the lowest score in England if you just measure heritage assets (the overall Heritage Index score for Warrington in the bottom 10 scores among English districts – the overall score being a combination of assets and activities). The two newspapers took the results of a heritage data project and reported as being about culture. Most other media outlets followed suit as the story mushroomed. Even the self-regulated commons of Wikipedia.
Many people in Warrington were understandably upset. Digging into those 100 indicators, we were quickly alerted to two shortcomings of the source data. Specifically, there are canals that don’t show up in the government’s own published dataset. While listed simply as ‘canals’, it turns out that data is originally sourced from Canal and Rivers Trust (one of many organisations who own and/or manage canals). I said in the Warrington Guardian that this omission misrepresents Warrington, that it underestimates the true reality and that it is therefore a shortcoming. But this is inevitable when undertaking an exercise of this scale for the first time. Visiting each district to verify each statistic used in the Heritage Index would be impossible. The data source we used to understand the location of accredited museums also missed one of two in Warrington.
So what did we learn? The media thrives on disputes. If people or organisations are in disagreement, that is news. Even better if they are angry at one another. It is as if the only acceptable resolutions to such arguments are apologies or resignations.
We knew the Heritage Index would be provocative, and guessed it might be controversial. The first step in prompting action is to get people talking. But what has been surprising is the tone of many reactions. In summary, they often come down to: “admit you are wrong”.
And so the story runs and runs, now Councillors are joining the debate to request responses from one another. I spoke to a packed room of heritage enthusiasts in Warrington on November 21st. Less than one in five in attendance had directly looked at our research, but they’d almost all read the papers. This shows the value to us in using the media to get visibility, but their frustration also reflects the risk to well-intentioned research will be ‘used’ by the media.
At the session many took on board my view that while the Heritage Index wasn’t able to perfectly reflect every asset in Warrington, this was due to the quality of published data available. I don’t think we were irresponsible; but we were constrained. Other people use the same data sources and therefore inherit the same limitations. To change the Index, next year, will mean working with the original data owners rather than simply overriding the numbers when local knowledge comes to light.
For Warrington, rather than waiting to see how the numbers change with the inclusion of overlooked canals, some locally see the opportunity to channel attention into action: “this can be a springboard for us” was one of the positive contributions from Warrington’s gathering.
Wider lessons for data publishing projects
As Vic Reeves joked, 88.2% of statistics are made up on the spot. “Lies, damned lies, and statistics” we often sneer, especially at moments when the complexity of numbers exceeds our skills. Combine this with low and declining trust in the UK media (here’s the data) and it’s a recipe for data disaster. We need to support initiatives (like https://fullfact.org/) and appreciate the scrutiny of independent voices (like Jonathan Portes).
As the data shows, data is more frequently becoming something we look for (at least online). Ironically, I can’t find much data on data literacy. The closest related figures are on numeracy, and this might be the most shocking statistic you read today. 49% of UK adults lack basic functional numeracy (Level 1; source here: Table 8.15). This means they (you?) might struggle with basic percentages, or to understand the calculations used on a pay slip.
Data is really important. We can use it to hold each other to account. But data alone is not enough. Data becomes information when people generate insights to understand and interpret differences in data. A recent RSA release allows people to easily understand complex health data on how people with serious mental health conditions are faring in their local area. While we increasingly crave the numbers behind the story, this only becomes meaningful if we have the story behind the numbers as well. As I wrote in the current edition of the RSA Journal, the promise of ‘giving’ power to local communities will only be realised if people can access and use information on an equal footing to those more powerful.
In Warrington, as with everywhere else, a world of open data and big data is promising, but only if we know how to use numbers to tell stories for ourselves. Any new information, released to the world, will serve to advantage some people and disadvantage others. If we don’t invest in greater data literacy for all, then simply making things more transparent can mean a new tool sits only in the hands of those with the power to use it.
Here are five principles to keep in mind when publishing data:
- Reference all your sources and publish all your workings. Just like your teacher used to say in school, it is important that others can follow the decisions that you have taken in analysing data. Making data open means being open to critique. With the Heritage Index, people were more often upset with an error which could be traced back to the original source data, which we made available. In fact, as much as people didn’t like the results for their area, no one challenged us on the calculations and weightings we applied.
- Turn numbers into visuals. Remember: many people struggle with numbers, but most people are good with visuals. Use the speed and power of the human brain: almost everyone who engages with your work will be pressed for time when they do so. Of course, visuals can be manipulated and can often be hard to understand. Your choices are important. The best way to make the right choice is simply to test some alternative options, early on, with people unfamiliar with the project. Increasingly you can get this free, online (see here for graphs, and here for our favoured mapping tool).
- Answer the ‘where am I?’ question. Structure your data so that you audience is able find themselves – their home postcode, their age bracket, their occupation. How are they doing compared to other people like them? How do the figures in one locality stack up against the rest of the country? Good data projects are similar to good storytelling or great journalism: within one minute of engaging, you can understand how your own experience fits in to something bigger.
- Answer the ‘so what?’ question. This is what the media will focus on. Think back to when you began your data project…why are you publishing data or analysing it in a certain way? What do you want to happen after people view your data? Often, a better understanding of data inspires action. Show people the way to the actions they might want to take, and recruit volunteers of partners who might be better placed than you to help.
- Listen to feedback and use it to strengthen your project. Data projects don’t end when you publish. Plan for this. The reactions you provoke, positive and negative, are hugely valuable. Admit that no methodology is perfect and no data is sacred. Make the methods open and invite people to improve upon them. Collect impact stories of where your data is helping people, and be generous in lending a hand to those who feel your data project will disadvantage them.
You can follow Jonathan @JSchifferes