The Center for Data Innovation spoke with Sarah Telford, Chief of Data Services at the United Nations Office for the Coordination of Humanitarian Affairs (OCHA). Telford discussed the value of OCHA’s open data platform to the humanitarian sector and the obstacles humanitarian organizations face when it comes to sharing data.
This interview has been lightly edited.
Joshua New: At OCHA you run the Humanitarian Data Exchange (HDX), which the UN first deployed during the 2014 Ebola outbreak in Africa. What function does HDX serve that other kinds of open data portals don’t?
Sarah Telford: At the time, there was no platform that brought data together across organizations, locations, and sectors for humanitarian purposes. I was working on reporting for OCHA, which develops a variety of information products in its coordination role, like situation reports and bulletins. A lot of these were very narrative-based and back in 2013 we decided we wanted them to be more analytical, such as by using more maps and graphs, but we just couldn’t find the data to do this. We asked around if anyone knew where this data was but it became clear that if we wanted to solve this problem, we’d have to bring this data together in one place.
The were a lot of other initiatives at OCHA that focused on single crises—a data repository developed for an earthquake, for a catastrophe in West Africa, and so on. This data wasn’t being maintained and we would lose track of it over time. We really wanted to figure out how we could make data easy to both find and use. Data use was the bigger challenge: was the data being maintained? Was it clean? Could we build a visual from it? The idea was to try to bring all these resources together and maintain the data so that we wouldn’t have to continuously create one-off solutions, which create a lot of inefficiencies.
New: What kind of data is considered “humanitarian”? Are these types of data only relevant after a major disaster, or does this kind of data have more sustained relevance?
Telford: We define humanitarian data in three ways. First, there’s baseline data that exists in a country before a crisis, such as vaccination rates or geospatial data. This pre-crisis data can paint a valuable picture of a country’s health. Then, there’s data about a crisis itself—information about where it happened, how many people were affected, and where assistance is needed. Third, there’s data about who is responding to the crisis, what kind of aid they are providing, and their impact. We started out with a lot of baseline data in HDX, and then over time we’re adding more and more data about crises themselves and data about who is doing what, and where they are doing it.
We now have over 4,000 datasets on HDX, and their applications are pretty diverse. For example, we just got satellite data about Aleppo, but we also have data from the Red Cross which is conducting surveys in Kenya about sanitation.
New: Other than making sure data is relevant, what factors are important for ensuring the data on HDX is as valuable a resource as possible to humanitarian workers?
Telford: There are a lot of challenges with dirty data, and the hardest kind to make use of quickly is survey data, which can tell us about the needs of a community after a crisis. This data needs to be processed before it can be shared so we’re increasingly looking at how we can improve the infrastructure that exists in between data collection and data sharing to make it simpler and quicker.
We have a data standard we created to help with this called the humanitarian exchange language (HXL). We’re making a big bet on these standard formats—the more data that uses HXL for humanitarian data, the easier it will be to combine and share data about a crisis with humanitarian workers right when they need it. For example, we built a tool called the Map Explorer that can pull on a wide variety of data sources, ranging from conflict events to food prices, and organize this data around geography and time. We need to clean every single dataset and bring it together, which right now takes a few weeks. This is definitely useful, but we really want to be able to use this at the height of a crisis, meaning we need to make this process a lot faster.
New: How has HDX changed since its launch? Has it just grown by dataset count, or has the platform as a whole improved?
Telford: In the beginning, HDX was primarily just a collection of datasets. We would have useful data, such as health indicators in a country, but we’d just have it on the platform as a data file. Over time, we’ve been able to visualize more of these individual datasets and increase their utility, as well as do more across multiple datasets, like combining multiple data sources into a single tool.
The visualizations and increasing the usability of this data have been our most valuable changes, but we also now have over 250 organizations contributing data. The more we can incentivize these organizations to share data on the platform, such as by providing data visualization and data cleaning, the more we’ll be able to make humanitarian data useful for the non-technical crowd. There’s a pretty big opportunity gap between technical and non-technical users. During the Ebola crisis, we had about 30 datasets shared by different organizations working in West Africa, but it was basically just a list of data on our website. A non-technical person coming to HDX wouldn’t have had any way to even know what was in these datasets, let alone use this data. We wanted to build a product layer on top of the raw data, which is how we came up with our crisis pages feature. A disease crisis page, for example, can have a map of cases that displays their severity, a graph of deaths over time, and then the data itself. As HDX becomes more sophisticated, we’re able to do more of these kind of data transformations.
New: Unless a company or organization is already supportive of the idea of open data, they might be hesitant to share data that they invested a lot of resources in collecting. Does this happen in the humanitarian sector? What other reasons might make a humanitarian group unwilling to share their data?
Telford: Fortunately this doesn’t happen much in the humanitarian community. The organizations that are collecting data about crises, such as UNICEF or Save the Children, don’t do it for commercial reasons so they already aren’t charging for this data. So while we fortunately don’t face that challenge, there is a big challenge when it comes to data quality, which might not happen in a commercial sector. Often times when data is collected, it needs a lot of processing before an organization is comfortable sharing it, which means that an organization sometimes isn’t prepared to share their data. So if we can come in and make the data processing step easier, we get more groups that want to share their data and we also can get data sharing to happen on an institutional level, rather than just between individuals.
There are a number of other data challenges as well, and our new Centre for Humanitarian Data will focus on challenges that fall into four areas: data services, data policy, data literacy, and network engagement.