Doctors and aid organizations have complained that poor quality data on the West African Ebola outbreak has made their jobs more difficult. But while it is true that much of the detailed case data they might expect to work with in a western country is unavailable, the affected countries do release some useful data, including information on confirmed cases by administrative district. The reason this data has not been useful for analysis, however, is that the countries in question—Liberia, Sierra Leone, and Guinea—report this data in portable document format (PDF), a format that is not machine-readable. Caitlin Rivers, a PhD student in computational epidemiology at Virginia Tech, has now endeavored to digitize these records by hand, posting the data from Liberia and Sierra Leone on her GitHub account, and promising to post the Guinea data soon. Rivers is also posting a blog series analyzing and visualizing the data and hopes the data will help other researchers do the same.
Ebola Data, Machine-Readable at Last
Travis Korte is a research analyst at the Center for Data Innovation specializing in data science applications and open data. He has a background in journalism, computer science and statistics. Prior to joining the Center for Data Innovation, he launched the Science vertical of The Huffington Post and served as its Associate Editor, covering a wide range of science and technology topics. He has worked on data science projects with HuffPost and other organizations. Before this, he graduated with highest honors from the University of California, Berkeley, having studied critical theory and completed coursework in computer science and economics. His research interests are in computational social science and using data to engage with complex social systems. You can follow him on Twitter @traviskorte.
View all posts by Travis Korte