10 Bits: The Data News Hotlist
This week’s list of data news highlights covers June 13-19 , 2015 and includes articles about how researchers trained a machine learning system with news articles and how a university is mining student records to boost graduation rates.
The Food and Drug Administration (FDA) has announced a research partnership with PatientsLikeMe, an online patient community, to explore the potential for patient-generated health data to identify dangerous drugs. The FDA already collects data about instances of dangerous drug side effects, called adverse events, though its access to this information is limited to patients who voluntarily report adverse events directly to the FDA and aggregated data from insurers and electronic medical records. FDA hopes this new partnership, which grants it access to 110,000 adverse event reports related to 1,000 different medications from PatientsLikeMe users who voluntarily share their data, will allow for more a detailed understanding of how patients take medicine.
India’s Income Tax Department is developing a new database designed to make it easier to mine personal and commercial financial records to crack down on tax evasion. The database will combine financial transaction data and human intelligence to help India improve the poor level of tax compliance in the country. Officials say the database will be fully operational this year.
Researchers from the University of Cambridge have developed algorithms they call the Corruption Risk Index (CRI) that can comb publicly available data on government procurement to identify potential signs of corruption. For example, rapid turnaround times for contract awards, a low amount of bidders in an otherwise competitive industry, or unusually complicated procurement documents would all cause CRI to signal a higher probability of corrupt practices that warrant further investigation. The researchers hope to make CRI widely available to journalists and civil society groups trying to sift through large amounts of public data to fight corruption in government.
The UK Environment Agency (EA) has announced it will make its valuable light detection and ranging (lidar) datasets available to the public. Organizations use lidar data to create precise, large-scale 3D models of terrain, vegetation, buildings, powerlines, and other infrastructure, for use in flood monitoring, asset management, and other urban planning issues. EA has lidar data covering 60 percent of England and Wales but has only made this partially available to non-commercial entities for the past two years. EA will make this data, as well as data from the Ordnance Survey, the UK’s mapping agency, freely available to all in September.
Researchers at Google DeepMind, an artificial intelligence company based in London, have developed a technique to train deep learning by feeding it articles from Daily Mail and CNN. The researchers recognized that Daily Mail and CNN articles use an annotation format uniquely suited for training neural networks that otherwise require vast databases amounts of carefully curated data. The DeepMind team was able to train a neural network on hundreds of thousands of news articles to teach it to respond to answer certain queries. The researchers expect this method will be useful in teaching machines to comprehend what they read, as existing training techniques typically require prohibitively large and labor-intensive databases.
Google has updated its Trends tool, that tracks the usage of search keywords, to provide users a real time look into the over 100 billion searches made on Google per month. Trends also now pulls search information from Google News and YouTube to paint a more accurate picture of how people look for information on the Internet. Journalists and researchers can break down this trend data into geographic location and for specific time periods to study public awareness of particular issues.
Facebook has launched a computer vision-powered feature called Moments that can rapidly recognize faces across different photographs with a 98 percent accuracy rate. Moments does not identify users, but rather recognizes if the same face appears in multiple photographs even if the face is partially obscured. Facebook expects Moments will make sharing large amounts of pictures with multiple people substantially easier.
Virginia Commonwealth University (VCU) is analyzing student records to identify students at risk of not graduating to help get them the extra help they need. While students who fail classes are easy to identify, VCU’s analysis aims to identify at-risk students who withdraw from classes or have poor—but not alarmingly bad—grade point averages. VCU offers these students, who would otherwise fall through the cracks, tutoring or degree counseling. While it may be too early to accurately gauge the effectiveness of this strategy, after just one semester of using the new data analysis system, VCU saw a 16 percent increase in the number of students completing courses.
Capital Area Food Bank, a D.C.-based nonprofit, has turned to data and maps to better reach populations in need of food. By combining disparate datasets from the U.S. Census, national nonprofit records, and internal reports, the food bank was able to identify that, surprisingly, suburban areas had substantial need for food aid. For example, they were able to map a yearly unmet need of 150,000 pounds of food in Reston, Virginia—a wealthy suburb. Capital Area Food Bank now uses maps to influence every decision it makes to better target its efforts, and the nonprofit hopes to share this technology to combat hunger on a national level.
CityFarm, a research group at the Massachusetts Institute of Technology, has built a “personal food computer”—a small, climate-controlled box that uses a technique called aeroponics, in which plants grow suspended in air, to grow produce. As plants grow, an array of sensors collect data on light levels, humidity, and other environmental conditions, and the computer makes this data publicly available to help users create optimal growing conditions. CityFarm’s founder hopes this combination of advanced farming technology and data sharing will help pave the way for widespread urban farming, which would have to be much less resource intensive than traditional farming to be viable.
Image: flickr user chipmunk_1.