10 Bits: the Data News Hotlist
This week’s list of data news highlights covers April 8 – 14, 2017, and includes articles about a challenge to make it easier to track health data and a genomics program for cattle.
An AI system named Lengpudashi has resoundingly beaten a human team of engineers and computer scientists in poker, winning $290,000 after playing 36,000 hands. Lengpudashi is an improved version of a poker-playing AI named Libratus, developed by researchers at at Carnegie Mellon University, which had previously beaten some of the world’s best poker players. This time, the system’s opponents were familiar with how AI operates and attempted to use strategies designed to exploit the weaknesses of machine learning systems, but they were still not able to beat the system.
The U.S. Office of the National Coordinator for Health Information Technology (ONC) has launched the Health Data Provenance Challenge to encourage private sector healthcare technology providers to improve their ability to identify where and when health data originates to improve the accuracy of this data. Knowing the origin of health data and tracking how it is modified and shared can help healthcare providers more easily identify incorrect information, which can improve patient safety. ONC will awarded $180,000 in prizes to firms that develop and test the best strategies for improving health data provenance.
Online health community PatientsLikeMe has partnered with Shire Pharmaceuticals to create more opportunities to research rare diseases, which have limited treatments available due to the financial challenges of developing drugs for only very small populations. The partnership will allow users of PatientsLikeMe with rare conditions to share data with Shire Pharmaceuticals about their symptoms and treatments, which Shire Pharmaceuticals will use to guide the development of new treatments.
The United Kingdom’s Pensions Regulator (TPR) has developed a machine learning tool that can analyze data about pension schemes and predict whether or not a scheme is likely to make a return on time. TPR trained the system on several years of historical data about pension schemes labelled with whether or not they successfully delivered a return in compliance with TPR rules. Based on this system’s predictions, TPR can tailor its interactions with different groups based on the likelihood that their pension scheme is not compliant.
New Hampshire’s Department of State’s Division of Vital Records Administration has developed a mobile app called electronic Cause of Death (eCOD) that allows physicians to quickly submit data about patient cause of death to the U.S. Centers for Disease Control and Prevention’s (CDC) National Center for Health Statistics. In most states, physicians report cause of death data manually to the CDC, which can have a month-long lag time and limits CDC’s ability to track disease trends. With eCOD, physicians can submit this data on a daily basis and eCOD links with a CDC service that audits death certificates to ensure that this data is accurate.
Google has developed a machine learning cool called AutoDraw that can a predict what users are attempting to draw in real time and automatically complete doodles with a high-quality drawing. AutoDraw is based on a neural network Google designed for character recognition in a tool it launched last year called Quick, Draw!, which attempted to guess what users were trying to draw in under 20 seconds.
Ireland’s Department of Agriculture, Food, and the Marine is re-opening its Beef Data and Genomics Programme (BDGP), which offers subsidies to cattle farmers who take steps to use genomics to improve the genetic diversity of their herds and make the beef industry more sustainable. BDGP participants submit genetic and other data about cows in their herds in return for the subsidy, the amount of which is determined by the size of their land devoted to supporting cattle. They use this data to make more informed decisions about breeding, which can boost beef production and lower carbon emissions.
The Indian state of Telangana has become the first in the country to create a database of university graduates to allow employers to verify whether or not potential hires are telling the truth about their education history. The database contains basic information about graduates from state universities from the past five years, including student name and what courses a student actually took, so employers can crack down on applicants lying about their backgrounds, which is a common problem. Telangana is working on expanding the database with an additional five years of graduate data.
A startup called Luminar has developed a new LIDAR sensor, which uses reflected laser light to power computer vision systems such as self-driving cars, capable of “seeing” substantially farther than existing LIDAR sensors. This improvement, could give self-driving cars much more time to analyze and react to their surroundings on the road. Luminar’s sensor is capable of detecting an object with 10 percent reflectivity (which is comparable to dark objects such as pavement) from 200 meters away, while the leading LIDAR sensor on the market can detect an object with the same reflectivity at just 50 meters. Luminar’s sensor uses longer-range light wavelength in its lasers, which allows it to use more power without posing a risk to eye safety, and uses a system of mirrors to allow it to actively focus its lasers on different objects, while other sensors typically emit lasers in a set pattern.
Researchers at Institut de Recerca Germans Trias i Pujol, a Spanish health research institute, have developed software called Truke that can identify and correct corrupted genomics data in research databases. A recent study of genomics journals found that 20 percent of papers contained errors in their genomic data, such as genes named “SEPT2” being automatically changed to the date “September 2, 2016,” by a computer analyzing this data. Truke analyzes genomic data in research databases and compares it against a gene library developed by the U.S. National Center for Biotechnology Information consisting of error-prone gene symbols, and reverts it back to its original state.
Image: Martin Abbegglen.