10 Bits: the Data News Hotlist
This week’s list of data news highlights covers May 7-13, 2016 and includes articles about a neural network that can recreate art styles and a do-it-yourself artificial pancreas.
Researchers from Columbia University, NewYork-Presbyterian Hospital, and the New York City Department of Health and Mental Hygiene have demonstrated that better data exchange between health-care providers and city agencies can substantially improve pediatric vaccination rates. Previously, pediatricians had to manually search a patient’s immunization record through a municipal or state portal, but the New York City Immunization Registry now allows them to import the data into their patient’s electronic health record. The researchers analyzed the effectiveness of this program and found that up-to-date vaccination rates increased from 75 to 82 percent, and over-immunization rates for adolescents—which can occur when clinicians do not have access to timely immunization data—decreased substantially.
Seventy six cities and regions are now using crowdsourced transit data from cycling app Strava to better understand how well municipal infrastructure meets commuters’ transit needs. Strava allows users to track their routes and speeds as they bike and then anonymizes and aggregates the data to illustrate how cyclists navigate a city. City planners typically have limited data about bicycle traffic since they must rely on people to count the number of cyclists that pass through an intersection in a given period. But by analyzing Strava data, which accounts for up to 10 percent of bicycle traffic in a city, planners can gain much greater insights into how to better design roads for cyclists and encourage more people to bike.
Researchers at the University of Washington have developed a robotic system that pairs an advanced robotic hand with machine learning algorithms that allow it to teach itself to perform new tasks. Robotic systems typically rely on simple grasping mechanisms, such as a claw, rather than hands because programming five fingers to work together to manipulate objects is incredibly challenging. The researchers first trained their algorithm with advanced physics simulations and as the hand attempted to perform tasks, such as typing on a keyboard, the algorithm analyzed sensors on the hand and videos of its attempts to continuously improve.
IBM has launched a new initiative to use its cognitive computing platform Watson to analyze and protect against cyberattacks. Using machine learning systems to aid cybersecurity efforts is increasingly common due to their ability to sift through massive amounts of technical data, but Watson is unique in that it will also analyze unstructured data sources, including cybersecurity research reports and blog posts, to consistently stay abreast of the latest developments and threats in the space. IBM will begin to train Watson with this data, exposing it to 15,000 such cybersecurity documents per month, before releasing a commercial product.
Researchers at the University of Freiburg in Germany have trained a deep neural network—a type of machine learning system designed to replicate the mechanisms of the brain—to recreate images and videos in the form of famous artistic styles, such as Picasso’s cubism and Van Gogh’s impressionism. The neural network uses machine vision techniques to analyze the relationships between particular characteristics of an art style and then modify images to replicate these styles. Though other research has attempted to accomplish a similar task for videos, the neural network is capable of recognizing subtle difference between successive frames and generate certain, difficult-to-analyze portions of the image from scratch, greatly improving the visual coherence of the final result.
The health-care division of analytics firm SAS has partnered with the Duke University Clinical Research Institute to make the Duke Databank for Cardiovascular Disease—the largest of its kind—freely available to researchers around the world. The database contains de-identified health data of 50,000 patients with heart disease and 100,000 cardiovascular procedures, which researchers can use to test hypotheses, improve clinical trials, and analyze trends in heart disease and treatment. The partnership is the result of a Duke University-led initiative that aims to improve patient care and accelerate research by increasing access to clinical trial and other health data.
Researchers at the University of Washington have developed a method called PaperID that makes it possible to integrate inexpensive radio frequency identification (RFID) transmitters into paper. These transmitters, which cost 10 cents each, can be placed, printed, or drawn onto a paper surface. Different movements, such as someone swiping a hand over the paper, interrupts the signal to a nearby reader which can then trigger a programmed command. The low cost of PaperID and the ubiquity and flexibility of paper could allow for widespread deployment of simple, on-demand sensor interfaces.
Google has made its natural language processing algorithms available as open source, which could substantially improve the ability of independent developers to create programs that reliably understand humans. Interpreting written and spoken language is valuable for human-computer interaction, but complicated grammar and syntax make it challenging for these applications to actually understand the underlying meaning of language and respond to users appropriately. One of the tools, called SyntaxNet, learns to understand the contextual meaning of language, rather than rely on literal interpretation. Another tool, called Parsey McParseface, automatically breaks down spoken language into specific components such as nouns and verbs to make it easier for applications to analyze.
A growing number of patients with type 1 diabetes patients are turning to open source software and sensor technology to develop portable, automatic insulin pumps to better manage their treatment. The system relies on freely available software called OpenAPS, a modified computerized insulin pump, and a glucose monitor that can regularly monitor a patient’s blood glucose levels. As a patient’s glucose levels change throughout the day, the system’s algorithms will automatically adjust the level of insulin it administers. The concept of an artificial pancreas has been pursued by medical devices companies for years, but advances in sensor technology and declining costs have made it feasible for patients to take treatment into their own hands as device manufacturers await regulatory approval.
A campaign by scientific journal Psychological Science is attempting to develop a culture of data sharing in the scientific community by awarding token badges to research papers that make their underlying data publicly available. A 17 month study of the campaign revealed that after Psychological Science began publicly displaying these badges for compliant papers in 2014, 40 percent of the journal’s papers made their data publicly available by early 2015, up from the sub-10 percent average of other similar journals. By sharing underlying research data and materials, the scientific community can better scrutinize research results, which can help address replicability concerns and lead to new findings.
Image: The Bridgeman Art Library.