This week’s list of data news highlights covers November 26 – December 2, 2016 and includes articles about an AI system that can write catchy songs and Virginia’s new open data dashboard evaluating school performance.
The U.S. Department of Veterans Affairs (VA) has partnered with precision medicine software company Flow Health to develop a knowledge graph—a repository of information that makes it possible for algorithms to identify semantic relationships between data points—to identify factors that make some people more susceptible to diseases. The knowledge graph will consist of 30 petabytes of genetic and clinical data from 22 million veterans. The VA and Flow Health will use this information to study how different gene variants correspond to the risk of contracting different diseases and develop more personalized treatment plans for its patients.
Researchers at Google have developed an AI system capable of generating natural-sounding series of notes by teaching it what makes songs “catchy.” The researchers began with Google’s existing music-generating system called Magenta and used deep learning to train its algorithms to incorporate basic principles of music theory that make music sound catchier, such as avoiding repeating certain patterns too frequently or playing too slowly. By giving the system positive feedback every time it generated a series of notes that adhered to these rules, Magenta was able to create much more natural-sounding music, whereas before the music it generated sounded flat and mechanical.
Pfizer has partnered with IBM’s Watson Health division to use the Watson cognitive computing platform to help research potential new drugs for immunotherapy—treatments that involve using the body’s immune system to attack cancer cells. Pfizer will use Watson to analyze large amounts of medical literature and its own pharmaceutical research to identify potential new drug targets and combinations of therapies that could be effective cancer treatments. Watson will also help Pfizer identify patients to tap for clinical trials.
Facebook is developing algorithms that can analyze live video streams and automatically detect and flag offensive content, such as nudity or violence, which violates Facebook’s terms of service. Facebook traditionally has relied on user reports to flag offensive content and then used human experts to evaluate the content and remove it if they determine it violates Facebook policies. By automating the process of identifying potentially offensive content, Facebook’s reviewers can more quickly respond to these reports and take corrective action.
Japan’s National Institute of Advanced Industrial Science and Technology has begun work on building the AI Bridging Cloud computer (AIBC), which will be the fastest supercomputer in the world by the time of its completion at the end of 2017. AIBC will have a processing power of 130 petaflops, and the current fastest supercomputer, China’s Sunway TaihuLight, has a maximum processing power of 125 petaflops. Such large amounts of processing power makes these systems adept at quickly analyzing massive datasets, and could be useful for advancing AI research.
Researchers at the Aravind Medical Research Foundation in India and Google have developed a deep learning algorithm capable of detecting signs of diabetic retinopathy—damage to the eye’s blood vessels caused by diabetes—in retinal scans with a comparable or higher degree of accuracy than human experts. The researchers trained the algorithm on 128,000 retinal scans of healthy patients and patients with diabetic retinopathy, and unlike other diagnostic software, which is trained to identify specific signs of the condition, the researchers had the algorithm learn the difference between a healthy and affected eye on its own. Early detection of diabetic retinopathy is important because the condition is treatable if caught early on, but can otherwise cause blindness.
The Virginia Department of Education has launched an open data portal called School Quality Profiles to serve as an easily accessible dashboard for information about how schools across the state perform. The portal aggregates data about student assessments, enrollment, teacher quality, school finances, and other indicators of school quality for both individual schools and school districts and provides easy-to-interpret charts and graphs to convey this information.
Researchers at the U.S. Centers for Disease Control and Prevention and the Georgia Institute of Technology have developed a methodology for foodborne disease surveillance that uses genomic analysis to rapidly identify pathogens, which can help public health officials more quickly respond to outbreaks of foodborne diseases. Identifying pathogens typically involves growing bacteria in a sample in cultures to produce enough of the bacteria for easy detection, however this can be time consuming and delay response times. The new approach relies on metagenomics, which involves sequencing the DNA of all bacteria present in a sample and comparing this data against a database of known microbial genomes to rapidly identify any matches.
Researchers at Lancaster University have created a machine learning system called REx that can combine portions of computer code to create functional programs that meet specifications set by human developers. REx works in three steps. First, it searches for and modifies code components based on the desired function. Then it assembles these pieces of code and measures its performance. And finally, it evaluates this performance data to identify the best possible way to assemble the code. The researchers expect this kind of self-assembling software could be used to operate complex computer systems, such as those used to manage data centers and robots, more efficiently and with less need for human input.
Apple has announced plans to use drones to quickly collect geospatial and other data to make its maps more useful and up to date. Apple intends to fly the drones over roads to monitor changes in traffic patterns, examine street signs, and gather other useful data to supplement the use of sensor-laden mapping vehicles, which can be ineffective for gathering timely data.
Image: National Eye Institute.