10 Bits: the Data News Hotlist
This week’s list of data news highlights covers October 29 – November 4, 2016 and includes articles about how NASA is learning more from its data and a startup using lasers to process data.
Researchers at Harvard University, the University of Toronto, and the University of Cambridge have developed a pharmaceutical research program that uses AI to generate models of potential new drugs. Human researchers traditionally attempt to identify drug molecules with particular properties and have software simulate how they could be combined, however this process can be time consuming and resource intensive. The researchers trained their system on data about 250,000 drug molecules to have it predict plausible combinations of these molecules based on their pharmaceutical properties without the need to simulate their structures.
Airbnb has made its data science sharing platform “Knowledge Repository” publicly available as open source. While platforms such as GitHub make it easy for users to share code and data, it can be difficult for users to share data analysis in a useful format. Knowledge Repository uses the same approach as platforms like GitHub that allows users to collaborate on code, but combined with Markdown templates, a markup language developers can use to convey explanatory text, so users can share both code and analysis in a single format.
The Associated Press (AP) is developing machine learning software that can automatically convert print news stories to a format suitable for broadcast. Unlike print stories, broadcast news relies on shorter and more concise stories, rounded numbers, and different attribution standards, which requires AP journalists to collectively spend 800 hours per week to rewrite a story to make it suitable for broadcast. AP has developed a prototype that can can automatically identify all the items in a print story that differ from broadcast standards, and it is working on using machine learning to automatically make the necessary changes for up to 80 percent of its stories by 2020.
NASA has implemented a graph database—a specialized database designed to link related records—to store information for its “lessons learned” database, which houses writeups and analysis of past agency projects. NASA engineers previously could only navigate the lessons learned database with basic keywords to try to find relevant information. With the graph database, which uses machine learning to map connections between topics and NASA’s writeups, users can more easily find relevant data and uncover new connections between projects.
Researchers at the Massachusetts Institute of Technology have developed a deep learning system capable of demonstrating a rationale for its decisions. For example, when analyzing online beer reviews, the system can identify which part of the review justifies the rating. This system proved to be nearly as accurate as humans who attempted the same task.The researchers applied the same technique to analyze pathology reports to detect breast biopsies and explain diagnoses.
Researchers at Google’s DeepMind have developed a machine learning algorithm capable of recognizing objects, such as the contents of images or handwriting, from just a single example, known as “one-shot learning.” Machine learning algorithms typically require training on thousands of similar examples to reliably recognize similar patterns in new data, while DeepMind’s system, after being trained on hundreds of categories of images, can recognize new objects after analyzing just one example and with a similar degree of accuracy as heavily-trained systems.
U.K. startup Optalysys has developed a method for processing data using lasers that can remove the need for powerful traditional computational hardware to analyze large amounts of data. The method relies on a mathematical function called the Fourier transform to represent data as particular patterns of light waves in a laser beam. Then, because lightwave physics can be predictably manipulated, making lasers interfere with each other can “process” data, and the resulting new patterns created by this interference can be interpreted by a sensor to produce the desired computation. Optical data processing in this fashion is not new, but the high cost of the technology involved made it not commercially viable. Now, thanks to the consumer electronic industry driving down component costs, Optalysys was able to make a more cost effective version that can be useful for computing-intensive applications such as genomics.
Job search website Glassdoor has developed a new method for estimating wage growth in the United States, using its own data rather than the Labor Department’s monthly employment report, which primarily provides national averages, rather than granular estimates. Glassdoor’s estimates, which it will publish monthly in its new Local Pay Reports, rely on crowdsourced salary information for different occupations and industries in specific cities, which its users voluntarily provide. Though several efforts attempt to supplement the Labor Department’s estimates with analysis of job postings or social media, salary data is much harder to obtain, making previous efforts less useful. Glassdoor’s first Local Pay Report, suggests that base salaries have grown 2.8 percent since last year, which is comparable to the Labor Department’s findings.
China’s National Research Center of Parallel Computer Engineering and Technology (NRCPC) has announced it has begun development of what will be the world’s most powerful supercomputer, 10 times more powerful than the current fastest supercomputer, which China completed in June 2016. The current leading supercomputer, the Sunway TaihuLight, can achieve 93 petaflops, or 93 quadrillion floating point operations per second (flops), while the new supercomputer will be able to reach 1,000 petaflops. China currently has 16 of the world’s top 500 supercomputers.
Austrian startup SmaXtec has developed an Internet connected device designed to safely lodge itself inside a cow’s stomach, where its sensors report real-time data about the cow’s health, such as body temperature, stomach pH, movement, and hydration. The devices connect to the Internet via wi-fi to automatically transmit data about the cows to a base station on the farm, and a cloud-based service will trigger a warning text messages to a farmer if the data indicates a cow might be getting sick, which can reduce the need for antibiotics. The system can also use the data to predict when a cow will give birth with 95 percent accuracy, which can help farmers make more informed planning decisions about milk production.
Image: Brian Johnson and Dane Kantner.