10 Bits: the Data News Hotlist
This week’s list of data news highlights covers March 25-31, 2017, and includes articles about a new bill to codify U.S. federal open data requirements and a machine learning system that can help diagnose depression.
A bipartisan group of congressmen have reintroduced the Open, Permanent, Electronic, and Necessary (OPEN) Government Data Act to make publishing open data an official responsibility of the federal government. Sponsored by Reps. Kilmer (D-WA) and Farenthold (R-TX) and Sens. Schatz (D-HI) and Sasse (R-NE), the bill would amend the U.S. Code to require federal agencies to treat their data as open by default and publish it online in open and machine readable formats with open licenses. The OPEN Government Data Act passed the Senate unanimously in 2016, however the 114th Congress concluded before the House could vote on it.
Computer scientists at Nanyang Technological University in Singapore have developed an algorithm that can design traffic flows to reduce the likelihood that spontaneous traffic jams will occur. The algorithm uses machine learning to predict how a traffic network would break down if an influx of cars were added at different points, and then evaluates potential routes to determine the optimal route for minimizing the likelihood this increase in cars would cause a traffic jam. If even just 10 percent of cars in a network were to follow the optimized routes, there would be a measurable positive effect on the network as a whole.
The UK’s Office for National Statistics (ONS) has launched a data science research center to focus on developing new methods for measuring the UK economy. The research will include analyzing data from traffic sensors to estimate economic activity, mobile phone data to understand commuting patterns, and satellite imagery to gauge an area’s population. ONS will partner with universities, nonprofits, businesses, and other organizations that could help advance its data science research.
IBM has developed a machine learning system that can automatically classify websites based on the likelihood that they contain phishing attacks 250 percent faster than standard methods. The system analyzes website data including a site’s URL, hosted images, and text, to identify any signs of a phishing attack, and it analyzes a site’s logo to determine if the site is legitimate.
Neuroscientists from the University of Texas at Austin have trained a machine learning algorithm to identify shared traits in brain scans, genomic data, and other data sources from people diagnosed with depression and anxiety. By identifying these traits in new patients, the system can identify if a patient has a major depressive disorder with 75 percent accuracy.
Researchers from the Technical University of Denmark in Copenhagen have developed an algorithm that can make it easier to compare the colors of different objects, which can help make computer vision applications more useful. While computer vision algorithms can easily identify and extract colors from images, it can be difficult for them to compare different color palettes because there has not been a standard method for putting colors in order. The algorithm assesses images to plot the colors in its palette in a three-dimensional vector, and then analyzes the distance between each point in the palette. Using this approach, an application could manipulate an image’s color palette with only minimal distortion.
The U.S. Senate has unanimously passed the Weather Research and Forecasting Innovation Act of 2017, which would authorize the National Oceanic and Atmospheric Administration (NOAA) to take several steps to improve its ability to collect and analyze weather data. The bill directs NOAA to research methods of generating weather forecasts for time periods of up to two years, improve tornado and hurricane forecasting, deploy tsunami sensors around federal and commercial undersea cables, and allow NOAA to supplement its satellite data with data purchased from the private sector, rather than exclusively rely on its own satellites.
The UK’s Defence Science and Technology Laboratory (DSTL) has launched a data science challenge to encourage members of the public to develop machine learning solutions to two problems facing the defense community. Participants will compete to develop systems that can classify documents based on their topics, and systems that can classify different vehicles in satellite imagery. WInning participants will receive £20,000 (USD $25,082).
Researchers at the University of California, San Diego have developed a machine learning system called Dance Dance Convolution that can learn choreographed routines from the dancing video game Dance Dance Revolution and generate choreography for new songs. The researchers trained the system on a crowdsourced database of Dance Dance Revolution choreography to have it learn how different steps can be combined with each other to match a song’s rhythm.
Researchers at Sandia National Laboratories have developed a method of diagnosing Zika and other viruses carried by mosquitos in less than 30 minutes by using a smartphone and an inexpensive attachment. The attachment uses a smartphone’s camera sensors to perform a diagnostic method called loop-mediated isothermal amplification, which can detect the presence of Zika in a biological sample, such as blood or urine, without needing to process the sample first.
Image: Gemma Longman.