10 Bits: The Data News Hot List
This week’s list of data news highlights covers May 24-30 and includes articles about Chicago’s planned environmental sensor network and an eBay initiative to use data to predict aesthetically pleasing clothing combinations.
Chicago will begin rolling out a next-generation data collection system this summer, which will use light post-mounted sensors to collect data on temperature, humidity, air quality, and other environmental variables. The initiative, a collaboration with industry and the Urban Center for Computation and Data, will begin with 30 to 50 such sensor nodes, but may grow to up to 500 additional installations in the future. The data will be available through the city’s open data portal, where city departments can use it to conduct studies on how environmental factors affect various aspects of city governance. The city is also working on an open data analytics platform, which it intends to make available to other cities.
The National Collegiate Athletic Association (NCAA) and the U.S. Department of Defense announced that they would collaborate to launch the most comprehensive concussion database ever created. The $30 million dollar effort, announced this week, will initially study NCAA athletes from 10 universities, as well as the entire student bodies at military service academies. The organizers hope the effort will aid researchers in conducting longitudinal research into sports-related concussions, which has traditionally been difficult due to small sample sizes.
eBay Research Labs wants to create a predictive algorithm to determine what clothing items make good combinations and recommend those combinations when a user is shopping. Researchers developed two algorithms to recommend outfits. The first suggests matches based on clothing items’ color, while the second detects whether or not the clothing item is patterned and recommends a solid-colored item if it is. The researchers then surveyed human subjects to rate the algorithms’ performance. The ongoing research is not yet being deployed on eBay’s website, but may be piloted in the future.
Researchers at Stanford University mined a unique Reddit data set to learn about what strategies work best for asking favors. On Reddit’s Random Acts of Pizza section, users leave messages asking for pizza, which other users can donate if they find the message convincing. The whimsical concept produced interesting data, with a total of 5,738 requests falling into five general categories, including having a financial need, being a student, and merely having a craving for pizza. The researchers used machine learning techniques to analyze various aspects of the posts and determine what might be responsible for the success of a request. The resulting algorithm, which had a 70 percent success rate, found that narratives about jobs, money, and family increase the probability that a request will be fulfilled, while narratives expressing a craving reduce chances of success.
Google is using artificial neural networks to analyze how its vast data centers perform and make adjustments accordingly. These techniques, loosely inspired by the human brain’s system of neurons, can recognize patterns in large data sets much faster than human analysts alone. The company uses neural networks to model how a data center is expected to perform under certain conditions, such as energy usage and weather, and make targeted changes if actual performance deviates from expectation.
Using light concentrations from satellite images to measure economic activity in areas with poor official data sources is a popular emerging area of research. One recent study found that satellite data is most useful in the developing world, where it can be used to estimate the density of economic activity and output per person. In developed countries, where reliable economic data already exists, it can be used to test the performance of satellite data analysis and improve methods for use in other countries. Another study used Sweden’s comprehensive economic data in conjunction with satellite imagery to show that light concentration correlates particularly well with population levels and population density, but is less useful for estimating wages.
By the end of this year, Microsoft Research will launch real-time speech translation in Skype. Using a variety of artificial intelligence techniques, Microsoft’s researchers have reduced the error rates of this traditionally difficult task. Skype boss Gurdeep Pall showed off the technology this week to demonstrate its accuracy. Although still imperfect, the translations were reportedly accurate enough to be useful.
The winners of Vodafone’s Wireless Innovation Project Competition, announced this week, created a variety of tools that use mobile technology to make previously costly tasks possible in the developing world. The first place winner was MobileOCT, a startup that developed tools to turn a smartphone’s digital camera into a cervical cancer detection device that was approximately 1/25th as expensive as current technologies. Another winner created a mobile phone attachment capable of retinal imaging, which may someday help health workers detect retinal diseases that cause blindness and other conditions.
Music industry analytics company Next Big Sound announced a new division called Next Big Book this week, a partnership with publishing company Macmillan. The companies will work together to develop a tool that looks at data on sales, publicity, events, social media, and web traffic to help publishers determine what factors make the most impact on book sales and adjust marketing efforts accordingly. One insight that has already surprised publishing executives is the strong correlation between traffic to an author’s Wikipedia page and book sales.
Los Angeles Mayor Eric Garcetti is making another push for open data in his city, following underwhelming initial efforts to provide public access to government performance information last fall. The city’s new offering will include data sets on crime, car crashes, business licenses, and other civic information. Los Angeles has struggled to match other cities’ efforts around opening data and providing a usable data platform, but mayoral aids hope the new push will help encourage civic hackers and other technologists to jumpstart a movement toward open data in the city by creating useful apps based on the information.