10 Bits: The Data News Hotlist
This week’s list of data news highlights covers January 10-16, 2015 and includes articles about how Congress is approaching the Internet of Things and how Uber data can improve traffic in Boston.
The Defense Advanced Research Projects Agency (DARPA) granted Carnegie Mellon University $3.6 million to develop machine learning algorithms to identify sex traffickers. The initiative is part of DARPA’s Memex program, devoted to analyzing portions of the internet associated with human trafficking. Carnegie Mellon’s research will develop algorithms that can index online sex ads to identify the authors of the ads and utilize computer vision techniques to extract valuable information from associated images.
2. The Internet of Things Gets a Congressional Caucus
Senator Darrell Issa (R-CA) and Representative Suzan DelBene (D-WA) created the Congressional Caucus on the Internet of Things to educate Congress on the development and implementation on this technology. Issa and DelBene cited the fast-paced development of the Internet of Things and the potential for a federal role in ensuring such innovation was encouraged, rather than restricted, as the reasons for establishing the caucus.
Physicists at the Stanford Solar Observations Group rely on machine learning techniques to automate the analysis of the largest ever dataset of solar observations and predict future solar flares. The data comes from the Solar Dynamics Observatory, which provides researchers with almost continuous streams of solar magnetic field data. The researchers found that using machine learning techniques significantly cuts down the time spent on analyzing such large amounts of data and could give earlier warnings of a potentially dangerous solar flare.
Open Addresses UK, a London-based open data advocacy group, launched an open database of addresses in the United Kingdom. The group called for the public and private sectors to contribute address data to the database in order to create the biggest open dataset of UK addresses freely accessible to the public. Open Addresses UK hopes that making this basic, yet important data, which currently only exists behind paywalls, available to the public will be a boon to the economy, as it was for Denmark in 2010 when opening address data created €14 million in economic value.
Researchers from the University of Alberta developed an algorithm that can beat human players in poker. Known as Cepheus, the algorithm can be played against online and while not perfect, Cepheus will almost always break even or come out ahead. Researchers behind the project picked solving poker as a test for their artificial intelligence algorithms as it involved many complex strategy decisions and successes with poker could translate to other problem solving applications.
DrivenData, a startup devoted to crowdsourcing data science solutions to social challenges, pitted 800 data scientists against each other to come up with the best solution to manage financial data from school districts. The competition sought the best automated solution for converting school district spending data into interoperable formats. The winning solutions, which were awarded prizes of up to $7,500, are expected to save 400 man-hours per project that required this data. DrivenData sees potential for future competitions in the public health and public policy arenas given the success of this campaign.
A series of tech startups are relying on data science to help shoppers navigate sizing in online shopping. Virtusize, Clothes Horse, and LoveThatFit are using a variety of techniques to help shoppers find the right fit online to help address the 50 to 80 percent return rate caused by non-standard sizing across the industry. From mining customer data to using data-driven recommendation engines, data science techniques are increasingly used in online clothes shopping to improve customer experiences.
A new partnership between Boston and Uber will grant the city access to Uber’s valuable transportation data to help improve traffic problems. The data, which will be anonymized before being delivered to the city, is expected to provide insights into how traffic flows around the city, between different neighborhoods, and how it is affected by time of day. This data is expected to be valuable in planning everything from traffic light timing to major construction projects.
A new online registry developed by Australia’s Randwick Hospitals Campus and the University of New South Wales aims to use treatment data to help cancer patients with their family planning. Known as the Australasian Oncofertility Registry, the website will compile cancer and fertility data from participating healthcare facilities around the world to give cancer patients information about their fertility potential before and after receiving treatment. The developers of the registry hope the insights from this data will educate cancer specialists on the impact of treatment on fertility, and help patients make more informed decisions about their treatment.
Several startups are attempting to help consumers make better decisions when it comes to purchasing health care by making data about treatment effectiveness, pricing, and quality easily available. Companies like Castlight Health, Better, and Vital, are making costs more transparent, simplifying the complex process of making healthcare purchases, and informing consumers about the cost effectiveness of different treatments.