10 Bits: The Data News Hotlist
This week’s list of data news highlights covers May 30 – June 5, 2015 and includes articles about how San Jose is connecting itself to the Internet of Things and how machine learning is helping Airbnb hosts set prices more accurately.
The Center for Medicare and Medicaid Services (CMS) is making data, which since 2013 has been exclusively reserved for researchers and academics, available to entrepreneurs and developers to help them create commercial healthcare tools. The data includes de-identified information on Medicare claims, chronic conditions, and healthcare provider analysis. Commercial developers will be vetted to ensure they handle the data securely and should have access by September 2015. CMS also announced that it will update the data on a quarterly basis, rather than annually, and will require more of this data to use machine-readable formats.
New York City’s Education Department will unveil a new system for public schools to share student data with parents, replacing the Achievement Reporting and Innovation System (ARIS) that expired at the end of 2014. ARIS, despite costing the Education Department $95 million to build and maintain from 2007 to 2014, saw little use from parents and educators, and since its expiration, parents have not had a method of accessing their children’s education data online. The new system, which cost just $2 million to build, will give parents access to data on attendance, grades, and standardized-testing scores, though it lacks some of the student performance analysis capabilities of the previous system.
San Jose has finalized a partnership with anyCOMM, a startup that makes Internet-connected sensors, to pilot the company’s technology and integrate the Internet of Things into city services. The city will install small wireless sensor devices along main roads to collect and report traffic data. The sensors are capable of recording audio and video, acting as Wi-Fi hotspots, and turning off streetlights when sidewalks and roads are empty. The company will provide the sensors to San Jose and other cities for free, with plans to eventually lease portions of the devices’ Wi-Fi networks to businesses as a means of funding the project.
Code for America, a nonprofit focused on improving how the public sector uses technology, has launched the Police Open Data Census to serve as a resource on open law enforcement data around the country. The project’s website allows users to contribute information on the existence and quality of open police data for cities around the United States across eight categories, including officer-involved shootings, traffic stops, and the use of force. Users can indicate if the data is current, stored in machine-readable formats, published online, and available for download. So far, 21 cities have committed to opening their data for the census, and the site currently has information on 32 cities.
The Missouri Highways and Transportation Commission has announced the Road to Tomorrow project, which aims to rebuild a 200-mile portion of Interstate 70 as the first smart highway in the United States. The project is currently in planning stages and Missouri officials are seeking public input on how the road can be rebuilt with technologies to accommodate modern and emerging transportation technologies, such as autonomous vehicles, global positioning systems (GPS), and computerized road-management systems.
The National Cancer Institute is launching a trial program called NCI-Match that will attempt to individually match 1,000 cancer patients with drugs targeted to their tumors’ specific genetic mutations. Ten pharmaceutical companies will provide more than 20 treatments to be studied for the program, which will begin screening patients in July, that fight cancers based on their genetic code, rather than on the location of the tumor in the body. The program will help further research into how treatments based on DNA sequencing could help cancer patients with targeted drugs that are not specifically approved for their type of cancer, but could still prove beneficial.
Travel company Airbnb, which connects travelers to homeowners that have spare bedrooms, has announced a tool called Price Tips that uses machine learning to help hosts set the best possible price on a daily basis. Price Tips analyzes data that could affect demand and informs hosts if their price on a given day is too high, which could reduce the likelihood of securing a booking, or too low, which means they miss out on potential profits. In testing, Airbnb found that a host is four times more likely to book their property if their price is within five percent of the amount Price Tips recommends.
Researchers from Tufts University have developed a machine-learning technique to solve the mysterious biological phenomenon of how a flatworm’s genes allow it to regenerate to form new organisms after being sliced into multiple pieces. Biologists have observed this phenomenon for 120 years, but have until now been unable to define the genetic mechanisms that make this process possible. The researchers’ algorithms were able to create a genetic model that explained the flatworm’s regenerative abilities, as well as discover two previously unknown proteins, by analyzing a flatworm’s genetic network—too tangled for a human to understand—in just three days.
A research consortium called Marine Mammals Exploring Oceans Pole to Pole has unveiled a publicly accessible database with information from some of the coldest and most remote points of the world’s oceans. The data comes from waterproof sensors attached to seals, which regularly travel portions of the ocean that humans cannot. The sensors collect and transmit data such as ocean temperature and salinity. The data has resulted in nearly 400,000 environmental profiles of the polar oceans—one of the largest resources of its kind. Researchers are using the data to better understand a wide range of environmental issues, such as how marine life in polar regions are adapting to climate change.
Data scientists at the University of Cambridge have developed a method of mining social media data to determine how and where cities smells. The scientists analyzed 17 million Flickr images, 436,000 Instagram posts, and 1.7 million Tweets geotagged in London and Barcelona that contained references to smells in English and Spanish, ranging from manure to lavender, from 2010 to 2014. With this data, the scientists were able to map the presence of good and foul-smelling odors and identify a high correlation between bad smells and areas with poor air quality. The scientists hope this information will be useful in urban planning and influence how policymakers consider the effects of polluting emissions in urban areas.
Image: flickr user Liam Quinn.