10 Bits: The Data News Hot List
This week’s list of data news highlights covers July 12-18 and includes articles about the U.S. Department of Commerce’s plan to hire a Chief Data Officer and an effort to track gas leaks using Google Maps.
U.S. Secretary of Commerce Penny Pritzker announced this week that the department would hire a chief data officer for the first time. The department, which houses the Census Bureau, the Patent and Trademark Office, and the National Oceanic and Atmospheric Administration, oversees an extremely broad range of data collection, storage, and analysis initiatives. The chief data officer will be responsible for developing and implementing plans for the future of the department’s data.
The United Kingdom’s registrar of businesses, the Companies House, announced this week that it will make all its company accounts data freely available to the public for the first time. The move comes following the G8 agreement of 2013 that contained provisions to fight tax evasion and publish data openly and in machine readable formats. The Companies House holds 130 million documents on 3 million UK companies in its database and accessing accounts data currently costs £1 for each file downloaded. The government hopes access to this data will improve accountability and spur the development of new applications that use the data.
The European Union issued a request for proposals for a pan-European open data portal this week. The portal, which would likely be the world’s largest and eclipse Data.gov.uk and Data.gov, will feature visualization tools along with data download and manipulation capabilities. The request emphasizes using open source software to remove license fees and reduce the risk of vendor lock-in.
Conservation group Environmental Defense Fund and researchers at Colorado State University are using Google Maps to track natural gas leaks in urban pipelines. The team, which has collected over 15 million readings across thousands of miles of roadway and verified their findings with utilities, launched a website displaying their findings this week. The site features a variety of visualizations from Staten Island, New York, Boston, and Indianapolis, showing how Boston’s older pipes make for many more leaks than Indianapolis’ newer system. The project’s organizers hope to track more cities in the future and use the data and visualizations as an advocacy tool to encourage cities to build better infrastructure that suffers fewer potentially dangerous leaks.
Software developer Pete Warden wants to lower the barriers to using deep learning technology on mobile devices with an open source project called DeepBelief. The technology integrates deep learning, an offshoot of machine learning that is loosely based on the action of neurons in a human brain, and it already underlies some systems at his travel image analysis company Jetpac. It was also recently used in an iPhone app called Spotter, which uses DeepBelief to identify objects a user points the camera toward. Warden hopes the technology will demystify deep learning for smaller app developers.
Airbnb is building language processing software to extract words from the company’s various sources of rental reviews and descriptions in order to create a kind of automated travel agent. The company uses the descriptive words to create a set of attributes matched with each city and hopes to offer travelers recommendations for cities or rentals similar to previous requests but which they might not have thought of. In the future, the company also hopes to recommend travel options based on a user’s location and other information, along with the travel preferences of similar users.
Microsoft Research showed off new a machine learning system this week designed to demonstrate the feasibility of running massive machine learning jobs on a distributed network of commodity hard drives. Researchers presented the system, called “Project Adam,“ identifying dogs in images and even distinguishing between photos of two different breeds of Welsh corgis. The team focused on a machine vision demonstration because that field offered the largest dataset, but in principle the computing power could be applied to all kinds of machine learning tasks. The team says their system is 50 times faster and more than twice as accurate as similar efforts have been in the past.
Bay Area startup BloomSky has designed a personal weather probe that can link consumers’ yards to a larger climate network and hopes eventually to deliver hyperlocal weather reports in real time. The probe collects data on temperature, barometric pressure, rain, humidity, and ultraviolet light levels, and features a camera that can take time-lapse footage. The device can be controlled via Wi-Fi by in-home control units or an accompanying app.
Pennsylvania-based startup True Fit uses data analytics to ensure that online shoppers find clothes that fit them. The company collects proprietary fit data from more than 1,000 brands to determine which brands’ clothing runs large or small or has unusual proportions. Users can create accounts on the site by inputting their height and weight, along with the size and brand of their favorite piece of clothing, and receive recommendations based on their inferred fit. On average, True Fit, has helped its clients, such as Macy’s, Nordstrom, and Guess, reduce online return rates by 10 percent so far.
Data for Good, a news site devoted to socially minded data projects, launched this week. The site, modeled after popular computer science news site Hacker News and its data science clone Datatau, was built at a startup weekend in Hamburg, Germany. Ultimately, the site’s founders plan to offer a listing for all socially beneficial data science projects and a collaboration tool to connect data providers, civic hackers, and government administrators interested in deploying projects.
Photo: Flickr user Daniel Stockman