10 Bits: The Data News Hotlist
This week’s list of data news highlights covers February 7-13, 2015 and includes articles about the first Senate hearing on the Internet of Things and how a genomics database is being used to identify the source of foodborne illnesses.
The Senate Commerce, Science, and Transportation Committee held the first-ever hearing on the Internet of Things. Senators and witnesses discussed how lawmakers can build a regulatory environment for the Internet of Things that does not restrict innovation in this emerging field, which all agreed offered enormous economic benefits. Committee Chairman Sen. John Thune (R-SD) kicked off the hearing by arguing, “Let’s not stifle the Internet of Things before we and consumers have a chance to understand its real promise and implications.”
The Food and Drug Administration (FDA) released final rules about how it will regulate medical device data systems—hardware or software that transfers, stores, or displays data from medical devices—and mobile apps. The FDA will not be enforcing compliance requirements such as premarket review and post market reporting for such technologies so long as they do not modify data or control the function of a connected medical device.
The U.S. government will publish large indexes of federal agency data that has been previously unavailable to the public. These indexes, known as Enterprise Data Inventories, describe the data collected and stored by the government. President Obama’s 2013 open data executive order required federal agencies to build and maintain these indexes, but did not require they be made publicly available. Publishing this information will allow the public to get a better understanding of what data the government has collected but not published.
The Dragon Master Foundation, a U.S.-based charity devoted to developing the technology needed to fight cancer, has partnered with five hospitals to accelerate research about pediatric cancer with data analytics. Though genomic data is increasingly used in cancer research, there has been a lack of progress in using this data, as well as demographic data, for certain types of childhood cancers. The partnership will focus on gathering more biological samples for study and will bolster the Dragon Master Foundation’s efforts to build a database of at least 50,000 human genomes for researchers working towards a cure for these diseases.
A research team from the University of Rochester and Adobe Research has developed a process to train a “deep convolutional neural network”—a machine learning algorithm that can interpret sentiments from images. The researchers are attempting to make the process of sentiment analysis easier for computers, as people often express themselves online with images instead of just simply text, which is easier to analyze. Researchers used images from the photo-sharing website Flickr already labeled with sentiment tags to develop the algorithm, which they hope could eventually be used to perform large-scale sentiment analysis of social media data, which frequently includes mixed media.
Internet of Things data analytics company Space-Time Insight is piloting a virtual reality project to make better sense of data. Using the virtual reality headset Oculus Rift, the company’s pilot lets users visualize the data coming from something like a faulty transformer with 3D models to help users address problems. Space-Time insight hopes it’s pilot will develop virtual reality as a platform to let users more easily interact with offsite connected devices and improve decision making.
A collaboration between the Food and Drug Administration and federal and state public health laboratories have developed the GenomeTrakr, a database of bacterial pathogen genomes, to help identify the sources of foodborne disease outbreaks. GenomeTrakr relies on technology that can sequence the complete genome of an organism at a single time and can identify unique features of pathogens much more accurately than previous methods.
The Department of Energy and researchers from the Lawrence Berkeley National Laboratory have developed a series of pilot projects to demonstrate the benefits of data infrastructure designed to facilitate specialized research. For example, one of the projects built a “data pipeline” to more easily share data between supercomputing sites. The goal of the pilots is to make better use of existing tools and develop new ones as needed for scientists to perform real-time analysis of their data.
Data visualization company Stamen Design has developed CaliParks, an app that relies on open data to let users discover nearby parks, to help support California’s park system that is plagued with budget troubles. The app pulls data from government sources as well as images from Flickr and Instagram to let users find parks by location or interests, such as fishing or hiking. CaliParks creators expect the app to increase the popularity of the park system by connecting the social nature of parks, such as picture taking and group activities, with data about the options available to the public.
MassMutual Financial Group has created a Women in Data Science program at Smith and Mount Holyoke Colleges—women’s colleges in Massachusetts—to encourage women to enter the field. Funding from MassMutual will be used to hire faculty and develop a four-year data science curriculum that students at both colleges can participate in. Though the schools already offer some data science courses, classes like statistics are oversubscribed.