10 Bits: The Data News Hotlist
This week’s list of data news highlights covers January 3-9 and includes articles about how machine learning is being used to find genetic links to autism and how the FDA is finding dangerous drugs with data analytics.
The 2015 Consumer Electronics Show (CES) took place this week in Las Vegas, and companies making products for the Internet of Things took center stage with a wide variety of new consumer devices and innovations. A new generation of connected wearables featured everything from solar charging fitness trackers to a fatigue detector for drivers that can detect if a driver will fall asleep up to five minutes before it happens. Smart cars, Internet-connected appliances, and sensor-laden clothes were also on display, indicating a promising year for the Internet of Things.
In a surprise vote this week on H.R. 37, the Promoting Job Creation and Reducing Small Business Burdens Act, detractors blocked efforts to undo progress on opening financial data and the bill failed to pass. The bill would have exempted 60 percent of public companies from reporting financial statements in the open, machine readable XBRL format to the SEC, reducing the requirement to only include outdated, text-based documents. The SEC has required public companies to report financial data in the XBRL format since 2009, but only recently began enforcing standards for data quality on such reporting, causing frustrations from financial stakeholders about the usefulness of this data.
By feeding a computer algorithm disparate datasets about 9,000 stars, including size, temperature, and the amount of heavy elements, known as spectra data, NASA researchers were able to have the algorithm produce original predictions about new stars without the need for new spectra data. This machine learning technique will help NASA discover and analyze new types of stars without having to first obtain spectra data, which can be an expensive and time-consuming process. Researchers responsible for the project hope to use their machine learning algorithm to help analyze the more than 50 million stars that NASA is projected to observe in its upcoming Large Synoptic Survey Telescope project.
A team of engineers is developing machine learning techniques to find autism clues in genomics data. Their efforts have already helped identify several genes that could help scientists better understand how autism is caused and develops. The developers of the program credit the machine learning techniques as critical to comb through the troves of data stored in the MSSNG genetic database, a joint effort by Autism Speaks and Google to create the worlds largest genomic database on autism. So far the program has identified 39 previously unrecognized genetic clues linked to autism, in addition to clues related to certain cancers and spinal muscular atrophy.
The United States’ Food and Drug Administration (FDA) will expand its Sentinel Initiative devoted to identifying drug safety issues with analysis of electronic health records (EHRs). The initiative, one of the first large-scale monitoring programs focused on patient safety, will move beyond its pilot phase to scan millions of EHRs to identify unsafe drug-related events. The Sentinel Initiative’s pilot program analyzed records of 178 million patients across the country to allow the FDA to more quickly identify problems caused by certain drugs and act to improve patient safety.
A partnership between New York health insurer Healthfirst and health data company eCaring will start using real-time patient data to reduce the time patients spend in the hospital. Cloud-based monitoring software will collect and analyze data generated by participants in their homes in real-time to alert Healthfirst about patients’ needs. Healthfirst expects that this data will help them keep better tabs on their members’ health, intervene sooner, and avoid unnecessary trips to the hospital.
Maine’s Health Information Exchange (HIE) will begin rolling out predictive analytics services to its members, which include hospitals and ambulatory services. The predictive analytics service will help hospitals cut costs by identifying which patients are most likely to return to the emergency room or be readmitted in the next month. Case workers and primary care providers can then work with these patients to address the potential reasons for repeated trips to the hospital, which can be costly. Developers of the platform hopes their analytics software will help keep physicians better informed about their patients’ health and identify previously unknown at-risk patients.
Disney Research has developed a machine learning algorithm that can analyze photographs on a wide variety of factors ranging from basic information such as time and location to more advanced things like texture and aesthetic quality. The algorithm can then group and organize pictures into albums based on preferences learned from analysis of Flickr data to have certain types of images in a certain order in albums. When generating an album, the model also learned not to use similar photographs more than once. This machine learning technique offers interesting possibilities for automation and optimization for consumers interested in using online platforms for album creation.
Twitter released software on GitHub, an online, open source code repository, to invite developers to help in the fight against hacking and spamming. The software tool, called AnomalyDetection, uses an algorithm to identify actions that deviate from the norm which might indicate malicious activity. Twitter hopes that opening the AnomalyDetection code to the developer community will improve the tool as well as help others crack down on things like online spam.
A new deal between Genentech, a biotech company, and 23andMe, a personal genetics company, makes large amounts of genetic and health data available for research. 23andMe sells inexpensive DNA test kits to their customers, most of whom then volunteer to contribute their genetic data to 23andMe’s database. The deal allows Genentech access to the genetic database to improve genetics research and help develop treatments for diseases like Parkinson’s. 23andMe’s current database is populated with genetic information volunteered by 600,000 of their 800,000 customers.