10 Bits: the Data News Hotlist
This week’s list of data news highlights covers October 15-21, 2016 and includes articles about a new pollution sensing network in Baltimore’s harbor and an initiative to share economic data from the United States and Europe.
Vice President Biden has issued a new report detailing the progress towards the Cancer Moonshot, an initiative to accelerate cancer research and improve care through genomics, and announced a series of new projects to support the moonshot. For example, the National Cancer Institute will work with Amazon Web Services and Microsoft to develop a genomic data management platform for cancer research. Additionally, the Department of Defense will work to link its cancer registry database of 250,000 biological samples to a database managed by the Environmental Protection Agency so researchers can identify the relationship between environmental conditions and cancer progression.
Tesla has announced that it is now manufacturing all of its cars with hardware capable of fully-autonomous driving, which will eventually be enabled through an over-the-air update. The hardware includes an updated suite of sensors to help the cars monitor and navigate their environments, including eight 360-degree cameras and 13 different radar, and Tesla has developed an update for its self-driving software to make it capable of analyzing the large volume of data from its new hardware.
Researchers at Microsoft have developed an AI speech recognition system that can transcribe conversations from audio more accurately than humans. The researchers trained their system on 2,000 hours of audio recordings of human speech and in a test, the system was able to transcribe a conversation with an error rate between 5.9 and 11.1 percent. By comparison, a human team of a transcriber and an error checker had slightly higher error rates, between 5.9 and 11.3 percent.
The U.S. Geological Survey (USGS) and the Environmental Protection Agency (EPA) have announced a program called Village Blue that will install a network of connected sensors in Baltimore’s Inner Harbor to collect granular data about pollution. Beginning early next year, the agencies will install water temperature, salinity, oxygen, pH, and nitrogen sensors in the harbor that will transmit data to a public-facing Village Blue website. With this data, the public and researchers will be able to monitor and analyze changes in water quality in real time.
Researchers at the University of Washington have developed a method for connecting devices without power sources to the Internet by recycling ambient radio frequency signals, known as backscatter. This method could potentially connect a large number of inexpensive and disposable devices to the Internet of Things. The researchers have developed several prototypes to demonstrate the potential of their method, including a contact lens that can connect to Wi-Fi, a skin patch that can monitor temperature and respiration rates, and a concert poster that can broadcast music samples over radio.
Swiss startup Privately SA has developed a smartphone app called Oyoty that uses an AI chatbot to help children make smarter decisions about what they share on social media. Oyoty links with Facebook, Instagram, and Twitter and can analyze a user’s public posts to flag worrisome content, such as personal contact information or revealing pictures, start a text chat to explain why the post might be cause for concern, and offer to help delete or modify the post. Privately SA developed Oyoty to help children be more aware of safe online practices in a way that does not strictly filter content, which research suggests is ineffective for helping children make better decisions about their online activities..
UK-based healthcare technology company Oxehealth has developed a system that can monitor a patient’s pulse, blood oxygenation, and respiration rate just by analyzing data from a single, standard camera, rather than a suite of sensors. Oxehealth’s system analyzes subtle changes between video frames, such as a patient’s chest movement and skin coloration, and can trigger an alarm if a patient’s vital signs fall or increase beyond a safe range. Oxehealth is piloting its system with the London Metropolitan Police Service and the Broadmoor high-security psychiatric hospital to test its effectiveness at monitoring patient data in secure environments without the need for invasive equipment and machinery.
A University of California project called SierraNet will install a network of sensors throughout California’s Sierra Nevada mountains to collect environmental data that can help local utilities better manage their water supplies. The sensors will collect data on snow depth, air temperature and humidity, soil temperature and moisture, and solar radiation, and wirelessly transmit this data to a base station for analysis. By monitoring environmental conditions in the mountains, utility providers, especially hydroelectric plant operators, can better estimate snow levels and melt rates, which are a crucial water supply for many areas.
Researchers at Columbia University and the University of California at Berkeley have developed open source software called ActiveClean that uses machine learning to identify mistakes in a dataset, such as typos or missing values, and prioritizes which mistakes should be updated first to make the data as usable as possible. Analyzing data to identify and correct mistakes can be time consuming and error prone for humans and it can be difficult to determine which errors in a dataset can make a model built on that data unreliable. ActiveClean analyzes both a model and its underlying data to target errors that most affect the model’s accuracy to quickly produce usable data.
The U.S. Department of Commerce has partnered with the European Commission Directorate General for Communications Networks, Content, and Technology (DG Connect) to develop a public repository of economic data. Though both the United States and European Union publish economic data, comparing these datasets with each other or with themselves over time can be difficult. The database, which will launch as an alpha version in November, will provide access to large amounts of economic statistics and tools that make it easier for the public to compare and analyze economic data from the United States and European Union.