10 Bits: the Data News Hotlist
This week’s list of data news highlights covers January 13-19, 2018, and includes articles about how Slack is using AI to make work messages less annoying and an AI system that can make bespoke shampoo.
Health technology company Kinsa claims that it can track the spread of the flu in the United States faster and more accurately than public health authorities such as the Centers for Disease Control (CDC) by using data from its smart thermometers. The CDC tracks the flu by analyzing data from hospitals and health clinics which report the number of cases with flu-like symptoms they treat, however this data is not always timely. Kinsa receives approximately 25,000 body temperature readings per day from smartphone-linked ear and oral thermometers, which Kinsa says are in over 500,000 households across the country, and maps this data to track where fever is on the rise, which could indicate outbreaks of the flu.
Workplace messaging software company Slack is developing machine learning tools that analyze users’ unread messages and differentiate between messages users are likely to think are important and low-priority messages. Slack says its users send 70 messages per day on average, but wants to prevent so many messages from being distracting while ensuring users can easily access important communications. To develop these tools, Slack develops models called work graphs which chart relationships between users based on who they interact with and what topics they discuss, similar to how social media companies like Facebook develop social graphs of users, which can help identify whether particular messages are important to a specific user.
Microsoft and Alibaba, working separately, have both developed AI systems that score higher than humans in a reading-comprehension test, marking the first time a computer system has done so. The AI systems analyzed excerpts from over 500 Wikipedia articles on a wide variety of topics and then attempted to answer 100,000 questions related to these topics in Stanford University’s Question Answering Dataset, a common benchmark for measuring the performance of natural language processing systems. Outperforming humans at reading comprehension is a significant milestone, but many AI researchers have noted that while Microsoft’s and Alibaba’s systems are impressive, they are only effective in very narrowly controlled tests, and would have to improve significantly to achieve a general level of meaningful reading comprehension comparable to humans.
Michigan State University has launched an initiative called “Enslaved: The People of the Historic Slave Trade” to develop a public database linking data from a variety of sources about enslaved people in the Americas, which could help researchers and members of the public identify enslaved ancestors and track their lineage. Data about enslaved people comes from baptismal records, plantation inventories, and other sources that are frequently handwritten and damaged, making it difficult for researchers to easily analyze on a large scale. Many digitization projects have made such data available online and in machine-readable formats, but this data is fragmented and spread across multiple databases. The “Enslaved” database will allow users to search for records about enslaved people from many of these repositories simultaneously.
Researchers at Boston University have developed a prototype AI system that can analyze images of kidney biopsies and outperform human experts in six different disease classification tasks, such as predicting which stage of chronic kidney disease a biopsy indicated. The researchers believe their system could help develop both diagnostic and prognostic applications, which could help alleviate the health-care system’s shortage of nephrologists by automating portions of their work.
New York City-based startup Prose has developed an AI system that generates personalized shampoo formulas for customers based on data about their hair, lifestyle, and environment. The system uses 85 different data points, such as if a customer is vegan, lives in an area with a lot of air pollution, or has dry hair, to determine which combination of 76 different potential ingredients would best suit a customer. Prose has already designed 1,000 different shampoo formulas for customers and can continuously refine its system the more people use it.
Researchers at Johns Hopkins University have developed a test that can detect signs of eight common cancers with just a blood sample, that could cost as little as $500, which is substantially less expensive than other kinds of more invasive cancer screenings. The test searches for the presence of eight cancer proteins and 16 cancer-related mutations that can appear in a patient’s blood before they ever display symptoms, and then uses a machine learning algorithm to determine the location of a tumor based on these signs. In a trial, the test could detect ovarian cancer with 98 percent accuracy and determine the location of a tumor 83 percent of the time.
Italian oil company Eni has developed a supercomputer called HPC4 to identify underground oil and gas reserves with a peak performance of 18.6 petaflops, making it the most powerful commercially-owned supercomputer and one of the top 10 most powerful supercomputers in the world. HPC4 will analyze data from prospecting drones to help pinpoint the location of new oil and gas deposits, which can help Eni avoid spending large amounts of money to drill for oil in the wrong place. HPC4 will also analyze data from sensors that monitor staff in dangerous work environments.
NASA researchers have successfully demonstrated a navigation system designed for spacecrafts that can automatically identify its position by calculating its distance from different pulsars, which are ultra dense stars that emit regular pulses of radiation. Spacecrafts normally rely on a global network of radio antennas to navigate in space, but this system is less accurate the farther away a spacecraft is from Earth. The researchers developed an algorithm that can automatically determine an object’s absolute location by analyzing changes in emissions from pulsars, similar to how a GPS system triangulates its position based on its proximity to different satellites. Because this calculation happens locally and does not depend on feedback from Earth to function, this system could allow spacecraft to navigate more efficiently and effectively in deep space.
The Intelligence Advanced Research Projects Activity (IARPA), which is overseen by the U.S. Office of the Director of National Intelligence, has launched a research initiative called Deep Intermodal Video Activity (DIVA) to develop a machine learning system that can automatically monitor multiple video feeds, such as security cameras and body cameras, and warn authorities about signs of potential terrorist attacks. IARPA’s goal is to have this system supplement human monitoring of areas around government buildings and densely populated public areas and prompt interventions to stop attacks before they happen. DIVA will first focus on developing software that can identify general activities, such as a person holding an object, and then focus on more complex activities that could signal an impending attack, such as a person carrying a gun or dropping off an object in a public location.