This week’s list of data news highlights covers September 3-7, 2018, and includes articles about an AI system that can identify images of child abuse and a new search engine for open data.
Google has developed an AI tool designed to identify images of child sexual abuse material (CSAM) online and has made it freely available. Automated systems to detect CSAM already exist, but they typically involve checking an image against a database of previously reported CSAM, making it ineffective for new material that has never been reviewed before. Google’s tool instead uses AI to analyze suspected CSAM and prioritize material that is most likely CSAM for human review. In a trial, Google’s tool helped a moderator review 700 percent more CSAM than a moderator relying on traditional tools over the same time period.
Hearing aid manufacturer Starkey Hearing Technologies has developed a new hearing aid called Livio AI that uses AI to improve a user’s hearing better than traditional methods. Livio AI uses machine learning to monitor a user’s environment and can automatically detect and amplify sounds a user wants to hear and dampen background noise. Livio AI also has sensors that can track a user’s movements, detect falls and automatically notify a caretaker or emergency services, and provide in-ear translation of other languages.
The U.S. Centers for Disease Control (CDC) and IBM have developed a blockchain-based system to improve how public health agencies collect and share data. The system is designed to coordinate the wide variety of health data the CDC collects from surveys, such as the National Ambulatory Medical Care Survey, allow the data to be easily updated, and make it easy for the CDC to grant researchers access to the data. The system is still a prototype, but could eventually make it easier for researchers to study public health crises such as the opioid epidemic.
A startup called WatchTower Robotics has developed a robot that can swim through water pipes and automatically detect leaks. Leaking pipes can result in serious damage and waste trillions of gallons of water per year, and while leak-detection systems exist, they typically rely on acoustic sensors, which are not reliable in noisy, densely populated areas. WatchTower Robotics’ robot, which resembles a badminton birdie, uses sensors on its fins to detect when suction tugs on it, indicating a nearby leak. The robot logs the location of all these tugs, so when it is collected, this data can generate accurate maps of where pipes are leaking.
Researchers at the Auckland University of Technology have developed an AI system than can analyze brain activity and predict a person’s choice before they consciously make it. The system relies on data from an electroencephalogram headset worn by a participant while researchers expose them to logos of different beverages. The system learns to associate brain activity patterns with the different logos, and when a participant is asked to choose between beverages, the system can predict which one they will choose 0.2 seconds before they consciously perceive it.
Google has developed a service called Dataset Search to make it easier for people to locate freely available data online. Despite the large amounts of open data on the Internet, researchers, journalists, data scientists, and others often struggle to identify what data is available to them because it can be difficult to find. Unlike other Google search tools, Dataset Search does not analyze the data itself to determine if it is relevant to a user’s query, and instead relies on data owners tagging their data with a standardized vocabulary from Schema.org, an initiative launched by Google, Microsoft, Yahoo, and Yandex to make it easier to manage structured data online.
Genetic analysis company Personal Genome Diagnostics has developed a machine learning tool called Cerebro that can identify cancerous mutations more accurately than existing methods. Sequencing tumor genomes provides valuable data that can help doctors identify the best treatments, however cancer tumors are constantly mutating, and analysis software often requires human experts to check for missed mutations or false positives. Cerebro automates this review process by generating confidence scores for each suspected mutation to indicate the likelihood that it is a false positive.
Uber has developed a feature called Ride Check that uses a smartphone’s GPS, accelerometer, and other sensors to automatically detect if a user experienced a crash and call emergency services. Ride Check works inside the Uber app and can detect spikes of force, rapid speed changes, or other anomalies that could indicate a crash, and sends a notification to users’ phones to verify if they were in an accident and call for emergency assistance if necessary. Ride Check will also send this notification if it detects prolonged or unusual stops to verify a user’s safety.
A startup called Rigetti Computing has launched its Quantum Cloud Service (QCS) to make quantum computing services available via the cloud. Quantum computing have the potential to be substantially more effective for solving certain kinds of computing challenges, such as advanced physics calculations, than traditional computing. However since quantum computing resources are limited, researchers often must rely on application programming interfaces (APIs) to send bits of data to quantum computers for processing and then wait for it to get sent back, which can be time intensive. QCS instead relies on a datacenter combining both traditional and quantum computers, allowing users to access the computers to run quantum algorithms 20 to 50 times faster than traditional cloud computing setups.
Researchers from the Allen Institute for Artificial Intelligence, OpenAI, and fast.ai, have developed an AI model called Embeddings from Language Models (ELMo) that can comprehend language significantly better than previous approaches. ELMo relies on a technique called word embeddings, which maps word relationship to each other based on where they appear in text, allowing it to train unsupervised on unstructured data—a feat that has long plagued natural language processing researchers. In tests, ELMo outperformed leading systems at tasks such as reaching comprehension and sentiment analysis by as much as 25 percent, which is a major jump in a field that typically only sees incremental progress.