10 Bits: the Data News Hotlist
This week’s list of data news highlights covers December 3-9, 2016 and includes articles about Amazon’s new store without any checkout lines and a new database helping Holocaust survivors reclaim lost property.
Amazon has developed a proof-of-concept retail store called Amazon Go that uses connected sensors and artificial intelligence to automatically track what customers are taking, allowing customers to simply pick up what they want to buy and walk out of the store without checking out. Customers scan a smartphone app linked to their Amazon account when they enter the store, and as they shop, sensors on shelves and cameras using computer vision algorithms track what customers take and automatically add it to their Amazon shopping carts. Once customers leave the store, they are charged for whatever is in their carts, eliminating the need for cashiers or self-checkout machines.
Facebook, Microsoft, YouTube, and Twitter have announced that they will collaborate to build a shared database of terrorist imagery, such as recruitment videos, to help track and remove the content when it appears online. The database will consist of hashes—unique identifiers assigned to a particular file—of violent terrorist imagery or recruitment imagery that participating companies have previously removed from their platforms. Sharing all of these hashes can make it easier and faster to identify and remove this imagery, as it enables companies to cross-reference the database whenever they come across new imagery to see if it was previously flagged elsewhere.
A team of doctors and engineers from Sweden and Finland have developed a machine learning tool for assessing an elderly person’s risk of developing dementia. The team used machine learning to analyze the results of a Finnish study that surveyed people between 65 and 79 years old about their health and cognitive performance, and then surveyed them again 10 years later to compare the results. This analysis allowed the team to develop a dementia risk index that can estimate the risk an elderly person in good cognitive health will end up developing dementia.
The nonprofit World Jewish Restitution Organization (WJRO) has developed a database designed to help Holocaust survivors and their families reclaim property lost during World War II in Warsaw. A recent Polish law gave people a six-month window to file claims for 2,613 properties lost during World War II, many of which belonged to Jews, however if they fail to file a claim before the six months expire, they lose any claims to the property. WJRO’s database matches the street addresses of the unclaimed properties with names found in historical documents relating to the property, so users can search for their names to more quickly identify any property that might be rightfully theirs.
Researchers at Princeton University have developed a method for improving the accuracy of predictions about whether or not a person will develop breast cancer from 70 percent to 92 percent. The researchers used a technique called the influence score (I-score) to analyze genetic data and differentiate between predictive and non-predictive variables for breast cancer, reducing the “noise” involved in making predictions. The I-score was also able to identify variables that have predictive value but that are not normally considered in traditional methods of breast cancer risk prediction.
Researchers at the U.S. Department of Energy’s Argonne National Laboratory have developed a model for predicting the physical, chemical, and mechanical properties of nanomaterials—atomic-scale materials—using machine learning. Nanomaterials can have useful applications, but building models of these materials to determine what their properties are can take years. The researchers compiled experimental data and theoretical calculations about stanene, a 2D nanomaterial comprised of a one-atom-thick sheet of tin, and used a machine learning system to develop a model that could successfully satisfy all of these properties, reducing the time needed to create an accurate nanomaterial model to just a few months.
A new benefit corporation called Data Does Good has developed a service that allows consumers to donate their anonymized online shopping data in exchange for annual cash donations to the nonprofit on their choice. Data Does Good will automatically strip personally identifiable information from user’s Amazon shopping history and aggregate this data for resale. The value of this bulk data then allows Data Does Good to donate $15 per year to any U.S.-based nonprofit a user chooses indefinitely, as long as he or she makes at least one purchase on Amazon each year.
Leading AI research organizations DeepMind and OpenAI have separately announced they will make their AI development platforms freely available as open source to advance AI research. OpenAI has published its software platform, Universe, that helps AI developers train their systems by having them play games and operate other computer applications. DeepMind is open-sourcing DeepMind Lab, a 3D videogame-like training environment that developers can use to make their AI systems learn to accomplish different tasks.
The U.S. Senate has passed the bipartisan 21st Century Cures Act, which includes nearly $3 billion in funding to expand precision medicine research launched during the Obama administration, including the Precision Medicine Initiative and the White House Brain Research Through Advancing Innovative Neurotechnologies (BRAIN) Initiative, which focuses on mapping and modeling how the brain functions. The 21st Century Cures Act has already passed the House, and President Obama has announced that he will sign the bill into law.
A researcher at Microsoft Research in Israel has developed a machine learning algorithm that can predict if a drug will be recalled by analyzing search activity on Microsoft’s Bing search engine. The researcher trained his algorithm on millions of historical queries from 2015, which included the names of 300 different drugs that Bing users searched for at least 1,000 times. Based on this data, the algorithm was able to identify drugs that would be recalled up to two days before a recall was announced by regulators.
Image: Patrick Hoesly.