This week’s list of data news highlights covers April 14-20, 2018, and includes articles about a microscope that uses machine learning to detect cancer and a new method for detecting passwords accidentally posted online.
Researchers at the Massachusetts Institute of Technology have developed a machine learning model that can process audio similar to how the human brain does. The researchers designed their model to imitate the theorized hierarchical structure of the brain’s auditory cortex, in which sensory information goes through successively complex stages of processing. The model was capable of identifying spoken words in audio clips and identifying the genre of music in a music clip with the same accuracy as a human.
Researchers at Google have developed a prototype augmented reality microscope (ARM) that uses machine learning to analyze images in real time to identify different cancers. ARM uses a modified digital microscope and its software can process the contents of slides at a rate of 10 frames per second, highlighting cancer cells in the viewer. ARM can currently detect breast and prostate cancer, but could be trained to identify signs of a variety of other diseases, such as malaria, which could make it valuable for use in developing countries where doctors trained to diagnose these diseases are scarce.
A team of management professors and researchers at Washington University in St. Louis have created a program named SimSoy that can analyze historical data about a farm’s weather and soil to predict which of 182 varieties of soy would perform the best for that particular farm. The team compiled data about soy seed varieties, historical weather data, and soil data for large areas of farmland in the midwestern United States and Canada to develop predictive models that could estimate which soy variety would produce the best yield on farms that never grew them before.
A pair of South Korean researchers from the Cheonan Public Health Center and the Korea Advanced Institute of Science and Technology have developed a machine learning system that can identify whether a person is likely to develop Alzheimer’s based on brain images. A scanning technique called positron emission tomography (PET) can reveal growths of certain protein clumps on the brain that form as a result of Alzheimer’s disease. Spotting these growths can reveal if a person will soon develop the disease or is just experiencing regular cognitive decline with age, but analyzing PET scans is time-intensive and error-prone. The researcher’s system can identify if a person with mild cognitive impairment will develop Alzheimer’s within three years with 81 percent accuracy.
Google and 3D scanning nonprofit CyArk have launched a project called Open Heritage to scan historical sites at risk of destruction from natural disasters or human conflict and recreate them as 3D renderings. Open Heritage will use laser mapping and high-resolution photography to create accurate digital representations of 25 locations in 18 countries, including the Ananda Ok Kyuang temple in Myanmar and the Al Azem Palace in Syria.
Researchers at Ben-Gurion University in Israel and the University of Washington have developed an algorithm that can identify social media profiles that have a high likelihood of being fake. In graph theory, social media profiles can be represented as nodes with varying numbers of connections to other nodes, also known as edges, depending on the size of their social network. The researchers trained their system to identify anomalous patterns of edges that indicate a particular profile is likely to be fake. In a test on 10 different simulated and real-world social networks, such as Yelp and Twitter, the system could detect fake profiles with higher accuracy and lower false positive rates than traditional methods.
Researchers at the Massachusetts Institute of Technology have developed an AI system called RoadTracer that can automatically build maps from aerial imagery 45 percent more accurately than existing approaches. Constructing maps from aerial imagery can be tedious and expensive, making it infeasible to maintain up-to-date maps of areas with low populations or frequent construction. Traditional automated approaches which build maps by labeling each pixel in an image as “road” or “not road,” but trees, buildings, and other objects can obscure the road, making this unreliable. Instead RoadTracer first predicts connections between known locations on a road network and then analyzes pixels to confirm and adjust these predictions.
Chaos theorists at the University of Maryland have developed a machine learning system that can imitate the properties of a chaos theory equation called the Kuramoto-Sivashinsky equation, allowing it to simulate complex behaviors of certain physical phenomenon. The equation behaves like a flame that reacts as it moves through a combustible medium and is useful for researchers studying turbulence. Rather than teach the system the equation itself, the researchers trained the system on data about how the equation behaves, allowing it to learn to predict the equations behavior for significantly longer periods of time than existing approaches. This approach could eventually allow machine learning systems to predict chaotic systems with unknown equations, enabling applications such as weather forecasting without the need for sophisticated atmospheric models.
Peter Christensen, an art history professor at the University of Rochester, has launched a research initiative to study “architecture biometrics,” which uses analytics techniques similar to those used by facial recognition algorithms to study buildings. Christensen’s research involves taking detailed 3D measurements of buildings and performing comparative analysis of these models. Christensen hopes to be able to use this approach to track how different influences shape architectural trends.
Software firm Pivotal has developed an AI system that can differentiate between regular text and text that is likely part of a password, which could help stop people from accidentally posting their passwords online. Traditional approaches to detecting publicly disclosed passwords rely on hard-coded criteria to instruct software to flag any text that meets specific criteria, such as length or the use of certain characters, however this is imprecise difficult since passwords can vary greatly. Pivotal first converted strings of characters, including both passwords and regular text, into matrices, which describe each sequence as a series of numbers. Since passwords are often randomized, by visualizing these matrices, Pivotal’s system was able to differentiate between passwords and regular information by the level of randomness each matrix contained.
Image: Paul Arps.