This week’s list of data news highlights covers August 11-17, 2018, and includes articles about a an AI system that can manage a data center’s cooling and an effort to combat hate speech on Facebook in Myanmar.
Researchers at Drexel University and George Washington University have developed machine learning techniques to recognize identifying patterns in code that can identify its original author, similar to how statistical analysis of text, such as the syntax and vocabulary used, can identify a text’s original author. Since code does not use syntax or vocabulary in the same way as written language, the researchers instead developed methods for recognizing abstract syntactic patterns in code. This approach, which is effective even for analyzing compiled code, could help resolve plagiarism disputes, such as when a student plagiarizes code for an assignment or if a developer re-uses code in violation of a noncompete clause.
Google has announced that several of its data centers now rely exclusively on AI to manage their cooling. Google has been experimenting with using AI to optimize data center cooling for several years, using software to help managers to increase energy savings by 40 percent. The new system, developed by DeepMind, analyzes data about data center performance and uses a machine learning technique called reinforcement learning to determine cooling configurations that reduce energy consumption without putting a data center at risk.
The U.S. Federal Reserve has been experimenting with analyzing real-time spending data to gain insights into economic activity, a process that would normally take weeks. The Fed deployed a new tool three days after Hurricanes Harvey and Irma hit Houston and Miami in 2017 to analyze card swipe data from payment technology company First Data Corp. This data showed that spending plummeted below normal levels or stopped almost completely, which was to be expected, but also that after two weeks, there was no above-normal spending rebound, which allowed the Fed to more accurately assess how the storms would detract from third-quarter economic growth. The Fed is now studying how to apply this approach to get better data on overall retail spending, employment, and prices.
Alphabet, Amazon, IBM, Microsoft, and Salesforce have pledged to build health care technology using FHIR (Fast Healthcare Interoperability Resources) Specification, a set of standards for sharing health data designed to encourage interoperability. Though FHIR is not new, a lack of interoperability has plagued the U.S. healthcare system for decades as there is little incentive for major market players to adopt common standards that would make it easier for customers to integrate competitors’ products, and healthcare providers have little incentive to make it easier for patients to bring their data to shop around for better care from other providers.
Researchers at the Cardiovascular Disease Initiative at the Broad Institute, a research organization run by Harvard University and the Massachusetts Institute of Technology, have developed a method for predicting whether a person is at high risk of developing a disease by analyzing their genome. Linking genetic factors to disease risk is not new, however traditionally this has involved identifying whether had a person had a specific mutation linked with increased disease risk. The researchers’ method relies on analyzing millions of different genetic factors associated with diseases to establish a “polygenic score” for five common diseases, including atrial fibrillation, type 2 diabetes, and breast cancer. This approach can help doctors make more informed decisions about whether to provide preventative or proactive treatments to patients with high risk scores.
Facebook has announced that it is using AI to identify and remove content in Myanmar that violates its policies about hate speech and misinformation designed to fuel racial violence in the country. Human rights groups have noted that groups have been using Facebook to share hate speech and misinformation to foment hatred and violence against Rohingya Muslims since 2013, but Facebook, which relies primarily on users to flag posts that violate its policies on hate speech, has struggled to remove the content as few people report the posts. Facebook’s AI system is able to flag 52 percent of all the content it removes before humans report it.
Researchers at DeepMind, University College London, and Moorfields Eye Hospital have created AI software that can diagnose over 50 different eye diseases by using deep learning to analyze 3D scans of patients’ eyes. The researchers trained their software on 15,000 eye scans from 7,500 patients paired with diagnoses from human experts to teach it to associate abnormalities in the scans with different diseases. In a test, the software made the same diagnoses as a team of eight doctors 94 percent of the time.
The U.S. Department of Defense (DoD) has hired its first ever chief data officer, Michael Conlin. Though several branches of the military have had chief data officers for years, Conlin will join DoD as its first chief data officer as the agency works towards adopting commercial cloud services and consolidating its cloud infrastructure with a single provider.
The U.S. Food and Drug Administration (FDA) has approved a smartphone app called Natural Cycles as a method of contraception. Natural Cycles logs a woman’s body temperature and uses an algorithm to predict days when a user will be fertile and thus require protection or abstinence to prevent pregnancy. Natural Cycles requires a minimum of five temperature recordings per week and can accurately predict fertility after just one to three menstrual cycles.
The International Wheat Genome Sequencing Consortium (IWGSC), an international collaboration to study the genome of wheat, has finally sequenced the genome of bread wheat after 13 years. Sequencing crops’ genomes can provide valuable data that can guide efforts to improve food security and increase production. However, bread wheat, which is the most widely grown crop worldwide, has a genome five times larger and more complex than the human genome, which has caused some to dismiss the task of sequencing its genome as impossible. IWGSC’s sequencing data includes the exact location of wheat’s 107,891 genes and 4.7 million molecular markers, as well as information about gene expression.
Image: Sakae Ramaru.