10 Bits: the Data News Hotlist
This week’s list of data news highlights covers September 10-16, 2016 and includes articles about a new marketplace for machine learning algorithms and a startup using genomics to keep livestock healthy.
The U.S. Department of Health and Human Services (HHS) has issued new rules about how researchers and pharmaceutical companies must publish data about clinical trials that use human volunteers. HHS for years has required clinical trial data involving human volunteers to be published in a timely manner on ClinicalTrials.gov, however many universities and companies have failed to do so. The new rules are designed to clarify the data publication requirements and require that researchers still publish clinical trial data for drugs that are never brought to market. Failure to comply with the new rules, which go into effect in January 2017, could result in fines of up to $10,000 per day or withholding of government funding for future trials.
Researchers at the Massachusetts Institute of Technology and the Georgia Institute of Technology have developed an imaging technique that enables a computer to “see” writing on individual pages of a book while the book is closed. The technique relies on shooting bursts of terahertz radiation—close to infrared light on the electromagnetic spectrum—at a book and using a special camera to analyze how long it takes for the radiation to bounce back to the camera. An algorithm then clarifies the resulting image. Because different substances absorb terahertz radiation at different rates, the system can differentiate between plain paper and paper with ink on it, with a high degree of accuracy up to nine pages deep.
The startup company Algorithmia has developed an online marketplace for machine learning algorithms to make it easier for companies to purchase and implement machine learning software into their products and services without having to develop the code themselves. Algorithmia allows developers to sell their algorithms on a fee-per-use basis, while also providing cloud computing power for companies to run programs using these algorithms. Algorithmia also hosts open-source machine learning algorithms, which are free for companies to use, and it only charges for the computing power customers use.
Researchers from Intel Labs and Darmstadt University in Germany have developed a method for converting the advanced 3D simulated environments from the video game Grand Theft Auto into training data for self-driving cars. Because the game’s environments are so complex, the researchers trained a machine-learning algorithm to classify all of the objects in an environment, such as pedestrians and other cars, and then extract this data into a format usable in training self-driving cars. By automating the process of extracting training data from a video game, researchers hope to reduce the large amounts of time and money autonomous vehicle researchers spend capturing data from real-world environments to train their systems.
Japan’s Education, Sports, Science, Culture, and Technology Ministry has announced it will establish data science education research centers at 10 national universities in 2017. The ministry will provide ¥1.2 billion ($180 million) in funding for the education centers, which will develop educational materials and curricula focused on data science, statistics, and mathematics. According to a 2015 survey, 40 percent of Japanese companies in the field of big data reported a shortage of workers trained in data science skills.
U.K.-based startup Luminance has developed an AI system capable of analyzing legal and financial documents, which could help automate the process of analyzing potential investments and acquisitions. Luminance’s system can analyze hundreds of pages of legal documents in under a minute, greatly reducing the time and cost of due diligence, which teams of lawyers and analysts otherwise must perform manually. In a trial of Luminance’s software, London-based law firm Slaughter and May was able to cut the time of the due diligence process by 50 percent.
Online education company Udacity has partnered with Mercedes-Benz, self-driving truck startup Otto, and Nvidia to launch an online education program that teaches software engineers the skills to help develop self-driving cars. The course, which lasts 12 weeks, is designed to help address the shortage of workers with the computer science expertise necessary for autonomous vehicles, such as machine learning and computer vision, as the industry rapidly grows and companies struggle to find enough skilled workers.
Startup Rex Animal Health has developed a clinical support system for veterinarians to treat farm animals. It predicts genetic factors contributing to poor livestock health and disease susceptibility. The system can analyze livestock genomes and provide farmers with epidemiological forecasts of how diseases in the area would affect their herds and help develop preventative measures. Additionally, the system allows veterinarians to convert paper notes into machine readable health data for animals, track symptoms and treatment measures, and provide a cost-benefit analysis of different treatment options.
A wearable device called Shortcut relies on sensor technology, similar to that used in advanced prosthetic limbs, to analyze amputees’ muscle movements and translate different movements into actions on a computer. An amputee with a prosthetic hand would place Shortcut on his or her wrist where it can sense muscle movements corresponding to specific gestures that can be programmed to trigger specific actions. For example, Shortcut could translate the muscle movements that would create an “OK” hand sign into a left click on a computer mouse.
Jodie Archer, a former literature researcher at Apple, and Matthew Jockers, an English professor, have developed an algorithm capable of analyzing novels predicting which ones will be best sellers. Archer and Jockers trained their algorithm on the text of more than 20,000 novels to identify factors likely to make a book popular, and they found language traits such as straightforward prose and declarative verbs, plus story traits, such as narrative cohesion and human closeness, were more common in bestselling books. In a test of their algorithm on the past 30 years of New York Times bestsellers, Archers and Jockers were able to predict which novels would be on the bestseller list based on the presence of these traits with 80 percent accuracy.
Image: U.S. Department of Agriculture.