10 Bits: the Data News Hotlist
This week’s list of data news highlights covers November 5-11, 2016 and includes articles about a wearable device that can help the blind read text and a system for tracking endangered dugong populations with drones and machine learning.
The Patricia Galvao Institute, a women’s rights organization based on Brazil, has launched an online tool called the Femicide Dossier that compiles data on gendered violence in Brazil to make the information more easily available to the public. The Femicide Dossier makes its data freely available under a Creative Commons license to encourage the public to analyze the data, as well as provides information about legal protections for women and steps people can take to help reduce the femicide rate in Brazil, which is the fifth highest in the world.
Researchers at the University of Maryland have built a prototype device called HandSight that uses a hand-worn wearable camera to analyze text and convert it to audio for people with vision impairments. HandSight attaches to a user’s finger and connects to a computer which translates the text to speech, as well as triggers audio cues or haptic feedback in the device to help users follow lines of text. In a test, blind users were able to read between 64 and 81 words per minute on average with HandSight with a high degree of accuracy.
Researchers at the Swiss Federal Institute of Technology and Penn State University have developed a smartphone app called PlantVillage that uses an artificial neural network to analyze images of sick plants and identify their diseases. PlantVillage can correctly diagnose plants with 98 percent accuracy in high-quality images, and its database of 150,000 images of diseased plants, the largest of its kind in the world, is free to download. PlantVillage will be publicly available in early 2017.
Researchers at the University of Oxford have developed deep learning software called LipNet capable of reading lips nearly twice as accurately as humans, albeit only for a limited vocabulary, and in near-real time. The researchers trained LipNet on a dataset of thousands of annotated videos of humans mouthing different words, and in a test, could read lips with 93 percent accuracy while experienced human lip readers could only achieve 52 percent accuracy.
Researchers at the Swiss Federal Institute of Technology used machine learning to reveal that astronomy research papers written by women receive 10 percent fewer citations than papers with men listed as the first author. The researchers developed a machine learning algorithm to analyze 200,000 papers in five astronomy journals written between 1950 and 2015 to calculate the amount of citations a paper could be expected to receive based on non-gender-related factors, such as date and how established its authors were. Then, by analyzing papers based on whether or not a woman was listed as the first author, the algorithm found that women receive 6 percent fewer citations than men, and actually should have received 4 percent more than men based on the non-gender-related factors of their research.
Google’s DeepMind has trained an AI system to teach itself about the physical properties of objects by interacting with them. In a virtual environment, researchers tasked the system to repeatedly identify the heaviest of five same-sized blocks, randomly assigning them different masses each time, and in another environment, tasked the system with determining if groups of blocks were stuck together or not. For both tests, the researchers gave the system positive feedback for getting the answer right, as well as negative feedback for wrong answers, causing the system to eventually learn to physically manipulate the blocks to determine their properties before making a guess.
Mental health researchers at Cincinnati Children’s Hospital have developed a machine learning model that can analyze a person’s spoken and written words to determine if he or she is suicidal with 93 percent accuracy. The researchers trained the model on interviews using open-ended mental health-related questions and standardized behavioral surveys of 379 patients that were diagnosed as either suicidal, mentally ill but not suicidal, or neither, and were able to identify certain language use and behavior strongly correlated to suicidal tendencies. When classifying a person as any of these three categories, rather than just suicidal or not, the model was 85 percent accurate.
British agricultural research organization Rothamsted Research has partnered with analytics firm Tessella to make data from agricultural research more easily available to scientists to support the United Nation’s Global Open Data for Agriculture and Nutrition (Godan) initiative to improve food security. Data from long-term agricultural experiments can be very difficult to analyze due to its diversity and size, as sensors can constantly collect large amounts of a wide variety of data. But making this data publicly available by combining these datasets for secondary analysis can lead to valuable new insights. Rothamsted Research and Tessella will develop a public portal that makes accessing and analyzing this data more user friendly.
A marine biology research team at Murdoch University in Australia is using drones and Google’s open source machine learning platform TensorFlow to identify dugongs, also known as sea cows, underwater to track their population and monitor changes in their habitat. Dugongs are endangered, but monitoring their activity in large bodies of water is difficult as they can be difficult to spot from above the surface. The team uses drones to take aerial photographs of dugong habitats and then uses TesnorFlow to quickly identify dugongs with 80 percent accuracy, even in complex images such as when the water is rough or there is a large amount of underwater plant life.
IBM has partnered with researchers at Harvard University and the Massachusetts Institute of Technology to use its Watson cognitive computing platform to analyze genomic data from drug-resistant tumors and identify new methods of treating those cancers. Certain cancers can develop genetic mutations that make them resistant to known treatments, and these lead to 600,000 deaths per year in the United States. Over the next five years, researchers will use Watson to analyze genetic sequence data and help predict drug sensitivity and resistance.
Image: Julian Willem.