10 Bits: The Data News Hotlist
This week’s list of data news highlights covers May 23-29, 2015 and includes articles about how data mining can catch unsavory government spending and a robot that can teach itself to overcome injuries.
IBM’s artificial intelligence platform Watson has processed videos of 2,000 TED Talks, the popular science, cultural, and academic speaker series, to map out connections between subjects they discuss. Users can ask Watson a question, and Watson will pull relevant clips from the videos that touch on the subject. IBM hopes that Watson could eventually be applied to other media, such as a user’s Twitter feed, to help connect users with TED Talks and other content relevant to their interests.
Pharmaceutical companies and pharmacies are turning to advanced supply chain monitoring technology to ensure they can meet consumer demand for allergy medication during allergy season. By combining inventory data from thousands of stores and retailers with weather data, pollen predictions, and other data, vendors are able to make more detailed predictions about demand spikes months in advance to ensure pharmacies are not out of stock of allergy medication, which can result in a six to ten percent loss in retail sales.
A report from the U.S. Department of Defense inspector general used data mining to identify thousands of transactions over a one-year period totalling $1 million in spending at casinos and on adult entertainment using credit cards issued to workers at the Pentagon. Though these transactions are just a small fraction of the $3.4 billion in legitimate credit-card spending by Pentagon employees, the inspector general recommends the Pentagon adopt better monitoring and authorization standards that can provide more real-time visibility into employee spending.
The U.S. City Open Data Census, a ranking of municipal open data efforts conducted by open information non-profit Open Knowledge, has listed Los Angeles as the top city for open data in the United States, with New York City and San Francisco coming in second and third place, respectively. The survey evaluated 98 cities on the types of government datasets they publish, such as campaign finance data and government spending, so city leaders can understand how to improve their open data policies.
Researchers at the University of Warwick in England have devised a method of measuring crowds with mobile phone data and geotagged tweets. The researchers studied two months of Twitter data and mobile phone data provided by an Italian telecommunications company to analyze activity surrounding nine soccer matches in Milan to predict attendance at a 10th match with a margin of error of 13 percent. The researchers hope that they can improve their prediction model by analyzing data from other countries and in different environments to more accurately understand how crowds form.
The National Institute of Standards and Technology (NIST) released a research and development roadmap for location-based services to guide the formation of public safety technologies over the next 20 years. The roadmap outlines the need for standards on new and emerging technologies that can report location data, such as Internet-connected wearables and mobile safety apps, so that public safety officials can use this data in an emergency. NIST expects that these devices will eventually be able to accurately report location data from inside buildings, underground, and in rural areas, which could substantially improve public safety efforts.
A medical research team from the U.S. Army, Massachusetts General Hospital, and other health organizations are field testing an automated system for ambulances that uses pattern recognition to diagnose life-threatening bleeding before a patient arrives at the hospital. The system analyzes vital signs, such as blood pressure, heart rate, and breathing patterns, to determine the chances that a patient has life threatening bleeding with up to 80 percent accuracy—substantially higher the 50 percent accuracy for standard clinical practice. This diagnosis allows hospitals to prepare emergency surgery or blood transfusions before a patient arrives.
Researchers at Sorbonne University in Paris have developed a machine learning method for teaching robots how to work around damages, such as a broken robotic leg, through trial and error. The team equipped robots with sensors that can monitor indicators of performance, such as speed and direction of movement, and fed this data into algorithms based on a machine learning technique called Bayesian optimization. When the robot is damaged to the point where its regular method of walking is impaired, the robot tests new solutions by combining pre-programmed knowledge about walking patterns with random choices to identify the most efficient method of continuing its task.
Photo-sharing website Pinterest has developed visual search technology that uses deep learning, a type of artificial intelligence, to help connect users with the content they search for. The technology combines machine vision technology that can recognize the contents of images with text metadata to identify images that meet certain search criteria, while filtering out false positives, to recommend content to its users based on how similarly it matches their interests.
New York City’s Board of Elections has, for the first time, published machine-readable data on the results of a recent special election for the 11th Congressional District. The Board of Elections, which is not under the jurisdiction of other city or state open data initiatives, hopes that publishing this data in easy-to-use formats will help journalists and researchers conduct analysis, such as improving voter turnout or ensuring that elections are accessible to people with disabilities.
Image: flickr user Steve Jurvetson.