News articles contain a wealth of information about people and events, but since most of this data is unstructured—meaning that it cannot be easily queried in a database—it is difficult for others to make use of it. To change this, researchers at Georgetown University, Penn State University and the University of Texas at Dallas have created the Global Database of Events, Language and Tone (GDELT), an open dataset of events that have been extracted from global news articles and automatically structured. The data includes the locations of events, what individuals were involved, and the tone of public sentiment toward the event. Or as computer scientist Kalev Leetaru, co-creator of GDELT and a Georgetown University fellow, succinctly describes it, the database is “basically processing the world’s news each day and compiling a list of everything that has happened, who’s involved, and how the world feels about it.”
GDELT launched this spring and already the database has been used to predict human rights violations, track influencer networks in Iran and even map the public’s feelings about Obamacare. Its creators hope it will give researchers and governments around the world the ability to forecast political events and public opinion with greater accuracy than ever before; GDELT can help users quantify the likelihood an unpopular leader will be overthrown, or map out hot spots for future insurgent activities. The database includes English language news articles from all corners of the globe and spanning 1979 to present. The project, which was funded by the National Science Foundation, Reed Elsevier, Google, the BBC and the University of Texas at Dallas, updates with new data daily.
A recent expansion of the project, called the GDELT Global Knowledge Graph, launched in October 2013 and attempts to identify connections between people, organizations, locations, themes, news sources and events. Its developers hope it will allow researchers to explore latent patterns in global news events using network analysis tools and methodologies that would have been impossible using only raw, unstructured text.
The Global Knowledge Graph formed the basis of the aforementioned Iran analysis and allowed researchers to employ community detection algorithms common in social network analysis to map the relationships between the country’s prominent newsmakers. While the results were not surprising—the United States is well aware of the fact that Supreme Leader Ali Khamenei keeps a relatively low profile, for example, and that some of Iran’s key political figures are largely insulated from interactions with foreign nations—the very possibility of automatically deriving insights about the country’s political structure opens up numerous new directions for study in international relations and defense.
And the defense community is already interested, with researchers from DARPA collaborating on a paper this summer to improve forecasting methodologies for various international events such as political instability and insurgent activities using GDELT. Co-creator Phillip Schrodt, a professor of political science at Penn State University, has noted that the system’s only competitor is the U.S. Department of Defense’s Worldwide Integrated Crisis Early Warning System, which is only available to a small number of users, offers less granular event data than GDELT and lacks the newer system’s sophisticated text analysis component.
Interest in the database is growing rapidly, and the wealth of information that can be gleaned from previous unstructured news articles has great potential to help international relations experts, defense analysts, international development organizations and others better predict the future of human action.