Data Set Jeopardy host Alex Trebek poses with a winner.

Published on August 14th, 2014 | by Travis Korte


Fans Create Database of Over 200,000 Jeopardy Questions

Reddit users have created a machine-readable data set of over 200,000 Jeopardy questions. The data, which the dataset’s creators scraped from fan-created question repository J!-Archive, contains each question’s answer, along with category, dollar value, air date, and other data. One analysis using the data set showed how diverse Jeopardy’s question categories are: the 100 most commonly used categories span only 11 percent of total questions asked. The creator of that analysis noted that this extreme amount of variation “has given me a lot of sympathy for IBM’s Jeopardy!-playing robot Watson.”

Get the data.

Photo: Queen’s University

Tags: , , , , ,

About the Author

Travis Korte is a research analyst at the Center for Data Innovation specializing in data science applications and open data. He has a background in journalism, computer science and statistics. Prior to joining the Center for Data Innovation, he launched the Science vertical of The Huffington Post and served as its Associate Editor, covering a wide range of science and technology topics. He has worked on data science projects with HuffPost and other organizations. Before this, he graduated with highest honors from the University of California, Berkeley, having studied critical theory and completed coursework in computer science and economics. His research interests are in computational social science and using data to engage with complex social systems. You can follow him on Twitter @traviskorte.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top ↑

Show Buttons
Hide Buttons