Data Set Quora

Published on February 2nd, 2017 | by Joshua New


Learning to Understand What People Ask on the Internet

Crowdsourced question-and-answer website Quora has published a dataset of different questions users ask on its website that could be similar to each other to spur the development of algorithms that can help curate redundant questions. For example, the questions “Should I learn Python or Java first?” and “If I had to choose between learning Java and Python, what should I choose to learn first?” are semantically equivalent—though they use different words and structures, they are asking the same thing—but identifying this can be challenging to automate, causing users to frequently post the same questions on the website without realizing there is already an answer available. The dataset contains 400,000 lines of these potentially similar questions and indicates whether or not they are semantically similar, so developers could build natural language processing systems that could help Quora redirect users to posts where their questions have already been answered.

Get the data.

Tags: , , ,

About the Author

Joshua New is a policy analyst at the Center for Data Innovation. He has a background in government affairs, policy, and communication. Prior to joining the Center for Data Innovation, Joshua graduated from American University with degrees in C.L.E.G. (Communication, Legal Institutions, Economics, and Government) and Public Communication. His research focuses on methods of promoting innovative and emerging technologies as a means of improving the economy and quality of life. Follow Joshua on Twitter @Josh_A_New.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top ↑

Show Buttons
Hide Buttons