Data Set Wikimedia

Published on February 16th, 2017 | by Joshua New


Studying How to Make Wikipedia Less Toxic

Researchers working for Wikimedia’s Wikipedia Detox project, which focuses on reducing the impact of harassment and attacks on the Wikipedia editor community, have published a dataset of more than 100,000 comments from English-language Wikipedia pages, annotated with information about whether or not a comment included a personal attack. The researchers collected the data to help develop methods that combine crowdsourced analysis and machine learning to automatically detect personal attacks on the site.

Get the data.

Image: Wikimedia

Tags: , , ,

About the Author

Joshua New is a policy analyst at the Center for Data Innovation. He has a background in government affairs, policy, and communication. Prior to joining the Center for Data Innovation, Joshua graduated from American University with degrees in C.L.E.G. (Communication, Legal Institutions, Economics, and Government) and Public Communication. His research focuses on methods of promoting innovative and emerging technologies as a means of improving the economy and quality of life. Follow Joshua on Twitter @Josh_A_New.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top ↑

Show Buttons
Hide Buttons