Data Set Editing text

Published on August 4th, 2016 | by Joshua New


Understanding How Non-Native English Speakers Write

Researchers at the Massachusetts Institute of Technology (MIT) have published the first database of annotated English sentences written by non-native English speakers to help train natural language processing systems. English is the most-used language on the Internet, but the majority of people who use English online are non-native speakers, which can make it difficult for language processing algorithms to analyze large amounts of text with the imperfect grammatical quirks non-native speakers often exhibit. The database consists of 5,124 English sentences written by native speakers of 10 different languages, all of which contain at least one grammatical error, along with annotation about the parts of speech used and the relationship between different words, and a corrected version of each sentence for comparison.

Get the data.

Image: WokinghamLibraries

Tags: , , ,

About the Author

Joshua New is a policy analyst at the Center for Data Innovation. He has a background in government affairs, policy, and communication. Prior to joining the Center for Data Innovation, Joshua graduated from American University with degrees in C.L.E.G. (Communication, Legal Institutions, Economics, and Government) and Public Communication. His research focuses on methods of promoting innovative and emerging technologies as a means of improving the economy and quality of life. Follow Joshua on Twitter @Josh_A_New.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top ↑

Show Buttons
Hide Buttons