Amazon has released a dataset of nearly 400,000 English, Hebrew, Russian, Arabic, and Japanese names collected from Wikipedia articles to help AI perform more accurate translations between alphabets. Differences in alphabets, such as the use of different characters and pronunciations, can affect how well AI can perform translations. For example, Amazon found its AI did better at understanding English to Russian translations than Arabic to English because the Latin alphabet is more similar to the Cyrillic alphabet than the Arabic alphabet. This data could help personal assistants retrieve information across languages.
Creating Better Translations with AI
Michael McLaughlin is a research assistant at the Center for Data Innovation. He previously worked at Oracle and held internships at USA TODAY and in local government. Prior to joining the Center for Data Innovation, Michael graduated from Wake Forest University, where he majored in Communication with Minors in Politics and International Affairs and Journalism. He is currently pursuing his Master’s in Communication at Stanford University, specializing in Data Journalism.
View all posts by Michael McLaughlin