Amazon has released a dataset of nearly 400,000 English, Hebrew, Russian, Arabic, and Japanese names collected from Wikipedia articles to help AI perform more accurate translations between alphabets. Differences in alphabets, such as the use of different characters and pronunciations, can affect how well AI can perform translations. For example, Amazon found its AI did better at understanding English to Russian translations than Arabic to English because the Latin alphabet is more similar to the Cyrillic alphabet than the Arabic alphabet. This data could help personal assistants retrieve information across languages.
Creating Better Translations with AI
Michael McLaughlin is a research analyst at the Center for Data Innovation. He researches and writes about a variety of issues related to information technology and Internet policy, including digital platforms, e-government, and artificial intelligence. Michael graduated from Wake Forest University, where he majored in Communication with Minors in Politics and International Affairs and Journalism. He received his Master’s in Communication at Stanford University, specializing in Data Journalism.
View all posts by Michael McLaughlin