The Center for Data Innovation spoke to Mait Müntel, co-founder and chief executive officer of Lingvist, an Estonian startup that uses artificial intelligence to develop language learning software tailored to individual learners. Müntel discussed how AI can adapt to how well a learner absorbs a new language, and how accurately it can predict what they already know.
Nick Wallace: I read that you used to work at the European Organization for Nuclear Research (CERN) in Geneva, on the Higgs-Boson project. How did you go from that to designing software to help people learn foreign languages?
Mait Müntel: For me it was kind of like a hobby that went out of control. I didn’t intend to start a language teaching company, or to become an entrepreneur at all. I was just playing around with ideas for my own purposes, and then somehow it turned out some other people wanted to use this prototype I used for myself. And I thought “right, I’m not such a good software developer, so maybe I can pay somebody to code it properly”—I gave the project my finger, and it took my hand! So I had some people working for me, I had to raise money to pay their salaries, and then I needed to hire more people to raise more funds, and the spiral started to work.
I built the app in the first place just because I was curious. Language learning was pretty hard for me at school. It was the reason I started to study physics: physics was easy, language learning was not. I struggled a lot learning English, and I had a very hard time learning Russian—Estonia was still part of the Soviet Union when I was a kid, and everybody had to learn Russian. It was like a big social experiment where everybody had to learn it for 12 years, but they didn’t learn anything.
Then later I moved to CERN in Geneva, which as you know is in a French-speaking area, and I thought that it would be very nice to learn a new language, but it takes so much time that I thought I couldn’t manage it, because I’d had such a difficult time learning Russian and English. And after the Higgs-Boson project, I was just thinking, “what, theoretically, would be the shortest possible time needed to learn a language, if we could optimize everything?”
I played around with some very hypothetical estimations, and ended up with a pretty small theoretical number of 100 to 200 hours. It seemed so incredible to me, but I trusted my estimates, and then I thought, “alright, let’s test it, let’s write a software program to try it out” and just out of theoretical curiosity I started to write the software.
We were using different algorithms and machine learning tools in particle physics at CERN, so those were the tools that I knew. So it was easy for me to use this software to build a piece of personalized software for myself. But before I started writing the software I needed to do quite a bit of research into language learning.
I tried the other software tools, just to understand what’s out there, and I was disappointed that they did not take into account important things like language statistics, or differences in memory. If you compare memory to physical differences: the world’s fastest runner can run 100 meters in ten seconds, whereas most of us can easily run 100 meters in 20 seconds—it’s not that big a difference. But with mental capabilities, the differences aren’t two times different, they’re maybe ten or even a hundred times different. It would be nice to measure this when learning, because you have so many data points. The other thing was the need to optimize the process—deciding what to do, and when. Language statistics and memory management make up the artificial intelligence core, based on the student’s personal data.
Wallace: Many people struggle to learn even the basics of any foreign language—but then there are polyglots like Joseph Conrad and Arthur Koestler, who managed to write great novels in languages not their own. What makes some people struggle more than others, and how can artificial intelligence help?
Müntel: I think the biggest obstacle is motivation. If you are less interested in language learning, and maybe even a bit afraid of it because you have not had a success experience, then you’re more inclined to give up. The success experience is important, because it ramps up motivation. If you take classroom lessons, there is one teacher and everybody has to have the same program and the same content. Those who are good at learning languages may not learn as efficiently as they could, and those who aren’t get left behind and struggle to catch up. And if everyone else is better, it’s very demotivating to be the person who does not know. Those people drop-off from learning.
Artificial intelligence can take into account these personal differences very efficiently. If there is software that millions of people are learning, the software can adapt to your personality with the help of other people, because probably there are some other people who are similar to you. As they have used the software before, they have trained the software for your particular kind of personality, and when you learn enough, then the software can start learning your personal strengths and weaknesses and using them more efficiently to help you.
Currently, we don’t measure much besides memory. But we are starting to measure other things because we have a vocabulary learning engine and a grammar learning engine, and now we are incorporating speaking, reading, and listening—all those skills will be interfaced in the same AI core—so pretty soon we will be able to measure how you learn whether you learn faster when you speak, or when you type. The software has to be able to make this distinction, because while you might think you learn faster by typing, you could be wrong.
And that’s why this software is amazing, because you can collect so much data about personal learning behavior, every interaction is also a measurement point, so during your whole learning cycle we will have millions of data points about your learning behavior. We can use that data not only to teach you languages, but to teach you other things as well, because your memory patterns are the same for different subjects.
Wallace: There are some really fundamental arguments over the right method to teach a language, such as whether the teacher should ever use the learner’s native language, or how to balance vocabulary and grammar in the early stages. Some of these differences seem irreconcilable—how does Lingvist approach such controversies?
Müntel: Definitely, there are a lot of language theories—maybe 60 to 80—that are contradictory to each other. We think different methodologies fit better to different people, and one software cannot serve everybody. We are not building this software for polyglots, or for those people who can naturally pick up language very efficiently.
But we test everything that we build so that we will have actual data. You can have theoretical discussions about what makes sense, but if you can test and measure an approach with real people, then you have an answer, and that answer may be that for you, one methodology is better, but for somebody else, another methodology may be better.
We hope to use these differences in our product. Currently we try to serve the biggest sector of users, and some users don’t find our software is the best for them because they are of a different kind of learner. But we try to test everything on real students and serve different functionalities to different students, so everybody will have a different learning experience.
Wallace: Depending on what your native language is, some languages are harder than others. Czechs and Slovaks already understand each other pretty well, Romanians can usually pick up Italian or French without too much difficulty, and German is not much of a leap if you already know Dutch—but virtually everyone struggles with Hungarian. How can AI navigate these contrasts?
Müntel: AI can navigate this very efficiently. For a French person, it’s easy to learn Italian and vice versa because they share quite a lot of similar vocabulary, and the grammar is similar. If you take Hungarian, that’s a different branch of language, there’s almost no common vocabulary at all. These difficulties come out from the data very quickly, so they can be taken into account.
We can see that learning speed is different between different language pairs. AI is capable of handling it pretty efficiently. It can pretty efficiently predict, based on what you know already, what else you are likely to know. So the learning experiences of different pairs of languages would be different.
One thing that we didn’t expect to see was that for Estonians, it’s quite easy to learn Japanese. They’re from the other side of the world, but the pronunciation is somewhat similar. For them it’s easier to pronounce Estonian words, and for us it’s easier to pronounce Japanese words. The words mean quite different things, but they’re easy to remember because there is a connection point in the phonetics.
Wallace: Human language still poses a huge challenge for AI. Machines can encode and compute a lot of vocabulary and grammar rules, but they do not actually understand it. What do you think is the next frontier for AI and language, and how do you think it might impact tools like yours?
Müntel: Artificial intelligence has made voice recognition and automatic translation much, much better. If you listen to some radio recordings, and then try to write it down, the automatic solutions are pretty close to the human ability, if not better. It’s very good to add those things into language learning software. But what is a little bit further away is having an actual conversation with the computer, because this means it has to have some level of understanding. You need conversation to learn a language properly, but currently it’s easier to do that by making people work in pairs, like Busuu does.
Wallace: What can the data generated by customers using Lingvist tell you about how people learn languages? Have you found anything that might be valuable for pedagogical research?
Müntel: We have found some pretty interesting things. We do not have time to publish scientific articles yet, but maybe in a few years when we’ve collected enough data, we will. Right now we are planning to give some data to universities, to do some research.
For example, we can predict what a person knows pretty well. If your answers indicate you know some 50 words, then we are able to predict thousands of other words you might know. We can do it quite accurately, because we have a lot of data about other users, so we can draw conclusions from other similar users.
The accuracy of that kind of prediction is about 90 to 95 percent. So if you start using language learning software, this method allows you to map your knowledge quite quickly and start teaching you things that actually matter to you, you don’t have to start at the beginning. It doesn’t have to be language: that’s what we’re working with right now now, but in principle it could work with anything.