Learning Method

Pop2Talk lets kids learn at their own pace and engages them via a playful learning environment in the form of a moonstone popping game.

The players pop moonstones. On a pop they hear a sample pronunciation of the target word. After a number of pops the player is shown the meaning of the word in a picture and asked to repeat the word.

Player's pronunciation is evaluated computationally and stars are handed as a reward. These stars measure the level of profiency in pronunciation. Now and then the player needs to pronounce a word without hearing the sample. This reinforces learning and also measures learning of vocabulary.

Pop2Talk method adjusts the ratio of hearing and pronouncing words and uses feedback and gamelike elements to keep the child interested.

Pronunciation is the first step...

...or to be more precise, it’s the second. First you need to hear an example pronunciation. Hear and say, hear and say. Repeat enough times and eventually you’ll get the sounds right. Then you’ll be able to say the words you know in a way that others understand them, and on that basis you can start saying sentences.

Learning new sounds

Pronouncing is not only a linguistic skill. It is also a motoric skill, a bit like gymnastics or juggling. It takes numerous attempts to master the control of your vocal tract for a new sound. When you first hear a sound that is not in any of the languages you know, your brain does not even notice that it is new. Listening and repeating form a loop that allows you to gradually learn the new sound.

It takes time and effort before representations of the sound are formed in your brain. These changes in the sensory-motoric cortex are very small but they tell when the language learner is able to distinguish between wrong and right pronunciations, whether its their own or someone elses.

Pop2Talk came to be during years of research at the University of Helsinki and Aalto University. Pop2Talk is based on a combination of brain research, educational science and speech technology. From brain measurements of test subjects playing our language learning games, we can tell what is a generally suitable amount of repetition required to learn new sounds.

Probability of recall versus number of spoken repeats of a word in Pop2Talk game study. There is a dramatic increase in players' capability to remember spoken word forms after a certain number of repeats. Numbers will be published by Ylinen & al. (manucript under preparation).

Playing keeps the child invested

Listening to a sample over and over again and pronouncing the same word endless times can feel like a grind. You’ll need to head a word hundreds of times and repeat it a few dozen times to form an enduring representation of it in your brain.

Many young children have a hard time concentrating on such an amount of work. A gaming environment can break the monotony of repetition by providing an exciting and rewarding side task.

By tuning the difficulty of the game by careful selection of words and a rewarding learning curve, the player can experience a satisfying sense of proficiency as well as a delightful feeling of being challenged. 

Tuning the difficulty of the game is enabled by the famous Leitner system. The Leitner system creates an effective model for enhancing learning. The system was developed by German journalist Sebastian Leitner in the 1970s.

The Leitner system is based on a simple spaced repetition. The system allows the game to automatically increase the number of words that a child still has difficulty with. Words that have already been mastered or learned during the game are presented less frequently and words that cause difficulties are carried along among the new words until they are mastered well enough.


The Leitner system adapts to both intensive and slow learning, allowing each learner an individual learning path.

How can AI help in learning pronunciation?

The pronunciation scoring in the Pop2Talk game is provided by our proprieatry in-house speech analysis system driven by modern machine learning algorithms.

Children's voices have larger variety in speaking style, high variability in pitch and vocal tract size and thus are a challenge to speech recognition even when they speak in their native language. By changing our approach to speech recognition, we have made our technology work well for children learning new languages.

From speech recognition to pronunciation recognition

Standard speech recognisers try to infer what the speaker tried to say, no matter how it was said. We redefined the problem: We know what the player wants to say, and so we can concentrate our efforts to detect pronunciation mistakes. Our scoring is based on perceived mistakes. Statistics of mistakes can be compiled to a report to a teacher.

