Learning Method

Pop2Talk lets kids learn at their own pace and engages them via a playful learning environment in the form of a moonstone popping game.

The players pop moonstones. On a pop they hear a sample pronunciation of the target word. After a number of pops the player is shown the meaning of the word in a picture and asked to repeat the word.

Player's pronunciation is evaluated computationally and stars are handed as a reward. These stars measure the level of profiency in pronunciation. Now and then the player needs to pronounce a word without hearing the sample. This reinforces learning and also measures learning of vocabulary.

Pop2Talk method adjusts the ratio of hearing and pronouncing words and uses feedback and gamelike elements to keep the child interested.

Pronunciation is the first step...

...or to be more precise, it’s the second. First you need to hear an example pronunciation. Listen and repeat, listen and repeat. Repeat enough times and eventually you’ll get the sounds right. Then you’ll be able to say the words you know in a way that others understand them, and on that basis you can start speaking full sentences.

Learning new sounds

Pronunciation is not only a linguistic attainment, it is also a motor skill rather like gymnastics or juggling. It takes numerous attempts to master the control of your vocal tract for a new sound. When you first hear a sound that is not in any of the languages you know, your brain does not even notice that it is new. Listening and repeating form a loop that allows you to gradually learn the new sound.

It takes time and effort before representations of the sound are formed in your brain. These changes in the sensory-motoric cortex are very small but they can detect when the language learner is able to distinguish between correct and incorrect pronunciations, whether their own or that of another person.

Pop2Talk evolved out of years of research at the University of Helsinki and Aalto University. Pop2Talk is based on a combination of brain research, educational science, and speech technology. From brain measurements of test subjects playing our language learning games, we can tell what is a generally suitable amount of repetition required to learn new sounds.

Probability of recall versus number of spoken repeats of a word in Pop2Talk game study. There is a dramatic increase in players' capability to remember spoken word forms after a certain number of repeats. Numbers will be published by Ylinen & al. (manucript under preparation).

Playing keeps the child invested

Listening to a sample over and over again and pronouncing the same word endless times can feel like a grind. You’ll need to hear a word hundreds of times and repeat it a few dozen times to form a lasting representation in your memory.

Many young children have a hard time concentrating on such an amount of work. A gaming environment can break the monotony of repetition by providing an exciting and rewarding side task.

By tuning the difficulty of the game by careful selection of words and a rewarding learning curve, the child can experience a satisfying sense of proficiency as well as a delightful feeling of being challenged. 

Tuning the difficulty of the game is enabled by the famous Leitner system. The Leitner system creates an effective model for enhancing learning.

The Leitner system is based on a simple spaced repetition. The system allows the game to automatically increase the number of words that a child still has difficulty with. Words that have already been mastered or learned during the game are presented less frequently and words that cause difficulties are carried along among the new words until they are mastered well enough.


The Leitner system adapts to both intensive and slow learning, allowing each learner an individual learning path.

How can AI help in learning pronunciation?

The pronunciation scoring in the Pop2Talk game is provided by our proprietary in-house speech analysis system which is driven by modern machine learning algorithms.

Childrens' voices have greater variety in speaking style as well as high variability in pitch and vocal tract size. This provides a challenge to speech recognition even when speaking in their native language. We understand this and have accordingly adapted our approach to speech recognition, enabling our technology to reflect the specific needs of children learning new languages.

From speech recognition to pronunciation recognition

Standard speech recognisers try to infer what the speaker tried to say, no matter how it was said. We redefined the problem: We know what the player wants to say, and so we can concentrate our efforts to detect pronunciation mistakes. Our scoring is based on perceived mistakes. Statistics of mistakes can be compiled to a report to a teacher.