How English-captioned videos stimulate learning

Learning new vocabulary from videos is improved with full captions (glossed, if possible) combined with pre-viewing activities, as demonstrated in a study by Mark Feng Teng at Hong Kong Baptist University.

Learning new vocabulary is demanding and relentless for language learners, and strategies to make it more interesting and efficient are welcome. Watching videos is a popular method for all ages, but there is some debate over how best to support this and whether reading text at the same time is helpful or just too much extra work, producing cognitive overload that interferes with the learning process.

Teng set up a rigorously designed study to test the use of captioned videos, as well as the use of pre-viewing activities. Two hundred and forty 11 to 12-year-old Chinese EFL students were recruited from six Chinese- medium primary schools. All the students taking part were of a similar intermediate standard.

The 240 students were split into two groups. One was given pre-video tasks involving pictures from the videos to match with captions. Each of these two groups was split into a further four groups. Each of the four groups viewed videos with different kinds of captioning: glossed full captions, glossed keyword captions, full captions and keyword captions. So, there were eight groups in total.

Each group watched four videos of about 4 minutes. Each video showed an animated story and had an English-speaking narrator. Full captions were verbatim transcripts of the narrator speaking, while keyword captions showed only one to four difficult words/phrases, eg, ‘extravagant’ and ‘abide by the rules’.

The glossed captions had links under some difficult words so that when the viewer clicked on the link a pop-up box showed the meaning in Chinese. The video paused while the pop-up was open and continued again when the viewer clicked again.

There were 20 target words in the videos, nouns (eg, twilight), verbs (eg, nibble) and adjectives (eg, extravagant). The students were pre-tested on a larger bank of target words and the final set were selected as being unknown to all the students. Each target word appeared only once in the video.

After viewing, the students were tested on their comprehension of the videos – but none of the students were aware that target vocabulary was the focus of this testing.

It requires a sturdy effect to produce significant differences between so many groups of fairly modest size, but Teng did find significant differences between all groups (properly using a MANOVA with Bonferroni corrections for multiple comparisons).

Overall, the choice of captions accounted for 63% of the variance in post-viewing test scores, while the use of pre-viewing activities accounted for 37%. Clearly the use of both glossed full captions and pre-viewing tasks produced the best scores, although the choice of caption style had the strongest influence on scores.

The pre-viewing activities took up to 50 minutes to complete, however, so a much less efficient use of time compared to four times 4-minute videos. In practice, with limited classroom time to devote to vocabulary activities, these activities would most likely need to be added as extra homework and there may well be other, higher priority tasks. Further research would be useful to find out the minimal, most efficient way to prepare these activities, eg, is it worth preparing just 10 minutes of pre-viewing activities or does the effect become so small it isn’t a good use of either the teacher’s or the students’ time?

Of the captioned videos, the order of effectiveness was glossed full, glossed keywords, full, then just keywords – and scores were significantly different between all groups (with or without pre- viewing tasks). It’s clear from these results that reading full captions in the target language while watching the video and also listening to the narration actually improves learning rather than being a burden.

This makes sense given current models of working memory, which have separate ‘loops’ for visual and auditory information, so that both can run concurrently without interference or cognitive overload. On the contrary, having both streams running simultaneously leads to more efficient storage in memory (as per dual coding theory).

Clearly students were making use of the glossed term links and these significantly improved vocabulary learning. Glossing is attention enhancing in itself – and anything that improves learnes’ attention is a winning strategy.

The most straightforward application of these findings is to favour videos fully captioned in the target language. Suitable videos that are not already in English could be used if they have a target language narration inserted: for this study, new audio files of English narration were prepared using Wondershare Filmora and glossing was added using MAGpie2.5. Investment in a bank of such videos has potential as a relatively time- efficient and entertaining way to learn vocabulary.


  • Teng, M F (2022), ‘Vocabulary learning through videos: captions, advance-organizer strategy, and their combination,’ Computer Assisted Language Learning, 35:3 518-550, DOI: 10. 1080/09588221. 2020.1720253
Gill Ragsdale
Gill has a PhD in Evolutionary Psychology from Cambridge, and teaches Psychology with the Open University, but also holds an RSA-Cert TEFL. Gill has taught EFL in the UK, Turkey, Egypt and to the refugees in the Calais 'Jungle' in France. She currently teaches English to refugees in the UK.
