YouTube Culture

A Machine Misheard the Lyrics, and We Laughed. For 430 Million People, the Words Were the Whole Point.

7 min read

In the winter of 2011, two men in North Carolina figured out that you could feed a sentence into YouTube's automatic captions, film yourself reading whatever the machine guessed, then run that through the captions again, and that if you did this enough times the machine would eventually hand you a kind of accidental poetry. Rhett McLaughlin and Link Neal started with a simple line: I got tickets to the Lady GaGa putt-putt tournament and monster rally. A few passes later the machine had rewritten it as Advantages of the Lady GaGa puppets in a lot of Iraq. They called the series Caption Fail. We watched it, and we laughed, because it was genuinely, helplessly funny.

Rhett & Link - CAPTION FAIL: Lady Gaga Putt-Putt Rally Rhett & Link - CAPTION FAIL: Lady Gaga Putt-Putt Rally (2011)
The pilot of Caption Fail, January 2011. The whole joke lives in the gap between what was said and what the machine heard.

What the machine was doing has a name, and the name is older than the machine. In 1954, a writer named Sylvia Wright confessed in a magazine that as a child she had misheard a line of a Scottish ballad. They hae slain the Earl o' Moray, and laid him on the green had arrived in her ears as and Lady Mondegreen, a tragic noblewoman who had never existed, slain alongside the Earl in a song that contained no such person. Wright loved her imaginary noblewoman so much she gave the entire category of misheard words a name. The mondegreen.

The point about what I shall hereafter call mondegreens, since no one else has thought up a word for them, is that they are better than the original. Sylvia Wright, 1954

That is the secret of why Caption Fail worked. An auto-caption is a mondegreen at industrial scale: a machine that hears sound without meaning and guesses, the way a half-asleep child hears a lullaby. It is not stupid. It is dreaming. And like Wright's Lady Mondegreen, its mistakes are sometimes funnier and stranger and more alive than the words that were actually sung.

430M
people living with disabling hearing loss worldwide (WHO)
700M
projected by 2050, or 1 in every 10 people (WHO)
80%
of people who switch captions on can hear perfectly well (Ofcom)
1954
the year a misheard lyric first got the name "mondegreen"

The joke only works if you already know the words

Here is the thing I keep coming back to. The comedy only lands if you already know the real line. Everyone laughing at a lot of Iraq could hear Lady Gaga perfectly well; the joke is the distance between the two, and you can only measure that distance if you have both. The caption, for that audience, was a toy. It was never built for them at all.

It was built for the person who cannot hear the song. The World Health Organization estimates that over 430 million people live with disabling hearing loss, more than five percent of everyone alive, and that by 2050 the number will pass 700 million. For someone in that 430 million, the auto-caption is not a parlor trick. It is the only way into the room. And when the machine mishears Lady Gaga as a lot of Iraq, the gap the rest of us find funny becomes, for them, a door that quietly does not open.

Jessica Kellgren-Fozard - How using captions can get you 80% more views, why captions are useful Jessica Kellgren-Fozard - Why captions are useful
Jessica Kellgren-Fozard, a deaf YouTuber who captions every video by hand, on why the words matter and who they are really for.

Jessica Kellgren-Fozard has spent years patiently explaining this to creators who treat captions as optional, and her most-quoted point is also the most surprising: roughly eighty percent of the people who turn captions on are not deaf or hard of hearing at all. They are watching in a loud bar, or beside a sleeping baby, or on a train at two in the morning with the sound off. Captions became the most-used accessibility feature on the internet precisely because they help almost everyone. Which means the small group they were actually built for now shares them with the rest of us, borrowing the door they have no choice but to use.

The smallest erasure is two characters long

And then there is the quietest failure of all. When a song plays and the machine cannot make out the words, it does not guess. It does not even mishear. It simply writes one flat word inside two brackets and moves on.

[Music]. It is the most honest thing the auto-caption writes, and the saddest. The swelling thing under the goodbye montage, the verse the whole video was built around, the lyric someone chose because it said what they could not say themselves, all of it compressed into a single label that tells a deaf viewer only that something is happening in a room they have not been invited into.

The captioning standards are clear about this. When lyrics carry meaning, they should be written out, word for word, wrapped in little music notes so you know they are sung. The machine knows none of that. [Music] is what giving up looks like in text. It is the place where the song just stops being part of the video, for the one person who needed it written down.

The audience captions are quietly built for (WHO estimates)

Some hearing loss, today
1.5 billion
Some hearing loss, 2050
2.5 billion
Disabling loss, today
430 million
Disabling loss, 2050
700 million

Captions are built for the people on these bars. Eighty percent of the people who actually use them, per Ofcom, are not on any of them. Figures via the World Health Organization.

The fix was never a better machine

The answer, so far, has not been smarter software. It has been people. In 2016 a deaf YouTuber named Rikki Poynter got tired of watching automatic captions turn a video about concealer into a video about zebras, and she gave the bad ones a name that stuck: craptions. She launched a campaign, #NoMoreCraptions, asking creators to write their own captions and viewers to ask the creators they loved to do it. Within two days there were forty-seven videos. The point was never that the machine was lazy. The point was that a caption is a small act of care, and care is the one thing you cannot automate.

Rikki Poynter - #NoMoreCraptions: How To Properly Caption Your Videos Rikki Poynter - #NoMoreCraptions
Rikki Poynter coined the word "craptions" and turned a private frustration into a movement creators could join in an afternoon.
For most of us a caption is a convenience we switch on in a loud room. For some of us it is the only door into the song. The same little line of text, doing two completely different jobs, and only one of them is optional.

So the same lyric can live three different lives on the same screen, depending on who is doing the listening, or whether anyone is.

THE MISHEARD LINE

"A lot of Iraq." Wrong, and somehow wonderful. We keep it and laugh because we already know the real words by heart.

THE GIVEN-UP LINE

[Music]. One bracket, two characters. The song plays, the screen says nothing, and the meaning leaves the room.

THE WRITTEN LINE

The real words, typed out by someone who sat with the song. The only version that opens the door for everyone in the room.

The machine is getting better. It always does. In a few years the mondegreens will mostly stop, and a small, specific kind of comedy will quietly go extinct, and that will be, on balance, a good trade. But the gap the machine leaves behind has always been filled the same way: by a person putting back what the system took out. It is the same instinct that turns a comment section under a muted, copyright-struck video into a memorial for the song that used to be there. Somebody, somewhere, types the real words for a stranger they will never meet.

I think about this when I save a video. A bookmark keeps the thing itself, but a video with real captions keeps something more: it stays watchable when the sound has to stay off, in the quiet room, on the late train, beside the sleeping person. The best videos in anyone's library turn out to be the ones somebody captioned by hand, because those are the ones that survive the volume going to zero. Saving a video is keeping it. Captioning it well is keeping it for everyone.

Leave a Comment

Join the conversation