A Machine Misheard the Lyrics, and We Laughed. For 430 Million People, the Words Were the Whole Point.
In the winter of 2011, two men in North Carolina figured out that you could feed a sentence into YouTube's automatic captions, film yourself reading whatever the machine guessed, then run that through the captions again, and that if you did this enough times the machine would eventually hand you a kind of accidental poetry. Rhett McLaughlin and Link Neal started with a simple line: I got tickets to the Lady GaGa putt-putt tournament and monster rally. A few passes later the machine had rewritten it as Advantages of the Lady GaGa puppets in a lot of Iraq. They called the series Caption Fail. We watched it, and we laughed, because it was genuinely, helplessly funny.
What the machine was doing has a name, and the name is older than the machine. In 1954, a writer named Sylvia Wright confessed in a magazine that as a child she had misheard a line of a Scottish ballad. They hae slain the Earl o' Moray, and laid him on the green had arrived in her ears as and Lady Mondegreen, a tragic noblewoman who had never existed, slain alongside the Earl in a song that contained no such person. Wright loved her imaginary noblewoman so much she gave the entire category of misheard words a name. The mondegreen.
The point about what I shall hereafter call mondegreens, since no one else has thought up a word for them, is that they are better than the original. Sylvia Wright, 1954
That is the secret of why Caption Fail worked. An auto-caption is a mondegreen at industrial scale: a machine that hears sound without meaning and guesses, the way a half-asleep child hears a lullaby. It is not stupid. It is dreaming. And like Wright's Lady Mondegreen, its mistakes are sometimes funnier and stranger and more alive than the words that were actually sung.
The joke only works if you already know the words
Here is the thing I keep coming back to. The comedy only lands if you already know the real line. Everyone laughing at a lot of Iraq could hear Lady Gaga perfectly well; the joke is the distance between the two, and you can only measure that distance if you have both. The caption, for that audience, was a toy. It was never built for them at all.
It was built for the person who cannot hear the song. The World Health Organization estimates that over 430 million people live with disabling hearing loss, more than five percent of everyone alive, and that by 2050 the number will pass 700 million. For someone in that 430 million, the auto-caption is not a parlor trick. It is the only way into the room. And when the machine mishears Lady Gaga as a lot of Iraq, the gap the rest of us find funny becomes, for them, a door that quietly does not open.
Jessica Kellgren-Fozard has spent years patiently explaining this to creators who treat captions as optional, and her most-quoted point is also the most surprising: roughly eighty percent of the people who turn captions on are not deaf or hard of hearing at all. They are watching in a loud bar, or beside a sleeping baby, or on a train at two in the morning with the sound off. Captions became the most-used accessibility feature on the internet precisely because they help almost everyone. Which means the small group they were actually built for now shares them with the rest of us, borrowing the door they have no choice but to use.
The smallest erasure is two characters long
And then there is the quietest failure of all. When a song plays and the machine cannot make out the words, it does not guess. It does not even mishear. It simply writes one flat word inside two brackets and moves on.
The captioning standards are clear about this. When lyrics carry meaning, they should be written out, word for word, wrapped in little music notes so you know they are sung. The machine knows none of that. [Music] is what giving up looks like in text. It is the place where the song just stops being part of the video, for the one person who needed it written down.
The fix was never a better machine
The answer, so far, has not been smarter software. It has been people. In 2016 a deaf YouTuber named Rikki Poynter got tired of watching automatic captions turn a video about concealer into a video about zebras, and she gave the bad ones a name that stuck: craptions. She launched a campaign, #NoMoreCraptions, asking creators to write their own captions and viewers to ask the creators they loved to do it. Within two days there were forty-seven videos. The point was never that the machine was lazy. The point was that a caption is a small act of care, and care is the one thing you cannot automate.
For most of us a caption is a convenience we switch on in a loud room. For some of us it is the only door into the song. The same little line of text, doing two completely different jobs, and only one of them is optional.
So the same lyric can live three different lives on the same screen, depending on who is doing the listening, or whether anyone is.
THE MISHEARD LINE
"A lot of Iraq." Wrong, and somehow wonderful. We keep it and laugh because we already know the real words by heart.
THE GIVEN-UP LINE
[Music]. One bracket, two characters. The song plays, the screen says nothing, and the meaning leaves the room.
THE WRITTEN LINE
The real words, typed out by someone who sat with the song. The only version that opens the door for everyone in the room.
The machine is getting better. It always does. In a few years the mondegreens will mostly stop, and a small, specific kind of comedy will quietly go extinct, and that will be, on balance, a good trade. But the gap the machine leaves behind has always been filled the same way: by a person putting back what the system took out. It is the same instinct that turns a comment section under a muted, copyright-struck video into a memorial for the song that used to be there. Somebody, somewhere, types the real words for a stranger they will never meet.
I think about this when I save a video. A bookmark keeps the thing itself, but a video with real captions keeps something more: it stays watchable when the sound has to stay off, in the quiet room, on the late train, beside the sleeping person. The best videos in anyone's library turn out to be the ones somebody captioned by hand, because those are the ones that survive the volume going to zero. Saving a video is keeping it. Captioning it well is keeping it for everyone.

Join the conversation