A Short Story about Machine Translation
Once upon a time, there were a bunch of universities who were drafted to participate in a “Surprise Language Machine Translation exercise.”
The exercise went like this:
”We’re going to tell you the name of a language, and all you machine translation departments will have one month to get something out of that language that more or less resembles English.”
The mystery language turned out to be be Hindi, and the boffins got in gear.
Now, when you have a massive nationwide brain trust including people like Franz Josef Och, Philip Resnik, and Dan Melamed collaborating and competing to build a machine translation system, you will get results—the best in the business.
(Mostly notwithstanding the muddlings at the not-so brain-trusty rungs of the ladder, where yours truly munged away as a Perl apprentice.)
Of course, the best in the machine translation business still isn’t exactly limpid prose… but then it’s not entirely useless, either (and it’s getting better).
Point being, there was plenty of output at the end of the month.
They’re so Bleu
Now, here’s something you might not know about MT: getting the system to produce output is only the beginning of the work. Then comes the painstaking stage called “evaluation,” where you have to compare the outputs, and decide which is best.
This is usually done by comparing the translations to a translation by a real live human—this is called the “gold standard.” In fact, the most common metric for evaluating MT systems is surprisingly simple: you essentially count which MT output has the most strings of words in common with gold standard translations. Whichever has the most, wins. (You can read about that metric here if you’re curious, it’s called “Bleu.”)
But anon!
There were no gold standard translations for Hindi»English. That was the whole point! If there had already been good, carefully translated texts, well, the suprise language wouldn’t have been terribly surprising.
After all, these guys already had systems for doing translations between famous language pairs like French»English or English»Spanish. (You know what I mean — the dialects with armies and navies.).They could have just fed the systems Hindi»English content, and whammo, MT system. But there just wasn’t enough such translated content to be had.
Faced with this dilemma, the boffins did what they do best: they came up with a clever hack. They simply put up a website with some texts in English, and rewarded amateur translators on the internet with gift certificates to Amazon.com if they would translate those texts into Hindi.
And here’s where the story gets interesting.
Forests and Trees
Lo and behold, the response was overwhelming. And within just a few days they had stacks of gold standard translations—far more human translations than they needed to function as a gold standard, in fact.
So the boffins thought that all was well and they went back to tinkering with the real problem: their MT systems… And they happily labored, knowing that they would be in possession of a scientifically sound point of comparison at the end of the month… lots of good Hindi»English translations, with which to compare their not-so-hot machine translations.
Now, stop and think about this a minute, O Children of the Age of Wikipedia.
Can you feel the irony?
Update: Speak of the devil — if you’re curious to read more about how statistical machine translation works (and how it relies on the existence of translated content), check out Translation by Numbers at Technology Review.
