The previous post spawned an interesting comment thread, thanks to everyone for the input and ideas!
Serendipitously enough, Richard Ishida at the w3c recently published Text size in translation, which has some numbers relevant to our discussion about translation length.
He cites some data from IBM which suggests that if you translate from English into a “European” language (whatever that means), your text gets longer in general. According to these numbers, if you start with a text of 10 characters, you’ll probably end up with a translation of about 25 characters. If you start with a text of 70 characters, you’ll probably end up with a translation of about 105 characters.
Richard also cites some research he himself did on localization in Flickr. He found that a typical interface term such as views could end up 300% longer in Italian (visualizzazioni). Korean (조회), by way of comparison, comes out shorter - just 2 letters (or better, perhaps, “syllabic glyphs”).
Following up on his idea, I decided to look at average word length across a wide variety of languages. You can see the results here:
What’s the bottom line? (Keeping in mind that the tail end of the list is screwy because of languages that don’t delimit words, because the definition of “word” is fuzzy, etc. etc.)
Average “word” length varies from something like 3 characters (Dangme) to somewhere around 15 (Inuktitut )
So, quite apart from the issues of translator skill, it seems undeniable that if you translate a text from Inuktitut to Dangme, it will come out shorter. (And that’s a HUGE market right there ;) )
Thoughts about the chart are welcome… I gotta get back to work!
If you’re feeling a little masochistic you can also take a look at the 20 minutes worth of grungy code I used to build that table. (There are some dependencies mentioned inline; if you have trouble running it let me know & I’ll try to help you out/clean it up): udhr_word_lengths.py