Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

New i18n Aggregator at the W3C

Written by Patrick Hall, 10 months, 3 weeks ago.
Tags: , .

We’re happy to report that there is a new aggregator started by the W3C’s i18n lead Richard Ishida: Planet I18n. Hacklog is lucky to be included! Here’s Richard’s description:

Planet i18n has just been launched by the I18n Core Working Group. It gathers together posts from various blogs that talk about internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.

If you own a blog with a focus on internationalization, and want to be added to this aggregator, please get in touch with Richard Ishida at ishida@w3.org.

The list is pretty short so far, but there’s some great stuff there (I’d never run across iheni, for instance). Check it out!

Web Designers and Internationalization

Written by Patrick Hall, 10 months, 3 weeks ago.
Tags: , , , .

Random thought:

I think more web designers should become interested in internationalization (i18n).

I don’t mean, mainly, that they should learn about the technical issues about encodings and keyboard support and Unicode and stuff like that (although of course they should).

What I mean is, there are a lot of interesting design problems related to i18n. People who like reading about how to design effective websites and appealing, useful interfaces will also like the challenges that i18n offers.

  • Design: interfaces to translated content in many (perhaps 30 or more) languages in an flexible, extensible, intuitive way
  • Rethink: how existing content will fit into an internationalized site―question your assumptions.
  • Imagine: somebody half way around the world, who doesn’t even know your language, might end up using something that you’ve designed. Cool.

For some analysis, check out Global by Design’s The Best Global Web Sites (and why).

I’ve often seen designers out there seeming to dread dealing with multilingual content. Why? It’s fun.

More on translation length: Word lengths in many languages

Written by Patrick Hall, 10 months, 3 weeks ago.
Tags: , , .

The previous post spawned an interesting comment thread, thanks to everyone for the input and ideas!

Serendipitously enough, Richard Ishida at the w3c recently published Text size in translation, which has some numbers relevant to our discussion about translation length.

He cites some data from IBM which suggests that if you translate from English into a “European” language (whatever that means), your text gets longer in general. According to these numbers, if you start with a text of 10 characters, you’ll probably end up with a translation of about 25 characters. If you start with a text of 70 characters, you’ll probably end up with a translation of about 105 characters.

Richard also cites some research he himself did on localization in Flickr. He found that a typical interface term such as views could end up 300% longer in Italian (visualizzazioni). Korean (조회), by way of comparison, comes out shorter - just 2 letters (or better, perhaps, “syllabic glyphs”).

Following up on his idea, I decided to look at average word length across a wide variety of languages. You can see the results here:


Languages by Average word length (click to see table)

What’s the bottom line? (Keeping in mind that the tail end of the list is screwy because of languages that don’t delimit words, because the definition of “word” is fuzzy, etc. etc.)

Average “word” length varies from something like 3 characters (Dangme) to somewhere around 15 (Inuktitut )

So, quite apart from the issues of translator skill, it seems undeniable that if you translate a text from Inuktitut to Dangme, it will come out shorter. (And that’s a HUGE market right there ;) )

Thoughts about the chart are welcome… I gotta get back to work!

If you’re feeling a little masochistic you can also take a look at the 20 minutes worth of grungy code I used to build that table. (There are some dependencies mentioned inline; if you have trouble running it let me know & I’ll try to help you out/clean it up): udhr_word_lengths.py

Any data on translation length by language pair?

Written by Patrick Hall, 11 months ago.
Tags: , .

Is there any research out there that gives numbers for the average length of a translation for given language pairs?

For instance, my cohort Jonas has noticed that when he translates from English into Portuguese, the resulting translation seems to end up about 25% longer than the English.

For some languages the writing system alone more or less guarantees a particular proportion: a translation into Chinese will end up shorter than a source text that uses an alphabetic writing system. Highly inflected languages (Russian, say) will tend to be “long,” on average, isolating languages “short” (Vietnamese).

What I’m imagining is a big matrix with a long list of languages on the X axis, and the same list on the Y axis, and a percentage in each cell of the matrix. Looking up Portuguese » English would give 75%; looking up English » Portuguese would give 125% (assuming Jonas’ guess is correct and that I can do math).

I’m just about to start searching for this myself, but I figured that I’d throw out the question.