Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Any data on translation length by language pair?

Written by Patrick Hall, 1 year, 1 month ago.
Tags: , .

Is there any research out there that gives numbers for the average length of a translation for given language pairs?

For instance, my cohort Jonas has noticed that when he translates from English into Portuguese, the resulting translation seems to end up about 25% longer than the English.

For some languages the writing system alone more or less guarantees a particular proportion: a translation into Chinese will end up shorter than a source text that uses an alphabetic writing system. Highly inflected languages (Russian, say) will tend to be “long,” on average, isolating languages “short” (Vietnamese).

What I’m imagining is a big matrix with a long list of languages on the X axis, and the same list on the Y axis, and a percentage in each cell of the matrix. Looking up Portuguese » English would give 75%; looking up English » Portuguese would give 125% (assuming Jonas’ guess is correct and that I can do math).

I’m just about to start searching for this myself, but I figured that I’d throw out the question.

12 Comments for 'Any data on translation length by language pair?'

  1. Comment received 1 year, 1 month ago from Rob De Almeida

    Actually it would be 80% / 125% — just think of an english text with 100 words, for example: 100 → 125 is a 25% increase, but 125 → 100 is a 20% decrease.

    I read that in conversations the information flux is more or less constant for the human species. This is why italian and spanish people speak faster than americans; their language has a smaller information density (facts per words) than english, so they need to speak faster to keep the flux constant. I don’t know if this is true, and of course there are obvious exceptions, but it makes sense.

  2. Comment received 1 year, 1 month ago from Patrick Hall

    Thanks Rob,

    Okay, it’s established, I can’t do math :P (That’s what I get for writing a blog post in 5 minutes ;) )

    I’ve never heard the term flux used in the context you describe. Is it really true that Spanish and Italian speakers speak faster than speakers of English? I would guess that the speech recognition literature would be the place to look for information on that.

    Another quick way I just thought of to generate some very raw data and build a matrix of the sort I described would simply be to count the number of letters in a translated document.

    The Universal Declaration of Human Rights would be a good text to start with, but it’s probably not long enough to get really representative numbers. (I suppose religious texts, which are commonly available, and heavily translated, would be another good choice.)

  3. Comment received 1 year, 1 month ago from Chris Waigl

    Well I’m not sure about research but I do know that professional translators have rate correspondence tables by language, for example to work out how to charge if the client requires on a rate for words in the source vs. words in the target document (both cases exist).

  4. Comment received 1 year, 1 month ago from Farzaneh

    Such a matrix won’t be a symmetrical one. I think it is usually the case that the translator needs to use more words to convey the exact meaning of the few words used in the original text.

  5. Comment received 1 year, 1 month ago from Patrick Hall

    @Chris,

    Hmm, hadn’t thought of that, interesting. Those rate charts would be an interesting proxy for translation length. The demand for each particular language pair would also factor in to those rates, but it would still be interesting to see.

    @Farzaneh,

    Hi there, you raise a very good point. I have often found myself with English translations of a Portuguese source text that end up longer than the original, but I’ve always chalked that up to not having a lot of experience myself as a translator.

    But perhaps theres always a tendency for translations to be longer than the source text? (Disregarding issues like the Chinese writing system versus alphabetic ones, mentioned above.)

    Thanks for stopping by!

  6. Comment received 1 year, 1 month ago from MBM

    I guess you could get such statistics from parallel aligned corpora or from translation memories.

    I think that there is always a tendency for translations to be longer than originals, no matter what the language pair is (provided the writing systems are comparable, of course). I like to think of translation as a process of fitting the possibilities of the source language into the restrictions of the target language. This often means that the translator needs to go into some length to express ideas for which the source language has a short and easy way of saying them, but the target language doesn’t.

  7. Comment received 1 year, 1 month ago from Glenn

    The formula I’ve used for English to Romance of +20% would logically generate 20% fewer words into English but it never does. I attribute it to the fact that translators, feeling tied to the source text, don’t always pare down enough, sometimes due to speed(it’s easier and faster to include all the words you see in the source), but mostly because they don’t feel they have the right not to reflect all of the words of the source by economizing into more geniune-sounding English.

  8. Comment received 1 year, 1 month ago from MBM

    Exactly! This is a falacy I keep hearing again and again. People presume that, if translating from language A to language B prolongs the text, then translating in the opposite direction should make the text shorter. But that almost never happens. On average, translation makes the target text longer in any language, no matter which direction.

    These days, most translation gets done from English into other languages, the translations are usually longer than the original, and that makes people believe that English must be the most concise and most efficient language in the world. This is an illusion. Translating from other languages into English also produces texts longer than the original.

  9. Comment received 1 year, 1 month ago from Alex

    Farzaneh:

    IMHO, it’s a sign of a poor translator if the target language always ends up more wordy, irrespective of source-target combination. An ideal translation should convey no more and no less than the original information, also preserving the emotional content intact.

    Patrick:

    The European Parliament has ample archives of its proceedings translated into a lot of languages. A quick search didn’t turn up much, but they are there for a determined person to find.
    Or, you could set up a web application in the form of a grid and ask various translators to fill in their relevant language pairs with their numbers (I realize results would be rather subjective). Then you could average the values.

  10. Comment received 1 year ago from Brian

    Hi there, here is an interesting link to a Swiss localization service provider with some statistics on this topic: http://www.oettli.gr/subpage1.asp?catid=52&maincat=2

    However, I am not sure about the sample documents they used to calculate the contraction/expansion ratios.

  11. Comment received 1 year ago from Patrick Hall

    Hey Brian!

    Thanks for the link, interesting data.

    I agree that it’s a bit hard to know what to make of divisions like “Spanish” vs “Mexican Spanish” without knowing what specific text genres and lengths were involved in the measurements. The variation is pretty wide.

    In any case, it seems that “contraction” and “expansion” are the right terms to search for to find more information on this topic.

    Danke vielmals!

  12. Comment received 1 month, 1 week ago from Bradleyjames

    Thanks a lot, this is really helpful. Really well for me and I’m not going back to the proprietary guys! If You Need More Information Please Visit us :- eTranslate is an international company specialising in the provision of Internationalization and Globalization Solutions.

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.