Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

The Human-speed Web

Written by Patrick Hall, 7 months, 3 weeks ago.
Tags: .

For most Google services, the policy seems to be: “if it’s not instantaneous, it’s broken.” For search, this is great.

For translation, it depends on what you want. If you want an instantaneous approximation of understanding, your option is machine translation, and that’s what Google does at translate.google.com.

But we’re a long way from artificial intelligence, right? And machine translation isn’t going to be putting translators out of business in the foreseeable future.*

So where does translation fit into the age of the internet? The possibility that we believe has a lot of merit, is to use computers to help human translators do their job more efficiently.

Guess what this means from the point of view of someone who’s requesting a translation from a skilled translator?

You gotta wait.

That sound you just heard was the eyeballs of the blipvert generation exploding, right? And this is the sort of internal kvetching I’ve been doing ever since we began this project: “Good grief,” I’d mutter to myself, whilst gnawing at my fingernails, “how am I going to explain to people that I’m working on a website that involves waiting?”

But you know what? I have come to believe that the idea of waiting for a quality service by a human is not really not so weird, even out here on the series of tubes. There are successful services out on the web that are not instantaneous.

Yahoo Answers is a good example: visitors ask references questions, and wait for responses prepared by others.

And I suppose one could argue that journalism in general operates at “human speed,” at least things that are written. The news doesn’t wait to happen, but we’re all willing to wait for helpful interpretation.

So, I dunno, maybe I’m nuts, and the whole world really is utterly impatient and unwilling to accept anything that isn’t instantaneous. But I don’t think so, and we’re betting on it. And we are going to make the processes of requesting and doing translations more efficient, and yes, faster.

* It’s my own personal opinion that once really good MT exists, ie, content that’s indistinguishable from the work of a human, we’ll more or less be living in the age of artificial intelligence. And at that point, all bets are off anyway: whether “real” machine translation is feasible will be a moot point, because there will be machine authors!

Reddit goes Multilingual

Written by Patrick Hall, 7 months, 3 weeks ago.
Tags: .

I posted about the reaction to a translation being posted on Reddit a while back.

Since that post, I ran across a really cool discovery. Reddit is going multilingual:

I’ve been looking forward to announcing this feature ever since we first started drawing up plans for reddit in Steve’s notebook.

Over the months (and as recently as yesterday), we’ve gotten requests for reddits in a number of languages, but subreddits now make this possible (using language codes as subdomains, a la wikipedia). Translating the interface is the next step, but for now, you can read/submit/share links written in your preferred langauge(s) through the english UI.

Esperanto Reddit

So now there are Reddits in Chinese (zh.reddit.com), French (fr.reddit.com), German (de.reddit.com), Japanese (ja.reddit.com), Korean (ko.reddit.com), and Spanish (es.reddit.com), and a bunch more. And there’s an open invitation for more suggestions.

The individual Reddits haven’t been localized or promoted very heavily yet, but this I’m really looking forward to seeing if they catch on. For one thing, it’s an interesting way to bootstrap corpora in various languages — it’s easy to get URLs out of feeds.

From our own point of view, these multilingual Reddits offer opportunities for translation: if a particular submission is popular on the Portuguese Reddit, doesn’t it stand to reason that a link to a translation of that submission could be popular on English or Spanish?

Update: Oh, I found the full list of supported languages: Armenian, Basque, Bulgarian, Catalan, Chinese, Esperanto, French, German, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish, Turkish, and Vietnamese.

PanImages

Written by Patrick Hall, 7 months, 4 weeks ago.
Tags: , , , , .

A press release about an interesting project at the University of Washington:

A rose is a rózsa is a 薔薇: Image-search tool speaks hundreds of languages

PanImages is a tool that helps you to search for images across languages.

Searching for
house” on Google images will get you different results than searching for or casa or maison.

PanImages takes all this to the next level, and automatically translates your search term into various languages, and then re-runs an ubersearch on Google Images and Flickr.

Pretty neat. But more interesting than the interface itself (to yours truly, anyway), is the work that’s going on behind the scenes to find the translated terms to broaden the query. The project is based on a paper:

Lexical Translation with Application to Image Search on the Web (pdf)

I just started taking a look at it, but there is some stuff in there about “translation graphs” that suggests all kinds of interesting applications.

UPDATE: I should add that I still find it compelling to note that there are more translations for “house” hidden in the interwiki links on the normal Wikipedia than there are to be found by parsing out the contents of Wiktionary: 45 versus 33, in this case.

A Translation on Reddit

Written by Patrick Hall, 8 months ago.
Tags: , , , , .

Reddit.com is a popular site where users submit interesting links, and then other users vote the link up or down, and discuss the submissions. The great majority of submissions are to content in English. I posted once before about the relationship of translation and aggregation.

This post on Reddit got me thinking about the topic again. Here’s the unwieldy title:

Without a doubt the most beautiful and moving expression of a father’s love and grief I’ve ever read. [I’m not sure how well reddit receives stuff in other languages, but please if you don’t read French leave this alone instead of downvoting it for that reason — I did my best at translating it into English in my comment.] (reddit.com)

The submitter found the French article so moving that he or she felt compelled to translate it in its entirety, and to submit the translation as a comment.

And that’s where the episode takes on separate layers of interest―linguistic layers: how do other users on the site react? And, just how much does their reaction depend on the nature of the translation?

This comment is undoubtedly in very poor taste, and that’s why it’s now voted into hidden status.

However, the commenter explains that he or she doesn’t read French. And if you consider the translated paragraph in question (which begins “Her liver is now in the belly of a two-month old infant…”), at least some of the commenter’s unease is explained: the English translation really does come off as more brutal than the original. While frank and painfully honest, the French version somehow does not convey they same clinical tone.

And so we have a whole conversation which is based on the tone of the translation, not the original. It takes a truly skilled translator to be able to convey such nuance. A professional translator would have better captured the tone, not just the meaning.

But this is the internet, right? What about collaborating to improve translation?

Well, we see some of that even this informal translation-in-a-comment. Here, a commenter suggests an improvement, and submitter incorporates the suggestion.

Could more of such collaboration have produced a professional quality English version? I kind of doubt it, to be honest. Many a Wikipedia article is separated from professional quality by a healthy bout of proofreading, after all. Nonetheless, the article improved. At least it was possible to incorporate suggestions.

These sorts of problems are problems of process. Translations have the same vagaries as any text. Their chimeric character is part of their charm and challenge. No technology will change that. (Certainly not machine translation!)

But can we streamline the process of translation on the web? Can we make it feel native to the web? We hope so―that’s what we’re trying to do around here. It’s taking us a long time. But it’s worth doing right.

More soon.

PS: I hasten to point out that there are also serious questions about the legality of this translation. After all, it’s copyrighted, and translations are derivative works. This separate topic merits a few bazillion more posts…

From the Ever-So-Thinly-Disguised Press Release Department

Written by Patrick Hall, 8 months ago.
Tags: , .

Language and terminology issues hold back global business: A study carried out by the Localisation Industry Standards Association and global information management provider SDL, which is best known for its translation and terminology management software, found that global business growth is hindered because decision makers have little knowledge of core technology such as content management, terminology management, and budgets associated with global communications.

Seriously folks, is a study carried out by a company that sells translation and terminology management software not going to “discover” that language and terminology are hindering global business?

</tinyrant>

We also note with interest that The Localization Industry Standards Association’s home page is not localized.

Why can’t your ATM remember what language you speak?

Written by Patrick Hall, 8 months, 1 week ago.
Tags: .

An amusing (and true) observation in a cantankerous article called Is Enterprise Software Failing The Innovation Test?

Yet with all this investment when I go to the ATM at my bank it still asks me in what language I want to be spoken to as I’m withdrawing my money. What!? It doesn’t know me? Despite all that equipment and data, the system can’t call up my preferences when I put my bankcard into the machine? I have millions of dollars in this bank!! In terms of innovation this industry definitely gets an “F.”

As the kind of guy who is usually more attuned to the lack of language support, I tend to overlook things like this. (I’m more likely to be counting the number of languages offered…)

But the author really makes a good point. Imagine if a localized website made you choose your language interface every time you logged in. Well, what would be the point of logging in?

And ATMs have access to a (supposedly) highly secure physical source of identification. Why don’t they remember your language preferences?