Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Interview with Microsoft Internationalization Guy Michael Kaplan

Written by Patrick Hall, 2 years, 4 months ago.
Tags: , , .

I’m a bit of a Linux freak myself, but I’m a big fan of Microsoft internationalization guy Michael Kaplan.

A few weeks ago Robert Scoble did a video interview with him for Channel 9:

Michael Kaplan - Bringing Windows Vista to International markets

I was happy to learn about the keyboard layout tool he created for Windows users. (Need to try that…)

Michael also has a great blog at Sorting It All Out. Naturally he often blogs about Windows-related stuff, but not exclusively — he knows Unicode like the back of his hand, and anybody who ends up futzing around with funny languages all day, well… the current blogger is automatically a fan. ☺

Back in Business

Written by Patrick Hall, 2 years, 4 months ago.
Tags: .

Sorry for the server’s sluggishness for the past several days, it has been betweaked in time for the new year. Prepare for onslaught of backlogged blogging!

Name games

Written by Patrick Hall, 2 years, 4 months ago.
Tags: .

I have a friend who was telling me about someone he knew who’d changed his name to a single letter.

Like, not his first name to a single letter, his entire name was a single letter. Something tells me the IRS doesn’t like that guy.

But Blogamundo will love him!

Does it really make sense to use First Name and Last Name fields on a website anymore?

I humbly suggest that it doesn’t. Or rather, that those fields are mostly artifacts of the English speaking world.

Er, no, that’s not right either. The point is that the names of the fields don’t work. In other words, trying to translate “First Name” and “Last Name” is a can of worms, because those terms are somewhat contextual.

Take Japanese, for instance. The famous writer’s name is 村上 春樹, romanized as Murakami Haruki. In English speaking contexts he uses Haruki Murakami.

Now, which of those is “first” and which is “last”?

Multiply that particular example by all the naming conventions for Akan and Arabic and Chinese and Fijian and French and Philippine and German and Hawaiian and Hebrew and Hungarian and Icelandic and Indian and Japanese and  Korean and Polish and Vietnamese and Russian and thereabouts and Spanish, Portuguese and Catalan and…

And you get the idea.

So, we’re going to give people a great big “Name” box and be done with it.

Is Machine Translation Possible? Well, yeah, but…

Written by Patrick Hall, 2 years, 4 months ago.
Tags: , .

Sorry about the lack of posts and the slow server this week. I’ve been recovering from a week in London and returning from the amazing Global Voices Summit, (about which I have more to blog). Meanwhile, back on the bandwagon…

Interesting post by Ryan Coleman over at Found in Translation: Machine Translation – ever ready for prime time?

At the end of the day, our goal as technologists shouldn’t be to replace the translator - they’re an essential part of the process - but rather build and create tools that automate the mundane so they can focus on the exceptions. In the end I think it means more efficient, higher quality translation for all of us.

I certainly agree with that sentiment. The question of whether “real” machine translation is possible is equivalent, I think, to whether “real” artificial intelligence is possible. In other words, when somebody finally makes machine translation work flawlessly, we will have necessarily reached the era of “OMG MY COMPUTER IS ALIVE!”

And by that point we’ll be dealing with problems far more complex and difficult to imagine than mere translation. (Brain uploading, anyone?)

And, anyway, back in the now, the variety of machine translation that’s actually working well these days—statistical translation—is already a sort of “mechanical turk.” It only works because human translators have given it something to imitate. Human translators are still in the loop, even though it’s called “machine” translation.

That doesn’t change the fact that there just won’t be enough financial and personal resources dedicated to the problem to build machine translation systems for a significant number of pairs of languages.

Do I think that real machine translation is possible? Yep, I do. But I also think that real AI is possible, and that the two developments will be more or less contemporaneous. And when we hit AI, fuhgeddaboutit, all bets are off.

And another thing—couldn’t Amazon have come up with something a name for their service that’s a little more tasteful than “mechanical Turk”? Good grief.

We want to know about your language

Written by Patrick Hall, 2 years, 5 months ago.
Tags: , .

If you’re attending the conference or following along in IRC, and you’re interested in the Blogamundo project, we’d like to get in touch with you and learn about:

  • Your languages
  • Whether you yourself would be interested in translating a few blog posts
  • Any suggestions or ideas you have about what Blogamundo can do for Global Voices

We are not machine translation.

You can email any of us at:

  • pat@blogamundo.com
  • jonas@blogamundo.com
  • john@blogamundo.com

We’ll be starting a mailing list soon, and we’re looking for beta testers!

Hope to hear from you!

Two thirds of Blogamundo reporting from London!

Written by Patrick Hall, 2 years, 5 months ago.
Tags: , .

Blogamundo North American Department East Coast Subdivision (cough, that’s me) and West Coast Subdivision (my brother John!) reporting from London!

First observation — cars are backwards.

The South American Department (Jonas, that is) is arriving tomorrow, and we’re looking forward to meeting everyone. The conference will be webcast at the link, so check it out if you’re interested!

Unicode vs. Latin-1… DEATHMATCH

Written by Patrick Hall, 2 years, 5 months ago.
Tags: , .

And now for some highly inaccurate but hopefully provocative statistics on the progress of Unicode on the web…

Emily Chang’s eHub is “a constantly updated list of web applications, services, resources, blogs or sites with a focus on next generation web (web 2.0), social software, blogging, Ajax, Ruby on Rails, location mapping, open source, folksonomy, design and digital media sharing.”

Holy smokes that’s a lotta buzzwords!

Presumably the folks building such web applications are pretty up-to-date with regards to web standards and such — and I was wondering how clued-in they were about character encodings.

So I took 5 minutes and got all the urls out of ehub’s front page, put them in a file called “ehublinks.html” said this to my bash shell:


$ wget -i ehublinks.html
$ grep -i charset index.html* |lower |tr ' ' '\12' |grep chars|tr '";' '\12' |grep chars|sort |uniq -c |sort -n

1 charset
1 charset=
5 charset=windows-1252
66 charset=iso-8859-1
100 charset=utf-8

66 in latin-1, 100 in utf-8.

That’s out of two-hundred-some-odd pages I downloaded, so it’s hardly accurate. Plus there’s the fact that the encoding that pages claim to be in isn’t necessarily what the server is sending.

But whatever, ballparks, ballparks.

The good news is that Unicode (utf-8) is winning, the bad news is that latin-1 won’t be going away any time soon.

And the even worse news is that my crufty little survey is probably half wrong anyway, since servers don’t necessarily actually send the page in the encoding that the page says it’s in.

But about those windows-1252 people… GOOD GRIEF.

Language death isn’t THAT inevitable

Written by Patrick Hall, 2 years, 5 months ago.
Tags: , .

On the occasion of the publication of his book We the Media into Portuguese, Dan Gillmor has this to say about finding common ground in translation:

We Americans tend to take for granted the ascendency of English. While English has become the international language of commerce, science and aviation — and it’s becoming a common second language around the globe — cultures are holding onto what makes them unique. As they should.

It’s nice to see someone taking note of the fact that linguistic diversity is actually alive and well. Yes, language death is a problem, but there’s something of a cottage industry in linguistic gloom and doom — and not solely with regard to the spread of English. Clay Shirky’s contribution from way back in the last century was particularly dire:

In the next 10 years, we will see the world’s languages sorted into two categories — those that form part of language networks will grow, and those that don’t will shrink, as the export of languages in the last century reshapes the map of the next one.

I don’t buy that.

Every language is its own network. I just don’t buy the idea that a language has to have strong interaction with other languages to exist. (Hopi is doing just fine, thank you very much.)

The well-worn account of English running amok and leaving a trail of linguistic corpses in its path is actually a bit… well… melodramatic. English (and a few other languages — Hindi, Mandarin…) is spreading and becoming a lingua franca, but because the use of language is so fluid, so difficult to predict that we simply can’t know what social, political, and technological forces will shape linguistic trends in the future.

I guess what I’m trying to say is, it’s not all bad news. A little positive thinking and creativity will certainly pay dividends in protecting our linguistic heritage.

Golly, that sounded a little pretentious. ☺

Here are a few relevant posts of varying vintage from my del.icio.us links: