Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

More on How to Find Book Translations

Written by Patrick Hall, July 31st, 2008

That post about how to find the translations of books turned out to address a more difficult question than I had imagined. Surely, somewhere there was a giant database, which maps translations to originals?

Well there are, sort of.

The big discovery for me was something called the Index Translationum, which was built by UNESCO. (I was tipped off to it by the very useful www.askusnow.info, a service where you can chat with librarians.)

For instance, there are 390 listings for a search for “The Lord of the Rings,” 442 for “Harry Potter” (yes, as a matter of fact, Harry Potter has been translated into Basque), and 24 for Philip K. Dick’s The Man in the High Castle. That last number would appear to be larger than the number of translations listed on www.philipkdick.com. It would also appear to be more than the 16 listed on LibraryThing. (click “Work Details”) It’s very important to note, however, that LibraryThing has the proper Unicode records of titles in non-Roman scripts, whereas the Index Translationum has a crummy, ill-defined transliteration system.

Thanks to Anirvan Chatterjee of BookFinder.com for his suggestions, which included the fact that LibraryThing tracks translations.

One final suggestion I’d throw in myself is simply to check Wikipedia. If the book is famous enough to have it’s own article, as The Man in the High Castle is, then the left-hand links often turn up several articles whose titles are probably the title of translated works. In this case, the links turn up Człowiek z Wysokiego Zamku, Manden i den store fæstning, Das Orakel vom Berge, El hombre en el castillo, Le Maître du Haut Château, La svastica sul sole, האיש במצודה הרמה, Mannen i det höga slottet, and 高堡奇人…

So the bottom line is, there are a bunch of places to look. But we’re not in the age of a “translation lookup web service” or anything like that, yet.

Further suggestions welcome…

(I’d add in passing that it’s really surprising that online bookstores don’t make this sort of information available. A Brazilian customer, say, who searches for “The Man in the High Castle” might be that much more likely to buy a copy if upon being informed that O homem do castelo alto exists…)

Cuil’s Unicode Support… Or lack thereof.

Written by Patrick Hall, July 28th, 2008

The blawgs are ablather in talk about a new search engine called Cuil.

I took a look, and personally I think it looks pretty nice. The layout is original, it seems quite fast, and while the results don’t seem to be as good as that cough other search engine, it strikes me as better than some other alternatives I’ve seen.

Except for the deal killer.

Exhibit A: Russian
Википедию - Cuil

Exhibit B: Bengali
উইকিপিডিয়া - Cuil

Exhibit C: Japanese
ウィキペディア - Cuil

Exhibit D: French
Wikipédia - Cuil
It ignores the diacritic and searches for “Wikipedia.” The first hit is http://en.wikipedia.org/wiki/Wikipedia.

Exhibit E: Chinese
維基百科 - Cuil
Huh, Chinese works. Go figure.

(I just randomly tried those languages.) Almost all of the above return “No results because of high load… Due to excessive load, our servers didn’t return results. Please try your search again.” Which is obviously not the case, because ASCII searches run okay.

In other words, Cuil pretty much doesn’t index anything but ASCII and, uh, Chinese. The 1970s called, they want their regular expression back…

I’m sure more complaints like this Twitter post complaining about lack of Vietnamese support will bubble up…

Anyway, it’s not like building a search engine is easy or something! I hope a fix for this is in the works, and good luck to Cuil!

Oh brother. Putting the word “Cuil” into this post seems to have been a total spam trap… had to turn off comments on this post. Eh, I’ll just delete them.

How do you figure out what languages a book has been translated into?

Written by Patrick Hall, July 24th, 2008

I was watching an interview of an author this morning, and by way of introduction, the interviewer said that the author’s book had been “translated into 30 languages.”

That’s a standard phrase, but I wondered which 30 languages the book had been translated into.

And then I realized I have no idea how to find out the answer to that question.

Do you?

Unicode Normalization in Ruby?

Written by Patrick Hall, July 19th, 2008

Last week I gave a little talk on Unicode at the DC Ruby Users Group. I have met some really interesting folks in that group; if you’re in the DC area and into Ruby I highly recommend it.

The talk was a high-level overview of “why Unicode matters,” more than a nitty-gritty down-to-the-bits sort of thing. In my experience the former issue is often more problematic than the latter, so that’s where I focused my attention.

Anyway, as a result of chatting with some folks I decided I would try to get a couple of very small-scale open source Ruby projects rolling. There are some READMEs scribbled at at github.com, and I’m going to try to work regularly there.

I have two initial ideas:

  1. Trying to do a pure-Ruby port of Python’s unicodedata module
  2. Statistical language identification with Ruby1.9

#1 is something I miss a lot in Ruby.
#2 is something I’ve had some success with in Python already, and I’d like to get it running in Ruby and turn it into a gem or something, as I imagine it would be of use to others.

Basically, I’m interested in collaborating on Ruby stuff that intersects with language, i18n, l10n, and all the rest of the stuff I babble about around these parts. Comments welcome…

PS: This stuff will be GPL’d. Free Software FTW.