Youtube comments, generally speaking, are pretty convincing evidence that intelligence has not yet been conclusively proven to exist in the universe.
But every once in a while, something intriguing comes up, like this rather nifty example of Unicode art found in the comments under “Jack Cafferty Tells Us How He Really Feels About Sarah Palin” (apparently individual Youtube comments don’t have permalinks, sorry):
¨¤ø„¸¨°º¤ø„¸¸„ø¤º°¨¸„ø¤º°¨
¨°º¤ø„¸Obama & Biden„ø¤º°¨
¸„ø¤º°¨ ROCKS!!!``°º¤ø„¸
ø¤º°¨ ¸„ø¤º°¨¨°º¤ø„¸¨°º¤ø
If you’re curious, here are the names of the particular characters:
¨ DIAERESIS
° DEGREE SIGN
º MASCULINE ORDINAL INDICATOR
¤ CURRENCY SIGN
ø LATIN SMALL LETTER O WITH STROKE
„ DOUBLE LOW-9 QUOTATION MARK
¸ CEDILLA
It seems to me that this little trick is just the tip of the iceberg; the possibilities for “drawing” given the number of characters in Unicode is truly terrifying impressive.
Long-time pal Chris Waigl gave a talk at BarCamp London 5 about multilingualism on the web, here be the slides:
There are lots of interesting factoids in there, particularly the numerical ones. G’wan, have a look.


Here’s one I ran across clicking through Amazon’s best seller page (had never heard of it):
The Girl with the Dragon Tattoo
It’s a crime novel translated from Swedish to English. I’m no aficionado of the genre, but it would seem that he’s not the first Swedish crime writer to have made a splash in English: Henning Mankell and Liza Marklund, and many more.
The author is a colorful fellow named Stieg Larsson, and the translator works under the name of Reg Keeland. It turns out his real name is Steven T. Murray. In the original Swedish, the title is Män Som Hatar Kvinnor which means, apparently, “Men Who Hate Women.” I wonder where the title was changed in the pipeline; it seems that Murray himself was none too pleased with the change.
Interestingly, it would seem that there’s a fair chunk of translation that goes on in the crime fiction world. There’s even an annual prize for translation into English within the genre: Duncan Lawrie International Dagger. Nifty.
A while back we had a discussion about some rather mysterious internationalization (or “informationization,” as it was called) efforts in China that supposedly addressed various minority languages in China.
They’re baaack, this time talking exclusively about Tibetan:
White paper: International-standard Tibetan character code approved_English_Xinhua
BEIJING, Sept. 25 (Xinhua) — An international-standard Tibetan character code has approved by the International Standards Organization, making the Tibetan script the first ethnic minority script in China with an international standard, said a white paper issued by the Information Office of the State Council on Thursday.
ISO, huh? If you say so, but I can’t find it.
Are they talking about Unicode? I hope so, but I don’t think so… hasn’t Tibetan been a part of Unicode for some time now?
And I thought US media were bewildering.
UPDATE Mondrian points out in a comment that the article is talking about Unicode. Cool. Thanks Mondrian.
Given the perilous state of most Native American languages, it might be surprising to learn just how vital the Navajo language is. There are some 170,000 speakers, and plenty of kids speaking it.
Navajo is vital enough that there is a population of speakers whose command of English is somewhat limited, and as a result translation issues arise.
Case in point: Edward R. Garrison, a biologist at Dine College in New Mexico, has been managing a project to translate a glossary about cancer for the use of health workers.
Cancer and Navajo language: No longer lost in translation - Salt Lake Tribune
News From Indian Country - Dine College on quest to rename Navajo cancer terms
Black-Spencer, a community health educator at the University of New Mexico, introduces herself by name and clan to establish a relationship and earn their trust. She speaks both Navajo and English, catering to older and younger generations.
…
And then there’s the issue of how to describe cancer. For decades, Navajos have used a word that when translated into English means, “the sore that does not heal” - lood doo na’dziihii.
It’s Black-Spencer’s biggest barrier and a description she says leads Navajos to lose any hope for survival. Officials at Dine College’s Shiprock campus want to change that.
…
In the end, Garrison hopes to make the glossary available as a guide for people like Black-Spencer, whose work takes her to Navajo communities where she presents information on cancer.
People often boil down language issues to a black/white distinction between those who “can” and those who “can’t” speak a language. But it’s not like that. There are a host of emotional and personal issues embodied in language.
This is just so cool I have to quote the whole email from the Unicode list:
On http://www.iau.org/public_press/news/release/iau0807/ , the IAU (International Astronomical Union) publishes a press release of 2008-09-17 “IAU names fifth dwarf planet Haumea”.
There, also the names of two moons of this dwarf planet are announced, the larger of them being named Hiʻiaka (after a Hawaiian goddess).
It is pleasant to see that this name is in fact spelled correctly in the recent version of that press release, including U+02BB as the correct encoding for the Hawaiian ʻokina. This even is done in the plain text file downloadable from that site, which is UTF-8 encoded.
Thus we have now a celestial body which is officially given a name which requires Unicode to be spelled correctly, rather than simply ASCII (aka ISO 646) or ISO 8859-1.
- Karl Pentzlin
(emphasis added)
If you’re wondering what the heck an ʻokina is, therein lies a story, but I’ll just refer you to Wikipedia.
I vaguely remember that one of Boston’s two major papers, the Globe or the Herald, started hosting content in Japanese on its Redsox site, in an attempt to woo the many Japanese fans of Redsox pitcher Daisuke Matsuzaka. But somehow I never got around to posting here about it, and that was 2006.
Go, Red Sox! Go, Daisuke! Go, Daigo!
Poynter Online: Japanese Baseball Fan’s Site Hits Big Time
I’m not exactly sure what the whole story is, but it would seem (to judge by some old blog links) that at one point it was the Herald that was actually hosting content on their own domain, but the content seems to have gone stale.
As far as I’ve been able to find, a partnership between Boston.com (which is, I think, the and a Japanese fan site (ボストンレッドソックス応援日記 : Go-RedSox.com) is all that survives of the Daisuke mania.
I find all of this very interesting, because it’s the only instance I can think of where a major US newspaper was actively producing and hosting non-English content for a non-US audience.
Interesting things happen on the internet…
I figured I’d start posting about interesting translation topics I run across here, rather than stick them in my delicious.com account. I’m always interested in when, where, and why translation happens…
Publisher turns a new page in translation
“Every writer wants to reach as many people as possible,” says Prof Kithaka, who teaches communications, linguistics and Kiswahili at the University of Nairobi.
“As much as I respect Kiswahili and I have no doubt the language I want to use in writings, I am also aware that there are many readers who cannot access my books due to the language barrier.”
Moue Magazine »Stanford Offers Free Online Courses in CS, Robotics
Yippee! And what’s more, there’s a course on Natural Language Processing!
…except…
Stanford School of Engineering: Artificial Intelligence | Natural Language Processing: Instructor: Christopher D Manning
Due to copyright issues, video downloads and lecture slides are not available for Natural Language Processing.
Sigh.
Just a quick followup on the topic of open multilingual dictionaries: with the help of my homey Carlos, we’re going to run a little experiment here shortly. For starters, we’ll be sticking a bunch of lexicons in the simple “newline separated” format into a directory. It’s going to be waaay lo-fi. Like this:
lexicons/
en-pt.txt
de-pt.txt
de-en.txt
...
We’ll also release the scripts we use to generate those files from Wikipedia dumps, so if you like you can download your favorite language and produce a lexicon to other languages.
I have plenty of thoughts about the comments in the previous thread, but I’m going to refrain from too much theorizing and concentrate on getting some data out there. Then we can all theorize about what to do with it, and just whether these lexicons would serve as a sensible ground floor for more detailed and elaborate projects (proper dictionaries, collecting cross-lingual concepts, …?)
More soon…
Also, for a look at a method of building a multilingual thesaurus from Wikipedia, check out Daniel’s English summary of his thesis (and his thesis itself, if you can read German).