Why doesn’t Google index Khmer and Amharic?
Note: you might want to install Khmer and Ethiopic fonts. But you can still get the idea behind this post without having them installed.
Compare:
Three searches for ភាសាខ្មែរ (”Khmer language”) on Yahoo, MSN, and Google. You can click on the images to run the searches yourself.
Google doesn’t just return zero results, it returns nothingness.
A Google blue screen of death.
And it’s not just Khmer:
Here’s a search for ዩኒኮድ (”Unicode” in Amharic, Tigrigna, and several other Ethiopic languages)
on Yahoo, MSN, and Google.
Once again, Google gives us nothing for those queries. Nary a “Did you mean” or “Your search did not match any documents.”
Just zilch. Zippo. Nada. Niente.
It’s not like nobody at Google has ever heard of these languages: unlike Yahoo and MSN, Google has actually been localized into Amharic, Tigrigna, and Khmer. And they’ve all got millions of speakers.
So what gives?
Theories welcome, I have none.
Update: They don’t bother to index Burmese, either. ဗမာစာ. Yahoo does. MSN , too.
Lame, Google.







Maybe these big fellows just don’t give the credit to none of these people. Perhaps, for them, none of them have at least computer at home or at office. =(
Hi Leandro,
Obrigado pela visita :)
It may be that the size the communities of these languages is too small to warrant Google’s indexing them, but I suspect that there is some other reason. It doesn’t seem to be the case that the number of speakers corresponding to a particular Unicode block is relevant.
Google indexes plenty of writing systems that are only used for “small” languages. To wit: Հայերեն (Armenian) with 7 million speakers , ქართული ენა (Georgian), with just 4.1 million.
Amharic alone has 26 million speaker, Tigrinya another 5.1; Khmer has something like 20 million.
Admittedly, most of those speakers have no access to computers… yet. But the point
Oh look! I found another missing language: ဗမာစာ … Burmese! There’s another 40 million ungoogleable speakers.
Pfeh.
One search seems to work: ማውጫ - ጉግል መፈለጊያ
fyi
Now Google , Altavista, MSN , Scroggle(any language)now
supports search in Burmese characters using truly,fully compliant Unicode-Padauk G font
Yahoo’s only supports in web mails up to now, May 07. 2006
Funny thing is, Ethiopic searching was working on Google about two years ago, then it suddenly stopped working. I had email and face-to-face correspondence with several people at Google over this. As you note, they finally got it working again. Not sure what the problem was…