Weirdness with Greek on Twitter
Sometimes I will search for various translations of the word “translation” (or “translator”, etc.) on various search engines, just to see what comes up. Even if I don’t know the language in question, sometimes I can glean something interesting, even if it’s only the fact that translations between particular languages are happening.
So anyway, today I stuck the Greek word for “translation”, “Μετάφραση”, into Twitter. I got some weird results:

Many of the results in the results page come back as “????”’s, but when you click through the “View Tweet” links on those particular tweets, like this one, proper Greek appears.

Any theories as to what sort of encoding issue could be going on here? And does anyone know what the most used encoding for Greek is? Perhaps it’s already UTF-8?

Hm, if we assume the following architecture for Twitter…
User ——> Main module
\–> Search module
|–> Indexing module
\–> Snippet storage module
…then it could be that encodings are resolved correctly in the main module and in the indexing module but for some reason not in the snippet storage module. Weird, but I cannot come up with a more plausible explanation for the behavior you’re describing.
This could apply particularly to automatic fixing of pathological cases, e.g. where clients submit posts in an encoding other than declared (likely to occur given the variety of client apps that allow you to post something to Twitter).
The encoding hidden beneath the question marks is probably ISO-8859-7 (it uses codepoints that are invalid for UTF-8).