UDHR, UDHR on the wall, who’s the x-iest language of them all?
Sometimes, if you are me, you wonder weird things, like:
What language has the highest proportion of xs?
from glob import glob
from operator import itemgetter
def xiness(text):
return float(text.count('x'))/len(text)
udhr = {}
for lang in glob('udhr/udhr_*.txt'):
udhr[lang] = open(lang).read().decode('utf-8')
xsurvey = {}
for lang, text in udhr.items():
xsurvey[lang] = xiness(text)
print sorted(xsurvey.items(), key=itemgetter(1))[-1]
Wahoo!!! The winner is Northern Qiandong Hmong!! The crowd goes wild! That’s one x-y language.
Goodnight.

who comes in second?
1. Northern Qiandong Hmong.
2. Hmong Njua, another type of Hmong.
3. Susu, a Mande language from Guinea.
4. Mazatec, Ixcatlán, a Mexican language.
5. Northern Mam, a Guatemalan language.
6. Q’eqchi’, a Mayan language from Central America.
7. Southern Qiandong Hmong, another Hmong!
8. Papantla Totonac, another Mexican language, different family.
9. Somali, ha! I knew it would be in here somewhere… Somali is very x-y.
10. Miahuatlán Zapotec… they sure do like x’s in Mexico!