Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Þšéüđø-lõçålîżáŧïòň? Pseudolocalization.

Written by Patrick Hall, 2 years, 1 month ago.
Tags: , , , , .

What a cool idea. Within a wide-ranging thread I wandered into via Found in Translation, I ended up at this post by Erik Schwiebert:

One of the ways we deal with that is a process called “pseudo-localization.” This has nothing to do with ‘pseudo-code’; instead, it is a way of forcing text into some translation automatically, yet still have that text be mostly readable. It works by taking the normal Roman alphabet and changing each of the characters into some similar character, perhaps one with an accent, or a copyright symbol instead of a C. We also pad each string with extra text to make it wider to check for dialog mis-layout and string insertions.

So “pseudo-localization” might become “[=== Þšéüđø-lõçålîżáŧïòň ===]” — still mostly humanly-readable, wider to force dialog layout, and bracketed so we can tell if a dev hardcoded string insertions. We can do this in an entirely automated fashion, and this technique lets us test perhaps 50% of Office as if it were localized, so that we can catch obvious dev mistakes right away.

This reminds me of Sam Ruby’s Survival Guide to Internationalization, where he uses “Iñtërnâtiônàlizætiøn” as a test phrase. In a previous post I advocated smushing any old non-ASCII text into every nook and cranny of your application.

I still think that’s a worth doing, but there are advantages to this “pseudo-localization” tool. For one thing, it’s easier to pronounce something like “Þšéüđø-lõçålîżáŧïòň”. (You can just say “pseudo-localization”!) This is useful when all the developers you’re working with don’t speak the same set of languages.

Far more important is the fact that it’s automated. That means that you can use this sort of stuff in unittests, for instance.


# -*- coding: utf-8 -*-
import re
import random

pseudo = u"Þšéüđølõçålîżáŧïòň"
plain = u"pseudolocalization"
pseudomap = dict(zip(list(plain), list(pseudo)))

sample = ''.join(list(set(list(plain))))
sampleRE = re.compile('^[' + sample + ']+$')

allwords = open('/usr/share/dict/words').readlines()
allwords = [w.strip() for w in allwords]

samplewords = [w for w in allwords if sampleRE.match(w)]
random.shuffle(samplewords)
afewwords = samplewords[:25]
for w in afewwords: print w, ''.join([pseudomap[c] for c in w])

The excitement is unbearable!

autopilot áüŧòÞïlòŧ
canine çáňïňé
clueless çlüéléšš
colonialists çòlòňïálïšŧš
consciousnesses çòňšçïòüšňéššéš
cuddliest çüđđlïéšŧ
ipecacs ïÞéçáçš
opines òÞïňéš
outlast òüŧlášŧ
postponed ÞòšŧÞòňéđ
punctuates Þüňçŧüáŧéš
salon šálòň
saltiest šálŧïéšŧ
sappiest šáÞÞïéšŧ
snot šňòŧ
spectacles šÞéçŧáçléš
titillates ŧïŧïlláŧéš
toilette ŧòïléŧŧé

1 Comment for 'Þšéüđø-lõçålîżáŧïòň? Pseudolocalization.'

  1. Comment received 2 years, 1 month ago from Larry

    That’s so hard to type. Couldn’t we just call it Þ17ň?

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.