If you are writing an application that deals with multilingual content…
Here is a piece of advice for the gearheads. Posted at 6 am. Y’all know who you are, you’ve been where we are right now, oh bleary-eyed brethren. Here it is:
Never test an application under development with plain-ASCII content.
Just don’t bother.
Yes, you will establish that stuff is being saved where it should be, that data is flowing wherever it needs to flow, that your models and view and controllers and bits and bobs and kitchen sinks are all cooperating in Rube Goldbergian glory.
But you haven’t done anything to test whether all the pieces are cooperating with regards to encodings.
In other words, you’re kidding yourself.
Don’t say this:
“Hmm, I’ll just stick some sample text into this text entry box to make sure everything’s working… let’s see… I’ll type in
abcde.”
Don’t do that.
Please don’t do that.
Do this:
“Hmm, let’s go over here and look at the funny-looking Unicode-encoded scripts on this page. What is this one, Zlatiborian? “ዩኒኮድ ምቃሩ?” “유니코드에 대해?” “Unicode คืออะไร?” “რა არის უნიკოდი?” “यूनिकोड क्या है?” What is all this stuff? I have no idea! Yeah, that sounds about right. Let’s stick all that in the text box!”
Now you’re talking. Cut and paste some of that Zlatiborian (or whatever) into your application, my friend. You don’t have to be able to understand it. You don’t have to read it. You don’t even have to recognize it.
Because you can recognize jibberish when you see it. I recently learned a neat word for the phenomenon: Mojibake. Here’s a page in Russian. Even if you’ve never read a word of Russian in your life, you’ll see where the examples of Mojibake are on that page.
Do it early, do it often. The Mojibake will come creeping out of the woodwork, believe me. But you’ve got to fix it eventually, right?
P.S. By the way, chances are that you didn’t have the fonts for all the excerpts of uncommon scripts I stuck in that quote up there. If you’re writing software that is going to have multilingual users, maybe you should think about investing in some fonts? Or at least, downloading some?
2 comments.
Technorati tags: Blogamundo, Code, Language and the Web, tips, unicode
Praise be to the author of this article for giving mention to the glorious language of Mt Zlatibor! When the time of reckoning comes, and the friends of the Zlatiborians are separated from the enemies, he will surely be judged kindly!
[…] Editing WordPress posts in Performancing | Performancing.com Testing Performancing… seems pretty neat. The account setup wizard was quite easy. Ooops, I broke my own rule — I forgot to test Performancing with non-Ascii text. […]