Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Unicode headed toward World Domination™

Written by Patrick Hall, 1 week, 3 days ago.
Tags: , .

The Google Blog has a chart showing that there is a very clear trend toward Unicode adoption.

Apparently their numbers refer to UTF-8 alone (as opposed to UTF-16/UCS-2 or (haha)UTF-32/UCS-4), which again is good news. (Though one wonders if there is any uptake of UTF-16 on the web… I hope not.)

The data is “Google internal”… peer-reviewed, it ain’t.

Thanks to Won for the pointer!

6 Comments for 'Unicode headed toward World Domination™'

  1. Comment received 1 week, 3 days ago from Robin

    Yay, long live Unicode! :)

  2. Comment received 1 week, 3 days ago from Patrick Hall

    Here here!

    Kaj multajn dankojn por via komento, Robin

  3. Comment received 1 week, 3 days ago from ke

    What’s wrong with UTF-16 and other Unicode encodings?

  4. Comment received 1 week, 3 days ago from Patrick Hall

    There’s nothing inherently wrong with UTF-16 or any other transformation format of Unicode.

    But I think the web is heading toward standardization on UTF-8, because it’s backwards-compatible with ASCII and latin-1 (though not the annoying CP1252 gremlins), and it has the widest support in applications.

    Because of the way it’s defined, UTF-8 is in a way-self validating. For a file to be processed as UTF-8, it kind of really has to be UTF-8. (As any Python programmer familiar with the notorious UnicodeDecodeError can attest.) Asking to decode a UTF-16 file doesn’t do any such “validation”, because any sequence of bytes is valid UTF-16.

    (It also takes up more memory, but that hardly matters these days… it’s just text.)

  5. Comment received 1 week, 2 days ago from Christoph

    It also takes up more memory

    <nit-picky>Not on Scripts mainly above the 7-bit border.</nit-picky>

  6. Comment received 5 days, 14 hours ago from ke

    That’s not nit-picky, that’s an important point. The “UTF-8 only” attitude strikes me as latin-centric.

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.