Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Informationization, you say?

Written by Patrick Hall, 1 year, 2 months ago.
Tags: , , , , , , , , , .

Well, I’m pretty certain this is the first time I’ve ever linked to the People’s Daily Online, but whatever:

People’s Daily Online — Ethnic minority languages head for the direction of informationization

Work on both spoken and written ethnic minority languages, which began in 1980s, has made big headway, note officials from the State Ethic Affairs Commission.

So far, China has made coded character set, keyboard, type matrix and other national standards for the Mongol script, Tibetan language, Uygur writing, Kazak writing, Kirgiz writing, Korean writing, Yi writing and Dai writing. In the latest edition of international standard multi-language coding, the coded character set of Mongol script, Tibetan language, Uygur writing, Kazak writing, Kirgiz writing, Yi writing and Dai writing were formally accepted. Much ethnic language and writing software can be used in Windows system. And the electronic publication system, office automatio (OA) system, and all forms of databases come to the fore in succession. The website and web page of ethnic language and writing are preliminary established. And there also are many good achievements in ethnic character and voice recognition, machine translation and so on.

So what I’m wondering is, what exactly are these standards? Please, somebody informed about such matters out there in Cyberia, tell me that they’re not an alternative to Unicode.

That would suck.

A lot.

PS. I think it’s State Ethnic Affairs, not “Ethic.” I.e., this thing: 中华人民共和国国家民族事务委员会. I tried Googling within that site for a couple of the Chinese names of languages mentioned in the article, like Tibetan (藏语) and Uyghur (维吾尔语), I got some results.

Then I remembered I don’t understand Chinese.

Bueller?

10 Comments for 'Informationization, you say?'

  1. Comment received 1 year, 2 months ago from dda

    Yeah, that’s ethnic:
    中华 China
    人民 People
    共和国 Republic
    国家 National
    民族 Ethnies
    事务 Affairs
    委员会 Commission

    I’ll poke around and see what I can get. And yeah I hope – but seriously doubt – it’s Unicode-based.

  2. Comment received 1 year, 2 months ago from Sile

    China’s thrust is to appear to be supporting minority languages while subtly ensuring they wither (i.e. non-Unicode ’standards,’ etc.)

    The Chinese people ROCK, the Chinese ‘government’ SUCKS.

    BOYCOTT THE 2008 BEIJING SHAM OLYMPICS! A lot of bad is being done in front of this event.

  3. Comment received 1 year, 2 months ago from Sile

    But I should add–don’t be discouraged yet! One of the best ways to oppose this kind of tyranny is to learn even just a little bit of one of these languages.

    Languages are POWERFUL - you can see how powerful by looking at the lengths to which a tyrannical government will go to kill them off.

    As long as these languages are on people’s hearts, minds and tongues, they cannot be killed by the corrupt Beijing clique.

  4. Comment received 1 year, 2 months ago from Patrick Hall

    Hi Sile,

    As for the importance of minority linguistic rights, you’re preaching to the choir. ☺

    As for the specific issue of encoding policy in China, can you point me to some specific information that shows what the Chinese government’s policy actually is, with regard to both Chinese itself and minority languages?

    (Pointers to Chinese content would be fine, I can get it translated.)

  5. Comment received 1 year, 2 months ago from serapio

    If they’re not talking about Unicode, they’re probably talking about the GB18030 standard. The Chinese version of the news story doesn’t make it clear either. Chances are the reporter was a bit confused too. I can’t imagine what changes they could be making to the Korean part of Unicode of GB18030. GB18030 apparently includes Tibetan, Japanese, Korean, Yi, and Thai among others, so the rest of the list in the story seems likely too. I can’t find anything indicating whether government policy about GB vs Unicode has changed since the GB18030 was released, but it doesn’t seem like they could be intentionally hobbling the minority languages, since it’s still in the same standard as the chinese characters.

  6. Comment received 1 year, 2 months ago from Patrick Hall

    Thanks for the comment Serapio. Your blog is awesome, definitely going to be spending some time looking around… (e fala português? A kindred spirit ☺)

    This GB18030 stuff is all news to me; I hadn’t realized that there is a Chinese standard based on ISO 10646 (which essentially is Unicode… or something like that…).

    In any case, I received a similar explanation in an email from Andrew West with further background on the relationship of Unicode to GB18030:

    China doesn’t really recognise Unicode as such, but does recognise and
    participate in the development of the corresponding ISO standard,
    ISO/IEC 10646. The PRC encoding standard, GB-18030, is effectively an
    encoding form of 10646/Unicode, as it has a one-to-one mapping to
    10646. GB18030 thus encapsulates the entire repertoire of
    Unicode/10646.

    All of China’s efforts to support minority languages and historic
    scripts is within the framework of ISO/IEC 10646 and GB-18030. So in
    effect they are talking about Unicode, even though they don’t call it
    that.

    Hope this makes some sense.

    More info about GB 18030 - at Wikipedia.

    So yeah, good news. I interpret this all to mean that the changes that they’re talking about probably are Unicode (even though it’s under another name). Still curious to know the exact nature of the work they’ve done with regard to these specific languages, but it seems like positive news to me.

    Thanks to you both!

  7. Comment received 1 year, 2 months ago from Andrew West

    Just to elaborate a bit on the standardization process involved, Unicode and WG2 (which is the ISO/IEC working group that is responsible for ISO/IEC 10646) work in parallel to develop two synchronised standards with the same character repertoire. But whilst Unicode is an American consortium with mostly industry membership, the membership of WG2 (actually its parent committee SC2) comprises various national bodies. There is considerable overlap in membership of the two committtees (Unicode and WG2), and very close liaison between them, so in general the two committees work in harmony. China has a made a commitment to tie its character encoding standards to ISO/IEC 10646, and so has to work within the framework of the international standard. To this end China is a very active member of WG2 and its Ideographic Rapporteur Group (the group responsible for coordinating the encoding of Han ideographs).

    To give you some idea of the level of Chinese involvement in this process, at the WG2 meeting in Frankfurt that I was at two weeks ago, China was represented by 11 delegates, far more than any other body, including three Tibetans, two Uyghurs and at least one of Dai nationality. Topics that were discussed at the meeting included the addition of 4,000 more CJK ideographs, Tibetan collation issues (not an issue in Vista), the encoding of the ancient Turkic/Uyghur Orkhon script (proposal by Professor Silamu of Xinjiang University), and the encoding of the Lanna script that is used by the Dai nationality in China as well as in Thailand and elsewhere.

  8. Comment received 1 year, 2 months ago from serapio

    “Your blog is awesome, definitely going to be spending some time looking around… (e fala português? A kindred spirit ☺)”

    Thanks. Posso fingir falar português, mais de verdade é só espanhol com características portugueses.

  9. Comment received 1 year, 2 months ago from Patrick Hall

    A, você fala portunhol. Eu sou falante nativo do portunhol. :P

  10. Comment received 1 year ago from Chris Fynn

    The Chinese National standard which is supposed to correspond directly (code point for code point) to ISO/IEC 10646 is GB13000. GB10830 uses different encoding - but there is supposed to be a one to one mapping between the two character encodings.

    For some “minority” scripts like Tibetan, China has developed a national standard which eschews the “atomic” character model of ISO/IEC 10646 & Unicode and places thousands of pre-composed ligatures in the PUA.

    See: Andrew’s Precomposed Tibetan Part 1 : BrdaRten &
    Precomposed Tibetan Part 2 : Stuck in the PUA
    for a discussion.

    On my site there is a chart of Part A of the Chinese Standard for pre-composed Tibetan with the corresponding encoding for these ligatures using “standard” Unicode ISO/IEC 10646 Tibetan characters.

    - Chris

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.