Unicode Script property and Javascript
Dear lazyweb,
I would like a Javascript function to work like this:
magicalFunction('カ')
→ 'Katakana'
magicalFunction('a')
→ 'Latin'
magicalFunction('አ')
→ 'Ethiopic'
In other words, I want to be able to access the script property described in UAX #24: Script Names.
This actually exists already in Perl regular expressions, where you can just say \p{Katakana} in a regex to match Katakana characters.
Maybe such a thing could end up in the next version of Javascript… not that I have the slightest idea where to make that suggestion. But in the meantime, it seems to me that there should be a unicodescripts.js or some such.
Any ideas on what would be an efficient programming approach to implementing such a data structure, something that might be reasonably squeezed into a .js file?
Update: Longtime reader Edward O’Connor emails to suggest xregexp:
…you should check out the unicode plugin for xregexp:
http://blog.stevenlevithan.com/archives/xregexp-unicode-pluginThis does pretty much exactly what you want.
Lazyweb, the greatest programming platform in history!
In vaguely related news, rubyistas out there should check out Edwards’s talk from MerbCamp.

In ICU it’s implemented with a large C array, where the indices are the code points and the values the code for the script. But that’s not really an option for Javascript.
Maybe the whole data could be encoded into a large binary string and then you’d have special functions to access the right bits from the string.
Or, as the data is a mapping from a range (of numbers) to a value, you could store the ranges in a tree. Finding an element is not as fast as with the string (logarithmic instead of O(1)) but as memory-efficient as it gets.
Some time ago I implemented another approach in Python, a stack of dicts, which is kind of a compromise of the above solutions in terms of speed and memory. Contact me if you’re interested in the code.
Anyway, interesting problem :).
You’re looking for this.
http://www.sungnyemun.org/ScriptName.html
Thanks folks… yeah, dda that seems to be a workable solution.
Edward, the one you linked is great but it seems to only handle looking up blocks, not scripts, which happens to be slightly different from what I was looking for.
Dda, maybe you should get in touch with the guy who wrote the plugin Edward linked?