Is there such a thing as “Linguistic Computing”?
Jon Udell has this to say about mentorship and open source:
Open source software development, as a profession, is an early adopter of a work style that can also characterize many other professions. The key aspects of that work style are transparency, accountability, network-mediated collaboration, and narration of work.
My own educational background is in linguistics, not programming. The only reason that I’ve been able to become a hacker is the dynamic that Jon describes: if you want to learn to hack, there is a “guild” waiting for you. You can rise to whatever level of expertise you work up to. It’s a society, a culture. There are tour guides. There are neighborhood hangouts. There are public libraries. And there are blogs, too many to link.
This was all a welcome discovery to me when I learned the hard way that the job opportunites for a language geek pale in comparison to job security for a language geek who can program.
So, it’s no exaggeration to say that open source has offered me a career.
But, some of the things I’ve had to learn were too hard to learn. I’ve read more conflicting theories about the right way to handle Unicode than I care to remember. But Unicode should be bread and butter to anyone who studies language with a computer! Unicode is an absolute fundamental. Imagine if there were no man page for bash, and you start to understand the sort of hoops a would-be language hacker has to leap through when approaching Unicode for the first time.
Programmers often scoff at such complaints, because they’ll tell you “it’s just character encoding, what’s so hard about it?” That’s because most programmers don’t remember what’s it’s like not to know a character encoding from a salad fork. And the way that individual writing systems are set up within Unicode has its complications as well. And this is just one subtopic.
Unlike the “guild” that welcomed me as I tiptoed my way into general programming and hackerdom, there are no open arms to help you into the world of “linguistic computing.” There simply is no “conscious community” of people interested in thinking of the intersection of these topics as a unified discipline.
Perhaps this is all my own wishful thinking. Perhaps there is no discernable discipline surrounding the intersection of language and computing.
You tell me: is there a thread here?
- localization
- web-based language study
- language creation (!)
- computational linguistics
- natural language processing
- Unicode
- cross-language retrieval
- computer aided translation
- internationalization
- statistical language modelling
- information retrieval
- encoding and keyboarding issues
I look at that list and I see a connection: they’re all to do with “linguistic computing.” (And no, I don’t like that term either, but “language hacking” sounds even worse…)
I know that there are people who share an interest in this “thing.” And sometimes, when a programmer happens to get involved in this stuff, they become afficionados — Jonas (the real programming brains behind Blogamundo), wasn’t into this stuff at all, really, before we became friends, but now he’s dyed-in-the-wool Unicode fanatic, and increasingly a language geek in his own right. But by and large, it’s not “discoverable” on the web.
Why isn’t there more of a sense of community among:
- linguists who who want to dip their feet into programming
- programmers who want to learn more about the mysterious number crunching that goes on in stuff like chardet and SpamBayes
- Language technology professionals who want to promote their ideas and their code to a wider audience — be those ideas from machine translation, information retrieval, l10n, etc.
There are web communities for environmental geeks, economics geeks, nanotech geeks, and god knows how many for politics geeks.
And I hasten to point out that there is no shortage of fantastic blogs about linguistics , language, globalization, writing systems , translation, or localization issues.
But I still feel like this “discipline” of linguistic computing is a distinct thing, which could and should serve as the basis for a vibrant community of people who help each other learn about code, language, and all the rest of this stuff, together.
Am I just being a hippy with all this talk, or does anyone agree?
8 comments.
Technorati tags: Code, Language and the Web
I definitely agree. I’m a programmer and a “linguistics hobbiest” I guess you could call it. More and more companies (like the one I work in) are “going I18n”, so the market is growing for that skill set. I get the feeling though that most places just do enough to support Unicode and I18n/L10n then move on, which is a pity since there’s some much more you could do for your customers. But then I work for a company that produces software for newspapers and other communications media, so maybe I’m biased.
Maybe the members are out there, waiting for the community to form. Maybe we need a nucleation site to start the crystallization of the supersaturated solution of the nascent linguistic computing community (sorry, chemical engineering training rears its head) - any volunteers?
Tag, you’re it ;-)
Hi Sean,
Thanks for the comment. Interesting to hear from someone who works in media software, it’s an interesting sort of “proving grounds” for Unicode acceptance, for one thing. (It’s interesting to watch which online papers have adopted Unicode in India, for instance…)
I’m glad to hear that you agree. I actually used to run a site called fieldmethods.net, with the idea that it would serve as a sort of “watering hole” for this crowd, but a combination of things conspired that resulted in me letting it expire. There was some response, but the interface was pretty lousy (sort of a Slashdot-style thing).
I would love to start such a thing up again, but Blogamundo itself is keeping me plenty busy. I do try to post on these sorts of topic here on a regular basis, but aside from comments it’s not really a community thing.
I do wonder whether it might not make sense to start a “language hackers” mailing list, or something — seems like a pretty easy, low-risk thing to try.
[…] Is there such a thing as “Linguistic Computing”? (tags: compling linguistics programming) […]
True, the comunity is small, but if you look for it, you can find it.
Try:
- http://blogs.msdn.com/michkap/default.aspx
- http://www.microsoft.com/globaldev
- (newsgroup) microsoft.public.win32.programmer.international
True, it seems like a lot of MS, but go ahead, try asking about Linux or Mac and you will get some answers.
What you listed is interesting, but probably difficult to find in one place. But the links above are a good start for:
* internationalization
* Unicode
* encoding and keyboarding issues
* localization
This isn’t exactly what you’ve described, but it is a community of computational linguists, most of whom seem to be more computational than linguistic, which is a big issue in this finding-a-linguistic-computing-group.
I am interested in this stuff, too, of course, but I don’t know anything about Unicode and I’m more comfortable/qualified talking about English language linguistics than any other language’s.
Hiya Mihai,
There does seem to be some interesting stuff in that Microsoft group. I must admit that I’ve never done any Windows programming at all.
But I’m a huge fan of Michael Kaplan’s blog, and transliteration is an abiding interest of mine, perhaps I’ll try that tool.
Thanks for the pointers!
Hi Erin,
Thanks for that, straight into the aggregator. I might have to break down & get a livejournal account… I only have about forty bazillion blog accounts already. ☺
As for Unicode, I have a post that tries to explain the motivation for why it matters so much, check it out:
Why Unicode is Better (Even if you’re not a programmer)
[…] אף אחד מהעיסוקים הללו אינו מהווה יישום ישיר של תיאוריות בלשניות, ולמעשה אין כל התייחסות למרביתם במסגרת לימודי הבלשנות ברוב האוניברסיטאות; עם זאת, כולם קשורים בצורה זו או אחרת בנושאים שבהם יש לבלשנים הבנה טובה יותר מלרוב האנשים: שפה, ייצוג מידע, הקשר לשוני/לא לשוני, וכו‘. מאפיין נוסף של כל העיסוקים הללו הוא המימד הבין-תחומי, המתבטא בכך שהם מתבססים על הבנה ביותר מאשר תחום אחד. אין ספק שכשמגיע שלב חיפוש העבודה, מצבו של מי שלמד רק בלשנות הרבה פחות טוב ממצבו של מי שלמד, בנוסף לבלשנות, גם מקצוע נוסף, כמו מדעי המחשב, מידענות, חינוך, חינוך מיוחד, תרגום, הפרעות בתקשורת, מדעי המוח, ועוד. כמו שכותב כאן מישהו שלמד את זה בדרך הקשה, אפשרויות התעסוקה ל-“language geek“ שיודע לתכנת גדולות בהרבה מאלה של זה שלא יודע לתכנת. […]