Machine translation and Open Source
Information Week blogger Serdar Yegulalp has some thoughts on the intersection of machine translation and open source:
Talk To Me, Openly - Open Source Blog - InformationWeek
He’s got an interesting anecdote about how he tackled studying Japanese, and it serves as an interesting intro to the idea behind bitext and statistical machine translation:
..Since I didn’t have money for classes, I homebrewed my own self-teaching method. I went out and bought a grammar guide, and then two copies of a given book — one in Japanese, the other an English translation — and sat with them side-by-side, comparing the two on a sentence-by-sentence and phrase-by-phrase level. It worked, up to a point, and while I’m no native speaker I can certainly figure out a fair amount of what’s put in front of me as long as I have a dictionary.
I didn’t know it at the time, but this parallel-texts technique is actually one of the best ways to also teach a computer to perform translations between languages.
He’s also got some thoughts on licensing issues involved with the data used to build MT systems, which is a topic which I don’t think has gotten enough attention.
(Please consider this an open thread for your thoughts on how MT and FOSS can and should interact.)
No comments yet.
Technorati tags: machine translation, open source, translation