Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Google and UPenn’s Ngrams

Written by Patrick Hall, 1 year, 9 months ago.
Tags: , .

The Google folks have updated the blog post about releasing a veritable avalanche of ngrams, which we mentioned here a while back.

Unfortunately, unless I’m mistaken, it seems the data’s all English. Which is really a great thing, if you’re interested in English exclusively, but not so much for us, since we aren’t. It’s also $150, which I suppose is fair enough, considering the amount of work that must have gone into spidering all that data, and converting 24 gigs to UTF-8 (no mean feat, that), and then filtering out everything but English.

(That last bit is where I cry.)

Anyway, here’s hoping there will be some similarly cool multilingual content somewhere down the road.

No Comments for 'Google and UPenn’s Ngrams'

No comments yet.

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.