Languages of the Blogosphere
Dave Sifry’s recent recent state of the blogosphere post has gotten a fair amount of attention — his data suggest that English has lost the title of “most common language of blogs” to Japanese. He points out some important caveats on this claim, however, you can read more about those in Dave’s post and some further interesting comments at Ethan Zuckerman’s response.
Those caveats are enough to keep me from taking the specific ranking of languages too seriously in detail. (But then, it’s not as if any particular ranking of the popularity of languages in the world can be taken too seriously, either.)
More importantly, I don’t think the specifics of the ranking really matter that much. If we put aside the “English apocalypse” banter that many responses have focused on, we can see the more important message here:
The blog world is a very multilingual place, and there isn’t any language in which a majority of blogs are written.
As for the specific contention that Japanese is a heavy hitter in the blog world… well… that’s not too surprising, it’s Japan! And I don’t see anything too shocking in the rest of the list. Actually, it’s quite similar in broad outline to the list of most active Wikipedias. (Although I share this blogger’s surprise [ES] that the number of Spanish blogs seems to have decreased the rate of growth in the number of Spanish blogs seems to have slowed relative to that of other languages.)
However, I do have another problem with this ranking: we only get to hear about the top of the list. I’d like to see the big picture — the top 100 or so.
Are they still using the default set of languages that Maciej Ceglowski built into his initial release of Languid, or have the Technorati folks added languages to the default list of 70 or so languages? (Languid, like its predecessor TextCat , can only identify languages on which it has been trained, of course.)
For me the most insteresting linguistic data with regard to the blogosphere isn’t in the top ten, it’s in the nascent blogging communities that are just now popping up. I watched with amazement as the Welsh blogosphere grew from just one guy into a sprawling community. There seemed to be a “critical mass” sort of phenomenon that took place there: suddenly there were too many Welsh blogs to keep in your aggregator. (Even assuming that you could read more than, oh, a paragraph a day. That’s about my rate with Welsh. ☺)
How about it, Dave, any more data to share?
3 comments.
Technorati tags: Code, cymraeg, japanese, Language and the Web, search, technorati, welsh, 日本語
As far as I can understand, it’s not that “that the number of Spanish blogs seems to have decreased,” but rather that the ratio of posts written in Spanish compared to the overall number of posts is decreasing.
That makes sense to me, but I would be beyond skeptical of any conclusion that there are now less weblogs in Spanish than there were three months ago.
Hi David,
Yes, you’re exactly right; my wording was inaccurate. I corrected it, thanks.
[…] It’s important to note, as Patrick Hall and Ethan Zuckerman previously have, that this latest data mark a change from the days when English was by far the dominant language of the web. But just because the English slice of the pie is decreasing, does that give us reason to celebrate an increase in language diversity? […]