h
a
c
k
l
o
g

An wiki-focloir gaeilge

Written by Patrick Hall, December 8th, 2008

Via the Language and Linguistics Reddit, I ran across an interesting account of running a Wiki-style dictionary.

The author, Eoin Ó Conchúir, describes how he has been running an English↔Irish wiki-style dictionary at IrishDictionary.org and FocloirGaeilge.ie.

I found this bit quite interesting:

People submit everything and anything. As much as I could, I tried to write the wording on the pages to guide people into adding words into the database if they knew the translation.  Ok, let’s look at the last 15 *Irish* headwords added to the dictionary:

dúirt
doire
salus
emphasize
josh
an

unwise
cuirm
goog morning ruth
sorry
i am very well
sweater
ma chara
symphony
muintir

You see, this is a validation pain, for want of a better term. “emphasize” is not an Irish word. “josh” is not an Irish word”. “goog morning ruth” not an Irish word, far from it!

Validation pain is something I can identify with. On my statistical language identifier, I’ve found that no amount of help text convinces users that it will not work without enough text. I have plans to enforce this rule programmatically.

But even that won’t prevent people from putting in bad data—there are plenty of examples of people pasting in the same two words over and over a few hundred times, so that they are satisfied that there’s “enough” text in the system.

Sigh.

As for Eoin’s project, I wonder if it might make sense to go out on the web and acquire tons of Irish text, tokenize it, and then put those words into the database as “empty” headwords.

Then, if someone tries to submit a word which has never been seen, at least one could flag those words as doubly suspicious, and put a low priority on vetting them.

Just a thought.

Anyway, cool project, Eoin!

PS, I have no idea if the title of this post is acceptable as Irish, but I couldn’t resist trying… never had a posted titled in Irish, you see… ☺

2 Comments for 'An wiki-focloir gaeilge'

  1. Comment received December 8th, 2008 from Eoin

    Hi Patrick. It’s an interesting idea to have an empty list of headwords taken from lots of text. I hadn’t thought of approaching it that way.

    Good attempt with the title :) My grammar isn’t great, so I’m not sure if the words should transform such as follows “An wiki-fhoclóir Ghaeilge”.

  2. Comment received December 8th, 2008 from Patrick Hall

    Hey Eoin!

    Small world.

    Ah, I see that Irish has the same bug feature as Welsh, consonant mutations. ☺ I never got those straightened out in Welsh, either.

    Ádh mór ort with the dictionary!!

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> . Don't forget to close them after use.