Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

Google’s Stemming Considered… Not So Useful.

Written by Patrick Hall, 1 year, 4 months ago.
Tags: , , , .

It seems lately that Google has increased the amount of stemming that takes place on search queries. At least, that’s my anecdotal impression.

Here’s what I mean:

Try this search:

“like an index” genes

Almost all of the results that come back treat genes as the name Gene.

It’s true that I can put quotes around that term:

“like an index” “genes”

and get what I originally intended.

Personally I don’t find that sort of second guessing very useful. If the user bothered to type the plural, they want the plural.

Would anyone argue that it would make sense to return plurals for singular searches, getting “genes” for “gene”? It doesn’t make that much sense to me to turn that around, which is what Google is doing at the moment.

8 Comments for 'Google’s Stemming Considered… Not So Useful.'

  1. Comment received 1 year, 4 months ago from A.S.

    Yes, the stemming is extremely annoying, especially when trying to use Google for linguistic research. I live in continual fear that they might introduce stemming even inside of quotation marks…

  2. Comment received 1 year, 4 months ago from Patrick Hall

    Hi Anatol,

    I noticed today that

    site:www.library.yale.edu translation - Google Search

    Will return not just “translations” but also “translator” and even “translator’s”.

    Sigh.

  3. Comment received 1 year, 4 months ago from A.S.

    That is truly horrible. There seems to be neither rhyme nor reason to the morphological behavior of this function:

    translation will return the words you mention as well as the verb form translate, but strangely not translates, translated or translating;
    translate will return all the derived nouns as well as the third person translates, but not translated or translating;
    translates will return translation and translations but not translator or any of its forms, and the verb forms translate and translated, but not translating;
    translating will return the same nouns and the verb form translate but not the other verb forms;
    translated will return the same nouns and translate but not the other verb forms.

    What are they thinking?

  4. Comment received 1 year, 3 months ago from Patrick Hall

    Hi again Anatol. I see from your Blog that you’re a German speaker.

    Any evidence that Google does similar stuff in German?

  5. Comment received 1 year, 3 months ago from A.S.

    Hi Patrick, yes, Google does the same thing in German and no more systematically so than in English… There are too many inflectional forms for me to test them all, but here are some examples:
    - übersetzen (‘translate’, 1st/3rd present or infinitive) returns Übersetzer (‘translator.masc‘) but no inflectional forms and not the female form Übersetzerin, and it doesn’t return Übersetzung (‘translation’) or its plural nor any other verb forms;
    - Übersetzung does not return anything but itself, not even the plural;
    - Übersetzer returns the infinitive/1st/3rd present übersetzen but no other verb forms nor the nouns Übersetzung or Übersetzerin;
    - Übersetzerin (‘translator.fem’) returns nothing but itself, not even the masculine form;
    - a randomly chosen verb form, übersetzt (past participle/3sg present), returns nothing but itself.

    It would be interesting to know what kind of algorithm Google uses to decide which forms to include in any given search. My guess is that it has nothing to do with linguistics (duh), but is somehow based on the frequency of these forms and perhaps their per-document co-occurrence.

  6. Comment received 1 year, 3 months ago from Patrick Hall

    I was thinking the same thing with regard to frequency, Anatol.

    It might be interesting to take a set of these terms and see if their frequencies correspond somehow to their occurence patterns.

    I’ll push that on my mile-long todo list ;)

  7. Comment received 1 year, 3 months ago from Patrick Hall

    aaargh

    aaaaaaaaargh

  8. Comment received 1 year, 3 months ago from Patrick Hall

    I’m gonna stop moaning about this right… now.

    As soon as I’ve moaned about this:

    “apartment maintenance ” tipping

    I wasn’t looking for tips, I was looking for information about tipping.

    Ugh!

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.