Hacklog: Blogamundo — poking holes in the language barrier since approximately 1 month from now

b
l
o
g
a
m
u
n
d
o

My Favorite Data Structure

Written by Patrick Hall, 2 years ago.
Tags: , .

The “dictionary” data structure (also called mappings or hashes or associative arrays) is the workhorse of representing relationships between pieces of data in many programming languages, including Python.

But these data structures represent a one-to-one mapping.

I’ll show you what I mean. We start with a dictionary:

(By the way, should you try to cut & paste any of this code, watch out for the newlines… I’ll post a plain text version soon. Wordpress has some rather… helpful ideas about how to fix quotes and stuff. There are also some wayward backslashes but… my will grows weak.)

>>> d = {}

And then we add a few keys and values:

>>> d['a'] = 1
	
>>> d['b'] = 2
>>> d['c'] = 3
>>> print d
{'a': 1, 'c': 3, 'b': 2}

But if we put in a new value for ‘a’:

>>> d['a'] = 10
>>> print d
{'a': 10, 'c': 3, 'b': 2}

The old value is lost.

But what if you want to keep the old value around, that is, to collect all the values that ‘a’ is set to?

That’s where we need something that might be called a “Collection.” (Thanks to Jonas for helping industrialize this). If you’re familiar with Python classes, you’ll recognize that this is a subclass of the dictionary type — that means that you can add to the collection and look things up just as if it were a normal dictionary.

class Collection(dict):
   def __init__(self, *args, **kargs):
       dict.__init__(self, *args, **kargs)
       for k, v in self.items():
        dict.__setitem__(self, k, [v])
   def __setitem__(self, item, value):
      if item in self:
        self[item].append(value)
      else:
        dict.__setitem__(self, item, [value])

The implemenation is a bit tricky, perhaps, but it’s easy enough to use…

>>> from collection import Collection
	
>>> c = Collection()
>>> c['a'] = 1
>>> c['a'] = 10
>>> c['b'] = 2
>>> c['c'] = 3

Notice that we entered 1 and 10 as the values of the key ‘a’, now:

>>> print c
{'a': [1, 10], 'c': [3], 'b': [2]}
	

So if we look up the value of the key ‘a’, we get both of them back in a list:

>>> print c['a']
[1, 10]

This is great, because it lets you pigeonhole data into categories. For instance, say you wanted to group words into lists according to their first letter:

>>> randomwords = \"about always asking authors brainymedia
brainyquote comedian december dictionary exact
history inquire paula poundstone taken their topics trivia trying”.split()
	
>>> c = Collection()
>>> for word in randomwords:
…     firstletter = list(word)[0]
…     c[firstletter] = word
	
>>> for k,v in c.items(): print k, v
a [’about’, ‘always’, ‘asking’, ‘authors’]
c [’comedian’]
b [’brainymedia’, ‘brainyquote’]
e [’exact’]
d [’december’, ‘dictionary’]
i [’inquire’]
h [’history’]
p [’paula’, ‘poundstone’]
t [’taken’, ‘their’, ‘topics’, ‘trivia’, ‘trying’]

Neat right?

Here’s a somewhat more realistic example, where we create a mapping of letters to all the words that contain those letters:

>>> letters = list(set(''.join(randomwords)))
>>> # this creates a list all the letters in our random word list
>>> print letters
['a', 'c', 'b', 'e', 'd', 'g', 'i', 'h', 'k', 'm', 'l', 'o',
‘n’, ‘q’, ‘p’, ’s’, ‘r’, ‘u’, ‘t’, ‘w’, ‘v’, ‘y’, ‘x’]
	
>>> c = Collection()
	
>>> for letter in letters:
…     for word in randomwords:
…             if letter in word:
…                     c[letter] = word
>>> print c[’a']
[’about’, ‘always’, ‘asking’, ‘authors’, ‘brainymedia’, ‘brainyquote’, ‘comedian’, ‘dictionary’, ‘exact’, ‘paula’, ‘taken’, ‘trivia’]
>>> for k,v in c.items(): print k, len(v)
a 12
c 5
b 4
e 9
d 5
g 2
i 11
h 3
k 2
m 3
l 2
o 8
n 9
q 2
p 3
s 6
r 10
u 6
t 12
w 1
v 1
y 6
x 1

There’s nothing revolutionary about any of this, it’s just that it’s convenient, and I’ve found that for whatever reason this structure is useful in linguistic contexts.

No Comments for 'My Favorite Data Structure'

No comments yet.

Leave a comment

(required)

(required)

Comment moderation may delay the posting of your comment. XHTML: You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <img src="" alt=""> <strike> <strong> . Don't forget to close them after use.