Saturday, December 11, 2010

Consider the tuple...

I remember learning Python and wondering what a tuple was for. Why wouldn't you just use a list? Or a dict?

What I've come down to is a few thoughts such as:
1) if I want sortable items, use a list. (list.sort in Python is way fast!)
2) if I want to use an item as a key and find the related info (value), put the key/value pair into a dict.
3) if I want to have possibly-heterogenous-but-related items in a specific unchangeable order, consider a tuple.


Tuples can often be overlooked as a data structure in Python - but can be really useful for things like x,y coordinates or GPS coordinates or timestamps or addresses. It's important to have them in order - you don't want to mix up the x and the y, or the hour and the minutes. You can't do much *to* tuples, they have no methods and are immutable(unchangeable), although you can search in them with in. If you have a tuple that needs something changed, you'll just have to replace it with a new tuple.

Of course, a kewl thing in Python is that you can mix and match. You can have a list of tuples. Or a dict of tuples. Tuples are really useful as keys in a dict, because they're immutable, unlike lists. So, for example, you could have a dict, with keys that are tuple of gps coordinates, and values of a place name at that location, or a house price at a particular address, or you can use "in" to find all the keys that have, say, Sunnyvale, as the name of the city, and find out some value related to Sunnyvale.

Lists and dicts get all the press in Python, because they're really useful and fast. But tuples are pretty awesome in their own way...

EDIT to add examples:

In a list mylist = [2,1], if you call mylist.sort() you get [1,2]. If 2,1 is a coordinate point, then sorting it to 1,2 is a REALLY BAD THING and could lead to really bad bugs.

OTOH: mytuple = 1,2 and you call mytuple.sort(), you get an error because (1,2) is not sortable. This is a GOOD THING, leading it to be usable as a hash (key) in a dict, and for identifying specific things.

A list has no business being coerced to a tuple in order to act as a key in a dict. I think that's an abuse of list. You'd be better off thinking of tuples as a record in a database. "213 1st Street" means nothing when sorted in a list ['1st', '213','Street'), but means everything when left as a tuple ('213', '1st', 'Street') as it should be. Tuples are great for data - when the position *means* something, not just an "ordering". People ask, "why don't you just have named field". A phone number doesn't really need named fields for the 800 to mean something and for it to be important that it not have its order rearranged. Take the following two tuples: (408,555,1212) and (555,408,1212). If you treat them just as lists, you could end up sorting them and they'd be "identical". But they're not identical - they're phone numbers for completely different parts of the country and the structure is meaningful. Which means, tuples would be appropriate here, and that structure is what makes them good as keys.

6 comments:

Unknown said...

Wait, so what do you do when you need to associate a list with a value, besides coerce it into a tuple so it can be a dictionary key?

Unknown said...

I would first consider whether it should have been a tuple in the first place. Of course, there are cases where you have a list as a side effect/return value of some other operation, where you then need to turn the list into some immutable object before you use it as a key. A tuple might indeed be the easiest solution, but it may not be the most appropriate one, depending on your use case. Consider whether that immutable object should be a tuple or whether a string, or even a user-defined object, might be more suitable, based on what else you'll be doing with the object.

Unknown said...

s/tuple in the first place/list in the first place/

Unknown said...

So thinking about lists and tuples. Lists are really useful containers. They can be great as temporary holding places for data that came from other operations. It's not uncommon to end up with a list of strings, for example. Or a list of lists of strings. But what is the type you really want that data to be? Is it a list because you actually need a list? You want to sort in place, or do the same operation to every item in the list in order? Or is it just in a list as a temporary holder for something that properly is another type of object? If the latter is the case, then what is that other type of object?

If the data is really a string, use .join to get your string. If you've got a list acting as a temporary container for data that properly belongs in a tuple, then make your tuple. Don't just leave it hanging around as a list and then "coerce it into a tuple" when you decide you want it to be a key in a dict. Down the road, you might find some hairy bugs that occurred from leaving it as a list, instead of a string or tuple or whatever it properly should have been in the first place.

That's what I mean when I say that tuples shouldn't be treated as just frozen lists. Tuples are first class data structures in their own right, with their own reason for being.

Peter said...

I like to think of it as the difference between a pair of shoes and a two element list of shoes, or the difference between a string quartet and a four element list of musicians. For tuples, the length of the sequence is part of its type.

Paddy3118 said...

There is nothing wrong with 'freezing' a list into a tuple.

But a tuple is more than a mere frozen list!

The contents of a list are ephemeral. What represents one list rather than another would be the lists id. You could use the id of a list as a key in a dict, but, on consideration, what you most usually want is the frozen state of a list at some point in time, and usually the easiest way of doing that is to use a tuple as we have no "frozen list".