Andrew Channels Dexter Pinion

Wherein I write some stuff that you may like to read. Or not, its up to you really.

June 17, 2003

Iterators and Databases

Inspired by David Mertz' latest Charming Python article I've been playing with generators.

An area where they look very applicable to me is in fetching large result sets from a database. Until now, I've always executed the query and then performed a for row in cursor.fetchall() to get and then process all of the results.

This practically means that all of the rows are read into memory before they are processed. If your query is a large one this can use an awful lot of memory.

So, until all of the Python database modules are updated to use generators here is a simple recipe to use;

def genResults(db, sql):
   cursor=db.cursor()
   cursor.execute(sql)
   row=cursor.fetchone()
   while row:
      yield row
      row=cursor.fetchone()

So in your code you can just include for x in genResults(db, "SELECT * FROM table") and carry on as normal.

Beware though, those database modules which only allow one cursor per connection (in which case pass in a cursor not a connection) but otherwise a thing of beauty.

Posted by Andy Todd at June 17, 2003 01:14 PM

Comments

It's considered bad form by some to duplicate code on top of and inside of a loop. The prefered Python way to do this (and I think it looks weird myself) is:

...
while 1:
row = cursor.fetchone()
if row is None: break # Explicitly break
yield row

Posted by: Adam Vandenberg on June 17, 2003 07:20 PM

I do this myself, but using recent Python versions, I prefer:

while True:

Posted by: Simon on June 18, 2003 09:17 AM

Note that for better performance you may want to use fetchmany and then have a second embedded loop to iterate through those sub results.

I had to use that when using ADO as the performance was quite abysmal when using single fetches. I made a simple Rowset reader iterator like this:


def rs_reader(rs):
  while not rs.EOF:
   cols = rs.GetRows(100)
   for row in zip(*cols):
     yield row

Posted by: Holger on June 19, 2003 10:21 AM

Several database modules already implement iterator-based fetching this way:
----------------------
cursor = connection.cursor()
cursor.execute("select ...")
for row in cursor:
...
----------------------

Posted by: Dave on June 20, 2003 04:10 AM

You don't need generators for any of this; you can also do
-------------
cursor = connection.cursor()
cursor.execute("select ...")
for row in iter(cursor.fetchone, None):
____...
-------------
If you have Python 2.3, you can implement the more efficient batched fetching version as
-------------
def make_fetcher(cursor, batchsize):
____def fetcher():
________return cursor.fetchmany(batchsize)
____return fetcher
from itertools import chain
...
rows = chain(iter(make_fetcher(cursor, 1000), None)
for row in rows:
____...
-------------
For both of these it should be obvious how to wrap them up in a nice reusable function.

The second one would be a true thing of beauty (or possibly a monster) if Python had proper currying support -- just type chain(iter(curry(cursor.fetchmany, 1000), None)), and then think up some profound statement comparing Python to Perl. Or don't.

The other cute trick I noticed the other day when pulling out a bunch of key-value pairs from a database was that I could type
return dict(iter(cursor.fetchone, None))
and have everything magically work...

[Apologies for those underscores; this is my first time here, and I couldn't find any notes on how to get indentation.]

Posted by: Nathaniel Smith on June 21, 2003 03:08 AM