Andrew Channels Dexter Pinion

Wherein I write some stuff that you may like to read. Or not, its up to you really.

October 19, 2004

Interesting Behaviour in ElementTree

I've been using the Effbot's ElementTree module to perform some XML processing. As I've mentioned before, I like it because it is the most Pythonic XML library I have found. But I've discovered an irritating feature. I'm not saying it's a bug because it could be a failure of understanding on my part. I'm hoping that Fredrik will read this and let me know what I'm doing wrong.

Allow me to explain. I have a simple XML file of the form;

<?xml version="1.0" encoding="UTF-8"?>
<wibble>
  <wobble id="1">
    <child1>
      Child value 1
    </child1>
    <child2>
      Child value 2
    </child2>
  </wobble>

  <wobble id="2">
    <child1>
      Child value one
    </child1>
   </wobble>
</wibble>

All I wish to do is iterate through the "wobble" elements and if they have a child2 element print out it's contents. Looking at each element individually it's fine;

>>> from elementtree import ElementTree
>>> tree = ElementTree.parse("wibble.xml")
>>> root = tree.getroot()
>>> wobbles = root.getchildren()
>>> wobble = wobbles[0]
>>> child2 = wobble.find("child2")
>>> if child2:
...      print child2.text.strip()
...
Child value 2
>>> wobble = wobbles[0]
>>> child2 = wobble.find("child2")
>>> if child2:
...      print child2.text.strip()
...
>>>

But if we iterate through them the find operation doesn't seem to produce any results;

>>> from elementtree import ElementTree
>>> tree = ElementTree.parse("wibble.xml")
>>> root = tree.getroot()
>>> for wobble in root.getchildren():
...      child2 = wobble.find("child2")
...      if child2:
...          print child2.text.strip()
...
>>>

I would expect to get some output from the second snippet of Python code, specifically the string "Child value 2", so seeing nothing is a bit of a surprise. I'm hoping that it is either a bug in ElementTree or something simple that I have misunderstood. Please enlighten me in the comments.

Posted by Andy Todd at October 19, 2004 03:18 AM

Comments

I've never found getchildren to work as expected. The XPath support in ElementTree seems to be easier to use, or at least it works as expected.

from elementtree import ElementTree
tree = ElementTree.parse("wibble.xml")
root = tree.getroot()
for wobble in root.findall("*/child1"):
  print wobble.text.strip()
for wobble in root.findall("*/child2"):
  print wobble.text.strip()

Posted by: Brian Lenihan on October 19, 2004 04:42 AM

Thanks Brian, that works. But sadly it loses the link between child2 and it's parent.

Just to make life more difficult in my *actual* code I want to output an attribute of the 'wobble' element as well as the contents of any 'child2' elements if they are present.

Posted by: Andy Todd on October 19, 2004 05:11 AM

Ah, in that case, listing three in an old article by Uche Ogbuje should help:

http://www.xml.com/pub/a/2003/02/12/py-xml.html

I ended up doing something that seemed like it was easier, but I can't find the code.

Posted by: Brian Lenihan on October 19, 2004 06:25 AM

The one problem with that example is that it uses get_children()

As you correctly surmised in your original comment this is the cause of my pain ;-)

I'm fairly convinced that there is a bug or implementation problem in that method and here's hoping that the effbot can fix it.

In the meantime I may give libxml2 a punt and put up with some nasty DOM manipulation. Although I suspect I'll have to write some wrappers that will end up poorly imitating Element Tree's interface.

Posted by: Andy Todd on October 19, 2004 07:22 AM

Note that elements are sequences (containing subelements), and empty sequences evaluate to false. In your second example, child2 is an element, but it has no subelements, so "if child2" evaluates to false. Use "if child2 is not None" instead.

See the sequence part on this page for some more discussion:

http://effbot.org/zone/element.htm#the-element-type

(also note that you don't have to use the getchildren method; just loop over the element itself)

Posted by: Fredrik on October 19, 2004 08:12 AM

I don't disagree that Fred's ElementTree has a lot of nice things in it. But I *still* strenously argrue that my gnosis.xml.objectify is significantly more "Pythonic" as an XML binding. In general, the difference is gnosis.xml.objectify treats all the data as regular attributes in an object, where ElementTree tends to require method calls to get at data.

E.g.:

>>> from gnosis.xml.objectify import make_instance, children
>>> root = make_instance('wibble.xml')
>>> wobbles = root.wobble
>>> wobble = wobbles[0]
>>> child2 = wobble.child2
>>> print child2.PCDATA.strip()
Child value 2

Or for the second example:

>>> for wobble in children(root):
... if hasattr(wobble,"child2"):
... print wobble.child2[0].PCDATA.strip()

Posted by: David Mertz on October 19, 2004 05:14 PM

I immediately guessed your problem. I've only been bitten once forgetting that a node with no subchildren evaluates to False. It was a nasty bite (thought I'd been bitten by a bug, too), but this matter is documented and the Elementtree documentation is clear and not excessive. Unlike a lot of XML libs, the Elementtree interface is the right size and form for me to deal with and easy to describe to others.

I'm okay with the possibility that this empty list behavior may change with 1.3 because Elementtree is so quickly evolving, in ways that have always been incrementally better for my needs. Far outweighting this one nasty test.

I also get along with Expat just fine, so I like the Elementtree / Expat combination.

Posted by: Brian Mahoney on October 19, 2004 06:46 PM

ElementTree also writes XML, and I believe gnosis.xml.objectify does not.

ElementTree is a nice compromise between a true XML DOM and Pythonic data structures.

It looks like the OP just got bit by Python using len(0) to infer boolean false

Posted by: Manuel on October 19, 2004 09:28 PM

Fredrik's comment reminded where to look for my code snippet (bad filing system). This is what is would look like as modified for your for your wibbles.

from elementtree import ElementTree
tree = ElementTree.parse("wibble.xml")
root = tree.getroot()
for elem in root:
  print elem.tag, elem.items()
  for e in elem:
    print e.tag, repr(e.text.strip())

The Gnosis tools do have their nice points. Fredrik's code is almost always extremely elegant even though his docs can be skimpy or even completely missing.

The real problem is that there are many, many, ways to work with XML in python and none of them are really as pythonic as I would like. To me, ElementTree comes the cloest, but it is not the "batteries included" holy grail we all have come to expect.

Posted by: Brian Lenihan on October 20, 2004 05:23 AM

Fredrik's comment reminded where to look for my code snippet (bad filing system). This is what is would look like as modified for your for your wibbles.

from elementtree import ElementTree
tree = ElementTree.parse("wibble.xml")
root = tree.getroot()
for elem in root:
  print elem.tag, elem.items()
  for e in elem:
    print e.tag, repr(e.text.strip())

The Gnosis tools do have their nice points. Fredrik's code is almost always extremely elegant even though his docs can be skimpy or even completely missing.

The real problem is that there are many, many, ways to work with XML in python and none of them are really as pythonic as I would like. To me, ElementTree comes the cloest, but it is not the "batteries included" holy grail we all have come to expect.

Posted by: Brian Lenihan on October 20, 2004 05:27 AM