Andrew Channels Dexter Pinion

Wherein I write some stuff that you may like to read. Or not, its up to you really.

November 26, 2004

Validating Relax NG with libxml2 and Python

Today's chore, validate that an XML document adheres to a schema defined in Relax NG. Using the libxml2 toolkit. Give this a try;

import libxml2

def fileRead(filename, attrib):
    myFile = open(filename, attrib)
    contents = myFile.read()
    myFile.close()
    return contents

def isValid(schemaFileName, instanceFileName):
    success = False
    schema = fileRead(schemaFileName, 'r')
    instance = fileRead(instanceFilename, 'r')
    rngParser = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))
    rngSchema = rngParser.relaxNGParse()
    ctxt = rngSchema.relaxNGNewValidCtxt()
    doc = libxml2.parseDoc(instance)
    ret = doc.relaxNGValidateDoc(ctxt)
    if ret == 0:
        success = True
    # Validation completed, let's clean up
    doc.freeDoc()
    del rngParser, rngSchema, ctxt
    libxml2.relaxNGCleanupTypes()
    libxml2.cleanupParser()
    if libxml2.debugMemory(1) != 0:
        print "Memory leaked %d bytes" % libxml2.debugMemory(1)
        libxml2.dumpMemory()
    return success

As usual, I'm liberally borrowing from the work of others. I defined my schema file after a quick skim through the Relax NG tutorial. The libxml2 documentation was worse than useless but google bought AMK and Dave Kuhlman to my aid.

There is a remarkable similarity between their code and mine, the working parts are theirs and the bugs are all mine. I'm posting my snippet because it is the simplest possible way I could find to validate my XML document against my schema and that's quite useful to me.

Posted by Andy Todd at November 26, 2004 08:50 AM

Comments

I don't suppose you've had any luck validating XML documents against RelaxNG compact schema using Python? I tried a while ago and ended up having to shell out to jing.

Posted by: Simon Willison on November 27, 2004 02:44 AM

That's my next step Simon. But in AMK's page that I referenced above he uses jing as well. As far as I can tell libxml2 doesn't support the compact schema so you need another tool to do it.

Posted by: Andy Todd on November 27, 2004 11:28 AM

Why not just use xvif or 4Suite to validate RelaxNG in Python. And if you want to use compact schemas, use my rnc2rng to touch your schemas up first.

See, for details:

http://www-106.ibm.com/developerworks/xml/library/x-matters27.html

Posted by: David Mertz on November 29, 2004 04:37 AM

Here is somebody who shows code that actually WORKS (I tried it), using libxml2.

If you suggest xvif (? never heard of) or 4Suite, then please also show code that works and that shows why your suggestion would be better.

My experiences with for example 4Suite are horrible, so I am happy with the example from Andrew using libxml2.

Posted by: Will Stuyvesant on December 2, 2004 02:06 AM

Challenge accepted. Code that WORKS on 4Suite:

http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/relaxng

Mr. Stuyvesant might not like it, but it works, and some do find it useful.

Re: XVIF, Google is your friend.

Posted by: uche on December 6, 2004 09:47 AM

And indeed, thank you Uche!

I installed 4suite from ftp://ftp.4suite.org/pub/4Suite/
there is an .exe with Python2.4 in the name and installing is just a matter of running it.

Then I tried the examples on the link you gave, and with them I was able to create a Python function that takes XML and RelaxNG schema as input and returns 0 or 1. Great!

I don't know why you wrote "Mr. Stuyvesant might not like it": If I can install it and it is not too hard to use I usually like it a lot! The last experience I had with 4Suite is months old, back then it was not possible for me to install it in such a way that I could use it, now I can. Big improvement.

Minor nitpicks: on your examples page in the 2nd example use cmdline-parameter 2 rng-tut7.xml instead of rng-tut1.xml for the non-valid one.
Another thing is making the Windows.exe download for 4suite available via HTTP, not FTP since that is rather slow via Internet Explorer.

Anyway, good job, also lots of other XML-related modules; 4Suite in my toolbox too now!

Posted by: Will Stuyvesant on December 6, 2004 11:43 PM