XML Canonicalization (C14N) in Python using lxml

XML Canonicalization (C14N) is useful in some cases such as digital signature.
lxml provides a very easy way to do it in Python. However, the current version lxml 2.1 does not give access to all C14N parameters. Here is a simple patch to improve its C14N support.

Here is an example showing how to perform C14N using lxml 2.1:

import lxml.etree as ET
et = ET.parse('file.xml')
output = StringIO.StringIO()
et.write_c14n(output)
print output.getvalue()

XML C14N version 1.0 provides two options which make four possibilities (see http://www.w3.org/TR/xml-c14n and http://www.w3.org/TR/xml-exc-c14n/):
- Inclusive or Exclusive C14N
- With or without comments

libxml2 gives access to these options in its C14N API: http://xmlsoft.org/html/libxml-c14n.html
However, the options are not exposed in the lxml write_c14n method. Current versions of lxml provide only inclusive C14N with comments, which may not always be the right solution.

It is possible to fix that limitation by changing a few lines in lxml source code, and recompiling it.

In serializer.pxi:

#[PL] 2008-07-22: added exclusive and with_comments args
cdef _tofilelikeC14N(f, _Element element, int exclusive, int with_comments):
    [...]
    try:
        if _isString(f):
            [...]
                #[PL]
                bytes = c14n.xmlC14NDocSave(c_doc, NULL, exclusive, NULL,
                                            with_comments, c_filename, 0)
                # end
        elif hasattr(f, u'write'):
            [...]
            #[PL]
            bytes = c14n.xmlC14NDocSaveTo(c_doc, NULL, exclusive, NULL, 
                                          with_comments, c_buffer)
            # end
            writer.error_log.disconnect()
            [...]

In lxml.etree.pyx:

    #[PL] 2008-07-22: added exclusive and with_comments (see serializer.pxi)
    def write_c14n(self, file, exclusive=0, with_comments=1):
        u"""write_c14n(self, file)

        C14N write of document. Always writes UTF-8.
        """
        self._assertHasRoot()
        _tofilelikeC14N(file, self._context_node, exclusive, with_comments)

Here is how to use this patched version:

import lxml.etree as ET
et = ET.parse('file.xml')
output = StringIO.StringIO()
et.write_c14n(output, exclusive=1, with_comments=0)
print output.getvalue()

Download:
The patch and a prebuilt lxml binary installer for Python 2.5 on Windows are attached below.

Fichier attachéTaille
lxml_2.2alpha1_C14N_patch.zip46.44 Ko
lxml-2.2alpha1.win32-py2.5_PL.exe2.31 Mo