ElementTree is a "pythonic" XML parser interface developed by Fredrik Lundh which is included in the Python standard library since version 2.5. It provides a very simple and intuitive API to process XML (well, much simpler and more intuitive than usual parsers). lxml is a more efficient parser with a compatible interface. Here are some useful tips to use ElementTree and lxml.
The ElementTree documentation included in the Python 2.5 manual is far from complete. Fredrik Lundh's pages on his effbot website are still necessary to take advantage of all useful ElementTree features:
In order to have a portable code, it is necessary to support versions of ElementTree before and after Python 2.5. It can be done this way:
try: # Python 2.5+: batteries included import xml.etree.ElementTree as ET except ImportError: try: # Python <2.5: standalone ElementTree install import elementtree.ElementTree as ET except ImportError: raise ImportError, "ElementTree is not installed, see http://effbot.org/zone/element-index.htm"
You may also replace ElementTree by cElementTree to get an optimized version of the parser developed in C. See below for a performance comparison.
lxml is another module providing an ElementTree-compatible API with additional features thanks to the use of libxml2 and libxslt libraries:
Official website: http://codespeak.net/lxml/
It is possible to easily switch from ElementTree to lxml simply by changing the import lines:
try: import lxml.etree as ET except ImportError: raise ImportError, "lxml is not installed, see http://codespeak.net/lxml/"
When parsing large XML files, performance matters. For example I parsed a large and complex 11MB XML file using ElementTree, cElementTree and lxml, first in a normal environment and then with psyco enabled. Here are the results:
1) parsing with lxml... lxml: 1.231 s 2) parsing with cElementTree... cElementTree: 4.416 s 3) parsing with ElementTree... ElementTree: 15.927 s same tests with psyco.full() enabled: 4) parsing with lxml... lxml: 4.486 s 5) parsing with cElementTree... cElementTree: 2.731 s 6) parsing with ElementTree... ElementTree: 14.419 s
This simple test may not be very representative, but it clearly shows two things:
So as a conclusion I would recommend lxml for most XML processing, with a fallback to cElementTree for portability, such as this:
try: # lxml: best performance for XML processing import lxml.etree as ET except ImportError: try: # Python 2.5+: batteries included import xml.etree.cElementTree as ET except ImportError: try: # Python <2.5: standalone ElementTree install import elementtree.cElementTree as ET except ImportError: raise ImportError, "lxml or ElementTree are not installed, "\ +"see http://codespeak.net/lxml "\ +"or http://effbot.org/zone/element-index.htm"
To be continued...