Martin Böhm, Jean-Jacques Dubray
Eigner Precision Lifecycle Management
We have created a simple test bed to evaluate the performance of DOM4J versus Xerces/Xalan. These results are intended to give a rough idea rather than exhaustive test suite. In particular we focus our study on XML document which look like database result set. It is pretty clear that performance results may vary greatly based on the topology of your XML.
The test was designed with two topologies in mind:
a) to have elements only and each element name is unique in the whole document.
<?xml version="1.0" encoding="UTF-8"?>
<ItemResultSet>
<Item>
<Attr0x0>123456789</Attr0x0>
<Attr1x0>123456789</Attr1x0>
<Attr2x0>123456789</Attr2x0>
<Attr3x0>123456789</Attr3x0>
<Attr4x0>123456789</Attr4x0>
<Attr5x0>123456789</Attr5x0>
<Attr6x0>123456789</Attr6x0>
<Attr7x0>123456789</Attr7x0>
<Attr8x0>123456789</Attr8x0>
<Attr9x0>123456789</Attr9x0>
<Attr10x0>123456789</Attr10x0>
<Attr11x0>123456789</Attr11x0>
<Attr12x0>123456789</Attr12x0>
<Attr13x0>123456789</Attr13x0>
...
</Item>
<Item>
<Attr0x1>123456789</Attr0x1>
<Attr1x1>123456789</Attr1x1>
<Attr2x1>123456789</Attr2x1>
...
</ItemResultSet>
b) To use attributes only
<?xml version="1.0" encoding="UTF-8"?>
<
ItemResultSet><Item guid="0" Attr0="123456789" Attr1="123456789" .../> <Item guid="1" Attr0="123456789" Attr1="123456789" .../>
</ItemResultSet>
We have tested for 1000,100,10,1 items the time it takes to:
a)
/*/*/Attr1x1
/*/*/Attr1x500
/*/*/Attr1x999
/*/*/Item
b)
/*/*[@id="1"]
/*/*[@id="500"]
/*/*[@id="999""]
All tests are running on my lapdog (PIII, 500MHz, 512Mb) We allocate a heap size of 256 Mb when we start the test.
All times in ms | |||||||
Create Document | Write Document to disk | Reparse the document from disk | |||||
Items | dom4j | xalan | dom4j | xalan | dom4j | xalan | |
1000 | 641.0 | 571.0 | 531 | 852 | 2020 | 2664 | |
100 | 9.0 | 20.0 | 60 | 61 | 62.99 | 68.6 | |
10 | 0.7 | 1.0 | 10 | 10 | 11.92 | 14.62 | |
1 | 0.1 | 0.0 | 10 | 10 | 8.01 | 8.31 | |
The most surprising result comes from executing XPath statements. Xalan does warn us in the JavaDoc that things could be a little slow.
selectSingleNode()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|||||||||||||||||||
All times in ms | |||||||
Create Document | Write Document to disk | Reparse the document from disk | |||||
Items | dom4j - elements | dom4j - attrs | dom4j - elements | dom4j - attrs | dom4j - elements | dom4j - attrs | |
1000 | 641.0 | 100 | 531 | 140 | 2020 | 207 | |
100 | 9.0 | 8.0 | 60 | 20 | 62.99 | 24 | |
10 | 0.7 | 0.9 | 10 | 10 | 11.92 | 8.31 | |
1 | 0.1 | 0.1 | 10 | 10 | 8.01 | 6.81 | |
The most surprising result comes from executing XPath statements. Xalan does warn us in the JavaDoc that things could be a little slow.
selectSingleNode()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|
||||||||||||||||||||||||||||||||||||
selectNodes()
|
|||||||||||||||||||
|
|||||||||||||||||||||||||
These number suggest one should use the XPathAPI class of Xalan with great caution, if at all
The syntax of Xpath statements must be chosen carefully. Contrary to some belief, and of the topology of our XML format, using /*/* or // was most efficient compared to the absolute path /ItemResultSet/Item
It appears more efficient to use selectNodes with Dom4j even if one needs a single node.
With DOM4J, it is about twice as fast when running XPath against a document which contains elements vs attributes.
In our case, we found that Dom4j is faster than Xalant for XSLT transformations. We do not claim this is a general result, but rather a datapoint
Here's the source code and data for these tests. Try them for yourself
PerfDOM4J.java |
PerfDOM4JAttr.java |
PerfW3C.java |
item.xslt |
w3c_100.xml |