站内搜索: 请输入搜索关键词
当前页面: 图书首页 > XML and Java: Developing Web Applications, Second Edition

XML and Java: Developing Web Applications, Second Edition

[ directory ] Previous Section Next Section

5.3 DOM versus SAX

We discussed the basic concepts of DOM and tips for using DOM in Chapter 4 and discussed those of SAX in the previous section. In Section 2.4.3, we discussed points for deciding whether to use DOM or SAX. In this section, we compare the performance of DOM and SAX and study the conversion of DOM from and to SAX.

5.3.1 Performance: Memory and Speed

In this section, we compare the performance of DOM and SAX based on memory usage and on parsing speed.

Memory Usage

First, we compare the memory usage of DOM and SAX. We can guess that SAX uses less memory than DOM.

We use the XML document shown in Listing 5.4. Its size is 348 bytes.

Listing 5.4 A sample document to test memory usage, chap05/memtest10.xml
<?xml version="1.0" encoding="us-ascii"?>
<root>
<child>Hello, XML! 1</child>
<child>Hello, XML! 2</child>
<child>Hello, XML! 3</child>
<child>Hello, XML! 4</child>
<child>Hello, XML! 5</child>
<child>Hello, XML! 6</child>
<child>Hello, XML! 7</child>
<child>Hello, XML! 8</child>
<child>Hello, XML! 9</child>
<child>Hello, XML! 10</child>
</root>

Listing 5.5 parses a given XML document ten times with a SAX parser and prints the memory usage for each iteration.

Listing 5.5 Print memory usage for SAX parsing, chap05/MemoryUsageSAX.java
package chap05;

import org.apache.xerces.parsers.SAXParser;

public class MemoryUsageSAX {
    static void printMemory() {
       System.gc();
       Runtime rt = Runtime.getRuntime();
       System.out.print(rt.totalMemory()-rt.freeMemory());
    }
    public static void main(String[] argv) throws Exception {
       String xml = argv[0];
       printMemory();
       System.out.println("");

       final int N = 10;
       SAXParser saxp = new SAXParser();
       printMemory();
       for (int i = 0; i < N; i++) {
           System.out.print(",");
           saxp.parse(xml);
           printMemory();
       }
       System.out.println("");
    }
}
R:\samples>java chap05.MemoryUsageSAX file:./chap05/memtest10.xml
104792,152912,208360,207712,247704,207712,247704,207712,247704,207712,
247704,207712

A SAX parser creates events and throws them to a handler. If the handler does nothing or there are no handlers, nothing is stored in memory. The result just shown confirms this observation. The amount of memory used did not increase after the first parsing. The memory consumed in the first parsing was for the classes and working area of the parser.

Next, let's do similar experiments for DOM. Listing 5.6 parses a given XML document with a DOM parser ten times and prints the memory usage for each iteration. To see how much memory is used for the DOM trees, the program keeps each of the created DOM trees in memory.

Listing 5.6 Print memory usage for DOM parsing, chap05/MemoryUsageDOM.java
package chap05;

import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.Document;

public class MemoryUsageDOM {
   static final String PROP_DOC =
      "http://apache.org/xml/properties/dom/document-class-name";
   static final String FEATURE_DEFER =
      "http://apache.org/xml/features/dom/defer-node-expansion";

   static void printMemory() {
      System.gc();
      Runtime rt = Runtime.getRuntime();
      System.out.print(rt.totalMemory()-rt.freeMemory());
   }

   public static void main(String[] argv) throws Exception {
      String className = argv[0];
      boolean defer = argv[1].equals("true");
      String xml = argv[2];
      printMemory();
      System.out.println("");

      final int N = 10;
      Document[] docs = new Document[N];
      DOMParser domp = new DOMParser();
      domp.setProperty(PROP_DOC, className);
      domp.setFeature(FEATURE_DEFER, defer);
      printMemory();
      for (int i = 0; i < N; i++) {
          System.out.print(",");
          domp.parse(xml);
          docs[i] = domp.getDocument();
          printMemory();
      }
      System.out.println("");
   }
 }

Xerces has two DOM implementations. One is fully compliant with all DOM Level 2 specifications. Its Document implementation class is org.apache.xerces.dom.DocumentImpl. Another implementation supports DOM Level 2 Core only. Its Document implementation class is org.apache.xerces.dom.CoreDocumentImpl. In addition, DocumentImpl has the Deferred DOM feature, which improves parsing speed. If Deferred DOM is enabled, the Xerces parser does not create all DOM nodes during parsing. They are created only when an application program attempts to access them.

In this section, we call DocumentImpl with Deferred DOM "Deferred DOM," we call DocumentImpl without deferred DOM "Non-deferred DOM," and we call CoreDocumentImpl "Core DOM."

Listing 5.6 can check the memory usage of these three implementations: Deferred DOM, Non-deferred DOM, and Core DOM.

R:\samples>java chap05.MemoryUsageDOM org.apache.xerces.dom.
DocumentImpl true file:./chap05/memtest10.xml
104768,155576,334536,446816,563016,679216,795416,896928,1013128,
1129328,1245528,1347040

R:\samples> java chap05.MemoryUsageDOM org.apache.xerces.dom.
DocumentImpl false file:./chap05/memtest10.xml
104768,155576,278488,280832,324480,327120,329776,291456,335104,337744,
340400,302080

R:\samples> java chap05.MemoryUsageDOM org.apache.xerces.dom.
CoreDocumentImpl false file:./chap05/memtest10.xml
104776,155584,278472,280792,324416,327032,329664,291320,334944,337560,
340192,301848

The first command invokes Deferred DOM, which is the default setting of Xerces, and uses approximately 110KB for one document. The second invokes Non-deferred DOM and uses about 2.62KB for one document. The third invokes Core DOM and uses about 2.60KB for one document.

Figure 5.5 shows the memory usage of SAX, Deferred DOM, Non-deferred DOM, and Core DOM.

Figure 5.5. Memory usage for SAX and DOM implementations

graphics/05fig05.gif

For Non-deferred DOM or Core DOM, the amount of memory used increases in proportion to the number of nodes in a document. For Deferred DOM, the amount of memory used is not proportional. It does not use 220KB for a document twice as large. Table 5.3 shows the memory usage for documents containing 10, 100, 200, 300, 400, or 500 child nodes.

This result indicates that Deferred DOM wastes much memory. In fact, Deferred DOM defers creating DOM nodes in order to improve not memory performance but parsing speed. In general, object creation in Java cost much time, and reducing object creation (new operators) is very effective for improving processing speed. By enabling Deferred DOM, Xerces keeps char arrays or byte arrays as often as possible, and defers String and DOM node creation.

It is obvious that the memory performance of SAX is better than that of DOM. We have learned that Deferred DOM, which is the default DOM in Xerces, uses a large amount memory for a small XML document and that there is little difference between Non-deferred DOM and Core DOM.

Table 5.3. Memory Usage for Deferred DOM
Number of Child Nodes 10 100 200 300 400 500
Memory for a Document (in KB) 110 110 114 124 124 144

Speed

We use the program shown in Listing 5.7 to compare parsing speeds.

Listing 5.7 Print parsing times for SAX and DOM implementations, chap05/SpeedTest.java
package chap05;

import java.io.IOException;
import java.io.OutputStream;
import org.apache.xerces.parsers.SAXParser;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;

public class SpeedTest {
    static final String PROP_DOC =
       "http://apache.org/xml/properties/dom/document-class-name";
    static final String FEATURE_DEFER =
       "http://apache.org/xml/features/dom/defer-node-expansion";

    public static void main(String[] argv) throws Exception {
       int n = Integer.parseInt(argv[0]);
       boolean consume = argv[1].equals("true");
       String xml = argv[2];

       OutputFormat format;
       format = new OutputFormat("xml", "UTF-8", false);
       format.setPreserveSpace(true);
       OutputStream stream = new NullOutputStream();
       XMLSerializer serializer = new XMLSerializer(format);
       long start = 0, end;

       System.gc();
       SAXParser saxp = new SAXParser();
       if (consume)
           saxp.setDocumentHandler(serializer);
       for (int i = -1; i < n; i++) {
           if (i == 0)
              start = System.currentTimeMillis();
           if (consume) {
               serializer.reset();
               serializer.setOutputByteStream(stream);
           }
           saxp.parse(xml);
       }
       end = System.currentTimeMillis();
       System.out.println("SAX: "+(end-start)+"ms");
       System.gc();
       DOMParser domp = new DOMParser();
       for (int i = -1; i < n; i++) {
          if (i == 0)
              start = System.currentTimeMillis();
          domp.parse(xml);
          Document doc = domp.getDocument();
          if (consume) {
          serializer.reset();
             serializer.setOutputByteStream(stream);
             serializer.serialize(doc);
         }

       }
       end = System.currentTimeMillis();
       System.out.println("Deferred DOM: "+(end-start)+"ms");

       System.gc();
       domp.setFeature(FEATURE_DEFER, false);
       for (int i = -1; i < n; i++) {
          if (i == 0)
              start = System.currentTimeMillis();
          domp.parse(xml);
          Document doc = domp.getDocument();
          if (consume) {
              serializer.reset();
              serializer.setOutputByteStream(stream);
              serializer.serialize(doc);
          }

       }
       end = System.currentTimeMillis();
       System.out.println("Non-deferred DOM: "+(end-start)+"ms");

       System.gc();
       domp.setProperty(PROP_DOC,
                     "org.apache.xerces.dom.CoreDocumentImpl");
       for (int i = -1; i< n; i++) {
           if (i == 0)
              start = System.currentTimeMillis();
           domp.parse(xml);
           Document doc = domp.getDocument();
           if (consume) {
              serializer.reset();
              serializer.setOutputByteStream(stream);
              serializer.serialize(doc);
           }

       }
       end = System.currentTimeMillis();
       System.out.println("Core DOM: "+(end-start)+"ms");
    }

    static class NullOutputStream extends OutputStream {
       public NullOutputStream() {
       }
       public void close() throws IOException {
       }
       public void flush() throws IOException {
       }
       public void write(byte[] b) throws IOException {
       }
       public void write(byte[] b, int off, int len)
              throws IOException {
       }
       public void write(int b) throws IOException {
       }
    }
 }

Given a number for repeating parsing, the program repeatedly parses a document with SAX, Deferred DOM, Non-deferred DOM, and Core DOM, and then prints the time for parsing. This program also allows us to specify whether it serializes the parsed document.

The following shows the result of parsing a document 500 times. The document has about 500 elements. The first command does not serialize, and the second does.

R:\samples>java chap05.SpeedTest 500 false file:./chap05/memtest500.xml
SAX: 10114ms
Deferred DOM: 11237ms
Non-deferred DOM: 12748ms
Core DOM: 12648ms

R:\samples> java chap05.SpeedTest 500 true file:./chap05/memtest500.xml
SAX: 11036ms
Deferred DOM: 15462ms
Non-deferred DOM: 13760ms
Core DOM: 13850ms

Figure 5.6 shows the time for parsing documents 500 times. The documents have 100 to 1,000 elements.

Figure 5.6. Parsing times for SAX and DOM implementations

graphics/05fig06.gif

Roughly speaking, SAX is the fastest and Deferred DOM is not significantly slower than SAX. There is little difference between Non-deferred DOM and Core DOM.

Because a serializer accesses all nodes in the DOM tree, all nodes are eventually created even when Deferred DOM does not create them during parsing. In fact, Deferred DOM is the slowest in parsing combined with serialization.

5.3.2 Conversion from DOM to SAX and Vice Versa

As described earlier, the runtime performance of SAX is always better than that of DOM. However, application development with SAX only is a hard job. Converters from DOM to SAX and vice versa would be useful.

In this section, we introduce DOMReader, which throws SAX events from a DOM tree, and DOMConstructor, which creates a DOM tree from SAX events.

Converting a DOM Tree to SAX Events

DOMReader traverses an input DOM tree and generates corresponding SAX events. It is derived from XMLReader, which is the SAX parser interface, because it generates SAX events. However, the input to DOMReader is a DOM node, though the input to XMLReader is InputSource or a URI. Thus, DOMReader ignores the parameters of the parse() method and receives the input DOM via the setProperty() method.

The core of DOMReader is the processNode() method, which generates corresponding SAX events from various types of DOM nodes. It is not difficult to understand this method if you are familiar with DOM and SAX. Because there are no ways to detect ignorable whitespace with the DOM API, DOMReader throws characters() events instead of ignorableWhitespace() events.

This program, shown in Listing 5.8, assumes that the input DOM tree has namespace information. The Xerces API enables namespaces by default. If you use JAXP, remember to enable the namespace feature explicitly. (See Section 4.3.5.)

Listing 5.8 Convert a DOM tree to SAX events, chap05/DOMReader.java
package chap05;

...[snip]...

 public class DOMReader implements XMLReader {

   ... [snip] ...

   protected void parse() throws SAXException {
      this.processNode(this.current);
      this.current = null;
   }
   public void parse(InputSource input)
          throws IOException, SAXException {
      this.parse();
   }
   public void parse(String sysid)
          throws IOException, SAXException {
      this.parse();
   }

   protected void processNode(Node node) throws SAXException {
      char[] chars;

      this.current = node;
      switch (node.getNodeType()) {
      case Node.COMMENT_NODE:
         chars = node.getNodeValue().toCharArray();
         this.lhandler.comment(chars, 0, chars.length);
         break;

   case Node.DOCUMENT_FRAGMENT_NODE:
      this.processChildren(node);
      break;

   case Node.DOCUMENT_NODE:
      this.chandler.startDocument();
      this.processChildren(node);
      this.chandler.endDocument();
      break;

   case Node.ELEMENT_NODE:
      // Preapares attributes
      if (this.attrs == null)
          this.attrs = new AttributesImpl();
      NamedNodeMap map = node.getAttributes();
      String[] prefixes = new String[map.getLength()];
      int nprefixes = 0;
      for (int i = 0; i < map.getLength(); i++) {
          Attr attr = (Attr)map.item(i);
          String qname = attr.getNodeName();
          if (this.namespaces
                 && (qname.equals("xmlns") ||
                     qname.startsWith("xmlns:"))) {
              String prefix = "";
              int colon = qname.indexOf(':');
              if (colon >= 0)
                  prefix = qname.substring(colon+1);
              this.chandler.startPrefixMapping(
                     prefix, attr.getNodeValue());
              prefixes[nprefixes++] = prefix;
              if (!this.namespaceprefixes)
                 continue;
          }
          this.attrs.addAttribute(attr.getNamespaceURI(),
                           attr.getLocalName(),
                           qname,
                           "CDATA",
                           attr.getNodeValue());
      }

      String uri = node.getNamespaceURI();
      String lname = node.getLocalName();
      String qname = node.getNodeName();
      if (uri == null)
         uri = "";
      this.chandler.startElement(uri, lname, qname, attrs);
         this.attrs.clear();
         this.processChildren(node);
         this.chandler.endElement(uri, lname, qname);

         if (this.namespaces)
            while (nprefixes > 0)
               this.chandler.endPrefixMapping(
                      prefixes[--nprefixes]);
         break;

      case Node.ENTITY_REFERENCE_NODE:
         this.lhandler.startEntity(node.getNodeName());
         this.processChildren(node);
         this.lhandler.endEntity(node.getNodeName());
         break;

      case Node.PROCESSING_INSTRUCTION_NODE:
         this.chandler.processingInstruction(node.getNodeName(),
                                             node.getNodeValue());
         break;

      case Node.CDATA_SECTION_NODE:
         chars = node.getNodeValue().toCharArray();
         this.lhandler.startCDATA();
         this.chandler.characters(chars, 0, chars.length);
         this.lhandler.endCDATA();
         break;

      case Node.TEXT_NODE:
         // DOM does not provide information whether
         // text is ignorable or not.
         chars = node.getNodeValue().toCharArray();
         this.chandler.characters(chars, 0, chars.length);
         break;

      case Node.DOCUMENT_TYPE_NODE:
         this.lhandler.startDTD(
                node.getNodeName(),
                ((DocumentType)node).getPublicId(),
                ((DocumentType)node).getSystemId());
         // Ignore contents.
         this.lhandler.endDTD();
         break;

      case Node.ATTRIBUTE_NODE:
      case Node.ENTITY_NODE:
      case Node.NOTATION_NODE:
         throw new IllegalArgumentException
             ("Internal Error: Non-supported node");
      }
   }

   protected void processChildren(Node parent)
          throws SAXException {
      for (Node child = parent.getFirstChild();
          child != null;
          child = child.getNextSibling()) {
         this.processNode(child);
      }
   }

   ... [snip] ...
}

Listing 5.9 is an example of using DOMReader. It prints SAX events directly created from an XML document by a SAX parser and also prints SAX events created from a DOM tree by DOMReader. SAXMonitor, used in this program, is a class to print SAX events to the console.

Listing 5.9 An example using DOMReader, chap05/DOM2SAX.java
package chap05;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

public class DOM2SAX {
   private static final String PROP_LEX
          = "http://xml.org/sax/properties/lexical-handler";
   private static final String PROP_DOM
          = "http://xml.org/sax/properties/dom-node";

   public static void main(String[] argv) {
       try {
          SAXMonitor mon = new SAXMonitor();

          System.out.println("--- SAX ---");
          XMLReader xreader = XMLReaderFactory.createXMLReader(
                 "org.apache.xerces.parsers.SAXParser");
          xreader.setContentHandler(mon);
          xreader.setProperty(PROP_LEX, mon);
          xreader.parse(argv[0]);

          System.out.println("--- DOM -> SAX ---");
          // Create namespace-aware DOM tree
          DocumentBuilderFactory factory
             = DocumentBuilderFactory.newInstance();
          factory.setNamespaceAware(true);
          DocumentBuilder builder = factory.newDocumentBuilder();
          Document doc = builder.parse(argv[0]);

          DOMReader reader = new DOMReader();
          // Register a monitor
          reader.setContentHandler(mon);
          reader.setProperty(PROP_LEX, mon);
          // Set the source DOM tree
          reader.setProperty(PROP_DOM, doc);
          // Start producing events
          reader.parse();
      } catch (Exception e) {
          e.printStackTrace();
      }
   }
 }

The following is the result of running the program.

R:\samples>java chap05.DOM2SAX file:./chap05/nstest.xml
--- SAX ---
setDocumentLocator: line=1 column=1
startDocument
startPrefixMapping: foo=urn:x-foo
startElement: {urn:x-foo}root (foo:root),
characters: length=3 '\n  '
startPrefixMapping: foo=urn:x-foo
startElement: {}foo2 (foo2),
endElement: foo2
endPrefixMapping: foo
characters: length=3 '\n  '
startElement: {}bar (bar), foo:id='id'
comment: <!?foo foo ?gt;
characters: length=11 '\n    aaa\n  '
endElement: bar
characters: length=3 '\n  '
processingInstruction: <?target ?>
characters: length=3 '\n  '
startCDATA: <![CDATA[
characters: length=11 '<foo></foo>'
endCDATA: ]]>
characters: length=1 '\n'
endElement: foo:root
endPrefixMapping: foo
endDocument
--- DOM -> SAX ---
startDocument
startPrefixMapping: foo=urn:x-foo
startElement: {urn:x-foo}root (foo:root),
characters: length=3 '\n  '
startPrefixMapping: foo=urn:x-foo
startElement: {}foo2 (foo2),
endElement: foo2
endPrefixMapping: foo
characters: length=3 '\n  '
startElement: {}bar (bar), foo:id='id'
comment: <!-- foo foo -->
characters: length=11 '\n    aaa\n  '
endElement: bar
characters: length=3 '\n  '
processingInstruction: <?target ?>
characters: length=3 '\n  '
startCDATA: <![CDATA[
characters: length=11 '<foo></foo>'
endCDATA: ]]>
characters: length=1 '\n'
endElement: foo:root
endPrefixMapping: foo
endDocument

The setDocumentLocater() method of the ContentHandler interface is not called because DOM nodes have no line or column information. However, apart from the events for setDocumentLocater(), the events from SAX and those from DOM-then-SAX are identical.

Converting SAX Events to a DOM Tree

In contrast to DOMReader, the DOMConstructor class creates a DOM tree from SAX events.

Though the processNode() method is the core of DOMReader, DOMConstructor has no such core methods. Rather, all its event handler methods collectively create the DOM tree from the input events. This difference between DOMReader and DOMConstructor is caused by the differences in the programming models of DOM and SAX.

Because both normal character data and CDATA sections are represented by characters() events, we cannot distinguish CDATA sections from characters() events by examining characters() events only. To distinguish CDATA sections, we have to track the status of whether a parser is processing CDATA sections or not by checking startCDATA() and endCDATA(). As for entity references, we also have to check startEntity() and endEntity() to know whether or not a parser is processing an entity reference.

Methods such as startCDATA(), startEntity(), and comment() are methods of the LexicalHandler interface. So they are not called if a SAX parser does not support LexicalHandler or an application does not register a DOMConstructor instance as a LexicalHandler() to a SAX parser. In this case,

  • No Comment nodes are generated.

  • No EntityReference nodes are created and the contents of entity references are appended directly.

  • Text nodes are generated instead of CDATASection nodes.

They do not change the meaning of an XML document, though they change the lexical representation of the XML document.

The type of an output node of DOMConstructor depends on the input SAX events. We get a Document node if the input SAX events start with startDocument() and end with endDocument(). Meanwhile, we get an Element node if the input SAX events start with startElement() and end with endElement(). To convert part of an XML document to a DOM tree, you can create a SAX filter to discard unnecessary events. See Listing 5.10.

Listing 5.10 Convert SAX events to a DOM tree, chap05/DOMConstructor.java
package chap05;

import org.xml.sax.Locator;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.ContentHandler;
import org.xml.sax.ext.LexicalHandler;
import java.util.Vector;
import java.util.Stack;
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.ProcessingInstruction;
public class DOMConstructor
       implements ContentHandler, LexicalHandler {

   public static final String XMLNS_NSURI
          = "http://www.w3.org/2000/xmlns/";

   Node contextNode = null;
   Stack contextStack;
   Document factory;
   boolean inCdata = false;
   Vector prefixes = null;
   StringBuffer buffer = null;

   /**
    * Create new DOMConstructor instance.
    * @param factory Factory instance to be used for creating nodes.
    */
   public DOMConstructor(Document factory) {
      this.factory = factory;
      this.contextStack = new Stack();
   }

   /**
    * Return created DOM node.
    */
   public Node getNode() {
      this.flushText();
      return this.contextNode;
   }

   protected void output(Node node) {
      if (this.contextNode == null) {
         this.contextNode = node;
      } else {
          this.contextNode.appendChild(node);
      }
   }

   protected void pushContext(Node newContext) {
      this.contextStack.push(this.contextNode);
      this.contextNode = newContext;
   }
    protected Node popContext() {
       Node ret = this.contextNode;
       this.contextNode = (Node)this.contextStack.pop();
       return ret;
   }
   protected void flushText() {
      if (this.buffer == null || this.buffer.length() == 0)
         return;
      String text = new String(this.buffer);
      if (this.inCdata) {
         this.output(this.factory.createCDATASection(text));
      } else {
         this.output(this.factory.createTextNode(text));
      }
      this.buffer.setLength(0);
   }

   // Text and CDATA section
   public void startCDATA() throws SAXException {
      this.flushText();
      this.inCdata = true;
   }
   public void endCDATA() throws SAXException {
      this.flushText();
      this.inCdata = false;
   }
   public void characters(char[] ch, int start, int length)
      throws SAXException {
      if (this.buffer == null)
         this.buffer = new StringBuffer();
      this.buffer.append(ch, start, length);
   }
   public void ignorableWhitespace(char[] ch, int start, int len)
      throws SAXException {
      this.characters(ch, start, len);
   }

   public void processingInstruction(String target, String data)
      throws SAXException {
      this.flushText();
      ProcessingInstruction pi;
      pi = this.factory.createProcessingInstruction(target, data);
      this.output(pi);
   }
   public void comment(char[] ch, int start, int length)
      throws SAXException {
      this.flushText();
      String data = new String(ch, start, length);
      this.output(this.factory.createComment(data));
   }
   public void startDocument() throws SAXException {
      this.pushContext(this.factory);
   }
   public void endDocument() throws SAXException {
      this.output(this.popContext());
   }

   public void startPrefixMapping(String prefix, String uri)
      throws SAXException {
      if (this.prefixes == null)
         this.prefixes = new Vector();
      else
         this.prefixes.removeAllElements();

      String qname = "xmlns";
      if (prefix.length() > 0)
         qname = "xmlns:"+prefix;
      Attr attr = this.factory.createAttributeNS(XMLNS_NSURI,
                                                       qname);
      attr.setNodeValue(uri);
      this.prefixes.addElement(attr);
   }
   public void startElement(String uri, String local,
                         String qname, Attributes atts)
      throws SAXException {
      this.flushText();

      Element elem = this.factory.createElementNS(uri, qname);
      for (int i = 0; i < atts.getLength(); i++) {
         elem.setAttributeNS(atts.getURI(i),
                          atts.getQName(i),
                          atts.getValue(i));
      }
      if (this.prefixes != null && this.prefixes.size() > 0) {
         for (int i = 0; i < this.prefixes.size(); i++) {
            Attr attr = (Attr)this.prefixes.elementAt(i);
            elem.setAttributeNode(attr);
         }
         this.prefixes.removeAllElements();
      }
      this.pushContext(elem);
   }
   public void endElement(String uri, String local, String qname)
      throws SAXException {
      this.flushText();
      this.output(this.popContext());
   }
   public void endPrefixMapping(String prefix)
      throws SAXException {
   }


   // EntityReference
   public void startEntity(String name) throws SAXException {
      this.flushText();
      Node entityref = this.factory.createEntityReference(name);
      this.pushContext(entityref);
   }
   public void endEntity(String name) throws SAXException {
      this.flushText();
      this.output(this.popContext());
   }

   // DOCTYPE: ignored
   public void startDTD(String root, String p, String s)
      throws SAXException {
   }
   public void endDTD() throws SAXException {
   }

   public void setDocumentLocator(Locator locator) {
   }
   public void skippedEntity(String name) throws SAXException {
   }
}

The program SAX2DOM (see Listing 5.11) is an example of converting SAX events to a DOM tree with DOMConstructor. It prints the structure of a DOM tree parsed by a DOM parser and the structure of another DOM tree converted from SAX events. We must not disable the SAX namespace feature because DOMConstructor uses it. DOMMonitor, used in Listing 5.11, is a class for printing the structure of a DOM tree.

Listing 5.11 An example using DOMConstructor, chap05/SAX2DOM.java
package chap05;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
public class SAX2DOM {
   private static final String PROP_LEX
          = "http://xml.org/sax/properties/lexical-handler";

   public static void main(String[] argv) {
      try {
         System.out.println("--- DOM ---");
         DocumentBuilderFactory factory
                = DocumentBuilderFactory.newInstance();
         factory.setNamespaceAware(true);
         DocumentBuilder builder = factory.newDocumentBuilder();
         DOMMonitor.dump(builder.parse(argv[0]), 0);

         System.out.println("--- SAX -> DOM ---");
         // This builder is namespace-aware
         Document doc = builder.newDocument();
         DOMConstructor con = new DOMConstructor(doc);
         XMLReader xreader = XMLReaderFactory.createXMLReader(
                "org.apache.xerces.parsers.SAXParser");
         // Register the DOMConstructor as
         // ContentHandler and LexicalHandler
         xreader.setContentHandler(con);
         xreader.setProperty(PROP_LEX, con);
         // Start construction
         xreader.parse(argv[0]);
         // Examine the result
         DOMMonitor.dump(con.getNode(), 0);
       } catch (Exception ex) {
           ex.printStackTrace();
       }
   }
}

The result of running SAX2DOM follows. We can see that two identical tree structures are created.

R:\samples>java chap05.SAX2DOM file:./chap05/nstest.xml
--- DOM ---
#document
   ELEMENT: foo:root xmlns:foo='urn:x-foo'
      #text
      ELEMENT: foo2 xmlns:foo='urn:x-foo'
      #text
      ELEMENT: bar foo:id='id'
         <!-- foo foo -->
         #text
         #text
         <?target ?>
         #text
         #cdata-section
         #text
--- SAX -> DOM ---
#document
   ELEMENT: foo:root xmlns:foo='urn:x-foo'
      #text
      ELEMENT: foo2 xmlns:foo='urn:x-foo'
      #text
      ELEMENT: bar foo:id='id'
         <!-- foo foo -->
         #text
      #text
      <?target ?>
      #text
      #cdata-section
      #text
    [ directory ] Previous Section Next Section