站内搜索: 请输入搜索关键词
当前页面: 图书首页 > XML and Java: Developing Web Applications, Second Edition

XML and Java: Developing Web Applications, Second Edition

[ directory ] Previous Section Next Section

4.3 Advanced DOM

By now you should be able to create and manipulate any tree consisting of nodes. We discuss the advanced use and the pitfalls of DOM in this section.

4.3.1 How to Simplify Your Code by Removing Entity References

Special attention is needed when an XML document may contain general entity references, such as &foo;, which are to be replaced by their definitions. See the example in Table 4.2.

Table 4.2. A Document without Entity References and a Document with an Entity Reference

DOCUMENT A

DOCUMENT B

<root>
  <first>Ichibanme</first>
  <second>Nibanme</second>
</root>
<!DOCTYPE root [
  <!ENTITY first "<first>Ichibanme</first>">
]>
<root>
  &first;
  <second>Nibanme</second>
</root>

Documents A and B are almost identical. In fact, validity should be checked after the entity reference &first; is replaced by its value, <first>Ichibanme</first>.

However, because the XML 1.0 Recommendation requires an XML processor to preserve entity references, the DOM trees for A and for B are different, as shown in Figure 4.10.

Figure 4.10. DOM trees for documents A and B

graphics/04fig10.gif

In the DOM tree for document B, an EntityReference node is inserted where an entity reference appeared in the XML document. Therefore, if you do not take into account the possibility of entity references, you might miss elements. For instance, do not assume, when coding your program, that the first element in the children of the root element must be a first element because it is specified in the content model of the root element. The first child element is a second element in the case of document B.

If your application does not need information about whether entity references are used, you can simplify your code by avoiding EntityReference nodes. Then, with Xerces, you can turn off EntityReference creation with the following code.

DOMParser parser = new DOMParser();
parser.setFeature("http://apache.org/xml/features/dom/create-entity-ref-nodes", false);
parser.parse(...);
Document doc = parser.getDocument();
// doc contains no EntityReference nodes.

With JAXP, you have to do nothing, because JAXP does not create Entity Reference nodes in parsing by default.

4.3.2 Tree Traversal

You might want to visit all the descendant nodes of a given node. A code fragment for such a visit is shown here.

public void processNodeRecursively(Node node) {
   ...process node
   for (Node child = node.getFirstChild();
          child != null;
          child = child.getNextSibling()) {
       processNodeRecursively(child);
   }
}

Starting at the name element in Figure 4.11, the code visits the nodes in this order:

  1. Element node of the name name

  2. Text node that has "\n "

  3. Element node of the name given

  4. Text node that has "John"

  5. Text node that has "\n "

  6. Element node of the name family

  7. Text node that has "Doe"

  8. Text node that has "\n"

Figure 4.11. Processing descendants

graphics/04fig11.gif

The code uses only the methods described in Section 4.2. However, DOM Level 2, issued in November 2000, introduced new interfaces for traversing a tree. They are NodeIterator, TreeWalker, NodeFilter, and DocumentTraversal. The following code fragment behaves the same as the previous code fragment using recursion.

DocumentTraversal trav;
NodeIterator iter;
Node child;
trav = (DocumentTraversal)node.getOwnerDocument();
iterator = trav.createNodeIterator(node,
                                   NodeFilter.SHOW_ALL,
                                   null,
                                   false);
while ((child = iterator.nextNode()) != null) {
   ...process child
}
iterator.detach();

The advantages of these DOM Level 2 interfaces are:

  • You can specify the node type that you are interested in before starting the loop.

  • You can change the direction of traversal anytime.

Note that Crimson, the default parser of JAXP 1.1, does not have implementations of DOM traversal as of now. Xerces does have them.

4.3.3 DOM Collection Is Live

Given a node, how do we remove all its child nodes? You might think it is easy, but it is not.

Suppose you write the following code to remove all children of a node.

for (Node child = node.getFirstChild();
      child != null;
      child = child.getNextSibling()) {
   node.removeChild(child);
}

This looks like a straightforward implementation, but unfortunately, it does not work properly. It simply removes the first child of the node and exits the loop. Here is how it works梠r more precisely, how it does not work.

  1. In the initialization of the for loop, the first child of the node is assigned to child (if there are no children, this value is null).

  2. The expression child != null is true, so this check passes and the body of the loop is executed.

  3. The statement node.removeChild(child) removes the first child from node. As soon as the child is removed, its internal variables are updated and child.getParentNode(), child.getPreviousSibling(), and getNextSibling() all now return null.

  4. In the third component of for (that is, child = child.getNextSibling()), child is set to null and the loop terminates.

Even if you use getChildNodes() and NodeList instead of getNextSibling(), as shown next, it still does not work. This is because getChildNodes() returns not the "snapshot" of the child nodes but a "live" data structure that will be updated immediately whenever any changes are made on the node from which it was created.

NodeList nodeList = node.getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
   node.removeChild(nodeList.item(i));
}

Thus, this code fails to remove the second child, fourth child, and so forth.

To remove all children properly, you can use either of the following code fragments. Both implement the strategy of "remove the first child node until there are no more child nodes."

while (node.hasChildNodes()) {
   node.removeChild(node.getFirstChild());
}
NodeList nodeList = node.getChildNodes();
while (nodeList.getLength() > 0) {
   node.removeChild(nodeList.item(0));
}

4.3.4 Moving Nodes over Documents

Each DOM node is owned by a Document node that created the node. We can use the getOwnerDocument() method of the Node interface to obtain the owner Document node. The getOwnerDocument() method for a Document node always returns null.

You cannot insert a DOM node into a document other than its owner. The following code throws a DOMException.

DOMParser parser = new DOMParser();
parser.parse(...);
Document doc1 = parser.getDocument();
parser.parse(...);
Document doc2 = parser.getDocument();

doc1.getDocumentElement().appendChild(doc2.getDocumentElement());
// DOMException is thrown at the above line.

To move nodes across the document boundary, use doc.importNode(Node src, boolean deep) of the Document interface. The importNode() method is a kind of factory method. It duplicates the specified src node (and its descendants if deep parameter is true) belonging to another document, for the doc document.

The following is a sample using importNode().

DOMParser parser = new DOMParser();
parser.parse(...);
Document doc1 = parser.getDocument();
parser.parse(...);
Document doc2 = parser.getDocument();

Element root2 = doc2.getDocumentElement();
Node roo2forDoc1 = doc1.importNode(root2, true);
doc.getDocumentElement().appendChild(root2forDoc1);

In this case, doc2 is not modified at all.

4.3.5 Namespaces in DOM

Some APIs in the previous sections are insufficient for processing documents using namespaces. Element names and attribute names in these APIs are actually treated as qualified names. The prefix in a qualified name is a placeholder for the namespace URI. Applications should use such URIs rather than prefixes.

DOM Level 2 introduces namespace-aware methods. They are named xxxNS(). Almost all methods take a namespace URI and a local name as parameters, though factory methods take a namespace URI and a qualified name.

Element nodes and Attr nodes created by namespace-aware methods support getNamespaceURI(), getLocalName(), getPrefix(), and setPrefix(). These methods for nodes created by namespace-unaware methods always return null.

If you parse a document and create a DOM tree by using the JAXP API, the resultant DOM tree is not namespace-aware by default. You can enable the namespace feature as follows:

DocumentBuilderFactory factory = DocumentBuilderFactory.
    newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
builder.parse(...);

On the other hand, the Xerces native API creates a namespace-aware DOM tree by default. If you want to turn it off, call setFeature() like this:

DOMParser parser = new DOMParser();
parser.setFeature("http://xml.org/sax/features/namespaces", false);
parser.parse(...);
Document doc = parser.getDocument());

Next, we discuss two important points of namespace processing in DOM.

DOM Creates No Namespace Declarations Automatically

For example:

Element elem = doc.createElementNS("urn:x-foo", "foo:root");

This code creates an element whose namespace URI is urn:x-foo and prefix is foo. But a namespace declaration like xmlns:foo="urn:x-foo" is not created automatically. If you need it, you have to add it by providing code similar to the following.

elem.setAttributeNS("http://www.w3.org/2000/xmlns/",
                    "xmlns:"+elem.getPrefix(),
                    elem.getNamespaceURI());

According to the "Namespaces in XML" specification, xmlns in a namespace declaration is not a prefix, and it has no corresponding namespace URI. In DOM, it is treated as a prefix and has a namespace URI, which is http://www.w3.org/2000/xmlns/.

Adding Namespace Declarations Does Not Affect Other Nodes

You cannot modify the local name and the namespace URI of a node after creating the node. Take a look at the same example as in the last section.

Element elem = doc.createElementNS("urn:x-foo", "foo:root");

The local name of this elem element is root, and the namespace URI is urn: x-foo. Now we add a namespace declaration, xmlns:foo="urn:x-bar".

elem.setAttributeNS("http://www.w3.org/2000/xmlns/",
                    "xmlns:foo", "urn:x-bar");

In this operation, it seems as if we had the following fragment. Suppose that an ancestor element of elem already had a namespace declaration for the prefix foo.

<... xmlns:foo="urn:x-foo">
   <foo:root/>
</...>

We add a namespace declaration to the foo:root element.

<... xmlns:foo="urn:x-foo">
   <foo:root xmlns:foo="urn:x-bar"/>
</...>

In this lexical view, the namespace URI of foo:root should be changed to urn:x-bar. However, elem.getNamespaceURI() still returns urn:x-foo. You cannot change the namespace URI of a node by editing namespace declarations in DOM.

Unlike the local name and namespace URI, the prefix of a node is changeable with the setPrefix(newPrefix) method. Note that you have to add a namespace declaration for the new prefix when you change the prefix.

As described earlier, we can create incomplete DOM trees, such as those lacking the required namespace declarations or having inconsistent namespace declarations.

The program in Listing 4.1, NamespaceCorrector.java, is a utility class for these namespace problems. This class adds missing namespace declarations to the specified DOM tree; it also checks namespace inconsistencies and throws a DOMException if the DOM tree has inconsistencies.

Listing 4.1 Utility to correct namespace problems, chap04/NamespaceCorrector.java
package chap04;

import org.w3c.dom.Attr;
import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NamedNodeMap;

 /**
 * Add required namespace declarations.
 */
public class NamespaceCorrector {
   private static final String XMLNS_NS
      = "http://www.w3.org/2000/xmlns/";

   private NamespaceCorrector() {
   }

   /**
    * @param node The top node of target nodes
    */
   public static void correct(Node node) {
      switch (node.getNodeType()) {
      case Node.ELEMENT_NODE:
         correctElement((Element)node);
                             // Fall down
      case Node.DOCUMENT_NODE:
      case Node.DOCUMENT_FRAGMENT_NODE:
      case Node.ENTITY_REFERENCE_NODE:
         for (Node ch = node.getFirstChild();
             ch != null;
             ch = ch.getNextSibling()) {
            correct(ch);
         }
         break;
      }
   }

   /**
   * Check whether the prefixes and the namespaces of el and
   * its attributes are declared or not.
   * if not, add a namespace declaration to el.
   */
   private static void correctElement(Element el) {
      // Check el.
      String prefix = el.getPrefix();
      String current = el.getNamespaceURI();
      String declared = howDeclared(el, prefix);
      if (prefix == null) {
         if (current == null && declared == null) {
            // ok
         } else if (current == null || declared == null) {
            set(el, prefix, current == null ? "" : current);
         } else if (!current.equals(declared)) {
            set(el, prefix, current);
         }
      } else {
         if (current == null)
            throw new DOMException(DOMException.NAMESPACE_ERR,
                                   el.getNodeName()
                                   +" has no namespace");
         if (declared == null || !current.equals(declared))
            set(el, prefix, current);
      }

      // Check attributes of el.
      NamedNodeMap map = el.getAttributes();
      for (int i = 0;  i < map.getLength();  i++) {
         Attr attr = (Attr)map.item(i);
         prefix = attr.getPrefix();
         if (prefix == null || prefix.equals("xml")
            || prefix.equals("xmlns"))
            continue;
         current = attr.getNamespaceURI();
         declared = howDeclared(el, prefix);
         if (declared == null || !current.equals(declared)) {
            set(el, prefix, current);
            i = -1;         // map has changed.
                            // So restart the loop.
         }
      }
   }

   private static void set(Element el, String prefix, String ns) {
      String qname = prefix == null ? "xmlns" : "xmlns:"+prefix;

      if (el.getAttributeNode(qname) != null)
         throw new DOMException(DOMException.NAMESPACE_ERR,
                                "Namespace inconsistence");
      el.setAttributeNS(XMLNS_NS, qname, ns);
   }

   /**
    * Search <var>context</var> and ancestors for declaration
    * of prefix.
    * @param prefix Prefix, or <code>null</code> for default ns.
    */
   private static String howDeclared(Element context,
                                     String prefix) {
      String qname = prefix == null ? "xmlns" : "xmlns:"+prefix;

      for (Node node = context; node != null;
          node = node.getParentNode()) {
         if (node.getNodeType() == Node.ELEMENT_NODE) {
            Attr attr = ((Element)node).getAttributeNode(qname);
            if (attr != null) {
               if (prefix == null &&
                      attr.getNodeValue().equals(""))
                   return null;
               else
                   return attr.getNodeValue();
            }
         }
      }
      return null;
   }
}

The correctElement() method is the core of the program. Given an element, it searches ancestor elements containing the namespace declarations that are used for the given element and its attributes. If ancestors have no matching declaration, it checks the consistency of the declarations and adds Attr nodes representing the missing namespace declarations.

Listing 4.2 shows how NamespaceCorrector adds declarations correctly and checks namespace inconsistency.

Listing 4.2 An example to show how NamespaceCorrector works, chap04/NCTest.java
package chap04;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class NCTest {
   static final String NS0 = "http://example.com/@";
   static final String NS1 = "http://example.com/a";
   static final String XML_NS =
         "http://www.w3.org/XML/1998/namespace";
   static final String XMLNS_NS = "http://www.w3.org/2000/xmlns/";

   public static void main(String[] argv) throws Exception {
      DocumentBuilderFactory dbfactory
             = DocumentBuilderFactory.newInstance();
      dbfactory.setNamespaceAware(true);
      DocumentBuilder builder = dbfactory.newDocumentBuilder();
      Document factory = builder.newDocument();
      OutputFormat format
             = new OutputFormat("xml", "UTF-8", true);
      XMLSerializer serializer
             = new XMLSerializer(System.out, format);

      Element top = factory.createElementNS(null, "Address");
      top.setAttributeNS(XMLNS_NS, "xmlns:p", NS0);

      // Add an element that has the namespace and no prefix.
      Element el1 = factory.createElementNS(NS1, "Zip");
      // el1 has an attribute of which namespace is
      // the same as el1.
      el1.setAttributeNS(NS1, "p:id", "");
      el1.appendChild(factory.createElementNS(null, "Zip2"));
      top.appendChild(el1);

      // Add an element that has the namespace and prefix.
      Element el2 = factory.createElementNS(NS1, "p:State");
      // el2 has an attribute. It has the same prefix
      // and the same NS.
      el1.setAttributeNS(NS1, "p:id", "");
      el2.setAttributeNS(XML_NS, "xml:lang", "en");
      top.appendChild(el2);

      Element el3 = factory.createElementNS(NS0, "p:City");
      top.appendChild(el3);

      // Prints the tree before correction.
      serializer.serialize(top);
      System.out.println("");
      // Correct
      NamespaceCorrector.correct(top);
      // Prints the tree after correction.
      serializer.reset();
      serializer.serialize(top);
      System.out.println("");

      // Another test:
      // p:Country and p:iso2 have the same prefix but
      // different namespaces.
      Element el4 = factory.createElementNS(NS0, "p:Country");
      el4.setAttributeNS(NS1, "p:iso2", "ja");
      // This should throw an exception.
      NamespaceCorrector.correct(el4);
   }
}

First, this program builds a DOM tree that lacks some required namespace declarations, and serializes it to the console. The serialized XML document is broken. Then, NamespaceCorrector fixes the DOM tree and serializes it again. We can see the correct XML document on the console.

Finally, the program tests the behavior for namespace inconsistencies in the DOM tree. In this case, NamespaceCorrector detects inconsistencies such as an element and its attribute having the same prefix but different namespace URIs.

In Listing 4.2, the element el4 has the prefix "p" and the namespace NS0, and its attribute p:iso2 has the same prefix "p" and a different namespace, NS1. In this case, we cannot correct the DOM tree by adding namespace declarations. So NamespaceCorrector throws a DOMException.

R:\samples>java chap04.NCTest
<?xml version="1.0" encoding="UTF-8"?>
<Address xmlns:p="http://example.com/@">
   <Zip p:id="">
      <Zip2/>
   </Zip>
   <p:State xml:lang="en"/>
   <p:City/>
</Address>

<?xml version="1.0" encoding="UTF-8"?>
<Address xmlns:p="http://example.com/@">
   <Zip p:id="" xmlns="http://example.com/a"
xmlns:p="http://example.com/a">
       <Zip2 xmlns=""/>
   </Zip>
   <p:State xml:lang="en" xmlns:p="http://example.com/a"/>
   <p:City/>
</Address>

Exception in thread "main" org.w3c.dom.DOMException: Namespace
inconsistence
       at java.lang.Throwable.<init>(Throwable.java:96)
       at java.lang.Exception.<init>(Exception.java:44)
       at java.lang.RuntimeException.<init>(RuntimeException.java:49)
       at org.w3c.dom.DOMException.<init>(DOMException.java:34)
       at chap04.NamespaceCorrector.set(NamespaceCorrector.java:89)
       at
chap04.NamespaceCorrector.correctElement(NamespaceCorrector.java:78)
       at
chap04.NamespaceCorrector.correct(NamespaceCorrector.java:26)
       at chap04.NCTest.main(NCTest.java:67)

In this example, we could have changed the prefixes of the elements and attributes. However, NamespaceCorrector does not change the prefixes because such changes might cause other problems. If a changed prefix were used in attribute values or character data, we would have another problem.

    [ directory ] Previous Section Next Section