| [ directory ] |
|
6.3 Basic Xerces TricksYou can use the general tricks to solve many common XML application problems, regardless of what parser implementation you use. However, at times you need to take advantage of the features of a specific parser. This section shows how to use the Xerces parser to solve problems that you cannot solve by using standard interfaces. 6.3.1 Extended Parser OptionsSAX version 2 introduced a generic configuration mechanism to allow parser implementations to be extended without requiring the parser interface to be updated. Parser features and properties are set by name using the setFeature() and setProperty() methods. SAX defines a small set of core features and properties to be used with XML parsers that implement these methods. Most of the features and properties are supported by Xerces and other parsers. In addition to the core settings, Xerces supports extra features and properties that you may find useful. The Xerces documentation contains a full list of the available features and properties, but here's a list of some of the more useful features. http://apache.org/xml/features/validation/dynamic This feature allows validation to be performed based on whether the document contains a DOCTYPE line. The standard validation feature must be turned on for this extended feature to work. http://apache.org/xml/features/allow-java-encodings This feature allows Java encoding names to be used in the XMLDecl and TextDecl lines of the XML document and external parsed entities, respectively. By default, only Internet Assigned Numbers Authority (IANA) encoding names are recognized (assuming that a decoder is available). http://apache.org/xml/features/validation/schema This feature allows the application to explicitly turn off support for validating documents against XML Schemas. By default, Xerces supports both DTD and XML Schema validation. http://apache.org/xml/features/nonvalidating/load-dtd-grammar This feature allows the application to prevent the parser from using the attribute declarations in the DTD to add default attribute values and attribute type information to elements in the document. This feature is always on when validation is enabled. http://apache.org/xml/features/nonvalidating/load-external-dtd This feature allows the application to prevent the parser from loading the external DTD referenced in the DOCTYPE line of the document. By default, the Xerces XML parser reads the external DTD even in non-validating mode so that default attribute values and attribute value normalization can be applied. This feature is always on when validation is enabled. http://apache.org/xml/features/dom/create-entity-ref-nodes This feature specifies whether entity reference nodes are created in the DOM document. By default, entity reference nodes are created. This feature is only available for the Xerces DOM parser. The DOM Level 2 Views allow you to accomplish the same thing, but this Xerces feature allows you to avoid writing code. http://apache.org/xml/features/dom/include-ignorable-whitespace This feature specifies whether ignorable whitespace nodes are created in the DOM document. By default, all character content (even ignorable whitespace) is considered significant. This feature is only available for the Xerces DOM parser. A grammar must be available and processed (even if not used for validation) in order to use this feature. Without a grammar, the parser has no way of determining what can be considered ignorable whitespace. http://apache.org/xml/features/dom/defer-node-expansion This feature allows the DOM parser to create a deferred document that is expanded as the application traverses the tree. The deferred document builds the full DOM document faster and saves memory when the application does not traverse the entire tree. This feature works only when the DOM document factory is set to the default Xerces DOM implementation. Here is a list of a few of the extended properties. http://apache.org/xml/properties/dom/current-element-node This read-only property returns the current DOM element as the document tree is being constructed. You may think that this property could be used to associate validation errors with their DOM nodes, but there are some caveats:
http://apache.org/xml/properties/dom/document-class-name This property allows the application to set the DOM document factory by name. The DOM parser will then use this factory to create the document nodes. The document class used must have a default, no-argument constructor. 6.3.2 Custom DOM ImplementationApplication data is often organized in a hierarchical structure resembling a tree. Because an XML document is also a tree, it lends itself to being used to model application data. There are many benefits to using DOM: it is a convenient in-memory representation for the data; it provides an interface to XML processors such as Xalan for transformations; and it allows a platform- and application independent serialization mechanism. However, because DOM is a generic tree model, it is sometimes more convenient to provide a custom interface for the application. Therefore, in this section we create a DOM document that creates custom objects to model the music collection DTD defined in Section 6.2.1. The Xerces parser makes this easy by allowing the application to set the DOM document implementation by name. First, as shown in Listings 6.8 through 6.11, we write custom classes to model the elements. To avoid having to write all the supporting DOM code, we extend the DOM implementation classes in the org.apache.xerces.dom package. Listing 6.8 Creating the MusicCollection class, chap06/music/ MusicCollection.java
package chap06.music;
import java.io.IOException;
import java.util.Enumeration;
import chap06.util.DOMUtil;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.dom.ElementImpl;
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.DOMException;
import org.w3c.dom.Element;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class MusicCollection
extends ElementImpl {
public MusicCollection(DocumentImpl ownerDoc) {
super(ownerDoc, "collection");
}
public static MusicCollection loadCollection(String systemId)
throws SAXException, IOException {
return loadCollection(new InputSource(systemId));
}
public static MusicCollection loadCollection(InputSource inputSource)
throws SAXException, IOException {
DOMParser parser = new DOMParser();
parser.setProperty("http://apache.org/xml/properties/dom/
document-class-name",
MusicCollection.Document.class.getName());
parser.parse(inputSource);
return (MusicCollection)parser.getDocument().getDocumentElement();
}
public Enumeration getAlbums() {
final MusicCollection collection = this;
return new Enumeration() {
private Element place =
DOMUtil.getFirstChildElement(collection, "album");
public boolean hasMoreElements() {
return place != null;
}
public Object nextElement() {
Element album = place;
place = DOMUtil.getNextSiblingElement(place, "album");
return album;
}
};
}
public static class Document
extends DocumentImpl {
public Element createElement(String name) {
if (name.equals("collection")) {
return new MusicCollection(this);
}
if (name.equals("album")) {
return new Album(this);
}
if (name.equals("artist")) {
return new Artist(this);
}
if (name.equals("title")) {
return new Title(this);
}
throw new DOMException(DOMException.NOT_SUPPORTED_ERR, name);
}
public Element createElementNS(String uri, String name) {
int index = name.indexOf(":");
if (index != -1) {
return createElement(name.substring(index + 1));
}
return createElement(name);
}
}
}
Listing 6.9 Creating the Album class, chap06/music/Album.java
package chap06.music;
import chap06.util.DOMUtil;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.dom.ElementImpl;
import org.w3c.dom.Element;
public class Album
extends ElementImpl {
public Album(DocumentImpl ownerDoc) {
super(ownerDoc, "album");
}
public Artist getArtist() {
return (Artist)DOMUtil.getFirstChildElement(this, "artist");
}
public Title getTitle() {
return (Title)DOMUtil.getFirstChildElement(this, "title");
}
}
Listing 6.10 Creating the Artist class, chap06/music/Artist.java
package chap06.music;
import chap06.util.DOMUtil;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.dom.ElementImpl;
public class Artist
extends ElementImpl {
public Artist(DocumentImpl ownerDoc) {
super(ownerDoc, "artist");
}
public void setName(String name) {
DOMUtil.setNodeValue(this, name);
}
public String getName() {
return DOMUtil.getNodeValue(this);
}
}
Listing 6.11 Creating the Title class, chap06/music/Title.java
package chap06.music;
import chap06.util.DOMUtil;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.dom.ElementImpl;
public class Title
extends ElementImpl {
public Title(DocumentImpl ownerDoc) {
super(ownerDoc, "title");
}
public void setName(String name) {
DOMUtil.setNodeValue(this, name);
}
public String getName() {
return DOMUtil.getNodeValue(this);
}
}
To keep the code simple and eliminate duplication, we use a simple DOM utility class called DOMUtil, as shown in Listing 6.12. This dramatically simplifies your application code and makes the DOM easier to program. Listing 6.12 Creating the DOM utility class, chap06/util/DOMUtil.java
package chap06.util;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class DOMUtil {
private DOMUtil() {}
public static void setNodeValue(Node parent, String value) {
Node child = parent.getFirstChild();
while (child != null) {
parent.removeChild(child);
child = parent.getFirstChild();
}
Node text = parent.getOwnerDocument().createTextNode(value);
parent.appendChild(text);
}
public static String getNodeValue(Node parent) {
StringBuffer str = new StringBuffer();
Node child = parent.getFirstChild();
while (child != null) {
if (child.getNodeType() == Node.TEXT_NODE) {
str.append(child.getNodeValue());
}
child = child.getNextSibling();
}
return str.toString();
}
public static Element getFirstChildElement(Node parent, String name) {
Node child = parent.getFirstChild();
while (child != null) {
if (child.getNodeType() == Node.ELEMENT_NODE) {
if (child.getNodeName().equals(name)) {
return (Element)child;
}
}
child = child.getNextSibling();
}
return null;
}
public static Element getNextSiblingElement(Node node, String name) {
Node sibling = node.getNextSibling();
while (sibling != null) {
if (sibling.getNodeType() == Node.ELEMENT_NODE) {
if (sibling.getNodeName().equals(name)) {
return (Element)sibling;
}
}
sibling = sibling.getNextSibling();
}
return null;
}
}
The MusicCollection class has two important parts. The first part is the implementation of a document factory, as shown in Listing 6.13, to create our custom objects as the DOM tree is constructed by the parser. Listing 6.13 Document factory
public static class Document
extends DocumentImpl {
public Element createElement(String name) {
if (name.equals("collection")) {
return new MusicCollection(this);
}
if (name.equals("album")) {
return new Album(this);
}
if (name.equals("artist")) {
return new Artist(this);
}
if (name.equals("title")) {
return new Title(this);
}
throw new DOMException(DOMException.NOT_SUPPORTED_ERR, name);
}
public Element createElementNS(String uri, String name) {
int index = name.indexOf(":");
if (index != -1) {
return createElement(name.substring(index + 1));
}
return createElement(name);
}
}
The second part is constructing a Xerces DOMParser and using our custom document factory as the document class used to construct the tree when a file is parsed, as shown in the following code.
DOMParser parser = new DOMParser();
parser.setProperty("http://apache.org/xml/properties/dom/document-
class-name",
MusicCollection.Document.class.getName());
parser.parse(inputSource);
The sample program in Listing 6.14 uses the custom document implementation to load a music collection and display all the artists in the file to the standard output. Notice how there are no direct calls to DOM methods in the code. Listing 6.14 Program to display the artists, chap06/music/DisplayArtists.java
package chap06.music;
import java.util.Enumeration;
public class DisplayArtists {
public static void main(String[] argv) throws Exception {
for (int i = 0; i < argv.length; i++) {
MusicCollection collection = MusicCollection.
loadCollection(argv[i]);
Enumeration albums = collection.getAlbums();
while (albums.hasMoreElements()) {
Album album = (Album)albums.nextElement();
Artist artist = album.getArtist();
System.out.println(artist.getName());
}
}
}
}
Running this program with the following XML document, called collection.xml, produces the output shown after it.
<!DOCTYPE collection SYSTEM 'collection.dtd'>
<collection>
<album cd-id='189EFCF'>
<artist>They Might Be Giants</artist>
<title>Flood</title>
</album>
<album cd-id='2A77609'>
<artist>Shonen Knife</artist>
<title>Brand New Knife</title>
</album>
</collection>
R:\samples>java chap06.music.DisplayArtists
chap06/data/collection.xml
They Might Be Giants
Shonen Knife
In addition to implementing standard XML programming interfaces and providing special features for application developers, Xerces provides a modular parser framework. This new framework gives users more choices in configuring the parser and allows new parser components and configurations to be written. The next section describes this framework and provides a basis for advanced developers to start writing their own parser configurations. |
| [ directory ] |
|