| [ directory ] |
|
6.4 Advanced Xerces TricksXerces2 is a complete redesign of the Xerces parser, with the goal of making Xerces simpler, easier to maintain, and more modular. The foundation of the Xerces2 implementation is a set of interfaces known as the Xerces Native Interface (XNI). XNI is a framework for communicating the "streaming" document information set and constructing modular parser configurations.[7]
The modular framework provided by XNI and implemented by the Xerces2 parser components allows unprecedented flexibility and extensibility. XML application developers can mix and match existing parser components or write new components and configurations that better suit the needs of their applications. In the past, XML developers had to choose from a wide array of different (and often incompatible) parser implementations. With the advent of XNI, the all-or-nothing XML parser model no longer exists. Some example configurations might include the following. HTML parserAn HTML scanner can be implemented and used in place of the default XML scanner in the parsing pipeline. This would allow existing HTML files to be parsed and processed by applications using standard XML APIs. Data binding parserDOM and SAX are not the only ways to access information from parsing an XML document. A parser implementation can be written that receives information from an XNI parser configuration and builds native Java objects, a process known as "data binding."[8] Because XNI defines a standard parser configuration interface, all existing (and future) parser configurations can be used to "drive" the data binding.
XInclude processorA parser component can be added to the parser pipeline to automatically insert the external entities referred to by XInclude elements. This is only a short list of what you can accomplish using the XNI framework. This section teaches you how to use the modularity of XNI and take full advantage of the features in the Xerces2 implementation. 6.4.1 The Xerces Native InterfaceWith several APIs already available for programming XML applications, why is the Xerces Native Interface needed? As the name implies, the interfaces are native (or internal) to the Xerces parsers. XNI is completely independent of existing APIs, such as DOM or SAX, so that parser implementations can be separated and layered by the user without introducing unneeded dependencies. In addition, the existing programming interfaces are "lossy" in that they do not contain valuable document information. XNI was designed to communicate as much document information as possible, within reason. The Xerces Native Interface is divided between a set of core interfaces that define the streaming information set and a framework for building modular parser components and configurations. The core interfaces are contained within the org.apache.xerces.xni package, and the parser framework is defined in the org.apache.xerces.xni.parser package. Figure 6.1 shows the package hierarchy. Figure 6.1. Hierarchy of XNI packages
XNI parsers can be thought of as a series of components connected in a pipeline with separate pipelines for the document and DTD information. However, each pipeline is made by connecting a "source" to zero or more "filters"; the last stage of the pipeline is then connected to a "target" for the information flowing through the pipeline. Typically, a document or DTD scanner acts as the pipeline source. The information scanned from the XML document flows through the filters, which may augment or validate the document information. Finally, the events are received by the target, which generates some type of programming API, such as build a DOM tree or emit SAX events. XNI contains several handler interfaces that define the document and DTD information that flows through the parser pipelines. The document information is defined in the XMLDocumentHandler interface. This handler is similar to a combination of the SAX ContentHandler and LexicalHandler interfaces but has different method prototypes and parameter values, where appropriate, to pass additional information. Listing 6.15 shows the entire interface. Listing 6.15 XMLDocumentHandler interface
package org.apache.xerces.xni;
public interface XMLDocumentHandler {
public void startDocument(XMLLocator locator, String encoding,
Augmentations augs)
throws XNIException;
public void endDocument(Augmentations augs) throws XNIException;
public void xmlDecl(String version, String encoding, String
standalone, Augmentations augs)
throws XNIException;
public void doctypeDecl(String rootElement, String publicId,
String systemId, Augmentations augs)
throws XNIException;
public void comment(XMLString text, Augmentations augs) throws
XNIException;
public void processingInstruction(String target, XMLString data,
Augmentations augs)
throws XNIException;
public void startPrefixMapping(String prefix, String uri,
Augmentations augs)
throws XNIException;
public void endPrefixMapping(String prefix, Augmentations augs)
throws XNIException;
public void startElement(QName element, XMLAttributes attributes,
Augmentations augs)
throws XNIException;
public void emptyElement(QName element, XMLAttributes attributes,
Augmentations augs)
throws XNIException;
public void endElement(QName element, Augmentations augs) throws
XNIException;
public void characters(XMLString text, Augmentations augs) throws
XNIException;
public void ignorableWhitespace(XMLString text, Augmentations
augs) throws XNIException;
public void startGeneralEntity(String name,
XMLResourceIdentifier identifier,
String encoding,
Augmentations augs) throws
XNIException;
public void textDecl(String version, String encoding,
Augmentations augs)
throws XNIException;
public void endGeneralEntity(String name, Augmentations augs)
throws XNIException;
public void startCDATA(Augmentations augs) throws XNIException;
public void endCDATA(Augmentations augs) throws XNIException;
}
Developers familiar with SAX should recognize the XMLDocumentHandler almost immediately. However, you'll notice a few differences between the SAX handlers and the XNI handlers. Some methods were changed to pass more information, whereas other methods were added when the SAX interface was lacking. In addition to the document handler, there are two handler interfaces for DTD information. The XMLDTDHandler communicates the basic markup declarations defined in the internal and external subsets of a document's DTD, whereas the XMLDTDContentModelHandler is used only to break down element content models so that each stage in the DTD pipeline does not have to reproduce the work of parsing the element's content model. Taken alone, the document and DTD handler interfaces don't communicate much more information than what is provided by the SAX interfaces. However, when these interfaces are combined with the XNI parser configuration framework, you can create a myriad of powerful configurations from an assortment of existing and custom modular components. In the following sections, we teach you how to take advantage of this powerful framework. 6.4.2 Components and the Component ManagerAs stated earlier, XNI parsers are made from connecting a series of components to form an XML parsing pipeline. To make these (and other) parser components modular, XNI contains a component manager, which is responsible for storing features and properties that are common to the entire parser configuration. Although there may be many configurable components, there is only one component manager for any given parser configuration, as shown in Figure 6.2. Figure 6.2. Component manager framework
The XMLComponent interface contains methods that allow the component manager to initialize the state of each component and notify the components of changes that occur in the configuration. Before a document is parsed, each component is initialized by calling the reset method with an instance of the component manager, which implements the XMLComponentManager interface. The component then queries the component manager for any features and properties that it requires for its operation. For example, a document scanner needs to know whether namespace processing is enabled so that it can properly scan element names as qualified names. Components in the parser configuration do not need to be directly related to the parsing pipeline, though. A parser configuration may contain other components that are responsible for reporting errors, maintaining a list of commonly used symbols, and so on. What task the components perform is completely up to the parser configuration implementer. However, the Xerces2 reference implementation contains a number of shared components, which are discussed in more detail in Section 6.4.4, Building Parser Configurations from Xerces2 Components. 6.4.3 Parser ConfigurationsThe actual work of parsing XML documents is done within the parser configuration defined by the XMLParserConfiguration interface. In this section, we give an overview of this interface and then implement a complete parser configuration as an example. OverviewIn the XNI parser configuration framework, the parser-processing pipeline is separate from parser instances. Whereas the parser configuration maintains the parser state and constructs the XML parsing pipelines, the parser merely receives the document and DTD information from the parser configuration to generate some type of programming model such as DOM or SAX. This separation allows the same API generating parser class to be used with any parser configurations so that the API generation code never needs to be duplicated. This separation is shown more clearly in Figure 6.3. Figure 6.3. Parser configuration framework
The parser configuration appears as a black box to the parser that uses it. Therefore, the configuration can be implemented as a single unit or can consist of any number of components connected as a processing pipeline, as shown in Figure 6.3. The only thing required is for the class to implement the XMLParserConfiguration interface. Because a parser configuration can be implemented as a single unit, there is no requirement that it use the XNI component and component manager framework. That framework is provided only as a convenience when reusing existing Xerces2 components or creating an entirely new modular configuration. In the following example, we implement the entire parser configuration as a single class. A Simple ExampleIn addition, there is no requirement that the parser configuration be used to parse XML documents. The parser configuration can parse any type of information as long as it exposes it to the parser as a series of XNI events. To demonstrate this, we create a simple parser configuration that is capable of parsing a text file of the following form: They Might Be Giants:189EFCF:Flood Shonen Knife:2A77609:Brand New Knife Each line in this file is a separate record containing any number of fields separated by colons. Listing 6.16 shows the source code that implements such a parser configuration. Listing 6.16 Creating a parser configuration, chap06/xni/SimpleConfiguration.java
package chap06.xni;
import java.io.*;
import java.util.*;
import org.apache.xerces.util.XMLAttributesImpl;
import org.apache.xerces.util.XMLStringBuffer;
import org.apache.xerces.xni.QName;
import org.apache.xerces.xni.XMLDocumentHandler;
import org.apache.xerces.xni.XMLDTDHandler;
import org.apache.xerces.xni.XMLDTDContentModelHandler;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLConfigurationException;
import org.apache.xerces.xni.parser.XMLEntityResolver;
import org.apache.xerces.xni.parser.XMLErrorHandler;
import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.parser.XMLParserConfiguration;
public class SimpleConfiguration
implements XMLParserConfiguration {
protected static final QName ROOT = new QName(null, "root",
"root", null);
protected static final QName ROW = new QName(null, "row", "row",
null);
protected static final QName COL = new QName(null, "col", "col",
null);
protected static final XMLStringBuffer NEWLINE = new
XMLStringBuffer("\n");
protected static final XMLStringBuffer SPACE1 = new
XMLStringBuffer(" ");
protected static final XMLStringBuffer SPACE2 = new
XMLStringBuffer(" ");
private final XMLStringBuffer text = new XMLStringBuffer();
private final XMLAttributesImpl attributes = new
XMLAttributesImpl();
protected XMLDocumentHandler documentHandler;
public void setDocumentHandler(XMLDocumentHandler handler) {
documentHandler = handler;
}
public XMLDocumentHandler getDocumentHandler() {
return documentHandler;
}
public void parse(XMLInputSource source)
throws IOException, XNIException {
Reader reader = source.getCharacterStream();
boolean openedStream = false;
if (reader == null) {
InputStream stream = source.getByteStream();
if (stream == null) {
openedStream = true;
stream = new FileInputStream(source.getSystemId());
}
reader = new InputStreamReader(stream, "UTF8");
}
documentHandler.startDocument(null, "UTF8", null);
documentHandler.startElement(ROOT, attributes, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
BufferedReader in = new BufferedReader(reader);
String line;
while ((line = in.readLine()) != null) {
StringTokenizer tokenizer = new StringTokenizer(line, ":");
documentHandler.ignorableWhitespace(SPACE1, null);
documentHandler.startElement(ROW, attributes, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
if (tokenizer.hasMoreTokens()) {
while (tokenizer.hasMoreTokens()) {
documentHandler.ignorableWhitespace(SPACE2, null);
documentHandler.startElement(COL, attributes, null);
text.clear();
text.append(tokenizer.nextToken());
documentHandler.characters(text, null);
documentHandler.endElement(COL, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
}
documentHandler.ignorableWhitespace(SPACE1, null);
documentHandler.endElement(ROW, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
}
}
documentHandler.endElement(ROOT, null);
documentHandler.endDocument(null);
if (openedStream) {
in.close();
}
}
public void addRecognizedFeatures(String[] featureIds) {}
public void setFeature(String featureId, boolean state) {}
public boolean getFeature(String featureId) { return false; }
public void addRecognizedProperties(String[] propertyIds) {}
public void setProperty(String propertyId, Object value) {}
public Object getProperty(String propertyId) { return null; }
public void setDTDHandler(XMLDTDHandler handler) {}
public XMLDTDHandler getDTDHandler() { return null; }
public void setDTDContentModelHandler(XMLDTDContentModelHandler
handler) {}
public XMLDTDContentModelHandler getDTDContentModelHandler() {
return null;
}
public void setErrorHandler(XMLErrorHandler handler) {}
public XMLErrorHandler getErrorHandler() { return null; }
public void setEntityResolver(XMLEntityResolver resolver) {}
public XMLEntityResolver getEntityResolver() { return null; }
public void setLocale(Locale locale) {}
public Locale getLocale() { return null; }
}
This parser configuration parses text files that are in the specified form and emits XNI events as if the following document were parsed by a normal XML parser: <root> <row> <col>They Might Be Giants</col> <col>189EFCF</col> <col>Flood</col> </row> <row> <col>Shonen Knife</col> <col>2A77609</col> <col>Brand New Knife</col> </row> </root> Now let's examine in more detail how this parser configuration works. To start, we created a class that implements the XMLParserConfiguration interface. Because the format of our file is so simple, we don't care about resolving external entities or communicating DTD information. Therefore, we can provide empty implementations for most of the parser configuration methods. The only methods we need to implement are setDocumentHandler, so that our parser configuration knows which handler should receive the document information, and the parse method, which does the actual work. The first thing we do in the parse method is to retrieve the input source stream or open it, if needed. For simplicity, we assume that the text file is encoded using only UTF-8.
Reader reader = source.getCharacterStream();
boolean openedStream = false;
if (reader == null) {
InputStream stream = source.getByteStream();
if (stream == null) {
openedStream = true;
stream = new FileInputStream(source.getSystemId());
}
reader = new InputStreamReader(stream, "UTF8");
}
Next we start sending document events with the following code:[9]
documentHandler.startDocument(null, "UTF8", null); documentHandler.startElement(ROOT, attributes, null); documentHandler.ignorableWhitespace(NEWLINE, null); Notice that the startElement call makes reference to ROOT and attributes. Because we use the same element qualified names repeatedly, we have defined some convenient constants at the beginning of the class. In addition, we've defined some convenient whitespace padding buffers and created an empty attributes instance to pass to the document handler when we call startElement. The following code shows these constants and local variables that we use:
protected static final QName ROOT = new QName(null, "root", "root",
null);
protected static final QName ROW = new QName(null, "row", "row",
null);
protected static final QName COL = new QName(null, "col", "col",
null);
protected static final XMLStringBuffer NEWLINE = new
XMLStringBuffer("\n");
protected static final XMLStringBuffer SPACE1 = new XMLStringBuffer("
");
protected static final XMLStringBuffer SPACE2 = new XMLStringBuffer("
");
private final XMLStringBuffer text = new XMLStringBuffer();
private final XMLAttributesImpl attributes = new XMLAttributesImpl();
Now the real work of the parse method is done. Using BufferedReader, we read each line, tokenizing it with a Java StringTokenizer, and create the appropriate XNI events for the registered document handler. This code is pretty straightforward.
BufferedReader in = new BufferedReader(reader);
String line;
while ((line = in.readLine()) != null) {
StringTokenizer tokenizer = new StringTokenizer(line, ":");
documentHandler.ignorableWhitespace(SPACE1, null);
documentHandler.startElement(ROW, attributes, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
if (tokenizer.hasMoreTokens()) {
while (tokenizer.hasMoreTokens()) {
documentHandler.ignorableWhitespace(SPACE2, null);
documentHandler.startElement(COL, attributes, null);
text.clear();
text.append(tokenizer.nextToken());
documentHandler.characters(text, null);
documentHandler.endElement(COL, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
}
documentHandler.ignorableWhitespace(SPACE1, null);
documentHandler.endElement(ROW, null);
documentHandler.ignorableWhitespace(NEWLINE, null);
}
}
Once we reach the end of the input file, the only thing that remains is to send the end of the document and close the input stream if it was opened within the parse method.[10]
documentHandler.endElement(ROOT, null);
documentHandler.endDocument(null);
if (openedStream) {
in.close();
}
We can test our parser configuration using one of the sample programs that come with Xerces2. Most of the command-line XNI samples have a -p option, which allows you to specify a parser configuration by name to use to run the sample. Using the xni.Writer sample with our sample configuration, we see the following output on the console:
R:\samples>java xni.Writer -p chap06.xni.SimpleConfiguration
chap06/data/collection.txt
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<col>They Might Be Giants</col>
<col>189EFCF</col>
<col>Flood</col>
</row>
<row>
<col>Shonen Knife</col>
<col>2A77609</col>
<col>Brand New Knife</col>
</row>
</root>
We have implemented almost the simplest parser configuration possible. However, a real-world implementation of a parser configuration requires more work. Next, we give an overview of the responsibilities of a full parser configuration. Parser Configuration ResponsibilitiesIn general, a parser configuration is responsible for the following:
We take a quick look at each of these responsibilities in turn. First, the parser configuration is responsible for maintaining feature and property settings. What does this mean? Simply put, the parser configuration must keep track of which features and properties are recognized and the values set for these features and properties so that they may be queried by the parser components when initialized. Because the parser configuration is responsible for holding all (or at least most) of the parser state, the parser class using the parser configuration may call the addRecognizedFeatures method and the addRecognized Properties method to add features and properties that should be accepted and stored by the parser configuration. Also, when a new component is added to the configuration, the features and properties that it recognizes should be added to the list of those already recognized by the parser component. The parser configuration can query the features and properties recognized by a component by calling the getRecognized Features and getRecognizedProperties methods on the component. This list of recognized features and properties is important because when the parser configuration is queried for features or properties, the configuration should signal unrecognized ones by throwing a configuration exception that indicates this fact. Next, the parser configuration must configure the parser pipeline, if appropriate. This simply means that the document sources, filters, and the registered document handler are chained together to form a pipeline for the document information. Last, before parsing the input document, the parser configuration must initialize the state of each configurable component by calling the reset method, passing a reference to itself as the component manager. This gives each component an opportunity to query those features and properties that are required for its proper operation. In addition, anytime during parsing, if a feature or property changes state, each configurable component should be notified of the change by calling setFeature or setProperty. Fortunately, most of this work is done for you by the default parser configuration implementations that come with the Xerces2 reference implementation. We take a little closer look at the basic parser configuration implementation in the next section. 6.4.4 Building Parser Configurations from Xerces2 ComponentsThe Xerces2 parser is built on the XNI parser framework to handle all the common parser tasks as a series of separable modules. There are components to scan documents and DTDs, components to perform validation, components to bind namespace, and parser classes to generate DOM and SAX for all XNI parser configurations. Standard Xerces2 ComponentsThe standard set of parser components in Xerces2 includes the following:
This library of parser components allows you to rearrange the default components as needed or combine them with custom components to create new types of parsers and parser configurations. When doing this, however, you should know the dependencies among the Xerces2 components. This section lists those dependencies and gives an example that builds a parser configuration using the Xerces2 components. Component DependenciesThe Xerces2 reference implementation that is required by most configurable components uses two standard components: the symbol table and the error reporter. For performance reasons, commonly used strings are stored within a symbol table to allow string comparisons to be done directly using the string references. Also, each component needs a common way to report errors to the application. The error reporter is used for this purpose. These components are stored within the parser configuration as properties using the property identifiers shown in Table 6.1. In addition to depending on the symbol table and the error reporter, the document and DTD scanners both depend on an entity manager, which is responsible for starting and stopping entities. The entity manager makes this process transparent so that the scanners don't have to worry about the low-level scanning and management of entities; the scanners can just implement code to parse the document and DTD structures. The entity manager is also stored in the parser configuration using the property identifier shown in Table 6.1. And the document scanner depends on the DTD scanner in order to scan the internal and external subsets of the DTD. The DTD scanner is stored using the property identifier shown in Table 6.1. Xerces2 Parser ConfigurationsThe Xerces2 reference implementation contains a couple of parser configuration classes to simplify the construction of new parser configurations. The BasicParserConfiguration class provides an abstract skeleton for implementing new parser configurations. This class manages components and parser configuration feature and property settings. To use this base class properly, the subclass is required to do the following:[11]
In addition to the basic parser configuration, Xerces2 contains the StandardParserConfiguration class, which constructs the standard configuration used by the default Xerces2 parsers. This configuration instantiates, registers, and configures the standard parser component pipeline. Figure 6.4 illustrates the standard configuration pipeline. Figure 6.4. Standard configuration pipeline
The standard parser configuration uses protected factory methods for constructing the standard set of parser components. This allows the user to create a configuration based on the standard configuration very quickly and easily. These factory methods are the following: protected XMLErrorReporter createErrorReporter(); protected XMLEntityManager createEntityManager(); protected XMLDocumentScanner createDocumentScanner(); protected XMLDTDScanner createDTDScanner(); protected XMLDTDValidator createDTDValidator(); protected XMLNamespaceBinder createNamespaceBinder(); Non-Validating Parser Configuration ExampleWe now proceed to build a non-validating parser configuration by building a parsing pipeline that does not contain the XMLDTDValidator component.[12]
For this parser configuration, we extend the BasicParserConfiguration to simplify the implementation. The code for this configuration is shown in Listing 6.17. Listing 6.17 Non-validating parser configuration, chap06/xni/NonValidatingConfiguration.java
package chap06.xni;
import java.io.IOException;
import java.util.Locale;
import org.apache.xerces.impl.XMLDocumentScannerImpl;
import org.apache.xerces.impl.XMLDTDScannerImpl;
import org.apache.xerces.impl.XMLEntityManager;
import org.apache.xerces.impl.XMLErrorReporter;
import org.apache.xerces.impl.XMLNamespaceBinder;
import org.apache.xerces.parsers.BasicParserConfiguration;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLInputSource;
public class NonValidatingConfiguration
extends BasicParserConfiguration {
public static final String ERROR_REPORTER =
"http://apache.org/xml/properties/internal/error-reporter";
public static final String ENTITY_MANAGER =
"http://apache.org/xml/properties/internal/entity-manager";
public static final String DOCUMENT_SCANNER =
"http://apache.org/xml/properties/internal/document-scanner";
public static final String DTD_SCANNER =
"http://apache.org/xml/properties/internal/dtd-scanner";
public static final String LOCALE =
"http://apache.org/xml/properties/internal/locale";
protected XMLErrorReporter errorReporter = new XMLErrorReporter();
protected XMLEntityManager entityManager = new XMLEntityManager();
protected XMLDocumentScannerImpl docScanner = new
XMLDocumentScannerImpl();
protected XMLDTDScannerImpl dtdScanner = new XMLDTDScannerImpl();
protected XMLNamespaceBinder namespaceBinder = new
XMLNamespaceBinder();
public NonValidatingConfiguration() {
String[] recognizedProperties = {
ERROR_REPORTER, ENTITY_MANAGER,
DOCUMENT_SCANNER, DTD_SCANNER,
LOCALE,
};
addRecognizedProperties(recognizedProperties);
setProperty(ERROR_REPORTER, errorReporter);
setProperty(ENTITY_MANAGER, entityManager);
setProperty(DOCUMENT_SCANNER, docScanner);
setProperty(DTD_SCANNER, dtdScanner);
addComponent(errorReporter);
addComponent(entityManager);
addComponent(docScanner);
addComponent(dtdScanner);
addComponent(namespaceBinder);
}
public void parse(XMLInputSource source)
throws IOException, XNIException {
reset();
configurePipeline();
docScanner.setInputSource(source);
docScanner.scanDocument(true);
}
public void setLocale(Locale locale) {
setProperty(LOCALE, locale);
}
protected void configurePipeline() {
docScanner.setDocumentHandler(namespaceBinder);
namespaceBinder.setDocumentHandler(fDocumentHandler);
dtdScanner.setDTDHandler(fDTDHandler);
dtdScanner.setDTDContentModelHandler(fDTDContentModelHandler);
}
}
First, we create the components that we will use in our parser configuration. Some of the components comprise the document and DTD pipelines and other components are required by the pipeline components.[13]
protected XMLErrorReporter errorReporter = new XMLErrorReporter(); protected XMLEntityManager entityManager = new XMLEntityManager(); protected XMLDocumentScannerImpl docScanner = new XMLDocumentScannerImpl(); protected XMLDTDScannerImpl dtdScanner = new XMLDTDScannerImpl(); protected XMLNamespaceBinder namespaceBinder = new XMLNamespaceBinder(); The basic parser configuration manages the feature and property settings as long we call the correct methods. Therefore, in the constructor, we add the set of configurable components, set the properties for our components so that they are accessible by other components in the system, and then add the configurable components to the configuration.
String[] recognizedProperties = {
ERROR_REPORTER, ENTITY_MANAGER,
DOCUMENT_SCANNER, DTD_SCANNER,
LOCALE,
};
addRecognizedProperties(recognizedProperties);
.
setProperty(ERROR_REPORTER, errorReporter);
setProperty(ENTITY_MANAGER, entityManager);
setProperty(DOCUMENT_SCANNER, docScanner);
setProperty(DTD_SCANNER, dtdScanner);
.
addComponent(errorReporter);
addComponent(entityManager);
addComponent(docScanner);
addComponent(dtdScanner);
Next, we implement a configurePipeline method to connect the document and DTD pipelines.
protected void configurePipeline() {
docScanner.setDocumentHandler(namespaceBinder);
namespaceBinder.setDocumentHandler(fDocumentHandler);
dtdScanner.setDTDHandler(fDTDHandler);
dtdScanner.setDTDContentModelHandler(fDTDContentModelHandler);
}
Finally, we implement the parse method to reset the components, configure the pipeline, and start the document scanner.
public void parse(XMLInputSource source)
throws IOException, XNIException {
reset();
configurePipeline();
docScanner.setInputSource(source);
docScanner.scanDocument(true);
}
Using the xni.Writer sample with our sample configuration, we see the following output on the console:
R:\samples>java xni.Writer -p chap06.xni.NonValidatingConfiguration
chap06/data/collection-ns1.xml
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<album cd-id="189EFCF">
<artist>They Might Be Giants</artist>
<title>Flood</title>
</album>
</collection>
Compare this with the output from the default configuration that has a DTD validator in the pipeline:
R:\samples>java xni.Writer chap06/data/collection-ns1.xml
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.company.com/">
<album cd-id="189EFCF">
<artist>They Might Be Giants</artist>
<title>Flood</title>
</album>
</collection>
Notice that the non-validating parser configuration did not add the default xmlns attribute specified in the DTD. The XNI examples presented here are simple in scope but demonstrate the flexibility of the framework in constructing new parser components and configurations. Because the Xerces2 parser is designed around this framework, a whole new class of XML applications can be built. Although the DOM and SAX parsers are sufficient for most developers, advanced application programmers will appreciate the power and robustness of the XNI framework. |
| [ directory ] |
|