站内搜索: 请输入搜索关键词
当前页面: 图书首页 > XML and Java: Developing Web Applications, Second Edition

XML and Java: Developing Web Applications, Second Edition

[ directory ] Previous Section Next Section

2.3 More about Parsing XML Documents

This section covers more parsing examples in various environments.

2.3.1 Parsing XML Documents with Namespaces

Namespaces allows us to use multiple tag sets in a single XML document. However, we need a trick when we want to use a DTD to specify the structure of the document. This section covers how to parse and validate XML documents with namespaces. Listing 2.9 shows an example of an XML document with a namespace.

Listing 2.9 XML document with namespaces, chap02/department-ns.xml
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE org:department SYSTEM "department-ns.dtd">
<org:department xmlns:org="http://www.schema.org/department/">
   <org:employee id="J.D">
      <org:name>John Doe</org:name>
      <org:email>John.Doe@foo.com</org:email>
   </org:employee>
   <org:employee id="B.S">
      <org:name>Bob Smith</org:name>
      <org:email>Bob.Smith@foo.com</org:email>
   </org:employee>

   <org:employee id="A.M">
      <org:name>Alice Miller</org:name>
      <org:url href="http://www.foo.com/~amiller/"/>
   </org:employee>
</org:department>

This document is valid, so we can parse it with a validating processor. However, if the processor is not namespace-aware, the element type name of the root element is org:department, and it is impossible to access the namespace prefix (org), local name (department), and namespace URI (http://www.schema.org/department/). To handle the namespace correctly, we should tell an XML processor to be aware of the namespace. In Xerces, we use the setFeature() method as we set the validation feature mentioned in the previous section.

The DTD for the document is shown in Listing 2.10. Because DTD does not support namespaces, we should embed the namespace prefix in the element declaration in order to validate the document, although the prefix do not have to be fixed. Refer to Section 6.2.1 for the trick of using parameter entities.

Listing 2.10 XML document with DTD and namespaces, chap02/department-ns.dtd
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT org:department (org:employee)*>
<!ATTLIST org:department xmlns:org CDATA
              #FIXED "http://www.schema.org/department/">
<!ELEMENT org:employee (org:name, (org:email | org:url))>
<!ATTLIST org:employee id CDATA #REQUIRED>
<!ELEMENT org:name (#PCDATA)>
<!ELEMENT org:email (#PCDATA)>
<!ELEMENT org:url EMPTY>
<!ATTLIST org:url href CDATA #REQUIRED>

Listing 2.11 shows a sample program to validate an XML document with namespaces.

Listing 2.11 Parsing an XML document with namespaces, chap02/SimpleParseWithNS.java
       package chap02;
       /**
        *       SimpleParseWithNS.java
        **/
       import org.w3c.dom.Document;
       import org.xml.sax.InputSource;
       import org.xml.sax.SAXException;
       import org.xml.sax.SAXParseException;
       import org.xml.sax.ErrorHandler;
       import org.apache.xerces.parsers.DOMParser;
       import org.apache.xerces.parsers.SAXParser;
       import org.w3c.dom.Document;
       import share.util.MyErrorHandler;
       import java.io.IOException;

       public class SimpleParseWithNS {

           public static void main(String[] argv) {
        if (argv.length != 1) {
            System.err.println(
                             "Usage: java chap02.SimpleParseWithNS
       <filename>");
            System.exit(1);
       }
       try {
           // Creates parser object
           DOMParser parser = new DOMParser();
           // Tells the parser to validate documents
           parser.setFeature(
                 "http://xml.org/sax/features/validation", true);
[30]         // Tells the parser to be aware of namespaces
[31]         parser.setFeature(
[32]                "http://xml.org/sax/features/namespaces", true);
           // Sets an ErrorHandler
           parser.setErrorHandler(new MyErrorHandler());
           // Parses an XML Document
           parser.parse(argv[0]);
           // Gets a Document object
           Document doc = parser.getDocument();
           // Does something
       } catch (Exception e) {
           e.printStackTrace();
       }
           }
       }

This program is very similar to SimpleParseWithValidation, shown in Listing 2.6. The difference is the following line:

[31] parser.setFeature("http://xml.org/sax/features/namespaces", true);

This tells the processor to recognize namespaces. As a result, an application can access the namespace prefix, the local part, and the URI with the method explained in Chapter 4. In the current version of Xerces, the default value of this feature is true. However, we recommend that you specify the feature explicitly. If you set the value to false in this feature, the parser does not recognize namespaces. For example, if the element type name is ns1:root, in which ns1 is the namespace prefix, the prefix is just treated as a part of the element type name. Therefore, it is natural to keep the default value (true) if you have a strong reason not to use it (generally, namespace handling involves some cost). Further discussions on namespaces, the DTD, and XML Schema appear in Sections 6.2.1 and 9.2.

2.3.2 Parsing XML Documents with XML Schema

XML Schema is the specification to enhance the DTD and has some advantages over the DTD. It became a W3C Recommendation on May 2, 2001. Chapter 9 discusses XML Schema in detail. In this section, we show a basic technique to parse and validate an XML document with an XML Schema. Xerces supports most of the XML Schema specification, but the details are still under discussion (see http://xml.apache.org/xerces-j/schema.html for the limitations of the support), so you may be warned when parsing an XML document with XML Schema validation. Listing 2.12 shows an example of XML Schema that presents the same structure of the document.

Listing 2.12 XML Schema example, chap02/department.xsd
<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:org="urn:department"
        targetNamespace="urn:department">

 <element name="department">
  <complexType>
   <sequence>
     <element ref="org:employee" minOccurs='0'
                                 maxOccurs='unbounded'/>

   </sequence>
  </complexType>
 </element>

 <element name="employee">
  <complexType>
   <sequence>
     <element ref="org:name"/>
     <choice>
      <element ref="org:email" minOccurs='0' maxOccurs='1'/>
      <element ref="org:url"   minOccurs='0' maxOccurs='1'/>
     </choice>
   </sequence>
   <attribute name="id"  type="ID" use='required'/>
  </complexType>
 </element>

 <element name="name" type='string'/>

 <element name="email" type='string'/>

 <element name="url">
  <complexType>
   <attribute name="href" type="string" use='required'/>
  </complexType>
 </element>

</schema>

You can see an XML document with XML Schema in Listing 2.13. The location of the schema is specified by using the xsi:schemaLocation attribute at the root element of the document.

Listing 2.13 XML document with XML Schema, chap02/department-schema.xml
<?xml version="1.0" encoding="utf-8"?>
<org:department xmlns:org="urn:department"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:department department.xsd">

  <org:employee id="J.D">
    <org:name>John Doe</org:name>
    <org:email>John.Doe@foo.com</org:email>
  </org:employee>

  <org:employee id="B.S">
    <org:name>Bob Smith</org:name>
    <org:email>Bob.Smith@foo.com</org:email>
  </org:employee>

  <org:employee id="A.M">
    <org:name>Alice Miller</org:name>
    <org:url href="http://www.foo.com/~amiller/"/>
  </org:employee>
</org:department>

This sets the feature for the support of XML Schema. The following is the output of the program.

R:\samples>java chap02.SimpleParseWithSchemaValidation chap02/
department-schema.xml

The program to validate the document is SimpleSchemaWithSchemaValidation, which is stored in the CD-ROM. The difference between this program and SimpleParseWithNS (see Listing 2.11) is the next line:

parser.setFeature
(http://apache.org/xml/features/validation/schema",true);

This set is the feature for the support of the XML Schema. The following is the output of the program:

R:\samples>java chap02.SimpleParseWithSchemaValidation
chap02/department-schema.xml

2.3.3 Design Point: The DTD versus XML Schema

At the time of this writing, several schema languages are available, including the DTD, XML Schema, and RELAX (see Chapter 16), to specify the structure of an XML document. This section describes the pros and cons of using a DTD and XML Schema.

If you want to define data types, you should use XML Schema. A DTD essentially has a single data type: String. It does not support numeric data types, for example.

If you want to use namespaces, using XML Schema is better, because the DTD does not support namespaces. If you want to use namespaces in a DTD, you can use the trick shown in Section 6.2.1.

If you don't need data types and namespaces, using a DTD is still a good decision for the following reasons. First, at this moment, the conformance levels of XML processors that support XML Schema depend on the implementations. However, all the processors can handle DTDs based on the XML 1.0 specification.

Second, the XML Schema specification does not specify how an XML processor should obtain a schema definition. That means it depends on the implementations. By contrast, the location of a DTD is always given in a DOCTYPE declaration.

2.3.4 Parsing XML Documents with JAXP

The Java API for XML Processing (JAXP) is a common API to handle (parse and transform) XML documents. As this book is being written, the current version of JAXP is 1.1. The JAXP specification is available from http://www.jcp.org/jsr/detail/5.jsp (you can also see a related specification in Appendix C). DOM and SAX are well known as APIs for accessing XML documents. JAXP provides an API that is not specified in DOM and SAX梐n API for parsing and transforming XML documents. Xerces supports JAXP, and we can write highly interoperable source code that does not depend on the implementation of an XML processor.

Listing 2.14 shows an example of DOM-based parsing with JAXP.

Listing 2.14 Parsing an XML document with the JAXP API, chap02/SimpleParseJAXP.java
       package chap02;
       /**
        *        SimpleParseJAXP.java
        **/
       import java.io.IOException;
       import org.w3c.dom.Document;
       import org.xml.sax.ErrorHandler;
       import org.xml.sax.SAXException;
       import org.xml.sax.SAXParseException;
       import javax.xml.parsers.DocumentBuilderFactory;
       import javax.xml.parsers.DocumentBuilder;
       import javax.xml.parsers.FactoryConfigurationError;
       import javax.xml.parsers.ParserConfigurationException;
       import share.util.MyErrorHandler;

       public class SimpleParseJAXP {

          public static void main(String[] argv) {
             if (argv.length != 1) {
                System.err.println(
                       "Usage: java chap02.SimpleParseJAXP <filename>");
                System.exit(1);
             }
             try {
                 // Creates document builder factory
                 DocumentBuilderFactory factory =
                             DocumentBuilderFactory.newInstance();
[28]             // Tells the parser to validate documents
[29]             factory.setValidating(true);
[30]             // Tells the parser to be aware of namespaces
[31]             factory.setNamespaceAware(true);
[32]             // Creates builder object
[33]             DocumentBuilder builder =
                             factory.newDocumentBuilder();
[35]             // Sets an ErrorHandler
[36]             builder.setErrorHandler(new MyErrorHandler());
[37]             // Parses the document
[38]             Document doc = builder.parse(argv[0]);
             } catch (ParserConfigurationException pe) {
                 pe.printStackTrace();
             } catch (SAXException se) {
                 se.printStackTrace();
             } catch (IOException ioe) {
                 ioe.printStackTrace();
             }
          }
       }

The main difference from SimpleParse shown in Listing 2.2 is the use of a factory method to create an instance of the parser. This technique is known as Abstract Factory and Factory Method design patterns. The document interface plays the role of an abstract factory that provides a factory method to create DOM nodes without considering implementation classes.

A number of these techniques have been found through development of large programs. The standard Java class library uses a lot of design patterns, so you may have already noticed it. In this book, we try to introduce various design patterns to develop reusable program code. We recommend Design Patterns, by Erich Gamma et al. (Addison-Wesley, ISBN 0201633612), for further reading.

Instead of creating the DOM parser instance, a special class called factory has the responsibility of creating the instance. It is based on design patterns to improve the reusability of software components, and they are used throughout this book. In the factory instance, features for supporting validation and namespaces are set. In the previous examples, these features were set to a parser instance. The JAXP approach is employed to abstract an XML processor.

[29]   factory.setValidating(true);
[31]   factory.setNamespaceAware(true);

The setValidating() method (line 29) tells the DocumentBuilderFactory object to be aware of the validation. Note that Xerces provides two separate features for the DTD and XML Schema validation. The current JAXP API provides only a single method. When you use Xerces, calling the method factory. setValidating(true) means setting both features for the DTD and XML Schema validation. If a JAXP-compliant XML processor does not support XML Schema, only the DTD validation is activated.

The setNamespaceAware() method (line 31) tells the DocumentBuilder Factory object to be aware of supporting namespaces. If you use Xerces, the default value is true.

A DOM parser instance is created from the factory instance.

[33]   DocumentBuilder builder = factory.newDocumentBuilder();

The method creates a DocumentBuilder instance. The mechanism to determine the implementation class of the DocumentBuilder interface is shown in Section 2.4.2.

An error handler is set to the parser instance and the program parses an input XML document.

[36]   builder.setErrorHandler(new MyErrorHandler());
[38]   Document doc = builder.parse(new File(argv[0]));

The following is the result of the program.

R:\samples>java chap02.SimpleParseJAXP chap02/department-dtd.xml

2.3.5 Design Point: JAXP and Xerces Native API

In the previous sections, you learned how to parse an XML document with the Xerces native and JAXP APIs. This section discusses their advantages and disadvantages.

Using JAXP provides a way to write highly reusable program code. For example, the small program shown in Listing 2.2 does not work without Xerces. Therefore, JAXP should be used whenever an application requires only basic parsing capability, because most XML processors, including Xerces, have supported or plan to support JAXP.

However, JAXP supports only a basic method for parsing, so it might not be enough when you want to do something special. Chapter 6 is devoted to the special but usable patterns (tricks) that can be used with Xerces. For example, if an application requires a custom DOM implementation, Xerces provides an easy way to accomplish it (see Section 6.3.2). So developers should consider the requirements of an XML processor before deciding which API is best.

    [ directory ] Previous Section Next Section