| [ directory ] |
|
16.2 Schemas as Syntactic ConstraintsIn this section, we show one important reason for creating schemas. Schemas allow us to focus on documents that satisfy our assumptions about documents. 16.2.1 Checking Unexpected DocumentsIn Chapter 1, we constructed our PowerWarning application using XML. We obtained the current temperature by using the strategy The current temperature is shown by the <CurrTemp> tag. We observed that use of XML keeps programs independent from the way the Web page is displayed for human users. Now, suppose that the XML document, shown in Listing 16.1, is received as weather information. Although this document appears to correctly represent weather information, it differs from our expectations in three points.
Listing 16.1 Unexpected XML document
<?xml version="1.0" encoding="utf-8"?>
<WeatherReport>
<City>White Plains</City>
<State>NY</State>
<Date>1998-07-25</Date>
<Time>11:00:00-04:00</Time>
[7] <Temperature>
[8] <Current unit="Fahrenheit">seventy</Current>
[9] <High unit="Fahrenheit">82</High>
[10] <Low unit="Fahrenheit">62</Low>
</Temperature>
</WeatherReport>
Our PowerWarning application does not work properly when it receives this document. First, it fails to obtain the element Current. Second, even if it could obtain the element Current, it would fail to obtain the integer 70 from the string "seventy". Let us make sure that it fails to obtain the element Current. If we used SAX to construct our PowerWarning application, we have tested that the start tag received by the method startElement contains the name CurrTemp. If we used DOM, we have used either getElementsByTagName/getElementsByTagNameNS or a combination of the methods getChildNodes and getLocalName. In both cases, it is fatal that the tag name is Current rather than CurrTemp. When we combine getChildNodes and getLocalName, we even fail to obtain the element Current from the root element WeatherReport by getChildNodes because of the intervening element Temperature; even if the tag name were CurrTemp rather than Current, our PowerWarning application would not work properly. In general, when we develop an application program for handling XML docu ments, we make some assumptions about target documents and deal with only those documents that satisfy these assumptions. The application program works properly when it receives documents satisfying these assumptions but does not work when it receives documents not satisfying these assumptions. It is a schema that ensures that a given document satisfies such assumptions. A schema is a formal description written in some schema language, and it precisely specifies permissible XML documents. If we have a validator for the schema language, we can validate a given XML document against a schema; that is, we can determine whether the document is permitted by the schema (see Figure 16.1). Figure 16.1. Validation
A schema declares names for tags and attributes. For example, consider a schema for our PowerWarning application. That example schema declares CurrTemp as a permissible tag name. A schema further declares permissible structural rela tionships among elements and attributes. Our example schema declares that a CurrTemp element is allowed as a child element of a WeatherReport element. A schema further declares datatypes to which the character contents of elements or values of attributes belong. Our example schema declares that the character content of CurrTemp is of the datatype integer. The DTD shown in Chapter 1, Section 1.4, provides an example of schemas. It specifies which XML document is permitted as a proper representation of weather information as far as possible in the DTD language. Because it is not equipped with rich datatypes, the DTD does not specify that the content of CurrTemp is an integer. 16.2.2 What Happens If We Neglect Schemas?Now, let us reconsider the criticism that schemas are hard to create. Although the DTD in Section 1.4 is simple, its creation is not cost-free. Creating a huge DTD sometimes requires several years. What would happen if we do not create schemas to eliminate the burden of schema authoring and maintenance? Neglecting schemas implies that we cannot use validators to determine whether a given document satisfies our assumptions. Rather, our application programs have to perform the required check. In the PowerWarning application example, the application program is forced to examine the following conditions.
Although there are many other things to check, these conditions already require a long Java program, as shown in Listing 16.2. If we try to capture all the conditions specified in the DTD, this program will become significantly longer. Such long programs for validation are hard to create and maintain. Listing 16.2 Part of the Java program HandWrittenValidator.java
package chap16;
/**
* HandWrittenValidator.java
**/
import java.io.IOException;
import org.xml.sax.SAXException;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import share.util.MyErrorHandler;
public class HandWrittenValidator {
public HandWrittenValidator() {
}
public void validateWeatherReport(Element element) {
//Does the root element have a tag name "WeatherReport"?
if (!(element.getTagName().equals("WeatherReport"))) {
System.err.println(
"[Invalid] Incorrect root");
return;
}
//Does the WeatherReport element have any attributes?
if (element.getAttributes().getLength() != 0) {
System.err.println(
"[Invalid] WeatherReport has illegal attributes.");
return;
}
//Validate children
Node child;
for (child = element.getFirstChild();
child != null;
child = child.getNextSibling()){
if ((child instanceof Text)
&& (((Text)child).getData().trim().length() != 0)) {
System.err.println(
"[Invalid] WeatherReport cannot have a text
child.");
return;
}
else if (child.getNodeType() == Node.ELEMENT_NODE) {
break;
}
}
if (child == null) {
System.err.println(
"[Invalid] WeatherReport has no child elements.");
return;
}
else {
validateCity((Element)child);
}
}
public void validateCity(Element element) {
//Does this element have a tag name "City"?
if (!(element.getTagName().equals("City"))) {
System.err.println("[Invalid] City is missing.");
return;
}
//Does the City element have any attributes?
if (element.getAttributes().getLength() != 0) {
System.err.println(
"[Invalid] City has illegal attributes.");
return;
}
//Does the City element have character contents?
for (Node child = element.getFirstChild();
child != null;
child = child.getNextSibling()) {
if (child.getNodeType() == Node.ELEMENT_NODE) {
System.err.println(
"[Invalid] City cannot have a text child.");
return;
}
}
}
...
Checking by schemas and validators is more effective than checking by application programs. For application programs to check documents, programmers have to write lengthy programs, as in Listing 16.1. When many application programs use these documents, such checking programs have to be written many times (possibly in different programming languages). On the other hand, once a schema is created, validation can be easily repeated many times. Application programmers only have to invoke their favorite validators. Furthermore, a fundamental problem of not creating schemas is that assumptions about XML documents become unclear. If different programmers have different understandings of these assumptions, application programs that should interwork thorough XML documents fail to interwork. For example, suppose that a programmer thinks that the attribute unit is optional and creates a program that does not output this attribute. The XML documents created by this program cannot be correctly handled by other programs that rely on this attribute. Put another way, we can dispense with schemas only when we can clearly specify permissible XML documents with prose and we are willing to write programs for the required checks (possibly several times). Otherwise, we have to create schemas and validate documents with validators. The cost of schema authoring and maintenance is a problem, but we cannot completely avoid it. All we can do is choose good schema languages, which minimize the cost of schema authoring and maintenance while taking advantage of schemas to the maximum. 16.2.3 Desiderata for Schema LanguagesDiscussion in this section leads to some desiderata for schema languages. They are basic and indispensable for making Web application development easier by using schemas. In other words, if a schema language fails to satisfy these desiderata, that schema language should be strongly avoided.
Although these desiderata are important, we do not discuss whether each schema language satisfies these desiderata. The reason is that such discussion can easily become subjective and unpersuasive. Readers are encouraged to read schema examples in this book and other resources and draw their own conclusion. In particular, the report of the schema language comparison panel at XML 2001 is helpful. |
| [ directory ] |
|