|[ directory ]|
16.5 General-Purpose Schema Languages
In this section, we consider four notable schema languages: DTD, W3C XML Schema, RELAX NG, and Schematron. We study whether these languages satisfy the desiderata shown in this chapter.
The DTD language does not satisfy desideratum 6: Schema languages should not change trees created by XML parsers. The reason is that default values, en tity declarations, and notation declarations specified in DTDs affect the result of parsing. As a result, nonvalidating parsers do not always behave the same as validating ones. This lack of interoperability has caused problems in application development.
16.5.2 W3C XML Schema
W3C XML Schema does not satisfy desideratum 6. The reason is PSVI: Much information (including default values) specified in schemas is introduced into trees created by parsing. As a result, we cannot safely omit validation even when documents are guaranteed to be valid.
PSVI is intended to support data model 3 (An XML document is a collection of data compliant with a schema). PSVI contains diagnostic messages, data (for example, integer 1) of some datatypes, and references to declarations in schemas. Default values are also contained by PSVI. However, PSVI is defined only in an abstract manner as of this writing. Neither DOM nor SAX can handle PSVI, and there are no standards for concretely representing PSVI. This author believes that PSVI complicates W3C XML Schema and that data model 3 should be supported by data binding tools rather than PSVI.
W3C XML Schema provides derivation by addition, derivation by restriction, and substitution groups. These mechanisms support one style of inheritance. Some people believe that they narrow the gaps between XML and programs or databases.
However, these mechanisms complicate W3C XML Schema significantly. This author believes that these mechanisms make data model 3 difficult because the inheritance of W3C XML Schema is very different from the inheritance of programming languages.
By mimicking NULL of SQL, xsi:nil is intended to narrow the gaps between W3C XML Schema and RDBMS. Although this attribute does not complicate W3C XML Schema, it is specific to one particular type of application. This author believes that an independent specification should be created for this attribute, if it is really necessary.
In summary, W3C XML Schema attempts to solve problems by incorporating many mechanisms from other technologies. However, opponents (including this author) believe that these mechanisms complicate W3C XML Schema and do not help in interworking with other technologies.
16.5.3 RELAX NG
RELAX NG satisfies desideratum 6 because it does not change trees created by XML parsers. We can thus safely omit validation when documents are guaranteed to be valid. On the other hand, desideratum 6 implies that RELAX NG and its validators do not provide default values. However, to help migration from DTDs to RELAX NG, annotations in RELAX NG schemas can represent default values. Without performing validation, programs can use such default values to transform trees created by XML parsers. In other words, RELAX NG separates validation and default values.
RELAX NG does not provide any mechanisms specific to particular programming languages or RDBMSs. It does not have inheritance and does not have anything like xsi:nil.
The absence of inheritance in RELAX NG is certainly debatable. James Clark has argued that modeling languages such as Unified Modeling Language (UML) should be used for representing inheritance and that RELAX NG schemas generated from UML do not have to provide inheritance. It is also possible to create syntax sugar, which mimics inheritance, and to convert schemas using such syntax sugar to RELAX NG schemas. Because we can introduce different sets of syntax sugar for different styles of inheritance, this approach can narrow the gap between RELAX NG and any programming language.
Data binding tools for RELAX NG can support data model 3. Two data binding tools have been developed for RELAX NG: Relaxer (introduced in Chapter 15) and RelaxNGCC. Relaxer generates Java programs from schemas and provides bidirectional mapping between XML documents and Java objects. Although Relaxer was originally designed for RELAX Core, it can also handle RELAX NG. The absence of inheritance in RELAX Core and RELAX NG allows Relaxer to take full advantage of the inheritance of Java (see Section 15.2.2). Intuitively speaking, RelaxNGCC is yacc or JavaCC for RELAX NG. To manipulate XML documents that are valid against a RELAX NG schema, programmers insert Java code fragments in this schema. Then, RelaxNGCC generates a main program that executes these code fragments during parsing. By invoking this main program, programmers can easily handle XML documents with Java programs. Although RelaxNGCC supports the creation of Java objects from XML documents, it does not support the creation of XML documents from Java objects. Both Relaxer and RelaxNGCC are on the accompanying CD-ROM.
RELAX NG has a mathematical foundation, which is the tree or hedge automaton theory. This foundation provides a solid basis for implementing validators for RELAX NG. Furthermore, query languages (such as XQuery) and programming languages (such as XDuce) are based on the same foundation.
In summary, RELAX NG concentrates on validation and lets data binding tools provide data model 3. By doing so, RELAX NG has become simple yet powerful.
Schematron is a schema language designed by Rick Jelliffe. Unlike other schema languages, such as W3C XML Schema or RELAX NG, schemas in Schematron are collections of rules using XPath expressions.
The Schematron schema in Listing 16.5 references an XML document (source1.xml) that provides a list of authors. This schema ensures that each author in an instance document exists in the author list.
<?xml version="1.0" encoding="utf-8"?> <schema xmlns="http://www.ascc.net/xml/schematron" > <pattern name="Compare with the database"> <rule context="author"> <assert test="document('source1.xml')//author[@id=current()/@id]">The author is not in the database. </assert> </rule> </pattern> </schema>
Schematron satisfies desideratum 6 because it does not change trees created by XML parsers. Just like RELAX NG, Schematron does not provide default values.
Schematron has been used for validation but has not been used as a data model. No data binding tools for Schematron have appeared. In other words, Schematron has not been used to support data model 3 (An XML document is a collection of data compliant with a schema).
The use of XPath allows Schematron to capture what other schema languages cannot capture. In particular, only Schematron can capture constraints among multiple documents. For this reason, Schematron is expected to be used in conjunction with other schema languages such as W3C XML Schema and RELAX NG.
|[ directory ]|