站内搜索: 请输入搜索关键词
当前页面: 图书首页 > XML and Java: Developing Web Applications, Second Edition

XML and Java: Developing Web Applications, Second Edition

[ directory ] Previous Section Next Section

1.4 Some XML Basics

As described in the power overload warning example in Section 1.2.2, XML is a key technology for B2B Web applications. Although we do not go into the details of the XML 1.0 Recommendation, several topics related to XML are helpful in reading and understanding the rest of the book. We discuss these topics in this section.

1.4.1 Standardization Process

Many of the Web-related standards are defined by W3C. Unlike ANSI and ISO, W3C is not an "official" standards body, so W3C issues its decisions as "recommendations," not as "international standards." Nonetheless, W3C recommendations effectively have the same authoritative standing as international standards issued by other standards bodies such as ISO and ANSI.

There are several levels in documents published by W3C. The less formal ones are called Notes. A Note is a submission by one organization or a group of organizations and is not the result of W3C formal discussions. If a formal proposal is submitted and it is determined to be important and worthy of discussion to reach a consensus, a Working Group is formed. A Working Group issues Working Drafts (WDs), which are published to the public so that they can receive receive feedback from anyone who is interested in the subject. Once the discussions converge on a set of common agreements and the Working Group feels that the Working Draft has become stable, the Working Group may issue a Candidate Recommendation (CR). The purpose of this document is to encourage implementations and to verify the specification through the implementation experiences. Once inconsistencies and ambiguities have been removed from the Candidate Recommendation, the Working Group issues a Proposed Recommendation (PR) as the basis for a vote. W3C member organizations cast votes for the proposal, and if it is approved, the proposal becomes a Recommendation (REC), which can be considered a standard in the normal sense.

In this book, we base our XML discussion on the XML 1.0 Recommendation. If we need to refer to other specifications, we will explicitly indicate which specification we are referring to. Any formal publication from W3C has a unique document name, whether it is a Note, Working Draft, Proposed Recommendation, or Recommendation. Readers are encouraged to check the latest publications at the W3C Web site (http://www.w3.org/). Appendix C, XML-Related Standardization Activities, lists several important ones to keep an eye on.

1.4.2 Validity and Well-Formedness

Because this book is not intended to be an introduction to or a reference manual for XML, we do not discuss the details of the XML specification. However, we do want to explain one important concept: the difference between validity and well-formedness.

In XML, you can define your own tag set using a Document Type Definition, or DTD. The following is an example of a DTD.

<!ELEMENT WeatherReport (City, State, Date, Time, CurrTemp, High, Low)>
<!ELEMENT City (#PCDATA)>
<!ELEMENT State (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT Time (#PCDATA)>
<!ELEMENT CurrTemp (#PCDATA)>
<!ELEMENT High (#PCDATA)>
<!ELEMENT Low (#PCDATA)>
<!ATTLIST CurrTemp unit  (Farenheight|Celsius) #REQUIRED>
<!ATTLIST High unit  (Farenheight|Celsius) #REQUIRED>
<!ATTLIST Low unit (Farenheight|Celsius) #REQUIRED>

Following the SGML tradition, the DTD has a different syntax from the XML syntax in the XML 1.0 Recommendation. We might want to use the same XML syntax for both documents and DTDs. This, in addition to a few other points (such as the introduction of data types into element specifications), is one of the hot topics being discussed by the XML community. Chapter 9 discusses two of these activities: the W3C XML Schema and RELAX NG.

An XML document with a <!DOCTYPE> declaration is said to be valid if it meets the constraints specified in the DTD. These constraints include the element content models (what child elements are allowed in what order) and attribute types. Validity constraints (VCs) are strictly defined in the XML 1.0 Recommendation. When an application receives an XML document, it should know the semantics of all the tags appearing in the document in order to process them correctly. Otherwise, the program has no clue how to handle an element with an unknown tag name even though it might look meaningful to human eyes (for example, <Purchase_order> may be understandable to English-speaking people, but it is as meaningless as <seikyuusho> to a program that does not know the semantics of the tag). Therefore, when processing XML documents with programs, we are usually interested in valid documents.

When a DTD is used as a schema language, an XML document must have a <!DOCTYPE> declaration at the beginning of the document. This declaration specifies a DTD against which an XML processor must validate the document. For example, if an XML document has the following declaration, it should be validated against a DTD available at http://www.example.com/WeatherReport.dtd.[6]

[6] How to specify the W3C XML Schema in an instance document is discussed in Chapter 9.

<!DOCTYPE WeatherReport SYSTEM
 "http://www.example.com/WeatherReport.dtd">

On the other hand, XML is designed so that simple XML documents are parsable without defining an explicit DTD as long as they contain no external entities.[7] This is one of the big differences from SGML and HTML. In XML, every start tag (for example, <Body>) must have a corresponding end tag (</Body>). Otherwise, a tag must be an empty tag that has an explicit slash at the end (for example, <City/>). An XML document is said to be well-formed if it satisfies this constraint as well as several others defined in the XML 1.0 Recommendation. A valid document is always well-formed, but there may be well-formed documents that are not valid.

[7] An external entity is a unit in an XML document whose replacement string is to be retrieved from a definition in a DTD.

Why do we want to allow well-formed documents that may contain unknown tags? Does it make sense at all to do such a thing? The answer is yes because not all the tags must necessarily be understood by one program. For example, we may want to allow text with HTML markups in a certain field, such as a comment. Even though the content cannot be understood by the program that receives the document, it can be submitted to an external browser upon request to display the HTML-tagged comment on the screen. Rendering is another good example that does not require DTD validity. Even if a browser encounters an unknown tag, usually it can simply be skipped without causing disastrous results.

1.4.3 Namespaces

The W3C Recommendation Namespaces in XML allows multiple tag sets (elements and attributes defined in schemas) in a single XML document. It is not easy to design a good schema that is used by many people for a long time. Using namespaces, you can reuse existing schemas to design more complex XML documents. In the following example, the tag price is defined in a namespace called http://ecommerce.org/edi and is distinguished from a tag that has the same name but belongs to a different namespace. Similarly, the tag x belongs to the namespace http://ecommerce.org/order.

<?xml version="1.0" encoding="UTF-8"?>
<order:x
   xmlns:order='http://ecommerce.org/order'
   xmlns:edi='http://ecommerce.org/edi'>
      <edi:price>14.95</edi:price>
</order:x>

In this example, edi in <edi:price> is called a namespace prefix. A namespace prefix must be declared at some ancestor element of the element where the prefix is used. The namespace declaration xmlns:edi='http://ecommerce.org/edi' declares that the namespace prefix edi is bound to the namespace http://ecommerce.org/edi, which is also called a namespace URI.

A namespace URI is the name of the tag set. In general, it has nothing to do with the location of the specification of the namespace. A namespace is often associated with a schema that dictates the syntax for the tags defined in the namespace. However, there is no standard way defined for validating an XML document against a namespace.

One of the original intentions of the namespace specification was to solve the problem of name collision. If the same element type, such as <price>, is defined in multiple namespaces, giving them different namespace prefixes can distinguish them in a single document. For this mechanism to work correctly, it must be able to rename namespace prefixes without modifying the contents of an XML document. Unfortunately, partly because XML documents lack a common data model, namespace prefixes are often fixed in an XML document.[8]

[8] If there were a common XML data model that explicitly stated that namespace prefixes have no significance other than binding an element type or an attribute name to a namespace URI, this problem would never have happened. However, some XML specifications, most notably XPath and XSLT, allow namespace prefixes to be used in attribute values. For these substrings appearing in string content, XML processors will never know whether they are associated with some namespace URIs or they are just part of an application data string, even though they look like namespace prefixes.

    [ directory ] Previous Section Next Section