站内搜索: 请输入搜索关键词
当前页面: 图书首页 > XML and Java: Developing Web Applications, Second Edition

XML and Java: Developing Web Applications, Second Edition

[ directory ] Previous Section Next Section

11.2 Storing and Searching for XML Documents

Database management systems such as RDBMS are very efficient in dealing with a large amount of data, and they provide the essential characteristics for mission-critical applications that require robustness, integrity, consistency, and availability. The three-tier model shown in Figure 11.1 is widely used for applications such as purchase orders, ticket reservations, and electronic form application systems. However, most data stored in these systems is not in XML format.

There are three approaches to storing data represented in XML into database systems:

  1. Store an XML document as a structured document.

  2. Store an XML document as a DOM tree object.

  3. Store an XML document as a set of relational tables.

The first approach came into use for storing and retrieving structured documents by using an SGML/XML native database. For example, OpenText (LiveLink) is a full-text search engine (see http://www.opentext.com/livelink). An XML database server called Tamino, from Software AG, also employs this approach (see http://www.softwareag.com/tamino). LiveLink associates an XML element with a region and realizes structure-aware searching. Tamino provides a searching capability by using an (extended) XPath expression and manages indexes for schemas to optimize the search process. The advantage of using the native database is that there is no need to design mapping between an XML document and tables, which is nontrivial work, as discussed in this chapter. An XML document can just be stored in the database and retrieved with XPath, XQuery, and other standards-based technologies. The native database also provides a full-text search capability that can be useful when complex documents (rather than data) are to be stored.

The second approach can be realized by using an Object-Oriented Database (OODB). In the OODB framework, a data object is stored as a persistent object, and an application can address the object via a pointer. In an XML-based OODB, an XML document can be represented as a DOM tree object and is stored in its persistent data store. A well-known implementation of this approach is eXcelon (see http://www.exceloncorp.com). The product is based on a general OODB called ObjectStore. It stores a collection of DOM objects and provides searching functions by using XPath with some extensions. The main advantage of this approach is the same as for the native database.

The last approach is the topic of this chapter. In this approach, an XML document is stored in an RDBMS. It can manage a set of relational tables and their schemas, which are strictly defined. On the other hand, an XML document is semistructured data, which is one of the most important characteristics of XML; therefore, it is not easy to map an XML document with one or more tables.

This chapter focuses on the third approach for the following two reasons. First, most existing business applications store data in RDBMS. To develop an XML-based Web application integrated with existing resources, it is natural to store an XML document sent from clients in RDBMSs.

Second, most commercial RDMBS products, such as Oracle and DB2, provide high availability against large data volume and huge numbers of accesses to them. They also provide various management capabilities, including data backup and recovery. These features have a significant meaning when you develop large-scale and reliable business applications. Furthermore, many techniques to optimize a system and a query have already been established, so users can benefit from them.

How to retrieve stored XML documents is another big issue. There are four well-known approaches for retrieving.

  • An application-specific query language, such as OpenText. For example, when you want to search TITLE elements that contain the string "XML" with OpenText, you can use a query like region TITLE including "XML" (assuming that the TITLE element is associated with the TITLE region, which is an OpenText-specific topic). The syntax of the query is application-specific.

  • XPath (see Chapter 7). XPath and its (application-specific) extension can be used as a query language because parts of XML document can be addressable by using XPath. It is also possible to convert an XPath expression to Structured Query Language (SQL) to search a database in which XML documents are decomposed into tables by using JDBC.

  • XQuery, the W3C standard in progress. XQuery is a standard of query language for XML documents specified by W3C. It is based on SQL, and XPath is used to address part of an XML document. XQuery can specify the format of the result in a flexible way. The latest Working Draft was published on June 7, 2001. This specification may change before becoming a Recommendation; therefore, the details of XQuery are not covered in this book. If you want to know more about it, visit the W3C Web site (http://www.w3.org/XML/Query). The advantages of XQuery are as follows.

    - It provides a common query language for XML documents. It does not depend on a particular type of database (native, OODB, and RDB).

    - It can be applicable to a set of XML documents.

    - It provides powerful syntax to search and transform the result.

    The following is an example of a query appearing in the XQuery 1.0 specification.

    FOR $p IN distinct(document("bib.xml")//publisher)
    LET $a := avg(document("bib.xml")//book[publisher = $p]/price)
    RETURN
        <publisher>
            <name> {$p/text()} </name>
            <avgprice> {$a} </avgprice>
        </publisher>
    

    The words FOR, LET, WHERE, and RETURN are the main building blocks of XQuery, and the query is called an "FLWR expression." The previous query lists for each publisher the average price of its books. You may have heard that the concept of the query is similar to SQL. That's true. A FOR clause specifies a target part of an XML document. It corresponds to SELECT in SQL. A LET clause is used to bind a variable to the result of a function (to get an average of multiple values, in this case). A WHERE clause specifies the condition of the query. The result of applying the condition is returned with the RETURN clause. It is a kind of template for output. The variables in the template are bound in the query process. XQueryX is a syntax for representing the FLWR expression in XML. The draft of XQueryX is also available at W3C (http://www.w3.org/TR/xqueryx).

  • Structured Query Language (SQL). SQL is a common language to access an RDBMS. If an XML document is decomposed into data to be stored in a table as a column value or it is generated from data stored in a database, it can be accessed by using SQL. We discuss more about this approach in the next section.

In this chapter, we focus on storing XML documents in an RDBMS. Furthermore, we should consider the opposite direction, creating XML documents from a set of relational tables. This is very important because most existing business data is stored in RDBMSs in the form of tables, not XML documents. The next section covers mapping between XML documents and relational tables.

    [ directory ] Previous Section Next Section