| [ directory ] |
|
11.2 Storing and Searching for XML DocumentsDatabase management systems such as RDBMS are very efficient in dealing with a large amount of data, and they provide the essential characteristics for mission-critical applications that require robustness, integrity, consistency, and availability. The three-tier model shown in Figure 11.1 is widely used for applications such as purchase orders, ticket reservations, and electronic form application systems. However, most data stored in these systems is not in XML format. There are three approaches to storing data represented in XML into database systems:
The first approach came into use for storing and retrieving structured documents by using an SGML/XML native database. For example, OpenText (LiveLink) is a full-text search engine (see http://www.opentext.com/livelink). An XML database server called Tamino, from Software AG, also employs this approach (see http://www.softwareag.com/tamino). LiveLink associates an XML element with a region and realizes structure-aware searching. Tamino provides a searching capability by using an (extended) XPath expression and manages indexes for schemas to optimize the search process. The advantage of using the native database is that there is no need to design mapping between an XML document and tables, which is nontrivial work, as discussed in this chapter. An XML document can just be stored in the database and retrieved with XPath, XQuery, and other standards-based technologies. The native database also provides a full-text search capability that can be useful when complex documents (rather than data) are to be stored. The second approach can be realized by using an Object-Oriented Database (OODB). In the OODB framework, a data object is stored as a persistent object, and an application can address the object via a pointer. In an XML-based OODB, an XML document can be represented as a DOM tree object and is stored in its persistent data store. A well-known implementation of this approach is eXcelon (see http://www.exceloncorp.com). The product is based on a general OODB called ObjectStore. It stores a collection of DOM objects and provides searching functions by using XPath with some extensions. The main advantage of this approach is the same as for the native database. The last approach is the topic of this chapter. In this approach, an XML document is stored in an RDBMS. It can manage a set of relational tables and their schemas, which are strictly defined. On the other hand, an XML document is semistructured data, which is one of the most important characteristics of XML; therefore, it is not easy to map an XML document with one or more tables. This chapter focuses on the third approach for the following two reasons. First, most existing business applications store data in RDBMS. To develop an XML-based Web application integrated with existing resources, it is natural to store an XML document sent from clients in RDBMSs. Second, most commercial RDMBS products, such as Oracle and DB2, provide high availability against large data volume and huge numbers of accesses to them. They also provide various management capabilities, including data backup and recovery. These features have a significant meaning when you develop large-scale and reliable business applications. Furthermore, many techniques to optimize a system and a query have already been established, so users can benefit from them. How to retrieve stored XML documents is another big issue. There are four well-known approaches for retrieving.
In this chapter, we focus on storing XML documents in an RDBMS. Furthermore, we should consider the opposite direction, creating XML documents from a set of relational tables. This is very important because most existing business data is stored in RDBMSs in the form of tables, not XML documents. The next section covers mapping between XML documents and relational tables. |
| [ directory ] |
|