XML Goes Native

Native XML Databases (NXDs) are emerging as a solution for XML document management—for now.

With XML, we finally have the means to connect disparate systems via practically any common ground. Well-formed XML data makes for more rapid and powerful application development, and the extensible nature of the standard means that it can be adapted to a wide variety of uses.

Through different schemas, XML is being used for everything from describing 3D designs to documenting drug interactions. XML has also affected the database arena. In the traditional client-server world, a database applies the underlying order to an application or organization's data. Applications send queries to the database, which responds with some combination of data and metadata about the query. The data itself is typically a series of rows and columns, with each row having several columns of data.

But if an application is receiving XML data, it doesn't really matter whether that data is coming from a database, an HTTP connection to a flat file served by a Web server, or even a console input (if that data is from someone who really likes to do things the hard way). That means that the benefits of XML's structure are lost. After all, what's a traditional relational database going to do with XML, beyond just stuffing it into a text field?

The most obvious and probably easiest solution is to map any given XML document to the databases' underlying tables of rows and columns, but that's difficult, given the flexible nature of an XML schema.

One solution to this quandary is the NXD, or Native XML Database. Such a database stores—or at least appears to store—whole XML documents as database entities. An NXD thus helps fill the capability gaps in standard database environments left by the rapid acceptance of XML in the marketplace.

But because an NXD changes the underlying unit of data from a row to a document, several other aspects of traditional databases need to be re-invented. For example, NXDs by and large don't support ANSI SQL. Instead, they use various supersets of XPath, the XML lookup syntax, to query and update data. That's one of the areas where NXDs show just how much of an emerging technology they are: Almost every NXD uses a different query syntax. Fortunately, a new spec called XML:db promises to standardize NXD query syntax.

The current generation of NXDs has another weakness: updates. Most NXDs simply don't support updating a portion of a document (say, a single element). Changing an attribute means pulling down a copy of the document, changing the attribute in an application, and then sending the document back to the database. Obviously, that's not very efficient.

However, it's clear that NXDs—or something offering similar functionality—will play an important role in the future of XML document management. That means that it's well worth looking at today's crop of NXDs, warts and all, to begin preparing for when NXD standards are settled and the technology becomes business-ready.

Big Players in the Game
As you probably know, the bulk of the work involved in building a database isn't in deciding how the data is structured. Rather, any enterprise-level database must have robust backup and restore options, replication, a top-notch query optimizer, referential integrity and a solid transaction model. Looking at the handful of upstart NXD vendors out there, I doubt that they're really going to be able to build out those capabilities more quickly than the big RDBMS vendors will incorporate native XML storage.

Oracle, IBM and Microsoft all offer the ability to return result sets as XML documents in one form or another. In addition to being able to output the results of a query as an XML document, Oracle offers an XML datatype—an abstraction of a text blob—that allows for limited manipulation of XML data. Oracle also offers XML parsers and validators built into the database engine, a handy feature that will no doubt become standard practice in the RDBMS world.

IBM's DB2 is also heading into the XML world, with the usual XML results feature as well as the ability to break XML documents apart and store them in a normalized schema of relational tables.

Microsoft recently released SQLXML 3.0, which goes further and allows SQL server to offer XML-based Web services and participate in Microsoft's .NET initiative. Still, the XML here is all for communications and API purposes; there's no facility to store and manipulate XML documents themselves.

So we have NXDs on one hand, with robust XML support but very immature approaches to queries—and no support for enterprise-level functionality like replication. And on the other hand, we have huge vendors with proven databases scurrying to catch up in the XML space. Where's it all headed?

I certainly think NXDs serve an important function. And it's probably worth investing some of your time and energy to get up to speed on XML:db and XQuery, quite possibly by using one of the NXDs available today. However, I don't see the genre as having a particularly long life span. There's no benefit to separating an XML database from a relational one, and it's just a matter of time before the big relational players incorporate this kind of functionality into their products, either through development or acquisition.

If you're looking to learn more about XML and databases and native XML databases, check out www.xml.com/databases/ and www.rpbourret.com/xml/XMLDatabaseProds.htm.

About the Author

Laura Wonnacott is VP of Business and Technology Development for Aguirre International, and a California State University system instructor.