In-Depth

IBM Bangs the XML Drum Slowly but Surely

pureXML extends DB2 9’s ability to store, update, delete, query, and index well-formed XML, officials say.

Don’t look now, but it looks as if IBM Corp.’s DB2 team has got a first-class case of XML religion.

Big Blue announced the first XML-native version of its DB2 database six months ago—claiming (and not without controversy) that DB2 9.1’s XML implementation outclassed those of rivals Microsoft Corp. and Oracle Corp.

Big Blue beat the XML drum again last week, when it announced new "quick start" XML-friendly software bundles for DB2. IBM’s DB2 9 pureXML quick start bundles promise to enable DBAs and software developers to more easily load XML industry exchange messages and schemas. The new bundles will also help accelerate the process by which code jockeys and DBAs can define XML queries and indexes for business applications, IBM officials say.

The first pureXML bundles support the Financial Information eXchange Markup Language (FIXML) and Financial products Markup Language (FpML) standards, just two of several industry-specific XML message exchange formats which provide a means to exchange XML data information between and within companies. "One of the key features of DB2 9 is the ability to handle both relational data as well as XML," said Bernie Spang, director of data server marketing with IBM’s Information Management group, in a statement. "By enhancing this XML capability and providing specific industry format support, we are again setting the bar for the industry XML standards."

pureXML extends DB2 9’s ability to store, update, delete, query, and index well-formed XML, officials say. Users can incorporate XPath, XQuery, and even SQL statements into their queries to retrieve XML documents or document fragments. In addition, they can register XML schemas and have DB2 validate XML documents against them. Big Blue plans to release additional pureXML bundles over time. The first bundles are available for download, free of charge, from IBM alphaWorks online. (http://www.alphaworks.ibm.com/tech/purexml.)

Big Blue’s promotion of DB2 9.1’s native XML capabilities has generated some controversy. IBM, for example, claims that DB2 9.1’s native XML implementation outstrips the XML capabilities which both Microsoft and Oracle deliver in their own flagship RDBMS products.

In a June interview, for example, IBM’s Spang claimed that Microsoft SQL Server 2005 and Oracle 10g store XML information in the form of non-native XML datatypes—in other words, as data in relational tables and columns. This is a functional, but less efficient, implementation than IBM’s own approach in DB2 9, Spang argued: "In the same sense that they say that, DB2 has also [in the past] supported XML as a datatype stored in the relational data model, so no argument that they store XML data. The difference is [DB2] version 9.1 stores the XML data in its pure XML structure, so a hierarchical structure that can be indexed and searched or queried [without having to be translated]. That results in a significant performance improvement."

Furthermore, Spang said, other RDBMS vendors use techniques like XML "shredding" (breaking up XML information and turning it into rows in relational tables) or as Binary Large Objects (BLOB) or Character Large Objects (CLOB), in which XML information is essentially dumped en masse into relational tables.

"The problem with that is that if you want to gain any insight, you have to pull the whole CLOB out and then you and your application code have to decompose it, or you have to have some intermediate layer to parse it and do the query," he argues. "So DB2 9 implements management of both [relational and XML] data structures through a single data server. As an application developer, you can write a query intermixing SQL, XPath, or the new XQuery language, and DB2 does the work for you to query both, any and all data sets in the database, whether it’s the relational structure or the [XML] data structure."

IBM’s rivals suggest that Big Blue’s claim to XML supremacy hinges more on what it means by "XML datatype" than on any explicit performance or implementation advantage.

Take Michael Rys, program manager for Microsoft’s SQL Server Engine Team, who has consistently downplayed IBM’s claims, particularly on his Weblog.

"Both DB2 and SQL Server (and others) expose or will expose at the logical level an XML datatype that provides XML fidelity plus query and update functionality. Thus all of them provide ‘native’ XML capabilities (without abusing the language)," Rys wrote last year. "IBM's physical design is irrelevant. Whether you store it as a string, store it in some internal binary format making use of existing storage facilities provided by the relational database system or design a complete new storage engine does not matter."

As for the performance advantages of IBM’s "native" XML data server, Rys said that the internal format in which XML is represented in SQL Server 2005 is easier to traverse than the original XML, which helps accelerate performance—especially vis-à-vis the practice of parsing raw XML.

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.

Must Read Articles