In-Depth
XQuery—Could Finalized Standard Emerge This Year?
A proposed standard for querying structured and un-structured data has been a long time coming
Ever dreamed of a query language that is equally at home getting at data in relational sources as it is accessing e-mail messages, HTML documents, and other unstructured data?
You’re in luck. The notion of a unified query standard—XML Query (XQuery)—was first proposed in 1998. The World Wide Web Consortium (W3C) released the first public XQuery Working Draft specification in June 2001. At the time, industry watchers predicted that XQuery would eventually revolutionize the management of XML data and XML databases, much like SQL once did for relational data and relational database management systems.
Since that time, however, the goal of a finished XQuery standard has remained elusive. Last year, for example, some XQuery boosters suggested that the W3C would almost certainly finalize the standard by the end of 2002—but that didn’t happen, of course, in spite of the fact that a new version of the XQuery Working Draft specification was published in late November.
It’s not as if vendor support is holding back the standard, either. Big-time ISVs—such as IBM Corp., Microsoft Corp., and Oracle Corp., among others—have pledged to support XQuery in their product offerings when the standard is finalized. Some, like Oracle and Microsoft, have even released prototype versions of XQuery tools.
As Jeff Jones, IBM’s director of strategy for data management, indicates: “We have an enormous involvement in the [XQuery] standard, and one of the co-inventers of the SQL language is involved with IBM on the W3C [XQuery standardization effort]. We will support the standard when it is finalized.”
That hasn’t stopped at least one vendor—Enosys Software—from marketing software based on the current working draft of the XQuery specification. Enosys has partnered with BEA Systems, which developed LiquidData 1.0—an XQuery implementation based on Enosys technology—for inclusion in its WebLogic enterprise integration platform.
As a matter of fact, says Yannis Papakonstantinu, Enosys’ chief scientist and co-founder, the XQuery standard is close to being finalized, and is even suitable for deployment in some environments. “At this point, XQuery is fairly stable. There [are] a bunch of discussions going on ahead of providing the official recommendation, but now it is really down to the data. The core of XQuery is very stable, and XQuery implementations are available from Enosys and other [vendors].”
XQuery's Advantage
At the database level, XQuery is enabled by XML Schema, which describes a data model for the representation of structured, semi-structured and unstructured data. SQL as a data model is well adapted to, and highly optimized for, structured data. XML Schema, for its part, is more comprehensive.
As a proposed query language for relational data stores, vendors and analysts alike suggest that XQuery is inferior to SQL, certainly in terms of the relative performance and maturity of its underlying XML base, but also in another key area. According to Sandeepan Banerjee, head of Oracle’s XML database technologies division, XQuery “is a good decade behind SQL in terms of the penetration of the development expertise aspect.” Thousands of enterprise applications have already been written which support SQL, Banerjee points out, and virtually all organizations have SQL developers in-house. “People are pretty happy with [SQL]. Why would you want to reinvent this when there’s a lot of expertise and technology investment in [SQL] development?”
Similarly, as a proposed standard for querying unstructured data, XQuery is by no means the only query language on the block. The reality is that there are several existing query standards that can facilitate access to unstructured data.
Among existing standards, for example, there’s eXtensible Stylesheets Language Transformations (XSLT), an XML-based language for translating one set of XML data into another XML data set. XSLT works in tandem with another standard, XML Path Language (XPath), to facilitate access to unstructured data and transform it into an expected form. According to Oracle’s Banerjee, XPath “allows you URL-like navigational syntax for getting at the data, so you can navigate to a particular e-mail message, for example, or to a particular document, by using a path-based syntax.”
That’s assuming that you can get at the data in the first place. For those who use Oracle databases, that isn’t necessarily a problem: In 1998, Oracle introduced its Internet File System (IFS), which allows documents stored in proprietary formats, such as Word or Excel files, to be imported into a database and parsed. Oracle currently uses XPath in tandem with IFS to support access to and querying of unstructured data. In addition, Oracle supports another standard—SQL/XML—which amounts to a fusion of SQL with XPath, and which allows a developer to embed XPath in SQL statements.
But using IFS and XPath or SQL/XML to query unstructured content assumes that you’re first willing to load it into your Oracle database. And it’s here that XQuery departs from most existing solutions that provide access to relational and unstructured data. In this respect, says Enosys’ Papakonstantinu, XQuery can facilitate federated access to data of all kinds: “I can have an XQuery statement accessing multiple sources, providing an XML view of a relational database, along with other sources.”
Oracle’s Banerjee agrees. “If you are only going to deal with structured data, then SQL is for the foreseeable future the standard that you’ll want to use. If you’re building something that has to look at tables and documents and e-mail messages, then—when it emerges as an approved standard—XQuery will be appropriate.”
The Wait Goes On
Enosys’ Papakonstantinu acknowledges that there have been several starts and stops on the road to XQuery, but expresses confidence that 2003 will witness the finalization of the XQuery standard. “Nothing big is holding it up, just low-level details that should be worked out this year.”
Nevertheless, ISVs such as Oracle and IBM express a reluctance to officially support XQuery in their databases in the absence of a formal standard. Stresses IBM’s Jones: “We do not yet have an XQuery interface, the main reason for that is that XQuery hasn’t [been] formalized and isn’t yet an accepted standard. That’s not said to be an avoidance statement.”Mike Schiff, an analyst with consultancy Current Analysis, speculates that the XQuery standard effort has been to some extent hampered by the usual political battles, particularly from vendor representatives “who feel some pressure [not to reduplicate] the functionality of their own products, that and the fact that when you’re reaching for a consensus, you tend to reach for the least common denominator.” Hammering out a common denominator acceptable to all, Schiff deadpans, can take a surprisingly long time.
The upshot is that Schiff and most other analysts won’t venture a guess as to when the standard will be finalized. Oracle’s Banerjee, on the other hand, suggests that the standard is still at least another year away: “The reality today is that XQuery is still probably a good year away from the standard. The use cases are probably hypothetical use standards today.”
XQuery-related Links
For links to draft standards and XQuery mailing lists, along with links to XQuery implementations, visit http://www.w3.org/XML/Query.
About the Author
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.