XML Tries to Be the Latest and Greatest

When I was growing up, pride in a new squirt gun was always muted the day my neighbor surfaced with a newer model firing farther and storing more water. At times like that I looked at my nearly brand-new water pistol and regarded it as a dowdy relic of the distant past.

The pace of change on the Web is even faster than that of technology in water weapons. Nowhere is this more evident than in the arena of eXtensible Markup Language (XML). XML is barely a factor on the Web: Browser support is limited to experimental plug-ins or rough, special-purpose, first-generation tools. Still, the underlying standards are undergoing almost weekly change. For businesses with a bottom line to protect, it's a difficult time to decide to experiment with XML without risking getting a lot more than your feet wet.

XML was originally designed to meet the challenges of large-scale, online publishing. HTML isn't nearly rich enough to support the advanced document needs of parts catalogs, airline reservation systems or other complex publishing requirements. In fact, the current version of HTML, version 4.0, is probably the last of the series: No attempts to extend it or address its limitations through new versions are forthcoming. The designers of XML attempted to address the limitations of HTML by crafting a new language from HTML's distant parent: SGML. SGML is a set of rules that allow the development of new markup languages. Unfortunately, SGML is so complex that it seems as if only three people in the world understand it.

XML emerged as a simplified implementation of SGML, enabling Web designers to create new document structure languages. The advantage of XML is that anyone can create his or her own markup tags. The downside is that browsers require some way to figure out what those tags mean. In some special cases, today's browsers know how to handle certain XML language definitions. For instance, Microsoft Corp.'s version of push technology is based on its proposed Channel Definition Format (CDF), and CDF is simply an XML-based language for defining push content and related deliverables.

XML's importance in enterprise applications is in its extensibility. Organizations will be able to use XML to address complex publishing and application challenges that HTML could never handle. XML also appears to be the future, industry-standard mechanism for the exchange of data as well as documents. For example, it may be possible to use XML as a mechanism for databases from different vendors to exchange information across the Internet.

What stands out today is how few real business applications there are for XML and, despite this, how excited everyone is about the technology. In fact, people are so excited about XML that they've discovered parts of the specification that they are already trying to improve -- long before the first substantial implementations of the first standard!

In August, Microsoft and IBM Corp. joined to suggest a major change to XML. They proposed a new way of telling browsers how to understand the tags embedded in XML documents. The specification, called the Document Content Description (DCD) for XML, describes how authors of XML-based languages define the structure and syntax of tags in their documents. The newly proposed DCD would completely supersede XML's existing method of doing this, a mechanism called the Document Type Definition (DTD).

One way in which DCDs improve on DTDs is in the ability to provide data types. If the content of a XML tag is a number – say, 32768 -- a DCD will allow the designer to specify whether that number is a date, time, integer, string or some other type of data. DCDs also improve on DTDs by allowing designers to easily extend XML-based languages and reuse definitions for tags.

Bridging the gap between HTML and XML is going to present many headaches for companies at they move more advanced applications to the Web. Companies that view XML as just a souped-up version of an old friend -- HTML -- are in for a painful lesson. What's potentially more difficult is that, despite the tremendous momentum for XML, the underlying standards are still being thought-out, revised and revisited. Examples such as the DCD specification show that the Internet industry is in the process of getting the standard right. We’re just not quite there yet.

For corporations considering implementation of XML applications, caution is in order. Development tools, XML-aware clients, and common, standardized XML languages are on the way, but using initial implementations is likely to feel like having a brand-new water pistol -- and knowing that the kid next door will have a SuperSoaker the very next day. -- Mark McFadden is a consultant and is communications director for the Commercial Internet eXchange (Washington). Contact him at mcfadden@cix.org.