VoiceXML: XML Talks Back

Will the VoiceXML standard fit into your enterprise … or is it not worth its SALT?

The problem with emerging technology is that there's just so much of it. But it can't be ignored. Understanding new technologies when they appear on the horizon makes the decision to adopt or pass a lot easier. So let's look at one: VoiceXML, a specification developed by the VoiceXML Forum.

The Forum was founded by IBM, AT&T, Lucent and Motorola. They've built some impressive allies; Nortel, Cisco, Siemens, Verizon, Alcatel, Intel and Oracle.

VoiceXML is a specification for designing XML documents that tell a server how to interact with a phone user, just as HTML tells a server how to interact with a browser.

Because the business logic remains the same, VoiceXML promises rapid development times. It should be fairly easy to code pages for VoiceXML—easier than HTML, in fact. VoiceXML looks like a structured version of many existing call center applications, providing robust, two-way communication, if/then/else logic constructs and error checking, and data validation. It's just based on speech rather than DTMF button-pushes.

More people will be able to use an HTML/VoiceXML application than a pure HTML application, because it can enable data presentations outside the normal desktop/laptop/PDA venue.

There are already voice technologies doing that, but XML will allow a new degree of standardization. With widespread adoption, you should be able to buy a voice gateway from any supporting vendor, ensuring that competition will force the continued improvement of both the speech generation and speech recognition engines.

But there's a 1,000-pound gorilla blocking the way: Microsoft, promoting a competing standard, SALT—its APIs will be part of Visual Studio .NET.

While the VoiceXML Forum is looking for licensing royalties, Microsoft is giving SALT away—a strategy that's proved successful for everything from browsers to music.

SALT's smaller following includes both Cisco and Intel. With Microsoft's stranglehold on the desktop market, and its leverage into servers, SALT will be a significant challenge.

Does VoiceXML stand on its own merits? Probably its key benefit is that it will allow effective voice interface to existing applications by simply replacing the presentation layer of the OSI model.

In a vacuum, I'd say this was a slam dunk: A technology that many companies clearly need, a probable W3C standard from some of the biggest IT players, backed by even more of the biggest players. But the opposition, Microsoft, has a track record of successfully killing rival standards. That's not good.

SALT is an interesting technology. More complex and flexible than VoiceXML, it, too, is XML-based, but works by voice-enabling existing Web pages, binding voice events to HTML forms. The same SALT-enabled Web page could be served to a Web browser as well as a voice gateway.

VoiceXML looks cleaner and simpler, has huge vendor support, and looks like it would accomplish most business needs. SALT, however, has the support of a huge vendor and is bound to start appearing in other Microsoft products, leveraging its wide deployment.

Best way to prepare: Tell your R&D developers to focus on VoiceXML, but hedge your bets by giving them some background in SALT.

About the Author

Laura Wonnacott is VP of Business and Technology Development for Aguirre International, and a California State University system instructor.