Splunk Touts Troubleshooting Rx

How you may be troubleshooting application performance and reliability issues in the loosely coupled application-scape of the future

If there’s a downside to the loosely-coupled application-scape of the future, it could well be that it’s going to look a lot like the highly distributed application-scape of the past. In an application model with multiple touchpoints, encompassing a range of different systems, networks, and data sources, how can IT troubleshoot application performance and reliability issues quickly and easily?

Scrappy start-up Splunk Inc. thinks it has a solution. Splunk markets an search and analysis tool that can parse log and configuration files from hardware devices, operating system platforms, applications, databases, message queues, and even enterprise service buses to help IT organizations troubleshoot application performance and reliability problems.

It also has impressive antecedents. Splunk CEO Michael Baum worked on search-engine technology with Yahoo! Inc. and the former Infoseek. Ditto for CTO Eric Swan, another Infoseek veteran—and a former eBay staffer. It’s tempting to frame Splunk as a sort of über-Google for log files—as at least one reviewer has done—but CEO Baum says that’s an oversimplification.

“For a layman from 100,000 feet, maybe that’s a fine explanation, but … it’s a pretty gross oversimplification of what we’re doing,” he comments. “On the Web, data is relatively static; Web sites don’t change as much. But IT data changes every millisecond—it’s streaming, rapid-fire information that isn’t hyperlinked to anything, so our challenge is [to take] all of that streaming data and [be] able to catalogue it in real time. It’s a whole lot different [than Web search].”

Of course, any sys admin worth his or her salt can grep, concatenate, and analyze a collection of log files. Most organizations use homegrown or ad hoc tools to do just that, after all. But Splunk outstrips vanilla search and analysis tools, Baum maintains, because it uses inferencing algorithms and other black-box technologies to reverse-engineer relationships. In this respect, he argues, it’s able to reliably pinpoint problems that might span dozens of different log files, involving hundreds of megabytes (or even gigabytes) of data.

“There’s this very complex data in lots of different formats, none of it is very well structured, and you have all of these experts trying to make sense of that data. But why hasn’t anyone applied technology to this problem? Here you someone logging into a machine, grepping through files, basically reverse-engineering what these systems are doing at runtime—why not automate that?” Baum suggests.

“The opportunity is to take every piece of data that’s generated in your system and in real time index it and link it together with other pieces of information on the fly. You’re essentially reverse-engineering those particular relationships so you can identify what component or components are at fault.”

Splunk isn’t just a product in search of a market, Baum insists; in fact, its impetus grew out of his experience at Yahoo!, Infoseek, and elsewhere. “I took a look at the kinds of things we did when I was at Yahoo, and even when I was at Infoseek, we had a lot of individual system administrators, support people, developers who would be called in for escalation, going through all of these different data sources, whether it was a database creating its own records or a bunch of J2EE traffic coming down a message bus. They’d have to comb through it to identify problems,” he says. “In a best-case scenario, it might only take a few hours or days—but in some cases, it could take weeks.”

Baum says Splunk can get at almost any data source—including data residing on mainframe systems. It supports standard connectivity (ODBC and JDBC, for example), messaging (MQSeries, MSMQ, or JMS), and management (SNMP) standards, and also ships with a SOAP connectivity toolkit. “We have a number of different ways to tap data. We include a complete open set of SOAP APIs, for example, and as far as mainframe data goes, I know we’ve had people get at it right through TCP sockets,” Baum notes.

Splunk is a pay-for-use product, but it’s also available as a free “lite” tool. It’s by means of the free download that many customers first take the Splunk plunge, Baum says.

“Anyone can come to our Web site and download the product. We’re doing on the order of 15,000 downloads on our Web site a month now,” he says. “That’s the typical entry point for a lot of [customers]. They’re searching [the Web] for something to help them [troubleshoot] these problems, and they end up downloading the free tool to try it out.”

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.