BMC Cuts False Performance Alerts to Zero
BMC says its new performance-monitoring software can help reduce false alerts—and sleep-shattering beeper pages—by almost 100 percent
If you’ve ever found yourself keeping company with talk show host Conan O’Brien because your beeper’s gone off in the middle of the night, BMC Software Corp. says it has just the product for you.
Yesterday, Houston-based BMC announced Patrol Analytics, a new business intelligence add-on for its PATROL distributed systems management tool. BMC says the new offering contains an analytics engine—licensed from Netuitive Inc.—that brings self-learning capabilities to the PATROL environment.
“A lot of [systems management products collect all of this performance information, but there’s no way for administrators to make sense of it. It becomes overwhelming,” says Sean Duclaux, director of infrastructure management with BMC.
In many cases, Duclaux argues, such tools wind up creating more work for administrators. “Too many alerts from data collection and poorly tuned thresholds in these tools really lead to an inefficiency of the IT organization,” he indicates. “They are forced to chase every rabbit down every rabbit hole, if they set the static threshold too low—but if they set it too high, then they miss their alert.”
That’s where PATROL Analytics comes into the picture. It uses the Netuitive analytics engine to crunch through performance data, unearthing—as analytic software is wont to do—hidden trends. It tracks historical information, too, so it can essentially generate a profile, so to speak, of system, network, and application performance over the course of an average business cycle.
Once PATROL Analytics builds a profile—which typically takes about two or three months—it can generate dynamic, real-time thresholds that eliminate false positives. “We begin to learn the behavior profiles of not only the system or a business service, but the correlated data or the parameters that impact that service,” Duclaux says. “We have found we are able to eliminate almost 100 percent of false positive alerts. … Not only are we reducing the overall volume of the alerts, but when we raise a trusted alarm, you know that is the one rabbit you should chase down the hole.”
Consider, for example, a corporate mail server. One of the busiest times for any messaging platform is Monday morning, when an army of workers sorts through a weekend’s worth of accumulated mail. In many cases, Duclaux says, this will drive up CPU utilization on the server, which—if your performance-monitoring threshold is set too low—is going to trigger an alert.
Consequently, an organization may set its threshold artificially high for just this reason. Of course, if a new mass-mailing worm gets into the wild, it, too, could increase CPU utilization on the company’s e-mail servers. Unfortunately, the company’s performance monitoring software won’t generate an alert until CPU utilization exceeds the maximum threshold – which is designed to accommodate the artificially high Monday morning e-mail crunch.
PATROL Analytics, on the other hand, could flag the virus outbreak as an anomaly that requires immediate IT attention. A similar example might involve a monthly or quarterly corporate meeting, at which all employees at a given business campus gather together. As is typical in such cases, many employees will return to their desktops and e-mail one another about what was said in the meeting. A conventional performance monitoring software package might flag such activity as anomalous, but PATROL Analytics—once it has a cycle or two of historical data under its belt, at least—would give it a pass.
With a typical ROI of two to three months, Duclaux says the new offering should be an easy sell to management, too.
“We see very, very fast ROI on the product, and here we state, [ROI is generally three months or less,” he says. “We pick that three-month window—generally we understand it to be closer to two—but we pick that the customers have time to understand the business cycle.”
According to BMC, a prominent satellite radio service achieved a 300-to-1 reduction in false alerts using PATROL Analytics, while a leading phone company managed to reduce the number of pages it sends to IT staff by a 15-to-1 ratio. Another success story is Oklahoma Heart Hospital (OHH), which tapped PATROL Analytics to support one of its most critical applications, Cerner Millenium, a patient and doctor information system.
According to Dave Stinson, network director with OHH, PATROL Analytics works as advertised in his environment.
“The problem I had was getting [all of this information] under control -- all of the management [tools], everything from applications to networking connectivity, everything that can generate an alert, so to speak, under control,” he says. Before [PATROL] Analytics, the perspective was just getting all of the alarms into one place, getting them to go to one pager. And when I looked at the PATROL Analytics, it was almost like a present from god.”
Now, he says, he rarely gets a false alarm. This frees him to focus on improving application and network performance for OHH’s doctors, nurses, and other staff.
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.