Q&A: How IT Departments Can Optimize Business Processes

Business transaction management can optimize business processes and performance.

Gartner's senior vice president of research, Peter Sondergaard, spoke at Gartner Symposium recently about four trends changing computing. He emphasized that although IT departments have been internally focused on optimizing processes and costs for the past 20 years, now it's more about business processes.

Charley Rich of Nastel Technologies, a provider of business transaction management (BTM) solutions, discusses how IT departments can optimize business processes and performance through the use of BTM technology that provides real-time visibility into application performance. Complex event processing (CEP) plays a key role here in ensuring business processes flow without interruption and enabling IT teams to react proactively to prevent problems.

Enterprise Strategies: How can IT departments change their focus to help grow the business and not just run the business when there are always so many fires to put out, so little time, and an ever-more-constrained IT budget?

Charley Rich: It starts by moving IT management’s focus from its traditional approach of improving mean-time-to-repair (MTTR) to outage-avoidance. Although improving the MTTR for application performance problems is important, it is essentially a reactive strategy. At this point in the process, end users are often impacted, productivity is decreasing, and the business process itself is disrupted. This “stealth waste” can be expensive; it reduces the level of service this firm is delivering and keeps operational costs high (which is often the underlying cause for customer attrition and order fallout).

One of the all-too-common scenarios is that an IT team repeatedly encounters the same sets of problems. Although the team gets better at resolving them, they rarely improve at preventing them, as they are unable to recognize the symptoms until users are affected.

Outage-avoidance looks to automation in order to prevent the impact of problems before the pain is felt. This requires an auto-discovery of how business processes are mapped to IT infrastructure and a complex event processing (CEP) engine that can predict failures. This approach automates the mapping of problems to situations that are comprised of multiple events and metrics from multiple sources and determines if these situations impact business users. The next time a symptom of this problem appears, the CEP engine will recognize it and can take action to prevent it from having impact. Now IT can break out of the cycle of using their best talent to chase the problems at the help desk and instead rebalance resources to use some of that talent for developing new services.

This methodology makes possible ITIL v3's vision of integrating IT with the business. Now, with IT focused on outage-avoidance, resources are available so IT can be used as the key part of the business that enables it to grow by adding new services and other company-differentiating capabilities.

The blame game is the all-too-common process after a problem with a complex, composite application occurs. Can IT move from the central war room approach to a more effective automated process, that proactively identifes problems before users are impacted and business processes are disrupted?

Yes, you can. The unfortunate and unproductive blame-storming that often ensues after a business process is disrupted is caused by a silo organization in IT and multiple sets of overlapping tools. From an organizational perspective, establishing an executive-sponsored, cross-domain services organization can help. This group with C-level sponsorship is charged with improving service delivery across silos.

Integrating the events and metrics the already deployed tools provide into a CEP engine for an overall analysis of the situation can also help. Prior efforts to utilize a manager of managers (MoM) to do this have not worked. They allow the information to sit side-by-side on the same dashboad, but do not provide insight into how composite events from different sources taken together to describe a problem that impacts the business. Utilizing an analysis tool such as a CEP can enable proactive notification before there is impact and prevent the need for the war room.

Are all IT problems equal? How does IT prioritize and differentiate between business transaction problems that impact the business and those that are important but do not immediately affect business processes?

No, IT problems vary in their impact to the business process. Many “problems” occur in IT that are completely transparent to end users and business processes. For example, consider failure of a Java transaction on a development server. Although important to the developer, there is no immediate impact whatsoever to your customers. However, if IT is resolving issues one at a time, it may be unclear whether there is business impact or not. A problem with a router might seem low priority, except when it creates a bottleneck that causes a WebSphere MQ (WMQ) server to take too long to drain a queue which causes the application that was waiting for the data from WMQ to timeout and miss its SLA and finally the user of the application abandons their order.

Taking the approach of using a CEP that understands the relationship of IT infrastructure to business processes, self learns, and prevents repeated problems can help here. The CEP along with business rules can be used to determine whether there is business impact so problems can be prioritized for IT based on their impact.

How can we continuously improve our response to the failures that have occurred and not spend the same time fixing a problem that has happened multiple times? In fact, is it even possible to prevent software problems?

Yes, software problems can be prevented. More precisely, their impact can be prevented. Using CEP, we can predict problems that will occur based on analyzing the multiple events and metrics that are captured by existing tooling. It can automatically predict issues via analytics such as exponentially moving averages, bollinger bands, momentum indicators, and other functions to differentiate a business normal from a business abnormal state. These composite problems define a situation. An automated response can be associated with these situations and initated as soon as the first symptoms of the situation occur and long before there is impact to the business process.

How does the move to the cloud impact IT?

Moving to the cloud makes everything more difficult for IT. It may be that many large enterprises migrate to the cloud in phases. With most of their business-critical applications based on service-oriented architecture (SoA), there is no reason to move all of an application to a cloud. These firms may utilize a hybrid environment with some services in their data center, some in a private cloud, some using a public cloud with software as a service and simultaneously be in communication with trading partner services that are distributed in a similar fashion. With so many moving parts distributed in so many locations, the need for visibility becomes ever more important, and monitoring this distributed “state” becomes more difficult.

How can we achieve ROI from business transaction management for monitoring the availability and performance of applications spanning the data center to the cloud?

A business transaction management (BTM) solution as described in Forrester’s recent report Evaluating Innovative I&O Solutions: Converged Application Performance Management includes a CEP engine and the capability to predict and prevent problems. It automatically discovers applications and stitches together the IT transactions it discovers into business transactions that implement the firm’s business processes. When a problem does occur, the BTM solution provides a deep-dive analysis that enables the IT staff to know the root cause of the problem down to the code level.

The BTM solution can be deployed in a data center or in its virtualized form it can be deployed along with applications as they are provisioned in the cloud. Its embedded CEP can be used to correlate the events coming from the BTM solution across many locations and produce a composite analysis with the rules and automation necessary to prevent problems. The visibility and problem prevention this provides supplies IT with an ROI -- outage-avoidance. This enables IT to have the time and resources to be used to grow and extend the business, which is essential to being competitive.

Are there any limitations to what BTM can provide?

Nothing in our world is without limit and that applies to BTM as well. BTM is just a technology. The larger part of the equation are the people who deploy and maintain technology. IT has a long history as a organization comprised of many stovepipe or silo structured departments (e.g.. storage, network, applications, operations, etc). That means that it is not easy for this organization to work across silos, and this is something that BTM absolutely demands in order to be effective.

There needs to be an “owner” for the BTM technology with a purview spanning across the silos and C-level sponsorship. The different application owners in the line of business and their supporting IT staff in shared services need to leverage the BTM technology to avoid multiple overlapping monitoring products all producing different answers.

What role does Nastel Technologies play in BTM?

Nastel Technologies provides a BTM solution called AutoPilot that ensures the availability and performance of applications, messaging middleware, and transactions across distributed, mainframe and cloud tiers. Using an embedded, grid-based complex event processing, it automatically identifies and fixes problems -- even predicts and prevents them -- before problems impact users and disrupt business processes. Recently Nastel Technologies AutoPilot solution was reviewed by Forrester Research in their article: Evaluating Innovative I&O Solutions.