Do Not Be Alarmed

Remember the boy who cried "Wolf!" onetoo many times? You know what happened. No one listened when the wolf came along. The sameis true for network operators forced to prioritize alarms from a torrent of"Wolf!" alerts.

Ultimately, everyone is concerned about Quality of Service (QoS). Or they should be. Sothe question becomes, "How do we get to QoS from thousands of events per day?"The answer is simple: event correlation. That is, taking a large number of raw events andreducing them into only the most important. While some systems use simple filtering toreduce unimportant events, most others use some kind of basic pattern matching andthresholding.

Another method is a sophisticated means of combining a number of raw events andcreating a few new events that better describe a problem. Many of these systems, likeSeagate NerveCenter, even surpass some of the basic functionality of HP OpenView NetworkNode Manager (OVNNM) with improved polling engines and advanced Java and Windowsinterfaces.

OVNNM identifies problems by using an event correlation engine. The engine correlatesevents into high-level alarms, attempting to immediately pinpoint the root cause ofnetwork problems. A drill-down capability allows network administrators to see all of thecontributing events for each of the alarms.

The HP OpenView Event Correlation Services (OVECS) Designer for IT/Operations (ITO) andNetwork Node Manager allows development and testing of correlation "circuits"for custom correlation requirements through a GUI. These "circuits" can then bedeployed to collection stations or management stations in the enterprise.

The ECS Engine is bundled with every product, enabling the out-of-the-box correlation"circuits" that come free with the product to be immediately implemented. (For adetailed description of each out-of-the-box circuit, see the Circuits Maximus sidebar in"Network Management And NNM Move To The Web" on page 26). Events are processedthrough the logic, either to completion (suppressed or output); or they are held pendingthe evaluation of some future conditions.

The ECS engine implements Boolean logic when deciding whether to suppress an event orto forward the event downstream in the correlation logic. Information from outside theengine may be retrieved and used to make correlation decisions, to modify event attributevalues or to add created events. Figure 1 presents an example of correlated events fromxnmevents in HP OpenView NNM 6.0. Figure 2 shows the correlated events for the "nodeadded" event on Sun Nov 15 12:18:07.

Nervous Network Nodes
Seagate's NerveCenter uses the concept of "alarm states" to represent statusinformation. Alarm state information is always available in NerveCenter's AggregateSummary window. An alarm state has a name, such as Critical and a color, typically used tocommunicate the severity. Alarm state information can also be passed to the HP OpenViewmanagement platform or sent via several interfaces to other entities such as troubleticketing or notification systems.

In day-to-day operation, NerveCenter users typically interact with the NerveCenterclient to view the Aggregate Summary and to configure property groups, alarms, polls,masks and Perl subroutines, which are the mechanisms used to create behavior models. Abehavior model is a combination of these mechanisms used to model some situation orprocess.

The Aggregate Summary is NerveCenter's primary mechanism for presenting information. Itpresents a summary of the number of object instances that are in a particular alarm state(see Figure 3).

Property groups are used to categorize managed devices into groups that share commontraits or properties. For instance, a network manager might define a property group named"CiscoRouter" into which all Cisco routers are placed.

Alarms are the heart of NerveCenter. An alarm usually relies on the results of a pollor an SNMP trap to generate "transitions" within the state machine. For example,a state machine might use the results of an ICMP echo reply (i.e., a ping) to determine ifa device is "up" or "down." NerveCenter provides 30 predefinedbehavior models covering numerous situations including:

  • ICMP status (is a node able to be pinged or not)
  • SNMP status (is a node's SNMP agent responding or not)
  • Interface errors
  • Interface loading
  • Interface status

This is not an all-inclusive list of behavior models. Included, but not loaded bydefault, are additional behavior models created specifically for Bay Networks devices andCisco routers.

Cost and complexity can be the downfall of these systems. What you don't see are someof the hidden costs. Seagate NerveCenter is the least expensive product at $6,995 for a250-node license. NerveCenter is also the most extensible and flexible solution but can becomplex to implement.

Leadership Potential
HP OpenView ECS has the potential to be a leader because the engine is bundled withNetwork Node Manager 6.0, but at $25,000, it's expensive to implement. And HP OpenViewECS' technical complexity can only add to its expense.

So what's my pick? The integration of HP OpenView ECS with Network Node Manager and theWeb configuration screens are attractive, but when it comes to cost and flexibility, Imust choose Seagate NerveCenter.

--Charles Hebert (charles@southernview.com) is President of Southernview Technologies,Inc. (Marietta, Ga.) and the Chairman of the Program Committee for the 1998 OpenView Forum& Universe Conference.

OTHER CORRELATIONS

Two other companies offer event correlation products that [DEMO]ate with HP OpenView:

System Management Arts' (SMARTS; White Plains, N.Y.) InCharge is a family of products that provide IP Fault manager; InCharge IP Fault Management; InCharge SNMP Management Applications;

InCharge Service Impact Management and enables managers to improve network and system service levels. It automatically identifies the root cause of faults and performance problems by cross-correlating network, system and application data.

Tavve Software Company (Durham, N.C.) extends the functionality of HP OpenView with software that's been developed through years of network management and systems integration experience and has culminated in the Tavve applications suite tsc/Event Management, tsc/Web, tsc/Utilites, tsc/X Window Tools and tsc/EventWatch.

Tsc/EventWatch offers four major components:
Event Correlation & Root Cause Analysis; Impact Analysis; Fault Notification; and Service Level Management.

Must Read Articles