In-Depth
Distributed Qeuing: MQSeries Queue Manager Clustering
Message queuing is a widely adopted technology for developing distributed processing systems where business logic extends to multiple and potentially remote systems. The asynchronous nature of message queuing enables applications on geographically separated systems to interact effectively, without the pain of synchronizing activities on communication ends that is usually associated with the conversational client/server model. This makes message queuing a preferred technology for many large-scale e-commerce applications.
MQSeries, IBM’s message queuing product, is available on practically every computer platform, and can be used for exchanging messages between applications running on the same system, as well as on different systems. From an end user point of view, there is not much difference -- putting messages to a local queue or a remote queue have the same semantics, and MQSeries takes care of the message delivery. However, the real price for message queuing across queue managers is paid at the administration side where a variety of objects have to be configured correctly for the sending and the receiving queue managers in order for the messages to be delivered. This is usually a very involved process, and if messages need to be exchanged between multiple queue managers, the amount of work is increased dramatically.
The simplest configuration involves two queue managers. To configure these queue managers so that messages can be sent to and received from one queue manager (local) to another queue manager (remote), the following objects may be required (assume one local queue at each queue manager is configured to receive messages):
• A sender channel definition for sending messages to the remote queue manager
•A receiver channel definition for receiving messages from the remote queue manager
• A transmission queue definition for sending messages to the remote queue manager
• A local queue definition for messages received from the remote queue manager
• A remote queue definition for sending messages to a remote queue at the remote queue manager
The Alternative -- Queue Manager Clustering
To mitigate this problem and to gain other benefits (which we will examine later) in message queuing, IBM introduced a new feature to its MQSeries V5.1 series product (and MQSeries 2.1 for OS/390 and AS/400) -- queue manager clustering.
Queue manager clustering is a technology that extends the way MQSeries does distributed queuing. It is designed to provide better manageability and workload balancing. With queue manager clustering, queue managers can be grouped into a logical group called a cluster. Queues can be defined as cluster queues. A cluster queue can be accessed by other queue managers just as a normal local queue, within the same cluster.
A cluster has the following components and objects:
• Cluster Repository. This is a collection of all the cluster-related information. This is maintained in a queue. Usually, two full copies of the repository are maintained for each cluster.
• Repository Queue Manager. Queue manager that manages cluster repository.
• Cluster Queue Managers. Queue managers that are defined as members of a cluster.
• Cluster Sender and Cluster Receiver Channel. Cluster specific channel definition. A pair of these definitions is required at each cluster queue manager.
• Cluster Transmission Queue. Cluster specific transmission queue. One is required at each cluster queue manager.
Configuration
To configure a queue manager so that it can communicate with other queue managers using the clustering technique, the following definitions at each cluster queue manager are required:
• A cluster sender channel definition for sending messages to a repository queue manager.
• A cluster receiver channel definition for receiving messages from other cluster queue managers.
• A local cluster queue definition for receiving messages from other cluster queue managers.
The setup of a repository queue manager follows the same steps, except the queue manager should be defined to hold the cluster repository.
To understand to what extent queue manager clustering can reduce the complexity of the administration work related to distributed queuing, we can look at the configuration steps for the same four-queue manager scenario we discussed earlier, but this time, we use queue manager clustering instead of the traditional distributed queuing.
Workload Balancing
One of the added benefits of queue manager clustering is workload balancing. This is done through defining distributed cluster queues -- basically cluster queues that have multiple instances defined in the same cluster. Messages sent to a distributed cluster queue may be delivered to any of the instances of the queue. The algorithm that is used to determine which instance of the queue should get a message is specified in a workload exit, which can be customized by users. The default workload exit uses a round robin approach (with the exception of the following scenario: if a message is sent to a distributed cluster queue and there is a local instance of that queue, then the message is always distributed to the local instance).
The obvious advantages of using a distributed cluster queue are:
• Increased availability of queues and applications, because there are multiple instances of them.
• Faster throughput for message processing, because messages can be delivered to multiple destinations and processed by multiple applications at the same time.
One of the common design issues associated with workload balancing is message affinity -- the relationship between messages that needs to be maintained when the messages are processed. One example is a package of data that is broken into multiple messages and that need to be reassembled at the receiving end by one application. Using workload balancing in this case may result in the pieces of the package ending up at different cluster queue instances.
To overcome this problem, MQSeries has built-in mechanisms to let a program specify message affinity, and to make sure associated messages are being delivered to the same queue instance.
What Lies Beneath -- Automated Configuration
What is the secret behind this technology that makes distributed queuing seem simple? How is it possible to send messages to remote queues without defining channels and remote queues?
The answer is automation. Distributed queuing using clustering is just as cumbersome as the traditional way of doing it, except with queue manager clustering the computer is handling the management for us. Because all the information about cluster queue managers and cluster queues is stored in a cluster repository and, therefore, is accessible to every queue manager in the cluster, it is possible for a queue manager to define channels and remote queues "on the fly" (i.e., channels and remote queue definitions are defined for cluster queues by queue managers when there are messages being put to a cluster queue.) All these happen without any manual configuration.
The key to this mechanism is that the queue manager and queue have to be defined as cluster queue manager and cluster queue, so that the necessary information about the queue manager and the queue is gathered and stored in a cluster repository. This has the effect of "advertising" the cluster queue to all queue managers in the cluster. Armed with this information a queue manager can automatically create the objects that are necessary to support delivering messages to the "advertised" cluster queue.
The actual steps that are taken by a queue manager to send a message to a cluster queue are as follows:
1) The cluster queue’s information is retrieved from the cluster repository.
2) Routing information is added to the message.
3) Message is put on the cluster transmission queue.
4) A remote queue definition is automatically created.
5) A cluster-sender channel definition pointing to the receiving queue manager is automatically created.
6) The channel is started.
7) The message is transferred.
8) The channel is shutdown when it is no longer needed.
The central piece of this architecture is the cluster repository, which holds all of the relevant information about the cluster: cluster queue manager parameters, cluster channel definitions at each cluster queue manager and cluster queue parameters.
To reduce potential network traffic and bottleneck caused by this centralized repository (called full repository), cluster information is cached at each queue manager (called partial repository), and it is updated periodically. To reduce the danger of a central point failure, it is a common practice to set up more than one full repository for a cluster.
Now, you have seen the benefits of queue manager clustering: It removes the burden of configuring and maintaining queue managers for distributed queuing from your system administrator, and it increases your distributed system’s overall availability and efficiency. Before you jump to redesign your system to adopt this new technology, be aware there are several things you need to consider before making your decision.
First of all, because it is new technology, its availability is limited (currently only on MQSeries V5.1 and MQSeries for OS/390). If you run MQSeries on multiple platforms and some of them do not support clustering, then you might want to make sure that the overhead of dealing with a mix of queue managers that do and don’t support clustering does not outweigh the benefits.
Another thing you want to keep in mind is the likelihood that you will have to configure your systems so that queue managers that are not in a cluster or that are in different clusters can exchange messages with queue managers in the cluster effectively. This involves setting up queue managers as bridges or routers, and presents a different kind of complexity.
Moreover, the possibility that messages get distributed to arbitrary queue instances when workload balancing is used increases the difficulty of debugging message distribution-related problems. The effort to locate a misplaced message sent to a cluster queue is multiplied. This will certainly make diagnosing MQ problems a much more difficult job. Transaction analysis software that allows you to visualize the cluster, and is able to drill down into the API level detail will become a necessity.
The promise of a simple, transparent development interface for MQSeries distributed queuing is built at the cost of introducing complex and often difficult (if not frustrating) configuration and maintenance work. Clustering is an elegant way to get the best from both sides. On top of that, the benefit of workload balancing is worth investigation for any IT manager.
Jason Zhong is a software engingeer with Bristol Technology Inc. (Danbury, Conn.)., and an IBM MQSeries certified developer. He can be reached at jasonz@bristol.com.