In-Depth

Q&A: Cloud Migration and the Impact on Enterprise Monitoring

Your cloud work doesn't stop once you go live. IT must vigilantly monitor its performance. We examine the options and impediments.

Cloud computing's benefits are well known, but once IT makes the move to the cloud, the job isn't over -- you must keep a close eye on cloud performance. What are the options and limitations, and what role do service-level agreements play? For answers, we spoke with James Sivis, vice president of sales and marketing at Circonus, a company that performs both internal and external monitoring of physical and virtual environments as SaaS or private SaaS.

Sivis' expertise includes product management, operations management, and strategic planning; over the past 25 years he has worked for start-ups and large enterprises, including telecom/networking, IT, new media, software, hardware, and professional services.

Enterprise Strategies: Will an enterprise migrating to the cloud have separate monitoring for their physical and virtual environments? If so, how will the two communicate?

James Sivis: It used to be the case that you had to have separate monitoring systems for not only virtual and physical environments but even within each environment, for different types of monitored end-points -- say applications, servers, switches, databases, Web metrics, and business metrics. Of course, these disparate systems didn't communicate, so you had to switch back and forth between them, never getting a cohesive view of your enterprise. With the new generation of systems that can monitor both physical and virtual environments, those issues should be a thing of the past.

The cloud providers have some built-in monitoring. Is it sufficient?

Unfortunately, the visibility that most cloud providers give to their customers is not ideal. Cloud providers are naturally concerned with performance of the infrastructure they provide to their customers and expend significant effort to keep their systems up. However, they don't expose much, if any, of performance data externally to their customer base. Although cloud customers could (and most often do) simply rely on their providers to ensure availability, etc., that's not good enough for anyone whose IT operations are mission-critical for the continued health and viability of their business -- and whose IT isn't mission-critical nowadays?

There is also the notion of sufficient performance rather than optimal performance. Which do you want? You are responsible for your IT, its costs and performance, and for the ability to deliver to internal constituents and to your external customers. This being the case, "just sufficient" isn't really good enough -- you want the best performance you can have. For this, you must be able to have your own direct visibility into your cloud infrastructure. Not having performance data is like driving in the dark without your headlights on. As long as the road is straight and there are no additional impediments, you're fine -- but for how long?

What happens when something goes wrong (and let's face it, that time always comes)? More often then not, you're told to simply spin up a new image to replace the one that faltered. That's fine for resolution of the immediate symptom but does nothing to identify the root cause of the problem.

What about SLAs? They provide a measure of assurance, don't they?

I always like hearing about SLAs -- you know the adage, that they're not worth the paper they're written on. SLAs would be great if they were ever laid down in such a way as to provide business continuity assurance, with full, immediate recompense for the long-term business damage that infrastructure downtime can easily wreak. Of course, they never are or they would be written that way. What do they provide? Simply a refund covering the amount paid to the cloud provider for the period during which the SLAs weren't met. No business-impact coverage whatsoever.

Let me ask you: what good is a refund of your cloud hosting fees when you've damaged your relationship and reputation with your internal users and/or your customer base? Obviously, it is of no real consolation. Let's face it, it's your infrastructure and your business -- so you have to maintain control over it to ensure the best possible outcomes while avoiding the worst possible ones. A crucial requirement for being able to do so is to not rely on SLAs but rather maintain your control through the visibility and alerts that continual monitoring provides.

Many enterprises are looking at the option of establishing a private cloud. Why would monitoring differ for private cloud versus public cloud?

That's a very reasonable question. With public cloud, your data transmits through the cloud along with the data of other clients. First, it's worth putting this data transmission in perspective -- performance data represents aspects of your past and present and not your future. You're likely already transmitting things much more vital to your organization's future, such as information about your clients, through the cloud by use of such widely accepted vehicles as Salesforce.com CRM.

Having said this, quite effective measures are certainly taken to protect your data in the public cloud. With Circonus, for instance, these include SSLv3 with PKI framework for in-motion data, along with performance data and account data being stored in completely separate repositories.

Nonetheless, some companies, whether for competitive reasons or regulatory requirements, do not wish any of their data to be sent outside their organization. For such companies, private-cloud is the preferred approach. In this way, they have an exclusive monitoring system that resides within their legal boundaries, and, if desired, within their physical boundaries as well.

There are separate organizations within our company that are also moving infrastructure to the cloud. Are there multi-tenant capabilities for cloud monitoring?

Yes, being multi-tenant is a requirement for cloud monitoring. Different parts of your organization will be able to have their own monitoring set up and either share information cross-organization or keep them segregated, depending upon your preference. We at Circonus are proponents of information sharing so that different parts of the organization will know how their actions affect other parts and vice versa.

What are we talking about in terms of cost for cloud monitoring? Will an enterprise also have to install and pay for agents for each virtual host it wants monitored?

Cost really depends on the solution. Some "Big 4" solutions that have been repackaged as SaaS offerings are still quite pricey (along with their other downsides). You also have to be careful because there are seemingly inexpensive solutions that start off cheap but then, like the Salesforce.com pricing model, escalate dramatically as you grow.

With a newer generation of hybrid monitoring specifically designed for both physical and virtual environments, cloud monitoring can actually be much cheaper than for physical environments used to be. Modern SaaS monitoring pricing goes down to as low as $15 per host per month. If you have several hundred (or more) hosts, you can really drive down your monitoring costs by taking advantage of an all-you-can-eat pricing model, with no restriction on hosts or metrics.

As for the question of per-host agents, the last thing you want to be dealing with is putting on a special agent on every one of your hosts. Some "SaaS-ified" versions of old-school tools require such an installation of an agent on every host at a cost of $50 per host per month -- that, of course, adds up if you have any sort of volume. It is much more digestible and cost-effective to find an option where per-host agent installation is not called for, where a single enterprise broker is sufficient to generally handle an entire data center.

Is cloud monitoring real-time?

It sure can be. Now, what do we mean when we say real-time -- how real-time? You want to know when things hit the fan and you want to know before your internal and external users do. For rapid network operations center (NOC) response and troubleshooting, you want down-to-the-second for alerting and for a live "Play" mode on your graphs.

Furthermore, you don't want to lose too much granularity when looking at historical data and visualizations. Some tools roll-up historical data so much as to be worthless, or don't keep that data at all after too short of a period. You should be able to drill down to about 5 minutes worth of data on historical graphs, and data retention should be not weeks or months but several years.

What sort of functionality is available for cloud monitoring -- does it include, say, trending and alerting, for example?

Definitely, but you have to look beyond the offerings that spend mega-dollars on advertising to show up all over the place with their marketing-speak. Don't believe the hype - do an evaluation, using free trials, of the technology itself.

What sort of functionality do you want? You want a full-featured system, but that doesn't mean the "mile deep and an inch wide" sort of tools, such as for applications monitoring. They're good at what they do, but you need a unified system that's going to handle your full range of monitoring needs: servers, databases, apps, Web, business. Your system should also be able to pull in data from these special-purpose tools to bring them into your unified view.

Your cloud monitoring system should give you stellar visuals of your data, with a UI that takes only a couple of clicks to create configurable dashboards, shareable graphs, correlations among data types, etc. This doesn't mean being spoon-fed, for example with a very limited subset of pre-prepared graphs. Put the time in to understand what you need to monitor and how -- don't rely on, or be limited to, some designer's guess of what's right for everyone -- you're not everyone, you're you. The same goes for your operations and your business. You want flexibility in your monitoring system -- you should be able to monitor what you want, how you want to, and when you want it. You should also be able to easily replicate your checks across similar hosts.

Your cloud monitoring also should come with a full alerting suite with such elements as escalations, maintenance windows, recoveries, rule criteria, parent/child dependencies, timed acknowledgement windows, soft alerts, and alert history report. As an aside, alerting is one of the areas where it shows who built your monitoring system. It should ideally have been built by people experienced in software architecture as well as Ops. The latter should have used most of the inadequate and frustrating monitoring toolsets that have been out there since the 1990s, so that they know to have built a system that avoids, for example, those not-so-fun false alerts in the middle of the night.

The bottom line:

  • Make sure that you have your own visibility and alerting for your cloud infrastructure
  • Test out in a trial any system before you buy it
  • Know the reputation of the people who designed and maintain the system
  • Make sure your monitoring of different aspects of your infrastructure can be cross-correlated
  • Be real-time
  • Make sure the system can scale (for example, handle Big Data), and do so cost-effectively
  • Consolidate your monitoring into a unified view of your infrastructure/enterprise
  • Spend the time up front to learn best practices for monitoring, or use a managed monitoring offering
  • Continually monitor your cloud environment
  • Do root-cause analysis to learn from any operational issues that come up, so you don't face the same ones again

Must Read Articles