Q&A: How to Improve Cloud Management

IT administrators have traditionally been managing cloud technology with highly customized, manual scripts. Automation and integration are key to overcoming the inefficiencies of this approach. We explain what you need to know to about avoiding common mistakes in moving to automated, integrated management.

For a long time, IT administrators have traditionally managed newer technology -- especially cloud environments -- using highly customized (but highly inefficient) manual scripts. It's time to automate and integrate your management tasks! To learn more about what's needed and how to avoid common pitfalls, we turned to Moe Fardoost, senior director of product marketing at Oracle; he's responsible for global product marketing for Oracle Enterprise Manager.

Enterprise Strategies: What management challenges does cloud computing present?

Moe Fardoost: Cloud computing promises to deliver greater adaptability for dynamic business needs, significant operational efficiencies, and lower cost. However, without a comprehensive management regime, these promises cannot be realized. Indeed, achieving adaptability with efficiency and lower cost demands that a comprehensive cloud management layer exist behind the scenes, orchestrating it all.

Virtualization is becoming prevalent and is an important enabler of cloud computing. It is easy and tempting to request and create virtual machines to meet demand. An application may consist of many VMs talking to each other, each with a software stack properly configured. For many customers the focus on VMs has resulted in an exponential growth of virtual machines with redundant applications running in them, loosely automated by a custom-built, cumbersome automation regime, creating more complexity and sprawl within the data center -- and that's just setting up the applications the first time!

After initial setup, unauthorized and untested changes to virtualized applications impact application performance, security, and reliability -- and can lead to expensive downtime, risky compliance issues, and service-level violations. Comprehensive management of these changes over time is required for cloud environments to provide acceptable ongoing performance, security and reliability.

Lack of visibility into the full application-to-disk cloud stack is another inhibitor to efficient cloud management and resource utilization. At times, customers end up using multiple management tools to manage the virtual and physical infrastructure while using completely separate tools to manage the software tier deployed into the infrastructure. This siloed approach creates more complexity and headaches for administrators. It also makes problem diagnosis and triage more difficult -- especially if an issue runs across multiple tiers or components.

Getting to a single pane of glass is not easy because it requires deep diagnostic capabilities instrumented directly in the applications (presentation layer, middleware, database, etc.), virtual machines, and physical infrastructure in order to provide a complete picture.

To achieve the best value out of the cloud infrastructure, it is important to have end-to-end management and automated workflows for various activities during the complete cloud life cycle -- from planning and setting up the cloud to end-user self-service provisioning and de-provisioning of applications, to metering and charging back the usage.

Because both the cloud infrastructure and applications must be maintained and updated over time, lowering the risk of ongoing change via proactive support, with call-home facility, integrated remote service request and patch management capabilities becomes critical.

What are the keys to evaluating your ability to automate IT operations and fill the gaps?

To automate IT operations, you first need to evaluate the foundational management capabilities you already have in place for managing your data center before you look at managing all the unique and different aspects of cloud. These foundational capabilities are vital for gaining end-to-end visibility into the cloud infrastructure that can be complex due to the interdependencies of various application, virtualization and physical infrastructure layers.

Some of these foundational capabilities include:

  • Configuration and compliance management, including tools and processes for application discovery and dependency mapping, analysis of configuration states, compliance dashboards, and the maintenance of this information on an ongoing basis.

  • Life cycle management that requires tools and processes for testing, patching, provisioning and dynamic resource management, which can be linked back to vendor's support cloud for ongoing maintenance.

  • Application performance management that includes tools for end-user experience monitoring as well as diagnostics and tuning from applications to disk

  • Application quality management that provides tools for functional testing, load testing, and data masking.

After you evaluate and fill in the gaps for the foundational management capabilities, you need to look at the new requirements for managing the complete cloud life cycle. An integrated cloud life cycle management approach will require tools and processes for planning and setting up the cloud, testing and deploying applications into it, and monitoring and managing the cloud and applications on an ongoing basis. Tools that provide these capabilities inclusive of self-service, policy-based resource management, assembly deployment, metering, chargeback, and capacity and consolidation planning are all critical for complete cloud management.

What are the biggest mistakes IT makes in creating and centrally managing multi-tier application assemblies for a self-service catalog?

IT administrators have traditionally been managing application deployment and images with highly customized manual scripts. They have now begun to attempt to leverage those scripts in virtual machine environments, but expecting a self-service user to understand the intricacies of full-stack deployments into VMs is unrealistic. Put another way, if a self-service user has to spend hours running setup scripts after receiving his or her allocation of cloud resources, have we really created an efficient cloud? Obviously not.

Another mistake is assuming that IT or end users can now manage these multi-tier application and virtual machine deployments with the same point tools they have been using in the past.

Yet another mistake is when IT administrators ignore the need for standard configuration of applications, virtual machines, and OS images and do custom deployments for each individual deployment request, either in an attempt to provide a highly customized solution for each request or out of fear of vendor lock-in. By doing this they miss the ability to automate and truly track the progress against the service-level agreements they signed with the lines of business.

Finally, another common mistake is to ignore the need for detailed tracking of the usage of the IT resources and the ability to charge the line of business for their usage.

What best practices can you recommend to avoid these problems?

Having an integrated cloud management regime to manage the entire cloud applications-to-disk infrastructure throughout the full cloud life cycle will avoid many of the problems mentioned above.

Standardizing the configuration of applications, virtual machines, and the physical infrastructure and setting up policies to flag compliance issues and dynamic resource management will be immensely helpful in managing the application request and provisioning process through a self-service catalog.

What are the best ways to dynamically allocate and balance resources based on schedule- or performance-based policies?

One of the major challenges that enterprise IT traditionally had was its inability to respond quickly to performance issues or utilization trends and dynamically balance IT resources. Cloud attempts to address the challenge by making resources available on demand, especially catering to applications that have big seasonality and load variance.

Therefore, in modern cloud-based environment, it becomes even more important for IT infrastructure to be elastic and have auto-scaling capabilities to respond quickly to business demands. IT managers should be able to set policies to dynamically allocate, scale up, or scale back resources. They should be able to set policies based on performance thresholds, for example, scaling out nodes if the database load goes beyond a certain threshold for a certain duration.

Should IT approach capacity planning for on-premise (private cloud) resources differently than they do for off-site (public cloud) resources?

Public clouds often claim to have "elastic scalability" and the ability to handle large capacity increases and decreases on demand, so IT may not need to do capacity planning at all for public clouds. Of course, no cloud has truly infinite capacity, but large public clouds have larger reserves than small or midsize organizations with private clouds.

For organizations with private clouds, IT must perform capacity planning for shared pools of resources, not individual applications. Good monitoring and planning tools can help. When sizing capacity for individual applications, especially new ones with uncertain demand, IT often intentionally over-estimates the capacity required because it can take so long to bring new capacity online and because capacity is not shared with other applications in a pooled reserve. With a private cloud, the amount of reserve and excess capacity can be smaller because of the shared resource architecture. Again, good monitoring and planning tools are essential.

In addition to capacity planning, another issue is control and visibility. Customers running applications in public cloud environments should not expect to have complete visibility and control over it. If an application is mission-critical to the business and requires a complete level of control, it should be deployed in a private cloud.

In a private cloud, enterprise IT has the responsibility of managing the on-premise private cloud resources and using them efficiently to get the best value out of the infrastructure. They can have complete control over the on-premise resources and track the usage patterns overtime to predict the optimal capacity or even charge back the metering costs to the lines of business if needed. This can help them understand usage patterns and avoid over-provisioning.

With an efficient solution for capacity planning and management, IT can plan for consolidation or repurpose underutilized systems. They can monitor the resource usage and make purchase decisions only when necessary to meet demand.

Why should you consider developing the ability to meter resources for the possibility of chargeback?

One of the most popular definitions of cloud is from National Institute of Standards (NIST). They consider metering capability as one of the key characteristics of cloud.

According to NIST, "Cloud systems automatically control and optimize resource use by leveraging a metering capability (typically through a pay-per-use business model) at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service."

IT is increasingly expected to run as a business and not as a cost center, but whether IT is truly charging back user organizations based on utilization of shared IT resources or merely using the metering data to do allocations, having accurate utilization data is critical. Metering or measuring the shared resource utilization in terms of business metrics as well as IT metrics (such as compute power, memory and storage) enables IT to perform chargeback or "showback" to the business units for their actual usage.

IT needs an automated way to understand how the different lines of business consume IT resources and what charges should be placed on those resources based on usage or policy. It is critical to note that IT must be able to track business metrics, not just IT metrics. Line-of-business users may not be satisfied with costing based on IT metrics alone. For example, metering capabilities should include tracking the number of orders processed, application response times, and other metrics for which their line-of-business customers are accustomed.

What specific products does Oracle have for cloud management?

Oracle Enterprise Manager is Oracle's solution for addressing the challenges I've described. Customers who build their private or public cloud using Oracle technologies can manage the entire cloud life cycle from a single console -- including cloud setup and the resulting self-service provisioning, policy-based resource management, assembly deployment, metering, chargeback, and capacity and consolidation planning.

Oracle Enterprise Manager delivers a highly optimized solution for managing the entire Oracle cloud platform. To complement this, key capabilities for monitoring, diagnosing, packaging, and provisioning are built into the Oracle technologies and applications which Oracle Enterprise Manager manages; hence, total control is achieved through an intricate orchestration of cloud operations across the managed cloud infrastructure.

Must Read Articles