Managing Data Center Performance and Availability (Part 2 of 2)

In the second of a two-part series, we explore best practices for protecting against downtime and data loss.

Preparing for data center outages is no longer optional. Server and application failures, site outages, natural disasters, and even simple human error can represent a serious threat to business operations and service levels. After all, data centers bear the responsibility for managing and protecting a company’s most valuable asset—its data. Even a temporary loss of access to electronic information can be costly.

As a result, providing rapid, reliable information access is critical in today’s competitive e-business world. In fact, defense against catastrophic information loss has become a strong driver of information and data lifecycle management initiatives in companies around the globe.

Best practices for guarding against downtime and data loss continue to include the implementation of data protection, replication, and clustering technologies. When used in combination with application performance management technologies that optimize the performance of applications, organizations have a powerful toolset for ensuring both uptime and quality of service.

Backup Basics

Best practices for data and application availability begin with a data protection strategy. Traditional tape backups have proven to be an effective and inexpensive means for data protection and recovery, and its portability makes it appropriate for off-site storage. However, tape is also slow, complex, and often unreliable. What’s more, recovering from tape can be time-consuming and cumbersome.

Disk-based backups offer several advantages over tape, including greater reliability, increased speed, and flexibility. Recovery is also faster and more efficient than tape since the backup is on disk. Disk-based backup also supports incremental backup and restore capabilities, enabling organizations to better avoid unacceptable interruptions to business operations. Plus, disk-based backup integrates with tape-based technologies to enable long-term data protection or off-site storage.

One of the most recent advancements in backup is continuous data protection, which provides the core benefits of disk-based protection while eliminating some of the challenges of more traditional technologies. Many continuous data protection tools also provide tape-based backup for archival and storage. Continuous data protection captures only changed portions of files, thereby simplifying backup, and can also back up multiple file servers simultaneously for greater efficiency. Recovery is also quick and flexible, allowing end users to retrieve their own files.

High Availability

While performing regularly scheduled backups protects against many types of data loss, backup provides only one layer of availability that businesses need to guard against downtime. Clustering can be used to guard against component failure while ensuring application availability and avoiding substantial downtime. Replication can be used to protect against substantial data loss. Both are vital to retrieving the critical applications of a business operation in the event of a disaster.

Clustering protects against server, application, and database downtime by eliminating the single point of failure found within a single server. Clustering also eliminates the need for additional application servers in the data center to otherwise guarantee availability.

Replication is designed to copy data to another location to protect it from disasters. Many replication tools allow organizations to replicate their data across disparate storage devices, over a standard IP connection, and across any distance.

Validation and Testing

Data replication and application clustering help protect IT and business operations in the event of a site outage or other event. Yet because application and storage configurations change so frequently and most organizations simply lack the time or budget for testing their infrastructure, the effectiveness of these tools can fall short of meeting the availability needs of the business.

Testing is often difficult and time-consuming, hardware resources are scarce, and there is virtually no way to avoid at least some level of disruption to the production environment when testing a business continuity or recovery plan.

At the same time, testing presents significant advantages to businesses. Organizations that test their recovery plans gain a more complete and accurate picture of their plan’s and can identify (and make needed changes to) the plan before disaster strikes.

Consequently, it is critical that companies test, plan, and validate recovery scenarios in production without disruption. This includes verifying that applications are migrated to the most appropriate server based on planned failover strategy, and testing those strategies on any desktop or laptop computer.

Because it is also important to plan for optimal bandwidth when replicating data between sites, testing a recovery plan also includes analyzing the organization’s network environment over a period of time to determine how much data is being written, and, in turn, establishing optimal bandwidth recommendations based on activity and specific parameters.

To complete the recovery plan test, replicated data as well as applications are validated, as are any new data, hardware, or application configurations. Automated tools make this simple. A typical tool mimicks a failover without stopping applications at the organization’s primary data center. It brings up a database or application to make sure the application is capable of coming online as the secondary in case of a fault at a primary site. Space-optimized snapshots can be used for bringing applications online at a secondary site, enabling organizations to test their recovery plan without having a complete extra copy of data. When testing is complete, the snapshot is destroyed so that the disk space can be available for future tests.

Organizations continue to work to defend the availability and performance of critical data and applications. By leveraging tools to apply best-practice approaches for optimizing performance and ensuring availability, organizations not only protect against costly downtime, but they also safeguard the quality of service their customers demand in today’s highly competitive business environment.

About the Author

As a senior group manager within the Data Center Management Group, Peter McKellar is responsible for product marketing, product strategy, and outbound marketing related to the company’s server foundation software. Peter joined Symantec through the VERITAS acquisition and has been with the company for more than four years.

Must Read Articles