Protecting Big Data: The Year Past, The Year Ahead

This was the year Big Data emerged as a significant focus. What will 2012 bring?

By Jeff Tofano, Chief Technology Officer, SEPATON, Inc.

As near-exponential data growth continued unabated, Big Data has emerged as its own category of IT challenges. Big Data is loosely defined as data sets that are too large to move, process, manage, and analyze with traditional approaches. The sheer volume of data in today's enterprise data centers is overwhelming traditional systems to handle it. According to estimates in a May 2011 report by McKinsey & Co. report (Big data: The next frontier for innovation, competition, and productivity, May 2011), by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data per company with more than 1,000 employees.

In these environments, the data sets are so large and the required transfer rates are so high that all normal tools for moving, processing, analyzing, and protecting this data break. In 2011, this data growth reached a critical mass, causing IT staff to look for new ways to manage and protect it and opening a new market for vendors with compelling solutions.

2011 Trend #1: Very large databases emerged as data protection and processing challenges

Companies have become increasingly dependent on large databases to run mission-critical business operations. . As data volumes increased in 2011, enterprises began to run out of processing power and I/O performance in existing solutions needed to move, deduplicate, and protect this data. Continuing to achieve RTO and RPOs for these databases emerged as a critical point of vulnerability as data volumes continue to grow.

More enterprises struggled to provide sufficient system resources to move database data to the safety of their backup environments within backup windows as well as fast restores for business continuity. Deduplication within the database applications and inline deduplication technologies were not able to control data volumes sufficiently. Enterprise IT managers are looking to vendors to provide innovative solutions for protecting large, rapidly growing databases efficiently and cost effectively.

2011 Trend #2: Increased adoption of virtualization created challenges

Although virtualization has been growing in small-to-medium enterprises (SMEs), until recently, adoption in large enterprises has been slower for their mission-critical workloads. However, in 2011, more enterprises looked to virtualization to drive efficiency in their data centers. Although these environments offer savings through server consolidation and improved management efficiency, they create I/O contention challenges because many virtual images must share the underlying server resources.

Moving data to safety as well as deduplication, replicating, and restoring data in these environments continues to be a challenge for traditional approaches. Looking ahead, the need for more efficient ways to protect massive, virtualized environments continues to be a challenge for enterprises -- and an opportunity for vendors to provide innovative solutions.

2011 Trend #3: Data protection for big data caused sprawl

One of the most significant trends in 2011 was the challenge of providing adequate protection for Big Data. As data volumes grew, many IT departments struggled to meet their backup windows, restore SLAs, and recovery time objectives (RTOs). Although the trend of moving data off physical tape libraries to disk-based backup systems with deduplication continued, in 2011 data protection sprawl emerged as a significant challenge.

Sprawl resulted as Big Data exceeded the scalability of some disk-based backup systems, forcing companies to add multiple "siloed" backup and disaster recovery (DR) systems and to manage each individually. Sprawl added significant costs and complexity by requiring IT managers to load balance their backups with every added system.

In addition, with more hardware and software to manage and maintain, IT staff spent significantly more time "tuning" systems for efficiency and in monitoring data as it moves through backup, deduplication, replication, and restore processes. System sprawl also reduces deduplication capacity by finding only duplicates within each silo.

A Look Ahead: 2012 Trends

As we look ahead to 2012, enterprise data sets will continue to get larger. Databases and unstructured data in particular will break traditional backup approaches. Enterprises will be increasingly be open to experimenting with new approaches to manage and protect Big Data more efficiently. This trend is creating a new market opportunity for vendors in a market that has been very mature for a long time.

2012 Prediction #1: Experimentation in enterprise backup environments will grow

Until recently, enterprise data protection has traditionally been a relatively straightforward process of backing up data to tape or to disk-based "targets." In 2012, enterprises will experiment with multiple technologies to protect Big Data volumes more efficiently.

A combination of several approaches will be used to reduce the volume of data to be moved, including source deduplication, snapshots, CDP, and archiving. Expect increased market pressure for larger, more scalable, unified data protection platforms and for traditional and emerging technologies with deduplication and replication that will help reduce cost, complexity, footprint, and data protection sprawl.

2012 Prediction #2: Enterprises will adopt new technologies for protection of large databases

With their increased dependency on large, rapidly growing databases, enterprises will rethink the way these databases are protected and restored.

Enterprises will pursue new technologies that can ingest and process large database data efficiently. Database data, which has changed data within small (<8KB) blocks, cannot be deduplicated efficiently by traditional, in-line technologies. In addition, the new analytic software used to manage large data environments produces massive volumes of small data segments that are also difficult for traditional data protection systems to deduplicate.

In 2012, enterprises will adopt deduplication technologies that can scale to handle massive data volumes and will examine data in these small increments for optimal capacity reduction and WAN optimization for replication. As more analytic software tools are introduced to manage large data environments, the need for efficient ways to protect and deduplicate data created by them will increase.

Trend #3: Enterprises will adopt new approaches that can meet their data protection and data recovery requirements over the next 3-5 years.

In the short term, many enterprises will evaluate and roll out emerging approaches for new application environments but retain the traditional approach for their existing ones. This approach is not efficient or cost-effective long term, so enterprises will unify these siloed approaches into massively scalable systems. Unified solutions will enable enterprises to protect evolving environments efficiently by enabling them to:

  • Move massive data volumes to safety and recovery quickly

  • Deduplicate/replicate unstructured, semi-structured, and structured data quickly and efficiently

  • Leverage deduplication to minimize replication bandwidth requirements

  • Provide powerful reporting and dashboards that enable IT administrators to monitor petabytes of data as it is backed up, deduplicated, replicated, and restored

A Final Word

Is IT prepared for this shift? If not, what should they start doing? Brute force with traditional approaches is not going to work much longer. It is becoming totally unmanageable and unaffordable. IT will need to think out of the box and look at emerging technologies and new approaches that enables it to drive down costs and reduce the risks of data loss and downtime.

Jeff Tofano, the chief technology officer at SEPATON, Inc., has more than 30 years of experience in the data protection, storage, and high-availability industries. As CTO, he leads SEPATON's technical direction and drives product and architectural decisions to take advantage of SEPATON's ability to address the data protection needs of large enterprises. His experience includes serving as CTO of Quantum and holding senior architect roles with Oracle Corporation, Tandem Computers, and Stratus Technologies. Jeff earned a BA in computer science and mathematics from Colgate University. You can contact the author at jtofano@sepaton.com.

Must Read Articles