In-Depth
Top 3 Trends in Enterprise and Big Data Backup: A Look Back, The Year Ahead
Scalability is a must-have for 2013 as data growth fueled by Cloud, Big Data push backup technology to the breaking point.
By Jeff Tofano
As we look back over 2012, we saw an unprecedented growth in backup data volumes fueled in part by the adoption of cloud, big dataanalytics, and large database systems (Oracle, DB2, and SQL) . This fast data growth brought many backup environments to the breaking point and resulted in three key trends in data protection as enterprises looked for faster, more cost-efficient ways to backup, replicate, and reduce the capacity requirements of overwhelming data volumes. I will discuss these trends and their impact on enterprise backup environments in 2012 and provide my predictions for large enterprise and big databackup environments in 2013.
2012 Trend #1: The need for enterprise scalability in data protection environments reached a critical point
Until recently, large enterprises have had a limited number of options for data protection. Only a handful of technologies were both cost-effective and fast enough to protect massive data volumes in these environments without the cost and complexity of managing multiple, independent systems. In the past year, we saw unprecedented levels of data growth push many of these backup environments to the breaking point. They routinely exceeded their backup windows by hours, failed to meet important service-level agreements, and saw IT costs spike. Many struggled to efficiently replicate data to remote sites provide effective disaster recovery protection.
Some enterprise IT environments tried to solve this issue by using multiple different technologies to backup and protect data. For example, many added 10 Gb Ethernet backup using NetBackup OST for faster performance. Many added multiple, single-node backup systems to their environment to maintain their backup windows, and in the process added enormous complexity, increased IT admin costs, and crowded data center rack space.
As a result, we saw a growing demand for scalable, enterprise-class data protection that lets enterprises add performance and capacity as they need it to backup and protect tens of petabytes in a single system. They are designed for high performance and massive capacity, support multiple protocols (Fibre Channel, 10 Gb E) and backup environments (tape library, OST), and deduplicate large databases and other data types that single-node systems cannot handle efficiently.
2012 Trend #2: Electronic replication use increased
Although physical tape libraries are still the least expensive way to provide offsite disaster protection, they cannot restore backups fast enough for today's data-dependent companies. As a result, in the past year, more large enterprises began using electronic replication technologies for disaster recovery protection. In a recent survey Sepaton (the company I work for) conducted of large enterprises, nearly half (47 percent) were replicating more than 50 percent of their data to a remote location. Nearly a quarter of these companies have implemented active-active replication strategies that enable immediate restoration of services and data from a remote site in the event of a disaster. Clearly, the increasing need for continuous access to applications and data is driving companies to adopt faster, more efficient remote replication technologies.
2012 Trend #3: Databases dominated backup environments
Database data grew to become a larger proportion of overall backup volume. Companies are using larger Oracle, DB2, and SQL databases to run more of their business-critical operations. Backing up these large databases within backup and replication windows is becoming an increasingly difficult task for several reasons. First, companies rely on using fast multistreaming/multiplexing backup technologies to back up databases within backup windows.
However, because inline-hash-based deduplication technologies cannot process multistreaming/multiplexing, we saw unchecked data growth in a large number of enterprise backup environments. Databases also store data in very small segments (less than 8 KB) that are also very difficult for inline-hash-based deduplication technologies -- further contributing to the fast data growth.
Three Predictions for 2013
As we saw in the past year, even with smart policy-driven tiering, large enterprises and big data environments will need fast, scalable ingest performance to meet their data protection needs. The already exponential growth of data in today's enterprise will reach avalanche proportions as more companies adopt cloud and big data analytics technologies. Based on this growth, I predict the following three trends.
Prediction #1: Enterprise-class grid scalable systems will be consolidated
Stacking up dozens of single-node, inline deduplication systems will continue to lead to crushing data center sprawl and complexity. To meet data protection needs and shrinking backup windows, enterprises will increasingly consolidate data onto a small number of disk-based solutions with massive, single-system capacity coupled with scalable, multi-node deduplication that doesn't slow data ingest or bog down restore times.
They will move away from hash-based inline deduplication and to byte-differential ContentAware deduplication technologies that are designed to cut capacity in multiplexed, multistreamed databases and massive backup environments without slowing backup performance. They will need systems that provide a high degree of automation of disk subsystem management (e.g., load balancing) and that provide powerful management and reporting interfaces that enable IT admins to track data and manage data through the backup, deduplication, replication, and secure erasure processes efficiently.
Prediction #2: Tiered storage will become essential
With more companies adopting cloud and big data technologies, the days of backing up and storing everything (just in case) are over. Cost reduction, labor savings, and system efficiency are paramount to allow organizations to protect massively growing amounts of data while meeting backup windows and business objectives. As we look ahead to 2013 and beyond, enterprises will increasingly require automatic, tiered data protection that includes CDP, snapshots, D2D, tape, remote replication, and cloud. To stay cost-effective and to enable data managers to balance costs with recovery time and risk, data protection technologies will need to be capable of automatically identifying and moving low-priority data to the lowest cost recovery tier -- without administrator involvement.
Prediction #3: A holistic enterprise-wide data protection approach will emerge
Given the scale of enterprises and big data environments, high-risk, rip-and-replace solutions and multiple single-node silos of storage are not feasible solutions for data protection. Companies will adopt new data protection solutions that coexist and integrate seamlessly into existing environments without disruption or added complexity. They will enable companies to consolidate all of their backup requirements for the entire enterprise into a system that can be managed through a simple view that consolidates management and reporting for the entire infrastructure -- including both existing and new technology.
Over time, they will migrate data to a few massively scalable, highly automated systems to handle all backup, deduplication, replication, and restoration in a highly efficient, tiered infrastructure.
Conclusion
Although enterprises have become accustomed to rapid backup data growth, few are prepared for the avalanche of data that is going to result from cloud and big data analytics in the coming year and beyond. This data growth will challenge the capabilities of even the most robust backup environments. Short-term solutions, solutions with limited scalability, and solutions that add complexity to the data center will no longer be viable options.
Jeff Tofano, the chief technology officer at Sepaton, Inc., has more than thirty years of experience in the data protection, storage, and high-availability industries. As CTO, he leads Sepaton's technical direction and drives product and architectural decisions to take full advantage of Sepaton's singular ability to address the data protection needs of large enterprises. His experience includes serving as CTO of Quantum and holding senior architect roles with Oracle Corporation, Tandem Computers, and Stratus Technologies. Jeff earned a BA in computer science and mathematics from Colgate University. You can contact the author at [email protected].