Archiving Data to Increase Data Warehouse Performance

Customers turn to data archiving to address data warehouse performance, availability woes

When organizations are concerned about the size and performance of their data warehouses, they often turn to data-quality or data-profiling tools to help solve the problem. Thanks to a technology that got its start in relational database environments, these customers have another option as well.

Data archiving describes the process of moving aged or infrequently accessed information out of data warehouse and into near-line or off-line storage. Logical links remain, so as far as the data warehouse is concerned, nothing has changed. But in many cases, the size of a data warehouse can be dramatically compacted in this fashion. Advocates say this results in improved data warehouse performance and—paradoxically—better availability.

“Database archiving lets you identify relational data that is not used often and gives you the option of storing it in another place, and because you’re reducing the size [of the database], that can result in some real performance benefits for customers,” says Jim Lee, vice-president of marketing with data archiving specialist Princeton Softech.

In addition to driving performance improvements, organizations can also tap data archiving to reduce their operating costs: Smaller data warehouses don’t need as much storage, can run on smaller servers, and require fewer staff to effectively manage them. With structured data growing at a 125 percent clip annually (according to META Group), and with an increasing number of data warehouses trending toward or above the once-benchmark 1 TB level, that’s an important consideration. “Most prospective customers recognize the value of what we do and how we do it,” Lee comments. Several trends are driving uptake of data archiving solutions, he says, but two of the most important are regulatory compliance requirements and limited IT budgets.

A number of compliance measures mandate retention periods for e-mail and other business records, or specify that data must be preserved and maintained in a manner that can guarantee its authenticity. Rather than keeping all of this data online in relational databases or data warehouses, Lee says, organizations should opt to move it to near-line storage—where it can be accessed by online users—or to offline storage resources, such as write once, read many (WORM) drives, which satisfy stringent data authenticity requirements. The best thing about such an approach, he argues, is that it can quickly be restored in the event of a disaster, or if required by federal or regulatory authorities.

“If you think about the typical disaster recovery plan, with a hot site and a backup, if you can archive and reduce your primary database by 30 percent, you can actually reduce that 30 percent twice,” he argues. “In some cases where we had customers doing automatic recovery, they actually ended up looking to archiving when they failed their disaster recovery service-level test. What they realized is that it took too long to recover the database.”

Most data archiving adopters have tapped the technology for use with relational databases, but Lee says there are also several thriving use cases in data warehousing environments. “One that we’ve seen with our customers is that they’re actually warehousing the reports themselves, as opposed to just raw data to run additional queries on,” he explains. “In that scenario, we’ve found that they’re archiving out the reports that are generated off the warehouse. When they’re needed, they’ll first check the warehouse, then they’ll check the archive, and then if it’s not there, they reconstruct the report from scratch.”

In data warehousing environments, especially, data retention is critical: What is a data warehouse if not a platform for the consolidation of historical data? In this respect, Lee says, data archiving has a very good story to tell. “Assuming the type of application, a data warehouse tends to keep two or three years of data [on hand], but many [data warehouses] have been around for six or seven years,” he comments. “As customers have found with warehouses, they need to retain that data, because it factors into decision-making. But do they need to retain all of it online, in the data warehouse? One thing they can do is archive it out the same way they do with OLTP systems, although it’s a little different.”

In at least one case, Lee says, a customer created an additional data warehouse to function as a destination for archived data. Call it meta archiving, if you will: “They actually have multiple warehouses, but they archive to an archive warehouse where they just store everything. It’s a low volume, not a souped-up server, it’s a way to streamline the mainline warehouses.”

Princeton Softech markets data archiving products for all prominent relational database products—DB2 UDB, Informix, Oracle, SQL Server, and Sybase—along with mainframe DB2. The company also offers archiving products for CRM systems from Amdocs Ltd. (ClarifyCRM) and Siebel Systems Inc., along with ERP systems from Oracle Corp. and PeopleSoft Corp. The company says it provides native drivers for a variety of near-line and off-line storage resources from EMC Corp., IBM Corp., and other vendors.

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.