Single-Instance Storage for Mastering High-Volume Content Storage
As managing storage is ever more critical, single-instance storage can reduce your storage footprint and save you money.
by Steve Jones
With the explosion in the volume of data and documents, increased regulatory demands, and the skyrocketing all-in cost of physical storage, enterprises are reaching a crisis point in their ability to store, manage, and retrieve information. Although there are a number of solutions that can help reduce the physical space occupied by content, one of the most effective and contemporary approaches that can dramatically reduce demand is single-instance storage (SIS). In fact, this approach can offer the single largest operational cost reduction opportunity and has proven to reduce storage needs by as much as 90 percent for some enterprises.
SIS eliminates duplication and increases efficiency by keeping only one copy of content that multiple users or computers share. This technique can be implemented in file systems, e-mail server software, data backup, and other storage solutions.
Beyond office documents such as Word, Excel, PowerPoint, e-mail messages, and music and video files, the concept of single instance storage is particularly beneficial in the world of high-volume transactional output (HVTO). HVTO is the term commonly associated with the millions of customer-facing statements, bills, invoices, and policies that enterprises produce on a regular basis.
These documents are traditionally stored in enterprise content management (ECM) systems. Although ECM vendors typically offer compression options, they fall far short of what single instancing can offer when dealing with the newer, high-resolution, graphically rich, customer-facing content that is increasingly being produced in PDF format today.
Very few legacy or modern ECM solutions have actually taken any action to reduce the storage footprint of this content beyond basic compression scenarios. Unfortunately, older compression schemes, when applied to graphically rich document output, don’t generate a satisfactory reduction in storage size. Adding to the challenge of ever-expanding content creation and increasing individual document size is the compliance and discovery mandates that require information be retained and available for extended periods of time. Given that it can cost on average between $20 and $30 per GB per month to sustain digital storage -- and multiply that cost over several years -- this can present a monumental and ongoing problem for enterprises.
To understand how single instancing can deliver significant benefits, consider how high-volume documents are created and their make-up. High volume documents are produced in massive batches by corporate applications and composition engines on a recurring basis. Examples include credit card statements, utility bills, insurance benefits forms, etc. These are produced every billing period and either mailed or made available online through a corporate self service channel.
Each individual document in a batch is designed to have the same look and feel. The only differentiator is the individual transaction data within the document -- such as the customer’s name and address, account number, banking line items, phone call lists, or usage statistics. The common “composition resources” -- the branding, the graphics, the fonts, the overlays, the marketing messages, and the terms and conditions -- are essentially identical across every single individual document. Increasingly, the majority of these types of statements are stored in PDF format. Each individual document -- made up of its unique transactional data and a complete set of the composition resources -- is stored inside the ECM solution. This effectively results in the common composition resources being stored over and over again for every customer.
Many large corporations produce millions of these documents every month and must store them for online access for 13 months to seven years (or longer), it is no wonder that their storage requirements are growing exponentially. The answer to conquering this mountain of content can be found in solutions specifically designed to reduce high-volume transactional document storage.
Server-based document storage reduction (DSR) technologies work by intelligently separating the common composition resources from the unique transactional content that make up an HVTO document. This technique is known as “deduplication.” DSR technologies then store only one single “bundled” copy of the common composition resources for any given batch of documents. This dramatically decreases the individual document size that has to be stored. When this technique is applied across a large set of like documents, the savings can be substantial.
Pointers are embedded within each transactional document that links it to the appropriate common composition resource bundle. When the document is retrieved from the archive, the two pieces are integrated in real time and the complete document is reconstituted exactly as it was originally designed.
To determine how much can be saved through this approach, take the number of documents in a given batch multiplied by the average size of each document multiplied by the number of times the document is produced per month multiplied by the average cost of storage per month ($25/GB/month has proven to be an acceptable, fully burdened storage cost based on analyst and customer engagement).
Following is a real life ROI estimate for an individual HVDO document type produced by a national insurance organization. The reduction in size of stored content is dependent upon the composition level and percent of duplication. In this case, following technical document analysis, an 83 percent reduction was determined -- although that could be higher.
2,000,000 docs/day X 70Kb/doc X 22 days/month X .000025 per Kb/month = $77,000 per month fully burdened storage cost
$77,000 X 83% reduction in storage = $63,910 savings per month
One important consideration when applying the SIS approach to HVTO is that resources will often change from one statement run to another. To apply resource deduplication across documents over time, the solution must have the capabilities to extract and manage the different versions of resources bundles used.
There is no question that the exponential growth of content is putting pressure on enterprises to manage storage resources more efficiently and cost-effectively. A number of approaches can be used to reduce an organization’s storage footprint. An increasingly popular and highly effective method that can deliver rapid ROI is implementing a single instance storage approach. When applied properly, single instance storage has already proven to dramatically reduce costs and improve overall ECM performance for many of today’s enterprises.
Steve Jones is vice president, solutions strategy at Xenos Group (http://www.xenos.com) and is responsible for delivering strategies based on Xenos Enterprise Server, the core of Xenos’ technology offering. Steve can be reached at firstname.lastname@example.org