In-Depth

The Pros and Cons of Storage Caching Devices

Is a caching device right for your organization?

By Seiji Shintaku, Director of Product Management, Likewise Software

Recently a friend and I were talking about inline storage devices because one of my clients was looking at NFS and LUN caching appliances. My friend stated simply that inline devices never work. He had a very convincing argument: every device he had seen implemented was eventually ripped out of the infrastructure because there were too many inherent flaws with the inline data devices. His argument was not so much about the reading of data but the writing of data.

Is it possible to get a consistent view of the world when a third party is receiving the write request and loses contact with the back-end NAS device?

Using a caching device is a big concern, especially because you can even bypass the device when writing to the LUN or the NFS volume. However, most devices perform synchronous write operations (i.e., the caching device will not acknowledge the write operation until the actual SAN array or NAS array has acknowledged the write operation). If the origin volume is no longer available, the "cached" volume will fail when a write operation occurs. The read-only data will continue to work as long as data has been pulled in by the caching algorithm.

In general, write operations are expensive and complicated. With intelligent servers and NAS devices, the write operation is written to cache (and is journaled on) an internal card with a backup battery of its own. If the server or NAS device fails, the write operation is kept on the internal card as long as the battery is functional. When the NAS or server comes back up, the internal card would replay the write operation and go on its merry way. Eventually the data has to be written to a physical medium, which is the disk. If the whole data caching and flushing system backs up and can't flow correctly, then the users see slow performance or latency.

Think of it this way. If you drain water from a single sink, your sewage system may be able to drain it fast enough. However, if you increase the volume too much by draining too many sinks, showers, bathtubs and toilets into the same sewage pipe, the pipe will not be able to drain fast enough and water build-up will occur. The same is true with a storage array. If large write operations are sustained over a period of time and data can't be flushed to disk fast enough, eventually you will see pent-up demand and the array would not be able to service the request. At that point, your users will begin to complain about performance.

One business unit has data that is 80 percent read-only. Obviously, a caching device will excel in a use case like this once the device is fully "warmed" up. The write operation may not be a great benefit, but if the business unit will perform mostly read operations, a caching device will accelerate that operation and the company can spend less money on the NAS device because they purchase smaller controllers on the back-end and fewer disks. Most people purchase more disks to get greater spindle count, which equates to faster write operation speeds. This also equates to a greater capital expenditure. Given the right circumstance, a caching device can be used to scale out read operations at a lower cost.

The working set determines how well the caching device will perform. If the working set is reread and reused repeatedly, the caching device will be extremely effective. If the application calls in the working dataset and then another and never reuses the dataset, the caching device will actually hinder performance compared to writing directly to the NAS or SAN device itself.

Essentially, a caching device makes sense for reading data and can be extremely effective. If a caching device is used, place it in synchronous write mode to limit the potential for data loss. Most if not all intelligent caching devices will journal the write operations, so if you were to do one-minute intervals of asynchronous write updates, the likelihood of data loss is still very low.

Some clients choose to hold everything on the caching device for eight to 12 hours and flush the data to the SAN or NAS device because the client's working set is the busiest during working hours. When users go home, the data is written to the real storage device.

Regardless of which caching device is used, there needs to be a proof-of-concept project to analyze how the data is being used and make minor adjustments based on what you discover. You may decide a caching device is not a good fit for your organization.

Seiji Shintaku is the director of product management at Likewise Software where he validates product requirements with customers and OEM partners and provides guidance on storage-related applications such as auditing for PCI, SOX, FISMA, and HIPAA compliance and data governance. He has also engineered global cloud computing, storage initiatives and virtualization environments in the financial, manufacturing and bio-medical environments. You can contact the author at sshintaku@likewise.com

Must Read Articles