Q&A: Private Clouds Can Speed Data Access
A private cloud solution allows customers to access giant data centers themselves almost immediately after the data has been collected, thus addressing a long-standing need in the retail sector.
- By Linda Briggs
- 01/20/2010
One of the biggest challenges facing retailers is how to analyze the huge amounts of data collected nearly constantly from individual customer transactions -- in which store the purchase was made, when it was made, what was bought, how it was paid for, and much more. Accessing that gold mine of data in near-real time can yield immediate, up-close insights into customer behavior. However, because of the volume, truly fast data analysis has proven to be extremely difficult.
One way to address the issue, explains Jim Mattecheck, vice president of 1010data's Retail Solutions Group, is through a proprietary backend database program such as 1010data offers, which got its start back in 1993 working with data from Wall Street -- a classic high-volume, high-demand setting. As Mattecheck explains in this interview, this "private cloud" computing can produce a solution that allows customers to manipulate entire, huge datasets themselves immediately after data has been collected.
TDWI: You've said that 1010data captures some 1.5 billion rows of transactional data daily, making it one of the largest data sets in the commercial world. What are the challenges of cloud computing specific to very large data centers?
Jim Mattecheck: With large data centers, there are usually multiple sources of data, and typically they are not all in the cloud. Therefore, marrying your internal transactional systems with the external cloud is a big challenge. It's a challenge from two angles. First, making the connections, and then once the connections are made, dealing with performance, which is probably the biggest issue. It's as if a car were running on eight cylinders, but if six are fast and two are slow, that's going to be a problem.
In terms of performance, cloud computing vendors have a very difficult challenge in sizing the demands of their users, since those demands are virtually unlimited. For examples, look at Google and Amazon as they try to assess what the demand will be on their infrastructures when the possibilities are limitless. Scalability within the cloud is a challenge for both the customer and the vendor.
Is the challenge that systems just aren't set up for that volume of data yet?
No. Internet response time is actually not as big an issue as people think. Most large companies, and even smaller ones, have good speed to the pipe. The scalability of the Internet itself is not the bottleneck today. Instead, it's the individual nodes within the infrastructure that need to be set up to handle the appropriate volume of traffic and data.
So most of the issues are with the software itself?
Yes, especially when you're talking about 10-plus billion rows of data, as we often are at 1010data. That's a large data set. The demand for handling more and more data is growing exponentially, meaning that the software that is moving around that data must be able to handle trillions of rows of data with appropriate response times.
How much of an issue is security in cloud computing?
We've addressed the security issue at 1010data; we use secure encrypted methods for collecting data.
With cloud computing in general, security really should be a bigger concept or issue than it is. It's amazing how many people buy things over the Internet without thinking much about security.
[In purchasing a cloud computing solution,] security absolutely should be a question posed to every vendor under consideration. In a few years, I predict it won't even be an issue because all vendors will have to play in a secured environment.
So security will just be a given at that point. But we're not there yet?
I don't think so.
What about service-level agreements in the cloud? In terms of best practices, can you offer some suggestions on how SLAs should be structured to address performance and other issues?
Cloud computing actually gives companies the opportunity to set up service-level agreements that are better than SLAs set up under a traditional data warehousing environment. Most companies manage their data warehouse as a lower-level priority in terms of recovery and accessibility and uptime guarantees, simply because it's historical data versus live transactional data.
With the cloud, you can provide service-level agreements that should be well above 99 percent uptime, 7/24, 365 days a year.
The cloud should improve on delivery to users, and it should be cheaper -- significantly cheaper, not just incrementally cheaper.
Regarding historical data versus transactional in the cloud, what are your thoughts on where companies should be spending money and effort?
I feel very strongly that companies should not spend a lot of money on historical analysis, because there's not a lot you can do about what happened. You want to use historical data to make better decisions in the future, [but] you shouldn't be spending a lot of money on that. Really, the amount of money people are spending on data warehousing is just ridiculous. There's simply too much money, in my opinion, spent on the care and feeding of historical data, rather than on transactional data. That's where the focus should be -- modeling and predicting the future.
Do you define what 1010data offers in terms of cloud computing? Can you describe where you fit in the market?
You can call it a dedicated cloud or the 1010data cloud. We're a dedicated environment for the customer. It's certainly the cloud, but it's a very specialized hosted service; we've been doing it for a long time.
At 1010data, we give what I call upside-down analytics. Instead of starting at the top with assumptions on what you think happened, and why, and working your way down, you start with the user and ask, what do you want to see? We go up from there -- we dive in and say, what do you have in terms of data and what do you want to know?
That's true analytics -- ask a question, get an answer, and the answer generates two or three more questions. You simply cannot do that without instantaneous response time, because you lose your train of thought.
At 1010data, we provide the ability to go to the source data very quickly. The volume of data we load in minutes is quite impressive. It's one of our secret sauces -- our ability to get data into the system quickly.