Q&A: Why Big Data Security in the Cloud is No Small Matter
Big data and cloud computing are two top-of-mind technologies. What benefits can IT enjoy by bringing them together?
Big data is getting bigger all the time. Cloud computing is garnering increasing interest from IT. Are these two technologies bound to come together, and if so, what benefits to they offer the data center?
To learn more about the convergence of Big Data and the cloud, we turned to Rand Wacker, vice president of product management for CloudPassage, where he helps enterprises adopt disruptive cloud computing services by solving the security and compliance problems that arise when IT moves from private data centers to public cloud services.
Enterprise Strategies: What business and operational benefits of cloud computing match the goals of big data users?
Rand Wacker: Let’s start with two benefits of cloud computing that match well with the goals of big data users.
The first one concerns budget and the way it is spent. Without having to invest in hardware, software, and maintenance up front, your cloud can scale up or down based on the actual demand for computing resources, and usage is based on a pay-as-you-go basis. Instead of provisioning your computing infrastructure for peak load -- and then some -- you can cover the base load with your own private cloud and either lease the spikes or burst out into the public cloud when additional temporary resources are needed, as it is often the case with big data projects. This can be more cost-effective compared to the classic on-premise data center, as you pay only for the resources you actually consume.
The second benefit is business agility (speed to deployment). Cloud infrastructure providers are highly automated and make it possible to provision computing resources at the push of a button. It can take you only minutes to spin up hundreds or even thousands cloud servers. Compare that with the time it took in the past to request this from your IT department.
Additionally, many cloud providers are now offering special feature and pricing plans to allow customers to optimize their deployments based on availability and cost for compute on a real-time basis. On-demand spot pricing and CPU bidding allows companies to further reduce compute costs when analysis jobs don’t need to be completed in real-time.
Is the future of big data in the cloud?
The cloud and the future of big data are closely intertwined. As I mentioned before, the cloud allows companies to dramatically scale their computing power up, allowing them to turn their analyses into a competitive advantage before their competition does.
Also, fueled by cloud-based SaaS (software-as-a-service) applications, many companies that did not even think of maintaining their own IT department in the past can now afford to take full advantage of the data they generate. Simply collecting and storing data can be an issue for small and midsize companies. Now there are big data SaaS analysis tools available to make sense of it, too.
What use cases of big data in the cloud are you seeing?
Many big data projects are being driven by life sciences and health-care organizations. Applications such as genome sequencing or drug-test simulations represent just this kind of temporary, high-load computing use case that is ideal for flexible public cloud environments.
Traditionally, the retail industry has been one of the areas where a lot of data was generated at the point-of-sale (POS) and the store level, and it often could not be efficiently handled with limited computing resources. E-tailers are constantly honing their recommendation engines, capturing more variables to predict customer preferences at the most sophisticated level. The cloud is an affordable place for these online retailers to turn their big data into analytics that feed their marketing strategies.
The financial services segment is another great use case for big data analysis. Whether it is credit card fraud analysis or data mining to create new financial products and services, there is a host of applications where temporary cloud resources help financial companies create innovative solutions to provide better customer service.
How do companies deploy big data in the cloud? What accounts for their success?
The on-demand nature of the cloud enables companies to very easily secure the amount of processing power they need, no matter how large or how small. Instead of building a data center dedicated to data analysis, companies can lease servers by the hour. For highly variable workloads, metered (or utility) billing matches costs to usage and can lower overall investment, especially for firms that are just beginning to leverage big data technology.
What mistakes do companies typically make in deploying big data to the cloud?
Often, individuals who aren’t part of the traditional IT organization are put in charge of the cloud deployments, and they’re usually unaware of the security and operational needs that go along with the computing resources. This is understandable given that traditional server provisioning has been done by IT and has security considerations built-in.
Those overseeing a deployment to the cloud, for whatever purpose, must understand their responsibility in the operation of their servers, as stated by their cloud provider. Furthermore, they must know their company’s security policies and understand how to apply those to the cloud servers they are using for analysis.
What are the threats in cloud environments?
Due to a lack of perimeter controls such as firewalls and other security systems that used to protect traditional data centers, cloud environments are at significantly increased risk of being hacked. A cloud server left open to compromise that is replicated into a hundred clones when cloud bursting, will multiply the attackable surface area and increase the exposure by an order of magnitude.
There are criminal organizations in all corners of the world operating large-scale fraud campaigns using massive botnets built of huge numbers of personal computers and servers on the Internet. These botnets turn the machines they capture into “zombies” or command-and-control drones via remote control software, rootkits, and other malware.
Zombies are then robustly networked to create botnets of massive proportions to carry out fraud, phishing, and denial of service attacks. Bot herders and routinely target the IP address ranges of public cloud providers. They know that public cloud servers are softer targets without the layers of perimeter security that are found in a data center. They also know that the chances of the server being replicated once compromised are high, meaning easy growth of their botnet capacity. Without appropriate security controls, cloud servers will come under attack, sometimes within minutes after launching.
How is security different in the cloud? Why can't traditional security approaches be used?
For decades, data centers have relied on strong perimeter controls to prevent server weaknesses from being exploited. Relatively lax enforcement of security standards was tenable in these environments because servers were safe behind the corporate firewall. A completely new approach is required for securing cloud servers. Without defined perimeters or security choke points, elastic cloud environments are much more difficult to secure. Security mechanisms need to expand, contract and automatically update along with the cloud server environment that changes dynamically. Every single server has to be rigorously hardened before it can be exposed to public cloud threats.
Who holds the responsibility for each layer of security in the cloud?
IaaS (infrastructure-as-a-service) cloud providers are stating in their SLAs that they share the responsibility for security with their customers. This issue is new and not well understood by many who are new to cloud computing security and incorrectly assume the provider will handle all of the security. In a recent survey, CloudPassage found that 31 percent of respondents believe that their cloud provider will take care of securing their cloud servers.
In reality, IaaS cloud providers are responsible for securing their physical infrastructure, compute infrastructure, and hypervisor software, but the end customer holds responsibility for their virtual servers, including the operating system, application stack, and data they contain.
Regardless of job title, anyone spinning up servers in the cloud must be sure to collaborate with their IT and security departments to understand what needs to be enforced on the cloud servers to protect the very valuable data and algorithms they are processing.
Does Big Data security differ between public, private, and hybrid cloud environments?
Only if you are running your private cloud, hosted on your premises, do you have the same controls you have today in your own data center. In a hybrid or public cloud model, you have to secure each server before it gets exposed to the public.
Who is running the processing systems in the cloud? Who should be running them?
Very often, business analysts, marketing teams, or other non-IT groups are running the analysis systems in the cloud. One of the great benefits of the cloud model is that any group can provision their own servers, but they must be aware of the risks and their responsibilities.
What products or services does CloudPassage offer that are relevant to our discussion?
CloudPassage offers Halo, a security platform based on a new architecture designed for dynamic cloud environments. It allows companies to automatically secure all servers, regardless of whether they are in the public, private, or hybrid cloud, even in their own data center. Companies have the flexibility to port their big data projects securely across different cloud environments.
CloudPassage Halo is delivered as SaaS and billed by the hour, matching the cloud deployment model. It allows analysts, DBAs, and BI professionals to move their servers seamlessly across cloud providers, even to their own data center without exposing them to the public. For companies just starting out in the cloud, Halo Basic offers comprehensive server protection for free for up to 25 servers.