In-Depth
3 "Big-Data" Predictions for 2011
How will compliance-driven structured data retention affect your enterprise? Will expensive data management software fall and sustainable IT rise as a result?
By Ramon Chen, Vice President of Product Management, RainStor
Although organizations in health care, communications, financial services, and other industries will see big data as a new opportunity to improve customer care, unearth business insights, control operational costs, and in some cases enable entirely new business models, I believe that big data growth will force organizations across all industries to rethink their infrastructure and capacity plans if they are to successfully manage cost and compliance for massive structured data retention and online retrieval.
The concept of big data over the past year has gained considerable ground. It is one reason for the increased M&A activity such as IBM acquiring Netezza and EMC acquiring Greenplum. However, the concept of big data is no longer focused on just analytics, business intelligence, and IT grappling with consolidating large disparate data sets for more meaningful reports.
Moving into 2011 and beyond, organizations will need to retain data online and make it accessible for longer time periods because of ever more stringent compliance regulations driven by external governing bodies. Keeping data longer means infrastructure capacity is stretched physically, virtually, and economically. IT is now forced to do more with the same or fewer resources, and that pressure ultimately falls to many of the hardware and database vendors to innovate and provide solutions with a lower TCO.
Looking into our crystal ball, we see big data retention fast becoming a central theme for IT organizations in 2011.
Prediction #1: Not all data is created equal. Traditional relational database management systems will be challenged in 2011.
Traditional RDBMS' will no longer be the default solution when it comes to storing and retaining large data sets. OLTP and OLAP systems won't disappear, but there are a myriad of cases where this type of solution is simply "overkill." Transactional application data will still utilize an RDBMS to support updates and edits, but the growing percentage of static historical data in these repositories will weigh on the performance and TCO of these systems.
In 2011, enterprises will recognize that offloading older, historical data to a secondary downstream online repository is a best-practice approach that can significantly improve their efficiency and reduce their cost. During 2010, we have already seen the rise in popularity of other flavors of repositories including columnar, in-memory, Hadoop/MapReduce, and other NoSQL approaches made popular by Google, Facebook, and other large-scale Internet applications.M/p>
Industry sectors that are experiencing a combination of stringent compliance and growing big data volumes include Telecommunications with call data records and SMS/MMS messages that need to be retained for a minimum of 7 years across most countries in the EMEA region. Japan and India also have stringent compliance on all communications records for lawful intercept purposes. Sensor data from SmartGrid meters and building management systems such as those that control heating and light also constitute growing data sets that need to be stored for 3-6 months or longer.
IT security vendors, including those involved with network packet data, are retaining data longer so they can develop better algorithms and models to prevent future network breaches; the volume of retained data can quickly total petabytes. Cybersecurity, which has even more stringent requirements (including longer data retention periods), is another example of a fast-growing big-data sector.
Looking ahead to 2011, expect more deployment of specialized repositories to better manage and scale big data -- especially human activity or machine-generated data that is immediately historical once generated and does not require a traditional RDBMS. New systems that can significantly compress and reduce the footprint of structured data, can run on inexpensive commodity hardware, easily scale to manage growth, and are cloud enabled will be in high demand across many sectors next year.
Prediction #2: Cloud architecture deployments will grow, specifically for long-term data storage and retention.
Cloud offerings are starting to mature and new players are entering the market. Larger ISVs are "buying their way" into the market through acquisition, reflecting the belief that enterprise clouds will soon form.
Even though compelling economics are still outweighed by security and availability concerns, for 2011 we believe the use of cloud for data storage and retention will pave the way for future cloud use cases. Application retirement is a perfect fit for the cloud since the data will be less-frequently accessed, thereby utilizing the elastic nature of the computing power available.
According to recent 451Group market research, a conservative estimate of the revenue to be generated by cloud (including SaaS) in 2010 is $8.7 billion, and the market will grow at a compound annual growth rate (CAGR) of 24 percent, reaching $16.7 billion in 2013. Excluding software-as-a-service (SaaS), 451Group estimates the cloud market will total $964 million in 2010 and will grow at a CAGR of 60 percent to reach $3.9 billion in 2013. Cloud storage will be a significant part of that growth.
Prediction #3: Enterprises will search for sustainable storage.
Next year, enterprises will seek out storage and supporting software that can scale cost efficiently and sustainably retain big data as data volumes continue to grow. Enterprises are becoming much more aware of the ramifications of emissions and power consumption of their data centers. With the big data deluge pushing up storage needs across all industries, vendors will seek out eco-friendly storage solutions as part of their corporate strategy.
Although hardware manufacturers can lead the way in reducing power consumption and emissions, data retention and management software will play a critical role by ensuring that the physical storage space is logically expanded through compression and data reduction. Beyond these capabilities, how the data is retrieved or extracted for on-demand use will play an important part in the data retention "carbon footprint" of a corporation. Although unstructured data files can easily be located and accessed on an individual basis, structured data needs to be accessible through queries that can pre-filter and extract only a partial subset of the data without the need to download huge databases.
Ramon Chen is vice president of product management at RainStor. You can contact the author at ramon.chen@rainstor.com