Q&A: Venture Capital Investor Sees Solid Future for Cloud BI, DW

Business intelligence delivered via software-as-a-service (SaaS) seems to be on the cusp of a wave of popularity, but it also brings some risks, as the recent demise of SaaS BI company LucidEra indicates. What's the future of SaaS BI and data warehousing in the cloud, and how can companies mitigate the risk?

The appeal of the cloud is catching on in the business intelligence and data warehousing world, where cloud-based data warehouses and software-as-a-service (SaaS) BI can offer clear benefits in both data warehousing and BI. Those benefits include lower initial capital expenditures -- especially attractive in a weak economy -- a shorter time to deployment, and options for collaborations between divisions and even across company borders.

As a managing director investing in information technology for Palo-Alto-based venture capital firm Trident Capital, Evangelos Simoudis is admittedly bullish about SaaS BI and cloud-based data warehousing. In this interview, he explains why, as well as where he sees SaaS BI headed despite the recent demise of one of the first SaaS BI companies. He also discusses how companies can mitigate the risk of moving data warehousing to the cloud.

Simoudis, who has more than 25 years of experience in information technology, holds a Ph.D. in computer science from Brandeis University and a B.S. in electrical engineering from Caltech. He joined Trident in 2005 and invests in Internet and software businesses. He is a director on the boards of two BI companies, Pivotlink and Host Analytics.

BI This Week: As a venture capitalist watching technology markets, what do you see as the immediate and long-term future for BI offered as an off-premise service?

Evangelos Simoudis: Clearly, there's a renaissance of sorts with BI and analytics in general. Businesses, regardless of industry sector or size, are becoming analytics driven. Online businesses in particular have led the way in demonstrating the effective use of data to make meaningful decisions.

Seeing how online businesses (for example, Amazon's use of data about customer activity to make its product recommendation engine more effective) are analyzing their data with good results, other corporations now want a quick path to analytic information. Unfortunately, though progress has been made in the 25 years that BI solutions have been available, the reach of BI within corporations is still limited. Even within the Global 2000, estimates are that on average, only 20 percent of corporate users that might benefit from BI and analytics, actually use any type of formal BI solution.

We can attribute this low adoption rate to either high cost or the perceived complexity of most solutions. Smaller companies, in particular, often lack the financial resources to purchase a costly solution or the IT resources to properly deploy and support it.

Even with the slower adoption rate for formal BI solutions, corporations usually have dozens or even hundreds of small ad hoc data marts in house, built using inexpensive databases such as MySQL and Microsoft Access, often accessed through Microsoft Excel. These applications are typically maintained by their business owners rather than by IT personnel, because while their content is valuable, they aren't viewed as strategic enough to warrant IT resources.

SaaS BI solutions can effectively address all of these less-than-ideal situations. Such solutions are simpler to deploy, require few corporate resources to manage and maintain, and have a lower total cost of ownership (TCO) than on-premise BI solutions. As a result, SaaS BI solutions are emerging as a strong alternative for companies of any size that aim at analytics-driven decision-making.

Here are just a few of the unique advantages that SaaS BI solutions offer over their on-premise counterparts:

The wisdom of the crowds: By monitoring the use of a SaaS BI solution, the vendor can (a) dynamically optimize it to improve performance and (b) determine which features are being used effectively, which features present difficulty to the user although users find them important (thus enabling the vendor to offer training), and which features aren't being used at all, enabling the vendor to remove them from the application and avoid maintaining them over time.

Customer metrics: This information allows the vendor to make immediate, incremental improvements to the product. That is difficult to do when the vendor's solution is on-premise with each customer. The bigger point here is that vendors of on-premise solutions don't know how their software is being used. SaaS vendors, in contrast, can closely monitor usage and even provide user metrics to customers in the way Nielsen and comScore do.

Leveraging collective knowledge across divisions and companies: Companies whose data is in cloud-based warehouses can easily choose to share or combine certain data sets with other companies using the same vendor, thus enabling collaborative problem-solving. Employees within a company but separated geographically or by division can more easily collaborate with colleagues and partners by looking together at a single view of internal and external data, all housed in the cloud.

Decisions that are most effective in today's business world are often based on data blended from inside and outside the company. I'm already seeing early examples of this type of collaboration, and I expect collaborative BI will accelerate in the next five years; SaaS solutions will be the catalyst. Specific analysis results and associated data can also be tagged (using systems like the ones found today in consumer Internet products such as Yahoo's Delicious). Tags can be searched and shared, enabling yet another level of information use and collaboration among users.

Given those advantages, the prospects for SaaS BI seem very good. However, what does the recent demise of SaaS BI company LucidEra indicate?

I continue to consider the near- and long-term prospects of SaaS BI solutions to be extremely bright. LucidEra's recent failure is no different than what we investors see in every other early market opportunity, because even though BI has been around for many years, SaaS BI is relatively new. In new markets, some companies succeed and some fail.

I'm sure LucidEra's failure is causing companies that are already using SaaS BI solutions, or are planning to use them, to once again go over the soundness of their decision. However, my prediction is that this will only serve to improve how companies interact with their SaaS application vendors, including the ones that provide BI solutions. That is, companies should become more proactive about asking for detailed specifications on the vendor's data policies, disaster recovery policies and plans, and protection of customer data should the vendor fail.

How should organizations think about data warehouses in the cloud?

Cloud-based data warehouses offer smaller companies their first true opportunity to take full advantage of data warehousing in a cost-effective manner. By the same token, large companies are interested in the potential of cloud-based data warehousing solutions to speed up the creation of data marts from enterprise data warehouses. They can do this while reducing development time and maintenance costs, thus improving the "time to value" of data by making data available to analysts faster.

Finally, cloud-based data warehouses can provide companies with elastic capacity to help address the usage spikes that are common with mission-critical data warehouses and marts.

What sorts of concerns do you hear from IT executives about cloud-based data warehouses?

I'm hearing a number of legitimate concerns, which vendors will need to address in order to move cloud-based data warehouses forward.

Security: As prominent corporations create cloud-based data warehouses and draw visibility to the technology, CIOs fear that the cloud will become a prime target for hackers who would want to exploit the existence of large databases with prized corporate data, all in a single place.

Data Integrity: IT must guarantee that the data stored in a company's operational databases is synchronized with reasonable frequency with the data stored in the cloud-based data warehouse to ensure the cloud-based DW has the correct data to drive analyses.

Data Ingestion Throughput: This is not an issue in the short term, but will become more important as ever-larger data warehouses and marts are implemented. The size of the data sets and the time it takes to ingest them will become issues around which IT organizations will need to obtain guarantees. This issue will also need to be considered if a cloud-based warehouse is used to provide elastic capacity to address usage spikes.

Service-level agreements (SLAa): The cloud data warehouse vendor must be able to address SLAs such as uptime, query response time, and disaster recovery plans. For backup and recovery in particular, any cloud-based software vendor, including those that provide data warehousing services, should have automatic failover capability to ensure the customer won't experience any service interruptions. In addition, and depending on the critical nature of the analytic applications supported by the cloud-based DW, the customer may want to insist that the vendor use an alternate cloud vendor for backup and disaster recovery services.

The Ability to Take Back the Warehoused Data: IT must ensure that the corporate data will not become hostage to the cloud-based DW vendor. To this end, the vendor needs to make it easy for the customer to take back its entire data set, either because of a switch in vendors or because the vendor is going out of business. The cloud-based DW vendor also must make it easy for the customer to obtain subsets of stored data to use with other analytic applications offered by other on-premise or SaaS vendors.

Warehouse Auditing: IT must be able to audit the warehouse in the cloud in the same way that they audit any on-premise data warehouse, for issues including data integrity, query efficiency, and information delivery reliability.

How should companies integrate data from cloud-based transactional applications into their SaaS BI systems?

The first generation of cloud-based data warehouses are actually data marts that provided reporting either on data stored in a single application (for example, Salesforce and Netsuite are already offering reporting functionality with their applications; SaaS BI vendors such as Pivotlink also provide solutions that analyze the data stored in SaaS applications) or on data that is extracted from existing on-premise data warehouses. The next wave of cloud-based data warehouses will integrate data from several SaaS applications.

SaaS application vendors are just now starting to develop APIs that allow the type of bulk data extraction necessary for the creation and updating of data warehouses. We are also starting to see the emergence of SaaS data extraction and integration vendors (some with proprietary software and others with open source software) whose tools are used in conjunction with these cloud-based data warehouses.

Today I can think of three types of risk related to data integration into cloud-based DW:

  • Immature APIs: The APIs provided by SaaS applications are not yet mature, since their first use is transactional operations rather than analytical/warehousing operations. Since the operational databases are under the control of the vendor rather than the customer, it's up to each SaaS vendor to place the correct priority and develop APIs for warehousing operations.
  • Immature ETL Tools: ETL tools for SaaS are improving in their ability to deal with data from a single data source, but are just now being tested on multiple data sources distributed across several SaaS application vendors.
  • Distributed Data: Source data is becoming distributed across SaaS application vendors with different capabilities and priorities. That means that coordinating, synchronizing, and moving the source data across the cloud and into the cloud-based data warehouse needs to be taken into account when planning such a cloud warehouse, because the warehouse's owner has less control over the entire operation than with an on-premise equivalent.

Does that mean that having federated source data across multiple SaaS application vendors makes it harder to create a cloud-based data warehouse?

As I mentioned, we are still in the very early stages of the thinking regarding cloud-based data warehouses. Companies are most commonly at the stage of warehousing data from a single source (a single SaaS application, a single on-premise application, or another on-premise warehouse).

Companies are not yet warehousing data from multiple SaaS applications -- that still represents state-of-the-art practice rather than a mainstream practice. For the progression to be successful, IT organizations in collaboration with business users will need to exhibit new and different thinking. Will organizations be willing to devote resources to create this type of multi-source database cloud-based DW? That will depend on their overall experiences with first-generation cloud-based data warehouses. In any event, the next five years will be an active and exciting period for SaaS BI and cloud-based data warehousing.