Microsoft Reveals More of its Hadoop/Windows Interoperability Plans

Microsoft last week has provided more information about its "big data" and Hadoop Windows interoperability plans.  The details were provided on Wednesday by David Campbell, a Microsoft technical fellow, at the Strata Conference in Santa Clara, Calif.

Hadoop, an open source MapReduce framework sponsored by the Apache Software Foundation, lets users gather business intelligence (BI) from unstructured and structured data at petabyte levels. During the talk, Campbell said that Microsoft is submitting two new proposals to the Apache Software Foundation for its effort.

One proposal is for a new JavaScript framework for writing MapReduce programs. The second proposal focuses on creating a new open database connectivity (ODBC) driver for Hive, Hadoop's data warehouse system.

The addition of the Hive driver will bring Microsoft's BI tools to bear on Hadoop. For instance, Microsoft is touting the use of PowerPivot for Excel, as well as Power View, which will arrive with SQL Server 2012 product this month (Microsoft is planning a SQL Server 2012 "launch event" on Wednesday). Both PowerPivot and Power View can be used to visually display Hadoop query results.

Campbell explained that having Microsoft's BI tools for Hadoop is important because researchers need common tools to share their results. He proposed Data Explorer as one option for sharing. Data Explorer, offered via Microsoft's SQL Azure Labs, handles multiple data formats and ties into the Windows Azure Marketplace, which provides data feeds for a price. Data Explorer for SQL Azure provides "capabilities for data curation, collaboration and mashup," according to Microsoft's datasheet description.

With Hadoop, users can improve "time to insight" when sifting through masses of data, and data quantity is key. Campbell said that when using Hadoop, having more data with a less sophisticated algorithm is better than having less data with a more sophisticated algorithm.

Hadoop, largely fostered by Yahoo, has been used for social media analytics and ad analytics, among other applications. It's designed to handle big data at a low cost. For instance, it is designed to run on commodity hardware and permits queries to be conducted on big piles of data on an ad hoc basis.

Partner Efforts

Microsoft's main partner on Hadoop is Hortonworks, a key contributor to the open source effort. Hortonworks is collaborating with Microsoft on the Hive ODBC driver and the JavaScript framework. In addition, patches are being contributed to the Apache Software Foundation to enable Apache Hadoop version 1.0 on Windows Server, according to an announcement issued by Hortonworks.

Other Microsoft Hadoop partner efforts were announced last week at the Strata Conference. Microsoft is collaborating with Karmasphere to enable Karmasphere's tools on Hadoop for Windows Server and Windows Azure, including Karmasphere Analyst and Karmasphere Studio. Microsoft is also working with Datameer to help make that company's BI tools work with Hadoop on Windows Azure, according to Datameer's announcement. HStreaming and Microsoft have formed a "strategic relationship" that will enable HStreaming's real-time analytics tools to work with Hadoop on Windows Server and Windows Azure. The HStreaming effort is currently open for testing through a Microsoft Community Technology Preview program, according to HStreaming's announcement.

Microsoft has its own SQL Server technology -- StreamInsight -- that is used for complex event processing. StreamInsight could possibly be used with Hadoop MapReduce jobs during the reducer phase, according to a Microsoft blog post. The reducer step is part of a three-tier Hadoop structure. Hadoop consists of the Hadoop Distributed File System at its base. Data is mapped via MapReduce. Finally, data undergoes a "reduce" operation, which produces a summary of the data after it has been processed in parallel.

"Isotope" Release Plans

Microsoft uses an internal code name ("Isotope") that describes its Hadoop interoperability efforts with Windows Server and Windows Azure, according to Alex Stojanovic, Microsoft's general manager of Hadoop on Azure and Windows, in a Channel 9 video. In that video, Stojanovic said that Microsoft will deliver Isotope on Azure in March 2012. In June 2012, Microsoft will deliver the general availability release of the "enterprise edition," he added. Microsoft also plans to provide "deep integration" with System Center, he said.

According to a tip given to veteran Microsoft observer Mary Jo Foley, the specific delivery dates have been disclosed. Microsoft will deliver Hadoop on Windows Azure on March 30, whereas Hadoop on Windows Server is expected to arrive on June 29, according to Foley's source.

About the Author

Kurt Mackie is senior news producer for the 1105 Enterprise Computing Group.