In-Depth
Cautious Optimism on SQL Server 2005
Changes on tap in SQL Server 2005 worry some BI and DM pros
When Microsoft Corp. finally ships SQL Server 2005, enterprise programmers—many with little or no experience programming directly to a SQL Server database or data warehouse—are going to get a crack at doing just that. That worries some business intelligence (BI) and data management pros, who argue that clumsy programming by inexperienced code jockeys could seriously impact the performance and reliability of SQL Server data warehouses
Today, coding for SQL Server 2000 typically involves one of three approaches: learning Transact-SQL (T-SQL); using ADO.NET clients or other bridging technologies to build middle-tier and front-end applications; or using a third-party Common Language Runtime (CLR) assembly that isn’t natively integrated into SQL Server 2000.
In SQL Server 2005, Microsoft drops its .NET CLR directly into the SQL Server database itself, right alongside T-SQL. This means many things to many people, of course: database administrators (DBA) and some BI pros typically react in horror (citing the potential for disaster), while enterprise developers adopt quite the opposite perspective.
There’s good reason for this. In SQL Server 2005, enterprise developers can, for the first time, use the languages they’re most comfortable with—such as Visual Basic .NET, C++, or C#—to code stored procedures, database triggers, and other programs directly in the database tier.
What’s more, they can do so without (at least in theory, anyway) suffering any performance loss relative to T-SQL. This ups the ante over competitors IBM Corp. and Oracle Corp.—both have dropped out-of-process (i.e., non-native) CLR implementations into their flagship databases.
Although IBM and Oracle have, arguably, beat Microsoft to the punch, Microsoft’s native (in-process) CLR implementation should be both faster and (by virtue of its support for ADO.NET 2.0) more robust.
“The key difference is that SQL Server hosts the .NET runtime internally, so that any managed code within a database effectively treats the SQL Server process as if it were the operating system in terms of thread scheduling, garbage collection, and so on,” says Graeme Malcolm, a SQL server programmer and principal technologist with technical training and content development specialist Content Master. “The point of this exercise is to allow you to develop database objects such as stored procedures, functions, triggers, and types in managed code instead of T-SQL.”
This is music to the ears of enterprise developers, who are used to thinking in terms of a programming paradigm (procedural software development) that differs radically from the set-based orientation on which T-SQL is premised. Because of the conceptual differences between T-SQL and procedural languages such as Visual Basic .NET, or C#, the learning curve for enterprise coders unversed in the intricacies of T-SQL is rather steep. This is a real-world constraint in which DBAs and BI pros—who are typically more concerned than most about the data residing in their relational databases or data warehouses—have long taken comfort.
“There will always be a learning curve for procedural developers coming into the world of SQL. It takes time to learn to think in terms of sets and abstractions instead of doing row-by-row processing,” says Adam Machanic, a database software engineer with a telecommunications and broadband services provider based in the Northeast. “But the rewards are quite great once you get there. And I don't believe that SQL CLR will ever be able to totally eliminate that advantage.”
As a result, Machanic and other data management pros admit to having concerns about SQL Server 2005’s native CLR. Their argument, for the record, isn’t without merit. “T-SQL is a great language for data access. It is a set-based language and as such, operates on collections of data rather than [on] a single row at a time as do procedural languages such as C# and VB.NET.”
For example, says Malcolm, assume that a user needs a quick way to solve a common BI problem—e.g., identifying which customers have placed orders in the last two months. If you think about this problem in terms of a Venn diagram—i.e., two overlapping circles, with a point of overlap marking the solution set of the problem—you’ve got a good illustration of how set-based programming works. “The area where the circles overlap represents the customers who have placed orders in the past two months. Essentially, the Venn diagram is just a visual representation of a set-based operation; specifically the intersection of a set of customers and a set of orders,” he says.This is the kind of thing T-SQL is designed to do. “To do the same task ‘purely’ in a procedural language, you'd have to get the first customer, find the first order made by the customer, check the date, if the date is in the past two months add the customer to the results, otherwise find the next order and repeat until an order in the last two months is found or there are no more orders for that customer, and then move onto the next customer,” Malcolm says.
The danger, data management pros say, is that programmers who are unfamiliar with T-SQL will almost certainly opt to use procedural languages to incorporate query capabilities into their applications—even when this clearly isn’t the best idea. “These guys are already writing cursors because they do not understand declarative, set-oriented programming. Why would they stop when they have languages that do not support set operations?” asks SQL consultant Joe Celko, author of Joe Celko’s SQL Programming Style. “Even worse, they will assume that their native language is universal and start modifying data with their rules. MOD() functions are not the same in various languages; C# and VB disagree about (0,+1) or (0 -1) for BOOLEAN values. … It just keeps getting worse.”
For this reason, Machanic, a SQL Server MVP, says T-SQL will continue to be the language of choice for most SQL Server pros. “[I]t can do certain operations a lot faster [than C# or Visual Basic .NET].”
Most data-management pragmatists concede, however, that there are circumstances in which it makes sense to give programmers direct access to SQL data. “I suspect … database developers will still be able to—and should—implement most of the data tier logic they'll ever need in T-SQL,” says Content Master’s Malcolm. “However, there will always be some scenarios where T-SQL doesn't provide the necessary flexibility, or where external resources are required, and this is where managed code in the database makes sense.”
About the Author
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.