Parallelism
This is the fourth column I’ve written about NT scaling, but some recent e-mails from readers made me think some more about the topic.
Microsoft Corp. uses an approach to databases, imprecisely called "shared-nothing." I call it "uncoordinated parallelism." The concept is to split a massive database into partitions, put a transaction router in front of the partitions, and route client requests to the correct partition. Each partition operates essentially independent of the other partitions. This works great for transactions that operate on only one partition but becomes really complicated for transactions that span multiple partitions.
Another approach involves a single database with lots of systems hitting it simultaneously, using a clusterwide lock manager to keep things straight. I call this "coordinated parallelism." Digital is building a lock manager for NT and has offered this technology for at least 15 years with other operating systems. With this approach, the disk farm and database servers are interconnected via some private high-bandwidth bus. When a request arrives from a client, the server with the most free capacity at that instant handles it.
The database server that will handle the transaction needs to lock the area of the database on which it wants to operate so that no other nodes mess with it until the operation finishes. From a developer’s point of view, this means calling a system service to request the lock, some code to do the operation, and another system service call to release the lock.
Under the hood, the algorithm to grant a lock on a resource involves using up to three nodes: a requesting node, a directory node and possibly an owning node. The algorithm also depends on an agreed-upon naming scheme for any resources for which anyone will want a lock. On the basis of the resource name, the requesting node uses a hashing scheme to determine a directory node name. The requesting node then queries the directory node to find out if any other node has a lock on (or "owns") the resource. If so, this is the owning node, and the requesting node and owning node negotiate to determine if the owning node can grant the lock to the requesting node. If no other node owns the resource, the requesting node becomes the owning node.
I lobbied for Microsoft to consider using clusterwide locking in a future release of SQL Server. I also said this approach would always be slower than the uncoordinated approach because of the inherent lock traffic involved. A reader took me to task and pointed out that even with the uncoordinated approach, the transaction router or somebody must take care of transactions that involve multiple partitions. So maybe clusterwide locking is not always slower then uncoordinated parallelism.
For example, suppose I want to transfer money in my bank from my savings to my checking account. Under the uncoordinated model, assume my checking account lives in a different partition than my savings account. In the best case, the transaction flows something like this:
Client à Tx Router à Savings Partition à Tx Router à Checking Partition à Tx Router à Client.
Note that each of these jumps uses the network, and the more partitions involved, the more jumps between the Tx Router and appropriate partition. As a developer, I need code on the client to initiate the transaction, and code on the Tx Router to store partial results from each partition and handle sophisticated journals so I can roll back if one of the database partitions suddenly dies. I also need to design an intelligent partitioning scheme to minimize the number of cross-partition transactions.
Under the clusterwide locking model, the worst case transaction flows like this:
Client à Database Server à Directory Node à Database Server à Owning Node à Database Server à Client.
Note that the traffic among the database server, directory node and owning node flows across a private interconnect, not the network. As a developer, I need code on the client to request the lock, do the work and release the lock. I don’t need to worry about partitioning schemes or storing and journaling partial results across partitions because there are no partitions. I also minimize my network load, regardless of how complex the transaction.
Unless I’m missing something, I can’t come up with any nontrivial scenario where the uncoordinated approach wins. So until somebody convinces me differently, I’ll keep lobbying for clusterwide locking. --Greg Scott, Microsoft Certified Systems Engineer (MCSE), is president of Scott Consulting Corp. (Eagan, Minn.). Contact him at [email protected].