Microsoft Targets Availability, Scalability

Among the phrases being generated by the Microsoft Corp. Windows 2000 marketing machine are promises of reliability, availability and scalability.

Special Report

Among the phrases being generated by the Microsoft Corp. Windows 2000 marketing machine are promises of reliability, availability and scalability. These are not new terms to be used by Microsoft, but rather, for many users, unfulfilled promises made many times before.

Microsoft has historically taken an approach that hardly lends itself to rock-solid stability. The lists of third-party products certified for use with Windows NT Server, for instance, are lengthy. Yet the price paid for the broad array of peripheral and driver support, predictably, is a less reliable operating system.

Other systems in the midrange and mainframe world have achieved high levels of reliability in part by restricting the number of devices, applications and configurations that are approved for use with a given system.

Mainframe vendors, for example, have a simple formula for their complicated systems: extensively test a specified number of devices, applications and configurations; certify the ones that work well; then hammer the list down so customers will know what works and what to expect.

With Windows 2000, Microsoft has adopted these methodologies, particularly for the Advanced Server and Datacenter Server editions. Microsoft now limits the hardware, devices, drivers and software certified for use with Windows 2000 Advanced Server and Datacenter Server, and has added new features to each server that are designed to increase uptime and make crash recovery a faster process.

"Microsoft has put massive resources behind quality assurance, which makes our confidence higher than it was with NT 4.0 that Windows 2000 will be more reliable, more available and more scalable," says Laura DiDio, operating systems analyst with Giga Information Group (www.gigaweb.com). DiDio, however, says Giga’s increased confidence does not necessarily mean the consulting firm actually has a high degree of confidence in Windows 2000.

Reliability and Availability

"One of the biggest issues Microsoft faces is that they are an OS maker without control of the hardware," observes Dan Kusnetsky, operating systems analyst at International Data Corp. (www.idc.com).

Most Unix vendors, in contrast, have complete control of the hardware. These vendors make the hardware and the software, and they test everything before certifying it, thus avoiding compatibility problems.

With Advanced Server and Datacenter Server, Microsoft is adopting a certification style closer to that used by the Unix community. For instance, the company is requiring stringent hardware testing before a product is included on the Hardware Compatibility List (HCL).

"We are asking hardware vendors to go through long-haul system tests, and we’ll keep track of that on the HCL. Right now, there is a pass/fail test that isn’t dependent on time," says Michel Gambier, product manager for Windows 2000 enterprise marketing at Microsoft.

The new testing procedures require hardware to run the operating system for specific periods of time, which is based on how a hardware system will be used.

Microsoft also is restricting the drivers that will be supported with Windows 2000 through stricter certification, device driver verification and signatures.

One new technology, called Driver Verifier, is a mechanism that helps Windows 2000 expose errors in kernel-mode drivers and activates defenses when interacting with unstable drivers. The technology tests specific sets of error conditions, and adds other likely failure modes to the suite as they are found.

Some of the problems Driver Verifier can provoke and detect are memory corruption, extreme memory pressure from the perspective of the specified driver, double releases of spinlocks, usage of uninitialized variables and pool corruption.

The performance impact of Driver Verifier prohibits its continuous use -- so Microsoft recommends using it primarily in a nonproduction environment. It is intended for testing new drivers and configurations that are later replication in production systems.

To ensure that device drivers loading on Windows 2000 systems are certified production-grade products, Microsoft now provides cryptographic signing on the binary driver code by digitally signing drivers that pass the Windows Hardware Quality Labs (WHQL) tests.

"As Microsoft’s products move up to the enterprise level, Microsoft has to have more control over the hardware the products run on or they literally cannot get high-availability," IDC’s Kusnetzky says.

In addition to exercising more control over the hardware that Windows 2000 will run on, several new features were added to the server software to increase reliability and availability. The additions include kernel-mode write protection, System File Protection (SFP), pool tagging and guard pages.

Kernel-mode write protection uses the Windows 2000 memory manager to provide write protection for code and read-only subsections of the kernel and device drivers, as it does for user-mode programs and dynamic link libraries. The feature protects each part of the operating system from bugs in other sections.

System File Protection prevents the replacement of monitored system files, avoiding file version mismatches.

Pool tagging, also known as special pool, is an addition to Windows 2000 culled from NT 4.0’s Service Pack 4. The feature makes all memory allocations to selected device drivers available out of a special pool, rather than a shared system pool, thereby enabling driver writers to produce better drivers and cleaner code. The memory protection is set to cause a system crash if a driver writes over the edge of its allocation.

Guard pages create boundaries for the special pool. Attempts to write outside the limits of the pool hit a guard page, which is mapped so that the hardware protection causes an operating system failure. The induced failure informs developers of whether their software is behaving properly or needs to be modified.

Scalability

In addition to the reliability and availability improvements -- which help increase scalability as well -- Microsoft added specific features to improve Windows 2000’s scalability.

Microsoft is working to improve scalability in both vertical and horizontal directions. Vertical scalability is building bigger, more powerful servers. Horizontal scalability is increasing the number of systems that can be strung together in a cluster or system area network.

To increase vertical scalability, SMP support is being boosted at the high end. Current Microsoft specifications call for Advanced Server to support up to eight processors, and Datacenter Server will be able to support up to 16 processors in the shrink-wrapped box and more in the OEM version.

The company is making changes to the 32-bit memory address limit of 4 GB by adopting Intel Corp.’s Physical Address Extension, using direct I/O to access far more than 4 GB of physical memory, and treating all physical memory as general purpose memory.

The new address windowing extensions (AWE) give user applications with 32-bit virtual memory greater than 32-bit regions of physical memory. The AWE API also allows applications to use physical, nonpaged memory and window views to this 36-bit physical memory within a 32-bit virtual address space.

Additionally, a new feature called the Job Object API provides a nameable, securable, inheritable and sharable object that associated processes on a processor-by-processor basis. Using tools that leverage this API, administrators can manage groups of processors as a unit and enforce limits associated with each process on a specific job -- such as jobwide user mode CPU time limit or per processor CPU time limit.

To improve horizontal scalability, networking enhancements were added to the operating system as well. These include I/O drivers, interrupt I/O affinity, NTFS improvements, SCSI support, support for storage area networks, TCP/IP stack performance, large frame support and, beginning in Datacenter Server, support for system area networks via the WinSock Direct Path API.

Other scalability enhancements include the basic clustering and load balancing features, such as COM+ Load Balancing and Network Load Balancing in Advanced Server and four-node clustering in Datacenter Server.

"The tools in Windows 2000 are helpful, but this is an area in need of further improvements," says Eric Cone, a senior consultant specializing in Windows 2000 planning at Metamor Technologies Ltd. (www.metamortech.com), a consulting firm. He says customers would like to see even more development in terms of load balancing and clustering.

Beyond the Operating System

Although Windows NT 4.0 has been criticized for its lack of reliability and availability, some of the responsibility can be pinned on sources outside the core operating system.

"Too many companies expect Windows servers to be available right out of the box, like a toaster," Cone says. "But if companies want it to do everything, the way people demanded of NT, that takes some configuring."

Barbara Gaffney, senior vice president for business programs at Sequent Computer Systems Inc. (www.sequent.com) agrees that achieving high reliability and high availability requires more than an adequate operating system.

"High reliability is definitely achievable for Windows NT or 2000," she says. "But just like Unix, it is a function of the solutions around it, the processes put in place to manage it and the people in charge of the entire solution."

Metamor’s Cone draws a parallel with Unix boxes, noted for their high reliability, which get there in part through extensive know-how on the part of the administrator.

"Even though it’s thought of as hardened, Unix can be left wide open if administrators don’t know what they are doing," he says. "The bullet- proof Unix systems are locked down by administrators who really know Unix well."

Microsoft’s role behind the software is as important. Analysts and consultants alike agree that one of the company’s shortcomings for Windows NT 4.0 was Microsoft’s not making the best practices for using the operating system immediately known to the public.

"That’s something customers ask us about with increasing frequency as they begin to put mission critical applications on our servers," Microsoft’s Gambier says. "They want to know what other customers are doing that is working, or what they’re doing that is not working. We are being more proactive about this. Companies like IBM have done this for some time in the mainframe world, and we are trying to do so as well."

Microsoft now has "The High Availability Deployment Guide," a 100-page document posted on its Web site. The guide features nine case studies of customers that have achieved high availability.

Gambier says the document will grow, and more documents will emerge in the company’s effort to increase best-practices awareness for Windows 2000.

According to Microsoft officials, the company is working to beef up its training and certification process to prepare professionals for the differences between achieving scalability, high reliability and high availability with Windows NT 4.0 vs. doing so with Windows 2000.

But IDC’s Kusnetzky is pessimistic that even with the operating system improvements and the initiatives to make best practices available to customers, Advanced Server and Datacenter Server will not be able to handle all the tasks of the largest enterprises.

"NT 4.0 already handles a good chunk of what companies need, and Windows 2000 will definitely handle more of those needs, but not all of the biggest jobs," he says. "I’d also like to point out that Unix can’t handle every task, and even OS/390 barely manages the absolutely largest of tasks."

Bringing Windows 2000 Back from a Crash

Despite several reliability and availability enhancements, analysts and consultants agree that Windows 2000 will probably crash from time to time, particularly in the early versions. The operating system includes new features to help when the server does go down, for whatever reason.

  • Safe Mode Boot uses minimal services to boot the machine so users can correct installation problems or change the settings that caused boot problems. In safe mode, Windows 2000 uses default settings, such as mouse, monitor, keyboard, mass storage, base video, and default system services.
  • Step-by-Step Configuration Mode lets users choose the basic files and drivers to start. The Last Known Good Configuration option starts a computer using the registry information saved at the last shutdown.
  • The Recovery Console is a recovery and repair tool that is integrated with Windows 2000 Setup and allows access to and repair of Windows 2000 installations that are damaged or not booting. The console reduces reliance on FAT and DOS boot floppies for recoverability.
  • Kernel-Only Crash Dumps optionally write the contents of the kernel to disk after a crash, thus decreasing reboot time.
  • Chkdsk in Windows 2000 was enhanced significantly. Microsoft reports that in some configurations Chkdsk’s performance is more than 10 times faster under Windows 2000 than it was under Windows  NT 4.0.
  • Kill Process Tree ensures that all processes spawned by a parent process are removed when the parent process is stopped.

Fewer Service Interruptions, Fewer Reboots, Higher Uptime

One of the most legitimate gripes about Windows NT is the number of administrative tasks that require rebooting the server, also known as service interruptions. With Windows 2000, Microsoft whittled down the reboot list considerably. Many Windows NT 4.0 configuration changes that needed to be rebooted no longer require reboots with Windows 2000:

  • Changing IP settings, protocol binding order, IPX frame type, ATM address of the ATMARP server, performance optimization between applications and background services
  • Resolving IP address conflicts
  • Switching between static and DHCP IP address selections
  • Enabling or disabling network adapters
  • Adding or removing network protocols, including TCP/IP, IPX/SPX, NetBEUI, DLC and AppleTalk
  • Adding or removing network services, such as SNMP, WINS, DHCP and RAS
  • Adding Point-to-Point Tunneling Protocol (PPTP) ports
  • Adding a new PageFile
  • Installing Dial-Up Server on a system with Dial-Up Client installed and RAS already running
  • Installing or removing a variety of Plug and Play devices, including network interface controllers, modems, disks and tape storage
  • Installing or removing Universal Serial Bus (USB) devices, including mice, joysticks, keyboards, video capture and speakers
  • Installing or removing PC Cards, File and Print Services for NetWare, Gateway Services for NetWare and Internet Locator Service
  • Installing Internet Information Server, Microsoft Transaction Services, SQL Server 7.0, Exchange 5.5, device driver kits, Software developers kits, Microsoft Connection Manager
  • Increasing the PageFile initial size and maximum size
  • Extending an NTFS volume
  • Mirroring an NTFS volume
  • Loading and using TAPI providers
  • Switching MacClient network adapters and view shared volumes
  • Docking or undocking a notebook

Must Read Articles