Microsoft Limits W2K High-End Configurations

Microsoft Corp. has launched an ambitious effort to eliminate memory leaks, marginal device drivers and ill-behaved hardware components.

REDMOND, Wash. -- Targeting a significant reduction in mean time between failures of Windows server systems, Microsoft Corp. has launched an ambitious effort to eliminate memory leaks, marginal device drivers and ill-behaved hardware components. The program is designed to boost the reliability of Windows 2000 server installations, particularly in high-end Advanced Server and Datacenter versions of the product.

"The mean time to reboot is a metric we’re fighting against," concedes Michel Gambier, Microsoft product manager for Windows 2000 enterprise marketing. The short-term goal is to extend the number of days, weeks or months between reboots. "The ultimate goal is to eliminate it," he adds.

Gambier says Microsoft surveyed customer sites and found that Windows NT reliability varied by a factor of 10 between less and more stable configurations. Research also found anecdotal evidence suggesting that organizations with highly structured mainframe-like disciplines usually experienced higher levels of uptime.

Taking a cue from the 99.9 percent uptime programs unveiled by Compaq Computer Corp., Data General Corp., Hewlett-Packard Co. and IBM Corp., Microsoft will limit the number of devices it plans to certify or recommend for use with Windows 2000 Advanced Server and Datacenter Server. For Datacenter Server, Microsoft and its partners will mandate configurations; for Advanced Server, Microsoft will recommend -- but won’t require -- use of a larger subset of components. "We know the number of configurations we’re supporting is a factor in reliability," adds Jeff Price, lead product manager for Windows NT Server. At this time there are no plans to limit device and driver support for Windows 2000 Professional or the low-end Windows 2000 Server products.

As part of the reliability initiative, Microsoft added several new types of reliability testing to the Windows 2000 builds. One test is a long-term stress test that, unlike the daily routine that includes overnight stress testing of each day’s build, puts groups of more than a dozen machines into a 30-day stress test cycle. The extended testing is not initiated on a daily basis, but is conducted for certain landmark builds of Windows 2000. This testing is expected to help root out minor memory leaks in Microsoft code and in device drivers that can take days or weeks to manifest themselves.

The stress testing is based on what Microsoft has found to be the most demanding Windows NT environments encountered by its customers. Microsoft is doubling the customer loads in its internal stress testing. Company representatives say there are over 60 testing scenarios developed under this program.

The second aspect of the testing is certifying drivers for use with Advanced Server and Datacenter Server. After drivers have been thoroughly tested, including exercising them under so-called abnormal conditions -- a set of conditions Microsoft has identified to have a high frequency of failures -- they will receive a digital signature. Unsigned drivers will be trapped and identified by policy-driven system management tools so they are not installed without a system manager’s knowledge.

By making these changes, Microsoft is tearing a page out of the playbook of mainframe vendors, who traditionally limit the number of peripherals and applications that a system can use.

"At the OEM level we’re talking to a different crowd than we’ve talked to before -- the people who built mainframes, not the people who built PCs," Gambier says. "We’re working with these OEM’s to find out what they need."

Microsoft is also recommending to customers that they train IT to be able to work on both mainframes and Windows Datacenter servers.

"There are a lot of good things that the mainframe has done, and we need to have those in Datacenter Server to make it more reliable and available," Gambier adds.

The final aspect of the program is a recognition that application software factors into the overall system reliability. To address that issue, Microsoft will work to educate customers about the methodologies that help achieve high reliability. "One of the enemies of getting [reliability] is change. You're not going to be adding new releases every few months," Price observes. "We need to educate all of our customers on the best processes."

Achieving Windows 2000 Reliability

  • Extended stress testing to find memory leaks and "abnormal condition" bugs
  • Digital signatures attached to tested and approved drivers
  • Limitation of the number of hardware and software drivers approved or recommended
  • Addition of mandatory hardware and driver selections for Datacenter Server
  • Education of customers on best practices for ensuring application software reliability