Methodology: Where We Got Our Numbers
The idea behind the
ENT Web platform survey was to discover on what servers the IT organizations at the largest US companies were trusting their corporate identity or brochure Web sites.
We chose the sites that host basic information about a company for investors, employees, and customers rather than e-commerce engines to get a consistent measure across corporations. Not every huge company derives millions or billions of dollars from an e-commerce source, but nearly all major corporations sport some kind of basic Web site.
The ENT Web platform survey was conducted in late April and early May. We used the 2000 Fortune 500 list (www.fortune.com) to rank the companies, and we used the corporate identity site listed by Fortune to determine what Web server and operating system were in use.
ENT used a Web site pinging tool from Netcraft Ltd.'s (www.netcraft.com) Web site to find out what platform each site was running.
The tool relies on a little-known feature of the Web, HTTP-headers. When a browser requests a Web page, the server can send a rich mix of HTML, script, images, multimedia and other objects for the browser to interpret and display. The server can also send information about the page or about the server that delivered it.
This information, sent from server to browser in small packages called HTTP headers, is the source for the information in the ENT survey. But is the information in those headers reliable?
In certain situations, the answer is no. Sometimes the information in the headers reflects unusual or special purpose networking strategies employed by the organizations running the servers. For instance, several large commercial sites use a front door Web server that examines incoming requests and then redirects them to servers based on certain criteria, such as the server which is least busy, the one available with special tools to meet the requirements of the request, or a server that is closer to the browser in the Internet’s public topology.
These front doors are often called HTTP request brokers because they redirect HTTP requests to servers better suited to fulfill incoming requests. In our survey, an organization using a request broker would report the operating system, Web server, and platform of the broker, not the server that eventually delivered the service.
Another example is when organizations insert a cache between incoming requests for service and a Web server. Done to enhance performance for the visitor and reduce workload on the corporate Web server, these Web proxy cache servers have become popular at high-volume sites. In our survey, a corporation with a Web proxy cache would report the cache server’s platform and server software -- not the platform of the server with the original content.
Many organizations now put security tools between external visitors and the machines that deliver public content. Some use firewalls as a gateway between the public Internet and the corporate LAN that connects their public Web server. Others use a technique called network address translation to ensure that outside requests are passed through an intermediary where access control and other rules can be applied to the traffic. In these cases, our survey reflects information from the intermediate machine rather than the one on which the content resides.
Other complications exist, but all are guided by the desire to provide better performance for the visitor and increase security for the corporate site. But do these substantially skew the results of the survey? These techniques are in much wider use in e-commerce sites than in traditional outreach and publishing sites. While the actual number is a closely guarded secret, one estimate puts the number of traditional Fortune 500 sites using these techniques at about 10 percent.