In-Depth
Efficient Design Never Goes Out of Style
When it comes to development, smaller truly is faster.
By Bob Supnik, Vice President of Engineering and Supply Chain, Unisys Technology Consulting and Integration Solutions
I recently relocated from an office in a company facility to a home office. To prepare for next summer, I finally had central air conditioning installed in my house, after a mere 33 years of living there. (I don't like to rush matters.) Because the new equipment took up about a quarter of the available space in the attic, I had to clean out boxes of stuff that had been up there for decades.
I started with the old books. I found a box of primary sources on diplomacy in the Second World War, dating from my brief fling as a history student. It was a pleasant surprise to find that some of them are rare and worth a bit of money now.
I also found multiple boxes of old computer books. These included such classics from the early 1970s as Madnick & Donovan's Operating Systems, Donovan's Systems Programming, and of course, Knuth's The Art of Computer Programming. There was also a complete set of operations and programming notebooks for the PDP-10, a 36-bit system from Digital Equipment Corporation (DEC), now part of HP. Still hidden up there are listings (remember listings?) for every software system I wrote or worked on before I joined Digital in 1977 and turned to the "Dark Side" (hardware) for the next 20 years of my career.
These discoveries raise an interesting question: what should I do with all this material? Looked at another way: in an industry that evolves as rapidly as ours, is there any value in technology books that are nearly 40 years old? For some of the material, the answer is easy. All of the PDP-10 notebooks have been scanned and are available online. There's no need (for me) to retain a paper copy. For the books, the answer is less obvious. Are they relevant in any way?
A superficial glance through the tables of content is not encouraging: assembly language programming, data structures, algorithms, memory management, I/O strategies (channel and non-channel), resource minimization, workload management, how to write assemblers, compilers, and loaders, etc. Computer Science, which was brand new as an academic discipline in those days, has moved on, to objects, rapid prototyping, scripting, Web services, and multimedia, hasn't it?
For the most part, yes. Almost all software development is done in modern application environments, at very high levels in the software stack. Mechanism is unimportant, and computer science curricula tend to give it short shrift. Modern computer engineering students don't take apart alarm clocks to see how they work; they just use them to get up and go to classes (sometimes).
However, in the world of Unisys ClearPath mainframes -- where I live now -- mechanism is everything. An understanding of the underlying design principles of ClearPath is critical for anyone who needs to work on the operating system kernel or tightly connected subsystems, such as the file system, the database, transaction monitors, and integrated recovery mechanism.
These design principles derive from the state of computer science in the 1960s and 1970s, principles that are no longer taught, but these "ancient" textbooks describe them in great detail.
These principles still matter.
Take, for example, efficient use of resources. Why should a programmer worry about conserving a few kilobytes, when main memories are tens of gigabytes? Or efficient file formats, when disks store multiple terabytes each? The reason is simple: smaller is faster, even today. Although the main memory of your PC may be gigabytes, the primary cache on your x86 is 64 kilobytes, or perhaps 128 kilobytes. If your program resides entirely in the primary cache, it runs at 3Ghz. If it resides in the secondary cache, it runs at 800Mhz. If it resides in main memory, it runs at 80Mhz: the same speed as a 1990 microprocessor. There's a reason for the sardonic saying that circulated when the first mega-operating systems showed up in the 90s: "It's big, but it's slow, too."
With the advent of in-memory databases, and solid-state disks (SSDs), the same reasoning applies to storage. It's now possible to hold databases of a few terabytes in main memory or solid state storage, with spectacular performance gains. Solid state disk arrays have demonstrated I/O rates in excess of one million I/Os per second. However, they're expensive, and they have limited capacity, so again, small is better. (On a side note, hierarchical databases use much less space than full relational databases. Perhaps they'll make a comeback with SSDs.)
The same is true of wide-area networking (WAN). WAN bandwidth has improved much more slowly than other computing technologies, such as processor performance, memory capacity, storage capacity, and local area networking (LAN). Even today, the cheapest and fastest way to move 1 terabyte of data from one data center to another is to copy it to a (fast) portable hard drive and then ship the drive via overnight courier. Efficient use of resources remains a necessity for wide-area networking. Compression and local caching can only do so much. Smaller is faster.
Let me demonstrate the value of efficiency with two anecdotes. In 1995, DEC started shipping Alpha systems based on the EV-5 chip. This was the first chip to issue four instructions at a time, out of its "mammoth" 8KB primary instruction and data caches, at 300Mhz (and later 500Mhz). Almost immediately, EV-5 was used for two spectacular accomplishments:
- Bolt, Beranek, and Newman (BBN), the inventors of the first Internet routers, demonstrated that EV-5 could implement a line-rate (100Mbit per second) router entirely in software, something that hitherto had required dedicated hardware.
- DEC demonstrated, via its AltaVista search engine, that it was possible to index and search the entire World Wide Web, as it then existed, in real time.
What was the key? In both cases, the implementers squashed the key loops into the 8KB primary instruction cache (that's only 2,000 instructions), and then hand-scheduled the code to get the maximum simultaneous instruction issue possible. They also hand-scheduled loads to bring data from main memory to the primary data cache before it was needed. The chip could theoretically issue 1.2 to 2 billion instructions per second, and these two programs came very close to achieving that. Why? Because they were efficient.
Maybe I won't junk those old textbooks quite yet. The lessons and techniques of the past aren't relevant, really, when you're running Excel or viewing your photo collection or perusing Facebook, but when you're trying to squeeze every ounce of possible performance out of a system, they're just as relevant now as they were 40 years ago.
Bob Supnik is vice president of engineering and supply chain, Unisys Technology Consulting and Integration Solutions. You can contact the author at [email protected].