In-Depth
Q&A: The Evolution of Emulation
Emulation can pay big dividends. We examine emulation's long history, its value, and what's ahead, focusing on its use at Unisys.
Unisys recently made a giant step forward in its "NextGen" initiative by transitioning the architecture of its ClearPath mainframes from proprietary processor technology to Intel Xeon Processor family platforms. The result: the company's latest systems can deliver large-scale mainframe performance that rivals that of systems using proprietary processors.
To learn more about the part emulation played in the transition, the history of the technology, how it's paying off for Unisys, and what's ahead, we spoke with Bob Supnik, vice president,engineering and supply chain (ESC) operations for the technology, consulting, and integration solutions (TCIS) business at Unisys.
Bob is directly responsible for software and hardware development; his organization delivers enterprise-level systems for ClearPath and the ES lines of servers -- as well as tools and solutions for data center transformation and cloud computing -- around the world.
Enterprise Strategies: Tell us about the role emulation played in your ClearPath architecture transition.
Bob Supnik: Emulation played a key role. It's by no means a new concept, but its impact has been underappreciated. At this historic moment, it's worth considering the value that emulation has delivered not just for Unisys but also for large-scale systems providers since the beginning of the computer era.
Key historical questions are: How do emulation offerings stack up in the long history of emulation projects, what's unique about one versus another, and what is the future of emulation?
What is an emulator?
An emulator is a computer program that mimics the behavior of one computer system (the source) on another (the host). The goal of an emulator is to run software from the source system on the host system without changes. Emulators have typically been used for one of these purposes:
- Migration between systems: Emulators allow software from one class of systems to be run on another, as an initial step in migration. Example: Digital Equipment Corporation's (DEC) VAX to Alpha (VAX to EVAX Static Translator, or VEST), Apple 68000 to PowerPC, and PowerPC to Intel.
- Preservation of the past: Emulators allow software from systems that are out of production to be run on modern hardware. Examples include SimH (mainframes and minicomputers), MAME (arcade machines, microcomputers, and gaming consoles).
- Augmentation of current systems. Emulators can provide far more extensive debugging capabilities than real hardware, particularly in the embedded space. They were used in MIMIC (early minicomputers), for example.
- Design of future systems: Emulators allow construction of hypothetical machines and debugging of software before real hardware is available. Most examples are proprietary. SiCortex, for example, wrote a simulator for its system on a chip and used it to debug its version of Linux.
You mentioned using emulators in early minicomputers. How else have emulators been used over time?
Emulators emerged quite early in computing, typically for migration. In 1958, the IBM 709 provided an emulator for the earlier IBM 704, to encourage forward migration. In 1959, Remington Rand (an ancestor of Unisys) offered an emulator for the IBM 650 on its Univac Solid State Computer to encourage competitive migration. In 1964, the IBM 360 family offered emulators for both the IBM 1401 commercial computer (as part of the 360/30) and the IBM 7094 scientific computer (as part of the 360/65). All of these emulators were standalone programs that took over the entire machine and used both microcode assists and standard instructions to achieve the desired speed. In the 370 line, the emulators became standard programs that could be run in parallel with other tasks.
Another early use was to augment the limited capabilities of the first minicomputers, which typically had no operating systems at all. In 1966, at Applied Data Research, a DEC PDP-7 was used to emulate a PDP-8 for software development. The PDP-7 provided development tools (such as "mass storage" -- 578KB DECtape drives) that the PDP-8 lacked. This was generalized into a PDP-10 system called MIMIC that supported about 10 different minis in a common framework.
As technology improved, and higher-level languages replaced assembly code, emulators became easier to implement. Before universities stopped teaching hardware architecture as part of computer science, implementing a simulator for DEC's 12-bit PDP-8 minicomputer was a standard undergraduate programming project.
How does an emulator work?
Emulators are conceptually pretty easy. The state of the source system -- its registers, memory, and so on -- are abstracted as variables and arrays in the emulation program. The emulator picks up an instruction from emulated memory, decodes it, and calls or jumps to a routine that executes that instruction. The process repeats until an error, a program halt, or a user-specified stop condition occurs.
In practice, things are a little more complicated because emulators have to deal with more than just internal state -- they have to handle I/O as well. That introduces significant issues around simulating timing, asynchronous operations, and so on. I won't recapitulate all the issues, which are extensively discussed in available literature. (See, for example, Simulators: Virtual Machines of the Past (and Future) in ACM Queue, Volume 2, Number 5.)
By far the biggest problem for an emulator is performance. The repetitive operations of an emulator -- fetch, instruction decode, address decode, execution -- take a lot of instructions. The more complex the target's architecture, the more work the emulator has to do and the slower it runs. On a modern PC, a PDP-8 emulator is orders of magnitude faster than any real PDP-8 ever built, but a VAX emulator is barely as fast as the best real hardware.
What efforts have been made to improve performance?
From a performance point of view, emulators are an imperfect way of transferring a program from one system on another. Ideally, you would like the program to run in the native environment of the host system for maximum performance. However, that's not always possible. For example, the language in which the program is written may not be offered on the host, or differences in architecture (character sets, word size, addressing modes) may cause the program to run improperly. Accordingly, engineers have been seeking for more efficient ways of running binaries from one system on another.
The key technique is to combine code emulation with code translation. In this model, the emulator executes sequences of instructions and analyzes them and creates sequences of host instructions that do exactly the same thing. In future iterations, the host instruction sequences are executed instead, typically at much higher speed.
Some of the first practical instances of code translation emerged from DEC in the late 1980s as part of the Alpha program. VEST would automatically rewrite VAX binaries to Alpha binaries, with fallback to an emulator when the program took a blind branch or otherwise unanalyzable path. Later, in the FX!32 series of translators, the control flow was reversed: execution started out with emulation, and the emulator invoked a translator for frequently executed sequences. This became the accepted model for high-performance emulators and was used by Apple and IBM. It also formed the basis of Sun's "HotSpot" Java Virtual Machine.
How has Unisys been involved with emulation?
Unisys' work on emulation started two decades ago, with the introduction of the Micro-A. This system combined a proprietary A-series (predecessor of the current ClearPath Libra systems) microprocessor with a standard PC that emulated the I/O subsystem. Over time, the microprocessor was replaced with an emulator as well. The result was the first fully emulated A-Series mainframe, the A2150, which shipped in 1996, at a rousing 2 MIPS.
When Unisys committed to its NextGen initiative in 2006, the emulation team for the MCP operating system began investigating more aggressive forms of emulation, in particular, translation. First, they noted that the decoding of an A-series instruction always yielded the same results, so they started caching the decoded results. This eliminated the fetch/decode overhead in most cases. They also noticed that A-series programs contained many sequences of repeated operator pairs (like push/pop to move data). Accordingly, they defined "compound operators" that implemented both operations in one routine, eliminating unnecessary processing steps. Finally, they started generating code for short operations, so that sequences of A-series operators were replaced by sequences of Xeon instructions, with no decoding or dispatching.
These advances, combined with Intel's steadily improving chip architectures, produced dramatic improvements in performance, from the 200 MIPS of the Libra 4000 (Q4 2008) to the 300 MIPS of the Libra 4100 (Q4 2010) to the 550 MIPS of today's Libra 6200 (October, 2012). Through the application of compiler-like optimizations to the generated code, and tailoring the A-series "instruction set" to Xeon, even more improvements are on tap.
The team working on the OS 2200 operating system for the ClearPath Dorado systems started on emulation about a decade later. As with MCP, the first approach was a classical emulator, and the initial performance results were modest: the Dorado 400 shipped in October 2007 at 90 MIPS. The emulation improved considerably in the later Dorado 4000 (Q4 2008, 195 MIPS) and 4100 (Q4 2010, 225 MIPS), but it was clear that the upper limits for a classical emulator had been reached.
The OS 2200 emulation team's first advance was to translate instructions into a highly simplified, 64-bit, RISC-like format tailored for easy emulation. This was released in the Dorado 4200 (300 MIPS). The simplified instruction format facilitates translation to native Intel code, and future Dorado systems will see substantial performance gains from the use of dynamic translation.
What makes the Unisys efforts unique and worth our attention?
I'd point to a couple of factors. First, in most commercial usages of emulation, emulation was viewed as a migration aid, a crutch to get customers across to a new system. The ultimate intent was for the customer to rewrite or recompile in the new environment. Even for Apple, the emulation was a temporary measure to get applications across; Apple encouraged, and then demanded, that application writers recompile. The emulators were "one time" programs; once written, they didn't get better.
In contrast, Unisys' emulation is intended as an ongoing solution with ongoing performance improvements. Although ClearPath Forward -- our long-range architectural direction -- will give clients the opportunity to recompile for higher performance, they don't have to. From generation to generation, the emulators (now becoming translators) get better and better and deliver better performance with no effort on the customers' part at all.
Second, most emulators are going "like to like" -- that is, the source and destination architectures are pretty similar. Even in PowerPC to Intel, Apple was dealing with 64-bit, general register, byte-addressable, flat address space systems with identical character codes (ASCII), floating point formats, integer data types, memory management models, and so on. In contrast, Unisys moved two extraordinarily different architectures to Intel. MCP is a 48-bit system with tagged memory, a stack architecture, segmented memory, EBCDIC characters, and unique integer and floating point formats. OS 2200 is a 36-bit system with multiple operating modes, 6-bit and 9-bit character sets (not ASCII), and unique integer and floating-point formats. Capturing all these architectural idiosyncrasies, with perfect fidelity and ongoing performance improvements, is unprecedented.
Finally, Unisys' emulation strategy has divorced the progress of our ClearPath systems from "Moore's Law" -- that is, Unisys can continue to improve ClearPath performance even if microprocessors don't get any faster. Despite the tremendous progress that has been made so far, there are lots of techniques and algorithms to explore: modifying compilers to provide useful hints for improving dynamic translation; applying compiler-like optimizations to translated sequences; using multiple cores to "pipeline" the execution of translated programs; and so on.
In the words of the late Al Jolson, "You ain't seen nothin' yet."