![]() |
|
Intel to Face Onslaught of
500MHz 86-Compatible CPUsAMD, Cyrix, and Integrated Device Technology will compete with start-up firm Rise Technology with 86-compatible microprocessors scheduled to be released from late 1998 and through 1999.
From late 1998 and through 1999, several 86-compatible chip manufacturers planned to ship their next generation of products. Most of them are setting their operating frequency targets at 500MHz or higher (Table 1). While they are all aiming to achieve a chip performance that is equivalent to that of the Celeron or Pentium II, each firm has its own design philosophy.
Competing Head-On with Intel
At the Microprocessor Forum held in San Jose, California at the end of 1998, a number of 86-compatible chip manufacturers announced new microprocessors running at speeds of 500MHz or higher. Established manufacturers including Advanced Micro Devices Inc (AMD), Cyrix Corp, Integrated Device Technology, Inc (IDT), all of the US, revealed outlines for their new chip technologies, while newcomer Rise Technology Co of the US announced its mP6 chip (Fig 1). Rise Technology was established in 1993 by Taiwan-born David T Lin, and currently employs about 90 people. Though its offices are located across the street from the head office of major competitor Intel Corp of the US, the firm says there are no ex-Intel engineers in its employ.
Seemingly in a bid to shake off the competition, Intel announced plans for its 86-series microprocessors. Only slightly behind the 64-bit microprocessor scheduled for shipment in 2000 (codename: Merced), the firm will market the Foster chip, an 86-series 32-bit microprocessor which has a performance on a par with that of its 64-bit big brother. Intel has mentioned in the past that it would enhance 32-bit performance after Merced, and this is the first announcement of a concrete shipment plan.
Of the 86-series chips, the one aimed directly at the personal computer (PC) is the AMD-K7 (generally just called the K7), which is scheduled to ship from AMD in the first half of 1999 (Fig 2, 3). While the other manufacturers of compatible chips continue to aim at the low-end PC, AMD geared the design of the K7 towards high-end PCs and multiprocessor servers. By aiming at the corporate market, currently monopolized by Intel, the firm is challenging Intel directly, while keeping its interest in the low-end PC market.
With the exception of AMD, the 86-compatible chip manufacturers have continually emphasized low price and low dissipation especially Cyrix, which is pushing hard for the low-priced PC market. The 86-compatible chip (codename: M3), scheduled to ship in the fourth quarter of 1999, for example, will integrate a three-dimensional (3D) graphics controller and a Rambus dynamic random access memory (DRAM) interface (Fig 4).
External Clock Speed of 200MHz
AMD, committed to a face-off with Intel, is building a secondary cache interface into the K7 chip to achieve a performance superior to that of the Pentium II processor. The chip will be able to manage a secondary cache of up to 8 Mbytes. The interface for peripheral equipment will use the same protocol and electrical specifications as the Alpha 21264 64-bit reduced instruction set computer (RISC) chip developed by Digital Equipment Corp of the US. The interface circuit will operate at 200MHz, and the firm plans to double this to 400MHz in the future. The Pentium II and existing compatible chips use interfaces running at 100MHz or less.
The interface linking the Alpha 21264 and peripheral integrated circuits (IC) was originally designed for point-to-point connection. According to AMD-K7 product marketing manager, Gary Bixler, this means, It is not very difficult to design signal lines running at 200MHz even with the four-layer PCBs used in desktop PCs. Our devices will be able to run at 400MHz with six- or eight-layer boards.
In a multiprocessor configuration, a peripheral IC can be connected to multiple K7 chips, serving as crossbar switches to interconnect the microprocessors. Because the interface specifications are the same for connections between microprocessors and peripheral chips, the firm believes it will be possible to use peripheral ICs originally designed for the Alpha 21264.
The K7 will be mounted in a cartridge (called Slot A) with physical specifications almost identical to the Slot 1 used by Intel in the Pentium II processor. By using the same connectors and heat sinks as the Pentium II, the firm hopes to reduce component costs. To prevent the user from accidentally mounting it into a Slot 1, a mechanism to prevent physical interconnection is provided.
AMD, Cyrix, IDT Boost Pipelining
Differences in the design philosophy of the various compatible chip manufacturers are also apparent in the structures of the central processing unit (CPU) cores. Most significantly, they are found in the number of pipeline stages, in the branch prediction technology, in the number of instructions that can be issued simultaneously, and in out-of-order execution.
AMD, Cyrix and IDT seem to have decided that a major enhancement in pipelining is essential to compete with Intel for operation at 500MHz or more. For the Pentium Pro, Intel adopted a 12-stage pipeline.
The AMD-K7 uses a 10-stage pipeline, which represents a big jump from the six-stage pipeline used in the K6. The Cyrix M3 and the IDT WinChip 4 have both increased the number of stages to 11, from the current six.
The mP6 from Rise Technology, on the other hand, has only six stages. It is unknown if the aim is for the chip to reach an operating speed of 500MHz or more. We dont want to talk about the operation speed until shipment, says chairman and chief executive officer (CEO) David Lin. The prototype displayed at the Forum had a 200MHz clock.
Increasing Branch Prediction Precision
Increases in pipeline stages makes a corresponding elevation in branch prediction precision essential. This is because the deeper the pipeline, the greater is the overhead when the prediction turns out to be wrong. In other words, the precision of branch prediction can easily affect operation performance.
All of the 86-series compatible chip manufacturers who are boosting their pipeline stages are also providing enhanced branch prediction mechanisms.
IDT is especially concerned about this point; it uses three branch address generator methods and two branch prediction methods. The branch prediction mechanism uses the gshare technique to improve branch prediction table utilization efficiency, and the agrees technique with either static or dynamic branch prediction to make determinations, based on the most recent branch result of that branch instruction. The branch prediction table is quite large: 8K entries. Winstone benchmarking results show a branch prediction precision of 95%, which Rises Lin admits is: Very good.
AMD has 2K entries in its table, and Cyrix only 1K entry. The Pentium II and Pentium Pro, on the other hand, have only 512 entries.
Rise Executes Three MMX Instructions
AMD and Rise Technology are developing chips capable of issuing three 86-series instructions simultaneously, thus finally catching up with the Pentium II.
The Rise chip issues three MMX instructions simultaneously. To accomplish this, the chip integrates an MMX instruction multiply/accumulate operator, arithmetic logic unit (ALU) and shift operator. According to the firm, even object code optimized for the Pentium II will deliver higher performance, because three MMX instructions are issued at once.
The AMD chip simultaneously converts three 86-series instructions into an expression called a MacroOPs, which is made up of one or two internal instructions. MacroOPs is executed with either wired logic or microcode. Nine operators are provided for MacroOPs processing, consisting of three ALUs, three address operators and three floating point operators. However, the chip can only execute two MMX instructions simultaneously.
Considering Circuit Complexity
Cyrix and IDT, having considered the circuit complexity required to issue three instructions simultaneously, decided that two instructions at a time was optimum.
According to a source at Cyrix, most PC software wont gain much of a performance boost even if the hardware running it can execute three or more instructions at once (Fig 5). This is because 40 to 60% of the processing time the application requires is spent on operating system (OS) execution, which has a very low degree of parallelism at the instruction level.
IDT did not adopt an out-of-order execution function, which can process instructions regardless of the order they are described in the program. The firm felt that maximum priority should be assigned to improving the cache hit rate, shrinking mounting area, and boosting operating frequency. Rise Technologys mP6 also excludes the out-of-order execution function, although, according to Lin, We may adopt it in the future, but not right now.
Intel Sets Sights on 1GHz
In response to advances in chip design made by competitors, Intel is striving to further boost the operating frequency of its microprocessors.
The Foster 86-series microprocessor, scheduled for commercialization in 2000 or 2001, will offer performance on a par with that of Merced (Fig 6). To achieve this, it will have an operating frequency of 1GHz or higher, and will be able to process significantly more 86-series instructions simultaneously than the existing Pentium II processor or its immediate successors (codenames Tanner and Cascades).
Three technologies are being introduced to execute more instructions simultaneously:
- A management method called the trace cache, whereby multiple instructions are lined up in execution order in the instruction cache;
- Enhanced branch prediction mechanisms; and
- Increased capacities for the internal primary and secondary caches of microprocessors. The primary cache data transfer rate will be 32 Gbytes/s, and the secondary rate will be 8 Gbytes/s.
The basic specifications of Merced are as follows:
- Separate decoders are provided for IA-64 instructions and 86-series (IA-32) instructions, but the ALU, registers and data cache are shared;
- In addition to an external cache, the chip itself will also integrate primary and secondary caches. The primary cache will be divided into instruction and data portions, while the two will be mixed in the secondary cache; and
- Two floating point multiply/accumulate operators are integrated each for single-precision and double-precision.
The 3D graphics draw-performance for the Merced floating-point operators is about 20 times superior to that of the Pentium Pro, and three times greater than that of the 86-series Tanner microprocessor, slated for release in 1999.
The successor to Merced, scheduled to ship in the second half of 2001 (codename: McKinley), aims to double the performance of Merced. It will increase the number of operators, and will offer a processor bus data transfer rate triple that of Merced.
by Hiroki Eda
- References:
- AMD
Cyrix
Digital
IDT
Intel
Rise