In November 2000 Intel introduced the new and very powerful high-end chip Pentium 4, formerly known as codename "Willamette". It was (as expected) delayed.
The Pentium 4 is a completely new processor holding several new designs. Here is a highlight:
The Rapid Execution Engine
Advanced Dynamic Execution means that the processor may execute up to 6 instructions simultanously.
Using Rapid Execution Engine certain instructions may be executed at twice the normal speed.
20 stages of pipeline
The pipeline is a execution unit which takes in decoded micro-instructions. The X86 instructions are first decoded and then sent to the pipeline. The longer the pipeline is, the quicker an instruction can be executed. Each stage executes a minor part of the work and by spreading the work on "more hands", the efficiency is increased.
In Pentium III the pipeline was of 10 stages. In Pentium 4 it has been increased to 20 stages.
The problem with the long pipeline is, that it takes to longer time load the instruction. And if the instruction should not have been loaded at all, the error is most costly (in time) the longer the pipeline becomes.
With many instructions being executed simultanously you cannot avoid loading the wrong instruction (called misprediction) from time to time. And a shorter pipeline is quicker to recover this error - fewer stages has to be cleared and reloaded.
The analytic work preventing mispredictions is done by the Branch Prediction Unit. This has, according to Intel, been improved with a 30% better performance compared to the Pentium III.
Stalling is another phenomen. Normally the data to be used is located in the cache. But if, for some reason, the data is missing in the cache, it has to be loaded from RAM. This takes a lot of time, and the longer the pipeline is the longer time it takes.
The benefit from a pipeline increased from 10 stages to 20, is to open up for new higher clock frequencies. When each instruction is executed in more stages, it can be done a lot quicker.
At lower frequencies this gives no advantage. In fact all reports indicate that a 2000 MHz Pentium 4 is slower than a 1600 MHz AthlonXP. This is due to the difficult prediction of the order of the instructions. Wrong predictions gives wasted clock cycluses and a poorer performance.
However, a longer pipeline is required for processor speeds above 2 GHz. Intel expects this new NetBurst core to live three to five years. Hence we may expect Pentium 4 successors to reach 5 GHz or more.
The Streaming SIMD (Single Instruction Multiply Data) Extensions known from Pentium III has been improved. The data width has doubled from 64 to 128 bit.
Also 144 new instructions in SSE2 gives more parallel execution. Now four Internet/Multimedia-based operations can be executed simultanously. The new design appears to have been accepted by software developers, and it will probably be very useful within programs like Photoshop.
The Execution Trace Cache
The Pentium 4 is the first CPU to have a "code cache". All instructions are translated inside the CPU. This happens in all modern x86 processors. They receive x86 instructions from the software. These instructions are "crunched" into smaller instructions which then are executed natively.
The new thing in Pentium 4 is that these translated instructions are cached and reused. The logic of this setup may look like this:
This new cache, being a part of the L1 cache holds 12 KB. Together with 8 KB of data cache, the L1 cache consists of 20 KB.
The first Pentium 4s were giant chips with a die size of 247 mm2. The 42 millions of transistors uses 60 watts and requires heavy cooling. However, Intel has done a lot to provide sufficient cooling using new materials and design.
The first Pentium 4s came with a 423 pin design. These CPUs could only use RDRAM, which only requires few pins interface.
Later came a 478 pin design with support for SDRAM.
The chip set i850 ("Tehama") is using a dual RDRAM bus. This is also heating up the MCH (north bridge) chip, so additional cooling is required.
The Intel i845 chipset was introduced August 2001. It gives an interface to standard 133 MHz SDRAM. This chipset is found in many cheap Pentium 4 computers.
Support for 266 MHz DDR RAM is found in the VIA P4X266 and P4X266A chipsets to be used with Pentium 4 (against Intels will; they don't like VIA at all).
SIS 645 is a chipset for Pentium 4 with support for both 266 and 333 MHz DDR RAM.
The i845D chipset
December 2001 Intel suddenly introduced a new flavour of the 845 chipset. It is designed to use with DDR-RAM!
There was no press releases about this product. Searching Intel's web for info on this chipset gave no result (December 27th 2001):
However the i845D chipset was found in computers from Dell and others.
Intel has been under a hard pressure from AMD, VIA, and the Taiwanese motherboard companiess in all 2001, and it was fine to see, that they finally had to adapt to common sense. DDR is the RAM type to use - almost nobody wants RAMBUS! And Intel's dominating position in the market will promote better DDR RAM products.
On January 7th 2002 Intel introduces a new 2.2 GHz version of the Pentium 4.
This processor comes with the new Northwood kernel: