KarbosGuide.com. Module 3e.14

The Intel Pentium 4 processor

The contents:

  • An introduction til Pentium 4
  • SSE2
  • The Execution Trace Cache
  • "Northwood"

  • Next page
  • Previous page

  • In November 2000 Intel introduced the new and very powerful high-end chip Pentium 4, formerly known as codename "Willamette". It was (as expected) delayed.

    The Pentium 4 is a completely new processor holding several new designs. Here is a highlight:

  • 400 MHz Front Side Bus of 128 bit width
  • Execution Trace Cache
  • 20 KB L1 cache and 256 KB L2
  • The ALU (Arithmetical Logic Unit) runs at twice the clock speed
  • A new socket for simple motherboard design
  • Clock frequencies from 1500 MHz
  • 20 stages pipeline
  • SSE2 and 128 bit MMX
  • 42 millions of transistors
  • A new 423 pins socket design
  • Dual Rambus memory channel with i850 chipset
  • Only single processor mode available.


    Intel uses the term NetBURST to describe some features in Pentium 4:

  • Advanced Dynamic Execution
  • The Rapid Execution Engine

    Advanced Dynamic Execution means that the processor may execute up to 6 instructions simultanously.

    Using Rapid Execution Engine certain instructions may be executed at twice the normal speed.

    20 stages of pipeline

    The pipeline is a execution unit which takes in decoded micro-instructions. The X86 instructions are first decoded and then sent to the pipeline. The longer the pipeline is, the quicker an instruction can be executed. Each stage executes a minor part of the work and by spreading the work on "more hands", the efficiency is increased.

    In Pentium III the pipeline was of 10 stages. In Pentium 4 it has been increased to 20 stages.

    The problem with the long pipeline is, that it takes to longer time load the instruction. And if the instruction should not have been loaded at all, the error is most costly (in time) the longer the pipeline becomes.

    With many instructions being executed simultanously you cannot avoid loading the wrong instruction (called misprediction) from time to time. And a shorter pipeline is quicker to recover this error - fewer stages has to be cleared and reloaded.

    The analytic work preventing mispredictions is done by the Branch Prediction Unit. This has, according to Intel, been improved with a 30% better performance compared to the Pentium III.

    Stalling is another phenomen. Normally the data to be used is located in the cache. But if, for some reason, the data is missing in the cache, it has to be loaded from RAM. This takes a lot of time, and the longer the pipeline is the longer time it takes. .

    The benefit from a pipeline increased from 10 stages to 20, is to open up for new higher clock frequencies. When each instruction is executed in more stages, it can be done a lot quicker.

    At lower frequencies this gives no advantage. In fact all reports indicate that a 2000 MHz Pentium 4 is slower than a 1600 MHz AthlonXP. This is due to the difficult prediction of the order of the instructions. Wrong predictions gives wasted clock cycluses and a poorer performance.

    However, a longer pipeline is required for processor speeds above 2 GHz. Intel expects this new NetBurst core to live three to five years. Hence we may expect Pentium 4 successors to reach 5 GHz or more.


    The Streaming SIMD (Single Instruction Multiply Data) Extensions known from Pentium III has been improved. The data width has doubled from 64 to 128 bit.

    Also 144 new instructions in SSE2 gives more parallel execution. Now four Internet/Multimedia-based operations can be executed simultanously. The new design appears to have been accepted by software developers, and it will probably be very useful within programs like Photoshop.

    The Execution Trace Cache

    The Pentium 4 is the first CPU to have a "code cache". All instructions are translated inside the CPU. This happens in all modern x86 processors. They receive x86 instructions from the software. These instructions are "crunched" into smaller instructions which then are executed natively.

    The new thing in Pentium 4 is that these translated instructions are cached and reused. The logic of this setup may look like this:

    This new cache, being a part of the L1 cache holds 12 KB. Together with 8 KB of data cache, the L1 cache consists of 20 KB.

    Heavy hardware

    The first Pentium 4s were giant chips with a die size of 247 mm2. The 42 millions of transistors uses 60 watts and requires heavy cooling. However, Intel has done a lot to provide sufficient cooling using new materials and design.

    The sockets

    The first Pentium 4s came with a 423 pin design. These CPUs could only use RDRAM, which only requires few pins interface.

    Later came a 478 pin design with support for SDRAM.

    The chipsets

    The chip set i850 ("Tehama") is using a dual RDRAM bus. This is also heating up the MCH (north bridge) chip, so additional cooling is required.

    The Intel i845 chipset was introduced August 2001. It gives an interface to standard 133 MHz SDRAM. This chipset is found in many cheap Pentium 4 computers.

    Support for 266 MHz DDR RAM is found in the VIA P4X266 and P4X266A chipsets to be used with Pentium 4 (against Intels will; they don't like VIA at all).

    SIS 645 is a chipset for Pentium 4 with support for both 266 and 333 MHz DDR RAM.

    The i845D chipset

    December 2001 Intel suddenly introduced a new flavour of the 845 chipset. It is designed to use with DDR-RAM!

    There was no press releases about this product. Searching Intel's web for info on this chipset gave no result (December 27th 2001):

    However the i845D chipset was found in computers from Dell and others.

    Intel has been under a hard pressure from AMD, VIA, and the Taiwanese motherboard companiess in all 2001, and it was fine to see, that they finally had to adapt to common sense. DDR is the RAM type to use - almost nobody wants RAMBUS! And Intel's dominating position in the market will promote better DDR RAM products.


    On January 7th 2002 Intel introduces a new 2.2 GHz version of the Pentium 4.

    This processor comes with the new Northwood kernel:

  • L2 cache doubled from 256 KB to 512 KB
  • 0.13-micron process

    New techniques should also improve the clock circle/instruction execution ratio.

    Due to the new design, the performance of this Pentium 4 was increased with 30% compared to the 2 GHZ version - where one should expect only 10%.

  • Next page
  • Previous page
  • Much more about P4s ...

  • The Pentium 4

    Learn more

    Read about chip sets on the motherboard in module 2d

    Read more about RAM in module 2e

    Read module 5a about expansion cards, where we evaluate the I/O buses from the port side.

    Read module 5b about AGP and module 5c about Firewire.

    Read module 7a about monitors, and 7b on graphics card.

    Read module 7c about sound cards, and 7d on digital sound and music.

    [Main page]
    [Karbo's Dictionary]
    [The Software Guides]

    Copyright (c) 1996-2005 by Michael B. Karbo. Click & Learn