KarbosGuide.com. Module 3e.09

On MMX, 3DNow!, and Katmai

The contents:

  • An introduction
  • The FPU
  • Working with 3D graphics
  • MMX
  • 3DNow!
  • Katmai
  • Next page
  • Previous page

  • Multimedia, MMX and Katmai

    With the Pentium MMX we had the first of several improvements of the microprocessor's set of instructions. Later, we got 3DNow! and Katmai. What does all this mean?

    In 1995 the Pentium processor was expanded with the so-called MMX instructions. That was announced as a multimedia expansion with 57 new instructions.

    Today the emphasis in multimedia is especially in 3D graphics. Here the most important operation is the so-called geometric transformations, which deal with floating-point numbers. Let us take a look at these issues.


    FPU stands for Floating-point Unit. That is the unit in the processor, that handles floating-point numbers. It is difficult for the CPU to manipulate floating-point numbers, since it requires lots and lots of bits to perform an accurate calculation. Math with integers is much simpler, and is done with hundred percent accuracy each time.

    The FPU works with floating - point numbers of various bit length, depending on the desired degree of accuracy. The most accurate type has a bit length of 80!

    All the modern P6 processors have 8 FP registers, each of which has a bit length of 80. So there is room inside the CPU itself for 8 numbers each of 80 bit length or, for example, 16 numbers each of 32 bit length. Read more...

    Working with 3D graphics

    When you draw people and landscapes, which are altered in 3D graphics, the figures are built up from small polygons (usually triangles or rectangles).

    A figure in a PC game can typically be built from 200-1500 such polygons. For each change in the picture these polygons have to be re-drawn in a new position. This means that each corner (vertex) in every polygon has to be recalculated.

    Floating-point number operations

    To calculate the placement of the polygons, you need to use floating-point numbers. Integer calculations (1, 2, 3, 4 etc.) are not nearly precise enough. Instead, you use decimal numbers such as 4.347. These numbers are single precision. They are 32 bits long. There are also 64 bit numbers (having more decimal places). They are called double precision numbers, which are useful for even more demanding calculations.

    However the 32 bits numbers are sufficient to design 3D objects. When the figures in a 3D landscape move, you need to make a so-called matrix multiplication to calculate the new vertices. If a figure consists of 1000 polygons, it requires up to 84,000 multiplications, each with two 32 bit floating-point numbers.

    It is quite a hefty piece of math, for which the traditional PC is not well equipped. Actually, the largest spreadsheet available to the finance ministry is a drop in the bucket compared to Quake II, as far as number crunching ability is concerned.

    What assists the 3D execution?

    The CPU can easily run out of breath when it comes to work with 3D movements across the screen. So what assistance can it get? That can be provided in different ways:

  • Generally speaking, the faster the CPU, the higher the clock speed, the faster the traditional FPU performance will be.

  • Improvements in the CPU’s FPU with pipelines and other acceleration. We see that in each new CPU generation.

  • New instructions for more effective 3D performance. Instructions which can be called by the programs, 3DNow! and SSE, are examples of this.

  • 3D accelerated graphics cards.


    The Pentium MMX processor was a big success. However, that was not because of the MMX instructions. Many regard them as a flop.

    The point is that MMX only works with integers. Furthermore the system is so weak, that it can only work with either MMX or with FPU, not both simultaneously. That is because the two sets of instructions share registers.

    The MMX instructions can be of assistance in other tasks in the redrawing of 3D landscapes (the surface etc.), but for all the geometry you need much more umph!

    Here you see the MMX enabling in a program. It is "Painter Classic" a great drawing program, which is bundled with Wacoms drawing tablets. The program utilizes MMX:


    During the summer of 1998 AMD introduced a new collection of CPU instructions, which improve the 3D execution.

  • 21 new instructions.

  • SIMD instructions, which enable handling of more data portions with just one instruction.

  • Improved handling of numbers, especially the 32 bit numbers, which are used widely in 3D games. 3DNow! became a big success, since the instructions soon became integrated in Windows , in different games (and other programs) and in the driver programs from the hardware producers.

    The instructions use the same registers, as do MMX and traditional FPU. So they have to share them. Since the registers are 80 bits wide, they can hold two 32 bit numbers simultaneously.


    Katmai (SSE) is Intel's way to improve 3D execution in Pentium III. Read also the description in module 3e7. The problem with Katmai is that the instructions require software support, and that will take some time to get in place.

    In principle Katmai is significantly more powerful than 3DNow! The 8 new 128 bit registers can actually hold four 32 bit numbers at a time. But to take advantage of this, the FPU pipeline should also have been doubled, so each multiplication or addition pipeline could receive four numbers at a time.

    However that was not done in Pentium III, since it would have delayed its introduction. So the pipelines can still handle two 32 bit numbers at a time. In that way the full potential of Katmai is not reached within the actual Pentium III design.

    With the current FPU unit Pentium III can perform twice as many 32 bit number operations per clock tick as can the other P6 processors (Pentium II and Celeron). That is the same performance as we find in the 3DNow! processors. But Pentium III is scheduled for future editions with a four-fold increase in FPU performance as far as the 32 bit numbers are concerned.


    SIMD stands for Single Instruction Multiple Data. This technique was introduced in the MMX processors, where more than one integer could be processed simultaneously. In Pentium III this technique was given another lift, so now it can handle more than one floating-point number. Multimedia handling especially will benefit from this, since many floating-point number operations are handled in sound and video programs.

    With the introduction of Pentium 4, the SIMD instruction set was further improved with144 new instructions.

  • Next page
  • Previous page

    Learn more

    Read about chip sets on the motherboard in module 2d

    Read more about RAM in module 2e

    Read module 5a about expansion cards, where we evaluate the I/O buses from the port side.

    Read module 5b about AGP and module 5c about Firewire.

    Read module 7a about monitors, and 7b on graphics card.

    Read module 7c about sound cards, and 7d on digital sound and music.

    [Main page]
    [Karbo's Dictionary]
    [The Software Guides]

    Copyright (c) 1996-2005 by Michael B. Karbo. www.karbosguide.com.