Country:
 

Intel Core micro architecture

hwi-h Articles » Intel Core micro architecture

Intel Core micro architecture Intel Core micro architecture
The new architecture used for Intel’s Core 2 Duo explained

Wide Dynamic Execution

Dynamic Execution is the name for a collection of technologies, including out-of-order execution, which Intel has been using since the P6 micro architecture. As the name implies out-of-order execution means that instructions arriving at the processor are not necessarily processed in the original order. Modern processors determine the most ideal order in which instructions are processed and when possible already start working on instructions that might be needed in the future. An important aspect of the inner working of modern processors in pipelining, a technology that divides all activities needed for fetching, decoding and executing instructions in several smaller actions. An instruction travels through all the stages of the pipeline and when everything goes as planned, a completely finished instruction will exit the pipeline every clock cycle. By feeding a bunch of instructions in the pipeline, all parts of the processor are working at every point in time.

The Pentium 4 features a very long pipeline with no less then 31 stages. The advantage is that every part of the process is therefore relatively simple and can be executed at very high speed. Because Intel was targeting high clock frequencies during the development of the Pentium 4, they chose to implement this very long pipeline. On first sight this seems like a wise decision and can you expect a processor with a very fragmented pipeline and extremely high clock frequencies to reach a very high level of performance. But there is a downside to a long pipeline: as mentioned modern processors tend to work on future instructions and it is not always completely clear which instructions are needed in the future. When there is a branch in a computer program - for instance when you have to choose Yes or No in a dialog windows - the CPU will process the more likely of the two branches. In the unfortunate case of a wrong branch decision, all partial results within the pipeline have to be abandoned and new instructions have to be fed into the pipeline. With a long pipeline as the Pentium 4 has it will take a long time before the first proper instruction will reach the end of the pipeline completely executed. With the core micro architecture, Intel chose for a pipeline with 14 stages, less then half of the amount of stages in the last generation of Pentium 4 processors and just a few stages more than the 10 stage pipeline of the Pentium III.

The new pipeline within the core micro architecture is also wider than the one inside the Pentium 4, hence the name wide dynamic execution. With the new architecture four parallel instructions can be executed at the same time, where the pipeline of the Pentium 4 and Pentium M can only execute three parallel instructions. The system is comparable with a motorway with three or four lanes. In the best case scenario the new pipeline can finish four instructions every clock cycle. But there are more new technologies that increase the IPC: Intel further improved their branch prediction algorithms and as a result the chance that the processor chooses the wrong branch while working on future instructions has decreased considerably. Completely new is the concept of macrofusion. Usually all instruction are decoded and executed separately by a processor. The new micro architecture on the other hand checks whether it is possible to combine two consecutive instructions and feed them into the pipeline as one new instruction. This effectively means that occasionally two instructions can be finished in the execution time of a single instruction. The core micro architecture also has an advanced form of micro-op fusion, the so called micro-ops are tiny instructions with low complexity that are easy for the CPU to process. Usually incoming complex x86-instructions are transformed to one or more less complex micro-ops before they are fed into the pipeline. The new Core architecture tries to combine these micro-ops again and execute them in one go, again to increase the average amount of instructions that can be processed per clock cycle.


All parts of the pipeline of the new architecture can work on four instructions in parallel.

Advertisement

Related articles processors

SlashdotPost to Slashdot Digg thisDigg this Add to del.icio.usAdd to del.icio.us

Hardware.Info in other countries: België - Nederland - United Kingdom - United States