Kishan Jainandunsing
(March 2006)
IDF Spring, March 2006, San Francisco – Intel revealed under-the-hood details of its next generation multi-core micro-architecture, called Intel Core. With Intel Core the company is unifying the multi-core CPU architectures of its mobile, desktop and server processors and extends its EM64T for 64-bit memory addressing across all platforms. The Intel Core micro-architecture will be at the heart of the company's next generation multi-core mobile, desktop and server CPU products, which are code named Merom, Conroe and Woodcrest, respectively. These processors are expected to start shipping from Q3 2006 onwards in a 45nm process with up to 4MB of L2 cache.
The Intel Core micro-architecture achieves higher performance/watt operation than the company's current generation of processors through 5 innovations: (1) wide dynamic execution, (2) advanced digital media boost, (3) smart memory access, (4) advanced smart cache and (5) intelligent power management.
Wide dynamic execution allows the processor to issue 4 or more instructions per cycle without increasing data path width. This is done through a combination of micro-ops and macro-ops fusion. Micro-ops fusion is a technique, which allows a single micro-op to represent multiple micro-ops that result from the decoding of an x86 instruction. Micro-ops fusion was first introduced by the company in its Banias micro-architecture. Macro-ops fusion is new in the Intel Core micro-architecture and is a technique, which represents common x86 instruction pairs, such as a compare followed by a conditional branch, as single micro-op instructions.
Advanced digital media boost increases the performance of the SSE engine for multimedia operations, by doubling its data path and register widths to 128 bits. This allows SSE instructions to execute in a single cycle and is a hugh gain for multimedia operations, such as audio or video compression and decompression.
Smart memory access results in faster execution by prioritizing memory loads that are not preceded by memory stores. In addition, prefetching algorithms are used that detect application data patterns and pre-load data into the L1 and L2 caches. Each L1 cache has 2 data prefetchers and 1 instruction prefetcher. The shared L2 cache has two prefetchers of its own, which are flexibly allocated between cores as needed. The technique is an extension of the existing cache coherency system and hides latency in the memory subsystem.
With advanced smart cache results in faster execution by letting the L2 cache be dynamically shared between the cores and by reducing fetches to system memory. Each core can allocate what it needs, up to the full cache size. Data in the L2 cache that needs to be used by the two cores is shared, so that it is not replicated in L2 for each core, as it would need to be if each core had its own, exclusive L2 cache.
Intelligent power management results in lower power consumption by deploying ultra fine-grain control and through split busses. Ultra fine-grain control switches off parts of the processor's data path, L2 cache and instruction pipeline when not used. By splitting busses in minimum and incremental widths, the incremental widths can be switched off when the data or instruction size is less than the worst case width that can occur.
Next to these power management techniques, the company is rolling out additional techniques, such as: (1) a power status indicator (PSI), (2) digital thermal sensors (DTS) and (3) a platform environment control interface (PECI). Through the PSI interface the processor can communicate its power consumption needs to the voltage regulation system for optimized voltage regulator, load line and power delivery efficiency. Digital thermal sensors are located in the hot spot areas of the processor. Dedicated logic on the processor continuously reads data from these sensors and calculates the die temperature at any given time. PECI is a multi-drop, single-wire bus, which allows efficient platform thermal control. Each processor in the system reports its temperature through this bus to a platform thermal management control chip, which in turn controls a cooling system, such as fans, etc., as well as power shutdown.
Intel claims substantial improvements in performance and power consumption across the server, desktop and mobile platforms with the new Intel Core micro-architecture. For Woodcrest it claims a whopping 80% performance and a 35% power consumption improvement compared to a high-end, 2.8GHz, dual-core Xeon processor (Paxville DP). For Conroe is claims more than 40% performance and power consumption improvements compared to a high-end Pentium D processor 950 (Presler). And for Merom it claims greater than 20% performance improvement and further reduction of power consumption over the already impressive gains in power reduction by the current Yonah, Dothan and Banias architectures.

