

# Intel Core i7 Sandy Bridge-E 3960X LGA-2011



Corso di Architettura e Progetto di Sistemi  
e Servizi Informatici

# Roadmap

- Overview
- What's New
- Inside the Architecture
- Intel's Technologies
- Performance

# Overview

### 45nm Process Technology

**Penryn**  
Intel® Core™  
Microarchitecture

**Nehalem**  
Intel® Core™  
Microarchitecture

### 32nm Process Technology

**Westmere**  
Intel® Core™  
Microarchitecture  
(Nehalem)

**Sandy Bridge**  
Intel® Core™  
Microarchitecture

### 22nm Process Technology

**Ivy Bridge**  
Intel® Core™  
Microarchitecture  
(Sandy Bridge)  
*FUTURE PLATFORM*

TICK

TOCK

TICK

TOCK

TICK

First High End Desktop Platform  
on the Sandy Bridge Microarchitecture

Sandy Bridge-E



# Intel® Core™ i7-3960X processor Extreme Edition

## *Summary of Product Features*



- 6 Cores, 12 Threads
  - Intel® Turbo Boost Technology 2.0
  - Intel® Hyper-Threading Technology
  - Supports LGA 2011 socket Intel® X79 Express Chipset-based motherboards
  - Up to 15 MB Intel® Smart Cache
  - Integrated Memory Controller
    - 4 channels of DDR3 1600 MHz, 1DPC
  - Intel® AVX and AES
  - 40 PCI Express<sup>\*1</sup> Lanes
  - SSE4.1 & SSE4.2 Instructions

<sup>1</sup> Intel believes that some PCIe devices may be able to achieve the 8GT/s PCIe transfer rate on the X79 Express Chipset based platform.

\*Other names and brands may be claimed as the property of others.

4

Copyright© 2011 Intel Corporation. All rights reserved. Under embargo until 12:01am PT November 14, 2011

София, 1965 г. | Българският писател и поет Георги Станишев



# Other Specs

- 2,27 billion transistors
- 3,30 GHz core clock speed
- 3,90 GHz TB core clock speed
- 6x64 KB L1 cache
- 6x256 KB L2 cache
- 130 W TDP





<sup>1</sup>Theoretical maximum bandwidth

<sup>2</sup>All SATA ports capable of 3 Gb/s. 2 ports capable of 6 Gb/s.

## Intel® X79 Express Chipset Block Diagram

# What's New

# Core Block Diagram



# Front End Microarchitecture



## Instruction Decode in Processor Core

- 32 Kilo-byte 8-way Associative ICache
- 4 Decoders, up to 4 instructions / cycle
- Micro-Fusion
  - Bundle multiple instruction events into a single “Uops”
- Macro-Fusion
  - Fuse instruction pairs into a complex “Uop”
- Decode Pipeline supports 16 bytes per cycle

# New: Decoded Uop Cache



## Add a Decoded Uop Cache

- An L0 Instruction Cache for Uops instead of Instruction Bytes
  - ~80% hit rate for most applications
- Higher Instruction Bandwidth and Lower Latency
  - Decoded Uop Cache can represent 32-byte / cycle
    - More Cycles sustaining 4 instruction/cycle
  - Able to 'stitch' across taken branches in the control flow

# New Branch Prediction Unit



## Do a 'Ground Up' Rebuild of Branch Predictor

- Twice as many targets
- Much more effective storage for history
- Much longer history for data dependent behaviors

## Sandy Bridge Out-of-Order (OOO) Cluster



- Method: Physical Reg File (PRF) instead of centralized Retirement Register File
  - Single copy of every data
  - No movement after calculation
- Allow significant increase in buffer sizes
  - Dataflow window ~33% larger

PRF is a “Cool” feature  
better than linear  
performance/power

Key enabler for Intel®  
Advanced Vector  
Extensions (Intel® AVX)

|                        | Nehalem | Sandy Bridge |
|------------------------|---------|--------------|
| Load Buffers           | 48      | 64           |
| Store Buffers          | 32      | 36           |
| RS - Scheduler Entries | 36      | 54           |
| PRF integer            | N/A     | 160          |
| PRF float-point        | N/A     | 144          |
| ROB Entries            | 128     | 168          |

# Execution Cluster – A Look Inside

## Scheduler sees matrix:

- 3 “ports” to 3 “stacks” of execution units
- General Purpose Integer
  - SIMD (Vector) Integer
  - SIMD Floating Point
- The challenge is to double the output of one of these stacks in a manner that is invisible to the others



# Execution Cluster

## Solution:

- Repurpose existing datapaths to *dual-use*
- SIMD integer and legacy SIMD FP use legacy stack style
- Intel® AVX utilizes *both* 128-bit execution stacks



"Cool" Implementation of Intel AVX  
256-bit Multiply + 256-bit ADD + 256-bit Load per clock...  
Double your FLOPs with great energy efficiency

# Memory Cluster



- Memory Unit can service two memory requests per cycle
  - 16 bytes load and 16 bytes store per cycle

Challenge to the Memory Cluster Architects

Maintain the historic bytes/flop ratio of SSE for Intel® AVX

...  
...and do so in a "cool" manner

# Memory Cluster in Sandy Bridge



- Solution : Dual-Use the existing connections
  - Make load/store pipes symmetric
- Memory Unit services **three** data accesses per cycle
  - 2 read requests of up to 16 bytes AND 1 store of up to 16 bytes
  - Internal sequencer deals with queued requests

Second Load Port is one of highest performance features  
Required to keep Intel® Advanced Vector Extensions (Intel® AVX)  
Instruction Set fed linear power/performance means its "Cool"



# Intel AVX

- New 256-bit instruction set extension to Intel Streaming SIMD Extensions (Intel SSE)
- Released as part of the Intel microarchitecture code name Sandy Bridge
- Can give great computation power to boost applications

# Applications

- Suitable for floating point-intensive calculations in multimedia, scientific and financial applications
- Increases parallelism and throughput in floating point SIMD calculations
- Reduces register load due to the non-destructive instructions.

# SSE Vs AVX



# Demo

# Original C Implementation

```
for (int j=0 ; j<firHalfLength; j++) // firHalfLength is 1023
{
    dFirCoefs = pFIRBuf[j];
    acc1 += pDllBuf[lFirIndex]*dFirCoefs; //acc1 is accumulator for Index
    acc2 += pDllBuf[lFirIndexRev]*dFirCoefs; //acc2 is accumulator for IndexRev
    lFirIndex =(lFirIndex-1)&lMask; //dec backward index (modulo operation)
    lFirIndexRev = (lFirIndexRev+1)&lMask;
}
```

# Intel SSE 128-bit Implementation

```
__m128d DllVal, FIRCoef, mulVal;  
  
for (int i = 0; i < firHalfLength; i += 2) //Operate on 2 elements at a time  
{  
    FIRCoef = _mm_load_pd(pFIRBuf+i);  
  
    //acc1  
    DllVal = _mm_load_pd(pDllBuf+lFIRIndexRev);  
    mulVal = _mm_mul_pd(FIRCoef, DllVal);  
    acc1 = _mm_add_pd(acc1, mulVal);  
  
    //acc2  
    DllVal = _mm_load_pd(pDllBuf+lFIRIndex);  
    DllVal = _mm_shuffle_pd(DllVal, DllVal, 0x1);  
    mulVal = _mm_mul_pd(FIRCoef, DllVal);  
    acc2 = _mm_add_pd(acc2, mulVal);  
  
    lFIRIndex -= 2;  
    lFIRIndex = (lFIRIndex & lMask);  
    lFIRIndexRev += 2;  
    lFIRIndexRev = (lFIRIndexRev & lMask);  
}
```



# Intel AVX Implementation

```
__m256d DllVal, FIRCoef, mulVal;
__m128d tmph,tmpl,tmpsh,tmpsh;

for (int i = 0; i < firHalfLength; i += 4) //Operate on 4 elements at a time
{
    FIRCoef = _mm256_load_pd(pFIRBuf+i);

    //accl
    DllVal = _mm256_load_pd(pDllBuf+lFIRIndexRev);
    mulVal = _mm256_mul_pd(FIRCoef, DllVal);
    accl = _mm256_add_pd(accl, mulVal);

    //acc2
    DllVal = _mm256_load_pd(pDllBuf+lFIRIndex);
    DllVal = _mm256_permute2f128_pd (DllVal,DllVal ,0x1); // Cross lane shuffle
    DllVal = _mm256_permute_pd(DllVal, 0x5);
    mulVal = _mm256_mul_pd(FIRCoef, DllVal);
    acc2 = _mm256_add_pd(acc2, mulVal);

    lFIRIndex -= 4;
    lFIRIndex = (lFIRIndex & lMask);
    lFIRIndexRev += 4;
    lFIRIndexRev = (lFIRIndexRev & lMask);
}
```



# Execution Speed Comparison



# Key Intel® AVX Features

| KEY FEATURES                                                                                                                                                                                            | BENEFITS                                                                                                                                       |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"><li>• Wider Vectors<ul style="list-style-type: none"><li>– Increased from 128 to 256 bit</li><li>– Two 128-bit load ports</li></ul></li></ul>                         | <ul style="list-style-type: none"><li>• Up to 2x peak floating point operations per second (FLOPs) output with good power efficiency</li></ul> |
| <ul style="list-style-type: none"><li>• Enhanced Data Rearrangement<ul style="list-style-type: none"><li>– Use the new 256 bit primitives to broadcast, mask loads and permute data</li></ul></li></ul> | <ul style="list-style-type: none"><li>• Organize, access and pull only necessary data more quickly and efficiently</li></ul>                   |
| <ul style="list-style-type: none"><li>• Three and four Operands<ul style="list-style-type: none"><li>• Non Destructive Syntax for both 128 bit and 256 bit Intel AVX instructions</li></ul></li></ul>   | <ul style="list-style-type: none"><li>• Fewer register copies, better register use for both vector and scalar code</li></ul>                   |
| <ul style="list-style-type: none"><li>• Flexible unaligned memory access support</li></ul>                                                                                                              | <ul style="list-style-type: none"><li>• More opportunities to fuse load and compute operations</li></ul>                                       |
| <ul style="list-style-type: none"><li>• Extensible new opcode (VEX)</li></ul>                                                                                                                           | <ul style="list-style-type: none"><li>• Code size reduction</li></ul>                                                                          |

***Intel® AVX is a general purpose architecture.***

# Inside the Architecture

- Basic Execution Environment
- Protection
- Multiple-Processor Management
- Memory Cache Control
- Power and Thermal Management

# Basic Execution Environment

# Modes of Operation



# Resources

- Basic Program Execution Registers
- Address Space
- FPU Registers
- MMX Registers
- XMM Registers
- Stack

# Additional Resources

- I/O ports
- Control Registers
- Memory Management Registers
- Debug Registers
- Memory Type Range Registers (MTRRs)
- Machine Specific Registers (MSRs)
- Machine Check Registers
- Performance Monitoring Counters

# Protection

- Operates at both the segment level and the page level
- Four privilege levels for segments
- Two privilege levels for pages
- Any violation results in an exception
- No performance penalty

# Protection Checks

- Limit Checks
- Type Checks
- Privilege Level Checks
- Restriction of Addressable Domain
- Restriction of Procedure Entry-Points
- Restriction of Instruction Set

# Multi-Processor Management

# Goals

- Maintain system memory coherency
- Maintain cache consistency
- Allow predictable ordering of writes to memory
- Distribute interrupt handling among a group of processors.
- Increase system performance by exploiting the multi-threaded and multiprocess nature of contemporary operating systems and applications.

# How

- Bus locking and/or cache coherency management
- Serializing instructions
- An advance programmable interrupt controller (APIC)
- Intel Hyper-Threading Technology
- A second-level cache (level 2, L2)
- A third-level cache (level 3, L3)

# Mechanisms for Locked Atomic Operations

- Guaranteed atomic operations
- Bus locking, using the LOCK# signal and the LOCK instruction prefix
- Cache coherency protocols that ensure that atomic operations can be carried out on cached data structures (cache lock)

# Automatic Locking

- When executing an XCHG instruction that references memory.
- When setting the B (busy) flag of a TSS descriptor
- When updating page-directory and page-table entries
- Acknowledging interrupts

# Serializing Instructions

- Force the processor to complete all modifications to flags, registers, and memory by previous instructions and to drain all buffered writes to memory before the next instruction is fetched and executed
- Privileged serializing instructions — INVD, INVEPT, INVLPG, INVPID, LGDT, LIDT, LLDT, LTR, MOV (to control register, with the exception of MOV CR82), MOV(to debug register), WBINVD, and WRMSR3.
- Non-privileged serializing instructions — CPUID, IRET, and RSM.

# Multiprocessor Initialization

- Supports controlled booting of multiple processors without requiring dedicated system hardware.
- Allows hardware to initiate the booting of a system without the need for a dedicated signal or a predefined boot processor.
- Allows all IA-32 processors to be booted in the same manner, including those supporting Intel Hyper-Threading Technology.



## Bootstrap processor (BSP)

- The BSP flag is set in the IA32\_APIC\_BASE MSR of the BSP.
- the BSP then begins executing the operating-system initialization code

## Application Processors (APs)

- This flag is cleared for all other processors.
- wait for a startup signal (a SIPI message) from the BSP processor. Upon receiving a SIPI message, an AP executes the BIOS AP configuration code, which ends with the AP being placed in halt state.

# Management of Idle and Blocked Conditions

- HLT instruction
- PAUSE instruction
- MONITOR/MWAIT instruction

# Memory Cache Control

# Methods of Caching

- Strong Uncacheable (UC)
- Write Combining (WC)
- Uncacheable (UC-)
- Write Through (WT)
- Write Back (WB)
- Write Protected (WP)

# Cache Control Protocol

| Cache Line State                            | M (Modified)                   | E (Exclusive)                  | S (Shared)                                                    | I (Invalid)                      |
|---------------------------------------------|--------------------------------|--------------------------------|---------------------------------------------------------------|----------------------------------|
| This cache line is valid?                   | Yes                            | Yes                            | Yes                                                           | No                               |
| The memory copy is...                       | Out of date                    | Valid                          | Valid                                                         | —                                |
| Copies exist in caches of other processors? | No                             | No                             | Maybe                                                         | Maybe                            |
| A write to this line ...                    | Does not go to the system bus. | Does not go to the system bus. | Causes the processor to gain exclusive ownership of the line. | Goes directly to the system bus. |

# MESI Protocol

- Upon loading:
  - A line is marked “E”
  - Subsequent read OK
  - Write marks “M”
- If another reads an “M” line
  - Write it back
  - Mark it “S”
- Write to an “S”, send “I” to all, mark “M”
- Read/write to an “I” misses

# Power and Thermal Management

# ACPI

- Industrial open standard
- Provides methods for hardware's low level control
- Defines performance state that are used to facilitate system software's ability to manage processor power consumption
- Needs compatible hardware

# ACPI System State



| State      | Description                                                                               |
|------------|-------------------------------------------------------------------------------------------|
| G0/S0      | Full On                                                                                   |
| G1/S3-Cold | Suspend-to-RAM (STR). Context saved to memory (S3-Hot is not supported by the processor). |
| G1/S4      | Suspend-to-Disk (STD). All power lost (except wakeup on PCH).                             |
| G2/S5      | Soft off. All power lost (except wakeup on PCH). Total reboot.                            |
| G3         | Mechanical off. All power removed from system.                                            |

# Core C-State

| Core C-State | Global Clock | PLL | L1/L2 Cache    | Core VCC          | Context        |
|--------------|--------------|-----|----------------|-------------------|----------------|
| CC0          | Running      | On  | Coherent       | Active            | Maintained     |
| CC1          | Stopped      | On  | Coherent       | Active            | Maintained     |
| CC1E         | Stopped      | On  | Coherent       | Request LFM       | Maintained     |
| CC3          | Stopped      | On  | Flushed to LLC | Request Retention | Maintained     |
| CC6          | Stopped      | On  | Flushed to LLC | Power Gate        | Flushed to LLC |
| CC7          | Stopped      | Off | Flushed to LLC | Power Gate        | Flushed to LLC |

# Threads and Core C-State



# Package C-State

| Package C-State        | Core States             | Limiting Factors                                                                                                                                                                                                                                  | Retention and PLL-Off                | LLC Fully Flushed | Notes <sup>1</sup> |
|------------------------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-------------------|--------------------|
| PC0 – Active           | CC0                     | N/A                                                                                                                                                                                                                                               | No                                   | No                | 2                  |
| PC2 – Snoopable Idle   | CC3-CC7                 | <ul style="list-style-type: none"> <li>• PCIe/PCH and Remote Socket Snoops</li> <li>• PCIe/PCH and Remote Socket Accesses</li> <li>• Interrupt response time requirement</li> <li>• DMI Sidebands</li> <li>• Configuration Constraints</li> </ul> | VccMin<br>Freq = MinFreq<br>PLL = ON | No                | 2                  |
| PC3 – Light Retention  | at least one Core in C3 | <ul style="list-style-type: none"> <li>• Core C-state</li> <li>• Snoop Response Time</li> <li>• Interrupt Response Time</li> <li>• Non Snoop Response Time</li> </ul>                                                                             | Vcc = retention<br>PLL = OFF         | No                | 2,3,4              |
| PC6 – Deeper Retention | CC6-CC7                 | <ul style="list-style-type: none"> <li>• LLC ways open</li> <li>• Snoop Response Time</li> <li>• Non Snoop Response Time</li> <li>• Interrupt Response Time</li> </ul>                                                                            | Vcc = retention<br>PLL = OFF         | No                | 2,3,4              |

# Package C-State Entry/Exit



# State Combinations

| Global (G) State | Sleep (S) State | Processor Core (C) State | Processor State | System Clocks   | Description     |
|------------------|-----------------|--------------------------|-----------------|-----------------|-----------------|
| G0               | S0              | C0                       | Full On         | On              | Full On         |
| G0               | S0              | C1/C1E                   | Auto-Halt       | On              | Auto-Halt       |
| G0               | S0              | C3                       | Deep Sleep      | On              | Deep Sleep      |
| G0               | S0              | C6/C7                    | Deep Power Down | On              | Deep Power Down |
| G1               | S3              | Power off                | —               | Off, except RTC | Suspend to RAM  |
| G1               | S4              | Power off                | —               | Off, except RTC | Suspend to Disk |
| G2               | S5              | Power off                | —               | Off, except RTC | Soft Off        |
| G3               | NA              | Power off                | —               | Power off       | Hard off        |

# Thermal Monitoring and Protection

- Catastrophic shutdown detector
- Automatic and adaptive thermal monitoring
- Software controlled clock modulation
- On-die digital thermal sensor and interrupt



Technologies

- Advanced Smart Cache
- Smart Memory Access
- Turbo Boost 2.0
- Enhanced SpeedStep
- Hyper-Threading

# Advanced Smart Cache

## Efficient Data Sharing

Advanced Smart Cache



Independent Cache



Intel Developer  
**FORUM**

*2X L2 to L1 Bandwidth*



# Intel Advanced Smart Cache

- Higher Cache Hit Rate
- Reduced BUS traffic
- Lower Latency to Data

# Intel Smart Memory Access

- Goals:
  - Improves system performance
  - Hides latency of memory accesses
- How:
  - Memory Disambiguation
  - IP-based prefetcher

# Smart Memory Access

*With Intel's New Memory Disambiguation*



Loads can decouple from Stores

Load4 can get its data FIRST

# Prefetchers and Multi-Core



# Intel® Hyper-Threading Technology



## What is it?

- Intel® Hyper-Threading Technology enables each processor core to run two tasks at the same time
- Two thread engines per core, enabling 4-way processing in dual core systems and 8-way processing in quad core systems
- Available with the new Intel® Core™ family of processors

## Benefits for consumers

- More threads and smart multitasking equals better performance
- Faster response time = less waiting



# Features

- Duplicated for each logical processor
- Shared by logical processors in a physical processor
- Shared or duplicated, depending on the implementation





## MainConcept 1.6.1

MPEG-2 to H.264

Core i7-870  
Hyper-Threading On

1:26

Core i7-870  
Hyper-Threading Off

1:48



## AVG Anti-Virus 8.5

Virus Scan of 334MB Compressed Files

Core i7-870  
Hyper-Threading On

2:57

Core i7-870  
Hyper-Threading Off

3:52



# Intel® Turbo Boost Technology 2.0

*Dynamically Delivering Optimal Performance*



# Intel® Turbo Boost Technology<sup>1</sup> 2.0



Four-Core Turbo

Dual-Core Turbo

Single-Core Turbo

## *Efficient.*

- Adapts by varying turbo frequency to conserve energy depending upon the type of instructions

## *Dynamic.*

- Boosts power level to achieve performance gains for high intensity "dynamic" workloads

## *Intelligent.*

- Power averaging algorithm manages power and thermal headroom to optimize performance

Intel® Turbo Boost Technology 2.0 delivers intelligent and energy efficient performance on demand

# Graphics Dynamic Frequency and Power Sharing



CPU Turbo bins & Graphics Dynamic Frequency (with Dynamic Range)

Base Frequencies

Idle mode

*Note 1: Power Sharing shown here with Single Core Turbo is only for illustrative purposes. Power Sharing can also occur when other cores are active as long as thermal headroom exists.*

*Note 2: Sandy Bridge is a monolithic die with integrated graphics. Graphics Core shown above as separate from CPU Cores is only for illustrative purposes.*

- Intel® HD Graphics with Dynamic Frequency delivers graphics performance boost to graphics intensive applications
- Power sharing algorithm works in concert with Intel® Turbo Boost Technology 2.0 to deliver performance when and where needed

Performance boost to graphics intensive applications when power and thermal headroom exist

# Next Generation Intel® Turbo Boost Technology

| Client               | Merom/<br>Penryn (Mobile<br>only)                                                       | Nehalem/Westmere                                                                                                                                                                                                                                                                                      |                                                                                                                                                        |                                                                                                                                                                                                                                         | Sandy Bridge    |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
|----------------------|-----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|--|--|--|-----|---------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|-----------------|----------------|--|--|--|--------|--------|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                      |                                                                                         | Clarksfield<br>Lynnfield/Clarkdale                                                                                                                                                                                                                                                                    | Arrandale                                                                                                                                              |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
| Key New Capabilities | <ul style="list-style-type: none"> <li>1 turbo bin when other core is asleep</li> </ul> | <ul style="list-style-type: none"> <li>Turbo controlled within power limit</li> <li>Multi-core turbo</li> <li>More turbo if cores are asleep</li> </ul>                                                                                                                                               | <ul style="list-style-type: none"> <li>Graphics Dynamic Frequency</li> <li>Driver controlled power sharing between IA and Graphics (Mobile)</li> </ul> | <ul style="list-style-type: none"> <li>HW controlled power sharing between IA cores and Graphics</li> <li>Dynamic Turbo provides high responsiveness</li> <li>More Turbo headroom from Improved power monitoring and control</li> </ul> |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
| Turbo Behavior       | Illustrative only. Does not represent actual number of turbo bins.                      | <p style="text-align: center;"><u>Quad Core Die</u></p> <table> <thead> <tr> <th>Single Core Turbo</th> <th>Dual Core Turbo</th> <th>Quad Core Turbo</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> <td></td> </tr> <tr> <td>0 1</td> <td>0 1 2 2</td> <td>0 1 2 2</td> </tr> </tbody> </table> | Single Core Turbo                                                                                                                                      | Dual Core Turbo                                                                                                                                                                                                                         | Quad Core Turbo |  |  |  | 0 1 | 0 1 2 2 | 0 1 2 2 | <p style="text-align: center;"><u>Dual Core Die</u></p> <table> <thead> <tr> <th>Single Core Turbo</th> <th>Dual Core Turbo</th> <th>Graphics Turbo</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> <td></td> </tr> <tr> <td>0 1 GT</td> <td>0 1 GT</td> <td>0 1 GT</td> </tr> </tbody> </table> | Single Core Turbo | Dual Core Turbo | Graphics Turbo |  |  |  | 0 1 GT | 0 1 GT | 0 1 GT | <p style="text-align: center;">Dual Core Die</p> <p style="text-align: center;">0 1 GT</p> <p style="text-align: center;">Quad Core Die</p> <p style="text-align: center;">0 1 2 3 GT</p> |
| Single Core Turbo    | Dual Core Turbo                                                                         | Quad Core Turbo                                                                                                                                                                                                                                                                                       |                                                                                                                                                        |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
|                      |                                                                                         |                                                                                                                                                                                                                                                                                                       |                                                                                                                                                        |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
| 0 1                  | 0 1 2 2                                                                                 | 0 1 2 2                                                                                                                                                                                                                                                                                               |                                                                                                                                                        |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
| Single Core Turbo    | Dual Core Turbo                                                                         | Graphics Turbo                                                                                                                                                                                                                                                                                        |                                                                                                                                                        |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
|                      |                                                                                         |                                                                                                                                                                                                                                                                                                       |                                                                                                                                                        |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |
| 0 1 GT               | 0 1 GT                                                                                  | 0 1 GT                                                                                                                                                                                                                                                                                                |                                                                                                                                                        |                                                                                                                                                                                                                                         |                 |  |  |  |     |         |         |                                                                                                                                                                                                                                                                                                       |                   |                 |                |  |  |  |        |        |        |                                                                                                                                                                                           |

# TB 1.0 Vs TB 2.0

## Innovative Concept: Thermal Capacitance

### Classic Model

Steady-State Thermal Resistance

Design guide for steady state

### New Model

Steady-State Thermal Resistance  
AND  
Dynamic Thermal Capacitance



Temperature

*Classic model  
response*

Time

Temperature

*More realistic  
response to power  
changes*

Time

*Temperature rises as energy is delivered to thermal solution  
Thermal solution response is calculated at real-time*

# Next Generation Intel® Turbo Boost Benefit



# Synthetic Test



## PCMark Vantage

Overall Suite Score



# Multimedia Test



**TMPGEnc 4.7.3.292**

MPEG-2 to MPEG-4

5 min. Terminator II SE DVD



# Videogames Test



## Left 4 Dead 2

Tom's Hardware Demo

2560x1600, No AA / 8x AA



# Processori @ Dinox PC

*Margine percentuale di guadagno con Turbo ON (Core i5-2500K)*



# Consumption



# Overclock wins!



# Intel Enhanced SpeedStep

- Advanced means of enabling very high performance while also meeting the power-conservation needs of mobile systems.
- Switches both voltage and frequency in tandem between high and low levels in response to processor load

# Performance

# Test Setups

|                            |                                                                                                  |
|----------------------------|--------------------------------------------------------------------------------------------------|
| <b>Motherboard:</b>        | ASUS P8Z68-V Pro (Intel Z68)<br>ASUS Crosshair V Formula (AMD 990FX)<br>Intel DX79SI (Intel X79) |
| <b>Hard Disk:</b>          | Intel X25-M SSD (80GB)<br>Crucial RealSSD C300                                                   |
| <b>Memory:</b>             | 4 x 4GB G.Skill Ripjaws X DDR3-1600 9-9-9-20                                                     |
| <b>Video Card:</b>         | ATI Radeon HD 5870 (Windows 7)                                                                   |
| <b>Video Drivers:</b>      | AMD Catalyst 11.10 Beta (Windows 7)                                                              |
| <b>Desktop Resolution:</b> | 1920 x 1200                                                                                      |
| <b>OS:</b>                 | Windows 7 x64                                                                                    |

# Processor Comparison

| Processor Number    | i7-3960X                 | i7-2600K                | i7-990X                  |
|---------------------|--------------------------|-------------------------|--------------------------|
| # of Cores          | 6                        | 4                       | 6                        |
| # of Threads        | 12                       | 8                       | 12                       |
| Clock Speed         | 3.3 GHz                  | 3.4 GHz                 | 3.46 GHz                 |
| Max Turbo Frequency | 3.9 GHz                  | 3.8 GHz                 | 3.73 GHz                 |
| Cache               | 15 MB Intel® Smart Cache | 8 MB Intel® Smart Cache | 12 MB Intel® Smart Cache |

# Cache and Memory Bandwidth Performance

| Cache/Memory Latency Comparison     |    |    |    |             |
|-------------------------------------|----|----|----|-------------|
|                                     | L1 | L2 | L3 | Main Memory |
| AMD FX-8150<br>(3.6GHz)             | 4  | 21 | 65 | 195         |
| AMD Phenom II X4<br>975 BE (3.6GHz) | 3  | 15 | 59 | 182         |
| AMD Phenom II X6<br>1100T (3.3GHz)  | 3  | 14 | 55 | 157         |
| Intel Core i5 2500K<br>(3.3GHz)     | 4  | 11 | 25 | 148         |
| Intel Core i7 3960X<br>(3.3GHz)     | 4  | 11 | 30 | 167         |

| Memory Bandwidth Comparison - Sandra 2012.01.18.10 |                                                         |                                                         |                                                      |
|----------------------------------------------------|---------------------------------------------------------|---------------------------------------------------------|------------------------------------------------------|
|                                                    | Intel Core i7<br>3960X (Quad<br>Channel, DDR3-<br>1600) | Intel Core i7<br>2600K (Dual<br>Channel, DDR3-<br>1600) | Intel Core i7 990X<br>(Triple Channel,<br>DDR3-1333) |
| Aggregate Memory<br>Bandwidth                      | 37.0 GB/s                                               | 21.2 GB/s                                               | 19.9 GB/s                                            |

# Windows 7 Application Performance

## Cinebench 11.5 - Single Threaded

Score in CBMarks - Higher is Better



## Cinebench 11.5 - Multi-Threaded

Score in CBMarks - Higher is Better

Intel Core i7 3960X 10.52

Intel Core i7 990X 8.85 + 19%

Intel Core i7 2600K 6.86 + 53%

AMD FX-8150 5.99 + 75%

AMD Phenom II X6 1100T BE 5.9

Intel Core i5 2500K 5.42

0 2 4 6 8 10 12

## 7-zip Benchmark

32MB Dictionary - Total MIPS - Higher is Better



## AES-128 Performance - TrueCrypt 7.1 Benchmark

Mean Encryption/Decryption AES Algorithm - GB/s



## x264 HD Benchmark - 1st pass - v3.03

Frames per Second - Higher is Better



## x264 HD Benchmark - 2nd pass - v3.03

Frames per Second - Higher is Better



## Adobe Photoshop CS4 - Retouch Artists Speed Test

Time in Seconds - Lower is Better



# Build Chromium Project - Visual Studio 2008

Compile Time in Minutes - Lower is Better



# Gaming Performance



## DiRT 3 - Aspen Benchmark - 1920 x 1200 High Quality

Average Frames per Second - Higher is Better



# World of Warcraft

FRAPS Runthrough - FPS - Higher is Better



# Power Consumption

## Power Consumption - Idle

Total System Power Consumption in Watts (Lower is Better)



## Power Consumption - Load (x264 HD 3.03 2nd Pass)

Total System Power Consumption in Watts (Lower is Better)



# Overclocked Performance and Consumption



## Overclocked: x264 HD Benchmark - 2nd pass - v3.03

Frames per Second - Higher is Better



## Overclocked Power Consumption - Load (x264 HD 3.03 2nd Pass)

Total System Power Consumption in Watts (Lower is Better)



# Final Words

- No-compromise, ultra high-end desktop solution
- May be world's fastest desktop CPU
- Lack of an on-die GPU
- Doesn't make gaming experience any better or speed up the majority of desktop applications

Thank You