



# Parallel Programming 2.0

**Wei Li**

Senior Principal Engineer  
Software and Solutions Group  
Intel Corporation

CGO'06 Keynote 3/27/06

# Agenda

Software at Intel

Major Technological Change

Software Response

Parallel Programming 2.0



# Why Does Intel Care About Software?

A close-up portrait of a man with short brown hair and blue eyes, wearing dark-rimmed glasses. He is looking slightly upwards and to his right with a thoughtful expression. The background is a soft-focus view of a city at night, with various lights from buildings and street lamps visible.

Copyright © 2006, Intel Corporation

# Software At Intel®



**Development  
Products**



**Enabling**



**Platform  
Software**



**Professional  
Services**

## Developer Products



Envir

Devel  
Produ

OS's

Architectures

Intel®  
VTune™  
Analyzers

Intel®  
Compilers

Intel®  
Performance  
Libraries

Intel®  
Cluster  
Toolkit

Int  
Threading  
Tools



Intel  
XScale®  
Technology



All Architectures Supported Across All Environments

Other names and brands may be claimed as the property of others.

# Enabling...

- Intel® Software Network
- Intel® Software College
- Intel® Early Access Program
- Industry Programs and Alliances
- Intel® Competency Centers
- Intel® Software Research
- Intel® Press Publications



[www.intel.com/software](http://www.intel.com/software)




**• Software**

- [Strategies & Technologies](#)
- [Technical References & Products](#)
- [Solutions & Services](#)
- [Software Products](#)
- [Drivers, Code, & Downloads](#)
- [Developer Programs & Networks](#)
- [Software Training & Events](#)
- [Market Your Product](#)
- [Developer Centers](#)
- [Software Support](#)
- [Community Forums](#)

Login to:  
**Intel® Software Network**

[Login](#) | [Register](#) | [About Us](#)

Выбрать язык:  
**Программа Intel® Software Network**

Please select

[• Sign up for Newsletter](#)  
[• Newsletter Archive](#)

► Intel® Software Network Home

## Intel® Software Network



[Products](#)



[Training](#)



[Collaboration](#)



[Vision & Technologies](#)



[Services](#)

Welcome to the Intel® Software Network.

### Top Stories

#### Intel-based Apple Computers

See how Intel® Core™ Duo processor and Intel software tools support these new products.  
[ [Intel® Software Development Products](#) ]

#### Develop for Intel® Viiv™ Technology-based PCs

Maximize your application or service performance by architecting for multithreading or multi-tasking environments enabled by dual-core processors.  
[ [Digital Media](#) ]

#### Intel Vision for Digital Home

Download the Intel® Software and Services Product Recommendations for 2006. Get the details you need to start developing for the future digital home.  
[ [Digital Home](#) ]

### Spotlight

#### Intel® Software Network Dispatch

Discover this subscription-based communication pipeline that offers timely, technology-specific content that's relevant to the software community. Subscribing is free.

[Next](#)

#### Intel® Early Access Program

Gain access to new hardware, tools, support, and marketing

[More >](#)

# ....With Worldwide Reach

Over **50** Development **Sites**  
Across More Than **20 Countries**

*Scaling Innovation on Intel's Leading  
Platforms and Tapping the Best  
Talent Around the World*



Hardware  
+ Platform Software  
+ ISV Value Add

---

## Platform Solutions

# Platform Software

- Intel® Active Management and Intel® Virtualization Technology Solutions
  - Rapid adoption throughout ecosystem
  - Many companies bringing products to market



- Carrier Grade Linux for Telecommunications with AdvancedTCA
  - Both hardware and software technology building blocks available
  - Commercial deployments already underway worldwide



# Platform Software

- Intel® Platform Administration Technology
  - For internet café's and SMB installations
- Tiano / EFI
  - Apple using Tiano and EFI to boot Intel-based Macs
  - Embedded Tiano to replace PC BIOS for initialization and management of high performance clusters

The screenshot shows the homepage of the Intel Platform Administration Technology website. The header includes the Intel logo and navigation links for Home, Products, Solutions & Services, Technologies, and Resources & Events. The main content area features a banner for "Intel® Platform Administration Technology" and "Intel® Platform Administrator". It highlights "Multi-Processor & Processor" features like "External multi-processor", "Processor-to-Processor Direct", and "PCI express link power". A sidebar titled "Related Links" lists "Intel® Platform Administration v1.1 download", "FAQ download", "White Paper v1.1 download", and "Product Demo download". Another sidebar shows an image of a "World Desktop Board".

The screenshot shows the homepage of the TianoCore.org website. The header includes the TianoCore logo and navigation links for Home, White Papers, White Paper Library, Administration, Architecture, Testing Method, Traceability API, Help for Projects, and Forum. The main content area features a "Welcome to TianoCore.org" banner. It explains that TianoCore is the community supporting the open source components of Intel's implementation of EFI, specifically the Platform Innovation Framework. It encourages users to join the community and contribute to the development of the framework. The page also includes sections for "About the Community", "Word on the Street", "Our Community at Work", and "Useful Internet Links".

# The Next Platform Workload: XML



XML Traffic Will Exceed:  
All Mail Traffic in 2004 • All Web Traffic in 2007

# Managed Runtime

- Develop the best available Java and .NET technology for Intel® hardware
  - help JVMs run better on parallel hardware
  - JITs generate great code for Intel® micro-architectures
- Example projects
  - Apache Harmony open-source JVM. Goals:
    - Create an open-source, compatible implementation of J2SE 5 under the Apache License
    - Create a community to carry the open J2SE forward
  - NuMA-cc Awareness for JVMs; object co-location
  - Profile collection and management
  - Dynamic profile-guided optimization



# Apache Harmony

http://incubator.apache.org/harmony/



## Welcome to Apache Harmony

### General

- [Home](#)
- [License](#)
- [Contribution Policy](#)
- [Project Guidelines](#)
- [ASF](#)
- [Downloads](#)

Welcome to Apache Harmony, the J2SE project of the [Apache Software Foundation](#). Please help us make this a world class, certified J2SE implementation!

Note : Apache Harmony is an effort undergoing incubation at the Apache Software Foundation (ASF). Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

### Community

- [Get Involved](#)
- [Committers](#)
- [Mailing Lists](#)
- [Documentation](#)
- [FAQ](#)
- [Wiki](#)

The aim of the project is to produce a large and healthy community of those interested in runtime platforms tasked with :

- Create a compatible, independent implementation of J2SE 5 under the Apache License v2
- Create a community-developed modular runtime (VM and class library) architecture to allow independent implementations to share runtime components, and allow independent innovation in runtime components

### Development

- [Road Map / TODO](#)

Done

# Software Technology Innovation



Internal Projects as well as Joint  
Research with Universities

# Agenda

Software at Intel

Major Technological Change

Software Response

Parallel Programming 2.0



# Quiz

Moore's law states which of the following roughly doubles every 2 years?

1. Frequency
2. Performance
3. Transistors
4. Transistor Density



*from [www.intel.com](http://www.intel.com)*



# Historical Driving Force

Increased Performance  
via Increased Frequency



**1946**

20 Numbers  
in Main Memory



**1971**

I4004 Processor  
2300 Transistors



**2005**

65nm  
1B+ Transistors



# The Challenge

## Power Limitations



**Power = Capacitance x Voltage<sup>2</sup> x Frequency**  
also  
**Power  $\sim$  Voltage<sup>3</sup>**

# Energy: The Next Frontier



# Energy Efficient Performance – High End

DATACENTER  
“ENERGY LABEL”



## NASA Columbia

2 MWatt  
60 TFlops goal  
10,240 cpus – Itanium II  
**\$50M**

Source: NASA

30,720 Flops/Watt  
1,288 Flops/Dollar

## Computational Efficiency



**ASC Purple**  
6 MWatt  
100 TFlops goal  
12K+ cpus – Power5  
**\$230M**

Source: LLNL



# The Classic Tradeoff

Higher  
Top Speed and Acceleration

Increased  
Range and Economy

OR



# A Simple Example

■ *Performance*  
■ *Power*



# Over-clocking



# Under-clocking



# Multi-Core Energy-Efficient Performance



# Moore's Law will provide transistors

Intel process technology capabilities

| High Volume Manufacturing                      | 2004 | 2006 | 2008 | 2010 | 2012 | 2014 | 2016 | 2018 |
|------------------------------------------------|------|------|------|------|------|------|------|------|
| Feature Size                                   | 90nm | 65nm | 45nm | 32nm | 22nm | 16nm | 11nm | 8nm  |
| Integration Capacity (Billions of Transistors) | 2    | 4    | 8    | 16   | 32   | 64   | 128  | 256  |

Use transistors for

- Multiple cores
- On-core memory (caches)
- New features (\*Ts)

Multiple cores and caches address power and memory latency issues

# The Dawn of Energy-Efficient Performance



# Agenda

Software at Intel

Major Technological Change

Software Response

Parallel Programming 2.0



# Multi-Core Platforms Demand Threaded Software

Biggest Performance Leap Since  
Out-of-Order Execution

Integer Performance at Introduction  
(normalized to 25MHz 486DX)

- Single Threaded
- Multi Threaded

Pentium      Pentium II      Pentium III      Pentium 4      Pentium D      Conroe<sup>1</sup>



Copyright © 2006, Intel Corporation

<sup>1</sup> Estimated based on preproduction measurements. Source: SpecWeb site & Newsletter

# The Importance of Threading

- Do Nothing: Benefits Still Visible
  - Operating systems ready for multi-processing
  - Background tasks benefit from more compute resources
- Parallelize: Unlock the Potential
  - Threaded applications
  - Threaded libraries
  - Compiler generated threads

# Unleash Multi-Core Potential

- Intel® C++ and Fortran Compilers
  - Built-in threading support with Auto-Parallelization and OpenMP\* support
- Intel® Thread Checker & Thread Profiler
  - Unique product locates hard to find threading errors before they happen!
  - Helps developers optimize threaded applications
- Intel® MKL and IPP Performance Libraries
  - Highly optimized threaded libraries that enable multi-core performance gains even if your application isn't threaded!
- Intel® VTune™ Performance Analyzer
  - Identifies performance bottlenecks in single or multithreaded applications to maximize performance



# Example: Threading for Multi-Core



# Threading for Multi-Core



Intel® VTune™  
Performance Analyzer

## Call Graph

- Functional Structure
- Execution Times
- Counts





Activity1 (Call Graph)



| Calls (841) | Execution Time (841) | Function (841)             |
|-------------|----------------------|----------------------------|
| 1.          | 99.9%                | WinMainCRTStartup          |
| 1.          | 99.9%                | WinMain                    |
| 1.          | 98.5%                | ParseArguments             |
| 1.          | 98.5%                | Initialize                 |
| 1.          | 98.5%                | InitializeSG               |
| 1.          | 97.3%                | LoadU3DFileInit            |
| 1.          | 97.2%                | Load                       |
| 1.          | 97.2%                | Load                       |
| 1.          | 97.2%                | ExecuteReadX               |
| 2.          | 60.3%                | ExecuteTransferX           |
| 1.          | 60.3%                | ProcessTransferOrderX      |
| 8,056.      | 47.1%                | TransferX                  |
| 16,212.     | 34.4%                | ProcessGenericBlockX       |
| 8,085.      | 33.6%                | ProcessModifierChainBlockX |
| 12,113.     | 31.8%                | ProcessBlockX              |
| 16,362,002. | 18.5%                | GetResourcePtr             |



# Threading for Multi-Core



Intel® Compilers

OpenMP Loop Construct

- Creates one thread per core
- Assigns iterations to threads



# Threading for Multi-Core





Memory write at "cifxmodifierchain.cpp":1346 conflicts with a prior memory write at "cifxmodifierchain.cpp":1380 (output dependence)

### 1st Access

Location of the first thread that was executing at the time the conflict occurred

Stack:

```
int CIFXModifierChain::Invalidate(unsigned int,unsigned int)
    "cifxmodifierchain.cpp":1380
[IFXCore.dll, 0x4430e]
int CIFXModifierChain::Invalidate(unsigned int,unsigned int)
    "cifxmodifierchain.cpp":1494
[IFXCore.dll, 0x44d9]
int CIFXModifierDataPacket::InvalidateDataElement(unsigned int)
    "cifxmodifierdatapacket.cpp":403
[IFXCore.dll, 0x45843]
int CIFXAuthorCLODResource::SetAuthorMesh(class IFXAuthorCLODMesh *)
    "cifxauthorclodresource.cpp":611
[IFXCore.dll, 0x231c0]
void CIFXAuthorCLODDecoder::TransferX(int &)
    "CIFXAuthorCLODDecoder.cpp":199
[IFXImporting.dll, 0x20e8]
?ProcessTransferOrderX@CIFXLoadManager@@AAEXAAH@Z_1433__par_loop1
"CIFXLoadManager.cpp":1462
[IFXImporting.dll, 0x1923a]
?ProcessTransferOrderX@CIFXLoadManager@@AAEXAAH@Z_1356__par_loop0
"CIFXLoadManager.cpp":1356
[IFXImporting.dll, 0x19520]
void CIFXLoadManager::ExecuteTransferX(void)
    "CIFXLoadManager.cpp":692
[IFXImporting.dll, 0x18eb6]
```

### Source

```
// Iterate -- follow all of the invalidation sequences
while( IFXSUCCESS( result ) && s_InvDepth > StartDepth )
{
    InvRecord* pCurIterState = s_pInvState + s_InvDepth;

    // Get the current Inv Seq
    IFXModifierDataPacketInternal* pDP =
        pDataPacketState[pCurIterState->ModIdx].m_pDataPacket;
    IFXDidInvElement* pInvEl =
        &(pCurIterState->pDEState->m_pInvSeq[pCurIterState->InvIdx]);
    pCurIterState->InvIdx++;

    // pop this iter state if we are processing the last entry
    if( pCurIterState->InvIdx == pCurIterState->pDEState->m_uInvCount )
    {
        IFXInterlockedDecrement( (U32*)&s_InvDepth );
    }

    // Get the Invalidation Target and Do The Invalidation
    if( pInvEl->uMIndex != APPENDED_DATAPACKET_INDEX )
    {
        IFXDataPacketState* pTrgDPState =
            &nDataPackerState[nInvEl->uMIndex];
        pTrgDPState->
```

### 2nd Access

Location of the second thread that was executing at the time the conflict occurred

Stack:

```
int CIFXModifierChain::Invalidate(unsigned int,unsigned int)
    "cifxmodifierchain.cpp":1346
[IFXCore.dll, 0x4426a]
int CIFXModifierChain::Invalidate(unsigned int,unsigned int)
    "cifxmodifierchain.cpp":1494
[IFXCore.dll, 0x44d9]
int CIFXModifierDataPacket::InvalidateDataElement(unsigned int)
    "cifxmodifierdatapacket.cpp":403
[IFXCore.dll, 0x45843]
int CIFXAuthorCLODResource::SetAuthorMesh(class IFXAuthorCLODMesh *)
    "cifxauthorclodresource.cpp":610
[IFXCore.dll, 0x231b0]
void CIFXAuthorCLODDecoder::TransferX(int &)
    "CIFXAuthorCLODDecoder.cpp":199
[IFXImporting.dll, 0x20e8]
?ProcessTransferOrderX@CIFXLoadManager@@AAEXAAH@Z_1433__par_loop1
"CIFXLoadManager.cpp":1462
[IFXImporting.dll, 0x1923a]
?ProcessTransferOrderX@CIFXLoadManager@@AAEXAAH@Z_1356__par_loop0
"CIFXLoadManager.cpp":1356
[IFXImporting.dll, 0x19520]
void CIFXLoadManager::ExecuteTransferX(void)
    "CIFXLoadManager.cpp":692
[IFXImporting.dll, 0x18eb6]
```

### Source

```
result = IFX_E_INVALID_RANGE;

if( IFXSUCCESS( result ) )
{
    // Set the state for the Initial invalidation
    IFXAquireMutex( s_mInvState );
    s_pInvState[s_InvDepth].ModIdx = uInModifierIndex;
    s_pInvState[s_InvDepth].pDEState =
        &(pDataPacketState[uInModifierIndex].m_pDataElements[uInDataElementIndex]);
    s_pInvState[s_InvDepth].InvIdx = 0;
    IFXReleaseMutex( s_mInvState );
}

// we never actually invalidate the proxy data packet
// all of the proxy data packet entries except for time
// should always be valid
if( IFXSUCCESS( result ) && uInModifierIndex != 0 )
    // invalidate this element
    s_pInvState[s_InvDepth].pDEState->State = IFXDATAELEMENTSTATE_INVALID;
    if( s_nInvState[s_InvDepth].nDEState->AspectBit_1 )
```

# Threading for Multi-Core



Intel® Thread  
Profiler

Find Contended Locks

- Most Overhead
- Largest Reduction in Parallelism



VTune(TM) Performance Environment - [Thread Profiler - Activity: 03:33 PM, 2006 Mar 02 (TP: sampleplayer.exe)]

File Edit View Activity Configure Window Help

Source View

Signal

Receive

Transition Source

Transition Threads  
Prev: Thread Unknown  
Current: Threads 3,  
5,  
4,  
1  
Next: Threads 3,  
5,  
4,  
1

Stack:  
IFXAcquireMutex  
"ifxosthreads.cpp": 105  
Path: c:\g3force\depot\cwg\mpu3d\source\rtl\platform\win32\int CIFXAuthorCLODResource::SetAuthorMesh(class IFXAuthorC  
"cifxauthorclodresource.cpp": 589  
Path: c:\g3force\depot\cwg\mpu3d\source\rtl\component\ger  
CIFXAuthorCLODDecoder::~CIFXAuthorCLODDecoder(void)  
"CIFXAuthorCLODDecoder.cpp": 257  
Path: c:\g3force\depot\CWG\MPU3D\Source\RTL\Component  
void \* operator new[](unsigned int)  
"IFXCheckX.h": 66  
Path: ..\..\Component\Importing\..\..\Kernel\Include  
Address: 0xlibguide40.dll  
Module: 3174  
Path: c:\g3force\depot\CWG\MPU3D\Source\Build\U3D

| Address | Line | Source                                                                           |
|---------|------|----------------------------------------------------------------------------------|
| 0x223E6 | 581  | }                                                                                |
| 0x223EF | 582  | rpAuthorCLODMesh = m_pAuthorMesh;                                                |
| 0x223F2 | 583  | IFXRETURN(rc);                                                                   |
| 0x23150 | 584  | }                                                                                |
| 0x23150 | 585  | IFXRESULT rc = IFX_OK;                                                           |
| 0x23150 | 586  | IFXAcquireMutex( s_mSetAuthorMesh );                                             |
| 0x23161 | 587  | if(m_pAuthorMesh != pAuthorCLODMesh)                                             |
| 0x2316E | 588  | {                                                                                |
| 0x23179 | 589  | ClearMeshGroup();                                                                |
| 0x2317D | 590  | if(pAuthorCLODMesh)                                                              |
| 0x2317D | 591  | {                                                                                |
| 0x2317D | 592  | pAuthorCLODMesh->AddRef();                                                       |
| 0x2317D | 593  | }                                                                                |
| 0x2319C | 594  | IFXRELEASE(m_pAuthorMesh);                                                       |
| 0x231B6 | 595  | m_pAuthorMesh = pAuthorCLODMesh;                                                 |
| 0x231C0 | 596  | m_bMeshGroupDirty = TRUE;                                                        |
| 0x231D0 | 597  | if(m_pModifierDataPacket) {                                                      |
| 0x231D0 | 598  | m_pModifierDataPacket->InvalidateDataElement(m_uMeshGroupDataElementIndex);      |
| 0x231D0 | 599  | m_pModifierDataPacket->InvalidateDataElement(m_uBoundSphereDataElementIndex);    |
| 0x231D0 | 600  | }                                                                                |
| 0x231D0 | 601  | IFXReleaseMutex( s_mSetAuthorMesh );                                             |
| 0x231E0 | 602  | IFXRETURN(rc);                                                                   |
| 0x22400 | 603  | IFXRESULT CIFXAuthorCLODResource::GetAuthorMeshMap(IFXMeshMap **ppAuthorMeshMap) |
| 0x22405 | 604  | {                                                                                |
| 0x22407 | 605  | IFXRESULT rc = IFX_OK;                                                           |
| 0x2240D | 606  | if( ppAuthorMeshMap )                                                            |
| 0x2240D | 607  | if( m_pAuthorMeshMap )                                                           |

Signal

Receive

Transition Source

Transition Threads  
Prev: Thread Unknown  
Current: Threads 3,  
5,  
4,  
1  
Next: Threads 3,  
5,  
4,  
1

Stack:  
IFXReleaseMutex  
"ifxosthreads.cpp": 113  
Path: c:\g3force\depot\cwg\mpu3d\source\rtl\platform\win32\int CIFXAuthorCLODResource::SetAuthorMesh(class IFXAuthorC  
"cifxauthorclodresource.cpp": 614  
Path: c:\g3force\depot\cwg\mpu3d\source\rtl\component\ger  
CIFXAuthorCLODDecoder::~CIFXAuthorCLODDecoder(void)  
"CIFXAuthorCLODDecoder.cpp": 257  
Path: c:\g3force\depot\CWG\MPU3D\Source\RTL\Component  
void \* operator new[](unsigned int)  
"IFXCheckX.h": 66  
Path: ..\..\Component\Importing\..\..\Kernel\Include  
Address: 0xlibguide40.dll  
Module: 3174  
Path: c:\g3force\depot\CWG\MPU3D\Source\Build\U3D

| Address | Line | Source                                                                           |
|---------|------|----------------------------------------------------------------------------------|
| 0x23183 | 604  | IFXRELEASE(m_pAuthorMesh);                                                       |
| 0x23183 | 605  | m_pAuthorMesh = pAuthorCLODMesh;                                                 |
| 0x23183 | 606  | m_bMeshGroupDirty = TRUE;                                                        |
| 0x23183 | 607  | if(m_pModifierDataPacket) {                                                      |
| 0x23183 | 608  | m_pModifierDataPacket->InvalidateDataElement(m_uMeshGroupDataElementIndex);      |
| 0x23183 | 609  | m_pModifierDataPacket->InvalidateDataElement(m_uBoundSphereDataElementIndex);    |
| 0x23183 | 610  | }                                                                                |
| 0x23183 | 611  | IFXReleaseMutex( s_mSetAuthorMesh );                                             |
| 0x23183 | 612  | IFXRETURN(rc);                                                                   |
| 0x23183 | 613  |                                                                                  |
| 0x231D0 | 614  | IFXRESULT CIFXAuthorCLODResource::GetAuthorMeshMap(IFXMeshMap **ppAuthorMeshMap) |
| 0x231D0 | 615  | {                                                                                |
| 0x231E0 | 616  | IFXRESULT rc = IFX_OK;                                                           |
| 0x231E3 | 617  | if( ppAuthorMeshMap )                                                            |
| 0x231E3 | 618  | if( m_pAuthorMeshMap )                                                           |

# Performance Impact



Optimizations may improve performance significantly.

# Agenda

Software at Intel

Major Technological Change

Software Response

Parallel Programming 2.0



# A New Era...

*THE OLD*

Performance Equals Frequency

Unconstrained Power

Voltage Scaling

*THE NEW*

Performance Equals IPC

Multi-Core

Power Efficiency

Microarchitecture Advancements

*And it is happening fast...*



# Multi-Core Trajectory



2005/2006



2007

# Growing Momentum for Multi-Cores



# Growing Momentum For Software Parallelization

Activision (Ravensoft)      Pinnacle  
Adobe      Pixar (Renderman)  
Algorithmics      Paradigm  
Alias      PTC  
Autodesk      Red Hat  
Business Objects      SAP  
Cakewalk      SAS  
CodecPeople      Siebel CRM  
Computer Associates      Signet  
Corel (WordPerfect)      Skype  
Cyberlink      SLB  
Discreet      SnapStream  
IBM      Sonic (Roxio)  
id Software      Sony  
Landmark      Steinberg  
Macromedia      SunGard  
Mainconcept      Sybase  
Maxon      Symantec  
mental images      Thomson  
Microsoft (Office Suite)      THQ  
Midway      Ubisoft  
MSC      UGS  
Novell SUSE      Valve  
Oracle      Yahoo (Musicmatch)  
Pegasus



# Growing Momentum in Ecosystem Driven by Intel

- Continuous tool improvements
  - Over 15 tools released for multi-core in '05
  - Coming in '06: more features across all environments
- Training end users and in house developers
  - 2005: > 1500 students 2006 target: 4500 students
- Academia
  - Universities developing parallel programming curriculum
  - Research grants for parallel programming projects
- Contest: Over \$100,000 in prizes
  - Topcoder.com: Monthly contests for top performing threaded software
  - Games Developer Conference: Contest for best use of Intel platform features

# Opportunity for Software: Extract Full Potential of Multi-core



# Parallel Programming 2.0

- Ubiquitous parallel software
  - Large spectrum of domains (consumer/wireless vs HPC/database, home vs nuclear labs)
- Scalable software
  - Explosion of cores (e.g. 2X cores every 18-24 months)
- User experience
  - vs. just raw performance
- Greater ease of programming
  - New programming paradigm, programming language, compiler, tools,
- Greater reliability and security
  - Programming for application and system reliability
- Industry-wide vs government-funded
- Higher demand for parallel programming education

— Mass vs elite



Copyright © 2006, Intel Corporation

Imagine what can be  
Create what will be

# Parallel Programming 2.0



*The Beginning of a New Era*