



Pacific  
Northwest  
NATIONAL LABORATORY

# DOE Advanced Computer and System Architecture R&D Investments

Center for Research into Novel Computing Hierarchies:  
CRNCH Summit  
Georgia Tech  
January 28-29, 2021

**James A. Ang, Ph.D.**

Chief Scientist for Computing,  
Physical and Computational Sciences Directorate  
Pacific Northwest National Laboratory  
Richland, WA

U.S. DEPARTMENT OF  
**ENERGY** **BATTELLE**

PNNL is operated by Battelle for the U.S. Department of Energy

**PNNL-SA-159237**



## Outline

- My history with DOE computer and system architecture R&D
- DOE/ASCI approach to advanced computing technology R&D
- DOE-SC/ASCR & NNSA/ASC Exascale Computing Initiative (ECI):  
FastForward & DesignForward
- DOE-SC/ASCR & NNSA/ASC Exascale Computing Project (ECP):  
PathForward
- Identification of Future *Beyond Moore's Law* R&D Challenges



# My History with DOE Architecture Investments

- 1996-1998: Sandia representative at start of DOE ASCI program.
- 2007: Established Sandia's Scalable Computer Architectures department
  - 2008 DOE-SC/ASCR Institute for Advanced Architectures & Algorithms
  - 2010 DARPA/UHPC Project: X-Caliber to develop concepts for memory-centric node designs
  - 2011 Helped establish NNSA/ASC testbed project
  - 2012 Sandia Extreme-scale Computing Grand Challenge (XGC) LDRD
- 2015-2017: Hardware Technology Director of DOE Exascale Computing Project (ECP): Established 6 PathForward Architecture R&D projects with US Computer Industry
- 2018-Present: PNNL Chief Scientist for Computing
  - Sector lead for DOE-SC/ASCR
  - Data-Model Convergence (DMC) Initiative Lead
  - DOE Microelectronics Basic Research Needs Workshop
  - SRC Decadal Plan workshops



# DOE ASCI PathForward Contracts Announced by President Clinton in 1998

- Part of launch of ASCI
- Focus was on Increasing Scalability for MPP Architectures
  - Technologies to Interconnect tens of thousands of commodity processors
  - Scalable Operating Systems
- R&D in processor designs, and memory subsystems were explicitly *off the table*
- Original PathForward Vendors:
  - Digital Equipment Corp.
  - IBM
  - SGI/Cray
  - Sun Microsystems

**DOE NEWS**

**FOR IMMEDIATE RELEASE**  
**February 3, 1998**

**NEWS MEDIA CONTACTS:**  
**Matthew Donoghue, 202/586-5806**  
**Mary Dixon, 202/586-2249**

**'PathForward' Aims for 30 Trillion Operations Per Second By 2001**  
***President Clinton Announces DOE Partnership with Computer Companies***

LOS ALAMOS, New Mexico – President Bill Clinton today announced PathForward, the next step in the Department of Energy's effort to develop the supercomputers of the 21st century. The computers and simulation capabilities will be used to keep the U.S. nuclear weapons stockpile safe, secure, and reliable without nuclear testing. The four-year, \$50 million contracts are with Digital Equipment Corporation of Maynard, Mass., International Business Machines (IBM) of Poughkeepsie, New York, Sun Microsystems, Inc. (SUN) of Chelmsford, Mass., and Silicon Graphics/Cray Computer Systems (SGI/Cray) of Chippewa Falls, Wisc. These collaborations with the computer industry will help reach the department's long-term goal of developing a 100 Teraflops computer by 2004.



# Key DOE Policy: Class Advance Waiver

- *Under this policy the Government waives its domestic and foreign patent rights and software copyright to any subcontractor that qualified for this Class Advance Waiver*
- The subcontractor needs to provide a match of at least 30% of the PathForward project investment made by DOE, in order to qualify
- ASCI Program Leadership understood that platform procurements alone were not enough to keep cross section of vendors in HPC – a 2<sup>nd</sup> motivation for PathForward

NOV 24 '97 08:09PM DOE/OAK LEGAL OFFICE P.2

DE F 1925.8  
(8-89)  
EFG (07-90)

United States Government

Department of Energy

## memorandum

**DATE:** November 24, 1997      **DRAFT**

**REPLY TO**

**ATTN OF:** Gary Drew  
Oakland Operations Office  
Intellectual Property Law Division

**SUBJECT:** ASCI Pathforward Subcontract With IBM

**TO:** Gilbert G. Weigand  
Deputy Assistant Secretary for  
Strategic Computing and Simulation (DP-50)

**cc:** Gary Kent, DP-50  
Paul Gottlieb, GC-62

Issue  
Approval of modifications to the IBM subcontract and the applicability of these modifications to the remaining subcontracts.

Background  
To meet the requirements of DOE'S Stockpile Stewardship and Management Program, DOE is enhancing its computational power by developing supercomputers with the capability of performing tera-scale computing. The ASCI Pathforward Project, which is one phase of the ASCI Program to develop several generations of computers, is aimed at developing the technology necessary to enable 100 TeraFLOP/s within six years. The Lawrence Livermore National Laboratory (LLNL) is presently negotiating with several U.S. companies to develop these technologies, which are part of a company's current business plan, but would not otherwise be available in the time frame needed or at the scale/performance level required by ASCI Program. In order to expedite the negotiations, a Class Advance Waiver (W(C)-97-004) was granted on November 7, 1997. Under this Waiver, the Government waives its domestic and foreign patent rights and software copyright to any subcontractor that qualified for this Class Advance Waiver.

## DOE ECI, Launched ~2009 Paths to *Influence* COTS Development

- The DOE Exascale Computing Initiative (ECI) provided a focal point for alignment and collaboration between DOE-SC/ASCR and NNSA/ASC
- ECI enabled ASCR and ASC to jointly fund industry R&D projects
  - FastForward – Node Architecture
    - Patterned after Original ASCI PathForward Program
    - National Laboratory Staff are assigned to collaborate with Industry Partners via Co-Design activities, Proxy Applications, Proxy Architectures, system software, etc.
    - Sub-contractor matching investment requirement raised to 40%
  - DesignForward – System Architecture
    - Interconnect Networks:
    - System design and integration:
    - System-level performance: Energy Utilization, Resilience and Reliability

# DOE ECI \*Forward Architecture R&D Programs

- Objective: Accelerate and influence the transition of innovative ideas from architecture research into *first of a kind* products
- Processor and Memory FastForward Two rounds of projects (2011-2015):  
DOE Budget: \$163M
  - AMD/Micron
  - Intel
  - Cray/ARM/Broadcom
  - NVIDIA
  - IBM/Micron
- System Architecture DesignForward Two rounds of projects (2012-2016):  
DOE Budget: \$35M
  - AMD
  - Intel
  - Cray
  - NVIDIA
  - IBM
- Evaluate advanced research concepts and develop quantitative evidence of their benefit for DOE applications (using proxy apps)
  - Engage DOE application teams to understand technology trends constraints (how it impacts their code development)
  - Understand how to *program* these new features
  - Gather quantitative evidence to lower risk to adoption of innovative ideas by product teams

# ECP Hardware & Integration: PathForward

- PathForward contracts awarded to:
  - AMD
  - HPE
  - Intel
  - Cray/ARM/Cavium
  - IBM
  - Nvidia
- PathForward R&D objectives:
  - Innovative memory architectures
  - Higher-speed interconnects
  - Improved reliability systems
  - Increased application performance without prohibitive increases in energy demand
- Contracted: 2QFY17 - 4QFY20
  - Total DOE Budget: \$258M
  - ECP/contractor monthly meetings, deep dives, and hackathons
  - Semi-Annual Reviews of all projects
    - Many projects are high risk/high reward
    - In response to DOE's new schedule, contractors driving harder and accelerated work to intersect 2021 Exascale Platforms
    - Engagement through proxy or full applications was essential to optimize hardware for our needs

# DOE-SC Basic Research Needs for Microelectronics: Oct 23-25, 2018

## DOE's Long-term R&D Strategy for *Beyond Moore's Law*

### Priority Research Directions

- Flip the current paradigm: Define innovative material, device, and architecture requirements driven by applications, algorithms, and software
- Revolutionize memory and data storage
- Reimagine information flow unconstrained by interconnects
- Redefine computing by leveraging unexploited physical phenomena
- Reinvent the electricity grid through new materials, devices, and architectures

<https://science.energy.gov/ascr/community-resources/program-documents/>



# DOE-SC ASCR, BES and HEP + NNSA/ASC Sponsor the SRC Decadal Plan Workshops



# SRC Decadal Plan Outcomes



## Decadal Plan for Semiconductors - 5 Seismic Shifts

<https://www.src.org/about/decadal-plan/>

Public



Fundamental **breakthroughs in analog hardware** are required to generate smarter world-machine interfaces that can sense, perceive and reason.

The growth of memory demands will outstrip global silicon supply presenting opportunities for **radically new memory and storage** solutions.

Always available communication requires new research directions that address the **imbalance of communication capacity vs. data generation rates**.

Breakthroughs in hardware research are needed to address **emerging security challenges** in highly interconnected systems and AI.

Ever rising energy demands for computing vs. global energy production is creating new risk, and new computing paradigms offer opportunities with **dramatically improved energy efficiency**.

Full Report Will Serve As A Guide Towards 2030 and Beyond

# Closing Thoughts

- Interfaces and Interactions
  - Technical – Co-design
  - Institutional
    - Public-Private Partnerships
    - Inter-agency Alignments
- The End of Moore's Law is an exciting time
  - Break Down Conventional Wisdoms
  - Fresh Look at Innovative Solutions



Pacific  
Northwest  
NATIONAL LABORATORY

Thank you

