

# The ARM Advantage

## Printable Presenter Tutorial (Detailed)

Engineering 73

December 12, 2025

## Contents

# 1 How to use this tutorial

This document is designed so you can (1) truly understand the material, (2) rehearse efficiently, and (3) handle Q&A confidently.

Use three layers:

- **Slides (what the audience sees):** minimal words and clean visuals.
- **Script (what you say):** the short paragraphs under each slide below.
- **Deeper notes (if asked):** extra detail for Q&A or if you need to explain something differently.

# 2 Presentation setup (HTML deck)

Your slide file: `arm_presentation.html`

## 2.1 How to present it

1. Open `arm_presentation.html` in Chrome or Safari.
2. Use **Arrow keys** (or Space) to move forward/back.
3. Press **F** for fullscreen (Reveal.js shortcut). Press **Esc** to exit.
4. Press **S** to open speaker view (notes + preview). If it doesn't open, your browser may block popups.
5. Have a stable internet connection: the deck loads Reveal.js + MathJax via CDN.

## 2.2 Day-of checklist

- **Open the deck once before class** to confirm equations render.
- **Check font size** from the back of the room (projector TVs can be harsh).
- **Know your 1-sentence story** (see Section ??).
- **Practice transitions:** the presentation should feel like one connected story, not separate facts.

## 2.3 Timing plan (typical 8–10 minutes)

A good pacing target is about 30–60 seconds per slide.

| Slide                     | Suggested time       |
|---------------------------|----------------------|
| 1 (Title)                 | 0:15–0:25            |
| 2 (MOSFET)                | 0:40–0:60            |
| 3 (Current sets speed)    | 0:45–1:05            |
| 4 (CMOS logic)            | 0:35–0:55            |
| 5 (CISC vs RISC)          | 0:45–1:05            |
| 6 (RISC reduces overhead) | 0:45–1:05            |
| 7 (Dynamic power)         | 0:45–1:10            |
| 8 (Pipelining)            | 0:45–1:10            |
| 9 (Delay vs voltage)      | 0:30–0:50 (optional) |
| 10 (Area & cost)          | 0:30–0:50            |
| 11 (SoC)                  | 0:35–0:55            |
| 12 (Conclusion)           | 0:15–0:25            |

**If you must shorten:** keep Slides 2, 6, 7 as the core; optionally skip Slide 9.

## 3 The story you must memorize

### 3.1 One-sentence story

**Key idea:** ARM often wins on efficiency because simpler instruction formats reduce switching overhead (lower  $C_{load}$ ), enabling lower voltage; and dynamic power scales like  $V^2$ .

### 3.2 30-second story (memorize)

Computers are built from MOSFET switches. The big cost in modern chips is switching energy: charging and discharging capacitances. ARM's RISC style keeps instruction formats uniform, which makes decoding simpler, reduces overhead logic, and lowers the effective capacitance being switched. That reduction in switching plus lower operating voltage gives much better performance per watt, and that enables modern system-on-chip designs.

## 4 Deep Dive: How It All Connects (The Detailed Logic)

This section connects every concept in a single logical chain. If you get lost, come back here.

## Step 1: The Physics Constraint

**The Reality:** We build computers out of transistors (MOSFETs). A MOSFET is just a switch that charges a capacitor (the next gate).

- **Speed depends on Current ( $I$ ):** To switch faster, you need more current to fill the capacitor quickly.
- **Power depends on Voltage ( $V^2$ ):** Energy per switch is  $\frac{1}{2}CV^2$ .

**The Conflict:** To get speed, you want high current. But high current requires either big transistors (high Area/Capacitance) or high Voltage (high Power). You can't just "go faster" without paying a penalty.

## Step 2: The Architectural Choice (RISC)

**The Problem:** x86 (CISC) instructions are complex and variable-length. The hardware has to spend a lot of energy just figuring out *what* the instruction is before it can do any work. This adds extra transistors, extra wiring, and extra switching.

- More switching = Higher  $C_{load}$  (capacitive load).

**The ARM Solution:** Use simple, fixed-length instructions (RISC). The decoder is tiny. The control logic is simple.

- Less overhead = Lower  $C_{load}$ .

## Step 3: The Efficiency Multiplier

**The Magic of  $V^2$ :** Because ARM chips have lower  $C_{load}$  (less overhead), they generate less heat. This allows designers to lower the Voltage ( $V_{DD}$ ) without the chip becoming unstable or too slow for mobile tasks.

- The Master Equation:  $P_{dynamic} \approx C_{load} \cdot V_{DD}^2 \cdot f$

This is a double-win:

1. ARM lowers  $C_{load}$  by design (Architecture).
2. That efficiency allows them to lower  $V_{DD}$  (Physics).
3. Since  $V$  is squared, a small drop in  $V$  creates a *huge* drop in Power.

## Step 4: The System Result

**The Payoff:** Because the CPU core is efficient (low power) and physically small (low area), you have “budget” left over on the silicon die.

- **Integration:** You can fit the CPU + GPU + NPU + Memory Controller all on one small chip (SoC).
- **Synergy:** putting them close together shortens the wires between them. Short wires = less capacitance = even lower power.

**Conclusion:** ARM isn’t just about writing code differently. It’s a strategy to minimize the number of electrons you have to move to get a job done.

## 5 Glossary + how to say symbols out loud

Use this as a quick reference so you don't freeze on notation.

- **MOSFET**: Metal-Oxide-Semiconductor Field-Effect Transistor.
- **Gate / Source / Drain**: terminals of the MOSFET.
- **Channel**: conductive path created under the gate.
- $V_{GS}$ : “**V G S**” (gate-to-source voltage).
- $V_{TH}$ : “**V T H**” (threshold voltage).
- $I_D$ : “**I D**” (drain current).
- $W/L$ : “**W over L**” (width over length, geometry strength).
- **CISC**: Complex Instruction Set Computer.
- **RISC**: Reduced Instruction Set Computer.
- **Decoder**: hardware that interprets instructions into control signals.
- $C_{load}$ : “**C load**” (effective switching capacitance).
- $V_{DD}$ : “**V D D**” (supply voltage).
- $f$ : **frequency** (clock frequency).
- **CPI**: cycles per instruction.
- **SoC**: System-on-Chip.

## 6 Slide-by-slide: what each slide means + what to say

This section is the core rehearsal material.

### Slide 1 — Title: The ARM Advantage

**Goal:** Set the thesis and the direction: physics → architecture → efficiency → SoC.

**Point at:** The 4-step flow (MOSFET → CMOS → RISC → SoC) and the thesis box.

**Say (script):**

Today I'm explaining ARM from the bottom up: how the physics of transistors sets constraints, and how ARM's RISC design is one way to organize silicon to win on performance per watt. The main message is: simpler instruction structure reduces overhead switching, which reduces capacitance, which lets you use lower voltage — and power scales like voltage squared.

**Transition line:** Let's start at the smallest unit: the transistor switch.

## Slide 2 — The MOSFET

**Goal:** Make the audience comfortable with the MOSFET as a controllable switch.

**Point at:** OFF vs ON visual: gate label and channel line.

**Say (script):**

Every CPU starts with one physical device: the MOSFET. It's basically a voltage-controlled switch. When the gate voltage is low, there's no channel, so current can't flow. When the gate voltage is high enough, it creates an electric field that forms a channel, and current flows from source to drain.

**If asked (deeper):**

A helpful analogy is a water valve: the gate voltage opens or closes the valve, controlling flow. In reality it's an electric field shaping the energy barrier in silicon, but for this talk, "*voltage opens a path for current*" is the right mental model.

**Transition line:** Now: what determines how *fast* that switch can operate? Current.

## Slide 3 — Current sets speed

**Goal:** Connect transistor physics to speed: more current charges the next gate faster.

**Point at:** The current proportionality and the two knobs (Geometry / Overdrive).

**Say (script):**

Switching speed comes from how quickly you can charge the next gate and wires, and that depends on the drive current. The key relationship is that current increases with geometry  $W/L$  and with the square of the overdrive ( $V_{GS} - V_{TH}$ ). In simple terms: you can go faster by making transistors bigger or pushing more voltage — but both increase cost and power.

**If asked (deeper):**

Why does more current mean faster? Because charging a capacitance roughly follows  $I = C dV/dt$ . Bigger  $I$  means faster  $dV/dt$  (faster voltage change), which means faster switching.

**Transition line:** One transistor isn't a computer. We need logic gates.

## Slide 4 — CMOS logic

**Goal:** Show how we build reliable logic from transistors (and why NAND matters).

**Point at:** The NAND diagram: PMOS network on top, NMOS on bottom, output node.

**Say (script):**

We build logic using CMOS: PMOS pulls the output up to 1, NMOS pulls it down to 0. A NAND gate uses four transistors and is universal: you can build any digital circuit from NAND gates. So when we talk about CPU architecture, it's really a massive organization of these gate-level switching events.

**If asked (deeper):**

Why CMOS is efficient: in steady state, ideally one network is off, so DC current is low.

The dominant energy cost is switching (charging/discharging capacitances), not constant conduction.

**Transition line:** Now that we have billions of gates, the big question is: how do we organize them?

## Slide 5 — CISC vs RISC

**Goal:** Introduce the architecture fork and tie it to physical consequences.

**Point at:** The two columns and the decode-overhead visual.

**Say (script):**

There are two philosophies. CISC, like x86, historically has more irregular, variable-length instructions, which pushes complexity into hardware to interpret instructions. RISC, like ARM, uses more uniform instruction formats, which simplifies decoding and control. Physically, simpler control logic often means fewer transistors switching and less wiring overhead.

**If asked (deeper):**

Important nuance: modern x86 CPUs often decode variable-length instructions into simpler internal micro-operations, so internally they can look quite RISC-like. The key point is that variable-length decode still costs silicon and power, and ARM's ISA and ecosystem encourage simpler front-ends.

**Transition line:** So what does *simpler decode* buy you in the equations? It reduces  $C_{load}$ .

## Slide 6 — RISC reduces overhead

**Goal:** Define  $C_{load}$  and show why decode/control complexity matters physically.

**Point at:** The decoder blocks and the  $C_{load}$  up/down indicators.

**Say (script):**

$C_{load}$  is the effective capacitance that the circuit charges and discharges each switching event. If you have more overhead circuitry toggling and more wiring,  $C_{load}$  is larger. ARM's uniform instruction format typically makes decoding simpler, so the decode/control overhead is smaller. That means less switching capacitance and lower energy per clock.

**If asked (deeper):**

Where does capacitance come from? Gate capacitance (every transistor input), diffusion capacitance, and especially wire capacitance. As designs scale, wires become a huge part of  $C$ .

**Transition line:** Now we can write the master equation that links all this to battery life: dynamic power.

## Slide 7 — Dynamic power

**Goal:** Teach the most important power equation and why voltage is the biggest lever.

**Point at:**  $P_{dynamic} \approx C_{load}V_{DD}^2f$  and the  $V^2$  emphasis.

**Say (script):**

Dynamic power is approximately  $P_{dynamic} \approx C_{load}V_{DD}^2f$ . This is why voltage matters so much: it's squared. ARM benefits by reducing  $C_{load}$  through simpler overhead, and by targeting designs that run well at lower voltage. Even a modest voltage drop can massively reduce power.

**If asked (deeper):**

This is only *dynamic* power. There's also leakage (static power), which matters a lot in modern nodes. But dynamic power is still the cleanest way to explain why lowering switching capacitance and voltage helps.

**Transition line:** If we lower voltage, how do we keep performance? Structure: pipelining.

## Slide 8 — Pipelining

**Goal:** Explain how simple, uniform instructions help keep CPI low and throughput high.

**Point at:** The execution-time equation and pipeline stages.

**Say (script):**

Execution time depends on CPI, clock cycle time, and instruction count. Pipelining is an assembly line: while one instruction executes, another decodes, another fetches. Uniform instructions make the pipeline smoother and help keep CPI near 1. This is how ARM can get high throughput efficiently without relying only on high voltage and high frequency.

**If asked (deeper):**

CPI is not always 1. Branches, cache misses, and dependencies add stalls. But simple, regular pipelines are easier to keep busy.

**Transition line:** There's always a trade-off: lowering voltage also slows switching.

## Slide 9 — Delay vs voltage (optional)

**Goal:** Communicate the basic trade-off: lower voltage saves power but increases delay.

**Point at:** The trade-off chart and the sweet-spot point.

**Say (script):**

Lowering voltage reduces dynamic power dramatically, but it also reduces drive strength, so gates switch slower and the maximum frequency drops. Efficient designs aim for a sweet spot where performance per watt is best.

**If asked (deeper):**

If someone asks for the “real equation,” you can say: device delay depends on capacitance, voltage, and how far above threshold you are. As  $V$  approaches threshold, delays rise quickly.

**Transition line:** Efficiency has a silicon consequence: smaller, cheaper cores and more

integration.

## Slide 10 — Area & cost

**Goal:** Connect physical simplicity to cost and integration.

**Point at:** The integration block diagram.

**Say (script):**

Simpler cores tend to be smaller. Smaller area generally improves cost and leaves room to integrate more blocks on the same chip: multiple CPU cores, a GPU, an NPU, and I/O.

**If asked (deeper):**

Why does cost rise faster than area? A common mental model is: bigger dies mean fewer dies per wafer and usually lower yield. Cost per good die can rise superlinearly with area.

**Transition line:** And when you integrate multiple blocks, you get a System-on-Chip.

## Slide 11 — System-on-Chip

**Goal:** Show that ARM's advantage is also about integration and short wires.

**Point at:** Compute / Memory / I/O cards and the SoC block map.

**Say (script):**

A real product is a system: CPU, GPU, NPU, memory controllers, and physical I/O. Integration shortens wires, which reduces capacitance and energy per operation. ARM is not just a core; it's an ecosystem and design approach that fits well into SoCs.

**Transition line:** Let's wrap with the three takeaways.

## Slide 12 — Conclusion

**Goal:** Reinforce the thesis and end cleanly.

**Point at:** The three summary cards and final thought.

**Say (script):**

The physics takeaway is that switching and voltage dominate energy. The architecture takeaway is that RISC reduces overhead switching, lowering  $C_{load}$ . The result is lower power and easier integration — ARM isn't just code; it's a strategy for organizing silicon.

## 7 Extra background (for confidence)

This section is optional, but reading it once makes Q&A much easier.

## 7.1 Why $P \approx CV^2f$ is so powerful

Switching a capacitor from 0 to  $V$  stores energy  $E = \frac{1}{2}CV^2$ . Each cycle you often charge and discharge, so energy per toggle is proportional to  $CV^2$ . Multiply by toggles per second (roughly  $\alpha f$ , where  $\alpha$  is activity factor), and you get power:

$$P_{dyn} \approx \alpha CV^2f.$$

We dropped  $\alpha$  on slides to keep it simple, but it's there in real chip modeling.

## 7.2 Why decode/control can dominate

Control logic can switch a lot even when the instruction does little. Irregular instruction formats also increase wiring complexity. The point isn't that *all* x86 chips are inefficient; it's that the architecture historically pushed more work into front-end decode.

## 7.3 Common nuance: ISA vs microarchitecture

ARM is an ISA family; x86 is an ISA; real chips differ by microarchitecture and process node. Apple's M-series are ARM ISA but high-performance designs with large caches, wide pipelines, and advanced process technology.

# 8 Q&A cheat sheet (short answers you can memorize)

- **Q: Is ARM always faster or always lower power than x86?**

A: Not always. Efficiency depends on the design goal, microarchitecture, and process node. ARM's ISA and ecosystem make low-power designs easier, especially for mobile and SoCs.

- **Q: Doesn't modern x86 decode into micro-ops anyway?**

A: Yes, many x86 CPUs translate into simpler internal ops, but the translation/decoder still costs silicon and power.

- **Q: Why is voltage so important?**

A: Dynamic power scales like  $V^2$ . Lowering  $V$  is the strongest lever for power.

- **Q: What about leakage/static power?**

A: Leakage exists even when not switching and is significant in modern nodes. We focus on dynamic power because it cleanly links architecture ( $C$ ) and voltage ( $V$ ) to energy.

- **Q: What is  $C_{load}$  in simple words?**

A: It's the amount of "electrical stuff you have to charge" each time logic switches — transistors and wires.

- **Q: What's a SoC?**

A: CPU + GPU + NPU + memory controllers + I/O all on one chip.

## 9 One-page final cheat sheet

If you only memorize one page, memorize this.

**Five facts:**

1. MOSFET = voltage-controlled switch.
2. More current  $\Rightarrow$  faster switching (but costs area/voltage).
3. CMOS gates (like NAND) build all logic.
4. RISC reduces decode overhead  $\Rightarrow$  lower  $C_{load}$ .
5. Dynamic power:  $P \approx CV^2f$  (voltage is squared).

**Three transitions to remember:**

- MOSFET  $\rightarrow$  current sets speed.
- Gates  $\rightarrow$  architecture changes how much switches.
- Lower  $C$  and  $V$   $\rightarrow$  better perf/W  $\rightarrow$  SoC integration.