

## ⌚ x86 PROCESSOR BASICS (HOW THE CPU ACTUALLY RUNS THE SHOW)

Imagine the CPU as the **brain** of your computer.

But not a chill brain — a **cracked-out microsecond freak** that runs everything on caffeine and electricity. Here's how it works:

### 🏛️ The CPU – Central Processing Unit

This is where all the **thinking, math, and decision-making** happens.

It has:

- **Registers** – tiny super-fast storage slots (think 32-bit pockets for numbers)
- **Clock** – keeps time like a heartbeat so stuff happens in sync
- **Control Unit (CU)** – the **boss** that decides what happens next
- **ALU (Arithmetic Logic Unit)** – the **muscle** that does all the math and logic ops (ADD, SUB, AND, OR, NOT, etc.)



## 💡 How the CPU Connects to the World

The CPU talks to the rest of the PC through **pins** on its socket. These pins connect it to **buses** — long electric highways carrying signals.



## 💡 The 3 main buses:



## ■ Data Bus

Moves the *actual data* and *instructions* between the CPU, memory and I/O devices.

The data bus is bidirectional, meaning information can flow in both directions.

The "*width*" of the data bus (how many parallel wires it has) determines how much data can be transferred at once.

A *64-bit data bus* can move 64 bits of data simultaneously.



**Analogy:** The data bus is like a fleet of delivery trucks that transport goods (data) and mail (instructions) between the city hall (CPU), the library (memory), and various businesses (I/O devices). These trucks can deliver or pick up cargo.

## ■ Address Bus

Says *where* in memory we're looking.

The address bus is *unidirectional*, meaning information flows only from the CPU to other components.

It carries the **memory addresses or I/O port addresses** where data is to be read from or written to.

When the CPU wants to access a specific piece of data or instruction, it places its memory address on the address bus, telling the memory unit *exactly where to find* or store that information.

The *width of the address bus* determines the maximum amount of memory the CPU can access.

A *32-bit address* bus can address  $2^{32}$  unique memory locations (4 Gigabytes).



Imagine your *computer's RAM as a massive library*, and each book in that library has a unique shelf and position. When the CPU wants to read a specific piece of information (a "book"), it doesn't just shout out the book's title.

Instead, it sends out the exact "*shelf number*" and "*position*" through the address bus. This "shelf number and position" is what we call a **memory address**.

This *one-way communication* ensures that the CPU can accurately request data from, or send data to, a specific spot in memory.

## ■ Control Bus

Uses binary signals (on/off) to tell devices **when** to send or receive. It synchronizes the actions and manages the flow of information among all devices attached to the system bus.

**Think:** "Hey RAM — CPU wants to read now!"



It carries control signals that dictate operations like "memory read," "memory write," "I/O read," "I/O write," "interrupt request," and "bus grant."

These signals ensure that devices don't try to use the buses simultaneously or perform conflicting operations.

The control bus is like the city's traffic light system.

## Other Buses

### ● I/O Bus

Handles data moving between CPU and input/output devices (keyboard, mouse, etc.)

Also called the Peripheral bus, considered part of the system bus, but, yeah, it's a bit different coz its *dedicated* to transferring data between the CPU and the system I/O devices.

Modern systems often use high-speed serial buses like *PCI Express (PCIe)* for this purpose.



This bus is all about getting data to and from your **input/output devices**. Imagine:

- **Keyboard Input:** When you type "hello," that information needs to travel from your keyboard into the computer. The I/O bus is the route that data takes, like supplies being delivered to a restaurant. 
- **Printer Output:** When you hit "print," the document data needs to go from your computer out to the printer. The I/O bus handles this, much like official documents being sent out to residents. 

The I/O bus is considered part of the overall system bus, but it's "a bit different" because it's dedicated to these specific external communications. Modern systems use advanced, high-speed **I/O buses like PCI Express (PCIe)**.



## 🧠 Memory – Where Programs & Data Live

All your running programs and variables are stored in **RAM**. But here's the kicker: The CPU can't run them straight from **RAM** coz it is a temporary storage locker for the CPU.



It always does this:

1. Grabs the instruction from memory.
2. Brings it into the CPU which has small temporary storage locations, registers, the ones we covered in the previous chapter.
3. Executes it.
4. Maybe sends a result back to memory.

So, your code doesn't *run in RAM*, it runs **inside the CPU** — one piece at a time, or in chunks.

*We're going to see more about this Fetch, Decode, Execute cycle ahead.*

## 📦 Buses Summary (Quick Table):

| Bus Type    | What it Moves                     | Between                           | 🔗 |
|-------------|-----------------------------------|-----------------------------------|---|
| Data Bus    | Actual values, instructions       | CPU ⇌ Memory                      |   |
| Address Bus | Memory addresses                  | CPU → Memory (to say where to go) |   |
| Control Bus | Control signals (like READ/WRITE) | CPU → All hardware                |   |
| I/O Bus     | Device-level data                 | CPU ⇌ Keyboard, Mouse, etc.       |   |

## TLDR – Reverse Engineering Focus:

- Know the **ALU** is where bitwise ops live (AND, OR, SHL, etc.)
- Know that **registers** are the CPU's playground — what you see in disasm (like eax, edx, rsi, etc.)
- Remember: instructions **run inside** the CPU, not memory. Memory just holds them until they're needed.
- Buses = wires that move the ops around. If you're watching malware move code into memory and jump to it — that's this system in action.



A *register's purpose* often becomes clear from the instructions around it. Is it being used as a counter in a loop? An argument for a function? The return value? The context will clue you in.

*Learn by Doing:* The more assembly code you read and write (even small snippets!), the more you'll see how registers are actually used in real programs. This hands-on experience beats rote memorization any day.

## ⌚ CLOCK & CLOCK CYCLE (X86 CPU TIMING EXPLAINED)

### ⌚ The Unseen Rhythm: What's the Clock?

The **CPU clock** is like the relentless, precisely timed heartbeat of your processor — ticking at a fixed speed (e.g., 1 GHz = 1 billion ticks per second).



It is an *internal electronic signal* that oscillates at an incredibly fixed and high frequency.

This isn't just a simple timer; it's the *master synchronizer* that orchestrates every single operation within the processor and its interactions with the rest of the computer system.

This clock ticks at a specific, fixed speed, often measured in Gigahertz (GHz). For example, a **3 GHz CPU** means the clock "ticks" 3 billion times every second. This incredible speed allows for billions of individual operations to occur in a mere blink of an eye.

The *clock keeps the CPU, RAM, and buses perfectly in sync*. It ensures data moves smoothly and at the right time — no timing chaos, no crashes. Without it, everything would fall apart.

## What's a Clock Cycle?

One clock cycle = **one complete tick** = the smallest unit of time the CPU understands.

*One clock cycle is equivalent to one complete tick. It represents the smallest indivisible unit of time the CPU understands and utilizes to perform any action. Nothing, absolutely nothing, can happen for a duration shorter than one clock cycle.*

The duration of a single clock cycle is simply the inverse of the clock speed.

For a CPU running at 1 GHz (1,000,000,000 cycles per second), one clock cycle lasts:

$$\frac{1 \text{ second}}{1,000,000,000 \text{ blinks}} = 0.000000001 \text{ seconds}$$

This is an incredibly tiny slice of time, emphasizing the sheer speed at which modern processors operate.

$$\frac{1 \text{ second}}{1,000,000,000 \text{ cycles}}$$

$$= 0.000000001 \text{ seconds}$$

(which is 1 nanosecond)

## ⌚ Clock Cycle in Action

- Every CPU instruction takes **at least 1 clock cycle** to run.
- Thanks to **pipelining**, modern CPUs can crunch simple operations super fast — even finishing one per cycle.
- But older CPUs? Different story.

On something like the **Intel 8088**, a single MUL instruction could eat up **tens or even hundreds** of cycles. 🎉

## 🧠 Meet the 8088 – The OG PC Chip

- Dropped in **1981**, the **Intel 8088** powered the first IBM PCs. That moment? Kicked off the whole *personal computer era*.
- It was a **cost-cut** version of the 8086 — same 16-bit CPU inside, but with an **8-bit external data bus** instead of 16.



**Why?** So, IBM could use cheaper 8-bit parts and simpler motherboard designs. 💰

**Downside?** To move 16-bit data, the 8088 had to do **two 8-bit transfers**. Slower memory and I/O — but it was worth it for the cost savings at the time.

## ❖ Segmented Memory (Remember this?)

- The 8088, like the 8086, used **segment:offset** addressing to get around the 64KB memory limit.
- It combined:
  - ✓ A **16-bit segment register** (points to a 64KB block)
  - ✓ A **16-bit offset**
- Together = a **20-bit address** → Boom, access to **1MB of RAM**.

( $2^{16}$  segment shifted left by 4 bits + offset = 20-bit address) – *we discussed this before.*

## ✓ Compatibility Bonus

- The 8088 ran the *same instructions* as the 8086 — full instruction set compatibility.
- So devs didn't have to rewrite anything. If it ran on 8086, it ran on 8088.

That made adoption easy and fast — crucial for software devs.

Modern x86 CPUs are incredibly sophisticated. They employ techniques like *pipelining* and *out-of-order execution*.

### Pipelining:

Imagine an assembly line. Instead of one worker building an entire car from start to finish, different workers perform different stages simultaneously on different cars.

In a CPU, this means that while one instruction is in its "**execute**" phase, another might be in "**decode**", and a third in "**fetch**."



## Out-of-Order Execution:

The CPU skips stalled instructions and runs independent ones first, then reorders the results. It's like working on what's ready instead of waiting — keeps the clock cycles busy.



If the CPU has to wait for slow memory? That gap = **wait states** (empty cycles where CPU chills while memory catches up).

If someone's confused about how a CPU doesn't freeze when one instruction stalls, here's the reason:

*"The CPU looks at its queue like this. If one instruction's waiting on RAM, it just hops to the next one that's ready. Keeps that pipeline moving."*

## ↗ Wait States – When the CPU's Just... Waiting

### 🚧 The Problem:

The CPU's insanely fast — like sprinting ahead at gigahertz speeds.

But RAM? RAM's out here jogging. 🚶

So when the CPU asks RAM for some data, it's gotta wait... and **wait...**  
...because RAM's still looking for it in its dusty file cabinet.

### ● The Result:

While waiting, the CPU basically just **sits idle**, burning through clock cycles doing *nothing*.  
These wasted cycles?

👉 They're called **wait states** — and yeah, they suck. It's like revving a Ferrari just to sit in traffic.

### 💡 The Fix: Enter the Caches

Modern CPUs fight back using **caches** — tiny, super-fast memory layers:

- **L1** – Small but lightning fast, closest to the core
- **L2** – Bigger, a bit slower
- **L3** – Even bigger, shared across cores

These caches stash frequently used data so the CPU doesn't always have to bug slow RAM.  
If the data's in cache? Boom — **no wait states**.  
If not? Well... back to traffic.



## ⌚ TLDR:

- Wait states = CPU stalls for the RAM to catch up. Wasted Clock Cycles. Like a *140wpm giga-typist server admin waiting for some document from an intern who types at 20wpm*, in order to reboot the server.
- Happens when RAM's too slow to deliver data on time.
- **Cache** acts as the CPU's ultra-fast, on-site mini-warehouse. It's a small, incredibly quick memory buffer that stores frequently accessed data and instructions.

**Repeat:** *Cache is fast, tiny, and loaded with the stuff the CPU uses most so it doesn't have to keep calling slowpoke RAM.*

## 🧠 THE INSTRUCTION EXECUTION CYCLE: THE CPU'S ETERNAL GRIND

Modern CPUs may have deep pipelines, out-of-order logic, and all kinds of secret sauce — but at the core?

They still run the same ancient loop over and over:

**Fetch → Decode → Execute → Store**

Billions of times per second. Non-stop.

### ■ Step 1: Fetch – Go Get the Next Command

#### 📌 What Happens:

- The **Control Unit** checks the **Instruction Pointer** (IP / EIP / RIP) — this register holds the address of the next instruction to run.
- That address is slapped onto the **address bus**.
- CPU sends a **READ signal** on the **control bus**.
- RAM (like a good librarian) grabs the binary instruction from that address — say 0100101010101010 — and sends it back through the **data bus**.
- The instruction lands in the **Instruction Register (IR)** — ready to be decoded next.
- CPU bumps the Instruction Pointer forward, pointing it to the *next* instruction for the next cycle.

## ⚙️ TLDR:

Instruction Pointer → Address Bus → RAM → Data Bus → Instruction Register.

## 💡 Analogy Time:

Imagine a factory worker following a checklist.

- They look at the next task on their list (**Instruction Pointer**).
- Head over to the supply room (**RAM**) to grab the specific blueprint for the task (**instruction**).
- Bring it back to their station (**Instruction Register**) and get ready to work.
- Then? Flip the page to the next item on the checklist — ready for the next fetch.



## Step 2: Decode – “Alright, What Are We Even Doing?”

### What Happens:

Once the instruction lands in the **Instruction Register**, it's time to make sense of that raw binary. The **Control Unit** steps in and starts unpacking the meaning.

### The Decode Process:

#### 1. Opcode (Operation Code):

What's the instruction asking for?

- Is it an ADD?
- A MOV?
- A JMP?

Each opcode is a specific binary pattern that tells the CPU *what action* to take.

#### 2. Operands:

What data is being used?

- Registers?
- Memory addresses?
- Immediate values (hardcoded numbers)?

The CPU figures out *where* to pull data from and *where* to send it.

#### 3. Micro-operations (Microcode):

The CPU breaks down the instruction into **tiny internal steps** it can actually execute.

Like:

- “Send register A to ALU input 1”
- “Tell ALU to perform ADD”
- “Save result to register B”. These atomic actions are the **real moves** the hardware performs.

## Analogy:

Our factory worker just got the blueprint from RAM (Fetch).  
Now they're staring at it going:

**"Ahhh... I'm supposed to build Widget X using Part A and Part B, using the Assembly Tool 3000."**

- Widget X = the **opcode**.
- Part A & Part B = the **operands**.
- Assembly Tool 3000 = the **internal micro-operations**.

They don't start building just yet — they're **understanding the job first**.



## ↳ Step 3: Execute – “Do the Work!”

### ★ What Happens:

This is where the CPU finally **performs the action** it decoded. No more planning — it's go time.

The **Control Unit** sends control signals to fire up the right parts of the CPU, depending on what the instruction actually wants:

### ⚙️ Three Main Possibilities:

#### 1. 🧠 Math or Logic?

The **ALU** (Arithmetic Logic Unit) takes center stage.

- Control Unit feeds the operands into the ALU.
- ALU performs the operation: ADD, SUB, AND, OR, etc.
- Result gets output, ready for storage.

The ALU is built from a whole **matrix of transistors** flipping ON and OFF — pushing those 1s and 0s through logic gates at light speed.

#### 2. 📦 Moving Data?

For instructions like MOV or LOAD, data just gets routed between registers, or between CPU and memory.

No math here — just straight-up transfer missions.

#### 3. 🚀 Control Flow (JMP, CALL)?

The CPU changes direction.

- The **Instruction Pointer (IP/EIP/RIP)** is updated to a new address.
- The program "jumps" somewhere else — maybe a function, loop, or branch.

### 💡 What's Actually Happening Physically?

Under the hood, this is all **transistors flipping** on and off.

Just millions of tiny electrical switches pulling off math, logic, and movement of data — all timed perfectly by the clock pulse.

It's not magic, it's **electrons sprinting in formation**.

## **Analogy Time:**

The factory worker now **gets to work**.

They've read the blueprint, grabbed the parts — now they:

- Use the **Assembly Tool 3000**
- Combine **Part A + Part B**
- Finish the build 

This is the actual hands-on moment — the CPU's equivalent of hammering, welding, or screwing parts together.



## 💡 Step 4: Store (Write-Back) – “Save the Result!”

### 艴 What Happens:

The CPU just finished running the instruction — now it needs to **put that result somewhere useful**.

- **If it's needed immediately?**  
→ Stored in a **register** for the next instruction to grab.
- **If it's meant for long-term use?**  
→ Sent over the **data bus** to a specific spot in **RAM** (picked out using the **address bus**).  
→ CPU also fires a **WRITE** signal through the **control bus** to tell memory, “Hey, store this here.”

### 💡 TLDR:

Output goes to a **register** or **memory**, depending on where it's needed next.

### 🏭 Analogy:

The factory worker just finished building **Widget X**.

- **If another worker needs it right away?**  
→ It goes into the "active bin" at their workstation (**register**).
- **If it's going to storage or shipping?**  
→ They **box it up and send it off** to the warehouse (**memory**).



## CPU Life:

This **4-step loop — Fetch, Decode, Execute, Store — never stops.**

From the moment you power on to the second you shut down.

Billions of instructions per second. No breaks. No excuses.

## Real Talk:

If the CPU clock is the beat, then the instruction cycle(F-D-E-S) is the dance. Every instruction is a dancer. Same steps every time — just with different moves.



## The Grand Choreography – From Boot to Shutdown

The **Fetch → Decode → Execute → Store** cycle isn't optional.

It's the **heartbeat** of your computer.

Every game you've played, every piece of malware you've reverse engineered, every compiler you've used —

all of them are just riding this loop. Billions of times per second.

## ♪ So, What's the Vibe?

If the CPU's clock is a **drumbeat**, the instruction cycle is the **choreo**.

- **Fetch:** Scope out the next move.
- **Decode:** Understand what the move means.
- **Execute:** Process the move.
- **Store:** Save it and prepare for the next beat.

Even the wildest zero-day malware is just flipping bits in this same rhythm.

Its logic that runs the digital universe.

## Почем Why This Matters for You:

As a reverse engineer, **this is your map**.

*When you're analyzing disassembly, you're watching this dance play out — step by step.*



*When there's lag? You're spotting missed steps (wait states, cache misses).*



*When code's obfuscated? You're untangling its footwork.*



The better you understand this cycle, the more **x-ray vision** you get into what software *really* does underneath all the GUI fluff.

This isn't just theory — it's the **grind behind every syscall, jump, XOR, and function call** you'll ever break down.



Once assembly becomes second nature, you'll find it really easy to work with big tools like:

## Binary Ninja:



x64dbg:

