

TLB miss and audio / video co-processor

# Qupls4 System Control Processor

Robert Finch

---

## Table of Contents

|                                                              |    |
|--------------------------------------------------------------|----|
| Qupls4_copro .....                                           | 3  |
| Overview .....                                               | 3  |
| Interrupts .....                                             | 3  |
| Operation .....                                              | 3  |
| Vectors .....                                                | 3  |
| Priority .....                                               | 3  |
| Copro Programming Model .....                                | 4  |
| Internal Stack .....                                         | 4  |
| Copro Control Registers .....                                | 4  |
| Copro Control Register .....                                 | 4  |
| Copro Instruction Set Description .....                      | 5  |
| WAITcc .....                                                 | 5  |
| LOAD .....                                                   | 6  |
| STORE .....                                                  | 7  |
| STOREI .....                                                 | 7  |
| Jcc .....                                                    | 8  |
| JEQ / JNE / JLT / JLE / JGE / JGT / JLEP / JGEP / DJNE ..... | 8  |
| JUMP .....                                                   | 9  |
| JSR .....                                                    | 9  |
| RET .....                                                    | 10 |
| CALC_INDEX .....                                             | 10 |
| CALC_ADR .....                                               | 11 |
| BUILD_ENTRYNO .....                                          | 12 |
| BUILD_VPN .....                                              | 13 |
| ADD .....                                                    | 14 |
| ADD64 .....                                                  | 14 |
| AND .....                                                    | 15 |
| AND64 .....                                                  | 15 |
| OR .....                                                     | 16 |
| XOR .....                                                    | 17 |
| SHL .....                                                    | 18 |
| SHR .....                                                    | 18 |



# Qupl4\_copro

## Overview

The Qupl4 copro is co-processor that handles MMU and display controller tasks. It is capable of updating the display controller's register set at specific points during the display generation. The copro has a small instruction set. Copro instructions are 32-bit words. All data values are 64-bit.

If enabled, during every vertical reset of the display the copro's instruction pointer is set to point at location 10h and the copro begin executing instructions.

For TLB misses the copro begins executing instructions at location 008h.

## Interrupts

The copro supports two sources of interrupts, the video frame interrupt and a TLB miss interrupt. Copro interrupts are very high speed. It takes three clock cycles to re-awaken from low power mode. Registers are swapped in one of those three cycles.

A TLB miss interrupt will automatically disable paging. Paging will be enabled on return from the interrupt subroutine.

## Operation

Normal operation is to execute a WAIT instruction which will put the copro in a lower power mode, then process the interrupt when it occurs.

When the interrupt occurs the instruction pointer for the instruction is stored on an internal stack. Then registers r1 to r8 are also placed on the internal stack. On interrupt return the registers are restored and the instruction pointer is loaded from the internal stack.

## Vectors

The copro vectors to the following locations on reset or interrupt.

| Vector        | Value  |
|---------------|--------|
| Reset         | \$0000 |
| TLB Miss      | \$0008 |
| Vertical Sync | \$0100 |

## Priority

TLB miss interrupts have a higher priority than display interrupts. TLB interrupts are allowed to interrupt the display interrupt processing routine.

## Copro Programming Model

There are fifteen 64-bit general purpose registers r1 to r15. R0 is always zero.

### Internal Stack

There is a 16-entry stack which can hold the IP plus registers r1 to r8. The stack is not directly accessible.

When an interrupt occurs or a subroutine call is done (JSR) the instruction pointer and the first eight registers are all placed on the internal stack in a single clock cycle. A two-bit flag is also saved on the stack indicating the call type (subroutine TLB miss, or frame interrupt).

On a return (RET) instruction the stack is popped and the IP and first eight GPRs may be restored in a single clock cycle. Which registers get restored depends on a register list mask specified by the RET instruction. For interrupts the mask should be set to FFh to restore all eight registers.

## Copro Control Registers

Copro address registers are used to store addresses for copro programs that include the restart address (address register zero), and subroutine linkage addresses.

| 63    |             | 0                       |                      |
|-------|-------------|-------------------------|----------------------|
| \$CC0 | $\sim_{32}$ | Offset <sub>31..0</sub> |                      |
| ....  |             |                         | 14 more registers    |
| \$CF8 | $\sim_{32}$ | Offset <sub>31..0</sub> |                      |
| \$CB0 |             |                         | F E Control Register |

### Copro Control Register

This register has bits to enable the copro. It also has a bit indicating the restart rate for the copro. The copro program may be automatically restarted at the beginning of every video frame (the default configuration) or it may restart every 16<sup>th</sup> frame.

## Copro Instruction Set Description

### WAITcc

**Opcode:** 1

**Description:**

The wait instruction waits for a signal combination or interrupt to occur. This could be the display generation scan to reaching a specific horizontal, vertical and frame position. The copro is not active while waiting and other devices may freely access the display register set.

The copro will also wait for a write cycle to occur at the specified local memory address. The value written will be loaded into the destination register Rd. The local memory address must be within the last 2k words of the address space. This address can thus be used as a signal by an external processor.

While waiting the copro is placed into a lower power mode. Three clock cycles are required to exit the mode.

**Instruction Format:**

| Bits  | Field   | Description             |
|-------|---------|-------------------------|
| 21:31 | Address | Signal address last 2k  |
| 17:20 | Cond    | condition               |
| 13:16 | Rs2     | Signal conditions mask  |
| 9:12  | Rs1     | Signal conditions value |
| 5:8   | Rd      | Command value           |
| 4:0   | 1       | WAIT opcode             |

Cond – condition and operation to perform

|                   |     |                                |
|-------------------|-----|--------------------------------|
| Cond <sub>4</sub> | cc  | Jump operation                 |
| 0 to 7            |     | Reserved                       |
| 9                 | GEP | Scan position after {Rs1, Rs2} |
| 10                |     | Unconditional WAIT             |
| 9 to 14           |     | reserved                       |

**Signal Conditions:**

Rs1:

| Bits  | Field | Description  |
|-------|-------|--------------|
| 63:60 | ~     | reserved     |
| 48    | B     | blitter wait |

|       |   |                         |
|-------|---|-------------------------|
| 37:32 | F | frame. to wait for      |
| 27:16 | V | vert. pos. to wait for  |
| 11:0  | H | horiz. pos. to wait for |

Rs2

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 11:0  | H     | horiz. pos. to wait for |
| 37:32 | MF    | mask: frame             |
| 27:16 | MV    | mask: vertical pos      |
| 11:0  | MH    | mask: horizontal pos    |

B – indicates that the copro should wait for an outstanding blit operation to complete before continuing.

F – the frame number that the copro should wait for. This may be masked off by the frame mask (MF) field.

V – the vertical position that the copro should wait for. This may be masked by the vertical position mask field (MV)

H – the horizontal position that the copro should wait for. this may be masked by the horizontal position mask field (MH).

## LOAD

**Opcode:** 16

**Description:**

The LOAD instruction moves a value from a register or memory to Rd. The load address is specified as the sum of Rs1 and a sign extended displacement constant.

**Instruction Format:**

| Bits  | Field | Description                     |
|-------|-------|---------------------------------|
| 17:31 | Disp  | Displacement                    |
| 13:16 | ~     | reserved                        |
| 9:12  | Rs1   | Base Address                    |
| 5:8   | Rd    | Destination register (not used) |
| 4:0   | 16    | STORE opcode                    |

# STORE

**Opcode:** 17

**Description:**

The STORE instruction moves a value from Rs2 into one of the display controller or MMU registers specified as the sum of Rs1 and a sign extended displacement constant. This allows the copro to do things like initiate a blitter operation or trigger an interrupt.

**Instruction Format:**

**Instruction Format:**

| Bits  | Field | Description                     |
|-------|-------|---------------------------------|
| 17:31 | Disp  | Displacement                    |
| 13:16 | Rs2   | Value to store                  |
| 9:12  | Rs1   | Base Address                    |
| 5:8   | Rd    | Destination register (not used) |
| 4:0   | 17    | STORE opcode                    |

# STOREI

**Opcode:** 18

**Description:**

The STORE instruction moves an immediate value into one of the display controller or MMU registers specified as the sum of Rs1 and a sign extended displacement constant. This allows the copro to do things like initiate a blitter operation or trigger an interrupt.

If the immediate is larger than 14 the immediate constant follows the instruction.

**Instruction Format:**

| Bits  | Field | Description       |
|-------|-------|-------------------|
| 17:31 | Disp  | Displacement      |
| 13:16 | Rs2   | Value to store    |
| 9:12  | Rs1   | Base Address      |
| 5:8   | Rd    | Immediate 0 to 14 |
| 4:0   | 18    | STORE opcode      |

# JCC

## JEQ / JNE / JLT / JLE / JGE / JGT / JLEP / JGEP / DJNE

**Opcode:** 4

**Description:**

This instruction conditionally jumps to a destination address if the relationship between Rs1 and Rs2 is true.

The DJNE instruction decrements the Rs1 register then jumps if the decremented value is not equal to Rs2.

**Instruction Format:**

| Bits  | Field   | Description           |
|-------|---------|-----------------------|
| 17:31 | Address |                       |
| 13:16 | Rs2     | Test register (mask)  |
| 9:12  | Rs1     | Test register (value) |
| 5:8   | Cond    | Branch condition      |
| 4:0   | 4       | Jcc opcode            |

Cond – condition and operation to perform

| Cond <sub>4</sub> | cc   | Jump operation                  |
|-------------------|------|---------------------------------|
| 0                 | EQ   | Rs1 = Rs2                       |
| 1                 | NE   | Rs1 <> Rs2                      |
| 2                 | LT   | Rs1 < Rs2                       |
| 3                 | LE   | Rs1 <= Rs2                      |
| 4                 | GE   | Rs1 >= Rs2                      |
| 5                 | GT   | Rs1 > Rs2                       |
| 6                 | DJNE | Rs1 = Rs1 -1, Rs1 != Rs2        |
| 8                 | LEP  | Scan position before {Rs1, Rs2} |
| 9                 | GEP  | Scan position after {Rs1, Rs2}  |
| 10                | GQE  | Graphics que empty              |
| 11                | GQNE | Graphics que not empty          |
| 12 to 15          |      | reserved                        |

# JUMP

**Opcode:** 9

**Description:**

This instruction unconditionally jumps to a destination address. The destination address is the sum of Rs1 and a sign extended constant value.

**Instruction Format:**

| Bits  | Field   | Description  |
|-------|---------|--------------|
| 17:31 | address |              |
| 13:16 | ~       | Reserved     |
| 9:12  | Rs1     | Base address |
| 5:8   | 0       | Op           |
| 4:0   | 9       | JMP opcode   |

# JSR

**Opcode:** 9

**Description:**

This instruction unconditionally jumps to a subroutine by pushing the return address and registers r1 to r8 on an internal stack then changing program flow to the destination address. The destination address is the sum of Rs1 and a sign extended constant value.

Registers may be optionally popped from the stack by the RET instruction.

**Instruction Format:**

| Bits  | Field   | Description  |
|-------|---------|--------------|
| 17:31 | address |              |
| 13:16 | ~       | Reserved     |
| 9:12  | Rs1     | Base address |
| 5:8   | 1       | Op           |
| 4:0   | 9       | JMP opcode   |

## RET

**Opcode:** 9,2

**Description:**

This instruction unconditionally returns from a subroutine or interrupt by popping the return address. Registers r1 to r8 may be popped off an internal stack according to a register list mask. Bit 0 of the mask represents r1, bit 1 represents r2, and so on.

**Instruction Format:**

| Bits  | Field   | Description                |
|-------|---------|----------------------------|
| 17:31 | Reglist | Which registers to restore |
| 13:16 | ~       | Reserved                   |
| 9:12  | ~       | Reserved                   |
| 5:8   | 2       | Op                         |
| 4:0   | 9       | JMP opcode                 |

## CALC\_INDEX

**Opcode:** 12

**Description:**

This instruction computes the page table index or the PTE for the current miss address using the page size from the page table attributes register and the level supplied by register Rs1 and places the value in register Rd. Note the calculated value still needs to be masked according to the number of entries in the page table. This is done by CALC\_ADR.

This instruction replaces about four operations with an operation executed in a single cycle.

**Calculation:**

Index = miss\_address >> ((log2(page size) - 3) \* level + log2(page size))

**Instruction Format:**

| Bits  | Field | Description                  |
|-------|-------|------------------------------|
| 17:31 | ~     | reserved                     |
| 13:16 | ~     | Reserved                     |
| 9:12  | Rs1   | Table level – three LSB bits |
| 5:8   | Rd    | Index value                  |
| 4:0   | 12    | opcode                       |

# CALC\_ADR

**Opcode:** 13

## Description:

This instruction computes the address of the PTE for the current miss address using the page size from the page table attributes register, the index into the page table supplied by CALC\_INDEX in register Rs2 and the base address of the table supplied by register Rs1 and places the value in register Rd.

This instruction replaces about four operations with an operation executed in a single cycle.

## Calculation:

$$\text{Mask} = (1 \ll (\log_2(\text{page size})) - 1$$

$$\text{Address} = \text{Rs1} | (\text{Rs2} \& \text{mask})$$

## Instruction Format:

| Bits  | Field | Description                   |
|-------|-------|-------------------------------|
| 17:31 | ~     | reserved                      |
| 13:16 | Rs2   | Table index (from CALC_INDEX) |
| 9:12  | Rs1   | Page table address            |
| 5:8   | Rd    | Address of PTE                |
| 4:0   | 13    | opcode                        |

# BUILD\_ENTRYNO

**Opcode:** 14

**Description:**

This instruction builds the entry\_no argument required to access the TLB. It takes the PTE index in Rs1, the way in Rs2 and a constant used to determine whether to read or write the PTE.

This instruction replaces about three operations with an operation executed in a single cycle.

**Calculation:**

$$\text{Entry\_no} = \text{Rs1}[15:0] | (\text{Rs2}[7:0] \ll 16) | (1 \ll \text{const})$$

**Instruction Format:**

| Bits  | Field | Description                   |
|-------|-------|-------------------------------|
| 17:31 | Const | Only the lower five bits used |
| 13:16 | Rs2   | way                           |
| 9:12  | Rs1   | PTE index                     |
| 5:8   | Rd    | Entry_no                      |
| 4:0   | 14    | opcode                        |

# BUILD\_VPN

**Opcode:** 15

## Description:

This instruction builds the VPN portion of a TLB entry required to update the TLB. It takes the current miss address, asid, and current count and builds it into a single value.

This instruction replaces about four operations with an operation executed in a single cycle.

## Calculation:

$$\text{VPN} = (\text{miss address}) \gg (\log_2(\text{page size}) + \log_2(\text{TLB entries})) \mid (\text{asid} \ll 48) \mid (\text{count} \ll 42)$$

## Instruction Format:

| Bits  | Field | Description      |
|-------|-------|------------------|
| 17:31 | ~     | Reserved         |
| 13:16 | ~     | Reserved         |
| 9:12  | ~     | Reserved         |
| 5:8   | Rd    | VPN, ASID, COUNT |
| 4:0   | 15    | Opcode           |

# ADD

**Opcode:** 22

**Description:**

This instruction adds Rs1, Rs2 and a sign extended constant. This instruction may also be used to load a constant into a register.

**Calculation:**

$$Rd = Rs1 + Rs2 + Imm$$

**Instruction Format:**

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 17:31 | Imm   | Constant                |
| 13:16 | Rs2   | 2 <sup>nd</sup> operand |
| 9:12  | Rs1   | 1 <sup>st</sup> operand |
| 5:8   | Rd    | Result value            |
| 4:0   | 22    | Opcode                  |

# ADD64

**Opcode:** 5

**Description:**

This instruction adds Rs1, Rs2 and a 64-bit constant. The immediate constant follows the instruction. This instruction may also be used to load a 64-bit constant into a register. This instruction takes two or three clock cycles depending on the alignment of the constant.

**Calculation:**

$$Rd = Rs1 + Rs2 + Imm$$

**Instruction Format:**

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 17:31 | ~     | reserved                |
| 13:16 | Rs2   | 2 <sup>nd</sup> operand |
| 9:12  | Rs1   | 1 <sup>st</sup> operand |
| 5:8   | Rd    | Result value            |
| 4:0   | 5     | Opcode                  |

# AND

**Opcode:** 24

**Description:**

This instruction bitwise ‘ands’ Rs1, Rs2 and a sign extended constant.

**Calculation:**

$$Rd = Rs1 \& Rs2 \& Imm$$

**Instruction Format:**

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 17:31 | Imm   | constant                |
| 13:16 | Rs2   | 2 <sup>nd</sup> operand |
| 9:12  | Rs1   | 1 <sup>st</sup> operand |
| 5:8   | Rd    | Result value            |
| 4:0   | 24    | Opcode                  |

# AND64

**Opcode:** 23

**Description:**

This instruction bitwise ‘ands’ Rs1, Rs2 and a 64-bit constant. The immediate constant follows the instruction. This instruction takes two or three clock cycles depending on the alignment of the constant.

**Calculation:**

$$Rd = Rs1 \& Rs2 \& Imm$$

**Instruction Format:**

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 17:31 | ~     | reserved                |
| 13:16 | Rs2   | 2 <sup>nd</sup> operand |
| 9:12  | Rs1   | 1 <sup>st</sup> operand |
| 5:8   | Rd    | Result value            |
| 4:0   | 23    | Opcode                  |

# OR

**Opcode:** 25

**Description:**

This instruction bitwise ‘ors’ Rs1, Rs2 and a sign extended constant.

**Calculation:**

$$Rd = Rs1 \mid Rs2 \mid Imm$$

**Instruction Format:**

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 17:31 | Imm   | constant                |
| 13:16 | Rs2   | 2 <sup>nd</sup> operand |
| 9:12  | Rs1   | 1 <sup>st</sup> operand |
| 5:8   | Rd    | Result value            |
| 4:0   | 25    | Opcode                  |

# XOR

**Opcode:** 26

**Description:**

This instruction bitwise exclusive ‘ors’ Rs1, Rs2 and a sign extended constant.

**Calculation:**

$$Rd = Rs1 \wedge Rs2 \wedge Imm$$

**Instruction Format:**

| Bits  | Field | Description             |
|-------|-------|-------------------------|
| 17:31 | Imm   | constant                |
| 13:16 | Rs2   | 2 <sup>nd</sup> operand |
| 9:12  | Rs1   | 1 <sup>st</sup> operand |
| 5:8   | Rd    | Result value            |
| 4:0   | 26    | Opcode                  |

# SHL

**Opcode:** 20

**Description:**

This instruction performs a left shift operation on Rs1 using a count which is the bitwise or of the value in Rs2 and the immediate. Usually either the count or Rs2 would be zero. The shift count is limited to five bits allowing shifts of up to 31 bits.

**Calculation:**

$$Rd = Rs1 \ll (Rs2 | imm)$$

**Instruction Format:**

| Bits  | Field | Description      |
|-------|-------|------------------|
| 17:31 | Imm   | Five LSBs only   |
| 13:16 | Rs2   | Count            |
| 9:12  | Rs1   | Operand to shift |
| 5:8   | Rd    | Result value     |
| 4:0   | 20    | Opcode           |

# SHR

**Opcode:** 21

**Description:**

This instruction performs a right shift operation on Rs1 using a count which is the bitwise or of the value in Rs2 and the immediate. Usually either the count or Rs2 would be zero. The shift count is limited to five bits allowing shifts of up to 31 bits.

**Calculation:**

$$Rd = Rs1 \gg (Rs2 | imm)$$

**Instruction Format:**

| Bits  | Field | Description      |
|-------|-------|------------------|
| 17:31 | Imm   | Five LSBs only   |
| 13:16 | Rs2   | Count            |
| 9:12  | Rs1   | Operand to shift |
| 5:8   | Rd    | Result value     |
| 4:0   | 21    | Opcode           |