

# Generated Specification Document

## INTRODUCTION

-- Purpose and Scope --

Purpose: Provide a synthesizable USB 2.0 device (function) controller top-level that bridges a UTMI/UTMI+ PHY to a WISHBONE system bus and a shared external SSRAM payload buffer, handling packet/protocol processing, endpoint control, DMA handshakes, and interrupt/event reporting.

Scope: The usbf\_top integrates the complete device-side datapath and control including: UTMI interface adaptation (receive/transmit data strobes, line-state handling, suspend/resume, reset detection, HS/FS speed negotiation); packet decode/encode with CRC5/CRC16; protocol engine for token/data/handshake sequencing; endpoint register file and match logic for up to 16 endpoints (EP0–EP15, EP1–EP15 optionally enabled via macros); ping-pong buffer management (buf0/buf1) and DATA PID/buffer select tracking; per-endpoint DMA request/acknowledge generation and a memory arbiter sharing single-ported SSRAM between PHY-side DMA and WISHBONE accesses; WISHBONE-mapped control/status and memory windows with acknowledge generation; consolidated interrupt outputs (inta/intb) and error/status sources (PID check, CRC, NSE, timeouts); frame/microframe timing capture and vendor UTMI sideband control/status (VControl/VStatus). Address widths and buffer memory size are parameterized via macros.

Out of scope: USB host functionality, the physical transceiver (PHY), high-level USB class/descriptor firmware, and application-level data management. The core assumes system firmware configures endpoint CSRs and buffers, services DMA/interrupts, and provides external SSRAM and a UTMI-compliant PHY.

-- USB Compliance and Supported Modes --

USB 2.0 device core with an 8-bit UTMI+ Level 1 PHY interface. Supports Full-Speed (12 Mbps) and High-Speed (480 Mbps) operation; Low-Speed device mode is not supported. High-Speed entry via standard USB 2.0 chirp K/J negotiation (falls back to Full-Speed if HS negotiation fails). UTMI control signals XcvSelect, TermSel, SuspendM, and OpMode are driven per mode and power state; FS termination is enabled/disabled as required and transceiver select toggles HS/FS. Implements full protocol decode and integrity checks: PID complement checking, CRC5 (token) and CRC16 (data). Recognizes OUT/IN/SOF/SETUP and HS-only PING; SPLIT/ERR/PRE and other host/hub-specific PIDs are ignored for endpoint matching. Handshake generation includes ACK/NAK/STALL in FS and adds NYET in HS when no buffers are available. Data PID sequencing (DATA0/1/2/MDATA) is enforced by endpoint policy. SOF handling is implemented; in HS, microframes are tracked (mframe count and timing). Bus reset detection and recovery are implemented (SE0 timing) and speed re-negotiation occurs after reset. Suspend/resume complies with USB 2.0 timings: enters suspend after >3 ms idle; supports remote wake via resume\_req input (resume K signaling duration  $\geq 1$  ms, with appropriate settle delays); SuspendM/OpMode are managed for low-power. Attach/detach and VBUS presence are monitored; attach is debounced (~100 ms) and status is exposed. Supported transfer types: Control (EP0), Bulk, and Isochronous; interrupt type is not explicitly implemented as a distinct mode. Host mode and test modes are not implemented.

-- Feature Summary --

- USB 2.0 device core with UTMI interface: supports Full-Speed and High-Speed, automatic FS/HS speed negotiation (chirp), reset detection, suspend/resume; exposes mode\_hs, usb\_reset, usb\_suspend, usb\_attached.
- UTMI PHY glue: drives XcvSelect, TermSel, SuspendM, OpMode; handles resume signaling (drive K), line-state timing, and wake/suspend timing per USB spec.
- Packet engine: PID decode/generation, token/data handshakes (ACK/NACK/STALL/NYET), PING support, data PID sequencing, CRC5/CRC16 check and CRC16 generation, sequence and timeout error handling.
- Endpoint subsystem: up to 16 endpoints (EP0–EP15), compile-time selectable; each EP has CSR, two buffer descriptors (buf0/buf1) with address/size, auto PID toggle, buffer selection, DMA enable, direction/type (CTRL/ISO/BULK/INT), max packet size.
- Double-buffered data path with automatic descriptor updates, buffer rollover, out-to-small detection, and zero-length packet support.
- Integrated DMA engine (IDMA): streams between UTMI byte data and 32-bit memory, tracks sizes, assembles words, supports IN/OUT, exposes idma\_done and byte-count used.
- Shared SSRAM buffer memory: 32-bit wide, parameterizable size via SSRAM\_HADR; arbiter mediates between protocol DMA and WISHBONE host; dual-clock safe.
- WISHBONE slave interface: 32-bit data, address decoding splits register file vs buffer memory (via address MSB); supports read/write cycles with proper ack and clock crossing.
- Register file (RF): function address, per-EP control/status, buffer descriptors, maskable interrupt control, vendor UTMI register passthrough (write strobe + status readback).
- Interrupts: two aggregated outputs (inta\_o/intb\_o) combining per-EP events (buffer done, CRC16, timeouts, PID/sequence) and system events (attach/detach, suspend enter/exit, USB reset, RX errors); maskable sources.
- Frame/timekeeping: captures SOF frame number, microframe count, and inter-SOF timing into frm\_nat for diagnostics.
- External DMA handshake per endpoint: dma\_req\_o[15:0]/dma\_ack\_i[15:0] for system-level DMA coordination.
- Clean clock-domain crossings between system clock and PHY clock throughout control/data paths.
- Top-level I/O: UTMI 8-bit data and control pins, WISHBONE bus, interrupt lines, suspend output, resume request input, SSRAM memory port, and UTMI vendor control/status.

-- Endpoint Support and Buffering --

- Supported endpoints: EP0 is always present. EP1, EP2, EP3 are enabled by build-time defines (USBF\_HAVE\_EP1..3). EP4–EP15 are instantiated as dummies (no function) unless corresponding defines are enabled. A power-up message confirms the contiguous set of enabled endpoints.
- Endpoint configuration (per endpoint CSR):
- Direction: csr[27:26] = 00 CTRL, 01 IN, 10 OUT.
- Transfer type: csr[25:24] = 01 Isochronous, 10 Bulk (others reserved).
- State: csr[23:22] = 01 Disabled, 10 Stalled.
- DMA enable: csr[15] (ignored for control endpoints).
- Max payload size: csr[10:0] (11-bit, bytes).
- Additional fields: csr[17] large packet allowed, csr[16] small packet allowed, csr[12:11] transaction framing. Data PID and buffer select are tracked in uc\_dpd (csr[29:28]) and uc\_bsel (csr[31:30]).
- Double buffering model (per endpoint):
- Two 32-bit buffer descriptors: buf0 and buf1. Reset value 0xFFFF\_FFFF indicates unallocated.
- Descriptor fields: [31] NA flag (1 = not available); [30:17] size (14 bits, bytes); [SSRAM\_HADR+2:4] word-aligned base address; [3:0] low address nibble. An address of all 1s also implies NA.

- Active buffer selection: if DMA enabled, hardware uses buf0 (uc\_bsel not used for fetching). For control endpoints, selection follows token direction (IN uses buf1, OUT uses buf0). Otherwise, hardware prefers the non-NA buffer; uc\_bsel toggles the user-visible next buffer when a buffer completes.
- Buffer lifecycle:
  - IN: data is read from the active buffer until empty (new\_size == 0) then buffer\_done asserted.
  - OUT: writes proceed until remaining size < max payload (buffer\_full) then buffer\_done asserted.
  - On UPDATE phase, if DMA is enabled and buffer\_done is reached, buf0\_rl is pulsed to restore buf0 to its original descriptor (ring behavior). Otherwise, the active bufX is updated (bufX\_set) with idin carrying the new size and next start address.
  - uc\_bsel\_set and uc\_dpd\_set update csr[31:30] and csr[29:28] with the next buffer select and DATA PID sequencing.
  - Out-of-size conditions: to\_small (packet < max and small not allowed) or to\_large (packet > max and large not allowed) cause NAK on OUT. In HS OUT with DMA, a short packet (sizu\_c != max) sets out\_to\_small, updates the descriptor with the actual size, and raises an interrupt bit.
- DMA request/ack per endpoint:
  - Each non-control endpoint exposes a dma\_req bit (dma\_req\_o[15:0]), acknowledged via dma\_ack\_i.
  - OUT endpoints assert DMA request while dma\_out\_cnt != 0; IN endpoints assert while dma\_in\_cnt < programmed buffer length.
  - Flow control flags exported to the protocol layer per matched endpoint:
  - dma\_in\_buf\_sz1: at least one max packet of IN data is ready.
  - dma\_out\_buf\_avail: at least one max packet space is available for OUT.
  - Handshake is edge/level conditioned with hold logic to avoid premature drop until ack.
  - Protocol interaction and backpressure:
    - If endpoint is stalled, device answers STALL; if both buffers are NA, or DMA cannot start (no\_buf0\_dma), NAK is returned (NYET in HS when appropriate). PING in HS gets ACK without data.
    - Zero-length IN packets are generated when max\_pl\_sz == 0.
  - Memory and addressing:
    - External shared buffer memory is 4 KiB (SSRAM\_HADR = 9  $\Rightarrow$   $2^{(SSRAM\_HADR+1)}$  words  $\times$  4 bytes). Descriptors hold byte addresses; the IDMA uses word addresses [SSRAM\_HADR+2:2] and arbitrates with the WISHBONE via usbf\_mem\_arb.
  - Endpoint register access:
    - For each endpoint, address window exposes: CSR (offset 0), INT status (offset 1), BUF0 (offset 2), BUF1 (offset 3). Writes to BUFX set/refresh descriptors; RL (buf0\_rl) re-loads BUF0 from its original value.
  - Control endpoint (EP0):
    - Always present. Shares the same CSR/BUF interfaces, but CSR.DMA is effectively ignored for control transfers; buffer selection follows token direction rather than uc\_bsel.
  - Interrupts and status:
    - Per-endpoint interrupt causes include: buffer0/1 done, UPID change, CRC16 error, timeout (ACK or DATA), sequence error, and OUT short packet (out\_to\_small). Function-level interrupts report attach/detach, suspend/resume, reset, PID/CRC errors.

#### -- Clocking and Reset Overview --

- Clock domains
- phy\_clk\_pad\_i (PHY domain): Drives the entire USB datapath and protocol: UTMI interface and line-state FSM (usbf\_utmi\_if/usbf\_utmi\_ls), Packet Layer (usbf\_pl) including PD/PA/PE/IDMA, the shared SRAM arbiter (usbf\_mem\_arb), and the "clk" side of the register file/endpoints (usbf\_rf/usbf\_ep\_rf).
- clk\_i (System/WISHBONE domain): Drives the WISHBONE interface (usbf\_wb), the "wclk" side of the register file/endpoints (bus-visible registers, interrupt aggregation), DMA request aggregation, and

select top-level status synchronizers.

- External memory clocking
- The shared SRAM interface (sram\_adr\_o/sram\_data\_o/sram\_we\_o/sram\_re\_o) is generated in the PHY clock domain (phy\_clk\_pad\_i). The external SSRAM should be timed to the PHY clock domain.
- Reset scheme
  - rst\_i is an active-low reset distributed to all sub-blocks (forwarded to the PHY as phy\_rst\_pad\_o). In the delivered RTL, resets are synchronous to the local clock (if(!rst) in posedge clocked always blocks). If USBF\_ASYNC\_RESET is defined at compile time, key always blocks switch to asynchronous, active-low reset (negedge rst in sensitivity lists).
  - UTMI line-state FSM (usbf\_utmi\_ls) treats usb\_vbus\_pad\_i as an additional POR source: when asserted it drives the FSM to POR.
- Clock-domain crossings (CDC)
  - WISHBONE to PHY (usbf\_wb): wb\_stb\_i & wb\_cyc\_i are sampled into the PHY domain (wb\_req\_s1). Address/data/we are used while the request is asserted; wb\_ack\_o is re-timed back to clk\_i using a small pulse-stretch/edge-detect pipeline (wb\_ack\_s1/1a/2) to produce a one-cycle pulse on clk\_i.
  - System memory access (SRAM): usbf\_wb presents memory (ma\_\*) control signals already in the PHY domain; usbf\_mem\_arb arbitrates between PHY DMA (mreq) and WISHBONE-initiated memory cycles (wreq) all on phy\_clk, then drives the SSRAM. The WISHBONE-side acknowledge for memory cycles (ma\_ack) is generated from PHY-domain handshake.
  - Register file and endpoint state (usbf\_rf/usbf\_ep\_rf): Bus accesses occur on wclk (clk\_i); endpoint state, counters, and DMA pacing largely occur on clk (phy\_clk). Level-type interrupt sources from endpoints are sampled and ORed in the wclk domain for inta/intb.
  - DMA request/ack (per-endpoint in usbf\_ep\_rf): dma\_req is asserted in the wclk domain. The external dma\_ack (top-level input) is synchronized into the PHY domain (r4/r5) and a write-clear handshake is synchronized back into wclk, creating a robust 2-domain handshake.
  - UTMI suspend/resume handoff: suspend\_clr from the PHY domain is sampled in clk\_i (suspend\_clr\_wr) to clear a stretched resume\_req\_r generated on clk\_i from resume\_req\_i.
- Backpressure/feedback
  - tx\_valid\_out into the packet engine is driven from the UTMI TxValid\_pad\_o (PHY domain) to gate protocol-engine timers (e.g., ACK timeout logic) accurately to what the PHY transmits.
- Integration guidance
  - Prefer running clk\_i and phy\_clk\_pad\_i synchronous or frequency-related to minimize CDC complexity. If run asynchronously, observe that: (1) WISHBONE address/data/we must remain stable for the duration of a request until wb\_ack\_o; (2) all critical CDC paths include internal handshakes/synchronizers, but integrators should still include CDC constraints in timing sign-off.
  - Ensure external SSRAM timing matches the PHY clock, and that rst\_i is held low long enough for both clock domains to observe a clean reset.
  - If an asynchronous reset strategy is required, define USBF\_ASYNC\_RESET so all guarded blocks use async active-low reset.

-- External Interfaces Overview --

- Clock and reset
- clk\_i: System/Wishbone clock domain.
- phy\_clk\_pad\_i: UTMI/PHY clock domain.
- rst\_i: Synchronous active-high reset for the core; also forwarded to the PHY as phy\_rst\_pad\_o.
- Wishbone slave interface (32bit)

- wb\_addr\_i[`USBF\_UFC\_HADR:0], wb\_data\_i[31:0], wb\_data\_o[31:0], wb\_we\_i, wb\_stb\_i, wb\_cyc\_i, wb\_ack\_o.
- Address bit wb\_addr\_i[12] selects space: 0 = Register File (RF), 1 = Buffer Memory window (SSRAM).
- Single-cycle classic handshake; data width 32 bits; internal clock domain crossing handled.
- External SSRAM buffer memory interface
- sram\_adr\_o[SSRAM\_HADR:0]: word address to external memory.
- sram\_data\_o[31:0]: write data; sram\_data\_i[31:0]: read data.
- sram\_we\_o: write strobe; sram\_re\_o: read enable (held asserted; memory must tolerate RE high).
- Single physical port arbitrated between USB DMA (PHY side) and Wishbone accesses; DMA has priority.
- Memory size =  $(2^{(SSRAM\_HADR+1)}) \times 4$  bytes (word addressed).
- UTMI+ Low-Pin Interface (to USB PHY)
- TX: DataOut\_pad\_o[7:0], TxValid\_pad\_o, TxReady\_pad\_i.
- RX: DataIn\_pad\_i[7:0], RxValid\_pad\_i, RxActive\_pad\_i, RxError\_pad\_i.
- Line control/status: LineState\_pad\_i[1:0] (input), XcvSelect\_pad\_o, TermSel\_pad\_o, SuspendM\_pad\_o, OpMode\_pad\_o[1:0].
- Power/status: usb\_vbus\_pad\_i (input), usb\_attached/usb\_suspend handled internally and exposed via susp\_o.
- Supports FS/HS negotiation, suspend/resume signaling (drive-K handled internally).
- Interrupts
- inta\_o, intb\_o: Aggregated interrupt lines combining endpoint and core/RF events (masking/aggregation managed internally).
- Endpoint DMA handshake (per endpoint)
- dma\_req\_o[15:0]: Core DMA request lines (one per endpoint 0..15; only enabled endpoints will assert).
- dma\_ack\_i[15:0]: External DMA engine acks; used to pace endpoint DMA transfers.
- Suspend/Resume control
- susp\_o: Reflects current USB suspend state.
- resume\_req\_i: System request to initiate resume signaling; internally latched until the PHY clears suspend.
- Vendor/PHY sideband control and status
- VStatus\_pad\_i[7:0]: Vendor status inputs sampled into the register file space.
- VControl\_pad\_o[3:0], VControl\_Load\_pad\_o: Vendor control output bus with a load strobe when software writes the control register.
- Functional address (device address) and endpoints
- Device address managed in RF and used by protocol layer; endpoints 0..15 supported at the interface level; actual enabled set is compile-time configurable (USBF\_HAVE\_EPn).
- Clock domains and CDC
- Two asynchronous domains: Wishbone (clk\_i) and UTMI/PHY (phy\_clk\_pad\_i). All crossings (registers, DMA, memory arbitration) are handled internally; no external CDC constraints are required beyond providing stable clocks.

-- Deliverables and Supported Configurations --

- Deliverables:
  - Synthesizable Verilog RTL for the complete USB Function core, including: usbf\_top, usbf\_utmi\_if, usbf\_utmi\_ls, usbf\_pl, usbf\_pd, usbf\_pa, usbf\_pe, usbf\_idma, usbf\_mem\_arb, usbf\_wb, usbf\_rf, usbf\_ep\_rf, usbf\_ep\_rf\_dummy, usbf\_crc16, usbf\_crc5.
  - Top-level (usbf\_top) with UTMI+ PHY pad interface, WISHBONE slave interface, external synchronous SRAM interface, interrupt lines (inta\_o/intb\_o), DMA request/ack vectors, suspend/resume support, and UTMI vendor control/status pass-through.
  - Parameterization hooks: SSRAM\_HADR (default from macro; here 9 -> 10-bit word address, 4 KB buffer), WISHBONE address high bit macro USBF\_UFC\_HADR (here 12), and address decode macros USBF\_RF\_SEL/USBF\_MEM\_SEL (registers vs. buffer memory window on wb\_addr\_i[12]).
  - Compile-time feature macros: USBF\_HAVE\_EP1..USBF\_HAVE\_EP15 (endpoint enables), USBF\_DEBUG (diagnostics), USBF\_TEST\_IMPL (test hooks), optional USBF\_ASYNC\_RESET (switch reset style to async in supported flops).
  - Built-in simulation diagnostics (\$display) for endpoint configuration summary and debug traces in key state machines.
- Supported Configurations:
  - Endpoints: Up to 16 endpoints (EP0..EP15). Each endpoint implements double buffering (buf0/buf1), per-endpoint CSR, DMA request line, and interrupt sources; EP0 always present. Current build enables EP1..EP3 macros (EP0–EP3 active); others instantiate dummies by default, but can be enabled via USBF\_HAVE\_EP.
  - Transfer types and roles: Endpoint mode bits support Control (EP0), IN, OUT; transfer type bits include Isochronous and Bulk (csr[25:24]). DATA PID sequencing (DATA0/1/2/MDATA) and token responses (ACK/NACK/STALL/NYET) handled in hardware.
  - Speeds and PHY: UTMI+ interface supporting Full-Speed and High-Speed operation with automatic reset, suspend, resume, and HS/FS speed negotiation. UTMI signals: DataIn/Out, TxValid/Ready, RxValid/Active/Error, XcvSelect, TermSel, SuspendM, OpMode, LineState, usb\_vbus. Vendor pass-through: VControl\_pad\_o, VControl\_Load\_pad\_o, VStatus\_pad\_i.
  - System bus: WISHBONE slave, 32-bit data, address width A12:0 per macro (USBF\_UFC\_HADR=12). Register file vs. buffer memory selected by address bit (USBF\_RF\_SEL/USBF\_MEM\_SEL on wb\_addr\_i[12]).
  - Buffer memory: External synchronous SRAM port (32-bit data). Address width is SSRAM\_HADR+1; default SSRAM\_HADR=9 -> 4 KB total buffer space. Built-in arbiter (usbf\_mem\_arb) manages PHY-side DMA and host WISHBONE accesses across clock domains.
  - DMA: Per-endpoint DMA request vector (dma\_req\_o[15:0]) and corresponding ack inputs, with internal IDMA engine for endpoint data moves, word packing/unpacking, and zero-length packet support.
  - Clocks and resets: Two clock domains (phy\_clk\_pad\_i for link/PHY; clk\_i for WISHBONE/control). Synchronous resets by default; optional asynchronous reset on selected flops when USBF\_ASYNC\_RESET is defined.
  - Interrupts: Two aggregated interrupt outputs (inta\_o/intb\_o) with maskable sources including endpoint events (buffer done, UPID, CRC16, timeout, sequence error, out-to-small) and root function events (attach/detach, suspend/resume transitions, USB reset, RX errors, CRC5/PID errors).
  - Protocol compliance features: Full PID decode/generation, CRC5/CRC16 check and generation, FS/HS timing parameters (ACK/data timeouts), NYET handling in HS OUT flow, and control endpoint handling integrated with address match (function address).
  - Power management: Suspend output (susp\_o), external resume request input (resume\_req\_i) with UTMI resume signaling, and automatic SuspendM/OpMode management.

-- Assumptions and Limitations --

- Protocol/PHY assumptions
- UTMI 8-bit parallel interface is assumed (no ULPI). Line-state, XcvSelect, TermSel, OpMode follow UTMI conventions.
- HS/FS operation is implemented; Low-Speed is not explicitly supported. Speed negotiation logic targets HS/FS only.
- Timing constants (reset/suspend/resume, handshake timeouts) are hard-coded via macros and implicitly assume a specific UTMI clock (typically 60 MHz). Using a different phy\_clk requires retuning these constants.
- Resume (K) signaling is generated internally; remote-wakeup trigger from the register file (rf\_resume\_req) is exposed but not wired into the UTMI block at top level (resume is driven from resume\_req\_i via software).
  
- WISHBONE interface limitations
- 32-bit data bus, no byte select signals implemented, and no burst support. Unaligned/byte writes are not supported.
- Simple request/acknowledge protocol with single-cycle pulse acks; no STALL/ERR responses. Masters must tolerate multi-cycle response latency and single-cycle ack pulses.
- Address decoding uses compile-time macros; the top-level split between register file and memory depends on a single address bit (wb\_addr\_i[12]). This must match system integration.
  
- Memory/IDMA/arbitrer assumptions
- External buffer memory is a dedicated 32-bit SSRAM-like interface; sram\_re is held asserted continuously. Designs that require qualified read-enable may need adaptation.
- The arbiter grants either the PHY DMA or WISHBONE, with mack combinatorially tied to mreq (assumes zero-wait read visibility). Real memories with latency need additional handshake logic.
- Addressing is word-based; IDMA assembles/disassembles 32-bit words from 8-bit UTMI stream in little-endian byte order.
- Buffer descriptors use 14-bit size and an address field that includes lower nibble for flags (bsel/dpid). Software must program valid, aligned addresses and sizes. No hardware bounds checking beyond simple wrap logic.
- When DMA and the computed next write address equals end of buffer window, address wraps to 0 (global wrap), not to a per-buffer base; improper programming can corrupt memory.
  
- Endpoint/register file constraints
- Up to 16 endpoints (0–15) are implemented; EP0 is always present. Additional endpoints are enabled at compile time via USBF\_HAVE\_EPn macros and must be defined contiguously from EP1 upward (the code warns if gaps exist).
- Two buffers per endpoint (BUF0/BUF1). Software must manage buffer allocation, sizes, and status fields; hardware does not protect against overlapping buffers.
- DMA request/ack for each endpoint crosses clock domains using simple handshakes; integrators must verify CDC robustness in their environment.
- Interrupts (inta/intb) are level/latched on the WISHBONE clock domain and cleared by specific register reads; software must service them to avoid overrun or lost events.
  
- Packet engine/protocol handling
- PID sequencing for data packets supports DATA0/1/2/MDATA per transfer type. Some choices assume HS/FS rules (e.g., NYET use only in HS).
- Control transfers are minimally handled in hardware (token/data/handshake). Full USB chapter 9 handling (descriptors, requests) is not provided; software must implement protocol policy via register programming.
- Timeouts and error handling (CRC5/CRC16, sequence errors, NAK/STALL/NYET) follow fixed state machines; behavior outside USB spec corner cases is not parameterized.

- Clocks/resets and CDC
- Two primary clock domains: phy\_clk (UTMI) and wclk (WISHBONE). Multiple signals cross between them; most crossings are registered, but not all are double-synchronized. System integrators should review CDC paths for their timing/clocking.
- Resets can be synchronous or (optionally) asynchronous via USBF\_ASYNC\_RESET. Without ASYNC reset, clocks must be running when reset deasserts.

- Build/configuration caveats
- Macros for address widths and decodes (USBF\_UFC\_HADR, USBF\_SSRAM\_HADR, USBF\_RF\_SEL, USBF\_MEM\_SEL) appear multiple times; ensure final values are consistent in your build to avoid mismatched decoding or memory sizing.
- Simulation-only code is present (initial \$display, debug messages under USBF\_DEBUG/USBF\_TEST\_IMPL). Synthesis tools may warn; gate out debug for production.

- Integration/functional limits
- No power management beyond suspend/resume signaling; no PHY power control hooks beyond UTMI signals.
- No error correction/ECC on buffer memory; data integrity relies on USB CRCs only.
- No built-in descriptor ROM or firmware; the core requires software to configure endpoints, buffers, and handle control requests.
- Vendor-specific UTMI register interface is pass-through only; correctness depends on the external PHY's vendor extension behavior.

#### -- Acronyms and Notation --

- USBF: USB Function core (this IP).
- UTMI: USB 2.0 Transceiver Macrocell Interface (physical layer interface signals: XcvSelect, TermSel, SuspendM, OpMode, LineState, DataIn/Out, TxValid/Ready, RxValid/Active/Error).
- WB / WISHBONE: System bus interface used for register and buffer memory access (wb\_\* signals).
- EP: USB endpoint (EP0–EP15). Presence controlled by macros USBF\_HAVE\_EPn.
- CSR: Control/Status Register; per-endpoint configuration/status word (csr bus in endpoint register file).
- DMA: Direct Memory Access; in-core mover between UTMI side and shared SRAM (dma\_req/dma\_ack, idma module).
- IDMA: Internal DMA engine (module usbf\_idma) handling memory bursts for IN/OUT payloads.
- SSRAM: Synchronous SRAM used for endpoint buffers; address width parameter SSRAM\_HADR.
- RF: Register File (module usbf\_rf), includes endpoint CSRs, buffers descriptors, interrupts, and vendor/PHY control.
- PL: Protocol Layer (module usbf\_pl): integrates packet decode/assemble, protocol engine, and IDMA.
- PD: Packet Decoder (module usbf\_pd): decodes incoming UTMI stream, PIDs, tokens, data, CRCs.
- PA: Packet Assembler (module usbf\_pa): forms outgoing packets (handshake/data) and CRC16.
- PE: Protocol Engine (module usbf\_pe): endpoint state machine handling tokens, data, handshakes, buffer updates, and timeouts.
- CRC5/CRC16: USB token/data CRC calculators (modules usbf\_crc5/usbf\_crc16). Bit-reversal conventions applied at interfaces per USB spec.
- UTMI-LS: UTMI Link-State and speed-negotiation controller (module usbf\_utmi\_ls); manages FS/HS modes, reset, suspend/resume, K/J/SE0 detection.
- PIDs: USB packet identifiers. Token: OUT, IN, SOF, SETUP, PING. Data: DATA0, DATA1, DATA2, MDATA. Handshake: ACK, NACK, STALL, NYET. Special: PRE, ERR, SPLIT. Macros USBF\_T\_PID\_\* define 4-bit types; transmitted with inverted upper nibble.
- HS/FS: High-Speed / Full-Speed modes (mode\_hs flag selects behavior and timing constants).

- SOF: Start Of Frame token; frame number captured in frm\_nat.
- frm\_nat: Frame status word combining micro-frame count, last frame number, and SOF timing counters.
- fa: USB Function Address.
- nse\_err: Non-sequential error (protocol/stream sequencing).
- pid\_cs\_err: PID checksum error (upper/lower nibble mismatch).
- crc5\_err / crc16\_err: Token/data CRC errors.
- Interrupts: inta/intb aggregate endpoint and RF events; per-EP masks and status in RF.
- Buffer descriptors: buf0, buf1 are double-buffer descriptors per EP.
- Fields: [31]=NA (not allocated), [30:17]=buffer size (bytes), [SSRAM\_HADR+2:0]=byte address, [3:0]={data\_pid\_state,next\_bsel or low addr nibble} per context.
- uc\_bsel: software/engine buffer select state (round-robin when not DMA).
- uc\_dpd: data PID state tracking.
- buf0\_set/buf1\_set/buf0\_rl: update or release buffer descriptors; out\_to\_small indicates short OUT packet in DMA mode.
- DMA sizing signals: dma\_in\_buf\_sz1 (IN path has at least one max packet buffered), dma\_out\_buf\_avail (OUT path has space for next max packet).
- Memory arbitration: usbf\_mem\_arb arbitrates between packet layer (m\*) and WB access (w\*); sram\_\* are external SRAM ports.
- Wishbone address selection macros: USBF\_RF\_SEL selects RF space; USBF\_MEM\_SEL selects buffer memory space based on a high address bit. USBF\_UFC\_HADR defines WB address MSB index.
- Handshake/flow signals conventions:
- \*\_req/\*\_ack: request/acknowledge handshake (e.g., dma\_req/dma\_ack, mreq/mack).
- \*\_we/\*\_re: write-enable/read-enable.
- \*\_din/\*\_dout: data into/out of a module or memory port.
- \*\_adr: address bus; [SSRAM\_HADR+2:0] are byte addresses; word address is [SSRAM\_HADR:0].
- rd\_next: packet assembler requests next byte.
- Clock/reset domains:
- phy\_clk: UTMI/PHY clock domain.
- clk\_i (wclk/wb\_clk): system/Wishbone clock domain.
- rst: active-high reset; default synchronous. If USBF\_ASYNC\_RESET is defined, selected always blocks use asynchronous reset.
- Naming suffixes (register/combination hints):
- \_r: registered (flop) version of a signal; \_r1/\_s1/\_s2 staged pipeline taps.
- \_d: combinational next-state/value.
- \_set/\_clr: set/clear strobes; \_hold: retention qualifier.
- \_sel: select line; \_ena/\_en: enable; \_na: not available/allocated.
- Timing macros (relative to PHY clock):
- USBF\_HMS\_DEL, USBF\_RX\_ACK\_TO\_VAL\_{FS,HS}, USBF\_TX\_DATA\_TO\_VAL\_{FS,HS} define protocol timeouts.
- USBF\_T1\_\* and USBF\_T2\_\* constants implement USB suspend/resume/reset timing per spec (e.g., 2.5us, 3.0ms, 100us, 1.0ms).
- Line-state notation (UTMI):
- J/K/SE0/SE1: logical bus states from LineState[1:0]; ls\_j, ls\_k, ls\_se0 are used internally; idle is J (FS) or SE0 (HS).
- Address/bit indexing: [MSB:LSB] Verilog slice notation is used consistently; numeric literals use 'h hex and 'd decimal; underscores separate thousands in hex fields for readability.
- Vendor I/F: VControl/VStatus are vendor-specific UTMI sideband controls exposed in RF; VControl\_Load pulses a write.

# ARCHITECTURE



## -- Top-Level Block Overview --

usbf\_top is a USB 2.0 device (function) core integrating a UTMI+ PHY interface, a packet/protocol/data-path, a shared external SRAM buffer subsystem, and a Wishbone slave control interface. It targets full-/high-speed operation and supports up to 16 endpoints (EP0 always present; others compile-time selectable). Major blocks and roles: (1) UTMI interface (usbf\_utmi\_if) bridges the UTMI+ PHY to the core, handling TX/RX strobes and 8-bit data, and embeds a line-state controller (usbf\_utmi\_ls) for attach/reset, suspend/resume, and HS/FS speed negotiation, driving XcvSelect/TermSel/OpMode and producing mode\_hs, usb\_reset, usb\_suspend, usb\_attached. (2) Protocol Layer (usbf\_pl) runs on the PHY clock and composes the data path from: packet decode (usbf\_pd), packet assemble with CRC16 (usbf\_pa), protocol engine (usbf\_pe) implementing token/data/handshake handling, endpoint buffer selection, PID sequencing, error and timeout handling, and internal DMA (usbf\_idma) that moves payloads between the UTMI stream and external SRAM using 32-bit accesses. It also computes frame/sof timing (frm\_nat) and exposes endpoint selection and status updates (idin, buf0/1 control strobes, interrupt cause flags). (3) External SRAM arbiter (usbf\_mem\_arb) arbitrates the single 32-bit synchronous SRAM port between the protocol-layer DMA (PHY clock domain) and Wishbone accesses (system clock domain), presenting sram\_adr\_o/sram\_data\_o/sram\_we\_o/sram\_re\_o to the memory. (4) Register File and Endpoint Control (usbf\_rf) provides the main CSR block (device/line status, function address), per-endpoint CSRs and buffer descriptors (buf0/buf1), consolidated interrupt generation (inta\_o/intb\_o) from endpoint and core sources, and per-endpoint external DMA handshake (dma\_req\_o/dma\_ack\_i). It also exposes a vendor UTMI sideband interface (VControl/VStatus). (5) Wishbone bridge (usbf\_wb) is a Wishbone B3-compatible slave that decodes address space into register-file vs buffer memory regions, muxes read data, generates acks, and issues memory or RF transactions; its requests arbitrate through usbf\_mem\_arb to the external SRAM. Top-level interfaces: - System/Wishbone: clk\_i, rst\_i, wb\_\* for control/config and buffer memory access. - UTMI+ PHY: DataOut/TxValid/TxReady, DataIn/RxValid/RxActive/RxError, XcvSelect, TermSel, SuspendM, OpMode, LineState, usb\_vbus, with resume\_req\_i input and susp\_o output (mirrors usb\_suspend). - External SRAM: 32-bit data, address width parameterized by SSRAM\_HADR. - Interrupts/DMA: inta\_o, intb\_o; 16-bit dma\_req\_o/dma\_ack\_i (per endpoint). Clocks/resets and CDC: Two main domains—phy\_clk\_pad\_i (UTMI/packet path and

DMA) and clk\_i (Wishbone/control). Reset rst\_i is distributed to both; cross-domain handshakes are localized in usbf\_utmi\_if, usbf\_wb/usbf\_rf, and usbf\_mem\_arb. Data/control flow summary: UTMI RX/TX streams terminate in usbf\_pl; protocol engine selects endpoint and buffer (buf0/buf1) per endpoint CSR, coordinates DMA bursts via usbf\_idma into external SRAM through usbf\_mem\_arb. Endpoint and core status/control live in usbf\_rf, which also aggregates interrupts and issues external DMA requests. Software configures the core and endpoints via Wishbone (register space) and can read/write packet buffers via the memory space.

#### -- Data Path Architecture (RX/TX) --

##### Overview

- The core implements a UTMI-based USB Full-/High-Speed function with a split, pipelined RX/TX data path. The physical interface (UTMI) is terminated in the PHY clock domain; packet decode/encode, protocol control, and DMA operate in the PHY domain; endpoint registers and the Wishbone interface operate in the system (WB) clock domain. A memory arbiter shares a single-port SSRAM between protocol/DMA and the WB host.

##### UTMI Interface and Line-State Control

- usbf\_utmi\_if bridges UTMI signals to internal byte-streams: RxValid/Active/Error + DataIn become rx\_valid/active/err + rx\_data; tx\_data/tx\_valid/tx\_valid\_last and tx\_first are translated to DataOut/TxValid with TxReady backpressure.
- usbf\_utmi\_ls manages HS/FS mode selection, reset, suspend/resume, and UTMI control pins (XcvSelect, TermSel, SuspendM, OpMode). It can drive K-chirps (drive\_k) and stretches TxValid for resume/keep-alive signaling.

##### Receive (RX) Data Path

###### 1) Packet Decode (usbf\_pd)

- Captures PID, validates PID checksum, parses tokens (address/endp), checks CRC5, extracts frame number (SOF), and streams payload bytes.
- Outputs: token\_fadr/endp, token\_valid, crc5\_err, rx\_data\_st (byte), rx\_data\_valid, rx\_data\_done, crc16\_err, seq\_err.

###### 2) Protocol Match and Control (usbf\_pl + usbf\_pe)

- Address match (fsel) compares token\_fadr to the programmed function address; endpoint match comes from usbf\_rf based on token\_endp.
- usbf\_pe detects transaction type (IN/OUT/SETUP/CTRL), evaluates endpoint CSR (direction/type/stall/enable, DMA enable, max packet), and buffer availability. It raises send\_token (ACK/NAK/STALL/NYET) and controls DMA for data reception (rx\_dma\_en).
- Buffer bookkeeping: selects buf0/buf1, computes new address/size, generates idin (descriptor/status writeback), and asserts buf0\_set/buf1\_set/uc\_bsel\_set/uc\_dpd\_set or buf0\_rl to release/rotate buffers. Error/flow control: to\_small/to\_large, out\_to\_small (HS), seq/CRC timeouts.

###### 3) DMA Write (usbf\_idma)

- On rx\_dma\_en, assembles incoming bytes into 32-bit words and issues SSRAM writes via mreq/mwe/madr/mdout. Tracks received byte count (sizu\_c) and end-of-packet (idma\_done). Handles partial-word finalization.

###### 4) Memory Arbitration (usbf\_mem\_arb)

- Grants the PHY-side DMA access to SSRAM, arbitrating with WB-side accesses. Read data returns to DMA; writes go straight to memory.

##### Transmit (TX) Data Path

###### 1) Protocol Decision (usbf\_pe)

- For IN or control-IN, checks buffer availability and max packet. Asserts tx\_dma\_en and optionally send\_zero\_length. Selects data PID sequencing per transfer type, speed, and CSR.

###### 2) DMA Read (usbf\_idma)

- On tx\_dma\_en, prefetches 32-bit words from SSRAM (pipelined MEM\_RD states), presents byte stream tx\_data\_st, and honors rd\_next backpressure. Asserts send\_data while bytes remain or for ZLP as needed.

### 3) Packet Assemble (usbf\_pa)

- Emits data or immediate handshake tokens. Chooses token/data PIDs (token\_pid\_sel/data\_pid\_sel), computes/streams CRC16 (usbf\_crc16), controls tx\_first/tx\_valid/tx\_valid\_last, and generates rd\_next to pull data from DMA.

### 4) UTMI Egress (usbf\_utmi\_if)

- Drives DataOut/TxValid toward the PHY, merges resume drive\_k when required, and observes TxReady for pacing.

#### Endpoint/Register File and Interrupts (usbf\_rf)

- Hosts per-EP CSR and buffer descriptors (buf0/1: address, size, valid). Provides match, csr, buf0/1 to the protocol engine, aggregates per-EP DMA request lines, and generates interrupts (A/B) for buffer events, PID/CRC/timeout conditions, attach/reset/suspend/resume, and vendor/line-state updates.

#### Clock/Domain Boundaries

- PHY clock domain: UTMI I/F, packet decode/encode (pd/pa), protocol engine (pe), DMA (idma), memory arbiter PHY-port, and parts of rf.
- WB clock domain: Wishbone slave (usbf\_wb), memory arbiter WB-port, rf register access/interrupt presentation. Handshakes and simple synchronizers are used for cross-domain req/ack and status.

-- Control Path and Register File --

#### Overview

- The control path is split between a WISHBONE bus front-end (usbf\_wb), a register file and endpoint control block (usbf\_rf and usbf\_ep\_rf[0..15]), and the UTMI link-state controller (usbf\_utmi\_ls via usbf\_utmi\_if). The register file exposes all software-visible control/status, interrupt, endpoint configuration, buffer descriptors, and vendor UTMI hooks.

#### WISHBONE access and address decoding

- WISHBONE interface: usbf\_wb arbitrates between register-file (RF) and buffer memory windows. It asserts rf\_we/rf\_re for RF accesses and ma\_\* for memory.
- Selection: RF is selected when wb\_addr\_i[12] == 0 (USBF\_RF\_SEL); memory when wb\_addr\_i[12] == 1 (USBF\_MEM\_SEL). Address bus width is A12:0.
- Data mux: wb\_data\_o returns rf\_din when RF selected, otherwise ma\_din.
- Acknowledge: wb\_ack\_o is a single-cycle pulse per access. RF read data is valid in the same WISHBONE cycle that wb\_ack\_o asserts.

#### Register file top map (usbf\_rf)

- RF address presented as adr[6:0] = wb\_addr\_i[8:2] (word-aligned). Within an endpoint, adr[1:0] selects the sub-register.
- Top-level words (adr[6:2] = 0x00..0x03): multiplexed via adr[2:0]
- 0x00 (Main status CSR, RO; write used for resume request):
  - [4:3] LineState (UTMI)
  - [2] usb\_attached
  - [1] mode\_hs (1=HS, 0=FS)
  - [0] suspend
- Write side-effect: writing this word uses bit [5] as rf\_resume\_req latch (set on write, auto-clear when consumed). Other written bits are ignored.
- 0x01 (Function address, R/W): [6:0] funct\_adr
- 0x02 (Interrupt masks, R/W): [24:16] intb\_msk[8:0], [8:0] inta\_msk[8:0]

- 0x03 (Interrupt sources, RO with read-to-clear for RF sources):
  - [27:16] int\_srca[15:0] = (epN\_inta | epN\_intb) per endpoint N
  - [11:3] int\_srcb[8:0] RF-level sources (see Interrupts)
  - Read of this word clears int\_srcb bits; int\_srca is not latched and does not clear on read.
- 0x04 (Frame/time, RO): frm\_nat = {mframe\_count[3:0], 1'b0, frame\_no[10:0], 4'h0, sof\_time[11:0]}
- 0x05 (UTMI vendor, mixed RW):
  - Read: [7:0] utmi\_vend\_stat (sampled)
  - Write: [3:0] utmi\_vend\_ctrl; writing also sets utmi\_vend\_wr strobe until consumed in the PHY clock domain.
  - Endpoint windows (adr[6:2] = 0x04 + N for endpoint N=0..15): sub-register select by adr[1:0]
  - +0 (EP CSR, R/W): 32-bit control/status per endpoint (see Endpoint CSR)
  - +1 (EP INT, R/W): read shows {IEN masks and pending status}; write updates masks only
  - Read side-effect: clears local int\_stat[6:0] for the selected EP
  - +2 (EP BUF0, R/W): buffer descriptor 0 (see Buffers)
  - +3 (EP BUF1, R/W): buffer descriptor 1 (see Buffers)

#### Endpoint CSR (per usbf\_ep\_rf, exposed in usbf\_rf)

- CSR layout (MSB..LSB): {uc\_bsel[1:0], uc\_dpd[1:0], csr1[12:0], 1'b0, ots\_stop, csr0[12:0]}
- Fields consumed by the Protocol Engine (usbf\_pe):
  - [31:30] uc\_bsel: software-visible current buffer select; updated by HW at end of transfers
  - [29:28] uc\_dpd: software-visible current DATA PID state; updated by HW
  - [27:26] endpoint direction: 01=IN, 10=OUT, 00=CONTROL
  - [25:24] transfer type: 01=isochronous, 10=bulk, 00/others as design-specific
  - [23:22] endpoint state: 00=enabled, 01=disabled, 10=stall
  - [17] lrg\_ok: accept larger-than-MPS receipt allowed
  - [16] sml\_ok: accept smaller-than-MPS receipt allowed
  - [15] dma\_en: enable automatic DMA for non-control EPs
  - [12:11] tr\_fr: HS transaction scheduling (used internally in PID selection)
  - [10:0] max\_pl\_sz: max packet size
  - Write behavior: writing +0 uses din[27:15] -> csr1, din[13] -> ots\_stop, din[12:0] -> csr0.

#### Endpoint interrupts (per usbf\_ep\_rf)

- Read format (+1): {2'h0, iena[5:0], 2'h0, ienb[5:0], 9'h0, int\_stat[6:0]}
- Write format (+1): IENB <= din[21:16], IENA <= din[29:24]
- Pending bits int\_stat[6:0] (cleared by reading +1):
  - [6] OUT too small (DMA OUT short)
  - [5] Sequence error (PID/sequence)
  - [4] BUF1 update
  - [3] BUF0 update
  - [2] Unexpected PID
  - [1] CRC16 error
  - [0] Timeout
- Per-EP interrupt outputs inta/intb are generated by (int\_stat AND corresponding IEN mask). usbf\_rf ORs all endpoint inta/intb across EP0..EP15, then ORs with RF-level sources to form top-level inta\_o/intb\_o.

#### RF-level interrupts (usbf\_rf)

- Sources int\_srcb[8:0] (latched, auto-clear on read of top 0x03):
  - [8] USB reset detected
  - [7] RX error
  - [6] Detach
  - [5] Attach

- [4] Suspend exit
- [3] Suspend entry
- [2] No selected endpoint match (nse\_err)
- [1] PID check-sum error
- [0] Token CRC5 error
- Masks: inta\_msk[8:0], intb\_msk[8:0] at top 0x02 control routing into inta\_o/intb\_o.

#### Buffers (per endpoint)

- Two 32-bit buffer descriptors: BUF0 (+2) and BUF1 (+3).
- Format (function of SSRAM\_HADR; common fields):
  - [31] NA: 1 means buffer not allocated/unavailable
  - [30:17] Size in bytes for this buffer
  - [SSRAM\_HADR+2:0] SRAM start address (word-aligned; low bits may be overloaded on HW update)
  - [3:0] Low nibble used by HW to carry either {next\_dpid[1:0], next\_uc\_bsel[1:0]} when scheduling, or address low bits when HW advances the pointer (HW writes). Software writes should initialize address and size; HW will update size, address, and uc state as transfers complete.
- HW side-effects:
  - After successful transfers, HW either rotates uc\_bsel/uc\_dpd (no DMA) or advances address/size (DMA), asserting buf0\_set/buf1\_set to commit updates.
  - When a full buffer is consumed with DMA enabled, HW can release BUF0 (buf0\_rl) to original value buf0\_orig.

#### DMA handshake (exposed via usbf\_rf)

- dma\_req[15:0]: one request line per endpoint, asserted when dma\_en=1 and HW determines data is ready (IN) or space is available (OUT). Cleared on dma\_ack[n] unless hold conditions apply (internal counters prevent underrun/overflow).
- dma\_in\_buf\_sz1/dma\_out\_buf\_avail: per-EP indicators exported (via match selection) to the protocol engine for flow control decisions.

#### Protocol/PHY control hooks

- rf\_resume\_req (set via writing top 0x00 bit[5]) requests resume signaling; it is generated in clk\_i domain and synchronized into the PHY domain (exposed to top-level; system integration determines how it is applied).
- UTMI vendor hook at top 0x05: utmi\_vend\_ctrl[3:0] and utmi\_vend\_wr strobe (W), utmi\_vend\_stat[7:0] (R).

#### Access side-effects summary

- Reading EP INT (+1) clears that EP's int\_stat.
- Reading top interrupt sources 0x03 clears RF-level int\_srcb latched bits.
- Writing EP CSR (+0) updates control fields; writing masks (+1) updates IEN only.
- Writing BUF0/BUF1 sets descriptors; HW may later modify these descriptors on transfer completion.
- Writing top 0x00 with bit[5]=1 sets rf\_resume\_req (auto-clears on consumption).

#### Clocking and reset

- Control/CPU interface (RF and masks) operate on wclk (wb\_clk). Endpoint state machines and data-path operate on clk (phy\_clk). Cross-domain handshakes (e.g., dma\_req/ack, vendor strobe, resume) are synchronized internally.

-- Protocol Engine (PD/PE/PA) Architecture --

Overview

- The Protocol Engine (PE) complex comprises three tightly-coupled blocks inside usbf\_pl: PD (Packet Decoder), PE (Protocol/transaction state machine), and PA (Packet Assembler). It implements USB token/data/handshake reception and transmission, endpoint policy and buffer management, PID sequencing, DMA buffer servicing, and protocol timeouts/handshakes for Control, Bulk, and Isochronous transfers in both FS and HS modes.

#### Composition and roles

- PD (usbf\_pd): Decodes incoming UTMI RX stream into PIDs, token fields, and data; validates CRC5/CRC16; generates aligned data stream for memory; flags sequencing errors.
- PE (usbf\_pe): Transaction controller that decides IN/OUT processing, handshakes (ACK/NACK/STALL/NYET), PID toggling, buffer selection/updates, DMA enablement, and interrupt/event reporting; enforces endpoint policies from CSR.
- PA (usbf\_pa): Serializes outbound handshakes/tokens/data, inserts DATA PID per PE policy, computes/appends CRC16, issues tx\_first/tx\_valid/tx\_valid\_last.

#### External context and gating

- Functional address filter (fsel) is token\_fadr==fa and further gated: match\_o = !pid\_bad & fsel & match & token\_valid & !crc5\_err, where match (from RF) indicates endpoint number match.
- HS/FS mode selection provided by UTMI LS block; PE behavior (e.g., NYET/PING, timeouts) depends on mode\_hs.

#### Key interfaces

- PD outputs: pid\_\* flags, token\_fadr[6:0], token\_endp[3:0] (as ep\_sel), token\_valid, pid\_cks\_err, crc5\_err, rx\_data\_st[7:0], rx\_data\_valid, rx\_data\_done, crc16\_err, seq\_err.
- PE controls to PA/IDMA: send\_token, token\_pid\_sel[1:0], send\_data, data\_pid\_sel[1:0], send\_zero\_length, rx\_dma\_en, tx\_dma\_en, abort; memory/size: adr[SSRAM\_HADR+2:0], size[13:0], buf\_size[13:0], sizu\_c[10:0]; endpoint updates idin[31:0] and strobes buf0\_set/buf1\_set/uc\_bsel\_set/uc\_dpd\_set/buf0\_rl; interrupt set strobes int\_to\_set/int\_crc16\_set/int\_seqerr\_set/int\_upid\_set and flow flags nse\_err/out\_to\_small.
- PA I/O: tx\_data, tx\_valid/tx\_valid\_last, rd\_next, consumes tx\_data\_st from IDMA.
- IDMA (inside PL) bridges PE to external SRAM via usbf\_mem\_arb, using adr/size, rx/tx DMA enables, rd\_next backpressure, and madrs/mdin/mdout/mreq/mack/mwe.

#### Packet Decoder (PD) details

- FSM: IDLE → ACTIVE → TOKEN or DATA; returns to IDLE on end-of-packet or error.
- PID capture and validation: pid\_cks\_err if upper/lower nibble mismatch; classifies TOKEN (OUT/IN/SOF/SETUP/PING), DATA (DATA0/1/2/MDATA), HANDSHAKE (ACK/NACK/STALL/NYET) and others.
- Token parsing: token0/token1 latched; extracts function address (token\_fadr), endpoint (token\_endp), and CRC5; crc5\_err asserted if computed (~CRC5) mismatches received.
- Data path: 3-byte shift pipeline (d0→d1→d2) produces rx\_data\_st with rx\_data\_valid; CRC16 runs over received bytes (init FFFFh) and is checked against 800Dh at data\_done (crc16\_err). seq\_err flags protocol sequencing faults (unexpected transitions).

#### Protocol Engine (PE) details

- Endpoint policy fields from CSR (selected by current endpoint match):
- Direction/type: csr[27:26] (CTRL=00, IN=01, OUT=10), transfer type csr[25:24] (ISO=01, BULK=10, INT=00), halt/disable csr[23:22] (01=disabled, 10=stall), size policy csr[17] (large OK), csr[16] (small OK), DMA enable csr[15] (only for non-CTRL), max packet size csr[10:0], software PID bits uc\_dpd csr[29:28], buffer select uc\_bsel csr[31:30].
- Buffer model: Two buffer descriptors per EP (buf0, buf1) with fields [31]=NA, [30:17]=size, [SSRAM\_HADR+2:0]=address, [3:2]=next DATA PID, [1:0]=next buffer select. PE selects active buffer

based on CTRL/in\_token or uc\_bsel and buffer availability; checks NA bits and space.

- FSM states: IDLE, TOKEN, IN, IN2, OUT, OUT2A, OUT2B, UPDATEW, UPDATE, UPDATE2.
- IDLE: On match\_r and !ep\_disabled and !pid\_SOF: decide transaction:
  - If ep\_stall→send STALL.
  - If both buffers NA or no DMA capacity or CTRL buffer missing→send NACK.
  - If HS PING→send ACK.
  - If IN (EP IN or CTRL+IN): enable tx\_dma\_en (and send\_zero\_length if max\_pl\_sz==0) → IN.
  - If OUT (EP OUT or CTRL+(OUT/SETUP)): enable rx\_dma\_en → OUT.
- IN: Wait for IDMA completion; if ISO, update immediately; else proceed to IN2 to await ACK within rx\_ack\_to.
- IN2: If token\_valid&&pid;\_ACK→UPDATE; else on rx\_ack\_to→IDLE.
- OUT: Receive payload; on tx\_data\_to/crc16\_err/abort→IDLE. On rx\_data\_done: if ISO, optionally set seqerr and go UPDATEW; else proceed to OUT2A.
- OUT2A/OUT2B: Final checks; if to\_small/to\_large→NACK. If pid\_seq\_err→ACK (complete the transaction). Else ACK (or NYET in HS when no\_bufs), then UPDATE.
- UPDATEW→UPDATE: commit buffer pointer/size update; if buffer completed and DMA enabled, issue buf0\_rl (release) else write back updated descriptor to selected buffer; UPDATE2 latches uc\_bsel/uc\_dpd.
- PID sequencing: Computes this\_dpid (expected incoming DATA PID) and next\_dpid for subsequent transactions, per transfer type, mode\_hs, and uc\_dpd; compares against received DATA PID (pid\_seq\_err) and rotates sequence for ISO/BULK per USB rules.
- Size/address arithmetic: size\_next=min(buf\_size,max\_pl\_sz); new\_size=buf\_size-new\_sizeb; new\_addr=old\_addr+size increment (packet size or szu\_c). Checks buffer\_done (IN: empty; OUT: full). Detects overflow (buffer\_overflow when szu\_c>buf\_size during OUT).
- DMA interplay: dma\_en=csr[15] & !CTRL\_ep; when set, uses IDMA for payload movement; uses dma\_in\_buf\_sz1/dma\_out\_buf\_avail to throttle IN/OUT respectively; issues abort on overflow, mid-transaction retarget, or policy violations.
- Timeouts: rx\_ack\_to (IN handshake wait) uses USBF\_RX\_ACK\_TO\_VAL\_FS/HS; tx\_data\_to (OUT data availability) uses USBF\_TX\_DATA\_TO\_VAL\_FS/HS.
- Handshake selection encoding for PA: token\_pid\_sel (ACK=0, NACK=1, STALL=2, NYET=3); data\_pid\_sel reflects chosen DATA PID.
- Interrupt/event strobes (to RF):
  - int\_to\_set on timeouts (IN2 or OUT), int\_crc16\_set on rx\_data\_done&crc16\_err, int\_seqerr\_set on OUT iso or detected sequencing fault, int\_upid\_set on unexpected PID for active EP, plus buffer completion int\_buf0\_set/int\_buf1\_set; nse\_err for no-EP-match; out\_to\_small on HS OUT short packet under DMA.
  - Descriptor update bus (idin):
    - idin[31:17] = {buffer\_done,new\_size} or {4'h0,szu\_c} on out\_to\_small,
    - idin[SSRAM\_HADR+2:4] = new\_addr (or original on out\_to\_small),
    - idin[3:0] = {next\_dpid,next\_bsel} or preserved low bits when writing updated buffer address.

#### Packet Assembler (PA) details

- FSM: IDLE, DATA, CRC1, CRC2, WAIT (zero-length path).
- Builds outbound bytes: chooses between handshake/token (send\_token|send\_token\_r) or data path (tx\_data\_st); first data byte is DATA PID; appends CRC16 (computed on-the-fly, init FFFFh, output reflected/inverted per USB).
- Control: tx\_first asserted on start of token or data; tx\_valid\_last asserted with token or the final CRC byte; rd\_next strobes IDMA when next data byte is consumed; supports zero-length packets (sends only DATA PID+CRC).

#### Error/exception handling

- pid\_bad masks non-addressable PIDs (ACK/NACK/STALL/NYET/PRE/ERR/SPLIT and PING in FS), preventing PE match\_o.

- abort sources: buffer overflow, concurrent re-match mid-transaction, policy oversize (to\_large) and other checks; abort returns state to IDLE and deasserts DMA.

#### Timing and compliance

- CRC5/CRC16 computed per USB2.0; CRC16 good value check is 0x800D at EOP.
- HS features: PING handling (ACK response), NYET on OUT when no buffers, separate timeout values, out\_to\_small flag for non-full HS OUT DMA bursts.

#### Data/memory path

- For OUT: PD→IDMA (rx\_data\_st/valid)→SSRAM via arbiter; PE updates/release descriptors.
- For IN: SSRAM→IDMA→PA; PE arms DATA PID, zero-length if max\_pl\_sz==0; waits for ACK (non-ISO).

#### Integration in usbf\_pl

- PD, PE, PA, and IDMA are composed; frm\_nat (frame and micro-frame timing) is generated for RF diagnostics but does not alter PE decisions; tx\_valid\_out (from UTMI IF) feeds back to PE for ACK-wait timing.

#### Reset and clocking

- All state machines are synchronous to phy\_clk (USB clock); reset is rst. Async reset support is wrapped by `USBF\_ASYNC\_RESET defines in RTL.

#### -- Endpoint Architecture and CSR/Buffer Structure --

Endpoints 0..15 are implemented by identical endpoint register-file instances (EP0 is always present; EP1..EP15 presence is compile-time selectable). Each endpoint exposes four 32-bit registers in its window: 0x0=CSR, 0x4=INT, 0x8=BUF0, 0xC=BUF1. The SoC register file (usbf\_rf) maps endpoint windows sequentially: window index 0x04+N (N=0..15) selects endpoint N; within a window, adr[1:0] selects CSR/INT/BUF0/BUF1.

Endpoint CSR (EPx.CSR, read/write except noted) is a 32-bit composite: [31:30] uc\_bsel (current SW/PE buffer select, updated by hardware when uc\_bsel\_set), [29:28] uc\_dpd (current SW/PE data PID state, updated by hardware when uc\_dpd\_set), [27:26] EP type/direction: 00=Control, 01=IN, 10=OUT, [25:24] transfer type: 01=Isochronous, 10=Bulk (other values reserved), [23:22] EP state: 00=enabled, 01=disabled, 10=stall, [21:18] endpoint address (matches token ep\_sel), [17] large packet allowed (lrg\_ok), [16] small packet allowed (smi\_ok), [15] DMA enable (ignored/forced off for Control), [14] reserved (reads 0), [13] ots\_stop (OutToSmall monitor enable; when set, hardware writes a status code into csr1[8:7] on an out\_to\_small event), [12:11] tr\_fr (HS transaction framing; determines DATA PID progression in HS), [10:0] max\_pl\_sz (11-bit maximum packet size). Firmware writes CSR as one word (bits [27:15],[13],[12:0]); uc\_bsel/uc\_dpd are updated by hardware via uc\_bsel\_set/uc\_dpd\_set using idin.

Endpoint INT (EPx.INT): read returns {2'b0, iena[5:0], 2'b0, ienb[5:0], 9'b0, int\_stat[6:0]}. Write (we1) programs masks: ienb<=din[21:16], iena<=din[29:24]. Read of INT (re on adr==1) clears int\_stat. Per-endpoint interrupt sources (int\_stat) set when the addressed endpoint matches the current token: [6] out\_to\_small (OUT size != max\_pl\_sz with DMA), [5] int\_seqerr\_set (DATA PID sequencing error), [4] int\_buf1\_set (buffer 1 updated/completed), [3] int\_buf0\_set (buffer 0 updated/completed), [2] int\_upid\_set (unexpected PID for EP type/state), [1] int\_crc16\_set (CRC16 error on received data), [0] int\_to\_set (ACK/DATA timeout). INTA/INTB outputs per endpoint are the masked ORs of these bits with iena/ienb. Device-level interrupt aggregation and masks live in the usbf\_rf top scope (separate from per-endpoint INT).

Endpoint buffers (EPx.BUF0/BUF1) are 32-bit descriptors for the shared 32-bit wide SSRAM buffer space: [31] NA (not available flag; if set, buffer is treated as unavailable), [30:17] size (14-bit buffer size in bytes), [SSRAM\_HADR+2:0] buffer start address (byte address; lower two bits are byte offsets), [3:0] low nibble used by hardware either to carry address low bits on updates or to store {next\_dpid[1:0], next\_bsel[1:0]} when the protocol engine advances PID/buffer state. Software initializes BUF0/BUF1 (both the address and size). Hardware may update a buffer after a transfer as follows: on UPDATE, if the transfer did not consume the entire buffer and DMA is disabled, the engine writes back new\_size and new\_adr and rotates uc\_bsel; if DMA is enabled and the buffer is fully consumed, buf0\_rl triggers to reload BUF0 from its original copy (buf0\_orig). For OUT endpoints with DMA, out\_to\_small causes a status writeback: idin[31:17]<={4'h0,sizu\_c} and address low nibble preserved; for normal updates idin[31:17]<={buffer\_done,new\_size} and idin low nibble is either new\_adr[3:0] (when updating address) or {next\_dpid,next\_bsel} (when only PID/buf select advances). The NA condition is detected as bit[31]==1 or address field all-ones; such buffers are skipped/NACKed as appropriate.

Endpoint match and operation: An endpoint matches an incoming token when ep\_sel (from token) equals CSR[21:18]. For matched endpoints, the protocol engine (PE) drives: buf0\_set/buf1\_set (write idin to the selected buffer descriptor), buf0\_rl (reload BUF0 from its original value), uc\_bsel\_set/uc\_dpd\_set (update CSR[31:28] from idin[1:0]/[3:2]), and event bits that raise int\_stat. Data flow uses IDMA + memory arbiter into a unified SSRAM; per-endpoint DMA request (dma\_req[n]) asserts when: OUT endpoint has queued data to write (dma\_out\_cnt!=0) or IN endpoint needs to fetch data (dma\_in\_cnt)

Address map summary (relative to each endpoint window): +0x0: CSR, +0x4: INT, +0x8: BUF0, +0xC: BUF1. Endpoint window base indices are contiguous from EP0 at window 0x04. All registers are 32-bit word accessible over the WISHBONE RF space.

-- DMA and Memory Subsystem (SSRAM and Arbiter) --

#### Overview

- The USB function core uses a single 32-bit, word-addressed, single-port SSRAM as a shared buffer memory for packet payloads and software access. Access is arbitrated between the packet DMA engine (IDMA, running on phy\_clk) and a Wishbone memory window (software, wb\_clk).

#### External SSRAM interface

- Width: 32-bit data (sram\_din/sram\_dout)
- Address: [SSRAM\_HADR:0] word address (low 2 address bits are byte selects handled internally by the DMA)
- Controls: sram\_we (write enable), sram\_re (always 1), sram\_adr (word address)
- Parameterization: SSRAM\_HADR = `USBF\_SSRAM\_HADR` (default 9), yielding  $2^{(SSRAM\_HADR+1)}$  words =  $(1 << (SSRAM\_HADR+1)) * 4$  bytes (default 4 KiB)

#### Arbiter (usbf\_mem\_arb)

- Ports (IDMA side, phy\_clk domain): madr[m], mdout<=sram\_din, mdin=>sram\_dout, mwe, mreq, mack
- Ports (Wishbone side, wb\_clk domain via usbf\_wb): wadr[w], wdout<=sram\_din, wdin=>sram\_dout, wwe, wre, wreq, wack
- Selection: IDMA has priority. Wishbone selected when (wreq | wack) & !mreq; otherwise IDMA selected.
- Handshakes: mack = mreq (immediate acknowledge to IDMA). Wishbone ack wack is issued (on phy\_clk) when wre is seen and no concurrent mreq; wack is held off while mreq is high.
- Address/data mux: sram\_adr/sram\_dout steer from the selected master; sram\_din is broadcast to both masters (mdout, wdout).

- Write enable: sram\_we = (wreq & wwe) when Wishbone selected; else sram\_we = (mwe & mack) when IDMA selected. Read enable is tied high.
- Clocks: Arbiter internal flops (e.g., wack\_r) run on phy\_clk. Wishbone requests (wreq/wwe/wadr/wdin) originate on wb\_clk; IDMA requests (mreq/...) on phy\_clk. The arbiter provides a simple level-based handshake across domains; system integration should ensure wb\_clk/phy\_clk relationship is acceptable for this scheme.

#### Wishbone memory window (usbf\_wb)

- Address decode: Memory window selected when `USBF\_MEM\_SEL` (default wb\_addr\_i[12] == 1). Register file when `USBF\_RF\_SEL`.
- Mapping: ma\_adr (Wishbone address) is truncated to SSRAM word address: wadr = ma\_adr[SSRAM\_HADR+2:2]. No byte enables; software must perform 32-bit accesses.
- Acknowledgment: A single acknowledge is returned per request when the arbiter grants the bus (may stall while IDMA is active).

#### DMA Engine (usbf\_idma)

- Purpose: Moves packet payloads between UTMI data streams and SSRAM buffers under control of the protocol engine.
- Addressing: IDMA uses byte address adr[SSRAM\_HADR+2:0] internally; the arbiter sees word address madr = adr[SSRAM\_HADR+2:2]. Little-endian packing/unpacking across the 32-bit word lanes (byte lane selected by adr[1:0]).
- RX path (OUT/SETUP): Assembles 32-bit words from incoming bytes; issues SSRAM writes when a word is complete or at packet end (partial word). Uses mreq/mwe handshakes. Optionally wraps to buffer start when dma\_en=1 and the end of the buffer (adr + buf\_size) is reached.
- TX path (IN): Prefetches 32-bit words from SSRAM (double-word ping-pong into rd\_buf0/rd\_buf1) and streams bytes to the packet assembler, asserting rd\_next to advance.
- Sizes and flow control: Tracks remaining byte count (sizd\_c) and user-visible size (sizu\_c). Generates idma\_done when the programmed size is exhausted (TX) or end of RX packet is reached.

#### Concurrency and policy

- Priority: IDMA always preempts Wishbone. While IDMA mreq is asserted, Wishbone memory acks are deferred.
- Latency: IDMA sees mack immediately (mack=mreq), modeling a single-cycle SSRAM access. Wishbone typically receives a one-clock phy\_clk-cycle ack when IDMA is idle; otherwise it waits.

#### Notes and limitations

- No byte write enables on SSRAM; all software writes are full 32-bit words.
- Read path is always enabled; address selection determines which location is seen.
- The arbiter implements a minimal cross-clock handshake. Deployments with unrelated wb\_clk/phy\_clk should review CDC robustness or align clocks accordingly.
- Buffer wrap (ring) behavior is enabled only when dma\_en=1 for the endpoint.

-- Wishbone Slave Interface Architecture --

The core exposes a 32-bit, classic WISHBONE slave interface with a single address region split into Register File (RF) and Buffer Memory (SSRAM) spaces.

- Bus signals: wb\_clk, rst, wb\_addr\_i[USBF\_UFC\_HADR:0], wb\_data\_i[31:0], wb\_we\_i, wb\_stb\_i, wb\_cyc\_i, wb\_data\_o[31:0], wb\_ack\_o. No sel\_i/cti\_i/bte\_i/stall\_o/err\_o/rty\_o are used or generated.
- Data width and alignment: 32-bit words; lower address bits [1:0] are ignored internally (word-aligned accesses only). Byte enables are not supported (full-word accesses).
- Address width: parameter USBF\_UFC\_HADR (final define in this file is 12), so A12:0 are

implemented.

- Region selection (A12):

- A12 = 0 (USBF\_RF\_SEL): Register File space (function + endpoint registers).
- A12 = 1 (USBF\_MEM\_SEL): Packet Buffer Memory (SSRAM) space.
- RF address decoding: usbf\_rf sees adr[6:0] = wb\_addr\_i[8:2] (word addressing). RF reads assert rf\_re; writes assert rf\_we and pass wb\_data\_i on rf\_dout.
- Buffer memory decoding: usbf\_mem\_arb uses word address wadr = wb\_addr\_i[SSRAM\_HADR+2:2]. SRAM\_HADR (final define is 9) yields 4 KiB total buffer memory ((1<<(SSRAM\_HADR+1)) \* 4 = 4096 bytes).
- Read data return: wb\_data\_o multiplexes ma\_din (memory) or rf\_din (registers) based on the address MSB (A12). Masters should sample wb\_data\_o when wb\_ack\_o is asserted.
- Transaction model: classic single transfer. A request is recognized when wb\_stb\_i & wb\_cyc\_i is high. The slave returns a single-cycle wb\_ack\_o pulse when the operation completes. No pipelining or bursts.
- Acknowledge generation and CDC: The WISHBONE front-end (usbf\_wb) runs an internal state machine in the PHY clock domain (phy\_clk) and synchronizes a single-cycle acknowledge back to the WISHBONE clock domain (wb\_clk) using a two-stage pulse synchronizer. RF accesses acknowledge after a short fixed latency; memory accesses acknowledge when the memory arbiter grants the WISHBONE port.
- Memory arbiter behavior: usbf\_mem\_arb arbitrates between the packet layer DMA port (M) and the WISHBONE port (W). The WISHBONE port is served when no DMA request is active; otherwise, WISHBONE is back-pressed until DMA deasserts its request. WISHBONE memory acks (ma\_ack) are produced from the arbiter once the W port is selected.
- Side effects: RF reads may clear internal sticky status bits as defined in usbf\_rf; writes update control, endpoint CSRs, and buffer descriptors. Memory operations only access the shared packet buffers.
- Reset: Active high rst (optionally asynchronous if USBF\_ASYNC\_RESET is defined) resets the WISHBONE interface state machine to IDLE; wb\_ack\_o is deasserted.
- Timing expectations: RF space typically acknowledges in a few wb\_clk cycles (subject to CDC), memory space acknowledges variably depending on DMA activity and PHY clock arbitration.
- Unsupported features: partial-word writes, burst/CTI/BTE cycles, error/retry responses, and stall insertion are not implemented.

-- UTMI Interface and Link State Machine --

The UTMI Interface and Link State Machine provide the on-PHY-clock bridge to a UTMI-compliant USB PHY and implement FS/HS attach, reset, suspend/resume, and high-speed negotiation. They comprise two blocks: usbf\_utmi\_if (electrical/handshake bridge) and usbf\_utmi\_ls (link state machine).

usbf\_utmi\_if (UTMI bridge on phy\_clk):

- RX path: registers RxValid, RxActive, RxError and DataIn to rx\_valid, rx\_active, rx\_err, rx\_data.
- TX path: drives DataOut with tx\_data when TxReady or tx\_first; otherwise, if the link state machine requests a K drive (drive\_k=1), outputs 0x00. TxValid is asserted when any of: tx\_valid, tx\_valid\_last, or drive\_k are true, and it remains asserted until TxReady or the K drive completes (holding rule: TxValid <= tx\_valid | tx\_valid\_last | drive\_k | (TxValid & !(TxReady | drive\_k\_r))). tx\_ready mirrors TxReady. tx\_first primes the first data byte regardless of TxReady.
- Low-level UTMI controls (outputs to PHY): XcvSelect (1=FS, 0=HS), TermSel (FS termination on/off), SuspendM (asserted while in suspend without a pending resume request, or while LineState==K), OpMode (00=normal/bit-stuff on, 10=non-driving/bit-stuff off), plus mode\_hs, usb\_reset, usb\_suspend, usb\_attached. It accepts resume\_req (device-initiated resume request) and produces suspend\_clr to indicate the core has exited suspend (top-level clears a latched resume request on suspend\_clr).

usbf\_utmi\_ls (link state machine on phy\_clk):

- Interprets UTMI LineState (J/K/SE0/SE1) and classifies idle: idle = SE0 in HS, idle = J in FS.

Measures protocol timing using counters parameterized by macros: 250 ns ticks (USBF\_T1\_PS\_250\_NS), 2.5 µs, 62.5 µs, 3.0 ms, 3.125 ms, 5.0 ms, 100 µs, 0.5 ms, 1.0 ms, 1.2 ms, 100 ms.

- States and key behavior:

- POR: Initialize FS mode (XcvSelect=FS, TermSel on, OpMode=normal), clear suspend; go to ATTACH.
- ATTACH: After ~100 ms stable presence assert usb\_attached and enter NORMAL.
- NORMAL: Monitor idle durations. If (FS) idle SE0 >2.5 µs (reset) enter RESET; if idle >3 ms enter SUSPEND (FS). If HS and idle >3 ms, drop to FS electricals and enter RES\_SUSP.
- RES\_SUSP: After >100 µs decide RESET (SE0) or SUSPEND (J).
- SUSPEND: Exit to RESET on SE0>2.5 µs. Enter RESUME on K. If idle >5 ms and a resume\_req is present, initiate RESUME\_REQUEST.
- RESUME\_REQUEST: Device-initiated resume: assert FS term, disable bit-stuffing (non-driving), after wakeup window drive RESUME\_SIG.
- RESUME\_SIG: Drive K (drive\_k=1) for >1.0 ms, then go to RESUME; RESUME clears suspend and, once SE0 seen, re-enable normal mode and proceed to RESUME\_WAIT; after >100 µs enter NORMAL.
- RESET: Assert usb\_reset, force FS electricals and non-driving mode for >1.0 ms, then SPEED\_NEG.
- SPEED\_NEG / SPEED\_NEG\_K / SPEED\_NEG\_J: Perform HS chirp (drive K, count alternating K/J detections). On six K/J toggles enter SPEED\_NEG\_HS; if SE0 long occurs, fall back to SPEED\_NEG\_FS.
- SPEED\_NEG\_HS: Select HS (XcvSelect=HS, TermSel off, bit-stuff on, mode\_hs=1). Once bus returns to idle, enter NORMAL.
- SPEED\_NEG\_FS: Select FS (XcvSelect=FS, TermSel on, bit-stuff on, mode\_hs=0), then NORMAL.
- Outputs driven per state: mode\_hs (latched FS/HS), usb\_reset (one-shot during RESET), usb\_suspend (set in SUSPEND, cleared on resume/reset entry), usb\_attached (set after ATTACH), XcvSelect/TermSel/OpMode as above, drive\_k for resume and chirp signaling, and suspend\_clr when leaving suspend.

Integration notes:

- The top-level latches an external resume\_req and clears it upon suspend\_clr. The UTMI IF/LS logic runs entirely in the PHY clock domain and provides all UTMI control pins, bus state indications, and timing-compliant resume/reset/negotiation sequences required by the USB 2.0 FS/HS protocol.

-- Interrupt Architecture --

Overview

- Two interrupt outputs: inta\_o and intb\_o (Wishbone clock domain). Each is a level-OR of endpoint interrupts and masked function-level (core) events.

Top-level composition

- inta\_o = (OR of ep[0..15].inta) OR ((int\_srcb & inta\_msk))
- intb\_o = (OR of ep[0..15].intb) OR ((int\_srcb & intb\_msk))
- int\_srcb[8:0] are function-level latched sources. inta\_msk[8:0] and intb\_msk[8:0] independently route any function source to A and/or B lines.
- int\_srca[15:0] is a live status bitmap: int\_srca[n] = epn\_inta OR epn\_intb (not latched, not maskable; read-only status).

Function-level interrupt sources (int\_srcb)

Bit -> Event (latched, clear on read of RF INT\_SRC)

- [8] USB reset detected
- [7] UTMI RxError
- [6] Device detach (usb\_attached falling edge)

- [5] Device attach (usb\_attached rising edge)
  - [4] Resume from suspend (suspend\_end)
  - [3] Enter suspend (suspend\_start)
  - [2] No-matching-endpoint on addressed token (NSE)
  - [1] PID checksum error
  - [0] Token CRC5 error
- Masking/route
- Write RF INT\_MASK to set inta\_msk[8:0] and intb\_msk[8:0]. If a bit is set in a mask, the corresponding source contributes to that line. A source may drive both lines if both masks are set.
- Clearing
- Read RF INT\_SRC (see Register access) to clear int\_srcb bits (read-to-clear). int\_srca is not cleared (it mirrors current per-endpoint line activity).

#### Per-endpoint interrupt architecture (ep0..ep15)

- Each endpoint implements 7 latched causes in int\_stat[6:0] (set in PHY/packet clock domain, reported in Wishbone domain). They are cleared by reading the endpoint INT register (read-to-clear when re asserted and adr==1 within the endpoint block).
- Cause-to-line routing is controlled per endpoint by two 6-bit enable fields written into the endpoint INT register:
  - iena[5:0] routes selected causes to the endpoint's A line (epX\_inta).
  - ienb[5:0] routes selected causes to the endpoint's B line (epX\_intb).
- Cause IDs and mask-bit mapping (note that two causes share mask bit 3):
  - int\_stat[0] Timeout -> mask bit 0
  - int\_stat[1] CRC16 error -> mask bit 1
  - int\_stat[2] Unexpected PID (UPID) -> mask bit 2
  - int\_stat[3] Buffer0 update -> mask bit 3
  - int\_stat[4] Buffer1 update -> mask bit 3 (shared)
  - int\_stat[5] Sequence error (ISO OUT) -> mask bit 4
  - int\_stat[6] OUT packet smaller than max pkt (DMA) -> mask bit 5

#### Aggregation

- usbf\_rf ORs all epX\_inta to form inta\_ep and all epX\_intb to form intb\_ep; these are then ORed with function-level masked results to produce inta\_o/intb\_o.

#### Register access (RF block, Wishbone A[12]=0 region)

- INT\_MASK: sets inta\_msk[8:0] and intb\_msk[8:0] (write). Addressed via RF adr[2:0]==2.
- INT\_SRC: read returns {3'h0, int\_srcb[8:0], 4'h0, int\_srca[15:0]}. Reading at RF adr==0x03 clears int\_srcb (read-to-clear). Endpoints are cleared by reading each endpoint's INT register (adr[1:0]==1 within the endpoint window).
- Endpoint windows: RF adr[6:2]==0x04..0x13 select EP0..EP15; inside each, adr[1:0]==1 is the INT register (read returns {..., iena, ..., ienb, ..., int\_stat}; write updates iena/ienb).

#### Behavior and timing

- inta\_o/intb\_o are level signals, synchronous to Wishbone clock (wclk). They deassert when all contributing sources are cleared or masked.
- Function-level sources are latched until RF INT\_SRC is read. Endpoint causes are latched until the corresponding endpoint INT register is read.
- Software servicing flow: on inta\_o/intb\_o, read RF INT\_SRC to handle/clear function-level events and to discover which endpoint(s) are asserting (int\_srca). Then read each active endpoint's INT register to obtain/clear its cause(s).

-- Clock and Reset Domains and CDC Strategy --

## Clocking overview

- Two primary, independent clock domains are used:
  - 1) PHY/UTMI domain (phy\_clk\_pad\_i). Hosts the UTMI interface (usbf\_utmi\_if/usbf\_utmi\_ls), the packet layer (usbf\_pl), packet decoder/encoder (usbf\_pd/usbf\_pa), packet engine (usbf\_pe), the internal DMA (usbf\_idma), and the shared SRAM arbiter side for the PHY master (usbf\_mem\_arb.m\*).
  - 2) System/WISHBONE domain (clk\_i). Hosts the WISHBONE bridge (usbf\_wb), the register file and endpoint register bank front-end and interrupt aggregation (usbf\_rf), UTMI vendor control/status registers, and the shared SRAM arbiter side for system accesses (usbf\_mem\_arb.w\* via usbf\_wb).

## Reset scheme

- A single reset input rst\_i is fanned out to both domains as rst. The PHY reset output phy\_rst\_pad\_o is a direct copy of rst\_i.
- All modules use rst synchronously to their local clock by default. If the synthesis macro USBF\_ASYNC\_RESET is defined, many always blocks switch to asynchronous reset (negedge rst). No additional reset synchronizers are inserted between domains; system integration should ensure reset assertion is asynchronous and deassertion is aligned or otherwise safe for each clock domain.

## CDC strategy and placement

- Data-path containment: All time-critical byte and packet processing, and the SRAM data path and arbitration, are kept in the PHY domain. System/WB sees memory through a bridge rather than a dual-clock RAM interface.
- WISHBONE-to-PHY bridge (usbf\_wb):
  - WB request ingress: WB strobe/cycle (wb\_stb\_i & wb\_cyc\_i) are sampled directly into the PHY clock (wb\_req\_s1) and used by a PHY-clock state machine that generates memory/register cycles. This is a level-sampled crossing (single stage), relying on the request being stable long enough.
  - ACK egress back to WB: Acknowledge (wb\_ack\_d) generated in PHY is synchronized into WB clock with a two-flip-flop chain (wb\_ack\_s1a/wb\_ack\_s2), then edge-detected to form wb\_ack\_o. This implements a standard 2-FF synchronizer for the return pulse.
  - Address/data for WB memory cycles: ma\_adr is a combinational copy of wb\_addr\_i and is consumed on the PHY side (via usbf\_mem\_arb.wadr). Validity relies on the usbf\_wb state machine holding the request while address/data remain stable; no explicit multi-FF synchronizer is used on address/data.
- Memory arbitration (usbf\_mem\_arb): Entirely in PHY clock. The system (WB) side is presented as a PHY-clocked requester (w\* signals) driven by usbf\_wb; hence the actual SRAM interface has no dual-clock crossing.
- UTMI control/status:
  - resume\_req: Synchronized into PHY clock within usbf\_utmi\_ls via a 2-FF synchronizer (resume\_req\_s1/resume\_req\_s). The corresponding suspend\_clr generated in the PHY domain is sampled in the WB domain without an explicit 2-FF synchronizer and only used to clear resume\_req.
  - Vendor control writes: Written in WB domain (utmi\_vend\_ctrl\_r / utmi\_vend\_wr\_r) then sampled once into PHY domain (single-FF stage). Intended as slow control; pulses should be wide enough to be seen by the PHY clock.
  - Vendor status reads: UTMI vendor status is captured in PHY, then re-sampled once in WB domain (single-FF stage).
- Register file and endpoint register bank (usbf\_rf/usbf\_ep\_rf):
  - The endpoint control/status registers and per-endpoint DMA/bookkeeping live in the PHY domain. Interrupt outputs inta/intb are produced in the WB clock domain inside usbf\_ep\_rf by reading PHY-domain status; this is implemented via direct sampling (single-FF or combinational read across domains), assuming level stability over several cycles.
  - Global RF status/interrupt bits (attach, deattach, USB reset, CRC/sequence errors, suspend start/end) are latched in the WB domain by sampling PHY-domain signals using single-FF stages; these are level-type events with clear-on-read.

## Assumptions and integration guidance

- The design primarily uses: (a) 2■FF synchronizers for specific control pulses (e.g., resume\_req ack path), (b) request/ack level handshakes where one side holds a level until the other domain observes it, and (c) many single■stage, level■sampled crossings for slow■moving status/interrupt/control bits.
- No explicit dual■clock FIFOs are used. No timing relationship between clk\_i and phy\_clk\_pad\_i is required by construction, but because several crossings are single■stage, integrators should ensure that level■type signals remain stable for multiple cycles of the receiving clock, or strengthen CDC (add 2■FF synchronizers) if targeting strict CDC sign■off.
- If USBF\_ASYNC\_RESET is enabled, assertion is asynchronous per block; ensure reset release meets each domain's clocking requirements.

## -- Scalability and Parameterized Resources --

The core is designed to scale primarily through two mechanisms: (1) parameterized buffer memory addressing and (2) compile-time selectable endpoints. Key aspects: 1) Buffer memory depth: Parameter SSRAM\_HADR (default from `USBF\_SSRAM\_HADR`) sets the external SRAM address width and scales all related address fields and counters across the DMA path. - External SRAM address bus: sram\_adr\_o[SSRAM\_HADR:0]. - Wishbone-to-SRAM addressing: wadr = ma\_adr[SSRAM\_HADR+2:2] (word addressing). - Internal buffer descriptors (EP Buf0/Buf1) carry address fields sized to [SSRAM\_HADR+2:0]; software must align descriptor packing to the chosen SSRAM\_HADR. - Memory size formula (bytes):  $4 * 2^{(SSRAM\_HADR+1)}$ . Changing SSRAM\_HADR updates usbf\_pl/usbf\_pe/usbf\_idma/usbf\_mem\_arb consistently via parameter propagation from usbf\_top. 2) Endpoint scalability (0..15): EP0 is always present; EP1..EP15 are compile-time selectable via `USBF\_HAVE\_EPn` macros. Enabled endpoints instantiate usbf\_ep\_rf; disabled endpoints instantiate usbf\_ep\_rf\_dummy, preserving vector widths and tie-offs. The design enforces contiguous EP enables (1..N) with an initial-time check. DMA interface scales to 16 per-EP request/ack bits: dma\_req\_o[15:0] and dma\_ack\_i[15:0] (unused bits are tied off by dummy EPs). 3) Wishbone address scalability: The Wishbone address width is set by `USBF\_UFC\_HADR`. Memory vs register-file selection is controlled by macros `USBF\_MEM\_SEL` and `USBF\_RF\_SEL` (in this file, last defines select wb\_addr\_i[12]). Adjusting `USBF\_UFC\_HADR` and these select macros together moves the split point between RF and memory spaces without RTL changes. 4) Mode scalability at runtime: High-Speed vs Full-Speed behavior is handled dynamically (mode\_hs) in UTMI/Link state machines and protocol engines; no compile-time sizing impact. 5) Build-time feature toggles: `USBF\_ASYNC\_RESET`, `USBF\_DEBUG`, and `USBF\_TEST\_IMPL` gate reset style and diagnostics without changing functional resource sizing. 6) Integration considerations when scaling: - If you change SSRAM\_HADR, ensure external SRAM depth matches, and update software to encode/decode EP buffer descriptors using the new [SSRAM\_HADR+2:0] address field width. - If you enable more endpoints, ensure corresponding DMA ack bits are driven and that system software maps their RF windows. - Keep `USBF\_UFC\_HADR`, `USBF\_RF\_SEL`, and `USBF\_MEM\_SEL` consistent; the last macro definitions take precedence in this file. Default configuration in this source sets SSRAM\_HADR=9 ( $\approx 4$  KiB buffer), `USBF\_UFC\_HADR`=12, and enables EP1..EP3; others synthesize as dummies.

## OPERATION

### -- Power-Up, Reset, and Initialization --

## Overview

- The core has two clock domains: system (clk\_i) and PHY/UTMI (phy\_clk\_pad\_i). A single reset input (rst\_i) fans out to all blocks; phy\_rst\_pad\_o mirrors rst\_i to the PHY.
- By default, resets are synchronous to their respective clock domains. If USBF\_ASYNC\_RESET is defined at synthesis, resets become asynchronous.

### Reset assertion (rst\_i = 1)

- All FSMs return to IDLE/known states. No memory or bus transactions are issued.
- UTMI interface (usbf\_utmi\_if/usbf\_utmi\_ls):
  - State machine enters POR. XcvSelect is set for Full-Speed (FS), TermSel asserted (FS termination on), mode\_hs = 0, usb\_attached = 0, usb\_suspend = 0.
  - TxValid is deasserted; no transmit activity. drive\_k = 0.
  - suspend\_clr is inactive.
- Top-level outputs/flags:
  - susp\_o reflects usb\_suspend (low on reset).
  - resume\_req\_r (internal latch of resume\_req\_i) is cleared.
- Wishbone interface (usbf\_wb):
  - State = IDLE; wb\_ack\_o deasserted. No RF/MEM requests.
- Register file (usbf\_rf):
  - funct\_adr = 0.
  - Interrupt masks = 0; all RF interrupt source bits cleared; inta\_o/intb\_o low.
  - Vendor control/output (utmi\_vend\_ctrl/utmi\_vend\_wr) cleared.
  - rf\_resume\_req cleared.
- Endpoints (usbf\_ep\_rf, all implemented EPx):
  - CSR fields reset to 0 (dir = control, type = control, DMA disabled csr[15] = 0, max packet = 0, status bits = 0). uc\_bsel = 0, uc\_dpd = 0.
  - buf0 = buf1 = 0xFFFF\_FFFF (unallocated). Endpoint-matching/interrupt status cleared. DMA request outputs = 0.
- Protocol/link (usbf\_pl, usbf\_pe, usbf\_pd, usbf\_pa, usbf\_idma):
  - All protocol engines IDLE. No tx/rx activity, no token generation, no DMA.
- Memory subsystem (usbf\_mem\_arb/usbf\_idma):
  - No writes; sram\_we\_o = 0. sram\_re\_o held high by design but no transactions occur without requests. All DMA address/size counters reset.

### Power-up and attachment (VBUS present)

- With usb\_vbus\_pad\_i asserted and rst\_i released, the UTMI line-state controller sequences: POR -> ATTACH -> NORMAL.
- In ATTACH it waits approximately 100 ms before asserting usb\_attached and entering NORMAL.
- Speed negotiation and reset handling:
  - SE0 > ~1 ms causes RESET; controller performs HS chirp and then selects HS or FS, updating mode\_hs, XcvSelect, and TermSel accordingly.
  - Suspend is detected from extended idle; resume signaling (drive K) is generated as required. On exiting suspend, suspend\_clr is pulsed.

### Resume request handling

- External resume\_req\_i is level-latched into resume\_req\_r and automatically cleared on suspend\_clr, ensuring a complete resume sequence.

### USB bus reset indication

- Upon detecting a USB bus reset, the RF "core" interrupt source bit (int\_srcb[8]) is set and remains set until the software reads the interrupt source register (read-to-clear).

### Software initialization sequence (typical)

- After rst\_i deassertion and usb\_attached asserted (and/or after a USB reset is observed):
  - 1) Program the Function Address (RF register index 0x1).
  - 2) For each used endpoint, program EP CSR (type, direction, max packet), initialize buf0/buf1 descriptors (base address, size, initial DPID/BSEL), and only then set csr[15] = 1 to enable DMA (if used).
  - 3) Configure interrupt masks; clear pending interrupt sources by reading the RF interrupt source register.
  - 4) Ensure the external SRAM buffer space is initialized before enabling traffic. Uninitialized (0xFFFF\_FFFF) buffer descriptors are treated as not allocated and will cause NAK/NYET per protocol engine rules.

### Operational guarantees after initialization

- Endpoints NAK until valid buffers are allocated (and for control EP, until appropriate stage is active).
- No outstanding Wishbone or memory transactions exist unless initiated by software or the protocol engine.

### Reset requirements

- Hold rst\_i asserted until both clk\_i and phy\_clk\_pad\_i are stable.
- Reasserting rst\_i at any time returns the core to the safe, quiescent defaults described above in both clock domains (sync or async as built).

-- Device Attach/Detach and Speed Negotiation --

### Overview

- The USB function core implements device attach/detach detection and high-speed/full-speed (HS/FS) selection/negotiation inside the UTMI line-state controller (module usbf\_utmi\_ls), accessed through usbf\_utmi\_if and exposed at the top level (usbf\_top).

### Key UTMI signals and controls

- Inputs: LineState[1:0] (UTMI bus), usb\_vbus, phy clock, rst, resume\_req.
- Derived line states: SE0=00, J=01, K=10, SE1=11. Idle is J in FS, SE0 in HS.
- Outputs to PHY: XcvSelect (FS=1, HS=0), TermSel (FS termination), OpMode (bit-stuff enable/disable), SuspendM, drive\_k (used for K signaling during resume and HS chirp).
- Status out: mode\_hs (1=HS, 0=FS), usb\_reset, usb\_suspend, usb\_attached.

### Device attach

- Power-on/reset (POR) entry conditions: rst deasserted or a usb\_vbus event forces the state machine into POR.
- POR actions: select FS (XcvSelect<=FS), enable FS termination (TermSel<=1), clear usb\_attached, clear suspend, then transition to ATTACH.
- ATTACH timing: wait ~100 ms (macro USBF\_T2\_C\_100\_MS via internal counters). After the delay, set usb\_attached=1 and transition to NORMAL operation.
- System-visible attach event: usbf\_rf edge-detects usb\_attached and latches an “attach” interrupt source (int\_srcb[5]).

### Device detach

- A VBUS change that drives the controller back to POR clears usb\_attached to 0.
- System-visible detach event: usbf\_rf detects the falling edge of usb\_attached and latches a “deattach” interrupt source (int\_srcb[6]).

### Reset detection and entry to negotiation

- In NORMAL (operational) state, the controller monitors line idle duration using programmable timing (USBF\_T1\_\* macros). When FS and a reset (SE0) condition is observed for the required interval, the state machine enters RESET.
- RESET actions: assert usb\_reset, force FS electricals (XcvSelect<=HS select for chirp drive plus TermSel<=1; bit-stuff off), then after ~1 ms (USBF\_T2\_C\_1\_0\_MS) transition to SPEED\_NEG.

High-speed negotiation (chirp) and fallback to FS

- SPEED\_NEG: drive K (drive\_k asserted) and initialize chirp counting; after ~1.2 ms (USBF\_T2\_C\_1\_2\_MS) start sampling host chirps.
- SPEED\_NEG\_K / SPEED\_NEG\_J: alternate states that count received K/J chirps from the host; each valid opposite-state transition increments chirp\_cnt.
- HS entry criterion: on reception of 6 valid K/J chirps (chirp\_cnt==6), enter SPEED\_NEG\_HS.
- SPEED\_NEG\_HS actions: enable bit-stuffing (OpMode), select HS transceiver (XcvSelect<=HS), disable FS termination (TermSel<=0), set mode\_hs=1; then transition to NORMAL when SE0 long is observed.
- FS fallback criterion: if a long SE0 occurs during negotiation before 6 chirps are counted, enter SPEED\_NEG\_FS.
- SPEED\_NEG\_FS actions: enable bit-stuffing, select FS transceiver (XcvSelect<=FS), enable FS termination (TermSel<=1), set mode\_hs=0; then transition to NORMAL.

Suspend/Resume notes (interaction with attach/negotiation)

- The controller detects idle durations (USBF\_T1\_\* macros) to enter SUSPEND; in suspend it may generate resume signaling on resume\_req via RESUME\_REQUEST/RESUME\_SIG (drive\_k asserted for resume K).
- On exit from suspend (RESUME), if in HS the controller restores HS settings; otherwise it restores FS.

Top-level observability and integration

- usbf\_top propagates: mode\_hs, usb\_reset, usb\_suspend, usb\_attached to the protocol layer; and captures attach/detach as interrupt sources in usbf\_rf (maskable via register interface).
- The protocol layer (usbf\_pl) uses mode\_hs and usb\_reset to adapt packet timing and state; attach status can be read and its edges can generate interrupts.

Timing constants

- All timing thresholds are parameterized via macros:
- T1\_\* (e.g., USBF\_T1\_C\_2\_5\_US, \_3\_0\_MS, \_3\_125\_MS, \_5\_MS) for idle-based decisions (reset/suspend heuristics).
- T2\_\* (e.g., USBF\_T2\_C\_2\_5\_US, \_100\_US, \_1\_0\_MS, \_1\_2\_MS, \_0\_5\_MS tick basis, \_100\_MS) for reset/negotiation windows, wakeup and attach delays.

Compliance summary

- After attach (VBUS presence), the device waits ~100 ms before declaring attached.
- On reset, the device performs HS chirp negotiation per USB 2.0: drives K, counts up to 6 K/J chirps from host; selects HS on success, otherwise falls back to FS.
- Electrical controls (XcvSelect, TermSel, OpMode) are driven appropriately for FS vs HS during attach, negotiation, and normal operation.
- Attach/detach edges are latched and exposed to system software via interrupt status bits.

-- Function Address Assignment --

Function Address Assignment defines how the USB device address (7-bit) is stored and used for token matching. The core exposes a 32-bit register in the RF (register file) main page at word offset 0x01 (internal RF address adr==7'h01). Reset value is 0x00000000. Bits [6:0] hold the function address

(0–127); bits [31:7] are reserved (read as 0, write as 0).

Write behavior: On a WISHBONE write to RF offset 0x01 with WE=1, the core latches din[6:0] into funct\_adr. This write occurs on the system/bus clock (wclk). There is no deferred activation: the new address takes effect immediately in the packet processing pipeline.

Use in packet matching: The packet layer compares the received token address (token\_fadr) against funct\_adr (fa) to produce fsel = (token\_fadr == fa). Endpoint activity only proceeds when match\_o = fsel & endpoint match & token\_valid & !crc5\_err & not a prohibited PID. Thus, after assignment, the core responds only to tokens addressed to funct\_adr.

Timing/clock domains: funct\_adr is written in the WISHBONE clock domain (wclk) and consumed in the PHY/packet domain (clk/phy\_clk) without an explicit synchronizer. System software must update funct\_adr at a quiescent time (e.g., after completing the Status stage of SET\_ADDRESS) to avoid transient mismatches during in-flight tokens.

Reset/USB reset: The register clears to 0 only on core reset (rst). It is not automatically cleared on a USB bus reset event (usb\_reset). Firmware should set funct\_adr to 0 upon detecting a USB reset (int source provided by rf.int\_srcb[8]) to comply with the USB specification.

Software model:

- Read @ RF[0x01]: returns {25'h0, funct\_adr}.
- Write @ RF[0x01]: din[6:0] becomes funct\_adr; din[31:7] ignored.
- Program flow for SET\_ADDRESS: handle the control transfer; after the Status stage completes, write the assigned address to RF[0x01].

-- Token Detection and Endpoint Matching --

Overview: Incoming UTMI bytes are decoded into USB PIDs and token fields, validated (PID complement and CRC5), then qualified against the device Function Address and the configured Endpoint numbers. Only qualified tokens are allowed to drive the protocol engine; others are ignored and/or flagged as errors.

### 1) PID detection and validation (usbf\_pd)

- Latching: On rx\_active & rx\_valid, the first byte (PID) is latched into pid.
- PID integrity: pid\_cks\_err asserts if pid[3:0] != ~pid[7:4] (exposed at top as pid\_cs\_err).
- PID classification: Individual one-hot flags for OUT/IN/SOF/SETUP, DATA0/1/2/MDATA, ACK/NACK/STALL/NYET, PRE/ERR/SPLIT/PING.

### 2) Token field extraction (usbf\_pd)

- Sequencing: Token bytes token0 then token1 are captured in TOKEN state.
- token\_valid: Asserts after the second token byte or when an ACK is seen (got\_pid\_ack). Used to time address/endpoint checks.
- Field mapping: token\_fadr[6:0] = token0[6:0]; token\_endp[3:0] = {token1[2:0], token0[7]}; frame\_no = {token1[2:0], token0} for SOF.
- CRC5 check: Computed over {token\_fadr, token\_endp}, inverted, then compared with token1[7:3]. crc5\_err asserts on mismatch.

### 3) Token filtering before match (usbf\_pl)

- pid\_bad: Filters out handshake PIDs (ACK/NACK/STALL/NYET), PRE/ERR/SPLIT always, and PING when not in HS mode.
- Only non-handshake, valid tokens proceed: gating conditions require !pid\_bad & token\_valid & !crc5\_err.

#### 4) Function Address match (usbf\_pl + usbf\_rf)

- fa (funct\_adr) is a writeable 7-bit register in usbf\_rf (WB space adr==7'h1) holding the device address.
- fsel = (token\_fadr == fa). Only tokens addressed to this function are processed.

#### 5) Endpoint match (usbf\_rf + usbf\_ep\_rf)

- ep\_sel (from token\_endp) is presented to all 16 endpoint register files.
- Each endpoint compares ep\_sel to its configured endpoint number csr[21:18]. When equal, that endpoint's ep\_match asserts.
- rf.match is the OR of all epX\_match, registered (match\_r1). Disabled endpoints (dummy instances) never match.
- The endpoint's CSR/buffer context (csr, buf0, buf1, and DMA hints) corresponding to the matching ep are multiplexed to the protocol engine.

#### 6) Final match qualification into Protocol Engine (usbf\_pl)

- match\_o = !pid\_bad & fsel & match & token\_valid & !crc5\_err.
- match\_o is fed to usbf\_pe as match. If a directed token (OUT/IN/SETUP) fails endpoint match, usbf\_pe sets nse\_err (No Such Endpoint): nse\_err <= token\_valid & (pid\_OUT | pid\_IN | pid\_SETUP) & !match.

#### 7) Timing and domains

- Detection (usbf\_pd), filtering, and match qualification (usbf\_pl) run on the UTMI clock (phy\_clk\_pad\_i).
- Endpoint comparison logic inside usbf\_rf/usbf\_ep\_rf also clocks on phy\_clk for match paths; WB programming uses wclk.

Result: A token is considered valid-and-directed when PID is consistent, CRC5 passes, it is addressed to this function (fa), and its endpoint number matches a configured endpoint. Only then does match\_o assert, enabling the protocol engine for the transaction; errors (pid\_cs\_err, crc5\_err, nse\_err) annotate invalid/misdirected tokens.

-- IN Transaction Flow --

#### IN Transaction Flow (device-to-host data stage)

##### Entry conditions

- Trigger: Receipt of a valid IN token for the addressed endpoint (match asserted) and not pid\_SOF.
- Endpoint gating in IDLE state (usbf\_pe):
  - If endpoint stalled (csr[23:22] == 2'b10) -> send STALL token and return to IDLE.
  - If no buffer available or DMA cannot serve (e.g., both buffers NA, or DMA in-buffer insufficient) -> send NAK token and return to IDLE.
- Otherwise, if endpoint direction/config allows IN (IN\_ep or CTRL\_ep with IN phase), proceed with an IN transfer.

##### Initiation (state IN)

- tx\_dma\_en is asserted to start fetching payload from buffer memory via IDMA (usbf\_idma).
- If MaxPacketSize (csr[10:0]) == 0, send\_zero\_length is asserted (zero-length DATA with only CRC).
- The DATA PID to use (DATA0/1/2/MDATA) is selected by data\_pid\_sel, computed from endpoint type (iso/bulk/interrupt/control), HS/FS mode, transfer framing, and the current/next data toggle (uc\_dpd); allow\_pid reflects received DATAx for validation.
- Starts ACK wait timer: rx\_ack\_to\_clr\_d deasserted to enable rx\_ack\_to counting.
- Transition: When IDMA signals idma\_done:
- Isochronous (csr[25:24] == 2'b01): skip ACK wait -> UPDATE.

- Others (bulk/int/control): go to IN2 to await host ACK.

#### Data transmission path

- usbf\_idma reads from buffer at adr/size determined by usbf\_pe, honoring buf\_size and DMA enable.
- usbf\_pa emits: DATA PID -> payload bytes -> CRC16. It handles zero-length case by emitting only CRC after PID.
- send\_data drives pa; rd\_next handshakes the byte stream; tx\_valid/tx\_first/tx\_valid\_last are generated to UTMI MAC.

#### ACK handling (state IN2, non-iso endpoints)

- Waits for token\_valid && pid\_ACK from host.
- rx\_ack\_to timeout uses mode-dependent constants (HS: USBF\_RX\_ACK\_TO\_VAL\_HS, FS: USBF\_RX\_ACK\_TO\_VAL\_FS). If timeout (rx\_ack\_to asserted) occurs before ACK, aborts to IDLE (no buffer pointer/toggle update).
- On ACK reception, proceeds to UPDATE.

#### Buffer/accounting update (state UPDATE then UPDATE2)

- int\_set\_en asserted to enable endpoint interrupt cause generation.
- If buffer\_done && dma\_en:
- buf0\_rl\_d asserted to release the consumed buffer (toggle buffer ownership in ring mode).
- Else:
- buf\_set\_d asserted to write back updated buffer address/size and next data toggle/buffer select into idin; buf0\_set or buf1\_set strobes choose which buffer gets updated.
- idin contents:
  - [31:17] = {buffer\_done, new\_size} (or {0, szu\_c} on certain conditions like out\_to\_small, not used for IN data writes).
  - [SSRAM\_HADR+2:4] = new\_adr (next buffer address in words).
  - [3:0] = {next\_dpid, next\_bsel} (encodes next DATA PID and buffer select).
- UPDATE2:
- uc\_bsel\_set and uc\_dpd\_set asserted to commit next buffer select and data PID toggle into csr.
- Returns to IDLE.

#### Error/abort behavior

- If a new token match occurs while busy (match & state != IDLE) or other error conditions, abort is asserted, forcing return to IDLE.
- For IN path specifically:
  - Isochronous: no ACK wait; always updates buffer/toggle in UPDATE.
  - Bulk/Interrupt/Control: missing ACK (timeout) ends transfer without UPDATE (no toggle/buffer update), leaving state for reattempt on a subsequent IN token.

#### Key interactions and constraints

- Buffer selection: buf\_sel = 0 when DMA enabled; otherwise based on uc\_bsel and buffer availability. For IN+DMA, buf0 is preferred unless endpoint configuration dictates otherwise via uc\_bsel.
- Size selection: size\_next = min(buf\_size, MaxPacketSize). Zero-length handling if MaxPacketSize==0.
- Handshakes prior to IN data stage (in IDLE): STALL or NAK are sent via send\_token/token\_pid\_sel when endpoint is stalled or lacks resources.
- Timing: ACK wait cleared on tx\_valid activity and restarted per transaction; counts in PHY clock domain.

#### Signals of interest

- Entry/decision: match, IN\_ep/CTRL\_ep, ep\_stall, no\_buf0\_dma, buf0\_na/buf1\_na, mode\_hs.

- Start: tx\_dma\_en, send\_zero\_length, data\_pid\_sel.
- IDMA path: adr, size, dma\_en, idma\_done, rd\_next, tx\_data\_st, send\_data.
- ACK wait: rx\_ack\_to\_clr\_d, rx\_ack\_to, token\_valid, pid\_ACK.
- Update: int\_set\_en, buf0\_rl, buf0\_set, buf1\_set, idin, uc\_bsel\_set, uc\_dpd\_set.
- Interrupts: int\_buf0\_set/int\_buf1\_set asserted in UPDATE when buffer\_done and buffer allocated.

#### Outcome

- On successful IN transaction (data sent and ACKed, or any iso IN): data toggle and buffer bookkeeping are updated, and buffer may be released or advanced; endpoint interrupt sources are generated accordingly. On NAK/STALL gating or ACK timeout, no state updates are committed.

-- OUT and SETUP Transaction Flow --

Scope: OUT and SETUP packet handling inside the USB function core protocol/link layer (modules usbf\_pl/usbf\_pe/usbf\_idma/usbf\_pd). This describes how OUT and SETUP tokens are accepted, data are moved to memory, handshakes are generated, and endpoint/buffer state is updated.

#### Token gating and endpoint match:

- A token is considered for processing only if: token\_valid=1 with no CRC5 error, PID is not a non-data/bad PID (not ACK/NACK/STALL/NYET/PRE/ERR/SPLIT; PING is separate), device address matches (fsel), and the endpoint number matches a configured EP (match from register file). If a valid OUT/IN/SETUP token does not match, nse\_err is raised.

#### Entry to OUT/SETUP flow:

- Control endpoint (CTRL\_ep): SETUP and OUT tokens go through the OUT data path; control endpoints never use DMA (dma\_en=0 for CTRL).
- Non-control OUT endpoint: OUT token goes through the OUT data path. When a matching, enabled endpoint is selected and not stalled, the protocol engine asserts rx\_dma\_en and transitions to OUT state.

#### Data reception and integrity:

- usbf\_pd streams received bytes and computes CRC16. It asserts rx\_data\_valid and eventually rx\_data\_done when the payload ends. usbf\_idma writes payload into the buffer/memory designated by the current buffer descriptor, accumulating the received byte count szu\_c.
- If a payload CRC16 error occurs (crc16\_err=1) or a data timeout occurs while expecting data (tx\_data\_to=1) or an abort condition is detected, the transfer is dropped and the state machine returns to IDLE without sending a handshake (ISO and non-ISO alike), per USB rules for CRC errors.

#### Handshake policy after data (non-isochronous):

- After rx\_data\_done, the engine evaluates conditions in OUT2A/OUT2B to decide the handshake:
- Size policy checks: max packet size is csr[10:0]. If sml\_ok=0 and received size < max, or lrg\_ok=0 and received size > max, a NAK is returned and the transfer is discarded (to\_small/to\_large).
- PID sequence error (unexpected DATA PID relative to toggle state): a sequence error sets int\_seqerr\_set; for non-ISO endpoints the core sends ACK and returns to IDLE (ISO discussed below).
- High-speed flow control: if in HS mode and no\_bufs=1 (no buffer available for next transaction), return NYET; otherwise return ACK.
- PING is handled separately (not part of OUT and SETUP data phase), but note HS PING gets ACK/NACK based on buffer readiness; OUT handshake is generated only after data as described above.

#### Isochronous OUT:

- For isochronous endpoints (txfr\_iso=1), after rx\_data\_done there is no handshake. If a PID sequence

error is detected, an interrupt (int\_seqerr\_set) is generated, but no token is sent. The flow goes to an update stage to commit descriptors.

Buffer selection and DMA/data movement:

- Buffer descriptors buf0/buf1 carry address, size, and meta (including DP/BS). For DMA-enabled non-control endpoints, buf\_sel is forced to buf0 during the transaction; for non-DMA/control, buf\_sel follows endpoint/control rules.
- usbf\_idma writes received bytes into memory at adr, increments MADR on each accepted word, and tracks szu\_c (received bytes). It can also transmit (IN), but for OUT/SETUP only the receive path is used.

Update phase and descriptor/DP/BS maintenance:

- After completing the data and handshake decision, the engine enters UPDATE/UPDATE2:
- If buffer\_done=1 and DMA is enabled: assert buf0\_rl to release/rotate the buffer (ring advance).
- Otherwise: compute new\_size and new\_adr based on szu\_c (or max packet size for certain DMA cases) and program idin with updated fields. Assert buf0\_set or buf1\_set (depending on which buffer was used) to write the descriptor with the new size and address tail. Short OUT in DMA mode sets out\_to\_small and writes the original size/location.
- Update and publish next toggle and buffer select (uc\_dpd\_set and uc\_bsel\_set), and raise the appropriate buffer interrupt (int\_buf0\_set/int\_buf1\_set) when buffer is consumed.

Flow-control prechecks and stalls:

- If the endpoint is stalled, the core responds with STALL (handshake) and returns to IDLE.
- If both buffers are not available, or (in HS) DMA indicates insufficient space for at least one max packet (no\_buf0\_dma/dma\_out\_buf\_avail=0), the core flow-controls the host. For OUT this results in returning NAK/NYET as described above after data; for HS PING the immediate ACK/NAK is returned.

Abort conditions:

- The active transaction is aborted if: buffer\_overflow is detected, a matching token arrives while the endpoint state machine is not in IDLE (new transaction preempts), or a to\_large condition is detected mid-flow. Abort drops the packet and returns to IDLE without handshake.

SETUP specifics (control):

- SETUP tokens are treated as OUT to the control endpoint (CTRL\_ep). The state machine routes them into the OUT path (rx\_dma\_en asserted). Data PID handling for control is forced so that SETUP stage uses DATA0 and internal next/this PID tracking updates accordingly. Control endpoints do not use DMA; descriptor and DP/BS updates are applied via buf0\_set/buf1\_set and uc\_\*\_set in UPDATE/UPDATE2.

Interrupts and status:

- The engine asserts per-endpoint status/interrupt bits: int\_seqerr\_set on PID sequence error, int\_to\_set on timeouts (IN/OUT), int\_crc16\_set on CRC error (at data\_done), and buffer interrupts on buffer commit. out\_to\_small records short OUTs in DMA mode. nse\_err is asserted when an OUT/IN/SETUP token is received for a non-matching endpoint or address.

-- Ping-Pong Buffer Management and Data PID Handling --

Overview

- Each endpoint has two 32-bit buffer descriptors (B0/B1) and endpoint control/state in CSR. This block manages pingpong (double) buffering and Data PID sequencing for IN/OUT/CTRL transfers in both FS and HS modes.

Buffer Descriptor Format and Status

- $\text{bufX}[31] = \text{NA}$  (Not Available) flag. Additionally, a descriptor is considered not allocated if its base address field is all ones.
- $\text{bufX}[\text{SSRAM\_HADR}+2:0]$  = Base address (word+byte address; low nibble holds byte address).
- $\text{bufX}[30:17]$  = Remaining size (bytes) of this buffer region.
- $\text{uc\_bsel}$  (CSR[31:30]) and  $\text{uc\_dpd}$  (CSR[29:28]) are maintained in the endpoint CSR, not inside the buffer descriptor. They are updated from the IDIN bus when  $\text{uc\_bsel\_set}/\text{uc\_dpd\_set}$  strobe.

#### Active-Buffer Selection (buf\_sel)

- DMA enabled non-control endpoints (CSR.DMA\_EN=1 and DIR!=CTRL): Always use B0.
- Control endpoints (DIR=CTRL): Use B0 for OUT/SETUP stages, B1 for IN stages ( $\text{buf\_sel} = \text{in\_token}$ ).
- All other cases (non-DMA, non-CTRL): Choose B1 when ( $\text{uc\_bsel}[0] == 1$ ) or B0 is NA, and B1 is available; otherwise choose B0. This realizes ping-pong by toggling  $\text{uc\_bsel}$  when a buffer completes.

#### Transfer Accounting and Completion

- $\text{MaxPacket} = \text{CSR}[10:0]$ . For each transaction the engine computes:
- $\text{size\_next} = \min(\text{buf\_size}, \text{MaxPacket})$
- $\text{new\_size} = \text{buf\_size} - \text{bytes\_transferred}$
- $\text{new\_adr} = \text{adr} + \text{bytes\_transferred}$
- Bytes transferred:
  - IN:  $\text{size\_next}$
  - OUT (DMA path):  $\text{MaxPacket}$
  - OUT (non-DMA path):  $\text{sizu\_c}$  (measured received bytes)
- Buffer completion (buffer\_done) criteria:
  - IN:  $\text{new\_size} == 0$  (buffer empty)
  - OUT:  $\text{new\_size} < \text{MaxPacket}$  (buffer considered full enough for a packet)

#### Descriptor Write-Back and Ping-Pong Update

- On UPDATE state:
  - If DMA enabled and buffer\_done: assert  $\text{buf0\_rl}$  to reload B0 from its original descriptor ( $\text{buf0\_orig}$ ). This creates a cyclic, constant descriptor for DMA ring operation.
  - Else: assert  $\text{bufX\_set}$  ( $X = \text{selected buffer}$ ) to write back descriptor with updated address and size.
- On UPDATE2 state: assert  $\text{uc\_bsel\_set}$  and  $\text{uc\_dpd\_set}$  to commit next ping-pong pointer and next Data PID to CSR.
- Ping-pong pointer update (next\_bsel):
  - DMA endpoints: forced to 0 (no ping-pong).
  - Non-DMA:  $\text{uc\_bsel}$  increments when buffer\_done; otherwise it holds.

#### IDIN Bus Composition (write-back vector)

- IDIN[31:17]  $\leq \{\text{buffer\_done}, \text{new\_size}\}$  by default. For OUT with DMA when the packet is smaller than MaxPacket (out\_to\_small), it carries  $\{4'h0, \text{sizu\_c}\}$  instead.
- IDIN[SSRAM\_HADR+2:4]  $\leq \text{new\_adr}[\text{MSBs}]$  (or original adr when out\_to\_small).
- IDIN[3:0]  $\leq$
- During  $\text{bufX\_set}$ :  $\text{new\_adr}[3:0]$  (address low bits), writing the descriptor.
- During  $\text{uc\_bsel\_set}/\text{uc\_dpd\_set}$ :  $\{\text{next\_dpid}[1:0], \text{next\_bsel}[1:0]\}$ , updating CSR.

#### Flow Control, NAK/NYET Policy, and Interrupt Tags

- If endpoint is STALLED: return STALL.
- If both buffers are NA, or DMA resources are not ready (IN and !dma\_in\_buf\_sz1, or OUT and !dma\_out\_buf\_avail), or CTRL-phase buffer is missing: return NACK from IDLE.
- For HS OUT completion (OUT2B):
  - If no buffers available: return NYET; else ACK.

- If RX Data PID sequence error: ACK but raise int\_seqerr\_set.
- int\_buf0\_set/int\_buf1\_set pulse when a buffer transitions through completion (buffer\_done) and is eligible; these are qualified by allocation status to avoid reporting on unallocated buffers.
- out\_to\_small pulses for DMA OUT when received size != MaxPacket; buf1 is updated with size and an interrupt can be generated.

#### DMA Aids (computed in endpoint RF)

- dma\_in\_buf\_sz1: IN has at least one MaxPacket worth of data ready in B0 (and MaxPacket != 0).
- dma\_out\_buf\_avail: OUT has at least one MaxPacket worth of free space remaining in current OUT buffer.

#### Data PID Generation and Checking

- The engine keeps a 2-bit current Data PID state (uc\_dpd) per endpoint in CSR and computes:
- this\_dpid: expected PID for the current transfer (drives data\_pid\_sel for transmit; used to check received PIDs).
- next\_dpid: PID for the next transaction, written back to uc\_dpd during UPDATE2.
- Selection logic depends on:
  - Endpoint direction (CSR[27:26]), type/transfer attributes (CSR[25:24], CSR[12:11]), high-speed vs full-speed (mode\_hs), control-stage (in\_token, out\_token, setup\_token), and observed HS multi-data patterns (MDATA/DATA2 rules per USB2 bulk HS).
- pid\_seq\_err is asserted when a received DATA PID doesn't match this\_dpid. For iso OUT it raises int\_seqerr\_set; for bulk OUT the state machine ACKs and completes while flagging the error.

#### Special Cases

- Control endpoints: data PID handling resets appropriately around SETUP/IN/OUT phases; buffer selection is fixed per phase (B0 for OUT/SETUP, B1 for IN).
- Zero-length IN: If MaxPacket == 0, send\_zero\_length is asserted to transmit ZLP.

#### Summary of Handshake Outputs

- token\_pid\_sel: ACK/NACK/STALL/NYET chosen per endpoint state, buffer availability, error conditions, and HS vs FS rules.
- data\_pid\_sel: DATA0/DATA1/DATA2/MDATA chosen by this\_dpid for transmit.

#### Key Signals

- buf0\_set, buf1\_set: Write back updated buffer descriptor.
- buf0\_rl: Reload B0 from its original CPU-programmed descriptor (DMA ring).
- uc\_bsel\_set, uc\_dpd\_set: Update pingpong pointer and Data PID state in CSR.
- int\_buf0\_set, int\_buf1\_set, int\_seqerr\_set, out\_to\_small: Interrupt sources tied to buffer completion, PID sequence error, and short OUT packet in DMA mode.

-- DMA Read/Write Operation and Buffer Updates --

#### Scope

- Describes how payload data is moved between the UTMI interface and the shared SSRAM (internal DMA), and how endpoint buffer descriptors are updated afterward. Includes flow-control conditions that gate DMA and the exact buffer-maintenance actions.

#### Terminology and roles

- RX DMA (OUT/SETUP): host sends data to the device. Data is written from UTMI to SSRAM.
- TX DMA (IN): device sends data to the host. Data is read from SSRAM to UTMI.
- Packet Engine (PE, usbf\_pe): decides when to perform RX/TX DMA, computes new buffer pointers/sizes, and issues buffer update commands.

- Internal DMA (IDMA, usbf\_idma): performs aligned memory reads/writes on the shared SSRAM using mreq/mack/mwe/madr/mdout/mdin.
- Register File (RF, usbf\_ep\_rf): holds endpoint CSR and buffer descriptors (buf0, buf1, original buf0). Applies PE updates and generates external dma\_req (system-level DMA) and availability flags (dma\_in\_buf\_sz1, dma\_out\_buf\_avail).
- Memory arbiter (usbf\_mem\_arb): arbitrates SSRAM between IDMA (m\* port) and Wishbone (w\* port).

#### When DMA is used

- DMA enable is CSR[15] and only active for non-control endpoints (csr[27:26] != 2'b00). For DMA-enabled endpoints the PE always operates on buf0 (buf\_sel = 0).
- Before servicing tokens the PE checks flow-control: for IN, dma\_in\_buf\_sz1 must be 1 (>= one max packet available); for OUT, dma\_out\_buf\_avail must be 1 (>= one max packet space available). If not, PE NACKs the token.

#### IDMA operation (aligned SSRAM access)

- Address inputs: base adr[SSRAM\_HADR+2:0] and transfer size size[13:0] come from the PE. buf\_size[13:0] provides the configured buffer span.
- Addressing is word-based on the SSRAM (madr = adr\_cw; 32-bit), with internal byte-steering using adr\_cb[1:0].
- Wrap control (DMA only): last\_buf\_adr = adr + buf\_size. When an increment would reach last\_buf\_adr, IDMA wraps adr\_cw to 0 (start of the configured region) for ring-buffer semantics.

#### RX DMA (OUT/SETUP)

- State sequence: WAIT\_MRD -> MEM\_WR -> MEM\_WR1 -> MEM\_WR2.
- WAIT\_MRD: pre-read the current word (mreq with dtmp\_sel=1) to merge a possibly unaligned first write.
- MEM\_WR: assemble 32-bit words from rx\_data\_st by byte position (adr\_cb[1:0]) and write (mwe=1). word\_done when the 4th byte arrives or at end-of-packet (wr\_last flush).
- MEM\_WR1/2: finalize partial last word and complete the final write.
- szu\_c counts received bytes (increments on rx\_data\_valid). idma\_done asserts when rx\_data\_done or after the last flush.

#### TX DMA (IN)

- State sequence: MEM\_RD1 -> MEM\_RD2 -> MEM\_RD3 (loops).
- MEM\_RD1: issue first read (mreq) and mark rd\_first.
- MEM\_RD2/3: pipeline alternating word reads; fill rd\_buf0/rd\_buf1 on mack, and feed bytes to the transmitter via tx\_data\_st according to adrb\_next[2:0].
- szd\_c is loaded from size and decremented on rd\_first/mack and each rd\_next until zero; idma\_done asserts when szd\_c reaches zero (or on abort).
- send\_data is asserted from the first fetched word until the last byte; send\_zero\_length is honored to emit a zero-length data phase when requested.

#### Arbitration to SSRAM (usbf\_mem\_arb)

- Grants to Wishbone when (wreq | wack) & !mreq, otherwise to IDMA. sram\_adr/sram\_dout/sram\_we mux between w\* and m\* sides. sram\_re is always 1. mack mirrors mreq (single-cycle ready by construction in this model).

#### Buffer accounting in PE (post-transfer)

- Operates on descriptor fields: bufX\_adr = bufX[SSRAM\_HADR+2:0], bufX\_size = bufX[30:17], uc\_bsel = csr[31:30], uc\_dpd = csr[29:28], max\_pl\_sz = csr[10:0].
- Size to subtract (new\_sizeb):
- OUT & DMA: use max\_pl\_sz (fixed-slot semantics).

- IN: use size\_next = min(buf\_size, max\_pl\_sz).
- OUT & non-DMA: use actual received szu\_c.
- New metrics: new\_size = buf\_size - new\_sizeb; new\_adr = adr + new\_sizeb (both saturate to configured widths). Buffer completion test:
- IN: buffer\_done when new\_size == 0.
- OUT: buffer\_done when new\_size < max\_pl\_sz (not enough space for another full packet).

#### Handling short OUT packets under DMA

- If OUT & DMA and received szu\_c != max\_pl\_sz, out\_to\_small is set. In this case the PE writes back:
- idin[31:17] = {4'h0, szu\_c} (actual short size),
- idin[SSRAM\_HADR+2:4] = original buf0\_adr (do not advance),
- idin[3:0] unchanged for address; UC fields handled in UPDATE2.
- RF captures this in buf1 (always @(buf1\_set || out\_to\_small) buf1 <= idin) to preserve the short-packet metadata.

#### Buffer update commands from PE

- UPDATE state after successful data phase:
- If buffer\_done && DMA: assert buf0\_rl (release). RF restores buf0 <= buf0\_orig (the original descriptor staged by software), effectively closing the slot and re-arming the ring.
- Else: assert bufX\_set to write idin into the active descriptor (buf0 for DMA; buf0/1 per selection for non-DMA). idin packs: {new\_size or short-size, new\_adr[SSRAM\_HADR+2:4], low[3:0] = new\_adr[3:0] when buf\_set\_d else {next\_dpid, next\_bsel}}.
- UPDATE2 state: assert uc\_bsel\_set and uc\_dpd\_set to write back updated UC buffer select and next data PID sequencing bits.

#### Flow-control signals into PE and NAK/NYET behavior

- no\_buf0\_dma is true for DMA endpoints when:
- IN: !dma\_in\_buf\_sz1 (fewer than one max packet available to send),
- OUT: !dma\_out\_buf\_avail (fewer than one max packet worth of free space).
- In PE.IDLE, if ep is stalled: send STALL. If any of: both buffers NA, no\_buf0\_dma, or control-direction buffer NA: send NACK. HS PING is ACKed. Otherwise PE proceeds with RX/TX DMA and subsequent updates.
- For HS OUT with no buffers, PE may send NYET instead of ACK per mode\_hs and no\_bufs.

#### Endpoint RF side accounting and external DMA handshake

- RF keeps counters per endpoint:
- OUT: dma\_out\_cnt decrements on dma\_ack\_i (external DMA serviced) and increments by max\_pl\_sz on buf0\_set/rl. dma\_out\_left = buf0\_orig.size - dma\_out\_cnt; dma\_out\_buf\_avail = (dma\_out\_left >= max\_pl\_sz).
- IN: dma\_in\_cnt increments on dma\_ack\_i and decrements by max\_pl\_sz on buf0\_set/rl. dma\_in\_buf\_sz1 = (dma\_in\_cnt >= max\_pl\_sz).
- RF raises dma\_req when either IN needs data or OUT has data to drain, subject to hold conditions. External system asserts dma\_ack to accept or provide data.

#### Completion signaling

- idma\_done informs the PE that the data mover finished for the current token (either RX end-of-packet processing or TX size exhausted). The PE then performs UPDATE/UPDATE2 to finalize buffer metadata and UC fields.

#### Error/exception interactions (brief)

- Abort of IDMA/PE paths occurs on protocol/CRC/timeouts or size violations. In these cases PE returns to IDLE without applying normal ACK/NYET or without advancing descriptors (except where

`out_to_small` dictates preserving a short-packet record).

-- Handshake Generation and Timeouts (ACK/NACK/STALL/NYET) --

- Scope: This block specifies when the device returns STALL, NACK, ACK, or NYET handshakes, and how the IN-ACK and OUT-data timeouts are handled in the packet engine (usbf\_pe) and packet arbiter (usbf\_pa).

- Handshake emission mechanism:

- Handshakes are emitted by asserting `send_token` together with `token_pid_sel`. The usbf\_pa module encodes `token_pid_sel` into the proper PID bytes (including complement) and transmits exactly one handshake packet (state TOKEN is a 1-cycle staging state).

- Mapping: 0=ACK, 1=NACK, 2=STALL, 3=NYET.

- When a matching token is received (`match_r`) and the endpoint is enabled (`ep_disabled=0`) and the token is not SOF:

- STALL: If the endpoint is stalled (`csr[23:22]==2'b10`), immediately send STALL and finish (IDLE→TOKEN→IDLE).

- NACK (resource unavailable at token time): If buffers are not available or DMA cannot proceed, immediately send NACK and finish. Conditions:

- (`buf0_na && buf1_na`) OR `no_buf0_dma` OR (CTRL endpoint and required direction buffer not present: CTRL+IN needs buf1, CTRL+OUT needs buf0).

- ACK (PING response in HS): If a HS PING token is received (`pid_PING && mode_hs`), immediately send ACK.

- IN data phase start: If IN endpoint (or CTRL IN) is selected and `max_pl_sz==0`, assert `send_zero_length`; enable TX-DMA and enter IN state; otherwise TX-DMA without handshake here (host later ACKs).

- OUT data phase start: If OUT endpoint (or CTRL OUT/SETUP) is selected, enable RX-DMA and enter OUT state.

- OUT transaction handshakes (non-isochronous):

- After successful data reception (`rx_data_done`), evaluate in OUT2B:

- NACK on size policy violations: if `to_small` (shorter than allowed when `sml_ok=0`) or `to_large` (larger than allowed when `lrg_ok=0`), return NACK.

- ACK on success when buffers are available.

- NYET in HS only when no further buffers are available (`mode_hs && no_bufs`). This indicates the function accepted this OUT but cannot accept another in the same microframe.

- Sequence error (PID toggle mismatch): `int_seqerr_set` is raised; device still returns ACK (accepts the packet but notes the sequence error).

- Isochronous OUT (`txfr_iso`): No handshake is sent (USB spec-compliant behavior); state proceeds to update bookkeeping.

- Abort paths (any time during OUT/OUT2A/OUT2B): If abort, CRC16 error, or TX data timeout triggers, return to IDLE with no handshake.

- IN transaction handshakes:

- The device does not send a handshake after IN data packets (the host does). The core waits for a host ACK in state IN2. On ACK receipt (`token_valid && pid_ACK`) it proceeds to UPDATE; on timeout it abandons and returns to IDLE.

- Isochronous IN: No ACK wait; transitions directly from IN to UPDATE.

- Control endpoints:

- Immediate NACK if the required buffer for the control direction is not present: CTRL+IN needs buf1; CTRL+OUT/SETUP needs buf0.

- NYET generation (HS only):
  - Sent after a valid OUT when mode\_hs==1 and no\_bufs==1, where no\_bufs reflects buffer not-available conditions across the currently selected and alternate buffers (including DMA availability signals dma\_out\_buf\_avail/dma\_in\_buf\_sz1 and max-payload sizing checks).
  - In FS, the same condition results in ACK (NYET is HS-only per spec).
  - Timeouts:
    - IN ACK wait timeout (rx\_ack\_to):
      - Applies to non-iso IN transfers while waiting for the host ACK in state IN/IN2.
      - Counter runs when the core is in IN/IN2; reset on tx\_valid activity and outside IN/IN2.
      - Thresholds: USBF\_RX\_ACK\_TO\_VAL\_FS=36 cycles (FS), USBF\_RX\_ACK\_TO\_VAL\_HS=22 cycles (HS), measured on the core/PHY clock domain.
      - On timeout (no ACK in time): transition to IDLE (no endpoint update is performed).
    - OUT data timeout (tx\_data\_to):
      - Counter clears whenever rx\_active is asserted; otherwise it increments.
      - Thresholds: USBF\_TX\_DATA\_TO\_VAL\_FS=36 cycles (FS), USBF\_TX\_DATA\_TO\_VAL\_HS=22 cycles (HS).
      - In OUT state, if tx\_data\_to asserts (bus inactive too long during expected reception), transition to IDLE (no handshake is sent).
  - Additional notes and guards:
    - Handshakes are never generated for SOF.
    - If the endpoint is disabled (csr[23:22]==2'b01), no handshakes/data operations are initiated.
    - Abort conditions (buffer overflow, overlapping match while active, size-too-large at start) force exit to IDLE; these suppress handshakes for the current transaction.
    - usbf\_pa ensures proper single-PID emission with last-cycle extension; CRC for data packets is handled in the transmitter but does not affect the handshake PID generation beyond the state machine decisions above.
- Error Handling (PID, CRC5/CRC16, NSE, Sequence Errors) --

**Scope:** This core detects, reports, and reacts to protocol errors on the UTMI/USB packet stream. Error conditions are surfaced both as device-level interrupt sources (rf.int\_srcb) and, when applicable, as endpoint-level interrupt bits (ep.int\_stat) that are individually masked and cleared by software.

#### Sources and detection:

- PID checksum (pid\_cs\_err): In usbf\_pd, a PID is flagged bad when pid[3:0] != ~pid[7:4]. Exposed to the RF block as pid\_cs\_err and latched into rf.int\_srcb[1]. The core does not gate token processing solely on this flag; it is for software observability.
- CRC5 on tokens (crc5\_err): usbf\_pd recomputes CRC5 over {faddr,endp} and compares with the token's CRC (active low). crc5\_err asserts at token\_valid and mismatch. Device-level: rf.int\_srcb[0]. Functional effect: usbf\_pl suppresses endpoint match on CRC5 error (match\_o requires !crc5\_err), so the protocol engine treats such tokens as non-matches (see NSE below). No ACK/handshake is sent in response.
- CRC16 on data (crc16\_err): usbf\_pd accumulates CRC16 over payload (init 0xffff) and flags error at end of DATA (expected remainder 0x800d). Endpoint-level: usbf\_pe sets int\_crc16\_set on rx\_data\_done & crc16\_err for the matched endpoint (ep.int\_stat[1]). Functional effect: in OUT state, crc16\_err causes immediate abort to IDLE (no ACK sent), allowing host retry. IN path (device transmit) has no receive-CRC16 check by the device.
- Non-match/NSE (nse\_err): In usbf\_pe, nse\_err is asserted when token\_valid & (pid\_OUT | pid\_IN | pid\_SETUP) & !match. Because match\_o = !pid\_bad & fsel & match & token\_valid & !crc5\_err, both

true address/endpoint mismatches and CRC5-failed tokens appear as "no-match" to the PE and raise nse\_err. Device-level: rf.int\_srcb[2]. No handshake is returned.

- Data PID sequence error (int\_seqerr\_set): In usbf\_pe, pid\_seq\_err asserts if the received DATA PID does not match the expected sequence (this\_dpid vs pid\_DATA0/1/2/MDATA). Behavior by transfer type:
  - Isochronous OUT: int\_seqerr\_set is raised (ep.int\_stat[5]) and the engine proceeds to UPDATEW/UPDATE.
  - Non-iso OUT: the core sends ACK and returns to IDLE; int\_seqerr\_set is not raised.
  - IN (device transmit): the host validates sequence; device does not flag a receive-side sequence error.
  - Invalid/unsupported tokens (pid\_bad): Handshake and other non-token PIDs (ACK/NACK/STALL/NYET/PRE/ERR/SPLIT) are filtered from match; PING is allowed only in HS (in FS it is pid\_bad). These are not explicitly reported as errors unless they lead to NSE via match suppression.

Interrupt reporting, masking, and clear:

- Device-level error sources (rf.int\_srcb[8:0]):
  - [8] usb\_reset, [7] rx\_err (UTMI RxError), [6] detach, [5] attach, [4] suspend\_end, [3] suspend\_start,
  - [2] nse\_err, [1] pid\_cs\_err, [0] crc5\_err.
- Read of RF main interrupt source register (adr==7'h3 with re) clears all bits (int\_src\_re handshake).
- inta\_rf/intb\_rf are the masked OR of int\_srcb via inta\_msk/intb\_msk.
- Endpoint-level error/condition bits (ep.int\_stat per endpoint, cleared on read of that endpoint's INT register adr[1:0]==2'h1 with re):
  - [6] out\_to\_small, [5] int\_seqerr\_set (data PID sequence error, iso OUT only), [4] int\_buf1\_set, [3] int\_buf0\_set,
  - [2] int\_upid\_set (unexpected/changed PID context), [1] int\_crc16\_set (data CRC error), [0] int\_to\_set (protocol timeout from usbf\_pe: ACK timeout on IN, DATA timeout on OUT).
- Endpoint inta/intb lines are the masked ORs of int\_stat (iena/ienb).

Protocol reactions summary:

- Bad PID checksum: no automatic transaction suppression; software notified via rf.int\_srcb[1].
- Bad token CRC5: transaction is ignored (no endpoint match, no handshake), rf.int\_srcb[0]=1, and nse\_err also asserts because !match.
- Bad data CRC16 on OUT: drop/abort with no ACK; ep.int\_stat[1]=1 so software can recover.
- No-match/NSE (wrong address/endpoint or CRC5 failure): no handshake; rf.int\_srcb[2]=1.
- Data PID sequence error: iso OUT raises ep.int\_stat[5]; non-iso OUT sends ACK and does not raise the bit.

Software guidance:

- Use rf.int\_srcb to catch device-level stream integrity issues quickly (pid\_cs\_err, crc5\_err, nse\_err) and clear them by reading 7'h3.
- Use per-endpoint int\_stat for data-plane errors (CRC16, sequence) and timeouts; clear by reading the endpoint's INT register.
- On CRC16/NSE/CRC5 errors, expect the core to avoid acknowledging faulty OUTs or malformed tokens so the host retries; firmware should inspect and reset/prepare buffers as needed.

-- Interrupt Generation, Masking, and Clearing --

Overview

- The core exposes two interrupt outputs: inta\_o and intb\_o.
- Each is the OR of: (a) per-endpoint masked interrupt events and (b) global (RF-level) masked interrupt events.

Per-endpoint interrupts (generated in usbf\_ep\_rf)

- Status bits (latched until cleared):
  - 0: Timeout (int\_to\_set)
  - 1: CRC16 error on data payload (int\_crc16\_set)
  - 2: Unexpected/invalid PID sequence (int\_upid\_set)
  - 3: Buffer 0 update/commit event (int\_buf0\_set)
  - 4: Buffer 1 update/commit event (int\_buf1\_set)
  - 5: Data PID sequence error (int\_seqerr\_set)
  - 6: OUT packet size smaller than programmed (out\_to\_small)
- Masking to INTA/INTB (two independent mask groups per endpoint):
  - iena[5:0] masks endpoint sources into INTA (inta)
  - ienb[5:0] masks endpoint sources into INTB (intb)
- Mapping (both iena and ienb):
  - bit0 → status[0] Timeout
  - bit1 → status[1] CRC16 error
  - bit2 → status[2] Unexpected PID
  - bit3 → status[3] Buffer0 and status[4] Buffer1 (shared mask bit)
  - bit4 → status[5] PID sequence error
  - bit5 → status[6] OUT too small
- Assertion
  - Endpoint inta/intb are asserted when (int\_stat & mask) has any bit set.
- Clearing
  - Reading the endpoint INT register (EPx.INT) clears all seven status bits (int\_stat) for that endpoint in a single operation.
- Register access (EP window, adr[1:0]):
  - 0x0: CSR (control/status)
  - 0x1: INT: read returns {zeros, iena, zeros, ienb, zeros, int\_stat}; write updates iena (bits [29:24]) and ienb (bits [21:16]); read side-effect clears int\_stat.
  - 0x2: BUF0
  - 0x3: BUF1

#### Global (RF-level) interrupts (generated in usbf\_rf)

- Sources (int\_srcb[8:0], latched until cleared):
  - 8: USB reset detected
  - 7: RX error (UTMI RxError)
  - 6: De-attach (usb\_attached falling edge)
  - 5: Attach (usb\_attached rising edge)
  - 4: Suspend end (exit suspend)
  - 3: Suspend start (enter suspend)
  - 2: NSE (Non-SEQ) error
  - 1: Token PID checksum error (pid\_cs\_err)
  - 0: SOF token CRC5 error (crc5\_err)
- Masks
  - inta\_msk[8:0] masks RF sources into inta\_o.
  - intb\_msk[8:0] masks RF sources into intb\_o.
  - inta\_rf = OR reduction of (int\_srcb & inta\_msk);  
intb\_rf = OR reduction of (int\_srcb & intb\_msk).
- Clearing
  - Reading the global INT\_SRC register clears all bits in int\_srcb[8:0].
  - Note: int\_srca[15:0] (endpoint summary) reflects epX\_inta | epX\_intb for each endpoint and is not latched/cleared; it updates continuously.

Top-level interrupt outputs

- inta\_o = OR of all endpoint inta signals and inta\_rf (masked global sources).
- intb\_o = OR of all endpoint intb signals and intb\_rf (masked global sources).

Address map within RF space (adr[6:2] page, adr[1:0] offset)

- Global pages (adr[6:2] = 0x00 or 0x01):
  - 0x00/0x01, offset 0x2 (adr[6:0] == 0x02): INT\_MASK
    - Read: {zeros, intb\_msk[8:0] at [24:16], zeros, inta\_msk[8:0] at [8:0]}
    - Write: updates intb\_msk (din[24:16]) and inta\_msk (din[8:0])
  - 0x00/0x01, offset 0x3 (adr[6:0] == 0x03): INT\_SRC
    - Read: {zeros[31:29], int\_srcb[8:0] at [28:20], zeros[19:16], int\_srca[15:0] at [15:0]}
    - Read side-effect: clears int\_srcb[8:0]
- Endpoint windows (adr[6:2] = 0x04..0x13 for EP0..EP15):
  - offset 0x1 (INT):
    - Read: returns masks (iena/ienb) and status (int\_stat[6:0])
    - Write: updates iena (din[29:24]) and ienb (din[21:16])
    - Read side-effect: clears int\_stat[6:0]

#### Notes

- All masks default to 0 after reset (interrupts disabled until enabled by software).
- Endpoint status bits are edge/condition latched and remain set until software reads EPx.INT.
- Global RF sources are latched and remain set until software reads INT\_SRC.
- Endpoint summary (int\_srca) is a live OR of endpoint inta|intb and is not affected by reads.
- inta/intb are registered on the WISHBONE clock (wclk); endpoint status generation occurs on the PHY clock (clk). Software should follow standard read-clear procedures when crossing clock domains.

-- Suspend/Resume Entry and Exit --

This core implements USB suspend and resume in the UTMI low-speed interface state machine (usbf\_utmi\_ls) and exposes status/control at the top level (usbf\_top).

#### Entry into Suspend

- Full-Speed (FS): While in NORMAL, if the FS idle (J) state persists for >3.0 ms (T1\_gt\_3\_0\_mS), the controller asserts usb\_suspend (suspend\_set) and enters SUSPEND.
- High-Speed (HS): While in NORMAL, if inactivity persists for >3.0 ms (T1\_gt\_3\_0\_mS), the controller switches to FS transceiver/termination (xcv\_set\_fs, fs\_term\_on) and enters RES\_SUSP. After >100 µs (T2\_gt\_100\_uS) of stable J (j\_long), it asserts usb\_suspend and enters SUSPEND. If instead SE0 persists, a reset is detected (see below).
- Top-level export: usb\_suspend is driven out on susp\_o.

#### Exit from Suspend (Host-initiated resume)

- If a K state is detected while in SUSPEND (k\_long), the state machine transitions to RESUME and immediately deasserts suspend (suspend\_clr). Once SE0 is observed, it configures bit stuffing and (if applicable) returns to HS (xcv\_set\_hs, fs\_term\_off), then waits >100 µs in RESUME\_WAIT (T2\_gt\_100\_uS) before returning to NORMAL.
- UTMI SuspendM output is gated as: SuspendM = (usb\_suspend & !resume\_req\_s) | (LineState == K). While a resume request is pending from the device, SuspendM is cleared to allow signaling.

#### Exit from Suspend (Device-initiated remote wakeup)

- A device resume request is provided on resume\_req\_i (latched internally as resume\_req\_r until suspend\_clr). While in SUSPEND, if usb\_suspend has been asserted for >5.0 ms (T1\_gt\_5\_0\_mS) and resume\_req\_s is set, the controller enters RESUME\_REQUEST.
- After an additional wakeup delay (T2\_wakeup ≈ 5.0 ms), the controller enters RESUME\_SIG and

actively drives K on the bus (drive\_k\_d) for  $\geq 1.0$  ms (T2\_gt\_1\_0\_ms).

- It then transitions to RESUME (which asserts suspend\_clr to clear resume\_req\_r at the top level), configures the link (bit\_stuff\_on, and HS re-enable if applicable), waits  $>100$   $\mu$ s in RESUME\_WAIT, and returns to NORMAL.

#### Exit to Reset (from Suspend or Normal)

- From SUSPEND: If SE0 persists for  $>2.5$   $\mu$ s (T1\_gt\_2\_5\_uS with se0\_long), the controller clears suspend (suspend\_clr), asserts usb\_reset, and enters RESET.
- From NORMAL (FS): If SE0 duration is between  $>2.5$   $\mu$ s and  $<3.0$  ms and not an idle long condition, a RESET is recognized.
- After RESET, the controller performs speed negotiation (SPEED\_NEG  $\rightarrow$  SPEED\_NEG\_K/J transitions), then settles in HS or FS NORMAL as determined by chirp detection.

#### Timing Constants (implemented by counters)

- T1 thresholds: 2.5  $\mu$ s, 3.0 ms, 3.125 ms, 5.0 ms (used for idle/suspend detection and minimum suspend time before remote wake).
- T2 thresholds: 100  $\mu$ s (post-resume settle),  $\sim 5.0$  ms (remote-wake readiness and wakeup delay),  $\geq 1.0$  ms (resume K signaling), 1.2 ms, and 100 ms (attach debounce). Exact counts are defined by macros and internal prescalers.

#### Software/Top-level Controls and Observability

- Input resume\_req\_i requests device-initiated resume. It is latched (resume\_req\_r) and automatically cleared when suspend\_clr pulses from the UTMI state machine upon entering RESUME/RESET.
- Output susp\_o reflects usb\_suspend.
- Internal rf\_resume\_req exists in the register file, but in this top-level it is not wired into the UTMI resume path; only resume\_req\_i controls remote wake signaling.

#### UTMI and Transceiver Behavior During Transitions

- During suspend entry/exit the core switches transceiver settings and terminations appropriately (XcvSelect, TermSel) to meet FS/HS requirements, and manages OpMode (bit stuffing on/off) in alignment with signaling phases.

#### Summary

- Suspend entry: FS idle J  $>3.0$  ms; HS inactivity  $>3.0$  ms then FS-J  $>100$   $\mu$ s.
- Host resume: Detect K in SUSPEND  $\rightarrow$  RESUME  $\rightarrow$  wait SE0  $\rightarrow$  RESUME\_WAIT  $>100$   $\mu$ s  $\rightarrow$  NORMAL.
- Device remote wake: Allowed after  $\geq 5.0$  ms in SUSPEND if resume\_req\_i set; drive K for  $\geq 1.0$  ms; then RESUME and return to NORMAL.
- Reset detection from SUSPEND or NORMAL is handled per SE0 timing; speed renegotiation follows reset.

#### -- SOF Handling and Frame Timing --

Start-of-Frame (SOF) reception and frame timing are handled in the packet decoder (usbf\_pd) and protocol layer (usbf\_pl), with a software-visible summary provided in the register file (usbf\_rf).

##### 1) SOF token detection and validation

- The packet decoder (usbf\_pd) captures the USB PID and classifies it. For SOF tokens, it latches the two token bytes (token0, token1) and computes CRC5 over {frame\_address[6:0], endpoint[3:0]}.
- token\_valid asserts when the token is fully received; crc5\_err asserts if the inverted CRC5 field in the token (token1[7:3]) does not match the computed CRC5.
- The 11-bit frame number is reconstructed as frame\_no = {token1[2:0], token0}.

## 2) Frame number update and microframe counting

- In usbf\_pl, a valid SOF that passes CRC5 (token\_valid && !crc5\_err && pid\_SOF) generates frame\_no\_we, then (registered to frame\_no\_we\_r) updates frame\_no\_r with the newly received frame\_no.
- mfm\_cnt (4 bits) counts consecutive identical SOF frame numbers: it resets to 0 when the frame number changes (frame\_no\_we\_r && !frame\_no\_same) and increments when successive SOFs carry the same frame number (frame\_no\_same). This provides a microframe index for high-speed operation (typically 0–7), though the counter is 4 bits wide and does not explicitly saturate.

## 3) Time-since-SOF measurement (sof\_time)

- usbf\_pl generates a periodic tick hms\_clk from the PHY clock by counting cycles up to USBF\_HMS\_DEL (default 0x1C = 28). When hms\_cnt equals this constant, hms\_clk pulses and the counter resets.
- sof\_time (12 bits) increments on each hms\_clk pulse and is cleared on every new valid SOF (clr\_sof\_time = frame\_no\_we). Thus sof\_time measures time elapsed since the last accepted SOF in units of hms\_clk.
- Tick resolution depends on the PHY clock and USBF\_HMS\_DEL. With a 60 MHz UTMI clock and USBF\_HMS\_DEL=28, the tick is approximately 0.47–0.5 µs. The 12-bit width yields up to 4095 ticks (~2 ms at ~0.5 µs resolution), sufficient to cover the full-/high-speed inter-SOF intervals.

## 4) Software-visible framing data (frm\_nat)

- usbf\_pl packages a 32-bit framing/timing word frm\_nat = {mfm\_cnt[3:0], 1'b0, frame\_no\_r[10:0], 4'h0, sof\_time[11:0]}.
- Bits [31:28]: mfm\_cnt (microframe count within the current frame)
- Bit [27] : Reserved 0
- Bits [26:16]: frame\_no\_r (11-bit SOF frame number)
- Bits [15:12]: Reserved 0
- Bits [11:0] : sof\_time (time since last valid SOF in hms\_clk ticks)
- frm\_nat is exposed through the main register bank in usbf\_rf (select the main CSR window; within that window, use the sub-address that returns frm\_nat). This allows firmware to read frame number, microframe index, and fine-grain time since SOF for scheduling and diagnostics.

## 5) Robustness and gating

- SOF-driven updates occur only when crc5\_err is deasserted, ensuring corrupted SOFs do not disturb framing.
- The timebase (hms\_cnt) is reset on each SOF boundary to keep sof\_time aligned to the last accepted SOF.

## 6) Integration notes

- The effective time resolution depends on the UTMI PHY clock and the USBF\_HMS\_DEL macro; adjust USBF\_HMS\_DEL to change sof\_time granularity if needed.
- mfm\_cnt increments only when consecutive SOFs carry the same 11-bit frame number (typical of high-speed microframes); in full-speed operation it will remain at or near zero as frame numbers increment every SOF.

# REGISTER MAP

## -- Address Map Overview and Decoding --

Top-level Wishbone address map and decode (32-bit data, byte addressing). Address bus width: A[12:0]. Decoding is performed on A[12].

### Region select

- Register File (RF): A[12] = 0
- Select expression: RF\_SEL = !wb\_addr\_i[12]
- Addressing: only A[8:2] are used inside the RF (word index W = A[8:2]); A[11:9] and A[1:0] are ignored, so the RF is mirrored over A[11:9].
- Buffer Memory (SSRAM): A[12] = 1
- Select expression: MEM\_SEL = wb\_addr\_i[12]
- Addressing: A[11:2] map to sram\_adr\_o[9:0] (10-bit word address). Size = 1024 words x 4 B = 4096 B.

### RF region layout (word offsets W = A[8:2], byte offset = 4\*W)

- W = 0x00..0x01: Main block (mirrored over two pages; lower 3 bits A[4:2] select register):
  - A[4:2] = 0x0 (byte 0x00): Main CSR (RO): {line\_stat[1:0], usb\_attached, mode\_hs, suspend}. Write to this same word (full 7-bit RF address == 0x00) with DIN[5]=1 issues a resume request (self-cleared by core).
  - A[4:2] = 0x1 (byte 0x04): Function Address (FA) (R/W) DIN[6:0].
  - A[4:2] = 0x2 (byte 0x08): Interrupt mask (R/W): inta\_msk[8:0] (DIN[8:0]) and intb\_msk[8:0] (DIN[24:16]).
  - A[4:2] = 0x3 (byte 0x0C): Interrupt source summary (R/C): {int\_srcb[8:0], int\_srca[15:0]}. Read clears sources.
  - A[4:2] = 0x4 (byte 0x10): Frame/microframe/time (frm\_nat) (RO).
  - A[4:2] = 0x5 (byte 0x14): UTMI vendor status (RO). Writing this same word sets utmi\_vend\_ctrl[3:0] and pulses utmi\_vend\_wr.
  - A[4:2] = 0x6..0x7 (bytes 0x18..0x1C): Reserved.
  - W = 0x02..0x03 (bytes 0x20..0x3C): Reserved (reads return 0).
- Endpoint windows (EP0..EP15): W pages 0x04..0x13, one page per endpoint, 4 words each. For endpoint i (i = 0..15):
  - Base word index W\_base = (0x04 + i) << 2. Byte base = 4\*W\_base = 0x40 + 0x10\*i.
  - Offsets within the EP window (A[3:2] = 0..3):
    - +0x0 (byte +0x00): EPi CSR (R/W).
    - +0x1 (byte +0x04): EPi INT status (R/C).
    - +0x2 (byte +0x08): EPi BUFO descriptor (R/W).
    - +0x3 (byte +0x0C): EPi BUF1 descriptor (R/W).
  - Notes: EP presence is compile-time configurable. If an endpoint is not implemented, its window reads as 0 and has no effect on write.
  - Unused RF locations beyond W = 0x13 (bytes > 0x13C) are undefined.

### Buffer Memory (SSRAM) region (A[12] = 1)

- Word address: sram\_adr\_o[9:0] = A[11:2]. Byte addressing within a word uses Wishbone byte selects (A[1:0] are ignored in address decode).
- Size: 4 KiB total (1024 x 32-bit words).

### Notes

- RF aliasing: Because only A[8:2] are used for RF, A[11:9] are ignored—RF registers are mirrored across these upper RF bits.
- All RF and MEM accesses are 32-bit word aligned; A[1:0] are ignored by internal decode.

-- Global Control and Status --

### Global Control and Status (GCSR)

- Address: RF base + 0x00 (internal rf adr[6:0] == 0x00)
- Width: 32-bit
- Access: Read returns PHY/USB status. Writes are ignored except as noted for bit 5.

Read fields (bits):

- [31:5] Reserved. Reads 0.
- [4:3] LineState (UTMI line state): 00=SE0, 01=J, 10=K, 11=SE1. Interpretation depends on mode:
- Full-speed: J=idle, K=resume signaling.
- High-speed: SE0 is idle between packets.
- [2] USB Attached: 1=Device attached/present; 0=not attached.
- [1] Mode HS: 1=High-speed mode; 0=Full-speed mode.
- [0] Suspend: 1=Core in suspend; 0=active.

Write behavior (bit 5 only):

- [5] Resume Request (write-only side-effect): Writing 1 generates an internal pulse to request resume (rf\_resume\_req). Auto-clears internally; reading GCSR does not reflect this bit (always reads 0). All other written bits are ignored.

Reset values:

- All defined status bits default to 0 after reset. LineState reflects UTMI input and may change as the PHY initializes.

Clocking/consistency notes:

- Read data is produced in the system (wclk) domain from status signals sourced in the PHY domain; values may change asynchronously. For stable observation, software may read twice and compare.

Dependencies:

- LineState, Attached, Mode HS, and Suspend are driven by the UTMI/link state machine (usbf\_utmi\_ls). Resume request via bit 5 is generated in the RF block; system integration must route it to PHY control if used.

-- Function Address Register --

### Function Address Register (FAR)

Purpose:

- Holds the 7-bit USB device address assigned by the host (per USB Chapter 9). The packet layer compares incoming token addresses against this value to determine if the transaction is addressed to this device.

Location and Access:

- Address space: Register File (RF) region (i.e., Wishbone address bit 12 = 0 per `USBF\_RF\_SEL').
- Word offset within RF: 0x01 (adr[6:0] == 7'h1). With 32-bit word addressing, this corresponds to byte offset 0x04 from the RF base.
- Bus width: 32 bits; only bits [6:0] are implemented.
- Access type: Read/Write on the Wishbone clock domain (wclk).

Reset Value:

- 0x00 (function address = 0). Note: This register is NOT automatically cleared on USB bus reset; software must update it upon handling a bus-reset event.

**Read Behavior:**

- Returns zero-extended function address: bits [6:0] = current device address; bits [31:7] = 0.

**Write Behavior:**

- On write, bits [6:0] of the data word are latched into the function address; bits [31:7] are ignored.
- No side effects; no interrupts are generated by writes.

**Functional Use:**

- The value drives funct\_adr[6:0], which is fed to the packet layer (usbf\_pl) as "fa" and compared against incoming token\_fadr. Matching contributes to the internal match signal for endpoint processing.

**Clock/Domain Notes:**

- FAR is written in the Wishbone clock domain (wclk) and consumed in the PHY/UTMI clock domain via usbf\_pl. Software should update FAR at safe times (e.g., when the bus is idle or immediately after a USB reset) to avoid transient mismatches.

**Software Guidance:**

- After a SET\_ADDRESS request is accepted, write the assigned 7-bit address to FAR.
- On USB bus reset (indicated via RF interrupt sources — int\_srcb[8] in this design), software should reinitialize FAR (typically to 0) before enumeration resumes.

-- UTMI Vendor Control/Status --

UTMI Vendor Control/Status provides a simple sideband bridge between software (WISHBONE/register-file domain) and vendor-specific UTMI PHY signals. It exposes a 32-bit register at RF address 0x05 (word address in the USB Function Core register file) with the following behavior:

- **Read (RF[0x05]):**
  - Bits [7:0] return the most recently sampled vendor status bus VStatus[7:0].
  - Bits [31:8] read as 0.
  - Sampling/clocking: VStatus[7:0] originates in the PHY clock domain and is sampled into the WISHBONE clock domain each wclk cycle for reads. There is no handshake; software should tolerate asynchronous updates.

- **Write (RF[0x05]):**
  - Bits [3:0] are taken as the new vendor control value VControl[3:0].
  - Writing the register also generates a load/strobe pulse VControl\_Load to inform the PHY that a new control value is available.
  - Bits [31:4] are ignored on write.
  - Clocking and pulse generation: the write occurs in the WISHBONE clock domain; the control value and a single update pulse are then registered into the PHY clock domain. Each write produces exactly one load pulse in the PHY clock domain.

**Top-level signal mapping (usbf\_top ports):**

- VControl\_pad\_o[3:0] <= UTMI vendor control (driven from writes to RF[0x05], bits [3:0]).
- VControl\_Load\_pad\_o <= UTMI vendor control load strobe (asserted once per write to RF[0x05]).
- VStatus\_pad\_i[7:0] => UTMI vendor status (reflected in reads of RF[0x05], bits [7:0]).

**Clock/Reset domains:**

- Software access (read/write of RF[0x05]) uses the WISHBONE clock (wclk).
  - VControl[3:0] and VControl\_Load are registered in the PHY/UTMI clock domain (clk = phy\_clk).
  - VStatus[7:0] is captured from the PHY clock domain into WISHBONE clock domain for readback.
- Initial value after reset is implementation-defined (no synchronous reset applied to the sampled status).

Software usage pattern:

- 1) To issue a vendor-specific command to the PHY, write RF[0x05] with the desired control nibble in bits [3:0]. This latches VControl[3:0] and produces a single VControl\_Load pulse in the PHY clock domain.
- 2) To read vendor-specific status from the PHY, read RF[0x05] and examine bits [7:0]. Because the status is sampled without handshake, software may need to poll or read twice for stable values if the PHY updates the status near the read.

Notes and constraints:

- The meaning of VControl[3:0] and VStatus[7:0] is vendor-/PHY-specific and outside the USB core's standard operation; the core simply forwards control and mirrors status.
- Consecutive writes to RF[0x05] will generate a VControl\_Load pulse per write.
- After reset, VControl[3:0] has no explicit default value until first write; VControl\_Load is deasserted.
- This interface does not affect standard USB protocol handling and can be used for PHY diagnostics, test features, or vendor extensions.

-- Frame and Microframe Counters --

The core exposes a consolidated 32-bit frame/microframe/time register (frm\_nat) derived from SOF tokens on the bus. It is generated in the PHY clock domain inside usbf\_pl and made software-visible through the RF register block (usbf\_rf) at the main register bank with adr[2:0] = 3'b100. Behavior and fields: 1) Source and qualification: frm\_nat updates only on reception of a valid SOF token (token\_valid asserted and CRC5 correct). SOFs are not filtered by device address; all observed SOFs on the bus are considered. 2) Frame number latch: On each qualified SOF, the 11-bit frame number from the token is captured into frame\_no\_r one cycle after the SOF detection. 3) Microframe counter (mfm\_cnt[3:0]): Incremented when consecutive qualified SOFs carry the same 11-bit frame number (frame\_no\_same). Reset to 0 when the incoming SOF's frame number differs from the last latched frame\_no\_r. In full-speed operation this typically remains 0 (since FS SOFs increment the frame number each millisecond). In high-speed operation, if the host issues multiple SOFs per 1 ms frame that reuse the same frame number, this field counts those microframes. 4) SOF time-stamp (sof\_time[11:0]): A coarse time counter that resets to 0 on each qualified SOF and then increments by one on a periodic tick (hms\_clk). The tick is generated by a small prescaler driven by phy\_clk; its period is set by the macro USBF\_HMS\_DEL (default 0x1C). sof\_time thus measures elapsed time since the last SOF in units of the hms\_clk period and wraps on 12-bit overflow. 5) Register layout (frm\_nat[31:0]): [31:28] mfm\_cnt (microframe count), [27] reserved (0), [26:16] frame\_no\_r (11-bit frame number), [15:12] reserved (0), [11:0] sof\_time. 6) Reset behavior: On reset, frame\_no\_r, mfm\_cnt, and sof\_time are cleared to 0. sof\_time is also cleared on each new qualified SOF. No updates occur on PID or CRC errors. This mechanism provides software with the current frame number, a microframe index (when applicable), and a coarse time since the last SOF for scheduling/diagnostics.

-- Interrupt Masks and Sources --

Function-level interrupting follows a two-source model: (1) endpoint-specific interrupts generated and masked inside each endpoint register file (EP0–EP15), and (2) core/global interrupts generated in the RF block. The top-level interrupt outputs are asserted when either source is active.

Top-level interrupt composition

- $\text{inta\_o} = (\text{OR of } \text{epN\_inta for } N=0..15) \text{ OR } (\lvert(\text{int\_srcb} \& \text{inta\_msk}))$
- $\text{intb\_o} = (\text{OR of } \text{epN\_intb for } N=0..15) \text{ OR } (\lvert(\text{int\_srcb} \& \text{intb\_msk}))$
- The endpoint contribution ( $\text{epN\_inta/intb}$ ) is already masked per-endpoint internally (via endpoint  $\text{iena/ienb}$ ). The RF masks ( $\text{inta\_msk/intb\_msk}$ ) apply only to the core/global sources ( $\text{int\_srcb}$ ).

Registers (RF space, main page)

- Interrupt Mask Register (INT\_MASK) at main RF offset 0x02 ( $\text{adr}[2:0] == 3'h2$ )
  - [8:0] INTA\_MASK = inta\_msk (din[8:0])
  - [24:16] INTB\_MASK = intb\_msk (din[24:16])
  - Reset: 0x0000\_0000
- Function: Enables which core/global sources (int\_srcb bits) are allowed to raise inta\_o/intb\_o. A masked-off bit will not contribute to the corresponding top-level interrupt.
- Interrupt Source Register (INT\_SRC) at main RF offset 0x03 ( $\text{adr}[2:0] == 3'h3$ )
  - Read returns: { [31:29]=0, [28:20]=int\_srcb[8:0], [19:16]=0, [15:0]=int\_srca[15:0] }
  - int\_srca[15:0]: Endpoint interrupt summary (read-only, not sticky, not clear-on-read)
  - Bit n = epn\_inta OR epn\_intb (n=0..15). Reflects whether endpoint n currently asserts either A or B interrupt. Detailed reason and per-endpoint masking are contained in each endpoint's register file.
  - int\_srcb[8:0]: Core/global interrupt sources (sticky, clear-on-read)
  - Reading INT\_SRC clears all bits in int\_srcb simultaneously.
  - Reset: all bits 0.

Core/global interrupt sources (int\_srcb bit definitions)

- [8] USB reset detected
- [7] Receive error (UTMI RxError observed)
- [6] Detach event (transition from attached to de-attached)
- [5] Attach event (transition to attached)
- [4] Suspend end (resume from suspend)
- [3] Suspend start (entering suspend)
- [2] Endpoint/address mismatch (token for non-matching function address/endpoint)
- [1] PID checksum error (PID CS failure)
- [0] Token CRC5 error

Operational notes

- Writing INT\_MASK selects which core/global events can assert inta\_o/intb\_o via their respective masks; endpoint-generated interrupts are unaffected by these masks.
- Reading INT\_SRC provides a snapshot of endpoint interrupt presence (int\_srca) and the sticky core/global events (int\_srcb). The read operation clears all int\_srcb bits (read-to-clear). There is no per-bit clear; software should read INT\_SRC after servicing recorded events.
- All masking and source latching for core/global events occurs in the wclk domain. Masks reset to 0 (disabled).

-- Endpoint Window (CSR, INT, BUF0, BUF1) --

Endpoint Window provides a 4x32-bit register set per endpoint (EP0..EP15): CSR, INT, BUF0, BUF1. Addressing within the register-file region: each endpoint window is selected by  $\text{adr}[6:2] = 0x04+EPn$  (EP0 at 0x04, EP1 at 0x05, ...), and  $\text{adr}[1:0]$  selects the register within the window: 0:CSR, 1:INT, 2:BUF0, 3:BUF1.

CSR (offset 0): 32-bit, R/W (some fields are HW-updated only)

- [31:30] UC\_BSEL (read-only): user/current buffer selector mirrored by HW.
- [29:28] UC\_DPD (read-only): data PID state mirrored by HW.

- [27:26] Endpoint Type/Direction: 00=Control, 01=IN, 10=OUT.
- [25:24] Transfer Type: 01=Isochronous, 10=Bulk, others reserved.
- [23:22] Endpoint State: 00=Enabled, 01=Disabled, 10=Stall, 11=Reserved. If OTS\_STOP=1 and an OUT-too-small occurs, HW forces this field to 01 (Disabled).
- [21:18] Endpoint Number (EP address) compared against token EP.
- [17] Allow Large (> MaxPacketSize) OUT packets (1=allow, 0=NACK).
- [16] Allow Small (< MaxPacketSize) OUT packets (1=allow, 0=NACK).
- [15] DMA Enable (ignored for Control endpoints).
- [14] Reserved, reads 0.
- [13] OTS\_STOP: if set, HW disables EP (CSR[23:22]=01) on OUT-too-small.
- [12:0] Reserved.

Notes: SW writes CSR update only [27:15], [13], [12:0]. UC\_BSEL/UC\_DPD ([31:28]) are updated by HW (read-only to SW).

- INT (offset 1): 32-bit; read returns masks and sticky status; write updates masks; read clears status
- Readback layout: [31:30]=0, [29:24]=IENA[5:0] (mask for INTA), [23:22]=0, [21:16]=IENB[5:0] (mask for INTB), [15:7]=0, [6:0]=INT\_STAT[6:0].
  - Write: din[29:24] -> IENA, din[21:16] -> IENB. Other bits ignored on write.
  - INT\_STAT bits (set by HW, cleared on read of INT):
    - [6] OUT\_TOO\_SMALL: OUT packet size != MaxPacketSize during DMA.
    - [5] PID\_SEQ\_ERR: data PID sequencing error.
    - [4] BUF1 Event (buffer update/completion).
    - [3] BUF0 Event (buffer update/completion).
    - [2] UNEXPECTED\_PID: token PID not matching EP direction/type.
    - [1] CRC16\_ERR: data CRC error.
    - [0] TIMEOUT: IN ACK timeout or OUT data transmit timeout.
  - Interrupt outputs: INTA and INTB are asserted when any (INT\_STAT[i] & IEN? mask) is set. Note: BUF1 ([4]) and BUF0 ([3]) share the same mask bit index 3 in both IENA and IENB.

BUF0 (offset 2), BUF1 (offset 3): 32-bit buffer descriptors, R/W (HW updates during transfers)

- [31] NA (Not Available): 1 marks buffer unavailable (SW convention). HW also treats an all-ones address as not allocated.
- [30:17] SIZE (14 bits): buffer size in bytes.
- [SSRAM\_HADR+2:4] ADR[high]: start address high bits into external SSRAM window.
- [3:0] Mixed-use low nibble:
  - On SW write (initialization) and on HW address updates (buf\_set): holds ADR[3:0].
  - On HW UC status updates: holds {UC\_DPD[1:0], UC\_BSEL[1:0]}.

Notes/behavior:

- BUF0\_ORIG latches the last SW-written BUF0; when HW asserts buf0\_rl, BUF0 is reloaded from BUF0\_ORIG.
- During operation, HW updates BUF0/BUF1 on buffer completion or special conditions (e.g., OUT-too-small).
- Software should program SIZE and ADR fields; bit[31] may be used to mark unavailability; an address of all ones also denotes not allocated.

Reset values

- CSR: all zeros (UC\_BSEL/UC\_DPD=0; OTS\_STOP=0; EP disabled until configured).
- INT: IENA=0, IENB=0, INT\_STAT=0.
- BUF0/BUF1: 0xFFFF\_FFFF (treated as not allocated).

-- DMA Request/Acknowledge Mapping --

DMA request/acknowledge is exposed as a 16-bit vector per endpoint at the top level. Mapping and handshake are as follows:

- Buses
  - `dma_req_o[15:0]`: One request bit per endpoint. Bit N corresponds to endpoint N (0..15).
  - `dma_ack_i[15:0]`: One acknowledge bit per endpoint. Bit N acknowledges endpoint N.
  - For endpoints not compiled in (USBF\_HAVE\_EPx undefined), the corresponding `dma_req_o` bit is permanently 0 and the `dma_ack_i` bit is ignored.
- Clocking and polarity
  - Request and acknowledge are active-high and synchronous to the WISHBONE/system clock domain `wclk` (top-level `clk_i`).
  - Internally, each `dma_ack_i[N]` is also synchronized into the PHY clock domain for counter bookkeeping.
- When requests are generated
  - Requests are emitted only if the endpoint's DMA enable (CSR bit `csr[15]`) is set and the endpoint is not a control endpoint (`csr[27:26] != 2'b00`).
  - OUT endpoints (`csr[27:26] == 2'b10`): `dma_req_o[N]` asserts when there is at least one 32-bit word of received packet data to be transferred out by the system DMA.
  - IN endpoints (`csr[27:26] == 2'b01`): `dma_req_o[N]` asserts when the endpoint needs data to be filled by the system DMA (i.e., current filled-word count is below the programmed packet length).
- Acknowledge semantics (what a single ack means)
  - The system must pulse `dma_ack_i[N]` high for exactly one `wclk` cycle for each 32-bit word transferred by the system DMA on behalf of endpoint N.
  - For OUT endpoints, an ack pulse denotes that the system has consumed one 32-bit word from the endpoint buffer.
  - For IN endpoints, an ack pulse denotes that the system has provided one 32-bit word to the endpoint buffer.
- Request deassertion and burst/hold behavior
  - While `dma_req_o[N]` is high, the endpoint may request a burst: a hold condition keeps the request asserted across multiple acks until the internal word counters reach their termination conditions.
  - Deassertion rule: `dma_req_o[N]` will deassert after an ack pulse when the endpoint's internal hold condition is false; otherwise it remains asserted to continue the burst.
  - Hold conditions (internal):
    - OUT: request stays asserted while there are outstanding words to be serviced (roughly, when the remaining OUT word-count is non-zero; internally held for more than a few words to encourage bursts).
    - IN: request stays asserted until the filled-word count approaches the programmed buffer length (internally held while fewer than (length-3) words have been provided).
- Correct usage requirements for external DMA
  - Only assert `dma_ack_i[N]` when `dma_req_o[N]` is currently high.
  - Provide exactly one ack pulse per 32-bit word moved.
  - Ack pulses must be synchronous to `wclk` and at least one clock wide; multi-cycle high is allowed but is treated as a single handshake per cycle high.
  - Do not generate acks when DMA is disabled (`csr[15]==0`) or for control endpoints; no requests will be generated in those cases.
- Reset behavior

- On reset, dma\_req\_o[\*] are deasserted and internal handshake/counter state is cleared.

Summary mapping

- dma\_req\_o[0] ↔ dma\_ack\_i[0]: Endpoint 0
- dma\_req\_o[1] ↔ dma\_ack\_i[1]: Endpoint 1
- ...
- dma\_req\_o[15] ↔ dma\_ack\_i[15]: Endpoint 15

Note: If an endpoint is not synthesized (USBF\_HAVE\_EPx undefined), its request line remains 0 and its ack input has no effect.

-- Access Semantics (R/W, W1C, Side Effects) --

- Address decode: wb\_addr\_i[12]==0 selects the Register File (RF) space exposed by usbf\_rf (adr = wb\_addr\_i[8:2]); wb\_addr\_i[12]==1 selects external buffer memory (SSRAM) with normal RW semantics (no special side effects beyond memory read/write).
- RF global registers (adr[6:2]==0x00/0x01, subselected by adr[2:0]):
- 0x00.main\_csr (RO): {line\_state[1:0], usb\_attached, mode\_hs, suspend}. Read-only; no side effects.
- 0x00.funct\_adr (RW): 7-bit device address. Write sets the value; read returns it; no side effects.
- 0x00.int\_mask (RW): inta\_msk[8:0] and intb\_msk[8:0]. Write updates masks; read returns; no side effects.
- 0x00.int\_src (int sources): returns {int\_srcb[8:0], int\_srca[15:0]}.
- int\_srcb (R1C): sticky global source bits set by events (usb\_reset, rx\_err, attach/deattach, suspend start/end, nse\_err, pid\_cs\_err, crc5\_err). Read of this register (re at adr==0x03) clears int\_srcb bits (Read-to-Clear) and may deassert global interrupts.
- int\_srca (RO): per-endpoint summary (epN\_inta|epN\_intb). Read-only; not cleared by read.
- 0x00.frm\_nat (RO): frame/microframe/timestamp info. Read-only; no side effects.
- 0x00.utmi\_vendor (RW/WO/RO combo):
- Read returns UTMI vendor status (RO).
- Write din[3:0] updates utmi\_vend\_ctrl and generates a single-cycle utmi\_vend\_wr strobe (side effect). No defined readback of control.
- 0x00.resume\_req (WO, self-clearing): write to adr==0x00 with din[5]==1 sets rf\_resume\_req; it auto-clears when rf\_resume\_req is observed by the UTMI/link (side effect). Reading returns only main\_csr (no readback of this bit).
- Endpoint register blocks (adr[6:2]==0x04..0x13 map EP0..EP15; subregister selected by adr[1:0]):
- +0: EP\_CSR (mixed RW/RO):
- Writable fields: csr0[12:0], ots\_stop (bit13), csr1[27:15].
- Read-only within CSR image: uc\_bsel[31:30], uc\_dpd[29:28] (updated only by hardware via uc\_\*\_set from protocol engine).
- Side effects: if ots\_stop=1 and an out\_to\_small event occurs, hardware forces csr1[8:7]=2'b01.
- +1: EP\_INT (RW + R1C): read returns {ienab, ienb, int\_stat}.
- iena[29:24], ienb[21:16] (RW): write updates enables; read returns current.
- int\_stat[6:0] (R1C): sticky per-EP event bits (timeout, CRC16, PID/seq error, buffer events, out\_to\_small). A read of EP\_INT (re at adr sub==+1) clears int\_stat (Read-to-Clear) and may deassert ep interrupts.
- +2: BUF0 (RW, HW-updated): software write sets buf0 and also latches buf0\_orig to the same value (side effect). Hardware may update buf0 on buf0\_set/buf0\_rl events.
- +3: BUF1 (RW, HW-updated): software write sets buf1. Hardware may update buf1 on buf1\_set or out\_to\_small.
- Notes on BUFX side effects: hardware modifies addresses/sizes and toggles uc\_bsel/uc\_dpd via uc\_\*\_set; these affect DMA buffer sequencing. Reading BUFX has no side effects.
- Interrupt outputs:

- int\_a\_o/intb\_o are asserted from per-EP inta/intb and masked RF sources. Reading EP\_INT (per-EP) clears that EP's int\_stat (R1C). Reading global int\_src clears only int\_srcb (R1C). No W1C fields are implemented.
- Wishbone interface timing/acks: wb\_ack\_o pulses after internal state machine completes. RF reads assert rf\_re; RF writes assert rf\_we. No additional side effects beyond those listed above.

-- Reset Values and Field Definitions --

Register file is selected when wb\_addr\_i[12] == 0. All offsets below are word indices taken from wb\_addr\_i[8:2]. Unless stated, fields are 0 after reset. Endpoint blocks (EP0..EP15) are replicated; reset values are identical for all endpoints.

Top-level RF registers

- 0x00 MAIN\_CSR (R; W has side-effect on bit 5 only)
  - [31:5] R: 0
  - [4:3] R: LineState[1:0] (mirrors UTMI line state)
  - [2] R: usb\_attached (1 when attached)
  - [1] R: mode\_hs (1=High-Speed, 0=Full-Speed)
  - [0] R: suspend (1 when suspended)
  - Write side-effect: writing a 1 to bit [5] latches a resume request (rf\_resume\_req). Cleared automatically when consumed. All other bits are ignored on write. Reset: readback shows [4:0]=0; write-side latch reset to 0.
- 0x01 FUNCT\_ADR (R/W)
  - [31:7] 0
  - [6:0] Function address. Reset: 0x00.
- 0x02 RESERVED (R=0, W ignored)
- 0x03 RESERVED (R=0, W ignored)
- 0x04 FRAME\_NAT (R)
  - [31:28] Microframe count within current frame (increments on repeated SOF of same frame number)
  - [27] 0
  - [26:16] Last latched frame number
  - [15:12] 0
  - [11:0] SOF timing counter (host milli/µs tracking)
- Reset: 0x00000000.
- 0x05 UTMI\_VENDOR (R: status; W: control)
  - Read: [31:8]=0, [7:0]=utmi\_vend\_stat (mirrors VStatus\_pad\_i). Reset readback depends on external pad; unspecified in RTL.
  - Write: [3:0] utmi\_vend\_ctrl; write also asserts a one-shot utmi\_vend\_wr strobe. Reset value of utmi\_vend\_ctrl is unspecified (no explicit reset); write before use.
- 0x06..0x07 RESERVED (R=0, W ignored)
- 0x08..0x25 EPx blocks (EP0..EP15)
 

Each endpoint x has 4 registers at base = 0x04 + x:

  - base+0: EPx\_CSR (R/W)
  - Layout (read):
    - [31:30] uc\_bsel (R/O): internal user buffer select (updated by core)
    - [29:28] uc\_dpd (R/O): internal DATA PID state (updated by core)

- [27:26] EP direction/type (R/W): 00=Control, 01=IN, 10=OUT, 11=Reserved
- [25:24] Transfer type (R/W): 01=Isochronous, 10=Bulk, 00/11=Other/Reserved
- [23:22] Endpoint status (R/W): 00=Normal, 01=Disabled, 10=Stall, 11=Reserved
- [21:18] Endpoint address (R/W): 4-bit endpoint number this CSR controls
- [17] lrg\_ok (R/W): allow OUT larger than MaxPacket (1=allow)
- [16] sml\_ok (R/W): allow OUT smaller than MaxPacket (1=allow)
- [15] dma\_en (R/W): enable DMA for non-control endpoints (ignored for Control)
- [14] Reserved, reads as 0, writes ignored
- [13] ots\_stop (R/W): if set and an OUT packet is smaller than MaxPacket, core auto-sets status to Disabled (01)
- [12:11] tr\_fr (R/W): data PID sequencing mode (used by core for PID selection)
- [10:0] max\_pl\_sz (R/W): MaxPacketSize
- Writes: only csr1 ([27:15]), ots\_stop ([13]), and csr0 ([12:0]) are writable; uc\_bsel/uc\_dpd ([31:28]) are read-only and updated by hardware. Reset: all zeros (uc\_bsel=0, uc\_dpd=0, dir=Control, type=0, status=Normal, ep\_addr=0, lrg\_ok=0, sml\_ok=0, dma\_en=0, [14]=0, ots\_stop=0, tr\_fr=0, max\_pl\_sz=0).

#### base+1: EPx\_INT (R/W)

- Read returns: { [31:30]=0, [29:24]=IENA[5:0], [23:22]=0, [21:16]=IENB[5:0], [15:7]=0, [6:0]=INT\_STAT }
- INT\_STAT (latched, W1C-on-read by entire field):
- [6] OUT\_TOOSMALL (out\_to\_small)
- [5] SEQ\_ERR (int\_seqerr\_set)
- [4] BUF1\_DONE (int\_buf1\_set)
- [3] BUF0\_DONE (int\_buf0\_set)
- [2] UNEXPECTED\_PID (int\_upid\_set)
- [1] CRC16\_ERR (int\_crc16\_set)
- [0] TIMEOUT (int\_to\_set)
- IENA/IENB (masks for inta/intb aggregation): asserting a bit enables that source to contribute to inta/intb respectively.
- Write: only masks are writable: IENB <= din[21:16], IENA <= din[29:24]. Reading this register clears INT\_STAT (all 7 bits). Reset: IENA=0x00, IENB=0x00, INT\_STAT=0x00.

#### base+2: EPx\_BUF0 (R/W)

- Buffer descriptor for buffer 0.
- [31] NA (1=Not allocated/Not available)
- [30:17] SIZE (bytes)
- [SSRAM\_HADR+2:0] BYTE\_ADDRESS into shared SRAM (default build SSRAM\_HADR=9 → bits [11:0])
- Note: core updates this descriptor during transfers (address, size, and low nibble may be updated to track progress and PID/buffer select). Software writes initial descriptor; hardware maintains it thereafter. Reset: 0xFFFF\_FFFF (NA=1, address undefined).

#### base+3: EPx\_BUF1 (R/W)

- Same format and behavior as BUF0 for buffer 1. Reset: 0xFFFF\_FFFF.

#### Interrupt source aggregation (top-level RF register 0x03)

- Read: { [31:29]=0, [28:20]=INT\_SRCB[8:0], [19:16]=0, [15:0]=INT\_SRCA[15:0] }
- INT\_SRCB (latched; cleared on read of 0x03):
- [8] USB\_RESET detected
- [7] RX\_ERROR (UTMI RxError)
- [6] DETACH (usb\_attached falling edge)

- [5] ATTACH (usb\_attached rising edge)
- [4] SUSPEND\_END (usb\_suspend falling edge)
- [3] SUSPEND\_START (usb\_suspend rising edge)
- [2] NO SUCH\_ENDPOINT (nse\_err)
- [1] PID\_CHECKSUM\_ERR (pid\_cs\_err)
- [0] TOKEN\_CRC5\_ERR (crc5\_err)
- INT\_SRCA (live OR, not latched/cleared): bit i reflects (EPi\_INTA | EPi\_INTB). Reset readback = 0 until endpoints raise interrupts.

Address decoding summary

- RF region: wb\_addr\_i[12] == 0
- Register index within RF: wb\_addr\_i[8:2]
- Endpoint block x (0..15): base = 0x04 + x; subregister offsets: +0 CSR, +1 INT, +2 BUF0, +3 BUF1

Notes

- uc\_bsel and uc\_dpd in EPx\_CSR are managed by hardware; software reads them but does not write them.
- EP INT\_STAT and top-level INT\_SRCB are cleared by read (whole-field clear on read access to their respective registers); masks retain their values.
- UTMI vendor control (0x05 write) has no defined reset value; write before use.
- Default build parameters in this image: SSRAM\_HADR=9 (12-bit byte address in BUFx), WB RF/MEM select bit at wb\_addr\_i[12].

-- Endianness and Alignment --

Data path endianness is little-endian for buffer memory: the first byte of a stream maps to D[7:0] of the first 32-bit word, the second to D[15:8], third to D[23:16], and fourth to D[31:24]. UTMI is an 8-bit byte stream (no byte swapping on the UTMI interface); CRC blocks handle any bit order internally. Buffer descriptors store byte addresses: addr[SSRAM\_HADR+2:4] hold high address bits and, when a buffer update occurs (buf\*\_set), addr[3:0] captures the low nibble. The low nibble [3:0] is also used to convey UC\_DPD/UC\_BSEL when uc\_\*\_set is asserted, so software must write the address low nibble via buf\*\_set operations.

Alignment rules:

- Internal DMA supports unaligned buffer addresses at byte granularity. It performs read-modify-write on the first/last partial words to preserve untouched bytes.
- External SRAM interface is 32-bit word-addressed; sram\_adr\_o is a word address and writes are full 32-bit words (no byte enables).
- Wishbone memory/register accesses are 32-bit word accesses only; lower address bits [1:0] are ignored (memory via ma\_adr[SSRAM\_HADR+2:2], registers via ma\_adr[8:2]). Software must not rely on byte-selects.

Implication: software reading/writing packet data via the memory window should interpret 32-bit words in little-endian byte order; unaligned buffers are allowed and handled by the core, but CPU-side memory writes must be full words.

## INTERFACE SPECIFICATIONS

-- WISHBONE Slave Interface --

32-bit WISHBONE B3/B4 classic slave, synchronous to wb\_clk.

Signals

- Inputs: wb\_clk, rst (active-high), wb\_addr\_i[12:0], wb\_data\_i[31:0], wb\_we\_i, wb\_stb\_i, wb\_cyc\_i
- Outputs: wb\_data\_o[31:0], wb\_ack\_o
- No SEL\_I/ERR\_O/RTY\_O/STALL\_O; only single-word classic cycles supported.

Addressing and map (USBF\_UFC\_HADR=12)

- 13-bit address bus (A12..A0), 32-bit data.
- Region select by A[12]:
- A[12]=0: Register space (RF)
- Effective register index = A[8:2] (word-aligned). A[1:0] ignored. Space mirrored over A[11:9].
- 0x00.. word map drives core control/status, endpoint windows, etc. (see core register map).
- A[12]=1: Buffer memory window (SSRAM)
- Effective memory index = A[11:2] (word-aligned). A[1:0] ignored.
- Depth set by USBF\_SSRAM\_HADR=9 → 1024 words (4 KiB) accessible.
- Full 32-bit reads/writes only; no byte enables.

Data routing

- Reads: wb\_data\_o is muxed from RF or buffer memory by A[12].
- Writes: RF writes assert rf\_we; memory writes assert ma\_we and target SRAM via arbiter.
- Byte ordering in buffer memory: the byte at address +0 maps to wb\_data\_o[7:0] (little-endian within 32-bit word).

Handshake and timing

- wb\_ack\_o is a single-cycle pulse per transfer.
- Typical latency: 1–3 wb\_clk cycles from STB/CYC assertion to ACK, depending on clock-domain crossing and target (RF vs. memory).
- No STALL\_O: master must hold CYC/STB until ACK.

Clock domains and arbitration

- Requests are sampled into the PHY clock domain for servicing; ACK is returned in wb\_clk domain.
- Buffer memory accesses arbitrate with internal DMA on the PHY side. DMA has priority; Wishbone memory accesses may be delayed until DMA releases the SRAM.

Reset

- rst\_i synchronously clears internal state; do not access bus while in reset.

-- UTMI+ PHY Interface --

UTMI+ PHY Interface provides the physical-layer attachment for the USB Function core. It wraps a standard 8-bit UTMI+ data path and control/status sideband and implements line-state management, speed negotiation, suspend/resume, and PHY handshakes in the PHY clock domain.

Clock and reset

- phy\_clk\_pad\_i is the only clock for the UTMI interface logic; all UTMI signals are registered on this clock.
- phy\_rst\_pad\_o is a direct copy of rst\_i and should be used to reset the external PHY as required.

### UTMI+ data and handshakes

- Transmit (device to host): DataOut\_pad\_o[7:0], TxValid\_pad\_o, TxReady\_pad\_i.
- DataOut is driven with the first byte when either TxReady is asserted or a new packet starts (tx\_first). Subsequent bytes are presented each cycle while TxReady is high.
- TxValid is asserted for the duration of a packet and remains asserted until the PHY acknowledges (TxReady) or while a K-chirp is being driven during resume signaling.
- Receive (host to device): DataIn\_pad\_i[7:0], RxValid\_pad\_i, RxActive\_pad\_i, RxError\_pad\_i are sampled and synchronized into rx\_data, rx\_valid, rx\_active, rx\_err.

Line-state, mode, and termination control (managed by usbf\_utmi\_ls)

- LineState\_pad\_i[1:0] is decoded each cycle to detect J/K/SE0 conditions and idle.
- XcvSelect\_pad\_o selects transceiver mode (FS vs HS) during and after speed negotiation.
- TermSel\_pad\_o controls FS terminations (enabled when operating FS as per state machine).
- OpMode\_pad\_o drives the UTMI OpMode: normal operation when bit-stuffing is enabled; non-driving when disabled (used during reset/suspend/negotiation windows).
- SuspendM\_pad\_o controls PHY suspend per UTMI+ convention; it is asserted during suspend and deasserted when resuming/active (the wrapper internally gates resume requests and line-state conditions).
- usb\_vbus\_pad\_i is monitored to gate attach/pot behavior.

### Suspend and resume

- The core accepts a resume request (resume\_req\_i). usbf\_utmi\_ls generates a resume sequence (RESUME\_REQUEST → RESUME\_SIG → RESUME) observing T2 timing; during RESUME\_SIG a K-chirp is driven via an internal drive\_k signal, which forces TxValid and a defined DataOut pattern.
- suspend\_clr is produced to clear the latched resume request once the PHY exits suspend.
- USB suspend, reset, and attach status are exported internally (usb\_suspend, usb\_reset, usb\_attached) and used by higher layers.

### High-/Full-speed negotiation

- After SE0 reset, the state machine performs HS detection using alternating K/J chirps; on success it asserts mode\_hs and configures XcvSelect/TermSel/OpMode for HS. Otherwise it selects FS and enables FS terminations.

### Vendor sideband

- VStatus\_pad\_i[7:0] is sampled each wclk cycle into the register file for software visibility.
- VControl\_pad\_o[3:0] and VControl\_Load\_pad\_o provide a programmable vendor control nibble and write strobe to the PHY.

### Integration notes

- All UTMI signals are synchronous to phy\_clk.
- The wrapper ensures UTMI timing requirements on DataOut/TxValid relative to TxReady at packet start.
- Line-state and timing constants (T1/T2) are parameterized via macros and implement USB 2.0 suspend, reset, resume, and negotiation timings.

### -- External SSRAM Interface --

- Purpose: External single-port buffer memory used for endpoint data (RX/TX) and Wishbone debug/host access. The interface is arbitrated between the USB packet engine (IDMA) and the Wishbone side.
- Signals (top-level):
- sram\_adr\_o[SSRAM\_HADR:0]: Word address (32-bit word indexing; byte lanes are not exposed).

- sram\_data\_o[31:0]: Write data to SRAM.
- sram\_data\_i[31:0]: Read data from SRAM.
- sram\_we\_o: Active-high write enable.
- sram\_re\_o: Read enable, held high continuously.
- Parameterization and size:
  - Address width is parameter SSRAM\_HADR = `USBF\_SSRAM\_HADR (effective value in this build: 9). This yields A9:0 = 1024 words x 32 bits = 4 KiB total.
- Addressing and endianness:
  - External addressing is word-based: sram\_adr\_o = internal\_byte\_address[SSRAM\_HADR+2:2].
  - No byte enables; all writes are full 32-bit words.
  - Little-endian mapping within each word (byte 0 at bits [7:0]).
- Clocking and timing expectations:
  - No explicit SRAM clock signal is provided; the arbiter (usbf\_mem\_arb) runs in the PHY clock domain.
  - Read behavior: present sram\_adr\_o with sram\_we\_o=0; read data must be valid on sram\_data\_i one phy\_clk cycle later (assumes 1-cycle synchronous read latency or compliant flow-through timing).
  - Write behavior: drive sram\_adr\_o and sram\_data\_o with sram\_we\_o=1 for one cycle for a full-word write.
  - sram\_re\_o is tied high; use it as a constant chip/read-enable as appropriate for the memory device.
- Arbitration and access model:
  - usbf\_mem\_arb multiplexes two masters onto the single SRAM port:
  - M (USB IDMA, PHY clock): fixed priority; mack asserts immediately when mreq is high; address must be held to return data next cycle.
  - W (Wishbone via usbf\_wb): granted only when M is idle; wack is generated in PHY clock and resynchronized to WB clock.
  - Both masters see read data directly from sram\_data\_i.
- Integration notes:
  - Implement the memory as a 32-bit, single-port synchronous RAM with 1-cycle read latency (or wrap a device-specific SSRAM to match this handshake). Map its clock to the PHY clock domain.
  - No chip-select or byte-enable signals are provided; if required by the memory, derive them from sram\_re\_o (always 1) and sram\_we\_o.
  - Wishbone memory window selection is done above this interface (wb\_addr\_i[12] selects memory space); external mapping does not affect the SRAM pins.

#### -- Interrupt Outputs (inta\_o, intb\_o) --

inta\_o and intb\_o are the two external, active-high, level interrupts generated by the USB function core. They are synchronous to the WISHBONE clock (wclk = clk\_i) and are asserted whenever any enabled interrupt source is pending. Each line is the logical OR of:

- Endpoint-level masked interrupt requests (from EP0..EP15)
- Core-level (RF) masked interrupt events

#### Aggregation and masking

- inta\_o = (OR over all enabled endpoint-A interrupts) OR (OR over enabled RF-A events)
- intb\_o = (OR over all enabled endpoint-B interrupts) OR (OR over enabled RF-B events)
- Outputs update on the rising edge of wclk and remain asserted while any contributing source remains set and masked in.

#### Endpoint-level interrupts (per endpoint)

- Each endpoint (EP0..EP15; only instantiated endpoints contribute) has two interrupt outputs (A and B) that feed the aggregate inta\_o/intb\_o.
- Per-endpoint event sources (latched in the PHY clock domain, reported at wclk):
  - 0) Timeout (ACK timeout after IN, or TX data timeout during OUT)

- 1) CRC16 error on received data
- 2) Unexpected PID (UPID)
- 3) BUF0 update/completion
- 4) BUF1 update/completion
- 5) PID sequence error
- 6) OUT packet smaller than max payload (out\_to\_small)
  - Per-endpoint mask registers (written via the endpoint register space) control which events drive the A and B lines:
    - iena[5:0] masks EP-A; ienb[5:0] masks EP-B
    - Event-to-mask mapping:
      - [0] Timeout -> mask bit 0
      - [1] CRC16 error -> mask bit 1
      - [2] Unexpected PID -> mask bit 2
      - [3] BUF0 -> mask bit 3
      - [4] BUF1 -> mask bit 3 (shares mask with BUF0)
      - [5] Sequence error -> mask bit 4
      - [6] OUT too small -> mask bit 5
    - Clearing endpoint events: reading the endpoint INT register (adr[1:0] == 2) clears that endpoint's latched event bits.
    - Reset state: iena and ienb default to 0 (endpoint interrupts disabled until enabled by software).

#### Core-level (RF) interrupts

- RF event sources (latched, cleared on read of the RF INT source register):
  - [8] USB reset detected
  - [7] RX error (UTMI RxError)
  - [6] Detach (usb\_attached falling)
  - [5] Attach (usb\_attached rising)
  - [4] Exit suspend (suspend end)
  - [3] Enter suspend (suspend start)
  - [2] NSE error (protocol layer)
  - [1] PID check error (pid\_cs\_err)
  - [0] Token CRC5 error
- Separate 9-bit masks for A and B:
  - inta\_msk[8:0] controls which RF events contribute to inta\_o
  - intb\_msk[8:0] controls which RF events contribute to intb\_o
- Register access (RF register space, wclk domain):
  - Write masks at RF address 0x02: intb\_msk <= din[24:16], inta\_msk <= din[8:0]
  - Read INT source at RF address 0x03: returns {int\_srcb, int\_srca}; reading clears only RF event latches (int\_srcb), not endpoint latches
- Reset state: inta\_msk and intb\_msk default to 0 (RF interrupts disabled until enabled by software).

#### Software visibility and clearing

- The RF INT source readback (RF addr 0x03) returns:
  - int\_srca[15:0] = per-endpoint summary bits (epN\_inta OR epN\_intb), one bit per endpoint
  - int\_srcb[8:0] = RF event bits described above
- Reading RF addr 0x03 clears int\_srcb but does not clear any endpoint event bits; endpoint events must be cleared by reading each endpoint's INT register.

#### Summary

- Both inta\_o and intb\_o are level, synchronous, and represent the OR of enabled endpoint and RF sources.
- All masks default to 0 after reset; no interrupts will assert until software programs masks.

- RF event latches clear on read of RF INT source; endpoint event latches clear on read of the per-endpoint INT register.
- Only instantiated endpoints contribute to the aggregate outputs; dummy (non-present) endpoints contribute zero.

-- DMA Handshake (dma\_req\_o, dma\_ack\_i) --

Purpose: Provide a per-endpoint DMA service request/acknowledge handshake between the USB function core and an external DMA (or software DMA) agent for moving 32-bit words to/from the core's endpoint buffers in the shared SSRAM. Interface: dma\_req\_o[15:0] (output, active-high, wclk domain = clk\_i) and dma\_ack\_i[15:0] (input, active-high pulse, wclk domain = clk\_i). Bit mapping: bit 0 = EP0, bit N = EPN (0..15). Reset: On rst, requests deassert; acks should be inactive.

Operation: While an endpoint is DMA-enabled (csr[15]=1) and configured as IN (csr[27:26]=2'b01) or OUT (2'b10), the core asserts dma\_req\_o[n] when service is needed:

- OUT endpoint (device receives data from USB host): dma\_req\_o[n] asserts when there is received data in the endpoint buffer that must be drained to the system. Each dma\_ack\_i[n] pulse indicates the external DMA has removed one 32-bit word from the buffer; the core decrements its internal OUT word counter. When the counter reaches zero, the request can deassert.
- IN endpoint (device sends data to USB host): dma\_req\_o[n] asserts when the endpoint buffer needs to be filled by the system. Each dma\_ack\_i[n] pulse indicates one 32-bit word has been written into the buffer; the core increments its internal IN word counter. When sufficient data is available (up to the programmed buffer/packet size), the request can deassert.

Handshake semantics: dma\_req\_o is level-sensitive; it may remain asserted across multiple acks to allow streaming bursts. The external DMA may pulse dma\_ack\_i[n] on any wclk cycle while dma\_req\_o[n] is high; each pulse must correspond to exactly one completed 32-bit data transfer to/from the endpoint buffer. The core includes hold logic that can keep dma\_req\_o[n] asserted after an ack if more words are immediately needed/available (no need to wait for a reassert between beats). Ack pulses should be at least one wclk cycle wide and deasserted between pulses. Acks received while the corresponding request is low are ignored.

Clocking and CDC: Both dma\_req\_o and dma\_ack\_i are in the wclk (clk\_i) domain. The core internally synchronizes the ack into the PHY clock domain to update counters safely; no special CDC handling is required by the external DMA.

Direction and sizing: The handshake itself does not encode direction. Software/driver or the DMA engine should use endpoint CSR fields to determine direction (csr[27:26]) and sizes (max payload size csr[10:0] and buffer descriptors buf0/buf1). Transfers are 32-bit word granular; one dma\_ack\_i pulse == one 32-bit word.

Servicing data: Actual data movement occurs via the core's buffer memory (SSRAM) through the Wishbone/memory port; the DMA handshake only signals demand and completion. The external agent must ensure each ack corresponds to a successful 32-bit read (OUT) or write (IN) at the correct buffer address.

Throughput and flow control: The core exposes additional status (e.g., dma\_in\_buf\_sz1, dma\_out\_buf\_avail via internal paths) to manage flow, but these are internal to the core. The external DMA should continue issuing ack pulses until dma\_req\_o deasserts, indicating the current DMA need is satisfied.

-- Clocks and Resets (clk\_i, phy\_clk\_pad\_i, rst\_i) --

- Overview: The core operates with two independent clock domains and a single active-low reset:
- clk\_i: System/WISHBONE clock domain (register/file access, interrupts, DMA request handshake, system-side memory access requests).
- phy\_clk\_pad\_i: UTMI PHY clock domain (UTMI interface, packet layer, endpoint engines, internal DMA, and the physical SSRAM port).

- `rst_i`: Active-low reset distributed to all submodules; also driven out as `phy_rst_pad_o` to the PHY.
- Reset characteristics:
  - Polarity: Active-low (logic 0 asserts reset).
  - Default behavior: Synchronous reset inside modules (reset checked as `if(!rst)`). If compiled with `USBF_ASYNC_RESET`, some flops use asynchronous, active-low reset (posedge `clk` or negedge `rst`).
  - External PHY reset: `phy_rst_pad_o` mirrors `rst_i`.
  - Reset recommendation: Assert `rst_i` low long enough to be sampled by at least one rising edge of both `clk_i` and `phy_clk_pad_i` to guarantee clean reset in both domains (more conservative: a few cycles of the slower clock).
- Clock domain usage and crossings:
  - UTMI/PHY domain (`phy_clk_pad_i`): `usbf_utmi_if`, UTMI line-state and speed negotiation (`usbf_utmi_ls`), packet layer (`usbf_pl`), packet decoder/encoder (`usbf_pd/usbf_pa`), protocol engine (`usbf_pe`), internal DMA (`usbf_idma`), endpoint control/state in `usbf_rf` (`clk` port), and the memory arbiter's PHY side (`usbf_mem_arb.phy_clk`). The external SSRAM interface (`sram_adr_o/sram_data_o/sram_we_o/sram_re_o`) is driven in this domain.
  - System/WISHBONE domain (`clk_i`): WISHBONE interface/state (`usbf_wb.wb_clk`), register file host access side (`usbf_rf.wclk`), DMA request/ack interface to system (`dma_req_o/dma_ack_i`), and the memory arbiter's WB side (`usbf_mem_arb.wclk`).
  - CDC mechanisms:
    - WISHBONE requests (`wb_stb_i & wb_cyc_i`) are sampled into the PHY domain in `usbf_wb`; acknowledge is generated in the PHY domain and returned to `clk_i` using a small pipeline (`wb_ack_*` flops) to create a single-cycle ack pulse in `clk_i`.
    - DMA request/ack between `usbf_rf` (`clk_i` side) and external DMA uses handshake flops in `wclk` and `clk` domains to safely cross.
    - Memory arbitration (`usbf_mem_arb`) arbitrates PHY-side IDMA and WB-side memory accesses; requests from WB side must remain asserted until wack is observed; the actual SSRAM cycles occur in the PHY domain.
    - Line state and vendor status from PHY are registered in the PHY domain and read by the RF/WB side; internal registers and handshake logic mitigate CDC hazards.
  - Design constraints and expectations:
    - No fixed frequency ratio or phase relationship is required between `clk_i` and `phy_clk_pad_i`; they may be fully asynchronous.
    - WISHBONE master must hold stb/cyc (and address/data) stable until `wb_ack_o` is observed, which also ensures the PHY-domain sampling in `usbf_wb` is reliable.
    - External SSRAM, if used, should be clocked/qualified to the PHY domain timing, since SSRAM control signals are generated with `phy_clk_pad_i`.
    - Resume/suspend: `resume_req_i` is latched in `clk_i` domain and cleared by `suspend_clr` sourced from the PHY domain logic; internal synchronization is provided.
  - Summary: Provide two free-running clocks (`clk_i` for system/WISHBONE and `phy_clk_pad_i` for UTMI/PHY) and an active-low reset `rst_i` (also driven to the PHY as `phy_rst_pad_o`). The design includes internal synchronizers/handshakes for CDC between these domains and expects standard WISHBONE hold-until-ack behavior.

-- Power Management Signals (`usb_vbus`, `susp_o`, `resume_req_i`) --

This core exposes three power management related signals that coordinate USB attach/suspend/resume behavior across the system and the UTMI PHY domain:

- `usb_vbus` (input, `phy_clk` domain): Indicates VBUS/power condition to the UTMI link-state (LS) machine. When asserted, the LS FSM is held in POR (power-on-reset) state; when deasserted, the FSM proceeds through ATTACH and then NORMAL operation. In ATTACH, the core waits ~100 ms before declaring attached. Provide a stable, debounced level relative to `phy_clk`. Effect of assertion: clears attachment, exit-suspend flags, and reinitializes to FS defaults; deassertion allows normal attach/operation.
- `susp_o` (output, `clk_i` domain): System-level suspend status. Reflects the UTMI LS machine's `usb_suspend` flag, re-sampled into `clk_i`. Asserts after bus idle persists for ~3 ms (per UTMI suspend entry rules); deasserts on reset or a successful resume sequence. Use `susp_o` to gate clocks/power to the function logic. Note: single-flop CDC into `clk_i`; treat as a quasi-static status with up to one `clk_i` cycle latency.
- `resume_req_i` (input, `clk_i` domain): System request for remote wakeup/resume. Latched internally (`resume_req_r`) and synchronized into the `phy_clk` domain, then automatically cleared when the UTMI FSM issues `suspend_clr` after completing resume. Usage rules: only assert while suspended (`susp_o=1`) and after at least ~5 ms in suspend (UTMI spec constraint enforced by the FSM). When valid, the LS FSM performs a proper resume signaling sequence (drives K for  $\geq 1$  ms, then returns to NORMAL). Software/logic may pulse or level-hold `resume_req_i`; the core holds the request until it is honored and cleared.

Operational notes:

- Clock domains: `usb_vbus` is sampled in `phy_clk`; `resume_req_i` is accepted in `clk_i` and safely transferred to `phy_clk`; `susp_o` is a `clk_i` copy of a `phy_clk` status. Ensure long enough assertion of `resume_req_i` for recognition (the core latches it until cleared).
- Compliance timing: The FSM enforces approximate UTMI timings (idle ~3 ms to enter suspend;  $\geq 5$  ms before allowing remote wakeup;  $\geq 1$  ms resume K). No external timing shaping is required beyond meeting these conditions.
- Side effects: Asserting `usb_vbus` (POR) clears attachment and suspend state; deassertion begins attach timing. Resume requests are ignored unless in suspend and timing preconditions are met.

-- Vendor/Test Interfaces (VControl, VStatus) --

### Overview

- The core exposes a simple vendor/test sideband interface intended to interact with PHY- or board-specific test logic: a 4-bit control bus (`VControl[3:0]`) with a write strobe (`VControl_Load`), and an 8-bit status input (`VStatus[7:0]`).
- These signals are surfaced at the top level as:
  - `VControl_pad_o[3:0]` (output): vendor/test control value.
  - `VControl_Load_pad_o` (output): pulse indicating a new control value is available (use as an external latch strobe).
  - `VStatus_pad_i[7:0]` (input): vendor/test status returned by external logic.
- Functionally, they are connected to the register file (`usbf_rf`) as `utmi_vend_ctrl[3:0]` (output), `utmi_vend_wr` (output strobe), and `utmi_vend_stat[7:0]` (input).

### Register map and access

- Address space: USB Function Register File (RF) space (WISHBONE accesses with `wb_addr_i[12] == 0`).
- Word address within RF: 0x05 (i.e., `adr[6:0] == 0x05`, which corresponds to byte offset 0x14 assuming 32-bit word addressing).
- Read (RF offset 0x05):
  - Bits [7:0] = `VStatus` snapshot.

- Bits [31:8] = 0.
- Write (RF offset 0x05):
  - Bits [3:0] written to VControl.
  - Bits [31:4] are ignored (reserved).
- A write generates a pulse on VControl\_Load to signal external logic to capture the new VControl value.

#### Clocking and timing behavior

- Two clock domains are involved:
  - wclk (system/WISHBONE clock) for software register accesses.
  - phy\_clk (UTMI/PHY clock) for the vendor/test pins and internal USB logic.
- Write path (wclk -> phy\_clk):
  - Software writes RF[0x05] in wclk. The 4-bit value is transferred to phy\_clk and presented on VControl.
  - VControl\_Load pulses in phy\_clk as a one-shot notification that a new value is ready; external logic should latch VControl on this pulse.
  - Read path (phy\_clk -> wclk):
    - VStatus is first registered in phy\_clk, then sampled into wclk and returned on reads of RF[0x05].
    - Because crossings are level/pulse-based and not fully handshaked, software should:
      - Avoid back-to-back writes without allowing at least one phy\_clk cycle for the load pulse to assert and be cleared.
      - Expect VStatus reads to reflect a recent (but not strictly synchronized) snapshot; external logic should hold VStatus stable for at least one wclk cycle to be reliably captured.

#### Reset and default values

- VControl\_Load is deasserted after reset.
- VControl default after reset is undefined until software performs the first write to RF[0x05]. External logic should not rely on a power-on VControl value.
- VStatus readback may be undefined until the first stable sample is captured after reset.

#### Functional notes and constraints

- The vendor/test interface is orthogonal to normal USB operation; it does not affect packet handling, endpoints, or protocol state machines.
- Reserved bits in the RF[0x05] register must be written as zero and read as zero.
- Typical usage:
  - 1) Software writes a 4-bit control value to RF[0x05] to drive VControl and trigger VControl\_Load.
  - 2) External logic latches VControl on VControl\_Load and may update VStatus.
  - 3) Software reads RF[0x05] to obtain the current 8-bit VStatus.
- Common applications include PHY vendor-specific register access, board test hooks, or debug feature control via off-core logic.

#### -- Signal Timing and Latency --

This core has two primary clock domains and explicit handshakes that determine timing and latency from request to observable response. Unless USBF\_ASYNC\_RESET is defined, all resets are synchronous in their respective clock domains.

#### Clock domains

- phy\_clk (UTMI/packet engine/memory master side): phy\_clk\_pad\_i
- wb\_clk (system/Wishbone/CSR side): clk\_i (referred to as wclk in several blocks)
- All UTMI receive and transmit signals are registered on phy\_clk. Most CSR/interrupt outputs are registered on wb\_clk.

### UTMI interface timing (usbf\_utmi\_if / usbf\_utmi\_ls)

- RX path: RxValid/RxActive/RxError/DataIn are sampled on phy\_clk; rx\_valid, rx\_active, rx\_err, rx\_data update with 1 phy\_clk cycle latency.
- TX path: DataOut updates on phy\_clk when (TxReady || tx\_first); TxValid is asserted as (tx\_valid | drive\_k | tx\_valid\_last | hold) and deasserts after TxReady or the end of a resume K, effectively stretching until the PHY accepts the last byte. Net latency from tx\_valid to TxValid is 0–1 phy\_clk cycle.
- Line-state/state-machine timing is derived from counters in phy\_clk with programmable constants:
- T1 250 ns tick derived by a prescaler (USBF\_T1\_PS\_250\_NS=13). Higher-level thresholds: 2.5 µs, 3.0 ms, 3.125 ms, 5.0 ms.
- T2 multi-event timing: 100 µs, 2.5 µs granularity, 0.5 ms step, wakeup (~5 ms by count), 1.0 ms, 1.2 ms, and 100 ms maximum window.
- Suspend/resume and high-speed negotiation transitions occur when the above counters reach thresholds while line states satisfy required conditions (J/K/SE0 persistence checks use two-sample qualification). Outputs XcvSelect/TermSel/OpMode/mode\_hs/usb\_reset/usb\_suspend/usb\_attached change on the phy\_clk edge that closes the state's condition; typical latency is 0–1 phy\_clk cycles after the qualifying event.

### Packet decoding/encoding (usbf\_pd, usbf\_pa)

- PID capture: The first PID byte is latched on rx\_valid & rx\_active. pid\_\* flags are valid 0–1 phy\_clk cycles after PID sampling. Token fields (address, endpoint, CRC5) become valid after two RX bytes; token\_valid asserts one cycle later.
- CRC16 (RX): Running CRC updates each cycle during rx\_data\_valid; crc16\_err is valid at rx\_data\_done with 0–1 cycle latency.
- Data transmit (usbf\_pa):
  - send\_data rising edge (from PE) starts transmission same phy\_clk cycle; tx\_valid asserts immediately.
  - rd\_next pulses when tx\_ready & tx\_valid\_r to fetch next byte; one rd\_next per payload byte.
  - CRC insertion adds two trailing bytes. With send\_zero\_length, CRC phase begins immediately.
  - send\_token causes token PID to be sent in place of data; latency to DataOut is 0–1 phy\_clk cycle.

### Protocol engine timing (usbf\_pe)

- Token/data PID selection and send\_token/send\_data are combinational from current state and inputs; registered outputs present on the next phy\_clk.
- Buffer accounting (sizes, addresses) updates on phy\_clk using current CSR and buffer descriptors.
- Timeouts:
  - RX ACK timeout (rx\_ack\_to): counts phy\_clk cycles when waiting for ACK; thresholds are 36 cycles (FS) or 22 cycles (HS). Absolute time depends on phy\_clk frequency.
  - TX DATA timeout (tx\_data\_to): counts phy\_clk cycles while RX is inactive; thresholds 36 (FS) or 22 (HS).
  - Abort conditions (e.g., buffer overflow, size violations, mismatched PID sequences) drive early state exits; state transitions occur on the next phy\_clk with 0–1 cycle latency.

### Internal DMA (usbf\_idma)

- Handshake to memory arbiter: mreq asserted when ready; mack is sampled and registered as mack\_r. Effective acknowledgement latency is 1 phy\_clk cycle after mreq is asserted (mem\_arb returns mack combinatorially but idma registers it).
- RX DMA (SSRAM writes):
  - Starts after WAIT\_MRD sees mack\_r, then MEM\_WR. Writes are word-packed; word write occurs when a 32-bit word is assembled or at end-of-packet via wr\_last.
  - idma\_done asserts when rx\_data\_done or size counter reaches zero; latency 0–1 phy\_clk cycles after those conditions.

- TX DMA (SSRAM reads):
  - Issues alternating reads to fill two 32-bit staging buffers. After first mack\_r, rd\_first asserts; subsequent reads cycle MEM\_RD2/MEM\_RD3 when adrb\_next[1:0]==3 and rd\_next is seen.
  - send\_data remains asserted until size drains to zero; deasserts on the cycle the last byte is scheduled (or immediately for zero-length).

#### External SRAM arbitration (usbf\_mem\_arb)

- Single-port external interface multiplexed between PHY master (IDMA) and WB master. mack is tied to mreq; wack is generated one phy\_clk later (wack\_r) when no mreq is present.
- Port selection is combinational each cycle: Wishbone side wins only when (wreq | wack) and no mreq. Otherwise PHY side has priority.
- Effective PHY memory access latency: 1 phy\_clk cycle from mreq to mack\_r; external sram\_re\_o is held high; sram\_we\_o asserts in-cycle on write selection; sram\_dout/adr reflect the selected master combinationally for the current cycle.

#### Register file and interrupts (usbf\_rf/usbf\_ep\_rf)

- Endpoint CSR/data accesses are addressed on wb\_clk; ep\*\_re/ep\*\_we decode from adr[6:2] and re/we with 0–1 wb\_clk cycle latency.
- ep\_match is generated on phy\_clk by comparing ep\_sel to CSR address; match propagates to rf outputs in 0–1 phy\_clk cycles.
- DMA request generation (per-EP):
  - dma\_req is produced in wb\_clk domain with internal hold logic when buffers are available or transfer remains. It asserts when dma\_req\_d is seen and clears on dma\_ack and no-hold.
  - dma\_ack crosses to phy\_clk through a two-flop synchronizer (r4/r5); dma\_ack\_i becomes valid after 2 phy\_clk cycles typical.
- Interrupt lines (inta/intb):
  - Event bits (int\_stat) set on phy\_clk from protocol/CRC/timeout conditions; reading INT register (re at adr==1) clears on the next clk edge.
  - Output aggregation and mask application occur on wb\_clk; inta/intb update the wb\_clk following the bit set/clear, giving 1–2 clock cycle latency from event to pin depending on domain crossing.

#### Wishbone interface (usbf\_wb)

- Address/data are used directly; wb\_data\_o returns rf\_din or ma\_din on the next wb\_clk after read mux selection.
- Request detection: wb\_req\_s1 is sampled on phy\_clk from (wb\_stb\_i & wb\_cyc\_i); response generation occurs via a state machine in phy\_clk.
- Acknowledge timing:
  - Register-file (RF) access: next\_state=W0 causes wb\_ack\_d=1 for one phy\_clk; wb\_ack\_o is a single-pulse on wb\_clk after a two-flop pipeline (W1/W2), typically 2–3 wb\_clk cycles after request sampling plus one phy\_clk for cross-domain.
  - Memory (MA) access: ma\_req asserted until ma\_ack; mem\_arb returns wack one phy\_clk after wreq (if PHY not using memory). wb\_ack\_o then follows the same two-stage wb\_clk pipeline. Contention with PHY side can delay wack until mreq deasserts.
  - Data path latency for reads: memory data returns to wb\_data\_o after arbiter selection and cross-domain; effective latency is at least 1 phy\_clk for wack generation + 2 wb\_clk for ack pulse formation; additional wait if PHY is occupying the SRAM.

#### General CDC and latency notes

- Many crossings use single-register sampling or two-flop synchronizers depending on signal direction. Where single-sample captures exist (e.g., wb\_req\_s1 sampled by phy\_clk), design assumes requests are held stable long enough to avoid metastability hazards.
- All state machine outputs are registered; observable outputs (pins and WB ack) change on the clock

following the qualifying condition.

- Time-based behaviors (timeouts, suspend/resume, frame timing) scale with phy\_clk. Absolute times depend on the actual PHY clock frequency (e.g., HS typically 60 MHz, FS typically 12 MHz). The provided constants define cycle thresholds; convert to time as threshold\_cycles / phy\_clk\_frequency.

#### -- Clock Domain Crossing Requirements --

This core uses two primary, potentially asynchronous clock domains and includes multiple clock-domain crossings (CDCs) that must be handled carefully at integration time.

##### Clock domains

- phy\_clk (UTMI domain): Driven by phy\_clk\_pad\_i. All packet-level logic (PD/PA/PE/IDMA in usbf\_pl), UTMI link state (usbf\_utmi\_if/usbf\_utmi\_ls), and the external buffer memory (via usbf\_mem\_arb) operate in this domain.
- wb\_clk (system/Wishbone domain): Driven by clk\_i. The Wishbone slave (usbf\_wb) and register-file access side of the register block (usbf\_rf) operate in this domain.

##### Reset

- rst is distributed to both domains. If USBF\_ASYNC\_RESET is not defined, rst must be synchronous to each domain. If USBF\_ASYNC\_RESET is defined, rst is used as an asynchronous assertion; deassert rst synchronously within each domain to avoid metastability.

##### CDC summary and requirements

###### 1) Wishbone request/ack crossing (usbf\_wb)

- Direction wb\_clk -> phy\_clk: wb\_req\_s1 <= wb\_stb\_i & wb\_cyc\_i is sampled directly in the phy\_clk domain (single-flop sampling). Requirement: The Wishbone master must hold CYC/STB/ADR/DAT/WE stable until wb\_ack\_o is observed (classic WB handshake), ensuring the request level is stable long enough to be safely sampled.
- Direction phy\_clk -> wb\_clk: wb\_ack\_d (generated in phy\_clk) is re-timed through 2+ flops in wb\_clk and edge-detected to create wb\_ack\_o (1-cycle pulse). No additional action required by integrator beyond honoring WB protocol.
- STA: Declare clk\_i and phy\_clk\_pad\_i asynchronous (set\_clock\_groups -asynchronous) and constrain the req/ack crossings as CDC paths.

###### 2) Buffer memory arbitration (usbf\_mem\_arb)

- All arbitration and SRAM control are clocked by phy\_clk. The wclk port is not used internally, and wack is generated in phy\_clk. The Wishbone side memory requests presented to usbf\_mem\_arb also originate from phy\_clk (via usbf\_wb's MA interface), so there is no CDC within the memory path itself.
- External SSRAM interface: Treat as synchronous to phy\_clk. Constrain sram\_\* signals to phy\_clk timing.

###### 3) Register file and endpoint register block (usbf\_rf and usbf\_ep\_rf)

- Write/read strobes and address: rf\_we/rf\_re/adr originate in the phy\_clk FSM of usbf\_wb and are consumed in the wb\_clk (wclk) domain of usbf\_rf. These pulses/levels cross clock domains without explicit synchronizers. Requirement: Either (a) use a common clock (tie clk\_i == phy\_clk\_pad\_i), or (b) add proper CDC synchronization for re/we/adr (e.g., re-time strobes with two-flop synchronizers and hold address/data stable for multiple wclk cycles), or (c) wrap usbf\_rf with a register-slice CDC bridge.
- Status/interrupts back to wb\_clk: Many status bits (e.g., int\_stat, endpoint CSR mirrors) are produced in phy\_clk and are sampled in wb\_clk logic without explicit multi-bit synchronization. Requirement: If clocks are asynchronous, add synchronizers for single-bit status (2FF) and use handshake/gray-code schemes or shadow registers to move multi-bit values. Alternatively, keep both domains on the same clock.

#### 4) UTMI vendor/status and line-state into register file

- LineState and VStatus are captured in the phy\_clk domain at top-level, then sampled in wb\_clk within usbf\_rf. This is a CDC without explicit synchronizers. Requirement: If clocks are asynchronous, re-time these buses into wb\_clk using appropriate CDC techniques (2FF for single-bit, bus synchronizers/handshake for multi-bit) or use a shared clock.

#### 5) Resume/suspend handshake across domains

- resume\_req (clk\_i -> phy\_clk): Top-level holds resume\_req\_r asserted in wb\_clk until cleared by suspend\_clr generated in phy\_clk, then re-sampled back into wb\_clk. This level-based handshake mitigates short pulses, but sampling is single-flop. Requirement: Keep resume\_req\_i asserted until suspend\_clr is observed; if clocks are asynchronous, consider adding 2FF synchronizers on both control and acknowledge paths.

#### General integration guidance

- Preferred: Use a single clock for clk\_i and phy\_clk\_pad\_i to avoid CDC hazards, as several paths lack full CDC hardening for multi-bit signals.
- If separate clocks are required:
  - Classify clk\_i and phy\_clk\_pad\_i as asynchronous in STA and set false-paths or asynchronous clock groups between them.
  - Add 2FF synchronizers for single-bit control/status that cross domains (e.g., re/we, interrupt sources, suspend\_clr, resume\_req).
  - For multi-bit buses (addresses, data, CSR images, status vectors), use stable-level handshakes or bus synchronizers; ensure the producer holds values stable over at least two destination clock cycles.
  - Ensure WISHBONE master adheres strictly to classic handshake timing (hold CYC/STB/ADR/DAT/WE until ACK) so usbf\_wb can safely capture requests.
- Reset deassertion should be glitch-free and synchronized per domain if asynchronous assertion is used.

## PARAMETERIZATION AND CONFIGURATION

-- Generic Parameters (SSRAM\_HADR, Address Widths) --

- Purpose
- This core exposes two key compile-time sizing knobs that shape bus widths and buffer memory depth:
- SSRAM\_HADR: selects the on-chip buffer memory depth (in 32-bit words) and all related address widths to the SSRAM/datapath.
- USBF\_UFC\_HADR: selects the width of the WISHBONE address bus used by the USB Function Controller (UFC) side and the bit used to split Register-File (RF) vs Buffer-Memory (MEM) accesses.
- Defaults in this file (after preprocessor redefinitions; last definition wins)
- `USBF\_SSRAM\_HADR = 9
- `USBF\_UFC\_HADR = 12
- RF/MEM select: `USBF\_RF\_SEL = !wb\_addr\_i[12], `USBF\_MEM\_SEL = wb\_addr\_i[12]
- SSRAM\_HADR (buffer memory depth and address widths)
- Scope/propagation: Parameter SSRAM\_HADR is defined in usbf\_top and passed down into usbf\_pl, usbf\_idma, usbf\_pe, and usbf\_mem\_arb.

- External SSRAM address width: sram\_adr\_o is [SSRAM\_HADR:0] (word addressing of 32-bit words).
- Internal DMA address width: IDMA/PE use [SSRAM\_HADR+2:0] to include the 2 LSB byte/word-offset bits and span two-word prefetching.
- Wishbone→memory mapping: the memory word address seen from Wishbone is ma\_adr[SSRAM\_HADR+2:2] (i.e., drop byte lanes [1:0], keep word index up to bit SSRAM\_HADR+2).
- Capacity formula: Buffer Memory Size =  $(2^{(SSRAM\_HADR+1)})$  words  $\times$  4 bytes/word. Example with default SSRAM\_HADR=9  $\rightarrow 2^{(10)} \times 4 = 4096$  bytes.
- Data width: 32-bit datapath to SSRAM (sram\_data\_[i|o] are 32 bits). Addressing is word-based; byte-lane selection is handled internally.
  
- WISHBONE/UFC address width and mapping (USBF\_UFC\_HADR)
- WISHBONE address bus width: wb\_addr\_i is [USBF\_UFC\_HADR:0]. With default 12  $\rightarrow$  13-bit address (A12..A0).
- RF/MEM address-space split: A[12] (by default) selects the target space.
- A[12]=0  $\rightarrow$  Register File (RF) space (`USBF\_RF\_SEL)
- A[12]=1  $\rightarrow$  Buffer Memory (MEM) space (`USBF\_MEM\_SEL)
- Register file addressing into usbf\_rf:
- usbf\_top passes adr = ma\_adr[8:2] (7 bits) to usbf\_rf; A[1:0] are the byte lanes of the 32-bit WISHBONE data bus.
- Effective RF map: 128 longword slots (7-bit index), organized in sub-windows per endpoint as implemented inside usbf\_rf.
- Memory addressing into SSRAM from WISHBONE:
- Word address is ma\_adr[SSRAM\_HADR+2:2], producing SSRAM\_HADR+1 bits to drive sram\_adr\_o[SSRAM\_HADR:0].
  
- Consistency constraints and guidance when changing sizes
- Relationship: USBF\_UFC\_HADR must be large enough to cover the memory word address field up to bit (SSRAM\_HADR+2) and still have a higher bit available for RF/MEM selection. With the provided split, a good rule is:
  - $USBF\_UFC\_HADR \geq (SSRAM\_HADR + 3)$
  - Example: SSRAM\_HADR=9  $\rightarrow (9+3)=12$  (matches default USBF\_UFC\_HADR=12).
  - If you change USBF\_UFC\_HADR, also ensure `USBF\_RF\_SEL`/`USBF\_MEM\_SEL` use the same split bit. In this file they are defined explicitly on bit 12; adjust them together when moving the split.
  - If you change SSRAM\_HADR, the following adjust automatically via parameter propagation:  
sram\_adr\_o width, internal DMA address widths, and the WISHBONE-to-memory slice  
ma\_adr[SSRAM\_HADR+2:2]. Ensure USBF\_UFC\_HADR and the split bit still satisfy the constraint above.
  
- Practical examples (with defaults)
- SSRAM\_HADR=9  $\rightarrow$  sram\_adr\_o[9:0] (1024 words), 4 KB buffer memory.
- USBF\_UFC\_HADR=12  $\rightarrow$  WISHBONE A[12:0]; A[12] selects MEM vs RF; RF address passed as A[8:2] to usbf\_rf; MEM word address passed as A[11:2] to SSRAM.
  
- How to override
- Via preprocessor: compile with -DUSBF\_SSRAM\_HADR= and/or -DUSBF\_UFC\_HADR= and adjust `USBF\_RF\_SEL`/`USBF\_MEM\_SEL` accordingly.
- Via parameter override (where applicable): set SSRAM\_HADR on the usbf\_top instance; it propagates to all submodules that parameterize on SSRAM\_HADR.

-- Endpoint Enablement Macros (USBF\_HAVE\_EPn) --

Compile-time macros USBF\_HAVE\_EPn (where n = 1..15) control which non-control USB endpoints

are instantiated in hardware. Endpoint 0 is always present and is not gated by any macro.

#### Behavior

- If USBF\_HAVE\_EPn is defined at compile time, the real endpoint register file usbf\_ep\_rf is instantiated for EPn.
- If USBF\_HAVE\_EPn is not defined, a stub usbf\_ep\_rf\_dummy is instantiated for EPn, which ties off functionality: no match, no interrupts, no DMA requests, CSR reads return 0, and buffers appear unallocated (buf0/buf1 = 0xFFFF\_FFFF).
- The macros are tested with `ifdef, so only the presence of the macro matters (its value is ignored). Typical use is +define+USBF\_HAVE\_EPn or `define USBF\_HAVE\_EPn.

#### Contiguity requirement

- Endpoints must be enabled contiguously from EP1 upwards. Enabling EP(k) while any lower EP(m Effects
  - Register map: The address decode window for all endpoints (EP0..EP15) remains. Accesses to a disabled endpoint's window operate on the dummy instance and read back zeros (or the dummy defaults) and produce no side effects.
  - DMA: dma\_req[n] is driven by the endpoint instance. Disabled endpoints keep their DMA request low; dma\_ack[n] is still a port but unused by the dummy.
  - Interrupts: inta/intb contributions from disabled endpoints are 0.
  - Matching and endpoint selection: Disabled endpoints never match (ep\_match = 0) and therefore do not participate in transfers.
  - Resource usage: Only enabled endpoints synthesize real logic; disabled endpoints reduce area.

#### Limits and defaults

- Supported endpoint indices: 0..15 (EP0 fixed on; macros cover EP1..EP15).
- Example/default in this file: USBF\_HAVE\_EP1, USBF\_HAVE\_EP2, USBF\_HAVE\_EP3 are defined; EP4..EP15 are not defined, yielding endpoints 0..3 enabled.

#### Integration guidance

- Define USBF\_HAVE\_EP1..USBF\_HAVE\_EP(K) for the highest endpoint number you require; do not skip indices (e.g., enable EP1..EP6 to get endpoints 0..6).
- Add the defines in your synthesis/simulation toolchain (e.g., Verilog +define+USBF\_HAVE\_EP4) or via a header.
- Ensure your USB descriptors' endpoint count matches the enabled hardware and that SRAM/buffer sizing is adequate for the chosen number of endpoints.

#### -- Timing and Timeout Parameters (T1/T2, ACK/DATA TO) --

Timing and Timeout Parameters govern link-state (T1/T2) timing in the UTMI low-speed controller and handshake/data timeouts in the packet engine. All timers run in the PHY clock domain (phy\_clk\_pad\_i) and are parameterized by macros that can be tuned at synthesis.

##### 1) T1 timing (idle-based, us-ms scale)

- Purpose: Detect reset and suspend entry/exit conditions from continuous line-state idleness.
- Idle definition: ls\_idle = (mode\_hs ? SE0 : J). An "idle\_long" qualifier asserts when two consecutive samples indicate idle.
- Tick generation: A prescaler (ps\_cnt) produces a periodic tick (ps\_cnt\_clr) after USBF\_T1\_PS\_250\_NS cycles. On each tick, an 8-bit counter (idle\_cnt1) increments while idle\_long is true.
- Threshold macros (expressed as counts of the ps\_cnt\_clr tick):
  - USBF\_T1\_C\_2\_5\_US
  - USBF\_T1\_C\_3\_0\_MS

- USBF\_T1\_C\_3\_125\_MS
- USBF\_T1\_C\_5\_MS
- USBF\_T1\_C\_62\_5\_US (auxiliary windowing constant)
- Derived flags:
  - T1\_gt\_2\_5\_uS: idle\_cnt1 > USBF\_T1\_C\_2\_5\_US
  - T1\_st\_3\_0\_mS: idle\_cnt1 < USBF\_T1\_C\_3\_0\_MS
  - T1\_gt\_3\_0\_mS: idle\_cnt1 > USBF\_T1\_C\_3\_0\_MS
  - T1\_gt\_3\_125\_mS: idle\_cnt1 > USBF\_T1\_C\_3\_125\_MS
  - T1\_gt\_5\_0\_mS: idle\_cnt1 > USBF\_T1\_C\_5\_MS
- Usage in UTMI LS state machine (usbf\_utmi\_ls):
  - NORMAL: !mode\_hs & T1\_gt\_2\_5\_uS & T1\_st\_3\_0\_mS & !idle\_long → RESET
  - NORMAL: !mode\_hs & T1\_gt\_3\_0\_mS → SUSPEND
  - NORMAL: mode\_hs & T1\_gt\_3\_0\_mS → RES\_SUSP
  - SUSPEND: T1\_gt\_2\_5\_uS & se0\_long → RESET; k\_long → RESUME; T1\_gt\_5\_0\_mS & resume\_req → RESUME\_REQUEST
- Notes: Effective time in seconds depends on phy\_clk and the prescaler constant USBF\_T1\_PS\_250\_NS; macro names indicate intended nominal durations.

## 2) T2 timing (microframe/millisecond scale)

- Purpose: Provide coarser time bases for reset, resume, attach, and speed negotiation windows.
- Two-stage counters:
  - me\_ps counts phy\_clk cycles; me\_ps\_2\_5\_us asserts at USBF\_T2\_C\_2\_5\_US cycles (~2.5 μs nominal).
  - me\_ps2 counts me\_ps\_2\_5\_us events; me\_ps2\_0\_5\_ms asserts at USBF\_T2\_C\_0\_5\_MS events (~0.5 ms nominal).
  - me\_cnt increments each me\_ps2\_0\_5\_ms pulse and is compared to:
    - USBF\_T2\_C\_100\_US (via me\_ps2) → T2\_gt\_100\_uS
    - USBF\_T2\_C\_WAKEUP → T2\_wakeup (resume signaling min)
    - USBF\_T2\_C\_1\_0\_MS → T2\_gt\_1\_0\_mS
    - USBF\_T2\_C\_1\_2\_MS → T2\_gt\_1\_2\_mS
    - USBF\_T2\_C\_100\_MS → me\_cnt\_100\_ms (attach delay)
- Usage in UTMI LS state machine:
  - RESUME/RESUME\_WAIT/RESUME\_REQUEST windows driven by T2\_gt\_100\_uS, T2\_gt\_1\_0\_mS, T2\_wakeup
  - RESET entry and speed negotiation staging uses T2\_gt\_1\_0\_mS, T2\_gt\_1\_2\_mS
  - ATTACH delay to NORMAL occurs at me\_cnt\_100\_ms (~100 ms)
- Notes: Actual times scale with phy\_clk and macro values; defaults target nominal USB timings.

## 3) Handshake/Data timeouts in Packet Engine (ACK/DATA TO)

- Domain: phy\_clk (same as link logic).
- Mode selection: mode\_hs selects HS- vs FS-specific thresholds.
- RX ACK timeout (after IN data):
  - Counter: rx\_ack\_to\_cnt increments while rx\_ack\_to\_clr is deasserted; cleared by tx\_valid activity.
  - Threshold: rx\_ack\_to\_val = (mode\_hs ? USBF\_RX\_ACK\_TO\_VAL\_HS : USBF\_RX\_ACK\_TO\_VAL\_FS).
  - Use: In state IN2, if token\_valid & pid\_ACK arrives before timeout → proceed to UPDATE; if rx\_ack\_to asserts → return to IDLE and set timeout interrupt.
- TX DATA timeout (before OUT data arrival):
  - Counter: tx\_data\_to\_cnt increments while rx\_active is low; cleared while rx\_active is high.
  - Threshold: tx\_data\_to\_val = (mode\_hs ? USBF\_TX\_DATA\_TO\_VAL\_HS : USBF\_TX\_DATA\_TO\_VAL\_FS).
  - Use: In state OUT, if tx\_data\_to asserts (no data started) → abort to IDLE and set timeout interrupt.

- Interrupt generation: `int_to_set = (state==IN2 & rx_ack_to) | (state==OUT & tx_data_to)`. This feeds endpoint interrupt status machinery (`usbf_ep_rf/usbf_rf`).

#### 4) Configurability and integration

- All thresholds are macros defined at compile time; system integrators may retune for specific PHY clock frequencies.
- Speed-dependent behavior is gated by `mode_hs` (set by the UTMI LS controller during speed negotiation).
- Timers automatically clear on state machine events (e.g., tx activity for ACK TO, rx\_active for DATA TO, and explicit clears within `utmi_ls` transitions) to ensure clean windowing.

Default macro values (for reference) include: `USBF_RX_ACK_TO_VAL_FS=36, USBF_RX_ACK_TO_VAL_HS=22, USBF_TX_DATA_TO_VAL_FS=36, USBF_TX_DATA_TO_VAL_HS=22, USBF_T1_PS_250_NS=13, USBF_T1_C_2_5_US=10, USBF_T1_C_62_5_US=250, USBF_T1_C_3_0_MS=48, USBF_T1_C_3_125_MS=50, USBF_T1_C_5_MS=80, USBF_T2_C_2_5_US=148, USBF_T2_C_0_5_MS=200, USBF_T2_C_WAKEUP=10, USBF_T2_C_100_US=40, USBF_T2_C_1_0_MS=2, USBF_T2_C_1_2_MS=2, USBF_T2_C_100_MS=200`. Effective real times depend on `phy_clk` and these prescale factors.

#### -- Debug and Test Options (USBF\_DEBUG, USBF\_TEST\_IMPL) --

Debug and Test Options control simulation-only diagnostics and (potentially) test hooks.

`USBF_DEBUG`: When defined, compiles in \$display-based runtime checks primarily in `usbf_pe` and `usbf_idma`. These checks print clear error messages if key control signals are X/unknown or violate expected conditions at specific FSM states (e.g., IDLE/IN/OUT/UPDATE in `usbf_pe`; IDLE/WAIT\_MR/MEM\_WR/... in `usbf_idma`). This aids in catching X-propagation, illegal handshakes, and timing/sequence issues during simulation. It does not alter functional logic or synthesize into hardware (messages are simulation-only). For additional step-by-step state tracing, an optional `USBF_VERBOSE_DEBUG` (not defined by default) prints state-entry messages. Recommended: enable for simulation bring-up and X-hunting; disable for quieter regressions or synthesis.

`USBF_TEST_IMPL`: Present but not used in the provided modules; it has no effect in this file set. It is reserved for integrating test-specific implementations elsewhere. Enabling it here does not change behavior.

#### -- Buffer Memory Sizing and Addressing --

Overview: The USB function core uses a shared, 32-bit wide SSRAM for all endpoint buffers. The physical SSRAM address presented on `sram_adr_o` is word-addressed (32-bit words). The memory depth and all pointer/length fields are parameterized by `SSRAM_HADR`. Total buffer memory size is: `bytes = 4 × 2^(SSRAM_HADR+1)`. Example: `SSRAM_HADR=14` → 128 KiB. Addressing domains: 1) External SSRAM port (to SRAM): word addressing [`SSRAM_HADR:0`], 32-bit data. 2) Endpoint buffer descriptors (`buf0/buf1`): byte addressing [`SSRAM_HADR+2:0`] embedded in a 32-bit register, allowing unaligned starts and automatic byte/word boundary handling. 3) Internal DMA: converts byte addresses to word addresses and manages sub-word byte indices. Buffer descriptor format (per endpoint, `buf0/buf1` in `usbf_ep_rf`): - Bit 31: status flag (used by core as not-allocated/done). - Bits [30:17] (14 bits): buffer length in bytes (current/remaining). - Bits [SSRAM\_HADR+2:0]: buffer start pointer in bytes (lower 2 bits are byte offset within a word). Core-generated updates write the full 32-bit descriptor atomically. Address and size handling in transfers: - Max payload size is 11 bits (`csr[10:0]`); the packet size used per transaction is `size_next = min(buf_size, max_pl_sz)`. - New buffer pointer = old pointer + `size_next` (IN) or `+max_pl_sz` (OUT with DMA), computed in byte space; the lower 2 bits are preserved/updated to maintain correct sub-word alignment. - Remaining length is decremented

accordingly; when the buffer becomes empty (IN) or cannot fit another packet (OUT), buffer\_done is asserted. - For DMA endpoints, when buffer\_done is reached, the core pulses buf\*\_rl to release the buffer instead of writing back a new pointer/length. Byte/word conversion (IDMA): - The IDMA performs packing/unpacking between byte streams and 32-bit SSRAM words. - It tracks a 3-bit byte index (adr\_cb) to assemble 4 bytes per write and to walk bytes across words on read; word reads are pre-fetched into two 32-bit staging registers to allow seamless byte sequencing across word boundaries. - The SSRAM word address used on the memory port is adr[SSRAM\_HADR+2:2]; the byte index uses adr[1:0]. Wrap/linear behavior: - For DMA operation, the internal word address generator advances sequentially; when the computed next word address reaches start+length (buffer end), it wraps to 0 (enables safe progression even if a buffer spans to the end of memory). Non-DMA updates simply write back the new byte pointer to the descriptor, so no implicit wrap is required. Wishbone access to buffer memory: - The Wishbone master address (ma\_adr) is byte-based; the SRAM word address seen by the arbiter is ma\_adr[SSRAM\_HADR+2:2]. CPU accesses are 32-bit word operations (no byte strobes), matching the IDMA's packing. Constraints and recommendations: - Choose SSRAM\_HADR large enough so that buf\_size (14-bit) and endpoint payloads (11-bit) comfortably fit; memory is shared across all endpoints. - Buffer pointers may be byte-unaligned; the core handles alignment internally. - For deterministic behavior, software should mark a buffer not-allocated by setting the descriptor to all 1s or by using the status bit scheme consistently before enabling an endpoint.

-- Endpoint CSR Configuration (Direction, Type, Max Packet, DMA Enable) --

Endpoint CSR (EPn, offset 0 within the endpoint's 4-word register bank) packs direction, transfer type, max\_packet size, and DMA enable as follows (bit indices are [31:0], MSB..LSB):

- Direction (bits [27:26])
  - 00: Control endpoint (CTRL)
  - 01: IN (device → host)
  - 10: OUT (host → device)
  - 11: Reserved
  - Notes: Internally decoded as CTRL\_ep, IN\_ep, OUT\_ep.
- Type (bits [25:24])
  - 01: Isochronous
  - 10: Bulk
  - 00/11: Treated as non-iso/non-bulk by the core (typical mapping is 00=Control, 11=Interrupt). The core only checks these for iso/bulk behaviors.
  - Effects: For Isochronous, the protocol engine skips ACK/handshake waits (e.g., IN transitions directly to UPDATE; OUT goes via UPDATEW). For non-iso, normal handshakes apply.
- DMA Enable (bit [15])
  - 0: DMA disabled (PIO/buffered mode)
  - 1: DMA enabled
  - Effects/constraints:
    - Ignored for Control endpoints (Direction=00); the core forces DMA off for CTRL endpoints.
    - When enabled on IN/OUT endpoints, the IDMA engine moves data to/from external SRAM; buffer selection logic changes (endpoint logic fixes selection to buffer 0 for DMA transfers) and flow control uses dma\_in\_buf\_sz1 (IN) and dma\_out\_buf\_avail (OUT). If insufficient data/space is available, the core NACKs/NYETs as appropriate (NYET in HS mode when no space).
- Max Packet Size (bits [10:0])
  - Size in bytes (11-bit field).
  - Used by the protocol engine for transfer sizing, buffer pointer advancement, and validation against small/large constraints.

- Special case: For IN transfers, a value of 0 forces a zero-length packet (send\_zero\_length asserted).

Access notes:

- The CSR is 32-bit and written via the EPn register window (adr[1:0]=2'b00 inside each endpoint bank). Reset value is 0 (Direction=Control, Type=00, MaxPkt=0, DMA=0).

-- Default Reset Configuration --

- Reset polarity and timing
- rst\_i is active-low. All internal resets are synchronous to their respective clocks unless USBF\_ASYNC\_RESET is defined (not defined here).
- Global/top-level outputs after reset release (rst\_i=1)
- Interrupts: inta\_o=0, intb\_o=0
- DMA: dma\_req\_o[15:0]=0
- Suspend: susp\_o=0 (usb\_suspend cleared)
- UTMI pads:
  - XcvSelect\_pad\_o=1 (Full-Speed transceiver selected)
  - TermSel\_pad\_o=1 (FS termination enabled)
  - SuspendM\_pad\_o=0 unless LineState\_pad\_i==2'b10 (K); default deasserted if line not K
  - OpMode\_pad\_o: not forced during POR; remains unchanged until later state transitions (normal mode will be set during resume/speed-neg)
  - TxValid\_pad\_o=0; tx\_ready mirrors TxReady\_pad\_i; DataOut\_pad\_o undefined until a transmit or drive\_k occurs
  - SRAM interface: sram\_re\_o=1, sram\_we\_o=0; sram\_addr\_o/sram\_data\_o are don't-care until a request
  - Vendor control: VControl\_Load\_pad\_o=0, VControl\_pad\_o=4'b0000
  - phy\_rst\_pad\_o mirrors rst\_i
- USB link/PHY controller defaults (usbf\_utmi\_if/usbf\_utmi\_ls)
- Mode: mode\_hs=0 (FS), usb\_reset=0, usb\_suspend=0, usb\_attached=0
- Line-state driven K signaling (drive\_k) inactive
- State machine starts in POR, forcing FS select and FS termination; immediately proceeds to ATTACH state timing
- Packet layer defaults (usbf\_pl, usbf\_pd, usbf\_pa, usbf\_idma, usbf\_pe)
- All internal state machines in IDLE
- No token/data transmit in progress: tx\_valid=0, tx\_first=0
- Receive datapath idle: rx\_data\_valid=0, rx\_data\_done=0, crc16\_err=0
- No DMA activity: rx\_dma\_en=0, tx\_dma\_en=0, idma\_done=0
- Timing/counters cleared (frame tracking and half-microsecond clocks start from 0)
- Register file and endpoint defaults (usbf\_rf, usbf\_ep\_rf)
- Function address: funct\_adr=7'h00
- Interrupt masks: inta\_msk=intb\_msk=0; all RF and EP interrupt sources cleared
- Resume request: rf\_resume\_req=0
- UTMI vendor iface regs: utmi\_vend\_ctrl=4'h0, utmi\_vend\_wr=0; status is a sampled copy of VStatus\_pad\_i
- Endpoint configuration (for all instantiated endpoints; non-present endpoints are tied-off dummies with the same visible reset state):
  - CSR fields: csr0=0, csr1=0, uc\_bsel=0, uc\_dpd=0, ots\_stop=0 → direction=00 (control), transfer type=00, not disabled, not stalled, DMA disabled (csr[15]=0), max packet size=0, size-match qualifiers

cleared

- Buffers: buf0=32'hFFFF\_FFFF, buf1=32'hFFFF\_FFFF (unallocated); buf0\_orig=32'hFFFF\_FFFF

- Interrupt enables: iena=0, ienb=0; interrupt status int\_stat=0; inta=intb=0

- DMA sideband: dma\_req=0; dma\_in\_buf\_sz1=0; dma\_out\_buf\_avail=0; all internal DMA counters cleared

- Wishbone interface defaults (usbf\_wb)

- State machine in IDLE; no memory/rf access pending

- wb\_ack\_o=0; wb\_data\_o reflects selected target on reads when issued; rf\_we=0; ma\_req=0, ma\_we=0

- Memory arbiter defaults (usbf\_mem\_arb)

- wack\_r=0; mack follows mreq; sram\_we\_o=0; sram\_re\_o=1 constant; SRAM address/data mux idle until a request

- Miscellaneous/status

- Addressing parameters in this build: USBF\_UFC\_HADR=12 (13-bit WB address), SSRAM\_HADR=9 (10-bit word address)

- Error flags: pid\_cs\_err=0, nse\_err=0, crc5\_err=0 until traffic is observed

- frm\_nat counters start at 0

-- Synthesis and Clocking Assumptions --

- Clock domains

- phy\_clk\_pad\_i: UTMI/PHY clock domain. Drives the UTMI interface (usbf\_utmi\_if/usbf\_utmi\_ls), packet layer (usbf\_pl, usbf\_pd, usbf\_pa, usbf\_idma, usbf\_pe), the PHY-side of the endpoint/register file (usbf\_rf clk port), and the external buffer memory (usbf\_mem\_arb and sram\_\* signals). Expected rate is UTMI 60 MHz; all UTMI/suspend/reset timing counters are derived from this clock.

- clk\_i: System/WISHBONE clock domain. Drives the WISHBONE interface (usbf\_wb), the WB-side of the register file (usbf\_rf wclk port), and the WB side of the memory arbiter.

- The two clocks are fully asynchronous to each other; there is no defined phase or frequency relationship.

- Reset

- rst\_i is active-high and distributed to both clock domains.

- By default resets are synchronous to their respective clocks. If macro USBF\_ASYNC\_RESET is defined, many always blocks use asynchronous assertion (negedge rst) with synchronous deassert.

- Hold reset long enough for all CDC handshakes to settle across both domains.

- Clock-domain crossing (CDC)

- CDC between clk\_i and phy\_clk\_pad\_i is done with simple registered handshakes and two-flop style request/ack schemes (e.g., usbf\_wb, usbf\_mem\_arb, usbf\_ep\_rf DMA req/ack staging).

- Multi-bit data buses cross domains only when gated by these handshakes; otherwise they are used within a single domain.

- STA constraints: treat clk\_i and phy\_clk\_pad\_i as asynchronous clock groups (e.g., set\_clock\_groups -asynchronous). Do not rely on timing between these domains; optionally add false-paths or CDC exceptions on handshake paths.

- External buffer memory (SRAM/BRAM) interface

- Single-port, 32-bit data, address width is (SSRAM\_HADR+1) bits; macro USBF\_SSRAM\_HADR defines depth (final effective value in this file is 9 → 4 KiB total buffer:  $2^{(9+1)}$  words × 4 bytes).

- Memory is clocked by phy\_clk\_pad\_i only; all sram\_\* signals are synchronous to phy\_clk\_pad\_i.

- Zero-wait-state assumption: usbf\_mem\_arb asserts mack combinationally with mreq, and sram\_re\_o

is tied high. The memory must present read data with a single-cycle latency compatible with this assumption (or be a wrapper that emulates flow-through read and immediate acknowledge).

- Read-during-write and continuous-read behavior must be supported by the memory or wrapper, as sram\_re\_o is constantly asserted and read data is sampled whenever requested.

- WISHBONE access and address mapping

- WISHBONE runs on clk\_i; ack generation is multi-cycle due to CDC. Masters must hold CYC/STB stable until wb\_ack\_o pulses.

- Address decode uses macros: USBF\_RF\_SEL = !wb\_addr\_i[12] (register file), USBF\_MEM\_SEL = wb\_addr\_i[12] (buffer memory). Macro USBF\_UFC\_HADR final value is 12 in this file.

- UTMI interface and timing constants

- UTMI 8-bit interface is assumed at 60 MHz. Timing counters in usbf\_utmi\_ls are derived from macros (e.g., USBF\_T1\_PS\_250\_NS, USBF\_T1\_C\_xxx, USBF\_T2\_C\_xxx). These constants must match the actual phy\_clk\_pad\_i frequency; if the PHY clock differs, adjust the macros accordingly to meet USB spec timing (reset, resume, suspend, chirp, etc.).

- Synthesis notes

- No gated clocks; clock enables and state machines are used throughout.

- Simulation-only code: \$display and initial blocks exist (e.g., in usbf\_top and under USBF\_DEBUG). For clean synthesis, compile without USBF\_DEBUG/USBF\_TEST\_IMPL or ensure your tool ignores system tasks.

- No vendor-specific primitives are instantiated; design is generic RTL.

- I/O and miscellaneous assumptions

- UTMI signals (XcvSelect, TermSel, SuspendM, OpMode, LineState) must connect to a UTMI/UTMI+ compliant PHY. LineState and DataIn/Out are synchronous to phy\_clk\_pad\_i as per UTMI.

- resume\_req\_i originates in clk\_i domain and is synchronized inside the UTMI block before use.

- No tri-stated internal busses; external memory data\_in/out are separated.

- Byte ordering on the 32-bit buffer follows little-endian byte lanes as mapped in usbf\_idma (adr[1:0] selects byte within word).

## MODULE HIERARCHY AND INTEGRATION NOTES

-- Module List and Roles --

- usbf\_top: SoC-facing USB function controller top level. Integrates UTMI PHY interface, protocol layer, register file (RF), WISHBONE slave, and SSRAM memory arbiter. Exposes pads (UTMI), WISHBONE bus, interrupt lines, DMA request vector, and external SRAM interface.

- usbf\_utmi\_if: UTMI PHY front-end bridge. Synchronizes UTMI Rx/Tx handshakes and data to internal signals, manages TxValid generation (incl. resume K drive), and instantiates the UTMI line-state controller.

- usbf\_utmi\_ls: UTMI line-state and link manager. Detects/reset/suspend/attach states, handles HS/FS speed negotiation (chirp K/J), resume signaling, line termination and transceiver select control, and produces mode\_hs/usb\_reset/usb\_suspend/usb\_attached.

- usbf\_pl: USB protocol layer integration. Connects packet decoder (PD), packet assembler (PA), internal DMA (IDMA), and protocol engine (PE). Provides address match against function address, exposes frame/sof timing (frm\_nat), and bridges payload accesses to external memory via

madr/mdout/mreq.

- usbf\_pd: Packet decoder. Parses incoming UTMI RX stream into PIDs, tokens (function address, endpoint), frame numbers, data payload with CRC16 check, and flags sequence/CRC5/CRC16 errors.
- usbf\_pa: Packet assembler. Builds outgoing token/data packets, selects proper DATA PID, appends CRC16, manages send\_first/last timing toward UTMI Tx.
- usbf\_pe: Protocol engine per active endpoint context. Implements USB token/data handshake/state machines for IN/OUT/SETUP/ISO/BULK/CTRL, PID sequencing and checks, timeout handling (ACK/DATA TO), buffer selection/rollover, zero-length handling, and generates RF updates/interrupt set strobes. Coordinates with IDMA for payload moves.
- usbf\_idma: Internal DMA engine. Moves payload data between UTMI side and external SSRAM: writes RX data to memory and reads TX data from memory. Generates memory addresses/byte lanes, tracks sizes, and signals send\_data/idma\_done.
- usbf\_crc16: Byte-wise USB CRC16 generator used by PA/PD.
- usbf\_crc5: USB CRC5 generator used for token verification.
- usbf\_rf: Register file and endpoint aggregator. Hosts top-level RF registers (main status, function address, interrupt masks/sources, frame/diagnostics, UTMI vendor regs) and instantiates up to 16 endpoint RF blocks (ep0..ep15). Aggregates per-endpoint CSRs/buffers/DMA status, produces combined interrupt outputs (inta/intb), endpoint match muxing for PE, and issues per-endpoint DMA request bits.
- usbf\_ep\_rf: Endpoint register file (real endpoint). Holds endpoint CSR (type/enable/stall/max packet, data toggle, buffer select), buffer descriptors (two buffers with address/size flags), per-endpoint interrupt enables/status, and DMA request/handshake logic for IN/OUT directions.
- usbf\_ep\_rf\_dummy: Stub endpoint RF used when an endpoint is disabled via compile-time macros; returns safe defaults and no DMA/interrupts.
- usbf\_wb: WISHBONE slave interface. Decodes address space into RF vs external memory windows, returns read data from RF or memory, sequences memory access requests toward the arbiter, and generates WISHBONE acknowledgments.
- usbf\_mem\_arb: External SSRAM arbiter. Time-multiplexes memory interface between protocol-side master (payload DMA) and WISHBONE-side master (CPU access), arbitrating address/data/we and returning read data/ack.

Notes:

- Endpoint instantiation is compile-time selectable via USBF\_HAVE\_EPx macros (EP0 always present). usbf\_rf multiplexes the selected endpoint's CSR/BD and DMA status toward PE.
- External memory is addressed as 32-bit words with parameterizable high address width (SSRAM\_HADR). The arbiter bridges protocol (phy\_clk) and bus (wclk) domains.
- Interrupts are generated per-endpoint and at RF level (attach/detach/reset/suspend/errors) and presented as inta/intb outputs.
- DMA requests are per-endpoint (dma\_req\_o[15:0]) with matching dma\_ack\_i inputs.

-- Clock Domain Partitioning and Crossings --

- Clocks and resets
- Two primary clock domains:
  - PHY domain: phy\_clk\_pad\_i (UTMI/packet path, DMA, memory arbiter control, Wishbone transfer FSM)
  - WB domain: clk\_i (WISHBONE host/programming, interrupt aggregation)
- Reset rst\_i is fed to both domains. If USBF\_ASYNC\_RESET is undefined (as in this file), all resets are synchronous to their local clock. If defined, resets become async (negedge rst) with no additional re-synchronization across domains.
- Domain ownership (top-level instances)
- PHY domain (phy\_clk\_pad\_i): usbf\_utmi\_if (incl. usbf\_utmi\_ls), usbf\_pl (usbf\_pd/usbf\_pa/usbf\_idma/usbf\_pe), usbf\_mem\_arb sequential logic, usbf\_wb main FSM and

RF/Memory selection, usbf\_rf "clk" side (per-EP datapath and DMA counters)

- WB domain (clk\_i): usbf\_rf "wclk" side (host-visible CSRs: funct\_adr, masks, vendor ctrl, readback mux; INT aggregation), top-level resume\_req\_r/susp\_o handling

- Key clock domain crossings (CDC)

1) WB -> PHY (programming path via usbf\_wb)

- wb\_stb\_i/wb\_cyc\_i/wb\_we\_i sampled into PHY domain (wb\_req\_s1) without explicit synchronizers; wb\_addr\_i/wb\_data\_i used combinationally in PHY FSM

- Return path: wb\_ack\_d generated in PHY domain, then single-flop pipelined in WB domain (wb\_ack\_s1/2) to produce wb\_ack\_o

- Read data: rf\_din/ma\_din (PHY) used to drive wb\_data\_o in WB domain; validity tied to ack pipeline rather than CDC synchronizers

2) PHY -> WB (status/errors into RF)

- usb\_attached and suspend are double-synchronized in WB domain (attach\_r/attach\_r1, suspend\_r/suspend\_r1)

- usb\_reset, rx\_err, nse\_err, pid\_cs\_err, crc5\_err are single-flop sampled in WB domain and used for RF interrupt source bits

- UTMI vendor status (VStatus\_pad\_i) captured in PHY (VStatus\_r), then single-flop sampled in WB domain (utmi\_vend\_stat\_r)

- LineState captured in PHY (LineState\_r), read in WB domain without explicit synchronization

- RF per-endpoint int\_stat bits are set in PHY domain and read/cleared in WB domain (level-based, no synchronizer)

3) WB -> PHY (RF controls to PHY)

- funct\_adr is written in WB domain and used in PHY domain (usbf\_pl) without explicit synchronization

- utmi\_vend\_ctrl and utmi\_vend\_wr are written in WB domain and single-flop retimed into PHY domain

4) PHY <-> WB (DMA request/ack for endpoints in usbf\_ep\_rf)

- dma\_req\_d computed in PHY domain and consumed in WB domain to assert dma\_req\_r (no explicit synchronizer)

- dma\_ack asserted in WB domain and two-flop retimed into PHY domain (r4/r5) to produce dma\_ack\_i  
- Additional hold logic (dma\_req\_hold) used to keep request asserted across clocks

5) Memory subsystem

- usbf\_mem\_arb sequential logic uses PHY clock; wclk port is unused internally

- All SSRAM-side control/data are effectively in PHY domain; WB-originated memory ops are evaluated in PHY via usbf\_wb FSM

6) Top-level suspend/resume

- suspend\_clr generated in PHY and single-flop sampled in WB (suspend\_clr\_wr)

- resume\_req\_r generated in WB and sampled in PHY by UTMI LS (no explicit synchronizer)

7) Interrupt outputs

- inta\_o/intb\_o are generated in WB domain from per-EP and RF sources (which originate in PHY domain but are treated as level signals)

- CDC design intent and constraints

- The design relies on level-stable handshakes and long-lived status bits rather than explicit multi-flop synchronizers on many buses.

- For STA/constraints, treat clk\_i and phy\_clk\_pad\_i as asynchronous; add set\_clock\_groups  
-asynchronous or equivalent false paths between them.

- For handshake-based paths (WB ack, DMA req/ack), consider multicycle/false-path exceptions aligned with the handshake protocol.

- If targeting ASIC or timing-closure-sensitive FPGA flows, consider adding explicit 2-flop synchronizers for single-bit controls (e.g., funct\_adr, usb\_reset, error flags, wb\_req\_s1) and CDC FIFOs or registered staging for multi-bit data (wb\_addr\_i/wb\_data\_i, ma\_din/rf\_din, utmi\_vend\_stat, ep\*\_dout), or ensure data is only observed in conjunction with a synchronized handshake.

- If enabling UUSBF\_ASYNC\_RESET, add reset deassertion synchronizers per domain.

-- Memory Arbitration and Buffer Integration --

- Scope
  - Describes how the USB function core arbitrates access to the on-chip packet buffer memory (SSRAM) and how endpoint buffers are represented, updated, and consumed by the protocol engine and the Wishbone host.
- Buffer Memory Architecture
  - Physical memory: external/internal synchronous SRAM interface exposed by usbf\_mem\_arb.
  - Data width: 32-bit. Addressing is word-based on the SRAM bus and byte-based inside the packet engine.
  - Size: parameterized by SSRAM\_HADR. With SSRAM\_HADR=9, the buffer size is 1024 words (4096 bytes).
  - Clocks: arbitration and memory interface run in the PHY clock domain (phy\_clk). The Wishbone interface runs in wb\_clk.
- Masters of the Buffer Memory
  - USB master (M-side): internal IDMA (usbf\_idma) driven by the protocol engine (usbf\_pe/usbf\_pl) for packet RX/TX.
  - Wishbone master (W-side): CPU/host access via usbf\_wb, used to inspect/fill buffers and endpoint register file.
- Arbitration Policy (usbf\_mem\_arb)
  - Priority: USB master has fixed priority over Wishbone.
  - Selection: wsel = (wreq | wack) & !lreq. If the USB master is requesting, it owns the bus; otherwise the Wishbone master is serviced.
- Handshakes:
  - USB side: mack is asserted combinationally with mreq (zero-wait for M-side). mcyc mirrors mack.
  - Wishbone side: wack is generated in phy\_clk domain using a registered pulse (wack\_r) gated with !lreq to avoid contention.
- Multiplexing:
  - sram\_adr: driven by USB madr when M has bus; otherwise by W wadr.
  - sram\_dout (write data): driven by USB mdin or Wishbone wdin per wsel.
  - sram\_re is held asserted (1) for designs that don't require explicit read enables; sram\_we is qualified per selected master.
- Read data fan-out: sram\_din (SRAM read) is broadcast back to both masters (mdout, wdout); only the acknowledged master consumes it.
- Wishbone Integration (usbf\_wb)
  - Address map select (final macros):
    - USBF\_RF\_SEL = !wb\_addr\_i[12] selects register file (RF).
    - USBF\_MEM\_SEL = wb\_addr\_i[12] selects the buffer memory window.
  - RF path: direct read/write to usbf\_rf (endpoint register file and core CSRs).
  - Memory path: forwards ma\_adr, ma\_dout, ma\_we, ma\_req to arbiter; data returns via ma\_din; ack returned as wb\_ack\_o after a short pipeline in wb\_clk.
  - Addressing to memory: word-aligned, using ma\_adr[SSRAM\_HADR+2:2] into the arbiter (lower 2 bits dropped for 32-bit words).
- Endpoint Buffer Model (usbf\_ep\_rf)
  - Each endpoint has two 32-bit buffer descriptors: buf0 and buf1.
  - Field layout:

- [31] NA (Not Allocated/Unavailable) flag. If set or if address is all-ones, buffer is considered not available (bufx\_na).
- [30:17] Size (14 bits): remaining or configured payload size for the buffer.
- [SSRAM\_HADR+2:0] Buffer start address (byte address). Low nibble is multiplexed contextually (see below).
  - Original descriptor shadow: buf0\_orig captures CPU-programmed buf0 for release/restore (buf0\_rl).
  - Endpoint matching and status: csr holds EP type/dir, enable/stall, transfer attributes, max packet size, and SW-controlled DP/BSEL bits.
  - Interrupts: per-endpoint interrupt sources latched and masked; combined inta/intb are generated in wb\_clk domain.
- Descriptor Update Semantics (driven by usbf\_pe)
  - Data movement updates are produced on idin and applied via strobes:
  - buf0\_set / buf1\_set: write updated descriptor (address, size, dpid, buffer select) into the selected buffer.
  - buf0\_rl: release buf0 and restore from buf0\_orig (used when a DMA cycle completes and buffer rotation is needed).
  - uc\_bsel\_set / uc\_dpd\_set: update user buffer select and data PID fields packed into the low nibble when not writing a new address.
  - idin packing rules:
    - idin[31:17] = {buffer\_done, new\_size} for normal updates, or {4'h0, szu\_c} when out\_to\_small is asserted (special short OUT handling).
    - idin[SSRAM\_HADR+2:4] = new\_adr[SSRAM\_HADR+2:4] (word-aligned address increment) or old buf0\_adr when out\_to\_small.
    - idin[3:0] = new\_adr[3:0] when buf\_set\_d is writing a new address, otherwise {next\_dpid[1:0], next\_bsel[1:0]} to carry DPID/BSEL status.
  - Buffer availability flags:
    - bufX\_na = bufx[31] | (&bufx;\_adr) indicates not available (either flagged or unallocated address).
    - no\_buf0\_dma considers dma\_en and in/out buffer thresholds to prevent starting transfers when there is insufficient data/space.
- Protocol Engine to Memory Data Path (usbf\_pl → usbf\_idma)
- RX (OUT) path:
  - usbf\_pd parses incoming packets; usbf\_pe validates and enables rx\_dma\_en.
  - usbf\_idma accumulates 8-bit rx\_data\_st into 32-bit words in dtmp\_r, issues writes when a word completes or on last bytes, and advances adr.
  - Byte count szu\_c increments with rx\_data\_valid; transfer completes on rx\_data\_done or sized-out.
  - After successful reception, usbf\_pe computes new\_adr/new\_size and updates buf descriptors; it may NACK/ACK depending on size and policy.
- TX (IN) path:
  - usbf\_pe enables tx\_dma\_en; usbf\_idma prefetches two 32-bit words (rd\_buf0/rd\_buf1) using mreq bursts and serves tx\_data\_st to usbf\_pa.
  - szd\_c tracks remaining bytes; send\_data is held active until szd\_c reaches zero (or zero-length packet case).
  - Addressing and wrap:
    - adr is byte-based buffer pointer used by IDMA; madr is word-based address to SRAM.
    - last\_buf\_adr = adr + buf\_size; when adrw\_next reaches last\_buf\_adr and dma\_en is set, adrw\_next wraps to 0 (circular addressing under DMA mode).
- Flow Control, Timeouts, and Acknowledgments
  - The protocol engine enforces ACK/NACK/STALL/NYET policy based on endpoint state, buffer availability, max packet size constraints, and high-speed rules.

- RX ACK timeout and TX data timeout counters are parameterized for FS/HS and gate state transitions; on timeout, transfers abort and no descriptor update occurs.
- External DMA Signaling (System-level)
  - Each endpoint raises dma\_req when internal counters indicate data is ready for system DMA or space is available, honoring direction and max packet sizing.
  - dma\_ack is synchronized across clk domains ( $wclk \leftrightarrow clk$ ) inside usbf\_ep\_rf; hold logic prevents premature deassertion if there is still work to do.
- Clock-Domain Crossings and Safety
  - Arbitration and SRAM control are in phy\_clk; Wishbone requests/acks cross into this domain (wack\_r handshake).
  - Endpoint DMA req/ack cross between wclk and clk safely using multi-flop synchronization (r1/r2/r4/r5 sequence).
- Design Guarantees and Assumptions
  - USB master always wins arbitration to meet real-time requirements; Wishbone accesses are deferred but eventually serviced when USB is idle.
  - Read data is broadcast on both master return paths; only the acknowledged side latches it.
  - Word-aligned SRAM accesses; byte steering is handled internally in IDMA.
  - Buffer descriptors default to 0xFFFF\_FFFF (invalid) after reset and must be programmed by software before use.
  - Parameter SSRAM\_HADR defines both memory size and descriptor address field width; macros control the address decode between RF and MEM windows.

#### -- Endpoint Instantiation and Scalability --

The core supports up to 16 hardware endpoints, indexed 0–15. Endpoint 0 is always instantiated (control endpoint). Endpoints 1–15 are instantiated conditionally at compile time via Verilog defines:

- Define USBF\_HAVE\_EPx (x = 1..15) to instantiate a real endpoint register file (usbf\_ep\_rf) for that number.
- If a define is absent, a dummy endpoint (usbf\_ep\_rf\_dummy) is instantiated. The dummy never matches tokens, raises no interrupts, and issues no DMA requests; its buffers read back as all 1s (0xFFFF\_FFFF) and CSR/INT read as 0.

#### Contiguity requirement:

- Endpoints must be enabled contiguously starting from EP1. An elaboration-time check in usbf\_top flags an error if EP(n+1) is enabled while EPn is not. EP0 is always present and not part of this contiguity check.

#### Addressing and register map scaling:

- All endpoint register banks are accessed through the shared register-file module (usbf\_rf). The Wishbone address slice adr[6:2] selects the endpoint window:
  - 0x04 → EP0, 0x05 → EP1, ..., 0x13 → EP15.
- Within an endpoint window, adr[1:0] selects the register:
  - 0: CSR (control/status), 1: INT (interrupt source/enable), 2: BUF0 descriptor, 3: BUF1 descriptor.
- Adding endpoints increases the accessible register windows linearly; no other address decode changes are required.

#### DMA and interrupt scalability:

- DMA request is a 16-bit vector dma\_req[15:0], one bit per endpoint (index aligns with endpoint number). dma\_ack[15:0] is returned per-endpoint. Disabled (dummy) endpoints drive dma\_req=0.

- Endpoint interrupt outputs (inta, intb) are generated per endpoint and OR-aggregated across all instantiated endpoints in usbf\_rf. Additional RF/global interrupt sources are also ORed in with per-class masks.

Endpoint selection and data path:

- Each real endpoint has a CSR field CSR[21:18] that holds the USB endpoint number it responds to, CSR[27:26] for direction/type (00=CTRL, 01=IN, 10=OUT).
- During token processing, the usbf\_rf asserts ep\_match for the endpoint whose CSR[21:18] equals the token's endpoint number. Only the matched endpoint's CSR, BUF0/BUF1, and flow-control outputs (dma\_in\_buf\_sz1, dma\_out\_buf\_avail) are forwarded to the packet engine, ensuring a single active endpoint per transaction.

Buffers and memory sharing:

- Every instantiated endpoint exposes two buffer descriptors (BUF0/BUF1): address [SSRAM\_HADR+2:0] and size [30:17], with additional status flags in the low nibble. All endpoints share the same external SSRAM through a single arbiter (usbf\_mem\_arb); enabling more endpoints does not change the memory interface width or timing, only the number of register banks and potential concurrent DMA requesters.

Build-time configuration guidance:

- To scale the number of endpoints, define USBF\_HAVE\_EP1 ... USBF\_HAVE\_EPN contiguously for the highest endpoint required. Leave higher-numbered defines undefined.
- No RTL edits are needed beyond defines. The dma\_req vector, interrupt aggregation, address decode, and arbitration automatically scale to the highest enabled endpoint.

Example (this build):

- With USBF\_HAVE\_EP1/2/3 defined, EP0–EP3 are active; EP4–EP15 are dummies. The dma\_req width remains 16, but only bits [3:0] can assert.

-- Interrupt Wiring and Aggregation --

Overview

- Two interrupt outputs are provided at the top level: inta\_o and intb\_o. They are generated in the register file (usbf\_rf) and updated on the WISHBONE clock (wclk = clk\_i).
- Each endpoint (EP0..EP15) can assert two interrupt lines (epX\_inta, epX\_intb). usbf\_rf ORs all endpoint interrupt lines and combines them with a set of controller (RF-level) interrupt sources that are maskable per top-level line.

Top-level interrupt equations (in usbf\_rf)

- inta\_o = OR over all epX\_inta (X = 0..15) OR any(RF\_INT\_SRC & INTA\_MASK)
- intb\_o = OR over all epX\_intb (X = 0..15) OR any(RF\_INT\_SRC & INTB\_MASK)
- All terms above are registered on wclk.

RF-level interrupt sources and masks

- RF interrupt sources are collected in a 9-bit sticky vector int\_srcb[8:0]. Bits are set on events and cleared on read of the RF INT STATUS register.
- int\_srcb[8] = USB reset detected
- int\_srcb[7] = RX error
- int\_srcb[6] = Detach
- int\_srcb[5] = Attach
- int\_srcb[4] = Suspend end
- int\_srcb[3] = Suspend start

- int\_srcb[2] = NSE (no selected endpoint) error
- int\_srcb[1] = PID check-sum error
- int\_srcb[0] = CRC5 error (token)
- Two 9-bit masks gate RF sources into each top-level line:
  - INTA\_MASK = inta\_msk[8:0]
  - INTB\_MASK = intb\_msk[8:0]
- Mask programming (WISHBONE, usbf\_rf address space, wclk domain):
  - Write RF INT MASK register (addr 0x02):
    - inta\_msk <= din[8:0]
    - intb\_msk <= din[24:16]
- RF source clear mechanism:
  - Read RF INT STATUS register (addr 0x03) with re=1 clears all bits of int\_srcb. Clearing is per-bit, synchronous to wclk.

#### RF INT STATUS readback composition (addr 0x03)

- 32-bit read data layout: { 3'b000, int\_srcb[8:0], 4'b0000, int\_srca[15:0] }
- int\_srcb[8:0] are the sticky RF-level sources described above; reading this register clears int\_srcb.
- int\_srca[15:0] is a live (non-sticky) per-endpoint summary: int\_srca[N] = epN\_inta | epN\_intb.
- Mapping: bit 15..0 correspond to EP15..EP0, respectively.

#### Endpoint interrupt generation and masking (per usbf\_ep\_rf instance)

- Each endpoint implements its own 7-bit status vector int\_stat[6:0] that latches endpoint-specific events when that endpoint matches the current token.
  - int\_stat[6] = OUT payload smaller than max packet size while DMA (out\_to\_small)
  - int\_stat[5] = PID sequence error
  - int\_stat[4] = Buffer1 update/complete event
  - int\_stat[3] = Buffer0 update/complete event
  - int\_stat[2] = Unexpected PID (UPID) event
  - int\_stat[1] = CRC16 error on received data
  - int\_stat[0] = Timeout (IN ACK timeout or OUT data timeout)
- Per-endpoint interrupt mask registers (written on wclk by accessing the endpoint's register window, offset 0x1):
  - On write to EP INT MASK (offset 0x1 within the endpoint window):
    - ienb[5:0] <= din[21:16] gate epX\_intb
    - iena[5:0] <= din[29:24] gate epX\_inta
  - Endpoint interrupt outputs are level-true and updated on wclk:
    - epX\_inta <= OR over (int\_stat[i] & iena[i]) for i in the implemented set
    - epX\_intb <= OR over (int\_stat[i] & ienb[i]) for i in the implemented set
  - Clearing endpoint interrupt status:
    - Read EP INT STATUS (offset 0x1 within that endpoint's register window) with re=1 clears all bits in that endpoint's int\_stat.

#### Endpoint presence

- EP0 is always present. EP1..EP15 are conditionally instantiated via USBF\_HAVE\_EP# macros. If an endpoint is not instantiated, a dummy module ties epX\_inta/intb low, so it does not contribute to inta\_o/intb\_o or int\_srca.

#### Clocking and reset domain

- All interrupt status, mask, aggregation, and the top-level inta\_o/intb\_o are in the WISHBONE clock domain (wclk = clk\_i). RF event latching from PHY domain signals is synchronized into wclk before affecting int\_srcb.

## Software usage summary

- To enable top-level reporting of RF events on inta\_o/intb\_o: write RF INT MASK (addr 0x02) to set inta\_msk/intb\_msk.
- To observe and clear RF events: read RF INT STATUS (addr 0x03). Reading clears only the RF portion (int\_srcb), not endpoint statuses.
- To enable endpoint-generated interrupts on inta\_o/intb\_o: program per-endpoint iena/ienb in the endpoint's INT MASK register (offset 0x1) and ensure endpoint int\_stat bits can set.
- To clear endpoint interrupt causes: read the endpoint's INT STATUS register (offset 0x1) for that endpoint window; this clears that endpoint's int\_stat.
- The top-level lines assert whenever any enabled endpoint or masked RF source is active; they deassert when all enabled sources are cleared.

-- Reset Distribution and Sequencing --

## Overview and polarity

- A single reset net rst\_i drives the entire core. Reset is active-low: rst\_i = 0 asserts reset. The same level is driven out on phy\_rst\_pad\_o to reset the external UTMI PHY.

## Clock-domain distribution

- The core has two principal clock domains: wb clock (clk\_i) and PHY/packet-processing clock (phy\_clk\_pad\_i). The same rst\_i net feeds logic in both domains.
- Compile-time option USBF\_ASYNC\_RESET selects asynchronous assertion (negedge rst) for most sequential blocks; otherwise resets are synchronous to the local clock. In both cases, deassertion is sampled on the local clock.

## Deassertion and sequencing requirements

- Ensure both clk\_i and phy\_clk\_pad\_i are running and stable before deasserting rst\_i. Hold rst\_i low for at least two rising edges of both clocks to guarantee all FSMs and counters initialize deterministically.
- Because reset is distributed into multiple clock domains without additional synchronization, system-level sequencing must avoid glitching rst\_i and must meet each domain's recovery/removal timing.
- The external UTMI PHY reset (phy\_rst\_pad\_o) mirrors rst\_i; release the PHY reset together with or before the internal logic so that UTMI Rx/Tx handshake signals become valid when internal state machines leave reset.

## Module-level effects on reset

- usbf\_utmi\_if and usbf\_utmi\_ls (PHY/link state): UTMI-LS FSM enters POR on rst\_i and also whenever usb\_vbus is asserted. Defaults are forced to Full-Speed: XcvSelect=FS, TermSel=FS pull-up on, OpMode bit-stuffing off until the state machine transitions. usb\_suspend is cleared, usb\_attached is cleared, mode\_hs=0. After reset and with bus conditions present, the FSM proceeds through ATTACH ( $\approx 100$  ms debounce via internal timer) to NORMAL. suspend\_clr is generated during resume paths and is used at top-level to clear any latched resume\_req.
- usbf\_pl (protocol layer) and sub-blocks:
  - usbf\_pd (packet decoder): PID, token/data pipelines and CRC engines reset; FSM  $\rightarrow$  IDLE.
  - usbf\_pa (packet assembler): CRC16 seed set to 0xFFFF; FSM  $\rightarrow$  IDLE; no data/CRC emission pending.
  - usbf\_idma (internal DMA): Address/counters cleared, send\_data deasserted; read/write pipelines idle; FSM  $\rightarrow$  IDLE.
  - usbf\_pe (packet engine): All control/data-path FSMs  $\rightarrow$  IDLE; buffer selectors, timers (ACK/DATA TO), and error flags cleared; no token/data is driven.
- usbf\_mem\_arb (SRAM arbiter): wack\_r is cleared (PHY clock domain). SRAM read-enable is

statically 1 by design; write-enable is held low while requests are idle.

- usbf\_wb (Wishbone interface): Main arbiter FSM (PHY clock domain) → IDLE; Wishbone handshake flops in clk\_i domain start in the deasserted state and generate a clean first acknowledge after reset.
- usbf\_rf (register file) and usbf\_ep\_rf (per-endpoint regs): Function address (funct\_adr) resets to 0; interrupt masks clear; vendor-control write strobes clear. All endpoints: CSR fields default to 0; buf0/buf1 descriptors initialize to 0xFFFF\_FFFF (not allocated), DMA counters/buffer-availability indicators clear, and interrupt status bits clear (and are cleared again upon read of the status register). inta/intb outputs are low until enabled and a source is set.

#### VBUS interaction

- The UTMI link-state FSM unconditionally re-enters POR while usb\_vbus is asserted; after bus presence and timers satisfy attach timing, the FSM proceeds to ATTACH then NORMAL. System designers should ensure usb\_vbus reflects actual attach/power state to avoid holding the link FSM in POR.

#### Cross-domain considerations

- Several handshakes cross domains (e.g., resume\_req synchronizers, DMA req/ack, UTMI status). Reset drives all participating flops to benign values; there is no additional reset-release ordering between domains beyond the shared rst\_i. Designers should therefore deassert rst\_i only when both clocks are running.

#### Summary

- Assert rst\_i low to reset the entire core and the external PHY. Keep it asserted until both clocks are stable. With UBF\_ASYNC\_RESET defined, assertion is asynchronous per domain; otherwise it is synchronous. On release, all FSMs start in IDLE/POR, endpoints are disabled with invalid buffer descriptors, interrupts are cleared, and the UTMI FSM performs its standard attach/normal transition when VBUS and line conditions permit.

### -- PHY and External Memory Integration --

This core integrates a UTMI+ USB PHY and an external 32-bit synchronous SRAM through well-defined interfaces and clock-domain boundaries.

#### UTMI PHY integration

- Physical interface: The usbf\_utmi\_if module connects directly to the external PHY pads: DataOut/TxValid/TxReady (TX), DataIn/RxValid/RxActive/RxError (RX), XcvSelect, TermSel, SuspendM, OpMode, LineState, and usb\_vbus. All PHY-side logic runs on phy\_clk\_pad\_i; reset is rst\_i (also routed to phy\_rst\_pad\_o).
- Link state machine: usbf\_utmi\_ls performs HS/FS speed negotiation, reset detection, suspend/resume entry and signaling (drive\_k), attach/de-attach detection, and line-state based timing. It produces mode\_hs, usb\_reset, usb\_suspend, usb\_attached, and suspend\_clr. SuspendM is driven from usb\_suspend and resume request state.
- Resume handling: Top-level resume\_req\_i is latched until a suspend\_clr from the link state machine and then forwarded to usbf\_utmi\_ls. During resume signaling, TxValid is held asserted and DataOut forced while drive\_k is active.
- Data path coupling: usbf\_pl implements packet decode/encode (PD/PA), CRC, and transaction state machine (PE). It consumes rx\_\* and provides tx\_\* to usbf\_utmi\_if. TxFirst is used to stage the first TX byte to DataOut in sync with TxReady.
- Vendor I/O hooks: UTMI vendor status VStatus\_pad\_i is sampled and exposed via the register file; vendor control VControl\_pad\_o and load strobe VControl\_Load\_pad\_o are driven by software through usbf\_rf.

#### External memory (buffer SRAM) integration

- Memory organization: External buffer memory is 32-bit wide with address bus `sram_adr_o[SSRAM_HADR:0]`. With `SSRAM_HADR=9`, total capacity is 1024 words (4 KiB). `sram_data_o/sram_data_i` carry data; `sram_we_o` controls writes; `sram_re_o` is held high (combinational read enable).
- Masters and arbitration: `usbf_mem_arb` arbitrates between two masters in the PHY clock domain: (1) the packet-layer IDMA (M-port: `madr`, `mdout`, `mdin`, `mwe`, `mreq`, `mack`) used for line-rate payload transfers; and (2) the system/Wishbone software access (W-port: `wadr`, `wdout`, `wdin`, `wwe`, `wreq`, `wack`) used to inspect/fill endpoint buffers. Grant policy favors the IDMA; the W-port gains access only when no M-port request is active. Address and write-data are muxed based on the current grant; read-data is broadcast to both masters.
- Addressing and alignment: The IDMA operates on byte addresses `adr[SSRAM_HADR+2:0]` and packs/unpacks to 32-bit words internally. The SRAM is accessed using word addresses; IDMA and WISHBONE word indices are derived by dropping the two LSBs (e.g., `ma_addr[SSRAM_HADR+2:2]` to W-port). This yields contiguous, word-aligned buffer addressing for software and streaming byte addressing for the IDMA.
- Clock-domain crossing: All SRAM control (arbiter, M/W requests) runs on `phy_clk_pad_i`. `usbf_wb` synchronizes Wishbone bus requests from `clk_i` into the PHY domain (`wb_stb_i/wb_cyc_i` sampled on `phy_clk`) before generating memory requests, avoiding CDC hazards. W-port acknowledge (`wack`) is generated in the PHY domain and observed by `usbf_wb` in the system domain to complete Wishbone handshakes.
- Software mapping: The Wishbone address map is split by `wb_addr_i[12]`: 0 selects the register file (`usbf_rf`), 1 selects buffer SRAM via `usbf_wb → usbf_mem_arb`. Software accesses buffer memory as 32-bit words using `ma_addr[SSRAM_HADR+2:2]`.
- Packet-layer DMA: `usbf_idma` performs bursty RX writes and TX reads with word coalescing and sub-word packing, handshaking via `mreq/mack` and `mwe`. Buffer sizes and descriptors come from the endpoint register file; the packet engine updates descriptors atomically after transfers.

#### Reset and status

- `rst_i` resets both PHY and system domains; `phy_rst_pad_o` mirrors `rst_i`. LineState and VStatus are synchronously sampled in the PHY domain. PHY/link status (`usb_reset`, `usb_suspend`, `usb_attached`, `mode_hs`) is fed into the packet layer and register file for software visibility and interrupt generation.

-- System Bus Address Space Integration --

#### Bus type and width:

- Wishbone classic, 32-bit data bus, single transfer (no pipeline), one-hot ack. No byte select signals are implemented; all accesses are 32-bit word accesses.
- Local address bus width inside the core is A12:0 (13 bits). As compiled here: `USBF_UFC_HADR = 12`.

#### Top-level address map (relative to the core's base address B):

- The 8 KB local space is split by address bit 12.
- B + 0x0000\_0000 ... B + 0x0000\_0FFF (`wb_addr[12] = 0`): Register File (RF) region
- B + 0x0000\_1000 ... B + 0x0000\_1FFF (`wb_addr[12] = 1`): Buffer Memory window (SSRAM)

#### Address select and alignment:

- Region select:
- RF\_SEL = !`wb_addr[12]`
- MEM\_SEL = `wb_addr[12]`
- All accesses must be 32-bit word aligned. `wb_addr[1:0]` are ignored by the core (no byte enables).

Register File (RF) region (`wb_addr[12] = 0`):

- Register addressing uses wb\_addr[8:2] (7-bit word index) inside RF. The core internally decodes:
- Main/global registers: word index adr[6:0] = 0x00 ... 0x05
- 0x00: Main CSR (read). Also write: bit[5]=1 issues a resume request toward the UTMI layer.
- 0x01: Function Address (read/write). Bits [6:0].
- 0x02: Interrupt mask (read/write): {intb\_msk[8:0], inta\_msk[8:0]} packed as {7'h0, intb\_msk, 7'h0, inta\_msk}.
- 0x03: Interrupt sources (read/clear-on-read): packed as {int\_srcb[8:0], int\_srca[15:0]}.
- 0x04: Frame/time stamp (frm\_nat) (read).
- 0x05: UTMI vendor interface:
- Read: Vendor status ({24'h0, utmi\_vend\_stat}).
- Write: Vendor control (low nibble din[3:0] latched to VControl; write also asserts VControl\_Load strobe).
- Endpoint windows (per endpoint, 16 windows, each 4 words):
- Base word index = 0x04 + ep\_num (ep\_num = 0...15). Within a window, wb\_addr[3:2] selects:
  - +0: EP CSR (configuration/status)
  - +1: EP INT (interrupt status/masks per EP)
  - +2: EP BUF0 descriptor
  - +3: EP BUF1 descriptor
- Unimplemented endpoints are present as read-as-zero, write-ignored dummies (no side effects). In this build EP0...EP3 are implemented; EP4...EP15 are stubs.
- Notes:
  - RF read data is returned when wb\_addr[12]=0; RF writes are performed when wb\_addr[12]=0 and wb\_we\_i=1.
  - Only the addresses listed above have defined behavior; other RF word indices return zero and ignore writes.

#### Buffer Memory window (wb\_addr[12] = 1):

- Maps the USB buffer SSRAM to the system bus.
- SSRAM addressing:
  - Parameter SSRAM\_HADR = 9 (as compiled here). The external SSRAM address bus is sram\_adr[9:0] (10 bits), addressing 1024 words.
  - Wishbone word address bits wb\_addr[11:2] map to sram\_adr[9:0] (i.e., wb\_addr >> 2, truncated to 10 bits). The window thus covers 1024 x 32-bit words = 4096 bytes.
- Data path:
  - Reads/writes are full 32-bit words. There are no byte enables.
  - The arbiter (usbf\_mem\_arb) arbitrates between the PHY DMA master and the Wishbone port. System accesses to the memory window may incur wait states when DMA is active.

#### Handshake/timing on Wishbone:

- A valid bus request is wb\_cyc\_i=1 and wb\_stb\_i=1. The core pulses wb\_ack\_o=1 for exactly one cycle per completed transfer.
- RF accesses typically acknowledge after one core clock; memory window accesses acknowledge when the internal memory arbiter grants and returns data (wait states possible).

#### Integration guidelines:

- Map an 8 KB aperture for the core at base B. Use bit[12] of the local offset to select RF vs SSRAM.
- Ensure the bus fabric performs 32-bit word accesses only; no byte writes are supported.
- Treat wb\_addr[11:9] as don't-care in RF space (they are ignored by the RF decoder); software should keep them 0 for forward compatibility.
- Software should program endpoint buffer descriptors (BUF0/BUF1) with word-aligned SSRAM addresses consistent with the 4 KB buffer memory window.
- Interrupts are exposed as two lines (inta\_o, intb\_o); per-EP and core interrupt sources/masks are in

the RF region at 0x02/0x03.

- Function address is set via RF 0x01. A resume request toward the PHY can be triggered by writing bit[5]=1 at RF 0x00.
- The core reports its configured local widths at reset (A12:0 for Wishbone, SSRAM\_HADR for memory). In this build: Wishbone A12:0, SSRAM window size 4 KB.

-- Verification Hooks and Testbench Notes --

This core exposes multiple built-in hooks and behaviors to aid verification and testbench development.

#### Clocks and reset

- Two clock domains: wb\_clk=clk\_i (register-file/host) and phy\_clk=phy\_clk\_pad\_i (UTMI/link/memory). Many CDC handshakes exist (e.g., request/ack flip-flop pairs). Reset is synchronous by default; define USBF\_ASYNC\_RESET to switch to async resets on many always blocks. Top drives phy\_rst\_pad\_o=rst\_i for convenience.

#### Compile-time switches (enable in TB via +define+...)

- USBF\_DEBUG: enabled here; emits \$display error lines when key FSM inputs are X/unknown. Helps catch TB driving issues.
- USBF\_VERBOSE\_DEBUG: optional; adds state-entry traces in PE/IDMA.
- USBF\_HAVE\_EP: controls which endpoint register files are real (u1..u15) vs dummy. Top initial block prints errors if gaps exist.
- Timing constants: USBF\_T\* macros parameterize UTMI low-speed/full/high-speed timing checks (used by UTMI LS state machine). Can be adjusted for accelerated sims.

#### UTMI interface TB driving

- Connect DataIn/RxValid/RxActive/RxError/LineState and monitor DataOut/TxValid/TxReady.
- Tx path handshake: usbf\_utmi\_if asserts TxValid while any of: tx\_valid (from packet assembler), tx\_valid\_last, or internal resume drive\_k is active. It holds TxValid until TxReady or drive\_k\_r are observed. TB must model TxReady backpressure and accept DataOut updates on TxReady or on tx\_first.
- Rx path: provide byte stream with RxValid/RxActive; PID decode and token/data parsing is in usbf\_pd. Inject errors by setting RxError or creating bad PID complement (pid\_cks\_err), or CRC mismatches; crc16\_err and crc5\_err are generated internally.
- Suspend/resume: drive resume\_req\_i (host) and observe suspend\_clr pulse from utmi\_if. UTMI LS FSM also reacts to LineState (J/K/SE0), usb\_vbus\_pad\_i, and enforces timing windows; TB can emulate resets (SE0), chirp sequences, and HS/FS negotiation.
- Vendor UTMI test registers: write RF address 0x5 (usbf\_rf space) to drive VControl\_pad\_o with VControl\_Load\_o strobe; read UTMI vendor status via VStatus\_pad\_i (latched in u4).

#### Memory/IDMA and arbitration

- usbf\_mem\_arb connects UTMI-side IDMA and WISHBONE-side memory window to a shared synchronous SRAM; sram\_re\_o is tied high; sram\_we\_o pulsed as needed. Arbiter grants write/read-side on phy\_clk; wack asserts when wreq & !mreq & !wack previously, creating simple handshake. TB can use a trivial SRAM model (combinational read, registered write) with address sram\_adr\_o and 32-bit data.
- usbf\_idma generates mreq/mwe and expects mack (granted in arbiter). Word packing/unpacking and last-word completion are handled internally; TB need only provide mack via arbiter and allow continuous access.

#### WISHBONE access and address map

- usbf\_wb decodes wb\_addr\_i MSB per macros: USBF\_RF\_SEL(!wb\_addr\_i[USBF\_UFC\_HADR]) selects register file (RF), USBF\_MEM\_SEL(wb\_addr\_i[USBF\_UFC\_HADR]) selects memory window.

Note the last macro definitions in this file set USBF\_UFC\_HADR=12 and USBF\_SSRAM\_HADR=9; ensure the TB master uses the same widths. Memory path uses ma\_req/ma\_ack handshake (ack is immediate in arbiter).

- Reading RF vs MEM: wb\_data\_o is sourced from rf\_din or ma\_din based on RF\_SEL. wb\_ack\_o is a short pulse derived from internal state; TB WB master should accept single-cycle acks.

Endpoint register file (RF) and DMA request to system

- usbf\_rf instantiates EP0 real and EP1..EP15 per USBF\_HAVE\_EP flags. Each EP has CSR, buffer pointers (buf0/1), interrupt masks/status, and internal DMA bookkeeping.
- External DMA handshake: dma\_req\_o[15:0] are per-endpoint requests for system DMA (not the internal IDMA). TB can assert dma\_ack\_i bits to emulate DMA engine acceptance; EP logic keeps requests asserted/held across clock domains and uses holdoffs to prevent underrun/overrun.
- Interrupts: inta\_o/intb\_o combine EP interrupts and RF-level events (attach/detach, suspend/resume edges, errors). Some RF sources clear on read of the INT\_SRC register (addr 0x3 in RF page); masks at addr 0x2. TB should read INT\_SRC to clear latched sources.
- EP selection/match: usbf\_pe uses token\_fadr==funct\_adr (RF addr 0x1) and ep\_sel field to match endpoints (match\_o). TB must program endpoint CSRs (type/dir/max packet/DMA enable) and buf descriptors before exercising traffic.

Error injection and observability

- PID/CRC errors: pid\_cs\_err, crc5\_err, crc16\_err outputs available; nse\_err indicates token with no EP match. rx\_err input forces framing errors.
- State/time behavior: frm\_nat bus exposes frame number, SOF timing and microframe count for checking timing; useful for suspend/resume/reset sequences.
- USBF\_DEBUG prints: PE and IDMA states emit \$display when critical inputs are X; helps assert TB drives. Top initial \$display summarizes configured endpoints and bus sizes.

Recommended TB scenarios (high-level)

- Reset and speed negotiation: drive usb\_vbus\_pad\_i, SE0 for reset, then K/J line states per HS/FS negotiation; verify mode\_hs, XcvSelect/TermSel/OpMode and usb\_reset strobes.
- Enumeration path: program funct\_adr; configure EP0 CSR; exercise SETUP/IN/OUT with correct PIDs, including zero-length and data PID toggling; verify token/data paths and ACK/NAK/STALL responses.
- DMA flow: configure a non-control EP with DMA enabled; preload buf0/1 descriptors; observe dma\_req\_o and respond with dma\_ack\_i; verify internal IDMA moves data to/from SSRAM and that buf descriptors update (buf\*\_set, uc\_\*\_set) and INT bits set/clear.
- Backpressure and timeouts: throttle TxReady, assert RxActive without data to trigger timeouts (rx\_ack\_to, tx\_data\_to); verify INT\_TO and PE transitions.
- Error cases: inject bad PID complement, CRC errors, sequence errors, and unmatched tokens; verify INT sources (crc5\_err, pid\_cs\_err, nse\_err, seqerr) and endpoint-side INT bits.

Waveform/coverage focus

- FSM coverage: usbf\_pd, usbf\_pa, usbf\_pe, usbf\_idma, utmi\_ls state transitions; handshake crossings (dma\_req/ack, WB ack strobes).
- Endpoint matrix: directions (IN/OUT/CTRL), transfer types (iso/bulk), DMA vs non-DMA, buf0/buf1 rolling, max packet edge cases (short/long/out\_to\_small).
- Interrupts and mask behavior, RF read-to-clear side effects, vendor register handshake.

Known behaviors useful in TB

- sram\_re\_o tied high; memory is effectively always-readable. Arbiter grants based on mreq vs wreq.
- ma\_ack is generated combinationally from mreq inside wb/arbitration, so MEM window behaves 0-wait-cycle; ensure TB WB master samples wb\_data\_o accordingly.
- Zero-length IN support via send\_zero\_length path when max packet=0.

- Final macro definitions in this flattened file override earlier ones; ensure TB uses the last values for address MSBs and SSRAM size.

-- Integration Guidelines and Best Practices --

- Clocks and resets

- Two primary clock domains: wb\_clk (clk\_i) for the system/Wishbone and phy\_clk (phy\_clk\_pad\_i) for UTMI PHY and packet logic. Avoid crossing signals directly; use the existing bridge modules (usbf\_wb, usbf\_utmi\_if, usbf\_pl) which already include synchronizers.
- By default resets are synchronous (USBF\_ASYNC\_RESET not defined). If you require asynchronous resets, define USBF\_ASYNC\_RESET consistently across the project.
- phy\_rst\_pad\_o is a copy of rst\_i; ensure PHY reset timing meets your PHY's requirements.

- UTMI+ PHY integration

- Connect UTMI 8-bit interface: DataIn\_pad\_i/DataOut\_pad\_o, TxValid\_pad\_o, TxReady\_pad\_i, RxValid/Active/Error, LineState\_pad\_i, XcvSelect\_pad\_o, TermSel\_pad\_o, SuspendM\_pad\_o, OpMode\_pad\_o, usb\_vbus\_pad\_i.
- Clock phy\_clk\_pad\_i must be the UTMI data clock (typically 60 MHz). Maintain low-jitter and correct duty cycle.
- Speed negotiation and resume/suspend are handled in usbf\_utmi\_ls. Provide resume\_req\_i when system intends to remote-wake; it is internally latched and auto-cleared by suspend\_clr.
- Verify usb\_vbus polarity versus your PHY. The state machine uses usb\_vbus in its state reset path; mismatched polarity can keep the link in POR.
- Vendor control/status: VControl\_pad\_o and VControl\_Load\_pad\_o drive UTMI vendor registers; VStatus\_pad\_i is sampled. Ensure your PHY exposes these or tie off appropriately.

- Wishbone bus integration

- 32-bit classic Wishbone (no byte selects). Single read/write cycles; wb\_ack\_o is a single-cycle pulse generated in wb\_clk domain.
- Address map split uses macro USBF\_RF\_SEL/USBF\_MEM\_SEL based on wb\_addr\_i[USBF\_UFC\_HADR]. Final macro definitions in this file set USBF\_UFC\_HADR=12 (i.e., wb\_addr\_i[12] selects):
  - wb\_addr\_i[12]==0: Register file (RF) space (accessed via usbf\_rf)
  - wb\_addr\_i[12]==1: Buffer memory space (accessed via usbf\_mem\_arb)
- Ensure wb\_addr\_i is at least 13 bits wide. Align accesses to 32-bit words.

- External buffer memory (SSRAM) interface

- Signals: sram\_adr\_o[SSRAM\_HADR:0], sram\_data\_i/o[31:0], sram\_re\_o, sram\_we\_o. Data width is 32-bit.
- Final macro sets USBF\_SSRAM\_HADR=9, yielding 4 KiB buffer RAM ((1<<(SSRAM\_HADR+1))\*4). Adjust macro as needed.
- usbf\_mem\_arb expects zero-wait-state behavior for the DMA port (mack=mreq). If your memory has wait states, insert adaptation logic or modify the arbiter.
- sram\_re\_o is held high; your memory wrapper should ignore it or treat it as permanently enabled.
- Addressing is word-based inside the core: ma\_addr[SSRAM\_HADR+2:2] maps to sram\_adr\_o. Provide proper byte ordering; IDMA packs bytes little-endian across the 32-bit word.

- Endpoints and register file

- EP0 is always present. Additional endpoints are controlled by defines USBF\_HAVE\_EP1..USBF\_HAVE\_EP15. This top includes defines for EP1–EP3; extend contiguously from EP1 upward (the top checks for gaps and prints errors).
- Register file usbf\_rf exposes per-EP CSRs and descriptors and system-level status/interrupt control.

Access via RF address space (wb\_addr\_i[12]==0). Internal address mapping uses adr[6:2] per-EP windows; keep software consistent with this layout.

- Functional address (device address) is held in funct\_adr and must be set by firmware after a SET\_ADDRESS request.

- Buffer descriptors and data flow

- Each EP has Buf0/Buf1 descriptors and CSR. Descriptor fields (typical):

- [31]: not-available flag; [30:17]: buffer size in bytes; [SSRAM\_HADR+2:0]: start address in words/bytes (see module access). Align buffers to 4-byte boundaries to match 32-bit memory port.

- The packet engine (usbf\_pl/usbf\_pe/usbf\_idma) directly moves data between UTMI and the external buffer RAM based on these descriptors and the EP CSR (direction, max packet size, transfer type).

- Max packet size must match the host configuration and EP CSR; software should ensure descriptor sizes are consistent and large enough to avoid overflow conditions flagged by the core.

- External DMA handshake (system DMA)

- Top-level dma\_req\_o[15:0]/dma\_ack\_i[15:0] expose per-EP requests from usbf\_ep\_rf to system logic. Use these to schedule system DMA transfers between the USB buffer RAM and system memory, if desired.

- Protocol: EP asserts dma\_req when it has space/data according to EP direction and CSR. System asserts corresponding dma\_ack\_i bit to acknowledge service; the EP will deassert request (it may hold the request depending on internal hold logic if more work remains). Design your DMA engine to operate per-bit and avoid missing short pulses (use registered acks).

- Interrupts

- inta\_o and intb\_o are the combined interrupt outputs. Masking and source registers are in the RF (main and per-EP). Software must read/clear latched status per documented locations in usbf\_rf.

- Sources include protocol errors (PID CS, CRC5/CRC16), attach/detach, suspend/resume, USB reset, and EP buffer events.

- Error handling and flow control

- The core autonomously returns ACK/NACK/STALL/NYET based on EP state, buffer availability, transfer type, and HS/FS mode.

- Sequence errors and size mismatches are flagged; software should clear and/or reconfigure EPs accordingly.

- Power management and resume

- usb\_suspend is exported (susp\_o). To remote-wake, assert resume\_req\_i; the core manages K-chirp and resume timing, and auto-clears the request via suspend\_clr.

- Timing and CDC constraints

- Constrain both clk\_i and phy\_clk\_pad\_i domains. Mark inter-domain paths through provided synchronizers as false paths if your STA flow requires it.

- Ensure UTMI timing (TxValid/TxReady and DataOut/DataIn) is met; the core registers outputs and samples inputs in phy\_clk domain.

- Configuration macros sanity

- This file redefines macros; ensure your build system sets final intended values: USBF\_UFC\_HADR (default final 12), USBF\_SSRAM\_HADR (default final 9), USBF\_HAVE\_EPx as needed, and optionally USBF\_ASYNC\_RESET.

- Keep endpoint defines contiguous from 1..N to avoid RF instantiation gaps.

- Software bring-up checklist

- After reset: program EP0 CSR and buffers; write functional address after SET\_ADDRESS; configure

masks in RF; set EP CSRs (direction/type/max packet size); allocate and program buf0/buf1 descriptors pointing into buffer RAM; service dma\_req\_o (if using system DMA); handle interrupts.

- Simulation and synthesis notes
- The design prints configuration info at sim start. Disable USBF\_DEBUG/USBF\_TEST\_IMPL in production to reduce logic and messages.
- Verify memory wrapper semantics (read enable always high, zero-wait DMA path). If not met, adapt usbf\_mem\_arb or add a memory adapter.