

---

## 1. Introduction and Central Problem Statement

Optimizing the Raspberry Pi 5 (RPi 5) for humanoid robotics is a technical imperative, as the dynamic stability of a biped requires a Whole-Body Control (WBC) loop operating at a minimum frequency of 1 kHz, corresponding to a computation period of 1000 µs. At this frequency, every microsecond of latency is critical, and the specific architecture of the RPi 5 necessitates radical measures to guarantee temporal determinism.

Here is a detailed analysis of the challenges and optimization solutions to transform this processor into a hard real-time controller.

## 2. Physical Architecture and Communication (The Nervous System)

### 1. Structural Problem: Disaggregated Architecture and RP1

Although the RPi 5 has a Broadcom BCM2712 (Cortex-A76) processor that is significantly more powerful than its predecessors, it introduces an architectural complexity that is detrimental to real-time performance.

- Southbridge RP1 Indirection: Unlike previous models, the peripherals (GPIO, SPI, I2C) are no longer integrated into the main SoC but are located on a separate chip, the RP1, connected via a PCIe 2.0 x4 link. This physical separation creates an unavoidable arbitration time for each transaction.
- Structural "Dead Time": Instrumental analyses reveal a dead time of approximately 12 µs between the activation of the Chip Select line and the effective start of the clock signal, compared to only 1 µs on the RPi 4.
- Arithmetic impact: For a robot with 12 degrees of freedom requiring 24 transactions per cycle (read and write), this delay accumulates to reach 288 µs, consuming nearly 30% of the total time budget before any algorithmic calculation.
- The performance paradox: Under load, the RPi 5 can exhibit latency peaks reaching 800 µs, whereas the RPi 4 capped at 200 µs, which is catastrophic for a 1000 µs loop.

### 2. Nervous System: Communication and Performance

The development of dynamic humanoid robotics requires a transition from a classic communication architecture to a high-bandwidth "nervous system".

Sources highlight a major technological breakthrough: the abandonment of the SPI bus in favor of the PCIe and CAN-FD combination to guarantee the determinism and reactivity necessary for bipedal balance.

#### 2.1. The Obsolescence of SPI on Raspberry Pi 5

The use of the SPI (Serial Peripheral Interface) bus on the Raspberry Pi 5 now constitutes a critical bottleneck for high-frequency control (1 kHz and above).

- The Chiplet Effect and the RP1 Southbridge: Unlike previous generations, the RP1's BCM2712 SoC offloads input/output functions (including SPI) to a separate chip, the RP1, connected via a PCIe 2.0 x4 link. This hardware indirection introduces an unavoidable arbitration time.
- Dead Time: Instrumental measurements reveal a delay of approximately 12  $\mu$ s between the activation of the Chip Select (CS) signal and the actual start of the clock, as well as between packets. On a Raspberry Pi 4, this delay was only 1  $\mu$ s.
- Time Budget Saturation: For a robot with 12 degrees of freedom (DoF), 24 transactions (read/write per cycle) generate a cumulative delay of 288  $\mu$ s. This consumes nearly 30% of the CPU time of a 1 ms loop before any useful data is even processed, capping the stable control frequency at 500 Hz, which is insufficient for dynamic locomotion.

## 2.2. The PCI Express (PCIe) Revolution and DMA

To break this latency ceiling, the use of the PCIe 2.0 x1 connector exposed by the RP1 is essential.

- Direct Memory Access (DMA): Unlike SPI, which requests the CPU for each byte, PCIe-based CAN-FD controllers use DMA to write sensor data directly into the host system's RAM without CPU intervention.
- Drastic Latency Reduction: The latency for initiating a transaction on PCIe drops to less than 1  $\mu$ s. In addition, the use of interrupt coalescing reduces the overhead associated with context switching.
- Bus Efficiency: While SPI or USB adapters often saturate at 20-50% with significant jitter, a PCIe link allows maintaining a bus load of 95% to 98% without frame loss.

## 2.3. CAN-FD: Densification and Unified Control

The CAN-FD (Flexible Data-rate) protocol complements this architecture as a robust fieldbus for distributed control to members.

- Payload Densification: The increase from 8 bytes (CAN 2.0) to 64 bytes per frame allows for the implementation of unified control frames. A single frame can now encapsulate the target position, speed, torque, and gains (KP, KD).

- Reduced Overhead: This capability halves or even reduces by two-thirds the number of transactions required per control cycle. By grouping data, the number of arbitrations on the bus is reduced, and the atomic consistency of commands applied to each joint is guaranteed.
- Throughput Acceleration: CAN-FD maintains a robust arbitration phase but switches to a throughput of 5 to 8 Mbps for the data phase, radically reducing the bus occupancy time for each message.

In short, if the SPI bus is a crowded hiking trail where each hiker must stop at every barrier, the PCIe/CAN-FD architecture is an automated highway where data flows in massive convoys without ever stopping. [Source: Analogy based on the concepts of transactional latency and source densification 8, 197, 199].

### 3. Software Optimization (Kernel & OS)

#### 2. Kernel Tuning

Applying the PREEMPT\_RT patch is the first step to transforming Linux into a real-time system, but it is insufficient on the RPi 5 without strict partitioning of resources via the cmdline.txt file.

- Isolation (isolcpus=2,3): This physical parameter excludes cores 2 and 3 from the Linux general scheduler. This prevents the system from moving non-critical processes (WiFi, SSH, updates) to these cores, thus avoiding the invalidation of L1/L2 caches that causes unpredictable memory reload times.
- "Full Tickless" mode (nohz\_full=2,3): The Linux kernel normally generates a periodic clock interrupt (the "tick") to manage time. Tickless mode disables this interrupt on isolated cores, eliminating a major source of jitter and enabling "bare-metal" execution of the control code.
- RCU callbacks (rcu\_nocbs=2,3): Kernel memory maintenance operations (Read-Copy-Update) are offloaded to cores 0 and 1. This measure avoids unpredictable stalls during intensive solver memory accesses.
- Frequency governor (cpufreq performance): It is crucial to lock the CPU frequency at its maximum (2.4 GHz). This eliminates latency related to clock speed changes (DVFS) and phase-locked loop (PLL) synchronization.

#### 3. Interrupt Management (IRQ Affinity)

By default, the interrupt controller distributes hardware signals to any available core, which saturates the system with conflicting requests. A physical segregation strategy is necessary.

- Cores 0 and 1 (Management): They absorb all the "noise" of the system, including network interruptions, USB and logging.

- Core 2 (Pre-processing): The critical interrupt from the CAN-FD board (signaling the arrival of sensor data) must be dedicated to it. It prepares the data and injects it into the shared memory.
- Core 3 (Sanctuary): This core is reserved exclusively for the real-time control thread running the QP (Quadratic Programming) or OCS2 solver. By being freed from hardware interrupts and system tasks, it can solve inverse dynamics equations with minimal and constant latency.

Summary of the recommended configuration (cmdline.txt)

To stabilize the loop at 1 kHz, the following parameters must be added: isolcpus=2,3 nohz\_full=2,3 rcu\_nocbs=2,3 cpufreq.default\_governor=performance.

## 4. Electrical Safety and Power (48V Architecture)

### 3. Power Architecture Safety (48V)

Adopting a 48V architecture (typically using 13S LiPo packs) is essential to achieve the power densities required for the dynamic locomotion of humanoid robots. However, this transition transforms the robot into a high-energy system comparable to light electric vehicles, where the low internal resistance of the batteries can generate short-circuit currents of several hundred amps, necessitating rigorous hardware safety protocols.

#### 3.1 Inrush Current Management: The Pre-charge Circuit

Connecting a 48V battery directly to a bus powering many motor drives (such as the Robostrike 04) causes a destructive electric arc, called "inrush current".

- Physics of the phenomenon: Motor controllers have massive banks of filtering capacitors (often >2000 F). At the moment of power-up, if the capacitors are discharged, the system behaves like a short circuit. The instantaneous current, limited only by the resistance of the cables, can reach approximately 1000 A.
- Technical solution: A pre-charge circuit is mandatory to prevent permanent welding of the main relay contacts by electro-erosion. It consists of a secondary relay connected in series with a power resistor (typically between 20 and 50 Ohms, ceramic or wire-wound type) placed in parallel with the main contactor.
- Start-up sequence: A control logic (via microcontroller or time-delay relay) must first close the circuit through the resistor. Once the bus voltage reaches 90 to 95% of the battery voltage, the main contactor is closed without arcing, and then the pre-charge relay is opened.

#### 3.2 Overvoltage Protection: The Braking Chopper

In dynamic robotics, motors are so-called "4 quadrant" loads: they consume energy, but also produce it during braking or impact phases.

- Regenerative energy: During sudden deceleration or when landing from a jump, the motors act as generators and send a massive reverse current back to the DC bus (Back-EMF). If the battery cannot absorb this current quickly enough (or if the battery management system, BMS, cuts the circuit), the bus voltage rises instantly.
- Critical threshold: On a 48V bus, the voltage can exceed 60V, crossing the breakdown voltage of the MOSFETs in the drives and causing their immediate destruction.
- Module operation: The Braking Chopper constantly monitors the voltage. As soon as a critical threshold is reached (usually 52V), a power transistor (IGBT or MOSFET) diverts the excess energy to an external dissipation resistor which transforms the excess into heat, acting as an electrical safety valve.

### 3.3 Emergency Stop (E-Stop) Architecture and Circuit Integrity

Emergency shutdown of a high-power direct current (DC) is physically more complex than that of an alternating current (AC) because there is no zero crossing to extinguish the electric arc.

- Limitations of standard relays: Conventional relays cannot interrupt a 48V DC arc under heavy load; the arc can persist, burning the contacts and keeping the robot powered on despite the emergency button being pressed.
- Dedicated contactors: It is essential to use industrial DC contactors (e.g., Albright SW80 or Gigavac GV200) equipped with magnetic blowers. These devices use permanent magnets to "push" and lengthen the electric arc via the Lorentz force until it physically breaks.
- Redundancy and STO: In accordance with ISO 13849, the circuit must be dual-channel (redundant). Additionally, the drives must support the Safe Torque Off (STO) function, which physically disconnects the power supply to the gate drivers of the power transistors, ensuring that no torque can be produced, even in the event of a control software crash.
- Warning about SSRs: Solid state relays (SSRs) are not recommended for the main safety cutoff because their typical failure mode is short circuit (remaining closed), which is unacceptable for a safety system.

To further stabilize the system during violent current demands (10kW peaks during a jump), the addition of supercapacitors in parallel can act as a buffer, protecting the battery from thermal stress and preventing voltage drops (brownouts) that could reset the on-board computers.

## 5. Detailed Engineering and Technical Risks

### 5. Summary of Technical Risks and Lessons Learned (Reconstruction of the Table)

The following table reorganizes the technical "Bitter Lessons" scattered throughout the original document. Here is a detailed extension of your project's technical specifications, structured by source to provide a robust engineering foundation.

#### 1. Actuators (Thermal): The Endurance Barrier

Quasi-Direct Drive (QDD) actuators are essential for mechanical transparency, but their low reduction necessitates high currents for posture maintenance.

- Physical Problem: Heat is generated by Joule losses. Since the torque is proportional to the current, the heat increases with the square of the torque produced.
- Material Reality: Standard grade Neodymium (NdFeB) magnets lose their magnetic strength irreversibly at 80-100°C, which permanently reduces motor performance.
- Golden Rule: According to Arrhenius' law, an increase of 10°C above the nominal temperature halves the lifespan of the winding insulation.

#### 2. Materials: Structural and Thermal Resilience

Using materials unsuitable for heat can transform a rigid structure into a soft and imprecise system.

- Limitations of PLA: PLA undergoes creep (slow deformation under load) from 55-60°C. The frame of a motor under load can easily reach this temperature, risking blocking the fan blades or misaligning the motor shafts.
- PAHT-CF Solution: The use of PAHT-CF (High Temperature Carbon Reinforced Nylon) is imperative for fairings and structural parts thanks to its Load Deflection Temperature (HDT) of 194°C.
- PCM management: Phase change materials (PCMs) such as paraffin act as "thermal capacitors," storing energy during peak stress without increasing the temperature. However, they require rest periods to solidify again and release this heat.

#### 3. Cooling: Active Exhaust Required

Passive cooling is insufficient for the power densities required by dynamic running.

- High-Pressure Ventilation: In a compact robot, airflow is restricted. High static pressure fans (such as the 9HV series) capable of forcing air through dense heat sinks should be preferred over models with high vacuum airflow.
- Software Integration: Cooling must be coupled with active thermal monitoring that forces the robot to change posture or sit down if the coil exceeds a critical threshold (e.g., 80°C).

#### 4. Electronics (Bus): CAN-FD Signal Integrity

The move to high data rates (5-8 Mbps) transforms cables into complex transmission lines where every physical error creates a "robotic epilepsy".

- Daisy Chain Topology: Star topology is strictly forbidden because it creates massive impedance mismatches (up to 67% of reflected energy). The wiring must be a continuous daisy chain.
- Terminations and Stubs: Each end of the bus must have a resistance of 120 Ohms. The stubs to the motors must be limited to a maximum of 30 cm to avoid destructive echoes.
- Split Termination: To stabilize the bus against common mode noise, two 60 Ohm resistors are used with a capacitor connected to ground at the midpoint.

#### 5. EMI (Interference): Shielding and Protection

The power inverter switches currents of 100A in nanoseconds, creating a hostile electromagnetic environment.

- Hybrid Shielding: Use cables with braid (copper) for mechanical resistance and low frequencies, coupled with foil (aluminum) to block high frequencies from PWM.
- 360° Ground Connection: The shield termination must never be made with a wire ("pig's tail"), as this creates a radiating antenna. It is imperative to use EMC cable glands ensuring complete circular contact.
- Filtering and Flyback: A 51  $\mu$ H common-mode choke (CMC) is recommended to filter motor noise without distorting the CAN-FD signal. TVS protection is essential to absorb flyback voltage surges (>100V) generated during sudden current interruptions in the inductors.

#### 6. Project Strategy: Software-Defined David

Faced with Chinese competition capable of selling robots (e.g., Unitree G1) at a price close to your material cost, the strategy must pivot.

- Software Focus: Don't aim for pure hardware excellence. The value lies in the software stack and cognitive autonomy.
- Target Budget: Maintain a budget of around 5k using "off-the-shelf" components (COTS) and modular chassis.
- Openness: The competitive advantage is the total transparency of the control loops, allowing rapid iterations where industrial systems are "black boxes".

## 6. AI, Simulation and Control

### 4. The Brain: AI and Sim-to-Real Transfer

Transferring policies learned in simulation to the real robot is the most critical step, as "ideal" physical models often fail to account for unmodeled nonlinearities such as friction, mechanical play, and communication delays. To bridge this "reality gap," a hybrid approach combining neural network modeling and stochastic environments is employed.

#### 4.1 Actuator Nets: Neural Modeling of Motors

The conventional approach ( $\tau = k_t \cdot I$ ) is considered insufficient for geared actuators, because it ignores complex physical phenomena such as Coulomb, viscous and Stribeck friction, as well as planetary reducer hysteresis and mechanical backlash.

- Capture methodology: To create a faithful model, the real actuator is excited with variable frequency signals, called "chirp" signals, and random step inputs. The motor's response (actual torque produced, speed, position error) is then precisely recorded as a function of the commands sent.
- Network architecture: This data is used to train a temporal neural network (usually an LSTM or an MLP with history). This network learns the complex transfer function of the real actuator.
- Integration and impact: This "actuator network" is then integrated directly into the simulator (as with Isaac Lab), replacing the default motor model. This technique reduces tracking error from over 20% to less than 5%, often facilitating a "zero-shot" transfer (without readjustment to the real robot). The agent thus learns a policy that natively respects the physical limits and saturations of real motors.

#### 4.2 Domain Randomization

To ensure the robustness of the policy, the simulation environment must be designed to be far more hostile and noisy than reality. By massively perturbing the physical parameters in each training episode, the neural network learns a policy capable of encompassing the entire spectrum of real-world conditions.

- Variability of physical parameters: Sources recommend strict ranges of variation to force adaptability:
  - ÿ Mass of segments: Variation of ±15% to 20% to compensate for unmodeled wiring errors or manufacturing tolerances.
  - ÿ Ground friction: Simulation of coefficients ranging from 0.4 to 1.2 (to mimic surfaces ranging from ice to thick carpet).
  - ÿ Joint friction: Addition of noises of the order of ±0.05 Nm related to wear, grease or temperature.

- Communication latency management: This is a crucial parameter. Command delays ranging from 0 to 20 ms are simulated. This allows the AI to remain stable despite the jitter of the Linux kernel, the delays of the CAN bus and the intrinsic latency of the embedded computer (Raspberry Pi or Jetson).

#### 4.3 Disruptions and Curriculum-Based Learning

The acquisition of recovery reflexes (such as catching oneself after a stumble) requires the introduction of dynamic external perturbations during learning.

- Random thrusts: The simulator applies unpredictable forces to the robot's center of mass (CoM). These "blows" force the agent to discover active equilibrium strategies.
- Progressive curriculum: Applying all disturbances and maximum noise from the outset would prevent learning convergence. Therefore, a curriculum approach is used: the environment is initially "clean," then the difficulty (amplitude of thrusts, level of friction, latencies) gradually increases over training periods.
- Result: This increase in power ensures a convergence towards a stable policy capable of working on rough terrain or resisting unforeseen human interactions from its physical deployment.

#### 7. Control (AI): Bridging the "Reality Gap"

Transferring the simulation (Isaac Lab) to the real robot fails if the models ignore friction and mechanical play.

- Actuator Net: Replace ideal motor models with neural networks (MLP or LSTM) trained on real signals ("chirp") to capture hysteresis and nonlinear friction. This reduces tracking error from 20% to less than 5%.
- Massive Randomization: For a "zero-shot" transfer, randomly vary during training the mass of the links, the ground friction (0.4 to 1.2) and especially the command latencies (0-20 ms) related to the Linux kernel and the CAN bus.

## 7. Summary and Conclusion

The stability and operational survival of a dynamic humanoid robot do not result from the isolated performance of a single component, but from the Chain of Coherence: a systemic synchronization between computing power, software determinism, and motor intelligence. This architecture transforms a potentially fragile machine into a resilient system capable of executing explosive movements without thermal failure or loss of communication.

### 1. The Hardware Layer: The PCIe Backbone and the CAN-FD Reflex Bus

The physical layer constitutes the robot's "nervous system". For a bipedal humanoid, which is structurally an unstable inverted pendulum, the latency between the perception of an imbalance and the application of a corrective torque is the number one survival factor.

- The SPI Bottleneck: On the Raspberry Pi 5 (RPi5) architecture, using the SPI bus to control motors is prohibited for high performance. The chiplet architecture of the BCM2712 SoC introduces a dead time of 12  $\mu$ s per transaction. For a robot with 12 degrees of freedom performing 24 transactions per cycle, this consumes nearly 30% of the time budget of a 1 kHz loop before any useful data is even transferred.
- The Superiority of PCIe: The imperative solution lies in exploiting the PCIe 2.0 x1 interface of the RPi5. Unlike SPI, PCIe allows Direct Memory Access (DMA), reducing initiation latency to less than 1  $\mu$ s and freeing the processor from the overhead of interrupt handling.
- CAN-FD Densification: The switch to the CAN-FD protocol (5 to 8 Mbps) allows all commands (position, speed, torque, gains) to be encapsulated in a single 64-byte frame. This guarantees the atomic consistency of the commands sent to the actuator and reduces the number of transactions required by a factor of three compared to conventional CAN 2.0.

### 2. The System Layer (OS): Mastering Temporal Determinism

Having a fast processor is not enough if the operating system is not deterministic. A Whole-Body Control (WBC) loop must run with metronome-like regularity at 1 kHz.

- Linux RT (PREEMPT\_RT): Applying this patch is vital to make critical kernel sections preemptible. Without it, unpredictable latency spikes ( $> 800 \mu$ s on RPi5) cause missed deadlines, inevitably leading to mechanical instability and robot failure.
- Strict isolation via isolcpus: To transform Linux into a near "bare-metal" system, CPU resources must be partitioned. Adding the parameter isolcpus=2,3 to the boot configuration removes cores 2 and 3 from the main scheduler.

- The Choreography of Hearts: A robust strategy is to "pin" the tasks
  - :
- ÿ Core 3: Dedicated exclusively to the real-time control thread (QP/OCS2 solver).
- ÿ Core 2: Dedicated to interrupt (IRQ) management of the CAN-FD controller.
- ÿ Cores 0-1: Handling non-critical tasks such as WiFi, USB and logs.
- Aggressive Optimizations: The use of nohz\_full (disabling the periodic clock tick) and rcu\_nocbs modes on isolated cores eliminates the last sources of software jitter.

### 3. The AI Layer: Driving Intelligence via Actuator Nets

The transfer from simulation to the real world (Sim-to-Real) usually fails because of the "Reality Gap": "ideal" physical models do not capture the friction, backlash and hysteresis of real engines.

- The Actuator Net Methodology: Instead of theoretical physical equations, a neural network (LSTM or MLP) trained on empirical data is used. Real actuators are excited with "chirp" signals (frequency sweeps) to record their exact response.
- Integration into Isaac Lab: This neural network model is integrated directly into the simulator, replacing the default motor model. The agent thus learns a policy that natively respects the thermal limits, current saturations, and non-linearities of real motors, reducing the position tracking error from 20% to less than 5%.
- Domain Randomization: To ensure robustness, the training environment is intentionally made more hostile than reality. Segment mass ( $\pm 15\text{-}20\%$ ), ground friction (0.4 to 1.2), and communication latencies (0 to 20 ms) are massively perturbed. This curriculum-based approach ensures that the learned policy can handle any real-world condition upon physical deployment.