

Master Thesis

# Optimal Platform for Embedded Super Computers

---

ZURICH UNIVERSITY OF APPLIED SCIENCES

INSTITUTE OF EMBEDDED SYSTEMS

Authors                    Alexey Gromov

Last changes            28.5.2020

**Copyright Information**

This document is the property of the Zurich University of Applied Sciences in Winterthur, Switzerland: All rights reserved. No part of this document may be used or copied in any way without the prior written permission of the Institute.

**Contact Information**

c/o Inst. of Embedded Systems (InES)  
Zürcher Hochschule für Angewandte Wissenschaften  
Technikumstrasse 22  
CH-8401 Winterthur

Tel.: +41 (0)58 934 75 25

Fax.: +41 (0)58 935 75 25

E-Mail: [groo@zhaw.ch](mailto:groo@zhaw.ch)

Homepage: <http://www.ines.zhaw.ch>

## **Erklärung betreffend das selbständige Verfassen einer Vertiefungsarbeit/Masterarbeit im Departement School of Engineering**

Mit der Abgabe dieser **Vertiefungsarbeit/Masterarbeit** versichert der/die Studierende, dass er/sie die Arbeit selbständig und ohne fremde Hilfe verfasst hat.

Der/die unterzeichnende Studierende erklärt, dass alle verwendeten Quellen (auch Internetseiten) im Text oder Anhang korrekt ausgewiesen sind, d.h. dass die **Vertiefungsarbeit/Masterarbeit** keine Plagiate enthält, also keine Teile, die teilweise oder vollständig aus einem fremden Text oder einer fremden Arbeit unter Vorgabe der eigenen Urheberschaft bzw. ohne Quellenangabe übernommen worden sind.

Bei Verfehlungen aller Art treten Paragraph 39 und Paragraph 40 der Rahmenprüfungsordnung für die Bachelor- und Masterstudiengänge an der Zürcher Hochschule für Angewandte Wissenschaften vom 29. Januar 2008 sowie die Bestimmungen der Disziplinarmassnahmen der Hochschulordnung in Kraft.

Ort, Datum:

Unterschrift:

Das Original dieses Formulars ist bei der ZHAW-Version aller abgegebenen **Vertiefungsarbeiten/Masterarbeiten** im Anhang mit Original-Unterschriften und -Datum (keine Kopie) einzufügen.



# Abstract

Embedded super computers are a powerful combination of multi-core CPUs, additional accelerating computing blocks and dedicated high-performance interfaces in one chip. The tight coupling of this blocks improved the energy consumption to compute power ratio strongly and enabled the realization of vision and AI applications in compact embedded devices. Embedded super computer target a wide range of application and offer a variety of interfaces to bring the data into the processing units. However, the fusion of high-performance blocks makes the chips expensive and the development complex, compared to standard processors.

Carrier board manufacturers for embedded super computers inherit the complexity and try to include as many physical interfaces as possible on their boards. Hence, the price of the carrier boards increases. Nevertheless, different applications demand a reduced set of different interfaces, so a complex carrier board is always only partly used.

This thesis proposes an implementation of a modular carrier board to increase the utilization of hardware and reduce the overall system price. To develop an optimal, modular platform for the Nvidia Xavier, the internal peripherals are grouped to functional blocks and a market analysis derives desirable converters in different applications for each block to define standardized extension module interfaces. The end-product is a minimal carrier with simple access to all functional blocks without limitations in functionality of the embedded super computer.

Five extensions modules for the functional groups of Internet access, video input, video output and general purpose high-speed communication are developed to make a comparison between a modular carrier board and existing solutions. The new approach is configurable with simple, specialized modules and increases the utilization of an embedded super computer without increasing the complexity.,

## Acknowledgment

I wish to thank all people, who were involved in my master thesis for interesting discussions and helpful feedback over the past months. In particular, I would like to express my gratitude to my supervisor Hans Gelke for his patience, forward driving energy and support. Especially for understanding the difficulties of a hardware project during the COVID-19 pandemic.

In addition, I am thankful for the support of the ZHAW INeS HPMM group for the support in hardware development and reviews. In this context, I want to highlight the great support of Rico Ganahl, Philipp Huber, Dominic Mösch and Simone Schweizer.

Finally, many thanks to all my close persons, for understanding my absence in the social live.

# Contents

|                                                     |           |
|-----------------------------------------------------|-----------|
| <b>1. Introduction</b>                              | <b>2</b>  |
| 1.1. Technological Computing Environment . . . . .  | 2         |
| 1.1.1. General Computing Tasks . . . . .            | 3         |
| 1.1.2. General Solutions . . . . .                  | 4         |
| 1.1.3. Multidisciplinary Solutions . . . . .        | 5         |
| 1.1.4. Limits of Conventional Solutions . . . . .   | 6         |
| 1.2. Embedded Super Computers . . . . .             | 7         |
| 1.2.1. Quantitative Analysis of ESC . . . . .       | 8         |
| 1.2.2. ESC Development Challenge . . . . .          | 10        |
| 1.3. Technological Environment Conclusion . . . . . | 12        |
| <b>2. Motivation for Modular Architecture</b>       | <b>13</b> |
| 2.1. SoM Applications . . . . .                     | 13        |
| 2.1.1. Autonomous Moving Drones . . . . .           | 13        |
| 2.1.2. Delivery Robots . . . . .                    | 14        |
| 2.1.3. Multimedia Applications . . . . .            | 15        |
| 2.1.4. PCIe Master Applications . . . . .           | 15        |
| 2.2. Standard Interfaces and Extensions . . . . .   | 16        |
| 2.2.1. Video Interfaces . . . . .                   | 16        |
| 2.2.2. Internet Connectivity . . . . .              | 18        |
| 2.2.3. USB C . . . . .                              | 18        |
| 2.3. Available Carrier Boards . . . . .             | 19        |
| 2.3.1. Carrier Board Analysis . . . . .             | 20        |
| 2.3.2. Cost and Complexity . . . . .                | 21        |
| 2.4. Modular approach . . . . .                     | 22        |
| <b>3. Modular Concept</b>                           | <b>23</b> |
| 3.1. Xavier Overview . . . . .                      | 23        |
| 3.2. Physical Module Connection . . . . .           | 24        |
| 3.3. Video Output . . . . .                         | 25        |
| 3.3.1. Requirements . . . . .                       | 25        |
| 3.3.2. Video Output Module Interface . . . . .      | 26        |
| 3.4. Video Input . . . . .                          | 27        |
| 3.4.1. Requirements . . . . .                       | 28        |
| 3.4.2. Video Input Module Interface . . . . .       | 28        |
| 3.5. Nvidia Highspeed UPHY (NVHS) . . . . .         | 29        |
| 3.6. Highspeed IO x12 (HSIO) . . . . .              | 30        |
| 3.6.1. SoM Structure . . . . .                      | 30        |
| 3.6.2. Requirements . . . . .                       | 31        |
| 3.6.3. Lane Partition . . . . .                     | 31        |
| 3.7. Ethernet Module Interface . . . . .            | 33        |
| 3.8. USB Module Interface . . . . .                 | 34        |
| 3.9. Value optimization . . . . .                   | 34        |
| 3.10. System overview . . . . .                     | 35        |
| <b>4. Implementation</b>                            | <b>36</b> |
| 4.1. Carrier Board . . . . .                        | 37        |
| 4.2. Modules . . . . .                              | 38        |
| <b>5. Results</b>                                   | <b>41</b> |
| 5.1. System Feasibility . . . . .                   | 42        |

---

|                           |    |
|---------------------------|----|
| 5.2. Conclusion . . . . . | 43 |
| <b>A. Appendix</b>        |    |

# 1. Introduction

Mobility defines best what has driven technological progress of the last decade. The human interaction with the environment or other tasks traditionally done by humans is achieved through small, autonomous and powerful devices. Applications push the development of embedded supercomputers. Examples are not only autonomous driving, drones, robots, augmented/virtual reality but also the control of remote processes with ML on the edge.

Product development based on an embedded supercomputer has a high technological entry-level. To create value for the end product, a long chain of different engineering fields needs to be mastered, demanding high employee resources or a long time to market at high costs. Most technical Startup-ideas die before the realization of the first working prototype, due to complexity and limited resources. Nevertheless, even small and established niche players are unable to build up the know-how in new overwhelming technologies in parallel to daily business due to limited available investment capital and resources.

The source of the problems is different market targets of the SoM producer and the end product manufacturer. While the former targets in markets with a wide range of interfaces and shared high-performance blocks, the latter needs to produce a tailored and cost-optimized product for a specific market with a specific but maybe not standard interface.

## 1.1. Technological Computing Environment

A wide range of computing devices exist, which makes it challenging to choose the optimal architecture to solve a specific problem. To develop an optimal platform for embedded supercomputers it is important to understand in which application these are intended to be used.

Figure 1.1 shows the design steps from a problem in the market defined by the management to a fixed system architecture and technical requirements. This process will help to understand in which cases embedded supercomputers need to be used and what are the characteristics of an optimal hardware platform for them. The desired problem to be solved is separated into three technology defining aspects.

- What is the data type and the amount of this data to be processed in what time.
- What is the computational effort do be made on this data.
- And what are the physical constrains in size of the device or distance from data input to output.



Figure 1.1.: Product Characterization

Answering these questions defines the technical requirements for the product. At this point, it is possible to decide if there are enough development resources to solve this problem. Further, hardware aspects of data acquisition, capable processors and data transfer options can be determined. A vast majority of problems can be solved with different approaches by investing more resources in one of the mentioned aspects. To get to a common technological understanding in this section an overview of existing computational tasks, methods and architectures is given. Major application fields and their characteristics are discussed in terms of typical problems and common solutions.

### 1.1.1. General Computing Tasks

Processing tasks can be separated into three focal areas (Figure 1.2). For example, in automation a well-defined environment is needed, this is the case where the problem can be described with single physical properties like distance, time, temperature. Often this are control applications with several inputs and outputs, hard but not tight timing constraints and limited possibility to react on environmental changes.

Another type of processing tasks is capture and distribution. In these applications the input and output information is irrelevant, but the throughput and interfaces define the system. A common task would be data conversion from one streaming standard to another.

The last area is analysis tasks with high computational effort. The available information is not a direct input to solve the problem. Out of a given situation, the relevant information needs to be extracted and analyzed. From this interpreted state a reaction needs to be calculated.



Figure 1.2.: Technology Fields: ■ Time critical, ■ Computational heavy , ■ High data transfer, ■ Interactive Multimedia, ■ Production Lines, ■ Server Applications, ■ Autonomous Movement

But many applications require a combination of several fields, and different hardware solutions exist for the same problem. As discussed in the product characterization process more or less resources can be invested in different fields leading to various architectures. Especially when the system scales in numbers of channels, the initial approach for one channel is not the best for many.

### 1.1.2. General Solutions

Figure 1.3 shows the most common processing families in relation to processing tasks. On the outside of the figure are basic representatives of circuits to solve simpler tasks in their field. Where near the center of the figure more complex device families operate to solve more complex and comprehensive problems.



Figure 1.3.: Technology Families: ■ Automation, ■ Analysis, ■ Capture/Distribution

The center shows multidisciplinary application indicated via the black triangle.

#### Automation

The most distributed processing units for automation are micro-controller/processors because of the enormous low-cost consumer market. Historically these devices solve simple tasks like reacting to an on/off signal, interpret an analog value and perform a defined calculation or just repeat a given task in an endless loop. Nowadays, the diversity and possibilities of micro-processors strongly increased also to simple AI applications. However, their architecture and interfaces do not provide required scalability for complex applications.

#### Capture/Distribution

An example of capture and distribution applications is the multimedia market. The main characteristics of these devices are high data rates without analysis of the information in the data stream. In primary applications in this field the data is muxed, copied, converted or simply re-transmitted. These predefined operations can be optimally realized in a customized ASIC. However, Asics which are optimized for a particular task are not flexible configurable, and thereby a custom PCB needs to be designed for each application. This problem led to FPGA development, where on the same hardware different applications could be realized by reconfiguration of the FPGA. Nevertheless, this flexibility comes at a high cost, because of complex silicon structures.

#### Analysis

Analysis applications began from the scientific environment where the amount of data is not interpretable by a human. Often different approaches are tested on the same data to find an optimal solution. The computational time to search for specific behavior in this data is not of priority but more the flexibility of the system to perform different calculations. This flexibility is intrinsic to its architecture of implementing different software programs on the same hardware (e.g. PC).

#### Multidisciplinary Applications

All above noted application fields have a high degree of self optimization. The market is satisfied and saturated with solutions to solve tasks within one area. The main focus of modern engineering is on multidisciplinary applications located in Figure 1.3 inside the black triangle in the center of the diagram. The challenge is how to solve a task that needs to analyze a given situation on a continuous high-speed data stream within a given time at lowest costs in terms of hardware and development time.

### 1.1.3. Multidisciplinary Solutions

To solve multidisciplinary problems, the combination of different system types is an obvious solution. However, also at this point many different approaches with advantages and disadvantages exist.

#### One Box Solutions

The first approach was to introduce expensive backplanes or carrier PCBs to expand the possibilities of the system shown in Figure 1.4. And not only the PCB costs are high, but also this system type requires expensive high-speed connectors. Further, the system cost is increased by expensive cables from the acquisition system to the source of data.



Figure 1.4.: One Box Solutions

The benefit of these systems is that they support configurations to solve the most complex problems but only in high price markets.

#### Edge Converter Solutions

In several cases of data input, the IO standard connector occupy much space (e.g., BNC-connectors). That leads to simple capture cards with a low number of interface connectors but an expensive high-speed connector for the backplane or carrier board. To reduce this cost edge converter are introduced Figure 1.5.



Figure 1.5.: Complexity Comparison Central and Edge Conversion Solution

This architecture combines and compresses the data at the source to a higher data density to feed the expensive acquisition device with more data. With this approach, the system costs are reduced by more efficient data transport. Moreover, this architecture is beneficial if the system is in an environment with electrical disturbances and the initial data source type is not robust against noise.

### Server Applications

To maximize the computational efficiency and costs server applications detached from physical constrain of distance. The infrastructure of the Internet is used to transport the information to a powerful computing unit (Figure 1.6).



Figure 1.6.: Server Architecture

The driving idea of server applications is sharing costs. It is assumed that the Internet infrastructure is already used for other purposes and with the same hardware (e.g., WLAN Router, Ethernet switches, Internet-Fee) value is generated for different applications. Moreover, also at the server hardware cost are shared between different users. In the server, high computational devices are combined to handle the most complex calculation for a vast number of users and applications.

#### 1.1.4. Limits of Conventional Solutions

All shown architectures use a combination of conventional solutions. The maturity of these technological fields results in many of the shelf solutions for different cases. In all cases, the goal is to reduce hardware development by using existing products and solve the problem in software. Table 1.1 shows the advantages and disadvantages of these architectures.

|          | One Box                            | Edge Conversion                                       | Server                                  |
|----------|------------------------------------|-------------------------------------------------------|-----------------------------------------|
| Positive | High variety of cards exists       | Tailored optimized solutions                          | Only software development               |
|          | Hardware engineering not mandatory | Combined solutions with custom and of the shelf parts | Cheap if infrastructure is given        |
|          |                                    | Reduced costs                                         | Not limited by distance                 |
| Negative | Very expensive                     | Expensive                                             | Needs infrastructure                    |
|          | More expensive to cover distance   | Custom hardware for optimization needed               | Data transfer limited to Internet speed |
|          | Not mobile                         | Not mobile                                            | Monthly costs                           |
|          | Not energy efficient               | Not energy efficient                                  | No latency control                      |
|          | Limited in distance                | Limited in distance                                   | Not real time                           |

Table 1.1.: Conventional solutions comparison

The positive aspects of conventional solutions show that they fulfill the needs of data throughput and computational power. Nevertheless, they fail to satisfy modern physical aspects of problems or industrial latency requirements. These aspects are best shown with examples.

### Autonomic devices

An autonomous moving drone is battery powered. One of the main characteristic is the operation time of the machine, which is directly depended on the battery size and power consumption. On the one hand side, one can use a bigger battery, which is more substantial and need more powerful motors to keep the machine in the air. On the other hand side, it is possible to reduce the power consumption to extend the operation time. But this is not possible with the standard solutions On Box and Edge Conversion.

One possibility is to capture the video data on the drone and stream it to a server to perform the calculations, but this leads to enormous data transfer over a mobile, not guaranteed connection. Furthermore, the latency introduced by the information transfer is not acceptable in such an application. The energy efficiency aspect is not only important for small devices. Also, in the case of delivery robots or even electric cars, the use of conventional CPU and GPU reduces the operational time significant or at least tangible.

### Security considerations

Another modern problem is the availability of the Internet infrastructure. In most locations, Internet is available not all the time or only at low data rates. What makes it not usable for industrial applications. Moreover, even if focusing on regional markets with good Internet quality, it is often not possible to deploy a device to a business because of their IT security rules. In this case, an additional option with mobile Internet needs to be offered. However, as already discussed, server applications are only cheap if the communication infrastructure is shared between different value creation units.

These two aspects characterize best the new requirement for an electrical processing solution. In addition to system performance and flexibility comes energy efficiency and autonomy.

## 1.2. Embedded Super Computers

To cope with the modern requirements System on Chip (SoC) were introduced and optimize over the last decades. That results in high performance, energy-efficient and available in options within a wide price range. These systems enabled the possibility to implement complex autonomous solutions at the data source. This section gives an overview of competing SoC fields and how they are nowadays used as System on Modules (SoM).

An SoC is defined as a CPU extended by other hardware ASIC functions on the same silicon chip. The dominating SoC types shows Figure 1.7. Simple SoCs only have extensions for data interfaces. The second SoC type is extended with a GPU to have more computational power. The last type is a native FPGA, which additionally offers a CPU and fixed optimized interface blocks.



Figure 1.7.: SoC Types

The main change compared to the standard solution is a high bandwidth interconnection between the hardware instances within a chip. This architecture eliminates the need for an expensive backplane and connector between the different functional blocks computation, capture and control. Moreover, all SoCs are built as autonomous computing devices in a single chip with no direct need of a supervising server or connection to the Internet.

Another advantage of tightly coupled hardware blocks is the elimination of IO pin drivers between the blocks. Much more energy is needed to transfer a signal over a more considerable distance, especially in a digital environment higher voltage levels are needed to be more noise resistant. Still, the processing unit is the most power-consuming block. However, with the use of specialized interface blocks, standardized conversions can be explicitly included in the interface, what offloads the computational effort of the CPU. This aspect allows the usage of less powerful CPUs for the same task, which reduces the overall energy consumption strongly.

### System on Module

High-performance SoCs include several highspeed interfaces, require stable clock and high-quality supplies. Furthermore, the casing of these chips requires a complex and expensive PCB stackup and technology. To reduce the development complexity for the manufacturer of a device and guarantee a stable function of the chip, the SoC developer creates modules for their chip. The module offers a connection to functional pins of the SoC in a more common mechanical structure. It implements the most critical supply rails, clocking and memory connections tight to the chip. That simplifies the usage of the SoC and allows the implementation of cheaper PCB technology for the carrier board of the module.

### Embedded Super Computer

The performance range of SoM is wide. To differ between performance classes the term Embedded Super Computer (ESC) was introduced to refer to the highest performance segment of SoMs. A technical difference between Embedded Super Computer and other SoM classes are powerful acceleration peripherals. Which means, addition to a multi-core CPU an even more computational powerful block is added to perform specific calculations in parallel.

No clear line exists to separate Embedded Super Computers from medium performance class SoMs/-SoCs.

#### 1.2.1. Quantitative Analysis of ESC

An example of the different SoM types is shown in Figure 1.8. In this comparison, the three types of SoM represented with Raspberry 3+ Module as a simple SoM, the Nvidia Xavier as powerful GPU ESP and Xilinx Zynq Ultrascale+ FPGA based ESP.



(a) Raspberry 3+ Simple SoM    (b) Nvidia Xavier GPU ESC    (c) Xilinx Zynq Ultrascale+ FPGA ESC

Figure 1.8.: SoM / Embedded Supper Computer

The following comparison of the possibilities and cost shows the difference between a standard SoM and ESC class.

These three examples represent sharply different SoM types so that no fair overall comparison can be carried out. The Table 1.2 is intended to give an overview of different main focuses of optimization between these families and shows the limitations of each of them. As long as no specific problem or application is given, no rating between the modules can be established. However, the overview of processors and peripherals in Table 1.2 shows which area of the modern complex problems can be solved optimally with which device family.

|                    | Simple SoM                         | GPU ESC                                                            | FGPA ESC                                              |
|--------------------|------------------------------------|--------------------------------------------------------------------|-------------------------------------------------------|
| Module             | Raspberry Pi 3+                    | Nvidia Xavier 8GB                                                  | Enclustra XCZU5EV-1FBVB900I                           |
| CPU                | 4-core ARM Cortex A53 (ARMv8)      | 6-core Carmel ARMv8                                                | 4-core ARM Cortex A53 (ARMv8) 2-core R5               |
| GPU                | none                               | 384 Volta cores                                                    | Mali-400MP2                                           |
| Memory             | 1GB LPDDR2<br>32GB eMMC            | 8GB LPDDR4 85GB/s<br>32GB eMMC                                     | 2GB DDR4 19GB/s<br>16GB eMMC                          |
| PCIe               | none                               | Gen3 x8 + x4 + x2 + x1                                             | Gen3 x16 + Gen2 x4                                    |
| Video IN           | CSI D-Phy 1.1 6-lanes              | CSI D-Phy 1.2 16-lanes or<br>C-Phy 1.1 16-lanes<br>SLVS-EC 8-lanes | CSI D-Phy 1.2 0 to 10-lanes <sup>1</sup>              |
| Video Out          | HDMI 1.3a<br>DSI D-Phy 1.1 6-lanes | 3 x Multimode<br>HDMI2.0/DP1.4/eDP                                 | 2 x DP1.2<br>DSI D-Phy 1.2 0 to 10-lanes <sup>1</sup> |
| Video Codec        | h.264                              | h.264/h265                                                         | h.264/h265                                            |
| Accelerators       | none                               | 7-Way VLIW Processor<br>2 x DLA                                    | 2678 DSP slices<br>504k Logic Cells                   |
| General High Speed | none                               | none                                                               | 16x 15 Gb Transceivers                                |
| Misc               | USB2/SD                            | USB2/USB3/ETH                                                      | USB2/USB3/ETH                                         |
| Price              | 50 USD                             | 630 USD                                                            | 1691 USD                                              |

Table 1.2.: Possibilities Comparison

<sup>1</sup> General purpose highspeed IOs are configurable as CSI lines or other functionality.

### Simple SoM

Simple SoM is cost-optimized and can only solve multidisciplinary problems with low technical requirements with limited supported input and output standards. The CSI and HDMI versions are only capable of supporting 1080p with 30 frames per second (fps). That is also intended, as without GPU or accelerators, the CPU will not be able to process more data. Furthermore, no general-purpose high-speed interfaces are available, which limits the application range strongly.

### GPU ESC

The higher number of CPU cores is not the main difference of the Nvidia Xavier ESC to a simple SoM, but more the combination of a powerful GPU and multimedia interfaces targeting multimedia applications. The Xavier features most competitive standard video interfaces in this comparison. At the same time, it offers general high-speed interfaces PCIe and USB3 what extends the application fields. However, high-speed interfaces are not the only requirement to solve complex problems. Xavier also provides the biggest memory with the highest memory access. This feature is essential to run dense computational algorithms on big data junks like 4k video streams. The accelerators in form of 7-Way VLIW optimizes vision algorithms and DLA, AI tasks.

### FPGA ESC

The main focus of FPGA ESCs is configurable, real-time and power-optimized solutions. DSP slices and Logic Cells is a very different type of accelerators compared to GPU ESC. Each slice and cell can be configured separately, delivering an optimized solution with minimal required active transistors. The smaller memory and memory bandwidth shows that it is not optimal to have a high number of memory access. FPGA, in general, are intended to work in (parallel) pipelines. That also enforces the high number of general-purpose highspeed interfaces in the form of PCIe and additional 15 Gbps transceivers, which can be configured to the desired format. Each interface and following pipeline can be configured in a deterministic and low latency way, which is the most significant benefit of FPGA ESC.

### Optimal Fields

For the price range of 50 to 100 USD, a simple SoM is an optimal solution for standard capture and simple vision/ML algorithms. Nevertheless, it is not capable of solving complex time-critical problems, for example, autonomous flying drones in a closed environment. However, it is an optimal solution for education and training purpose.

The GPU ESCs are optimal to work with consumer data standards and have enough input interfaces and computational power for a wide range of complex applications in the market. With the price range from 400 to 900 USD, these modules offer the full automation of a complex human task. But the fixed architecture limits the optimization of power consumption and the latency through the system makes it not optimal for problems with constraints in complexity, time and power.

Technological most competitive FPGA ECSs start at the range of 1000 USD and can reach several thousands. This highest price segment offers the least restrictions to develop an optimal solution in all aspects of a digital system. Offering customizable high-performance blocks in the fields of capture and compute. The defined processing pipeline architecture gives control over data throughput and latency though the price range limits the usage of FPGA ESC to high price margin devices.

### 1.2.2. ESC Development Challenge

The main difference from simple SoMs to ESCs is the availability of accelerator processors for specific computational tasks. The accelerators have different compute architectures and hence different programming requirements. A developer needs experience and education in diverse fields to make use of the additional computational advantage. In some instances, different developers are involved in optimizing and achieving the needed performance of the algorithm.

In addition the complexity is increased through the configurable interface blocks, which are designed to cover a wide range of applications. Vast understanding of interface capabilities is required to determine the optimal configuration to solve a problem.

These requirements are a challenge for the majority of small and middle-size manufacturers. Besides their daily business, they have not enough resources nor capital to build up expertise in these fields.

All SoM types offer a CPU that has technological a low entry-level for programming. The language C is used, the most distributed programming language for embedded devices. However, GPU and FPGA ESC differ strongly in the development tools and methods for their accelerators.

### GPU ESC Development

The most common languages used for GPUs are CUDA or OpenGL. Both are based on C and have a similar syntax. To describe the behavior of GPUs these C is extended with metaphors to describe parallelism in a simple way. Understanding these metaphors and the functionality of GPU architecture is enough to achieve acceleration.

The Deep Learning Accelerator (DLA) cores are different from the GPU and are optimized to accelerate machine learning algorithms. Different expertise is required to understand and optimize the DLA core, e.g. Nvidia uses the TensorRT translation API . Where a created and trained model is recompiled to the target platform (GPU or DLA). Figure 1.9 show which AI Frameworks can be translated.



Figure 1.9.: Overview of tools to be used with Nvidia GPU<sup>[1]</sup>

### FPGA ESC Development

The high degree of flexibility of FPGA also requires deep understanding FPGA functionality, especially in the dedicated interface peripherals. FPGA development tools offer configuration GUIs to simplify the process, nevertheless, the reading and understanding of the datasheet of each peripheral is crucial to achieve the desired performance.

The CPU in FPGA ESC is programmed with C, the rest of the FPGA ESC is not programmed, but described in a hardware description language. Such an algorithm implementation differs strongly compared to programming in C. A good knowledge of the ASIC build-up of an FPGA is essential to achieve the needed performance. Furthermore, simulation is an essential step in FPGA development and requires further skills and expertise.

To implement AI networks, like in the case of GPUs, translation software is used to simplify the development process. Figure 1.10 shows the optimization process of Xilinx DNN Development Kit (DNNDK) and the automatic optimization of the network through their software. However, the configuration of the peripherals and optimization of the whole system is a challenging and time-consuming task.



Figure 1.10.: Xilinx DNNDK process<sup>[2]</sup>

### 1.3. Technological Environment Conclusion

Technology fields by it self are optimized, and many solutions exist to cover the hole desired range. The modern challenge is how to solve complex multidisciplinary problems. Many years the solution was to connect chips from different fields with a high-speed bus introducing an additional expensive component. The price for such solutions exceed the limit for most consumer markets.

Three main architectures were used to solve demanding problems: One Box, Edge Conversion and server applications. All of them show weaknesses in solving modern multidisciplinary problems. SoMs are the most promising architecture to tackle applications demanding high performance in execution time, complexity and data throughput. The most powerful SoMs are embedded supper computers. The difference is, that in addition to a CPU and dedicated interface blocks, they include computing acceleration capability for specific tasks. This guarantees data throughput through the whole processing chain.

However, the potential of embedded super computers leads to a trade-off between perfomance and resources. On the one hand side, the high performance is achieved through the combination of different technologies on one die. Hence, the same price is paid for the SoM regardless of the generated value in the product. On the other hand side, the combination of different technologies requires expertise in diverse fields, resulting in high human resource for ESC development.

As outlined, the integration of ESC is only realistic for large companies with enough resources to invest in a long term project. For midrange and small companies, it is impossible to build up the needed expertise besides daily business. This thesis proposes a new architecture to lower the knowledge entry-level and risk associated with an ESC development. The proposal is based on the Nvidia Xavier 8GB ESC, which shows the best performance to cost ratio for the high-end consumer market.

## 2. Motivation for Modular Architecture

This chapter analyses the multidisciplinary property of ESCs. The variety of interfaces enables the use of SoM in many different applications, leading to an increased number of sold chips for the SoM manufacturer. For the same reason, baseboard manufacturer also try to be used in as many applications as possible and try not to limit the possibilities of the SoM by missing interfaces and connectors.

With different example applications, the possibilities and requirements of complete SoM systems are shown. Despite that configurable interfaces strongly improve the system possibilities, they complexity can be a limiting factor for new markets. The reason is the wide range of standards and interfaces. Different types of available baseboards are discussed in terms of cost to market value potential. To increase this factor, a new modular approach is proposed.

### 2.1. SoM Applications

Given the use of an ESC, this section focuses on peripheral usage in overall systems. The following examples show the challenge of not used or wrong format interfaces in different markets, on the example of Nvidia Xavier 8GB.

#### 2.1.1. Autonomous Moving Drones

One of the most typical SoM applications are autonomous drones. Figure 2.1 shows the Skidio 2 drone, which is a multi-camera drone with the capability to create a depth map of the surrounding track a moving target and calculate its path to not lose visual connection to the target.



Figure 2.1.: Skidio 2 drone<sup>[3]</sup>

To accomplish the goal, six wide-angle cameras are used to create the depth map and one high-resolution camera captures the video and ensures the tracking of the target. Physically the cameras are located near the processor. The video needs to be compressed and transmitted to a host device. As the cameras are near the processor, the native CSI interfaces can be used to capture the data. The integrated video engines are utilized to compress the video and a WIFI adapter is needed to transmit the data to the host. Table 2.1 shows an overview of the utilization in a drone application.

| Used blocks                         | Not used blocks                                                      |
|-------------------------------------|----------------------------------------------------------------------|
| CSI/ h.264 encoder / PCIe x1 / USB2 | PCIe x1 x2 x4 x8 / 3x Video output / h.264 decoder / USB3 / Ethernet |

Table 2.1.: Interface and hardware blocks utilization in drone applications

### 2.1.2. Delivery Robots

In general, delivery robots need to solve a similar problem like drones, moving and avoid surroundings. However, they have different physical constraints because of transportation use. Primarily they need to transport bigger deliveries, which makes them more prominent in size. Hardware-wise this constraint makes the CSI interface not viable as it only works at a maximal distance of 15 cm. The same problem occurs in the case of cars with drive assistant. Two examples of this situation show Figure 2.2<sup>1</sup>.



Figure 2.2.: Short range of CSI problem

A delivery robot will need several cameras and distance sensors to obtain its environment, a GPS sensor and Internet connection to receive destination information and transmit its own location. To capture video data on different sides of the robot additional converters are needed to convert the data at the camera and transmit to the processor. In the development process, this step introduces additional hardware effort, understanding of not common standards and software development for the converter. Widespread extensions to existing interfaces for the Nvidia Xavier are discussed in Section 2.2. Table 2.2 shows the required interface blocks, needed extensions and not used blocks.

| Used blocks           | Additional hardware   | Not used blocks                                                                   |
|-----------------------|-----------------------|-----------------------------------------------------------------------------------|
| CSI / PCIe 2x1 / USB2 | Video transfer to CSI | PCIe x2 x4 x8 / 3x Video output / h.264 encoder / h.264 decoder / USB3 / Ethernet |

Table 2.2.: Interface and hardware blocks utilization in autonomous delivery robots

For the reason of the variety of drive assistant tasks no quantitative table for their interfaces is shown.

<sup>1</sup>It is not known which type of sensors Starship is using, but this shows a possible situation when CSI is not sufficient for video capturing.

### 2.1.3. Multimedia Applications

To increase the performance and reduce the costs a Xavier module replaces an industrial pc. The integrated GPU is suitable for executing scaling and muxing of video streams. Figure 2.3 describes a possible system with four HDMI sources that need to be muxed externally, each two merged, rescaled and streamed to HDMI and DP splitter. The system is configured through Ethernet.



Figure 2.3.: HDMI mux/scaler based on Xavier SoM

| Used blocks                             | Additional hardware                               | Not used blocks                                          |
|-----------------------------------------|---------------------------------------------------|----------------------------------------------------------|
| CSI / USB2 / Ethernet / 1x Video output | Video transfer to CSI / HDMI Mux / Video splitter | PCIe 2x1 x2 x4 x8 / h.264 encoder / h.264 decoder / USB3 |

Table 2.3.: Interface and hardware blocks utilization in multimedia applications

### 2.1.4. PCIe Master Applications

As discussed in the introduction, a significant benefit of the One Box and Edge Conversion solution is the high availability of different capture cards. When older architectures are physically sufficient the Xavier module still adds value in cost reduction by combining the CPU and GPU on one chip. Two situations cover the most presumable cases. First, the number of channels of a specific interface supported by the ESC is not sufficient. An example of that is a surveillance application when ten IP cameras streams need to be captured. Second, a particular interface needs to be captured. This is the case with a professional audio mixer. For both situations PCIe grabber cards exist and are deployed. Table 2.4 gives an overview of the used hardware.

| Used blocks                              | Additional hardware   | Not used blocks                         |
|------------------------------------------|-----------------------|-----------------------------------------|
| USB2 / Ethernet / PCIe x8 / Video Output | specialized PCIe Card | PCIe 2x1 x2 x4 / 2x Video output / USB3 |

Table 2.4.: Interface and hardware blocks utilization in One Box or Edge Conversion applications

## 2.2. Standard Interfaces and Extensions

Previous discussed applications show that in all cases, many interfaces are not used or need a converter to be used. This section gives an overview of available interface extensions to the Xavier 8GB module.

### 2.2.1. Video Interfaces

To capture image data, MIPI CSI is the most suitable interface. The standard distributed version is CSI D-PHY. It consists of maximal one clock differential pair and up to four data differential pairs and has a data-rate of 2.5 Gb/s per data lane<sup>[6]</sup>. The negative aspect of MIPI CSI is that it is developed for the mobile market and has a working distance of 15 cm. Which is not sufficient for many applications discussed in Section 2.1.

#### MIPI Serializer

To solve the distance problem serializers/deserializers are used. Figure 2.4 show a simplified setup for CSI serializers.



Figure 2.4.: Concept of CSI serializer and deserializer used by FPD-Link III and GMSL

This technology offers a high-speed serial link to up to 15m and is driven from the automotive industry. The channel includes not only a high-speed data link but also a bidirectional control layer. Furthermore, power over coax (PoC) is supported to supply the camera. These standards are the optimal solution to implement cameras in a distance of the processor and is a cost-effective analogy to edge converter systems.

#### HDMI converter

The ASIC manufacturer Toshiba and Lontium offer various HDMI to MIPI converters. These are used to convert HDMI to a CSI input or to connect a DSI Display to HDMI output. Both cases are presented in Figure 2.5.



Figure 2.5.: HDMI to MIPI converter

Texas Instruments, the producer of the FPD-Link III, also offers HDMI to FPD-Link III converters. What is compelling, to convert an HDMI source to FPD-Link III and then from FPD-Link III to CSI at the processor.

Furthermore, the distance of HDMI and GMSL or FPD-Link III is not sufficient for several applications. HDBaseT technology provides a solution for this problem. It combines Ethernet, HDMI and USB to one CAT6 cable and transfers over 100W with Power over Ethernet(Figure 2.6).



Figure 2.6.: HDBaseT example to transport HDMI, Ethernet and USB 2 to an ESC

### Camera Sensor Standard

Besides MIPI CSI, LVDS image sensors exist in the market. The use of those sensors requires a converter from LVDS to CSI, as shown in Figure 2.7.



Figure 2.7.: Lattice Crosslink conversion LVDS to MIPI CSI

Many applications are sensitive to a change of the sensor. Often it is not desirable to change a proven combination of algorithms and sensor. Therefore, the same LVDS camera needs to be maintained.

### Video Output

The Xavier module supports different video output formats (HDMI, DP, eDP). By using a fixed connector, no configuration of the output format can be performed. Hence, the interface is defined through the connector. Which, in turn, limits the potential of the ESC. Figure 2.8 is an example to extend the video output over 100 m with HDBaseT.



Figure 2.8.: HDBaseT used to extend HDMI signal to longer distance

In addition this use-case is more resistant to external noise compared to HDMI.

### 2.2.2. Internet Connectivity

The Ethernet standard evolved during the last decades. Resulting in various functionalities, supported speeds, media types and distances.

Table 2.5 describes the conventional transport media defined by IEEE with the Ethernet/IP protocol. However, many other protocols exists namely, Profinet, EtherCAT, SERCOSIII, VARAN and others. Which PHY is connected to the ESC defines, which capability the whole system supports.

| Standard                  | Capability                                                                           |
|---------------------------|--------------------------------------------------------------------------------------|
| 10Base-T (IEEE 802.3)     | 10 Mbps with category 3 unshielded twisted pair (UTP) wiring. Up to 100 meters long  |
| 100Base-TX (IEEE 802.3u)  | known as Fast Ethernet, uses category 5, 5E, or 6 UTP wiring. Up to 100 meters long  |
| 100Base-FX (IEEE 802.3u)  | a version of Fast Ethernet that uses multi-mode optical fiber. Up to 412 meters long |
| 1000Base-CX (IEEE 802.3z) | uses copper twisted-pair cabling. Up to 25 meters long                               |
| 1000Base-T (IEEE 802.3ab) | 1 Gigabit Ethernet that uses Category 5 UTP wiring. Up to 100 meters long            |
| 1000Base-SX (IEEE 802.3z) | 1 Gigabit Ethernet running over multimode fiber-optic cable                          |
| 1000Base-LX (IEEE 802.3z) | 1 Gigabit Ethernet running over single-mode fiber                                    |

Table 2.5.: Main Ethernet Standards with description of technology and distance [7]

### 2.2.3. USB C

With the introduction of USB 3.0 and the type C connector, the variety of USB increased. For applications, the increased bandwidth is essential. Nevertheless, at the design level, the new standards introduce extended power capability shown in Table 2.6, which have a significant impact on the hardware design.

| Version        | Voltage [V] | Current [A] | Power [W] |
|----------------|-------------|-------------|-----------|
| USB A / B / AB | 5           | 0.5         | 2.5       |
| USB-BC         |             | 1.5         | 7.5       |
| USB C          |             | 3           | 15        |
| USB-PD         | 5 / 12 / 20 | 5           | 100       |

Table 2.6.: USB standards, showing voltage and power levels

The Xavier module offers two active USB (3.0-3.2 Datarates) ports, which can be implemented in the BC, C or PD variant. In case the application requires two PD ports, the system needs to deliver additional 200 W compared to 30 W for the Xavier module. Moreover, not only the high power delivery is a challenge, but also the implementation of an Up-Down Converter to guarantee voltage levels from 5V - 20 V.

## 2.3. Available Carrier Boards

The diversity of available interfaces and standards show how a preselection of supported interfaces affects the module's performance. This section presents the interfaces of tree carrier boards and compares the functionality of the systems.

### Rogue-X for Xavier

|           |                                        |
|-----------|----------------------------------------|
| Internet  | 2x Ethernet ports / M.2 E Key for Wlan |
| Video Out | 2x HDMI 2.0                            |
| Video In  | 16 lanes CSI connector                 |
| USB       | 3x USB 3.1 Type C                      |
| PCIe      | 2x PCIe Gen2 x4 I-Pex Connector        |
| Storage   | M.2 M Key for NVMe / SD Card           |
| Size      | 105mm x 105mm                          |
| Price     | 930 USD <sup>[8]</sup>                 |

Table 2.7.: Rogue-X capabilities



Figure 2.9.: Rogue-X<sup>[9]</sup>

Rogue-X carrier board defines two Ethernet connections, with unknown protocol support, as well as a port for a WLAN module. Video is available with two HDMI 2.0 ports. All 16 CSI lines are accessible through a connector. Three USB C ports are implemented, with the restriction, that only one can be a Type C with 3A power delivery, the other are limited to 1.5A. Furthermore, a port for NVMe modules is present. The system overview is given in Table 2.7 and Figure 2.9.

### AX710 for Xavier

|           |                                                  |
|-----------|--------------------------------------------------|
| Internet  | 1x Ethernet / 2x pin header / M.2 E Key for Wlan |
| Video Out | 2 x HDMI 2.0                                     |
| Video In  | 16 lanes CSI connector                           |
| USB       | 1x USB 2.0 Type C / 2 x 3.1 Type A               |
| PCIe      | PCIe x8 extension connector                      |
| Storage   | M.2 M Key for NVMe / SD Card                     |
| Size      | 112mm x 107mm                                    |
| Price     | Not known. >1000 USD <sup>2</sup>                |

Table 2.8.: AX710 capabilities



Figure 2.10.: AX710<sup>[10]</sup>

The AX710 offers an Ethernet port and two pin headers where additional Internet modules can be connected. Similar to Rogue-X two HDMI 2.0 ports are available. 16 lane CSI is present. A USB type C connector is implemented, but with a speed limitation to USB 2.0, the two USB A connectors are defined for maximal 500 mA. PCIe x8 is accessible, but only through an expensive cable to a custom PCIe board. AX710 can plug in an M.2 NVMe storage. The system overview is given in Table 2.8 and Figure 2.10.

<sup>2</sup>Because the price is not accessible on-line, it is assumed that for a single piece the price is over 1000 USD

### X220 for Xavier

|           |                                                   |
|-----------|---------------------------------------------------|
| Internet  | 1x Ethernet / 1x pin header                       |
| Video Out | 2 x HDMI 2.0                                      |
| Video In  | 4 lanes CSI connector                             |
| USB       | 1x USB 2.0 Type B / 2 x 3.0 Type A                |
| PCIe      | PCIe x1 + x4 + USB 2.0 on one extension connector |
| Storage   | M.2 M Key for NVMe / SD Card                      |
| Size      | 125mm x 1105 mm                                   |
| Price     | 380 USD                                           |

Table 2.9.: X220 capabilities

Figure 2.11.: X220<sup>[11]</sup>

X220 carrier board includes one Ethernet connection and a pin header with Ethernet signals. Like in on other boards, two HDMI 2.0 are included. As USB capability, only one USB 2.0 Type B and two USB Type A, with the limitation of 500 mA, are implemented. X220 includes five PCIe lines and an additional USB 2.0 port on a high-speed connector. And like the others, an M.2 M-key for NVMe storage modules and SD card.

#### 2.3.1. Carrier Board Analysis

Compared to the full functionality of the Xavier module, all carrier boards limit these possibilities in one or the other way. Consumer interfaces are best present on all modules but show the limitation of protocols by the unknown included Ethernet PHY. Some of the boards offer a general pin header with Ethernet signals, which implies, depending on data rates, an expensive cable and connectors to an external custom board. In the case of the USB, on none of the boards, the end-manufacturer could configure, which USB data rates and power capability will be supported. The last consumer interface is video output, which is restricted to HDMI on all boards and in none of the carrier board the third video output of the Xavier Module is present.

PCIe standard has two significant benefits. First, it has the highest data throughput. And second, through its maturity, a vast number of PCIe plated edge connection cards exists. None of these points is realized on any of the carrier boards. All carrier boards include a compact connector with access to PCIe, but it implies the use of expensive high-speed connectors and cables to a custom-designed PCIe card. However, all carrier boards include an M.2 M-Key with four PCIe lanes for high-speed storage modules.

For the video input, all carriers implement CSI connectors. Rogue-X in the form of one high-density, high-speed connector and AX710 with four low-profile, high-speed connectors. In both options, again, expensive cable and connectors need to be used. X220 only offers four CSI, what is few for applications discussed in Section 2.1. Nevertheless, on a comparable cheaper connector then the first two.

All carriers include standard low-speed communication like UART, I2C, CAN and offer GPIOs. This type of interface is not discussed in the comparison because it is not system defining. Cheap solutions exist to extend the number of low-speed interfaces.

### 2.3.2. Cost and Complexity

Rogue-X and AX710 are carrier boards for the price range of 1000 USD. However, the carrier-board offers only a 1 Gb Ethernet and USB as data input channels. These interfaces can not deliver enough data to realize the full computing power of a Xavier module. Considering that the Xavier 8GB module costs 612 USD, what results in a not optimal system with limited capability for over 1500 USD. This is not acceptable for any cost-optimized markets. Where Auvidea has a much lower price with 380 USD, but with high restrictions to optional extensions.

Additional modules combined with the system increases the feasibility. Figure 2.12 shows two examples of CSI input modules. First, one channel HDMI to CSI converter from Auvidea, which costs 110 USD. Second, an eight-channel GMSL to CSI module with a price of 450 USD. The system price increase significantly, but overall the system has higher potential value, which is created by the GPU and accelerators used to capacity.



(a) Auvidea, one channel HDMI to CSI converter. (b) Connectech GMSL to CSI moduel for 8 cameras. 450 USD

Figure 2.12.: Available camera extensions and price

An important point to understand is the competition of ESC, which are old architectures One BOX, Edge Conversion and industrial computers. Where complete industrial PCs are available for less than 2500 USD, including an Intel i7 CPU and an Nvidia GTX1060 GPU<sup>[12]</sup> (Figure 2.13).



Figure 2.13.: Industrial PC with i7 CPU and GTX1060 GPU for less than 2500 USD

## 2.4. Modular approach

The wide range of different interfaces for diverse applications offered by the ESC needs all to be accessed simply and cheaply. Previously described disadvantages of fixed carrier boards, motivate to develop an architecture with exchangeable modules. The main focus is to retain the initial interface variety of the Xavier module and guarantee an uncomplicated integration of converters and extensions.

This underlying motivation is also acknowledged by the discussed carrier board manufacturers. All pin headers and not consumer standard high-speed connectors show this intention not to limit the ESC possibilities. From the viewpoint of a general end-system manufacturer, the offered solutions, which imply additional expensive connectors and cable, are not optimal. Moreover, extensions and converters connected by a cable are loose and need additional effort in the end assembly of the system. The reason why carrier board manufacturers have a market is the complexity and long term resource investment to develop a custom carrier board. As mentioned before, midrange to small companies do not have these resources and are dependent on the carrier board. This also shows that most of the carrier board manufacturers offer Original Equipment Manufacturer (OEM) service. That means they take over the hardware development of your system. By doing so, an electronic device producer outsources one of the leading value creation fields to an external company. This results in the loose of interface and technology knowledge, potentially leading to worse products in the long term.

To open access to ESC for various new markets, the base platform needs to be modular. Moreover, modularity needs to become the main driving idea of the carrier board structure, instead of filling up free space on the PCB with expensive, not standard connectors. The carrier board needs to have similar functionality as a backplane, where modules can be plugged in without external cables. This structure will motivate the end-system manufacturer to keep his expertise in specialized interfaces and optimize them further. Furthermore, in a technological fast-changing market, a new interface module can be developed in a short time. Or even the minimal carrier board itself can be upgraded to a new ESC but still be used with the already developed modules. Modularity actively improves Time to Market and pushes the development of efficient, high-performance and agile hardware

## 3. Modular Concept

This chapter describes the concept for modular driven implementation of an ESC carrier board for the Nvidia Xavier 8GB module. The Xavier hardware structure is separate in functional blocks to define suitable modules. Possibilities for each block with corresponding modules are described as a final overall architecture.

### 3.1. Xavier Overview

The functional groups of the Xavier module are shown in Figure 3.1.



Figure 3.1.: Xavier internal hardware interface blocks grouped by functionality

#### **Video Out** (Section 3.3)

The SoM implements three independent video output blocks. Each port can act as eDP, DP or HDMI. Through the analysis of required

#### **Video Input** (Section 3.4)

Video input blocks are the 16 lanes MIPI CSI input and Sony SLVS interface. Because the SLVS interface is shared with the PCIe pins, it can not be combined with CSI to one functional group.

#### **NVHS** (Section 3.5)

Nvidia High-Speed cluster (NVHS) implements eight RX/TX high-speed lanes for different purposes. Namely, as one PCIe x8 controller, Nvidia high-speed link (NVlink) and Sony SLVS EC interface.

#### **HSIO** (Section 3.6)

High-Speed Input-Output cluster (HSIO) is the leading high-speed block of the SoM. It is shared between several PCIe, high-speed USB, UFS and SATA controllers.

#### **Ethernet** (Section 3.7)

For Ethernet connectivity, the SoM offers a Reduced Gigabit Media Independent Interface (RGMII). This gives the option to implement the desired PHY and transport media.

### 3.2. Physical Module Connection

A common problem for all modules is the elimination of expensive cables and connectors. This is achieved through edge plated PCB connections, as only this solution involves one component in form of the female socket.

Conceptually, the carrier board for a modular approach is similar to a PC motherboard, which is also highly configurable. For example laptops use the M.2 Next Generation Form Factor (NGFF) standard.



Figure 3.2.: NGFF connectivity

Figure 3.2 shows the female receptacle and a PCB with plated edge contacts. NGFF connectors are intended to be used with speeds up to PCIe Gen 3 (8 Gbps). This equals the highest bandwidth of the Xavier module. In total a NGFF connector offers 67 pins, and in terms of power, each pin can deliver up to 500 mA of current.



Figure 3.3.: NGFF options<sup>[13]</sup>

Figure 3.3 presents the available options in the NGFF standard. First, different keys (cutout in the PCB) locations are available to prevent the insertion of a different module into a slot. Figure 3.3b show, available heights of the connector. This is beneficial to optimize the volume of the end-product. Further, different heights enable to manufacture double or one-sided PCBs.

The negative side is that NGFF builds up on the PCIe M.2 standard, which defines a fixed pinout for different keys and applications. Moreover, all M.2 include PCIe lines, which are not present in all functional blocks of the Xavier module. However, mechanical provisions in form of an additional pin at the height of the key prevents the insertion of not intended modules. Therefore, the module interfaces are designed with NGFF standard.

### 3.3. Video Output

The main characteristics of supported video outputs of the Xavier are:

- HDMI 2.0: The standard interface for video transmission. The 18 Gbps data link supports 4K resolutions at 60 fps.
- DP 1.4: An newer standard with support of 25.92 Gbps. Transports up to 8K images at 60 fps. Includes multiple independent display per port, which can be daisy-chained.
- eDP: eDP is intended to be used as internal video link to panels in embedded or portable devices and supports touch-screen information transfer.

Figure 3.4 shows the lane muxing between HDMI and DP/eDP configuration. In case of HDMI only one Consumer Electronics Control (CEC) signal is provided for all HDMI ports.



Figure 3.4.: Xavier Video Output line sharing between HDMI, DP and eDP

#### 3.3.1. Requirements

The Xavier manufacturer Nvidia configures the primarily used video output standards inside the SoM. Further options are missing to use the video output for multimedia applications like video splitting, video muxing and longe range transmission.

##### Video Splitter

The duplication of a given video stream is desired in multimedia applications. One possibility is to output the same video stream on two different video outputs of the Xavier module. However, this not an efficient utilization of the video output engine. More cost-effective is the use of splitter ASICs, which exists on the market. The behavior of the splitter is configured through I2C or GPIO, which needs to be provided on the module.

##### Video Mux

Other applications might require a switch between video sinks. Specialized solutions/chips exist for this purpose, with the same configuration interfaces GPIO and I2C as splitter ASICs.

##### HDBaseT

With the offered standards by Nvidia, it is not possible to transmit a video signal over a long-range. The HDBaseT standard solves this problem to transmit HDMI over 100m. Additionally, not only HDMI can be transmitted over HDBaseT but also I2C, UART, USB and Ethernet.

### Power Supplies

Different modules require diverse power supplies from 1V to 5V. However, the power demand is also very different in each case. It is considered as optimal to deliver 3.3 V and 5V voltages to the module, each with four pins. With 0.5 A per pin, this results in a total of 6.6 W on the 3.3 V power supply and 10 W on 5V. The quality of lower voltages is more critical for the functionality of implemented ASIC and should be generated on the module itself.

#### 3.3.2. Video Output Module Interface

To fully utilize the 67 pins of NGFF connector, two video output ports can be combined on one connector. Figure 3.5a gives a simplified description of available functionality on the connector and Figure 3.5b shows detailed pin description.



Figure 3.5.: Video Output Interface

The naming convention, inherited from the Xavier module, shows the belonging of video lines to each standard. Hence, HDMI0\_DP3 is the HDMI data lane zero of Display Port lane 3. This structure is inherited from the Xavier module.

Moreover, for control purposes the module, an I2C port and three GPIO for each video out port exist on the module connector. The USB 2.0 provides the possibility to connect a touchscreen or a display with peripherals over HDbaseT. The Key pin is a standing out pin similar to the M2. Key to prevent insertion of standart M.2 modules.

GP

### 3.4. Video Input

Xavier offers 16 CSI Lanes which can be configured as MIPI D-Phy or MIPI C-Phy. Figure 3.6 shows the 16 lane arrangement in D-Phy and C-Phy modes.



Figure 3.6.: Camera Input Ports

The datasheet of the Xavier module states that four 4-lane cameras or six 2-lane cameras can be connected. This follows from the input path of the video signal into the Xavier module, as very tuple of ports are connected to a Camera Input Logic block (CIL). This block is responsible for decoding the applied standard into a byte stream. Further, a Pixel Parser converts the byte stream to an pixel stream. In Figure 3.7 shows only one Pixel Parser for the CILs of Port 4/5 and 6/7. Hence, port 5 and 7 can not execute standalone. Further, Table 3.1 shows potential port configurations.



Figure 3.7.: Missing Pixel Parser Port 5 and 7

| Port configuration |                     |                     |          |
|--------------------|---------------------|---------------------|----------|
| 6x2 Lane           | 4x2 Lane & 2x4 Lane | 2x2 Lane & 3x4 Lane | 4x4 Lane |

Table 3.1.: CSI lane options

### 3.4.1. Requirements

#### HDMI / DP

For HDMI / DP video input is converted to CSI. As both standards carry video and audio data, the converter separates the incoming data in a CSI and I2S stream. The converter is configured and controlled through I2C and GPIOs.

#### HDBaseT

For an HDBaseT to CSI conversion, two chips are combined. With Valens chipset, HDBaseT first converts to HDMI and then to CSI like in the previous example. Similar to HDMI / DP converter, HDBaseT is needs I2C and GPIOs for configuration.

#### Camera Sensors

I2C and GPIOs are also the control interface for Camera sensors. Moreover, Lattice Crosslink technology extends the sensor selection to subLVDS sensors.

#### FPD Link / GMSL

Interface wise, FPD-Link III and GMSL, are used to enlarge the distance of CSI, are transparent for the host video input interface. Hence, the same control interfaces are used, as for camera sensors.

#### Power Supply

Because of consistency reasons the same power concept as in video output module is used. Therefore, 6.6 W on the 3.3 V power supply and 10 W on 5V.

### 3.4.2. Video Input Module Interface

With four 2-line CSI configurations and four clocks leading to total eight lanes, all desired video standards can be provided. Figure 3.8 exemplifies the structure including configuration possibles via I2C or six GPIO ports. For standards containing audio signal the interface provides an I2S port.



Figure 3.8.: Video Input

### 3.5. Nvidia Highspeed UPHY (NVHS)

The Xavier module offers a dedicated, eight lanes wide high-speed port named NVHS, which is the most performant data transfer port of the SoM. The NVHS block includes three independent blocks, which are NVlink, SLVS EC controller and PCIe x8. Figure 3.9 show the muxing of these standards to one eight-lane wide port.



Figure 3.9.: NVHS Upphy muxing of NVlink, PCIe and SLVS EC

#### SLVS EC

Sony Scalable Low Voltage Signaling with Embedded Clock (SLVS EC) is a new standard for Sony high-speed sensors.

#### NVLink

Nvidia GPU Link is used to interconnect GPUs between itself.

#### PCIe

PCIe x8 is the most common interface for general purpose high-speed cards. The maturity of the standard lead to vast availability to PCIe cards. PCIe cards adds the most value to an ESC carrier board by extending the interface possibilities. Physically a PCIe x16 connector on the carrier board allows to plug in existing PCIe x1 to x16 cards and is shown in Figure 3.10.



Figure 3.10.: PCIe x16 connector to connect PCIe x1 to x16 cards

## 3.6. Highspeed IO x12 (HSIO)

The main value of HSIO are twelve independent high-speed lanes and it is the most complex cluster of the Xavier module. Through muxes, the output drivers connect to different high-speed protocols (USB 3.1, SATA, UFS and five PCIe controllers). The analysis of this block results in an optimal distribution of the output drivers between the standards.

### 3.6.1. SoM Structure

The high-speed cluster is a complex structure between different interfaces. HSIO includes a total of 12 high-speed RX/TX lanes through the output mux. These ports are shared between four USB3, one SATA, one UFS and 12 PCIe lanes. Additionally, the 12 PCIe lanes are shared between five PCIe controllers with different bus widths, resulting in 15 lanes. Inside the PCIe Complex, left part in Figure 3.11, the PCIe lanes are connected to 12 PCIe mux. Each PCIe Mux has two inputs per lane. When a PCIe controller is selected, it will block the other connected controller to the same mux.



Figure 3.11.: High speed signal muxing

The PCIe controllers on the left side of Figure 3.11 support following configurations:

- C0: x1, x2, x4, x8
- C1: x1
- C2: x1
- C3: x1
- C4: x1, x2, x4

Moreover, Figure 3.11 shows that four USB 3.1 connections exist, but only two controllers are available. Therefore, only two USB 3.1 ports can be active simultaneously.

### 3.6.2. Requirements

The NVHS Block provides PCIe and therefore a second PCIe connector is not of priority. The following list gives a priority list of other high-speed interfaces.

#### 1. USB 3

To enable usage in stereo vision application with USB 3 high-speed cameras at least two USB 3 ports need to be provided. Each of the ports must have an independent USB controller.

#### 2. WLAN/Bluetooth/LTE

For mobile devices like robots or drones, a standard M.2 E-Key needs to be provided to enable the use of standard WLAN or Bluetooth modules. The standard E-key modules require one PCIe lane and a USB 2.0 interface. For autonomous machines a M.2 B-Key gives the access to LTE and WWAN (Wireless Wide Area Network) and uses two PCIe lanes in combination with a USB 2.0 port.

#### 3. Mass storage

A connection to a mass storage device like SATA or NVMe SSD needs to be provided. These modules are connected with a standard M.2 M-key and require four PCIe lanes.

#### 4. Compact storage

To allow future applications, Universal Flash Standard (UFS) needs to be supported and requires one high-speed lane.

#### 5. Free High Speed Ports

An increase of USB3 ports is desireable and the remaining high-speed ports provide connections to a general purpose module.

### 3.6.3. Lane Partition

The HSIO cluster offers enough lanes to alocate all desired functions. To achieve this goal, first, interfaces with no placement option are defined. Secondly, the definition of all interfaces with higher lane numbers follows and the last part is the remainder. Table 3.2 is derived from Figure 3.11 and shows all options on all output lanes, with the naming convention:

- CX: PCIe Controller X
- UX\_Y: USB 3.1 controller X, Port Y

| Lane    | 0         | 1                  | 2  | 3  | 4           | 5         | 6                  | 7         | 8         | 9         | 10                         | 11          |
|---------|-----------|--------------------|----|----|-------------|-----------|--------------------|-----------|-----------|-----------|----------------------------|-------------|
| Options | C1,<br>C4 | C3,<br>C4,<br>U0_0 | C0 | C0 | C0,<br>U0_1 | C0,<br>C2 | C0,<br>C1,<br>U1_0 | C0,<br>C3 | C0,<br>C4 | C0,<br>C4 | C2,<br>C4,<br>SATA,<br>UFS | C4,<br>U1_1 |

Table 3.2.: HS Lane Partition

Table 3.3 to Table 3.5 visualize the process of partitioning. The colored lanes with only one entry are those with fixed locations.

The first step reserves the UFS output lane 10, which blocks C4 to function in x4 mode for the mass storage module. For this purpose, C0 is set in x4 mode.

| Lane    | 0         | 1                  | 2    | 3    | 4    | 5    | 6           | 7  | 8  | 9  | 10  | 11          |
|---------|-----------|--------------------|------|------|------|------|-------------|----|----|----|-----|-------------|
| Options | C1,<br>C4 | C3,<br>C4,<br>U0_0 | C0_0 | C0_1 | C0_2 | C0_3 | C1,<br>U1_0 | C3 | C4 | C4 | UFS | C4,<br>U1_1 |

Table 3.3.: Placed ■ UFS and ■ PCIe x4

The PCIe controller C0 blocks the USB controller 0 Port 1 (lane 4). Hence, reserving lane 1 for USB controller 0 Port 0 ensures that both controllers are accessible. Further, C3 is used for the Wifi/Bluetooth modules and lane 8 and 9 cover the two lane B-Key setup without blocking other interfaces.

| Lane    | 0         | 1    | 2    | 3    | 4    | 5    | 6           | 7  | 8  | 9  | 10  | 11          |
|---------|-----------|------|------|------|------|------|-------------|----|----|----|-----|-------------|
| Options | C1,<br>C4 | U0_0 | C0_0 | C0_1 | C0_2 | C0_3 | C1,<br>U1_0 | C3 | C4 | C4 | UFS | C4,<br>U1_1 |

Table 3.4.: Placed ■ UFS, ■ PCIe x4, ■ USB, ■ PCIe x1, ■ PCIe x2

Lane 6 and 11 can both be used for the second USB controller 1 as all other requirements are fulfilled. Finally, Lane 0 serves as free x1 PCIe Lane (C1) and Lane 8/9 are x2 PCIe (C4) for the user.

| Lane    | 0  | 1    | 2    | 3    | 4    | 5    | 6    | 7  | 8  | 9  | 10  | 11   |
|---------|----|------|------|------|------|------|------|----|----|----|-----|------|
| Options | C1 | U0_0 | C0_0 | C0_1 | C0_2 | C0_3 | U1_0 | C3 | C4 | C4 | UFS | U1_1 |

Table 3.5.: Placed ■ UFS, ■ PCIe x4, ■ USB, ■ PCIe x1, ■ PCIe x2, ■ 2xUSB

Standard PCIe M.2 modules with B-Key, E-Key and M-key as well as the standard UFS module follow from their standard definition and will not be discussed in detail.

### 3.7. Ethernet Module Interface

The SoM has a dedicated Reduced Gigabit Media Independent Interface (RGMII) for wired Internet access. This interface allows to choose a desired PHY to connect to the module, which supports the idea of the modular approach. However, RGMII only supports 1000 Mbit connections. To expand the possibilities for wired internet connection, the free C1 PCIe controller in Table 3.5 will be combined with RGMII on the same module. This gives the end-manufacturer further possibilities to produce multi-port Ethernet connectivity over a PCIe connection.



Figure 3.12.: Ethernet module interface

Figure 3.12 shows the M.2 connector used for the Ethernet module, which has enough free pins to include 1.8V, 3.3V and 12V supplies. 3.3V and 1.8V are the most common voltage levels for Ethernet PHYs. The four-pin 12V supply delivers up to 24 W and therefore exceeds the specifications of 802.3af-2003 for Power over Ethernet, and therefore the module is capable of supporting this standard.

### 3.8. USB Module Interface

A USB 3.1 port consists of a pair of high-speed RX/TX lines and a USB 2.0 port. All required signals occupy small parts of an M.2 connector. To increase the optional value, the module interface is extended with general-purpose low-speed interfaces (I2C, SPI, UART, CAN). Figure 3.13 shows a functional overview of the module connection and provides a detailed pinout for both ports.



Figure 3.13.: USB module interface

### 3.9. Value optimization

Applied to the Xavier module, the presented modular interfaces have two drawbacks. The first drawback is the odd number of video ports on the Xavier. Following from the two port design of the module only one port will be occupied. In case of a direct video output implementation without converters, the manufacturing costs of a one channel video output module and the module interface on the carrier board are higher than the direct implementation of video output on the baseboard.

Second, the USB controller 1 in Table 3.5 is available on two HSIO lines (six and eleven). And one of them is connected to the USB module interface. However, if no USB module is used, it is better to connect the remaining USB port to a fixed connector on the baseboard. As for the video input a direct connection without converters is very cost-effective.

Both aspects, motivate to implement a minimum of two interfaces directly on the base board. This adds the value that a Xavier ESC without any modules is still accessible for development and maintenance.

### 3.10. System overview



Figure 3.14.: Concept Overview

Figure 3.14 shows the connection to all peripherals containing at least one high-speed connection. Custom designed M.2 interfaces are marked in blue, where red highlighted parts are standard available M.2 modules. Directly implemented interfaces on the carrier board are green. The modular approach achieves higher potential value and custom-designed modules provide development opportunities to the end-manufacturer in a simplified way. With the standard M.2 modules, a simple way for connectivity and storage options is given, while the interfaces installed on the carrier board reduce the development effort and prototyping.

## 4. Implementation

The main benefits of the modular approach are agile hardware and reduced costs. The design and production of the carrier board and five modules show the advantages of this approach. This chapter describes the resulting hardware.

Figure 4.1 visualize the top, bottom and side view of the mechanical concept. Dark green represents the main carrier board, where light green are the Video Output and USB custom modules facing the front side of the device on the top level. Furthermore, the standard M.2 modules, which are one-sided boards, are located underneath the Xavier module. The potential E-key and B-key modules that imply the use of antennas face the backside of the device for mounting of the antenna. The PCIe slot is also designed in a way that it faces the backside of the device. In general, the PCIe card defines the length of the module from the connector end to the PCIe front panel. Also facing backside, the two not modular connectors HDMI and USB are placed, combined with the power input. All connectors on the backside intend to provide developer or maintenance access, while the primary front side use is modular connectivity.

In grey color, the bottom custom modules, two video input and Ethernet modules are drawn. Note that on the front side at top and bottom, different types of custom modules are located near the edge of the baseboard. The intention for this design is to relax physical constraints for module development. Hence, modules located near the edge can have larger dimensions offering sufficient space to implement multi-stage converters.



Figure 4.1.: Modular Xavier carrier board

## 4.1. Carrier Board

With the described concept the board was designed<sup>1</sup> and produced. Figure 4.2 shows the finalized base board.



(a) 3D drawing of the carrier board from top view (b) 3D drawing of the carrier board from bottom view

Figure 4.2.: Carrier board

Table 4.1 shows the technical specifications of this baseboard.

| Function                     | Description                                                                                                                                                    |
|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Main processor               | Nvidia Xavier connector                                                                                                                                        |
| Standard PCIe                | PCIe x16 connector with eight active lines.<br>PCIe M.2 M-key x4<br>PCIe M.2 E-Key x1 with USB and I2C support<br>PCIe M.2 B-key with SIM slot and USB support |
| Video Input modules          | Two times custom optimized video input with supported 4 x 2 lane or 2 x 4 lane CSI configuration supporting I2S, I2C and GPIO control                          |
| Video Output module          | Including two video ports with support of HDMI 2.0, DP 1.2, eDP 1.4 supporting USB, I2C and GPIO control                                                       |
| Ethernet module              | RGMII interface and PCIe x1 for Phy connection. Supports PoE and GPIO control                                                                                  |
| USB / General Purpose module | Two independent USB 3.1 controller with up to 48 W USB power delivery. Including I2C, SPI, UART and CAN.                                                       |
| Fixed connectors             | USB 3.0 , USB 2.0 , HDMI 2.0, FTDI UART, 16 GPIOs Pin Header, Fan output                                                                                       |
| Physical                     | 8-Layer HDI design. 145mm x 120 mm                                                                                                                             |

Table 4.1.: Modular carrier board specifications

<sup>1</sup>The PCB Layout was accomplished by a external company "Pender Electronics"

## 4.2. Modules

Five modules were designed to validate the applicability of the proposed modular approach. This section gives an overview of the specifications of each module. Mechanically, all modules follow the outline of Figure 4.3.



Figure 4.3.: Module outline with edge connector and carrier board fixation on the right side. Two options describe potential fixations of the front panel.

### HDMI Input

The two channel HDMI to CSI converter shown in Figure 4.4 uses the Lontium LT6911uxc chip to perform a conversion from HDMI 2.0 to CSI D-Phy 1.2. The module interface, as detailed in Section 3.4.3, has one I2S audio port and therefore, the extracted audio information of the HDMI source is muxed into the available channel. The external connections are protected from ESD and have EMI filters.



Figure 4.4.: Two channel HDMI to CSI converter

### HDMI Output

For additional video output a two channel HDMI 2.0 module was developed and is shown in Figure 4.5. The HDMI channels are directly connected to the video inputs of the Xavier module and therefore the module provides the same capabilities as the Xavier. The external connection are protected from ESD and have EMI filters.



Figure 4.5.: Two channel HDMI output

### Ethernet

The Ethernet module, shown in Figure 4.6, implements the Marvell 88E1512 Phy for 1000BASE-T IEEE 802.3. The external connection are protected from ESD.<sup>2</sup>



Figure 4.6.: Ethernet Phy and connector

---

<sup>2</sup>The layout of the Ethernet module was done by Dominic Moesch and is not part of this thesis.

**FPD-Link**

Four channel FPF-Link III are provided on the FPD-Link module, see Figure 4.7. The TI DS90UB954 converts two FPD-Link channels to one CSI stream.<sup>3</sup>



Figure 4.7.: Four channel FPD-Link III converter

**USB-C**

The USB module implements two USB 3.1 Type-C interfaces and four connectors for low-speed communication, namely I2C, SPI, UART, CAN. The module is shown in Figure 4.8



Figure 4.8.: Two channel USB Type-C and low-speed interfaces

<sup>3</sup>The layout and schematic of the FPD-Link module was done by Rico Ganahl and is not part of this thesis.

## 5. Results

The main focus of this thesis is the question, what needs to change, that ESC could be used in a wider range of applications. The given situation is, that the use of ESCs is associated with high long term investment, where only large companies have enough human and capital resource to bring an ESC device to the market.

account for the current situation of the ESC market. First, an ESC is a complex and expensive computational device. But a developer or purchaser does not have big influence on the price, except of the purchased number of device, which is limited by the market. More important is, that if an ESC is used, the full potential is realized to add more value to the end product. This leads to the second point, utilizing an ESC to its capacity requires a high data transfer to the powerful processing units. This is achieved through the wide range of high-performance interfaces of the ESC. However, the variety of demanded interfaces in different markets is so high, that each application needs a custom designed board for an ESC. This is the reason for the third defining factor, expensive and not optimal carrier boards. Due to limited resources a smaller company is forced to purchase an existing board and pay of the long term investment of the carrier board developer. The combination of an expensive ESC with an expensive carrier board, where the whole system can not show best performance, is the reason why ESCs can not access new demanding niche markets.

Our in-depth analysis of requirements shows a demand for using a standardized modular interface approach. All interfaces of the ESC should be accessible on cost-effective high-speed connectors. To further reduce costs, the PCIe M.2 ensures that only one additional mechanical part is required from the data source to a sink on a different board. During this thesis, with the main thought of modularity, a complete system for an Nvidia Xavier 8GB ESC was developed and includes five modules. In particular, a Ethernet module, a dual channel HDMI to CSI module , a dual channel HDMI output, FPD-Link III deserializer module and a general purpose plugin for USB C Type-C and low-speed communication. The complete system is shown in Figure 5.1.



(a) Modular carrier board with modules  
bottom view

(b) Modular carrier board with modules  
top view

Figure 5.1.: Final system design as outcome of the thesis.

## 5.1. System Feasibility

The prototype production for three carrier boards costs 640 USD. Due to low quantity the price is not comparable with available carrier boards in the market. From experience in other projects it can be assumed that the price drops by 50% already at 100 pieces, what is a suitable production number considering small niche markets. Furthermore, no assembly and supply chain optimization was done for the prototype and both strongly.

Because the carrier is build with maximal degree of freedom for the end-manufacturer, a minimum of interfaces is available on the carrier board. If a comparison with other products the functionality of the modular baseboard is disadvantageous. Including the interface extensions, the situation changes strongly.

The cost of production for three modules of each type, five in total, costs 2430 USD. Hence, 809 USD for a batch of 5 different modules. With the same assumption of 50% cost reduction for an increased quantity of at least 100, the price changes to 409 USD for five modules and a total system price results in 730 USD with the carrier board.

Table 5.1 shows a comparison between the most comprehensive carrier board Rogue-X and the developed carrier board. The new board is extended with modules to be technically equal or better as the Rogue-X for a price comparison. To achieve this an one channel HDMI out module (26 USD<sup>1</sup>), the USB C module (42 USD<sup>1</sup>) and a two port Ethernet module (60 USD<sup>1</sup>).

| Functional block | Rogue-X                                  | Modular carrier board approach                                      |
|------------------|------------------------------------------|---------------------------------------------------------------------|
| Internet         | 2x Ethernet ports / M.2 E Key for Wlan   | 2x Ethernet ports / M.2 E Key for Wlan / B key for LTE,WWAN and GPS |
| Video Out        | 2x HDMI 2.0                              | 2x HDMI 2.0                                                         |
| Video In         | 16 lanes CSI connector                   | 16 lanes CSI on connectors                                          |
| USB              | 3x USB 3.1 Type C (only one true Type C) | 1x USB 3.1 Type A / 2x full USB Type C                              |
| PCIe             | 2x PCIe Gen2 x4 I-Pex Connector          | Standard PCIe slot Gen3 x8                                          |
| Storage          | M.2 M Key for NVMe / SD Card             | M.2 M Key (Optional B-key) / SD Card / UFS                          |
| Size             | 105mm x 105mm                            | 165mm x 120mm                                                       |
| Price            | 930 USD <sup>[8]</sup>                   | 448 USD                                                             |

Table 5.1.: Comparison between Rogue-X and modular carrier board

<sup>1</sup>Estimated price at 100 pieces

## 5.2. Conclusion

The modular based carrier board is the optimal platform for efficient utilization of ESC type SoMs. The results show, that the modular approach has an attractive compared to available carrier boards. Especially the possibility to include or exclude certain modules completely, strongly increase the cost to added value ration. Furthermore, each manufacturer can use it's expertise to develop a specific module that fits best with the intended application. This provides a great control over the required production costs of the module.

In addition to material related benefits like lower price and higher performance, the development process is strongly simplified. Software and hardware development are more detached from each other. During the time, a new interface is developed, a software developer can proceed with his work by using an older module, or a temporary PCIe card emulating the new module. This property of the system shortens the development time dramatically. Moreover, the evaluation of new technologies needs less effort, because of the short time needed to develop a one channel prototype module. The modules itself have a high value in production. Considering that the same modules can be used on other carrier board for other SoM. What enables the control of market segmentation and simplifies the supply chain.

The modular platform architecture for embedded super computer has great potential to be the enabling factor for further innovation and quality.

## A. Appendix



- FID3 PCB FIDUCIAL
- FID2 PCB FIDUCIAL
- FID1 PCB FIDUCIAL



A

B

C

D

A

B

C

D



A

A



B

B

C

C

D

D

A



B



C



D



A

A



B

B



C

C



D

D





A

A

B

B

C

C

D

D



A



**M.2 B-Key Module**  
Option with SIM Card. For LTE/WWAN/GPS



B



C



D





A

A

B

B

C

C

D

D





### Video Input Module P0 2+2 lanes AB and 2+2 lanes CD



### Video Input Module P1 4 lanes EF and 4 lanes GH









A



A

B



B

C



C

D

A

A

B

B

C

C

D

D











# Bibliography

- [1] Developer guide tensor rt. <https://docs.nvidia.com/deeplearning/tensorrt/pdf/TensorRT-Developer-Guide.pdf>.
- [2] Xilinx. Xilinx dnndk user guide. [https://www.xilinx.com/support/documentation/user\\_guides/ug1327-dnndk-user-guide.pdf](https://www.xilinx.com/support/documentation/user_guides/ug1327-dnndk-user-guide.pdf). Online; accessed 21.05.2020.
- [3] mendium. <https://medium.com/skydio/inside-the-mind-of-the-skydio-2-b1b78aa6dfa7>. Online; accessed 20.05.2020.
- [4] Gladys Makhana, Oliver Pwaka, and Chergedzai Mafini. *Liquid Petroleum Gas Supply Chain Challenges in Rural Medical Facilities in Zimbabwe: 13th EAI International Conference, TridentCom 2018, Shanghai, China, December 1-3, 2018, Proceedings*, pages 288–301. 03 2019.
- [5] Aldec. Akdec adas. <https://www.aldec.com/en/solutions/embedded/adas>. Online; accessed 24.05.20.
- [6] MIPI. Mipi csi description. <http://resources.mipi.org/blog/powering-ai-and-automotive-applications-with-the-mipi-camera-interface>. Online; accessed 24.05.20.
- [7] geek university. Ethernet standards. <https://geek-university.com/ccna/ieee-ethernet-standards/>. Online; accessed 25.05.20.
- [8] SiliconHighway. Agx101. <https://www.siliconhighwaydirect.co.uk/product-p/agx101.htm>. Online; accessed 25.05.20.
- [9] Connect Tech. Rogue-x carrier. <http://connecttech.com/product/rogue-x-carrier-nvidia-jetson-agx-xavier/>. Online; accessed 25.05.20.
- [10] Aetina. Ax710. <https://www.aetina.com/products-detail.php?i=255>. Online; accessed 25.05.20.
- [11] Auvidea. X220. <https://auvidea.eu/product/70410/>. Online; accessed 25.05.20.
- [12] abigo4u. Industial pc. <https://abigo4u.com/en/nuvo-7160gc-rugged-pc-8th-gen-corei-poe-120w-nvidia.html>. Online; accessed 25.05.20.
- [13] TE. Ngff te. <https://www.te.com/deu-de/products/connectors/pcb-connectors/card-edge-connectors/m-2-connectors.html?tab=pgp-story>. Online; accessed 25.05.20.

# List of Figures

|                                                                                                                                                                                                |    |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1. Product Characterization . . . . .                                                                                                                                                        | 2  |
| 1.2. Technology Fields: ■ Time critical, ■ Computational heavy , ■ High data transfer, ■ Interactive Multimedia,<br>■ Production Lines, ■ Server Applications, ■ Autonomous Movement . . . . . | 3  |
| 1.3. Technology Families: ■ Automation, ■ Analysis, ■ Capture/Distribution . . . . .                                                                                                           | 4  |
| 1.4. One Box Solutions . . . . .                                                                                                                                                               | 5  |
| 1.5. Complexity Comparison Central and Edge Conversion Solution . . . . .                                                                                                                      | 5  |
| 1.6. Server Architecture . . . . .                                                                                                                                                             | 6  |
| 1.7. SoC Types . . . . .                                                                                                                                                                       | 7  |
| 1.8. SoM / Embedded Supper Computer . . . . .                                                                                                                                                  | 8  |
| 1.9. Overview of tools to be used with Nvidia GPU <sup>[1]</sup> . . . . .                                                                                                                     | 11 |
| 1.10. Xilinx DNNDK process <sup>[2]</sup> . . . . .                                                                                                                                            | 11 |
| 2.1. Skidio 2 drone <sup>[3]</sup> . . . . .                                                                                                                                                   | 13 |
| 2.2. Short range of CSI problem . . . . .                                                                                                                                                      | 14 |
| 2.3. HDMI mux/scaler based on Xavier SoM . . . . .                                                                                                                                             | 15 |
| 2.4. Concept of CSI serializer and deserilizer used by FPD-Link III and GMSL . . . . .                                                                                                         | 16 |
| 2.5. HDMI to MIPI converter . . . . .                                                                                                                                                          | 16 |
| 2.6. HDBaseT example to transport HDMI, Ethernet and USB 2 to an ESC . . . . .                                                                                                                 | 17 |
| 2.7. Lattice Crosslink conversion LVDS to MIPI CSI . . . . .                                                                                                                                   | 17 |
| 2.8. HDBaseT used to extend HDMI signal to longer distance . . . . .                                                                                                                           | 17 |
| 2.9. Rogue-X <sup>[9]</sup> . . . . .                                                                                                                                                          | 19 |
| 2.10. AX710 <sup>[10]</sup> . . . . .                                                                                                                                                          | 19 |
| 2.11. X220 <sup>[11]</sup> . . . . .                                                                                                                                                           | 20 |
| 2.12. Available camera extensions and price . . . . .                                                                                                                                          | 21 |
| 2.13. Industrial PC with i7 CPU and GTX1060 GPU for less than 2500 USD . . . . .                                                                                                               | 21 |
| 3.1. Xavier internal hardware interface blocks grouped by functionality . . . . .                                                                                                              | 23 |
| 3.2. NGFF connectivity . . . . .                                                                                                                                                               | 24 |
| 3.3. NGFF options <sup>[13]</sup> . . . . .                                                                                                                                                    | 24 |
| 3.4. Xavier Video Output line sharing between HDMI, DP and eDP . . . . .                                                                                                                       | 25 |
| 3.5. Video Output Interface . . . . .                                                                                                                                                          | 26 |
| 3.6. Camera Input Ports . . . . .                                                                                                                                                              | 27 |
| 3.7. Missing Pixel Parser Port 5 and 7 . . . . .                                                                                                                                               | 27 |
| 3.8. Video Input . . . . .                                                                                                                                                                     | 28 |
| 3.9. NVHS Upphy muxing of NVlink, PCIe and SLVS EC . . . . .                                                                                                                                   | 29 |
| 3.10. PCIe x16 connector to connect PCIe x1 to x16 cards . . . . .                                                                                                                             | 29 |
| 3.11. High speed signal muxing . . . . .                                                                                                                                                       | 30 |
| 3.12. Ethernet module interface . . . . .                                                                                                                                                      | 33 |
| 3.13. USB module interface . . . . .                                                                                                                                                           | 34 |
| 3.14. Concept Overview . . . . .                                                                                                                                                               | 35 |
| 4.1. Modular Xavier carrier board . . . . .                                                                                                                                                    | 36 |
| 4.2. Carrier board . . . . .                                                                                                                                                                   | 37 |
| 4.3. Module outline with edge connector and carrier board fixation on the right side. Two options describe potential<br>fixations of the front panel . . . . .                                 | 38 |
| 4.4. Two channel HDMI to CSI converter . . . . .                                                                                                                                               | 38 |
| 4.5. Two channel HDMI output . . . . .                                                                                                                                                         | 39 |
| 4.6. Ethernet Phy and connector . . . . .                                                                                                                                                      | 39 |
| 4.7. Four channel FPD-Link III converter . . . . .                                                                                                                                             | 40 |
| 4.8. Two channel USB Type-C and low-speed interfaces . . . . .                                                                                                                                 | 40 |
| 5.1. Final system design as outcome of the thesis. . . . .                                                                                                                                     | 41 |

# List of Tables

|                                                                                                     |    |
|-----------------------------------------------------------------------------------------------------|----|
| 1.1. Conventional solutions comparison . . . . .                                                    | 6  |
| 1.2. Possibilities Comparison . . . . .                                                             | 9  |
| 2.1. Interface and hardware blocks utilization in drone applications . . . . .                      | 13 |
| 2.2. Interface and hardware blocks utilization in autonomous delivery robots . . . . .              | 14 |
| 2.3. Interface and hardware blocks utilization in multimedia applications . . . . .                 | 15 |
| 2.4. Interface and hardware blocks utilization in One Box or Edge Conversion applications . . . . . | 15 |
| 2.5. Main Ethernet Standards with description of technology and distance <sup>[7]</sup> . . . . .   | 18 |
| 2.6. USB standards, showing voltage and power levels . . . . .                                      | 18 |
| 2.7. Rogue-X capabilities . . . . .                                                                 | 19 |
| 2.8. AX710 capabilities . . . . .                                                                   | 19 |
| 2.9. X220 capabilities . . . . .                                                                    | 20 |
| 3.1. CSI lane options . . . . .                                                                     | 27 |
| 3.2. HS Lane Partition . . . . .                                                                    | 31 |
| 3.3. Placed ■ UFS and ■ PCIe x4 . . . . .                                                           | 32 |
| 3.4. Placed ■ UFS, ■ PCIex4, ■ USB, ■ PCIe x1, ■ PCIe x2 . . . . .                                  | 32 |
| 3.5. Placed ■ UFS, ■ PCIex4, ■ USB, ■ PCIe x1, ■ PCIe x2, ■ 2xUSB . . . . .                         | 32 |
| 4.1. Modular carrier board specifications . . . . .                                                 | 37 |
| 5.1. Comparison between Rogue-X and modular carrier board . . . . .                                 | 42 |