

Advanced Topics in Communication Networks

# Programming Network Data Planes



Laurent Vanbever  
[nsg.ee.ethz.ch](http://nsg.ee.ethz.ch)

ETH Zürich  
Nov 1 2018

Last week on  
**Advanced Topics in Communication Networks**

We looked at the Tofino architecture together with two (key, value) store applications: Net/{Cache, Chain}



A screenshot of a presentation slide titled "NetCache: Balancing Key-Value Stores with Fast In-Network Caching" by Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica. The slide includes logos for Johns Hopkins University, Barefoot Networks, Princeton University, Cornell University, and Berkeley. The copyright notice "Copyright © 2017 - Barefoot Networks" is visible at the bottom.

A screenshot of a presentation slide titled "NetChain: Scale-Free Sub-RTT Coordination" by Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica. The slide includes logos for Johns Hopkins University, Barefoot Networks, Princeton University, Cornell University, and Berkeley. The copyright notice "Copyright © 2017 - Barefoot Networks" is visible at the bottom.



A screenshot of a Mac OS X desktop showing a presentation slide titled "NetCache: Balancing Key-Value Stores with Fast In-Network Caching". The authors listed are Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. Logos for Johns Hopkins University, Barefoot Networks, Princeton University, Cornell University, and Berkeley are displayed at the bottom.

A screenshot of a Mac OS X desktop showing a presentation slide titled "NetChain: Scale-Free Sub-RTT Coordination". The authors listed are Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. Logos for Johns Hopkins University, Barefoot Networks, Princeton University, Università della Svizzera Italiana, and Berkeley are displayed at the bottom.

**“Programmable switches are 10-100x slower than fixed-function switches. They cost more and consume more power.”**

**Conventional wisdom in networking**

One of the main enabler for data-plane programmability is the shrinking size of the packet processing logic chip.



Source: Programmable Data Planes at Terabit Speeds, Vladimir Gurevich, 2017

Barefoot Tofino processes packets in parallel,  
even though the semantic of a P4 program is sequential

## Parallelism and alternatives

- Sequential semantics does not prohibit parallelism
- Doing everything does not mean doing everything all the time

Barefoot Tofino processes packets in parallel,  
even though the semantic of a P4 program is sequential

## PISA: Important Details

- Multiple simultaneous lookups and actions can be supported



# Barefoot Tofino 6.5 Tbps backplane

## several billion packets per second at line rate

### 6.5Tb/s Tofino™ Summary

- **State of the art design**
  - Single Shared Packet Buffer
  - TSMC 16nm FinFET+
- **Four Match+Action Pipelines**
  - Fully programmable PISA Embodiment
  - All compiled programs run at line-rate.
  - Up to 1.3 million IPv4 routes
- **Port Configurations**
  - 65 x 100GE/40GE
  - 130 x 50GE
  - 260 x 25GE/10GE
- **CPU Interfaces**
  - PCIe: Gen3 x4/x2/x1
  - Dedicated 100GE port



Tofino relies on Packet Header Vector (PHV) to pass states between stages—this is one of the limiting factor

## Packet Header Vector (PHV)

- A set of uniform containers that carry the headers and metadata along the pipeline
- Fields can be packed into any container or their combination
- PHV Allocation step in the compiler decides the actual packing



Tofino uses a folded pipeline in which the *same* stages are used for both the ingress and the egress pipeline

## Unified Pipeline

- There is no difference between ingress and egress processing
  - The same blocks can be efficiently shared





A screenshot of a presentation slide titled "NetCache: Balancing Key-Value Stores with Fast In-Network Caching". The authors listed are Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. Logos for Johns Hopkins University, Barefoot Networks, Princeton University, Cornell University, and Berkeley are at the bottom.

A screenshot of a presentation slide titled "NetChain: Scale-Free Sub-RTT Coordination". The authors listed are Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. Logos for Johns Hopkins University, Barefoot Networks, Princeton University, Cornell University, and Berkeley are at the bottom.

NetCache solves the problem of load-balancing in key-values stores observing *dynamic, skewed* workload

**Key challenge: **highly-skewed** and rapidly-changing workloads**

**low throughput & high tail latency**



Source: NetCache: Balancing Key-Value Stores with Fast In-Network Caching, Xin Jin, 2017

It leverages that a small but very fast cache can provide perfect load-balancing... in theory

## Opportunity: fast, small cache can ensure load balancing

[B. Fan et al. **SoCC'11**, X. Li et al. **NSDI'16**]

Cache  $O(N \log N)$  hottest items

E.g., 10,000 hot objects



**N:** # of servers

E.g., 100 backends with 100 billions items



**Requirement:** cache throughput  $\geq$  backend aggregate throughput

# NetCache relies on the O(billion) throughput of programmable network devices to achieve it in practice

## NetCache: towards billions QPS key-value storage rack

Cache needs to provide the **aggregate** throughput of the storage layer



flash/disk  
each:  $O(100)$  KQPS  
**total:  $O(10)$  MQPS**

storage layer



in-memory  
 **$O(10)$  MQPS**

cache layer

in-memory  
each:  $O(10)$  MQPS  
**total:  $O(1)$  BQPS**



in-network  
 **$O(1)$  BQPS**

Small on-chip memory?  
Only cache  **$O(N \log N)$  small** items

Source: NetCache: Balancing Key-Value Stores with Fast In-Network Caching, Xin Jin, 2017

It relies on a tailored UDP-based protocol, an de/encoding scheme for storing variable length values, and sketches

## **Key-value caching in network ASIC at line rate ?!**

- How to identify application-level packet fields ?
- How to store and serve variable-length data ?
- How to efficiently keep the cache up-to-date ?



A screenshot of a presentation slide titled "NetCache: Balancing Key-Value Stores with Fast In-Network Caching" in large black text. Below the title is the author list: "Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica". At the bottom are logos for Johns Hopkins University, Barefoot Networks, Princeton University, Cornell University, and Berkeley.

A screenshot of a presentation slide titled "NetChain: Scale-Free Sub-RTT Coordination" in large black text. Below the title is the author list: "Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica". At the bottom are logos for Johns Hopkins University, Barefoot Networks, Princeton University, Università della Svizzera Italiana, and Berkeley.

NetChain builds upon NetCache to scale coordination services, a key building block of distributed systems

Conventional wisdom: [avoid coordination](#)

NetChain: [lightning fast coordination](#)  
enabled by programmable switches

Open the door to rethink distributed systems design

# Coordination services typically rely on a replicated key-value store for consistency and fault-tolerance



Source: NetChain: Scale-Free Sub-RTT Coordination, Xin Jin, 2018

State of the art server-based coordination services struggle to provide high-throughput and low-latency

## Opportunity: **in-network** coordination



- Throughput: **switch throughput**
- Latency: **half of an RTT**

# Key challenge is to ensure consistency and fault-tolerance

## Design goals for coordination services

- High throughput
  - Low latency
  - Strong consistency
  - Fault tolerance
- 
- Directly from high-performance switches
- Chain replication in the network

NetChain does so using chain replication, building upon NetCache for storing values in each switch

## What is chain replication



- Storage nodes are organized in a **chain** structure
- Handle operations
  - **Read** from the **tail**
  - **Write** from **head** to **tail**
- Provide strong consistency and fault tolerance
  - Tolerate **f** failures with **f+1** nodes

NetChain relies on a tailored UDP-based protocol,  
source-routing mechanisms and message serialization

## How to build a strongly-consistent, fault-tolerant, in-network key-value store

- How to store and serve key-value items?
  - How to route queries according to chain structure?
  - How to handle out-of-order delivery in network?
  - How to handle switch failures?
- 
- The diagram consists of a vertical bracket on the right side of the slide, spanning the height of the first three bullet points. This bracket is connected to the text 'Data Plane' located to its right. Below this group, there is a horizontal arrow pointing to the right, which is connected to the text 'Control Plane' located below it. The fourth bullet point is positioned below the third one and is not associated with either bracket or arrow.

This week on

# Advanced Topics in Communication Networks

A high-level, **non-exhaustive** overview of the research surrounding data plane programmability

# A high-level, non-exhaustive overview of the research surrounding data plane programmability

Data plane for  
programmability

Performance  
Monitoring  
Applications offloading

Platforms for Data plane  
Correctness programmability  
Management

Data plane  
programmability

for

**Performance**  
**Monitoring**  
**Applications offloading**

**Platforms**  
**Correctness**  
**Management**

for

Data plane  
programmability

# A large set of papers on programmable data planes aim at improving performance, esp. load balancing



CONGA [SIGCOMM'14]



LetFlow [NSDI'17]

# A large set of papers on programmable data planes aim at improving performance, esp. load balancing

HULA [SOSR'16]



CONGA [SIGCOMM'14]



DRILL [SIGCOMM'17]

Let Flow: Resilient Asymmetric Load Balancing with Flowlet Switching

Erico Vanini<sup>1</sup> Rong Pan<sup>1</sup> Mohammad Alizadeh<sup>1</sup> Parvin Taheri<sup>1</sup> Tom Edsall<sup>2</sup>  
<sup>1</sup>Cisco Systems <sup>2</sup>Massachusetts Institute of Technology

## Abstract

Datacenter networks often have asymmetric links between switches. These links can cause significant imbalances if there are a few large flows. More importantly, ECMP uses a purely *local* decision to split traffic among several paths without knowledge of potential downstream congestion on each path. Thus ECMP fares poorly with *asymmetry* caused by link failures that occur frequently and are disruptive in datacenters [17, 34]. For instance, the recent study by Gill *et al.* [17] shows that failures can reduce delivered traffic by up to 40% despite built-in redundancy.

Broadly speaking, the prior work on addressing ECMP's shortcomings can be classified as either centralized scheduling (e.g., Hedera [2]), local switch mechanisms (e.g., Flare [27]), or host-based transport protocols (e.g., MPTCP [41]). These approaches all have important drawbacks. Centralized schemes are too slow for the traffic volatility in datacenters [28, 8] and local congestion-aware mechanisms are suboptimal and can perform even worse than ECMP with asymmetry ([24]). Host-based methods such as MPTCP are challenging to deploy because network operators often do not control the end-host stack (e.g., in a public cloud) and even when they do, some high performance applications (such as low latency storage systems [39, 7]) bypass the kernel and implement their own transport. Further, host-based load balancing adds more complexity to an already complex transport layer burdened by new requirements such as low latency and burst tolerance [4] in datacenters. As our experiments with MPTCP show, this can make for brittle and unreliable [5].

## 1. INTRODUCTION

Datacenter networks being deployed by cloud providers as well as enterprises must provide large bisection bandwidth to support an ever-increasing number of virtual machines from financial-grade to big-data analytics. They also must provide agility, enabling any application to be deployed at any server, in order to realize operational efficiency and reduce costs. Seminal papers such as VL2 [18] and Portland [1] showed how to achieve this with Clos topologies. Equal Cost MultiPath (ECMP) load balancing, and the decoupling of endpoint addresses from their location. These design principles are followed by next generation overlay technologies that accomplish the same goals using standard encapsulations such as VXLAN [35] and NVGRE [45].

However, it is well known [2, 41, 9, 27, 44, 10] that ECMP can balance load poorly. First, because ECMP randomly hashes flows to paths, hash collisions can cause significant imbalance if there are a few large flows. More importantly, ECMP uses a purely *local* decision to split traffic among several paths without knowledge of potential downstream congestion on each path. Thus ECMP fares poorly with *asymmetry* caused by link failures that occur frequently and are disruptive in datacenters [17, 34]. For instance, the recent study by Gill *et al.* [17] shows that failures can reduce delivered traffic by up to 40% despite built-in redundancy.

This paper presents a distributed load balancing mechanism that overcomes the shortcomings of ECMP. Our key insight is that the load balancing problem is best solved at the edge of the datacenter fabric. This is because the edge is where the traffic is generated and consumed. By placing the load balancer at the edge, we can take advantage of the fact that the traffic is already aggregated and can be easily distributed across multiple paths. We also show that this approach can achieve high load balancing performance even in the presence of link failures.

The paper is organized as follows. In Section 2, we introduce the system architecture and the design of HULA. In Section 3, we evaluate HULA's performance and compare it with other load balancing schemes. In Section 4, we discuss related work. Finally, we conclude in Section 5.

## 2. DESIGN AND IMPLEMENTATION

### 2.1. System Architecture

#### 2.1.1. Edge Switches

#### 2.1.2. Fabric Controller

#### 2.1.3. Flowlet Switching

#### 2.1.4. Flowlet Scheduling

#### 2.1.5. Flowlet Routing

#### 2.1.6. Flowlet Monitoring

#### 2.1.7. Flowlet Control

#### 2.1.8. Flowlet Statistics

#### 2.1.9. Flowlet Configuration

#### 2.1.10. Flowlet Management

#### 2.1.11. Flowlet Statistics

#### 2.1.12. Flowlet Configuration

#### 2.1.13. Flowlet Management

#### 2.1.14. Flowlet Statistics

#### 2.1.15. Flowlet Configuration

#### 2.1.16. Flowlet Management

#### 2.1.17. Flowlet Statistics

#### 2.1.18. Flowlet Configuration

#### 2.1.19. Flowlet Management

#### 2.1.20. Flowlet Statistics

#### 2.1.21. Flowlet Configuration

#### 2.1.22. Flowlet Management

#### 2.1.23. Flowlet Statistics

#### 2.1.24. Flowlet Configuration

#### 2.1.25. Flowlet Management

#### 2.1.26. Flowlet Statistics

#### 2.1.27. Flowlet Configuration

#### 2.1.28. Flowlet Management

#### 2.1.29. Flowlet Statistics

#### 2.1.30. Flowlet Configuration

#### 2.1.31. Flowlet Management

#### 2.1.32. Flowlet Statistics

#### 2.1.33. Flowlet Configuration

#### 2.1.34. Flowlet Management

#### 2.1.35. Flowlet Statistics

#### 2.1.36. Flowlet Configuration

#### 2.1.37. Flowlet Management

#### 2.1.38. Flowlet Statistics

#### 2.1.39. Flowlet Configuration

#### 2.1.40. Flowlet Management

#### 2.1.41. Flowlet Statistics

#### 2.1.42. Flowlet Configuration

#### 2.1.43. Flowlet Management

#### 2.1.44. Flowlet Statistics

#### 2.1.45. Flowlet Configuration

#### 2.1.46. Flowlet Management

#### 2.1.47. Flowlet Statistics

#### 2.1.48. Flowlet Configuration

#### 2.1.49. Flowlet Management

#### 2.1.50. Flowlet Statistics

#### 2.1.51. Flowlet Configuration

#### 2.1.52. Flowlet Management

#### 2.1.53. Flowlet Statistics

#### 2.1.54. Flowlet Configuration

#### 2.1.55. Flowlet Management

#### 2.1.56. Flowlet Statistics

#### 2.1.57. Flowlet Configuration

#### 2.1.58. Flowlet Management

#### 2.1.59. Flowlet Statistics

#### 2.1.60. Flowlet Configuration

#### 2.1.61. Flowlet Management

#### 2.1.62. Flowlet Statistics

#### 2.1.63. Flowlet Configuration

#### 2.1.64. Flowlet Management

#### 2.1.65. Flowlet Statistics

#### 2.1.66. Flowlet Configuration

#### 2.1.67. Flowlet Management

#### 2.1.68. Flowlet Statistics

#### 2.1.69. Flowlet Configuration

#### 2.1.70. Flowlet Management

#### 2.1.71. Flowlet Statistics

#### 2.1.72. Flowlet Configuration

#### 2.1.73. Flowlet Management

#### 2.1.74. Flowlet Statistics

#### 2.1.75. Flowlet Configuration

#### 2.1.76. Flowlet Management

#### 2.1.77. Flowlet Statistics

#### 2.1.78. Flowlet Configuration

#### 2.1.79. Flowlet Management

#### 2.1.80. Flowlet Statistics

#### 2.1.81. Flowlet Configuration

#### 2.1.82. Flowlet Management

#### 2.1.83. Flowlet Statistics

#### 2.1.84. Flowlet Configuration

#### 2.1.85. Flowlet Management

#### 2.1.86. Flowlet Statistics

#### 2.1.87. Flowlet Configuration

#### 2.1.88. Flowlet Management

#### 2.1.89. Flowlet Statistics

#### 2.1.90. Flowlet Configuration

#### 2.1.91. Flowlet Management

#### 2.1.92. Flowlet Statistics

#### 2.1.93. Flowlet Configuration

#### 2.1.94. Flowlet Management

#### 2.1.95. Flowlet Statistics

#### 2.1.96. Flowlet Configuration

#### 2.1.97. Flowlet Management

#### 2.1.98. Flowlet Statistics

#### 2.1.99. Flowlet Configuration

#### 2.1.100. Flowlet Management

#### 2.1.101. Flowlet Statistics

#### 2.1.102. Flowlet Configuration

#### 2.1.103. Flowlet Management

#### 2.1.104. Flowlet Statistics

#### 2.1.105. Flowlet Configuration

#### 2.1.106. Flowlet Management

#### 2.1.107. Flowlet Statistics

#### 2.1.108. Flowlet Configuration

#### 2.1.109. Flowlet Management

#### 2.1.110. Flowlet Statistics

#### 2.1.111. Flowlet Configuration

#### 2.1.112. Flowlet Management

#### 2.1.113. Flowlet Statistics

#### 2.1.114. Flowlet Configuration

#### 2.1.115. Flowlet Management

#### 2.1.116. Flowlet Statistics

#### 2.1.117. Flowlet Configuration

#### 2.1.118. Flowlet Management

#### 2.1.119. Flowlet Statistics

#### 2.1.120. Flowlet Configuration

#### 2.1.121. Flowlet Management

#### 2.1.122. Flowlet Statistics

#### 2.1.123. Flowlet Configuration

#### 2.1.124. Flowlet Management

#### 2.1.125. Flowlet Statistics

#### 2.1.126. Flowlet Configuration

#### 2.1.127. Flowlet Management

#### 2.1.128. Flowlet Statistics

#### 2.1.129. Flowlet Configuration

#### 2.1.130. Flowlet Management

#### 2.1.131. Flowlet Statistics

#### 2.1.132. Flowlet Configuration

#### 2.1.133. Flowlet Management

#### 2.1.134. Flowlet Statistics

#### 2.1.135. Flowlet Configuration

#### 2.1.136. Flowlet Management

#### 2.1.137. Flowlet Statistics

# Motivation

---

DC networks need large bisection bandwidth for distributed apps (big data, HPC, web services, etc)

## Single-rooted tree

- High oversubscription



2

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Motivation

---

DC networks need large bisection bandwidth for distributed apps (big data, HPC, web services, etc)

**Multi-rooted tree [Fat-tree, Leaf-Spine, ...]**

- Full bisection bandwidth, achieved via multipathing



2

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Multi-rooted != Ideal DC Network

Ideal DC network:  
Big output-queued switch



- No internal bottlenecks → predictable
- Simplifies BW management  
[EyeQ, FairCloud, pFabric, Varys, ...]

Multi-rooted tree



Possible  
bottlenecks

4

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Multi-rooted != Ideal DC Network

Ideal DC network:  
Big output-queued switch



Multi-rooted tree



Need precise load balancing

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Today: ECMP Load Balancing

---

Pick among equal-cost paths by a **hash** of 5-tuple

- Approximates Valiant load balancing
- Preserves packet order

## Problems:

- Hash collisions  
(coarse granularity)
- Local & stateless  
(v. bad with asymmetry  
due to link failures)



Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Dealing with Asymmetry

---

Handling asymmetry needs non-local knowledge



6

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Dealing with Asymmetry

---

Handling asymmetry needs non-local knowledge



6

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Dealing with Asymmetry: ECMP



Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Dealing with Asymmetry: Local Congestion-Aware



Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Dealing with Asymmetry: Global Congestion-Aware



Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# Dealing with Asymmetry: Global Congestion-Aware



Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

## Global Congestion-Awareness (in Datacenters)



11

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

## Global Congestion-Awareness (in Datacenters)



**Key Insight:**  
Use *extremely fast, low latency*  
distributed control

# CONGA in 1 Slide

---

1. Leaf switches (top-of-rack) track congestion to other leaves on different paths **in near real-time**
1. Use greedy decisions to minimize bottleneck util



Fast feedback loops  
between leaf switches,  
**directly in dataplane**

12

Source: CONGA: Distributed Congestion-Aware Load Balancing for Datacenters,  
Mohammad Alizadeh et al., 2014

# A large set of papers on programmable data planes aim at improving performance, esp. load balancing

P4-based data-plane load-balancing  
with better scalability than CONGA

HULA [SOSR'16]



"micro" load balancing,  
packet-by-packet,  
can deal with micro-bursts

DRILL [SIGCOMM'17]

stateless, yet congestion-aware  
load-balancing decision

LetFlow [NSDI'17]

Data plane  
programmability

for

**Performance**  
**Monitoring**  
**Applications offloading**

**Platforms**  
**Correctness**  
**Management**

for  
Data plane  
programmability



## In-band Network Telemetry (INT)

June 2016

Changhoon Kim, Parag Bhide, Ed Doe: *Barefoot Networks*  
 Hugh Holbrook: *Arista*  
 Anoop Ghanwani: *Dell*  
 Dan Daly: *Intel*  
 Mukesh Hira, Bruce Davie: *VMware*

### Introduction

#### Terms

#### What To Monitor

##### Switch-level Information

##### Ingress Information

##### Egress Information

##### Buffer Information

#### Processing INT Headers

##### INT Header Types

##### Handling INT Packets

#### Header Format and Location

##### INT over any encapsulation

##### On-the-fly Header Creation

##### Header Format

##### Header Location and Format -- INT over Geneve



## Current monitoring methods are inadequate

- Not fast enough
  - Involve CPU and control planes
  - Network state changes rapidly
- Do not provide end-to-end state
  - Difficult to correlate per-element state with the actual path of a flow

Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

## INT : In-band Network Telemetry

- Mechanism for collecting network state in the dataplane
  - As close to **realtime** as possible
  - At current and future **line rates**
  - With a framework that can **adapt** over time
- Examples of network state
  - Switch ID, Ingress Port ID, Egress Port ID
  - Egress Link Utilization
  - Hop Latency
  - Egress Queue Occupancy
  - Egress Queue Congestion Status
  - ....

Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

# INT Header Format



Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

## INT using P4

- P4 enables flexible packet parsing and modification for INT
- P4 allows INT to adapt to
  - Any Encapsulation format
  - Any State required to be collected
  - Any feature, protocol – current and future

Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

# INT : P4 Code Snippet

Exact-match  
Table Definition

```
table int_inst {  
    reads {  
        int_header.instruction_mask : exact;  
    }  
    actions {  
        int_set_header_i0;  
        int_set_header_i1;  
        int_set_header_i2;  
        int_set_header_i3;  
        ....  
    }  
}
```

Action  
Definitions

```
action int_set_header_i0() {  
}  
action int_set_header_i1() {  
    int_set_header_3();  
}  
action int_set_header_i2() {  
    int_set_header_2();  
}  
action int_set_header_i3() {  
    int_set_header_3();  
    int_set_header_2();  
}  
....
```

Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

## HULA: INT + Flowlet routing

1. Periodic INT probes
  - disseminate path utilization to switches
2. Flowlet detection and path selection
  - happens at **all** switches
  - hop-by-hop adaptive routing

## INT probes traverse multiple paths



Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

## Probes carry path utilization



Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

## Probes update switch state



Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015

## Summary

- INT provides real-time network state directly in the dataplane
  - Scales to arbitrarily large networks
  - Scales to current and future link speeds
  - Can adapt to any network, any encap, any application
- Knowledge of real-time network state opens up new possibilities
  - Enhanced monitoring and troubleshooting
  - Network-state aware routing
  - ...

Source: In-band Network Telemetry, Mukesh Hira and Naga Katta, 2015



## In-band Network Telemetry (INT)

June 2016

Changhoon Kim, Parag Bhide, Ed Doe: *Barefoot Networks*  
Hugh Holbrook: *Arista*  
Anoop Ghanwani: *Dell*  
Dan Daly: *Intel*  
Mukesh Hira, Bruce Davie: *VMware*





## MARPLE [SIGCOMM'17]

## SONATA [SIGCOMM'18]

Both papers enable operators to express **monitoring queries**

```
result = filter(pktstream, qid == Q and switch == S
                and t_out - t_in > 1ms)
returns a stream of packets experiencing high queuing latencies
```

A compiler then compiles these queries to: switch programs + control code

The two papers differ among others in the types of queries they support



LossRadar [CoNEXT'16]

FlowRadar [NSDI'16]

Develop techniques and tools to monitor *all flows* by

- relying on in-switch data structures (Bloom Filters) and
- decoding them at the controller-level

## DAPPER [SOSR'17]

## Network-Wide HH [SOSR'18]



## Develop P4-based detection mechanisms to

- diagnose TCP performance issue (e.g. small receiver buffers)
- heavy-hitter (e.g. port scanners, superspreaders, DDoS)

Introduce techniques to make sketch-based monitoring more practical (by making sketches adaptive or "universal")

SketchLearn [SIGCOMM'18]

Elastic Sketch [SIGCOMM'18]

UnivMon [SIGCOMM'16]



Data plane  
programmability

for

**Performance  
Monitoring**

**Applications offloading**

**Platforms  
Correctness  
Management**

for

Data plane  
programmability

[SOSR'15]



[HotNets'17]



[SIGCOMM'17]



Consensus at network speed

In-Network Aggregation  
(e.g., for MapReduce, graph analytics, ML)

Stateful layer-4 load balancers

+ NetCache [SOSP'17], NetChain [NSDI'18]

Data plane  
programmability

for

**Performance  
Monitoring**  
**Applications offloading**

**Platforms**  
**Correctness**  
**Management**

for  
Data plane  
programmability

"Data-plane" programmability goes beyond  
switch programmability (or P4 for that matter)

# Offloading...

# ... to FPGA-based SmartNICs

host networking

congestion control

**Azure Accelerated Networking: SmartNICs in the Public Cloud**

Daniel Firestone Andrew Putnam Sambrahna Mundkur Derek Chiu Alireza Dabagh  
Mike Andrewartha Hari Angepat Vivek Bhandi Adrian Caulfield Eric Chung  
Harish Kumar Chandrapur Somesh Chaturmodha Matt Humphrey Jack Lavier Norman Lam  
Fengfen Liu Kalin Ovtcharov Gautham Popuri Shachar Ravidel Tejas Sapte  
Mark Shaw Gabriel Silva Madhan Sivakumar Nisheet Srivastava Anshuman Verma Qasim Zuhair  
Deepak Bansal Doug Burger Kushagra Vaish David A. Maltz Albert Greenberg  
Microsoft

**Abstract**

Modern cloud architectures rely on each server running its own networking stack to implement policies such as tunneling for virtual networks, security, and load balancing. However, these networking stacks are becoming increasingly complex as features are added and as network services implemented in software on CPU cores takes away processing power from VMs, increasing the cost of running cloud services, and adding latency and variability to network performance.

We present Azure Accelerated Networking (AccelNet), a solution for offloading host networking to hardware using custom Azure SmartNICs based on FPGAs. We describe the goals of AccelNet, including performance, compatibility, software, and performance and efficiency comparable to hardware. We show that FPGAs are the best current platform for offloading our networking stack as ASICs do not provide sufficient programmability, and embedded CPU cores do not provide scalable performance especially on single network flows.

Azure SmartNICs implementing AccelNet have been developed and deployed in the Microsoft cloud [20] in a fleet of >1M hosts. The AccelNet service has been available for Azure customers since 2016, providing consistent <15μs VM-VM TCP latencies and 37Gbps throughput, which we believe represents the fastest network available to customers in the public cloud. We present the design of AccelNet, including our hardware/software co-design model, performance results on key workloads, and experiences and lessons learned from developing and deploying AccelNet on FPGA-based Azure SmartNICs.

**1 Introduction**

The public cloud is the backbone behind a massive and rapidly growing percentage of online software services [1], [2], [3]. In the Microsoft Azure cloud alone, these services consume millions of processor cores, exabytes of storage, and petabytes of network bandwidth. Network performance, both bandwidth and latency, is critical to most cloud workloads, especially interactive customer-facing workloads.

As a large public cloud provider, Azure has built its cloud network on host-based software-defined networks

**HotCocoa: Hardware Congestion Control Abstractions**

Mina Tahmasbi Arashloo Monia Ghobadi Jennifer Rexford David Walker  
Princeton University Princeton University Princeton University Princeton University

**ABSTRACT**

Congestion control in multi-tenant data centers is an active area of research because of its significant impact on customer experience, and, consequently, on revenue. Therefore, new algorithms are frequently explored to leverage the Cloud evolution. Deploying new congestion control algorithms at the end host's hypervisor allows frequent updates, but processing packets at high rates in the hypervisor and implementing the elements of a congestion control algorithm, such as traffic shapers and timestamps, in software have well-studied inaccuracies and inefficiencies. These inaccuracies originate from the fact that these components are implemented entirely on software timers to timestamp packets. These timers are inaccurate as they drift orders of magnitude compared to hardware timers [20, 22, 23]. Worse yet, merely switching packets between the NIC and VMs at 100Gbps can utilize up to 45% of CPUs on a 12-core machine [16]. Thus, with 100Gbps NICs on horizon, implementing traffic shaping, and similar per-packet stateless processing, at line rate in the hypervisor requires additional CPU cycles and memory that could have otherwise been sold to operators.

To free up CPU cycles on servers, several techniques have been developed for offloading various networking functions to the NIC (e.g., TCP Segmentation Offload [10] and Generic Receive Offload [2]). More recent technologies such as Single Root I/O Virtualization [17] enable VMs to bypass the host's hypervisor and interact directly with the NIC. This trend is likely to continue, given the impact of network congestion on data centers' revenue and their rapid adoption

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without prior permission or fee. For those interested in making commercial use of this material, please make arrangements for permission to copy or license with ACM at the address below. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

HotNets-XV, November 29–December 1, 2017, Palo Alto, CA, USA  
© 2017 Association for Computing Machinery.  
ACM ISBN 978-1-4503-5569-8/17/11...\$15.00  
<https://doi.org/10.1145/32343.3152457>



[NSDI'18]

[HotNets'17]

NetFPGA SUME board

# Host-based programmability + SmartNICs + programmable switches = fully programmable platforms

Big question is

How to combine them best?

The screenshot shows a PDF document titled "beyond\_smart\_nics.pdf (page 1 of 6)". The title is "Beyond SmartNICs: Towards a Fully Programmable Cloud" with a subtitle "(Invited Paper)". Three authors are listed: Adrian Caulfield (Microsoft Research, acauliffe@microsoft.com), Paolo Costa (Microsoft Research, pcosta@microsoft.com), and Monia Ghobadi (Microsoft Research, mgh@microsoft.com). The abstract discusses the integration of FPGA-based SmartNICs and programmable switches to enable a fully programmable cloud. It highlights the benefits of hardware acceleration and custom pipelines, and argues for a shift towards a fully programmable cloud where hardware and software co-design across all layers. The paper focuses on the potential of FPGA-based SmartNICs and programmable switches to realize this vision. The introduction section discusses the continuous growth of cloud applications and the challenges of modern clouds. It also mentions traditional offloading techniques like SR-IOV and PCIe Process Address Space ID (PASID). The document is 6 pages long.

Data plane  
programmability

for

Performance  
Monitoring  
Applications offloading

Platforms  
**Correctness**  
Management

for  
Data plane  
programmability

# So you've a programmable networks...

# How do you make sure that it works as it should?!



[SIGCOMM'18]

[SIGCOMM'18]

[CoNEXT'18]

# So you've a programmable networks...

# How do you make sure that it works as it should?!



[SIGCOMM'18]

[SIGCOMM'18]

[CoNEXT'18]

# Programmable routers...

(specifically, programmable data planes)

**...how do they work?**



Arista 7170 series switches

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

## Let's verify!



Bit-level description  
of data-plane behaviour

Give programmers language-based  
verification tools

P4 also used as HDL for fixed-function  
devices

Arista 7170 series switches

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

## P4 by example

- P4 is a low-level language → many gotchas
- Let's explore by example!
  - IPv6 router w/ access control list (ACL)

```
control ingress { apply(acl); }

table acl {
    reads { ipv6.dstAddr: lpm; }
    actions { allow; deny; }
}

action allow() {
    modify_field(std_meta.egress_spec, 1);
}

action deny() { drop(); }
```

**What could *possibly* go wrong?**

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

**What if we didn't receive an IPv6 packet?**

ipv6 header will be **invalid**

### **What goes wrong**

Table reads arbitrary values

→ Intended ACL policy violated

Can read values from a previous packet

→ Side channel vulnerability!

Real programs are complicated:  
hard to keep validity in your head

```
control ingress { apply(acl); }

table acl {
    reads { ipv6.dstAddr: lpm; }
    actions { allow; deny; }
}

action allow() {
    modify_field(std_meta.egress_spec, 1);
}

action deny() { drop(); }
```

## **Property #1: header validity**

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

## **What if acl table misses (no rule matches)?**

Forwarding decision is unspecified

### **What goes wrong**

Forwarding behaviour depends on hardware

- May not do what you expect!
- Code not portable

```
control ingress { apply(acl); }

table acl {
    reads { ipv6.dstAddr: lpm; }
    actions { allow; deny; }
}

action allow() {
    modify_field(std_meta.egress_spec, 1);
}

action deny() { drop(); }
```

## **Property #2: unambiguous forwarding**

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

# Types of properties

## General safety

- **Header validity**
- Arithmetic-overflow checking
- Index bounds checking (header stacks, registers, meters, ...)

## Architectural

- **Unambiguous forwarding**
- **Reparseability**
- **Mutual exclusion of headers**
- Correct metadata usage (e.g., read-only metadata)

## Program-specific

- Custom assertions in P4 program — e.g., IPv4 ttl correctly decremented

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

## Challenge #1: imprecise semantics



- P4 language spec doesn't give precise semantics
- Defined semantics by translation to GCL (a simple imperative language)
- Tested semantics
  - Symbolically executed GCL to generate input-output tests for several programs
  - Ran w/ Barefoot P4 compiler & Tofino simulator

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

## Challenge #2: modelling the control plane

- A P4 program is just half the program
  - Table rules are not statically known
  - Populated by the control plane at run time
- Control planes are carefully programmed
  - Tables rarely take arbitrary actions
- To rule out false positives, need to model behaviour of control plane



```
table acl {  
    reads {  
        ipv6.dstAddr: lpm;  
    }  
    actions { allow; deny; }  
}
```

```
( @[ Action ] acl <hit> (allow);  
    std_meta.egress_spec := 1)  
[] ( @[ Action ] acl <hit> (deny);  
    std_meta.egress_spec := 511)  
[] @[ Action ] acl <miss>
```

Tables translated into *unconstrained nondeterministic choice*

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

# p4v overview

- **Automated** tool for verifying P4 programs
- Considers **all paths**
  - But also practical for **large programs**
- Includes basic safety properties for any program
- **Extensible** framework
  - Verify custom, program-specific properties
  - Assert-style debugging



Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

# p4v architecture



1. Start w/ CPI & P4 program
2. Translate to GCL
3. Auto-annotate w/ assertions
4. Standard optimizations
5. Generate formula
6. Send to Z3
7. Success or counterexample
  - Input packet
  - Program trace
  - Violated assertion

Source: p4v, Practical Verification for Programmable Data Planes, Liu et al., 2018

Data plane  
programmability

for

**Performance  
Monitoring**  
**Applications offloading**

**Platforms**  
**Correctness**  
**Management**

for  
Data plane  
programmability

So you've a *verified* programmable networks...

How do you manage it?!

How do you perform planned maintenance?  
now that you've state in your switches...

How do you run multiple applications in your switches?  
monitoring, forwarding, load-balancing, etc.

How do you share resources amongst applications?  
especially memory and # packet operations

# We need an Operating System for the data plane

Definition  
Wikipedia

An operating system is a system software that manages computer hardware and software resources and provides common services for computer programs.

Do we have that? **Nope.** Not yet at least.

# We're working on it...

[SOSR'17]

The screenshot shows a PDF document titled "vanbever\_swing\_state\_sosr\_2017.pdf (page 1 of 7)". The title page contains the following information:

**Swing State: Consistent Updates for Stateful and Programmable Data Planes**

Shouxi Luo\* Hongfang Yu  
University of Electronic Science and Technology of China

Laurent Vanbever  
ETH Zürich

**ABSTRACT**  
With the rise of stateful programmable data planes, a lot of the network functions that used to be implemented in the controller or at the end-hosts are now moving to the data plane to benefit from line-rate processing. Unfortunately, stateful data planes also mean more complex network updates as not only flows, but also the associated states, must now be migrated consistently to guarantee correct network behaviors. The main challenge is that data-plane states are maintained at line rate, according to possibly runtime criteria, rendering controller-driven migration impossible.

We present **Swing State**, a general state-management framework and runtime system supporting consistent state migration in stateful data planes. The key insight behind **Swing State** is to perform state migration entirely within the data plane by piggybacking state updates on live traffic. To minimize the overhead, **Swing State** only migrates the states that cannot be safely reconstructed at the destination switch.

We implemented a prototype of **Swing State** for P4. Given a P4 program, **Swing State** performs static analysis to compute which states require consistent migration and automatically augments the program to enable the transfer of these states at runtime. Our preliminary results indicate that **Swing State** is practical in migrating data-plane states at line rate with small overhead.

**Keywords**  
Network updates; Software-Defined Networking; P4; Stateful programmable data planes.

**1. INTRODUCTION**  
By enabling stateful applications to run directly *in* the data plane, at line rate, programmable data planes [9, 22, 8, 28, 27, 16, 23] have recently emerged as a promising research area.

Yet, despite making SDNs more powerful, maintaining states in the data plane also calls for new consistent update mechanisms as it prevents traditional update techniques from working, and this, for three main reasons. First, the fact that data-plane states can be updated at line rate—at speeds that can reach Tbps [5]—prevents any software-based controller from consistently moving states from one device to another. Inconsistent migration is a problem for any data-plane application that requires strong-consistency network-wide. Examples of such applications include stateful firewalls tracking dynamic flow characteristics (e.g., low-level TCP states [29]) or anomaly detection applications [21]. Second, even ignoring states dynamism, the exact set of states to be migrated may actually be unknown to the controller, preventing it from performing the migration in the first place. Indeed, the states location in memory can differ from device to device according to runtime factors (e.g. a hash computed on packet headers) that are invisible to the controller. Third, data-plane states

**CCS Concepts**

Swing State is a state management framework with 1 primitive: **moveStates**



Source: Swing State: Consistent Updates for Stateful and Programmable Data Planes  
Luo et al., SOSR 2017

# Advanced Topics in Communication Networks



~7 weeks  
how to program in P4

>= 7 weeks  
in teams of 2—3

# Advanced Topics in Communication Networks



~7 weeks  
how to program in P4

>= 7 weeks  
in teams of 2—3

The group project starts this week

It accounts for 50% of your final grade

The evaluation of your project will depend on  
your implementation, report, and presentation

The evaluation of your project will depend on  
your implementation, report, and presentation

implementation

70%

achieves the basic goals  
is properly documented  
runs...

# The evaluation of your project will depend on your implementation, report, and presentation

implementation

70%

achieves the basic goals  
is properly documented  
runs...

report

15%, 10 pages max

describes the main building blocks  
evaluates the solution  
describes what each group member did

# The evaluation of your project will depend on your implementation, report, and presentation

implementation

70%

achieves the basic goals  
is properly documented  
runs...

report

15%, 10 pages max

describes the main building blocks  
evaluates the solution  
describes what each group member did

presentation

15%, 12 min. +questions

summarizes the problem and the solution  
contains a *live demo*  
involves all group members

The final deadline for the project is

**Wed Dec 19 at 23.59pm**

This week

Select a proposal from the list ([see Doodle](#))  
or send us your own proposal by email

*Every* week

Meet with the responsible assistant  
schedule a recurring slot in [10.15am; noon]

**Wed Dec 19  
11.59pm**

Send us an archive with report, code, slides

**Thu Dec 20  
8.15am—**

Groups presentation + course/exam debrief  
**attendance is mandatory**

The project has to be done in groups of 3 students  
"Matching" process for incomplete groups via Slack

Project grade is shared by each group member  
provided that each collaborated (roughly equally)

- Let us know in advance if that's *not* the case
- Briefly describe in the report the contribution of each group member
- Each group member should be involved in the presentation and be able to answer questions

# Details about each proposal is available on our website

## Advanced Topics in Communication Networks **Project Proposals**

### Proposal #1: Hardware-Based RSVP

Responsible: Albert Gran Alcoz

Resource Reservation Protocol (RSVP) [1] is a signaling protocol that allows connections in a network to perform bandwidth requests throughout a given path. It is a protocol that has been included in different solutions both in the traffic engineering field and in quality of service. Integrated Services (IntServ) was the first in adopting it, in the late 1990s, as a means to provide guaranteed quality of service in multimedia networks. Some years later, and with higher success, RSVP was extended for traffic engineering purposes in the RSVP-TE protocol [2] to be used for the establishment of virtual circuits in MPLS. RSVP suggests users in a network to perform bandwidth reservations before starting data transmissions. For that, packet probes are forwarded from source to destination, letting routers in between identify the amount of resources requested by the new connection. Routers will receive those requests and reply to them by annotating in the same packet their resource availability. Flows will only be admitted if all routers along the path have agreed on having enough resources for hosting the new request. Although achieving notable and robust performance, being able to provide 100% resource guarantees, the high price that RSVP requires in terms of scalability and complexity, has made from it a not very successful solution in multiple scenarios until nowadays. Among the main drawbacks, the most remarkable ones are the time required to set up a new connection (too high especially for real-time flows), the amount of state to be stored in each switch along the path (to keep track of reservations), and the periodic overheads needed to refresh reservation requests.

In this project, we propose the design and implementation of an evolved version of RSVP, based on P4, to be run directly on hardware. We strongly believe that a signaling protocol executed at line rate in the data-plane can be quicker in deploying configurations and faster in reacting to updates.

Students are expected to come up with a data-plane implementation, aiming to overcome RSVP original

Register your proposal (one per group)  
from Friday 3pm until Sunday 11.59pm

**Doodle** Plans Help English ▾

[Sign up](#) [Log in](#) [Create a Doodle](#)

## Adv-Net Group Projects

by Roland Meier • 18 hours ago • Print

Please register as teams of 3 people (write the names of all team members). If a team has only 2 members, we might add another person. Reservations with only 1 name or with more than 3 names will be removed.

|                                      | Proposal #1: Hardware-Based RSVP | Proposal #2: Data-plane driven network convergence | Proposal #3: Intra-domain routing in the data-plane | Proposal #4: Delay-based routing entirely in the data-plane | Proposal #5: Advanced stateful firewall |
|--------------------------------------|----------------------------------|----------------------------------------------------|-----------------------------------------------------|-------------------------------------------------------------|-----------------------------------------|
| 2 participants                       | ✓ 1/1                            | ✓ 1/1                                              | ✓ 0/1                                               | ✓ 0/1                                                       | ✓ 0/1                                   |
| <input type="text"/> Enter your name | <input type="radio"/>            | <input type="radio"/>                              | <input type="radio"/>                               | <input type="radio"/>                                       | <input type="radio"/>                   |

If you want to propose your own project,  
send me an email describing it by **Friday (Nov 2) 3pm**

ivanbever@ethz.ch

# Quick overview of the proposals



Albert



Thomas



Roland



Alexander



Edgar

# Quick overview of the proposals



Albert



Thomas



Roland



Alexander



Edgar

# Proposal #1

## Hardware-Based RSVP

Bandwidth reservations throughout a given path:

- Quality of Service guarantees (IntServ)
- Establishment of virtual circuits (MPLS)

Exclusive data plane implementation:

- Personalized headers
- Header stacks
- Registers
- Bloom filters



Faster and more scalable than traditionally

# Quick overview of the proposals



Albert



Thomas



Roland



Alexander



Edgar

# Proposal #2: Data-plane Driven Network Convergence



# Proposal #3: Delay-based Routing

Entirely in the Data-Plane



# Quick overview of the proposals



Albert



Thomas



Roland



Alexander



Edgar

# Proposal #4

## Advanced stateful firewall



- ✓ Fine-grained access policies
  - ✓ Deep packet inspection (DPI)
  - ✓ VPN
  - ✓ Attack detection
  - ✓ Spoofing detection
  - ✓ (add your idea here)

# Proposal #5

## I know what you're seeing now



# Proposal #6

## Playing snake in the data plane



# Quick overview of the proposals



Albert



Thomas



Roland



Alexander



Edgar

## Proposal #7

In **Active Networks**, packets carry **programs**.



The programs **are executed**  
**on each switch** along the path

## Proposal #8

# Storing data in the cloud ~~the right way!~~



Store data in a  
**forwarding loop**

# Quick overview of the proposals



Albert



Thomas



Roland



Alexander



Edgar

# Proposal #9

## Data Plane Failure Detection



Detect **local** and **remote** link failures (A-C)

Detect **random** packet drops (B-C)

Detect **corrupted** table entries (E)

| prefix      | port          |
|-------------|---------------|
| 10.1.1.0/24 | 01            |
| 10.1.2.0/24 | <del>10</del> |

# Proposal #10

## Stateful Application Migration



# Proposal #10

## Stateful Application Migration



# Proposal #10

## Stateful Application Migration



# Proposal #10

## Stateful Application Migration



# Proposal #10

## Stateful Application Migration



# Proposal #11

## P4 Switch

### Management and Configuration API

Control Plane

#### Basic Features

I2 forwarding, learning, multicast

ipv4, ipv6, I3 multicast

ECMP, Weighted ECMP

ICMP

ARP

ECN

Simple QoS

#### Advanced Features

Spanning Tree Protocol

netflow, sFlow or similar

VXLAN, MPLS, Gre

DHCP Server

DNS Cache

Simple Firewall

NAT

Data Plane

Advanced Topics in Communication Networks

# Programming Network Data Planes



Laurent Vanbever  
[nsg.ee.ethz.ch](http://nsg.ee.ethz.ch)

ETH Zürich  
Nov 1 2018