Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

microsoft/Validate-DCB

Repository files navigation

Build status downloads

⭐ More by the Microsoft Core Networking team

Find more from the Core Networking team using the MSFTNet topic

What's New in v2.2

For more information, please see What's New

Getting Started
  1. Learn the Tool
  2. Customize your Config
  3. Initiate Testing

Description

Validate-DCB v2.1 is a PowerShell-based unit test tool that allows you to:

   :heavy_check_mark: Validate the expected configuration on one to N number of systems or clusters

   :heavy_check_mark: Validate the configuration meets best practices

Additional benefits include:

   :heavy_check_mark: The configuration doubles as DCB documentation for the expected configuration of your systems.

   :heavy_check_mark: Answer "What Changed?" when faced with an operational issue (see Test Results)

   :heavy_check_mark: [New with version 2] Deploy the configuration to nodes

ℹ️ Note: This tool does not modify your system unless you specify the -Deploy command. As such, you can re-validate the configuration as many times as desired.

Overview

RDMA over Converged Ethernet (RoCE) requires Data Center Bridging (DCB) technologies to make a network fabric lossless. The configuration requirements are complex and error prone, requiring exact configuration and adherence to best practices across:

   ➡️ Each Windows Node

   ➡️ Each network port RDMA traffic passes through on the fabric

This tool aims to validate the DCB configuration on the Windows nodes by taking an expected configuration as input and unit tests each Windows system.

Important: The validation of the network fabric is out-of-scope for this tool

Here's a quick introductory video from Microsoft Premier Field Engineer, Jan Mortensen. Alt text

Scenarios

Validate-DCB will provide configuration validation for one or more nodes or clusters across a variety of scenarios including:

   ➡️ Native RDMA Adapters (Mode 1)

   ➡️ Host vNIC RDMA (Mode 2) with vNICs in the parent partition

   ➡️ Combination scenarios with both Native RDMA and Host Virtual NICs

   ➡️ Multiple virtual switches with RDMA enabled adapters

⚠️ For step-by-step configuration instructions, please see the Converged NIC Guide. Alternatively, you can use the deployment options in version 2

Test Overview

Test Types

Currently all tests in Validate-DCB are unit tests. That is, they break down and check individual configuration items one by one, rather than a holistic or functional test. In the future, we may incorporate integration/acceptance testing.

Tests are broken down into two types:

   ➡️ Global - Tests the TestHost, Each SUT, and Configuration File for prerequisites. If anything fails here, Validate-DCB will not move onto the actual DCB tests.

   ➡️ Modal - Tests each SUT for RDMA and configuration best practices

For more information, please see Test Details

Test Results

Testing with Azure DevOps and a CI/CD pipeline

Besides the on-screen feedback provided by the tool, results of the tests are stored in NUnitXML format in the \Results folder. These Results can be stored for historical reasons and take part in a CI/CD pipline as shown in Building a Continuous Integration and Continuous Deployment pipeline with DSC

Simple report using PowerBi

You can also use PowerBi to make displaying results easy. For more information, please see Using the Results or see this video from Microsoft Premier Field Engineer, Jan Mortensen.

Alt text

Interpreting Test Results

Validate-DCB may not work with other languagues. In this case, use the test as guidance on how to verify your configuration.

How Test Output is Constructed

Tests are constructed hierarchically. Describing blocks contain one or more Context blocks. Context blocks contain one or more tests. This is Pester terminology outside the scope of this documentation. Pester is a PowerShell-based unit testing framework included inbox with Windows 10, Server 2016 and Server 2019.

While we have future plans to include more sections, currently the only two possible describe blocks are:

   ➡️ [Global Unit] tests requirements or prerequisites to run the modal tests

   ➡️ [Modal Unit] tests a node's configuration or best practices

A context block is a group of one or more tests. For example, Validate-DCB may test a physical NetAdapter's Advanced Properties including the VLANID or NetwordDirect (RDMA in driver terms) settings. These would be grouped in the same context.

Describe or Context Titles

Each Describe, Context, and Test includes a title enclosed in square brackets [ ]. Information inside these square brackets are intended to guide you to the necessary details to either resolve a failing test, or understand what just passed. Let's use this as an example:



➡️ Describing [Modal Unit] contains unit tests for the RDMA modes of operation (NDK mode 1 or 2)

➡️ Context can be broken down as follows:

   ↪️ [Modal Unit] – The describe block this Context is within

   ↪️ [VMSwitch.RDMAEnabledAdapters] – The section of the config file currently being testing.

   ↪️ [SUT: TK5-3WP07R0511] – The hostname of the current System Under Test

In this example, the current context is used for testing an adapter that is expected to be enabled for RDMA and connected to a VMSwitch.

This adapter exists below the VMSwitch section of the configuration file.

Note: During runtime, a variable named $ConfigData contains the information from the config file. With a debugger attached, you can walk the variable like this:

   [DBG]: PS C:\> $ConfigData.AllNodes.VMSwitch.RDMAEnabledAdapters

Passing Tests

If your system passes a test you will see green text similar to this:

+ [SUT: TK5-3WP07R0511]-[VMSwitch: VMSTest]-[RDMAEnabledAdapter: RoCE-01]-[Noun: NetAdapter] Interface status must be "Up"

Using the above image as an example, you can interpret this passing test as:

▶️ The SUT named TK5-3WP07R0511

   ↪️ is expecting the RDMAEnabledAdapter named RoCE-01

    ↪️ intended to back the VMSwitch named VMSTest

     ✔️ to have an interface operation status of "Up"

You can verify this using the PowerShell noun identified in the test (in the example, this is NetAdapter).

    

Failing Tests

If your system is incorrectly configured, the test will provide an error message on-screen.

Unlike most PowerShell scripts, red error messages do not indicate an exception or failing code. Rather this (typically) is indicating a failing test. Another words, this is highlighting something you need to fix.

Failing tests give information to identify the misconfiguration. In the failing test shown below (red output), the RDMAEnabledAdapter named RoCE-02 on SUT named TK5-3WP07R0511 was expected to be attached to the VMSwitch named VMSTest.

As you can see above, the Enabled property corresponding to the:

  

By running Get-NetAdapterBinding on the SUT you can see this for yourself.

Here's another video from Microsoft Premier Field Engineer, Jan Mortensen, who reviews and validates errors found with Validate-DCB

Alt text

Reviewing the Tests

You may also find it useful to review the code generating the failing test. To do this, navigate in the folder structure to the file and line specified in the test failure, for example:

This message identifies the file and line number of the failing test.

  

Now navigate to the file and review the code.

  

If you’re still stuck and want to review the variables during runtime, you can set a breakpoint on the line above that specified in the test failure (the test failed at line 490 so the breakpoint at 489 as shown here):

  

⚠️ If searching for a test in the code,please be aware that parenthesis typically indicates variables that are being expanded. All other test descriptions should be searchable.

For example, in this test description the exact driver version is specific to a particular NIC manufacturer (in this case 1.90.19240.0) and therefore, you cannot search for this in the test as it’s an expanded variable.

Resolving Test Failures

To complete our example above, we need to resolve the configuration issue. To do this, we'll attach the adapter(s) to the VMSwitch so the binding is now enabled.

Getting Started

Installation

Validate-DCB is now published in the PowerShell gallery. Please use Install-Module Validate-DCB from a system with internet connectivity.

For disconnected systems, use Save-Module -Name Validate-DCB -Path c:\temp\Validate-DCB then move the modules in c:\temp\Validate-DCB to your disconnected system. Here's a video from Microsoft Premier Field Engineer, Jan Mortensen.

Alt text

Requirements

  • TestHost: Windows 10, Windows Server 2016, or Windows Server 2019. The TestHost can also be a SUT if it is the appropriate OS.

  • System Under Test (SUT): Windows Server 2016 or Windows Server 2019

  • Configuration File: This is a file that defines the expected configuration on the SUTs.

Configuration File

Regardless of the scenario, you need a configuration file to define the expected configuration on your systems. Validate-DCB then checks that each system matches the expected configuration. With Validate-DCB v2.1 we recommend using the user interface to create the configuration for you. To do this, run Validate-DCB without parameters. For more information on customizing your own file, please see: Customize your Config

Running Validate-DCB

To begin testing, complete the wizard mentioned in the previous section or run Validate-DCB -ConfigFilePath <Path to your configuration file>.ps1 if you have an existing configuration file you wish to use.

Additionally, you can connect Validate-DCB with your Azure Automation account to first deploy the configuration (then validate).

ℹ️ Note: For full parameter help use: Get-Help Validate-DCB

Here are a few tips on the parameters of the parameters.

Parameter Description
TestScope Determines the describe block to be run. You can use this to only run certain describe blocks. For example:

Use Global if you just want to setup a test host or validate your systems are ready to be tested.

Use Modal if you have already know you have all the prerequisites met.
LaunchUI Use this parameter to launch a user interface that helps create a configuration file.
ExampleConfig Use this to select one of the pre-defined configuration files that will test a system in Mode 1 or Mode 2. For more information on the example configuration guides, please see Examples.

For details about the configuration for these modes, please review the Converged NIC Guide
ConfigFilePath Use this parameter to specify the path to a custom configuration file.
ContinueOnFailure If a test fails in one of the Describe blocks, Validate-DCB exits prior to moving to the next Describe block allowing you to correct the issue. Use this to attempt all tests even if a test failure is detected.
Deploy Use this parameter to deploy the configuration to all specified nodes prior to validating the configuration

About

Validator for RDMA Configuration and Best Practices

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published