Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of open-nic-dpdk #2

Open
aneesullah opened this issue Mar 28, 2022 · 11 comments
Open

Performance of open-nic-dpdk #2

aneesullah opened this issue Mar 28, 2022 · 11 comments

Comments

@aneesullah
Copy link

aneesullah commented Mar 28, 2022

Hi,
How to reproduce the results reported in "Xilinx Answer 71453 QDMA Performance Report" with PKTGEN, only getting 10Gbps link speed on threadripper pro with U280 card. Are these results only for the QDMA example design or they apply to Open NIC also?
Regards,
Anees

@cneely-amd
Copy link
Collaborator

Hi @aneesullah,

Can you say more about your test setup?

The QDMA Performance report was characterized separately by a different team and not using OpenNIC, however, performance should be similar based on using the QDMA.

The performance using pktgen-dpdk should be something along the lines of the following, however, depending on machine capabilities:

- Ports 0-1 of 2   <Main Page>  Copyright(c) <2010-2021>, Intel Corporation
  Flags:Port        : -------Range      :0 -------Range      :1PMD: qdma_dev_link_update(): Link update done
Link State          :       <UP-100000-FD>       <UP-100000-FD>     ---Total Rate---
Pkts/s Rx           :            7,679,047            7,666,897           15,345,944
       Tx           :            7,743,488            7,743,488           15,486,976
MBits/s Rx/Tx       :        47,762/48,153        49,338/49,827        97,101/97,981
Pkts/s Rx Max       :           27,320,352            7,774,715           27,320,352
       Tx Max       :           36,211,872            7,833,377           36,211,872
Broadcast           :                    0                    0
Multicast           :                    0                    0
Sizes 64            :          892,988,087          700,471,032
      65-127        :       14,667,088,513        8,375,758,013
      128-255       :       48,810,275,136       22,767,085,962
      256-511       :       59,827,277,291       70,602,037,739
      512-1023      :      117,997,157,741      142,212,997,263
      1024-1518     :      115,162,837,292      112,307,731,359
Runts/Jumbos        :                  0/0                  0/0
ARP/ICMP Pkts       :                  0/0                  0/0
Errors Rx/Tx        :                  0/0                  0/0
Total Rx Pkts       :      357,351,332,184      356,959,800,953
      Tx Pkts       :      359,974,053,503      359,438,896,255
      Rx/Tx MBs     :2,222,514,601/2,238,12,297,667,031/2,313,300,795
Pattern Type        :              abcd...              abcd...
Tx Count/% Rate     :        Forever /100%        Forever /100%
Pkt Size/Tx Burst   :            64 /   32            64 /   32
TTL/Port Src/Dest   :       64/ 1234/ 5678       64/ 1234/ 5678
Pkt Type:VLAN ID    :      IPv4 / TCP:0001      IPv4 / TCP:0001
802.1p CoS/DSCP/IPP :            0/  0/  0            0/  0/  0
VxLAN Flg/Grp/vid   :     0000/    0/    0     0000/    0/    0
IP  Destination     :          192.168.1.1          192.168.0.1
    Source          :       192.168.0.1/24       192.168.1.1/24
MAC Destination     :    15:16:17:18:19:1a    15:16:17:18:19:1a
    Source          :    15:16:17:18:19:1a    15:16:17:18:19:1a
PCI Vendor/Addr     :    10ee:903f/65:00.0    10ee:913f/65:00.1
-- Pktgen 21.03.0 (DPDK 20.11.0)  Powered by DPDK  (pid:32576) ----------------

@aneesullah
Copy link
Author

aneesullah commented Mar 28, 2022

Hi @cneely-amd,
Thanks a lot for your quick response. Following is the information:
1) Hardware
From lscpu:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD Ryzen Threadripper PRO 3975WX 32-Cores
Stepping: 0
Frequency boost: enabled
CPU MHz: 1919.520
CPU max MHz: 3500.0000
CPU min MHz: 2200.0000
BogoMIPS: 7000.66
Virtualization: AMD-V
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 16 MiB
L3 cache: 128 MiB
NUMA node0 CPU(s): 0-63
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and _user pointer sanitization
Vulnerability Spectre v2: Mitigation; LFENCE, IBPB conditional, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good no
pl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_l
egacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ib
pb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm

local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif
umip rdpid overflow_recov succor smca

More details about hardware:

H/W path Device Class Description
system AS -5014A-TT (091715D9)
/0 bus M12SWA-TF
/0/0 memory 64KiB BIOS
/0/12 memory 512GiB System Memory
/0/12/0 memory 64GiB DIMM DDR4 Synchron
/0/12/1 memory 64GiB DIMM DDR4 Synchron
/0/12/2 memory 64GiB DIMM DDR4 Synchron
/0/12/3 memory 64GiB DIMM DDR4 Synchron
/0/12/4 memory 64GiB DIMM DDR4 Synchron
/0/12/5 memory 64GiB DIMM DDR4 Synchron
/0/12/6 memory 64GiB DIMM DDR4 Synchron
/0/12/7 memory 64GiB DIMM DDR4 Synchron
/0/15 memory 2MiB L1 cache
/0/16 memory 16MiB L2 cache
/0/17 memory 128MiB L3 cache
/0/18 processor AMD Ryzen Threadripper P
/0/100 bridge Starship/Matisse Root Co
/0/100/0.2 generic Starship/Matisse IOMMU
/0/100/7.1 bridge Starship/Matisse Interna
/0/100/7.1/0 generic Starship/Matisse PCIe Du
/0/100/8.1 bridge Starship/Matisse Interna
/0/100/8.1/0 generic Starship/Matisse Reserve
/0/100/8.1/0.3 bus Starship USB 3.0 Host Co
/0/100/8.1/0.3/0 usb9 bus xHCI Host Controller
/0/100/8.1/0.3/1 usb10 bus xHCI Host Controller
/0/100/14 bus FCH SMBus Controller
/0/100/14.3 bridge FCH LPC Bridge
/0/101 bridge Starship/Matisse PCIe Du
/0/102 bridge Starship/Matisse PCIe Du
/0/103 bridge Starship/Matisse PCIe Du
/0/104 bridge Starship/Matisse PCIe Du
/0/105 bridge Starship/Matisse PCIe Du
/0/106 bridge Starship/Matisse PCIe Du
/0/107 bridge Starship/Matisse PCIe Du
/0/108 bridge Starship Device 24; Func
/0/109 bridge Starship Device 24; Func
/0/10a bridge Starship Device 24; Func
/0/10b bridge Starship Device 24; Func
/0/10c bridge Starship Device 24; Func
/0/10d bridge Starship Device 24; Func
/0/10e bridge Starship Device 24; Func
/0/10f bridge Starship Device 24; Func
/0/110 bridge Starship/Matisse Root Co
/0/110/0.2 generic Starship/Matisse IOMMU
/0/110/7.1 bridge Starship/Matisse Interna
/0/110/7.1/0 generic Starship/Matisse PCIe Du
/0/110/8.1 bridge Starship/Matisse Interna
/0/110/8.1/0 generic Starship/Matisse Reserve
/0/110/8.1/0.1 generic Starship/Matisse Cryptog
/0/110/8.1/0.3 bus Starship USB 3.0 Host Co
/0/110/8.1/0.3/0 usb7 bus xHCI Host Controller
/0/110/8.1/0.3/0/2 bus USB Virtual Hub
/0/110/8.1/0.3/0/2/1 input SMCI HID KM
/0/110/8.1/0.3/0/2/2 enxb03af2b6059f communication RNDIS/Ethernet Gadget
/0/110/8.1/0.3/1 usb8 bus xHCI Host Controller
/0/110/8.1/0.4 multimedia Starship/Matisse HD Audi
/0/111 bridge Starship/Matisse PCIe Du
/0/112 bridge Starship/Matisse PCIe Du
/0/113 bridge Starship/Matisse PCIe Du
/0/114 bridge Starship/Matisse PCIe Du
/0/115 bridge Starship/Matisse PCIe Du
/0/116 bridge Starship/Matisse PCIe Du
/0/117 bridge Starship/Matisse PCIe Du
/0/118 bridge Starship/Matisse Root Co
/0/118/0.2 generic Starship/Matisse IOMMU
/0/118/1.1 bridge Starship/Matisse GPP Bri
/0/118/3.1 bridge Starship/Matisse GPP Bri
/0/118/3.1/0 enp67s0 network Ethernet interface
/0/118/7.1 bridge Starship/Matisse Interna
/0/118/7.1/0 generic Starship/Matisse PCIe Du
/0/118/8.1 bridge Starship/Matisse Interna
/0/118/8.1/0 generic Starship/Matisse Reserve
/0/119 bridge Starship/Matisse PCIe Du
/0/11a bridge Starship/Matisse PCIe Du
/0/11b bridge Starship/Matisse PCIe Du
/0/11c bridge Starship/Matisse PCIe Du
/0/11d bridge Starship/Matisse PCIe Du
/0/11e bridge Starship/Matisse PCIe Du
/0/11f bridge Starship/Matisse PCIe Du
/0/120 bridge Starship/Matisse Root Co
/0/120/0.2 generic Starship/Matisse IOMMU
/0/120/3.1 bridge Starship/Matisse GPP Bri
/0/120/3.1/0 bridge Matisse Switch Upstream
/0/120/3.1/0/1 bridge Matisse PCIe GPP Bridge
/0/120/3.1/0/1/0 storage NVMe SSD Controller Cx6
/0/120/3.1/0/1/0/0 /dev/nvme0 storage KCD6XLUL1T92
/0/120/3.1/0/1/0/0/1 /dev/nvme0n1 disk 1920GB NVMe namespace
/0/120/3.1/0/1/0/0/1/1 /dev/nvme0n1p1 volume 511MiB Windows FAT volum
/0/120/3.1/0/1/0/0/1/2 /dev/nvme0n1p2 volume 8191MiB Linux swap volum
/0/120/3.1/0/1/0/0/1/3 /dev/nvme0n1p3 volume 1779GiB EXT4 volume
/0/120/3.1/0/8 bridge Matisse PCIe GPP Bridge
/0/120/3.1/0/8/0 generic Starship/Matisse Reserve
/0/120/3.1/0/8/0.1 bus Matisse USB 3.0 Host Con
/0/120/3.1/0/8/0.1/0 usb1 bus xHCI Host Controller
/0/120/3.1/0/8/0.1/0/2 generic A-U280-A32G
/0/120/3.1/0/8/0.1/0/6 multimedia USB Audio
/0/120/3.1/0/8/0.1/1 usb2 bus xHCI Host Controller
/0/120/3.1/0/8/0.3 bus Matisse USB 3.0 Host Con
/0/120/3.1/0/8/0.3/0 usb3 bus xHCI Host Controller
/0/120/3.1/0/8/0.3/1 usb4 bus xHCI Host Controller
/0/120/3.1/0/a bridge Matisse PCIe GPP Bridge
/0/120/3.1/0/a/0 storage FCH SATA Controller [AHC
/0/120/3.2 bridge Starship/Matisse GPP Bri
/0/120/3.2/0 bus ASMedia Technology Inc.
/0/120/3.2/0/0 usb5 bus xHCI Host Controller
/0/120/3.2/0/1 usb6 bus xHCI Host Controller
/0/120/3.3 bridge Starship/Matisse GPP Bri
/0/120/3.3/0 enp103s0 network I210 Gigabit Network Con
/0/120/3.4 bridge Starship/Matisse GPP Bri
/0/120/3.4/0 bridge AST1150 PCI-to-PCI Bridg
/0/120/3.4/0/0 display ASPEED Graphics Family
/0/120/3.5 bridge Starship/Matisse GPP Bri
/0/120/3.5/0 network Aquantia Corp.
/0/120/7.1 bridge Starship/Matisse Interna
/0/120/7.1/0 generic Starship/Matisse PCIe Du
/0/120/8.1 bridge Starship/Matisse Interna
/0/120/8.1/0 generic Starship/Matisse Reserve
/0/121 bridge Starship/Matisse PCIe Du
/0/122 bridge Starship/Matisse PCIe Du
/0/123 bridge Starship/Matisse PCIe Du
/0/124 bridge Starship/Matisse PCIe Du
/0/125 bridge Starship/Matisse PCIe Du
/0/126 bridge Starship/Matisse PCIe Du
/0/127 bridge Starship/Matisse PCIe Du
/0/1 system PnP device PNP0c02
/0/2 system PnP device PNP0c01
/0/3 system PnP device PNP0b00
/0/4 system PnP device PNP0c02
/0/5 communication PnP device PNP0501
/0/6 communication PnP device PNP0501
/0/7 system PnP device PNP0c02
/0/8 system PnP device PNP0c02
/1 power To Be Filled By O.E.M.
/2 power To Be Filled By O.E.M.

from numactl --hardware:
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 515641 MB
node 0 free: 482304 MB
node distances:
node 0
0: 10

So, NUMA is not enabled in BIOS, is it required?
2) I have generated vivado project with
vivado -mode tcl -source build.tcl -tclargs -board au280 -min_pkt_len 64 -max_pkt_len 9600 -num_cmac_port 2 -num_phys_func 2 -impl 1 -post_impl 1 -jobs 64

3) The server is connected to a U280 card, both the QSFPz are connected through a loopback cable and pktgen is run with the command:
sudo pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen -a 43:00.0 -a 43:00.1 -d librte_net_qdma.so -l 4-10 -n 4 -a 40:03.1 -a 40:03.1 -- -m [6:7].0 -m [8:9].1

Following is the output

\kPorts 0-1 of 2

Copyright (c) <2010-2020>, Intel Corporation 0/0
Flags:Port : -------Single :0 -------Single :1 0/0
Link State : ---Total Rate---
Pkts/s Max/Rx : 12936320/12730810 12936288/12728434 25872608/25459244
Max/Tx : 12936704/12728481 12936321/12730847 25873025/25459328
MBits/s Rx/Tx : 8555/2443 8553/2444 17108/4888
Broadcast : 0 0
Multicast : 0 0
Sizes 64 : 1501510360 1501000821
65-127 : 0 0
128-255 : 0 0
256-511 : 0 0
512-1023 : 0 0
1024-1518 : 0 0
Runts/Jumbos : 0/0 0/0
ARP/ICMP Pkts : 0/0 0/0
Errors Rx/Tx : 0/0 0/0
Total Rx Pkts : 1497474002 1496964629
Tx Pkts : 1496985888 1497477119
Rx MBs : 1006302 1005960
Tx MBs : 287421 287515
Pkt Size/Tx Burst : 64 / 32 64 / 32
Pattern Type : abcd... abcd...
Tx Count/% Rate : Forever /100% Forever /100%
Pkt Size/Tx Burst : 64 / 32 64 / 32
TTL/Port Src/Dest : 4/ 1234/ 5678 4/ 1234/ 5678
Pkt Type:VLAN ID : IPv4 / TCP:0001 IPv4 / TCP:0001
802.1p CoS/DSCP/IPP : 0/ 0/ 0 0/ 0/ 0
VxLAN Flg/Grp/vid : 0000/ 0/ 0 0000/ 0/ 0
IP Destination : 192.168.1.1 192.168.0.1--------------
Source : 192.168.0.1/24 192.168.1.1/24
MAC Destination : 15:16:17:18:19:1a 15:16:17:18:19:1a
-- Pktgen 20.11.3 (D: 10ee:903f/43:00.0 10ee:913f/43:00.1--------------

It seems from pktgen output that only 64 bytes packets are generated. How to generated larger size packets?
Regards,
Anees

@cneely-amd
Copy link
Collaborator

Hi @aneesullah in your testing with pktgen-dpdk, can you try something like the following:

range 0 size 64 64 1518 3
range 1 size 1500 64 1518 5
enable 0-1 range
start 0-1

@cneely-amd
Copy link
Collaborator

@aneesullah
Also, maybe to go along with my above suggestion for how to vary the packet size in pktgen-dpdk, I wanted to mention that in my testing I've been enabling serdes loopback instead of a cable for those quick tests by writing 0x1 to 0x8090 (for port 0) and 0x1 to 0xC090 (for port 1).

@aneesullah
Copy link
Author

aneesullah commented Mar 31, 2022

Hi @cneely-amd ,
Thanks a lot. Improved, getting around 70 Gbps for Cable loopback but still not able to hit the link rate.
Here is the PKTGEN snapshot:
pktgen_cable_lp
Also getting around 70 Gbps with SERDES loopback, For enabling SERDES Loopback used the following
open_nic_registers_setup
Any idea?

Regards,
Anees

@aneesullah
Copy link
Author

aneesullah commented Mar 31, 2022

Another related question:
pktgen-dpdk allows only 16 bytes of user fill pattern for testing, what if we have a large amount of data from a file or from memory to transfer? I think, it is not supported. What about dma_to_device and dma_from_device functions from the QDMA driver library, Can they be used to transmit custom user data at 100 Gbps with Open NIC? Shall we need to write our own dpdk app based on QDMA driver library which is patched for OpenNIC? Any suggestion how such functionality can be quickly achieved? Note, we do need to measure the performance while doing the TX/RX when transferring our data. Thanks a lot

@cneely-amd
Copy link
Collaborator

cneely-amd commented Apr 6, 2022

@aneesullah
I'm not sure what the best approach would be to improving the performance. I can give two examples of different machine configurations that I have tried recently. Both are using -n 4 with mapping cores like in the example.

~70-80Gbps (fluctuating in that range):

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            11th Gen Intel(R) Core(TM) i7-11700F @ 2.50GHz
    CPU family:          6
    Model:               167
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            1
    CPU max MHz:         4900.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4992.00
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   384 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    4 MiB (8 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15

RAM: 32 GB

~95Gbps (fairly constant):

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
Stepping:            7
CPU MHz:             800.045
CPU max MHz:         3200.0000
CPU min MHz:         800.0000
BogoMIPS:            4200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            22528K
NUMA node0 CPU(s):   0-15,32-47
NUMA node1 CPU(s):   16-31,48-63

RAM: 192GB
(Note: I updated the info above because the first time it didn't paste correctly into my message)

I also have a Ryzen 5950 with 32GB for testing, but right now my GPU is using up most of the lanes and I need to swap the order of my PCI cards around before I can test it. I'll try to do that as an experiment when I get a chance.

Best regards,
--Chris

@cneely-amd
Copy link
Collaborator

cneely-amd commented Apr 9, 2022

Hi @aneesullah,

I tried my Ryzen 5950X machine and I'm getting the following:

\ Ports 0-1 of 2   <Main Page>  Copyright(c) <2010-2021>, Intel Corporation
  Flags:Port        : -------Range      :0 -------Range      :1
Link State          :       <UP-100000-FD>       <UP-100000-FD>     ---Total Rate---
Pkts/s Rx           :            7,965,641            7,958,737           15,924,378
       Tx           :            8,003,200            8,003,200           16,006,400
MBits/s Rx/Tx       :        49,627/49,875        51,223/51,520      100,851/101,395
Pkts/s Rx Max       :            8,104,052            8,121,805           16,225,857
       Tx Max       :            8,129,920            8,129,793           16,259,713
Broadcast           :                    0                    0
Multicast           :                    0                    0
Sizes 64            :            2,131,832            2,131,412
      65-127        :           44,765,934           25,580,104
      128-255       :          147,057,426           68,193,983
      256-511       :          183,028,200          217,291,635
      512-1023      :          362,044,422          434,493,564
      1024-1518     :          351,542,078          342,874,892
Runts/Jumbos        :                  0/0                  0/0
ARP/ICMP Pkts       :                  0/0                  0/0
Errors Rx/Tx        :                  0/0                  0/0
Total Rx Pkts       :        1,084,465,571        1,084,468,533
      Tx Pkts       :        1,085,482,495        1,085,482,367
      Rx/Tx MBs     :  6,758,742/6,764,658  6,981,025/6,987,758
Pattern Type        :              abcd...              abcd...
Tx Count/% Rate     :        Forever /100%        Forever /100%
Pkt Size/Tx Burst   :            64 /   32            64 /   32
TTL/Port Src/Dest   :       64/ 1234/ 5678       64/ 1234/ 5678
Pkt Type:VLAN ID    :      IPv4 / TCP:0001      IPv4 / TCP:0001
802.1p CoS/DSCP/IPP :            0/  0/  0            0/  0/  0
VxLAN Flg/Grp/vid   :     0000/    0/    0     0000/    0/    0
IP  Destination     :          192.168.1.1          192.168.0.1
    Source          :       192.168.0.1/24       192.168.1.1/24
MAC Destination     :    15:16:17:18:19:1a    15:16:17:18:19:1a
    Source          :    15:16:17:18:19:1a    15:16:17:18:19:1a
PCI Vendor/Addr     :    10ee:903f/0b:00.0    10ee:913f/0b:00.1
-- Pktgen 21.03.1 (DPDK 20.11.0)  Powered by DPDK  (pid:3131) -----------------

This is with (as before):

Pktgen:/> range 0 size 64 64 1518 3
Pktgen:/> range 1 size 1500 64 1518 5
Pktgen:/> enable 0-1 range
Pktgen:/> start 0-1

lscpu reports:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 5950X 16-Core Processor
    CPU family:          25
    Model:               33
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU max MHz:         5272.6558
    CPU min MHz:         2200.0000
    BogoMIPS:            6799.19
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   512 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    8 MiB (16 instances)
  L3:                    64 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-31

RAM: 32GB

(P.S. note: my Ryzen machine might have some overclocking settings enabled due to the latest Radeon software driver issue in the news.)

@aneesullah
Copy link
Author

Hi @cneely-amd,
I checked on another machine, on this one, NUMA nodes are not enabled from the BIOS. I am able to get 100Gbps.
image (5)
Thanks for your help. Any idea why the enabling NUMA reduces the speed on the other machine or is it something else?
Regards,
Anees

@cneely-amd
Copy link
Collaborator

Hi @aneesullah,
I would guess that it might have to do with NUMA and allocation of hugepages and their locality to whichever processor cores are specified in the test, but that is just a guess.
--Chris

@attdone
Copy link

attdone commented Dec 28, 2023

Hi @aneesullah and @cneely-amd ,
I am utilizing the Alveo U200 and have conducted tests with pktgen. However, the observed packet transfer rates are at 300 MBits/s for transmission and 9000 MBits/s for reception. I'm seeking guidance on how to enhance the throughput to achieve the optimal 100 Gbps. I have configured the BIOS settings in accordance with the specifications outlined on the Open-NIC DPDK Git page.
image

During packet transfer, I used to get "Timeout on request to dma internal csr register", "Packet length mismatch error" and "Detected Fatal length mismatch". This hinders further transfer. Please let me know how to resolve this?
image

Thanking in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants