Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disabling NETIF_F_HW_VLAN_CTAG_TX issue #1317

Open
michael-Lzh opened this issue May 6, 2024 · 3 comments
Open

disabling NETIF_F_HW_VLAN_CTAG_TX issue #1317

michael-Lzh opened this issue May 6, 2024 · 3 comments

Comments

@michael-Lzh
Copy link

When disabling NETIF_F_HW_VLAN_CTAG_TX feature, the tcpdump can't filter arp type correctly.

please run the following command to repro.
everything runs on the same board(include ping and tcpdump).

1.create VLAN
sudo ip link del link eno1 name eno1.50 type vlan id 50
sudo ip link add link eno1 name eno1.50 type vlan id 50
sudo ip link set eno1.50 type vlan egress 0:5 1:5 2:5 3:5 4:5 5:5 6:5 8:5
sudo ifconfig eno1.50 182.31.50.1 netmask 255.255.254.0 up

2.disbale NETIF_F_HW_VLAN_CTAG_TX feature
ethtool -K eno1 txvlan off

3.ping
ping 182.31.50.99 -I eno1.50 > /dev/null &

4.tcpdump
tcpdump -i eno1 arp

Normally, the 12th-13th data is equal 0x806(arp type), but actually result above is at 16th-17th(check from tpacket_rcv()).
Or The result is correct through modifying BPF code.

the date printed on tpacket_rcv() are the same as below.
image

@infrastation infrastation transferred this issue from the-tcpdump-group/tcpdump May 6, 2024
@guyharris
Copy link
Member

guyharris commented May 20, 2024

Normally, the 12th-13th data is equal 0x806(arp type), but actually result above is at 16th-17th(check from tpacket_rcv()).

In Ethernet traffic, having 0x08 0x06 in octets 12 and 13 of a frame is normal for non-VLAN ARP frames; having them in octets 16 and 17 is normal for VLAN ARP frames.

If you want to filter out non-VLAN ARP frames, the filter is not arp. That will not filter out VLAN ARP frames.

If you want to filter out only VLAN ARP frames, the filter is 'not (vlan and arp)`. That will not filter out non-ARP VLAN frames.

When capturing on a Linux Ethernet adapter, the generated code is:

(000) ld       #0x0				# set M[1] to 0
(001) st       M[1]
(002) ldb      [vlanp]				# is a stripped VLAN tag present?
(003) jeq      #0x1             jt 10	jf 4	# if so, jump ahead
(004) ld       #0x4				# set M[1] to 4
(005) st       M[1]
(006) ldh      [12]				# load Ethertype
(007) jeq      #0x8100          jt 10	jf 8	# if ARP, jump ahead
(008) jeq      #0x88a8          jt 10	jf 9	# if other tag, jump ahead
(009) jeq      #0x9100          jt 10	jf 14	# if other other tag, ditto
(010) ldx      M[1]
(011) ldh      [x + 12]				# load *(M[1] + 12)
(012) jeq      #0x806           jt 13	jf 14	# if ARP, reject, otherwise accept
(013) ret      #0
(014) ret      #262144

That's a bit weird, as it has to handle frames that have VLAN tags stripped out and frames that don't have VLAN tags stripped out. M[1] is being set to 0 if the tag is stripped and 4 if the tag is not stripped, so that M[1] + 12 is the offset, into the packet data as handed to the PF_PACKET socket, of the real Ethertype.

If you want to filter out VLAN ARP frames as well as non-VLAN ARP frames, the filter would, in theory, be not arp and not (vlan and arp), but it's not generating correct code for that:

(000) ldh      [12]				# load Ethertype
(001) jeq      #0x806           jt 9	jf 2	# if ARP, reject
(002) jeq      #0x8100          jt 7	jf 3	# if VLAN, jump ahead
(003) jeq      #0x88a8          jt 7	jf 4	# if other tag, jump ahead
(004) jeq      #0x9100          jt 7	jf 5	# if other other tag, ditto
(005) ldb      [vlanp]				# is a stripped VLAN tag present?
(006) jeq      #0x1             jt 7	jf 10	# if not, succeed
(007) ldh      [16]				# load Ethertype from tag
(008) jeq      #0x806           jt 9	jf 10	# if ARP, reject, otherwise accept
(009) ret      #0
(010) ret      #262144

That's a bug, but it's not the cause of the problem you're seeing.

@michael-Lzh
Copy link
Author

So it's a kernel issue? can it be solved on libpcap?

@guyharris
Copy link
Member

So it's a kernel issue?

The filter not XXX failing to filter out XXX packets, for all values of XXX, in a VLAN is a deficiency of libpcap. Fixing the deficiency without too badly reducing the efficiency of the generated code might require some significant restructuring of the filter compiler.

The filter not udp and not (vlan and arp) generating incorrect code is a libpcap problem.

Neither of those are kernel issues.

The kernel issue is that, before handing packets to the PF_PACKET socket code, VLAN tags might be removed from the raw packet data and put into metadata attached to the skbuff, so 1) the filtering code has to take that into account, so that the filtering is done, for packets with VLAN tags removed, as if the tags had not been removed, and 2) the code that handles packets that pass the filter have to re-insert the VLAN tags into the packet data before handing the packet to the callback, so that the packets look, to the application or library using libpcap, as if the tags had not been removed.

That kernel issue is already handled in libpcap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants