Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the rule handle after flush #88

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

alexispires
Copy link
Contributor

@alexispires alexispires commented Dec 26, 2019

Currently this very basic and useful workflow is not possible:

r1 := c.AddRule(&nftables.Rule{
	Table: filter,
	Chain: prerouting,
	Exprs: []expr.Any{
		&expr.Verdict{
			// [ immediate reg 0 drop ]
			Kind: expr.VerdictDrop,
		},
	},
})

c.Flush()

c.DelRule(r1)

c.Flush()

The solution:
Because of NLM_F_ECHO all netlink messages in the transaction are returned back. In the case of rules, there are returned back with Handle. Rules are linked to requests by keeping the index of netlink request of a rule. We rely on sequence number to retrieve the right request when receiving a netlink response.

What does the solution allow?

  • Manipulating rules without fetching and processing rules before
  • Adding comment to rules, because we don't need user data to retrieve rules anymore

@alexispires alexispires changed the title set handle back Set the rule handle after flush Dec 26, 2019
@alexispires alexispires marked this pull request as ready for review December 26, 2019 11:52
@sbezverk
Copy link
Contributor

@alexispires check out this library github.com/sbezverk/nftableslib it is based on github.com/google/nftables and it allows to do easily what you need.

@sbezverk
Copy link
Contributor

I think this change would break 1 important use case, when Flush() is called only once for everything, chains, rules, sets etc. We have a use case when everything gets prepared and then programmed in a single shot. I do not see how proposed implementation would address such case.

@alexispires
Copy link
Contributor Author

Look at my test function, I send 1 table, 1 chain and 2 rules in a single transaction

@sbezverk
Copy link
Contributor

When there are several chains, multiple rules how do you insure correspondence with returned handle and rule? It seems you rely only on sequence, I do not think it is reliable enough. Still it is better to extend functionality which ensures backward compatibility and not change with breakage.

@alexispires
Copy link
Contributor Author

I understand your concern. The same problem appeared to nft tool and the same assumptions were made, which makes them acceptable to me.

Look at this commit : http://git.netfilter.org/nftables/commit/?id=bb32d8db9a125d9676f87866e48ffbf0221ec16a

The basic principle is to not return a JSON object freshly created from
netlink responses, but just update the existing user-provided one to
make sure callers get back exactly what they expect.
To achieve that, keep the parsed JSON object around in a global variable
('cur_root') and provide a custom callback to insert handles into it
from received netlink messages. The tricky bit here is updating rules
since unique identification is problematic. Therefore drop possibly
present handles from input and later assume updates are received in
order so the first rule not having a handle set is the right one.

The proposal to extend the functionality bothers me since it introduces a behavior specific to the go implementation. The notion of Handle is already used to identify the rules and the echo function exist mainly to return the Handle.

Look at this commit too: http://git.netfilter.org/nftables/commit/?id=b99c4d072d9969f7a0dfc539b2b68b517f90af68

@sbezverk
Copy link
Contributor

The work you quoted geared towards less dynamic and more static implementations, imho. My focus is in dynamic, concurrent use uses, example kube-proxy using nftables. Here is scenario when this approach pretty much guarantees the issue, there are 2 processes handling ipv4 and ipv6 tables concurrently, there is a possibility that ipv4 adn ipv6 rules are put in the same messages queue for Flush to program, if you cannot identify precisely which rule handle corresponds to which rule, then it is a matter of chance before they get mixed up. I think using user defined field to be able to identify a particular rule and its id is a safe and reliable way covering all possible use cases, well at least none of use cases presented in the past were not covered by this approach.

@sbezverk
Copy link
Contributor

The same problem appeared to nft tool and the same assumptions were made, which makes them acceptable to me.

It is a public and shared library, what is acceptable to you might not work for others, let's find a solution which is not breaking the current behaviour and providing you with what you need for your use case.

conn.go Outdated Show resolved Hide resolved
@stapelberg
Copy link
Collaborator

I’m not sure yet whether we should support this behavior.

For robustness, programs should generally do idempotent operations where possible. Changing the program from its current “modify/modify” approach to a “modify, query/modify” approach lends itself better to making the program work correctly when interrupted.

Can you share some more details regarding what you’re trying to do?

@alexispires
Copy link
Contributor Author

Today there's only two way to delete or replace a rule with this library:

  • Using UserData and put a custom userland id (not standard), then fetch all rules and retrieve it
  • Fetch all rules and compare field by field to retrieve the right rule

IMO the right way is to get back the handle after flush. I can't tell much about what I'm trying to do (commercial product) but it's like a SDN application. I store in memory inside a container (so not problem with interruption it's almost stateless) all my rules to delete them after.

I'm trying to code it in a strong way by matching the sequence number of request (rule by rule) with sequence of the response wich contain handle. Maybe it can be stronger.

@alexispires
Copy link
Contributor Author

alexispires commented Jan 3, 2020

@stapelberg @sbezverk
I'm trying to improve my loop control and I'm not sure about the best way to accomplish it. By using strace I see that nftables do a select syscall to detect if a netlink message is going to be received. Do you think this is the right way ?

select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(......)
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)

@alexispires
Copy link
Contributor Author

I made a proto with select, what do you think?

conn_linux.go (from github.com/mdlayher/netlink)

func (c *conn) Select() (int, error) {

	var fdSet unix.FdSet
	fdSet.Zero()
	fdSet.Set(c.s.FD())

	n, err := unix.Select(c.s.FD()+1, &fdSet, nil, nil, &unix.Timeval{})

	return n, err
}

conn.go (from github.com/google/nftables)

        smsg, err := conn.SendMessages(batch(cc.messages))

	if err != nil {
		return fmt.Errorf("SendMessages: %w", err)
	}

	// Retrieving of seq number associated to entities
	entitiesBySeq := make(map[uint32]Entity)
	for i, e := range cc.entities {
		entitiesBySeq[smsg[i].Header.Sequence] = e
	}

	// Trigger entities callback
	for i := 1; i > 0; i, _ = conn.Select() {
		rmsg, err := conn.Receive()

		if err != nil {
			return fmt.Errorf("Receive: %w", err)
		}

		for _, msg := range rmsg {
			if e, ok := entitiesBySeq[msg.Header.Sequence]; ok {
				e.HandleResponse(msg)
			}
		}
	}

conn.go Show resolved Hide resolved
conn.go Outdated Show resolved Hide resolved
conn.go Show resolved Hide resolved
@alexispires
Copy link
Contributor Author

alexispires commented Jan 6, 2020

I just added a more robust test suites (4096 rules added in // by 16 workers) with assert which rely on UserData to check Handle value.

rule.go Outdated Show resolved Hide resolved
@alexispires
Copy link
Contributor Author

Cleaner version below, but both works and are reliable.

diff --git a/conn.go b/conn.go
index 6a552d8..34cf0b9 100644
--- a/conn.go
+++ b/conn.go
@@ -17,7 +17,6 @@ package nftables
 import (
 	"fmt"
 	"sync"
-	"sync/atomic"
 
 	"github.com/google/nftables/expr"
 	"github.com/mdlayher/netlink"
@@ -41,7 +40,7 @@ type Conn struct {
 	sync.Mutex
 	put      sync.Mutex
 	messages []netlink.Message
-	entities map[int32]Entity
+	entities map[int]Entity
 	it       int32
 	err      error
 }
@@ -52,7 +51,6 @@ func (cc *Conn) Flush() error {
 	defer func() {
 		cc.messages = nil
 		cc.entities = nil
-		cc.it = 0
 		cc.Unlock()
 	}()
 	if len(cc.messages) == 0 {
@@ -71,7 +69,7 @@ func (cc *Conn) Flush() error {
 
 	cc.endBatch(cc.messages)
 
-	_, err = conn.SendMessages(cc.messages[:cc.it+1])
+	_, err = conn.SendMessages(cc.messages)
 
 	if err != nil {
 		return fmt.Errorf("SendMessages: %w", err)
@@ -104,36 +102,29 @@ func (cc *Conn) Flush() error {
 }
 
 // PutMessage store netlink message to sent after
-func (cc *Conn) PutMessage(msg netlink.Message) int32 {
+func (cc *Conn) PutMessage(msg netlink.Message) int {
 	cc.put.Lock()
 	defer cc.put.Unlock()
 
 	if cc.messages == nil {
-		cc.messages = make([]netlink.Message, 16)
-		cc.messages[0] = netlink.Message{
+		cc.messages = append(cc.messages, netlink.Message{
 			Header: netlink.Header{
 				Type:  netlink.HeaderType(unix.NFNL_MSG_BATCH_BEGIN),
 				Flags: netlink.Request,
 			},
 			Data: extraHeader(0, unix.NFNL_SUBSYS_NFTABLES),
-		}
-	}
-
-	i := atomic.AddInt32(&cc.it, 1)
-
-	if len(cc.messages) <= int(i) {
-		cc.messages = resize(cc.messages)
+		})
 	}
 
-	cc.messages[i] = msg
+	cc.messages = append(cc.messages, msg)
 
-	return i
+	return len(cc.messages) - 1
 }
 
 // PutEntity store entity to relate to netlink response
-func (cc *Conn) PutEntity(i int32, e Entity) {
+func (cc *Conn) PutEntity(i int, e Entity) {
 	if cc.entities == nil {
-		cc.entities = make(map[int32]Entity)
+		cc.entities = make(map[int]Entity)
 	}
 	cc.entities[i] = e
 }
@@ -214,23 +205,14 @@ func (cc *Conn) marshalExpr(e expr.Any) []byte {
 
 func (cc *Conn) endBatch(messages []netlink.Message) {
 
-	i := atomic.AddInt32(&cc.it, 1)
-
-	if len(cc.messages) <= int(i) {
-		cc.messages = resize(cc.messages)
-	}
+	cc.put.Lock()
+	defer cc.put.Unlock()
 
-	cc.messages[i] = netlink.Message{
+	cc.messages = append(cc.messages, netlink.Message{
 		Header: netlink.Header{
 			Type:  netlink.HeaderType(unix.NFNL_MSG_BATCH_END),
 			Flags: netlink.Request,
 		},
 		Data: extraHeader(0, unix.NFNL_SUBSYS_NFTABLES),
-	}
-}
-
-func resize(messages []netlink.Message) []netlink.Message {
-	new := make([]netlink.Message, cap(messages)*2)
-	copy(new, messages)
-	return new
+	})
 }

@sbezverk
Copy link
Contributor

sbezverk commented Jan 8, 2020

@alexispires could you please demonstrate that with this approach you can create rules with a reference to anonymous set. The reason for this request is the fact that creation of anonymous set must happen within the same transaction of a rule creation. It is not clear to me if it works with this solution.

@alexispires
Copy link
Contributor Author

@alexispires could you please demonstrate that with this approach you can create rules with a reference to anonymous set. The reason for this request is the fact that creation of anonymous set must happen within the same transaction of a rule creation. It is not clear to me if it works with this solution.

I'm not sure to understand, there are tests for this case (TestCreateUseAnonymousSet), you mean if handle ref is still retrieved ?

@sbezverk
Copy link
Contributor

sbezverk commented Jan 8, 2020

@alexispires The test case only tests the sequence of bytes that gets sent, not the actual kernel/netfilter programming, unfortunately travis does not allow this type of "e2e" tests. I was asking you a favour, to build an actual simple binary, based on your change that programs a rule with a reference to an anonymous set and then run 'nft list table {your table name}' to confirm it.

@alexispires
Copy link
Contributor Author

@sbezverk ok no problem, i'll come back to you

@alexispires
Copy link
Contributor Author

alexispires commented Jan 8, 2020

It looks good:

Before flush rule Handle is 0
After flush rule Handle is 3
table ip filter {
        chain nfpoc {
                type filter hook postrouting priority 0; policy accept;
                tcp dport { tftp, 1163 } drop # handle 3
        }
}

Code (also in my github, with binary included for x64: https://github.com/alexispires/nfpoc):

package main

import (
	"fmt"
	"os/exec"

	"github.com/google/nftables"
	"github.com/google/nftables/binaryutil"
	"github.com/google/nftables/expr"
	"golang.org/x/sys/unix"
)

func main() {

	c := &nftables.Conn{}

	table := c.AddTable(&nftables.Table{
		Family: nftables.TableFamilyIPv4,
		Name:   "filter",
	})

	chain := c.AddChain(&nftables.Chain{
		Name:     "nfpoc",
		Table:    table,
		Type:     nftables.ChainTypeFilter,
		Hooknum:  nftables.ChainHookPostrouting,
		Priority: nftables.ChainPriorityFilter,
	})

	set := &nftables.Set{
		Anonymous: true,
		Constant:  true,
		Table:     table,
		KeyType:   nftables.TypeInetService,
	}

	if err := c.AddSet(set, []nftables.SetElement{
		{Key: binaryutil.BigEndian.PutUint16(69)},
		{Key: binaryutil.BigEndian.PutUint16(1163)},
	}); err != nil {
		fmt.Printf("c.AddSet() failed: %s", err.Error())
	}

	r := c.AddRule(&nftables.Rule{
		Table: table,
		Chain: chain,
		Exprs: []expr.Any{
			// [ meta load l4proto => reg 1 ]
			&expr.Meta{Key: expr.MetaKeyL4PROTO, Register: 1},
			// [ cmp eq reg 1 0x00000006 ]
			&expr.Cmp{
				Op:       expr.CmpOpEq,
				Register: 1,
				Data:     []byte{unix.IPPROTO_TCP},
			},

			// [ payload load 2b @ transport header + 2 => reg 1 ]
			&expr.Payload{
				DestRegister: 1,
				Base:         expr.PayloadBaseTransportHeader,
				Offset:       2,
				Len:          2,
			},
			// [ lookup reg 1 set __set%d ]
			&expr.Lookup{
				SourceRegister: 1,
				SetName:        set.Name,
				SetID:          set.ID,
			},
			// [ immediate reg 0 drop ]
			&expr.Verdict{
				Kind: expr.VerdictDrop,
			},
		},
	})

	fmt.Printf("Before flush rule Handle is %d\n", r.Handle)

	if err := c.Flush(); err != nil {
		fmt.Printf(err.Error())
	}

	fmt.Printf("After flush rule Handle is %d\n", r.Handle)

	out, err := exec.Command("/usr/sbin/nft", "-a", "list", "table", "filter").Output()
	if err != nil {
		fmt.Printf(err.Error())
	}
	fmt.Printf("%s\n", out)
}

@alexispires
Copy link
Contributor Author

@stapelberg do you need more informations?

Copy link
Collaborator

@stapelberg stapelberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! First review pass done

conn.go Outdated Show resolved Hide resolved
conn.go Outdated Show resolved Hide resolved
conn.go Show resolved Hide resolved
conn.go Outdated Show resolved Hide resolved
conn.go Outdated Show resolved Hide resolved
rule.go Outdated Show resolved Hide resolved
return fmt.Errorf("Receive: %w", err)
// Retrieving of seq number associated to entities
entitiesBySeq := make(map[uint32]Entity)
for i, e := range cc.entities {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the separate entites book-keeping, why not just iterate over cc.messages and check if the type implements HandleResponse?

for _, msg := range cc.messages {
  m, ok := msg.(interface { HandleResponse(netlink.Message) })
  if !ok {
    continue
  }
  m.HandleResponse(rmsg)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HandleResponse is not a method that belong to netlink.Message. It's belong to entity.
For a given entity, it's perform some operations on it by using the netlink response. We have to connect an entity (like a Rule) and a netlink message in case of the slice of responses is unordered. I connect them with the netlink sequence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I’m still not particularly happy about having to manage messages and entities, which are separate but connected.

Instead of the current approach, we could make messages take an interface type which has a NetlinkMessage method. putMessages would then wrap a netlink.Message in a type which implements NetlinkMessage, and can optionally implement HandleResponse as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stapelberg I'm not sure to understand what do you have in mind but the main limit I have is that the SendMessages method of netlink library take a slice of netlink.Message and not a slice of netlink.Message's pointer. So the only data struct that give me the netlink sequence number is the slice of netlink.Message returned by sendMessages. So the slice of netlink.Message have to be connected in one way or another to something else. In your solution I have to connect the type that wrap netlink.Message with the netlink.Message that contains the seq even if both are the same netlink message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a little confused now: you’re saying the sequence numbers are only set in the []netlink.Message that’s returned by (*netlink.Conn).SendMessages, but in your PR, you’re discarding that result:

if _, err = conn.SendMessages(cc.messages); err != nil {

Is that a bug in your PR, or am I misunderstanding?

(Independently, related to your question: the fact that the netlink package takes a []netlink.Message instead of []*netlink.Message doesn’t limit us in which data structures we want to use. If need be, we can do a copy from one to the other. A netlink.Message’s Header is just a few ints, and copying the Body byte slice won’t need to copy its contents.)

nftables_test.go Outdated Show resolved Hide resolved
}
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test running many workers concurrently? In other words: why is not sufficient to test with 1 worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test the behaviors with concurrency as exprimed here: #88 (comment)
IMO it's safer to keep it to identify regression on concurent access. But it's not specific on this part of lib, I think concurrency have to be tested on the whole lib.

conn.go Outdated Show resolved Hide resolved
@sbezverk
Copy link
Contributor

@alexispires @stapelberg Can we preserve old Flush() call the way it is now, but new changes implement in something like FlushWithCallback. I will not be able to switch to new functionality right away, maybe eventually after running some performance tests.

@alexispires
Copy link
Contributor Author

@alexispires @stapelberg Can we preserve old Flush() call the way it is now, but new changes implement in something like FlushWithCallback. I will not be able to switch to new functionality right away, maybe eventually after running some performance tests.

I let @stapelberg decide about it, but what do you suggest as API ?

@sbezverk
Copy link
Contributor

@alexispires I was suggesting to have old API and new API in parallel, it will help to preserve the backward compatibility and give us a chance to move to new API when confidence/timing is right.

TestDial nltest.Func // for testing only; passed to nltest.Dial
NetNS int // Network namespace netlink will interact with.
entities map[int]Entity
messagesMu sync.Mutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move messagesMu and messages down and separate them with a newline (to form a group of fields):

type Conn struct {
  TestDial nltest.Func
  // …
  entities map[int]Entity

  messagesMu sync.Mutex
  messages []netlink.Message
}

}

// Trigger entities callback
msg, err := cc.checkReceive(conn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call checkReceive instead of just calling conn.Receive? It’s not clear to me whether you’re blocking until the next message is available, or only reading as long as messages are available. If the latter, that doesn’t seem sound to me—instead, we should read blockingly until we can identify the end of the response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to read as long as message are available. I'm not sure how to detect the end of the response without select.

Here an example of batch with echo flag:

sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{{len=20, type=NFNL_MSG_BATCH_BEGIN, flags=NLM_F_REQUEST, seq=3, pid=0}, "\x00\x00\x0a\x00"}, {{len=36, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWTABLE, flags=NLM_F_REQUEST|NLM_F_ECHO, seq=4, pid=0}, "\x02\x00\x00\x00\x08\x00\x01\x00\x6e\x61\x74\x00\x08\x00\x02\x00\x00\x00\x00\x00"}, {{len=84, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, flags=NLM_F_REQUEST|NLM_F_ECHO|NLM_F_CREATE, seq=5, pid=0}, "\x02\x00\x00\x00\x08\x00\x01\x00\x6e\x61\x74\x00\x0f\x00\x03\x00\x70\x72\x65\x72\x6f\x75\x74\x69\x6e\x67\x00\x00\x14\x00\x04\x80\x08\x00\x01\x00\x00\x00\x00\x00\x08\x00\x02\x00\x00\x00\x00\x00\x08\x00\x05\x00\x00\x00\x00\x01\x0b\x00\x07\x00\x66\x69\x6c\x74\x65\x72\x00\x00"}, {{len=84, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, flags=NLM_F_REQUEST|NLM_F_ECHO|NLM_F_CREATE, seq=6, pid=0}, "\x02\x00\x00\x00\x08\x00\x01\x00\x6e\x61\x74\x00\x10\x00\x03\x00\x70\x6f\x73\x74\x72\x6f\x75\x74\x69\x6e\x67\x00\x14\x00\x04\x80\x08\x00\x01\x00\x00\x00\x00\x04\x08\x00\x02\x00\x00\x00\x00\x64\x08\x00\x05\x00\x00\x00\x00\x01\x0b\x00\x07\x00\x66\x69\x6c\x74\x65\x72\x00\x00"}, {{len=96, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSET, flags=NLM_F_REQUEST|NLM_F_ECHO|NLM_F_CREATE, seq=7, pid=0}, "\x02\x00\x00\x00\x08\x00\x01\x00\x6e\x61\x74\x00\x0c\x00\x02\x00\x5f\x5f\x73\x65\x74\x25\x64\x00\x08\x00\x03\x00\x00\x00\x00\x03\x08\x00\x04\x00\x00\x00\x00\x07\x08\x00\x05\x00\x00\x00\x00\x04\x08\x00\x0a\x00\x00\x00\x00\x04\x0c\x00\x09\x80\x08\x00\x01\x00\x00\x00\x00\x04\x0a\x00\x0d\x00\x00\x04\x02\x00\x00\x00\x00\x00"}, {{len=116, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSETELEM, flags=NLM_F_REQUEST|NLM_F_ECHO|NLM_F_CREATE, seq=7, pid=0}, "\x02\x00\x00\x00\x0c\x00\x02\x00\x5f\x5f\x73\x65\x74\x25\x64\x00\x08\x00\x04\x00\x00\x00\x00\x04\x08\x00\x01\x00\x6e\x61\x74\x00\x44\x00\x03\x80\x10\x00\x01\x80\x0c\x00\x01\x80\x08\x00\x01\x00\x54\x4d\x28\x84\x10\x00\x02\x80\x0c\x00\x01\x80\x08\x00\x01\x00\xb0\x1f\x35\x63\x10\x00\x03\x80\x0c\x00\x01\x80\x08\x00\x01\x00\x51\x13\x60\x94\x10\x00\x04\x80\x0c\x00\x01\x80\x08\x00\x01\x00\x8a\x64\x3e\x08"}, {{len=168, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWRULE, flags=NLM_F_REQUEST|NLM_F_ECHO|NLM_F_EXCL|NLM_F_CREATE|NLM_F_APPEND, seq=8, pid=0}, "\x02\x00\x00\x00\x08\x00\x01\x00\x6e\x61\x74\x00\x0f\x00\x02\x00\x70\x72\x65\x72\x6f\x75\x74\x69\x6e\x67\x00\x00\x7c\x00\x04\x80\x34\x00\x01\x80\x0c\x00\x01\x00\x70\x61\x79\x6c\x6f\x61\x64\x00\x24\x00\x02\x80\x08\x00\x01\x00\x00\x00\x00\x01\x08\x00\x02\x00\x00\x00\x00\x01\x08\x00\x03\x00\x00\x00\x00\x0c\x08\x00\x04\x00\x00\x00\x00\x04\x30\x00\x01\x80\x0b\x00\x01\x00\x6c\x6f\x6f\x6b\x75\x70\x00\x00\x20\x00\x02\x80\x08\x00\x02\x00\x00\x00\x00\x01\x0c\x00\x01\x00\x5f\x5f\x73\x65\x74\x25\x64\x00\x08\x00\x04\x00\x00\x00\x00\x04\x14\x00\x01\x80\x0c\x00\x01\x00\x63\x6f\x75\x6e\x74\x65\x72\x00\x04\x00\x02\x80"}, {{len=20, type=NFNL_MSG_BATCH_END, flags=NLM_F_REQUEST, seq=9, pid=0}, "\x00\x00\x0a\x00"}], iov_len=624}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 624
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=104, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, flags=0, seq=5, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0f\x00\x03\x00\x70\x72\x65\x72\x6f\x75\x74\x69\x6e\x67\x00\x00\x14\x00\x04\x00\x08\x00\x01\x00\x00\x00\x00\x00\x08\x00\x02\x00\x00\x00\x00\x00\x08\x00\x05\x00\x00\x00\x00\x01\x0b\x00\x07\x00\x66\x69\x6c\x74\x65\x72\x00\x00\x08\x00\x06\x00\x00\x00\x00\x04"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 104
fstat(1, {st_dev=makedev(0, 22), st_ino=3, st_mode=S_IFCHR|0620, st_nlink=1, st_uid=1000, st_gid=5, st_blksize=1024, st_blocks=0, st_rdev=makedev(136, 0), st_atime=1579772816 /* 2020-01-23T10:46:56.597292734+0100 */, st_atime_nsec=597292734, st_mtime=1579772816 /* 2020-01-23T10:46:56.597292734+0100 */, st_mtime_nsec=597292734, st_ctime=1579772494 /* 2020-01-23T10:41:34.597292734+0100 */, st_ctime_nsec=597292734}) = 0
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=104, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, flags=0, seq=6, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x02\x10\x00\x03\x00\x70\x6f\x73\x74\x72\x6f\x75\x74\x69\x6e\x67\x00\x14\x00\x04\x00\x08\x00\x01\x00\x00\x00\x00\x04\x08\x00\x02\x00\x00\x00\x00\x64\x08\x00\x05\x00\x00\x00\x00\x01\x0b\x00\x07\x00\x66\x69\x6c\x74\x65\x72\x00\x00\x08\x00\x06\x00\x00\x00\x00\x00"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 104
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=100, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSET, flags=0, seq=7, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0b\x00\x02\x00\x5f\x5f\x73\x65\x74\x33\x00\x00\x0c\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x09\x08\x00\x03\x00\x00\x00\x00\x03\x08\x00\x04\x00\x00\x00\x00\x07\x08\x00\x05\x00\x00\x00\x00\x04\x0a\x00\x0d\x00\x00\x04\x02\x00\x00\x00\x00\x00\x0c\x00\x09\x00\x08\x00\x01\x00\x00\x00\x00\x04"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 100
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=60, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSETELEM, flags=0, seq=0, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0b\x00\x02\x00\x5f\x5f\x73\x65\x74\x33\x00\x00\x14\x00\x03\x00\x10\x00\x01\x00\x0c\x00\x01\x00\x08\x00\x01\x00\x54\x4d\x28\x84"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 60
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=60, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSETELEM, flags=0, seq=0, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0b\x00\x02\x00\x5f\x5f\x73\x65\x74\x33\x00\x00\x14\x00\x03\x00\x10\x00\x01\x00\x0c\x00\x01\x00\x08\x00\x01\x00\xb0\x1f\x35\x63"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 60
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=60, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSETELEM, flags=0, seq=0, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0b\x00\x02\x00\x5f\x5f\x73\x65\x74\x33\x00\x00\x14\x00\x03\x00\x10\x00\x01\x00\x0c\x00\x01\x00\x08\x00\x01\x00\x51\x13\x60\x94"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 60
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=60, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWSETELEM, flags=0, seq=0, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0b\x00\x02\x00\x5f\x5f\x73\x65\x74\x33\x00\x00\x14\x00\x03\x00\x10\x00\x01\x00\x0c\x00\x01\x00\x08\x00\x01\x00\x8a\x64\x3e\x08"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 60
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000040}, msg_namelen=12, msg_iov=[{iov_base={{len=204, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWRULE, flags=0, seq=8, pid=3583}, "\x02\x00\x00\x07\x08\x00\x01\x00\x6e\x61\x74\x00\x0f\x00\x02\x00\x70\x72\x65\x72\x6f\x75\x74\x69\x6e\x67\x00\x00\x0c\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x0a\x94\x00\x04\x00\x34\x00\x01\x00\x0c\x00\x01\x00\x70\x61\x79\x6c\x6f\x61\x64\x00\x24\x00\x02\x00\x08\x00\x01\x00\x00\x00\x00\x01\x08\x00\x02\x00\x00\x00\x00\x01\x08\x00\x03\x00\x00\x00\x00\x0c\x08\x00\x04\x00\x00\x00\x00\x04\x30\x00\x01\x00\x0b\x00\x01\x00\x6c\x6f\x6f\x6b\x75\x70\x00\x00\x20\x00\x02\x00\x0b\x00\x01\x00\x5f\x5f\x73\x65\x74\x33\x00\x00\x08\x00\x02\x00\x00\x00\x00\x01\x08\x00\x05\x00\x00\x00\x00\x00\x2c\x00\x01\x00\x0c\x00\x01\x00\x63\x6f\x75\x6e\x74\x65\x72\x00\x1c\x00\x02\x00\x0c\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00"}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 204
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about setting NLM_F_ACK or NLM_F_ECHO on the batch start and batch end messages? That way, you should be able to read until you encounter the sequence number for the batch end message.

return fmt.Errorf("Receive: %w", err)
// Retrieving of seq number associated to entities
entitiesBySeq := make(map[uint32]Entity)
for i, e := range cc.entities {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I’m still not particularly happy about having to manage messages and entities, which are separate but connected.

Instead of the current approach, we could make messages take an interface type which has a NetlinkMessage method. putMessages would then wrap a netlink.Message in a type which implements NetlinkMessage, and can optionally implement HandleResponse as well.

@prologic
Copy link

prologic commented Sep 8, 2022

Damn I need this! Any changes we can get this merged? 🙏

@prologic
Copy link

prologic commented Sep 8, 2022

I'm not familiar enough with the codebase to help resolve the merge conflicts :/

@alexispires
Copy link
Contributor Author

There was some reluctance about it. I could continue this work if it's deemed useful.

@prologic
Copy link

prologic commented Sep 9, 2022

There was some reluctance about it. I could continue this work if it's deemed useful.

Please do 🙏 I could really use the improvement you set out to fix here 🙇‍♂️

I think the comments from @stapelberg should be easily addressable 👌

@prologic
Copy link

@alexispires Just an FYI, this is where I want to use this: https://git.mills.io/prologic/box/issues/18 so it would be awesome if you could revive this PR of yours, rebase it and hopefully we can get it merged soon 🤞 Let me know if I can help in any way, I'm not that familiar with the codebase (let alone nftables 🤣) but my Go is "okay" 😅

@alexispires
Copy link
Contributor Author

@alexispires Just an FYI, this is where I want to use this: https://git.mills.io/prologic/box/issues/18 so it would be awesome if you could revive this PR of yours, rebase it and hopefully we can get it merged soon 🤞 Let me know if I can help in any way, I'm not that familiar with the codebase (let alone nftables 🤣) but my Go is "okay" 😅

I'm still waiting opinion (if it changed) from the maintainers about this feature.

However you can fix your problem by doing as this:

  1. Maintain an unique id for each rule.
  2. When creating your rules, set the UserData field with the defined id.
  3. After the flush, fetch all rules.
  4. Retrieve the right rule by comparing the UserData field with the defined id.
  5. Retrieve the handle from the rule retrieved. After that you can return it, as doing here : https://git.mills.io/prologic/box/src/branch/master/nat/nat.go#L305

@prologic
Copy link

I'm still waiting opinion (if it changed) from the maintainers about this feature.

Is that @stapelberg ?

However you can fix your problem by doing as this:

  1. Maintain an unique id for each rule.
  2. When creating your rules, set the UserData field with the defined id.
  3. After the flush, fetch all rules.
  4. Retrieve the right rule by comparing the UserData field with the defined id.
  5. Retrieve the handle from the rule retrieved. After that you can return it, as doing here : https://git.mills.io/prologic/box/src/branch/master/nat/nat.go#L305

Ahh brilliant! I didn't know you could do this 😅

@prologic
Copy link

@alexispires If you at least rebase this PR I can also test it in my project 👌 I'm not sure I wouldn't screw up the merge conflicts myself 😂

@stapelberg
Copy link
Collaborator

It’s been a long time since I last looked at this, so reading up always takes some time. Maybe a summary in the first post with open questions/issues would be helpful.

I will note that I’m not against this feature per se, but we need to introduce it in a way that doesn’t cause breakage.

There seem to be still a number of unaddressed code review comments. To move this forward, rebasing the PR and replying to the comments would be the first step :)

@fungaren
Copy link

fungaren commented Apr 4, 2023

However you can fix your problem by doing as this:

  1. Maintain an unique id for each rule.
  2. When creating your rules, set the UserData field with the defined id.
  3. After the flush, fetch all rules.
  4. Retrieve the right rule by comparing the UserData field with the defined id.
  5. Retrieve the handle from the rule retrieved. After that you can return it, as doing here : https://git.mills.io/prologic/box/src/branch/master/nat/nat.go#L305

Ahh brilliant! I didn't know you could do this 😅

Anyway, this PR is important for us. As for now, use this code as described above:

var errNotFound = errors.New("not found")

// When a new rule is added or inserted, we do not know its handle.
// Use this method to get its handle if you want to delete it later.
func FindRule(c *nftables.Conn, rule *nftables.Rule) (*nftables.Rule, error) {
	rules, err := c.GetRules(rule.Table, rule.Chain)
	if err != nil {
		return nil, err
	}
	for _, r := range rules {
		if bytes.Equal(r.UserData, rule.UserData) {
			for idx, e := range r.Exprs {
				if _, ok := e.(*expr.Lookup); ok {
					continue
				} else if !reflect.DeepEqual(e, rule.Exprs[idx]) {
					goto next
				}
			}
			return r, nil
		}
	next:
	}
	return nil, errNotFound
}
   rule := conn.InsertRule(&nftables.Rule{...})
   err := conn.Flush()
   if err != nil {
      return nil, err
   }
   rule, err = FindRule(conn, rule)
   if err != nil {
      return nil, err
   }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants