Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Replace libnatpmp with built-in PCP+NATPMP implementation #30043

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented May 5, 2024

Continues #30005. Closes #17012..

This PR adds PCP (Port Control Protocol) from RFC6887. This adds, in addition to the existing IPv4 port mapping (which now uses PCP, with fallback to NAT-PMP), support for IPv6 pinholing-that is, opening a port on the firewall to make it reachable.

PCP, like NAT-PMP is a simple UDP-based protocol, and the implementation is self-contained, so this gets rid of lthe libnatpnp dependency without adding a new one. It should otherwise be a drop-in replacement. NAT-PMP fallback is implemented so this will not make router support worse.

For now it is disabled by default, though in the future (not in this PR) we could consider enable it by default to increase the number of connectable nodes without adding significant attack surface.

To test:

bitcoind -regtest -natpmp=1 -debug=net

(most of the changes in this PR are, ironically, removing the libnatpmp dependency and associated build system and build docs)

TODO

@DrahtBot
Copy link
Contributor

DrahtBot commented May 5, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK Sjors, theuni, vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29852 ([WIP] build: remove need to test for endianness by fanquake)
  • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
  • #29015 (kernel: Streamline util library by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 5, 2024

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24606743714

@Sjors
Copy link
Member

Sjors commented May 6, 2024

Concept ACK

@theuni
Copy link
Member

theuni commented May 6, 2024

Whoa :)

(Concept ACK)

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC6887 appendix A describes how a router that supports NAP-PMP but not PCP, will return UNSUPP_VERSION. A log message could encourage users to try -upnp instead (if not already enabled). Or upgrade their ancient router firmware :-)

Got distracted during review, will continue later. I'll look into how we can preserve a previously selected NatPMP checkbox in the GUI.

src/init.cpp Outdated Show resolved Hide resolved
@theuni
Copy link
Member

theuni commented May 6, 2024

Regarding the custom lib/self-impl discussion in #30005:
Taking a quick look at the implementation here, I think it's simple enough for us to maintain ourselves. If a nice canonical lib ever emerges we could always jump to it, as there are only a few basic functions and presumably we could probably switch them out close to 1:1.

That said, it is a little rough to review as

  • It needs to be compared to the letter of the spec, with the assumption that @laanwj is evil or has mis-implemented (I doubt that :)
  • It needs to be very defensive, though the attack surface seems quite minimal
  • It needs to be aware of real-world violators/benders/extenders of the spec (if any? I have no idea.)

But it seems worth the effort to me.

@laanwj
Copy link
Member Author

laanwj commented May 7, 2024

Rebased for #29984

@laanwj
Copy link
Member Author

laanwj commented May 7, 2024

That said, it is a little rough to review as
It needs to be compared to the letter of the spec, with the assumption that @laanwj is evil or has mis-implemented (I doubt that :)

If you're more comfortable comparing it against another implementation there's:

It needs to be aware of real-world violators/benders/extenders of the spec (if any? I have no idea.)

Miniupnpd's is, likely, the most common server implementation in the wild. It's been tested against that. i'm hoping people will test this on various routers and conditions, there will be inevitable edge cases to iron out.

If a nice canonical lib ever emerges we could always jump to it

Maybe, but at some point there's not much difference between implementing a simple request/reply protocol and using a library. At least the RFC is unambigiously documented, which can't be said of many FOSS ABI's. Also, @fanquake (and @gmaxwell in the past) has been hoping for a solution that doesn't introduce a library dependency.

src/util/pcp.cpp Outdated Show resolved Hide resolved
@luke-jr
Copy link
Member

luke-jr commented May 8, 2024

Would prefer this in two steps (add PCP, then remove NAT-PMP)

@laanwj
Copy link
Member Author

laanwj commented May 9, 2024

Would prefer this in two steps (add PCP, then remove NAT-PMP)

i'm not planning to do this, the build system commits are already set up this way, but doing it throughout would involve adding another setting in Qt just to remove it later. Same for adding a third mechanism in portmap.cpp. Agree with @Sjors that having a forest of different port mapping settings is confusing to the user.

Edit: no longer relevant now that this implements PCP with NAT-PMP fallaback.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Ubuntu 24.04 I'm getting in bound via IPv4 and IPv6.

When a GUI user previously had NAT-PMP selected there would be a "natpmp": true field in settings.json.

We need to rename this to "tcp" in order for the box to remain checked.

Maybe @ryanofsky has an idea how to do this elegantly?

A little hack that works is to put the following at the end of ReadSettings() in settings.cpp:

    // Migrate settings:
    if (values.contains("natpmp")) {
        auto el = values.extract("natpmp");
        el.key() = "pcp";
        values.insert(std::move(el));
    }

A bit nicer would be to have a MigrateSettingsFile() that's called between ReadSettingsFile and WriteSettingsFile in init.cpp.

(not to be confused with OptionsModel::checkAndMigrate() which is called before settings.json is read)


Maybe you can split src/util/netif.h off into a separate PR, along with b06edec.

src/qt/optionsdialog.cpp Outdated Show resolved Hide resolved
src/qt/forms/optionsdialog.ui Outdated Show resolved Hide resolved
@Sjors
Copy link
Member

Sjors commented May 9, 2024

In terms of which IPv6 address to use, my understanding is this:

  1. In the olden days the IPv6 address contained part of the MAC address. This ensured uniqueness, but was bad for mobile device privacy; no matter where you went, part of your IP address was constant. Such addresses always (?) contain 0xfffe. See https://datatracker.ietf.org/doc/html/rfc7707
  2. A new standard was introduced that uses a hash of both the MAC address and the prefix, ensuring it's stable if you stay on the same network, but changes when you move. See https://datatracker.ietf.org/doc/html/rfc7217#page-19
  3. There's also temporary addresses, which are rotated every couple of hours. See https://datatracker.ietf.org/doc/html/rfc8981

If we were able to tell which one one is which, then I think we should pick only one, in order of preference: (2), (3), (1). This reflects our desire to actually get inbound connections, even after a shutdown, while at the same time not doxxing ourselves when on the move (mainly for laptops, perhaps mobile in the future).

We can easily detect type (1) by looking for 0xfffe at the right position (and then least prefer it).

I believe you can detect (2) by looking for IFA_F_STABLE_PRIVACY in flags of the inet6_ifaddr struct. It seems getifaddrs doesn't have access to that. Neither does /proc/net/if_inet6 since https://patchwork.ozlabs.org/project/netdev/patch/1386680189-7852-1-git-send-email-jiri@resnulli.us/

Afaik there's also no guarantee about the order of results.

So maybe we should just pick one at random rather than announce all three. This also seems orthogonal to PCP.

@Sjors
Copy link
Member

Sjors commented May 27, 2024

Actually this might be an existing bug: miniupnp/miniupnp@7bd0877

I'll see if I can convince OPNsense to ship >= 2.3.6, tracking at opnsense/plugins#4003.

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

That does look like an issue that would affect renewal, but i'm not sure it would cause this specific issue, if i understand correctly it'd bump the renewal timestamp so far it would live forever. It wouln't make it complain about the port being used. Unless it's an old mapping that sticks around. But that wouldn'e explain why it only happens on renewal.

@Sjors
Copy link
Member

Sjors commented May 27, 2024

generating a new nonce every time for every mapping

That doesn't seem like a good idea. IIUC the nonce is used to recognise the requesting application, and to distinguish a renew request from some other app trying to use the same port.

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

That doesn't seem like a good idea. IIUC the nonce is used to recognise the requesting application, and to distinguish a renew request from some other app trying to use the same port.

It's not supposed to be a good idea, but i wonder if it works around the issue with your router, if they implemented the protocol wrongly. i honestly have no idea what could be the problem here i'm just guessing.

@Sjors
Copy link
Member

Sjors commented May 27, 2024

I thought perhaps the issue was that miniupnpd checks whether at least half the lease time went by and would refuse if not. So then if the lease time was too far in the future, it wouldn't allow renewal.

But looking through the source code it doesn't appear to care about that. There's a few places that can trigger a PCP_ERR_NO_RESOURCES reply, but so far those don't offer an obvious explanation.


The pinhole code is mostly seperate from port mappings. Depending on pcp_msg_info->is_fw it will call CreatePCPMap_FW for an IPv6 pinhole. My guess is this function doesn't recognise the renewal attempt as such, treats it as a new request and then fails. I need to figure out how to read the router log messages.


Figured out how to get verbose logging (opnsense/plugins#4004).

It does log updating pinhole to before the failure, so it does find the existing route and knows that it should update. The line after that log statement calls upnp_update_inboundpinhole and interprets any failure as PCP_ERR_NO_RESOURCES. There's only two ways that can fail:

  1. If #if defined(USE_PF) || defined(USE_NETFILTER) is false, because it'll return -42; /* not implemented */. But upnp_find_inboundpinhole has the same guard, so that can't be it.

  2. If update_pinhole fails, which can only fail if get_pinhole fails. That implies that the uid returned by find_pinhole is wrong.

I'm probably still missing something... I should test if deleting an IPv6 pinhole does work.


Deleting a pinhole succeeds. The log message is wrong (PCP: UDP port 8 mapping removed), which is an unrelated bug miniupnp/miniupnp#743.

I also tried opening only a single pinhole, to rule out some race condition. But that doesn't make a difference.


Anyway, I'm fairly sure it's not a bug in this PR.

src/util/pcp.cpp Outdated Show resolved Hide resolved
@laanwj
Copy link
Member Author

laanwj commented May 28, 2024

Thanks for investigating in detail.
FWIW, i'm not seeing that problem on TurrisOS (openwrt) with miniupnpd 2.3.3. Renewals work fine, both with 20 minute and 5 minute reannounce period. Both for IPv4 and IPv6. So it's either something that was solved in the meantime, or something different in your networking situation.

Makes the name of the constant (approximately) match the
behavior again.
@Sjors
Copy link
Member

Sjors commented May 28, 2024

Both our setups are at miniupnpd 2.3.3; the plugin description incorrectly says 2.3.1: opnsense/plugins#4003 (comment)

So perhaps there's a difference in how miniupnpd controls the router / firewall itself, which causes the renewal to fail on my end. This is above my pay grade...

@laanwj
Copy link
Member Author

laanwj commented May 28, 2024

i've been trying to build the package myself, and there's miniupnpd-nftables and miniupnpd-iptables, do you know which one you have? i see i have -iptables installed at the moment, it appears.

Edit: i installed the -nftables variant and rebooted, and it's giving me NO_RESOURCES even on first rquest, for IPv4 and IPv6! Restored the -iptables variant and it worked again. No idea if it's the same issue of course. May just be a kernel issue.

@Sjors
Copy link
Member

Sjors commented May 28, 2024

No idea. OPNsense is built on FreeBSD and I think the plugin just relies on whatever miniupnpd comes with the OS:

# pkg info miniupnpd
miniupnpd-2.3.3_3,1
Name           : miniupnpd
Version        : 2.3.3_3,1
Installed on   : Thu May  2 10:44:17 2024 CEST
Origin         : net/miniupnpd
Architecture   : FreeBSD:13:amd64
Prefix         : /usr/local
Categories     : net
Licenses       : BSD3CLAUSE
Maintainer     : squat@squat.no
WWW            : http://miniupnp.free.fr/
Comment        : UPnP IGD implementation which uses pf
Options        :
	CHECK_PORTINUSE: on
	IPV6           : on
	LEASEFILE      : off
	UPNP_IGDV2     : off
	UPNP_STRICT    : off
Shared Libs required:
	libssl.so.12
	libpfctl.so.0
	libcrypto.so.12
Annotations    :
	FreeBSD_version: 1302001
	cpe            : cpe:2.3:a:miniupnp_project:miniupnpd:2.3.3:::::freebsd13:x64:3
	repo_type      : binary
	repository     : OPNsense
Flat size      : 160KiB
Description    :
Mini UPnPd is a lightweight implementation of a UPnP IGD daemon. This is
supposed to be run on your gateway machine to allow client systems to redirect
ports and punch holes in the firewall.

I installed 2.3.6 from source and got the same NO_RESOURCES error upon IPv6 pinhole renewal. Tried 5fcf0c281fd4e3fa3f32114824c1dee8f78cca03 for good measure. Ditto fail.


Update: @laanwj figured it out pinned it down: miniupnp/miniupnp#747

MaxJitphanu

This comment was marked as spam.

Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
Comment on lines +28 to +33
const int s{socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)};
if (s < 0) {
LogPrintLevel(BCLog::NET, BCLog::Level::Error, "socket(AF_NETLINK): %s\n", NetworkErrorString(errno));
return std::nullopt;
}
Sock sock{static_cast<SOCKET>(s)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the discussion from #30043 (comment), so that messages are grouped together, not scattered in the main PR thread.

The first two commits of #26812 would make it possible to test and fuzz how this code interacts with a router.

So ive been thinking about this, do we have a mockable way to do std::optional socket(int domain, int type, int protocol)? ...

Almost. Right now we have the "pointer"

std::function<std::unique_ptr<Sock>(const sa_family_t&)> CreateSock;

which in the real code points to the function

std::unique_ptr<Sock> CreateSockOS(sa_family_t address_family);

Tests that wish to mock the sockets redirect the pointer to another one CreateSockWhateverMockedStuff().

That address_family is the first argument to the socket(2) syscall. What's needed is to extend CreateSock*() with the other two arguments - type and protocol. I can do that.

@Sjors, are you ACK on the first 2 commits of #26812? If yes, should I create a separate PR with those 2 commits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasild I haven't had a chance to thoroughly review the second commit. Making a separate PR could help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

That address_family is the first argument to the socket(2) syscall. What's needed is to extend CreateSock*() with the other two arguments - type and protocol. I can do that.

Yes, for specifying UDP (and NETLINK, but i do not intend to make a test framework for that) it would need all three arguments.

Copy link
Contributor

@vasild vasild May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended CreateSock() in the topmost commit in https://github.com/vasild/bitcoin/commits/extend_CreateSock/.
Separate PR or include in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is already quite large, so let's make a fresh one that this can rebase on if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree adding fuzzing/testing makes sense as a follow-up PR. It's already big enough, and also i don't want to interfere with review by doing more active development here than address review comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #30202 "netbase: extend CreateSock() to support creating arbitrary sockets"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a separate PR could help

Done, the first two commits from #26812 extracted into a separate PR: #30205

Copy link
Member

@Sjors Sjors May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a bit confused about these two new PRs, but I guess they are orthogonal.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25552745115

this should remove integer comparison warnings for all platforms.
@Sjors
Copy link
Member

Sjors commented May 30, 2024

I was able to build b4535d4 on my BSD 13.2 VM (errors and warnings are gone).

Opening ports fails, maybe because it's a VM, but "Address family not supported by protocol family" seems an odd error for that.

2024-05-23T01:58:47Z [net:error] socket(AF_NETLINK): Address family not supported by protocol family (47)
2024-05-23T01:58:47Z [net] portmap: Could not determine IPv4 default gateway
2024-05-23T01:58:47Z [net:error] socket(AF_NETLINK): Address family not supported by protocol family (47)
2024-05-23T01:58:47Z addcon thread start
2024-05-23T01:58:47Z [net] portmap: Could not determine IPv6 default gateway

On BSB 14.0 it seems to work fine:

2024-05-27T17:37:34Z mapport thread start
2024-05-27T17:37:34Z [net] portmap: gateway [IPv4]: 10.0.2.2
2024-05-27T17:37:34Z [net] pcp: Requesting port mapping for addr 0.0.0.0 port 38333 from gateway 10.0.2.2
2024-05-27T17:37:34Z Bound to [::]:38333
2024-05-27T17:37:34Z [net] pcp: Internal address after connect: 10.0.2.15
2024-05-27T17:37:34Z Bound to 0.0.0.0:38333
2024-05-27T17:37:35Z [net] pcp: Timeout
2024-05-27T17:37:35Z [net] pcp: Retrying (1)
2024-05-27T17:37:36Z [net] pcp: Timeout
2024-05-27T17:37:36Z [net] pcp: Retrying (2)
2024-05-27T17:37:37Z [net] pcp: Timeout
2024-05-27T17:37:37Z [net] pcp: Giving up after 3 tries

(it doesn't actually get a mapping, but that's expected because VirtualBox doesn't come with PCP, afaik)

@laanwj
Copy link
Member Author

laanwj commented May 30, 2024

Thanks for testing. Sounds like AF_NETLINK is not actually supported on that kernel, even though the necessary stuff is in the headers.

On BSB 14.0 it seems to work fine:

So let's bump the minimum to FreeBSD 14.0 instead?

(it doesn't actually get a mapping, but that's expected because VirtualBox doesn't come with PCP, afaik)

It'd likely work if you set VirtualBox's VM networking to bridged instead of local NAT.

On 13.2 it compiles but doesn't actually work at runtime due to
AF_NETLINK error.
@Sjors
Copy link
Member

Sjors commented May 30, 2024

With a bridged network it indeed works.

Also, as suggested by @vasild, doing kldload /boot/kernel/netlink.ko makes things work on FreeBSD 13.2.

@Self-Hosting-Group
Copy link

Maybe too late or has been looked at. A PCP/NAT-PMP library https://github.com/libpcp/pcp

@laanwj
Copy link
Member Author

laanwj commented Jun 1, 2024

Maybe too late or has been looked at. A PCP/NAT-PMP library https://github.com/libpcp/pcp

Yes, some revievers have been looking at it as reference/comparison.

This has been said before but: implementing a self-contained version of the RFCs here is intentional, we don't want to introduce a dependency on a third-party library. It shoudn't be needed for a straightforward fixed packet-based protocol. The intent is to have code that integrates with bitcoin core's frameworks for logging, networking and testing/fuzzing. At some point we want to be confident enough about it to enable it by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

Add IPv6 pinhole support using UPnP / NAT-PMP