Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tailscale connection fails in both Docker container and new LXC container on Proxmox #1824

Open
adoolaard opened this issue Mar 12, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@adoolaard
Copy link

Bug description

I have successfully installed Headscale in a Docker container running on a Proxmox LXC container. I opened ports 80, 443, and 8080 in the Proxmox firewall, forwarding them to port 8080 on the Headscale container.

I can successfully connect to Headscale using the Tailscale apps on my iPhone and Macbook. However, I am unable to connect from:

A Tailscale Docker container running on the same LXC container as Headscale.
A new LXC container where I installed Tailscale with apt install tailscale and ran tailscale up --login-server https://headscale.mydomain.com:443.
When attempting to connect from these containers, nothing happens for 15 minutes before the command times out. I have tried with and without the --authkey option.

For the Docker container, I have some logs, but they are not helpful in understanding the issue. I have tried using both the stable version of Headscale and "v0.23.0-alpha5." My iPhone and Macbook connect successfully with both versions, but Linux and Docker connections fail.

Environment

What I have tried:

Opened the necessary ports in the Proxmox firewall.
Used both stable and alpha versions of Headscale.
Tried connecting with and without the --authkey option.
Checked the Docker container logs (limited information).

Docker Compose configuration:

services:
  tailscale:
    container_name: tailscale
    #image: tailscale/tailscale:stable
    image: tailscale/tailscale:v1.58.2
    hostname: headtailscale
    volumes:
      - ./data:/var/lib/tailscale
      - /dev/net/tun:/dev/net/tun
    network_mode: "host"
    cap_add:
      - NET_ADMIN
      - NET_RAW
    environment:
      - TS_STATE_DIR=/var/lib/tailscale
      - TS_EXTRA_ARGS=--login-server=https://headscale.mydomain.nl --advertise-exit-node --advertise-routes=192.168.1.0/24 --accept-dns=true
      - TS_NO_LOGS_NO_SUPPORT=true
      - TS_AUTHKEY=<my_generated_key>
    restart: unless-stopped

Docker logs:

docker compose up
[+] Running 4/4
 ✔ tailscale 3 layers [⣿⣿⣿]   0B/0B   Pulled                                                                             3.8s 
  ✔ c926b61bad3b Pull complete                                                                                       0.4s 
  ✔ 74bc9945fe25 Pull complete                                                                                       0.4s 
  ✔ 7726f8056532 Pull complete                                                                                       0.8s 
[+] Running 1/1
 ✔ Container tailscale Created                                                                                       7.6s 
Attaching to tailscale
tailscale | boot: 2024/03/12 18:13:34 Starting tailscaled
tailscale | boot: 2024/03/12 18:13:34 Waiting for tailscaled socket
tailscale | 2024/03/12 18:13:34 You have disabled logging. Tailscale will not be able to provide support.
tailscale | 2024/03/12 18:13:34 logtail started
tailscale | 2024/03/12 18:13:34 Program starting: v1.58.2-tb0e1bbb62, Go 1.21.5: []string{"tailscaled", "--socket=/tmp/tailscaled.sock", "--statedir=/var/lib/tailscale", "--tun=userspace-networking"}
tailscale | 2024/03/12 18:13:34 LogID: cc2ab974be4ad126eb5f7d816f99afa6b4c9055812fc865241444fb35aa137fa
tailscale | 2024/03/12 18:13:34 logpolicy: using system state directory "/var/lib/tailscale"
tailscale | 2024/03/12 18:13:34 wgengine.NewUserspaceEngine(tun "userspace-networking") ...
tailscale | 2024/03/12 18:13:34 dns: using dns.noopManager
tailscale | 2024/03/12 18:13:34 link state: interfaces.State{defaultRoute=eth0 ifs={br-249257de4702:[172.20.0.1/16 llu6] br-448c3b3b6366:[172.26.0.1/16 llu6] br-6de1989c1aba:[172.19.0.1/16 llu6] br-da5fa8b46807:[172.18.0.1/16 llu6] br-ffb655d9d88a:[172.21.0.1/16 llu6] docker0:[172.17.0.1/16] eth0:[192.168.1.4/24 llu6] wg0:[10.10.88.1/24]} v4=true v6=false}
tailscale | 2024/03/12 18:13:34 onPortUpdate(port=51777, network=udp6)
tailscale | 2024/03/12 18:13:34 magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
tailscale | 2024/03/12 18:13:34 magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
tailscale | 2024/03/12 18:13:34 onPortUpdate(port=46084, network=udp4)
tailscale | 2024/03/12 18:13:34 magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
tailscale | 2024/03/12 18:13:34 magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
tailscale | 2024/03/12 18:13:34 magicsock: disco key = d:ff5a60f30ec136bd
tailscale | 2024/03/12 18:13:34 Creating WireGuard device...
tailscale | 2024/03/12 18:13:34 Bringing WireGuard device up...
tailscale | 2024/03/12 18:13:34 Bringing router up...
tailscale | 2024/03/12 18:13:34 Clearing router settings...
tailscale | 2024/03/12 18:13:34 Starting network monitor...
tailscale | 2024/03/12 18:13:34 Engine created.
tailscale | 2024/03/12 18:13:34 pm: migrating "_daemon" profile to new format
tailscale | 2024/03/12 18:13:34 envknob: TS_NO_LOGS_NO_SUPPORT="true"
tailscale | 2024/03/12 18:13:34 logpolicy: using system state directory "/var/lib/tailscale"
tailscale | 2024/03/12 18:13:34 got LocalBackend in 18ms
tailscale | 2024/03/12 18:13:34 Start
tailscale | 2024/03/12 18:13:34 Backend: logs: be:cc2ab974be4ad126eb5f7d816f99afa6b4c9055812fc865241444fb35aa137fa fe:
tailscale | 2024/03/12 18:13:34 Switching ipn state NoState -> NeedsLogin (WantRunning=false, nm=false)
tailscale | 2024/03/12 18:13:34 blockEngineUpdates(true)
tailscale | 2024/03/12 18:13:34 health("overall"): error: state=NeedsLogin, wantRunning=false
tailscale | 2024/03/12 18:13:34 wgengine: Reconfig: configuring userspace WireGuard config (with 0/0 peers)
tailscale | 2024/03/12 18:13:34 wgengine: Reconfig: configuring router
tailscale | 2024/03/12 18:13:34 wgengine: Reconfig: configuring DNS
tailscale | 2024/03/12 18:13:34 dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:0}
tailscale | 2024/03/12 18:13:34 dns: Resolvercfg: {Routes:{} Hosts:0 LocalDomains:[]}
tailscale | 2024/03/12 18:13:34 dns: OScfg: {}
tailscale | boot: 2024/03/12 18:13:34 Running 'tailscale up'
tailscale | 2024/03/12 18:13:34 Start
tailscale | 2024/03/12 18:13:34 control: client.Shutdown()
tailscale | 2024/03/12 18:13:34 control: client.Shutdown
tailscale | 2024/03/12 18:13:34 control: authRoutine: exiting
tailscale | 2024/03/12 18:13:34 control: mapRoutine: exiting
tailscale | 2024/03/12 18:13:34 control: updateRoutine: exiting
tailscale | 2024/03/12 18:13:34 control: Client.Shutdown done.
tailscale | 2024/03/12 18:13:34 Backend: logs: be:cc2ab974be4ad126eb5f7d816f99afa6b4c9055812fc865241444fb35aa137fa fe:
tailscale | 2024/03/12 18:13:34 Switching ipn state NoState -> NeedsLogin (WantRunning=true, nm=false)
tailscale | 2024/03/12 18:13:34 blockEngineUpdates(true)
tailscale | 2024/03/12 18:13:34 StartLoginInteractive: url=false
tailscale | 2024/03/12 18:13:34 control: client.Login(false, 2)
tailscale | 2024/03/12 18:13:34 control: LoginInteractive -> regen=true
tailscale | 2024/03/12 18:13:34 control: doLogin(regen=true, hasUrl=false)
tailscale | boot: 2024/03/12 18:14:34 failed to auth tailscale: failed to auth tailscale: tailscale up failed: signal: killed
tailscale exited with code 1

I have searched for similar issues in the existing tickets and documentation but could not find a solution. Any help would be greatly appreciated!

@adoolaard adoolaard added the bug Something isn't working label Mar 12, 2024
@adoolaard
Copy link
Author

adoolaard commented Mar 12, 2024

Update:

In the meantime, I have also installed Headscale bare metal (in a Debian VM in Proxmox). I am experiencing the same issue here. I can connect my Mac and iPhone, but not Linux (via the tailscale up command or the Tailscale Docker container).

@pax0707
Copy link

pax0707 commented Apr 3, 2024

Did you check this:

https://tailscale.com/kb/1130/lxc-unprivileged

@sthomson-wyn
Copy link

sthomson-wyn commented Apr 4, 2024

We see this occasionally as well.

Normally restarting the headscale instance a couple of times fixes it.

This only happens after we update the routes of a subnet router, and only subnet routers are affected. Other clients can connect fine. (We are running the subnet routers in docker containers as well)

The tailscale up command fails with no output, It just times out https://github.com/tailscale/tailscale/blob/ac574d875c7bf6ce16e744b47ce94b74622d550b/cmd/containerboot/main.go#L704

We're unable to find any relevant logs in headscale indicating an error. In fact, headscale logs that it authenticates the node correctly

Our tailscale client containers are configured as such (using container config on GCP GCE)

  - name: test-container
    image: tailscale/tailscale:v1.56.1@sha256:196044d4d339f10bef9bdd639504fb359afbbb6486608f2bc9851aa1f2014e0b
    env:
    - name: TS_EXTRA_ARGS
      value: --login-server https://{headscale} --reset
    - name: TS_ROUTES
      value: {list of routes}
    - name: TS_USERSPACE
      value: 'false'
    - name: TS_STATE_DIR
      value: /var/headscale
    securityContext:
      privileged: true

@sthomson-wyn
Copy link

image

Here are the logs on headscale's side regarding the particular node

@sthomson-wyn
Copy link

I wonder if it's an issue of awkward timing where a machine is declared to be offline while it is trying to authenticate

@sthomson-wyn
Copy link

Some info on timing:

At 2024-04-04 10:14:50.000 headscale reports "Machine successfully authorized"
At 2024-04-04 10:14:51.000 headscale reports "Machine successfully authorized"
At 2024-04-04T14:14:51.078128612Z subnet router node reports "RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authURL=false"
At 2024-04-04 10:15:49.845 subnet router node reports "failed to auth tailscale: failed to auth tailscale: tailscale up failed: signal: killed"
{subnet router docker container restarts}
At 2024-04-04 10:15:50.000 headscale reports "Machine successfully authorized"
At 2024-04-04 10:15:50.454 subnet router node reports "RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authURL=false"
At 2024-04-04 10:15:59.000 headscale reports "Machine successfully authorized"
At 2024-04-04 10:16:50.106 subnet router node reports "failed to auth tailscale: failed to auth tailscale: tailscale up failed: signal: killed"

This auth + timeout behaviour loops indefinitely until we restart headscale a couple of times.

So kind of interesting that headscale reports "machine successfully authorized" twice for each auth attempt

Between that and the fact that this only happens to us intermittently, it feels like some kind of race condition

@simonszu
Copy link

I have the same problem as @adoolaard . Connection from Mac and iOS device is fine, connection from linux is fine on the server side:

2024-05-25T08:58:05+02:00 DBG Registering machine from API/CLI or auth callback expiresAt=<nil> nodeKey=[iYXXZ] registrationMethod=cli userName=simonszu
2024-05-25T08:58:05+02:00 DBG Registering machine machine=naugol machine_key=b5416c5da860668ded90885d6d7a283aec8bf96dcb427f9f70f304f273babc24 node_key=8985d7673375cec652fb5956a3010419a5f6056cf9ac0dee63362a132ecf9204 user=simonszu
2024-05-25T08:58:05+02:00 INF unary dur=21.093901 md={":authority":"/var/run/headscale/headscale.sock","content-type":"application/grpc","user-agent":"grpc-go/1.54.0"} method=RegisterMachine req={"key":"nodekey:8985d7673375cec652fb5956a3010419a5f6056cf9ac0dee63362a132ecf9204","user":"simonszu"} service=headscale.v1.H
eadscaleService
2024-05-25T08:58:05+02:00 DBG go/src/headscale/hscontrol/protocol_common.go:665 > Client is registered and we have the current NodeKey. All clear to /map machine=naugol noise=true
2024-05-25T08:58:05+02:00 INF go/src/headscale/hscontrol/protocol_common.go:703 > Machine successfully authorized machine=naugol noise=true
2024-05-25T08:58:05+02:00 DBG A machine is entering polling via the Noise protocol handler=NoisePollNetMap machine=naugol
2024-05-25T08:58:05+02:00 DBG Client map request processed handler=PollNetMap machine=naugol noise=true omitPeers=true readOnly=false stream=false
2024-05-25T08:58:05+02:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=naugol noise=true
2024-05-25T08:58:05+02:00 DBG A machine is entering polling via the Noise protocol handler=NoisePollNetMap machine=naugol
2024-05-25T08:58:05+02:00 DBG Client map request processed handler=PollNetMap machine=naugol noise=true omitPeers=false readOnly=false stream=true
2024-05-25T08:58:05+02:00 INF Client is ready to access the tailnet handler=PollNetMap machine=naugol noise=true
2024-05-25T08:58:05+02:00 INF Sending initial map handler=PollNetMap machine=naugol noise=true
2024-05-25T08:58:05+02:00 INF Notifying peers handler=PollNetMap machine=naugol noise=true

However, the client side does not seem to get the callback/response, and therefore the login command hangs indefinitely. No idea why, any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants