Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting new configured images hangs for > 1min #3031

Open
maggie44 opened this issue Feb 15, 2023 · 1 comment
Open

Starting new configured images hangs for > 1min #3031

maggie44 opened this issue Feb 15, 2023 · 1 comment

Comments

@maggie44
Copy link

maggie44 commented Feb 15, 2023

https://balena.zulipchat.com/#narrow/stream/345889-balena-io.2Fos/topic/Delay.20in.20joining.20the.20Cloud.20on.20first.20boot/near/327504498

RPI 4 (balenaOS 2.108.27) appears to hang before joining the Cloud on first boot when a new image is configured. Confirmed this is the case both when using balena os config and when downloading configured images from the Cloud.

Feb 14 17:11:43 937d7c5 balena-supervisor[2135]: inactive
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Killing process 1863 (balena) with signal SIGKILL.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Killing process 1858 (balena) with signal SIGKILL.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Killing process 1854 (balena) with signal SIGKILL.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: State 'stop-sigterm' timed out. Killing.

Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 8, refid: 823CCC0A, correction: 0.000000000, skew: 1000000.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2108]: 2023-02-14T17:10:35Z System clock was stepped by 1.087356 seconds
Feb 14 17:10:34 937d7c5 healthdog[2108]: 2023-02-14T17:10:34Z System clock wrong by 1.087356 seconds
Feb 14 17:10:34 937d7c5 healthdog[2108]: 2023-02-14T17:10:34Z Selected source 130.60.204.10 (1.resinio.pool.ntp.org)
Feb 14 17:10:32 937d7c5 nm-dispatcher[2129]: Setting NTP source on/offline status (connectivity-change FULL).
Feb 14 17:10:32 937d7c5 dbus-daemon[1514]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb 14 17:10:32 937d7c5 dbus-daemon[1514]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.3' (uid=0 pid=>
Feb 14 17:10:32 937d7c5 NetworkManager[1825]: <info>  [1676394632.8895] manager: NetworkManager state is now CONNECTED_GLOBAL

Feb 14 17:10:29 937d7c5 systemd[1]: Failed to start balena-supervisor.json watcher service.
Feb 14 17:10:29 937d7c5 systemd[1]: balena-supervisor-conf.service: Failed with result 'exit-code'.
Feb 14 17:10:29 937d7c5 systemd[1]: balena-supervisor-conf.service: Main process exited, code=exited, status=1/FAILURE
Feb 14 17:10:29 937d7c5 os-config[1870]: Awaiting balena-supervisor.service to exit...
Feb 14 17:10:28 937d7c5 systemctl[1846]: Job for balena-supervisor.service canceled.
Feb 14 17:10:28 937d7c5 os-config[1870]: Stopping balena-supervisor.service...

It seems the systemd attempt to stop balena-supervisor.service hangs.

balena-supervisor.service:

[Unit]
Description=Balena supervisor
Requires=\
    resin\x2ddata.mount \
    balena-device-uuid.service \
    os-config-devicekey.service \
    bind-etc-balena-supervisor.service \
    extract-balena-ca.service
Wants=\
    migrate-supervisor-state.service
After=\
    balena.service \
    resin\x2ddata.mount \
    balena-device-uuid.service \
    os-config-devicekey.service \
    bind-etc-systemd-system-resin.target.wants.service \
    bind-etc-balena-supervisor.service \
    migrate-supervisor-state.service \
    extract-balena-ca.service
Wants=balena.service
ConditionPathExists=/etc/balena-supervisor/supervisor.conf

[Service]
Type=simple
Restart=always
RestartSec=10s
WatchdogSec=180
SyslogIdentifier=balena-supervisor
EnvironmentFile=/etc/balena-supervisor/supervisor.conf
EnvironmentFile=-/tmp/update-supervisor.conf
ExecStartPre=-/usr/bin/balena stop resin_supervisor
ExecStartPre=-/usr/bin/balena stop balena_supervisor
ExecStartPre=/bin/systemctl is-active balena.service
ExecStart=/usr/bin/healthdog --healthcheck=/usr/lib/balena-supervisor/balena-supervisor-healthcheck /usr/bin/start-balena-supervisor
ExecStop=-/usr/bin/balena stop balena_supervisor

[Install]
WantedBy=multi-user.target
Alias=resin-supervisor.service

It attempts to call balena stop balena_supervisor before balenad has started. There shouldn't be any reason for that to hang though, but instead return an immediate error that the socket isn't available. If balenad was up, then the default timeout on killing docker containers is around 10 seconds, so assuming it isn't reaching as far as balenad and making a request.

Perhaps instead, it is the termination of the healthdog service that is causing the hang, or one of the dependant after or want processes. Healthdog is connected to a tree of shell scripts.

That is as far as I have got with the investigation.

Full logs:

Feb 14 17:12:00 937d7c5 balena-supervisor[2300]: Error response from daemon: No such container: resin_supervisor
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.813653215Z" level=info msg="API listen on [::]:2375"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.780013159Z" level=info msg="API listen on /var/run/balena-engine.sock"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.766651178Z" level=info msg="API listen on /var/run/balena.sock"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.756384511Z" level=info msg="API listen on /run/balena-engine.sock"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.655560937Z" level=info msg="Daemon has completed initialization"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.655352511Z" level=info msg="Docker daemon" commit=13db38c82bdb056f013f5497b0662ad34ffb98f7 graphdriver(s)=overlay2 version=20.10.17
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.480075937Z" level=info msg="Loading containers: done."
Feb 14 17:12:00 937d7c5 NetworkManager[1825]: <info>  [1676394720.3150] manager: (balena0): new Bridge device (/org/freedesktop/NetworkManager/Devices/6)
Feb 14 17:12:00 937d7c5 systemd-udevd[2213]: Using default interface naming scheme 'v250'.
Feb 14 17:12:00 937d7c5 kernel: Bridge firewalling registered
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.091054178Z" level=info msg="Loading containers: start."
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.090206752Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.090160030Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Feb 14 17:12:00 937d7c5 balenad[2137]: time="2023-02-14T17:12:00.090052956Z" level=warning msg="Your kernel does not support CPU realtime scheduler"
Feb 14 17:11:59 937d7c5 balenad[2137]: time="2023-02-14T17:11:59.945015197Z" level=info msg="Storage migration skipped: %!s(<nil>)"
Feb 14 17:11:59 937d7c5 balenad[2137]: time="2023-02-14T17:11:59.942876289Z" level=warning msg="Configured runtime \"runc\" is deprecated and will be removed in the next release"
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.810888345Z" level=info msg="containerd successfully booted in 0.111913s"
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.810591456Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd.sock
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.810043771Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd.sock.ttrpc
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.809472382Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd-debug.sock
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.808258901Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.806577104Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.806150715Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.805837864Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.805439308Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.805094178Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.804784808Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.804472567Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.804177308Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.803877789Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.803573289Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.803218049Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.802752549Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.802139215Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.799774771Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.798743086Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.797872530Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.797520845Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.797164086Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.796696956Z" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.796424030Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.796163956Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.795913827Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.795635086Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.795333641Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.794924864Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.794555030Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.791332252Z" level=info msg="metadata content store policy set" policy=shared
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.791094623Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="invalid aufs configuration"
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.790733993Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.789634882Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.788794438Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.788547641Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="invalid aufs configuration"
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.788095975Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.784493160Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
Feb 14 17:11:59 937d7c5 balenad[2200]: time="2023-02-14T17:11:59.708534012Z" level=info msg="starting containerd" revision= version=1.4.0+unknown
Feb 14 17:11:59 937d7c5 balenad[2137]: time="2023-02-14T17:11:59.591938956Z" level=info msg="libcontainerd: started new balena-engine-containerd process" pid=2200
Feb 14 17:11:57 937d7c5 openvpn[2192]: 2023-02-14 17:11:57 SIGTERM[soft,auth-failure] received, process exiting
Feb 14 17:11:57 937d7c5 openvpn[2192]: 2023-02-14 17:11:57 AUTH: Received control message: AUTH_FAILED
Feb 14 17:11:57 937d7c5 openvpn[2192]: 2023-02-14 17:11:57 SENT CONTROL [vpn.balena-cloud.com]: 'PUSH_REQUEST' (status=1)
Feb 14 17:11:56 937d7c5 os-config[2191]: No configuration changes
Feb 14 17:11:56 937d7c5 os-config[2191]: Service configuration retrieved
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 [vpn.balena-cloud.com] Peer Connection Initiated with [AF_INET]52.7.228.224:443
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA256
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 VERIFY OK: depth=0, C=US, ST=WA, L=Seattle, O=balena.io, OU=balenaCloud, CN=vpn.balena-cloud.com
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 VERIFY EKU OK
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 Validating certificate extended key usage
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 VERIFY KU OK
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 VERIFY OK: depth=1, C=US, ST=WA, L=Seattle, O=balena.io, OU=balenaCloud, CN=open-balena-vpn-rootCA
Feb 14 17:11:56 937d7c5 os-config[2191]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 TLS: Initial packet from [AF_INET]52.7.228.224:443, sid=4dafbb42 2a432d6a
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 NOTE: UID/GID downgrade will be delayed because of --client, --pull, or --up-delay
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 TCP_CLIENT link remote: [AF_INET]52.7.228.224:443
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 TCP_CLIENT link local: (not bound)
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 TCP connection established with [AF_INET]52.7.228.224:443
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 Attempting to establish TCP connection with [AF_INET]52.7.228.224:443 [nonblock]
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 Socket Buffers: R=[131072->131072] S=[16384->16384]
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 TCP/UDP: Preserving recently used remote address: [AF_INET]52.7.228.224:443
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 library versions: OpenSSL 3.0.5 5 Jul 2022, LZO 2.10
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 OpenVPN 2.5.6 aarch64-poky-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Mar 16 2022
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 WARNING: file '/var/volatile/vpn-auth' is group or others accessible
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 --cipher is not set. Previous OpenVPN version defaulted to BF-CBC as fallback when cipher negotiation failed in this case. If you need this fal>
Feb 14 17:11:56 937d7c5 openvpn[2192]: 2023-02-14 17:11:56 WARNING: Compression for receiving enabled. Compression has been used in the past to break encryption. Sent packets are not compressed unless ">
Feb 14 17:11:46 937d7c5 openvpn[2181]: 2023-02-14 17:11:46 SIGTERM[soft,auth-failure] received, process exiting
Feb 14 17:11:46 937d7c5 openvpn[2181]: 2023-02-14 17:11:46 AUTH: Received control message: AUTH_FAILED
Feb 14 17:11:46 937d7c5 openvpn[2181]: 2023-02-14 17:11:46 SENT CONTROL [vpn.balena-cloud.com]: 'PUSH_REQUEST' (status=1)
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 [vpn.balena-cloud.com] Peer Connection Initiated with [AF_INET]52.7.228.224:443
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA256
Feb 14 17:11:44 937d7c5 os-config[2180]: No configuration changes
Feb 14 17:11:44 937d7c5 os-config[2180]: Service configuration retrieved
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 VERIFY OK: depth=0, C=US, ST=WA, L=Seattle, O=balena.io, OU=balenaCloud, CN=vpn.balena-cloud.com
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 VERIFY EKU OK
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 Validating certificate extended key usage
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 VERIFY KU OK
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 VERIFY OK: depth=1, C=US, ST=WA, L=Seattle, O=balena.io, OU=balenaCloud, CN=open-balena-vpn-rootCA
Feb 14 17:11:44 937d7c5 os-config[2180]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 TLS: Initial packet from [AF_INET]52.7.228.224:443, sid=34ddcae0 44649265
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 NOTE: UID/GID downgrade will be delayed because of --client, --pull, or --up-delay
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 TCP_CLIENT link remote: [AF_INET]52.7.228.224:443
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 TCP_CLIENT link local: (not bound)
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 TCP connection established with [AF_INET]52.7.228.224:443
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 Attempting to establish TCP connection with [AF_INET]52.7.228.224:443 [nonblock]
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 Socket Buffers: R=[131072->131072] S=[16384->16384]
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 TCP/UDP: Preserving recently used remote address: [AF_INET]52.7.228.224:443
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 library versions: OpenSSL 3.0.5 5 Jul 2022, LZO 2.10
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 OpenVPN 2.5.6 aarch64-poky-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Mar 16 2022
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 WARNING: file '/var/volatile/vpn-auth' is group or others accessible
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 --cipher is not set. Previous OpenVPN version defaulted to BF-CBC as fallback when cipher negotiation failed in this case. If you need this fal>
Feb 14 17:11:44 937d7c5 openvpn[2181]: 2023-02-14 17:11:44 WARNING: Compression for receiving enabled. Compression has been used in the past to break encryption. Sent packets are not compressed unless ">
Feb 14 17:11:44 937d7c5 balenad[2137]: time="2023-02-14T17:11:44.567818965Z" level=warning msg="Support for listening on TCP without authentication or explicit intent to run without authentication will >
Feb 14 17:11:44 937d7c5 balenad[2137]: time="2023-02-14T17:11:44.567779484Z" level=warning msg="You can override this by explicitly specifying '--tls=false' or '--tlsverify=false'" host="tcp://0.0.0.0:2>
Feb 14 17:11:44 937d7c5 balenad[2137]: time="2023-02-14T17:11:44.567738706Z" level=warning msg="Please consider generating tls certificates with client validation to prevent exposing unauthenticated roo>
Feb 14 17:11:44 937d7c5 balenad[2137]: time="2023-02-14T17:11:44.567625873Z" level=warning msg="Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to >
Feb 14 17:11:44 937d7c5 extract-balena-ca[2170]: [extract-balena-ca][INFO] The config.json file does not contain custom CA
Feb 14 17:11:44 937d7c5 prepare-openvpn[2159]: prepare-openvpn: [INFO] Balena.io VPN authentication.
Feb 14 17:11:44 937d7c5 os-config[1870]: Starting balena-supervisor.service...
Feb 14 17:11:44 937d7c5 os-config[1870]: /home/root/.ssh/authorized_keys_remote updated
Feb 14 17:11:44 937d7c5 os-config[1870]: Starting openvpn.service...
Feb 14 17:11:44 937d7c5 os-config[1870]: Starting prepare-openvpn.service...
Feb 14 17:11:44 937d7c5 os-config[1870]: /etc/openvpn/openvpn.conf updated
Feb 14 17:11:44 937d7c5 os-config[1870]: /etc/openvpn/ca.crt updated
Feb 14 17:11:44 937d7c5 os-config[1870]: Awaiting openvpn.service to exit...
Feb 14 17:11:44 937d7c5 os-config[1870]: Awaiting prepare-openvpn.service to exit...
Feb 14 17:11:44 937d7c5 os-config[1870]: Stopping openvpn.service...
Feb 14 17:11:44 937d7c5 os-config[1870]: Stopping prepare-openvpn.service...
Feb 14 17:11:43 937d7c5 balenad[2137]: time="2023-02-14T17:11:43.567149410Z" level=warning msg="Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out>
Feb 14 17:11:43 937d7c5 balenad[2137]: time="2023-02-14T17:11:43.567027947Z" level=warning msg="Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone wh>
Feb 14 17:11:43 937d7c5 balenad[2137]: time="2023-02-14T17:11:43.554991410Z" level=warning msg="Running experimental build"
Feb 14 17:11:43 937d7c5 balenad[2137]: time="2023-02-14T17:11:43.554828725Z" level=info msg="Starting up"
Feb 14 17:11:43 937d7c5 extract-balena-ca[2141]: [extract-balena-ca][INFO] The config.json file does not contain custom CA
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Failed with result 'timeout'.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Control process exited, code=exited, status=3/NOTIMPLEMENTED
Feb 14 17:11:43 937d7c5 balena-supervisor[2135]: inactive
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Killing process 1863 (balena) with signal SIGKILL.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Killing process 1858 (balena) with signal SIGKILL.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: Killing process 1854 (balena) with signal SIGKILL.
Feb 14 17:11:43 937d7c5 systemd[1]: balena-supervisor.service: State 'stop-sigterm' timed out. Killing.

Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 8, refid: 823CCC0A, correction: 0.000000000, skew: 1000000.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2120]: try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000
Feb 14 17:10:35 937d7c5 healthdog[2108]: 2023-02-14T17:10:35Z System clock was stepped by 1.087356 seconds
Feb 14 17:10:34 937d7c5 healthdog[2108]: 2023-02-14T17:10:34Z System clock wrong by 1.087356 seconds
Feb 14 17:10:34 937d7c5 healthdog[2108]: 2023-02-14T17:10:34Z Selected source 130.60.204.10 (1.resinio.pool.ntp.org)
Feb 14 17:10:32 937d7c5 nm-dispatcher[2129]: Setting NTP source on/offline status (connectivity-change FULL).
Feb 14 17:10:32 937d7c5 dbus-daemon[1514]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb 14 17:10:32 937d7c5 dbus-daemon[1514]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.3' (uid=0 pid=>
Feb 14 17:10:32 937d7c5 NetworkManager[1825]: <info>  [1676394632.8895] manager: NetworkManager state is now CONNECTED_GLOBAL

Feb 14 17:10:29 937d7c5 systemd[1]: Failed to start balena-supervisor.json watcher service.
Feb 14 17:10:29 937d7c5 systemd[1]: balena-supervisor-conf.service: Failed with result 'exit-code'.
Feb 14 17:10:29 937d7c5 systemd[1]: balena-supervisor-conf.service: Main process exited, code=exited, status=1/FAILURE
Feb 14 17:10:29 937d7c5 os-config[1870]: Awaiting balena-supervisor.service to exit...
Feb 14 17:10:28 937d7c5 systemctl[1846]: Job for balena-supervisor.service canceled.
Feb 14 17:10:28 937d7c5 os-config[1870]: Stopping balena-supervisor.service...
Feb 14 17:10:28 937d7c5 os-config[1870]: Service configuration retrieved
Feb 14 17:10:27 937d7c5 healthdog[2108]: 2023-02-14T17:10:27Z System clock was stepped by 0.000000 seconds
Feb 14 17:10:27 937d7c5 kernel: cam-dummy-reg: disabling
Feb 14 17:10:27 937d7c5 healthdog[2110]: [chrony-healthcheck][INFO] No online NTP sources - forcing poll
Feb 14 17:10:27 937d7c5 healthdog[2108]: 2023-02-14T17:10:27Z Running with root privileges
Feb 14 17:10:27 937d7c5 healthdog[2108]: 2023-02-14T17:10:27Z chronyd version 4.2 starting (+CMDMON +NTP +REFCLOCK +RTC -PRIVDROP -SCFILTER -SIGND +ASYNCDNS -NTS -SECHASH +IPV6 -DEBUG)
Feb 14 17:10:27 937d7c5 timesync-https.sh[2107]: hwclock: can't open '/dev/misc/rtc': No such file or directory
Feb 14 17:10:27 937d7c5 timesync-https.sh[1872]: [timesync-https.sh][INFO] New time: Tue Feb 14 17:10:27 UTC 2023
Feb 14 17:10:27 937d7c5 timesync-https.sh[1872]: [timesync-https.sh][INFO] Old time: Thu Apr 28 17:42:46 UTC 2022
Feb 14 17:10:27 937d7c5 journalctl[2080]: Vacuuming done, freed 0B of archived journals from /run/log/journal.
Feb 14 17:10:27 937d7c5 journalctl[2080]: Vacuuming done, freed 0B of archived journals from /run/log/journal/29edd8eebea441c3a4d80f4fbaf1a16f.
Feb 14 17:10:27 937d7c5 systemd-journald[1341]: /run/log/journal/29edd8eebea441c3a4d80f4fbaf1a16f/system.journal: Journal header limits reached or header out-of-date, rotating.
Feb 14 17:10:27 937d7c5 systemd-journald[1341]: Oldest entry in /run/log/journal/29edd8eebea441c3a4d80f4fbaf1a16f/system.journal is older than the configured file retention duration (1month), suggesting>
Feb 14 17:10:27 937d7c5 timesync-https.sh[1872]: [timesync-https.sh][INFO] Time synchronised via HTTPS.
Apr 28 17:42:43 937d7c5 rngd[1503]: [jitter]: Initialized
Apr 28 17:42:43 937d7c5 rngd[1503]: [jitter]: Enabling JITTER rng support
Apr 28 17:42:42 937d7c5 os-config[1870]: Awaiting service configuration...
Apr 28 17:42:39 937d7c5 NetworkManager[1825]: <info>  [1651167759.3941] manager: startup complete
Apr 28 17:42:39 937d7c5 sshd_check_keys[1547]:   generating ssh ED25519 host key...
Apr 28 17:42:39 937d7c5 sshd_check_keys[1547]:   generating ssh ECDSA host key...
Apr 28 17:42:38 937d7c5 nm-dispatcher[2000]: Setting NTP source on/offline status (connectivity-change LIMITED).
Apr 28 17:42:38 937d7c5 os-config[1870]: https://api.balena-cloud.com/os/v1/config: error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../>
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.1686] device (eth0): Activation: successful, device activated.
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.1677] manager: NetworkManager state is now CONNECTED_SITE
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.1663] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.1655] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.1457] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Apr 28 17:42:38 937d7c5 dnsmasq[1942]: dnsmasq[1942]: using nameserver 192.168.1.1#53
Apr 28 17:42:38 937d7c5 dnsmasq[1942]: dnsmasq[1942]: using nameserver 8.8.8.8#53
Apr 28 17:42:38 937d7c5 dnsmasq[1942]: dnsmasq[1942]: reading /etc/resolv.dnsmasq
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.0434] dns-mgr: Writing DNS information to /sbin/resolvconf
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.0422] policy: set 'Wired connection 1' (eth0) as default for IPv4 routing and DNS
Apr 28 17:42:38 937d7c5 NetworkManager[1825]: <info>  [1651167758.0409] dhcp4 (eth0): state changed new lease, address=192.168.1.139
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7511] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7364] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7342] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7336] manager: NetworkManager state is now CONNECTING
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7328] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7325] device (eth0): Activation: starting connection 'Wired connection 1' (7662ca9e-26a9-3cb6-b13d-6ce91585e2b5)
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7306] policy: auto-activating connection 'Wired connection 1' (7662ca9e-26a9-3cb6-b13d-6ce91585e2b5)
Apr 28 17:42:36 937d7c5 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 28 17:42:36 937d7c5 kernel: bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7282] device (eth0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Apr 28 17:42:36 937d7c5 NetworkManager[1825]: <info>  [1651167756.7270] device (eth0): carrier: link connected
Apr 28 17:42:36 937d7c5 bluetoothd[1809]: Adv Monitor app :1.12 disconnected from D-Bus
Apr 28 17:42:36 937d7c5 bthelper[1978]: AdvertisementMonitor path registered
Apr 28 17:42:36 937d7c5 bluetoothd[1809]: Path / reserved for Adv Monitor app :1.12
Apr 28 17:42:36 937d7c5 bluetoothd[1809]: Adv Monitor app :1.11 disconnected from D-Bus
Apr 28 17:42:36 937d7c5 bthelper[1977]: Changing power off succeeded
Apr 28 17:42:36 937d7c5 bluetoothd[1809]: Path / reserved for Adv Monitor app :1.11
Apr 28 17:42:33 937d7c5 ModemManager[1510]: [base-manager] couldn't check support for device '/sys/devices/platform/soc/fe300000.mmcnr/mmc_host/mmc1/mmc1:0001/mmc1:0001:1': not supported by any plugin
Apr 28 17:42:33 937d7c5 ModemManager[1510]: [base-manager] couldn't check support for device '/sys/devices/platform/scb/fd580000.ethernet': not supported by any plugin
Apr 28 17:42:33 937d7c5 NetworkManager[1825]: <info>  [1651167753.5073] device (p2p-dev-wlan0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
@maggie44
Copy link
Author

Was able to replicate it on an AMI:

aws ec2 run-instances --image-id ami-0799401323b99ed4c \
                      --count 1 \
                      --instance-type m5.large \
                      --tag-specifications \
                      "ResourceType=instance,Tags=[{Key=Name,Value=test-name}]" \
                      "ResourceType=volume,Tags=[{Key=Name,Value=test-name}]" \
                      --user-data "file://config.json" --region us-east-1

It seems to only occur occasionally though. So far only noticeable difference is avahi-daemon and balena-hostname are up before os-config on the working one.

Next week I will try and capture the full logs of both a working and non-working, and try and look in a more detail at the differences in boot order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant