Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm hangs on linux/s390x containers #1973

Open
hardillb opened this issue Oct 4, 2023 · 22 comments
Open

npm hangs on linux/s390x containers #1973

hardillb opened this issue Oct 4, 2023 · 22 comments

Comments

@hardillb
Copy link

hardillb commented Oct 4, 2023

Environment

  • Platform: linux/s390x
  • Docker Version: 24.0.6, build ed223bc
  • Node.js Version: 18
  • Image Tag:18-alpine

Expected Behavior

npm install runs and packages are installed.

Current Behavior

Trying to build a container on the linux/s309x platform hangs running npm install with npm consuming 100% CPU.

Previous builds complete in less than 5mins, current build has been running for over an hour

We are building the https://github.com/node-red/node-red-docker container with

docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .

Possible Solution

Steps to Reproduce

Additional Information

Same thing is happening with 14-alpine and 16-alpine tags

I'm hitting this both locally and in a GH Action, both of which use Qemu to support building for alternate architectures.

@tyranron
Copy link

tyranron commented Oct 4, 2023

I have similar issue (see Dockerfile).

I wonder whether the problem of #1798 and #1829 finally snuck into 18 and earlier images.

@sxa
Copy link
Member

sxa commented Oct 4, 2023

Interesting. I've just fired up the docker image (node:16-alpine and node:18-alpine) on a real s390x system and npm seems to install without any problems. Which would lead us to perhaps something specific to qemu or the docker version in use (Mine is Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1)

@sxa
Copy link
Member

sxa commented Oct 4, 2023

Just tried with your dockerfile - went through without problems:
build18.log.gz
Command: docker build --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 . 2>&1 | tee build18.log

@hardillb
Copy link
Author

hardillb commented Oct 4, 2023

Which does appear to point to this possibly being a qemu based problem. I know my laptop got a recent set of qemu packages, but not sure what would be needed to debug this. Any pointers would be helpful

@tyranron
Copy link

tyranron commented Oct 4, 2023

@hardillb setup-qemu-action uses onistiigi/binfmt Docker image for installing QEMU binaries. I think other versions like 6.1.0 or master could be tried to "resolve" this at least on GitHub Actions.

@hardillb
Copy link
Author

hardillb commented Oct 4, 2023

master doesn't appear to fix it for me, testing 6.1.0

@hardillb
Copy link
Author

hardillb commented Oct 4, 2023

no joy with qemu-v6.1.0 either so this may be a NodeJS + Qemu issue

@hardillb
Copy link
Author

hardillb commented Oct 5, 2023

OK, while this appears to be limited to when running builds using qemu, this is going to be the default way 99% of CI builds run that target s390x, so I think we still need to track this down, even if it's just to raise a sensible upstream issue against qemu.

  • What debug options can I enable to try and get some useful debug information here?
  • Would trying to connect GDB to the spinning process help?

@tyranron
Copy link

tyranron commented Oct 5, 2023

@hardillb seems like after moby/buildkit#1516 we may omit using setup-qemu-action, because BuildKit supports QEMU emulation out-of-the-box. Even more, judging by onistiigi/binfmt Docker image tags, newer version of QEMU are released for buildkit- images only. The last one is 7.1.0.

However, for my repository the result is still the same, no matter which version is used: 6.0.0, 6.1.0, 6.2.0, 7.0.0, 7.1.0 or master.

@tyranron
Copy link

tyranron commented Oct 5, 2023

@hardillb in my case, the problem seems to be related to Linux only, somehow. I was able to resolve the issue just by switching to macos-latest runner for archs where the build stucks.

I will try this workaround for #1798 too, and will report the results.

@hardillb
Copy link
Author

@tyranron did you get any joy using the docker.io/ prefix on the base containers?

If it is the qemu but https://gitlab.com/qemu-project/qemu/-/issues/1729 then hopefully it gets fixed soon.

@tyranron
Copy link

@hardillb

did you get any joy using the docker.io/ prefix on the base containers?

These are the same images, no?

I will try this workaround for #1798 too, and will report the results.

Building under macos-latest runner didn't work out for Node.js 20, but for 18 it fixed my problem.

@hardillb
Copy link
Author

hardillb commented Oct 17, 2023

This may not be the same as the other qemu bug as it's not calling mremap.

I ran the following command:

docker run --platform linux/s390x -it --cap-add=SYS_PTRACE -e QEMU_STRACE=true -e QEMU_LOG_FILENAME=qemu.log -v ./qemu.log:/qemu.log --rm node:18-alpine npm install node-red:3.1.0

and got the following strace:

1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62cd8) = 0 ({tv_sec = 2223689,tv_nsec = 708242416})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708269945})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62db8) = 0 ({tv_sec = 2223689,tv_nsec = 708530498})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708556031})
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b62eb0) = 0 ({tv_sec = 2223689,tv_nsec = 708593060})
1 munmap(0x00000040101ef000,57344) = 0
1 clock_gettime(CLOCK_MONOTONIC,0x0000004007b63028) = 0 ({tv_sec = 2223689,tv_nsec = 708653152})
1 socket(PF_NETLINK,SOCK_RAW|SOCK_CLOEXEC,NETLINK_ROUTE) = 24
1 sendto(24,275007277688,20,0,0,0) = 20
1 recvfrom(24,275007277688,8192,64,0,0) = 2880

qemu.log

@hardillb
Copy link
Author

This looks to be spinning trying to receive data from the network. How do we move this forward?

@tyranron
Copy link

tyranron commented Oct 25, 2023

Due to tonistiigi/binfmt#120 we have QEMU 8.0 in onistiigi/binfmt:master Docker image now. Tried it with node:21 Docker image, and still no luck.

@felddy
Copy link

felddy commented Nov 7, 2023

I started seeing this issue on September 19th, 2023.

I created a repo to help diagnose the problem, or to detect when a fix is made upstream. It runs daily tests on two versions of node across six architectures on Debian and Alpine. It simply attempts npm -v.

On Nov 7: 4 of the 12 Alpine combinations are failing.

Daily test status:

See:

@ozbillwang
Copy link

ozbillwang commented Nov 15, 2023

report the similar in ticket #1946

mendhak added a commit to mendhak/docker-http-https-echo that referenced this issue Dec 3, 2023
@hardillb hardillb changed the title npm hangs on linux/s309x containers npm hangs on linux/s390x containers Dec 25, 2023
@hardillb
Copy link
Author

I've been playing with this again (as it's still a problem). I've been using AWS EC2 machines to try out a few different options.

  • It fails on Both Intel and AMD based x86_64 hardware
  • It fails on AWS Arm64 hardware as well
  • It fails with the latest 8.0.6 qemu builds (as provided by the qemu-v8.0.4 tag of tonistiigi/binfmt
  • I've tried Ubuntu 22.04 and 23.10 base OS builds

@whyour
Copy link

whyour commented Jan 1, 2024

I tried to run it on ubuntu-20.04 s390x and it works fine, but arm/v6 and arm/v7 still don't work, only alpine3.18 and nodejs18.
https://github.com/whyour/qinglong/actions/runs/7375782137/job/20067750407

@janvda
Copy link

janvda commented Jan 3, 2024

I tried to reproduce the problem on my macbook and it seems to be working for me:
FYI this is what I get:

mac-jan:tmp jan$ git clone https://github.com/node-red/node-red-docker
Cloning into 'node-red-docker'...
remote: Enumerating objects: 3154, done.
remote: Counting objects: 100% (225/225), done.
remote: Compressing objects: 100% (107/107), done.
remote: Total 3154 (delta 133), reused 197 (delta 118), pack-reused 2929
Receiving objects: 100% (3154/3154), 823.97 KiB | 1.99 MiB/s, done.
Resolving deltas: 100% (1988/1988), done.
mac-jan:tmp jan$ ls
node-red-docker
mac-jan:tmp jan$ cd node-red-docker/
mac-jan:node-red-docker jan$ docker buildx build --platform linux/s390x --file .docker/Dockerfile.alpine --build-arg NODE_VERSION=18 .
[+] Building 331.6s (20/20) FINISHED                                                                                                                 docker-container:build
 => [internal] load build definition from Dockerfile.alpine                                                                                                            0.1s
 => => transferring dockerfile: 3.55kB                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/node:18-alpine                                                                                                      4.0s
 => [auth] library/node:pull token for registry-1.docker.io                                                                                                            0.0s
 => [internal] load .dockerignore                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                        0.0s
 => [base  1/11] FROM docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2                                        18.3s
 => => resolve docker.io/library/node:18-alpine@sha256:b1a0356f7d6b86c958a06949d3db3f7fb27f95f627aa6157cb98bc65c801efa2                                                0.0s
 => => sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61 449B / 449B                                                                             2.0s
 => => sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080 2.34MB / 2.34MB                                                                        18.0s
 => => sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643 41.11MB / 41.11MB                                                                      11.1s
 => => sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227 3.24MB / 3.24MB                                                                         3.3s
 => => extracting sha256:0fca3ee44ced87b7184bc23390283fdf10cfae0e844a25b785dd11c463815227                                                                              0.2s
 => => extracting sha256:9d91f34cd4b1eccf088eefcf662f235b2c0ae325b9739b7f7f7e875c25ba8643                                                                              2.3s
 => => extracting sha256:743d88e4fdd9423c2822f8530204bc76ce14cdcfc62b97fe81bb5a6115485080                                                                              0.1s
 => => extracting sha256:8f566b0cf37515471460f9658e9c86f83bf2350169945c1f2b328eec90ccac61                                                                              0.0s
 => [internal] load build context                                                                                                                                      0.1s
 => => transferring context: 7.81kB                                                                                                                                    0.0s
 => [base  2/11] COPY .docker/scripts/*.sh /tmp/                                                                                                                       0.0s
 => [base  3/11] COPY .docker/healthcheck.js /                                                                                                                         0.0s
 => [base  4/11] RUN set -ex &&     apk add --no-cache         bash         tzdata         iputils         curl         nano         git         openssl         open  8.1s
 => [base  5/11] WORKDIR /usr/src/node-red                                                                                                                             0.0s 
 => [base  6/11] COPY .docker/known_hosts.sh .                                                                                                                         0.0s 
 => [base  7/11] RUN ./known_hosts.sh /etc/ssh/ssh_known_hosts && rm /usr/src/node-red/known_hosts.sh                                                                 71.6s 
 => [base  8/11] RUN echo "PubkeyAcceptedKeyTypes +ssh-rsa" >> /etc/ssh/ssh_config                                                                                     0.2s 
 => [base  9/11] COPY package.json .                                                                                                                                   0.0s 
 => [base 10/11] COPY flows.json /data                                                                                                                                 0.1s 
 => [base 11/11] COPY .docker/scripts/entrypoint.sh .                                                                                                                  0.1s 
 => [build 1/1] RUN apk add --no-cache --virtual buildtools build-base linux-headers udev python3 &&     npm install --unsafe-perm --no-update-notifier --no-audit   178.4s 
 => [release 1/3] COPY --from=build /usr/src/node-red/prod_node_modules ./node_modules                                                                                 0.8s 
 => [release 2/3] RUN chown -R node-red:root /usr/src/node-red &&     /tmp/install_devtools.sh &&     rm -r /tmp/*                                                    41.5s 
 => [release 3/3] RUN npm config set cache /data/.npm --global                                                                                                         7.3s 
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load                                                                                                                                                      
mac-jan:node-red-docker jan$

FYI My macbook docker setup:

1/ I have installed lima (so I don't use docker desktop)

# install lima
brew install lima

# create default lima instance with 6GB memory using docker template
limactl start --name=default --set='.cpus = 4 | .memory = "6GiB" | .disk = "100GiB" ' template://docker

# create docker context - note that the actual unix socket path is returned by the previous command.
docker context create colima --docker "host=unix:///Users/jan/.lima/default/sock/docker.sock"
colima"

# starts the docker environment on my macbook.
limactl start

2/ I have installed Docker Buildx as follows:

# in folder /Users/jan/.docker/cli-plugins
wget https://github.com/docker/buildx/releases/download/v0.10.3/buildx-v0.10.3.darwin-amd64
mv buildx-v0.10.3.darwin-amd64 docker-buildx
chmod a+x docker-buildx

Add binfmt_misc support for additional platforms as specified in https://docs.docker.com/build/building/multi-platform/

 docker run --privileged --rm tonistiigi/binfmt --install all

@tyranron
Copy link

tyranron commented Jan 5, 2024

With tonistiigi/binfmt#144 (QEMU 8.1.4) and node:21 it still doesn't work for me on arm32v6, arm32v7 and s390x platforms. Tried building on both macos-latest and ubuntu-latest runners:

@hardillb
Copy link
Author

hardillb commented Jan 5, 2024

Also the important place to test this is in AMD64 hardware as this needs to run on GH actions with the Ubuntu runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants