Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker container not building due to missing S3 artifact #4544

Open
3 tasks done
stefanAMB opened this issue Mar 15, 2024 · 3 comments
Open
3 tasks done

Docker container not building due to missing S3 artifact #4544

stefanAMB opened this issue Mar 15, 2024 · 3 comments
Assignees
Labels
type:bug Software flaws or errors.

Comments

@stefanAMB
Copy link

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I'm convinced that this is not my fault but a bug.

Description

I am trying to build autoware docker images but it consistently fails due to not being able to download an S3 artifact. Logs say:

221.7 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***
222.3 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 0, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}

Trying to manually get the asset (wget or Chrome) also fails saying address unreachable. Further logs are below.

Expected behavior

I expect to be able to run ./docker/build.sh without any issues.

Actual behavior

The actual behaviour is given above. Task [autoware.dev_evn.artifacts] fails with:

#24 221.7 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***                                                                                                                                                            
#24 222.3 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 0, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-
2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}                                                                                                                                                                             
#24 222.3                                                                                                                                                                                                                                                      
#24 222.3 PLAY RECAP *********************************************************************                                                                                                                                                                     
#24 222.3 localhost                  : ok=9    changed=6    unreachable=0    failed=1    skipped=60   rescued=0    ignored=0                                                                                                                                   
#24 222.3                                                                                                                                                                                                                                                      
#24 222.4 Failed.                                                                                                                                                                                                                                              
#24 ERROR: process "/bin/bash -o pipefail -c ./setup-dev-env.sh -y --module all ${SETUP_ARGS} --download-artifacts --no-cuda-drivers --runtime openadk   && pip uninstall -y ansible ansible-core   && mkdir src   && vcs import src < autoware.repos   && rosd
ep update   && DEBIAN_FRONTEND=noninteractive rosdep install -y --dependency-types=exec --ignore-src --from-paths src --rosdistro \"$ROS_DISTRO\"   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \"$HOME\"/.cache   && find /usr
/lib/$LIB_DIR-linux-gnu -name \"*.a\" -type f -delete   && find / -name \"*.o\" -type f -delete   && find / -name \"*.h\" -type f -delete   && find / -name \"*.hpp\" -type f -delete   && rm -rf /autoware/src /autoware/ansible /autoware/autoware.repos     
/root/.local/pipx /opt/ros/\"$ROS_DISTRO\"/include /etc/apt/sources.list.d/cuda*.list     /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/nvidia-docker.list     /usr/include /usr/share/doc /usr/lib/gcc /usr/lib/jvm /usr/lib/llvm*" did not comp
lete successfully: exit code: 1

Steps to reproduce

Assuming you cloned the repo do:

  1. git checkout main
  2. git fetch
  3. cd docker
  4. ./build.sh

Versions

No response

Possible causes

I assume the host s3.ap-northeast-2.wasabisys.com is simply offline.

Additional context

No response

@oguzkaganozt oguzkaganozt self-assigned this Mar 25, 2024
@idorobotics idorobotics added the type:bug Software flaws or errors. label Apr 4, 2024
@oguzkaganozt
Copy link
Contributor

Can this still be reproducible ? I could not. @stefanAMB

@stefanAMB
Copy link
Author

Hi @oguzkaganozt,

I am afraid it's till the same for me. The issue is still with an artifact download. Here are the logs.

#25 192.4 changed: [localhost]
#25 192.5 
#25 192.5 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***
#25 198.1 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 5, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}
#25 198.1 
#25 198.1 PLAY RECAP *********************************************************************
#25 198.1 localhost                  : ok=45   changed=19   unreachable=0    failed=1    skipped=29   rescued=0    ignored=0   
#25 198.1 
#25 198.2 Failed.
#25 ERROR: process "/bin/bash -o pipefail -c ./setup-dev-env.sh -y --module all ${SETUP_ARGS} --download-artifacts --no-cuda-drivers --runtime openadk   && pip uninstall -y ansible ansible-core   && mkdir src   && vcs import src < autoware.repos   && rosdep update   && DEBIAN_FRONTEND=noninteractive rosdep install -y --dependency-types=exec --ignore-src --from-paths src --rosdistro \"$ROS_DISTRO\"   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \"$HOME\"/.cache   && find /usr/lib/$LIB_DIR-linux-gnu -name \"*.a\" -type f -delete   && find / -name \"*.o\" -type f -delete   && find / -name \"*.h\" -type f -delete   && find / -name \"*.hpp\" -type f -delete   && rm -rf /autoware/src /autoware/ansible /autoware/autoware.repos     /root/.local/pipx /opt/ros/\"$ROS_DISTRO\"/include /etc/apt/sources.list.d/cuda*.list     /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/nvidia-docker.list     /usr/include /usr/share/doc /usr/lib/gcc /usr/lib/jvm /usr/lib/llvm*" did not complete successfully: exit code: 1

#24 [devel prebuilt 1/3] RUN --mount=type=ssh   ./setup-dev-env.sh -y --module all  --no-cuda-drivers openadk   && pip uninstall -y ansible ansible-core   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* "$HOME"/.cache   && find / -name 'libcu*.a' -delete   && find / -name 'libnv*.a' -delete
------
 > [runtime runtime 2/7] RUN --mount=type=ssh   ./setup-dev-env.sh -y --module all  --download-artifacts --no-cuda-drivers --runtime openadk   && pip uninstall -y ansible ansible-core   && mkdir src   && vcs import src < autoware.repos   && rosdep update   && DEBIAN_FRONTEND=noninteractive rosdep install -y --dependency-types=exec --ignore-src --from-paths src --rosdistro "humble"   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* "$HOME"/.cache   && find /usr/lib/x86_64-linux-gnu -name "*.a" -type f -delete   && find / -name "*.o" -type f -delete   && find / -name "*.h" -type f -delete   && find / -name "*.hpp" -type f -delete   && rm -rf /autoware/src /autoware/ansible /autoware/autoware.repos     /root/.local/pipx /opt/ros/"humble"/include /etc/apt/sources.list.d/cuda*.list     /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/nvidia-docker.list     /usr/include /usr/share/doc /usr/lib/gcc /usr/lib/jvm /usr/lib/llvm*:
192.3 TASK [autoware.dev_env.artifacts : Create yabloc_pose_initializer directory inside /root/autoware_data] ***
192.4 changed: [localhost]
192.5 
192.5 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***
198.1 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 5, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}
198.1 
198.1 PLAY RECAP *********************************************************************
198.1 localhost                  : ok=45   changed=19   unreachable=0    failed=1    skipped=29   rescued=0    ignored=0   
198.1 
198.2 Failed.
------

As you can see the error is still a failing connection to and S3 store. I digged it too. Here's the outcome

wasabisys.com ✅

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> wasabisys.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61557
;; flags: qr rd ra; QUERY: 1, ANSWER: 28, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;wasabisys.com.                 IN      A

;; ANSWER SECTION:
wasabisys.com.          120     IN      A       38.27.106.124
wasabisys.com.          120     IN      A       38.27.106.16
wasabisys.com.          120     IN      A       38.27.106.27
wasabisys.com.          120     IN      A       38.27.106.29
wasabisys.com.          120     IN      A       38.27.106.32
wasabisys.com.          120     IN      A       38.27.106.100
wasabisys.com.          120     IN      A       38.27.106.126
wasabisys.com.          120     IN      A       38.27.106.102
wasabisys.com.          120     IN      A       38.27.106.15
wasabisys.com.          120     IN      A       38.27.106.24
wasabisys.com.          120     IN      A       38.27.106.23
wasabisys.com.          120     IN      A       38.27.106.19
wasabisys.com.          120     IN      A       38.27.106.106
wasabisys.com.          120     IN      A       38.27.106.101
wasabisys.com.          120     IN      A       38.27.106.33
wasabisys.com.          120     IN      A       38.27.106.125
wasabisys.com.          120     IN      A       38.27.106.21
wasabisys.com.          120     IN      A       38.27.106.31
wasabisys.com.          120     IN      A       38.27.106.107
wasabisys.com.          120     IN      A       38.27.106.14
wasabisys.com.          120     IN      A       38.27.106.26
wasabisys.com.          120     IN      A       38.27.106.12
wasabisys.com.          120     IN      A       38.27.106.25
wasabisys.com.          120     IN      A       38.27.106.13
wasabisys.com.          120     IN      A       38.27.106.103
wasabisys.com.          120     IN      A       38.27.106.22
wasabisys.com.          120     IN      A       38.27.106.30
wasabisys.com.          120     IN      A       38.27.106.123

;; Query time: 292 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri May 10 09:16:49 CEST 2024
;; MSG SIZE  rcvd: 490

ap-northeast-2.wasabisys.com ✅

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> ap-northeast-2.wasabisys.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11056
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;ap-northeast-2.wasabisys.com.  IN      A

;; ANSWER SECTION:
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.231
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.230
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.232
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.233

;; Query time: 164 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri May 10 09:18:07 CEST 2024
;; MSG SIZE  rcvd: 121

s3.ap-northeast-2.wasabisys.com ❌

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> s3.ap-northeast-2.wasabisys.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34605
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;s3.ap-northeast-2.wasabisys.com. IN    A

;; ANSWER SECTION:
s3.ap-northeast-2.wasabisys.com. 10 IN  CNAME   malware.demo.spsredir.dnsfilters.com.
malware.demo.spsredir.dnsfilters.com. 493 IN A  23.200.237.238

;; Query time: 16 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri May 10 09:19:29 CEST 2024
;; MSG SIZE  rcvd: 123

Consequently, the issue persists w/ curl or wget

$> wget -v  https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz                                                                                                  --2024-05-10 09:58:50--  https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz
Resolving s3.ap-northeast-2.wasabisys.com (s3.ap-northeast-2.wasabisys.com)... 23.200.237.238
Connecting to s3.ap-northeast-2.wasabisys.com (s3.ap-northeast-2.wasabisys.com)|23.200.237.238|:443... failed: Network is unreachable.
$> wget -v  https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz                                                                                                  
$> curl -v   https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz                                                                                           
*   Trying 23.200.237.238:443...
* connect to 23.200.237.238 port 443 failed: Network is unreachable
* Failed to connect to s3.ap-northeast-2.wasabisys.com port 443 after 1210 ms: Network is unreachable
* Closing connection 0
curl: (7) Failed to connect to s3.ap-northeast-2.wasabisys.com port 443 after 1210 ms: Network is unreachable

The network I am using is located in Thuringia/Germany. I was able to connect to the host from a different network (in Berlin/Germany). I'll have to investigate if there's sth in the local network that prevents it once the NW manager is back in (next monday) and report.

@stefanAMB
Copy link
Author

stefanAMB commented May 10, 2024

Ok, I investigated a bit more. See the following tracereoutes

 traceroute to s3.ap-northeast-2.wasabisys.com (23.200.237.238), 30 hops max, 60 byte packets
 1  fritz.box (192.168.178.1)  0.498 ms  0.548 ms
 2  ber1001fihr001.versatel.de (62.214.63.105)  3.157 ms  2.819 ms
 3  vlan213.100M.flensburg1.distribution.komtel.net (62.214.0.77)  2.429 ms  2.328 ms
 4  versatel-ic-326760.ip.twelve99-cust.net (213.155.129.191)  2.381 ms  2.492 ms
 5  * 212.162.40.37 (212.162.40.37)  2.992 ms
 6  be3341.rcr71.ber01.atlas.cogentco.com (154.54.60.1)  14.528 ms  14.385 ms
 7  hbg-bb3-link.ip.twelve99.net (62.115.137.40)  9.518 ms be3141.ccr41.ham01.atlas.cogentco.com (130.117.49.137)  14.635 ms
 8  4.53.31.70 (4.53.31.70)  155.526 ms !N *

versus

traceroute to ap-northeast-2.wasabisys.com (219.164.248.230), 30 hops max, 60 byte packets
 1  fritz.box (192.168.178.1)  0.386 ms  0.491 ms
 2  * *
 3  vlan213.100M.flensburg1.distribution.komtel.net (62.214.0.77)  1.847 ms  1.976 ms
 4  versatel-ic-326760.ip.twelve99-cust.net (213.155.129.191)  2.076 ms 149.11.163.26 (149.11.163.26)  2.638 ms
 5  * 80.156.161.25 (80.156.161.25)  3.193 ms
 6  ae2.11.edge1.mln1.neo.colt.net (171.75.9.108)  21.549 ms f-ed12-i.F.DE.NET.DTAG.DE (217.5.67.162)  78.135 ms
 7  62.157.249.186 (62.157.249.186)  14.937 ms be3141.ccr41.ham01.atlas.cogentco.com (130.117.49.137)  14.085 ms
 8  be2816.ccr42.ams03.atlas.cogentco.com (154.54.38.209)  13.983 ms  14.010 ms
 9  ae-14.r21.londen12.uk.bb.gin.ntt.net (129.250.3.12)  21.920 ms ae-3.r20.frnkge13.de.bb.gin.ntt.net (129.250.3.22)  16.958 ms
10  ae-13.r24.asbnva02.us.bb.gin.ntt.net (129.250.6.6)  107.803 ms *
11  * be2806.ccr41.dca01.atlas.cogentco.com (154.54.40.106)  95.401 ms
12  * be3084.ccr41.iad02.atlas.cogentco.com (154.54.30.66)  101.598 ms
13  ae-1.a02.osakjp02.jp.bb.gin.ntt.net (129.250.4.232)  267.338 ms ae-21.a08.asbnva02.us.bb.gin.ntt.net (129.250.8.121)  110.058 ms
14  ae-3.r22.chcgil09.us.bb.gin.ntt.net (129.250.2.166)  118.235 ms ae-7.r26.dllstx14.us.bb.gin.ntt.net (129.250.4.152)  133.497 ms
15  ae-7.r26.dllstx14.us.bb.gin.ntt.net (129.250.4.152)  141.293 ms ae-4.r32.tokyjp05.jp.bb.gin.ntt.net (129.250.5.55)  255.862 ms
16  211.6.15.190 (211.6.15.190)  267.001 ms ae-2.r24.lsanca07.us.bb.gin.ntt.net (129.250.7.69)  154.067 ms
17  219.164.248.230 (219.164.248.230)  263.239 ms ae-1.a02.osakjp02.jp.bb.gin.ntt.net (129.250.4.232)  260.841 ms

Fiddling a bit I found that the s3 subdomain isn't even needed. Therefore the following diff fixes the issues I face:

--- a/ansible/roles/artifacts/tasks/main.yaml
+++ b/ansible/roles/artifacts/tasks/main.yaml
@@ -8,7 +8,7 @@
 - name: Download yabloc_pose_initializer/resources.tar.gz
   become: true
   ansible.builtin.get_url:
-    url: https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz
+    url: https://ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz
     dest: "{{ data_dir }}/yabloc_pose_initializer/resources.tar.gz"
     mode: "644"
     checksum: sha256:1f660e15f95074bade32b1f80dbf618e9cee1f0b9f76d3f4671cb9be7f56eb3a

I just checked an all images are built as expected. Not sure if this is then worthy a PR as it might be related to some network issues that aren't really the concern here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Software flaws or errors.
Projects
Status: Foundation
Development

No branches or pull requests

3 participants