New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oc mirror very slow, failure prone / inconsistent on bandwidth constrained network #793
Comments
Just today I opened a ticket in redhat about the slowness of this tool. Listing channels of a single operator takes around ~11 minutes. Love the tool but this is kind of frustrating. |
Yeah, it definitely depends on the network - but the channel listing is super slow, and when trying to check multiple operators across multiple registries, it can take a ton of time. I'm not sure what optimization/caching can be done with the current architecture, but would love for that to be possible! |
@BadgerOps I've had another person from the EU region ask about oc mirror being very slow recently. 4-5 minutes or over 10 minutes in some cases. We ran this command:
With the verbosity increased I can see cloudfront cache hits like
If you're seeing cloudfront cache misses, that could easily be part of the problem. I would think that given the number of OpenShift clusters in any region, the redhat-operator-index images should almost never have a cache miss. Even if there was one, it shouldn't happen again if you rerun the same command again. Are you seeing any cache misses in the verbose oc-mirror logs or any other messages that indicate slowness? |
I'll have to check my larger run log files, it is currently at around ~30hr run time. (Edit: I only have For grins, here's my output of the same command, cache hitting as expected: time oc-mirror --verbose 9 list operators --catalog=registry.redhat.io/redhat/redhat-operator-index:v4.12 --package=rhacs-operator 2>&1 | tee mirror-time.log
<snip>
NAME DISPLAY NAME DEFAULT CHANNEL
rhacs-operator Advanced Cluster Security for Kubernetes stable
PACKAGE CHANNEL HEAD
rhacs-operator latest rhacs-operator.v3.74.8
rhacs-operator rhacs-3.62 rhacs-operator.v3.62.1
rhacs-operator rhacs-3.64 rhacs-operator.v3.64.2
<snip>
rhacs-operator rhacs-4.3 rhacs-operator.v4.3.4
rhacs-operator stable rhacs-operator.v4.3.4
real 1m49.465s
user 0m53.767s
sys 0m18.740s
|
Also, to clarify, this is when running
./oc-mirror version
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.15.0-202401241750.p0.g6ddf902.assembly.stream-6ddf902", GitCommit:"6ddf902e42c93a3fd1cb155d52584bb8dd912c43", GitTreeState:"clean", BuildDate:"2024-01-24T22:09:12Z", GoVersion:"go1.20.12 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}
# what _does_ --continue-on-error do and should I be using it :thonk:
./oc-mirror -v 4 --continue-on-error --config imageset.yaml file://. imageset.yaml: kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
local:
path: /var/quay/oc-mirror/offline
mirror:
platform:
architectures:
- "amd64"
channels:
- name: stable-4.12
type: ocp
minVersion: '4.12.40'
maxVersion: '4.12.40'
shortestPath: true
graph: true
operators:
- catalog: registry.redhat.io/redhat/redhat-marketplace-index:v4.12
packages:
- name: percona-postgresql-operator-certified-rhmp
- catalog: registry.redhat.io/redhat/certified-operator-index:v4.12
packages:
- name: gitlab-operator-kubernetes
- name: gitlab-runner-operator
- name: dell-csm-operator-certified
- name: splunk-operator
- catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12
full: true
additionalImages:
- name: registry.redhat.io/ubi8/ubi:latest
- name: registry.redhat.io/rhel8/support-tools:latest
- name: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0
- name: registry.k8s.io/sig-storage/csi-resizer:v1.8.0
- name: registry.k8s.io/sig-storage/csi-attacher:v4.3.0
- name: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
- name: registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2
- name: docker.io/dellemc/csi-metadata-retriever:v1.4.0
- name: registry.access.redhat.com/ubi8/nginx-120:latest
- name: registry.gitlab.com/gitlab-org/build/cng/kubectl:v16.5.1 Note, this attempt changed from:
to:
Because, we realized we were missing some necessary redhat operators (specifically the logging operator) and were hoping that just mirroring the whole catalog would help solve that problem. I'm happy to grab any details you might want to dig into this, and will make sure to re-run with |
Looks exactly the same for me |
Womp, womp, ran out of disk space (Disk had 3T) after ~5.8 days. It would be awesome if there was a way to calculate what disk space would be required from an imageset.
So again, given the above imageset.yaml, and ~30mb average bandwidth, it took almost 6 days to run oc-mirror. Which is a miracle that we didn't have a network reset cause it to break in that time period. Also, since we did fail, re-starting is going to start from 0, meaning another ~6 days of waiting. Differential downloads/picking up from cached download would be very nice. |
So, here we are +2 more days of attempted sync's. I did learn that
appears to mirror every operator version instead of just the default, it seems like if I just want whatever the latest/default operator is I should just have
but I don't see that explicitely called out. Back to the random failures. I tried running with I then copy the (several) mirror_seq tar files over to my disconnected network, and run
Google, other issues, stackoverflow and the ai bots are all failing me in trying to get moved forward here. Any thoughts? Am I completely doing this wrong? |
For the red hatters, I submitted a support case with the same details here - FYSA. |
@BadgerOps fyi, the caching feature will be available in v2 of oc-mirror which will be released around openshift ~4.16. In my case the reason for the slowness is probably my mediocre host which could need some better specs. |
Another update - we've tried quite a few different ways of consistently mirroring Platform, Operator and Container images using oc-mirror. Given the restrictions mentioned above, and in our Support Case, we're having significant issues utilizing this tool - and really need some better guidance on how this tool is supposed to be used. Are there people out there on restricted networks that are successfully using oc-mirror to move data across networks? Feel free to reach out to me via my profile email to coordinate a discussion. |
First off, I am excited with what I'm seeing being developed here - I can see a lot of improvements coming soon, and would love to help contribute with a solution to the problem I am outlining below.
Version
What happened?
Hello team! I am on a bandwidth constrained network (~20mb average) in the EU. I am attempting to mirror something similar to the following imageset:
I'm mirroring with the following syntax:
This process takes anywhere from 500 - 1100 minutes to complete, but unfortunately often fails due to either a connection reset error (probably our network) or some upstream error, usually looks like a rate limit error.
It also seems to take forever (at least 10+ minutes on my system) to initialize the working directory (I'd love to know why a whole filesystem tree is created? :confusedbadger: ) and doesn't seem to cache anything on failure, only on a completely successful sync.
This is becoming frustrating for our team, as we're unable to sync platform updates & operators to our disconnected open shift installation reliably.
What did you expect to happen?
Reliable source of upstream updates for platform & operators
How to reproduce it (as minimally and precisely as possible)?
I will try to get some sanitized logs to provide if they would be helpful. - what specifically can I provide to help dig into this?
Anything else we need to know?
I would love to help identify and resolve the issues described above - I suspect there is some retry logic with exponential backoff that could help with some of the issues, and if there is a way to recover from a partially mirrored imageset, which I would love to dig into that possibility.
I've been trying different combinations of imagesets & oc-mirror versions (4.12, 4.14, 4.15-rc2,3,4) to try to get a reliable imageset downloaded to our internet facing server, but pretty consistently see the aforementioned problems.
Thank you, and I look forward to figuring out a good path forward - and, of course, I'd love it if I was just doing it wrong
The text was updated successfully, but these errors were encountered: