Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database connection limits reached when GC is run #1708

Open
nmcostello opened this issue Feb 15, 2024 · 4 comments
Open

Database connection limits reached when GC is run #1708

nmcostello opened this issue Feb 15, 2024 · 4 comments

Comments

@nmcostello
Copy link

Hi,

We have harbor-helm deployed on a K8s cluster with RDS and S3 as the data backend. We currently have begun seeing an issue where GC is run and it takes up all of the available connections on the RDS cluster. This results in being unable to interact with harbor via UI, API, or OCI clients. The connections are eventually freed after ~5 hours, but during that time harbor is inoperable.

Please let me know if this is a better issue for the main harbor repo.

We have paused the GC schedule for the time being.

Harbor helm chart version: 1.11.1
Harbor version: v2.7.1-6015b3ef

DB Connection Values:

              maxIdleConns: 4
              maxOpenConns: 14

At the time the connections were overwhelmed, we had ~80 core + exporter pods running. By my calculations, that only equates to ~1100 connections which is no where near the 5k that we saw at the time. Any thoughts on this?

Pics

In the pictures below, you can see that the connections to the DB are more than what they should be according to harbor docs that max connections = [maxOpenConns] * (core+exporter). The spike in pod count around 2130 is the result of my interventions and is well after we hit max connections on the db.
Screenshot 2024-02-15 at 4 08 54 PM

Screenshot 2024-02-15 at 4 10 26 PM

@Vad1mo
Copy link
Member

Vad1mo commented Feb 22, 2024

Why do you run 80 core pods? Are piping the s3 traffic via core. (Disable redirect on docker distribution?)

One can do quite some optimization with indexes and caches.

However this won't solve the GC issue. It's a fundamental problem inherited by harbor from docker distribution.

@zyyw
Copy link
Collaborator

zyyw commented Feb 22, 2024

maybe you need to update the db connection to a larger number, reference:

@zyyw zyyw closed this as completed Feb 22, 2024
@nmcostello
Copy link
Author

nmcostello commented Feb 22, 2024

Why do you run 80 core pods? Are piping the s3 traffic via core. (Disable redirect on docker distribution?)

One can do quite some optimization with indexes and caches.

However this won't solve the GC issue. It's a fundamental problem inherited by harbor from docker distribution.

@Vad1mo
We aren't doing anything special. Our pods spike to 80 during the day with the traffic that we see. But if there are ways to optimize this I would love to read about it. Let me paste our s3 configs...

          {{- if .Values.s3 }}
              s3:
                {{- if and (ne .Values.environment "internal") (ne .Values.environment "internal-test") }}
                existingSecret: {{ .Values.targetNamespace }}-secret
                {{- end }}
                region: {{ .Values.s3.region }}
                bucket: {{ .Values.s3.bucket }}
                accesskey: managed-by-sealed-secret
                secretkey: managed-by-sealed-secret
                regionendpoint: {{ .Values.s3.regionendpoint }}

                encrypt: ""
                keyid: ""
                secure: ""
                skipverify: true
                v4auth: ""
                chunksize: "5242880"
                rootdirectory: ""
                storageclass: STANDARD
                multipartcopychunksize: "33554432"
                multipartcopymaxconcurrency: 100
                multipartcopythresholdsize: "33554432"
          {{- end }}

@Vad1mo
Copy link
Member

Vad1mo commented Feb 22, 2024

image
this is the option you should have on off.

Something seems way off in your setup. IMO, a single pod can do 100-300 concurrent operations.

@Vad1mo Vad1mo reopened this Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants