Database connection limits reached when GC is run #1708

nmcostello · 2024-02-15T21:11:13Z

Hi,

We have harbor-helm deployed on a K8s cluster with RDS and S3 as the data backend. We currently have begun seeing an issue where GC is run and it takes up all of the available connections on the RDS cluster. This results in being unable to interact with harbor via UI, API, or OCI clients. The connections are eventually freed after ~5 hours, but during that time harbor is inoperable.

Please let me know if this is a better issue for the main harbor repo.

We have paused the GC schedule for the time being.

Harbor helm chart version: 1.11.1
Harbor version: v2.7.1-6015b3ef

DB Connection Values:

              maxIdleConns: 4
              maxOpenConns: 14

At the time the connections were overwhelmed, we had ~80 core + exporter pods running. By my calculations, that only equates to ~1100 connections which is no where near the 5k that we saw at the time. Any thoughts on this?

Pics

In the pictures below, you can see that the connections to the DB are more than what they should be according to harbor docs that max connections = [maxOpenConns] * (core+exporter). The spike in pod count around 2130 is the result of my interventions and is well after we hit max connections on the db.

Vad1mo · 2024-02-22T08:04:34Z

Why do you run 80 core pods? Are piping the s3 traffic via core. (Disable redirect on docker distribution?)

One can do quite some optimization with indexes and caches.

However this won't solve the GC issue. It's a fundamental problem inherited by harbor from docker distribution.

zyyw · 2024-02-22T08:32:41Z

maybe you need to update the db connection to a larger number, reference:

https://github.com/goharbor/harbor-helm/blob/main/values.yaml#L885-L889

nmcostello · 2024-02-22T14:39:35Z

Why do you run 80 core pods? Are piping the s3 traffic via core. (Disable redirect on docker distribution?)

One can do quite some optimization with indexes and caches.

However this won't solve the GC issue. It's a fundamental problem inherited by harbor from docker distribution.

@Vad1mo
We aren't doing anything special. Our pods spike to 80 during the day with the traffic that we see. But if there are ways to optimize this I would love to read about it. Let me paste our s3 configs...

          {{- if .Values.s3 }}
              s3:
                {{- if and (ne .Values.environment "internal") (ne .Values.environment "internal-test") }}
                existingSecret: {{ .Values.targetNamespace }}-secret
                {{- end }}
                region: {{ .Values.s3.region }}
                bucket: {{ .Values.s3.bucket }}
                accesskey: managed-by-sealed-secret
                secretkey: managed-by-sealed-secret
                regionendpoint: {{ .Values.s3.regionendpoint }}

                encrypt: ""
                keyid: ""
                secure: ""
                skipverify: true
                v4auth: ""
                chunksize: "5242880"
                rootdirectory: ""
                storageclass: STANDARD
                multipartcopychunksize: "33554432"
                multipartcopymaxconcurrency: 100
                multipartcopythresholdsize: "33554432"
          {{- end }}

Vad1mo · 2024-02-22T16:28:02Z

this is the option you should have on off.

Something seems way off in your setup. IMO, a single pod can do 100-300 concurrent operations.

nmcostello mentioned this issue Feb 20, 2024

GC causes outage on production Harbor deployment goharbor/harbor#20000

Open

zyyw closed this as completed Feb 22, 2024

Vad1mo reopened this Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database connection limits reached when GC is run #1708

Database connection limits reached when GC is run #1708

nmcostello commented Feb 15, 2024

Vad1mo commented Feb 22, 2024

zyyw commented Feb 22, 2024

nmcostello commented Feb 22, 2024 •

edited

Vad1mo commented Feb 22, 2024

Database connection limits reached when GC is run #1708

Database connection limits reached when GC is run #1708

Comments

nmcostello commented Feb 15, 2024

DB Connection Values:

Pics

Vad1mo commented Feb 22, 2024

zyyw commented Feb 22, 2024

nmcostello commented Feb 22, 2024 • edited

Vad1mo commented Feb 22, 2024

nmcostello commented Feb 22, 2024 •

edited