Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet synchronizes pod states in a parallel manner. #124596

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhifei92
Copy link

@zhifei92 zhifei92 commented Apr 28, 2024

What type of PR is this?

/sig node
/area kubelet
/kind cleanup

What this PR does / why we need it:

When a large number of pods are created or deleted on a node, parallelly calling syncPod within syncBatch can effectively boost the performance of pod status synchronization.Addressing the issue mentioned in #123875, #123877

unit test

Test Scenario: When syncpod has a 20ms latency with 100 pods updating their status concurrently.

  1. Below are the test results without optimization
% make test WHAT=./pkg/kubelet/status GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestManyPodsUpdates$'
+++ [0428 18:37:36] Set GOMAXPROCS automatically to 12
+++ [0428 18:37:36] Running tests without code coverage and with -race
=== RUN   TestManyPodsUpdates
--- PASS: TestManyPodsUpdates (2.22s)
PASS
ok      k8s.io/kubernetes/pkg/kubelet/status    4.143s
  1. the test results after optimization
% make test WHAT=./pkg/kubelet/status GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestManyPodsUpdates$'
+++ [0428 18:38:03] Set GOMAXPROCS automatically to 12
+++ [0428 18:38:04] Running tests without code coverage and with -race
=== RUN   TestManyPodsUpdates
--- PASS: TestManyPodsUpdates (0.09s)
PASS
ok      k8s.io/kubernetes/pkg/kubelet/status    2.010s
e2e test

spec latency/resource should be within limit when create 90 pods with 0s interval
env :virtualbox Ubuntu 22.04 4 cores 8GB mem on maxOS Intel Core i7

  1. Below are the test results without optimization
{
  "version": "v2",
  "dataItems": [
    {
      "data": {
        "Perc100": 39425.357059,
        "Perc50": 36094.848409,
        "Perc90": 39269.009005,
        "Perc99": 39425.357059
      },
      "unit": "ms",
      "labels": {
        "datatype": "latency",
        "latencytype": "create-pod"
      }
    }
  ],
  "labels": {
    "desc": "latency/resource should be within limit when create 90 pods with 0s interval [Benchmark][NodeSpecialFeature:Benchmark]",
    "image": "Ubuntu 22.04.4 LTS",
    "machine": "cpu:4core,memory:7.8GB",
    "node": "ubuntu3",
    "test": "density_create_batch_90_0_0_5",
    "timestamp": "1715749372"
  }
}
  1. the test results after optimization
{
  "version": "v2",
  "dataItems": [
    {
      "data": {
        "Perc100": 28667.124507,
        "Perc50": 25658.057549,
        "Perc90": 28422.753707,
        "Perc99": 28667.124507
      },
      "unit": "ms",
      "labels": {
        "datatype": "latency",
        "latencytype": "create-pod"
      }
    }
  ],
  "labels": {
    "desc": "latency/resource should be within limit when create 90 pods with 0s interval [Benchmark][NodeSpecialFeature:Benchmark]",
    "image": "Ubuntu 22.04.4 LTS",
    "machine": "cpu:4core,memory:7.8GB",
    "node": "ubuntu3",
    "test": "density_create_batch_90_0_0_5",
    "timestamp": "1715750021"
  }
}

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. area/kubelet kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 28, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 28, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @zhifei92. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zhifei92
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhifei92
Copy link
Author

zhifei92 commented May 7, 2024

/assign @dims @rphillips Please help me review this. thks

@@ -436,6 +459,19 @@ func TestStaleUpdates(t *testing.T) {
verifyUpdates(t, m, 0)
}

func TestManyPodsUpdates(t *testing.T) {
const podNums = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhifei92 FWIW In my environment(Ubuntu 22.04 on Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz 24 cores) the change slows down the test:
master:

kubernetes (master) $ make test WHAT=./pkg/kubelet/status GOFLAGS="-v" KUBE_TEST_ARGS='-count 1 -run ^TestManyPodsUpdates$'
+++ [0508 14:06:19] Set GOMAXPROCS automatically to 24
+++ [0508 14:06:19] Running tests without code coverage and with -race
=== RUN   TestManyPodsUpdates
--- PASS: TestManyPodsUpdates (0.05s)
PASS
ok  	k8s.io/kubernetes/pkg/kubelet/status	1.113s

this PR:

kubernetes (124596) $ make test WHAT=./pkg/kubelet/status GOFLAGS="-v" KUBE_TEST_ARGS='-count 1 -run ^TestManyPodsUpdates$'
+++ [0508 14:03:41] Set GOMAXPROCS automatically to 24
+++ [0508 14:03:42] Running tests without code coverage and with -race
=== RUN   TestManyPodsUpdates
--- PASS: TestManyPodsUpdates (0.13s)
PASS
ok  	k8s.io/kubernetes/pkg/kubelet/status	1.231s

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Yes, that's correct, especially if we're not simulating any latency scenarios, where the extra overhead of locks does become a factor. However, considering there is inherent network latency in the communication between the kubelet and the kube-apiserver, we need to emulate a delay of 10 to 20 milliseconds within the syncPod function to truly demonstrate the value of parallel processing."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like provided unit test is not the best way to demonstrate benefits of the change. How about providing e2e test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The e2e test should be better. I'll try it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh Using the spec latency/resource should be within limit when create 90 pods with 0s interval should achieve the objective; if feasible, consider employing tc tc qdisc add dev lo root netem delay 20ms to simulate network latency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be better to have it reproducible in an e2e (not e2e_node) test case(s). This would allow us to avoid simulation and see a real life delays.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had mistakenly believed that an e2e_node test case was needed. I need to carefully contemplate how to cover this scenario using an end-to-end (e2e) test case. As I am not yet well-versed in e2e testing, it might take me some additional time to figure this out.

@bart0sh bart0sh moved this from Triage to Waiting on Author in SIG Node PR Triage May 15, 2024
m.syncPod(update.podUID, update.status)
go func() {
defer wg.Done()
m.syncPod(update.podUID, update.status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncPod will cause multiple request to Apiserver, but kubelet's request to Apiserver is limited by kube-api-qps) and max-requests-inflight.

And if there is a large cluster, multiple pods are created at the same time, and the kubelet of multiple nodes runs syncPod in parallel at the same time, it will undoubtedly trigger throttling and affect the normal work of Apiserver.

This PR will bring unpredictable risks, I may prefer #123877

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if there is a large cluster, multiple pods are created at the same time, and the kubelet of multiple nodes runs syncPod in parallel at the same time, it will undoubtedly trigger throttling and affect the normal work of Apiserver.

This PR will bring unpredictable risks, I may prefer #123877

Thanks for your comment. The impact of GET operations on the API server and etcd is considerably less than that of LIST(list all pods). Coupled with the apiserver APF, QPS limits, and the Kubelet QPS limits, this should suffice for the majority of scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
SIG Node PR Triage
Waiting on Author
Development

Successfully merging this pull request may close these issues.

None yet

4 participants