Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl exec truncates stdout without reporting error #124571

Open
jvm-pangea opened this issue Apr 26, 2024 · 4 comments
Open

kubectl exec truncates stdout without reporting error #124571

jvm-pangea opened this issue Apr 26, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@jvm-pangea
Copy link

What happened?

When running a command with kubectl exec, the output (data written to stdout) can be truncated without the command returning an error. This results in ambiguity around the command failing or network being torn down with an error (non-zero exit code) versus the command completing successfully and returning complete output (exit code 0).

Unsure what causes variation here, but probably related to network I/O speed and/or latency. Here's an example of drastic failures and inconsistencies on my machine:

% for i in 1 2 3 4 5; do kubectl exec "${POD:?}" -- sh -c 'seq 1 2000000' | tail -n1; done
1788552
1994888
809911
1862792
1214600

If it doesn't reproduce for you, try adding some network latency or bandwidth restrictions, or increasing the amount of data that goes across the pipe. Visiting fast.com in a browser while running the tests, running more tests, and transferring more output (larger second value in the seq command) will help reproduce.

What did you expect to happen?

I either:

  • Get all of the commands output, OR
  • Get truncated output and a non-zero exit code

How can we reproduce it (as minimally and precisely as possible)?

Run a command against a POD a number of times, see that the output always matches (it should):

for i in 1 2 3 4 5; do kubectl exec ${POD:?} -- sh -c 'seq 1 2000000' | tail -n1; done

If this doesn't reproduce try:

  • using a larger value for the upper bounds to seq
  • adding latency and/or bandwidth restrictions to the network link
  • (wouldn't reproduce for a colleague, had him visit fast.com in a browser while testing, which trigged it right away)

Anything else we need to know?

This appears to have been a problem for a long time:

Kubectl appears to be discarding standard output
"kubectl exec" sometimes incorrectly returns empty string causing tests to flake
#34256] (closed w/ only workaround)

Kubernetes version

Also reproduces with (all permuatations of) client version 1.29, server version v1.27.11-eks-b9c9ed7:

Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11-gke.1062000

Also reproed with:

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11-eks-b9c9ed7

Cloud provider

Reproduces against both AWS and GCP managed k8s.

OS version

I am on OS X.

Darwin hostname 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:44 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6000 arm64

Didn't have anyone handy not on OS X to try to repro.

Install tools

No response

Container runtime (CRI) and version (if applicable)

No response

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

@jvm-pangea jvm-pangea added the kind/bug Categorizes issue or PR as related to a bug. label Apr 26, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 26, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 26, 2024
@brianpursley
Copy link
Member

This sounds like it could be the same problem as #60140 (comment)

This problem should be fixed in server version 1.30+ (actually 1.29+ if you enable the alpha feature). However, I know it is not always simple to upgrade.

As a potential workaround for earlier versions, some people have reported that adding a sleep at the end prevents the output from being truncated.

@jvm-pangea
Copy link
Author

A sleep does workaround this, generally.

I think this is different from the cp EOF, as in the cp EOF problem the EOF case is detected and successfully reports an error (and I can imagine this happening due to PMTU problems or something, which was mentioned somewhere.

Looks like you're saying the data channel code for this has changed significantly in 1.30 (SPDY->WebSockets or the other way around?) though so that's promising. I'll let you know if I can test or repro on 1.30 if I get the chance to.

If anyone else has a 1.30 environment, would love to see if you can repro the issue there (network congestion is key.)

@neolit123
Copy link
Member

/sig cli

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
Status: Needs Triage
Development

No branches or pull requests

4 participants