kubectl exec truncates stdout without reporting error #124571

jvm-pangea · 2024-04-26T17:15:56Z

What happened?

When running a command with kubectl exec, the output (data written to stdout) can be truncated without the command returning an error. This results in ambiguity around the command failing or network being torn down with an error (non-zero exit code) versus the command completing successfully and returning complete output (exit code 0).

Unsure what causes variation here, but probably related to network I/O speed and/or latency. Here's an example of drastic failures and inconsistencies on my machine:

% for i in 1 2 3 4 5; do kubectl exec "${POD:?}" -- sh -c 'seq 1 2000000' | tail -n1; done
1788552
1994888
809911
1862792
1214600

If it doesn't reproduce for you, try adding some network latency or bandwidth restrictions, or increasing the amount of data that goes across the pipe. Visiting fast.com in a browser while running the tests, running more tests, and transferring more output (larger second value in the seq command) will help reproduce.

What did you expect to happen?

I either:

Get all of the commands output, OR
Get truncated output and a non-zero exit code

How can we reproduce it (as minimally and precisely as possible)?

Run a command against a POD a number of times, see that the output always matches (it should):

for i in 1 2 3 4 5; do kubectl exec ${POD:?} -- sh -c 'seq 1 2000000' | tail -n1; done

If this doesn't reproduce try:

using a larger value for the upper bounds to seq
adding latency and/or bandwidth restrictions to the network link
(wouldn't reproduce for a colleague, had him visit fast.com in a browser while testing, which trigged it right away)

Anything else we need to know?

This appears to have been a problem for a long time:

Kubectl appears to be discarding standard output
"kubectl exec" sometimes incorrectly returns empty string causing tests to flake
#34256] (closed w/ only workaround)

Kubernetes version

Also reproduces with (all permuatations of) client version 1.29, server version v1.27.11-eks-b9c9ed7:

Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11-gke.1062000

Also reproed with:

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11-eks-b9c9ed7

Cloud provider

Reproduces against both AWS and GCP managed k8s.

OS version

I am on OS X.

Darwin hostname 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:44 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6000 arm64

Didn't have anyone handy not on OS X to try to repro.

Install tools

No response

Container runtime (CRI) and version (if applicable)

No response

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-26T17:16:05Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

brianpursley · 2024-04-26T17:43:29Z

This sounds like it could be the same problem as #60140 (comment)

This problem should be fixed in server version 1.30+ (actually 1.29+ if you enable the alpha feature). However, I know it is not always simple to upgrade.

As a potential workaround for earlier versions, some people have reported that adding a sleep at the end prevents the output from being truncated.

jvm-pangea · 2024-04-26T18:08:30Z

A sleep does workaround this, generally.

I think this is different from the cp EOF, as in the cp EOF problem the EOF case is detected and successfully reports an error (and I can imagine this happening due to PMTU problems or something, which was mentioned somewhere.

Looks like you're saying the data channel code for this has changed significantly in 1.30 (SPDY->WebSockets or the other way around?) though so that's promising. I'll let you know if I can test or repro on 1.30 if I get the chance to.

If anyone else has a 1.30 environment, would love to see if you can repro the issue there (network congestion is key.)

neolit123 · 2024-04-26T18:14:08Z

/sig cli

jvm-pangea added the kind/bug Categorizes issue or PR as related to a bug. label Apr 26, 2024

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 26, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 26, 2024

k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl exec truncates stdout without reporting error #124571

kubectl exec truncates stdout without reporting error #124571

jvm-pangea commented Apr 26, 2024

k8s-ci-robot commented Apr 26, 2024

brianpursley commented Apr 26, 2024

jvm-pangea commented Apr 26, 2024

neolit123 commented Apr 26, 2024

kubectl exec truncates stdout without reporting error #124571

kubectl exec truncates stdout without reporting error #124571

Comments

jvm-pangea commented Apr 26, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Apr 26, 2024

brianpursley commented Apr 26, 2024

jvm-pangea commented Apr 26, 2024

neolit123 commented Apr 26, 2024