Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the kubelet check error reporting in the output of agent status #7495

Merged
merged 1 commit into from
Sep 7, 2020

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented Sep 3, 2020

What does this PR do?

In combination with DataDog/datadog-agent#6315 , this change improves the kubelet check error reported in the output of agent status when the agent cannot properly connect to the kubelet.

Motivation

In case the agent cannot properly connect to the kubelet, the useful details were in the logs but the output of the agent status command gave no clue about the reasons.

Here is an example of the agent status output in such a case:

$ agent status
[…]

=========
Collector
=========

  Running Checks
  ==============

    kubelet (4.1.1)
    ---------------
      Instance ID: kubelet:d884b5186b651429 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 1
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 2ms
      Last Execution Date : 2020-09-03 11:40:31.000000 UTC
      Last Successful Execution Date : Never
      Error: Unable to detect the kubelet URL automatically.
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 827, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 297, in check
          raise CheckException("Unable to detect the kubelet URL automatically.")
      datadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically.

Here is what the output becomes with this PR:

$ agent status
[…]

=========
Collector
=========

  Running Checks
  ==============

    kubelet (4.1.1)
    ---------------
      Instance ID: kubelet:d884b5186b651429 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 1
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 3ms
      Last Execution Date : 2020-09-03 11:14:20.000000 UTC
      Last Successful Execution Date : Never
      Error: Unable to detect the kubelet URL automatically: cannot set a valid kubelet host: cannot connect to kubelet using any of the given hosts: [1.2.3.4] [], Errors: [Get https://1.2.3.4:10250/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) cannot connect: http: "Get http://1.2.3.4:10255/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"]
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 827, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 297, in check
          raise CheckException("Unable to detect the kubelet URL automatically: " + kubelet_conn_info.get('err'))
      datadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically: cannot set a valid kubelet host: cannot connect to kubelet using any of the given hosts: [1.2.3.4] [], Errors: [Get https://1.2.3.4:10250/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) cannot connect: http: "Get http://1.2.3.4:10255/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"]

Additional Notes

This would help the investigation of issues like #2582.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@L3n41c L3n41c merged commit 1638ecc into master Sep 7, 2020
@L3n41c L3n41c deleted the lenaic/kubelet_error_status branch September 7, 2020 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants