Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't enable DB monitoring collect_schemas feature: job "database-metadata" crushing. #16498

Open
se-ipsip opened this issue Dec 27, 2023 · 0 comments

Comments

@se-ipsip
Copy link

Output of the info page

Getting the status from the agent.


===============
Agent (v7.49.1)
===============

  Status date: 2023-12-27 11:08:42.991 UTC (1703675322991)
  Agent start: 2023-12-27 11:07:33.663 UTC (1703675253663)
  Pid: 1
  Go Version: go1.20.10
  Python Version: 3.9.18
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System time: 2023-12-27 11:08:42.991 UTC (1703675322991)

  Host Info
  =========
    bootTime: 2023-12-20 09:54:14 UTC (1703066054000)
    hostId: <redacted>
    kernelArch: x86_64
    kernelVersion: 6.1.58+
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 23.04
    procs: 4
    uptime: 169h14m20s
    virtualizationRole: guest

  Hostnames
  =========
<redacted>

  Metadata
  ========

=========
Collector
=========

  Running Checks
  ==============

    postgres (15.1.1)
    -----------------
      Instance ID: postgres:5e98737379db6dc5 [OK]
      Configuration Source: kube_services:kube_service://datadog/datadog-cloudsql-proxy
      Total Runs: 3
      Metric Samples: Last Run: 307, Total: 814
      Events: Last Run: 0, Total: 0
      Database Monitoring Activity Samples: Last Run: 1, Total: 3
      Database Monitoring Metadata Samples: Last Run: 1, Total: 3
      Database Monitoring Query Metrics: Last Run: 1, Total: 3
      Database Monitoring Query Samples: Last Run: 27, Total: 62
      Service Checks: Last Run: 1, Total: 3
      Average Execution Time : 328ms
      Last Execution Date : 2023-12-27 11:08:31 UTC (1703675311000)
      Last Successful Execution Date : 2023-12-27 11:08:31 UTC (1703675311000)

      Instance ID: postgres:73bd4a61bbd12aab [OK]
      Configuration Source: kube_services:kube_service://datadog/datadog-cloudsql-proxy
      Total Runs: 2
      Metric Samples: Last Run: 350, Total: 468
      Events: Last Run: 0, Total: 0
      Database Monitoring Activity Samples: Last Run: 2, Total: 2
      Database Monitoring Metadata Samples: Last Run: 1, Total: 3
      Database Monitoring Query Metrics: Last Run: 2, Total: 2
      Database Monitoring Query Samples: Last Run: 9, Total: 9
      Service Checks: Last Run: 1, Total: 2
      Average Execution Time : 118ms
      Last Execution Date : 2023-12-27 11:08:23 UTC (1703675303000)
      Last Successful Execution Date : 2023-12-27 11:08:23 UTC (1703675303000)


==========
Aggregator
==========
  Checks Metric Sample: 1,450
  Dogstatsd Metric Sample: 1
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 3
  Series Flushed: 313
  Service Check: 5
  Service Checks Flushed: 6
  Database Monitoring Activity Samples: 7
  Database Monitoring Metadata Samples: 7
  Database Monitoring Query Metrics: 6
  Database Monitoring Query Samples: 96
==========
Endpoints
==========
  https://app.datadoghq.eu - API Key ending with:
      - <redacted>


=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://<redacted>:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 7.49.1+commit.1790cab

=============
Autodiscovery
=============
  Enabled Features
  ================
    kubernetes

Additional environment details (Operating System, Cloud provider, etc):
GCP CloudSQL Postgres
DD running on GKE autopilot, deployed with Helm
PGSQL 14
Datadog connection via CloudSQL Proxy

Steps to reproduce the issue:

  1. Configue for Database Monitoring
     {
        "postgres": {
          "init_config": {},
          "instances": [
            {
              "host": "datadog-cloudsql-proxy",
              "reported_hostname": "gcpsql-01",
              "port": 5432,
              "dbstrict": true,
              "username": "<username>",
              "dbname": "db01",
              "dbm": true,
              "relations": [
                "relation_regex: .*"
              ]
            }
          ]
        }
      } 
  1. Enable schema collection feature (https://docs.datadoghq.com/database_monitoring/setup_postgres/gcsql/?tab=kubernetes#collecting-schemas)
    Add into config
   "collect_schemas": { "enabled": true }
  1. Can observe following error stacktrace in datadog-agent-clusterchecks container:
2023-12-27 10:56:14 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:129 in LogMessage) | postgres:8f8c6091ed5a240d | (utils.py:327) | [kube_service:datadog-cloudsql-proxy,env:demo,service:postgres,kube_namespace:datadog,cluster_name:***, kube_cluster_name:***,server:datadog-cloudsql-proxy,port:5432,db:postgres,dd.internal.resource:database_instance:gcpsql-01,job:database-metadata] Job loop crash
Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/db/utils.py", line 305, in _job_loop
    self._run_job_rate_limited()
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/db/utils.py", line 344, in _run_job_rate_limited
    self._run_job_traced()
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/db/utils.py", line 350, in _run_job_traced
    return self.run_job()
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 216, in run_job
    self.report_postgres_metadata()
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/tracking.py", line 71, in wrapper
    result = function(self, *args, **kwargs)
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 242, in report_postgres_metadata
    metadata = self._collect_schema_info()
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 274, in _collect_schema_info
    metadata.append(self._collect_metadata_for_database(database))
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 461, in _collect_metadata_for_database
    tables_info = self._query_table_information_for_schema(cursor, schema['id'], dbname)
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 396, in _query_table_information_for_schema
    tables_info = self._get_table_info(cursor, dbname, schema_id)
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 324, in _get_table_info
    table_info = self._filter_tables_with_no_relation_metrics(dbname, table_info)
  File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 338, in _filter_tables_with_no_relation_metrics
    if table['name'] in cache[dbname].keys():
KeyError: 'db01'

Describe the results you received:
Schema(BETA) tab in APM DB Monitoring page still shows empty.

Describe the results you expected:
Schema information collected on APM DB monitoring page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant