Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CEP] Support for Prometheus metrics #26816

Open
snopoke opened this issue Mar 9, 2020 · 0 comments
Open

[CEP] Support for Prometheus metrics #26816

snopoke opened this issue Mar 9, 2020 · 0 comments
Assignees
Labels
CEP: done CEP CommCare Enhancement Proposal

Comments

@snopoke
Copy link
Contributor

snopoke commented Mar 9, 2020

Abstract
CommCare currently supports sending metrics to Datadog. This proposal outlines changes required to support exposing metrics compatible with Prometheus. Prometheus is an open source monitoring solution which can be hosted alongside CommCare.

Motivation
In order to improve the ability of organizations outside of Dimagi to run and support CommCare without being dependent on paid services. The specific use case currently is the ICDS program. As part of the effort to hand over the operations of CommCare to the government it is desirable to have a self hosted monitoring solution.

Specification
Some important differences between Datadog and Prometheus:

Function Datadog Promethius
Metric collection Datadog is a push based system. Agents are run on hosts which collate metrics and push them to the central Datadog API. Custom services can be instrumented which send metrics to StatsD which in turn is queried by the Datadog agent and forwarded with the other host level metrics. Promethius is primarily a pull based system. The Promethius server makes HTTP requests to configured endpoints from where it scrapes metrics. Promethius does support push metrics for certain use cases but it is not the primary method of collecting metrics.
Metric definition The Datadog client libraries allow dynamic definition of metrics via the metric name and a dynamic list of tags. Prometheus client libraries require definition of the metrics as a global class. They also require defining the metric labels at creation.

Instrumentation

Since the Prometheus client library has a more restrictive API it is recommended that a compatible Python API be created for Datadog which will allow the two to be used interchangeably. The following example illustrates the potential usage:

# this may be declared at the file level
metric_blobs_added = get_metrics_provider().counter('commcare.blobs.added.count', 'Count of blobs added', tag_names=['type_code'])

metric_blobs_added.tag(type_code=1).inc()

The metrics provider may be interchanged between Datadog and Prometheus based on the configuration values in the system. This will work in a similar fashion to how the BlobDB currently works.

Exposing metrics
The metrics provider for Datadog will continue to push metrics to a local StatsD instance.

In order to expose metrics for Prometheus it will be required to expose an additional HTTP endpoint. This endpoint can be secured by preventing access to it via the nginx proxy.

Impact on users
None

Impact on hosting
This should not impact any existing hosting but will create an alternative monitoring solution for hosters.

Backwards compatibility
Backwards compatibility with current metrics will be maintained.

Release Timeline
End of Q2 2020.

Open questions and issues
None

@snopoke snopoke added the CEP CommCare Enhancement Proposal label Mar 9, 2020
@snopoke snopoke changed the title [CEP] Support for Promethius metrics [CEP] Support for Prometheus metrics Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CEP: done CEP CommCare Enhancement Proposal
Projects
None yet
Development

No branches or pull requests

6 participants