Observability Operator

What Is It?

The Observability Operator deploys & maintains a common platform for Application Services to share and utilize to aid in monitoring & reporting on their service components. It integrates with the Observatorium project for pushing metrics and logs to a central location.

What's included?

Prometheus

Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.

Prometheus's Alertmanager is also supported, which handles alerts sent by Prometheus & takes care of deduplicating, grouping, and routing them to the correct receiver integration (such as, in our case, PagerDuty). It also takes care of silencing and inhibition of alerts.

Grafana

Grafana is a multi-platform open source analytics and interactive visualization web application. When configured with supported data sources, Grafana provides charts, graphs, and other visualizations via UI dashboards.

Promtail

Promtail is a log aggregator for Loki, Grafana's platform for collecting and analyzing logs. Loki is the logging backend used in Observatorium.

How do we integrate our service with Observability?

If you find yourself asking, chances are good a conversation's already been started in that regard, but if not, reach out! It'll likely be a bit more involved than plug-and-play and we'd like to make sure we support your needs. That being said (and know that what follows is absolutely subject to change)...

Observability Operator is intended to support multiple application services, each of which will be responsible for maintaining their own configuration repository & instantiating a ConfigMap containing a bit of information the operator needs in order to read from it. At current, it's expected that configuration repos reside within our 'bf2' organization as we use a special read-only mechanism limited to within our organization for access.

As an example, first take a look at the configuration repository for the first service we've onboarded, Managed Kafka. There you'll find an index file and various configuration files referenced from within. In order to use this config repo, the Observability Operator must be told about it via a Secret:

kind: Secret
apiVersion: v1
metadata:
  name: kafka-observability-configuration
  namespace: kafka-observability
  labels:
    configures: observability-operator
data:
  access_token: '<token here>'
  channel: 'resources'
  repository: 'https://api.github.com/repos/bf2fc6cc711aee1a0c2a/observability-resources-mk/contents'
  tag: <tag or branch>

The Observability Operator doesn't care too much about what namespace the Secret resides in (that's not to say that we won't have an opinion, though!). Instead, it scans all namespaces for any Secrets matching a particular label set as specified in the Observability CR (more on that in a bit):

  configurationSelector:
    matchLabels:
      configures: "observability-operator"

What's supported via external config?

Within a given resources folder an index.json file containing, at a minimum, id and config fields must exist:

{
  "id": "shiny-managed-service-development",
  "config": {...}
}

The id field (specifying both the service and channel) is used by the observability operator to label & track various generated resources.

The config field may contain various entries as appropriate for your service:

config.grafana.dashboards expects an array list of subdirectory/file.yaml entries, each pointing to a complete Grafana Dashboard YAML definition file:
```
 "grafana": {
  "dashboards": [
    "grafana/foo-dashboard.yaml",
    "grafana/bar-dashboard.yaml",
  ]
 },
}
```

config.promtail specifies whether Promtail should be used and, if so, a namespace label selector for matching:

  "promtail": {
    "observatorium": "default"
    "enabled": true,
    "namespaceLabelSelector": {
      "app": "strimzi"
    }
  },

config.promtail.observatorium specifies the id of the Observatorium config to forward logs to
config.alertmanager indicates the name of two prerequisite secrets assumed to pre-exist on the cluster for configuration of Prometheus PagerDuty & Alertmanager integrations:
```
  "alertmanager": {
    "pagerDutySecretName": "pagerduty",
    "deadmansSnitchSecretName": "deadmanssnitch"
  },
```
config.promtail.observatorium specifies the id of an observatorium config where the logs are written to

config.prometheus.pod_monitors expects an array list of sub/directory/file.yaml entries, each pointing to a complete Prometheus PodMonitor YAML definition file:

  "prometheus": {
      "pod_monitors": [
        "prometheus/pod_monitors/foo-monitor.yaml",
        "prometheus/pod_monitors/bar-monitor.yaml",
  ],

config.prometheus.rules expects an array list of sub/directory/file.yaml entries, each pointing to a complete PrometheusRule YAML definition file:
```
  "rules": [
    "prometheus/prometheus-rules.yaml"
  ],
```
config.prometheus.federation expects a single subdirectory/file.yaml location pointing to a file containing an array of regex patterns to be concatenated & used in instantiating a Prometheus additional scrape config secret:
```
"federation": "prometheus/federation-config.yaml",
```
config.prometheus.observatorium specifies the id of the Observatorium config to forward metrics to
config.prometheus.remoteWrite expects a single subdirectory/file.yaml location pointing to a file containing an array of regex patterns to be concatenated & used in instantiating the Prometheus operand (CR):
```
  "remoteWrite": "prometheus/remote-write.yaml"
```

config.observatoria an array of observatorium configs, each with an id referenced by prometheus and/or promtail:

  [{
      "id": "default",
      "secretName": "observatorium-configuration-red-hat-sso"
    }
  }, ...]

Additionally, an empty ConfigMap can be created in a target namespace to prevent an Observability operand (CR) from being created in that namespace.

The ConfigMap requires the name to be set to observability-operator-no-init and the target namespace to be specified:

  kind: ConfigMap
  apiVersion: v1
  metadata:
    name: observability-operator-no-init
    namespace: my-target-namespace

What's required in the Observability operand (CR)?

Your spec will be required to specify, at a minumum, two things:

a resyncPeriod to indicate how often external config should be re-fetched
a retention to configure the lifetime of stored data
a configurationSelector indicating what labels to match when scanning for external config info ConfigMaps as previously mentioned

  apiVersion: observability.redhat.com/v1
  kind: Observability
  metadata:
    name: observability-sample
  spec:
    resyncPeriod: 1h
    retention: 45d
    configurationSelector:
      matchLabels:
        configures: "observability-operator"

What's optionally supported via the Observability operand (CR)?

Prometheus storage config

spec:
  storage:
    prometheus:
      volumeClaimTemplate:
        spec:
          storageClassName: ssd
          resources:
            requests:
              storage: 40Gi

Node Tolerations

spec:
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/infra
    operator: Exists

Node Affinities

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: node-role.kubernetes.io/infra
              operator: Exists

Running Locally

Prerequisite Tools

golang 1.16+
operator-sdk v1.1.0
CodeReady Containers 1.18 (OCP 4.6.1) or later
OpenShift command line tool

Setting up your cluster

We recommend targeting a local CRC instance for daily development. Though we're hedging our bets a bit with these amounts, we do feel additional memory config params are essential in having a stable platform to run against:

crc start --pull-secret-file $HOME/.crc/cache/pull-secret.txt --cpus 6 --memory 24576`

Once your cluster is up and running, determine which (or create a new) namespace to target with various generated files. The operator will look in this namespace for an Observability operand (CR) and if not found, generate its own. In order for it to do so, you need to indicate the namespace in one of two ways:
- specify the WATCH_NAMESPACE=foo environment variable when running
- create a file containing only the namespace name at /var/run/secrets/kubernetes.io/serviceaccount/namespace to emulate a pod environment & prevent having to supply the env var every run.

Adding requisite resources

Note that some files require values that are not supplied here or within sample files for security reasons - please review the content of each!

PagerDuty secret:

oc apply -f config/samples/secrets/pagerduty.yaml

DeadmansSnitch secret:

oc apply -f config/samples/secrets/deadmanssnitch.yaml

Auth provider config secret:

Users can choose between two auth configurations; dex or Red Hat SSO

Dex config secret

oc apply -f config/samples/secrets/observatorium-dex-credentials.yaml

Red Hat SSO secret

oc apply -f config/samples/secrets/observatorium-configuration-red-hat-sso.yaml

External config repo secret:
- The Observability stack requires a Personal Access Token to read externalized configuration from within the bf2 organization. For development cycles, you will need to generate a personal token for your own GitHub user (with bf2 access) and place the value in the Secret.
- To generate a new token:
  - Follow the steps found here, making sure to check ONLY the repo box at the top of the scopes/permissions list (which will check each of the subcategory boxes beneath it).
  - Copy the value of your Personal Access Token to a secure private location. Once you leave the page, you cannot access the value again & you will be forced to reset the token to receive a new value should you lose the original.
  - Take care not to push your PAT to any repository as if you do, GitHub will automatically revoke your token as soon as you push, and you'll need to follow this process again to generate a new token.
- Apply the Secret with token value substituted in:
```
oc apply -f config/samples/secrets/observability_secret.yaml
```
The required secrets listed above can be configured and deployed using
```
make deploy/secrets
```
- The following parameters can be modified:
  - NAMESPACE: Defaults to current namespace in use.
  - OBSERVATORIUM_TENANT: Defaults to managedKafka
  - OBSERVATORIUM_GATEWAY: Defaults to https://observatorium-mst.api.stage.openshift.com
  - OBSERVATORIUM_AUTH_TYPE: Defaults to redhat
  - OBSERVATORIUM_RHSSO_URL: Defaults to https://sso.redhat.com/auth/
  - OBSERVATORIUM_RHSSO_REALM: Defaults to redhat-external
  - OBSERVATORIUM_RHSSO_METRICS_CLIENT_ID: No default provided
  - OBSERVATORIUM_RHSSO_METRICS_SECRET: No default provided
  - OBSERVATORIUM_RHSSO_LOGS_CLIENT_ID: No default provided
  - OBSERVATORIUM_RHSSO_LOGS_SECRET: No default provided
  - GITHUB_ACCESS_TOKEN: No default provided
- More information about configurable parameters can be found by running:
```
oc process --parameters -f ./templates/secrets-template.yml
```
For users deploying to CRC, an additional secret is required in the openshift-monitoring namespace for grafana datasources. This secret can be deployed with the command:
```
make deploy/crc/secret
```
- The following parameters can be modified:
  - DATASOURCES: Defaults to ZHVtbXk=
  - PROMETHEUS: Defaults to ZHVtbXk=
- More information about configurable parameters can be found by running:
```
oc process --parameters -f ./templates/crc-secret-template.yml
```
Priority Class
- If the Observability Operator was not installed using OLM, you need to create the requried Priority Class yourself. Use the command to add the necessary Priority Class:
```
oc apply -f config/samples/prioclass.yaml
```

Running via `Makefile`

If this is the first time you've run against the cluster (or your CRD has changed and been uninstalled):

make install

Run the operator as a local process:

WATCH_NAMESPACE=<namespace> make run

alternatively, you can deploy the operator's latest image to your cluster:

make deploy

Running via IntelliJ

If you encounter errors regarding PrometheusRule validation webhooks, you can use the following:
```
oc delete ValidatingWebhookConfiguration prometheusrules.openshift.io
```

Running via VS Code

Add the following into your launch.json file to enable running and debugging.

"version": "0.2.0",
    "configurations": [{
      "name": "Observability Operator",
      "type": "go",
      "request": "launch",
      "mode": "auto",
      "program": "${workspaceFolder}/main.go",
      "env": {
        "WATCH_NAMESPACE": "kafka-observability",
        "KUBERNETES_CONFIG": "~/.kube/config",
        "OPERATOR_NAME": "observability-operator"
      },
      "cwd": "${workspaceFolder}",
      "args": []
    }]
  }

Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Issues

If you'd like to report an issue, feel free to use our project Issues page.

Contact

If all else fails, you can reach us at mk-config-user[at]redhat.com

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
.github/workflows		.github/workflows
api/v1		api/v1
bundle		bundle
config		config
controllers		controllers
hack		hack
runners		runners
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
build_deploy.sh		build_deploy.sh
build_index.sh		build_index.sh
bundle.Dockerfile		bundle.Dockerfile
bundle_history.txt		bundle_history.txt
bundles.db		bundles.db
go.mod		go.mod
go.sum		go.sum
main.go		main.go
opm.Dockerfile		opm.Dockerfile
pr_check.sh		pr_check.sh
readme-ide-run.png		readme-ide-run.png

jeremyary/observability-operator

Folders and files

Latest commit

History

Repository files navigation