Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for setting up a new component: k8slog receiver #24439

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 20 additions & 0 deletions .chloggen/k8slog_receiver_setup.yaml
@@ -0,0 +1,20 @@
# Use this changelog template to create an entry for release notes.
# If your change doesn't affect end users, such as a test fix or a tooling change,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: new_component

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: k8slogreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add the skeleton for the new k8slogreceiver in development."

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [23339]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Expand Up @@ -219,6 +219,7 @@ receiver/jmxreceiver/ @open-telemetry/collect
receiver/journaldreceiver/ @open-telemetry/collector-contrib-approvers @sumo-drosiek @djaglowski
receiver/k8sclusterreceiver/ @open-telemetry/collector-contrib-approvers @dmitryax @TylerHelmuth @povilasv
receiver/k8seventsreceiver/ @open-telemetry/collector-contrib-approvers @dmitryax @TylerHelmuth
receiver/k8slogreceiver/ @open-telemetry/collector-contrib-approvers @h0cheung @TylerHelmuth
receiver/k8sobjectsreceiver/ @open-telemetry/collector-contrib-approvers @dmitryax @hvaghani221 @TylerHelmuth
receiver/kafkametricsreceiver/ @open-telemetry/collector-contrib-approvers @dmitryax
receiver/kafkareceiver/ @open-telemetry/collector-contrib-approvers @pavolloffay @MovieStoreGuy
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Expand Up @@ -215,6 +215,7 @@ body:
- receiver/journald
- receiver/k8scluster
- receiver/k8sevents
- receiver/k8slog
- receiver/k8sobjects
- receiver/kafka
- receiver/kafkametrics
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yaml
Expand Up @@ -209,6 +209,7 @@ body:
- receiver/journald
- receiver/k8scluster
- receiver/k8sevents
- receiver/k8slog
- receiver/k8sobjects
- receiver/kafka
- receiver/kafkametrics
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/other.yaml
Expand Up @@ -209,6 +209,7 @@ body:
- receiver/journald
- receiver/k8scluster
- receiver/k8sevents
- receiver/k8slog
- receiver/k8sobjects
- receiver/kafka
- receiver/kafkametrics
Expand Down
1 change: 1 addition & 0 deletions cmd/githubgen/allowlist.txt
Expand Up @@ -14,3 +14,4 @@ cheempz
jerrytfleung
sh0rez
driverpt
h0cheung
1 change: 1 addition & 0 deletions receiver/k8slogreceiver/Makefile
@@ -0,0 +1 @@
include ../../Makefile.Common
144 changes: 144 additions & 0 deletions receiver/k8slogreceiver/README.md
@@ -0,0 +1,144 @@
# K8slog Receiver

<!-- status autogenerated section -->
| Status | |
| ------------- |-----------|
| Stability | [development]: logs |
| Distributions | [] |
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Areceiver%2Fk8slog%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Areceiver%2Fk8slog) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Areceiver%2Fk8slog%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Areceiver%2Fk8slog) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@h0cheung](https://www.github.com/h0cheung), [@TylerHelmuth](https://www.github.com/TylerHelmuth) |

[development]: https://github.com/open-telemetry/opentelemetry-collector#development
<!-- end autogenerated section -->

Tails and parses logs in k8s environment.

There only one mode of discovery as for now, it's specified by the `discovery.mode` configuration option:
atoulme marked this conversation as resolved.
Show resolved Hide resolved
- `daemonset-stdout`: (default) Deployed as a DaemonSet, the receiver will read logs from the stdout of pods in the same node.
Copy link
Member

@ChrsMark ChrsMark Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a diagram for the 3 of them in the design itself?

I'm interested to know if the daemonset-stdout mode will read the logs from the actual files or not. Could we clarify this in the design document? If that's not the case I wonder if this should be the default then, because I'm not sure if it could provide enough of delivery guarantees.

Specifically, the receiver should be capable to recover from where it was left in case of Collector's restarts. Reading from files gives us the option to keep track of the read offset while I'm not sure if reading from stdout can provide such a recovery guarantee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documents was updated.
It will read logs from file even in daemonset-stdout mode.


Two modes of discovery are planned to be supported in the future:

- `daemonset-file`: Deployed as a DaemonSet, the receiver will read logs from files inside pods in the same node.
- `sidecar`: Deployed as a sidecar container, the receiver will read logs from files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking, but it'd be nice to have issues to track those?


## Configuration

The following settings are common to all discovery modes:

| Field | Default | Description |
|------------------|--------------------|------------------------------------------------------------------------------------------------------------------|
| `discovery.mode` | `daemonset-stdout` | The mode of discovery. Only `daemonset-stdout` is supported now. `daemonset-file` and `sidecar` are coming soon. |
| `extract` | | The rules to extract metadata from pods and containers. TODO default values. |
| TODO: add fields for reading files similar to filelogreceiver |

When `discovery.mode` is not `sidecar`, there are additional configuration options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to mention what functionalities users lose by not being able set these options.


| Field | Default | Description |
|-------------------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `discovery.k8s_api.auth_type` | `serviceAccount` | The authentication type of k8s api. Options are `serviceAccount` or `kubeConfig`. |
| `discovery.host_root` | `/host-root` | The directory which the root of host is mounted on. |
| `discovery.runtime_apis` | | The runtime apis used to get log file paths. docker and cri-containerd are supported now. By default, it will try to automatically detect the cri-containerd. |
| `discovery.node_from_env` | `KUBE_NODE_NAME` | The environment variable name of node name. |
| `discovery.filter` | [] | The filter used to filter pods and containers. By default, all pods and containers will be collected. |

### Operators

Each operator performs a simple responsibility, such as parsing a timestamp or JSON. Chain together operators to process logs into a desired format.

- Every operator has a `type`.
- Every operator can be given a unique `id`. If you use the same type of operator more than once in a pipeline, you must specify an `id`. Otherwise, the `id` defaults to the value of `type`.
- Operators will output to the next operator in the pipeline. The last operator in the pipeline will emit from the receiver. Optionally, the `output` parameter can be used to specify the `id` of another operator to which logs will be passed directly.
- Only parsers and general purpose operators should be used.

### Filters

When `discovery.mode` is not `sidecar`, the `discovery.filter` field can be used to filter pods and containers. The filter is a list of rules. Each rule is a map with the following fields:

| Field | Description |
|---------------|--------------------------------------------------------------|
| `annotations` | MapFilters that filters pods by annotations. |
| `labels` | MapFilters that filters pods by labels. |
| `env` | MapFilters that filters containers by environment variables. |
| `containers` | ValueFilters that filters containers by name. |
| `namespaces` | ValueFilters that filters pods by namespace. |
| `pods` | ValueFilters that filters pods by name. |

#### MapFilter

A MapFilter can be used to filter pods by maps, such as annotations or labels. It has the following fields:

| Field | Description |
|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `op` | The operation to perform. Options are: <br> - "equals": (default) the value must be equal to the specified value. <br>- "not-equals": the value must not be equal to the specified value. <br> - "exists": the value must exist. <br> - "not-exists": the value must not exist. <br> - "matches": the value must match the specified regular expression. <br> - "not-matches": the value must not match the specified regular expression. |
| `key` | The key of the map. |
| `value` | The value to match. Only used for "equals", "not-equals", "matches", and "not-matches" operations. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this right away?

Note other components also have built their own filters, such as k8sobjects.

Copy link
Contributor Author

@h0cheung h0cheung Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is similar to the k8sattributes processor. Many users who want to use this component may be familiar with k8sattributes since it was the previous solution for correlating k8s metadata to logs. This may reduce their migration costs.


#### ValueFilter

A ValueFilter can be used to filter pods by string values, such as container names or namespaces. It has the following fields:

| Field | Description |
|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `op` | The operation to perform. Options are: <br> - "equals": (default) the value must be equal to the specified value. <br>- "not-equals": the value must not be equal to the specified value. <br> - "matches": the value must match the specified regular expression. <br> - "not-matches": the value must not match the specified regular expression. |
| `value` | The value to match. |

### Extract

The `extract` field can be used to extract fields from the log file path. It has the following fields:

| Field | Description |
|---------------|--------------------------------------------------------------------------------------|
| `metadata` | A string slice of metadata to extract from the pods and containers. |
| `env` | A FieldExtractConfig that extracts fields from environment variables of containers. |
| `annotations` | A FieldExtractConfig that extracts fields from annotations of pods. |
| `labels` | A FieldExtractConfig that extracts fields from labels of pods. |

#### FieldExtractConfig

A FieldExtractConfig can be used to extract fields from maps, such as annotations or labels. It has the following fields:

| Field | Description |
|-------------|------------------------------------------------------------------------------------------------------|
| `tag_name` | Required. The name of the extracted attributes. |
| `key` | The key of the map (annotation, label or etc).Exactly one of `key` or `key_regex` must be specified. |
| `key_regex` | The regular expression of the key. Exactly one of `key` or `key_regex` must be specified. |
| `regex` | Optional. The regular expression to extract a submatch from the value. |

### Supported encodings

| Key | Description |
|------------|------------------------------------------------------------------|
| `nop` | No encoding validation. Treats the file as a stream of raw bytes |
| `utf-8` | UTF-8 encoding |
| `utf-16le` | UTF-16 encoding with little-endian byte order |
| `utf-16be` | UTF-16 encoding with big-endian byte order |
| `ascii` | ASCII encoding |
| `big5` | The Big5 Chinese character encoding |

Other less common encodings are supported on a best-effort basis. See [https://www.iana.org/assignments/character-sets/character-sets.xhtml](https://www.iana.org/assignments/character-sets/character-sets.xhtml) for other encodings available.

## Additional Terminology and Features

- An [entry](../../pkg/stanza/docs/types/entry.md) is the base representation of log data as it moves through a pipeline. All operators either create, modify, or consume entries.
- A [field](../../pkg/stanza/docs/types/field.md) is used to reference values in an entry.
- A common [expression](../../pkg/stanza/docs/types/expression.md) syntax is used in several operators. For example, expressions can be used to [filter](../../pkg/stanza/docs/operators/filter.md) or [route](../../pkg/stanza/docs/operators/router.md) entries.

### Parsers with Embedded Operations

Many parsers operators can be configured to embed certain followup operations such as timestamp and severity parsing. For more information, see [complex parsers](../../pkg/stanza/docs/types/parsers.md#complex-parsers).

## Example - Collect logs from stdout of all containers

Receiver Configuration
```yaml
receivers:
k8slog:
discovery:
mode: daemonset-stdout
operators:
- type: recombine
combine_field: body
is_first_entry: body matches "^\\d{4}-\\d{2}-\\d{2}"
max_log_size: 128kb
source_identifier: attributes["k8s.pod.uid"]
```