Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for setting up a new component: k8slog receiver #24439

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

h0cheung
Copy link
Contributor

Description:

k8slog receiver provides full functionality to collect logs in k8s containers. It is an all-in-one solution, which supports to:

  • collect logs from files inside k8s containers via daemonset
  • support docker and cri-containerd
  • support many docker graph drivers / snapshotters
  • collect logs from stdout of k8s containers via daemonset
  • filter containers by metadata
  • extract metadata of containers
  • collect file logs in k8s containers via sidecar

If we want to collect logs from k8s, we can use this component.

Link to tracking Issue:

#23339

Testing:

Tests for unmarshaling configuration have been added.

Documentation:

Docs for configurations and overall design has been added.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 8, 2023
@h0cheung
Copy link
Contributor Author

h0cheung commented Aug 8, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

remove stale

)

// Config is the configuration of a k8s input operator
type Config struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of the configuration options required for the initial implementation? Can you remove them for now and add them along with the functionality using them going forward?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, starting small will make things easier to review and get started and we can always add them in the future.

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h0cheung I am very excited about this component, sorry I didn't see your proposal sooner. @dmitryax thanks for pointing it out.


// Expr represents the rules to filter containers based on a custom expression.
// TODO: define the values passed to expr.
Expr string `mapstructure:"expr"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not introduce expr filtering to start. The other filter options you've provided are quite extensive. For expression language filtering I'd prefer we use OTTL.

receiver/k8slogreceiver/config.go Outdated Show resolved Hide resolved
stability:
development: [logs]
codeowners:
active: [h0cheung]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add me here as well

Copy link
Contributor Author

@h0cheung h0cheung Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will do it soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @TylerHelmuth @h0cheung , I will be more than happy to help reviewing/testing and maintaining this component. In this, feel free to add me into the list as well if that's ok already.

)

// Config is the configuration of a k8s input operator
type Config struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, starting small will make things easier to review and get started and we can always add them in the future.

receiver/k8slogreceiver/config.go Show resolved Hide resolved
@h0cheung
Copy link
Contributor Author

h0cheung commented Sep 4, 2023

@TylerHelmuth @dmitryax I've simplified the configs and added Tyler Helmuth to the code owner. Could you please have a look at this?

Comment on lines 44 to 48
// Env represents the rules to extract from container environment variables.
Env []FieldExtractConfig `mapstructure:"env"`

// OtelEnv represents the rules to extract from environment variables of otel itself.
OtelEnv []FieldExtractConfig `mapstructure:"otel_env"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow extracting from env variables, that feels like a security risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OtelEnv has been removed.
However, Env is needed in some cases. For example, if there are some environment variables in the docker image, there is no other way to get it.
Is this really risky? resourcedetection supports it, and this component gets k8s node name from environment variables, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an env var to filter feels ok. But this section is for taking fields from the pod and adding them as resource attributes to the log right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting attributes from environment variables is useful, and many components support it such as resoucedetection

Are there any reports of security risks associated with reading environment variables as strings and adding them to data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk is exposing apikey/password information, but that risk is lowered if the config only allows static strings. If we do allow this I think we can do all env vars with 1 config option, no need to split out OTel env vars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the env config is empty, and no variable will be exposed. When a user set this, he should know what he is doing.

It seems that otel_env is necessary, so I've removed it in the latest commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax do you have an opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move forward with this allowed as we've discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find it useful to have the option to collect ENV_VAR information from the Pods.
That would give us important information about OTEL specific info like OTEL_SERVICE_NAME.
Having service.name populated in log records collected by the k8slog receiver is quite important since it will allow us to correlate logs with traces using the service.name value.

receiver/k8slogreceiver/config.go Outdated Show resolved Hide resolved
@h0cheung h0cheung force-pushed the feat/k8slog-setup branch 3 times, most recently from 132b260 to 8256f32 Compare September 8, 2023 12:35
@h0cheung h0cheung force-pushed the feat/k8slog-setup branch 6 times, most recently from f7a8980 to a912135 Compare March 5, 2024 11:23
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 20, 2024
@atoulme
Copy link
Contributor

atoulme commented Mar 20, 2024

@TylerHelmuth fyi is it a good time to review? I'll try my hand at it if possible.

Two modes of discovery are planned to be supported in the future:

- `daemonset-file`: Deployed as a DaemonSet, the receiver will read logs from files inside pods in the same node.
- `sidecar`: Deployed as a sidecar container, the receiver will read logs from files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking, but it'd be nice to have issues to track those?

|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `op` | The operation to perform. Options are: <br> - "equals": (default) the value must be equal to the specified value. <br>- "not-equals": the value must not be equal to the specified value. <br> - "exists": the value must exist. <br> - "not-exists": the value must not exist. <br> - "matches": the value must match the specified regular expression. <br> - "not-matches": the value must not match the specified regular expression. |
| `key` | The key of the map. |
| `value` | The value to match. Only used for "equals", "not-equals", "matches", and "not-matches" operations. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this right away?

Note other components also have built their own filters, such as k8sobjects.

Copy link
Contributor Author

@h0cheung h0cheung Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is similar to the k8sattributes processor. Many users who want to use this component may be familiar with k8sattributes since it was the previous solution for correlating k8s metadata to logs. This may reduce their migration costs.

Copy link
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first PR it looks like this can land. You could cut down a bit on the filters to bring them back later with more testing, but it's ok to mature them in as well.

@h0cheung h0cheung force-pushed the feat/k8slog-setup branch 3 times, most recently from 954c61e to 60fc7e4 Compare March 20, 2024 15:00
| `extract` | | The rules to extract metadata from pods and containers. TODO default values. |
| TODO: add fields for reading files similar to filelogreceiver |

When `discovery.mode` is not `sidecar`, there are additional configuration options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to mention what functionalities users lose by not being able set these options.

receiver/k8slogreceiver/design.md Outdated Show resolved Hide resolved

The first one is caculated by the graph driver, and the second one is from the mount points.

Also, the host root is mounted under /host-root on the otel container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the otel container need any special permissions in order to be able to share mounts with the incoming target Pods on the fly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It connect to the runtime API.

// NodeFromEnv represents the environment variable which contains the node name.
NodeFromEnv string `mapstructure:"node_from_env"`

// HostRoot represents the path which is used to mount the host's root filesystem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need the whole host's root directory to be mounted. Can't we just focus on /var/log/pods by default for the stdout mode? I guess many users would have concerns if their whole host directory is mounted by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But the files under /var/log/pods may be symbol links to paths defined by runtimes, which can also be configured by ops. If they don't mount the whole host's root directory, they must have knowledge about will the files actually live. This may be difficult for users, especially with k8s clusters that mix nodes with different runtimes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be better if we chose a reasonable list of directories like /var/log/pods/*, /var/lib/docker/* etc in order to specifically define the possible directories the known runtimes use?
We can even make this list configurable so as advanced users could set their own extra directories if the want.

Mounting the whole host's root directory looks problematic to me and maybe shouldn't be the default.
In GKE Autopilot for example this would not be supported: grafana/loki#9100.

So my suggestion would be to start with reasonable defaults, while also giving the option for more advanced users to extend these defaults.

```

The `Source` will get pod list and metadata from k8s api (maybe there will also be an implementation using kubelet).
Then, it will get the logPath and env from docker/cri-containerd api. And they will be associated with the pod metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just by-pass the extra API call to the runtime and construct the log path based on a default pattern which is more or less known? i.e. /var/log/pods/'<namespace>_<pod_name>_<pod_uid>'/<container_name>/*.log, where namespace, pod_name, pod_uid and container_name are already known from the k8s API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also works. Datadog is using this way. But:

  1. The path is different across k8s versions. Maybe it will also change in future. see https://github.com/DataDog/datadog-agent/blob/0d3dcb9fdb96e8a9d1810bb041d2dae6250372f8/pkg/logs/launchers/container/tailerfactory/file.go#L239
  2. Some runtime API will directly return the actual path, which reduces resolving symbol links.
  3. If we bypass extra API, it will be complex to get environment variables defined by downward api.

Co-authored-by: Chris Mark <chrismarkou92@gmail.com>
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Apr 17, 2024
@h0cheung
Copy link
Contributor Author

This PR was marked stale due to lack of activity. It will be closed in 14 days.

remove stale

@github-actions github-actions bot removed the Stale label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored cmd/githubgen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants