New component: Kubernetes api logs receiver #24641

yotamloe · 2023-07-27T10:23:02Z

The purpose and use-cases of the new component

The problem:

Logging is a challenge in serverless Kubernetes frameworks (like AWS EKS Fargate, AKS virtual nodes, GCP autopilot, etc...) due to the lack of direct access to the underlying nodes. This limitation means that traditional log collection methods, which often rely on reading log files directly from the nodes, are ineffective.

Example use cases:

Consider a developer who is working in an AKS virtual nodes or GCP autopilot environment. As for today, there is no simple solution to forward logs from the serverless environment to other logging backends.
Consider a developer who is working in an EKS Fargate environment. At present, EKS Fargate supports a limited set of outputs for logs: Elasticsearch, Firehose, Kinesis Firehose, CloudWatch, and CloudWatch Logs. If the developer's backend system doesn't align with any of these outputs, they face challenges in log collection.

Describe the solution you'd like:

Proposing a new receiver to gather logs directly from the Kubernetes API, bypassing the need for node-level access. This receiver will help developers working with serverless Kubernetes environments who need access to workload logs for troubleshooting and monitoring.

Describe alternatives you've considered:

Today Some vendors have specific solutions like EKS Fargate Fluentbit log router, but they are limited and do not support a wide range of backends out of the box. AKS virtual nodes and GCP autopilot does not have a simple solution for log forwarding. I think the goal of the new component is to make serverless kubernetes log collection vendor-agnostic.

Example configuration for the component

Draft configuration (any suggestions from the community are welcome):

receivers:
  kube_api_logs:
    collection_interval: 10s
    namespaces: 
      - default
      - dev
      - prod
    filters:
      - pod_labels:
          app: my-app
      - pod_names:
        - server-*
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'

The suggested draft supports filtering logs according to namespaces, pod labels and pod names, and also supports stanza operators for log parsing.

Telemetry data types supported

logs

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

I'm curious if any other community members face these challenges and what solutions they use to overcome them.

I will be happy for any feedback on this suggestion.

I would be happy to contribute to this feature and open a PR for it.

The text was updated successfully, but these errors were encountered:

jinja2 · 2023-07-27T20:03:03Z

Hi @yotamloe, this sounds interesting but I am wondering if you have tried running the collector as a sidecar with your application container? Are there any specific challenges with the collector when run in sidecar mode other than the additional config a developer might need to add to the pod specs?

jpkrohling · 2023-07-28T21:39:46Z

Question: instead of specifying a reload interval, can you implement this using a watcher?

yotamloe · 2023-07-29T11:02:41Z

Are there any specific challenges with the collector when run in sidecar mode other than the additional config a developer might need to add to the pod specs?

@jinja2 Thank you for reading the proposal this is great feedback 😁 I didn't find any receiver that is capable of collecting pod logs without reaching the log files in the underlying node (filelogreceiver) and that is the main challenge in serverless k8s environments. Do you know a specific receiver that can achieve this without node-level access?
Besides that, I think there are some challenges in the sidecar approach especially at scale.

Complexity: As you've pointed out, additional configuration is required for the pod specs. This can increase the complexity of your deployment configuration, It may also add to the complexity of managing and updating these configurations, especially if you have a large number of pods.
Increased Resource Consumption: Each collector instance will require its own set of resources. Running a collector in each pod will consume additional CPU, memory, and storage resources.
Isolation: While deploying the collector as a sidecar provides better isolation compared to running it as a standalone service, it also means that if there's an issue with the collector, it could potentially impact the main application running in the same pod.

yotamloe · 2023-07-29T11:04:53Z

Question: instead of specifying a reload interval, can you implement this using a watcher?

@jpkrohling Yes, I think it's possible.

JaredTan95 · 2023-07-30T07:25:24Z

Proposing a new receiver to gather logs directly from the Kubernetes API, bypassing the need for node-level access.

@yotamloe kube-apiserver fall over and become unresponsive when cluster is too large and too many requests are sent to it.

I think that we can adds an option(in daemonset mode) to send the request to kubelet /pods endpoint instead of kube-apiserver to retrieve the log if possible.

Since Kubelet is running locally in nodes, the request would be responded faster and each node would only get one request one time. This could save kube-apiserver power to handle other requests. In this way, the Kube-apiserver bottleneck should be avoided when the cluster is large.

yotamloe · 2023-07-31T08:53:35Z

Thanks, @JaredTan95 This is great feedback. Your proposition sounds interesting and could indeed reduce the pressure on the kube-apiserver, especially in larger clusters. My main concern due to the fact we are dealing with serverless k8s environments is that some vendors do not support daemonsets. for example this is stated in eks fargete docs:

Daemonsets aren't supported on Fargate. If your application requires a daemon, reconfigure that daemon to run as a sidecar container in your Pods.

And since kubelet runs locally on the fargate nodes it could be hard to ensure we have communication with all of the kubelets without using deamonset (or a sidecar for all containers). I would be happy to hear your thoughts.

jinja2 · 2023-08-02T01:52:56Z

Do you know a specific receiver that can achieve this without node-level access?

@yotamloe You can use the filelog receiver but the app needs to write to file. The general practice is to set up an emptydir mounted by both the app and otel container. This might require application changes to get it to log to file and rotation, etc. To reduce the config overhead, I recommend looking into auto-injecting the otel-collector sidecar with the Opentelemetry Operator, if the distro allows installing mutating webhooks. Here's a simple example you can build on.
Install the operator, and add an otelcol custom resource like below.

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: logging-sidecar
spec:
  mode: sidecar
  volumeMounts:
  - name: varlog
    mountPath: /var/log
  resources:
    limits:
      cpu: 100m
      memory: 50Mi
  config: |
    extensions:
      file_storage:
        directory: /var/log

    receivers:
      filelog:
        include: ["/var/log/*.log"]
        include_file_path: true
        start_at: beginning
        storage: file_storage
        operators:
        - type: add
          field: resource["k8s.pod.name"]
          value: ${OTEL_RESOURCE_ATTRIBUTES_POD_NAME}

    exporters:
      logging:
        verbosity: detailed

    service:
      extensions: [file_storage]
      pipelines:
        logs:
          receivers: [filelog]
          processors: []
          exporters: [logging]

Example application pod spec which will use an annotation to indicate the which collector config to inject.

apiVersion: v1
kind: Pod
metadata:
  name: counter
  annotations:
    sidecar.opentelemetry.io/inject: "logging-sidecar"
spec:
  containers:
  - name: count
    image: busybox:1.28
    args:
    - /bin/sh
    - -c
    - >
      i=0;
      while true;
      do
        echo "$(date) INFO $i" 2>&1 | tee /var/log/1.log
        i=$((i+1));
        sleep 1;
      done
    volumeMounts:
    - name: varlog
      mountPath: /var/log
  volumes:
  - name: varlog
    emptyDir:
      sizeLimit: 50Mi

Re: the proposed receiver, imho it might not be a sustainable solution for clusters running at any real production scale. But looks like a good addition for smaller development clusters. A few things to consider for the receiver -

I think exposing the sinceSeconds and tailLines options from the logs api as some config in receiver will be useful, so on startups the receiver can be configured to not get all logs from start and still give user the option to get some historical logs. Another hacky option could be leveraging the sinceTime parameter supported by this api, get the receiver to persist the the time it shutdown and on restart, it requests from that timestamp, giving some sort of checkpointing.
We might want to allow filtering for container/initcontainer names. There maybe sidecar containers (being a pattern used a lot in serverless clusters) users might want to exclude, and filtering these out will help reduce the api requests as we follow logs for a container per-request.

jpkrohling · 2023-08-02T19:34:05Z

I believe there's still a need to at least explore having a component that grabs logs from the API server, as it would remove the requirement to run the collector as a daemonset. Requiring daemonsets is a no-go in some setups and is especially problematic for multi-tenant settings.

TylerHelmuth · 2023-08-03T02:34:13Z

I would be curious if the k8sobjectsreceiver could be made to do it.

jinja2 · 2023-08-08T15:48:49Z

I would be curious if the k8sobjectsreceiver could be made to do it.

@TylerHelmuth Are you asking if the object watching feature of the k8sobject receiver can be leveraged here? This receiver will list + watch for k8s pods but will use a different api to get/follow the logs. kubelet will be reading the logs from files local to the nodes and won't involve etcd outside of tracking the pods for which to collect the logs.

yotamloe · 2023-08-15T09:54:56Z

Thanks, everybody for all of your feedback it's super helpful! I understand that there is interest in the community to explore a component that collects logs from the Kubernetes API server.
To summarise some points raised here during the conversation so far:

Collect logs from kubelet in daemonset mode
Implement log collection using Watcher
Exposing the sinceSeconds and tailLines? (not sure if it will be possible if we use Watchers)
Allow filtering container/initcontainer names

Draft configuration (any suggestions from the community are welcome):

receivers:
  kube_api_logs:
    namespaces: 
      - default
      - dev
      - prod
    filters:
      - container_names:
        - server-*
      - pod_labels:
          app: my-app
      - pod_names:
        - server-*
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
    daemonset_mode: true

I will Start working on a PR for it (if there are no objections), any help from the community will be amazing!

dmitryax · 2023-08-28T06:47:33Z

There is another initiative to introduce a receiver with similar functionality #24439. I believe we should consolidate the efforts and have one receiver for both use cases

github-actions · 2023-10-30T03:29:13Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions · 2023-12-29T05:19:11Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

yotamloe added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Jul 27, 2023

yotamloe changed the title ~~New component:~~ New component: Kubernetes api logs receiver Jul 27, 2023

atoulme removed the needs triage New item requiring triage label Jul 27, 2023

petewall mentioned this issue Aug 1, 2023

Investigate getting infrastructure metrics and logs using OTel collectors grafana/k8s-monitoring-helm#16

Closed

This was referenced Sep 3, 2023

Weekly Report: 2023-08-27 - 2023-09-03 kevinslin/opentelemetry-collector-contrib#20

Open

Weekly Report: 2023-08-27 - 2023-09-03 kevinslin/opentelemetry-collector-contrib#21

Open

This was referenced Sep 6, 2023

Weekly Report: 2023-08-30 - 2023-09-06 kevinslin/opentelemetry-collector-contrib#22

Open

Weekly Report: 2023-09-05 - 2023-09-12 kevinslin/opentelemetry-collector-contrib#23

Open

This was referenced Sep 12, 2023

Weekly Report: 2023-09-05 - 2023-09-12 kevinslin/opentelemetry-collector-contrib#24

Open

Weekly Report: 2023-09-05 - 2023-09-12 kevinslin/opentelemetry-collector-contrib#25

Open

This was referenced Sep 19, 2023

Weekly Report: 2023-09-12 - 2023-09-19 kevinslin/opentelemetry-collector-contrib#26

Open

Weekly Report: 2023-09-19 - 2023-09-26 kevinslin/opentelemetry-collector-contrib#27

Open

This was referenced Oct 3, 2023

Weekly Report: 2023-09-26 - 2023-10-03 #27402

Closed

Weekly Report: 2023-09-26 - 2023-10-03 kevinslin/opentelemetry-collector-contrib#28

Open

Weekly Report: 2023-10-03 - 2023-10-10 #27574

Closed

github-actions bot mentioned this issue Oct 17, 2023

Weekly Report: 2023-10-10 - 2023-10-17 #27791

Closed

jinja2 mentioned this issue Oct 20, 2023

REQUEST: New membership for jinja2 open-telemetry/community#1749

Closed

6 tasks

github-actions bot mentioned this issue Oct 24, 2023

Weekly Report: 2023-10-17 - 2023-10-24 #28557

Closed

github-actions bot added the Stale label Oct 30, 2023

github-actions bot mentioned this issue Oct 31, 2023

Weekly Report: 2023-10-24 - 2023-10-31 #28813

Closed

This was referenced Nov 7, 2023

Weekly Report: 2023-10-31 - 2023-11-07 #29000

Closed

Weekly Report: 2023-11-07 - 2023-11-14 #29245

Closed

This was referenced Nov 21, 2023

Weekly Report: 2023-11-14 - 2023-11-21 #29422

Closed

Weekly Report: 2023-11-21 - 2023-11-28 #29517

Closed

This was referenced Dec 5, 2023

Weekly Report: 2023-11-28 - 2023-12-05 #29650

Closed

Weekly Report: 2023-12-05 - 2023-12-12 #29753

Closed

This was referenced Dec 19, 2023

Weekly Report: 2023-12-12 - 2023-12-19 #30067

Closed

Weekly Report: 2023-12-19 - 2023-12-26 #30206

Closed

github-actions bot added the closed as inactive label Dec 29, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New component: Kubernetes api logs receiver #24641

New component: Kubernetes api logs receiver #24641

yotamloe commented Jul 27, 2023 •

edited

jinja2 commented Jul 27, 2023 •

edited

jpkrohling commented Jul 28, 2023

yotamloe commented Jul 29, 2023

yotamloe commented Jul 29, 2023 •

edited

JaredTan95 commented Jul 30, 2023 •

edited

yotamloe commented Jul 31, 2023 •

edited

jinja2 commented Aug 2, 2023

jpkrohling commented Aug 2, 2023

TylerHelmuth commented Aug 3, 2023

jinja2 commented Aug 8, 2023 •

edited

yotamloe commented Aug 15, 2023

dmitryax commented Aug 28, 2023 •

edited

github-actions bot commented Oct 30, 2023

github-actions bot commented Dec 29, 2023

New component: Kubernetes api logs receiver #24641

New component: Kubernetes api logs receiver #24641

Comments

yotamloe commented Jul 27, 2023 • edited

The purpose and use-cases of the new component

The problem:

Example use cases:

Describe the solution you'd like:

Describe alternatives you've considered:

Example configuration for the component

Telemetry data types supported

Code Owner(s)

Sponsor (optional)

Additional context

jinja2 commented Jul 27, 2023 • edited

jpkrohling commented Jul 28, 2023

yotamloe commented Jul 29, 2023

yotamloe commented Jul 29, 2023 • edited

JaredTan95 commented Jul 30, 2023 • edited

yotamloe commented Jul 31, 2023 • edited

jinja2 commented Aug 2, 2023

jpkrohling commented Aug 2, 2023

TylerHelmuth commented Aug 3, 2023

jinja2 commented Aug 8, 2023 • edited

yotamloe commented Aug 15, 2023

dmitryax commented Aug 28, 2023 • edited

github-actions bot commented Oct 30, 2023

github-actions bot commented Dec 29, 2023

yotamloe commented Jul 27, 2023 •

edited

jinja2 commented Jul 27, 2023 •

edited

yotamloe commented Jul 29, 2023 •

edited

JaredTan95 commented Jul 30, 2023 •

edited

yotamloe commented Jul 31, 2023 •

edited

jinja2 commented Aug 8, 2023 •

edited

dmitryax commented Aug 28, 2023 •

edited