Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Kubernetes api logs receiver #24641

Closed
yotamloe opened this issue Jul 27, 2023 · 14 comments
Closed

New component: Kubernetes api logs receiver #24641

yotamloe opened this issue Jul 27, 2023 · 14 comments
Labels

Comments

@yotamloe
Copy link
Contributor

yotamloe commented Jul 27, 2023

The purpose and use-cases of the new component

The problem:

Logging is a challenge in serverless Kubernetes frameworks (like AWS EKS Fargate, AKS virtual nodes, GCP autopilot, etc...) due to the lack of direct access to the underlying nodes. This limitation means that traditional log collection methods, which often rely on reading log files directly from the nodes, are ineffective.

Example use cases:

  1. Consider a developer who is working in an AKS virtual nodes or GCP autopilot environment. As for today, there is no simple solution to forward logs from the serverless environment to other logging backends.
  2. Consider a developer who is working in an EKS Fargate environment. At present, EKS Fargate supports a limited set of outputs for logs: Elasticsearch, Firehose, Kinesis Firehose, CloudWatch, and CloudWatch Logs. If the developer's backend system doesn't align with any of these outputs, they face challenges in log collection.

Describe the solution you'd like:

Proposing a new receiver to gather logs directly from the Kubernetes API, bypassing the need for node-level access. This receiver will help developers working with serverless Kubernetes environments who need access to workload logs for troubleshooting and monitoring.

Describe alternatives you've considered:

Today Some vendors have specific solutions like EKS Fargate Fluentbit log router, but they are limited and do not support a wide range of backends out of the box. AKS virtual nodes and GCP autopilot does not have a simple solution for log forwarding. I think the goal of the new component is to make serverless kubernetes log collection vendor-agnostic.

Example configuration for the component

Draft configuration (any suggestions from the community are welcome):

receivers:
  kube_api_logs:
    collection_interval: 10s
    namespaces: 
      - default
      - dev
      - prod
    filters:
      - pod_labels:
          app: my-app
      - pod_names:
        - server-*
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'

The suggested draft supports filtering logs according to namespaces, pod labels and pod names, and also supports stanza operators for log parsing.

Telemetry data types supported

logs

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

I'm curious if any other community members face these challenges and what solutions they use to overcome them.

I will be happy for any feedback on this suggestion.

I would be happy to contribute to this feature and open a PR for it.

@yotamloe yotamloe added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Jul 27, 2023
@yotamloe yotamloe changed the title New component: New component: Kubernetes api logs receiver Jul 27, 2023
@atoulme atoulme removed the needs triage New item requiring triage label Jul 27, 2023
@jinja2
Copy link
Contributor

jinja2 commented Jul 27, 2023

Hi @yotamloe, this sounds interesting but I am wondering if you have tried running the collector as a sidecar with your application container? Are there any specific challenges with the collector when run in sidecar mode other than the additional config a developer might need to add to the pod specs?

@jpkrohling
Copy link
Member

Question: instead of specifying a reload interval, can you implement this using a watcher?

@yotamloe
Copy link
Contributor Author

Are there any specific challenges with the collector when run in sidecar mode other than the additional config a developer might need to add to the pod specs?

@jinja2 Thank you for reading the proposal this is great feedback 😁 I didn't find any receiver that is capable of collecting pod logs without reaching the log files in the underlying node (filelogreceiver) and that is the main challenge in serverless k8s environments. Do you know a specific receiver that can achieve this without node-level access?
Besides that, I think there are some challenges in the sidecar approach especially at scale.

  1. Complexity: As you've pointed out, additional configuration is required for the pod specs. This can increase the complexity of your deployment configuration, It may also add to the complexity of managing and updating these configurations, especially if you have a large number of pods.
  2. Increased Resource Consumption: Each collector instance will require its own set of resources. Running a collector in each pod will consume additional CPU, memory, and storage resources.
  3. Isolation: While deploying the collector as a sidecar provides better isolation compared to running it as a standalone service, it also means that if there's an issue with the collector, it could potentially impact the main application running in the same pod.

@yotamloe
Copy link
Contributor Author

yotamloe commented Jul 29, 2023

Question: instead of specifying a reload interval, can you implement this using a watcher?

@jpkrohling Yes, I think it's possible.

@JaredTan95
Copy link
Member

JaredTan95 commented Jul 30, 2023

Proposing a new receiver to gather logs directly from the Kubernetes API, bypassing the need for node-level access.

@yotamloe kube-apiserver fall over and become unresponsive when cluster is too large and too many requests are sent to it.

I think that we can adds an option(in daemonset mode) to send the request to kubelet /pods endpoint instead of kube-apiserver to retrieve the log if possible.

Since Kubelet is running locally in nodes, the request would be responded faster and each node would only get one request one time. This could save kube-apiserver power to handle other requests. In this way, the Kube-apiserver bottleneck should be avoided when the cluster is large.

@yotamloe
Copy link
Contributor Author

yotamloe commented Jul 31, 2023

Thanks, @JaredTan95 This is great feedback. Your proposition sounds interesting and could indeed reduce the pressure on the kube-apiserver, especially in larger clusters. My main concern due to the fact we are dealing with serverless k8s environments is that some vendors do not support daemonsets. for example this is stated in eks fargete docs:

  • Daemonsets aren't supported on Fargate. If your application requires a daemon, reconfigure that daemon to run as a sidecar container in your Pods.

And since kubelet runs locally on the fargate nodes it could be hard to ensure we have communication with all of the kubelets without using deamonset (or a sidecar for all containers). I would be happy to hear your thoughts.

@jinja2
Copy link
Contributor

jinja2 commented Aug 2, 2023

Do you know a specific receiver that can achieve this without node-level access?

@yotamloe You can use the filelog receiver but the app needs to write to file. The general practice is to set up an emptydir mounted by both the app and otel container. This might require application changes to get it to log to file and rotation, etc. To reduce the config overhead, I recommend looking into auto-injecting the otel-collector sidecar with the Opentelemetry Operator, if the distro allows installing mutating webhooks. Here's a simple example you can build on.
Install the operator, and add an otelcol custom resource like below.

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: logging-sidecar
spec:
  mode: sidecar
  volumeMounts:
  - name: varlog
    mountPath: /var/log
  resources:
    limits:
      cpu: 100m
      memory: 50Mi
  config: |
    extensions:
      file_storage:
        directory: /var/log

    receivers:
      filelog:
        include: ["/var/log/*.log"]
        include_file_path: true
        start_at: beginning
        storage: file_storage
        operators:
        - type: add
          field: resource["k8s.pod.name"]
          value: ${OTEL_RESOURCE_ATTRIBUTES_POD_NAME}

    exporters:
      logging:
        verbosity: detailed

    service:
      extensions: [file_storage]
      pipelines:
        logs:
          receivers: [filelog]
          processors: []
          exporters: [logging]

Example application pod spec which will use an annotation to indicate the which collector config to inject.

apiVersion: v1
kind: Pod
metadata:
  name: counter
  annotations:
    sidecar.opentelemetry.io/inject: "logging-sidecar"
spec:
  containers:
  - name: count
    image: busybox:1.28
    args:
    - /bin/sh
    - -c
    - >
      i=0;
      while true;
      do
        echo "$(date) INFO $i" 2>&1 | tee /var/log/1.log
        i=$((i+1));
        sleep 1;
      done
    volumeMounts:
    - name: varlog
      mountPath: /var/log
  volumes:
  - name: varlog
    emptyDir:
      sizeLimit: 50Mi

Re: the proposed receiver, imho it might not be a sustainable solution for clusters running at any real production scale. But looks like a good addition for smaller development clusters. A few things to consider for the receiver -

  1. I think exposing the sinceSeconds and tailLines options from the logs api as some config in receiver will be useful, so on startups the receiver can be configured to not get all logs from start and still give user the option to get some historical logs. Another hacky option could be leveraging the sinceTime parameter supported by this api, get the receiver to persist the the time it shutdown and on restart, it requests from that timestamp, giving some sort of checkpointing.
  2. We might want to allow filtering for container/initcontainer names. There maybe sidecar containers (being a pattern used a lot in serverless clusters) users might want to exclude, and filtering these out will help reduce the api requests as we follow logs for a container per-request.

@jpkrohling
Copy link
Member

I believe there's still a need to at least explore having a component that grabs logs from the API server, as it would remove the requirement to run the collector as a daemonset. Requiring daemonsets is a no-go in some setups and is especially problematic for multi-tenant settings.

@TylerHelmuth
Copy link
Member

I would be curious if the k8sobjectsreceiver could be made to do it.

@jinja2
Copy link
Contributor

jinja2 commented Aug 8, 2023

I would be curious if the k8sobjectsreceiver could be made to do it.

@TylerHelmuth Are you asking if the object watching feature of the k8sobject receiver can be leveraged here? This receiver will list + watch for k8s pods but will use a different api to get/follow the logs. kubelet will be reading the logs from files local to the nodes and won't involve etcd outside of tracking the pods for which to collect the logs.

@yotamloe
Copy link
Contributor Author

Thanks, everybody for all of your feedback it's super helpful! I understand that there is interest in the community to explore a component that collects logs from the Kubernetes API server.
To summarise some points raised here during the conversation so far:

  1. Collect logs from kubelet in daemonset mode
  2. Implement log collection using Watcher
  3. Exposing the sinceSeconds and tailLines? (not sure if it will be possible if we use Watchers)
  4. Allow filtering container/initcontainer names

Draft configuration (any suggestions from the community are welcome):

receivers:
  kube_api_logs:
    namespaces: 
      - default
      - dev
      - prod
    filters:
      - container_names:
        - server-*
      - pod_labels:
          app: my-app
      - pod_names:
        - server-*
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
    daemonset_mode: true

I will Start working on a PR for it (if there are no objections), any help from the community will be amazing!

@dmitryax
Copy link
Member

dmitryax commented Aug 28, 2023

There is another initiative to introduce a receiver with similar functionality #24439. I believe we should consolidate the efforts and have one receiver for both use cases

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants