Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

Fluentd pod crashing on Azure Container Service #847

Open
sbulman opened this issue Aug 5, 2017 · 6 comments
Open

Fluentd pod crashing on Azure Container Service #847

sbulman opened this issue Aug 5, 2017 · 6 comments

Comments

@sbulman
Copy link

sbulman commented Aug 5, 2017

Hi All,

I'm following the instructions to set up Deis on Azure Container Service. One of the deis-logger-fluentd pods is crashing with the following log.

2017-08-05 07:21:26 +0000 [info]: reading config file path="/opt/fluentd/conf/fluentd.conf"
2017-08-05 07:22:27 +0000 [error]: config error file="/opt/fluentd/conf/fluentd.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API v1 endpoint https://10.0.0.1:443: Timed out connecting to server"

Any ideas?

Thanks.

@sbulman sbulman changed the title Fluentd pod crashing Azure Container Service Fluentd pod crashing on Azure Container Service Aug 5, 2017
@sbulman
Copy link
Author

sbulman commented Aug 5, 2017

A bit more info. I created the ACS cluster with 1 agent. The fluentd pod that is crashing is on the master node. The pod running on the agent appears to be working fine.

@ghost
Copy link

ghost commented Sep 25, 2017

We're facing the same issue, same symptoms and circumstances as @sbulman.
The fluentd logger pod continually crashes on the master node on Azure ACS.

@bacongobbler
Copy link
Member

bacongobbler commented Sep 25, 2017

There should not be a fluentd pod running on the master node. There was an open ticket on DaemonSet pods being accidentally scheduled on the kubernetes master node that was eventually solved upstream.

More background context in this ticket, which was resolved in Kubernetes 1.5.0+ via kubernetes/kubernetes#35526.

@ghost
Copy link

ghost commented Sep 25, 2017

Ok, thanks @bacongobbler for the context. It still appears to be an issue though on ACS today. Any thoughts much appreciated!

The fluentd logger pod event for the master node indicates the following error:

Error syncing pod, skipping: failed to "StartContainer" for "deis-logger-fluentd" with CrashLoopBackOff: "Back-off 10s restarting failed container=deis-logger-fluentd pod=deis-logger-fluentd-swjnl_deis

K8S versions (client and Azure Container Service):

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:55:55Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}

Deis version 2.18.0

Fluentd pod is definitely running on the master node on ACS as denoted by the event logs, in this case being created by: k8s-master-47933ef9-0

@monaka
Copy link

monaka commented Dec 25, 2017

I also got same issue on my K8s/CoreOS.
Not on ACS but might be same root cause.

In my case, it was fixed by adding the option --register-with-taints=node-role.kubernetes.io/master=true:NoSchedule to hyperkube.

The unschedulable field of a node is not respected by the DaemonSet controller.

@Cryptophobia
Copy link

This issue was moved to teamhephy/workflow#6

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants