Skip to content

Commit

Permalink
support containerLogMax[Size|Files] kubelet configurations (#6702)
Browse files Browse the repository at this point in the history
* support containerLogMaxSize and containerLogMaxFiles configurations in kubelet

* support containerLogMaxSize and containerLogMaxFiles configurations in kubelet

* add default for ContainerLogMaxSize and switch it to resource quantity

* leverage kublet logrotation in containerd runtime case

* add documentation clarification

* rerun generate

* adding a comment to container log max size default const

* update skaffold dependencies

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate_suite_test.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate_test.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update docs/usage/logging.md

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/kubelet.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/docker/logrotate/logrotate.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/docker/logrotate/logrotate_suite_test.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate_suite_test.go

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* revert default-admin.conf

* move defaults to workers kubelet configs, unify internal api and remove obsolete conversions

* Update docs/usage/logging.md

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update docs/usage/logging.md

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update docs/usage/logging.md

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/types_shoot.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/types_shoot.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/types_shoot.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/types_shoot.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/defaults_test.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/defaults_test.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/defaults_test.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/defaults_test.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/defaults_test.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/defaults_test.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/apis/core/v1alpha1/types_shoot.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* unify naming convention in tests

* improve readability

* comments update

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/docker/component.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* refactor dedicated const type to a string

* Update pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/containerd/logrotate/logrotate.go

Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>

* restore copytruncate in logrotate

* add validation for containerLogMaxFiles

* rerun generate

* adjust the exmaple configuration in the documentation

* set to docker runtime when cri is not defined

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

* refactor the defaulting of kubelet configuration

* refactored the defaulting of containerLogMaxSize with a complete set of test cases

* adding containerLog fields to shoot yaml example

* use the const value in the case of docker runtime

* run generate

* deny containerLog fields when runtime is docker

* default workers only when global config is not set

* refactor validation message

* keep formating

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>
Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>
  • Loading branch information
3 people committed Oct 24, 2022
1 parent 31c4ef3 commit 36a73ee
Show file tree
Hide file tree
Showing 35 changed files with 2,153 additions and 1,175 deletions.
27 changes: 27 additions & 0 deletions docs/api-reference/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -4876,6 +4876,33 @@ This requires the corresponding SeccompDefault feature gate to be enabled as wel
This field is only available for Kubernetes v1.25 or later.</p>
</td>
</tr>
<tr>
<td>
<code>containerLogMaxSize</code></br>
<em>
<a href="https://godoc.org/k8s.io/apimachinery/pkg/api/resource#Quantity">
k8s.io/apimachinery/pkg/api/resource.Quantity
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>A quantity defines the maximum size of the container log file before it is rotated. For example: &ldquo;5Mi&rdquo; or &ldquo;256Ki&rdquo;.
Default: 100Mi</p>
</td>
</tr>
<tr>
<td>
<code>containerLogMaxFiles</code></br>
<em>
int32
</em>
</td>
<td>
<em>(Optional)</em>
<p>Maximum number of container log files that can be present for a container.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="core.gardener.cloud/v1beta1.KubeletConfigEviction">KubeletConfigEviction
Expand Down
38 changes: 35 additions & 3 deletions docs/usage/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,37 @@ Kubernetes uses the underlying container runtime logging, which does not persist
* One Loki Statefulset in the `garden` namespace which contains logs for the seed cluster and one per shoot namespace which contains logs for shoot's controlplane.
* One Grafana Deployment in `garden` namespace and two Deployments per shoot namespace (one exposed to the end users and one for the operators). Grafana is the UI component used in the logging stack.

### Container Logs rotation and retention

Container [log rotation](https://kubernetes.io/docs/concepts/cluster-administration/logging/#log-rotation) in Kubernetes describes a subtile but important implementation detail depending on the type of the used high-level container runtime. When the used container runtime is not CRI compliant (such as `dockershim`) then the `kubelet` does not provide any rotation or retention implementations, hence leaving those aspects to the downstream components. When the used container runtime is CRI compliant (such as `containerd`) then the `kubelet` provides the necessary implementation with two configuration options:
- `ContainerLogMaxSize` for rotation
- `ContainerLogMaxFiles` for retention.

#### Docker container runtime

In this case, the log rotation and retention is implemented by a `logrotate` service provisioned by Gardener which rotates logs once `100M` size is reached. Logs are compressed on daily basis and retained for a maximum period of `14d`.

#### ContainerD runtime

In this case, it is possible to configure the `containerLogMaxSize` and `containerLogMaxFiles` fields in the Shoot specification. Both fields are optional and if nothing is specified then the `kubelet` rotates on the same size `100M` as in the `docker` container runtime. Those fields are part of provider's workers definition. Here is an example:

```yaml
spec:
provider:
workers:
- cri:
name: containerd
kubernetes:
kubelet:
# accepted values are of resource.Quantity
containerLogMaxSize: 150Mi
containerLogMaxFiles: 10
```

The values of the `containerLogMaxSize` and `containerLogMaxFiles` fields need to be considered with care since container log files claim disk space from the host. On the opposite side, log rotations on too small sizes may result in frequent rotations which can be missed by other components (log shippers) observing these rotations.

In the majority of the cases, the defaults shall do just. Custom configuration might be of use under rare conditions.

### Extension of the logging stack
![](images/shoot-node-logging-architecture.png)
The logging stack is extended to scrape logs from the systemd services of each shoots' nodes and from all Gardener components in the shoot `kube-system` namespace. These logs are exposed only to the Gardener operators.
Expand All @@ -24,7 +55,7 @@ There are two Grafana instances where the logs are accessible from.
The user Grafana URL can be found in the `Logging and Monitoring` section of a cluster in the Gardener Dashboard alongside with the credentials, when opened as cluster owner/user.
The secret with the credentials can be found in `garden-<project>` namespace under `<shoot-name>.monitoring` in the garden cluster or in the `control-plane` (shoot--project--shoot-name) namespace under `observability-ingress-users-<hash>` secrets in the seed cluster.
Also, the Grafana URL can be found in the `control-plane` namespace under the `grafana-users` ingress in the seed.
The end-user has access only to the logs of some of the control-plane components.
The end-user has access only to the logs of some of the control-plane components.

2. In addition to the dashboards in the User Grafana, the Operator Grafana contains several other dashboards that aim to facilitate the work of operators.
The operator Grafana URL can be found in the `Logging and Monitoring` section of a cluster in the Gardener Dashboard alongside with the credentials, when opened as Gardener operator.
Expand Down Expand Up @@ -81,6 +112,7 @@ Examples:
### Expose logs for component to User Grafana
Exposing logs for a new component to the User's Grafana is described [here](../extensions/logging-and-monitoring.md#how-to-expose-logs-to-the-users)
### Configuration

#### Fluent-bit

The Fluent-bit configurations can be found on `charts/seed-bootstrap/charts/fluent-bit/templates/fluent-bit-configmap.yaml`
Expand Down Expand Up @@ -138,7 +170,7 @@ The main specifications there are:

* chunk_store_config Configuration
```
chunk_store_config:
chunk_store_config:
max_look_back_period: 336h
```
**`chunk_store_config.max_look_back_period` should be the same as the `retention_period`**
Expand All @@ -152,7 +184,7 @@ The main specifications there are:
`table_manager.retention_period` is the living time for each log message. Loki will keep messages for sure for (`table_manager.retention_period` - `index.period`) time due to specification in the Loki implementation.

#### Grafana
The Grafana configurations can be found on `charts/seed-bootstrap/charts/templates/grafana/grafana-datasources-configmap.yaml` and
The Grafana configurations can be found on `charts/seed-bootstrap/charts/templates/grafana/grafana-datasources-configmap.yaml` and
`charts/seed-monitoring/charts/grafana/tempates/grafana-datasources-configmap.yaml`

This is the Loki configuration that Grafana uses:
Expand Down
2 changes: 2 additions & 0 deletions example/90-shoot.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ spec:
# kubernetes:
# version: 1.14.3
# kubelet:
# containerLogMaxSize: 100Mi
# containerLogMaxFiles: 5
# cpuCFSQuota: true
# failSwapOn: true
# cpuManagerPolicy: none
Expand Down
4 changes: 4 additions & 0 deletions pkg/apis/core/types_shoot.go
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,10 @@ const (
// KubeletConfig contains configuration settings for the kubelet.
type KubeletConfig struct {
KubernetesConfig
// ContainerLogMaxSize defines the maximum size of the container log file before it is rotated. For example: "5Mi" or "256Ki".
ContainerLogMaxSize *resource.Quantity
// ContainerLogMaxFiles is the maximum number of container log files that can be present for a container.
ContainerLogMaxFiles *int32
// CPUCFSQuota allows you to disable/enable CPU throttling for Pods.
CPUCFSQuota *bool
// CPUManagerPolicy allows to set alternative CPU management policies (default: none).
Expand Down
28 changes: 26 additions & 2 deletions pkg/apis/core/v1alpha1/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -304,11 +304,31 @@ func SetDefaults_Shoot(obj *Shoot) {
continue
}

if worker.CRI != nil {
if worker.CRI == nil {
obj.Spec.Provider.Workers[i].CRI = &CRI{Name: CRINameContainerD}
}

if isDockerRuntime(worker.CRI) {
continue
}

obj.Spec.Provider.Workers[i].CRI = &CRI{Name: CRINameContainerD}
// When CRI runtime is used and there is no explicit kubelet configuration, the ContainerLogMaxSize in Workers kubelet is set to 10Mi.
// To align both container runtime configurations, the default log max size in containerd case is also set to 100Mi.

if worker.Kubernetes == nil {
obj.Spec.Provider.Workers[i].Kubernetes = &WorkerKubernetes{}
}

if obj.Spec.Provider.Workers[i].Kubernetes.Kubelet == nil &&
obj.Spec.Kubernetes.Kubelet.ContainerLogMaxSize == nil {
obj.Spec.Provider.Workers[i].Kubernetes.Kubelet = &KubeletConfig{}
}

if obj.Spec.Kubernetes.Kubelet.ContainerLogMaxSize == nil &&
obj.Spec.Provider.Workers[i].Kubernetes.Kubelet.ContainerLogMaxSize == nil {
defaultContainerLogMaxSize := resource.MustParse(DefaultContainerLogMaxSize)
obj.Spec.Provider.Workers[i].Kubernetes.Kubelet.ContainerLogMaxSize = &defaultContainerLogMaxSize
}
}

if obj.Spec.SystemComponents == nil {
Expand All @@ -325,6 +345,10 @@ func SetDefaults_Shoot(obj *Shoot) {
}
}

func isDockerRuntime(cri *CRI) bool {
return cri == nil || cri.Name == CRINameDocker
}

// SetDefaults_Maintenance sets default values for Maintenance objects.
func SetDefaults_Maintenance(obj *Maintenance) {
if obj.AutoUpdate == nil {
Expand Down
84 changes: 84 additions & 0 deletions pkg/apis/core/v1alpha1/defaults_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -639,6 +639,90 @@ var _ = Describe("Defaults", func() {
Expect(obj.Spec.Kubernetes.EnableStaticTokenKubeconfig).To(PointTo(BeTrue()))
})

It("should default the workers's kubelet containerLogMaxSize field when cri is containerd", func() {
obj.Spec.Provider.Workers = []Worker{
{Name: "DefaultWorker"},
{Name: "containerd-worker",
CRI: &CRI{Name: CRINameContainerD}},
{Name: "containerd-worker-with-kubernetes",
CRI: &CRI{Name: CRINameContainerD},
Kubernetes: &WorkerKubernetes{},
},
{Name: "containerd-worker-with-kubelet",
CRI: &CRI{Name: CRINameContainerD},
Kubernetes: &WorkerKubernetes{
Kubelet: &KubeletConfig{},
},
},
}

SetDefaults_Shoot(obj)

Expect(obj.Spec.Kubernetes.Kubelet.ContainerLogMaxSize).To(BeNil())
Expect(obj.Spec.Provider.Workers[0].Kubernetes).To(BeNil())

Expect(obj.Spec.Provider.Workers[1].Kubernetes.Kubelet).ToNot(BeNil())
Expect(obj.Spec.Provider.Workers[1].Kubernetes.Kubelet.ContainerLogMaxSize).ToNot(BeNil())
Expect(obj.Spec.Provider.Workers[1].Kubernetes.Kubelet.ContainerLogMaxSize.String()).
To(Equal(DefaultContainerLogMaxSize))

Expect(obj.Spec.Provider.Workers[2].Kubernetes.Kubelet).ToNot(BeNil())
Expect(obj.Spec.Provider.Workers[2].Kubernetes.Kubelet.ContainerLogMaxSize).ToNot(BeNil())
Expect(obj.Spec.Provider.Workers[2].Kubernetes.Kubelet.ContainerLogMaxSize.String()).
To(Equal(DefaultContainerLogMaxSize))

Expect(obj.Spec.Provider.Workers[3].Kubernetes.Kubelet).ToNot(BeNil())
Expect(obj.Spec.Provider.Workers[3].Kubernetes.Kubelet.ContainerLogMaxSize).ToNot(BeNil())
Expect(obj.Spec.Provider.Workers[3].Kubernetes.Kubelet.ContainerLogMaxSize.String()).
To(Equal(DefaultContainerLogMaxSize))
})

It("should not overwrite the workers's kubelet containerLogMaxSize field when it is set", func() {
r := resource.MustParse("1M")
obj.Spec.Provider.Workers = []Worker{
{Name: "containerd-worker-with-kubelet",
CRI: &CRI{Name: CRINameContainerD},
Kubernetes: &WorkerKubernetes{
Kubelet: &KubeletConfig{
ContainerLogMaxSize: &r,
},
},
},
}

SetDefaults_Shoot(obj)

Expect(obj.Spec.Kubernetes.Kubelet.ContainerLogMaxSize).To(BeNil())
Expect(obj.Spec.Provider.Workers[0].Kubernetes.Kubelet.ContainerLogMaxSize.String()).
To(Equal("1M"))
})

It("should not default the workers's kubelet containerLogMaxSize field when cri is docker", func() {
obj.Spec.Provider.Workers = []Worker{
{Name: "docker-worker",
CRI: &CRI{Name: CRINameDocker}},
}

SetDefaults_Shoot(obj)

Expect(obj.Spec.Kubernetes.Kubelet.ContainerLogMaxSize).To(BeNil())
Expect(obj.Spec.Provider.Workers[0].Kubernetes).To(BeNil())
})

It("should not default the workers's kubelet containerLogMaxSize field when global config is set", func() {
r := resource.MustParse("10M")
obj.Spec.Kubernetes.Kubelet = &KubeletConfig{
ContainerLogMaxSize: &r,
}
obj.Spec.Provider.Workers = []Worker{
{Name: "containerd",
CRI: &CRI{Name: CRINameContainerD}},
}

SetDefaults_Shoot(obj)
Expect(obj.Spec.Provider.Workers[0].Kubernetes.Kubelet).To(BeNil())
})

Context("k8s version < 1.25", func() {
BeforeEach(func() {
obj.Spec.Kubernetes = Kubernetes{
Expand Down

0 comments on commit 36a73ee

Please sign in to comment.