Skip to content

rootfs/node-fencing

Repository files navigation

Go Report Card

Kubernetes Node Fence controller

Based on fence controller design proposal

Status

alpha

Goal

The goal is to run controller that monitors partitioned nodes (i.e. not-ready nodes or nodes that raises problem events) Once partitioned is monitored, the controller posts NodeFence CRD object.

Deployment

Based on quay.io/bronhaim/standalone-fence-controller image

kubectl create -f standalone-controller/fence-controller-deployment.yaml

The deployment starts the controller with permissions to read and start agent pods (fence executors)

Agents pods are based centos7 and include all fence agents script (maintained in docker.io/bronhaim/agent-image)

Configuration

The following defines fence config for node1 which runs fence-rhevm in power-management step.

Each step list contains the names of the fence-method to perform with space separation. For each method you should define fence-method config.

The controller moves between steps every grace_period which is defined by the fence-cluster-config (default is 5seconds). For more available configs, such as defining method templates, see examples.

- kind: ConfigMap
  apiVersion: v1
  metadata:
   name: fence-config-node1
   namespace: default
  data:
   config.properties: |-
    node_name=node1
    isolation=
    power_management=fence-rhevm
    recovery=

- kind: ConfigMap
  apiVersion: v1
  metadata:
   name: fence-method-fence-rhevm-node1
   namespace: default
  data:
   method.properties: |
          agent_name=fence-rhevm
          namespace=default
          ip=ovirt.com  # address to the rhevm management
          username=admin@internal
          password-script=/usr/sbin/fetch_passwd
          ssl-insecure=true
          plug=vm-node1  # the vm name
          action=reboot
          ssl=true
          disable-http-filter=true

- kind: Secret
  apiVersion: v1
  metadata:                                 
    name: secret-fence-method-fence-rhevm-node1
  type: Opaque
  data:                                                              
    password: MTIz # ecoded password - follow https://kubernetes.io/docs/concepts/configuration/secret/ for more info

- kind: ConfigMap
  apiVersion: v1
  metadata:
   name: fence-cluster-config
   namespace: default
  data:
   config.properties: |-
    grace_timeout=10
    giveup_retries=5
    roles=

Build

# make
go build -i -o standalone-controller/_output/bin/node-fencing-controller cmd/node-fencing-controller.go

Build Images

To rebuild images after changes:

export REGISTRY=[registry name]
export VERSION=[build version]
# make images

Run

# $ ./standalone-controller/_output/bin/node-fencing-controller -kubeconfig=[config path]
  I1228 10:29:01.180885    6376 controller.go:120] Fence controller starting
  I1228 10:29:01.181089    6376 controller.go:123] Waiting for pod informer initial sync
  I1228 10:29:02.181297    6376 controller.go:133] Waiting for node informer initial sync
  I1228 10:29:03.181527    6376 controller.go:144] Waiting for event informer initial sync
  I1228 10:29:04.335665    6376 controller.go:170] Controller monitor is running every 10 seconds
   ...
  W1228 10:34:55.392293    6376 controller.go:272] Node node1-48dw ready status is unknown
  I1228 10:34:55.547233    6376 controller.go:447] Posted NodeFence CRD object for node node1-48dw - starting Isolation
  ...

Demo

In the demo video we show running example over GCE k8s cluster - enable subtitles to understand the process