Automating Host Upgrades With Ansible #8596

djryanj · 2024-05-17T20:30:55Z

Note: This is a continuation of a discussion originally started by @aureq in #8593 (comment)_ which has since been moved to discussions, but I can't post a comment.

@aureq, I have managed to automate this with Ansible not using any kind of wrapped python. Here's my playbook:

- hosts: k8s
  gather_facts: false
  serial: 1

  tasks:

  - name: Update apt cache on {{ inventory_hostname_short }}
    ansible.builtin.apt:
      update_cache: yes

  - name: Check if there are updates for {{ inventory_hostname_short }}
    ansible.builtin.command:
      cmd: apt list --upgradable
    register: updates

  - name: Cordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: cordon
      name: "{{ inventory_hostname_short }}"
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Evict Longhorn volumes from {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: false
        - op: replace
          path: /spec/evictionRequested
          value: true
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Wait for Longhorn volume eviction on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_info:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
    register: replica_list
    until: "replica_list.resources[0] | community.general.json_query('status.diskStatus.*.scheduledReplica') |unique == [{}]"
    retries: 60
    delay: 10
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Drain node {{ inventory_hostname_short }}
    delegate_to: localhost
   # unfortunately the k8s_drain command from kubernetes.core really struggles with longhorn as it throws a 429 too many requests error very often, in spite of all the attempts to cleanly migrate volumes in longhorn
    ansible.builtin.shell: kubectl drain {{ inventory_hostname_short }} --ignore-daemonsets --delete-emptydir-data
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Upgrade all packages on node {{ inventory_hostname_short }}
    ansible.builtin.apt: 
      update_cache: no
      upgrade: yes
      force: yes
      dpkg_options: 'force-confdef,force-confold'
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  # Restart required?
  - name: Check if reboot is needed for {{ inventory_hostname_short }}
    stat: path=/var/run/reboot-required
    register: check_reboot 
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Reboot node {{ inventory_hostname_short }}
    ansible.builtin.reboot:
      connect_timeout: 5
      reboot_timeout: 600
      pre_reboot_delay: 0
      post_reboot_delay: 30
      test_command: whoami
      msg: "Reboot complete"
    when: check_reboot.stat.exists and updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Uncordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: uncordon
      name: "{{ inventory_hostname_short }}"
    tags:
      - always
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Re-enable Longhorn volumes on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: true
        - op: replace
          path: /spec/evictionRequested
          value: false
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

The magic is in the kubernetes.core.k8s_json_patch tasks, which patches the Longhorn nodes and evicts the volumes running on them. This causes Longhorn to rebalance them if additional nodes are available (in my case, they are) and things continue as you'd expect.

There's almost certainly some improvement available in this playbook, but this works for me as a way to automatically update my hosts.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automating Host Upgrades With Ansible #8596

Automating Host Upgrades With Ansible #8596

djryanj commented May 17, 2024 •

edited

Automating Host Upgrades With Ansible #8596

Automating Host Upgrades With Ansible #8596

Comments

djryanj commented May 17, 2024 • edited

djryanj commented May 17, 2024 •

edited