Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automating Host Upgrades With Ansible #8596

Open
djryanj opened this issue May 17, 2024 · 0 comments
Open

Automating Host Upgrades With Ansible #8596

djryanj opened this issue May 17, 2024 · 0 comments

Comments

@djryanj
Copy link

djryanj commented May 17, 2024

Note: This is a continuation of a discussion originally started by @aureq in #8593 (comment)_ which has since been moved to discussions, but I can't post a comment.

@aureq, I have managed to automate this with Ansible not using any kind of wrapped python. Here's my playbook:

- hosts: k8s
  gather_facts: false
  serial: 1

  tasks:

  - name: Update apt cache on {{ inventory_hostname_short }}
    ansible.builtin.apt:
      update_cache: yes

  - name: Check if there are updates for {{ inventory_hostname_short }}
    ansible.builtin.command:
      cmd: apt list --upgradable
    register: updates

  - name: Cordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: cordon
      name: "{{ inventory_hostname_short }}"
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Evict Longhorn volumes from {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: false
        - op: replace
          path: /spec/evictionRequested
          value: true
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Wait for Longhorn volume eviction on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_info:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
    register: replica_list
    until: "replica_list.resources[0] | community.general.json_query('status.diskStatus.*.scheduledReplica') |unique == [{}]"
    retries: 60
    delay: 10
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Drain node {{ inventory_hostname_short }}
    delegate_to: localhost
   # unfortunately the k8s_drain command from kubernetes.core really struggles with longhorn as it throws a 429 too many requests error very often, in spite of all the attempts to cleanly migrate volumes in longhorn
    ansible.builtin.shell: kubectl drain {{ inventory_hostname_short }} --ignore-daemonsets --delete-emptydir-data
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Upgrade all packages on node {{ inventory_hostname_short }}
    ansible.builtin.apt: 
      update_cache: no
      upgrade: yes
      force: yes
      dpkg_options: 'force-confdef,force-confold'
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  # Restart required?
  - name: Check if reboot is needed for {{ inventory_hostname_short }}
    stat: path=/var/run/reboot-required
    register: check_reboot 
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Reboot node {{ inventory_hostname_short }}
    ansible.builtin.reboot:
      connect_timeout: 5
      reboot_timeout: 600
      pre_reboot_delay: 0
      post_reboot_delay: 30
      test_command: whoami
      msg: "Reboot complete"
    when: check_reboot.stat.exists and updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Uncordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: uncordon
      name: "{{ inventory_hostname_short }}"
    tags:
      - always
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Re-enable Longhorn volumes on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: true
        - op: replace
          path: /spec/evictionRequested
          value: false
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

The magic is in the kubernetes.core.k8s_json_patch tasks, which patches the Longhorn nodes and evicts the volumes running on them. This causes Longhorn to rebalance them if additional nodes are available (in my case, they are) and things continue as you'd expect.

There's almost certainly some improvement available in this playbook, but this works for me as a way to automatically update my hosts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant