Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

okd release-3.11 #70

Closed
mighani opened this issue Nov 22, 2018 · 9 comments
Closed

okd release-3.11 #70

mighani opened this issue Nov 22, 2018 · 9 comments

Comments

@mighani
Copy link

mighani commented Nov 22, 2018

How to support 3.11? Just changing version in the install-from-bastion.sh is not enough. Anything else to change?

@VineetReynolds
Copy link

VineetReynolds commented Dec 4, 2018

Got OKD 3.11 installation to work to a fair degree on Centos 7.6 working using the following changes:

  • Set the branch to release-3.11 in install-from-bastion.sh.
  • Set openshift_release=v3.11 in inventory.template.cfg#L23
  • Added an additional repo to point to http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin311/, via the following entry in inventory.template.cfg: openshift_additional_repos=[{'id': 'centos-okd-311', 'name': 'centos-okd-311', 'baseurl' :'http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin311/', 'gpgcheck' :'0', 'enabled' :'1'}]

The repo is needed only for Centos, and not for RHEL. Details about the Origin 3.11 repo were referenced from: https://lists.openshift.redhat.com/openshift-archives/users/2018-November/msg00007.html

After installation, instability was noticed in the master node(s), compared to older versions of OpenShift Origin, especially around etcd and API server pods. Due to this, failures cascaded to every other OKD component. To end-users, failures occurred when running oc commands or when accessing the web console, with log messages reporting messages like Failed to list *v1.Service or Failed to list *v1.Pod and dial tcp 10.0.1.83:8443: connect: connection refused; the API server pods were restarting frequently. Managed to recover from this, by restarting the docker and origin-node services on master node, but I'm not confident this is either recommended or sufficient. So maybe, these aren't the only changes.

@VineetReynolds
Copy link

Component failures in okd 3.11 installations on Centos seem to be related to a newer version of Docker. See: kubernetes/kubeadm#1299 (comment)

@mariusfilipowski
Copy link

On CentOs 7.5 I get the error message on each node: Currently, NetworkManager must be installed and enabled prior to installation. in "Verify Node Network Manager".

Do you have any experiences fixing this error?

@VineetReynolds
Copy link

@mariusfilipowski Yes, the openshift-ansible scripts need some modification for Centos 7 VineetReynolds/openshift-ansible@2c54d74

@zoobab
Copy link
Contributor

zoobab commented Dec 20, 2018

@mariusfilipowski I had to make a custom playbook to install docker, and NetworkManager, and this one was problematic since it required a reboot (!) to work properly. I can share the playbooks to have a basic Centos 7 install as a pre step if you are interested.

@dwmkerr
Copy link
Owner

dwmkerr commented Dec 20, 2018

It'd be great to see how you did it @zoobab I'm sure it'd help others coming across these issues!

@mariusfilipowski
Copy link

@zoobab That would be very helpful.
I tried it also with Redhat 7.5 but this did fail too.

@zoobab
Copy link
Contributor

zoobab commented Jan 30, 2019

Here is my adhoc yaml openshift-ansible/playbooks/adhoc/bootstrap-centos.yaml for centos7, feel free to adapt as you wish:

# Notes: tested against Centos 7, some parts are specific (pip and j2cli, and the public-hostname part are working on AWS and Openstack)

---
- hosts: OSEv3:children
  gather_facts: False
  become: yes
  tasks:
  - name: Wait that the machines are reachable
    wait_for_connection:
      timeout: 300
  - name: Add Epel repo
    copy:
      dest: "/etc/yum.repos.d/epel.repo"
      content: |
        [epel]
        name=epel
        baseurl=http://dl.fedoraproject.org/pub/epel/7/x86_64/
        gpgcheck=0
  - name: Run yum update
    yum: name=* state=latest update_cache=yes
  - name: Install old version of pip
    yum:
      name: python-pip
  - name: Install the latest version of pip
    pip:
      name: pip
      extra_args: --upgrade
  - name: Install the j2cli via pip
    pip:
      name: j2cli
  - name: Install required packages (docker, curl, httpd-tools, etc...)
    yum:
      name: "{{ packages }}"
    vars:
      packages:
      - wget
      - git
      - net-tools
      - bind-utils
      - iptables-services
      - bridge-utils
      - bash-completion
      - kexec-tools
      - sos
      - psacct
      - jq
      - docker-1.13.1
      - skopeo
      - python-docker-py
      - openvswitch
      - awscli
      - NetworkManager
      - unzip
      - vim
      - python-virtualenv
      - gcc
      - httpd-tools
  - name: Systemd enable NetworkManager
    systemd:
      name: NetworkManager
      enabled: yes
      masked: no
  - name: Systemd enable Docker
    systemd:
      name: docker
      enabled: yes
  - name: Create directory
    file:
      path: /etc/systemd/system/docker.service.d
      state: directory
      owner: root
      group: root
  - name: Restart Docker
    systemd:
      state: restarted
      daemon_reload: yes
      name: docker
  - name: Get this instance public hostname
    get_url:
      url: http://169.254.169.254/latest/meta-data/public-hostname
      dest: /tmp/public-hostname
  - name: Get this instance public hostname bis
    command: cat /tmp/public-hostname
    register: myhostname
  - debug:
      msg: "Public Hostname {{ myhostname.stdout }}"
  - name: Set the hostname
    shell: hostnamectl set-hostname {{ myhostname.stdout }}
  - name: Rebooting
    command: /sbin/shutdown -r +1 "Ansible-triggered Reboot"
    async: 0
    poll: 0
  - name: Wait for server to come back
    wait_for_connection:
      delay: 120
  - name: Check NetworkManager
    systemd:
      state: started
      name: NetworkManager

dwmkerr added a commit that referenced this issue Mar 6, 2019
@dwmkerr dwmkerr mentioned this issue Mar 6, 2019
@dwmkerr
Copy link
Owner

dwmkerr commented Mar 6, 2019

Hi all, thanks for the comments! I've closed this issue as OKD 3.11 is working fine now, but created a new, more specific issue to track the CentOS 7.5 challenges (#79).

welshstew pushed a commit to Purple-Sky-Pirates/terraform-aws-openshift that referenced this issue May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants