Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between podified and appliance #595

Open
3 of 18 tasks
Fryguy opened this issue Jul 28, 2020 · 5 comments
Open
3 of 18 tasks

Differences between podified and appliance #595

Fryguy opened this issue Jul 28, 2020 · 5 comments
Assignees
Projects

Comments

@Fryguy
Copy link
Member

Fryguy commented Jul 28, 2020

This is a checklist of the things we know do not work in podified, but do work in appliances. If you know of more features, please comment here.

This section describes features that work differently in podified vs. appliances. These differences could mean a less desirable feature set and/or require different documentation.

  • Documentation for enabling SSA in podified vs. appliances is different. As an example:
    • For VMware SSA in podified, the user must create their own container image based on ours, then they install the vddk and follow any steps required to agree to vmware's terms and redistribution rules, and finally, they will need to publish and test this image for deployment. This PR added a hook to look for the vddk installation materials in the image: Add simple installation process for the VMware VDDK #666
    • For VMware in appliances, the process is to agree to the terms and install the VDDK in the base appliance we ship, then use that image as a template for which they deploy in their environment.
  • Documentation for upgrading postgresql versions is different.
    • In podified, the pvc is a shared resource in the CR, so there is a workflow process to backup the existing database, bring down the old pg pod, clear the pvc, and bring up the new pg pod and restore the database.
    • In appliances, the storage is not shared and the database backup is all that needs to be brought over to the new appliance.
  • Resource limits in podified and appliances are different
    • In appliances, resource limits are handled by the operating system (swap usage) and our internal worker management where we limit memory usage before workers are requested to restart and to disable starting new workers if the swap usage is excessive.
    • In podified, resource limits are specified in the CR and follow kubernetes rules. Within the pod, we have no real idea how much CPU or memory we're using. Limits on CPU cause throttling of the pod which is only evident by looking at /sys/fs/cgroup/cpu/cpu.stat. Limits on memory cause the pod to be OOMKilled via kill -9 with no graceful cleanup. CPU and memory request are used for autoscaling when we get there.
  • Log retention for podified is different
    • In podified, it depends on the cluster level settings and if they have setup clustered logging. It's up to them to set this up and verify it's working. Otherwise, it looks like the default is pods have a current and previous log. If it's restarted more than once, we lose the prior previous log. If pods are deleted due to a deployment being removed, we also lose those.
    • In appliances, it's more universal that logrotate will keep logs x number of days on each appliance. The only time we lose logs is if it's been logrotate x number of days since the event we care about or if we fill up the disk.
  • System events / notification is different in podified compared to appliances. It might have a similar workaround for podified as log retention.
    • In podified, system events linked up to current pods/deployments/etc. (oc get events) are retained, perhaps 4 hours or more. If an object is removed from the system, these events might disappears sooner. If a node or storage had an issue, there might not be an event that there's something wrong. Even worse, that event may not live very long so we could lose post-mortem information.
    • In appliances, mostly everything works through journald so you can actually get a system-wide view for most things, including events for other failing services, by looking at the journald logs. /var/log likely contains other logs that could be relevant. Additionally, these event like logging should be retained in a similar logrotate approach as normal application logging.
@Fryguy Fryguy added the bug label Jul 28, 2020
@chessbyte chessbyte added this to To do in Roadmap Jul 28, 2020
@chessbyte chessbyte added this to the Kasparov milestone Jul 28, 2020
@agrare
Copy link
Member

agrare commented Jul 28, 2020

I believe we had issues with VM remote consoles as well right @skateman ?

@skateman
Copy link
Member

I don't know, Nick might have sorted them out.

@Fryguy
Copy link
Member Author

Fryguy commented Jul 28, 2020

remote console support was added in #540, but Nick said he didn't see it working end to end since RHV consoles were different and he didn't have the VMware libs. There's #538 to document how to install the right libraries.

@skateman
Copy link
Member

Not sure if we can support html5 consoles in RHV any longer, the provider has this option disabled 😕

@gtanzillo gtanzillo self-assigned this Jul 30, 2020
@gtanzillo gtanzillo mentioned this issue Aug 3, 2020
5 tasks
@Fryguy Fryguy removed this from the Kasparov milestone Oct 20, 2020
@chessbyte chessbyte moved this from To do to Backlog in Roadmap Oct 21, 2020
@chessbyte chessbyte assigned Fryguy and unassigned gtanzillo Apr 7, 2021
@jrafanie jrafanie changed the title Things that do not work in pods Differences between podified and appliance Nov 23, 2022
@jrafanie jrafanie self-assigned this Nov 23, 2022
@Fryguy Fryguy moved this from Backlog to In progress in Roadmap Dec 7, 2022
@miq-bot
Copy link
Member

miq-bot commented Mar 6, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@miq-bot miq-bot added the stale label Mar 6, 2023
@agrare agrare added pinned and removed stale labels Mar 7, 2023
@Fryguy Fryguy moved this from In progress to To do in Roadmap Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
  
To do
Development

No branches or pull requests

7 participants