Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case when switching license from any not empty non SLES_BYOS to SLES_BYOS without cache file #138

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

jesusbv
Copy link
Contributor

@jesusbv jesusbv commented Dec 20, 2023

When switching from SLES to BYOS and no cache file exists, the license changes to BYOS and no other action is done, instance/system is still registered

This change fixes that scenario

@jesusbv jesusbv self-assigned this Dec 20, 2023
@smarlowucf
Copy link

I don't know the code enough to give a good review here. Also, is it intended to be s/SLES to BYOS/PAYG to BYOS?

@jesusbv
Copy link
Contributor Author

jesusbv commented Dec 20, 2023

Also, is it intended to be s/SLES to BYOS/PAYG to BYOS?

Well, it is a license change from SLES to SLES_BYOS
I will edit the title of the PR to clarify it

@jesusbv jesusbv changed the title Fix case when switching from SLES to BYOS without cache file Fix case when switching license from any not empty non SLES_BYOS to SLES_BYOS without cache file Dec 20, 2023
@jgleissner
Copy link

I'm trying to understand what the license cache is. So the pre-PR version simply updates the cache when the file is missing, which is kinda what you expect when it's trying to access a non-existent cache. But it didn't return anything in that case, from a function that is clearly supposed to, so that seems off. Plus it sets the license type to what it's supposed to compare it with, which also seems off.

It seems to me that the license cache isn't actually a cache (== stored data that is time-consuming to compute or retrieve) but actually the registration information (I vaguely recall we've had this conversation before). So that would mean, missing cache means it's assumed the system is not registered?

@jesusbv
Copy link
Contributor Author

jesusbv commented Dec 21, 2023

I'm trying to understand what the license cache is. So the pre-PR version simply updates the cache when the file is missing, which is kinda what you expect when it's trying to access a non-existent cache. But it didn't return anything in that case, from a function that is clearly supposed to, so that seems off. Plus it sets the license type to what it's supposed to compare it with, which also seems off.

So, pre-PR, the cache gets checked and if the file didn't exist, it gets created and the check method returned None
meaning license has not changed, whenever the check returned True cache file gets updated outside the check

This leads to an scenario where switching from SLES (or any PAYG license) to SLES_BYOS and no cache file, the instance is registered against the infrastructure and benefiting from that while being BYOS

It seems to me that the license cache isn't actually a cache (== stored data that is time-consuming to compute or retrieve) but actually the registration information (I vaguely recall we've had this conversation before).

I'm not sure I understand, the cache file stores the license value only

So that would mean, missing cache means it's assumed the system is not registered?

Missing the cache or not missing the cache does not affect registration per se, it affects registration when switching licenses. For example, if the instance is registered and license switched to SLES_BYOS (and cache had SLES for the sake of the example) that check and code would take care of de-registering the instance

Another approach to fix this issue is to check for BYOS license in the license check and de-register + disable the service

except FileNotFoundError:
   if 'BYOS' in license_type:
       run_command(['registercloudguest', '--clean'])
       run_command(['systemctl', 'disable', 'guestregister'])
   update_license_cache(license_type)
   # Notice we would keep returning None

This would put some responsibility into has_license_changed which IMHO does not belong
but we may be open to that trade-off to fix the scenario of "BYOS benefiting from infra", again
this would add a side-effect into the check license code where one would expect the method to
check the license or do things license related, not to do things like de-registering the instance for example

@jgleissner
Copy link

A function called has_license_changed should not return None, but only True, False, or throw an exception if it can't tell.

Generally how I would expect it to work is to populate the cache when trying to access data in the cache that is not there. So the question is, can it figure out the current license type from the active registration (not by re-registering)? If not, it's not actually a cache.

When switching from SLES to BYOS and no cache file exists,
the license changes to BYOS and no other action is done,
instance/system is still registered

- Update version for addon-azure

This change fixes that scenario
Copy link

@jgleissner jgleissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd still suggest moving the stored license type into /var/lib and not calling it cache, as it really persists the last known license type and rather than caching metadata, which got me confused in the first place.

usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
Return False when no cache file
If it BYOS and we do not know for sure license type has changed
then check if system is registered
usr/sbin/regionsrv-enabler-azure Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
Add BYOS unregistered cases
Add license type from IMDS starting value info
@jesusbv jesusbv requested a review from rjschwei January 3, 2024 10:21
usr/sbin/regionsrv-enabler-azure Show resolved Hide resolved

BYOS-2 SLES N/A Do nothing
BYOS-2 SLES BYOS Re-register
BYOS-2 BYOS-1 N/A De-register
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the actions taken by the user here? I do not understand this state transition in the context of this code. When a BYOS instance is registered, either to SCC or the update infrastructure then in order to transition from the registered state to the unregistered state BYOS-2 to BYOS-1 the user has to run registercloudguest --clean meaning this code when it runs again will find a cache and no registration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that the user has to run registercloudguest --clean command, but the code would check if license is BYOS,
which it would, it would check if it uses RMT as SCC proxy, which it wouldn't, then re-run the same command and
run the command to disable guestregister, no cache would be found on that scenario

If it did run again, after that scenario, yes there would be a cache saying SLES_BYOS on an unregistered system

usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
usr/sbin/regionsrv-enabler-azure Outdated Show resolved Hide resolved
):
# either license has changed or
# system was previously registered to update infrastructure
if license_has_changed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is tested twice in consecutive conditionals, on line 138 and then again here. Can be changed to (starting at line 137)

if has_license_changed(license_type):
    update_license_cache(license_type)
    if update_server and utils.is_registered(update_server.get_FQDN()):
        run_command(['registercloudguest', '--clean'])
        run_command(['systemctl', 'disable', service_name])

to avoid checking the same condition in consecutive instructions and make the flow a little easier to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do that, the de-registration must happen even if license hasnt changed
I will think a better way to check the conditions

Copy link
Contributor Author

@jesusbv jesusbv Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant outside the if of the license change

if has_license_changed(license_type):
    update_license_cache(license_type)
if update_server and utils.is_registered(update_server.get_FQDN()):
    run_command(['registercloudguest', '--clean'])
    run_command(['systemctl', 'disable', service_name])

@jesusbv jesusbv requested a review from rjschwei January 12, 2024 11:42
PAYG PAYG PAYG Do nothing

BYOS-1 -> BYOS unregistered
BYOS-2 -> BYOS registered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider the a BYOS-3 case where BYOS-3 is defined as registered to SCC and we need to handle that transition for example

BYOS-3 PAYG No dergister from SCC register with update infrastructure.

is one of the state transitions we need to handle.

It may also be a good idea to number the state transitions and then in the implementation reference the numbers the various if conditions handle.

Let's focus on getting the state transitions captured and then take another look at the implementation.

- Move START definition to the top of the table
- Number table entries
- Add comments to reference the entries on the code
Copy link
Contributor

@rjschwei rjschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need another iteration on the state transition table. Thanks for your patience.

3 BYOS-1 BYOS-1 No Do nothing
4 BYOS-1 BYOS-1 BYOS Do nothing
5 BYOS-1 BYOS-2 No Do nothing
6 BYOS-1 BYOS-2 BYOS Do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case 5 & 6 collapse to cases 3 and 4 because the customer cannot change the licence via AHB to a registered state. The action is always

  • Change license to BYOS, this is always BYOS-1 to BYOS-1
  • Register, this would then result in BYOS-2 or BYOS-3

Given the table we have this is hard to express. At the same time adding more columns such as "Additional user action" is not conducive to making it more readable. I suggest we make entries 3 & 4 look like this:

BYOS-1 BYOS-1 (BYOS_2, BYOS-3)* .....

and the add a footnote like so

* The user has manually registered he instance with their registration key

5 BYOS-1 BYOS-2 No Do nothing
6 BYOS-1 BYOS-2 BYOS Do nothing

7 BYOS-2 PAYG No Do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the instance is switched to PAYG and the cache is not PAYG we always have to re-register.

7 BYOS-2 PAYG No Do nothing
8 BYOS-2 PAYG BYOS Re-register - Update Infra
9 BYOS-2 BYOS-1 No De-register - Update Infra
10 BYOS-2 BYOS-1 BYOS De-register - Update Infra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the customer cannot switch to an unregistered state via AHB. So the condition expressed here really is

  • Customer ran registercloudguest --clean
  • Customer used AHB to explicitly set the BYOS license

As such there is nothing to do. If the starting state is BYOS, any form not registered or registered to the update infra or SCC and the cache says BYOS we have nothing to do. If there is no cache we also do nothing because we cannot do anyting.

13 PAYG BYOS-1 No De-register - Update Infra
14 PAYG BYOS-1 PAYG De-register - Update Infra
15 PAYG BYOS-2,3 No Do nothing
16 PAYG BYOS-2,3 PAYG Do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, the customer cannot change the license state to BYOS-2 or BYOS-3. Since we have trouble with this I think the column should be renamed from LICENSE CHANGED TO to Reg State at runtime this would indicate that we start with a given state and then detect the state during runtime of the code. And the state is set by the user as a combination of running AHB commands and registration commands. This should get us out of the quandary we are in. Let's try that and see what that looks like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants