Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.12.62: too strict check for feature template existence #111

Open
3 tasks
jouvin opened this issue Jul 14, 2018 · 9 comments
Open
3 tasks

1.12.62: too strict check for feature template existence #111

jouvin opened this issue Jul 14, 2018 · 9 comments

Comments

@jouvin
Copy link
Contributor

jouvin commented Jul 14, 2018

aq bind feature and aq manage , when they act on hosts that are part of a domain (through the archetype/personality selected in the case of aq bind feature), check that the config template for the feature exists in the selected archetype directory (in the Git repo associated with the domain). The goal of this check is to avoid breaking the configuration of the domain hosts with this archetype by requiring something that doesn't exist.

There are 2 identified drawbacks to this strict check

  • You may want to use archetypes to define host categories that may share some features. In this case, the natural way of doing it is to create a feature namespace/directory at the same level as archetype namespaces/directories. Despite the feature template does exist in the domain Git repo, the command will fail. Addressing this is probably trivial: it should be enough to look for the config template first in the archetype features directory, then in the features directory at the root of the Git repo. We need to understand if this can be the default behaviour or if it needs to be bound to a config option. If this is the default behaviour, do we need an option to enforce that we want the config template to be in only one of the locations (not both) to avoid overloading one template definition by the other accidently.
  • The second problem is related to the template library (TL) usage. The TL is normally part of the plenary templates and thus is not visible from the domain Git repo. But the TL may provide some features that a site would like to use directly rather than adding a site-specific feature that just includes the TL feature template. There is no trivial fix for this IMO (using the .dep files to guess what is the TL version/path used by each node of the archetype may be very cumbersome). But it should be left to a site to decide whether it wants a strict check of feature existence (at the price of creating one site feature for each TL feature) or if it agrees to live with a relaxed check that would only display a warning if the feature template is not found it the domain Git repo but not prevent the command execution (at the price of breaking the future host/domain deployment but not at the risk of pushing the wrong configuration as it would not compile).

In addition, there is one issue specific to aq manage:

  • it is discussable whether aq manage should do the check for feature template existence when option --skip_auto_compile is not specified. As it implies a compile of the managed host in its new environment, it will be the definitive check, including when using TL features directly (see previous point). I have the feeling that the check should not be done in this case as the host is not moved to the new domain/sandbox if it cannot be compiled successfully.
jouvin added a commit to jouvin/aquilon that referenced this issue Jul 14, 2018
- Enable sharing of feature definitions between several archetypes

Contributes to quattor#111

Change-Id: Ic15880018c615add765724ff82e24acf337d4912
jouvin added a commit to jouvin/aquilon that referenced this issue Jul 14, 2018
- Enable sharing of feature definitions between several archetypes

Contributes to quattor#111

Change-Id: Ic15880018c615add765724ff82e24acf337d4912
jouvin added a commit to jouvin/aquilon that referenced this issue Jul 14, 2018
When config option relaxed_feature_template_check is true
(default: false), only display a warning if the template existence
cannot be assessed rather than raising an exception. This allows
to support direct use of features defined in the plenary templates
(e.g. features from the template library).

Contributes to quattor#111

Change-Id: I0b3319cd39a2c962da0464c511a2c42de2595407
jouvin added a commit to jouvin/aquilon that referenced this issue Jul 14, 2018
When config option relaxed_feature_template_check is true
(default: false), only display a warning if the template existence
cannot be assessed rather than raising an exception. This allows
to support direct use of features defined in the plenary templates
(e.g. features from the template library).

Contributes to quattor#111

Change-Id: I0b3319cd39a2c962da0464c511a2c42de2595407
@ned21
Copy link
Contributor

ned21 commented Jul 15, 2018

My usual comment here: creating more than one way to do something creates confusion, makes it hard to write working documentation, and complicates answering support queries (because you first have to get the person asking for support to send you their config file). Personalities are composed of features and personalities are namespaced by archetype. Each archetype has its own schema so cross-archetype features are likely to be a very niche use case.

IMHO this should be closed won't fix.

@jouvin
Copy link
Contributor Author

jouvin commented Jul 16, 2018

@ned21 I'd request that this issue is not closed before further discussion. I understand your point of view but we also need to take intot account that having one way for everybody won't work... We need to find the right balance between making the code maintenance not unnecessarily complex, the user support (/document) as easy as possible and the production flexible enough to accomodate different use cases and site configurations.

This is again an issue related to the possibility to use (efficiently) the template library with Aquilon and I consider this a serious issue for the community which has been relying on the TL. TL is installed as part of the plenary templates (see https://www.quattor.org/aquilon/configuration.html#configuring-the-template-library) which means that as soon as you use it you rely on a component that you cannot completely validate as part of your site templates. Even with the current check which tries to prevent an inconsistent configuration you can add a feature to an archetype/personality successfully with site features that are just including features from the TL and later one delete the TL version from the plenary (because you thought it was no longer used) and break things. This is why I propose to let as a site decision whether you want strict checks (as MS does today) or if you want a bit more flexibility. Every solution has a price and I don't think this is so difficult to document.

@ned21
Copy link
Contributor

ned21 commented Jul 16, 2018

Even with the current check which tries to prevent an inconsistent configuration you can add a feature to an archetype/personality successfully with site features that are just including features from the TL and later one delete the TL version from the plenary (because you thought it was no longer used) and break things.

I don't follow this logic. Deleting things from source code requires you to test the change, probably via some sort of automated framework. The protection provided by this check is because we learned through years of usage that people will put hosts in sandboxes and bind the feature "for their host" and not realise that they inadvertently broke the domain. Or they moved hosts out of sandboxes into the domain and again broke it.

I realise this is partly a problem of scale where we have many SAs globally distributed so we need more safety buffers, but by turning off this protection you are disregarding the years of experience we built up in making Aquilon easy to use.

Back to the original points as I see now there are actually two distinct issues.

  1. Feature templates being shared between archetypes. As I said above, I think this is a bad idea because it goes against the Aquilon data model.
  2. Disabling the check-for-file existence to allow including features directly from the TL. I can see why you want to do this. I assume the TL appears in the LOADPATH after the archetype so a local feature will always take precedence over the community one. If this was a config option to skip these checks (default = false) then I think it should be marked "dangerous" but it would acceptable to exist for those that want the flexibility of operating without safety nets. :)

@jouvin
Copy link
Contributor Author

jouvin commented Jul 17, 2018

@ned21 I think we are not far from an agreement! As far as I'm concerned, I certainly don't want to disregard the years of experience you put in Aquilon but I'd rather want to thank you for the great product you built. Honnestly, I've been very positively surprised by how good Aquilon is. That said, as you mentioned scale is impacting the balance you choose between safety and flexibility and my point is only that we must accomodate for several choices, based on site decision, making clear what the impact of any choice is. Note that I try not to change the default behavior of anything.

  • On the features shared between archetypes, I don't agree that this is against Aquilon model. Aquilon model for me is that archetypes are a group of personalities. MS, for very good reasons, decided that archetypes must correspond to different kind of HW devices, with different schemas. But other sites may want to use archetypes for categorizing (Linux) hosts based on some use cases they have. I don't think it breaks anything in the Aquilon model. As I mentioned in my first comments and in the PR description, I'm happy to make the change related to shared features optional (which is not yet the case to avoid unnecessary multiplication of options if not needed). I see 2 possibilities and I'd like to get feedback on them:
    • Have an option that enforces that you can have only one feature template, no matter whether it is in the archetype or as a shared feature.
    • Have an option that keeps exactly the current behaviour (feature templates must be in the archetype directory). In this case, we must agree if it can be the same option as the one used for the second change or if we see a use case for not linking the 2 changes in term of configuration options. I've the feeling that your preference would be for this one.
  • TL features: yes the TL is at the end of the loadpath, as anything coming from the plenary templates. I am not sure we must flag it as dangerous but we should make clear what the consequence is. This is why, in the implementation I proposed, when the relax checks are enabled the checks are still done and if they fail, there is a warning displayed to the user saying that the broker cannot check the consistency of the configuration. Have you looked at it?

BTW, after doing the implementation, I think it is good to keep the check in manage command, conversely to what I suggested initially. It is harmless and it is insisting that there is a potential risk.

jouvin added a commit to jouvin/aquilon that referenced this issue Jul 17, 2018
- Enable sharing of feature definitions between several archetypes

Contributes to quattor#111

Change-Id: Ic15880018c615add765724ff82e24acf337d4912
jouvin added a commit to jouvin/aquilon that referenced this issue Jul 17, 2018
When config option relaxed_feature_template_check is true
(default: false), only display a warning if the template existence
cannot be assessed rather than raising an exception. This allows
to support direct use of features defined in the plenary templates
(e.g. features from the template library).

Contributes to quattor#111

Change-Id: I0b3319cd39a2c962da0464c511a2c42de2595407
@jrha
Copy link
Member

jrha commented Jul 24, 2018

FWIW at RAL we use archetypes for separating hosts that are managed by different organisational structures, we have some desire to eventually move away from this, but nothing will happen in the short to medium term.

Archetypes work really well for us in this use case, allowing us to share a broker and the associated infrastructure with groups who have different ways of working and whom we have no managerial authority over, allowing them to manage their template library versions, schema changes, local forks etc.

We do however like to avoid duplicating effort, so we have a fake archetype called shared where we maintain shared templates, we extended the check to always look for the shared area:

@@ -122,8 +122,11 @@ def check_feature_template(config, dbarchetype, dbfeature, dbdomain):
     # The broker has no control over the extension used, so we check for
     # everything panc accepts
     for ext in ('pan', 'tpl'):
-        if os.path.exists("%s/%s/%s/config.%s" % (basedir, dbarchetype.name,
-                                                  dbfeature.cfg_path, ext)):
+        if os.path.exists("%s/%s/%s/config.%s" % (basedir, dbarchetype.name, dbfeature.cfg_path, ext)):
+            return
+
+        # STFC shared archetype
+        if os.path.exists("%s/shared/%s/config.%s" % (basedir, dbfeature.cfg_path, ext)):
             return
 
         # Legacy path for hardware features

I'm not sure if supporting this would be a good idea for the upstream broker code, if it were then shared should probably be a reserved name.

I guess something similar to this could work for the template library, but we never use bare features from the library, we always wrap them in a local feature template, this allows us to customise them more easily (e.g. setting global variables before the include) and lets people know that they are available for use, even though we do sometimes trip over naming conflicts.

@jouvin
Copy link
Contributor Author

jouvin commented Jul 24, 2018

I am not sure I completely share the view that a sandbox should contain only archetypes... For example, you may want to add HW definitions (in hardware namespace) and they are clearly not archetype-related. For me, archetypes should have been inside an archetypes directory (with the broker defining the appropriate loadpath). The drawback to your approach is that it requires adding something to the loadpath this probably just makes life more complex... At least this is my opinion!

Anyway, I don't really care at the end: what is important, IMO, is that we support (optionally at least) the possibility of features shared between archetypes. If there is a wider consensus on the RAL approach, I can live with it! May be the option to enable this feature could be the name of the directory used for the shared information rather than a boolean.

@jrha
Copy link
Member

jrha commented Jul 24, 2018

For us at least, adding things to the LOADPATH was desirable because it is important that archetypes opt-in to the shared area. Most of our sharing of work is more grass-roots than top-down.

@jouvin
Copy link
Contributor Author

jouvin commented Jul 24, 2018

@jrha I understand the potential benefit of your approach but in this case it is not possible to implement a reliable check of template existence (the original issue discussed here!) as you need to know the loadpath set in the context of the archetype. So if you have an archetype that doesn't use the shared templates and you are running your modified broker, the missing template will not be detected by the broker (as it will check the shared area that will not actually be used). In this sense, you are advocating for relaxed checks!!

@jrha
Copy link
Member

jrha commented Jul 25, 2018

Sure, though ideally we wouldn't need to use archetypes in this way.

jouvin added a commit to jouvin/aquilon that referenced this issue Aug 16, 2018
- Enable sharing of feature definitions between several archetypes

Contributes to quattor#111

Change-Id: Ic15880018c615add765724ff82e24acf337d4912
jouvin added a commit to jouvin/aquilon that referenced this issue Aug 16, 2018
When config option relaxed_feature_template_check is true
(default: false), only display a warning if the template existence
cannot be assessed rather than raising an exception. This allows
to support direct use of features defined in the plenary templates
(e.g. features from the template library).

Contributes to quattor#111

Change-Id: I0b3319cd39a2c962da0464c511a2c42de2595407
jouvin added a commit to jouvin/aquilon that referenced this issue Oct 22, 2018
- Enable sharing of feature definitions between several archetypes

Contributes to quattor#111

Change-Id: Ic15880018c615add765724ff82e24acf337d4912
jouvin added a commit to jouvin/aquilon that referenced this issue Oct 22, 2018
When config option relaxed_feature_template_check is true
(default: false), only display a warning if the template existence
cannot be assessed rather than raising an exception. This allows
to support direct use of features defined in the plenary templates
(e.g. features from the template library).

Contributes to quattor#111

Change-Id: I0b3319cd39a2c962da0464c511a2c42de2595407
jouvin added a commit to jouvin/aquilon that referenced this issue Nov 2, 2018
- Enable sharing of feature definitions between several archetypes

Contributes to quattor#111

Change-Id: Ic15880018c615add765724ff82e24acf337d4912
jouvin added a commit to jouvin/aquilon that referenced this issue Nov 2, 2018
When config option relaxed_feature_template_check is true
(default: false), only display a warning if the template existence
cannot be assessed rather than raising an exception. This allows
to support direct use of features defined in the plenary templates
(e.g. features from the template library).

Contributes to quattor#111

Change-Id: I0b3319cd39a2c962da0464c511a2c42de2595407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants