Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error checking, StaticFW doesn't pass vasp input set params correctly(?) #295

Closed
mkhorton opened this issue Jun 10, 2019 · 29 comments

Comments

@mkhorton
Copy link
Contributor

Am I reading this incorrectly, or does this just not work for custom input set params? The keyword arg vasp_input_set_params should be vasp_input_params?

vasp_input_set_params=vasp_input_set_params))

optional_params = ["vasp_input_params"]

Just lost some CPU time because of this. I think how we pass around keyword args really needs to be re-thought, it's very brittle and doesn't scale well when you have a lot of nested classes.

At the very least we need more error checking, it's a shame my FW made it all the way to being computed before I realized. I imagine it may be possible to use this option and never realize your settings weren't picked up properly.

Will have to have a read through to see if other FWs are affected.

@bocklund
Copy link
Contributor

bocklund commented Jun 10, 2019

I just did a quick grep and it seems like this is in StaticInterpolateFW and LcalcpolFW corresponding to WriteVaspFromIOSet and WriteVaspFromIOSetFromInterpolatedPOSCAR.

This kind of thing would make for a good test and is a good example of how using atomate to make test files for atomate can be dangerous.

Maybe the correct solution for this is for FiretaskBase to raise if any keyword arguments to the initializer of the subclass are not in required_params or optional_params

@computron
Copy link
Contributor

computron commented Jun 11, 2019

Hi all

  1. The bug seems real - sorry for the problem. The fix should be simple enough

  2. As @bocklund suggested we probably want to just check in FireTaskBase. Actually, FireTaskBase already checks to make sure that anything in required_params is present. We can easily add a check to make sure that all of the keyword arguments are in either required_params or optional_params. However, previously we let people be more lenient as to whether they felt like filling out the required/optional_params or not. Perhaps we want to check as follows:

  • If someone fills out neither required_params nor optional_params, FiretaskBase doesn't check anything
  • If someone only fills out required_params, we will only check to make sure all the required_params are in place.
  • If someone fills out optional_params (including an empty list for optional_params), then we will make sure that both (i) all required params, if specified, are present AND (ii) no keyword parameters are specified that are in neither required_params nor optional_params.

Thus, if you take the time to fill out optional_params, then that is actually used as a check. But there might be reasons you don't want to list out all the parameters explicitly - e.g., if you want to pass in **kwargs style things to a Firetask which gets passed on to some object in the Firetask. Here a FWS developer would not necessarily want to write out a list of every possible kwarg. So they'd have the option of not filling out optional_params.

Thoughts?

@mkhorton
Copy link
Contributor Author

e.g., if you want to pass in **kwargs style things to a Firetask which gets passed on to some object in the Firetask. Here a FWS developer would not necessarily want to write out a list of every possible kwarg.

I think that's part of the problem here.

I think allowing flexibility in FireTaskBase in general is good, but in core atomate specifically can bite us. We could be more strict here.

The calculations I was running this weekend I had to cross-reference with allowed kwargs from DictSet regardless, and this is very error prone (I missed a plural in the first instance, and no warning was raised). As part of #290, I'm still thinking about ways we can add additional checks ... maybe some kind of validate_dictionary_keys convenience function. For example, INCAR can only take a set of known keys, vasp_input_set_params can only take a set of known keys, we could validate both -- or at least warn if the key isn't known, while still allowing flexibility for power users.

@bocklund
Copy link
Contributor

If someone fills out neither required_params nor optional_params, FiretaskBase doesn't check anything
If someone only fills out required_params, we will only check to make sure all the required_params are in place.
If someone fills out optional_params (including an empty list for optional_params), then we will make sure that both (i) all required params, if specified, are present AND (ii) no keyword parameters are specified that are in neither required_params nor optional_params.

I agree with the second two points, but not the first. If required parameters don’t need to be specified, doesn’t that mean they aren’t actually required, but optional? Required parameters should be required and optional should be optional? It should be super simple and obvious to the user - no magic:

  1. Raise if any required parameters are not specified
  2. Raise if any kwargs are not in Union(required parameters, optional parameters)

#2 would catch this issue.

@computron
Copy link
Contributor

If required parameters don’t need to be specified, doesn’t that mean they aren’t actually required, but optional?

No, but I think you are getting at something similar.

I am saying that required_params itself does not need to be specified by the person who writes the FireTask. Meaning you as a developer can write a FireTask and not bother with specifying what are the required_params or optional_params for that FireTask (required_params itself is not required). But, if you do bother to write required_params, then of course those parameters are required. And, if you do bother to write optional_params, we will additionally check to make sure that no "extraneous" parameters that you did not specify are there.

@bocklund
Copy link
Contributor

bocklund commented Jun 11, 2019

I see what you mean now. I think it's reasonable to ask that Firetask developers be explicit about the Firetask's API in the form of required and optional parameters. It's a little extra effort when writing the Firetask, but that extra effort will be worth it if the Firetask will be relatively unchanged (as many are now) since they will be read by users (Firework developers) many more times than they were written.

@computron
Copy link
Contributor

I already wrote why optional parameters should not be required, please read my explanation from before. Matt seems to get it. Note that there is a difference between requiring something in FireWorks (requires careful thought) versus having a convention in atomate.

@shyuep
Copy link
Collaborator

shyuep commented Jun 11, 2019

I hate to jump into the fray here, but in my opinion, everything should as far as possible be required. The fewer opportunities there are for people to be lazy, the more robust the code. Re @computron about people wanting to pass kwargs to objects within FireTasks/Works, there has to be some sane guidelines on what should be allowed.

My suggested guidelines are pretty simple: if the called method has pretty much the same purpose as an existing method, or is a subclass, kwargs passthrough is reasonable. E.g., if you are writing a PDPlotter with a plot method, then passing through **kwargs to matplotlib.pyplot.plot is fine. Otherwise, the preference should be to use a proper dict arg/kwarg. E.g., allowing passthrough from FireTask to VaspInputSets would be a bad idea (this does not seem to be the case in atomate, and something like WriteVaspInputTask actually has a proper vasp_input_params). kwarg passthrough in atomate's FireWorks basically goes to the parent FireWork superclass, which is fine. There can be only one kwarg passthrough and no other objects are allowed to use it. So kwarg passthrough should be used very judiciously.

In short, I would strongly discourage use of kwargs passthrough in most cases. In which case, the default should be to require everyone to specify all required and optional parameters unless there is great reason not to do so. You should also specify a rule that if a variable is named something in the core FireTasks, that parameter naming should be carried through. E.g., vasp_input_params should be named that in all FireTasks, rather than having someone name it input_params and another just name it params.

@computron
Copy link
Contributor

  1. I have fixed the immediate issue via: 01558e8 . However, note that different classes in atomate have different conventions for vasp_input_params vs vasp_input_set_params in terms of naming. If we wanted to have consistent names, we could either introduce a backward incompatible refactor to enforce consistency, or include duplicate parameters (e.g., add a "vasp_input_set_params" argument to classes that are currently "vasp_input_params") to allow either/or usage and thus allow the same name to be used across atomate Firetasks. For now, I have done neither and will simply try to enforce consistent names better in atomate_v2.

  2. There is a difference between "guidelines" (i.e. "discouraged in most cases", "should be used judiciously") and enforced behavior. Sure, we can have guidelines that people shouldn't pass kwargs passthrough is only tolerable under certain circumstances. However, according to the proposed changes from others, this would be explicitly prohibited since a developer of a FireTask would be required to explicitly list out every possible optional parameter that might go into a FireTask. I don't think we should be confusing enforced changes (no coloring outside the lines, FireWorks will start throwing errors if you try to do so) versus guidelines (do it only if you have a good reason to). There is no problem from me in terms of guidelines.

I have thus gone my original suggestion. If a developer specifies required_params, FireWorks will enforce that required_params are in place (this was already there before). If a developer specified optional_params for their FireTask, FireWorks will enforce that a user does not put any kwarg parameters (outside required or optional params) when initializing that FireTask.

Note there are good reasons not to require developers to add required_params and optional_params for every single Firetask they write, but rather to simply enforce those parameters if the developer does choose to add those variables to their Firetask. Note that since pretty much every atomate FireTask (probably all of them?) include both of these, in practice every atomate Firetask will be checked.

@computron
Copy link
Contributor

Also, after making the FWs change, looks like @shyuep 's PyTask fails the tests since this is one of the few FireTasks that was actually doing the whole "pass-through" thing - meaning, PyTask allows a user to put in kwargs that are not explicitly listed in required_params or in optional_params. I've opened a separate ticket on FireWorks for this where we can resolve.

@shyuep
Copy link
Collaborator

shyuep commented Jun 11, 2019

@computron I think we can have an Atomate FiretaskBase class which other atomate firetasks must subclass. The default would be to enforce all parameters, unless someone turn on a flag called no_arg_check. Opting out of arg checking should require a cognitive effort, not simple laziness.

@bocklund
Copy link
Contributor

If it doesn't make sense to be more strict in FireWorks, I'm on board with atomate having a more strict FiretaskBase in atomate v2. That would be a particularly nice feature since we can catch bugs at the time the Fireworks and workflows are instantiated, rather than at runtime. Anything we can do that direction would be valuable.

@computron
Copy link
Contributor

I don't think it makes sense to introduce a new FireTaskBase in atomate, that is just silly. It creates yet another thing for people to learn about - anyone outside this group of people will be utterly confused as to why atomate FireTasks should be subclassing some other base class than the normal one.

If you really want to police this, I would suggest just adding a unit test that goes through all the FireTasks and makes sure required_params and optional_params are set. Then if someone adds a new FireTask that doesn't have these set, the test will fail and we'll be notified.

Note that even if we did set up a different FireTaskBase, e.g. AtomateFireTaskBase (which I think there are many reasons not to), there'd be no automated way to detect that someone subclassed the original FireTaskBase by mistake.

@computron
Copy link
Contributor

Note that even if we did set up a different FireTaskBase, e.g. AtomateFireTaskBase (which I think there are many reasons not to), there'd be no automated way to detect that someone subclassed the original FireTaskBase by mistake.

(unless you also did an automated test to make sure all FireTasks subclassed AtomateFireTaskBase - which at that point, you might as well just have that test check whether required_params and optional_params are present instead of checking the class that it is subclassing)

@bocklund
Copy link
Contributor

The problem with just trying to test it is that it's hard to test the entire surface of arguments that every Firework tries to send to a Firetask. A plain test is fine, but what about a Workflow that has control flow that sets an option in a Firework that gets propagated to a Firetask (the from_prev is a simple example). How do you reliably test that every possible way a Firetask can be instantiated is made with correct input? I think you cannot and having a check in FiretaskBase.__init__ precludes the need to test the entire surface of inputs

All of atomate has 66 classes that inherit from FIretaskBase. They do not get added very frequently and since all things should go through PRs, it shouldn't be a problem to check it by hand, although (as you said) automated testing is possible.

@bocklund
Copy link
Contributor

It should be impossible to write

class DoSomethingFW(Firework):
    def __init__(...):
        # do stuff
        tasks = []
        ...
        tasks.append(MyCustomFiretask(..., bad_input_arg=foo))
        super(DoSomethingFW, self).__init___(...)

If bad_input_arg is there, or if any required_params are missing, it needs to fail on the line tasks.append(MyCustomFiretask(..., bad_input_arg=foo)) and I should not have to run the Firework to find out.

But yeah, it does seem silly to have AtomateFiretaskBase, which is why I think the behavior belongs in FireWorks to begin with

@computron
Copy link
Contributor

@bocklund That is already the case with the FWS changes I just pushed....

Look at the recent changes I pushed FWS recently and have been talking about since several comments up.

I have thus gone my original suggestion. If a developer specifies required_params, FireWorks will enforce that required_params are in place (this was already there before). If a developer specified optional_params for their FireTask, FireWorks will enforce that a user does not put any kwarg parameters (outside required or optional params) when initializing that FireTask.

@bocklund
Copy link
Contributor

Sorry I missed that, thanks for highlighting it

@shyuep
Copy link
Collaborator

shyuep commented Jun 11, 2019

@computron I don't think it is silly to have a separate AtomateFireTaskBase that subclasses FireTaskBase and just adds a level of check. This is in fact frequently done in many other classes. The requirements of FireWorks and Atomate are different. Your solution of writing a test to check that all required/optional args are specified is impossible to implement (go ahead and tell me how you are going to inspect that a FireWork is calling a FireTask with the right parameters within a unittest framework, other than having FireWorks itself do the policing). It is also inflexible since exceptions (e.g., PyTask) has to be hard-coded into the test. My suggested solution merely forces the error to come up during runtime (whether in unittest or in normal use) when wrong args/kwargs are passed to an AtomateFireTask, like practically all methods where you supply garbage args/kwargs. e.g., if you supply a c=3 kwarg to a method defined as sum(a, b). The inherent problem with FireWorks has always been that the args/kwargs are so flexible that anything can go in without any checks (other than the metaclass required/optional params that were implemented by me during the early days to enforce some semblance of checking).

Anyway, I have said my piece.

@computron
Copy link
Contributor

I think you are misunderstanding.

Your solution of writing a test to check that all required/optional args are specified is impossible to implement

My suggestion was just to make sure that required_params and optional_params are defined (not None). It is up to the developer to make sure that those parameters are correct.

It is also inflexible since exceptions (e.g., PyTask) has to be hard-coded into the test.

This is true, but it's not much different than having to manually review which FireTasks must subclass AtomateFireTaskBase and which ones are OK to subclass FireTaskBase. In my suggestions, at least those decisions are at least encoded in a unit test. i.e. we enforce that all FireTasks in atomate must include checks on required_params and optional_params, unless we make a real exception. This is in-line with your own suggestion to only allow flexible optional_params on a very-case-by-case basis. Note that in the current atomate, I don't think a single exception is needed. So it's not like the unit test is going to be littered with exceptions.

My suggested solution merely forces the error to come up during runtime (whether in unittest or in normal use) when wrong args/kwargs are passed to an AtomateFireTask, like practically all methods where you supply garbage args/kwargs. e.g., if you supply a c=3 kwarg to a method defined as sum(a, b).

See my message to Brandon. This is already the case in the recent changes I pushed to FireWorks. As long as the user supplies required_params and/or optional_params, garbage arguments will throw an error at Runtime. No unit test is needed. For example, you can try to pull the most recent Fireworks (bleeding edge master) and just try to instantiate any atomate Firetask with garbage arguments. The code will fail immediately. No unit test is needed. The only Firetasks that are exceptions to this "fail immediately" behavior are the ones where the person who writes the FireTask does not supply required_params or optional_params. In those cases, the strict checking is skipped. For example, PyTask supplies required_params but not optional_params. FireWorks will thus make sure all required_params are present, but since there is no optional_params, it will not check for extraneous params (since PyTask wants to allow any parameter through its auto_kwarg argument). In short, just check the most recent FireWorks, it already has this.

@computron
Copy link
Contributor

computron commented Jun 11, 2019

Also, since it seems multiple people seem to have missed my comment made on FireWorks changes, I'll explicitly paste the new unit test that it's in FireWorks now that does parameter checking:

    def test_param_checks(self):

        class DummyTask(FiretaskBase):
            _fw_name = "DummyTask"
            required_params = ["param1"]
            optional_params = ["param2"]

        self.assertRaises(RuntimeError, DummyTask, param2=3)  # missing required param
        self.assertRaises(RuntimeError, DummyTask, param1=3, param3=5)  # extraneous param
        DummyTask(param1=1)  # OK
        DummyTask(param1=1, param2=1)  # OK

Hopefully this will avoid any further questions / confusion on the same topic.

@shyuep
Copy link
Collaborator

shyuep commented Jun 11, 2019

@computron I did understand your changes. The fact is that it requires someone to have actually specified required and optional parameters for there to have any checks. My suggestion is a minor mod that even if someone did not supply required and optional params, garbage arguments are not allowed by default in Atomate. This is the subtle difference. In Fireworks, this would be allowed.

@computron
Copy link
Contributor

We both want atomate FireTasks to include basic argument checking to help prevent problems like what Matt saw.

You want to have a separate base class that is almost the same as FireTaskBase, but in which the default required_params and optional_params are essentially both [] (currently, they are None, which means unspecified). e.g., in the absence of no other information, assume that no parameters at all are allowed. And then also to add another variable which would toggle off the parameter checking if needed.

What I want is for everyone in atomate to specify required_params and optional_params unless they have a good reason not to. So if no parameters are allowed, one must explicitly set required_params and optional_params to [] to make the clear to everyone instead of allowing someone to neglect putting those parameters. This all works without requiring a new base class or a new parameter.

In your case, to enforce that people are following the rules, you must somehow make sure they are subclassing AtomateFiretaskBase. Either at PR time or through a unit test.

In my case, to enforce that people are following the rules, I must somehow make sure they are specifying "required_params" and "optional_params". Either at PR time or through a unit test.

Functionally, they are both the same. The difficulty of enforcement seems the same to me. I don't know how to convince you that my way will be more intuitive to most people. But, if you feel strongly about your way then you can open an issue to modify the way that it works. If other developers agree that it is more intuitive then I will switch it.

@bocklund
Copy link
Contributor

bocklund commented Jun 11, 2019

What I want is for everyone in atomate to specify required_params and optional_params unless they have a good reason not to. So if no parameters are allowed, one must explicitly set required_params and optional_params to [] to make the clear to everyone instead of allowing someone to neglect putting those parameters. This all works without requiring a new base class or a new parameter.

It should be possible to write a test that asserts that cls.required_params is not None (same for optional) for all classes in all Firetask-containing atomate modules if issubclass(cls, FiretaskBase)

@shyuep
Copy link
Collaborator

shyuep commented Jun 11, 2019

@computron Functionally they are the same, but the intuition and ease of bad coding are different. Say I am a lazy coder who just wants to write something that will pass all the unit tests. I write BadFireTask and just set required_params and optional_params to []. This will basically allow me to pass any kind of garbage to the FireTask. Atomate's BDFL sees my FireTask and interprets that I have done my job to specify required_params and optional_params (even though both are []) and merges. Yay, everyone is happy, until Horton the Elephant comes along and stumbles on the Who bug above that is nigh impossible to catch otherwise.

In my world, everyone subclasses AtomateFireTaskBase, which is in Atomate itself. I don't have to refer to FireWorks doc to find out what AtomateFireTaskBase is. There is no way I can be lazy and write required_params = [] and optional_params = [] to pass tests, because if something is not in there, it is not allowed. If I instead writes no_arg_check = True, it is immediately a red flag that the BDFL is going to ask - why do you need that special flexibility? Self-documentation of classes dramatically improves because Horton sees a required_params and optional_params. To make sure doubly sure, I write a self-inspecting unittest that checks that all FireTasks that can be imported within the atomate package is a subclass of AtomateFireTaskBase.

Anyway, that's just my world view. It comes from experience that making a choice is always a cognitive load. Whether you opt-in vs opt-out of organ donations, or 401k participation, for example. Studies have time and time shown again that the path of least resistance must push people to the desired behavior.

@bocklund
Copy link
Contributor

bocklund commented Jun 11, 2019

required_params = [] and optional_params = [] will raise an error if any parameters are passed based on my understanding of the FireWorks implementation

@shyuep
Copy link
Collaborator

shyuep commented Jun 11, 2019

@bocklund Ah yes, sorry my mistake. I meant if required and optional args are not specified.

@computron
Copy link
Contributor

Brandon is correct. There is no way you can write required_params = [] and optional_params = [] to pass tests in my solution, since in that case if someone passes in any argument at all to the Firetask the code will throw a RuntimeError.

I am done speaking on this thread, as there is only so many times I can say the same thing ... additional comments will not get any responses from me.

@bocklund
Copy link
Contributor

If they aren't specified, then the default FiretaskBase.required_params = None and FiretaskBase.optional_params = None. So an automated test that introspects all firetasks modules and asserts that all subclasses of FiretaskBase have required and optional params are not None would be sufficient.

I think the solution implemented should make everyone here happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants