Skip to content
This repository has been archived by the owner on May 7, 2020. It is now read-only.

Improve unhelpful logging of rule engine #4317

Closed
ThomDietrich opened this issue Sep 21, 2017 · 20 comments
Closed

Improve unhelpful logging of rule engine #4317

ThomDietrich opened this issue Sep 21, 2017 · 20 comments

Comments

@ThomDietrich
Copy link
Contributor

ThomDietrich commented Sep 21, 2017

https://github.com/eclipse/smarthome/blob/a4dea98a1c91a36d4dbe6d5d616879988f176459/bundles/model/org.eclipse.smarthome.model.rule.runtime/src/org/eclipse/smarthome/model/rule/runtime/internal/engine/RuleEngineImpl.java#L341-L345

Hello, the above logging lines are often encountered while DSL rule development. Their minimalistic form can sometimes be confusing or non-helpful. One interesting example is:

Rule 'myRuleTest': java.lang.Number

Also quite common is:

Rule 'myRuleTest': The name '<XCastedExpressionImpl>.blaaa' cannot be resolved to an item or type.

I believe this could easily be improved for the sake of user experience. One simple approach would be to distinguish commonly known error reasons and add constructive feedback to the logging message.

@SJKA I believe you are the right person to trigger in this matter? What do you think?
@martinvw I think we discussed something similar a while back.

@martinvw
Copy link
Contributor

@ThomDietrich yes we should work on this, maybe we can start by collecting some example which lead to those errors and work from there on improving the error handling. If needed we can even pair up around the smarthome day 👍

@sjsf
Copy link
Contributor

sjsf commented Sep 23, 2017

I absolutely agree, it would help a lot if these error messages would be more helpful. However, while this sounds like a trivial task on the first glance, it probably will turn out to be pretty nasty in the end: these error messages are not created by us, but by the xtext parser, lexer and the scoping. Therefore I second @martinvw, we need to collect them (ideally a sample rule + error message) and then have a deeper look individually on what can be done in order to support xtext in that situation to generate better error messages. Sometimes there might have been taken a short-cut so that the scoping is clueless what to expect and we can change it to the better. Sometimes we might be able to add our custom validations with also our own error messages. In any case it's gonna be some effort.
Nevertheless, I'm looking forward to meeting you guys at the Smart Home Day in Ludwigsburg!

@rkoshak
Copy link

rkoshak commented Sep 25, 2017

One common misleading error that comes to mind is, for example, when I pass something to a Number Item that cannot be parsed into a Number I end up getting a NullPointerException (NPE) rather than a ParseException (or whatever would be appropriate). In fact, NPE seems to be the default exception thrown for pretty much all type and parsing problems.

One thing I have noticed is when I wrap my rules in a try/catch myself, I will actually get the ParseException so somewhere along the way that exception is being lost and replaced with a NPE.

I agree with @SJKA though, I suspect the root of the problem is the underlying Rules engine, and not that the developers have just ignored this issue.

One thing that would go a really long way towards more helpful errors would be the line in the Rule that generated the error. I suspect that once the Rule gets parsed that information gets lost, but if not just knowing specifically what line generated the error can be very helpful.

I am not completely up on what is going on with the Experimental Rules Engine but I hope that this sort of thing is being taken into account.

sjsf added a commit to sjsf/smarthome that referenced this issue Sep 26, 2017
...so that sendCommand or postUpdate don't fail with a weird NPE anymore
if e.g. a non-parsable string is attempted to be sent to a number item.

See eclipse-archived#4317 (comment)
Signed-off-by: Simon Kaufmann <simon.kfm@googlemail.com>
@sjsf
Copy link
Contributor

sjsf commented Sep 26, 2017

when I pass something to a Number Item that cannot be parsed into a Number I end up getting a NullPointerException (NPE) rather than a ParseException (or whatever would be appropriate). In fact, NPE seems to be the default exception thrown for pretty much all type and parsing problems.

That's indeed a good example for something which we have in our hands and can easily fix: In
BusEvent.java:142 the TypeParser does its best to turn the given argument into a type which the item accepts. It that doesn't work out, it returns null silently. Then the sending of the event miserably fails because it cannot handle a null state.
As I was there already, I quickly created #4346 for this.

@martinvw
Copy link
Contributor

Great, that we can already make steps!

maggu2810 pushed a commit that referenced this issue Sep 26, 2017
...so that sendCommand or postUpdate don't fail with a weird NPE anymore
if e.g. a non-parsable string is attempted to be sent to a number item.

See #4317 (comment)
Signed-off-by: Simon Kaufmann <simon.kfm@googlemail.com>
@martinvw
Copy link
Contributor

@ThomDietrich do you have the rules belonging to these exception then I can take a quick look as well :-)

@ThomDietrich
Copy link
Contributor Author

ThomDietrich commented Sep 27, 2017

Sure thing.

One thing I need to add because I ran into it while testing: There seems to be a huge difference between the execution of Item-triggered rules and "System started" rules. The below examples will show that. This needs to be addressed as well or we will fight a battle on multiple fronts forever.

1. "Rule 'myRuleTest': java.lang.Number"

Short: Type casting of NULL

Example:

rule "Test"
when
    System started
then
    logInfo("Test", "Item:   " + Test_Item_1.name)
    logInfo("Test", "State:  " + Test_Item_1.state)
    logInfo("Test", "Number: " + (Test_Item_1.state as Number))
    logInfo("Test", "Item:   " + Test_Item_2.name)
    logInfo("Test", "State:  " + Test_Item_2.state)
    logInfo("Test", "Number: " + (Test_Item_2.state as Number))
end

Results in:

2017-09-27 14:45:59.870 [INFO ] [.eclipse.smarthome.model.script.Test] - Item:   Test_Item_1
2017-09-27 14:45:59.875 [INFO ] [.eclipse.smarthome.model.script.Test] - State:  17
2017-09-27 14:45:59.881 [INFO ] [.eclipse.smarthome.model.script.Test] - Number: 17
2017-09-27 14:45:59.886 [INFO ] [.eclipse.smarthome.model.script.Test] - Item:   Test_Item_2
2017-09-27 14:45:59.892 [INFO ] [.eclipse.smarthome.model.script.Test] - State:  NULL
2017-09-27 14:45:59.897 [ERROR] [ntime.internal.engine.RuleEngineImpl] - Error during the execution of startup rule 'Test': java.lang.Number

Pay attention to the exact error.

  1. Item-triggered: this line of code
  2. System started triggered: that line of code

2. "Rule 'myRuleTest': The name '.blaaa' cannot be resolved to an item or type."

Short: non-existent method

Example:

rule "Test"
when
    System started
    or Item Test_Item_1 changed
then
    logInfo("Test", "Item:   " + Test_Item_1.name)
    logInfo("Test", "Number: " + (Test_Item_1.state as Number).blaaa)
end
  1. Item-triggered: "Rule 'Test': An error occurred during the script execution: The name '.blaaa' cannot be resolved to an item or type."
  2. System started triggered: no error logged. Rule crashes silently.

3. Non-existent Item

While we are at it, let's add yet another.

rule "Test"
when
    System started
    or Item Test_Item_1 changed
then
    logInfo("Test", "Item:   " + Non_Existent_Item.name)
end
  1. Item-triggered: "Rule 'Test': An error occurred during the script execution: The name 'Non_Existent_Item' cannot be resolved to an item or type."
  2. System started triggered: no error logged. Rule crashes silently.

@ThomDietrich
Copy link
Contributor Author

@SJKA as promissed 😉

@sjsf
Copy link
Contributor

sjsf commented Oct 27, 2017

Thanks 😄

Your first point about the two different places where the script interpreter is used obviously is very valid. This looks suspicious! I will check if I can figure out if there is/was a good reason for this. Otherwise this should be unified, of course.

The other two examples look perfectly fine to me when run at run-time. For the system started cases we will have to see if after fixing the above it gets any better. But to be honest I doubt it and assume that this rather happens because the startup rules are executed too early while not all registries are filled. We have #1896 to address this (and yes, I plan to put some more focus on it again now...).

@sjsf
Copy link
Contributor

sjsf commented Oct 27, 2017

Oh, I just found out some additional details regarding points 2.) and 3.). Apparently there is this "special handling" for startup rules here:

https://github.com/eclipse/smarthome/blob/a4dea98a1c91a36d4dbe6d5d616879988f176459/bundles/model/org.eclipse.smarthome.model.rule.runtime/src/org/eclipse/smarthome/model/rule/runtime/internal/engine/RuleEngineImpl.java#L312-L320

It basically intends to detect if a rule failed because of a missing item and then will re-execute it later, once a new item was added. It will just log a Execution of startup rule '{}' has been postponed as items are still missing message on debug level.

This obviously is a good idea if the item really exists and will be added later to the system. However, as you used a wrong item in 2.) which will never show up the rule will never get executed in a way that it fails "loudly". Which is a little sad if you didn't do this on purpose but accidentally have a typo in your item names (but remember, this only happens at startup).

The 3.)rd case is a little sad because the error message apparently is the same (the interpreter cannot know if it is a item or not which it is looking for). So it gets also "tricked" into the same repeat-once-an-item-gets-added-behavior like above.

To be honest, I'm not really sure what to do about it nevertheless. I still think solving #1896 would be the best choice because then we can completely eliminate this "special" behavior and get the same error messages in both cases.

@kaikreuzer
Copy link
Contributor

I still think solving #1896 would be the best choice

👍

@ThomDietrich
Copy link
Contributor Author

Yes sounds like it. I also agree that "...cannot be resolved to an item or type." is an acceptable result. I was wondering if they can be further distinguished. ".blaaa" is clearly neither an item nor a type, correct?

What's your opinion about the first case ("java.lang.Number")?

@sjsf
Copy link
Contributor

sjsf commented Oct 27, 2017

I did some interpreter archeology and I think I found a place to hook into (#4462) for those cases. Not exactly perfect, but good enough I think.

How about

Error during the execution of rule Initialize demo items: Could not cast NULL to java.lang.Number

and

Error during the execution of rule Initialize demo items: 'blaaa' is not a member of 'org.eclipse.smarthome.core.types.UnDefType'.

?

@sjsf
Copy link
Contributor

sjsf commented Oct 27, 2017

Oh, wait, hang on... I just figured out how to find out where it happened. So we could have something like this:

Error during the execution of rule Initialize demo items: The name 'testNumber1' cannot be resolved to an item or type.; line 6, column 34, length 11

@martinvw
Copy link
Contributor

Sounds great 👌

@ThomDietrich
Copy link
Contributor Author

ThomDietrich commented Oct 27, 2017

I have to second the 👌

line 6, column 34, length 11

This will make a lot of people very happy!! Thanks Simon!

Minor details: Where does the semicolon come from?

@sjsf
Copy link
Contributor

sjsf commented Oct 27, 2017

Good question. The ScriptError.java adds it by "itself". I'm not 100% sure though in which contexts it is used and how it is intended to look like where (no unit tests unfortunately). So changing this just needs a little more archeology to be done.

@ThomDietrich
Copy link
Contributor Author

I see. Well it's not that important. From the looks of it we'd be fine if you removed the periods at the end of your error strings.

@sjsf
Copy link
Contributor

sjsf commented Oct 27, 2017

You're right, that made it look weird

@ThomDietrich
Copy link
Contributor Author

@SJKA Thanks for your changes! I had the chance to see those additions in action yesterday and it's a huge improvement in the overall workflow. I'll close this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants