Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible result handling for other properties, e.g., quantitative properties #996

Open
2 tasks
PhilippWendler opened this issue Feb 20, 2024 · 3 comments

Comments

@PhilippWendler
Copy link
Member

Factored out from #994:

In use cases like QComp the existing ways to define an expected verdict and classify the actual result into correct/wrong are not enough, because for example the expected verdict consists of a numeric value and an allowed error interval. To support this in BenchExec we need at least

  • a way to define different kinds of expected verdicts in the task-definition format
  • make [get_result_category] depend on the used property, e.g., by moving it into the Property class as a method

Then it also makes sense to create subclasses of Property as a way to group this nicely, e.g., have an SvcompProperty etc.

Likely we can keep the current assumption that a result category is one of correct, wrong, unknown, or error.

@incaseoftrouble
Copy link
Contributor

incaseoftrouble commented Feb 26, 2024

Related comment: When an expected_verdict can be anything, does it make sense to write expected_verdict: "False(xy)" instead of adding subproperty: xy to the task-definition? (reasoning: subproperty is domain-specific)

Also: Right now, <propertyfile> can be filtered by expectedverdict. Should the expected verdict then be always a string / the value of that tag compared to the string representation of the expected_verdict? This would effectively limit to one-dimensional results.

@PhilippWendler
Copy link
Member Author

Related comment: When an expected_verdict can be anything, does it make sense to write expected_verdict: "False(xy)" instead of adding subproperty: xy to the task-definition? (reasoning: subproperty is domain-specific)

Well, let me quote you from #994:

Definitely in favor of additional keys (in general, I am against magic inside strings :) ). For example, we also would want to specify relative or absolute error, or might consider statistical methods (which take two parameters). Just seems more extensible this way :)

Of course in BenchExec we already have precedent of false(xy) as actual result. So we could consider this.
But if we put something like the precision into a separate key, then we should make it consistently always in separate keys.

Also: Right now, <propertyfile> can be filtered by expectedverdict. Should the expected verdict then be always a string / the value of that tag compared to the string representation of the expected_verdict? This would effectively limit to one-dimensional results.

Hm, what use cases would there even be for filtering numeric expected verdicts? I guess we can leave this out in beginning?

@incaseoftrouble
Copy link
Contributor

But if we put something like the precision into a separate key, then we should make it consistently always in separate keys.

But that would be part of the query, not the expected verdict, right? So, for (single dimensional) quantitative properties, I think the expected_verdict would always be float.

Definitely in favor of additional keys (in general, I am against magic inside strings :) )

That was under the (wrong) assumption that precision would go into the verdict (which doesn't make sense after thinking about it). I like False(xy) more than a "sibling" key to expected verdict. Allowing for domain-specific magic is IMO ok for conciseness.

Hm, what use cases would there even be for filtering numeric expected verdicts?

Not much, I think. On that note, I think that the expectedverdict filter is a bit ad-hoc in this regard. (a special case of (derived) tags :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants