More flexible result handling for other properties, e.g., quantitative properties #996

PhilippWendler · 2024-02-20T14:32:27Z

Factored out from #994:

In use cases like QComp the existing ways to define an expected verdict and classify the actual result into correct/wrong are not enough, because for example the expected verdict consists of a numeric value and an allowed error interval. To support this in BenchExec we need at least

a way to define different kinds of expected verdicts in the task-definition format
make [get_result_category] depend on the used property, e.g., by moving it into the Property class as a method

Then it also makes sense to create subclasses of Property as a way to group this nicely, e.g., have an SvcompProperty etc.

Likely we can keep the current assumption that a result category is one of correct, wrong, unknown, or error.

The text was updated successfully, but these errors were encountered:

incaseoftrouble · 2024-02-26T19:43:04Z

Related comment: When an expected_verdict can be anything, does it make sense to write expected_verdict: "False(xy)" instead of adding subproperty: xy to the task-definition? (reasoning: subproperty is domain-specific)

Also: Right now, <propertyfile> can be filtered by expectedverdict. Should the expected verdict then be always a string / the value of that tag compared to the string representation of the expected_verdict? This would effectively limit to one-dimensional results.

PhilippWendler · 2024-02-27T08:11:33Z

Related comment: When an expected_verdict can be anything, does it make sense to write expected_verdict: "False(xy)" instead of adding subproperty: xy to the task-definition? (reasoning: subproperty is domain-specific)

Well, let me quote you from #994:

Definitely in favor of additional keys (in general, I am against magic inside strings :) ). For example, we also would want to specify relative or absolute error, or might consider statistical methods (which take two parameters). Just seems more extensible this way :)

Of course in BenchExec we already have precedent of false(xy) as actual result. So we could consider this.
But if we put something like the precision into a separate key, then we should make it consistently always in separate keys.

Also: Right now, <propertyfile> can be filtered by expectedverdict. Should the expected verdict then be always a string / the value of that tag compared to the string representation of the expected_verdict? This would effectively limit to one-dimensional results.

Hm, what use cases would there even be for filtering numeric expected verdicts? I guess we can leave this out in beginning?

incaseoftrouble · 2024-02-27T10:22:02Z

But if we put something like the precision into a separate key, then we should make it consistently always in separate keys.

But that would be part of the query, not the expected verdict, right? So, for (single dimensional) quantitative properties, I think the expected_verdict would always be float.

Definitely in favor of additional keys (in general, I am against magic inside strings :) )

That was under the (wrong) assumption that precision would go into the verdict (which doesn't make sense after thinking about it). I like False(xy) more than a "sibling" key to expected verdict. Allowing for domain-specific magic is IMO ok for conciseness.

Hm, what use cases would there even be for filtering numeric expected verdicts?

Not much, I think. On that note, I think that the expectedverdict filter is a bit ad-hoc in this regard. (a special case of (derived) tags :) )

This was referenced Feb 20, 2024

Additional values/columns derived from tool result #997

Open

Discussion: Qualitative properties and extended task definitions #994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexible result handling for other properties, e.g., quantitative properties #996

More flexible result handling for other properties, e.g., quantitative properties #996

PhilippWendler commented Feb 20, 2024

incaseoftrouble commented Feb 26, 2024 •

edited

PhilippWendler commented Feb 27, 2024

incaseoftrouble commented Feb 27, 2024

More flexible result handling for other properties, e.g., quantitative properties #996

More flexible result handling for other properties, e.g., quantitative properties #996

Comments

PhilippWendler commented Feb 20, 2024

incaseoftrouble commented Feb 26, 2024 • edited

PhilippWendler commented Feb 27, 2024

incaseoftrouble commented Feb 27, 2024

incaseoftrouble commented Feb 26, 2024 •

edited