Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure message appears to contradict itself #18

Open
davidimoore opened this issue Oct 22, 2021 · 2 comments
Open

Failure message appears to contradict itself #18

davidimoore opened this issue Oct 22, 2021 · 2 comments
Labels

Comments

@davidimoore
Copy link

davidimoore commented Oct 22, 2021

Describe the problem

When executing a spec using the power matcher I receive an error that appears to be contradictory.

Steps to reproduce the problem

Create a spec that uses the perform_power matcher.

Your code here to reproduce the issue

      it "tests complexity" do
        expect{request}.to perform_power
      end

Actual behaviour

What happened? This could be a description, log output, error raised etc.

     Failure/Error: expect{request}.to perform_power
       expected block to perform power, but performed power

Expected behaviour

What did you expect to happen?
A passing test or a failing test stating the request did not perform power

Describe your environment

  • OS version: macOS Big Sur 11.6
  • Ruby version: 2.7.4
  • RSpec::Benchmark version: rspec-benchmark (0.6.0)
@davidimoore davidimoore changed the title Failure message appears to contradict itsself Failure message appears to contradict itself Oct 22, 2021
@piotrmurach
Copy link
Owner

Hi David,

Thank you for using rspec-benchmark and reporting this issue.

Would you be able to provide a minimal reproduction test case?

@piotrmurach
Copy link
Owner

Ok, I've spent some time investigating the reasons behind this nonsensical error message.

When assessing whether the expectation matches, two things are taken into account:

  • the fitness type e.i. logarithmic
  • the quality of the fit e.i. threshold - how well does the function approximate the observed trend.

A fit quality threshold is a number between 0 and 1. Values above 0.9 mean that the fit is very good which is the default. This value can be changed globally or per test. For example, to lower it to 0.8 you can do:

it "tests complexity" do
  expect { request }.to perform_power.threshold(0.8)
end

So the message expected block to perform power, but performed power means that the fit quality was below the 0.9.

Why is this even taken into account? A poor fit quality means that the range of values is hard to approximate to any trend line and the given complexity is only the best estimate. It would be better to 'improve' the test to get a more definite approximation and thus gain confidence about the measured complexity.

Now, this is not ideal, and can be resolved in two ways:

  • Expand the error message to include fit quality. For example,
expected block to perform power above 0.9 fit quality, but performed power at 0.87 fit quality
  • Removed the threshold from the equation and only compare the trend line.

I'm reluctant to go the route of removing the threshold because such tests may become very brittle. With low threshold values, the trend line can be hard to estimate and change with each test run. We want high confidence. Hence, I'm more inclined to improve the message and 'educate' about this parameter. Any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants