Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit testing overhaul - new algorithm and tolerance proposal #1309

Open
MH-ZShearer opened this issue Aug 23, 2023 · 4 comments
Open

Unit testing overhaul - new algorithm and tolerance proposal #1309

MH-ZShearer opened this issue Aug 23, 2023 · 4 comments

Comments

@MH-ZShearer
Copy link
Contributor

MH-ZShearer commented Aug 23, 2023

Is your feature request related to a problem? Please describe.
I originally thought the existing method of unit testing round trips of conversion rates via percent error comparison were inefficient. However, the more time I have spent thinking about this issue and working on an alternative, the less I know what the actual issue is.

Describe the solution you'd like
Personally, I like the significant figures approach over the percent error approach. I'm struggling to justify one approach over the other though.

Describe alternatives you've considered
Minimally, increase the default accepted percent error from 1e-5. Ideally, to 1e-9.

Additional context
I was initially convinced that the percent error comparison was inadequate, however I'm no longer convinced. However, I don't want my efforts to look at an alternative to go to waste so I'm presenting my thoughts and potential solution for your review and feedback.


Edit: Two "advantages" I thought about with significant figure comparison is that the output can be displayed "lined up" to quickly identify the difference and the arbitrary percent error that is selected would be replaced by an integer.


The below algorithm needs to be provided the expected and actual values as well as the number of significant digits to compare. The expected value's log10 is used to determine what the allowed difference can be. The difference ends up being calculated as (power + 1) / 2, so the maxDelta effectively allows actual to be +/- 0.5, relative to the significant digit's position.

The only use case that I've been able to identify as being "better" than percent error comparison is when expected is 0.0. With the percent error comparison, there is no tolerance, and the actual value must be 0.0. With the significant figures approach, n significant figures translate to expected +/- 10-n + 0.5.
Note: After further review, this is only because a special use case was added to handle zero a certain way, as if the ones place was a significant digit.

Something to keep in mind, the expected precision for a round trip calculation should only be concerned with float error, assuming the to/from equations are inverse operations. However, the operations are not always inverse operations. This translates into a tolerance that should be stricter but must still have some leniency. Additionally, higher round trip precision should be encouraged as all calculations are relative to base unit conversions.

private static bool IsAccurateToSignificantFigures(double expected, double actual, int significantFigures)
{
    var power = 0.0 - significantFigures + 1;

    if (expected != 0)
        power = Math.Floor(Math.Log10(Math.Abs(expected))) - significantFigures + 1;

    var maxDelta = Math.Pow(10, power) / 2;
    var delta = Math.Abs(expected - actual);

    return delta < maxDelta;
}
> IsAccurateToSignificantFigures(1004, 1005, 3);
// true
> IsAccurateToSignificantFigures(1001, 1009, 3);
// false
> IsAccurateToSignificantFigures(1, -1, 1);
// false
> IsAccurateToSignificantFigures(0, 0.0001, 3)
// true
> IsAccurateToSignificantFigures(9.995, 9.99499999999, 3);
// true

I was concerned with this new method being significantly slower due to additional math being performed so I ran some benchmarks. Even though an increase in time isn't as relevant for the testing library, there are a lot of tests to run, and this change shouldn't be impactful to the overall testing time. It appears this new method is roughly 3.5x slower than the current method. I would deem this to be acceptable, but that is not my call.

Note: Upon running multiple benchmarks, the results varied significantly. See below for details regarding performance benchmarking.

|    Method |      Mean |     Error |    StdDev |    Median | Rank | Allocated |
|---------- |----------:|----------:|----------:|----------:|-----:|----------:|
| OldMethod | 0.0020 ns | 0.0027 ns | 0.0022 ns | 0.0012 ns |    1 |         - |
| NewMethod | 0.0068 ns | 0.0059 ns | 0.0052 ns | 0.0055 ns |    2 |         - |

Edit: Upon further investigation, it turns out I overlooked the following warnings.

ZeroMeasurement
  BenchmarkDictionaryLookup.NewMethod: Default -> The method duration is indistinguishable from the empty method duration
  BenchmarkDictionaryLookup.OldMethod: Default -> The method duration is indistinguishable from the empty method duration

Effectively, this means the times are so small, Benchmark.NET is unable to provide accurate results. So I ran it again, iterating a million times and removed the Debug.Assert call (I don't benchmark often), and got the below results. They're effectively equivalent.

|    Method |     Mean |     Error |    StdDev | Rank | Allocated |
|---------- |---------:|----------:|----------:|-----:|----------:|
| NewMethod | 1.267 ms | 0.0052 ms | 0.0046 ms |    1 |       1 B |
| OldMethod | 1.274 ms | 0.0074 ms | 0.0069 ms |    1 |       1 B |
@MH-ZShearer MH-ZShearer changed the title Alternative algorithm to unit testing percent error comparison Alternative algorithm to unit testing - percent error comparison Aug 23, 2023
@MH-ZShearer
Copy link
Contributor Author

As a follow up regarding other unit testing "issues", I would like to further address the library's default error tolerance as well as how unit test comparison values are represented. Inaccurate expected values can have a negative impact on the unit testing as a whole. Take, for example, the PoundsForceInOneNewton value's current value to the precise value (1 / 4.4482216152605):

Current: 0.22481
Precise: 0.2248089430997104829100394134 (approx.)
Error:   0.00047% (5 sig. digits)

This error compounds with the error tolerance of the library (default 0.001%). That measurement alone would move the library to almost 0.0015% error, 50% more than currently stated. On top of this issue lies an inherent issue with the unit testing. Unit conversion regularly uses unsavory conversion rates, some even resulting in repeating decimals. If you were to give me two similar numbers above and ask me which one was more precise, I would select the "Current" one because precision is sometimes perceived as conciseness.

In #1308, the unit test values are represented as equations to prevent the above concise/precise issue from happening for maintainers. I understand it is easier to copy/paste the values into Wolfram Alpha, or similar, but if a unit test is under scrutiny, I believe it should require more time to evaluate it.

My proposal around all of this, even if the comparison method isn't changed from percent error (not sure it should be), is that library wide precision needs to be increased and the approach for "acceptable" unit test values should be re-evaluated. Take, for example, the #1308 PR that modifies the unit conversion rates towards the real-world values. Based on some quick tests, round-trip conversions pass a 0.000000001% (1e-11) error tolerance without issue. I believe this is what should be strived for as it incurs no performance degradation and provides additional confidence to the users of Units.NET that the numbers are precise.

But to talk about the elephant in the room, many of the conversion rates are not precise and are good enough for most use cases. In those cases, I believe the tolerance of those units should be required to be overridden (change 1e-9 to 1e-5) manually. I understand this may take some (read: a lot of) work, but if accepted, I will make the necessary changes. I would like more of the user base to provide feedback for this as I understand our position in our industry, and subsequent requirements, is higher than most users.

Ultimately, you're the maintainers and I will respect your final decision on this but wanted to provide my feedback beforehand. Worst case scenario, our application can create its own, similar unit testing that requires tighter tolerances.

@MH-ZShearer MH-ZShearer changed the title Alternative algorithm to unit testing - percent error comparison Unit testing overhaul - new algorithm and tolerance proposal Aug 25, 2023
@angularsen
Copy link
Owner

angularsen commented Aug 26, 2023

A lot of good information here, but there is a lot of text to digest and some ideas in several directions.

Can you help me understand better your proposal, by summarizing your exact proposed changes and any open questions to discuss, in a bullet list?

If I read it right, there are two ideas that each require a summary of changes/discussions:

  1. Error as significant digits vs relative/percentage (you mention you are not sure the change is worth it)
  2. Tighter error tolerance by default, f.ex. 1e-11 instead of 1e-5, requiring manual overrides for some units
  3. Also, in Fix force conversion rates #1308 (comment) you mention you do not agree with fixed constants in test assertions and calculations in JSON conversion functions, this could be a third discussion

@MH-ZShearer
Copy link
Contributor Author

Sorry, I can get a bit carried away at times. I think you have the gist of it, though. I hope this helps clarify things, or at least makes referencing the different points easier.

Note: I'm still not convinced that one algorithm is better than the other and the following is based on my observations and/or understandings.


Comparison Algorithm

% Error Sig. Fig.
Industry Standard1
Simple Algorithm2
Linearly scales with value3
Comparable to data type's precision4
More specific tolerance5
Easily Observed6
Performant7
  1. Most industries list error as percent error
  2. Easy to implement in software or calculate by hand
  3. The further from 0, the larger the acceptable difference
  4. Data type precision is specified in programming as significant figures
  5. Limited number of possibilities (0 - <dataTypePrecision>); percent error has unlimited possibilities
  6. Assuming the values are aligned around a decimal place, a human can quickly identify the error themselves
  7. Based on testing, performance is indistinguishable

Concept

Compare the first n number of significant digits of two numbers. This allows the tolerance to scale with the magnitude of a given number while accepting "half" of a significant figure in either direction. This approach is typically preferred/used when using limited precision values, i.e., 7-fractional digits of precision.


Tighter Default Tolerances

Advertised as the "accuracy" of the library. Closely tied to Unit Testing Values section.

  • Currently 1e-5 (4 sig figures)
  • Preferred 1e-10 (9 sig figures)
  • Ideally 1e-12+ (11+ sig figures)

Note: Many conversion rate tolerances would need to be overridden to compensate for new default tolerances.


Unit Testing Values

This is a bit more challenging and maybe should be a completely separate discussion. Essentially, unit testing should hold the JSON values to a very high standard so that if someone finds a "more precise" conversion rate, it may force them to question the change if the unit test then fails. I believe this is only achievable by using the highest precision values for the unit test. For comparison, the top is what I propose. The bottom is what is currently recommended:

- protected override double ShortTonsForceInOneNewton => 1.12404471549816e-4;
+ protected override double ShortTonsForceInOneNewton => 1.0 / (PoundsConversionRate * 2_000.0);
- protected override double ShortTonsForceInOneNewton => 1.12404471549816e-4;
+ protected override double ShortTonsForceInOneNewton => 1.124044715498552414550197e-4;

Readability takes a hit by using a double literal as it's unclear where it came from (unless specified in a comment). Truncating the value reduces the possible precision, which should be avoided. I believe the most important question to answer here is, "What is the purpose of the round-trip conversion unit tests?" Remember, Units.NET technically doesn't know what the precise value is (I could enter 1 N <-> 1 kgf and set the tests to be the same and it would pass.

@angularsen
Copy link
Owner

I will try to find time to get back to this sometime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants