Is statistical confidence computation correct? #71

lorenzwalthert · 2021-10-26T11:13:01Z

The following seems suspicious:

update GHA r-lib/styler#857

assignUser · 2021-10-26T14:13:03Z

You mean the significance marker on +.25 +3.25%?

We actually don't calculate anything (we could ofc do t.test and check p value) but rather check if the confidence interval contains zero or not (as you laid out in the doc). And this meets the requirements.

lorenzwalthert · 2021-10-26T18:02:54Z

Not the marker, but the result itself. This PR has no speed implications and our confidence interval is not overlapping with 0. If our computation is correct, this should happen once in a blue moon, but I’ve seen it already more than once.

assignUser · 2021-10-26T18:12:10Z

Hm I haven't touched that part of the code yet but I can have a look.

lorenzwalthert · 2021-10-26T19:08:35Z

it's a linear model. Nothing fancy. Maybe it was just chance. But let's keep an eye on this.

assignUser · 2021-11-08T20:03:04Z

It is a blue moon:

kgoldfeld/simstudy#122

tangential to this issue: I have started a little project to collect data about GHA benchmarking, we could also use it to check this issue. You can have a look here https://github.com/assignUser/touchstone.collect
the data is stored on the data branch.

assignUser · 2021-11-08T20:13:57Z

On another tangent: while working on the project above I streamlined the github action for benchmarking and commenting into a single yaml and without using the cancle action: https://github.com/assignUser/simstudy/blob/new-gha/.github/workflows/touchstone-receive.yaml
Should I PR that?

lorenzwalthert · 2021-11-08T21:38:33Z

Re one action: I don't think it's documented (apart maybe got commits, PRs) but the reason there are two actions is purely security. It used to be one, but it is gauged unsafe:

https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

lorenzwalthert · 2021-11-08T21:44:06Z

You can always open an issue and ask questions about the current design and challenge it. This will only iMovie Code quality. There are no stupid questions. I just did not expect anyone to contribute to {touchstone} so soon, but I am glad we are two now. 😊

lorenzwalthert · 2021-11-08T21:45:40Z

Re: {touchstone.collect}, cool. I am not sure I fully understand, but I'll watch this space.

assignUser · 2021-11-08T21:59:25Z

Re one action: I don't think it's documented (apart maybe got commits, PRs) but the reason there are two actions is purely security. It used to be one, but it is gauged unsafe:

https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

Interesting read, thanks for the link!
There is a comment about tokens at the top of the commen yaml, now I understand.
My file would fail for PRs from outside of the repo due to the missing secret access. I'll add that as a document to do.

{touchstone.collect}: I wanted to have some data/graphs to show the inconsistency in runner performance you mention anecdotally in the doc, possibly to add some meat to a JOSS paper. But I also have a benchmark in there with the same speed in both branches, so we could use that data to investigate this issue or test/dial in another way to analyse or display the results.

lorenzwalthert · 2022-01-18T11:29:56Z

I just looked at kgoldfeld/simstudy#129 And I wondered if the users could be a able to specify a threshold himself for the icon. Does it always make sense to check if 0 is contained? One could argue that a ci that contains a 1% performance drop is still not enough to justify a 🐌 icon, so if the user specifies a custom value for null_boundary (Better name needed I think ), we check if the confidence interval overlaps with it.

assignUser mentioned this issue Jan 18, 2022

Make indicator boundary customisable #106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is statistical confidence computation correct? #71

Is statistical confidence computation correct? #71

lorenzwalthert commented Oct 26, 2021

assignUser commented Oct 26, 2021

lorenzwalthert commented Oct 26, 2021

assignUser commented Oct 26, 2021 via email •

edited

lorenzwalthert commented Oct 26, 2021

assignUser commented Nov 8, 2021

assignUser commented Nov 8, 2021

lorenzwalthert commented Nov 8, 2021

lorenzwalthert commented Nov 8, 2021 •

edited

lorenzwalthert commented Nov 8, 2021 •

edited

assignUser commented Nov 8, 2021

lorenzwalthert commented Jan 18, 2022

Is statistical confidence computation correct? #71

Is statistical confidence computation correct? #71

Comments

lorenzwalthert commented Oct 26, 2021

assignUser commented Oct 26, 2021

lorenzwalthert commented Oct 26, 2021

assignUser commented Oct 26, 2021 via email • edited

lorenzwalthert commented Oct 26, 2021

assignUser commented Nov 8, 2021

assignUser commented Nov 8, 2021

lorenzwalthert commented Nov 8, 2021

lorenzwalthert commented Nov 8, 2021 • edited

lorenzwalthert commented Nov 8, 2021 • edited

assignUser commented Nov 8, 2021

lorenzwalthert commented Jan 18, 2022

assignUser commented Oct 26, 2021 via email •

edited

lorenzwalthert commented Nov 8, 2021 •

edited

lorenzwalthert commented Nov 8, 2021 •

edited