smart analysis improvements #921

t184256 · 2024-04-05T17:19:25Z

Feature request

Is your feature request related to a problem? Please describe

The feature works as described, in that when one uses defaults, this is equal to asking for one nanosecond CIs, and the resulting CIs are ~1ns wide.

The feature doesn't work as expected, in that when one might approach a value close to a nanosecond, yet larger than a nanosecond, keep collecting data, and never have their CIs improve because the feature rejects extra data.

Describe the solution you'd like

estimating the amount of data to process must use random sampling in order to avoid the beginning of the results skewing the whole picture.
there should be some overshoot coefficient introduced into it just in case the estimation got lucky and the whole run was bleaker (~4x?)
the feature should log prominently into the report that it has discarded 87.95% of the data so that those who hit it and want to go deeper have a better chance of noticing why they hit the wall.

Describe alternatives you've considered

Not using the feature at all - not cool.
Disabling the feature by default and documenting one needs to overshoot - maybe that's a sensible option.

Additional context

tomato42 · 2024-04-05T17:29:29Z

If we use random sampling for both the initial estimation and the later analysis, then the resulting CI should be really close to the specified desired value

GeorgePantelakis · 2024-04-15T13:00:14Z

I do not think disabling by default will be good. I would say most regular users want okay-ish results in a short amount of time. The other features I do like. Also perhaps it will be useful to add an option to limit the data analyzed yourself. for example, if the script says that you need 230,000 points but you want to push it a bit more to be able to specify (via a flag) to use 250,000 points. What do you think about that?

GeorgePantelakis · 2024-05-16T11:34:11Z

Notes:

If there are fewer samples than needed for the first estimation just skip estimation, use whole sample.
Random sampling (in parallel, like Skillings-Mack)
For estimation create random 10k or 100k (take the tuple randomly with probability 1 in (total/100k))
From that, estimate what is the sample size we need to have the required estimate.
Add 10% to the size we calculated to make sure that we have a smaller CI (aim for CI 5% smaller than requested).
Do the rest of the steps as normal

t184256 assigned GeorgePantelakis Apr 5, 2024

tomato42 mentioned this issue May 2, 2024

tlsfuzzer/analysis.py: report for Hamming weight and more #920

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smart analysis improvements #921

smart analysis improvements #921

t184256 commented Apr 5, 2024

tomato42 commented Apr 5, 2024

GeorgePantelakis commented Apr 15, 2024

GeorgePantelakis commented May 16, 2024 •

edited by tomato42

smart analysis improvements #921

smart analysis improvements #921

Comments

t184256 commented Apr 5, 2024

Feature request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

tomato42 commented Apr 5, 2024

GeorgePantelakis commented Apr 15, 2024

GeorgePantelakis commented May 16, 2024 • edited by tomato42

GeorgePantelakis commented May 16, 2024 •

edited by tomato42