Add information to logs with a custom runner #205

shym · 2022-11-21T21:08:10Z

Add a bash script to use as runner for our test suite

This script adds information to the logs:

explicit names of the test run when they start
some explicit messages for crashes (SIGSEGV, SIGBUS, ...)
and, maybe most importantly, anchors in CI logs, so that the main webpage contains the most important information and direct links to precise points of interest in the log

It is written as a bash script but making sure it can be used as a runner also on Windows CI (notably by commenting all ends of lines, without which it fails with errors about '\r's that don't exist...)

Example of result of the changes: the webpage for this run shows directly that we got a SEGV in Ephemeron on trunk (windows). Or this run with SIGABRT in buffers on trunk on macos.

Add a bash script to use as runner for our test suite This script adds information to the logs: - explicit names of the test run when they start - some explicit messages for crashes (SIGSEGV, SIGBUS, ...) - and, maybe most importantly, anchors in CI logs, so that the main webpage contains the most important information and direct links to precise points of interest in the log It is written as a bash script but making sure it can be used as a runner also on Windows CI (notably by commenting all ends of lines, without which it fails with errors about '\r's that don't exist...)

jmid · 2022-11-25T10:23:22Z

I'm a bit hesitant about this one.
As someone who has scanned CI logs for the past two weeks I certainly follow you on the benefits!

On the negative side,

this adds back in a custom runner now that we finally got rid of the clunky check_error_count
it is in bash - which is untyped, has lots of quirks (witness the newline issue), and has caused problems for people before (I think there was even an incident at JaneStreet)

I'm wondering if this is something

that could/should be handled in dune or in QCheck_base_runner instead
or something we could handle with a custom runner?

I've been thinking of the latter to avoid printing counterexamples twice (with and without return values)...

shym · 2022-11-28T17:50:13Z

I do agree it’d be nicer to go without bash. I had tried first to see if it would be possible to do in dune, but I couldn’t figure it out (see the fact that the error code shown in windows for a segfault is completely different from what it reports in linux).
The custom runner would be the nicest solution, especially if we also gain on the actual output.

shym · 2022-11-29T16:28:53Z

Now, to plead the case for that PR:

we are the end users for the test suite and can make choices that would otherwise be clearly dangerously fragile, so I thought that going with bash would not be a problem if it works in the environments where we run the suite,
as we see in our logs, segfaults are logged differently between Windows and Unix (negative exit value instead of actual signal) and, according to a few small tests, this problem actually comes with using Unix.system or Sys.command; so writing a custom (OCaml) runner will entail merging both behaviours, what bash is already doing for us.

shym · 2022-12-01T11:09:54Z

For the record, I thought using the echo dune stanza could at least provide the feature of having the path of the test being run before it starts but my attempt failed: https://github.com/shym/multicoretests/actions/runs/3591271602/jobs/6045600378#step:10:41

jmid · 2023-01-31T11:51:25Z

Overall, I can see benefit to the CI log anchors and would like to salvage them!

I'm wondering whether a good solution would be to extend QCheck to output the right format when running under a suitable environment variable, e.g., QCHECK_GA_CI as that could benefit more QCheck CI users and avoid introducing a specialized runner script. 🤔

What do you think?

How different an output should be printed to achieve that, e.g., for the below? (I see some "::-header" printing in the PR code)

random seed: 337549857
generated error fail pass / total     time test name
[✓] 1000    0    0 1000 / 1000     1.2s STM Bytes test sequential
[✓]   11    0    1   10 / 1000    57.4s STM Bytes test parallel

--- Info -----------------------------------------------------------------------

Negative test STM Bytes test parallel failed as expected (21 shrink steps):

                             |           
                             |           
                  .---------------------.
                  |                     |           
                To_seq          (Fill (7, 5, 'K'))  


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Bytes test parallel:

  Results incompatible with linearized model

                                                                                               |                                            
                                                                                               |                                            
                                                 .------------------------------------------------------------------------------------------.
                                                 |                                                                                          |                                            
     To_seq : ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'K'; 'K'; 'K'; 'K'; 'a'; 'a'; 'a'; 'a']                                (Fill (7, 5, 'K')) : Ok (())                               

================================================================================
success (ran 2 tests)

shym · 2023-01-31T13:01:26Z

The documentation explains here the syntax to report an error and create the anchor; tldr: echo "::error title=Failure in STM Bytes test parallel::STM Bytes test parallel failed on its 11th run after 57.4 seconds with seed 337549857".
That would nice indeed.
The bash runner provides one thing particularly useful for multicoretests that would be harder to bring into QCheck: it can report when a test crashes (on a signal). That would require wrapping all our tests in some sort of fork_prop (and ensuring we manage to report it as the proper signal even on Windows, something that dune fails at, by the way).

jmid · 2023-01-31T14:33:24Z

Ah, this old brain is finally starting to understand 😅
That's why you need a "parent process" to catch crashes and print the anchor (rather than just crash and burn).

For the weird Windows signalling, I think we should report it along with test case if possible to the dune developers.

jmid · 2023-01-31T14:35:19Z

BTW, thanks for the documentation link - much appreciated! 🙏
Potentially QCheck could ::group ... ::endgroup a list of tests in a test suite in such a mode.

shym force-pushed the bash-runner branch from 2ae5cb5 to b36ad1c Compare November 24, 2022 16:47

shym mentioned this pull request Feb 28, 2023

Add (and use) a runner wrapping every test, to improve error reporting #303

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add information to logs with a custom runner #205

Add information to logs with a custom runner #205

shym commented Nov 21, 2022

jmid commented Nov 25, 2022

shym commented Nov 28, 2022

shym commented Nov 29, 2022 •

edited

shym commented Dec 1, 2022

jmid commented Jan 31, 2023

shym commented Jan 31, 2023

jmid commented Jan 31, 2023

jmid commented Jan 31, 2023

Add information to logs with a custom runner #205

Are you sure you want to change the base?

Add information to logs with a custom runner #205

Conversation

shym commented Nov 21, 2022

jmid commented Nov 25, 2022

shym commented Nov 28, 2022

shym commented Nov 29, 2022 • edited

shym commented Dec 1, 2022

jmid commented Jan 31, 2023

shym commented Jan 31, 2023

jmid commented Jan 31, 2023

jmid commented Jan 31, 2023

shym commented Nov 29, 2022 •

edited