Summary of tests in machine readable format #126523

Flamefire · 2024-05-17T08:45:45Z

🐛 Describe the bug

We are running the test suite python run_test.py --continue-through-error after installing PyTorch on our (many) clusters. For that we need to know how many tests were run, succeeded and failed to make an assessment whether the installation is working at all and especially in that environment. Many of the issues I reported and PRs I submitted are based on that.

Until recently we were able to parse the build output containing summary lines such as

    # ===================== 2 failed, 128 passed, 2 skipped, 2 warnings in 3.43s =====================
    # test_quantization failed!

This is easy to grep for and even extract manually. However recently (2.3.0 release) we get something like (I'll file a separate bug report for that failure once I know more, but the specific failure is only an example here):

=======================short test summary info =======================
FAILED [0.0003s] export/test_lift_unlift.py::TestLift::test_duplicate_constant_access - OSError: /caffe2/test/cpp/jit:test_custom_class_registrations: cannot open shared object file: No such file or directory
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================= 1 failed, 2 rerun in 0.07s =======================
Got exit code 1
Retrying...
[above repeated a few times]
Got exit code 1
Retrying...
======================short test summary info ======================
FAILED [0.0003s] export/test_lift_unlift.py::TestLift::test_unlift_nonpersistent_buffer - OSError: /caffe2/test/cpp/jit:test_custom_class_registrations: cannot open shared object file: No such file or dire...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================== 1 failed, 3 deselected, 2 rerun in 0.07s =============
Got exit code 1
Retrying...
[Similar until]
==================== 5 deselected in 0.06s ====================
The following tests failed consistently: ['test/export/test_lift_unlift.py::TestLift::test_duplicate_constant_access', 'test/export/test_lift_unlift.py::TestLift::test_lift_basic', 'test/export/test_lift_unlift.py::TestLift::test_lift_nested', 'test/export/test_lift_unlift.py::TestLift::test_unlift_nonpersistent_buffer', 'test/export/test_lift_unlift.py::ConstantAttrMapTest::test_dict_api']
export/test_lift_unlift 1/1 failed!

This is mostly due to 3b7d60b which removed the following and replaced the whole logic by a custom pytest plugin rerunning differently (basically: Manually rerun the suite continuing after the first failure until the next, then repeat):

    elif options.continue_through_error:
        # If continue through error, don't stop on first failure
        rerun_options = ["--reruns=2"]
[Other test]
==================49 passed, 4 skipped, 41 deselected in 7.04s ==================

So that cannot be reasonably parsed check for the number of failures or the number of tests anymore.
Additionally, as tests are run in parallel, partial outputs from different tests appear after each other making it virtually impossible to attribute some output to a specific test.

I have seen the option --save-xml which seems to be intended for such purposes but that isn't passed by run_test.py to the tests spawned by it. Hence reporting this here as a bug instead of only a feature request.

The main question: Is there any reliable way to extract the number of failed and run tests after running run_test.py?

Versions

PyTorch 2.2.0-2.3.1 and main as of now

cc @mruberry @ZainRizvi

The text was updated successfully, but these errors were encountered:

mikaylagawarecki · 2024-05-20T15:07:51Z

cc @clee2000 since you added 3b7d60b

clee2000 · 2024-05-20T16:10:24Z

I can move the option to save xml into an argument. The paths to the xmls are generated by run_test.py though, is that ok?

Flamefire · 2024-05-20T16:57:54Z

I can move the option to save xml into an argument. The paths to the xmls are generated by run_test.py though, is that ok?

You mean this code here:

pytorch/torch/testing/_internal/common_utils.py

Lines 968 to 978 in 7aa853a

    
           def get_report_path(argv=UNITTEST_ARGS, pytest=False): 
        
               test_filename = sanitize_test_filename(argv[0]) 
        
               test_report_path = TEST_SAVE_XML + LOG_SUFFIX 
        
               test_report_path = os.path.join(test_report_path, test_filename) 
        
               if pytest: 
        
                   test_report_path = test_report_path.replace('python-unittest', 'python-pytest') 
        
                   os.makedirs(test_report_path, exist_ok=True) 
        
                   test_report_path = os.path.join(test_report_path, f"{test_filename}-{os.urandom(8).hex()}.xml") 
        
                   return test_report_path 
        
               os.makedirs(test_report_path, exist_ok=True) 
        
               return test_report_path

To me it looks like all XML files go into the folder passed via --save-xml, but might be in subfolders and might have a random name. Unless --log-suffix is used (e.g.

pytorch/torch/testing/_internal/common_utils.py

Line 1100 in 7aa853a

    
           command = [sys.executable] + argv + [f'--log-suffix=-shard-{i + 1}'] + test_batches[i]

) where a string is appended to the folder name. Still usable as --save-xml=/tmp/test-reports/report could be used and then all XMLs are in /tmp/test-reports

Main question is: Is that even a feasible approach? I.e. does this work to get all failed tests, especially with the retry logic where I'm not sure if it will skip the "consistently failing" test and still try all others (before and after that failing one)

As those XMLs doesn't seem to used (otherwise it would have been already noticed that it got broken) is there any other/better approach?

clee2000 · 2024-05-20T17:19:45Z

~~I'm thinking something more along the lines of #126690~~ Ignore my previous comment as well

Have you tried using --save-xml as an additional unittest arg?

Our CI makes xmls and saves them to s3, you can download them on HUD to see what format and file paths they take. There are already a couple of scripts in the repository that parse the xmls to get certain information, you can probably copy a bunch of code from that

It doesn't get all failed tests, there are some cases when xmls don't get made #123882 (but I think the log parsing method you used previously would have also not been able to handle this), but it will handle most retries correctly, you will have to dedup based on test name though.

I guess it's also worth noting that you could parse the logs since all the test names should be there if you use the verbose option?

Flamefire · 2024-05-20T18:25:46Z

Our CI makes xmls and saves them to s3

Where does that happen, i.e. where are the xmls created?

It doesn't get all failed tests, there are some cases when xmls don't get made #123882

There seem to be 2 issues: Force-Terminated tests and reruns. I'd expect the XML files to contain also the rerun tests even currently as the file names get randomized.

but I think the log parsing method you used previously would have also not been able to handle this

IIRC when pytest-rerunfailures was used (plain, prior to 3b7d60b) then there was still a full summary line for each test file like 2 failed, 128 passed, 2 rerun which we could use.
For force-terminated files we treated it as a hard failure or excluded the test(s)

I guess it's also worth noting that you could parse the logs since all the test names should be there if you use the verbose option?

We actually do that already to be able to print individual failed tests in addition to the summaries (per test file and total). It even allowed to validate that we got all individual failed tests by checking against the summary. But that already turned out to be error-prone due to the different formats. Some examples:

    # === FAIL: test_add_scalar_relu (quantization.core.test_quantized_op.TestQuantizedOps) ===
    # --- ERROR: test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ---
    # FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_grad_linalg_det_singular_cpu_complex128 - [snip]
    # FAILED [22.8699s] test_sparse_csr.py::TestSparseCompressedCPU::test_invalid_input_csr_large_cpu - [snip]

clee2000 · 2024-05-20T18:36:59Z

The save xml argument is set by default by the CI environment variable, which is true in our CI. I guess the other option would be for you to set it as well, but it also turns on a couple of other things so you'd have to be careful

I'm not sure what a force terminated test is but the case where it doesn't generate xml is when a test segfaults, which leads to xml not getting created and also the === x failed, y skipped === line not getting printed. Then the rerun starts from the segfaulted test so all information about the tests before the segfaulted test gets lost. I can't remember what the previous behavior was, but depending on how far back it was, I'm pretty sure the previous test information still gets lost. Other than this, the reruns should get xml, its just that it'll be in a different file so its a bit more parsing

~~I edited my comment because I realized I understood the usage of the save xml argument incorrectly so I'm not sure if you saw this, but have you tried using --save-xml as an additional unittest arg?~~ nvm this also doesn't work the way I think it does

Flamefire · 2024-05-21T09:58:54Z

The save xml argument is set by default by the CI environment variable, which is true in our CI. I guess the other option would be for you to set it as well, but it also turns on a couple of other things so you'd have to be careful

Is that export CI=true set by e.g. GitHub? I found

pytorch/test/run_test.py

Lines 993 to 997 in b36e018

    
           if IS_CI: 
        
               # Add the option to generate XML test report here as C++ tests 
        
               # won't go into common_utils 
        
               test_report_path = get_report_path(pytest=True) 
        
               pytest_args.extend(["--junit-xml-reruns", test_report_path])

in run_test setting --junit-xml-reruns but the same flag is set in common_utils only when --save-xml is passed

https://github.com/pytorch/pytorch/blob/b36e01801b89a516f4271f796773d5f4b43f1186/torch/testing/_internal/common_utils.py#L1108-L1111

and the former only applies to tests run with pytest. So it looks incomplete to me.

I'm not sure what a force terminated test is but the case where it doesn't generate xml is when a test segfaults, which leads to xml not getting created

Yes I meant test (files) terminated by a signal like SIGSEGV but also SIGIOT (which we have observed)

all information about the tests before the segfaulted test gets lost. I can't remember what the previous behavior was, but depending on how far back it was, I'm pretty sure the previous test information still gets lost.

Yes it was the same here: Most of the information was lost (e.g. that summary line) but some individual tests could be found by parsing the log, but that is fragile, see my previous comment

Other than this, the reruns should get xml, its just that it'll be in a different file so its a bit more parsing

Just be sure I understood this correctly: For segfaulted tests there won't be any xmls even with the new rerun system?

mikaylagawarecki added module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary of tests in machine readable format #126523

Summary of tests in machine readable format #126523

Flamefire commented May 17, 2024 •

edited by pytorch-bot bot

mikaylagawarecki commented May 20, 2024

clee2000 commented May 20, 2024

Flamefire commented May 20, 2024

clee2000 commented May 20, 2024 •

edited

Flamefire commented May 20, 2024

clee2000 commented May 20, 2024 •

edited

Flamefire commented May 21, 2024

Summary of tests in machine readable format #126523

Summary of tests in machine readable format #126523

Comments

Flamefire commented May 17, 2024 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

mikaylagawarecki commented May 20, 2024

clee2000 commented May 20, 2024

Flamefire commented May 20, 2024

clee2000 commented May 20, 2024 • edited

Flamefire commented May 20, 2024

clee2000 commented May 20, 2024 • edited

Flamefire commented May 21, 2024

Flamefire commented May 17, 2024 •

edited by pytorch-bot bot

clee2000 commented May 20, 2024 •

edited

clee2000 commented May 20, 2024 •

edited