Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary of tests in machine readable format #126523

Open
Flamefire opened this issue May 17, 2024 · 7 comments
Open

Summary of tests in machine readable format #126523

Flamefire opened this issue May 17, 2024 · 7 comments
Labels
module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Flamefire
Copy link
Collaborator

Flamefire commented May 17, 2024

馃悰 Describe the bug

We are running the test suite python run_test.py --continue-through-error after installing PyTorch on our (many) clusters. For that we need to know how many tests were run, succeeded and failed to make an assessment whether the installation is working at all and especially in that environment. Many of the issues I reported and PRs I submitted are based on that.

Until recently we were able to parse the build output containing summary lines such as

    # ===================== 2 failed, 128 passed, 2 skipped, 2 warnings in 3.43s =====================
    # test_quantization failed!

This is easy to grep for and even extract manually. However recently (2.3.0 release) we get something like (I'll file a separate bug report for that failure once I know more, but the specific failure is only an example here):

=======================short test summary info =======================
FAILED [0.0003s] export/test_lift_unlift.py::TestLift::test_duplicate_constant_access - OSError: /caffe2/test/cpp/jit:test_custom_class_registrations: cannot open shared object file: No such file or directory
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================= 1 failed, 2 rerun in 0.07s =======================
Got exit code 1
Retrying...
[above repeated a few times]
Got exit code 1
Retrying...
======================short test summary info ======================
FAILED [0.0003s] export/test_lift_unlift.py::TestLift::test_unlift_nonpersistent_buffer - OSError: /caffe2/test/cpp/jit:test_custom_class_registrations: cannot open shared object file: No such file or dire...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================== 1 failed, 3 deselected, 2 rerun in 0.07s =============
Got exit code 1
Retrying...
[Similar until]
==================== 5 deselected in 0.06s ====================
The following tests failed consistently: ['test/export/test_lift_unlift.py::TestLift::test_duplicate_constant_access', 'test/export/test_lift_unlift.py::TestLift::test_lift_basic', 'test/export/test_lift_unlift.py::TestLift::test_lift_nested', 'test/export/test_lift_unlift.py::TestLift::test_unlift_nonpersistent_buffer', 'test/export/test_lift_unlift.py::ConstantAttrMapTest::test_dict_api']
export/test_lift_unlift 1/1 failed!

This is mostly due to 3b7d60b which removed the following and replaced the whole logic by a custom pytest plugin rerunning differently (basically: Manually rerun the suite continuing after the first failure until the next, then repeat):

    elif options.continue_through_error:
        # If continue through error, don't stop on first failure
        rerun_options = ["--reruns=2"]
[Other test]
==================49 passed, 4 skipped, 41 deselected in 7.04s ==================

So that cannot be reasonably parsed check for the number of failures or the number of tests anymore.
Additionally, as tests are run in parallel, partial outputs from different tests appear after each other making it virtually impossible to attribute some output to a specific test.

I have seen the option --save-xml which seems to be intended for such purposes but that isn't passed by run_test.py to the tests spawned by it. Hence reporting this here as a bug instead of only a feature request.

The main question: Is there any reliable way to extract the number of failed and run tests after running run_test.py?

Versions

PyTorch 2.2.0-2.3.1 and main as of now

cc @mruberry @ZainRizvi

@mikaylagawarecki
Copy link
Contributor

cc @clee2000 since you added 3b7d60b

@mikaylagawarecki mikaylagawarecki added module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 20, 2024
@clee2000
Copy link
Contributor

I can move the option to save xml into an argument. The paths to the xmls are generated by run_test.py though, is that ok?

@Flamefire
Copy link
Collaborator Author

I can move the option to save xml into an argument. The paths to the xmls are generated by run_test.py though, is that ok?

You mean this code here:

def get_report_path(argv=UNITTEST_ARGS, pytest=False):
test_filename = sanitize_test_filename(argv[0])
test_report_path = TEST_SAVE_XML + LOG_SUFFIX
test_report_path = os.path.join(test_report_path, test_filename)
if pytest:
test_report_path = test_report_path.replace('python-unittest', 'python-pytest')
os.makedirs(test_report_path, exist_ok=True)
test_report_path = os.path.join(test_report_path, f"{test_filename}-{os.urandom(8).hex()}.xml")
return test_report_path
os.makedirs(test_report_path, exist_ok=True)
return test_report_path

To me it looks like all XML files go into the folder passed via --save-xml, but might be in subfolders and might have a random name. Unless --log-suffix is used (e.g.

command = [sys.executable] + argv + [f'--log-suffix=-shard-{i + 1}'] + test_batches[i]
) where a string is appended to the folder name. Still usable as --save-xml=/tmp/test-reports/report could be used and then all XMLs are in /tmp/test-reports

Main question is: Is that even a feasible approach? I.e. does this work to get all failed tests, especially with the retry logic where I'm not sure if it will skip the "consistently failing" test and still try all others (before and after that failing one)

As those XMLs doesn't seem to used (otherwise it would have been already noticed that it got broken) is there any other/better approach?

@clee2000
Copy link
Contributor

clee2000 commented May 20, 2024

I'm thinking something more along the lines of #126690 Ignore my previous comment as well

Have you tried using --save-xml as an additional unittest arg?

Our CI makes xmls and saves them to s3, you can download them on HUD to see what format and file paths they take. There are already a couple of scripts in the repository that parse the xmls to get certain information, you can probably copy a bunch of code from that

It doesn't get all failed tests, there are some cases when xmls don't get made #123882 (but I think the log parsing method you used previously would have also not been able to handle this), but it will handle most retries correctly, you will have to dedup based on test name though.

I guess it's also worth noting that you could parse the logs since all the test names should be there if you use the verbose option?

@Flamefire
Copy link
Collaborator Author

Our CI makes xmls and saves them to s3

Where does that happen, i.e. where are the xmls created?

It doesn't get all failed tests, there are some cases when xmls don't get made #123882

There seem to be 2 issues: Force-Terminated tests and reruns. I'd expect the XML files to contain also the rerun tests even currently as the file names get randomized.

but I think the log parsing method you used previously would have also not been able to handle this

IIRC when pytest-rerunfailures was used (plain, prior to 3b7d60b) then there was still a full summary line for each test file like 2 failed, 128 passed, 2 rerun which we could use.
For force-terminated files we treated it as a hard failure or excluded the test(s)

I guess it's also worth noting that you could parse the logs since all the test names should be there if you use the verbose option?

We actually do that already to be able to print individual failed tests in addition to the summaries (per test file and total). It even allowed to validate that we got all individual failed tests by checking against the summary. But that already turned out to be error-prone due to the different formats. Some examples:

    # === FAIL: test_add_scalar_relu (quantization.core.test_quantized_op.TestQuantizedOps) ===
    # --- ERROR: test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ---
    # FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_grad_linalg_det_singular_cpu_complex128 - [snip]
    # FAILED [22.8699s] test_sparse_csr.py::TestSparseCompressedCPU::test_invalid_input_csr_large_cpu - [snip]

@clee2000
Copy link
Contributor

clee2000 commented May 20, 2024

The save xml argument is set by default by the CI environment variable, which is true in our CI. I guess the other option would be for you to set it as well, but it also turns on a couple of other things so you'd have to be careful

I'm not sure what a force terminated test is but the case where it doesn't generate xml is when a test segfaults, which leads to xml not getting created and also the === x failed, y skipped === line not getting printed. Then the rerun starts from the segfaulted test so all information about the tests before the segfaulted test gets lost. I can't remember what the previous behavior was, but depending on how far back it was, I'm pretty sure the previous test information still gets lost. Other than this, the reruns should get xml, its just that it'll be in a different file so its a bit more parsing

I edited my comment because I realized I understood the usage of the save xml argument incorrectly so I'm not sure if you saw this, but have you tried using --save-xml as an additional unittest arg? nvm this also doesn't work the way I think it does

@Flamefire
Copy link
Collaborator Author

The save xml argument is set by default by the CI environment variable, which is true in our CI. I guess the other option would be for you to set it as well, but it also turns on a couple of other things so you'd have to be careful

Is that export CI=true set by e.g. GitHub? I found

pytorch/test/run_test.py

Lines 993 to 997 in b36e018

if IS_CI:
# Add the option to generate XML test report here as C++ tests
# won't go into common_utils
test_report_path = get_report_path(pytest=True)
pytest_args.extend(["--junit-xml-reruns", test_report_path])
in run_test setting --junit-xml-reruns but the same flag is set in common_utils only when --save-xml is passed https://github.com/pytorch/pytorch/blob/b36e01801b89a516f4271f796773d5f4b43f1186/torch/testing/_internal/common_utils.py#L1108-L1111 and the former only applies to tests run with pytest. So it looks incomplete to me.

I'm not sure what a force terminated test is but the case where it doesn't generate xml is when a test segfaults, which leads to xml not getting created

Yes I meant test (files) terminated by a signal like SIGSEGV but also SIGIOT (which we have observed)

all information about the tests before the segfaulted test gets lost. I can't remember what the previous behavior was, but depending on how far back it was, I'm pretty sure the previous test information still gets lost.

Yes it was the same here: Most of the information was lost (e.g. that summary line) but some individual tests could be found by parsing the log, but that is fragile, see my previous comment

Other than this, the reruns should get xml, its just that it'll be in a different file so its a bit more parsing

Just be sure I understood this correctly: For segfaulted tests there won't be any xmls even with the new rerun system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants