Failing tests #6955

tripleee · 2022-05-06T10:37:30Z

What problem has occurred? What issues has it caused?

Several of the recent tests are failures, possibly due to the recent refactoring #6948

For example, https://github.com/Charcoal-SE/SmokeDetector/runs/6320800336?check_suite_focus=true failed with the error traceback below.

What would you like to happen/not happen?

Tests should succeed when nothing is wrong (-:

May be relevant: When I run the tests locally, I get this warning:

test/test_blacklists.py:215
  /home/tripleee/charcoal/SmokeDetector/test/test_blacklists.py:215: PytestUnknownMarkWarning: Unknown pytest.mark.xdist_group - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.xdist_group(name="long_runner_1")

Here is the traceback from the example:

def test_metasmoke():
        orig_tell_rooms_with = chatcommunicate.tell_rooms_with
        chatcommunicate.tell_rooms_with = dummy_tell_rooms_with
        msg = Fake({
            "owner": {
                "name": "ArtOfCode",
                "id": 121520,
                "is_moderator": False
            },
            "room": {
                "id": 11540,
                "_client": {
                    "host": "stackexchange.com"
                }
            },
            "_client": {
                "host": "stackexchange.com"
            }
        })
        msg_source = "metasmoke is {}. Current failure count: {} " + "({id})".format(id=GlobalVars.location)
    
        chatcommands.metasmoke(original_msg=msg, alias_used="ms-up")
        assert GlobalVars.MSStatus.is_up()
        chatcommands.metasmoke(original_msg=msg, alias_used="ms-down")
>       assert GlobalVars.MSStatus.is_down()
E       AssertionError: assert False
E        +  where False = <function GlobalVars.MSStatus.is_down at 0x7f41f81237f0>()
E        +    where <function GlobalVars.MSStatus.is_down at 0x7f41f81237f0> = <class 'globalvars.GlobalVars.MSStatus'>.is_down
E        +      where <class 'globalvars.GlobalVars.MSStatus'> = GlobalVars.MSStatus

test/test_chatcommands.py:656: AssertionError
----------------------------- Captured stderr call -----------------------------
[09:55:04.7[34](https://github.com/Charcoal-SE/SmokeDetector/runs/6320800336?check_suite_focus=true#step:10:34)] KeyError: ('stackexchange.com', 11540)
2022-05-06 09:55:04.73[36](https://github.com/Charcoal-SE/SmokeDetector/runs/6320800336?check_suite_focus=true#step:10:36)09 UTC
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line 482, in __call__
    result = self.__func__(*processed_args, alias_used=alias_used)
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommands.py", line 926, in metasmoke
    if not is_privileged(msg.owner, msg.room):
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line 432, in is_privileged
    return user.id in _privileges[(room._client.host, room.id)] or user.is_moderator



  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line 482, in __call__
    result = self.__func__(*processed_args, alias_used=alias_used)
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommands.py", line 926, in metasmoke
    if not is_privileged(msg.owner, msg.room):
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line 432, in is_privileged
    return user.id in _privileges[(room._client.host, room.id)] or user.is_moderator

[09:55:04.738] KeyError: ('stackexchange.com', 11540)
2022-05-06 09:55:04.7[37](https://github.com/Charcoal-SE/SmokeDetector/runs/6320800336?check_suite_focus=true#step:10:37)917 UTC
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line 482, in __call__
    result = self.__func__(*processed_args, alias_used=alias_used)
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommands.py", line 926, in metasmoke
    if not is_privileged(msg.owner, msg.room):
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line [43](https://github.com/Charcoal-SE/SmokeDetector/runs/6320800336?check_suite_focus=true#step:10:43)2, in is_privileged
    return user.id in _privileges[(room._client.host, room.id)] or user.is_moderator



  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line [48](https://github.com/Charcoal-SE/SmokeDetector/runs/6320800336?check_suite_focus=true#step:10:48)2, in __call__
    result = self.__func__(*processed_args, alias_used=alias_used)
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommands.py", line 926, in metasmoke
    if not is_privileged(msg.owner, msg.room):
  File "/home/runner/work/SmokeDetector/SmokeDetector/chatcommunicate.py", line 432, in is_privileged
    return user.id in _privileges[(room._client.host, room.id)] or user.is_moderator

The text was updated successfully, but these errors were encountered:

tripleee · 2022-05-06T10:45:04Z

Not every test fails; for example, https://github.com/Charcoal-SE/SmokeDetector/commit/2e6783c67371f43192757b332345550944508ce3/checks succeeded just now.

This should partially resolve Charcoal-SE#6955. There's at least one more test failing intermittently, but this should resolve the majority. autopull

makyen · 2022-05-08T16:28:48Z

The closure of this issue was automated by GitHub as a result of my saying "partially resolve #[issue number]" in the commit. Apparently, it's just looking for "resolve #[issue number]" in order to invoke auto-close. That commit should resolve the majority of the failures we've seen, but we'll have to wait to see, as the issue was that tests that didn't fully set up the test environment were being run without having previously run a test earlier in the file which did set it up (i.e. the tests were not independent).

That's very likely what's happening with the other test which appears to be failing:

Test name	Link(s) to failing test(s)
`test/test_chatcommunicate.py::test_init`	1 [There's at least one prior to this, which, IIRC, looked basically the same.]

I'll take a look at that one too, but probably later.

makyen · 2022-05-22T02:59:17Z

Another potentially related CI failure:

https://app.circleci.com/pipelines/github/Charcoal-SE/SmokeDetector/71161/workflows/ccff8a1d-3aa6-46b2-9ed0-0438f1e95a75/jobs/109367

Failed: Privileged user 22202197 does not have a corresponding entry in users.yml

test/test_chatcommunicate.py:45: Failed
=========================== short test summary info ============================
FAILED test/test_chatcommunicate.py::test_validate_yaml - Failed: Privileged ...

There have been more of these: on 2022-05-24: 1, 2, 3

makyen · 2022-05-25T13:40:11Z

There have been multiple additional failures in test/test_chatcommunicate.py (e.g. 1, 2, 3).

While this is an annoyance wrt. having the CI tests fail, it does reveal a real issue, which is that chatcommunicate.py is substantially lacking wrt. thread safety. I've looked through the code in chatcommunicate.py and made edits to fix that. Before submitting a PR, I still need to check if there are other portions of the code which access or modify chatcommunicate.py's globals and make sure they use the new locks. Then, some testing to verify that there aren't any obvious deadlocks created.

makyen · 2022-05-27T18:46:01Z

For the issue you mention in the initial comment here when running the tests locally, that appears to be that the pytest-xdist package is not being applied during the tests. The error is saying that it doesn't understand a pytest.mark which is defined in that package. I added the pytest-xdist package to requirements.txt when the CI testing was changed to running tests in parallel. Have you installed the pytest-xdist package? If not, installing it will probably be the solution. If it is installed, then we'll need to see if we can figure out why it isn't automatically incorporated in the testing.

I'm sorry, I should have explicitly mentioned to you that the requirement.txt file had been updated. I also recently added psutil to requirements.txt, if you haven't installed that. That package is used in bodyfetcher.py to prevent starting additional threads when the CPU is already heavily loaded.

As for testing locally, my personal choice was to change the script which I use to start the tests to have pytest run the tests in parallel in order to substantially reduce the overall elapsed time, but use OS scripting methods to reduce the priority of the tests, so that running them doesn't interfere with my local system being interactive. If you do choose to run them in parallel, I'd suggest setting the number of threads to use to one more than the number of CPU cores which you are willing to have working on the task. The tests are configured such that one thread starts up first running the long DNS based verification test, which is very network bound and takes very little CPU. All remaining threads will tend to be first occupied by CPU intensive tasks. For example, when wanting to primarily allocate 4 CPU cores to the testing, I use python3 -W default::Warning -m pytest -n 5 --dist loadgroup to start up 1 thread which isn't CPU intensive and 4 which are CPU intensive. You should, of course, adjust the -n <x> setting to whatever fits for your setup. If you don't desire to have them run in parallel, then the existing way which you are invoking the tests should work, as long as pytest-xdist is installed.

tripleee · 2022-05-29T16:02:28Z

Thanks for the tips. For the record, I probably installed psutil manually at the time, but probably hadn't noticed pytest-xdist. (Not near my computer now.)

makyen · 2022-05-31T13:40:27Z

For the failure which is

FAILED test/test_chatcommunicate.py::test_validate_yaml - Failed: Privileged user 22202197 does not have a corresponding entry in users.yml

I added some extra printing, both of the data structures being tested and the raw file contents, as read within the test. The most recent error on Circle CI of this type shows the raw text contents of the file, as read by SD, doesn't have the line for that user (last lines of output in the "Pytest" section). However, the line for that user does, in fact, exist in the file when is checked out of git outside of the CI environment. It's unclear to me why that line is not being read. It's almost as if it's reading an older version of the file. The next thing to determine is if the file, as stored in the checked out repository, really doesn't contain that line or if the problem is it's not being read properly. In other words, is it a problem with the git checkout not putting the right version of the file on disk, or is it a problem with Python reading the file.

There's another issue with tests which involve reading from disk which also is intermittently causing errors. I haven't tracked it as far as this one, but the test_check_if_spam_json() test in test_spamhandling.py sometimes fails, due to being called with no test JSON data. That data is supposed to be read at the start of testing from the test/data_test_spamhandling.txt file, but the data when the test is called is empty. It's not clear these issues are related. I'm just noting it here as they both involve reading data from disk within the tests.

makyen · 2022-05-31T16:20:25Z

I've added cating the two files, users.yml and test/data_test_spamhandling.txt to the CircleCI tests just prior to executing pytest. That should at least tell us what's on the disk at that time.

Hmmmm... what could be happening for these is that the file contents are changing due to us performing git operations in other tests. I wouldn't have expected that to affect test/data_test_spamhandling.txt, as that data should be read prior any tests being run. But, it could affect users.yml, as that's read within the test.

makyen · 2022-06-02T16:59:05Z

It looks like this CircleCI test run confirms that the contents of the users.yml file which are read in the test/test_chatcommunicate.py::test_validate_yaml() test are different than what was on the disk prior to running pytest. So, either that test is somehow reading the contents of the file incorrectly, or we have something that's changing the content that's on the disk (which seems more likely). Given that we do have tests which do git operations, we should take a look at those to see if any of them change the contents of the files. It may mean we need to run those tests separately, or in some way prevent interference between tests.

makyen · 2022-06-08T14:40:17Z

As to the more prevalent sporadic errors (at least 3 today: 1, 2, 3:

FAILED test/test_chatcommunicate.py::test_validate_yaml - Failed: Privileged user 22202197 does not have a corresponding entry in users.yml

which I mention above, I've been working on adding threading locks on quite a bit of stuff which we use across threads, both in CI testing, but more importantly in normal operation of SD. Some of those changes should address the above issue, or at least reduce it. Given that those changes are adding a bunch of locks, I'd prefer to wait for the weekend to put them in. I'll try to get what I have so far into a PR in the next day or two. While I've made a lot of changes, there's still substantial sections of code which need to be reviewed for thread safety.

In addition to the changes I have so far for our code, I also have changes for the regex package. I still need to look in detail at both the ChatExchange and phonenumbers packages wrt. thread safety (a cursory check makes it appear both have issues which will need to be resolved).

makyen self-assigned this May 6, 2022

makyen closed this as completed in 2a92089 May 8, 2022

makyen reopened this May 8, 2022

makyen added type: bug Aaaah! Kill it! area: CI testing status: confirmed Confirmed as something that needs working on. labels May 22, 2022

makyen added the area: thread safety label May 25, 2022

makyen mentioned this issue May 27, 2022

Use threading.RLock() locks for global values in chatcommunicate.py #7033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing tests #6955

Failing tests #6955

tripleee commented May 6, 2022

tripleee commented May 6, 2022

makyen commented May 8, 2022 •

edited

makyen commented May 22, 2022 •

edited

makyen commented May 25, 2022

makyen commented May 27, 2022

tripleee commented May 29, 2022

makyen commented May 31, 2022

makyen commented May 31, 2022

makyen commented Jun 2, 2022

makyen commented Jun 8, 2022

Failing tests #6955

Failing tests #6955

Comments

tripleee commented May 6, 2022

What problem has occurred? What issues has it caused?

What would you like to happen/not happen?

tripleee commented May 6, 2022

makyen commented May 8, 2022 • edited

makyen commented May 22, 2022 • edited

makyen commented May 25, 2022

makyen commented May 27, 2022

tripleee commented May 29, 2022

makyen commented May 31, 2022

makyen commented May 31, 2022

makyen commented Jun 2, 2022

makyen commented Jun 8, 2022

makyen commented May 8, 2022 •

edited

makyen commented May 22, 2022 •

edited