Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot-server): create isolation for tests and run them in parallel #11517

Closed
wants to merge 3 commits into from

Conversation

y3rsh
Copy link
Collaborator

@y3rsh y3rsh commented Sep 28, 2022

Overview

  • Massive speed up locally
    • M1 mac now takes ~1 minute
    • WSL takes ~2 minutes
  • Not much speed up in CI as the runner only goes to 2 pools

Changelog

  • Dynamically grab ports
  • Explicitly scope fixtures and confine tests as needed
  • Use built in temporary path from pytest

Review Requests

  • Is there a way not to repeat the code in *run_server? Since the params are fixtures I couldn't see a way.

Risk assessment

Low, test only. Not sure the port cleanup is great? Especially if ctrl-c or on failure?

@y3rsh y3rsh requested review from a team as code owners September 28, 2022 19:24
robot-server/Makefile Outdated Show resolved Hide resolved
tests ?= tests
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
test_opts ?=
test_opts ?= -n auto --dist loadscope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your experimentation, does it seem like loadscope has the intended effect? The online docs suggest it doesn't actually follow fixture scope; it's something coarser-grained based on files?

I guess if it misbehaves, the effect would be that expensive fixtures are redundantly set up across processes, which will be slower than expected but should at least be safe if the tests are written properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope's fault, per se.


On my dual-core 2018 MacBook Air:

Serial (old) Parallel (pytest-xdist)
Run time (wall clock) 3m25s 3m45s
CPU utilization 50% 100%
Worker processes 1 4

The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.

pytest-xdist worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session" fixture. Then, pytest-xdist happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.

My theory is that we're running into this with some of our expensive fixtures, like run_server, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.

If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist decide to distribute across them?

To fix this, we could either:

  • Make run_server (and other expensive scope="session" fixtures) truly global across worker processes, so they're not executed redundantly even if pytest-xdists distributes tests badly.
    • This would make scope="session" fixtures behave more closely to how we'd all hope and intuitively expect.
    • However, because run_server is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.
  • Help pytest-xdist distribute tests better.
    • Ideally, pytest-xdist would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled.
    • So we might have to write a custom pytest-xdist scheduler, which is a quasi-documented part of the pytest-xdist API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the insight as always Max. If tavern supported parameterized marks other than skipif we could upgrade pytest-xdist to 2.5 and use --dist loadgroup with @pytest.mark.xdist_group(name="group1") but it does not.
So I think we can keep the fixture scoping, add marks to tests optimally run in xdist, then run some tests using xdist and some not.

tests ?= tests
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
test_opts ?=
test_opts ?= -n auto --dist loadscope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Can we spell out full-length command line options, for readability?

Suggested change
test_opts ?= -n auto --dist loadscope
test_opts ?= --numprocesses auto --dist loadscope

Comment on lines +115 to 118
def _request_session() -> requests.Session:
session = requests.Session()
session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE})
return session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this existed before this PR, but a Session is a resource that should be close()d, since it represents a connection pool (among other things), and we don't want to leak TCP connections.

I think we should rewrite this like:

@contextmanager
def _request_session() -> Generator[requests.Session, None, None]:
    with requests.Session() as session:
        session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE})
        yield session

This ties into my other comments about using regular Python functions more and Pytest fixtures less.

Comment on lines 121 to +128
@pytest.fixture(scope="session")
def server_temp_directory() -> Iterator[str]:
new_dir = tempfile.mkdtemp()
def request_session() -> requests.Session:
return _request_session()


@pytest.fixture(scope="function")
def function_scope_request_session() -> requests.Session:
return _request_session()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These request_session and function_scope_request_session fixtures are only used in this file, correct? If I'm reading things correctly, they're just helpers to set up run_server and function_scope_run_server, which is what the tests actually care about?

I think our life gets a bit easier if we don't have fixtures for request_session and function_scope_request_session. If something in this file needs a Session, let it call _request_session() directly. No need to use Pytest's dependency injection machinery for it.

Per your review request, this will help deduplicate the code between run_server and function_scope_run_server. See my other comment.

Comment on lines +147 to +151
def _free_port() -> str:
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.bind(("localhost", 0))
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return str(sock.getsockname()[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this came from https://stackoverflow.com/a/45690594/497934, right?

I need to refresh my memory on how port allocation stuff like this works, but does it seem fishy that we close the socket when we return the port number? I worry that we're doing this:

  1. Open a socket and bind it to an automatically-chosen port
  2. Get the port that we just chose automatically
  3. Return the port and close the socket, thereby unintentionally freeing the port?

I'll dig around and see if there's a better way to do this, but I wonder if this whole thing needs to become a context manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I've gathered:

Given all of that, I propose we ditch this port auto-allocation and just use random() port numbers. I figure if things are going to be dodgy no matter what, we can at least keep the code simple.

If random port numbers prove insufficient, we can either:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will run with the random port and see how it goes.

Comment on lines +223 to +228
@pytest.fixture(scope="function")
def function_scope_run_server(
function_scope_request_session: requests.Session,
function_scope_server_temp_directory: str,
function_scope_free_port: str,
) -> Iterator["subprocess.Popen[Any]"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing your review question about deduplication, I think this gets a lot easier if we use regular Python functions more and Pytest fixtures less.

For example, we could have a _run_server() function that's a regular Python context manager that runs the server and returns its URI or something.

And then, if we want to reuse the same server across all tests in a file, we can easily wrap that plain Python function in a 5-line module-scope fixture.

For example, this is what we do in one of our integration tests:

# Module-scope to avoid the overhead of restarting the server between test functions.
# This relies on the test functions only reading, never writing.
@pytest.fixture(scope="module")
def dev_server(module_scope_free_port: str) -> Generator[DevServer, None, None]:
port = module_scope_free_port
with DevServer(
port=port,
persistence_directory=_OLDER_PERSISTENCE_DIR,
) as server:
server.start()
yield server

In other words, the idea is to keep our reusable building blocks defined as plain Python functions, because they're easy to compose; and only wrap them all up in a properly-scoped fixture at the top level.

Comment on lines +5 to 6
- free_port
- run_server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor the free_port and run_server fixtures into a common fixture called something like server_url_base or global_server_url_base.

The value of this fixture would be a string like http://localhost:12345, pointing to a running dev server. The Tavern tests would use it like url: '{server_url_base}/runs'.

The Tavern tests would not explicitly use a run_server fixture.

tests ?= tests
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
test_opts ?=
test_opts ?= -n auto --dist loadscope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope's fault, per se.


On my dual-core 2018 MacBook Air:

Serial (old) Parallel (pytest-xdist)
Run time (wall clock) 3m25s 3m45s
CPU utilization 50% 100%
Worker processes 1 4

The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.

pytest-xdist worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session" fixture. Then, pytest-xdist happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.

My theory is that we're running into this with some of our expensive fixtures, like run_server, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.

If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist decide to distribute across them?

To fix this, we could either:

  • Make run_server (and other expensive scope="session" fixtures) truly global across worker processes, so they're not executed redundantly even if pytest-xdists distributes tests badly.
    • This would make scope="session" fixtures behave more closely to how we'd all hope and intuitively expect.
    • However, because run_server is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.
  • Help pytest-xdist distribute tests better.
    • Ideally, pytest-xdist would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled.
    • So we might have to write a custom pytest-xdist scheduler, which is a quasi-documented part of the pytest-xdist API.

Comment on lines 169 to 170
@pytest.fixture(scope="session")
def run_server(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For flakiness reasons, I don't think run_server can remain scope="session" in a parallel world.

My understanding is that run_server is used primarily (or exclusively?) by our Tavern integration tests. Historically, these tests have shared a single run_server per test session, for performance reasons.

Our Tavern tests occasionally leak and affect each other. For example, maybe one test leaves behind a run resource, which affects what subsequent tests see when they do GET /runs.

So far, the combination of these facts hasn't caused any flakiness, because pytest always runs the tests serially and in an order that's consistent in practice.

But in a pytest-xdist parallelized world, tests will be allocated to dev servers unpredictably. An integration test might flakily succeed or fail depending on what other tests happened to get allocated to the same server before it.

I think we either need to:

@y3rsh
Copy link
Collaborator Author

y3rsh commented Nov 22, 2022

closing, will reopen after #11682 is merged, will be pushing on that 11/29

@y3rsh y3rsh closed this Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants