Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelism of pytest tests #1702

Closed
themr0c opened this issue Jan 24, 2019 · 20 comments
Closed

parallelism of pytest tests #1702

themr0c opened this issue Jan 24, 2019 · 20 comments

Comments

@themr0c
Copy link
Contributor

themr0c commented Jan 24, 2019

Issue Type

  • Bug report

To execute the tests, pytest-xdist is installed as a requirement, but pytest is invoked without the -n option which would permit to have some parallelism during tests.

See: https://pypi.org/project/pytest-xdist/

In pytest.ini, we could complete the addopts line with some option relevant to travis runner (to be determined).

[pytest]
addopts = -n3

This option has to be tested: maybe it could influence positively the duration of travis builds.

@themr0c
Copy link
Contributor Author

themr0c commented Jan 24, 2019

@webknjaz
Copy link
Member

FTR xdist internally identifies Travis and know that it has two cores

@webknjaz
Copy link
Member

Depends on: #1573
Fix PoC: webknjaz@87ccd81

@decentral1se decentral1se added this to the v.2.21 milestone Jan 31, 2019
@decentral1se
Copy link
Contributor

Added #1573 (comment). Any further ideas on this? I'm wondering if we can use State to have all this naming issue resolved internally (some k:v where instance maps to the instance+uuid naming or so) and keep the end-user facing instance names.

@webknjaz
Copy link
Member

Not sure, it'd be also weird: when users will run docker ps they'll see instance names with UUIDs...

@webknjaz
Copy link
Member

@lwm after rethinking what you wrote I think that you're actually on the right track and using state would probably be beneficial.

@decentral1se
Copy link
Contributor

Further data points: #1715 (comment).

@ssbarnea
Copy link
Member

I doubt we are facing a pytest-xdist issue here, is mostly the lack of concurrency support in molecule which prevents us from running tests in parallel. I attempted to run functional ones and it mainly felt appart. Still, this does not mean that we shoud not sort this. It just means that it will take a lot of time to make required changes for allowing molecule to run multiple instances in parallel, under the same user on a single machine.

I do want to be able to run molecule in parallel on multiple roles and multiple scenarios and I already have repositories where this is desirable. At this moment we run sequentially but this increases the runtime considerably as the number of scenarios and roles increases.

@ssbarnea
Copy link
Member

I created feature/parallel branch so we can start working towards this goal.

@decentral1se
Copy link
Contributor

decentral1se commented Jun 22, 2019

We need to solve two things here:

  • Making the scenario /tmp/... folder unique (files overwritten)
  • Making the instance name unique for the underlying driver (instances overwritten)

@ssbarnea
Copy link
Member

I am bit worried about unique instance names because it means no caching would be available. Also remember that user may run different steps at any time and any order (test, verify, converge, destroy,...). This makes me think that at most we could make a temp folder unique per scenario-path (checksum of full path?).

Another concern is leftovers. Who is going to assure that we do not endup with huge leftovers in molecule temp folders? (from partial executions).

BTW, I am considering having a look at how pre-commit does the caching of repos, maybe I can learn something out of it. But as my time is limited I would appreciate if someone else can look into that.

@decentral1se
Copy link
Contributor

I've been trying to make progress based on #2108 (review). Molecule partial sequence runs (create_sequence for example) cannot maintain state concurrently because the following sequence can't understand which cache it should load from the last run.

So, I've been thinking about this proposal:

  • We add a --parallel flag to only those molecule invocations which result in a destroy step in the sequence. These do not require a file based cache so we can store that in-memory. Each instance gets a unique UUID name for the duration of execution. No file cache, no problems.

We have the following sequences now: create_sequence, check_sequence, converge_sequence, destroy_sequence and test_sequence. Of those, only create_sequence and converge_sequence cannot be made concurrent.

We'd have to adapt our functional tests to only test sequences that we can run concurrently and rely on the fact that the create/converge moving parts are tested in the context of another functional test. I think that is fine in practice.

Thoughts?

@decentral1se decentral1se pinned this issue Jun 24, 2019
@decentral1se
Copy link
Contributor

OK, after a quick spike, the above plan is not workable because something must be stored to disk on the sequence run (like the roles downloaded files). So, I'm going to try and adjust the internal APIs to be parallelizable (use random UUIDs for paths and instances) instead of trying to memcache all the internal state. This seems doable and might actually be easier to manage.

@decentral1se
Copy link
Contributor

I've got a proof of concept over at #2135.

decentral1se added a commit to decentral1se/molecule that referenced this issue Jun 28, 2019
decentral1se added a commit to decentral1se/molecule that referenced this issue Jun 28, 2019
decentral1se added a commit to decentral1se/molecule that referenced this issue Jun 28, 2019
@decentral1se
Copy link
Contributor

decentral1se commented Jun 28, 2019

OK, #2137 is out now working on this. After that is merged then I think we can start to address this issue. We'll need to figure out how to allow pytest to select those tests that can be run in parallel - anything that does molecule test but not something that does molecule create, for example. I am not sure how best to achieve this right now.

Looks like this is where we need to look ... pytest-dev/pytest-xdist#18 ...

@decentral1se
Copy link
Contributor

decentral1se commented Jun 28, 2019

We could perhaps arrange it another way: we mark the tests that we know cannot be run in parallel and then arrange -n auto to be applied to all the rest. This might require us to re-arrange our CI configuration further. If anyone has time to brainstorm this, it'd be great.

An example of what we need to avoid: https://travis-ci.com/ansible/molecule/jobs/174056358#L1822.

decentral1se added a commit to decentral1se/molecule that referenced this issue Jul 2, 2019
@decentral1se

This comment has been minimized.

@decentral1se
Copy link
Contributor

PR to reduce workload over at #2146.

Also, I've experimented with pytest markers to run some functional tests in parallel and some not. I can only get ~25 to run in parallel and the rest ~200+ are still in serial mode. Maybe we can't remove sharding .... I might just go with the markers instead.

decentral1se added a commit to decentral1se/molecule that referenced this issue Jul 3, 2019
@decentral1se
Copy link
Contributor

See #2147 as an example of what we can do to paralleize the functional tests!

ssbarnea pushed a commit that referenced this issue Jul 4, 2019
@decentral1se
Copy link
Contributor

OK, #2155 is as far as we can take this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants