Test for affected codebase subset instead of everything in the codebase #4453

yarnabrina · 2023-04-09T15:46:54Z

yarnabrina
Apr 9, 2023
Collaborator

As of now, all pull requests in sktime tests for each and every functionality. This is probably (actually quite certainly) safer, but it takes a LOT of time, especially since same tests run on multiple platforms. I've read in few issues that few tests are split across different platforms, but still it does take long.

For example, let's take an example: https://github.com/sktime/sktime/actions/runs/4615853258. The PR had no code changes, so certainly none of the configured workflows (except that of read the docs) have the scope of failing ever, except few very specific scenarios:

environment setup failure in GitHub actions
change of dependencies of any of sktime dependencies (like [BUG] attrs import error under minimal dependencies #4449)
(is there any more?)

I would like to think that these two are extreme rare scenarios and does not worth running tests for 2 hours and 35 minutes. Also, it'd never be in the scope of being fixed in same PR following the modular PR style followed here.

So, I was wondering if there is any reasons to not run tests selectively. For example, if I modify SARIMAX estimator, it should only tests on SARIMAX, and it should run those tests on every supported python version and every supported platform. But it should not run tests in clustering or annotation. Ideally, I'd prefer if not even other forecasting tests run unnecessarily, but that may be difficult to detect.

This is an idea for discussion. Please let me know if it's worth considering to add such test filtering, or if it has some other implications that I failed to notice.

fkiraly · 2023-04-10T08:03:31Z

fkiraly
Apr 10, 2023
Maintainer

This is an excellent idea, it was also discussed before in two "strains" of discussion.
The answer so far, it does not seem easy to identify the subset to be tested, even with a package that is dedicated specifically at testing only the diffs (pytest-testmon). The problem is the interaction with the test class framework which makes the usually sufficient criterion ("line touched") incorrect.

General discussion thread on how best to reduce test time:
#2890

Discussion on how to use pytest-testmon for sktime, @tarpas has even created a nice prototype!
#2530
context: pytest-testmon is a plugin for pytest which covers precisely this use case - testing only increments and not the entire code base!

@tarpas, in case you are still active and/or watching this, where did we end up getting stuck? anything we should be doing next, in your opinion?

2 replies

fkiraly Apr 10, 2023
Maintainer

Please let me know if it's worth considering to add such test filtering, or if it has some other implications that I failed to notice.

I think it's quite high up on the wishlist of everyone, the main impediment being "no one has found a way to do it properly yet".

tarpas Apr 11, 2023

@tarpas, in case you are still active and/or watching this, where did we end up getting stuck? anything we should be doing next, in your opinion?

I'm still watching but not reacting much because I don't have a silver bullet for you :/. It's a hard problem. I'm not offering pytest-testmon and testmon.net for you yet because I don't consider our solution to be stable enough yet for a large project like yours. The time is near when I'll be able to but I can't offer an ETA.

I'll give some suggestions lower...

fkiraly · 2023-04-10T08:06:09Z

fkiraly
Apr 10, 2023
Maintainer

(is there any more?)

other failure conditions for code that is not directly changed:

direct dependency updates
sktime internal dependencies, e.g., one estimator being a hard-coded component of another
hard to spot side effects, e.g., a forgotten clone

0 replies

fkiraly · 2023-04-10T08:08:59Z

fkiraly
Apr 10, 2023
Maintainer

There is also a very hacky thing core devs tend to do: if they know the test scope in some CI elements is superfluous, we cancel them manually and consider them as passed.

0 replies

yarnabrina · 2023-04-10T11:16:48Z

yarnabrina
Apr 10, 2023
Collaborator Author

I've zero experience with pytest-testmon, so apologise if if it's a better more feature complete solution. But what I have in mind is a simple filtering, at least by some extent to reduce times.

(What I am going to say below is what I have used quite a lot on Gitlab CI for my work related works. There should be Github Actions counterparts for achieving the same.)

Instead of a single test-estimators workflow, create multiple workflows, for example test-forecast-estimators, test-annotator-estimators, test-classifier-estimators, etc.
Instead of triggering these workflows on just creation of pull_request or on push to existing pull_request, add condition on change in forecasting, annotation, classification etc. folders.

The second point can be achieved quickly using paths parameter, if I'm following correctly.

For the first point, one can use -k flag extensively, which does support filtering by full test file path substring.

Admittedly, it still leads us to not go to very filtered specific tests, we are still at a higher level of forecasting vs classification etc. But hopefully it'd save some time.

7 replies

fkiraly Apr 10, 2023
Maintainer

I see!

Learnings for me:

the path field, haven't used it before. Very useful, this could be set to impact estimator types.
fail-fast, perhaps we want to set it to False everywhere anyway!

I would say, let's try this, looks promising!

Two comments about the above:

I think -k does selection by substring matching, but we actually want to run everything in the sktime.forecasting folder instead, right? But it seems plausible that with the right parameters this can be set up.
For forecasters, you also need to run the general contract tests ("all objects") in sktime.tests.test_all_estimators. This can be dealt with in similar fashion though, have one line to run all in foreecasting, and one line to run test_all_estimators. Possibly, we can restrict this to the forecaster estimator type, similar to current conftest args.

fkiraly Apr 10, 2023
Maintainer

reading through the GHA docs, one easy win would also be using paths-ignore, to ensure the estimator and framework tests are not run when only docs are being changed.

fkiraly Apr 10, 2023
Maintainer

also perhaps a cron schedule for "full tests", to keep load low an ensure everything is tested at some recent point in the past?

fkiraly Apr 10, 2023
Maintainer

question: there are ways to pass workflow outputs and inputs
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onworkflow_call
if we could pass the information to a workflow which folders (or which modules) have changes in them, the test workflow could be easily directed to test only affected estimator types.

I can't come up with a way (give my somewhat limited knowledge of GHA) to pass information about the changed folders (e.g., as from the paths filter) to a workflow.

yarnabrina Apr 10, 2023
Collaborator Author

Answering/commenting to all at once here.

we actually want to run everything in the sktime.forecasting folder instead, right

I do not want that. If I modify sarimax.py, I'd be happy to run tests on just SARIMAX and nothing else. This, in ideal case of my hope, would mean all of test_sarimax.py and all other tests where estimator type of SARIMAX is created through fixtures etc. But as I said, this is ideal, not I don't expect to achieve to start with.

In my above sample example though, I did not complicate that much, and indeed that would require too much knowledge about all tests and patterns than I know. So, there the commented line, 3rd from last, should do what you suggested.

perhaps a cron schedule for "full tests"

I don't know if there are any limit on GHA usage or not, but if there is, may be it's enough to have a single workflow to test all on workflow_dispatch (manual call from UI) and on release (change in sktime/__init__.py)? My only reason to propose this is that I'm almost certain that GHA must have some restrictions on number of workflows to run simultaneously on a single repo on their infrastructure, so that'd then block merge requests pending at scheduled times.

pass workflow outputs and inputs

I've used workflow inputs/outputs before, but mainly to pass to my custom Github actions which were defined in same repo and were being used in my workflows. I've never used it in a way you suggest, but may be this can simplify the task a bit?

https://github.com/tj-actions/changed-files

Related (?) link: https://stackoverflow.com/a/71117981/11117265

tarpas · 2023-04-11T18:37:35Z

tarpas
Apr 11, 2023

I don't know if there are any limit on GHA usage or not, but if there is, may be it's enough to have a single workflow

I can't find the source but there has been a limit of 20 concurrent jobs running (24/7 - no limit for overall capacity). I think GH doesn't advertise this because they want to have a leeway to restrict this.

not testing everything when just docs changed is a no brainer

I have a couple of thoughts which I think haven't been discussed yet:

I think conceptually this is the right approach: https://hacks.mozilla.org/2020/07/testing-firefox-more-efficiently-with-machine-learning/ No idea how much effort would it take to "port" the approach to your project. (the code is there and it supports Python, but not GH Actions). testmon.net might implement this as product in the future.
How often will a test which passes on Python 3.7 and 3.11 fail for python 3.10? I think there might be an opportunity to free up some capacity here and not test 3.8-3.10 for each commit, but only for important ones (or periodically)
To improve latency (time to failure) you could then use pytest-split

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test for affected codebase subset instead of everything in the codebase #4453

{{title}}

Replies: 5 comments 9 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Test for affected codebase subset instead of everything in the codebase #4453

yarnabrina Apr 9, 2023 Collaborator

Replies: 5 comments · 9 replies

fkiraly Apr 10, 2023 Maintainer

fkiraly Apr 10, 2023 Maintainer

tarpas Apr 11, 2023

fkiraly Apr 10, 2023 Maintainer

fkiraly Apr 10, 2023 Maintainer

yarnabrina Apr 10, 2023 Collaborator Author

fkiraly Apr 10, 2023 Maintainer

fkiraly Apr 10, 2023 Maintainer

fkiraly Apr 10, 2023 Maintainer

fkiraly Apr 10, 2023 Maintainer

yarnabrina Apr 10, 2023 Collaborator Author

tarpas Apr 11, 2023

yarnabrina
Apr 9, 2023
Collaborator

Replies: 5 comments 9 replies

fkiraly
Apr 10, 2023
Maintainer

fkiraly Apr 10, 2023
Maintainer

fkiraly
Apr 10, 2023
Maintainer

fkiraly
Apr 10, 2023
Maintainer

yarnabrina
Apr 10, 2023
Collaborator Author

fkiraly Apr 10, 2023
Maintainer

fkiraly Apr 10, 2023
Maintainer

fkiraly Apr 10, 2023
Maintainer

fkiraly Apr 10, 2023
Maintainer

yarnabrina Apr 10, 2023
Collaborator Author

tarpas
Apr 11, 2023