Test for affected codebase subset instead of everything in the codebase #4453
Replies: 5 comments 9 replies
-
This is an excellent idea, it was also discussed before in two "strains" of discussion. General discussion thread on how best to reduce test time: Discussion on how to use @tarpas, in case you are still active and/or watching this, where did we end up getting stuck? anything we should be doing next, in your opinion? |
Beta Was this translation helpful? Give feedback.
-
other failure conditions for code that is not directly changed:
|
Beta Was this translation helpful? Give feedback.
-
There is also a very hacky thing core devs tend to do: if they know the test scope in some CI elements is superfluous, we cancel them manually and consider them as passed. |
Beta Was this translation helpful? Give feedback.
-
I've zero experience with (What I am going to say below is what I have used quite a lot on Gitlab CI for my work related works. There should be Github Actions counterparts for achieving the same.)
The second point can be achieved quickly using For the first point, one can use Admittedly, it still leads us to not go to very filtered specific tests, we are still at a higher level of forecasting vs classification etc. But hopefully it'd save some time. |
Beta Was this translation helpful? Give feedback.
-
I can't find the source but there has been a limit of 20 concurrent jobs running (24/7 - no limit for overall capacity). I think GH doesn't advertise this because they want to have a leeway to restrict this.
I have a couple of thoughts which I think haven't been discussed yet:
|
Beta Was this translation helpful? Give feedback.
-
As of now, all pull requests in
sktime
tests for each and every functionality. This is probably (actually quite certainly) safer, but it takes a LOT of time, especially since same tests run on multiple platforms. I've read in few issues that few tests are split across different platforms, but still it does take long.For example, let's take an example: https://github.com/sktime/sktime/actions/runs/4615853258. The PR had no code changes, so certainly none of the configured workflows (except that of read the docs) have the scope of failing ever, except few very specific scenarios:
sktime
dependencies (like [BUG]attrs
import error under minimal dependencies #4449)I would like to think that these two are extreme rare scenarios and does not worth running tests for 2 hours and 35 minutes. Also, it'd never be in the scope of being fixed in same PR following the modular PR style followed here.
So, I was wondering if there is any reasons to not run tests selectively. For example, if I modify
SARIMAX
estimator, it should only tests onSARIMAX
, and it should run those tests on every supported python version and every supported platform. But it should not run tests inclustering
orannotation
. Ideally, I'd prefer if not even otherforecasting
tests run unnecessarily, but that may be difficult to detect.This is an idea for discussion. Please let me know if it's worth considering to add such test filtering, or if it has some other implications that I failed to notice.
Beta Was this translation helpful? Give feedback.
All reactions