Performance: Cross-test caching of loaded scripts #606

reitzig · 2023-07-11T21:38:01Z

What feature do you want to see added?

Currently, PipelineTestHelper#loadScript (or, more specifically, GroovyScriptEngine#loadScriptByName) accounts for roughly 95% of the running time of our test suite (>200 tests).

Now, I'm sure we've been working against anything that even closely resembles best practices, but still.

In a @BeforeEach method, we

run DelarativePipelineTestHelper#setUp,
set up some mocks,
register a shared library, and
load the pipeline script under test.

In the test class at hand, we have 27 tests, each of which performs a run of the pipeline (with different parameters, alas).

I see that GroovyScriptEngine has a cache, but that one is reset for every test.
I tried adding a cache for the class objects myself:

class CachingPipelineTestHelper extends PipelineTestHelper {
    private final static Map<String, Class> scriptClassCache = new HashMap<>()

    @Override
    Script loadScript(String scriptName, Binding binding) {
        Objects.requireNonNull(binding, "Binding cannot be null.");

        Class scriptClass;
        if (scriptClassCache.containsKey(scriptName)) {
            scriptClass = scriptClassCache.get(scriptName)
        } else {
            Objects.requireNonNull(gse,
                    "GroovyScriptEngine is not initialized: Initialize the helper by calling init().");
            scriptClass = gse.loadScriptByName(scriptName)
            scriptClassCache.put(scriptName, scriptClass)
        }

        setGlobalVars(binding)
        Script script = InvokerHelper.createScript(scriptClass, binding)
        InterceptingGCL.interceptClassMethods(script.getMetaClass(), this, binding)
        return script
    }
}

When I use that, I get java.lang.IllegalStateException: No agent description found in all but the first test; basically, they fail on the first expression inside pipeline { ... }.

I don't really understand what's going on, so I'll go ahead and phrase it as a feature request: Please provide facilities that allow us to load pipeline scripts only once per test run (or at least test class).

Upstream changes

No response

The text was updated successfully, but these errors were encountered:

nre-ableton · 2023-07-13T07:58:23Z

(>200 tests)

Just out of curiosity, how long (in absolute time) does your test suite take to run? 200 tests isn't very much. We have a library with 218 tests and it takes 16 seconds to run.

reitzig · 2023-07-13T08:05:30Z

Huh. 😲 🤔

On my laptop, the whole suite runs for 3-4min; on the build server, 6-7min.

Most of that time (2-3min locally) is taken up by that one test (class) I profiled above with 20-ish tests, each of which ends up re-loading a ~500 line pipeline script (which liberally uses classes from the library, in case that matters).

I don't think I can share any code, unfortunately. 😕 Anything I should be on the lookout for?

nre-ableton · 2023-07-13T08:22:38Z

Yeah, your analysis is on point. The difference in performance is definitely due to loading in the pipeline script for every test. And although I wouldn't consider a ~500 line pipeline script to be large, it's not small either. I'm not sure how complex the pipeline code is, but that might also play a role here as well.

Regardless, the reason our tests pass so quickly is because of the way that we test. We don't test any of our pipelines, but instead we build libraries and move any significant logic from the pipelines to them (see https://github.com/jenkinsci/JenkinsPipelineUnit/#writing-testable-libraries). This means that for our test cases we are literally loading an empty library and just testing class functionality, which explains the performance difference. Unfortunately, that also means we are comparing apples to oranges here...

I'm open to the idea of caching scripts, but this would need to be done in a very careful manner and also probably disabled by default. The obvious concern would be that pipeline scripts could contain static state which, in the worst case scenario a test suite would only run properly when the tests are executed in a certain order (for example). Generally speaking, the desired behavior for any unit testing framework is to start fresh with each test and not retain any state. If you can get your above code working and make a PR, I'd be happy to review it.

However, I think that you might have better luck either in refactoring the pipeline (ie, breaking up some of the logic into individually tested libraries or to shell/python scripts etc). I realize that this may be a case of "easier said than done" given the various restraints of your codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Cross-test caching of loaded scripts #606

Performance: Cross-test caching of loaded scripts #606

reitzig commented Jul 11, 2023

nre-ableton commented Jul 13, 2023

reitzig commented Jul 13, 2023

nre-ableton commented Jul 13, 2023 •

edited

Performance: Cross-test caching of loaded scripts #606

Performance: Cross-test caching of loaded scripts #606

Comments

reitzig commented Jul 11, 2023

What feature do you want to see added?

Upstream changes

nre-ableton commented Jul 13, 2023

reitzig commented Jul 13, 2023

nre-ableton commented Jul 13, 2023 • edited

nre-ableton commented Jul 13, 2023 •

edited