Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce resource footprint of data-driven tests #3736

Closed
7 of 20 tasks
epdenouden opened this issue Jun 25, 2019 · 14 comments
Closed
7 of 20 tasks

Reduce resource footprint of data-driven tests #3736

epdenouden opened this issue Jun 25, 2019 · 14 comments
Assignees
Labels
feature/data-provider Data Providers feature/test-dependencies Issues related to explicitly declared dependencies between tests feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory) type/refactoring A refactoring that should be applied to make the code easier to understand and maintain

Comments

@epdenouden
Copy link
Contributor

epdenouden commented Jun 25, 2019

🏭 Data-driven tests reengineered for large-scale automation

The current implementation of running data sets through tests using @dataprovider has not been designed for efficiency. Optimizing this part of PHPUnit would greatly benefit build pipelines and is a regular request from developers. The proposed changes provide a solid improvement to data providers in general and generators in particular.

  • equal-or-lower peak memory usage of PHPUnit test collections
  • equal-or-faster speed of PHPUnit test collections
  • smoother resource usage curve
  • progressive loading makes repeating defects-first testing significantly more efficient
  • no changes to tests or configuration required by users

🚀 Benefits of improved data provider handling

A picture is worth a thousand data points:
image

  • prefetched is the @dataprovider implementation from 8.1
  • just-in-time is the experimental just-in-time loading mechanism
  • JIT with unload is the JIT combined with data unloading during the run
  • simulated Generator are data providers that are (un)loaded row-by-row

Data gathered with a custom TestCase and Xdebug. See the technical notes below for details.

🏗 Design considerations and implementation

The change that makes everything possible is postponing the loading of data sets from the initialization phase to just before running. Immediately after each TestCase finishes the associated data is removed from memory. For all this to work a lot of plumbing and wiring has to be reviewed.

  • data provider loading has been moved from \PHPUnit\Util\Test to DataProviderTestSuite
  • during initialization every DataProviderTestSuite is created as an empty TestSuite
  • just-in-time loading is triggered in DataProviderTestSuite::run()
  • after TestCase::run() there is now a call to TestCase::unloadData()
  • special cases like --list-tests that require the whole test collection to be available for traversal can force-load all data providers by poking the root TestSuite
  • the --filter selector requires access to both the keys and data of a @dataprovider to be able to pick out the right ones. The MVP handles this by reverting back to the old load on init behaviour when a filter is detected

⚗️ Research

  • make sure test execution reordering still works for @dataprovider
  • look into how --filter (and other selectors) work in detail and see what efficiency gains are possible without introducing new edge cases
  • investigate generators which requires more logic in DataProviderTestSuite::run() which already work but are currently unrolled into an Array
  • possibility of non-static data providers with access to runtime test fixtures

📐Testing and benchmarking the prototype

  • the prototype was based on PHPUnit version 8.1, behaviour in newer versions is the same
  • to validate the resource allocation behaviour of the different DataProviderTestSuite implementations I've created SyntheticDataprovidersTest
  • the prototype does not yet support Generator row-by-row testing
    This benchmark is accurately simulated by unrolling the SyntheticDataprovidersTest into a series of tests with single-row @dataprovider. See DummyGeneratorDataprovidersTest

🔬 Technical notes

  • fresh master based development branch
  • resource usage is measured with xdebug_trace() and mangled into the graph above with some scripting and a spreadsheet
  • exact resource measurements differ slightly between PHP versions, loaded modules, OS type, etc

📚 References

📋 Tasks

  • create master based MVP
  • add data provider mode switch in options and config
    • validate current self-test suite works with lazy loading 🥂
    • add tests for event sequence differences for prefetch vs jit
  • clean up @dataprovider loading mechanism
    • move relevant part of the loading mechanism to DataProviderTestSuite
    • refactor @testWith to return a Generator
    • add Generator row-by-row loading
  • implement prefetch/jit switching logic
    • add DataProviderTestSuite::run()
    • add basic injectFilter functionality to DataProviderTestSuite
    • refine filter handling to only trigger loading when required, e.g. not for @group
  • add loadAllDataProviders() to make --list-tests work
  • upgrade TestSuiteSorter to handle sorting unloaded DataProviderTestSuite with defects
  • fix up basic ResultPrinter
  • validate TestDox output buffering still works with data tests popping into existence

(more tasks to be added)

@epdenouden epdenouden self-assigned this Jun 25, 2019
@epdenouden epdenouden added type/performance Issues related to resource consumption (time and memory) and removed experimental labels Jul 2, 2019
@epdenouden epdenouden changed the title WIP improved data provider handling Improve data set resource footprint Jul 8, 2019
@epdenouden epdenouden changed the title Improve data set resource footprint Improve resource footprint of data-driven tests Jul 8, 2019
@epdenouden epdenouden changed the title Improve resource footprint of data-driven tests Reduce resource footprint of data-driven tests Jul 12, 2019
@theofidry
Copy link
Contributor

Out of curiosity, does it mean you will have access to the test case properties set in the setUp() or setUpBeforeClass() within the providers?

@epdenouden
Copy link
Contributor Author

Out of curiosity, does it mean you will have access to the test case properties set in the setUp() or setUpBeforeClass() within the providers?

Hi again @theofidry :)
Yes, I am working towards making that a possibility. For the fixtures to be accessible we need two things:

  1. sequencing: execute data providers after the fixtures
  2. runtime context: find the correct scope to execute the data providers

The first is a matter of simply moving the ::getData() to the correct step in the process. For static calls this should be completely transparent. Except we all know it isn't, of course, as developers often use the fixtures to set up and tear down whole databases and environments. I will make sure to add a @useOldDataLogic toggle so people don't have to refactor large test collections all at once to upgrade.

Getting the runtime context right is trickier. setUpBeforeClass and setUp aren't quite the same. It will take me some fiddling to provide some stable proposed solutions, which can then be experimented with by the community.

And then there are generators. Turning a /* @dataProvider someGenerator */ into an actual row-by-row provider without quietly unrolling it, is possible but tricky.

Thanks for the comment! You always have a keen eye for detail and I know you will be one of the first to test the pull request when it's up. Much appreciated. :)

@danon
Copy link

danon commented Jul 28, 2019

I have no idea about internal phpunit structure, but maybe there is a one-off standing tasks, that doesn't require phpunit knowledge to solve?

I could contribute :)

@epdenouden
Copy link
Contributor Author

epdenouden commented Jul 29, 2019

Hi @danon!

I have no idea about internal phpunit structure, but maybe there is a one-off standing tasks, that doesn't require phpunit knowledge to solve?

Refactoring the data provider mechanism is all about untangling parts of the core of PHPUnit and updating it to support on-demand data initialization. At the same time I want to add generator-based providers to the developers' toolbox and maintain backwards compatibility. It is a bit of a mess.

I could contribute :)

Just reading your positive message already contributes so much! Often when people make the effort to get into the Github issues something doesn't work as expected and they have become frustrated enough to write about it.

This is a good reminder to create some extra ticket triage labels like Good first ticket. There are lots of wishes floating around and I would love for other developers to feel not just welcome but invited. Please don't hestitate to file an issue or pull request of your own; hit me up on twitter if you want to float some ideas.

@danon
Copy link

danon commented Jul 29, 2019

@epdenouden Right now my only issue is that I have custom @dataPrvider, which loads data from files. The parsing of files can be difficult, and I'd like to debug it. But I can't debug a single dataprovider, without also debugging all of them - since data providers are resolved before running tests.

I need to manually edit the dataProvider and add array_slice($data, x, 1); to only run one test.

But I can imagine that's not a trivial task at the best of circumstances, and made extra hard by me not nowing anything about phpunit :D

@epdenouden
Copy link
Contributor Author

epdenouden commented Jul 29, 2019

I need to manually edit the dataProvider and add array_slice($data, x, 1); to only run one test.

See, this is why the comments are just gold: I had not thought about it this way. Yes, the data provider refactoring would help you out with its improved row-by-row loading. However, when a data provider fails it should/could already give you an accurate code location.

I have a request: can you post a screenshot copy+paste the output of the exception+stacktrace you get when a data provider fails? I doesn't have to contain any sensitive data of your client/employer. I just want to know what code location hint you get when the provider crashes.

With a bit of luck I can help you out with a small improvement along the lines of the extra protection against tear down exceptions or the improved code location hints.

@sebastianbergmann
Copy link
Owner

can you post a screenshot of the exception+stacktrace you get when a data provider fails?

Please: no screenshots. Copy&Paste instead (of course using the right markup).

@epdenouden
Copy link
Contributor Author

Please: no screenshots.

Good point, text is much easier to work with. I've been guilty of posting screenshots as the highlighting is quite handy. Just found out how to use Markdown formatting in collapisble blocks like this one:

github collapsible content with formatting
DataProviderInstance
 ✘ Data provider uses running test instance with data set 0  15 ms
   ┐
   ├ Failed asserting that two arrays are equal.
   ┊ ---·Expected
   ┊ +++·Actual
   ┊ @@ @@
   ┊  Array (
   ┊      0 => '__construct'
   ┊ -····1·=>·'setUp'
   ┊ -····2·=>·'testDataProviderUsesRunningTe...stance'
   ┊ +····1·=>·'objectHashProvider'
   ┊  )
   │
   ╵ /Users/ewout/proj/phpunit/src/Framework/Constraint/IsEqual.php:96
   ╵ /Users/ewout/proj/phpunit/src/Framework/Assert.php:2663
   ╵ /Users/ewout/proj/phpunit/src/Framework/Assert.php:602
   ╵ /Users/ewout/proj/phpunit/tests/_files/DataProviderInstanceTest.php:35
   ╵ /Users/ewout/proj/phpunit/src/Framework/TestCase.php:1189
   ╵ /Users/ewout/proj/phpunit/src/Framework/TestCase.php:856
   ╵ /Users/ewout/proj/phpunit/src/Framework/TestResult.php:695
   ╵ /Users/ewout/proj/phpunit/src/Framework/TestCase.php:814
   ╵ /Users/ewout/proj/phpunit/src/Framework/TestSuite.php:576
   ╵ /Users/ewout/proj/phpunit/src/Framework/DataProviderTestSuite.php:93
   ╵ /Users/ewout/proj/phpunit/src/Framework/TestSuite.php:576
   ╵ /Users/ewout/proj/phpunit/src/TextUI/TestRunner.php:620
   ╵ /Users/ewout/proj/phpunit/src/TextUI/Command.php:202
   ╵ /Users/ewout/proj/phpunit/src/TextUI/Command.php:161
   ┴

Time: 253 ms, Memory: 6.00 MB

@danon
Copy link

danon commented Jul 30, 2019

I'm the author of T-Regx library. I've also written a documentation which has a lot of code markups. For example here: https://t-regx.com/docs/match

I don't want ever for the code markups to be invalid or return unexpected values. So I've created a parser that iterates throuch each of the snippets asserting that they return expected values: https://github.com/T-Regx/T-Regx.github.io/blob/source/test/Docs/MarkupResultConsistencyTest.php

@epdenouden
Copy link
Contributor Author

epdenouden commented Jul 30, 2019

I don't want ever for the code markups to be invalid or return unexpected values.

Thanks for the example, I'll have a look later this week. It is data provider related and I will create a separate ticket for it to keep the comments here for lazy-loading specific discussion.

In any case: @danon, thanks for bringing up another use case. The whole array vs row by row handling is something that rears its head in multiple places in the code.

@danon
Copy link

danon commented Jul 30, 2019

@epdenouden but yea, basically I'd like the phpunit with iterator to call current() , invoke a test, call next and current itd.

@epdenouden
Copy link
Contributor Author

epdenouden commented Jul 31, 2019

@epdenouden but yea, basically I'd like the phpunit with iterator to call current() , invoke a test, call next and current itd.

Yes, that will be a very nice improvement and it will get there. It will take a few steps to get every detail sorted as development on PHPUnit has to keep backwards compatibility in mind.

BTW, T-Regx looks really nice; I grew up on too much Perl and it just looks so... clean. Will give it a go in a private project. Also, the Cross Data Providers project I saw is something that I have implemented myself more than once. Reimplmenting something like that as an iterator with proper memory management would be really sweet.

If you don't mind, I'll ping you once I have finished laying the groundwork here.

@danon
Copy link

danon commented Aug 1, 2019

Yes, that will be a very nice improvement and it will get there. It will take a few steps to get every detail sorted as development on PHPUnit has to keep backwards compatibility in mind.

Can't wait :) If I can be of help, please ask.

BTW, T-Regx looks really nice; I grew up on too much Perl and it just looks so... clean. Will give it a go in a private project.

Thank you, I really appreciate it :Dyou have no idea what it means to me. do you know this from checking out the docs?

Also, the Cross Data Providers project I saw is something that I have implemented myself more than once. Reimplmenting something like that as an iterator with proper memory management would be really sweet.

We could merge it and create one common repo for users to use.

If you don't mind, I'll ping you once I have finished laying the groundwork here.

Waiting :)

@sebastianbergmann sebastianbergmann added feature/data-provider Data Providers feature/test-dependencies Issues related to explicitly declared dependencies between tests feature/test-runner CLI test runner type/refactoring A refactoring that should be applied to make the code easier to understand and maintain and removed PHPUnit 8 labels Feb 11, 2020
@danon
Copy link

danon commented Jul 8, 2020

@epdenouden Hello, what's the status of the data provider handling revamp? What's new? :)

Do you need contributors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/data-provider Data Providers feature/test-dependencies Issues related to explicitly declared dependencies between tests feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory) type/refactoring A refactoring that should be applied to make the code easier to understand and maintain
Projects
None yet
Development

No branches or pull requests

4 participants