Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add possibility to switch off use of external data in tests #1801

Closed
wants to merge 1 commit into from

Conversation

mrbean-bremen
Copy link
Member

  • depends on env var NO_EXTERNAL_DATA being 1
  • moved all usages of get_testdata_file() into tests and fixtures
  • change old-style setup() to fixtures

Tasks

  • Unit tests added that reproduce the issue or prove feature is working
  • Fix or feature added
  • Code typed and mypy shows no errors
  • Documentation updated (if relevant)
    • No warnings during build
    • Preview link (CircleCI -> Artifacts -> doc/_build/html/index.html)
  • Unit tests passing and overall coverage the same or better

@mrbean-bremen mrbean-bremen marked this pull request as draft May 28, 2023 08:47
@mrbean-bremen
Copy link
Member Author

mrbean-bremen commented May 28, 2023

@darcymason - this is just to show what I did. About 350 of the 2800+ plus tests seem to depend on external data and will not be run if NO_EXTERNAL_DATA is set to 1.
I haven't really finished this - as I wrote, I'm not too happy with that.

- depends on env var NO_EXTERNAL_DATA being 1
- moved all usages of get_testdata_file()
  into tests and fixtures
- change old-style setup() to fixtures
@codecov
Copy link

codecov bot commented May 28, 2023

Codecov Report

Merging #1801 (33eb8b5) into master (1f59d85) will decrease coverage by 0.03%.
The diff coverage is 16.66%.

@@            Coverage Diff             @@
##           master    #1801      +/-   ##
==========================================
- Coverage   97.41%   97.39%   -0.03%     
==========================================
  Files          66       66              
  Lines       10839    10845       +6     
==========================================
+ Hits        10559    10562       +3     
- Misses        280      283       +3     
Impacted Files Coverage Δ
pydicom/data/data_manager.py 95.96% <16.66%> (-4.04%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@darcymason
Copy link
Member

Thanks for looking into this and experimenting with options.

I haven't really finished this - as I wrote, I'm not too happy with that.

So, any thoughts on which way to go?

I've had a quick look through here and have some questions/observations, to at least help with discussion about how to approach this:

  • in these commits you've replaced some base filenames with global constants, some with fixtures, some have gone back to an explicit get_testdata_file("base filename"). What is the need for these variations? (I could probably figure it out with a little more inspection, but probably easier for you to comment)
  • The ~350 tests using external data are probably pretty core to pydicom's functionality. This leads me again to think that just requiring them is reasonable. A "test" of pydicom with so many missing may not really be testing all that much anyway.
  • similar to the last point, I wonder whether even numpy could be just required for testing (perhaps for pydicom v3.0 anyway). I can see other libraries like gdcm, the libjpeg's etc staying as skipif tests since they are truly optional and in some cases alternative choices to each other.

Some more nitpicking things:

  • setup could also be replaced with setup_method rather than a fixture. Pytest is great in many ways, but I don't like the way it often works implicitly rather than explicitly. I'd prefer to avoid fixtures unless the gain is really worth it. Of course, setup_method is also effectively implicit (one has to know that it is called automatically), but at least there is a long history of setup and teardown methods.
  • similarly, I'm not so big on autouse methods either, would rather see them explicit. (even just passing a fixture to the function already triggers implicit behavior one has to know about, but at least it is clear which things are being used).

This is a complex issue, again thanks for taking some steps, hopefully we can close in on some consensus.

My $0.02 Cdn.

@mrbean-bremen
Copy link
Member Author

in these commits you've replaced some base filenames with global constants, some with fixtures, some have gone back to an explicit get_testdata_file("base filename"). What is the need for these variations?

This was not planned. I started with replacing the global calls of get_testdata_file that are used more than once with fixtures, which worked for most of the common tests. Then I realized that most of the pixel data tests used these calls in constants used for parametrizing tests. As this cannot be replaced with fixtures (at least not in a straightforward way), I just replaced them with constants and moved get_testdata_file into the tests. This was crudely done using copy/paste without much thinking - I justed wanted to make sure that it works.
This all resulted in a hodgepodge of different ways to get the test data. Should we decide to go this way (which seems unlikely to me), I would consolidate this mess.

The ~350 tests using external data are probably pretty core to pydicom's functionality. This leads me again to think that just requiring them is reasonable.

Agreed. As the use case here is mainly packaging, we could indeed make different packages - with and without tests.

As an aside: I'm not much in favour of deploying tests within the wheel packages (what we currently do, though not the tests in pydicom-data). They make more sense in source dist packages, and in this case there should also be the possibilty to download the test data.

I'm inclined to think that for installing the pydicom package in a distribution, running the tests is not really needed - as a Python only package, our CI tests should be sufficient. Most of the tests using external test data are pixel handler tests, which can only be tested if installing all the needed packages, which would further complicate packaging.

So, at least for the 2.4 release, I tend to just remove the tests at all from packaging (not sure how to do this, but it can certainly be done).

similar to the last point, I wonder whether even numpy could be just required for testing

Agreed. I added a few skipif statements because I removed numpy to be able to run the non-numpy tests, but didn't want to remove the other packages. These tests have been failing otherwise.

setup could also be replaced with setup_method rather than a fixture

Yes - but as I called fixtures in setup, they had to be fixtures. In some cases I also prefer fixtures, because there is no need to add instance variables just to be cleaned up in teardown, as there is no separate teardown. I realize that this could also be done using finalizers, but I find these awkward.

similarly, I'm not so big on autouse methods either, would rather see them explicit. (even just passing a fixture to the function already triggers implicit behavior one has to know about, but at least it is clear which things are being used).

Well, it depends. In the case of setup_method I agree, if there are no fixtures needed in the setup, and nothing to be saved for teardown.
Generally, I'm a fan of fixtures - they are the swiss army knife of pytest. In this case (looking up test data names) the performance does not matter, but if we would use fixtures with session scope for reading datasets, they would be read only once. Again, this is not the best use case here, as this would imply that the dataset is not changed.
Also, fixtures are only called if used, as opposed to globals, that are called on loading the test.

@darcymason
Copy link
Member

I'm not much in favour of deploying tests within the wheel packages (what we currently do, though not the tests in pydicom-data). They make more sense in source dist packages

I'm open to that idea. I had originally always bundled tests with pydicom because I thought they might serve as a form of example for users (and the test files provided various kinds of DICOM SOP classes to experiment with). But I think we are long past that now, there is much more documentation and various blogs and other code on the internet with pydicom examples.

as a Python only package, our CI tests should be sufficient.

That's an interesting point I hadn't thought of. But this gets at what I was asking in issue #1800, if tests were really necessary. We don't release without our full CI passing, so there is no reason to run the tests on any pydicom distribution, but as you point out, that's likely only really true with pure Python packages.

similar to the last point, I wonder whether even numpy could be just required for testing

Agreed.

One problem I realized with this, is that I still think we should be able to install pydicom without numpy, e..g. for very minimal environments like pyodide. I think if the default install is without numpy, then perhaps a set of non-numpy tests should be able to be run to ensure things work in that enviroment.

In some cases I also prefer fixtures, because there is no need to add instance variables just to be cleaned up in teardown, as there is no separate teardown

Fair point. I agree that in some cases one self-contained fixture is easier to grok than a separate setup and teardown.

@mrbean-bremen
Copy link
Member Author

mrbean-bremen commented May 28, 2023

So maybe we for the release, we just can remove the tests and test data from packaging (e.g. from the manifest), change setup to setup_method where needed to avoid the warnings and leave it at that?

@darcymason
Copy link
Member

So maybe we for the release, we just can remove the tests and test data from packaging (e.g. from the manifest), change setup to setup_method where needed to avoid the warnings and leave it at that?

Possibly. I'm not clear if the manifest is enough or whether we have to specify to not include package data as well (or specifically exclude tests?).

The other possibility, mentioned before, (I'll try to investigate tomorrow) is if there is a way to simply produce an sdist with and without the tests, which would solve #1800 while minimally changing everything else.

@darcymason
Copy link
Member

The other possibility, mentioned before, (I'll try to investigate tomorrow) is if there is a way to simply produce an sdist with and without the tests, which would solve #1800 while minimally changing everything else.

I was reading about Python packaging for a while yesterday, my head was starting to hurt 🙉. I don't see anything simple that can produce alternate build paths, so I guess we are left with removing tests in the build for next release. I'm not even clear on how to do that with the new pyproject.toml style, so suggestions welcome.

However, I've been reading up on the whole packaging ecosystem, and trying to think defensively for the longer term, I've opened a new issue to track/discuss options...

Please see #1802.

@mrbean-bremen
Copy link
Member Author

I was reading about Python packaging for a while yesterday, my head was starting to hurt 🙉.

Yes, packaging in Python feels like rocket science... I also don't have a real grasp on it yet.

I'm not even clear on how to do that with the new pyproject.toml style, so suggestions welcome.

I think this part is still in setup.py (plus the manifest) - package_data currently includes the test data, and packages uses the default find_packages, which has to be adapted.

@darcymason
Copy link
Member

I think this part is still in setup.py (plus the manifest) - package_data currently includes the test data, and packages uses the default find_packages, which has to be adapted.

There is also package discovery in pyproject.toml though, but I am not expert on using it. Right now it has:

[tool.setuptools.packages]
find = {}

I remember having to add that, but I don't remember why. There is a syntax for finding/excluding packages, etc. that I came across but this one was the one needed in that PR.

However, maybe it doesn't matter - if we move tests out in v3.0, it won't be in the project folder to be found anyway. So if there is a way to get it working now, we can adapt as needed after.

@mrbean-bremen
Copy link
Member Author

@darcymason - I will be travelling for a week, so from tomorrow onwards I will have no access to a computer (apart from my phone). So, if you plan to make the release soon, I won't be able to help (apart from trivial reviews).

As they closed the team discussions, I find no convenient place to put this kind of information... or did we discuss this already? I dimly remember some related discussion, but it may have been elsewhere...
This PR currently serves as a kind of discussion forum, I will close it soon, as there is nothing to be merged here.

@darcymason
Copy link
Member

So, if you plan to make the release soon, I won't be able to help (apart from trivial reviews).

Thanks for letting me know ... I do plan on releasing soon, but we'll see if I can get around to it. I'm thinking now to just leave the remaining issues tagged for this release until the next one, except to make a build without the tests. That is, unless you (or anyone else) has an urgent issue which you think should be included.

@mrbean-bremen
Copy link
Member Author

That is, unless you (or anyone else) has an urgent issue which you think should be included.

No, I'm fine :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants