Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: dcmread: Expected a file ... but got NoneType test failures on =pydicom-2.3.1 #1800

Open
TheChymera opened this issue May 25, 2023 · 15 comments

Comments

@TheChymera
Copy link

Full build log: https://ppb.chymera.eu/084827.log

Any ideas what's going on?

@mrbean-bremen
Copy link
Member

At a first glance it looks like the test data cannot be accessed - maybe this warning:

_Warning: Package 'pydicom.data.test_files' is absent from the `packages` configuration.

has something to do with it.
Are you packaging a distribution?

@TheChymera
Copy link
Author

@mrbean-bremen yes, I'm packaging it for Gentoo Linux. Is there perhaps a static testdata archive somewhere? The log above is using the PyPI tarball, I also tried it with the GH one, but I get the same errors.

@mrbean-bremen
Copy link
Member

Yes, there is both testdata in the package (the one mentioned in the warning), and a static test data archive in a separate repository (pydicom/pydicom-data). I thought that the tests that use the external data are excluded if it cannot be accessed, but I may be wrong. I may have a closer look tonight (at work right now), or @darcymason may check this if he finds the time.

@darcymason
Copy link
Member

Instead of trying to completely understand what is going on here, I have been wondering anyway about being more explicit for optional packages in pydicom's new pyproject.toml, which might help solve this.

Specifically, I'm thinking to options so one could pip install pydicom[test_files], or pip install pydicom[all], etc. although I'm not clear what the script does here. If it starts with the source tarball, then maybe that is not possible. Alternatively, can your packaging be done without running pydicom's tests?

I'm hoping to do a pydicom 2.4 release in the next week or so. If you could wait for that, we could ensure that it is configured correctly to work for your build.

I thought that the tests that use the external data are excluded if it cannot be accessed, but I may be wrong.

I'll try to have a look at that too later today if @mrbean-bremen doesn't get to it first

@TheChymera
Copy link
Author

@darcymason yes, I can temporarily disable the tests. Can you ping me on this with the new release?

@darcymason
Copy link
Member

I thought that the tests that use the external data are excluded if it cannot be accessed, but I may be wrong.

I've had a look at this and there is are classes of tests of the data manager code itself that skip if EXT_PYDICOM (whether pydicom-data installed) is True and another class if it is False, but no such check in the other tests AFAICT. I haven't followed it through completely, but I believe the files will just be downloaded as needed by each test.

@darcymason
Copy link
Member

Can you ping me on this with the new release?

@TheChymera, sure, we will likely enlist your help pre-release, to make sure it is okay before publishing.

@mrbean-bremen
Copy link
Member

Thanks @darcymason - I got distracted with other not pydicom-related stuff and didn't get to it.

I've had a look at this and there is are classes of tests of the data manager code itself that skip if EXT_PYDICOM (whether pydicom-data installed) is True and another class if it is False, but no such check in the other tests AFAICT.

Yes, you are right. It is probably the best to also skip all tests relying on this data, maybe using an auto-use fixture that detects if the files are available.
Speaking of fixtures - I also noticed a whole lot of pytest warnings about the obsolete setup method. We may also want to fix this. I'm not sure if I find the time before the release, but these should be easy things to do.

@mrbean-bremen
Copy link
Member

@darcymason - I've started to try this out locally. Depending on an environment variable, get_test_data will skip the test if the data is not available and external data shall not be fetched (probably making this the default, so that for packaging it will never be attempted).
Unfortunately, there are a lot of get_test_data used globally in the test modules, which will not work for skipping single tests, so this will take a bit to change. Nevertheless, I think it is better to do this in a general way, so we don't have to mark each single test that uses external data. What do you think?

I will try to put something together over the weekend.

@darcymason
Copy link
Member

What do you think?

I will try to put something together over the weekend.

Sure, if you are willing to come up with something, that would be great.

skip the test if the data is not available and external data shall not be fetched (probably making this the default, so that for packaging it will never be attempted).

I'm not sure how this should work -- it seems making the default skip a large number of tests might be awkward. How do we set that for our CI, etc.? Does that mean adding an environment variable change to all the workflows? How does that work for a user who just installed and wants to run the tests? -- in partial answer to my own question, I think we could have a [dev] extras option in the install, or [test], and we could add that to the contributing guidelines (and/or installation documentation)

@darcymason
Copy link
Member

Unfortunately, there are a lot of get_test_data used globally in the test modules,

Forget to mention this in my previous comment. I did wonder whether we should simplify that anyway. I don't see the point of setting a global variable, e.g. rtplan_name = get_testdata_file("rtplan.dcm") when each test using it could just call the get_testdata_file directly. I doubt there is a significant time cost in getting the full filepath again for files that are used in multiple tests, and if even if there is, some kind of caching could be done to avoid that penalty.

@darcymason
Copy link
Member

So, another thought...what if the tests simply required pydicom-data to be installed and no tests are run if it is not, or the user has to specify it is okay to install pydicom-data. I think all or almost all files are used in the tests, so it would be more efficient anyway to just get them all at once.

I'm not sure if that can work in this issue's build from a source tarball scenario ... but if there were a way to do something parallel to pip install[tests] or pip install[all] kind of behavior (maybe produce two different source builds - one with pydicom-data, one without), then perhaps it could work out.

@mrbean-bremen
Copy link
Member

I'm not sure how this should work -- it seems making the default skip a large number of tests might be awkward.

Agreed.

I doubt there is a significant time cost in getting the full filepath again for files that are used in multiple tests, and if even if there is, some kind of caching could be done to avoid that penalty.

Yes, I ended up doing exactly this for many cases (specifically the pixel handler tests, as they use the names to parametrize the tests). I started by adding fixtures for the common cases, but I'm now not so sure if this is even needed.

So, another thought...what if the tests simply required pydicom-data to be installed and no tests are run if it is not, or the user has to specify it is okay to install pydicom-data.

You mean no tests at all are run? This would make some sense, as with my current implementation, only the tests with external data are skipped, which is a rather arbirary collection of tests. I'm inclined to throw away my local changes in favour of an easier way that will just skip all tests if pydicom-data is not available. I'm not sure about the best way yet...

@darcymason
Copy link
Member

You mean no tests at all are run?

Yes, that is the suggestion. Throw an error message stating that testing requires pydicom-data to be installed. But as I said, I'm not sure how that can work in a source distribution as in this issue.

@darcymason
Copy link
Member

@TheChymera, I'm attaching a source tarball from my latest attempts with the prep2.4 branch. Do you mind giving it a try and making sure it works in your environment? It should be the same as a normal sdist, just without the tests folder.

pydicom-2.4.0.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants