Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numpy.memmap() instead of loading pixel data to memory #139

Closed
darcymason opened this issue Nov 27, 2014 · 7 comments · May be fixed by #1267
Closed

Use numpy.memmap() instead of loading pixel data to memory #139

darcymason opened this issue Nov 27, 2014 · 7 comments · May be fixed by #1267

Comments

@darcymason
Copy link
Member

From thomas.p...@gmail.com on January 23, 2014 09:18:37

Hello all,
I have to deal with rather large DICOM files (1GB+). Most of the time, I don't need the entire pixel data all at once, so using numpy.memmap() would be a huge time saver.

Initially, I wanted to change _pixel_data_numpy() in dataset.py accordingly, but unfortunately, the pixel data has already been loaded in this stage. Maybe one of the developers sees a simple way to integrate this consistently.

Cheers,
Tom

Original issue: http://code.google.com/p/pydicom/issues/detail?id=138

@darcymason
Copy link
Member Author

From darcymason@gmail.com on January 23, 2014 19:01:28

This has come up from time to time, and I think it is a reasonable suggestion. Will give some thought as to how best to work it in. I don't like assuming numpy is available (many users have much simpler needs), so it should be some kind of configurable option.

Status: Accepted
Labels: -Type-Defect Type-Enhancement

@darcymason
Copy link
Member Author

From thomas.p...@gmail.com on January 23, 2014 23:37:35

Thanks for considering.
If someone doesn't have or does not want to use numpy, chances are that the pixel data is not of interest anyway or that a plain mmap handle would be sufficient.

@darcymason darcymason added this to the v1.0.0 milestone Dec 17, 2014
@samueljohn
Copy link

that would be fantastic - I could use it, too. Hopefully Thomas you are reading this and consider a PR ...

@mshunshin
Copy link
Contributor

As someone who abuses mmap for speeding things up in pydicom and were we could go...

  1. A fair amount of "slowness" in reading large files was in the small reading window that was being used Default read_size in read_undefined_length_value() in fileutil.py is too small and slows down reading large DICOM files #436 that should now be fixed.

  2. If you don't need the image, putting stop_before_pixels=True means you can read the non-pixel data within a handfull of ms.

Thinking about strategy:

I don't think the numpy.mmap is the best way - as we not only need to specify a start byte in the file, but also end (there is allowed to be padding / junk after the pixel data in DICOM).

We would probably have to mmap the whole file, then pass a view to numpy, then reshape.

Whether it is worth pulling out the existing code and replacing it with mmap; or closing the file and reopening it wiht mmap when you wan't the PixelData I don't know - what do you think?

Ideally - I am aiming to fix this for compressed data as well - as I want to be able to very quickly load only the first frame (when I show all files in a directory); and then when playing a very large file decompress the frames as they are being shown the first time; but cache the results.

Does anyone know how to detect / know where the frame/slice boundries in compressed PixelData without readingin it - is there an index, or do I have to search for delimiters?

Matt

@hackermd
Copy link
Contributor

@mshunshin The value of the Basic Offset Table item can be used to determine the frame boundaries within the Pixel Data element value (see PR #534). The item value is not required, however (see PS3.5 A.4).

@darcymason darcymason added this to the v1.5 milestone Jun 29, 2019
@darcymason darcymason modified the milestones: v2.0, v2.2 May 18, 2020
@darcymason
Copy link
Member Author

Pushing this milestone back again to get v2.0 out first.

@darcymason
Copy link
Member Author

Closing this - I believe it has been shown elsewhere that you can open a memmap file and pass the handle to dcmread, which gives a workable solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants