Use numpy.memmap() instead of loading pixel data to memory #139

darcymason · 2014-11-27T01:13:43Z

From thomas.p...@gmail.com on January 23, 2014 09:18:37

Hello all,
I have to deal with rather large DICOM files (1GB+). Most of the time, I don't need the entire pixel data all at once, so using numpy.memmap() would be a huge time saver.

Initially, I wanted to change _pixel_data_numpy() in dataset.py accordingly, but unfortunately, the pixel data has already been loaded in this stage. Maybe one of the developers sees a simple way to integrate this consistently.

Cheers,
Tom

Original issue: http://code.google.com/p/pydicom/issues/detail?id=138

darcymason · 2014-11-27T01:13:43Z

From darcymason@gmail.com on January 23, 2014 19:01:28

This has come up from time to time, and I think it is a reasonable suggestion. Will give some thought as to how best to work it in. I don't like assuming numpy is available (many users have much simpler needs), so it should be some kind of configurable option.

Status: Accepted
Labels: -Type-Defect Type-Enhancement

darcymason · 2014-11-27T01:13:44Z

From thomas.p...@gmail.com on January 23, 2014 23:37:35

Thanks for considering.
If someone doesn't have or does not want to use numpy, chances are that the pixel data is not of interest anyway or that a plain mmap handle would be sufficient.

samueljohn · 2015-02-10T18:43:27Z

that would be fantastic - I could use it, too. Hopefully Thomas you are reading this and consider a PR ...

mshunshin · 2017-07-25T16:00:58Z

As someone who abuses mmap for speeding things up in pydicom and were we could go...

A fair amount of "slowness" in reading large files was in the small reading window that was being used Default read_size in read_undefined_length_value() in fileutil.py is too small and slows down reading large DICOM files #436 that should now be fixed.
If you don't need the image, putting stop_before_pixels=True means you can read the non-pixel data within a handfull of ms.

Thinking about strategy:

I don't think the numpy.mmap is the best way - as we not only need to specify a start byte in the file, but also end (there is allowed to be padding / junk after the pixel data in DICOM).

We would probably have to mmap the whole file, then pass a view to numpy, then reshape.

Whether it is worth pulling out the existing code and replacing it with mmap; or closing the file and reopening it wiht mmap when you wan't the PixelData I don't know - what do you think?

Ideally - I am aiming to fix this for compressed data as well - as I want to be able to very quickly load only the first frame (when I show all files in a directory); and then when playing a very large file decompress the frames as they are being shown the first time; but cache the results.

Does anyone know how to detect / know where the frame/slice boundries in compressed PixelData without readingin it - is there an index, or do I have to search for delimiters?

Matt

hackermd · 2018-01-18T23:28:27Z

@mshunshin The value of the Basic Offset Table item can be used to determine the frame boundaries within the Pixel Data element value (see PR #534). The item value is not required, however (see PS3.5 A.4).

darcymason · 2020-05-18T20:51:36Z

Pushing this milestone back again to get v2.0 out first.

darcymason · 2023-04-21T17:05:33Z

Closing this - I believe it has been shown elsewhere that you can open a memmap file and pass the handle to dcmread, which gives a workable solution.

darcymason added enhancement imported Difficulty-Medium labels Nov 27, 2014

darcymason added this to the v1.0.0 milestone Dec 17, 2014

darcymason mentioned this issue Apr 10, 2016

Provide PixelData decompression (JPEG, RLE, MPEG, etc) #18

Closed

darcymason mentioned this issue May 8, 2016

Use os.stat instead of stat #260

Merged

darcymason mentioned this issue Jun 7, 2016

pypi release with new package name #240

Closed

darcymason mentioned this issue Jan 15, 2017

DeferredDataElement deprecation #291

Closed

darcymason modified the milestones: v1.1, v1.0.0 Jul 11, 2017

darcymason mentioned this issue Jul 14, 2017

Update filereader.py #227

Closed

mrbean-bremen mentioned this issue Jul 24, 2017

Default read_size in read_undefined_length_value() in fileutil.py is too small and slows down reading large DICOM files #436

Closed

darcymason mentioned this issue Dec 15, 2017

Integrated handling of decompression and Dataset state #525

Closed

mrbean-bremen mentioned this issue Feb 18, 2018

Chuncked reads for large elements #222

Closed

mrbean-bremen modified the milestones: v1.1, v1.2 Jul 18, 2018

darcymason mentioned this issue Jul 19, 2018

[MRG+1] Deprecate DeferredDataElement #683

Merged

mrbean-bremen removed this from the v1.2 milestone Sep 10, 2018

darcymason mentioned this issue Sep 17, 2018

"underlying array is read-only" while modifying pixel value in pydicom version 1.1.0 #717

Closed

darcymason mentioned this issue Sep 20, 2018

How to specify pixel data as read-only #746

Closed

darcymason added this to the v1.5 milestone Jun 29, 2019

darcymason modified the milestones: v2.0, v2.2 May 18, 2020

scaramallion added the pixel-data label Nov 1, 2020

darcymason mentioned this issue Nov 27, 2020

[WIP] Add memmap capability for binary data element values #1267

Open

6 tasks

mrbean-bremen removed this from the v2.2 milestone Apr 19, 2022

darcymason closed this as completed Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use numpy.memmap() instead of loading pixel data to memory #139

Use numpy.memmap() instead of loading pixel data to memory #139

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

samueljohn commented Feb 10, 2015

mshunshin commented Jul 25, 2017

hackermd commented Jan 18, 2018

darcymason commented May 18, 2020

darcymason commented Apr 21, 2023

Use numpy.memmap() instead of loading pixel data to memory #139

Use numpy.memmap() instead of loading pixel data to memory #139

Comments

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

samueljohn commented Feb 10, 2015

mshunshin commented Jul 25, 2017

hackermd commented Jan 18, 2018

darcymason commented May 18, 2020

darcymason commented Apr 21, 2023