Update filereader.py #227

parneshr · 2015-11-27T01:52:38Z

Added chuncked reads when reading an element with fixed length to overcome issues with large data elements. In particular elements over 4Gb result in errant reads and when reading the same elements in smaller chuncks of saw 1Gb, no errors are detected. This should have no performance issues for standard size data elements.

landscape-bot · 2015-11-27T01:57:08Z

Repository health decreased by 0.06% when pulling 39a9908 on parneshr:master into 43f2784 on darcymason:master.

6 new problems were found (including 0 errors and 0 code smells).
No problems were fixed.

cancan101 · 2015-11-27T17:22:01Z

pydicom/filereader.py

+                    len_to_read = length - buf_size
+                    while len_to_read > 0 : 
+                        if len_to_read > buf_size:
+                            value = value + fp_read(buf_size)


Is there something better than a string that can be used here?

darcymason · 2015-12-10T02:05:06Z

I like this chunked-read idea in principle, but I echo cancan101's question about using a string. String concatenation would probably be very slow. Another problem is that the binary file read in python 3 returns a bytes object, so I think using a string will generate a type error in python 3.

There are a couple of other minor things: the landscape-bot new issues (whitespace problems -- just style issues), and the setting of buf_size should happen at the top of the generator, not inside the loop just to optimize speed and be a little cleaner.

darcymason · 2017-07-14T02:14:37Z

Is the need for this removed if memmap (#139) is implemented?

mrbean-bremen · 2017-07-16T14:48:52Z

I think that memmap could help to prevent this problem - you could just do the partial reads and handle the concatination outside of pydicom, and would avoid any unnneeded performance degradation. The only downside is that it still has to be implemented...
Also, I'm not entirely sure about the original problem - AFAIK, the maximum size of a DICOM element is 4GB, but here it was stated that it was larger than that.

darcymason · 2017-07-16T15:03:02Z

AFAIK, the maximum size of a DICOM element is 4GB, but here it was stated that it was larger than that.

Agreed. There are only four bytes at most stored in a dicom file for Length, so max possible value length should be 4 GB.

mrbean-bremen · 2017-07-26T15:36:31Z

I looked at this again, and I would add this only as an option (probably something like chunk_size) which defaults to None, e.g. no chunked read - if it is needed at all, which has to be decided.
The reason is that it seems not to be needed on most (64 bit) systems, and introduces additional memory and CPU load for large files.
For reading byte strings shall be used, as has been already noted, and instead of concatination the byte strings could be collected in a list and joined at the end - this seems to be the fastet way according to stack overflow (though it probably will double the memory usage during joining - haven't checked this).

darcymason · 2017-12-02T18:56:34Z

Trying to clean up old issues ... this relates to #222. On reading this again, I remembered that length can be "undefined length" so maybe that allows more than 4 GB (although the original post talks about fixed length elements).

To close this, I think we need a solution for the string questions (does BytesIO work more efficiently?), or a memmap solution.

mrbean-bremen · 2018-02-14T17:57:05Z

@darcymason - I think this one can be closed, because the problem can be handled by reading frame-per-frame, once the respective PR has been merged. A possible memmap feature would also help, as discussed.

darcymason · 2018-02-15T02:32:06Z

@mrbean-bremen, agreed. Closing.

cancan101 reviewed Nov 27, 2015
View reviewed changes

massich added the needs contributor label Jul 11, 2017

glemaitre force-pushed the master branch from 477f2f1 to f191bb5 Compare July 11, 2017 22:54

darcymason closed this Feb 15, 2018

mrbean-bremen mentioned this pull request Feb 18, 2018

Chuncked reads for large elements #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update filereader.py #227

Update filereader.py #227

parneshr commented Nov 27, 2015

landscape-bot commented Nov 27, 2015

cancan101 Nov 27, 2015

darcymason commented Dec 10, 2015

darcymason commented Jul 14, 2017

mrbean-bremen commented Jul 16, 2017

darcymason commented Jul 16, 2017

mrbean-bremen commented Jul 26, 2017

darcymason commented Dec 2, 2017

mrbean-bremen commented Feb 14, 2018

darcymason commented Feb 15, 2018

Update filereader.py #227

Update filereader.py #227

Conversation

parneshr commented Nov 27, 2015

landscape-bot commented Nov 27, 2015

cancan101 Nov 27, 2015

Choose a reason for hiding this comment

darcymason commented Dec 10, 2015

darcymason commented Jul 14, 2017

mrbean-bremen commented Jul 16, 2017

darcymason commented Jul 16, 2017

mrbean-bremen commented Jul 26, 2017

darcymason commented Dec 2, 2017

mrbean-bremen commented Feb 14, 2018

darcymason commented Feb 15, 2018