-
-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update filereader.py #227
Update filereader.py #227
Conversation
Added chuncked reads when reading an element with fixed length to overcome issues with large data elements. In particular elements over 4Gb result in errant reads and when reading the same elements in smaller chuncks of saw 1Gb, no errors are detected. This should have no performance issues for standard size data elements.
|
len_to_read = length - buf_size | ||
while len_to_read > 0 : | ||
if len_to_read > buf_size: | ||
value = value + fp_read(buf_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something better than a string that can be used here?
I like this chunked-read idea in principle, but I echo cancan101's question about using a string. String concatenation would probably be very slow. Another problem is that the binary file read in python 3 returns a bytes object, so I think using a string will generate a type error in python 3. There are a couple of other minor things: the landscape-bot new issues (whitespace problems -- just style issues), and the setting of buf_size should happen at the top of the generator, not inside the loop just to optimize speed and be a little cleaner. |
Is the need for this removed if memmap (#139) is implemented? |
I think that memmap could help to prevent this problem - you could just do the partial reads and handle the concatination outside of |
Agreed. There are only four bytes at most stored in a dicom file for |
I looked at this again, and I would add this only as an option (probably something like |
Trying to clean up old issues ... this relates to #222. On reading this again, I remembered that length can be "undefined length" so maybe that allows more than 4 GB (although the original post talks about fixed length elements). To close this, I think we need a solution for the string questions (does BytesIO work more efficiently?), or a memmap solution. |
@darcymason - I think this one can be closed, because the problem can be handled by reading frame-per-frame, once the respective PR has been merged. A possible memmap feature would also help, as discussed. |
@mrbean-bremen, agreed. Closing. |
Added chuncked reads when reading an element with fixed length to overcome issues with large data elements. In particular elements over 4Gb result in errant reads and when reading the same elements in smaller chuncks of saw 1Gb, no errors are detected. This should have no performance issues for standard size data elements.