Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"underlying array is read-only" while modifying pixel value in pydicom version 1.1.0 #717

Closed
xulunk opened this issue Aug 17, 2018 · 23 comments · Fixed by #742
Closed

"underlying array is read-only" while modifying pixel value in pydicom version 1.1.0 #717

xulunk opened this issue Aug 17, 2018 · 23 comments · Fixed by #742
Assignees
Labels
Milestone

Comments

@xulunk
Copy link

xulunk commented Aug 17, 2018

I tried to run a demo about reading dicom file, modify the pixel value and write new pixel data to another dicom file, however it failed in pydicom version 1.1.0 while works well for version 0.9.9-1.
In order to modify the pixel data in version 1.1.0 or later, what modification should I do?
Looking forward to your reply, thanks.

p.s. in order to run version 0.9.9-1, I just changed the module name from pydicom to dicom as recommended.

Description

Steps/Code to Reproduce

import pydicom
import pylab
ds = pydicom.dcmread("IMG.DCM")
 
##查看有哪些属性
print(ds.dir("pat"))
 
##查看对应属性的具体值
print(ds.PatientName)
 
##将属性值给某个元素
data_element = ds.data_element("PatientName")  # or data_element = ds[0x10,0x10]
print(data_element.VR, data_element.value)
 
##删除属性
#del ds.SoftwareVersions
 
##原始二进制文件
pixel_bytes = ds.PixelData
 
##CT值组成了一个矩阵
pix = ds.pixel_array
 
##读取显示图片
pylab.imshow(ds.pixel_array, cmap=pylab.cm.bone)
#pylab.show()
 
##修改图片中的元素,不能直接使用data_array,需要转换成PixelData
print('ds.pixel_array.shape =', ds.pixel_array.shape)
for n,val in enumerate(ds.pixel_array.flat): #example: zero anything < 300
    if val < 300:
        ds.pixel_array.flat[n] = 0
ds.PixelData = ds.pixel_array.tostring()
ds.save_as("IMG_NEW.DCM")

Expected Results

IMG_NEW.DCM is written successfully

Actual Results

Traceback (most recent call last):
File "readDicom.py", line 34, in
ds.pixel_array.flat[n] = 0
ValueError: underlying array is read-only

Versions

python 3.6.3
pydicom 1.1.0

@scaramallion
Copy link
Member

scaramallion commented Aug 18, 2018

Reproducible with current master (87b40f8) and python 2.7/3.6, numpy 1.14.5/1.14.3.

>>> from pydicom import dcmread
>>> ds = dcmread('CT_small.dcm')
>>> arr = ds.pixel_array
>>> arr[0, 4]
139
>>> arr[0, 4] = 12
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: assignment destination is read-only
>>> arr.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

WRITEABLE : False can be fixed with arr.setflags(write=1), but I have no idea why the array is read-only to begin with.

@scaramallion
Copy link
Member

scaramallion commented Aug 18, 2018

We use frombuffer to create the array, which uses the memory buffer of the input data and as str/bytes are immutable in python this makes the resulting array read-only.

>>> import numpy as np
>>> np.frombuffer(b'\x00\x01', dtype='uint8').flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
>>> np.fromstring(b'\x00\x01', dtype='uint8').flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
>>> np.frombuffer(bytearray(b'\x00\x00'), dtype='uint8').flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

v0.9.9 used fromstring, looks like it was changed in 28e7459 (#577) @mrbean-bremen do you remember why the change was made? I can see the benefit in reusing the memory, does changing the flag to make the array writable negate that?

@scaramallion
Copy link
Member

scaramallion commented Aug 18, 2018

Oh yikes (remember that in pydicom original is usually ds.PixelData):

>>> original = b'\x00\x01'
>>> arr = np.frombuffer(original, dtype='uint8')
>>> arr.setflags(write=1)
>>> arr[0] = 12
>>> original
b'\x0c\x01'

I think we should change back to fromstring.

scaramallion added a commit to scaramallion/pydicom that referenced this issue Aug 18, 2018
scaramallion added a commit to scaramallion/pydicom that referenced this issue Aug 18, 2018
@mrbean-bremen
Copy link
Member

mrbean-bremen commented Aug 18, 2018

@xulunk - if you want to modify the returned pixel array, you have to copy it first using numpy.copy. Sorry for breaking that...

@scaramallion - this has been changed because tostring () is deprecated (there have been depreciation warnings). I also thought it a good idea at that time because of the performance improvement.
I'm still not sure if we shall update the code to copy the data, or update the documentation to habe the user copy it. I have no idea how common the use case of directly modifying the returned pixel data is.

@scaramallion
Copy link
Member

Ah, that makes sense. I think its better to return a copy given in-place changes result in PixelData being changed as well.

@darcymason
Copy link
Member

I think its better to return a copy given in-place changes result in PixelData being changed as well

I don't see that as a problem. No one can use PixelData directly, it makes no sense on its own as 1d bytes.

Copying the array is really not reasonable for very large images, which is a use case that many people do have. I think we need to avoid making copies.

@xulunk xulunk changed the title "underlying array is read-only" while modifying pixel value in version 1.1.0 "underlying array is read-only" while modifying pixel value in pydicom version 1.1.0 Aug 20, 2018
@xulunk
Copy link
Author

xulunk commented Aug 20, 2018

@scaramallion @mrbean-bremen @darcymason 👍

  • Thanks a lot to your reply;

  • I modified the code as follows and now it runs well;

  • The demo I used referred to the official document of pydicom: working_with_pixel_data, could you please update the user guide?

  • Shall I close the issue now?

    new_array = ds.pixel_array.copy()
    for n,val in enumerate(new_array.flat): #example: zero anything < 300
        if val < 300:
            new_array.flat[n] = 0
    ds.PixelData = new_array.tostring()
    ds.save_as("IMG_NEW.DCM")

@mrbean-bremen
Copy link
Member

@xulunk - glad to hear you got it to work.
Leave the issue open please - we still have to fix it.

@mrbean-bremen
Copy link
Member

@darcymason @scaramallion - do we have a decision for this one? Leave it as is?

@darcymason
Copy link
Member

darcymason commented Sep 15, 2018

I think making the buffer writable restores previous behaviour. It does mean that PixelData is modified in-place, but as I said before, I don't see that as a problem - I don't see why anyone would be directly manipulating PixelData's 1-d array (except to write to it with tobytes()/ tostring() after using pixel_array, which will still work).

Edit: we should still of course warn about this change in behaviour.

@mrbean-bremen
Copy link
Member

So, setting setflags(write=1) as proposed by @scaramallion and adding a warning to the release notes/documentation is the way to go, if I understand that correctly, right? Is this a candidate for 1.2?

@darcymason
Copy link
Member

So, setting setflags(write=1) as proposed by @scaramallion and adding a warning to the release notes/documentation is the way to go, if I understand that correctly, right? Is this a candidate for 1.2?

Yes, that's what I think is the best way forward. I just realized too that a test for this is very important -- it should have been caught before 1.1 went out.

And yes, IMO it's very important for 1.2 because it "broke" 1.1.

@scaramallion
Copy link
Member

scaramallion commented Sep 16, 2018

There's a fair bit of inconsistency in that some of the handlers retain a link to the buffer and some don't. The numpy handler will sometimes return a copy (as with bit packed data) and sometimes not (may also depend on the exact parameters of the data). I don't know for sure about the other handlers, but given they decompress data they're almost certainly not linked to the original buffer.

I'd prefer it if the behaviour was more predictable (either in-place modification never affects Pixel Data or always does), but I don't think the 'always affects Pixel Data' option is even possible.

Also, as long as only the view (i.e. a reshape/transpose/dtype) on the read-only 1D array is changed (and I believe even then some particular view operations will return a copy instead), the returned ndarray will be read-only.

@scaramallion scaramallion added this to the v1.2 milestone Sep 16, 2018
@mrbean-bremen
Copy link
Member

@scaramallion - so what do you propose? Return a copy in the cases where the original pixel data is affected, and make the array writable otherwise?

@scaramallion
Copy link
Member

scaramallion commented Sep 16, 2018

Returning a copy for the numpy handler would make it consistent with the others, which should all be writable. I guess its up to @darcymason to decide.

I'll add some tests to try and nail down the circumstances where the returned array is read-only.

@scaramallion
Copy link
Member

scaramallion commented Sep 16, 2018

Right, so:

  • Numpy handler arrays with BitsAllocated = 1 are writeable
  • Numpy handler arrays with BitsAllocated > 1 are read-only and linked to PixelData
  • RLE handler arrays are writeable, except in PyPy2 (a numpypy issue? not sure why it only affects RLE)
  • Pillow handler arrays are writeable
  • JPEG-LS handler arrays with NumberOfFrames = 1 are read-only and not linked to PixelData
  • JPEG-LS handler arrays with NumberOfFrames > 1 are writeable
  • GDCM handler arrays are read-only and not linked to PixelData

The single frame JPEG-LS and the GDCM handlers return read-only arrays because the decompressed output from gdcm and jpeg_ls are turned into a numpy array from a bytes, they're disconnected from the original PixelData memory. The multi-frame JPEG-LS arrays use bytearray instead and hence are writeable.

@mrbean-bremen
Copy link
Member

So if I understand that correctly, the returned arrays for the GDCM and JPEG-LS handlers could be made writable without problems, and the question is about the NumPy handler with the alternatives:

  • copy it to be consistent (possible performance problem)
  • make it writable and document that original pixel data will be changed (possible unintended modification)
  • leave it read-only and document that you have to copy it for modification (inconsistency between handlers)

@darcymason
Copy link
Member

To step back from detail for a moment, I'd like pixel data to work exactly like this 1-D array example:

>>> # load CT_small.dcm from testdata...
>>> pos = ds.ImagePositionPatient
>>> pos
['-158.135803', '-179.035797', '-75.699997']
>>> pos[0] = 0
>>> ds.ImagePositionPatient
['0.0', '-179.035797', '-75.699997']

The fully converted data element value should be a reference to a (mutable) numpy array. That's what we have given in the past for pixel_array, which we are now finding in some cases has not been mutable.

... the question is about the NumPy handler with the alternatives:

  • copy it to be consistent (possible performance problem)

Yes, if necessary. We used fromstring() pre-1.1, that must also have been a copy.

  • make it writable and document that original pixel data will be changed (possible unintended modification)

Yes. But I'd be interested to hear specific code this would break. If there are common cases, we could re-think this.

  • leave it read-only and document that you have to copy it for modification (inconsistency between handlers)

I don't like this option. Consistency is important - I think it would be very confusing to users to have to copy pixel data themselves as a normal operation.

And I can't help but put in a plug again for what I have argued before: this confusion all goes away if we indeed treat PixelData like all other data elements. That we should (very long term, major version release) deprecate pixel_array and have PixelData work just like ImagePositionPatient did in my example above. Now that pydicom can handle almost any pixel data using external libraries, the artificial split is no longer necessary.

@mrbean-bremen
Copy link
Member

I don't like this option. Consistency is important

Agreed. I put this here to have all options listed (at least which I see), but I would also tend to option 1 or 2. I don't know of the specific use cases, so I have no strong opinion here.

this confusion all goes away if we indeed treat PixelData like all other data elements.

This sounds very sensible to me - though there has been some arguing against that, IIRC. This is something that can be discussed again after the release, and it may not be that long term in my opinion, but that remains to be discussed.

@scaramallion
Copy link
Member

scaramallion commented Sep 17, 2018

I'm in favour of the first option. For the numpy handler we could add a keyword argument to get_pixeldata.

# Default, returns mutable array
arr = get_pixeldata(ds, copy=True)
# Returns immutable array linked to PixelData (if not bit-packed)
arr = get_pixeldata(ds, copy=False)

I also agree with eventually deprecating pixel_array. It might be a good exercise to implement this first for OverlayData to get an idea of the best way to go about it (and/or convince those against it).

@mrbean-bremen
Copy link
Member

For the numpy handler we could add a keyword argument to get_pixeldata.

+1
I thought about the same thing, so you still have the possibility to avoid the copying.

@darcymason
Copy link
Member

Can I suggest read_only=True/False rather than copy? The distinction is better for eventual memory-mapped files (#139) where that would control how the file is opened. Also, I think it would be more natural to raise an error with the message that the pixel data was converted read-only, if someone attempts to write to a read-only array. In either case, the documentation has to point out the other effect -- i,e, if read-only is used, the documentation would explain the advantage in terms of memory saving.

@scaramallion
Copy link
Member

Ok, sounds good. I'll take care of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants