Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell values into shapes? #2

Open
josenimo opened this issue Feb 4, 2022 · 32 comments
Open

Cell values into shapes? #2

josenimo opened this issue Feb 4, 2022 · 32 comments

Comments

@josenimo
Copy link
Collaborator

josenimo commented Feb 4, 2022

Dear Sophia and George,
I am Jose, a PhD student at Fabian's lab in Berlin. I am very happy to hear your progress with the current py-lmd, its functionalities are great! For us however there is a key step missing. Obtaining the shape coordinates for each cell from the analysis pipeline output. I have uploaded a sample output table. I tried Googling a simple way to translate the cell's shape measurements into coordinates, but I was not able to find anything. Let me know if there is any way I can help.

Thank you again for the wonderful package, I look forward hearing from you.
Best,
Jose

MCMICRO17_output.csv

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 4, 2022

Dear Jose,

good to hear that you checked out our package. At the moment the module gives you a basic framework for creating more complex workflows. Therefore, the library is highly customizable and there are no real 'out of the box' solutions yet.

We developed it to generate cutting files based on segmentation masks and I will add this workflow over the course of the next days. This segmentation based workflow can be used with an segmentation map and has a couple of more complex features which are very benefitial for cutting large numbers of single cells:

  • Combining touching or intersecting shapes
  • Applying filters like smoothing or binary erosion, dilation
  • Compression of vertices for improved cutting time
  • Optimization of cutting order for improved cutting time

The alternative to using segmentation maps is what you suggested. Using a list of coordinates with width, height, rotation is already quite straight forward with the package. You could for example use the tools.rectangle() function to generate a rectangle for every shape in the list.

df = pd.read_csv('sample_locations.csv')
for row_dict in df.to_dict(orient="records"):
    cell_shape = tools.rectangle(row_dict['MinorAxisLength'],
                                row_dict['MajorAxisLength'],
                               offset = (row_dict['X_centroid'], row_dict['Y_centroid']),
                               rotation = row_dict['Orientation'])

When I look at your data and from our experience with cutting single cells, smooth shapes are generally easier to cut. I therefore implemented a function to generate an ellipse, analogous to the tools.rectangle function. You can find the function in the documentation and I included a notebook under /notenbooks/Centroid_Cell_list.

Screenshot 2022-02-04 at 12 39 16

If we use this tools.ellipse function we can generate cutting data which would be a very good start for what you plan to do. I hope this can serve as a first inspiration.

import pandas as pd
import numpy as np
from lmd.lib import Collection, Shape
from lmd import tools

calibration = np.array([[0, 0], [0, 13000], [13000, 13000]])
my_first_collection = Collection(calibration_points = calibration)

# load csv
df = pd.read_csv('sample_locations.csv')

# iterate all rows
for row_dict in df.to_dict(orient="records"):
    
    # generate a shape for each row
    cell_shape = tools.ellipse(row_dict['MinorAxisLength'],
                                row_dict['MajorAxisLength'],
                               offset = (row_dict['X_centroid'], row_dict['Y_centroid']),
                               rotation = row_dict['Orientation'])
    
    # add shape to collection
    my_first_collection.add_shape(cell_shape)
    
my_first_collection.plot(calibration = True, fig_size = (20, 20))

fig (1)

From @sophiamaedler and mine experience it's good to start with the calibration workflow and make sure that the transfer of coordinates from single images -> whole slide images -> cell locations -> cutting data -> LMD microscope works. We started by generating a custom calibration layout which we would use with every membrane slide before putting samples on them. If you would like, we could have a call together and discuss the best use of the package.

Best, Georg

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 4, 2022

@josenimo Do you have a segmentation for the sample you shared?

@josenimo
Copy link
Collaborator Author

josenimo commented Feb 7, 2022

Dear Georg,
Thank you for your prompt response, I am very impressed by the quick results you got. The workflow is clear and easy to follow. The final image looks good, even though some shapes seem to be overlapping, I will try out somethings inspired from what you have done.

Regarding segmentation data, yes, I have a segmentation file. Attaching a download link.
https://filetransfer.mdc-berlin.de/?u=CwDbNwC7&p=WF2MfDBj

Best,
Jose

@sophiamaedler
Copy link
Collaborator

Dear Jose,
Glad to hear that you are finding the pylmd Library useful.
Georg also added some additional documentation/functions this weekend which let you directly work with segmentation masks to generate the XMLs. I am not sure if you already saw this but it could be quite interesting for you. You can check out the last two commits for more details. We can then also discuss in more detail next week in person.
Cheers
Sophia

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 7, 2022

Dear Jose,

Thanks for providing the segmentation. In the first workflow I shared, we only have information on the location and rough shape of the cells. Therefore the shapes we place can still overlap. The best practive is to use a segmentation like you provided. I have added the workflow and some preliminary information in the documentation. I will extend the documentation to include information on the different processing parameters for optimizing the shapes and the cutting performance. Processing the segmentation is very straightforward and takes about 3 minutes.

In the example I select all 13803 cells and assign them to well A1.
The first image shows the generated cutting data.

import numpy as np
from PIL import Image
from lmd.lib import SegmentationLoader

import tifffile
im = tifffile.imread('/Users/georgwallmann/Documents/testdaten/cell.ome.tif')
segmentation = np.array(im).astype(np.uint32)

all_classes = np.unique(segmentation)

cell_sets = [{"classes": all_classes, "well": "A1"}]

calibration_points = np.array([[0,0],[0,13000],[13000,13000]])

loader_config = {
    'orientation_transform': np.array([[0, -1],[1, 0]])
}

sl = SegmentationLoader(config = loader_config, verbose=True)
shape_collection = sl(segmentation, 
                    cell_sets, 
                    calibration_points)
                    
shape_collection.plot(fig_size = (50, 50), save_name='big.png')

big

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 7, 2022

In the different plots you can also see the effect of the orientation transform. In the first workflow we mapped x and y directly to the LMD x- and y-axis without applying any transform. In the SegmentationLoader workflow we specify the transform as described in the documentation and get the same orientation as in the image.

@josenimo
Copy link
Collaborator Author

josenimo commented Feb 8, 2022

Dear Georg and Sophia,

I had to add from skimage import segmentation for running shape_collection = sl(segmentation, cell_sets, calibration_points) otherwise it ran into an Attribution Error.

Otherwise I was able to replicate everything shown by Georg, it is quite impressive, I am excited to try it out the entire open source workflow in a new sample. I have some concerns that you could probably help me with. We etch our reference points using the LMD, what would be the simplest way to obtain the coordinates of these points that could only be recognized visually? Could I just load up the merged picture in ImageJ/Napari and check the pixel coordinate?

We will prepare a tissue sample in a Frame slide, and try it out. I will keep you updated.

Thank you again for the amazing support!
Jose

@sophiamaedler
Copy link
Collaborator

sophiamaedler commented Feb 8, 2022

Dear Jose,
Thanks for the feedback. Glad you were able to replicate Georg's results with some small modifications.

Yes this would be the simplest way to obtain your calibration coordinates and is also how we do it in our workflow. We first calibrate our frame slides with a cross pattern and unique sample ID number before seeding our cells on the slides. After staining and imaging, we assemble whole-slide stitched images and open these in ImageJ. I can then visually find the calibration crosses (since I use the same calibration mask for all slides they are always approx. in the same location) and can write down the pixel coordinates. When I stitch the images I ensure that the generated image is in the same orientation as the slide will be when I place it in the LMD7. As Georg explains in the documentation here this ensures that the pylmd software automatically handles the transformation of the coordinates. This means you do not need to make any calculations/transformations before inputing the coordinates into the workflow Georg posted above. What we have found to be the critical step in this process in general is the image stitching. Even small errors in the stitching will lead to an incorrect shape alignment when importing the XML on the LMD7.

I am in the process of making a tutorial to quickly generate a calibration mask (i.e. an XML which you can load to calibrate slides with) and will also describe the process of finding calibration points and inputing them to the XML generation workflow. I'll let you know as soon as it is finished.

Cheers
Sophia

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 8, 2022

Dear Jose,

glad to see that you were able to reproduce the results.
Could you post the whole stack trace for the error you encountered? The segmentation variable should contain the labels from the image as numpy array. Therefore I dont understand why you had to import the segmentation package from scikit-image.

Best,
Georg

@josenimo
Copy link
Collaborator Author

josenimo commented Feb 9, 2022

Dear Georg,

I can't really say why it behaves like this, I have copied the error message into a notepad file. Hopefully it helps, I am not really sure what you mean with stack trace.
pylmd_bug_1.txt
Best,
Jose

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 9, 2022

Thanks a lot, I will fix this with the next version.

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 3, 2022

Dear @GeorgWa @sophiamaedler ,

I am trying to replicate the export of contours from segmentation data, but I keep running into Attribution errors. Like I mentioned I ran into the lack of skimage.segmentation, but then as well as the
AttributeError: module 'skimage.segmentation' has no attribute 'astype'
I am posting the entire stack here:
lmd_py_bug2_MCMICRO_19.txt
I will keep trying other things, I will let you know if I solve it

@sophiamaedler
Copy link
Collaborator

@josenimo could you please post the code you are running? Ideally with input files? Or alternatively send them to me via email. Then I would try to replicate your issue.

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 3, 2022

Alright, here is the link to the jupyter notebook, and the segmentation file, perhaps there is something not up to date?
https://filetransfer.mdc-berlin.de/?u=dueURtvt&p=h3cnNnMe

@sophiamaedler
Copy link
Collaborator

great I will check it out and let you know.

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 3, 2022

Importing skimage.segmentation worked with the first sample data that I sent you, the colon cancer sample, but with this one it is not working...
Also, I think there might be something wrong, or at least different with the segmentation file, for something reason I can't even open it in ImageJ..

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 3, 2022

I was able to stop the error from coming up, by using the quality control file. this file usually has the segmentation mask as one image, and the real image, both in a stack. I just took the segmentation mask and made it an image by itself. It ran but it got stuck at the calculating polygons.

lmdpy_bug3_MCMICRO19.txt

@sophiamaedler
Copy link
Collaborator

I was able to run your notebook on my MacBook with the original file you sent without any issues so far. It is currently on the calculate polygons step. This is processing very slowly on my MacBook for some reason (less than 1it/s so total processing time would be over 4hs) also for me. I am currently checking into that and would get back to you on what I find. It normally does not take this long when I run it from my Linux Workstation (something like this processed there in minutes).

I also did not need to import skimage.segmentation to get your notebook to run so I think something might be incorrect in your environment. On what setup are you running it? Could you maybe also post the specs of your conda environment?

I have generated a yaml from my current working environment. You could try generating a new Conda environment from that using conda env create -f pylmd_environment.yml and see if the import skimage.segmentation issue still occurs. Regarding the very slow processing speed of the polygon creation I will get back to you. I am checking the same setup on my linux workstation now.

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 3, 2022

Ok, at least it is just an environment thing.. I am trying to recreate the environment but conda is failing.. Trying to recreate at my home desktop to see if it is workstation specific

@sophiamaedler
Copy link
Collaborator

It might be that some of my conda package specs are M1 Mac specific. If you keep having issues I would try a clean Conda env and try again with that.

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 3, 2022

Hmm, I am not able to replicate your conda env, I tried moving the packages that are failing to be installed by pip, or installing manually and I am failing. Could you send me the env from your Linux workstation? I cannot think of anything else..
I tried reinstalling pylmd from scratch in new conda env, and I got the following error message.
pylmd_bug4.txt

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 3, 2022

Hi Jose,

First regarding the Attribution error:

In scikit-image you have to import all submodules explicitly e.g.:
from skimage.morphology import binary_erosion, disk

If you only import the whole package like this, it will behave in an unpredictable way:
import skimage as sk

I wasn't aware of this behavior and fixed it by changing all sk imports in the following commit:
a210562#diff-7be789b257496f870868616568b1c1f72d3a36ff3d43f433c2504fbcff858feb

Therefore you don't have to inlcude the line from skimage import segmentation anymore.
Using this line will give you two different objects named segmentation (a python module and the segmentation we try to import) which leads to interference.

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 3, 2022

Regarding a functional conda environment:
I am very sorry you are experiencing the Issues.
Unfortunately, installing the package only based on pip dependencies does not work for some reason.
I have updated the instructions in the README.md.

Please make sure you have the latest version of py-lmd (git pull) and try the following:

conda create -n "dep-test"
conda activate dep-test
conda install python=3.9 scipy scikit-image numpy numba
pip install -e .

I just verified this. Please let me know of this resolves the errors.

@josenimo
Copy link
Collaborator Author

josenimo commented Mar 4, 2022

Hey @GeorgWa @sophiamaedler,

Georg's last advice was exactly what I needed. I did however had to install extra packages to run that specific conda env in jupyter notebook, not sure if there is an easier way of doing this. I installed nb_conda and ipykernel with conda, and to pip install environment_kernels. Then I could run the env in Jupyter without any problems.

As of now, python is still running, it is using about 180gb of RAM, and plenty of CPU. It says it will take 5 hours. I hope I have to do this only once :)

Calculating polygons
2%|█▌ | 307/16400 [09:49<5:38:33, 1.26s/it]

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 4, 2022

Good to hear @josenimo that it worked.
Now that we solved the issues regarding the environment, I will take a look at the performance.

Sophia already told me that she is experiencing the same bottleneck at the step of polygon generation. I can't reproduce this on my mac where I get 80 iterations /s. I am currently trying to connect to my Linux environment and will update you on the issue.

The step should be very fast and only consume a couple of gigabytes of ram. I'm ver confused. 😀
Best,
Georg

@GeorgWa GeorgWa closed this as completed Mar 4, 2022
@GeorgWa GeorgWa reopened this Mar 4, 2022
@josenimo
Copy link
Collaborator Author

josenimo commented Mar 4, 2022

Alright good to hear,
my Jupyter notebook crashed after about 2 hours, so I could not see the end result.
Have a good weekend!

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 4, 2022

That sounds a lot like python was out of memory.
Have a good weekend!

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 19, 2022

Hi @josenimo,

we also observed the bug on our linux system. It lead to very slow processing after the first 3000 shapes and very high memory utilization. I released a new version of the py-lmd library and this bug should now be fixed.

It would be great If you could update your repository and check if the segmentation works for you now!

Best,
Georg

@josenimo
Copy link
Collaborator Author

Hey @GeorgWa ,

I have git pull, and pip install -e . in my conda environment. However running the shape
collection making command crashes with the following trace:
20220321_lmdpy_bug.txt

Hopefully it is something simple, or most likely a simple mistake I am making, (I am running the exact same Jupyter notebook)
Best,
Jose

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 21, 2022

Hi Jose,

This is due to an old version of scikit-image.

I will include try to include a warning.
Can you try to update scikit-image to a version greater than 0.19?

Best,
Georg

@GeorgWa
Copy link
Collaborator

GeorgWa commented Mar 21, 2022

I've upgraded the instructions in the readme, sorry for the inconvenience.
It seems like scikit-image greater than 0.19 is only available on conda-forge.

git clone https://github.com/HornungLab/py-lmd

conda create -n "py-lmd-env"
conda activate py-lmd-env
conda install python=3.9 scipy scikit-image>=0.19 numpy numba -c conda-forge
pip install -e .

@josenimo
Copy link
Collaborator Author

Dear @GeorgWa,

now it works wonderfully, I will start looking into filtering cells, and applying that to the contour creation.
Is there an easy way for me to superimpose the XML contour data on a picture viewer, say napari or imagej? I know you guys coded an XML reader but I haven't had the time to look it up

Best,
Jose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants