Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimisation approaches for cutting many cells in close proximity while preserving membrane Integrity #5

Open
sophiamaedler opened this issue Feb 21, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@sophiamaedler
Copy link
Collaborator

As has been discussed offline it would be ideal to develop an optimisation approach to allow for the cutting of cells in close proximity while preserving membrane integrity. I have created this GitHub Issue to discuss in more detail with all interested parties (@fabsen-87, @josenimo, @LisaSchweizer) to ensure that we are implementing a tool that meets everyone's needs.

Based on the feedback I have received so far a first approach could be to selectively eliminate individual cells from densely segmented slide areas in such a way that we make sure that the membrane stays intact and we don't collect any "wrong" membrane areas and also don't lose membrane integrity. This would probably be implemented in an iterative fashion using proximity and area filters. We would of course lose some cells but would in return be able to collect all other cells without risk of contamination/destroying the sample.

One question that @GeorgWa and I had was at what point in the processing pipeline it would make the most sense from your perspective to implement such a filter:
(1) when loading the segmentation mask or
(2) when actually generating the cutting XML

In addition if you have any specific requirements that such a tool would need to fulfil it would be great if you could quickly outline them here.

@sophiamaedler sophiamaedler added the enhancement New feature or request label Feb 21, 2022
@fabsen-87
Copy link

fabsen-87 commented Feb 22, 2022 via email

@GeorgWa
Copy link
Collaborator

GeorgWa commented Feb 22, 2022

Hi Fabian,

there is already a function for shape collections called Collection.stats()
You could call this after loading a segmentation similar to:

sl = SegmentationLoader(config = loader_config, verbose = False)
shape_collection = sl(segmentation, 
                    cell_sets, 
                    calibration_points)
                    
shape_collection.stats()

It will give you information on the number of shapes and the number of vertices:

===== Collection Stats =====
Number of shapes: 7
Number of vertices: 4,913
============================
Mean vertices: 702
Min vertices: 599
5% percentile vertices: 617
Median vertices: 687
95% percentile vertices: 811
Max vertices: 839

I've used it so far to optimize the compression of shapes.
Let me know if this is what you are looking for.

Best,
Georg

@fabsen-87
Copy link

fabsen-87 commented Feb 24, 2022 via email

@josenimo
Copy link
Collaborator

Regarding the cutting strategy @sophiamaedler @GeorgWa, to me it makes more sense to implement the cutting path at the XML export step. I am not sure exactly what loading the segmentation mask means.

Is this implementation only changing the order the contours are cut, or are some contours just skipped for the sake of the integrity of the membrane?

Best,
Jose

@sophiamaedler
Copy link
Collaborator Author

Hi @josenimo,
This implementation would skip some contours so you would indeed lose some shapes. In our opinion this more radical approach is necessary since only optimising the cutting order will not ensure membrane integrity in all cases. E.g. if you have a fully connected circle of cells as soon as you cut the last cell the middle area would fall down and be incorrectly collected even if we optimise the order in such a way that it happens as late as possible, which is something we would like to avoid at all costs. Having fewer cells available is something that users should be able to address by getting more input material/segmenting more cells/etc where as an incorrect collection could ruin an entire experiment. What are your thoughts on this?
With loading the segmentation mask I was referring to the very first step in the pipeline where you import an array defining the areas of individual cells/contours so that they can be converted to an xml in later steps. The advantage of already adjusting the shapes in such a way that membrane integrity is ensured in this step is so that you get feedback as early as possible on the maximum number of shapes available in your segmentation for cutting. One concern we had if we implement this algorithm in the final export step is that the user then suddenly has much fewer cells available than he expects based on what he loaded and selected in previous steps and might end up with unequally distributed classes. So for example you load your segmentation and look at the classes and see that you have at least 500 cells available of each type so you choose to export 500 of each class to your XML. Unluckily class 1 is clustered much closer together and we need to filter out more cells than in class2 to ensure that we preserve membrane integrity. So you then actually end up with 350 cells in class 1 and 450 cells in class 2. Going back to optimise the cell selection so that you actually end up with 500 cells each could then be quite cumbersome and at least in our application it is usually quite relevant to have balanced classes. I would love to hear more of your thoughts on this issue though! Since the pipeline is quite flexible I am sure it would also be easy to implement a more flexible solution if this brings a benefit to users.
Cheers
Sophia

@josenimo
Copy link
Collaborator

josenimo commented Mar 4, 2022

Hey @sophiamaedler,
This all makes sense now, I was a bit confused at first.
I think that running the algorithm right after applying the segmentation mask makes the most sense. As you say it is important to keep the sample number consistent and comparable, and the earlier the better. Would it be possible to ask the algorithm for a certain number of cells and then it would take into account the cell positions? Because I could imagine it would take some trial and error to get down exactly a number of cells, even though I guess I would just get more than needed and then leave the rest behind.. just throwing ideas :D.

In my mind having an exact number of cells for each group is not essential, our goal is to get enough cells for each group to observe their proteins. Thoughts @fabsen-87 ?
Best,
Jose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants