Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] : Better Tracing & Skeletonisation Merger #800

Open
7 of 24 tasks
MaxGamill-Sheffield opened this issue Feb 12, 2024 · 8 comments
Open
7 of 24 tasks

[feature] : Better Tracing & Skeletonisation Merger #800

MaxGamill-Sheffield opened this issue Feb 12, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@MaxGamill-Sheffield
Copy link
Collaborator

MaxGamill-Sheffield commented Feb 12, 2024

Is your feature request related to a problem?

Users currently require the better skeletonisation, pruning and tracing found within the maxgamill-sheffield/cats branch along with the height-traces Sylvia has made together into one branch.

Describe the solution you would like.

The coding group has decided the best approach is to cherry pick the relevant code (a.k.a no topology stuff yet, or NodeStats), and push these into main, then when we have the time we can write tests. This will hold off any releases to PyPi in the meantime as the Software will be untested.

The tasks are as below:

Current ToDo list:

Outputs

  • Get advice from experimentalists - How do you want the data saving in the csv if any? Due to multiple molecules per grain.
  • Get Sylvia to check and comment on the hdf5 structure.
  • Return crops and overlay traces.

Skeletons

  • Look into skeleton bias not seeming to work.

Tracing

  • Look into why fitted traces don't lie on the backbones (ordered traces do).
  • Get the contingency using ordinary tracing working - need to handle extra dict fields that aren't generated this way.

Config

  • Possibly move the save_fig config dpi param to the plotting dict as a high setting takes a while but are needed for skeletal plots. Or change so "null" takes the plotting config defaults, not "Figure".
  • Work out something for the mask_cmap to produce > 1 colour for crossing plots.

Code Tidy

  • Add docstrings.
  • Maybe tidy up the behemoth analyse_nodes function.

pre-commit hooks

The following checks have been disabled whilst @ns-rse started working on adding/developing tests (under #818) and will need re-enabling and the problems/errors they highlight addressing (doing so off the bat would be a massive amount of work and @ns-rse is focusing on writing unit-tests for the changes on this branch).

Specific hooks can be run on specific files with the following.

pre-commit run <hook-name> --file <path-to-file>

Once they pass the relevant checks the task can be checked off.

numpydoc-validation

Disabled for the following files by adding patterns to pyproject.toml configuration file. Lines that exclude are noted to assist with their removal. Once removed run the pre-commit run numpydoc-validation --file <path-to-file> locally to check everything.

  • topostats/tracing/dnatracing.py (line 275)
  • topostats/tracing/nodestats.py (line 276)
  • topostats/tracing/tracinfuncs.py (line 277) - may be redundant as code may have moved to other modules.
  • topostats/theme.py (line 279)

codespell

The following files have been excluded from this check for now, the relevant line to remove them from in pyproject.toml is 258...

[tool.codespell]
skip = '*.spm*,*.mplstyle,*.svg,nodestats.py,dnatracing.py'
count = ''
quiet-level = 3
  • topostats/tracing/dnatracing.py
  • topostats/tracing/nodestats.py

pylint

Currently the following files are explicitly excluded from being checked by pylint this will need reinstating and issues flagged addressing.

  • topostats/tracing/nodestats.py
  • topostats/tracing/dnatracing.py
@ns-rse
Copy link
Collaborator

ns-rse commented Feb 12, 2024

This will hold off any releases to PyPi in the meantime as the Software will be untested.

It will however mean people can work off of a single main branch installed from Git which will hold all the desired features.

Height-tracing and tests were merged with #790 this morning, closing the original issue #488.

@MaxGamill-Sheffield
Copy link
Collaborator Author

MaxGamill-Sheffield commented Feb 23, 2024

Skeletonisation and pruning added together with the nodestats tracing due to their intertwining and is on branch maxgamill-sheffield/800-better-tracing (linked in the development section).

It currently runs to completion when testing on the nice cat image and minicircles.spm, but a few things have not translated over well...

Current ToDo list:
Outputs

  • Get advice from experimentalists - How do you want the data saving in the csv if any? Due to multiple molecules per grain.
  • Get Sylvia to check and comment on the hdf5 structure.
  • Return crops and overlay traces.

Skeletons

  • Look into skeleton bias not seeming to work.

Tracing

  • Look into why fitted traces don't lie on the backbones (ordered traces do).
  • Get the contingency using ordinary tracing working - need to handle extra dict fields that aren't generated this way.

Config

  • Possibly move the save_fig config dpi param to the plotting dict as a high setting takes a while but are needed for skeletal plots. Or change so "null" takes the plotting config defaults, not "Figure".
  • Work out something for the mask_cmap to produce > 1 colour for crossing plots.

Code Tidy

  • Add docstrings.
  • Maybe tidy up the behemoth analyse_nodes function.

@MaxGamill-Sheffield
Copy link
Collaborator Author

MaxGamill-Sheffield commented Feb 23, 2024

Get advice from experimentalists

Happy with the state of what goes into the grainstats file and the .topostats files and it makes sense.
Distinction between grain and molecule seems clear.
Only thing they wanted to add was to understand how to load data from the .topostats file to plot.

@llwiggins
Copy link
Collaborator

Should U-net masking be included as part of this merger, or perhaps as a separate issue/PR?

@MaxGamill-Sheffield
Copy link
Collaborator Author

Should U-net masking be included as part of this merger, or perhaps as a separate issue/PR?

Great point! There are a few edits within the cats branch to improve the U-Net code (1 object from the segmentation and square crop edge-cases) but I think as it's quite separated from the rest of the code to be added, it might be more beneficial to start that as a separate branch and then merge it.

What do you think @SylviaWhittle?

@ns-rse
Copy link
Collaborator

ns-rse commented May 21, 2024

Hi @MaxGamill-Sheffield,

I'm back on writing more tests for pruning and looking at the topostats.tracing.pruning.topostatsPrune() class. There is one method prune_all_skeletons() that I hadn't previously written tests for but now I look at it I've got some questions about it.

Question 1 - Going loopy

This method uses skimage.morphology.label() to label grains and then loops over them. This shouldn't be needed though since in #600 I refactored the tracing class to work with single grains and the control of tracing multiple grains from a single image is now done in the processing.run_dnatracing() function.

Is this looping a throwback to the older code from before the refactoring or is there a specific reason to loop over multiple grains? Currently I don't think it would do more than one loop since skimage.morphology.label() would label entangled grains as a single item.

Question 2 - Length v Height v both?

The logic in the prune_all_skeletons() isn't clear to me.

Currently it has...

    def prune_all_skeletons(self) -> npt.NDArray:
        """
        Prune all skeletons.
        Returns
        -------
        npt.NDArray
            A single mask with all pruned skeletons.
        """
        pruned_skeleton_mask = np.zeros_like(self.skeleton)
        labeled_skel = morphology.label(self.skeleton)
        for i in range(1, labeled_skel.max() + 1):
            single_skeleton = np.where(labeled_skel == i, 1, 0)
            if self.max_length is not None:
                single_skeleton = self._prune_by_length(single_skeleton, max_length=self.max_length)
            if self.height_threshold is not None:
                single_skeleton = heightPruning(
                    self.img,
                    single_skeleton,
                    height_threshold=self.height_threshold,
                    method_values=self.method_values,
                    method_outlier=self.method_outlier,
                ).remove_bridges()
            # skeletonise to remove nibs
            pruned_skeleton_mask += getSkeleton(self.img, single_skeleton, method="zhang").get_skeleton()
        return pruned_skeleton_mask

Within the for loop there are two if statements.

  1. Prune single_skeleton if the self.max_length is not None. The value for this parameter set in the topostats/default_config.yaml of your branch is -1 (which triggers length based pruning based on 15% of grain length).
  2. Prune single_skeleton if the self.height_threshold is not None. The value for this parameter set in the topostats/default_config.yaml of your branch is None.

This sounds ok, but what if a configuration such as...

dnatracing:
  run: true # Options : true, false
  ...
  pruning_params:
    pruning_method: topostats
    max_length: -1
    height_threshold: 789
    method_values: mid
    method_outlier: mean_abs

...were used by a user? Its not clear to the user, based on the docstrings as they stand what would happen.

I can say looking at the code that only the height-based pruning would be undertaken since the results of pruning based on length are not passed into the heightPruning() class, but is this what is expected?

Are there cases where both length and height based pruning should be applied?

Background

I realised this when starting to remove the for: loop and refactor and write out paramterised tests to check the effects of the different combinations of options for max_length / height_threshold / method_values / method_outlier as ideally tests should test the effects of different options but then I realised the lack of clarity about what behaviour is intended or required.

@MaxGamill-Sheffield
Copy link
Collaborator Author

Hey @ns-rse ,

A1: Yep just old code that I didn't have time to refactor, and I think it was designed similarly to the skeletonisation code which did the same thing as that's what the skimage methods did.

A2: This should absolutely take both the parameters from the config and performing both types of pruning on the skeleton, but concurrently (obvs there may be some issue with which comes first but this is sensible to me as they address different problems). However, maybe I'm misunderstanding but both pruning types update single_skeleton and use single_skeleton so I thought this would be working as intended and both will run under your config?

Also unless there are catastrophic things wrong with the code (e.g. if I've misunderstood here and both pruning stages can't happen together) I'm not sure it's worth your time refactoring the code so we can make the most of your time with us.

@ns-rse
Copy link
Collaborator

ns-rse commented May 22, 2024

D'oh! 🤦 I clearly hadn't engaged my brain when I sat down and looked at the code yesterday as it will indeed apply both pruning types. I'm an idiot sometimes, thanks for pointing out my error so politely.

As I'm unlikely to look at this code again any time soon I will refactor after writing tests to remove the unnecessary loop, otherwise it will persist in the code base and need removing somewhen which may never happen. By writing a test first and then refactoring it will ensure I don't break anything.

Cheers,

@ns-rse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants