Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ThresholdTiler #539

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

erich-r
Copy link

@erich-r erich-r commented Mar 21, 2023

Description

Added a new tiler that, given a scorer and a threshold, extracts all tiles whose score is above the threshold.
There may be instances when the user does not want the top n tiles from a slide, but rather every tile with a score over a specified threshold.
In this instance, the user does not know beforehand how many tiles must be extracted.

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Documentation

Issues Fixed or Closed by This PR

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

@codecov
Copy link

codecov bot commented Mar 21, 2023

Codecov Report

Patch coverage: 98.36% and project coverage change: -0.07% ⚠️

Comparison is base (7fc5fcd) 100.00% compared to head (dc6cfd5) 99.93%.
Report is 64 commits behind head on master.

❗ Current head dc6cfd5 differs from pull request most recent head 2b8e315. Consider uploading reports for the commit 2b8e315 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##            master     #539      +/-   ##
===========================================
- Coverage   100.00%   99.93%   -0.07%     
===========================================
  Files           19       19              
  Lines         1576     1637      +61     
  Branches       165      175      +10     
===========================================
+ Hits          1576     1636      +60     
- Partials         0        1       +1     
Files Changed Coverage Δ
histolab/tiler.py 99.70% <98.36%> (-0.30%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ernestoarbitrio ernestoarbitrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @erich-r 👋

Thanks for your contribution, we really appreciate it. ❤️

I had a few time to do a first round of code review and left some comments here.

Please take a look at my comments and also I believe that you miss the integration tests for this new tiler. Unit tests are not enough to prevent regressions.

My comments are mostly implementation and design wise ... waiting for @alessiamarcolini or @nicolebussola for a paired CR.

histolab/tiler.py Outdated Show resolved Hide resolved
tile = Tile(image, coords)
_tiles_generator = method_mock(request, GridTiler, "_tiles_generator")
# it needs to be a generator
_tiles_generator.return_value = ((tile, coords) for i in range(3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _tiles_generator return value in the signatures is Tuple[List[Tuple[float, CoordinatePair]], List[Tuple[float, CoordinatePair]]] and here you are assigning (tile, coords) I don't think it's quite correct.
So or the test is wrong or the signature is wrong. :D

Copy link
Author

@erich-r erich-r Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method being mocked is _tiles_generator of the GridTiler class, which I think has the signature Tuple[Tile, CoordinatePair], so I think both test and signatura are correct (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, but now the question is why are you mocking the GridTiler? You should mock the one from the ThresholdTiler

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, but now the question is why are you mocking the GridTiler?

Because the aim of the test is to check whether the tiler can extract the scores from tiles, so I think it does not matter if the tiles come from a GridTiler.

In the test it_can_calculate_filtered_tiles I am calling the _tiles_generator from a ThresholdTiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but if you are testing the ThresholdTiler that has its own _tiles_generator you should mock that one not the super() one. This way it's more readable. Furthermore I do have doubts (as written in another message) that GridTiler inheritance is weird. 🤷🏽‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ernestoarbitrio: here we should mock the _tiles_generator from the ThresholdTiler to be more precise (and the tests for the ScoreTiler should do the same, oops) and assign the correct return value

histolab/tiler.py Outdated Show resolved Hide resolved
Comment on lines +1237 to +1242
tile = slide.extract_tile(
tile_wsi_coords,
tile_size=self.final_tile_size,
mpp=self.mpp,
level=self.level if self.mpp is None else None,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the tile should be extracted in the _tiles_generator and then iterating over the tiles as we do in the GridTiler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I did understand what you meant, but I think this is the current behavior.
This is because the ThresholdTiler object calls _scores, which calls the _tiles_generator of GridTiler , and lastly assign each extracted tile a score.

histolab/tiler.py Show resolved Hide resolved
@nicolebussola nicolebussola self-requested a review March 27, 2023 13:07
@nicolebussola
Copy link
Collaborator

Hi Erich, thank you for your contribution! I think it could be more intuitive to select a threshold when the scores are scaled between 0 and 1 rather than on raw scores, what do you think?

1 similar comment
@nicolebussola
Copy link
Collaborator

Hi Erich, thank you for your contribution! I think it could be more intuitive to select a threshold when the scores are scaled between 0 and 1 rather than on raw scores, what do you think?

Copy link
Collaborator

@alessiamarcolini alessiamarcolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @erich-r thank you so much for your contribution ⭐

As @ernestoarbitrio pointed out, it's a bit weird that the ThresholdTiler inherits from the GridTiler and not the ScoreTiler - though I understand that the fact that the ScoreTiler imposes a n_tiles parameter and that is not acceptable in this case.

Anyway this is bringing a lot of code duplication, which makes me wonder whether we should move the common code to some other "support" Tiler, or move out the static methods to standalone functions... What do we think?

Comment on lines +1308 to +1311
for i in range(len(all_scores)):
if all_scores[i][0] > self.threshold:
filtered_tiles_scores.append(all_scores[i])
filtered_tiles_scaled_scores.append(scaled_scores[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be refactored into this, to make it more Pythonic

Suggested change
for i in range(len(all_scores)):
if all_scores[i][0] > self.threshold:
filtered_tiles_scores.append(all_scores[i])
filtered_tiles_scaled_scores.append(scaled_scores[i])
for score in all_scores:
if score[0] > self.threshold:
filtered_tiles_scores.append(score)
filtered_tiles_scaled_scores.append(score)

) -> None:
"""Save to ``filename`` the report of the saved tiles with the associated score.

The CSV file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see this line is cut (same as in the ScoreTiler). I'm not sure what we wanted to write here (?)
Could you remove this from both here and the ScoreTiler?

expectation,
):
slide = Slide(fixture_slide, "")
scored_tiles_extractor = ThresholdTiler(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we want to call it thresholded_tiles_extractor to be more on the theme?

tile = Tile(image, coords)
_tiles_generator = method_mock(request, GridTiler, "_tiles_generator")
# it needs to be a generator
_tiles_generator.return_value = ((tile, coords) for i in range(3))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ernestoarbitrio: here we should mock the _tiles_generator from the ThresholdTiler to be more precise (and the tests for the ScoreTiler should do the same, oops) and assign the correct return value

Comment on lines +1332 to +1334
_tiles_generator = method_mock(request, GridTiler, "_tiles_generator")
# it needs to be an empty generator
_tiles_generator.return_value = (n for n in [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

assert str(err.value) == "'threshold' cannot be negative (-1)"
_scores.assert_called_once_with(threshold_tiler, slide, binary_mask)

def it_can_extract_score_tiles(self, request, tmpdir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def it_can_extract_score_tiles(self, request, tmpdir):
def it_can_extract_thresholded_tiles(self, request, tmpdir):

[(0.8, coords), (0.7, coords)],
[(0.8, coords), (0.7, coords)],
)
_tile_filename = method_mock(request, GridTiler, "_tile_filename")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_tile_filename = method_mock(request, GridTiler, "_tile_filename")
_tile_filename = method_mock(request, ThresholdTiler, "_tile_filename")

@ernestoarbitrio
Copy link
Member

Hi 👋,
just checking @erich-r do you have any news on this PR?

@erich-r
Copy link
Author

erich-r commented Apr 7, 2023

Hi wave, just checking @erich-r do you have any news on this PR?

Hi!

  • I wrote the integration tests
  • Now the threshold is being compared with the normalized scores (as suggested by @nicolebussola )
    Next I'm going to apply @alessiamarcolini suggestions (as soon as possible)

@ernestoarbitrio
Copy link
Member

Hi wave, just checking @erich-r do you have any news on this PR?

Hi!

  • I wrote the integration tests
  • Now the threshold is being compared with the normalized scores (as suggested by @nicolebussola )
    Next I'm going to apply @alessiamarcolini suggestions (as soon as possible)

awesome thanks a lot

@alessiamarcolini
Copy link
Collaborator

hey @erich-r do you have any updates about this PR?

@alessiamarcolini alessiamarcolini deleted the branch histolab:main August 17, 2023 09:04
@alessiamarcolini alessiamarcolini changed the base branch from master to main August 17, 2023 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants