Added custom semantic segmentation trainer tutorial #1897

calebrob6 · 2024-02-21T23:44:33Z

This is a tutorial notebook that shows users how to override a custom semantic segmentation class for training on LandCoverAI.

calebrob6 · 2024-02-22T21:00:30Z

@adamjstewart I think there is a bug (or at least some weird behavior going on) here

I can create a new class that extends SemanticSegmentationTask:

class CustomSemanticSegmentationTask(SemanticSegmentationTask):

    # any keywords we add here between *args and **kwargs will be found in self.hparams
    def __init__(self, *args, tmax=50, eta_min=1e-6, **kwargs) -> None:
        super().__init__(*args, **kwargs)  # pass args and kwargs to the parent class

Then instantiate and everything works great

task = CustomSemanticSegmentationTask(model="unet", tmax=100, ...)

However when I go to load from file:

task = CustomSemanticSegmentationTask.load_from_checkpoint("lightning_logs/version_3/checkpoints/epoch=0-step=117.ckpt")

I get an error:

TypeError: SemanticSegmentationTask.__init__() got an unexpected keyword argument 'ignore'

I can add del kwargs["ignore"] before super().... in the constructor of CustomSemanticSegmentationTask but this feels like a bad hack.

(@isaaccorley in case you've seen this)

robmarkcole · 2024-02-23T10:48:52Z

Note that https://www.reviewnb.com/ is free for 'educational' use - would enable previewing the notebook

docs/tutorials/custom_segmentation_trainer.py

robmarkcole · 2024-02-23T16:51:08Z

docs/tutorials/custom_segmentation_trainer.py

+
+# ## Test model
+#
+# Finally, we test the model on the test set and visualize the results.


visualize the results - would make me expect some plots, it is a table of metrics however

Yep -- not finished here. Need to get some agreement on:

How to review

How to deal with long running notebooks in CI. If I run this notebook for 250 epochs (7 hours on my machine) then I get 85.95 test mIoU which is better than the results reported in https://arxiv.org/pdf/2005.02264.pdf. I think fake datasets are less interesting.

How to deal with the above issue I pointed out where the ignore parameter is being saved as a hparam, then breaking the subclass constructor.

I think it would be fine to train for 1 batch as this notebook is about demonstrating the technique, in which case a CPU can be used

calebrob6 · 2024-02-23T17:04:10Z

@robmarkcole thanks for the review! Was that easy enough to do (vs reviewnb)?

On my side, I just have to run jupytext --sync custom_segmentation_trainer.ipynb whenever I change either file and commit both (which seems simple enough). We could have CI that makes sure that all py and ipynb are in sync.

robmarkcole · 2024-02-23T17:56:57Z

Yep easy enough

isaaccorley · 2024-02-23T17:58:43Z

@adamjstewart I think there is a bug (or at least some weird behavior going on) here

I can create a new class that extends SemanticSegmentationTask:
class CustomSemanticSegmentationTask(SemanticSegmentationTask):



    # any keywords we add here between *args and **kwargs will be found in self.hparams

    def __init__(self, *args, tmax=50, eta_min=1e-6, **kwargs) -> None:

        super().__init__(*args, **kwargs)  # pass args and kwargs to the parent class
Then instantiate and everything works great
task = CustomSemanticSegmentationTask(model="unet", tmax=100, ...)
However when I go to load from file:
task = CustomSemanticSegmentationTask.load_from_checkpoint("lightning_logs/version_3/checkpoints/epoch=0-step=117.ckpt")
I get an error:
TypeError: SemanticSegmentationTask.__init__() got an unexpected keyword argument 'ignore'
I can add del kwargs["ignore"] before super().... in the constructor of CustomSemanticSegmentationTask but this feels like a bad hack.

(@isaaccorley in case you've seen this)

I don't see an ignore param in your custom class. Did you modify the class code you used after training that checkpoint? If so, one thing you can do is to just load the checkpoint, delete that param in the hparam dict and then save the checkpoint which would fix the error.

calebrob6 · 2024-02-23T19:29:09Z

The ignore param is stored in task.hparams because SemanticSegmentationTask passes it to BaseTask, -- https://github.com/microsoft/torchgeo/blob/main/torchgeo/trainers/segmentation.py#L98.

I want to be able to do something like this:

class CustomSemanticSegmentationTask(SemanticSegmentationTask):
    def __init__(self, *args, my_custom_arg, **kwargs) -> None:
        super().__init__(*args, **kwargs)

i.e. use the constructor from SemanticSegmentationTask and not have to copy paste the args and logic from SemanticSegmentationTask.

This works fine but, when I try to load a version of this class from checkpoint ignore is passed through to SemanticSegmentationTask as it is a kwarg saved in hparams.

calebrob6 · 2024-02-23T19:30:49Z

One workaround that I'm checking is just adding "ignore" to the list of args ignored in save_hyperparameters in BaseTask.

isaaccorley · 2024-02-23T20:12:24Z

One workaround that I'm checking is just adding "ignore" to the list of args ignored in save_hyperparameters in BaseTask.

I think this makes the most sense since it's not an actual hparam.

calebrob6 · 2024-02-23T21:46:34Z

The way that I've come up with to check whether an .ipynb is in sync with the corresponding .py is diff <(jupytext --output - --to py custom_segmentation_trainer.ipynb) custom_segmentation_trainer.py. This should only output something like:

3a4
> #     formats: ipynb,py

calebrob6 · 2024-02-23T22:15:05Z

I don't know why notebook test is being cancelled (maybe because it is trying to run the LandCoverAI split script?).

adamjstewart · 2024-02-25T19:54:06Z

I don't know why notebook test is being cancelled (maybe because it is trying to run the LandCoverAI split script?).

Tests being canceled means the job either ran out of time, space, or memory. Here, my guess would be space. We want to use the smallest datasets possible, such as EuroSAT100. Might be worth creating a LandCoverAI100 or something like that.

adamjstewart · 2024-02-25T19:54:51Z

Haven't yet had time to evaluate jupytext to decide whether or not it's what we should use. @nilsleh what did you end up using for lightning-uq-box?

calebrob6 · 2024-02-25T20:09:18Z

I really don't like the idea of making custom datasets/datamodules just to have pretty CI -- it is a large overhead for something that makes the tutorial less cool.

adamjstewart · 2024-02-25T20:35:50Z

And I really don't like having tutorials that take 30+ minutes to download a dataset and train a model for hundreds of epochs, or tutorials that can't be tested in CI because they involve more data than our runners can store. There's always a tradeoff. You can also find a smaller dataset instead of making your own.

calebrob6 · 2024-02-25T22:15:08Z

docs/tutorials/custom_segmentation_trainer.py

+#
+# The remainder of the turial is straightforward and follows the typical [PyTorch Lightning](https://lightning.ai/) training routine. We instantiate a `DataModule` for the LandCover.AI dataset, instantiate a `CustomSemanticSegmentationTask` with a U-Net and ResNet-50 backbone, then train the model using a Lightning trainer.
+
+dm = LandCoverAIDataModule(root="data/", batch_size=64, num_workers=8, download=True)


Suggested change

dm = LandCoverAIDataModule(root="data/", batch_size=64, num_workers=8, download=True)

dm = LandCoverAIDataModule(root="data/", batch_size=2, num_workers=8, download=True)

calebrob6 · 2024-02-25T22:15:56Z

Luckily none of that happens here ;). LandCover.ai is 1.5GB (this will take 30+ minutes to download if your download speed is < 0.83 MB/s), training happens for 1 batch (and I can reduce the batch size to make this faster).

I'm saying that we shouldn't be catching bugs with overly sanitized examples -- if LandCoverAI breaks or Lightning training breaks then our other tests will break. If the example notebooks are catching bugs then we should ask ourselves why. Downloading LandCoverAI and running this now and per release doesn't seem to be a big burden.

calebrob6 · 2024-02-25T22:18:53Z

How about-- is it possible to change LandCoverAI datamodule to use the test data that we already have spent time creating for this notebook (then comments saying if you actually want to play with data, then do this other thing)?

nilsleh · 2024-02-26T08:56:06Z

Haven't yet had time to evaluate jupytext to decide whether or not it's what we should use. @nilsleh what did you end up using for lightning-uq-box?

So I tried jupytext, but for my tutorials I couldn't get the jupytext scripts to execute and be displayed on the documentation. I went back to notebooks, and the notebooks are now run when the documentation builds. However, I don't need to download any data and model fitting is fast, since it's just toy problems.

adamjstewart · 2024-02-26T12:19:02Z

How about-- is it possible to change LandCoverAI datamodule to use the test data that we already have spent time creating for this notebook (then comments saying if you actually want to play with data, then do this other thing)?

This is also undesirable because the notebook by default will train and display predictions on random noise. I would much rather have a tiny dataset with real data. I'm happy to make this myself (maybe next week).

calebrob6 · 2024-02-26T12:29:08Z

I don't think it needs to display predictions -- as if we're only training for a batch for "making CI pretty" reasons then it will display noise regardless.

adamjstewart · 2024-02-26T16:27:56Z

We can train for multiple epochs in the tutorial, but use fast_dev_run in CI. The trainers tutorial does this with nbmake variable mocking. This also means we could use the synthetic dataset during CI and the real dataset during the tutorial. My only hope is that we don't train for longer than it takes to explain what we're doing in a live tutorial session. If we use this notebook in a tutorial and end up sitting there watching it awkwardly for an hour then it gets tedious.

Added custom segmentation trainer tutorial

907c846

github-actions bot added the documentation Improvements or additions to documentation label Feb 21, 2024

calebrob6 added 2 commits February 21, 2024 23:50

Black formatting

a555656

Updates

080a24f

black formatting

7153f19

calebrob6 added 2 commits February 23, 2024 16:40

Adding a jupytext synced python file

b473e9c

Trying jupytext

d36a70d

robmarkcole reviewed Feb 23, 2024

View reviewed changes

docs/tutorials/custom_segmentation_trainer.py Show resolved Hide resolved

robmarkcole reviewed Feb 23, 2024

View reviewed changes

docs/tutorials/custom_segmentation_trainer.py Outdated Show resolved Hide resolved

robmarkcole reviewed Feb 23, 2024

View reviewed changes

docs/tutorials/custom_segmentation_trainer.py Show resolved Hide resolved

Linting

94f3d82

robmarkcole reviewed Feb 23, 2024

View reviewed changes

calebrob6 added 2 commits February 23, 2024 16:53

Typo fix and constructor fix

1c38a6f

Trainer args

f07b4a7

Update num epochs

176d22c

Make base trainer class work properly

eb65c25

github-actions bot added the trainers PyTorch Lightning trainers label Feb 23, 2024

calebrob6 added 5 commits February 23, 2024 20:59

didn't account for None

f8d42f2

Updating kernelspec name

3e425db

Make mypy happy I think

aa7a549

Add tutorial to docs

6a3f291

Linting

1681fe5

Adding torchgeo datasets install

9af5e8d

Add dataset dependency to tutorial workflow

02d3083

github-actions bot added the testing Continuous integration testing label Feb 23, 2024

adamjstewart added this to the 0.6.0 milestone Feb 25, 2024

calebrob6 commented Feb 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added custom semantic segmentation trainer tutorial #1897

Added custom semantic segmentation trainer tutorial #1897

calebrob6 commented Feb 21, 2024

calebrob6 commented Feb 22, 2024

robmarkcole commented Feb 23, 2024 •

edited

robmarkcole Feb 23, 2024

calebrob6 Feb 23, 2024

robmarkcole Feb 23, 2024

calebrob6 commented Feb 23, 2024

robmarkcole commented Feb 23, 2024

isaaccorley commented Feb 23, 2024

calebrob6 commented Feb 23, 2024

calebrob6 commented Feb 23, 2024 •

edited

isaaccorley commented Feb 23, 2024

calebrob6 commented Feb 23, 2024

calebrob6 commented Feb 23, 2024

adamjstewart commented Feb 25, 2024

adamjstewart commented Feb 25, 2024

calebrob6 commented Feb 25, 2024

adamjstewart commented Feb 25, 2024

calebrob6 Feb 25, 2024

calebrob6 commented Feb 25, 2024

calebrob6 commented Feb 25, 2024

nilsleh commented Feb 26, 2024

adamjstewart commented Feb 26, 2024

calebrob6 commented Feb 26, 2024

adamjstewart commented Feb 26, 2024

	dm = LandCoverAIDataModule(root="data/", batch_size=64, num_workers=8, download=True)
	dm = LandCoverAIDataModule(root="data/", batch_size=2, num_workers=8, download=True)

Added custom semantic segmentation trainer tutorial #1897

Are you sure you want to change the base?

Added custom semantic segmentation trainer tutorial #1897

Conversation

calebrob6 commented Feb 21, 2024

calebrob6 commented Feb 22, 2024

robmarkcole commented Feb 23, 2024 • edited

robmarkcole Feb 23, 2024

Choose a reason for hiding this comment

calebrob6 Feb 23, 2024

Choose a reason for hiding this comment

robmarkcole Feb 23, 2024

Choose a reason for hiding this comment

calebrob6 commented Feb 23, 2024

robmarkcole commented Feb 23, 2024

isaaccorley commented Feb 23, 2024

calebrob6 commented Feb 23, 2024

calebrob6 commented Feb 23, 2024 • edited

isaaccorley commented Feb 23, 2024

calebrob6 commented Feb 23, 2024

calebrob6 commented Feb 23, 2024

adamjstewart commented Feb 25, 2024

adamjstewart commented Feb 25, 2024

calebrob6 commented Feb 25, 2024

adamjstewart commented Feb 25, 2024

calebrob6 Feb 25, 2024

Choose a reason for hiding this comment

calebrob6 commented Feb 25, 2024

calebrob6 commented Feb 25, 2024

nilsleh commented Feb 26, 2024

adamjstewart commented Feb 26, 2024

calebrob6 commented Feb 26, 2024

adamjstewart commented Feb 26, 2024

robmarkcole commented Feb 23, 2024 •

edited

calebrob6 commented Feb 23, 2024 •

edited