ConvTranspose2d layers not being tracked #43

marthinwurer · 2021-09-12T00:32:48Z

class simple(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.deconv1 = nn.ConvTranspose2d(32, 3, 3)

simple_model = simple()
tracker2 = CheckLayerSat("my_experiment", save_to="plotcsv", modules=simple_model, device=image.device)

output:

added layer conv1
Skipping deconv1

This is an awesome tool, but I'd love to see how well the decoder part of my autoencoder works.

The text was updated successfully, but these errors were encountered:

MLRichter · 2021-09-12T10:04:32Z

Hi,
thanks for pointing this out.
I think this should be an easy fix / new feature.
I try to get a PR done this week.

marthinwurer · 2021-09-12T14:27:01Z

Awesome!

If I can ask some more questions, this is the result from swapping the deconv for upscaling and regular conv. When you have high saturation like shown, is it best to add more filters or add more layers, or both?

MLRichter · 2021-09-12T14:55:23Z

My research is primarily focused on classifiers, so take everything I say with a grain of salt.

With that said, you should try to manipulate the network as a whole and never individual layers.
So if you increase the amount of filters, you should scale them globally.
This will change the saturation level. (you should aim for something arround 20-40%, depending on the dataset

However, with the width (number of filters per layer) in a network it is somewhat like the trees in a decision Forrest, in the sense that increasing only hurts computational efficiency and performance decreases are generally rather minor.

What is more worrisome is that the first part of the network (which has the explicit purpose of compressing information) is barely utilized according to the saturation values.
This may indicate a to deep system, which means that you should be able to remove layers from the compressing part of the network without losing to much performance.
Here is an example of how this looks like for a CNN classifier:

The best way of optimizing performance usually is first reducing the layers and then fiddling around with the width of the network.
In the end your goal should be a roughly even distributed saturation with an average roughly between 20 and 40%
Can you maybe provide some additional details on the neuroarchitecture?

marthinwurer · 2021-09-12T19:48:01Z

So, I'm trying to implement the autoencoder from world models (https://worldmodels.github.io/):

This was the output from delve on that arch (minus the deconv, plus an extra layer cuz I couldn't read, and a straight-through rather than variational latent):

I wasn't getting as good of reconstruction results as the other paper, and so I wanted to use delve to try and figure out the bottlenecks. I switched to a different arch that mostly used regular conv layers (well, coordconv because that paper looked useful and I'm already a mess) so I could use it.

The arch in the graph I first posted:

        self.encoder = nn.Sequential(
            CoordConv2d(3, 32, 3, padding=1),  # 64
            activation(),
            SpectralPool2d(.5),  # 32
            CoordConv2d(32, 64, 3, padding=1),
            activation(),
            SpectralPool2d(.5),  # 16
            CoordConv2d(64, 128, 3, padding=1),
            activation(),
            SpectralPool2d(.5),  # 8
            CoordConv2d(128, 256, 3, padding=1),
            activation(),
            SpectralPool2d(.25),  # 2
            CoordConv2d(256, 256, 3, padding=1),
            activation(),
        )
        self.compress = nn.Linear(1024, latent_size)
        self.decompress = nn.Linear(latent_size, 1024)
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(1024, 256, 4),  # 4x4
            activation(),
            SpectralPool2d(2),  # 8x8
            CoordConv2d(256, 128, 3, padding=1),
            activation(),
            SpectralPool2d(2),  # 16
            CoordConv2d(128, 64, 3, padding=1),
            activation(),
            SpectralPool2d(2),  # 32
            CoordConv2d(64, 32, 3, padding=1),
            activation(),
            SpectralPool2d(2),  # 64
            CoordConv2d(32, 3, 3, padding=1),
        )

I'm mostly just messing around at this point, I'm definitely not a professional in this field. Thanks for the help!

marthinwurer · 2021-09-14T14:52:45Z

For closure, I figured out what my issue was!

I ended up going back to basics and implementing the network without any modifications. That didn't solve my issue; it actually made it worse! It looked like the network wasn't training at all, and I ended up using some code from here to look at the gradients in the network, which showed that they were tiny. The issue ended up being with my optimizer: I had messed with the Adam hyperparameters in a misguided attempt to fix some previous issue. Resetting those fixed it, and I got much better results than I had before. Now the graph from delve looks like this:

Getting a graph of the gradients was super helpful; it might be another good statistic to track with delve. If you want, I can open another issue with a feature request for that.

MLRichter · 2021-09-25T13:36:45Z

I have two minor updates regarding this.
First, recording the gradients is a bit trickier, since the backward-hook logic is substantially less stable in torch than the forward hook logic.
I have built some prototypes meanwhile, but it will take some time to make it into a stable feature.

After digging through the code, I think a mid-sized refactoring is necessary to allow the inclusion of non-standard layers into the saturation statistics like transposed convolutions or more custom, functional-style convolutions, like the ones used in the EfficientNet-models.
This requires a more modularized, less monolithic approach to layer recordings.
I am currently working on the concept.
If everything goes well, you can expect a PR in the next month or so.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvTranspose2d layers not being tracked #43

ConvTranspose2d layers not being tracked #43

marthinwurer commented Sep 12, 2021

MLRichter commented Sep 12, 2021

marthinwurer commented Sep 12, 2021

MLRichter commented Sep 12, 2021 •

edited

marthinwurer commented Sep 12, 2021 •

edited

marthinwurer commented Sep 14, 2021

MLRichter commented Sep 25, 2021 •

edited

ConvTranspose2d layers not being tracked #43

ConvTranspose2d layers not being tracked #43

Comments

marthinwurer commented Sep 12, 2021

MLRichter commented Sep 12, 2021

marthinwurer commented Sep 12, 2021

MLRichter commented Sep 12, 2021 • edited

marthinwurer commented Sep 12, 2021 • edited

marthinwurer commented Sep 14, 2021

MLRichter commented Sep 25, 2021 • edited

MLRichter commented Sep 12, 2021 •

edited

marthinwurer commented Sep 12, 2021 •

edited

MLRichter commented Sep 25, 2021 •

edited