Memory efficient implementation of SWISH and MISH

Swish from this paper
Mish from this paper

Mish

Swish

Implementations

These two activation functions are implemented using the Pytorch custom Function. This implementation can save around 20% memory usage.

i.e:

This Swish implementation: 1816 MB Simple swish implementation: 2072 MB

This Mish implementation: 1816 MB Simple Mish implementation: 2328 MB

Usage

Usage: similar to torch.nn.ReLU()...and torch.autograd.Function

from swish import Swish
from mish import Mish
self.conv1 = nn.Sequential(
                            nn.Linear(256, width),
                            Swish(),
                            nn.BatchNorm1d(width),
                            nn.Linear(width, 1)
                          )

self.conv2 = nn.Sequential(
                            nn.Linear(256, width),
                            Mish(),
                            nn.BatchNorm1d(width),
                            nn.Linear(width, 1)
                          )

Performance

More details on the comparison between these two activation functions can be found from their papers.

From my experiments on mono depth estimation, both of these perform on par or better than ReLU6. Mish performs slightly better then Swish and ReLU6.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
mish.py		mish.py
swish.py		swish.py
test_mish.py		test_mish.py
test_swish.py		test_swish.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

.gitignore

.gitignore

README.md

README.md

mish.py

mish.py

swish.py

swish.py

test_mish.py

test_mish.py

test_swish.py

test_swish.py

Repository files navigation

Memory efficient implementation of SWISH and MISH

Mish

Swish

Implementations

Usage

Performance

About

Releases

Packages

Languages

tyunist/memory_efficient_mish_swish

Folders and files

Latest commit

History

Repository files navigation

Memory efficient implementation of SWISH and MISH

Mish

Swish

Implementations

Usage

Performance

About

Resources

Stars

Watchers

Forks

Languages