MFM

Unofficial code for paper "Masked Feature Prediction for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)

Below are experiments with resnet50. Though better result is achieved, it seems that the baseline is also much higher than in paper.

	top-1 acc	pretrain	finetune
paper scratch	78.1	-	-
paper mfm pretrain	78.5	-	-
scratch	78.542	-	link
supervised pretrain	78.942	-	link
mfm pretrain	78.826	link	link

Note: Supervised pretrain means finetune from torchvision resnet weights (by setting pretrained=True). It seems that supervised pretrain is better than the proposed mfm pretrain.

Platform

pytorch 1.13.1
torchvision 0.14.1
dali 1.21.0
cuda 11.6
V100 GPU(32G) x 8
driver: 470.82.01

Dataset

Prepare imagenet val set in same method as pytorch official classification example, and then link them to the folder of this repo:

    $ mkdir -p imagenet
    $ ln -s /path/to/imagenet/train ./imagenet/train
    $ ln -s /path/to/imagenet/val ./imagenet/val

Train

Pretraining and finetuning Command is here.

More ablations

Here are some points that affects the results:

finetune --val-resize-size
When we eval the model after finetuning, we always resize the short side of the image to a fixed value before a center crop operation. Here I find sometimes the value of fixed short side size affects the acc by a noticeable margin. Take the "supervised pretrain" as example:

val-resize-size 234 235 236
top-1 acc 78.856 78.942 78.794
finetune with bce loss is important
We can see this by finetuning from scratch with CE(cross entropy) loss and BCE(binary cross entropy) loss, the result is:

loss CE BCE
top-1 acc 78.542 78.952
pretrain random crop area
We usually crop a part of the image with certain area ratio from the original image, and the default value of this ratio is 0.08-1.0 with torchvision RandomResizedCrop. Different self-supervised learning methods tend to prefer different random area ratios. For example, MAE uses 0.2-1.0, MAE3d uses 0.5-1.0, and SimMIM uses 0.67-1.0. Here I find a smaller lower bound of 0.2-1.0 is better:

random area ratio 0.67-1.0 0.2-1.0 0.1-1.0
top-1 acc 78.770 78.826 78.842

Though here 0.1-1.0 is better than 0.2-1.0, I still use the latter, since, with 0.1-1.0, the finetuning eval result is more affacted by val-resize-size:

val-resize-size 234 235 236
0.2-1.0 78.816 78.826 78.796

0.1-1.0 78.730 78.842 78.738

model variance
Here I pretrain the model for 4 times(2 on 8 v100 gpu, and 2 on 8 p40 gpu) with identical configuration. Then I finetune 3 times for each of the pretrained model(with 8 p40). Results are listed below. We can see that the results varies between a big margin. Maybe the above good results are brought by a good luck. Hence, I cannot say that I have certainly reproduced the results in the paper now.

pretrain	finetune	acc1(235)	mean/std
round 1	round 1	78.654	78.644/0.024	78.621/0.08
	round 2	78.61
	round 3	78.668
round 2	round 1	78.646	78.642/0.122
	round 2	78.79
	round 3	78.49
round 3	round 1	78.516	78.612/0.073
	round 2	78.626
	round 3	78.694
round 4	round 1	78.608	78.584/0.080
	round 2	78.668
	round 3	78.476

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
mfm		mfm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dist_train.sh		dist_train.sh
test.sh		test.sh
train_finetune.py		train_finetune.py
train_mfm.py		train_mfm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mfm

mfm

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

dist_train.sh

dist_train.sh

test.sh

test.sh

train_finetune.py

train_finetune.py

train_mfm.py

train_mfm.py

Repository files navigation

MFM

Platform

Dataset

Train

More ablations

About

Releases 1

Packages

Languages

random area ratio	0.67-1.0	0.2-1.0	0.1-1.0
top-1 acc	78.770	78.826	78.842

val-resize-size	234	235	236
0.2-1.0	78.816	78.826	78.796
0.1-1.0	78.730	78.842	78.738

loss	CE	BCE
top-1 acc	78.542	78.952

License

CoinCheung/MFM

Folders and files

Latest commit

History

Repository files navigation

MFM

Platform

Dataset

Train

More ablations

About

Topics

Resources

License

Stars

Watchers

Forks

Languages