Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train vit_tiny (Light HQ-SAM for real-time need): ViT-Tiny HQ-SAM model? #130

Open
Andy718811 opened this issue Apr 20, 2024 · 0 comments

Comments

@Andy718811
Copy link

I tried to train the vit_tiny with tis argument, "python -m torch.distributed.launch --nproc_per_node=1 train.py --checkpoint ./pretrained_checkpoint/sam_hq_vit_tiny.pth --model-type vit_b --output work_dirs/hq_sam_tiny_l", but faced this problem, it seems tain.py can't be used to train the vit_tiny model. The full error message is down below.
Traceback (most recent call last):
File "train.py", line 700, in
main(net, train_datasets, valid_datasets, args)
File "train.py", line 366, in main
train(args, net, optimizer, train_dataloaders, valid_dataloaders, lr_scheduler)
File "train.py", line 393, in train
sam = sam_model_registryargs.model_type
File "/data/4TB/FENG/sam-hq-main/train/segment_anything_training/build_sam.py", line 38, in build_sam_vit_b
return _build_sam(
File "/data/4TB/FENG/sam-hq-main/train/segment_anything_training/build_sam.py", line 106, in _build_sam
sam.load_state_dict(state_dict)
File "/home/server3/anaconda3/envs/sam-hq/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Sam:
Missing key(s) in state_dict: "image_encoder.pos_embed", "image_encoder.patch_embed.proj.weight", "image_encoder.patch_embed.proj.bias", "image_encoder.blocks.0.norm1.weight", "image_encoder.blocks.0.norm1.bias", "image_encoder.blocks.0.attn.rel_pos_h", "image_encoder.blocks.0.attn.rel_pos_w", "image_encoder.blocks.0.attn.qkv.weight", "image_encoder.blocks.0.attn.qkv.bias", "image_encoder.blocks.0.attn.proj.weight", "image_encoder.blocks.0.attn.proj.bias", "image_encoder.blocks.0.norm2.weight", "image_encoder.blocks.0.norm2.bias", "image_encoder.blocks.0.mlp.lin1.weight", "image_encoder.blocks.0.mlp.lin1.bias", "image_encoder.blocks.0.mlp.lin2.weight", "image_encoder.blocks.0.mlp.lin2.bias", "image_encoder.blocks.1.norm1.weight", "image_encoder.blocks.1.norm1.bias", "image_encoder.blocks.1.attn.rel_pos_h", "image_encoder.blocks.1.attn.rel_pos_w", "image_encoder.blocks.1.attn.qkv.weight", "image_encoder.blocks.1.attn.qkv.bias", "image_encoder.blocks.1.attn.proj.weight", "image_encoder.blocks.1.attn.proj.bias", "image_encoder.blocks.1.norm2.weight", "image_encoder.blocks.1.norm2.bias", "image_encoder.blocks.1.mlp.lin1.weight", "image_encoder.blocks.1.mlp.lin1.bias", "image_encoder.blocks.1.mlp.lin2.weight", "image_encoder.blocks.1.mlp.lin2.bias", "image_encoder.blocks.2.norm1.weight", "image_encoder.blocks.2.norm1.bias", "image_encoder.blocks.2.attn.rel_pos_h", "image_encoder.blocks.2.attn.rel_pos_w", "image_encoder.blocks.2.attn.qkv.weight", "image_encoder.blocks.2.attn.qkv.bias", "image_encoder.blocks.2.attn.proj.weight", "image_encoder.blocks.2.attn.proj.bias", "image_encoder.blocks.2.norm2.weight", "image_encoder.blocks.2.norm2.bias", "image_encoder.blocks.2.mlp.lin1.weight", "image_encoder.blocks.2.mlp.lin1.bias", "image_encoder.blocks.2.mlp.lin2.weight", "image_encoder.blocks.2.mlp.lin2.bias", "image_encoder.blocks.3.norm1.weight", "image_encoder.blocks.3.norm1.bias", "image_encoder.blocks.3.attn.rel_pos_h", "image_encoder.blocks.3.attn.rel_pos_w", "image_encoder.blocks.3.attn.qkv.weight", "image_encoder.blocks.3.attn.qkv.bias", "image_encoder.blocks.3.attn.proj.weight", "image_encoder.blocks.3.attn.proj.bias", "image_encoder.blocks.3.norm2.weight", "image_encoder.blocks.3.norm2.bias", "image_encoder.blocks.3.mlp.lin1.weight", "image_encoder.blocks.3.mlp.lin1.bias", "image_encoder.blocks.3.mlp.lin2.weight", "image_encoder.blocks.3.mlp.lin2.bias", "image_encoder.blocks.4.norm1.weight", "image_encoder.blocks.4.norm1.bias", "image_encoder.blocks.4.attn.rel_pos_h", "image_encoder.blocks.4.attn.rel_pos_w", "image_encoder.blocks.4.attn.qkv.weight", "image_encoder.blocks.4.attn.qkv.bias", "image_encoder.blocks.4.attn.proj.weight", "image_encoder.blocks.4.attn.proj.bias", "image_encoder.blocks.4.norm2.weight", "image_encoder.blocks.4.norm2.bias", "image_encoder.blocks.4.mlp.lin1.weight", "image_encoder.blocks.4.mlp.lin1.bias", "image_encoder.blocks.4.mlp.lin2.weight", "image_encoder.blocks.4.mlp.lin2.bias", "image_encoder.blocks.5.norm1.weight", "image_encoder.blocks.5.norm1.bias", "image_encoder.blocks.5.attn.rel_pos_h", "image_encoder.blocks.5.attn.rel_pos_w", "image_encoder.blocks.5.attn.qkv.weight", "image_encoder.blocks.5.attn.qkv.bias", "image_encoder.blocks.5.attn.proj.weight", "image_encoder.blocks.5.attn.proj.bias", "image_encoder.blocks.5.norm2.weight", "image_encoder.blocks.5.norm2.bias", "image_encoder.blocks.5.mlp.lin1.weight", "image_encoder.blocks.5.mlp.lin1.bias", "image_encoder.blocks.5.mlp.lin2.weight", "image_encoder.blocks.5.mlp.lin2.bias", "image_encoder.blocks.6.norm1.weight", "image_encoder.blocks.6.norm1.bias", "image_encoder.blocks.6.attn.rel_pos_h", "image_encoder.blocks.6.attn.rel_pos_w", "image_encoder.blocks.6.attn.qkv.weight", "image_encoder.blocks.6.attn.qkv.bias", "image_encoder.blocks.6.attn.proj.weight", "image_encoder.blocks.6.attn.proj.bias", "image_encoder.blocks.6.norm2.weight", "image_encoder.blocks.6.norm2.bias", "image_encoder.blocks.6.mlp.lin1.weight", "image_encoder.blocks.6.mlp.lin1.bias", "image_encoder.blocks.6.mlp.lin2.weight", "image_encoder.blocks.6.mlp.lin2.bias", "image_encoder.blocks.7.norm1.weight", "image_encoder.blocks.7.norm1.bias", "image_encoder.blocks.7.attn.rel_pos_h", "image_encoder.blocks.7.attn.rel_pos_w", "image_encoder.blocks.7.attn.qkv.weight", "image_encoder.blocks.7.attn.qkv.bias", "image_encoder.blocks.7.attn.proj.weight", "image_encoder.blocks.7.attn.proj.bias", "image_encoder.blocks.7.norm2.weight", "image_encoder.blocks.7.norm2.bias", "image_encoder.blocks.7.mlp.lin1.weight", "image_encoder.blocks.7.mlp.lin1.bias", "image_encoder.blocks.7.mlp.lin2.weight", "image_encoder.blocks.7.mlp.lin2.bias", "image_encoder.blocks.8.norm1.weight", "image_encoder.blocks.8.norm1.bias", "image_encoder.blocks.8.attn.rel_pos_h", "image_encoder.blocks.8.attn.rel_pos_w", "image_encoder.blocks.8.attn.qkv.weight", "image_encoder.blocks.8.attn.qkv.bias", "image_encoder.blocks.8.attn.proj.weight", "image_encoder.blocks.8.attn.proj.bias", "image_encoder.blocks.8.norm2.weight", "image_encoder.blocks.8.norm2.bias", "image_encoder.blocks.8.mlp.lin1.weight", "image_encoder.blocks.8.mlp.lin1.bias", "image_encoder.blocks.8.mlp.lin2.weight", "image_encoder.blocks.8.mlp.lin2.bias", "image_encoder.blocks.9.norm1.weight", "image_encoder.blocks.9.norm1.bias", "image_encoder.blocks.9.attn.rel_pos_h", "image_encoder.blocks.9.attn.rel_pos_w", "image_encoder.blocks.9.attn.qkv.weight", "image_encoder.blocks.9.attn.qkv.bias", "image_encoder.blocks.9.attn.proj.weight", "image_encoder.blocks.9.attn.proj.bias", "image_encoder.blocks.9.norm2.weight", "image_encoder.blocks.9.norm2.bias", "image_encoder.blocks.9.mlp.lin1.weight", "image_encoder.blocks.9.mlp.lin1.bias", "image_encoder.blocks.9.mlp.lin2.weight", "image_encoder.blocks.9.mlp.lin2.bias", "image_encoder.blocks.10.norm1.weight", "image_encoder.blocks.10.norm1.bias", "image_encoder.blocks.10.attn.rel_pos_h", "image_encoder.blocks.10.attn.rel_pos_w", "image_encoder.blocks.10.attn.qkv.weight", "image_encoder.blocks.10.attn.qkv.bias", "image_encoder.blocks.10.attn.proj.weight", "image_encoder.blocks.10.attn.proj.bias", "image_encoder.blocks.10.norm2.weight", "image_encoder.blocks.10.norm2.bias", "image_encoder.blocks.10.mlp.lin1.weight", "image_encoder.blocks.10.mlp.lin1.bias", "image_encoder.blocks.10.mlp.lin2.weight", "image_encoder.blocks.10.mlp.lin2.bias", "image_encoder.blocks.11.norm1.weight", "image_encoder.blocks.11.norm1.bias", "image_encoder.blocks.11.attn.rel_pos_h", "image_encoder.blocks.11.attn.rel_pos_w", "image_encoder.blocks.11.attn.qkv.weight", "image_encoder.blocks.11.attn.qkv.bias", "image_encoder.blocks.11.attn.proj.weight", "image_encoder.blocks.11.attn.proj.bias", "image_encoder.blocks.11.norm2.weight", "image_encoder.blocks.11.norm2.bias", "image_encoder.blocks.11.mlp.lin1.weight", "image_encoder.blocks.11.mlp.lin1.bias", "image_encoder.blocks.11.mlp.lin2.weight", "image_encoder.blocks.11.mlp.lin2.bias".
Unexpected key(s) in state_dict: "image_encoder.layers.0.blocks.0.conv1.c.weight", "image_encoder.layers.0.blocks.0.conv1.bn.weight", "image_encoder.layers.0.blocks.0.conv1.bn.bias", "image_encoder.layers.0.blocks.0.conv1.bn.running_mean", "image_encoder.layers.0.blocks.0.conv1.bn.running_var", "image_encoder.layers.0.blocks.0.conv1.bn.num_batches_tracked", "image_encoder.layers.0.blocks.0.conv2.c.weight", "image_encoder.layers.0.blocks.0.conv2.bn.weight", "image_encoder.layers.0.blocks.0.conv2.bn.bias", "image_encoder.layers.0.blocks.0.conv2.bn.running_mean", "image_encoder.layers.0.blocks.0.conv2.bn.running_var", "image_encoder.layers.0.blocks.0.conv2.bn.num_batches_tracked", "image_encoder.layers.0.blocks.0.conv3.c.weight", "image_encoder.layers.0.blocks.0.conv3.bn.weight", "image_encoder.layers.0.blocks.0.conv3.bn.bias", "image_encoder.layers.0.blocks.0.conv3.bn.running_mean", "image_encoder.layers.0.blocks.0.conv3.bn.running_var", "image_encoder.layers.0.blocks.0.conv3.bn.num_batches_tracked", "image_encoder.layers.0.blocks.1.conv1.c.weight", "image_encoder.layers.0.blocks.1.conv1.bn.weight", "image_encoder.layers.0.blocks.1.conv1.bn.bias", "image_encoder.layers.0.blocks.1.conv1.bn.running_mean", "image_encoder.layers.0.blocks.1.conv1.bn.running_var", "image_encoder.layers.0.blocks.1.conv1.bn.num_batches_tracked", "image_encoder.layers.0.blocks.1.conv2.c.weight", "image_encoder.layers.0.blocks.1.conv2.bn.weight", "image_encoder.layers.0.blocks.1.conv2.bn.bias", "image_encoder.layers.0.blocks.1.conv2.bn.running_mean", "image_encoder.layers.0.blocks.1.conv2.bn.running_var", "image_encoder.layers.0.blocks.1.conv2.bn.num_batches_tracked", "image_encoder.layers.0.blocks.1.conv3.c.weight", "image_encoder.layers.0.blocks.1.conv3.bn.weight", "image_encoder.layers.0.blocks.1.conv3.bn.bias", "image_encoder.layers.0.blocks.1.conv3.bn.running_mean", "image_encoder.layers.0.blocks.1.conv3.bn.running_var", "image_encoder.layers.0.blocks.1.conv3.bn.num_batches_tracked", "image_encoder.layers.0.downsample.conv1.c.weight", "image_encoder.layers.0.downsample.conv1.bn.weight", "image_encoder.layers.0.downsample.conv1.bn.bias", "image_encoder.layers.0.downsample.conv1.bn.running_mean", "image_encoder.layers.0.downsample.conv1.bn.running_var", "image_encoder.layers.0.downsample.conv1.bn.num_batches_tracked", "image_encoder.layers.0.downsample.conv2.c.weight", "image_encoder.layers.0.downsample.conv2.bn.weight", "image_encoder.layers.0.downsample.conv2.bn.bias", "image_encoder.layers.0.downsample.conv2.bn.running_mean", "image_encoder.layers.0.downsample.conv2.bn.running_var", "image_encoder.layers.0.downsample.conv2.bn.num_batches_tracked", "image_encoder.layers.0.downsample.conv3.c.weight", "image_encoder.layers.0.downsample.conv3.bn.weight", "image_encoder.layers.0.downsample.conv3.bn.bias", "image_encoder.layers.0.downsample.conv3.bn.running_mean", "image_encoder.layers.0.downsample.conv3.bn.running_var", "image_encoder.layers.0.downsample.conv3.bn.num_batches_tracked", "image_encoder.layers.1.blocks.0.attn.attention_biases", "image_encoder.layers.1.blocks.0.attn.norm.weight", "image_encoder.layers.1.blocks.0.attn.norm.bias", "image_encoder.layers.1.blocks.0.attn.qkv.weight", "image_encoder.layers.1.blocks.0.attn.qkv.bias", "image_encoder.layers.1.blocks.0.attn.proj.weight", "image_encoder.layers.1.blocks.0.attn.proj.bias", "image_encoder.layers.1.blocks.0.mlp.norm.weight", "image_encoder.layers.1.blocks.0.mlp.norm.bias", "image_encoder.layers.1.blocks.0.mlp.fc1.weight", "image_encoder.layers.1.blocks.0.mlp.fc1.bias", "image_encoder.layers.1.blocks.0.mlp.fc2.weight", "image_encoder.layers.1.blocks.0.mlp.fc2.bias", "image_encoder.layers.1.blocks.0.local_conv.c.weight", "image_encoder.layers.1.blocks.0.local_conv.bn.weight", "image_encoder.layers.1.blocks.0.local_conv.bn.bias", "image_encoder.layers.1.blocks.0.local_conv.bn.running_mean", "image_encoder.layers.1.blocks.0.local_conv.bn.running_var", "image_encoder.layers.1.blocks.0.local_conv.bn.num_batches_tracked", "image_encoder.layers.1.blocks.1.attn.attention_biases", "image_encoder.layers.1.blocks.1.attn.norm.weight", "image_encoder.layers.1.blocks.1.attn.norm.bias", "image_encoder.layers.1.blocks.1.attn.qkv.weight", "image_encoder.layers.1.blocks.1.attn.qkv.bias", "image_encoder.layers.1.blocks.1.attn.proj.weight", "image_encoder.layers.1.blocks.1.attn.proj.bias", "image_encoder.layers.1.blocks.1.mlp.norm.weight", "image_encoder.layers.1.blocks.1.mlp.norm.bias", "image_encoder.layers.1.blocks.1.mlp.fc1.weight", "image_encoder.layers.1.blocks.1.mlp.fc1.bias", "image_encoder.layers.1.blocks.1.mlp.fc2.weight", "image_encoder.layers.1.blocks.1.mlp.fc2.bias", "image_encoder.layers.1.blocks.1.local_conv.c.weight", "image_encoder.layers.1.blocks.1.local_conv.bn.weight", "image_encoder.layers.1.blocks.1.local_conv.bn.bias", "image_encoder.layers.1.blocks.1.local_conv.bn.running_mean", "image_encoder.layers.1.blocks.1.local_conv.bn.running_var", "image_encoder.layers.1.blocks.1.local_conv.bn.num_batches_tracked", "image_encoder.layers.1.downsample.conv1.c.weight", "image_encoder.layers.1.downsample.conv1.bn.weight", "image_encoder.layers.1.downsample.conv1.bn.bias", "image_encoder.layers.1.downsample.conv1.bn.running_mean", "image_encoder.layers.1.downsample.conv1.bn.running_var", "image_encoder.layers.1.downsample.conv1.bn.num_batches_tracked", "image_encoder.layers.1.downsample.conv2.c.weight", "image_encoder.layers.1.downsample.conv2.bn.weight", "image_encoder.layers.1.downsample.conv2.bn.bias", "image_encoder.layers.1.downsample.conv2.bn.running_mean", "image_encoder.layers.1.downsample.conv2.bn.running_var", "image_encoder.layers.1.downsample.conv2.bn.num_batches_tracked", "image_encoder.layers.1.downsample.conv3.c.weight", "image_encoder.layers.1.downsample.conv3.bn.weight", "image_encoder.layers.1.downsample.conv3.bn.bias", "image_encoder.layers.1.downsample.conv3.bn.running_mean", "image_encoder.layers.1.downsample.conv3.bn.running_var", "image_encoder.layers.1.downsample.conv3.bn.num_batches_tracked", "image_encoder.layers.2.blocks.0.attn.attention_biases", "image_encoder.layers.2.blocks.0.attn.norm.weight", "image_encoder.layers.2.blocks.0.attn.norm.bias", "image_encoder.layers.2.blocks.0.attn.qkv.weight", "image_encoder.layers.2.blocks.0.attn.qkv.bias", "image_encoder.layers.2.blocks.0.attn.proj.weight", "image_encoder.layers.2.blocks.0.attn.proj.bias", "image_encoder.layers.2.blocks.0.mlp.norm.weight", "image_encoder.layers.2.blocks.0.mlp.norm.bias", "image_encoder.layers.2.blocks.0.mlp.fc1.weight", "image_encoder.layers.2.blocks.0.mlp.fc1.bias", "image_encoder.layers.2.blocks.0.mlp.fc2.weight", "image_encoder.layers.2.blocks.0.mlp.fc2.bias", "image_encoder.layers.2.blocks.0.local_conv.c.weight", "image_encoder.layers.2.blocks.0.local_conv.bn.weight", "image_encoder.layers.2.blocks.0.local_conv.bn.bias", "image_encoder.layers.2.blocks.0.local_conv.bn.running_mean", "image_encoder.layers.2.blocks.0.local_conv.bn.running_var", "image_encoder.layers.2.blocks.0.local_conv.bn.num_batches_tracked", "image_encoder.layers.2.blocks.1.attn.attention_biases", "image_encoder.layers.2.blocks.1.attn.norm.weight", "image_encoder.layers.2.blocks.1.attn.norm.bias", "image_encoder.layers.2.blocks.1.attn.qkv.weight", "image_encoder.layers.2.blocks.1.attn.qkv.bias", "image_encoder.layers.2.blocks.1.attn.proj.weight", "image_encoder.layers.2.blocks.1.attn.proj.bias", "image_encoder.layers.2.blocks.1.mlp.norm.weight", "image_encoder.layers.2.blocks.1.mlp.norm.bias", "image_encoder.layers.2.blocks.1.mlp.fc1.weight", "image_encoder.layers.2.blocks.1.mlp.fc1.bias", "image_encoder.layers.2.blocks.1.mlp.fc2.weight", "image_encoder.layers.2.blocks.1.mlp.fc2.bias", "image_encoder.layers.2.blocks.1.local_conv.c.weight", "image_encoder.layers.2.blocks.1.local_conv.bn.weight", "image_encoder.layers.2.blocks.1.local_conv.bn.bias", "image_encoder.layers.2.blocks.1.local_conv.bn.running_mean", "image_encoder.layers.2.blocks.1.local_conv.bn.running_var", "image_encoder.layers.2.blocks.1.local_conv.bn.num_batches_tracked", "image_encoder.layers.2.blocks.2.attn.attention_biases", "image_encoder.layers.2.blocks.2.attn.norm.weight", "image_encoder.layers.2.blocks.2.attn.norm.bias", "image_encoder.layers.2.blocks.2.attn.qkv.weight", "image_encoder.layers.2.blocks.2.attn.qkv.bias", "image_encoder.layers.2.blocks.2.attn.proj.weight", "image_encoder.layers.2.blocks.2.attn.proj.bias", "image_encoder.layers.2.blocks.2.mlp.norm.weight", "image_encoder.layers.2.blocks.2.mlp.norm.bias", "image_encoder.layers.2.blocks.2.mlp.fc1.weight", "image_encoder.layers.2.blocks.2.mlp.fc1.bias", "image_encoder.layers.2.blocks.2.mlp.fc2.weight", "image_encoder.layers.2.blocks.2.mlp.fc2.bias", "image_encoder.layers.2.blocks.2.local_conv.c.weight", "image_encoder.layers.2.blocks.2.local_conv.bn.weight", "image_encoder.layers.2.blocks.2.local_conv.bn.bias", "image_encoder.layers.2.blocks.2.local_conv.bn.running_mean", "image_encoder.layers.2.blocks.2.local_conv.bn.running_var", "image_encoder.layers.2.blocks.2.local_conv.bn.num_batches_tracked", "image_encoder.layers.2.blocks.3.attn.attention_biases", "image_encoder.layers.2.blocks.3.attn.norm.weight", "image_encoder.layers.2.blocks.3.attn.norm.bias", "image_encoder.layers.2.blocks.3.attn.qkv.weight", "image_encoder.layers.2.blocks.3.attn.qkv.bias", "image_encoder.layers.2.blocks.3.attn.proj.weight", "image_encoder.layers.2.blocks.3.attn.proj.bias", "image_encoder.layers.2.blocks.3.mlp.norm.weight", "image_encoder.layers.2.blocks.3.mlp.norm.bias", "image_encoder.layers.2.blocks.3.mlp.fc1.weight", "image_encoder.layers.2.blocks.3.mlp.fc1.bias", "image_encoder.layers.2.blocks.3.mlp.fc2.weight", "image_encoder.layers.2.blocks.3.mlp.fc2.bias", "image_encoder.layers.2.blocks.3.local_conv.c.weight", "image_encoder.layers.2.blocks.3.local_conv.bn.weight", "image_encoder.layers.2.blocks.3.local_conv.bn.bias", "image_encoder.layers.2.blocks.3.local_conv.bn.running_mean", "image_encoder.layers.2.blocks.3.local_conv.bn.running_var", "image_encoder.layers.2.blocks.3.local_conv.bn.num_batches_tracked", "image_encoder.layers.2.blocks.4.attn.attention_biases", "image_encoder.layers.2.blocks.4.attn.norm.weight", "image_encoder.layers.2.blocks.4.attn.norm.bias", "image_encoder.layers.2.blocks.4.attn.qkv.weight", "image_encoder.layers.2.blocks.4.attn.qkv.bias", "image_encoder.layers.2.blocks.4.attn.proj.weight", "image_encoder.layers.2.blocks.4.attn.proj.bias", "image_encoder.layers.2.blocks.4.mlp.norm.weight", "image_encoder.layers.2.blocks.4.mlp.norm.bias", "image_encoder.layers.2.blocks.4.mlp.fc1.weight", "image_encoder.layers.2.blocks.4.mlp.fc1.bias", "image_encoder.layers.2.blocks.4.mlp.fc2.weight", "image_encoder.layers.2.blocks.4.mlp.fc2.bias", "image_encoder.layers.2.blocks.4.local_conv.c.weight", "image_encoder.layers.2.blocks.4.local_conv.bn.weight", "image_encoder.layers.2.blocks.4.local_conv.bn.bias", "image_encoder.layers.2.blocks.4.local_conv.bn.running_mean", "image_encoder.layers.2.blocks.4.local_conv.bn.running_var", "image_encoder.layers.2.blocks.4.local_conv.bn.num_batches_tracked", "image_encoder.layers.2.blocks.5.attn.attention_biases", "image_encoder.layers.2.blocks.5.attn.norm.weight", "image_encoder.layers.2.blocks.5.attn.norm.bias", "image_encoder.layers.2.blocks.5.attn.qkv.weight", "image_encoder.layers.2.blocks.5.attn.qkv.bias", "image_encoder.layers.2.blocks.5.attn.proj.weight", "image_encoder.layers.2.blocks.5.attn.proj.bias", "image_encoder.layers.2.blocks.5.mlp.norm.weight", "image_encoder.layers.2.blocks.5.mlp.norm.bias", "image_encoder.layers.2.blocks.5.mlp.fc1.weight", "image_encoder.layers.2.blocks.5.mlp.fc1.bias", "image_encoder.layers.2.blocks.5.mlp.fc2.weight", "image_encoder.layers.2.blocks.5.mlp.fc2.bias", "image_encoder.layers.2.blocks.5.local_conv.c.weight", "image_encoder.layers.2.blocks.5.local_conv.bn.weight", "image_encoder.layers.2.blocks.5.local_conv.bn.bias", "image_encoder.layers.2.blocks.5.local_conv.bn.running_mean", "image_encoder.layers.2.blocks.5.local_conv.bn.running_var", "image_encoder.layers.2.blocks.5.local_conv.bn.num_batches_tracked", "image_encoder.layers.2.downsample.conv1.c.weight", "image_encoder.layers.2.downsample.conv1.bn.weight", "image_encoder.layers.2.downsample.conv1.bn.bias", "image_encoder.layers.2.downsample.conv1.bn.running_mean", "image_encoder.layers.2.downsample.conv1.bn.running_var", "image_encoder.layers.2.downsample.conv1.bn.num_batches_tracked", "image_encoder.layers.2.downsample.conv2.c.weight", "image_encoder.layers.2.downsample.conv2.bn.weight", "image_encoder.layers.2.downsample.conv2.bn.bias", "image_encoder.layers.2.downsample.conv2.bn.running_mean", "image_encoder.layers.2.downsample.conv2.bn.running_var", "image_encoder.layers.2.downsample.conv2.bn.num_batches_tracked", "image_encoder.layers.2.downsample.conv3.c.weight", "image_encoder.layers.2.downsample.conv3.bn.weight", "image_encoder.layers.2.downsample.conv3.bn.bias", "image_encoder.layers.2.downsample.conv3.bn.running_mean", "image_encoder.layers.2.downsample.conv3.bn.running_var", "image_encoder.layers.2.downsample.conv3.bn.num_batches_tracked", "image_encoder.layers.3.blocks.0.attn.attention_biases", "image_encoder.layers.3.blocks.0.attn.norm.weight", "image_encoder.layers.3.blocks.0.attn.norm.bias", "image_encoder.layers.3.blocks.0.attn.qkv.weight", "image_encoder.layers.3.blocks.0.attn.qkv.bias", "image_encoder.layers.3.blocks.0.attn.proj.weight", "image_encoder.layers.3.blocks.0.attn.proj.bias", "image_encoder.layers.3.blocks.0.mlp.norm.weight", "image_encoder.layers.3.blocks.0.mlp.norm.bias", "image_encoder.layers.3.blocks.0.mlp.fc1.weight", "image_encoder.layers.3.blocks.0.mlp.fc1.bias", "image_encoder.layers.3.blocks.0.mlp.fc2.weight", "image_encoder.layers.3.blocks.0.mlp.fc2.bias", "image_encoder.layers.3.blocks.0.local_conv.c.weight", "image_encoder.layers.3.blocks.0.local_conv.bn.weight", "image_encoder.layers.3.blocks.0.local_conv.bn.bias", "image_encoder.layers.3.blocks.0.local_conv.bn.running_mean", "image_encoder.layers.3.blocks.0.local_conv.bn.running_var", "image_encoder.layers.3.blocks.0.local_conv.bn.num_batches_tracked", "image_encoder.layers.3.blocks.1.attn.attention_biases", "image_encoder.layers.3.blocks.1.attn.norm.weight", "image_encoder.layers.3.blocks.1.attn.norm.bias", "image_encoder.layers.3.blocks.1.attn.qkv.weight", "image_encoder.layers.3.blocks.1.attn.qkv.bias", "image_encoder.layers.3.blocks.1.attn.proj.weight", "image_encoder.layers.3.blocks.1.attn.proj.bias", "image_encoder.layers.3.blocks.1.mlp.norm.weight", "image_encoder.layers.3.blocks.1.mlp.norm.bias", "image_encoder.layers.3.blocks.1.mlp.fc1.weight", "image_encoder.layers.3.blocks.1.mlp.fc1.bias", "image_encoder.layers.3.blocks.1.mlp.fc2.weight", "image_encoder.layers.3.blocks.1.mlp.fc2.bias", "image_encoder.layers.3.blocks.1.local_conv.c.weight", "image_encoder.layers.3.blocks.1.local_conv.bn.weight", "image_encoder.layers.3.blocks.1.local_conv.bn.bias", "image_encoder.layers.3.blocks.1.local_conv.bn.running_mean", "image_encoder.layers.3.blocks.1.local_conv.bn.running_var", "image_encoder.layers.3.blocks.1.local_conv.bn.num_batches_tracked", "image_encoder.norm_head.weight", "image_encoder.norm_head.bias", "image_encoder.head.weight", "image_encoder.head.bias", "image_encoder.patch_embed.seq.0.c.weight", "image_encoder.patch_embed.seq.0.bn.weight", "image_encoder.patch_embed.seq.0.bn.bias", "image_encoder.patch_embed.seq.0.bn.running_mean", "image_encoder.patch_embed.seq.0.bn.running_var", "image_encoder.patch_embed.seq.0.bn.num_batches_tracked", "image_encoder.patch_embed.seq.2.c.weight", "image_encoder.patch_embed.seq.2.bn.weight", "image_encoder.patch_embed.seq.2.bn.bias", "image_encoder.patch_embed.seq.2.bn.running_mean", "image_encoder.patch_embed.seq.2.bn.running_var", "image_encoder.patch_embed.seq.2.bn.num_batches_tracked", "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".
size mismatch for image_encoder.neck.0.weight: copying a param with shape torch.Size([256, 320, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 768, 1, 1]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant