Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于多模态融合以及结果复现问题 #15

Open
xiezexun opened this issue Oct 5, 2023 · 2 comments
Open

关于多模态融合以及结果复现问题 #15

xiezexun opened this issue Oct 5, 2023 · 2 comments

Comments

@xiezexun
Copy link

xiezexun commented Oct 5, 2023

作者您好,看了您的论文深受启发,觉得您写的很好,有两个问题想咨询您。
1、我已经成功复现了代码,预训练模型使用的vit-l-14,两张4090显卡跑的结果是:top1: 95.3%\top5: 99.2%,跟您的结果可能还有差距。
2、关于视觉特征和文本特征融合时,您采用了CLIP模型默认的余弦相似度计算,但我不太理解这个代码思路,看CLIP原论文伪代码好像不是这样,恳请您解答一下这个logit_scale 是干啥的,有什么用,为什么要这样初始化logit_scale 。
self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07))
logit_scale = self.logit_scale.exp()
logits = logit_scale * image_emb @ text_emb.t()

@whwu95
Copy link
Owner

whwu95 commented Oct 13, 2023

感谢对我们工作的兴趣。

  1. 不清楚您指的是什么数据集上的结果?
  2. 关于logit_scale请参考CLIP官方代码https://github.com/openai/CLIP/blob/a1d071733d7111c9c014f024669f959182114e33/clip/model.py#L295

@xiezexun
Copy link
Author

我是在ucf101数据集上复现的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants