Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve detr #12

Open
zhangzhen119 opened this issue Oct 13, 2022 · 14 comments
Open

How to improve detr #12

zhangzhen119 opened this issue Oct 13, 2022 · 14 comments

Comments

@zhangzhen119
Copy link

Could you please share about your improved code for batchformer on detr, I want to learn about the improvement for detr

@zhihou7
Copy link
Owner

zhihou7 commented Oct 14, 2022

Hi @zhangzhen119,
Thanks for your comments and sorry for getting to you late. The improvement in DETR is similar to Deformable-DETR. The differences between the two codes are that Deformable-DETR uses batch first transformer, while the batch is in the second dimension. If you want a clean code, you can find the changes as follows,

diff models/deformable_transformer.py <(curl https://raw.githubusercontent.com/fundamentalvision/Deformable-DETR/main/models/deformable_transformer.py)

For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time.

If you have further questions, feel free to ask.
Regards,

@zhangzhen119
Copy link
Author

Hi @zhangzhen119, Thanks for your comments and sorry for getting to you late. The improvement in DETR is similar to Deformable-DETR. The differences between the two codes are that Deformable-DETR uses batch first transformer, while the batch is in the second dimension. If you want a clean code, you can find the changes as follows,

diff models/deformable_transformer.py <(curl https://raw.githubusercontent.com/fundamentalvision/Deformable-DETR/main/models/deformable_transformer.py)

For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time.

If you have further questions, feel free to ask. Regards,

Thank you very much for your help and congratulations on the work you have done on this

@zhihou7
Copy link
Owner

zhihou7 commented Oct 15, 2022

My pleasure

@zhangzhen119
Copy link
Author

Hello, in the process of reproducing your improvement of detr using batchformer, I found that you have added some parameters to the main function. Is this the parameter for the best improvement in your paper? If I want to get How to set these parameters for the same improvement, sorry to bother you
image

@zhihou7
Copy link
Owner

zhihou7 commented Oct 19, 2022

I set bf as 3, that is the batchformerv2. It is Because I use bf == 1 to indicate the experiment without shared prediction modules.
base_bf is used for the baseline, thus you can ignore it.
start_idx is useless when you set insert_idx. I use insert_idx to indicate the insert layer in the transformer encoder. I set insert_idx 0 (the first layer).
use_checkpoint is to use checkpoint to reduce memory because I usually have 4 16G V100. Thus, in the experiment on DETR, I set batch size 2.
share_bf indicates we share the batchformer along different layers. Interestingly, this does not degrade the performance too much.
Other parameters do not affect the performance. I do not use it. It exists just because I think the weight decay might affect the performance according to my experience in BatchFormerV1.

@zhangzhen119
Copy link
Author

我将bf设置为3,即batchformerv2。这是因为我bf == 1 用来表示没有共享预测模块的实验。 base_bf 用于基线,因此您可以忽略它。 设置 insert_idx 时 start_idx 没用。我使用 insert_idx 来表示转换器编码器中的插入层。我设置了 insert_idx 0(第一层)。 use_checkpoint 是使用checkpoint来减少内存,因为我平时有4个16G V100。因此,在 DETR 的实验中,我将批量大小设置为 2。share_bf 表示我们沿不同层共享批处理形成器。有趣的是,这并没有过多地降低性能。 其他参数不影响性能。我不用这个。它的存在只是因为根据我在 BatchFormerV1 中的经验,我认为重量衰减可能会影响性能。

I am still very poor at code learning, so I asked some relatively simple questions, I really appreciate your prompt and effective reply, I will do experiments and learning based on your suggestions, and wish you a higher achievement

@zhihou7
Copy link
Owner

zhihou7 commented Oct 19, 2022

Thanks. it is mainly because my code is too messy.

@zhangzhen119
Copy link
Author

Excuse me, I use your batchformerv2 in a transformer structure similar to detr, the parameters are set according to your suggestion, but in the end there is only about 0.1 improvement, is this improvement reasonable. Due to equipment problems, I set batchsize to 4, and did not use the optimal solution batchsize=24 mentioned in your article. Is the main reason for the small improvement is the problem of batchsize? If I want to only use batchsize=4 Are there any other possible solutions? sorry to trouble you

@zhihou7
Copy link
Owner

zhihou7 commented Oct 26, 2022

Hi, how many epochs do you train the network? Could you provide the logs? Meanwhile, do you run the experiments on a single GPU with batchsize 4 or 4GPUs with batchsize 4?

Here is the baseline log and Here is the batchformer log

The two logs are trained with batchsize 16 and 8 GPUs. I do not implement the multi-gpu distributation training. Therefore, it depends on the batch size on a single gpu.

@zhangzhen119
Copy link
Author

你好,你训练了多少个 epoch?你能提供日志吗?同时,您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验?

这是基线日志,这是批处理器日志

这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此,它取决于单个 gpu 上的批量大小。

Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small.

@zhihou7
Copy link
Owner

zhihou7 commented Oct 26, 2022 via email

@zhangzhen119
Copy link
Author

你的意思是17个epochs后性能下降吗?你使用共享预测模块吗?我的意思是暹罗流。获取适用于 iOS 的 Outlook< https://aka.ms/o0ukef >

________________________________ From: zhangzhen119 @.> Sent: Wednesday, October 26, 2022 5:54:36 PM To: zhihou7/BatchFormer @.> Cc: Zhi Hou @.>; Comment @.> Subject: Re: [zhihou7/BatchFormer] How to improve detr (Issue #12) 你好,你训练了多少个 epoch?你能提供日志吗?同时,您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验? 这是基线日志<[https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing>,这是批处理器日志https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing](https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing%EF%BC%8C%E8%BF%99%E6%98%AF%E6%89%B9%E5%A4%84%E7%90%86%E5%99%A8%E6%97%A5%E5%BF%97https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing) 这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此,它取决于单个 gpu 上的批量大小。 Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small. ― Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQGPLYEQUZAJZIOQ5A2GSUTWFDISZANCNFSM6AAAAAAREJOWBI. You are receiving this because you commented.Message ID: @.***>

Yes, my model started to drop at the 17th epoch without batchformer, so I think this is normal. I used what you shared with me in detr and then improved it. Sorry, I didn't see the use of shared modules, but when I looked at the code you shared with me, I found that I didn't use batchformer only in the training phase. This problem is caused, so I am going to use it only in the training phase and try again

@zhihou7
Copy link
Owner

zhihou7 commented Oct 26, 2022

If you do not share other modules in the network, you will suffer from performance dropping when you do not use the batchformer in the test phrase.

I copy the batch into batchformerv2 stream, then input the original feature batch and the feature batch with batchformerv2 into the next modules.

@zhangzhen119
Copy link
Author

如果您不共享网络中的其他模块,那么当您在测试短语中不使用 batchformer 时,您将遭受性能下降的困扰。

我将批次复制到 batchformerv2 流中,然后将原始特征批次和带有 batchformerv2 的特征批次输入到下一个模块中。

Ok thank you, I'll try again, sorry for your inconvenience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants