How to improve detr #12

zhangzhen119 · 2022-10-13T13:43:38Z

Could you please share about your improved code for batchformer on detr, I want to learn about the improvement for detr

zhihou7 · 2022-10-14T07:17:57Z

Hi @zhangzhen119,
Thanks for your comments and sorry for getting to you late. The improvement in DETR is similar to Deformable-DETR. The differences between the two codes are that Deformable-DETR uses batch first transformer, while the batch is in the second dimension. If you want a clean code, you can find the changes as follows,

diff models/deformable_transformer.py <(curl https://raw.githubusercontent.com/fundamentalvision/Deformable-DETR/main/models/deformable_transformer.py)

For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time.

If you have further questions, feel free to ask.
Regards,

zhangzhen119 · 2022-10-14T07:30:10Z

Hi @zhangzhen119, Thanks for your comments and sorry for getting to you late. The improvement in DETR is similar to Deformable-DETR. The differences between the two codes are that Deformable-DETR uses batch first transformer, while the batch is in the second dimension. If you want a clean code, you can find the changes as follows,
diff models/deformable_transformer.py <(curl https://raw.githubusercontent.com/fundamentalvision/Deformable-DETR/main/models/deformable_transformer.py)
For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time.

If you have further questions, feel free to ask. Regards,

Thank you very much for your help and congratulations on the work you have done on this

zhihou7 · 2022-10-15T10:15:53Z

My pleasure

zhangzhen119 · 2022-10-19T01:51:44Z

Hello, in the process of reproducing your improvement of detr using batchformer, I found that you have added some parameters to the main function. Is this the parameter for the best improvement in your paper? If I want to get How to set these parameters for the same improvement, sorry to bother you

zhihou7 · 2022-10-19T02:01:28Z

I set bf as 3, that is the batchformerv2. It is Because I use bf == 1 to indicate the experiment without shared prediction modules.
base_bf is used for the baseline, thus you can ignore it.
start_idx is useless when you set insert_idx. I use insert_idx to indicate the insert layer in the transformer encoder. I set insert_idx 0 (the first layer).
use_checkpoint is to use checkpoint to reduce memory because I usually have 4 16G V100. Thus, in the experiment on DETR, I set batch size 2.
share_bf indicates we share the batchformer along different layers. Interestingly, this does not degrade the performance too much.
Other parameters do not affect the performance. I do not use it. It exists just because I think the weight decay might affect the performance according to my experience in BatchFormerV1.

zhangzhen119 · 2022-10-19T02:11:04Z

我将bf设置为3，即batchformerv2。这是因为我bf == 1 用来表示没有共享预测模块的实验。 base_bf 用于基线，因此您可以忽略它。设置 insert_idx 时 start_idx 没用。我使用 insert_idx 来表示转换器编码器中的插入层。我设置了 insert_idx 0（第一层）。 use_checkpoint 是使用checkpoint来减少内存，因为我平时有4个16G V100。因此，在 DETR 的实验中，我将批量大小设置为 2。share_bf 表示我们沿不同层共享批处理形成器。有趣的是，这并没有过多地降低性能。其他参数不影响性能。我不用这个。它的存在只是因为根据我在 BatchFormerV1 中的经验，我认为重量衰减可能会影响性能。

I am still very poor at code learning, so I asked some relatively simple questions, I really appreciate your prompt and effective reply, I will do experiments and learning based on your suggestions, and wish you a higher achievement

zhihou7 · 2022-10-19T02:31:09Z

Thanks. it is mainly because my code is too messy.

zhangzhen119 · 2022-10-25T13:32:08Z

Excuse me, I use your batchformerv2 in a transformer structure similar to detr, the parameters are set according to your suggestion, but in the end there is only about 0.1 improvement, is this improvement reasonable. Due to equipment problems, I set batchsize to 4, and did not use the optimal solution batchsize=24 mentioned in your article. Is the main reason for the small improvement is the problem of batchsize? If I want to only use batchsize=4 Are there any other possible solutions? sorry to trouble you

zhihou7 · 2022-10-26T06:11:56Z

Hi, how many epochs do you train the network? Could you provide the logs? Meanwhile, do you run the experiments on a single GPU with batchsize 4 or 4GPUs with batchsize 4?

Here is the baseline log and Here is the batchformer log

The two logs are trained with batchsize 16 and 8 GPUs. I do not implement the multi-gpu distributation training. Therefore, it depends on the batch size on a single gpu.

zhangzhen119 · 2022-10-26T06:54:25Z

你好，你训练了多少个 epoch？你能提供日志吗？同时，您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验？

这是基线日志，这是批处理器日志

这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此，它取决于单个 gpu 上的批量大小。

Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small.

zhihou7 · 2022-10-26T09:50:33Z

Do u mean the performance drops after 17epochs? Do u use shared prediction modules? I mean siamese stream. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: zhangzhen119 ***@***.***> Sent: Wednesday, October 26, 2022 5:54:36 PM To: zhihou7/BatchFormer ***@***.***> Cc: Zhi Hou ***@***.***>; Comment ***@***.***> Subject: Re: [zhihou7/BatchFormer] How to improve detr (Issue #12) 你好，你训练了多少个 epoch？你能提供日志吗？同时，您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验？这是基线日志<https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing>，这是批处理器日志<https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing> 这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此，它取决于单个 gpu 上的批量大小。 Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small. ― Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQGPLYEQUZAJZIOQ5A2GSUTWFDISZANCNFSM6AAAAAAREJOWBI>. You are receiving this because you commented.Message ID: ***@***.***>

zhangzhen119 · 2022-10-26T10:33:05Z

你的意思是17个epochs后性能下降吗？你使用共享预测模块吗？我的意思是暹罗流。获取适用于 iOS 的 Outlook< https://aka.ms/o0ukef >
…
________________________________ From: zhangzhen119 @.> Sent: Wednesday, October 26, 2022 5:54:36 PM To: zhihou7/BatchFormer @.> Cc: Zhi Hou @.>; Comment @.> Subject: Re: [zhihou7/BatchFormer] How to improve detr (Issue #12) 你好，你训练了多少个 epoch？你能提供日志吗？同时，您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验？这是基线日志<[https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing>，这是批处理器日志https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing](https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing%EF%BC%8C%E8%BF%99%E6%98%AF%E6%89%B9%E5%A4%84%E7%90%86%E5%99%A8%E6%97%A5%E5%BF%97https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing) 这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此，它取决于单个 gpu 上的批量大小。 Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small. ― Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQGPLYEQUZAJZIOQ5A2GSUTWFDISZANCNFSM6AAAAAAREJOWBI. You are receiving this because you commented.Message ID: @.***>

Yes, my model started to drop at the 17th epoch without batchformer, so I think this is normal. I used what you shared with me in detr and then improved it. Sorry, I didn't see the use of shared modules, but when I looked at the code you shared with me, I found that I didn't use batchformer only in the training phase. This problem is caused, so I am going to use it only in the training phase and try again

zhihou7 · 2022-10-26T10:43:44Z

If you do not share other modules in the network, you will suffer from performance dropping when you do not use the batchformer in the test phrase.

I copy the batch into batchformerv2 stream, then input the original feature batch and the feature batch with batchformerv2 into the next modules.

zhangzhen119 · 2022-10-26T10:51:54Z

如果您不共享网络中的其他模块，那么当您在测试短语中不使用 batchformer 时，您将遭受性能下降的困扰。

我将批次复制到 batchformerv2 流中，然后将原始特征批次和带有 batchformerv2 的特征批次输入到下一个模块中。

Ok thank you, I'll try again, sorry for your inconvenience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve detr #12

How to improve detr #12

zhangzhen119 commented Oct 13, 2022

zhihou7 commented Oct 14, 2022 •

edited

zhangzhen119 commented Oct 14, 2022

zhihou7 commented Oct 15, 2022

zhangzhen119 commented Oct 19, 2022

zhihou7 commented Oct 19, 2022 •

edited

zhangzhen119 commented Oct 19, 2022

zhihou7 commented Oct 19, 2022

zhangzhen119 commented Oct 25, 2022

zhihou7 commented Oct 26, 2022 •

edited

zhangzhen119 commented Oct 26, 2022

zhihou7 commented Oct 26, 2022 via email

zhangzhen119 commented Oct 26, 2022

zhihou7 commented Oct 26, 2022

zhangzhen119 commented Oct 26, 2022

How to improve detr #12

How to improve detr #12

Comments

zhangzhen119 commented Oct 13, 2022

zhihou7 commented Oct 14, 2022 • edited

zhangzhen119 commented Oct 14, 2022

zhihou7 commented Oct 15, 2022

zhangzhen119 commented Oct 19, 2022

zhihou7 commented Oct 19, 2022 • edited

zhangzhen119 commented Oct 19, 2022

zhihou7 commented Oct 19, 2022

zhangzhen119 commented Oct 25, 2022

zhihou7 commented Oct 26, 2022 • edited

zhangzhen119 commented Oct 26, 2022

zhihou7 commented Oct 26, 2022 via email

zhangzhen119 commented Oct 26, 2022

zhihou7 commented Oct 26, 2022

zhangzhen119 commented Oct 26, 2022

zhihou7 commented Oct 14, 2022 •

edited

zhihou7 commented Oct 19, 2022 •

edited

zhihou7 commented Oct 26, 2022 •

edited