Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 6th No.34】support return micro batch loss for dygraph train_batch #64218

Merged
merged 6 commits into from May 16, 2024

Conversation

AndSonder
Copy link
Contributor

@AndSonder AndSonder commented May 11, 2024

PR Category

Auto Parallel

PR Types

New features

Description

支持动态图流水并行时返回 micro batch 的 loss

主要思路为将 self.total_loss 的累加策略更改为存储所有的 micro batch 的loss,当开启开关的时候将存储的 loss 合并为一个 tensor 返回,否则按照原来的逻辑将 loss 合并(求平均)。

在打开开关的时候,广播到其他卡的loss也是所有micro batch 的loss,参与计算 backward 的 loss 还是之前的 loss 没有变动这部分的逻辑

Copy link

paddle-bot bot commented May 11, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label May 11, 2024
@AndSonder AndSonder marked this pull request as ready for review May 11, 2024 07:39
@AndSonder
Copy link
Contributor Author

@ForFishes ci 没啥问题了,还麻烦研发老师帮忙 review 一下

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented May 15, 2024

@AndSonder 看下覆盖率

@AndSonder
Copy link
Contributor Author

@luotao1 PR-CI-Coverage 申请豁免, PipelineParallelWithInterleave 的覆盖率需要把单测加入到 test_parallel_dygraph_pipeline_parallel_with_virtual_stage 里面才行,但是 ci 的 2 卡环境测不了这个单测

@luotao1 luotao1 merged commit 84fb07d into PaddlePaddle:develop May 16, 2024
31 checks passed
co63oc pushed a commit to co63oc/Paddle that referenced this pull request May 18, 2024
…n_batch (PaddlePaddle#64218)

* support return micro batch loss

* fix codestyle

* recover some code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants