Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPYOLOE的_bbox_loss训练自己的数据集时计算损失报错ValueError: Target -6 is out of lower bound #8963

Closed
3 tasks done
YJH1108 opened this issue May 9, 2024 · 5 comments
Assignees

Comments

@YJH1108
Copy link

YJH1108 commented May 9, 2024

问题确认 Search before asking

  • 我已经查询历史issue,没有发现相似的bug。I have searched the issues and found no similar bug report.

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

在使用PPYOLOE训练自己的数据集时计算bbox_loss时出现以下错误
“”“
Traceback (most recent call last):
File ".\tools\train.py", line 211, in
main()
File ".\tools\train.py", line 207, in main
run(FLAGS, cfg)
File ".\tools\train.py", line 160, in run
trainer.train(FLAGS.eval)
File "E:\jingsai\PaddleDetection\ppdet\engine\trainer.py", line 577, in train
outputs = model(data)
File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "E:\jingsai\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 60, in forward
out = self.get_loss()
File "E:\jingsai\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 147, in get_loss
return self._forward()
File "E:\jingsai\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 93, in _forward
yolo_losses = self.yolo_head(neck_feats, self.inputs)
File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 264, in forward
return self.forward_train(feats, targets, aux_pred)
File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 198, in forward_train
return self.get_loss([
File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 455, in get_loss
assign_out_dict = self.get_loss_from_assign(
File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 500, in get_loss_from_assign
self._bbox_loss(pred_distri, pred_bboxes, anchor_points_s,
File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 364, in _bbox_loss
loss_dfl = self._df_loss(pred_dist_pos, assigned_ltrb_pos,
File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 319, in _df_loss
loss_left = F.cross_entropy(
File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\nn\functional\loss.py", line 1719, in cross_entropy
raise ValueError("Target {} is out of lower bound.".format(
ValueError: Target -1 is out of lower bound.
”“”

出错的行是

ppyoloe_head.py中的
loss_dfl = self._df_loss(pred_dist_pos, assigned_ltrb_pos,
self.reg_range[0]) * bbox_weight

我尝试打印了pred_dist_pos和assigned_ltrb_pos两个变量,发现assigned_ltrb_pos经常出现较大的值
image
image

不清楚是bug还是我在训练自己的数据集时缺少设置什么参数
pred_dist_pos和assigned_ltrb_pos又是在描述什么呢?

望解答

复现环境 Environment

nothing

Bug描述确认 Bug description confirmation

  • 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR? Are you willing to submit a PR?

  • 我愿意提交PR!I'd like to help by submitting a PR!
@YJH1108
Copy link
Author

YJH1108 commented May 9, 2024

我尝试进一步debug发现,assigned_ltrb值域是正常的,在reg_range的范围之内(默认0~17),但是为什么经过masked_select之后会出现值域之外的值,例如下图中assigned_ltrb_pos出现了28,60,92.......或者负数值

我对mask_select的理解是只会根据mask从原tensor中取值,不知道我是否理解有误

image
image

@YJH1108
Copy link
Author

YJH1108 commented May 10, 2024

在CPU版本下masked_select能正常得到结果
我对环境是:
paddlepaddle-gpu 2.3.2
CUDA11.2
cudnn 8.2

code:
"""
import paddle

print(paddle.version)
x = paddle.randn((10,))
mask = x >= 0
y = paddle.masked_select(x, mask)
print(x)
print(mask)
print(y)
"""
Snipaste_2024-05-10_11-04-08
image

@lyuwenyu
Copy link
Collaborator

gpu是什么版本的

@YJH1108
Copy link
Author

YJH1108 commented May 10, 2024

gpu是什么版本的

3050Ti ,驱动版本546.80

安装paddlepaddle-cpu使用的是:
python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple

安装paddlepaddle-gpu 2.3使用的是:
python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

后面我发现使用pd2.6时没有这个问题
安装paddlepaddle-gpu 2.6:
python -m pip install paddlepaddle-gpu==2.6.1.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

但是我现在参加一个比赛最高只能使用2.3

@lyuwenyu
Copy link
Collaborator

这 应该是之前的paddle有bug 后面的版本修复的,,试一下dfl那个区间改成 [0-17]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants