Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNeRF -> loss is nan #132

Closed
yeonjisong opened this issue May 3, 2024 · 3 comments
Closed

DNeRF -> loss is nan #132

yeonjisong opened this issue May 3, 2024 · 3 comments

Comments

@yeonjisong
Copy link

yeonjisong commented May 3, 2024

I face the "loss is nan" issue when I run DNERF lego.
It repeats several times and exits with an error message.

python train.py -s /data/dnerf/lego/ --port 6017 --expname "dnerf/lego" --configs arguments/dnerf/lego.py

Optimizing Output folder: ./output/dnerf/lego [03/05 14:16:13] Tensorboard not available: not logging progress [03/05 14:16:13] feature_dim: 64 [03/05 14:16:13] Found transforms_train.json file, assuming Blender data set! [03/05 14:16:13] Reading Training Transforms [03/05 14:16:13] Reading Test Transforms [03/05 14:16:17] Generating Video Transforms [03/05 14:16:18] hello!!!! [03/05 14:16:18] Generating random point cloud (2000)... [03/05 14:16:18] Loading Training Cameras [03/05 14:16:18] Loading Test Cameras [03/05 14:16:18] Loading Video Cameras [03/05 14:16:18] Deformation Net Set aabb [1.29982098 1.29990645 1.29988719] [-1.29980838 -1.29981163 -1.29872349] [03/05 14:16:18] Voxel Plane: set aabb= Parameter containing: tensor([[ 1.2998, 1.2999, 1.2999], [-1.2998, -1.2998, -1.2987]]) [03/05 14:16:18] Number of points at initialisation : 2000 [03/05 14:16:19] Training progress: 0%| | 0/3000 [00:00<?, ?it/s]data loading done [03/05 14:16:22] loss is nan,end training, reexecv program now. [03/05 14:16:22] Optimizing Output folder: ./output/dnerf/lego [03/05 14:16:27] Tensorboard not available: not logging progress [03/05 14:16:27] feature_dim: 64 [03/05 14:16:27] Found transforms_train.json file, assuming Blender data set! [03/05 14:16:27] Reading Training Transforms [03/05 14:16:27] Reading Test Transforms [03/05 14:16:30] Generating Video Transforms [03/05 14:16:32] hello!!!! [03/05 14:16:32] Generating random point cloud (2000)... [03/05 14:16:32] Loading Training Cameras [03/05 14:16:32] Loading Test Cameras [03/05 14:16:32] Loading Video Cameras [03/05 14:16:32] Deformation Net Set aabb [1.29982098 1.29990645 1.29988719] [-1.29980838 -1.29981163 -1.29872349] [03/05 14:16:32] Voxel Plane: set aabb= Parameter containing: tensor([[ 1.2998, 1.2999, 1.2999], [-1.2998, -1.2998, -1.2987]]) [03/05 14:16:32] Number of points at initialisation : 2000 [03/05 14:16:32] Training progress: 0%| | 0/3000 [00:00<?, ?it/s]data loading done [03/05 14:16:35] loss is nan,end training, reexecv program now. [03/05 14:16:36]

@guanjunwu
Copy link
Collaborator

guanjunwu commented May 5, 2024

interesting, I don't know why because most cases are successful. But I can recommend some potential solutions:

  1. Reinstall our proposed environments
  2. Try vanilla 3DGS, exclude GPU error.

@RNGrunshen
Copy link

interesting, I don't know why because most cases are successful. But I can recommend some potential solutions:

  1. Reinstall our proposed environments
  2. Try vanilla 3DGS, exclude GPU error.

For same dataset, if I cancel the resize of images, I will get nan loss. Maybe it is the reason?

@guanjunwu
Copy link
Collaborator

guanjunwu commented May 19, 2024

interesting, I don't know why because most cases are successful. But I can recommend some potential solutions:

  1. Reinstall our proposed environments
  2. Try vanilla 3DGS, exclude GPU error.

For same dataset, if I cancel the resize of images, I will get nan loss. Maybe it is the reason?

Oh, yes! I didn't fix the resize bug in may codelib!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants