Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScanNet data generation #18

Open
sangrockEG opened this issue Dec 12, 2022 · 1 comment
Open

ScanNet data generation #18

sangrockEG opened this issue Dec 12, 2022 · 1 comment

Comments

@sangrockEG
Copy link

First of all, thank you for publishing your implementation.

I want to generate the ScanNet dataset using the learned weights.
For this, from the huggingface, I downloaded the files including last.ckpt.

Then, using the demo code, I tried to render the images of the first scene (scene0000_00).
For rendering without additional training or evaluation, I slightly modified the final block of scannet.gin as follows:

run.run_render = True
run.run_train = False
run.run_eval = False

After that, I run the demo code with

python -m run --ginc configs/scannet.gin --scene_name scene0000_00

However, when I run the demo code, it seems taking too much memory and returns the following message.

Unable to allocate array with shape (1210619520, 3) and data type float64

This issue also had been mentioned by #11.
The rendering loop (predict_step in /model/plenoxel_torch/model.py) seems to sequentially render the image tensors and keep all of them on RAM.
Maybe this part has better to be fixed for better accessibility of the dataset.

Anyway, in my case, I just picked one pose (frame_id=0) and rendered a single image.
The code runs without error, but it returns an unexpected result.
Fortunately, at least I can see the room-like shape (probably the room of scene0000_00, right?).

rendered_results_of_scene0000_00_0

It seems that there is a pose-related problem.
The following (intermediate) pose tensors might be helpful for figuring out what is wrong.

original pose (before processing with pcd-related things)

[[[-9.554210e-01  1.196160e-01 -2.699320e-01  2.655830e+00]
  [ 2.952480e-01  3.883390e-01 -8.729390e-01  2.981598e+00]
  [ 4.080000e-04 -9.137200e-01 -4.063430e-01  1.368648e+00]
  [ 0.000000e+00  0.000000e+00  0.000000e+00  1.000000e+00]]]

render_pose (the finally returned one)

[[[-9.80858835e-01  2.35084399e-18 -1.94721569e-01  2.96767746e-01]
  [-1.16803752e-07  9.99999718e-01 -7.10082718e-07  3.07291136e-02]
  [ 1.94722179e-01 -1.46270149e-17 -9.80858767e-01  1.29165942e+00]
  [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]]

I'm not very familiar with NeRF-related things, so the aforementioned trials might be wrong somewhere.
Any help would be greatly appreciated.

@Minhluu2911
Copy link

Have you try to use trans_info.npz to convert the pose. After loading pose from ScanNet convert it using the code below:

trans_info = np.load("path/to/trans_info.npz")
T = trans_info['T']
pcd_mean = trans_info['pcd_mean'] 
scene_scale = trans_info['scene_scale']
poses = T @ poses
poses[:, :3, 3] -= pcd_mean
poses[:, :3, 3] *= scene_scale
poses = poses.astype(np.float32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants