-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training issues #95
Comments
Hi @lzbushicai,
|
Thank you for your suggestion. After resizing the image, I did not adjust the camera parameters. |
Hello, your voxelization code appears to be correct. However, have you verified that your x, y, z axes align with those of Semantic KITTI? Without using camera parameters, it's unclear how you would be able to aggregate the point cloud to construct a dense scene. You will still require the calibration parameters to project the voxels onto the image. |
Thanks, I am projecting the point cloud with labels into the image to troubleshoot the problem. Check if the problem is with the KITTI script or with the Extrinsic Parameters |
Hi, I'm using my own script to voxelize the point cloud, but the points I get are particularly sparse (relative to the KITTI tool), can I aggregate the front and back frames to solve this problem?
Here is the label distribution after processing by KITTI tool
it's obvious that the file processed by my own script is very sparse. |
I think you need to aggregate the point clouds from many consecutive frames to have dense scenes not only the front and back. |
Hi, it takes 30 epochs to converge (https://github.com/astra-vision/MonoScene/blob/master/monoscene/scripts/train_monoscene.py#L50). You should try overfitting a single example to see if your network can converge. Usually, this problem arises from an incorrect camera projection of voxels onto the image. You should also visualize the projected voxels on the image to check if they are correct. |
Thank you!!!I tried overlaying 10 frames of point clouds and used surroundocc's pipline to generate dense labels, but the training results were still very poor. |
Hi,Cao Anh Quan! |
Hi, you should sample 1 point per voxels in 3D space, then project them on the image using the camera parameter to visualize them. This is the only way to know if your projection is correct. Otherwise, you can visualize pix_x and pix_y here to see if they are correct. Then, you should try to overfit on single example to see if everything is working. You should expect a very high overfit performance. |
Thank you. I printed the fovmask, pix, and target on a blank image, and the result showed that I did not have the correct projection,
|
You are welcome! It also took me quite sometime to make the projection works correctly. |
Hi,Cao Anh Quan!I have sloved this problem,Without your foundation, it's hard to imagine how long it would have taken me to solve this problem,Thankyou !!!! |
Hi Cao Anh Quan!I have processed my data into KITTI format and now I am sure that my voxels are correctly projected onto the image. Why is my training loss unable to converge and my MIOU almost zero? I cannot find the reason why the network is not working. Even if my dataset is a field dataset, the effect may be slightly worse, but it shouldn't be working, right? |
Have you tried to optimized only 1 frame to see if you can overfit it? |
Yes, I trained 10 frames of data using Monoscene, but the loss did not converge and the moiu was very low |
I suggest you to visualize the output. |
I set other parameters in monoscene.yaml
but the loss stills constant。 |
Hi Cao Anh Quan!I have solved the above problem. Thank you for your long-term guidance。The network works well in my dataset!!! |
Hello,Cao Anh Quan!!, I organized my data format into KITTI format and trained it, but I found that the relationship loss often appears as nan.
sequences:['01'],frame_id:['002205'],**relation_loss: nan** sequences:['01'],frame_id:['002205'],loss_sem_scal: 5.660839080810547 sequences:['01'],frame_id:['002205'],loss_geo_scal: 2.1648528575897217 Warning: frustum_ nonempty is zero, and the division operation will be skipped or assigned a default value. sequences是:['01'],frame_id为:['002205'],**total_loss: nan**
I guess this may be related to my data (I did not change the code for calculating the loss). In my tags, there are a particularly large number of 255, as shown in the following figure. May I ask if there is any special meaning or handling method for 255 in your design?
my dataset label distribution
Another issue is that due to the large size of my img, I do not have enough memory to train the network. Therefore, I resized the image to (450 * 720) here is my code ,
img = np.array(img, dtype=np.float32, copy=False) / 255.0 img_resized = Image.fromarray((img * 255).astype(np.uint8)).resize((720, 450)) img = img[:450, :720, :] # crop image
I'm unsure whether this will substantially influence the outcomes of my training, and I would greatly appreciate your insights and guidance on this matter.
The text was updated successfully, but these errors were encountered: