GCC loss implementation #22

meijie0401 · 2021-03-08T20:40:33Z

I want to implement GCC loss in my project and I'm trying to figure out the details of your implementation.

Question-0: the branch predicting uv for each foreground pixel (used for GCC loss) is only used during training. So it is only used to indirectly help train the 'camera pose' branch, and 'deformation' branch and won't be used during inference. Is my understanding correct?

Here are your relevant code and my understanding/questions of the implementation of GCC loss.

Step-1, Set ground truth 'self.codes_gt['xy_map']', with linspace function (easy to understand~):

def get_sample_grid(img_size):
| x = torch.linspace(-1, 1, img_size[1]).view(1, -1).repeat(img_size[0],1)
| y = torch.linspace(-1, 1, img_size[0]).view(-1, 1).repeat(1, img_size[1])
| grid = torch.cat((x.unsqueeze(2), y.unsqueeze(2)), 2)
| grid.unsqueeze(0)
| return grid

self.grid = cub_parse.get_sample_grid(self.upsample_img_size).repeat( 1, 1, 1, 1).to(self.device)

grid = self.grid.repeat(b_size,1,1,1)
self.codes_gt['xy_map'] = grid

Step-2, map predicted UV for each pixel to 3d vertices location, get 'points3d':

self.uv2points = cub_parse.UVTo3D(self.mean_shape)
points3d = geom_utils.project_uv_to_3d(self.uv2points, codes_pred['uv_map'])

Question-1: Why do you map UV to 3D first using 'UVTo3D', then use 'project_uv_to_3d'? What's the graphics formula/theory behind these two functions?

Step-3, orthographic project points3d to image plane, get 'codes_pred['project_points']'

def project_3d_to_image(points3d, cam, offset_z):

| projected_points = orthographic_proj_withz(points3d, cam, offset_z)
| return projected_points

codes_pred['project_points_cam_pred'] = geom_utils.project_3d_to_image(points3d, codes_pred['cam'], self.offset_z)
codes_pred['project_points_cam_pred'] = codes_pred['project_points_cam_pred'][..., 0:2].view(self.codes_gt['xy_map'].size())
codes_pred['project_points'] = codes_pred['project_points_cam_pred']

Step-4: GCC L2 loss between 'codes_pred['project_points''], and 'self.codes_gt['xy_map']'.

Reprojection Loss
project_points = codes_pred['project_points']
| if opts.ignore_mask_gcc:
| reproject_loss = reproject_loss_l2(project_points, codes_gt['xy_map'], seg_mask*0+1)
| else:
| reproject_loss = reproject_loss_l2(project_points, codes_gt['xy_map'], seg_mask)

Question-2: why does the 'codes_pred['project_points']' obtained by 'orthographic_proj_withz' range from 0 to 255 (e.g. when input image size is 256x256) ? I think 'points3d' is already in this range, but how does step-2 make this happen?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCC loss implementation #22

GCC loss implementation #22

meijie0401 commented Mar 8, 2021 •

edited

GCC loss implementation #22

GCC loss implementation #22

Comments

meijie0401 commented Mar 8, 2021 • edited

Step-1, Set ground truth 'self.codes_gt['xy_map']', with linspace function (easy to understand~):

Step-2, map predicted UV for each pixel to 3d vertices location, get 'points3d':

Step-3, orthographic project points3d to image plane, get 'codes_pred['project_points']'

Step-4: GCC L2 loss between 'codes_pred['project_points''], and 'self.codes_gt['xy_map']'.

meijie0401 commented Mar 8, 2021 •

edited