cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？ #28

zhuyu2015 · 2024-05-14T05:42:38Z

#else
			dL_dweight += (final_D2 + m_d * m_d * final_A - 2 * m_d * final_D) * dL_dreg;
#endif
			dL_dalpha += dL_dweight - last_dL_dT;
			// propagate the current weight W_{i} to next weight W_{i-1}
			last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;
			float dL_dmd = 2.0f * (T * alpha) * (m_d * final_A - final_D) * dL_dreg;
			dL_dz += dL_dmd * dmd_dd;

尤其367行的final_A和final_D要随循环的变化而变化吗？

The text was updated successfully, but these errors were encountered:

zhuyu2015 · 2024-05-14T06:16:39Z

第367行是这一行：

float dL_dmd = 2.0f * (T * alpha) * (m_d * final_A - final_D) * dL_dreg;

hbb1 · 2024-05-14T07:42:39Z

Hi, here is the formulation for back-propagation of the depth-distortion loss:

$$L= \sum_{i=0}^{N-1}\sum_{j=0}^{N-1}w_iw_j(d_i-d_j)^2$$

Now we specifically analyze the $k$-th item and do some simplification,

$$ L_k = \sum_{j=0}^{k-1}w_kw_j(d_k-d_j)^2 + \sum_{i=k+1}^{N-1}w_iw_k(d_i-d_k)^2 $$

$$ L_k = \sum_{j=0}^{k-1}w_kw_j(d_k^2-2d_kd_j + d_j^2) + \sum_{i=k+1}^{N-1}w_iw_k(d_i^2-2d_id_k+d_k^2)$$

factorize the nested composition terms and we have got:
$$dL_k / dw_k = \sum_{j=0}^{k-1}w_j(d_k^2-2d_kd_j+d_j^2) + \sum_{i=k+1}^{N-1}w_i(d_i^2-2d_id_k+d_k^2)$$

$$dL_k / dw_k = D^2-w_kd_k^2 + d_k^2(A_k-w_k) - 2d_k(\sum_{j=0}^{k-1}w_jd_j + \sum_{i=k+1}^{N-1}w_id_i +w_kd_k -w_kd_k) = D^2 +d_k^2A-2d_kD $$

$$dL_k / dd_k = 2(\sum_{j=0}^{k-1}w_kw_j(d_k-d_j) + \sum_{i=k+1}^{N-1}w_iw_k(d_k-d_i)) = 2(w_k(d_k(A-w_k) - (D-d_kw_k))) = 2(w_k(d_kA - D))$$

YanhaoZhang · 2024-06-07T05:20:07Z

Hi @hbb1 The above formulation is very helpful in understanding the implementation of the backward function. May I further ask about the formulation of the normal loss?

YanhaoZhang · 2024-06-07T07:08:23Z

Another quick question. When calculating the derivative, why not considering the case of d L_k / dw_m, (when m not equals to k)?

hbb1 · 2024-06-07T07:21:40Z

Because we will instead compute something like dL / dw_{k-1} so that the algorithm can be efficiently run back-to-front.
Note that dL_k / d_wk * d_wk / d_w{k-1} = dL_k / dw_{k-1}

last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

YanhaoZhang · 2024-06-10T04:09:59Z

@hbb1 Thanks a lot for your prompt reply. I tried to understand how the backward is implemented these days but still a bit confused. I would be very appreciative if you could explain a bit more.

Based on (17)
$$\mathcal{L} = \sum^{N-1}_{i=0} \sum^{i-1}_j w_i w_j (d_i - d_j)^2$$.

$$w_k = \alpha_k \prod_{i}^{k-1}(1-\alpha_i)$$
The derivative of $w_k$ is
$$\frac{\partial \mathcal{L}}{\partial w_k} = \frac{\partial \mathcal{L}_k}{\partial w_k} = D^2 + d_k^2 A - 2d_kD $$
As the answer above, $\mathcal{L}_k$ is the k-th term of
$$\sum^{N-1}_i \sum^{N-1}_j w_i w_j (d_i - d_j)^2 $$,
where the second sum is to $N-1$ rather than to $i-1$ as (17).

The following code calculates $\frac{\partial \mathcal{L}}{\partial \alpha_k} $

dL_dalpha += dL_dweight - last_dL_dT;
// propagate the current weight W_{i} to next weight W_{i-1}
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

My understanding is that
$$\frac{\partial \mathcal{L}}{\partial \alpha_k} = \sum_i^{N-1} \frac{\partial \mathcal{L}}{\partial w_i} \frac{\partial w_i}{\partial \alpha_k}$$
I know that $\frac{\partial w_i}{\partial \alpha_k}=0$ when i<k. Considering a back-to-front process, we iteratively calculate the following derivatives inside the loop for (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++)
$$\frac{\partial \mathcal{L}}{\partial \alpha_{N-1}}=\frac{\partial \mathcal{L}}{\partial w_{N-1}} \frac{\partial w_{N-1}}{\partial \alpha_{N-1}}=\frac{\partial \mathcal{L}}{\partial w_{N-1}} \prod_{i}^{N-2}(1-\alpha_i)$$
$$\frac{\partial \mathcal{L}}{\partial \alpha_{N-2}}=\frac{\partial \mathcal{L}}{\partial w_{N-1}} \frac{\partial w_{N-1}}{\partial w_{N-2}}\frac{\partial w_{N-2}}{\partial \alpha_{N-2}}+\frac{\partial \mathcal{L}}{\partial w_{N-2}} \frac{\partial w_{N-2}}{\partial \alpha_{N-2}}=( \frac{\alpha_{N-1}}{\alpha_{N-2}}(1-\alpha_{N-2})\frac{\partial \mathcal{L}}{\partial w_{N-1}} +\frac{\partial \mathcal{L}}{\partial w_{N-2}})\prod_{i}^{N-3}(1-\alpha_i)$$

I was wondering if I made any mistakes. If not, how to get the implementation above based on the formulations. Thanks a lot.

hbb1 · 2024-06-10T04:42:29Z

Hi, I found I did not write it clearly.
$L$ can be seen as a summation of a $N \times N$ matrix. When we need derive the gradient of $w_k$, only some entries are related. I denoted the summation of them as $L_k$. So that $\partial L / \partial w_k = \partial L_k / \partial w_k$.

Now let's think of the gradients of $a_k$, they can pass from both $L$ through $w_k$ and from $w_{j}$ where $j>k$ (occlusion). The $dL_d / da_k$ is compute as above and the gradients from lateral $w_k$ is computed as

dL_dalpha += dL_dweight - last_dL_dT;
# pass the next
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

YanhaoZhang · 2024-06-11T08:40:22Z

@hbb1 Really appreciate your prompt reply. I can understand most of the parts until the last sentence on $dL/d a_k$. Because of occlusion $\frac{\partial w_i}{\partial a_k}=0$ where $i < k$, therefore
$$\frac{\partial L}{\partial a_k} = \sum_{i=k}^{N-1} \frac{\partial L}{\partial w_i} \frac{\partial w_i}{\partial a_k}$$
Also
$$w_k = a_k \prod_{i=0}^{k-1}(1-a_i)$$
which means $w_k = \frac{a_k(1-a_{k-1})}{a_{k-1}}w_{k-1}$. Therefore we have $\frac{\partial w_k}{\partial w_{k-1}}=\frac{a_k(1-a_{k-1})}{a_{k-1}}$ and $\frac{\partial w_k}{\partial a_k}=\prod(1-a_i)=T_{k-1}$.
Considering a back-to-front process and starting on the final $n$-th Gaussian:
$$\frac{\partial L}{\partial a_n}=\frac{\partial L}{\partial w_n} \frac{\partial w_n}{\partial a_n}=\frac{\partial L}{\partial w_n}T_{n-1}$$
$$\frac{\partial L}{\partial a_{n-1}}= \frac{\partial L}{\partial w_{n-1}} \frac{\partial w_{n-1}}{\partial a_{n-1}} + \frac{\partial L}{\partial w_n} \frac{\partial w_n}{\partial w_{n-1}} \frac{\partial w_{n-1}}{\partial a_{n-1}}=(\frac{\partial L}{\partial w_{n-1}} - \frac{a_n(a_{n-1}-1)}{a_{n-1}}\frac{\partial L}{\partial w_{n}} )T_{n-1}$$.

Based on the code

dL_dalpha += dL_dweight - last_dL_dT;
# pass the next
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

Denoting last_dL_dT as $\frac{d L}{d T_{k+1}}$, and ignoring $T_k$, within the loop:
$$\frac{d L}{d T_{n+1}}=0$$
$$\frac{\partial L}{\partial a_n}=\frac{\partial L}{\partial w_n} - \frac{d L}{d T_{n+1}} =\frac{\partial L}{\partial w_n} $$
$$\frac{d L}{d T_{n}}=\frac{\partial L}{\partial w_n}a_n+(1-a_n)\frac{d L}{d T_{n+1}}=a_n\frac{\partial L}{\partial w_n}$$
$$\frac{\partial L}{\partial a_{n-1}}= \frac{\partial L}{\partial w_{n-1}} - \frac{d L}{d T_{n}} = \frac{\partial L}{\partial w_{n-1}} - a_n\frac{\partial L}{\partial w_n}$$.
I find $\frac{a_n}{a_{n-1}}\frac{\partial L}{\partial w_{n}}$ is missing.

May I know if I made any mistake? Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？ #28

cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？ #28

zhuyu2015 commented May 14, 2024

zhuyu2015 commented May 14, 2024

hbb1 commented May 14, 2024 •

edited

YanhaoZhang commented Jun 7, 2024

YanhaoZhang commented Jun 7, 2024

hbb1 commented Jun 7, 2024

YanhaoZhang commented Jun 10, 2024

hbb1 commented Jun 10, 2024 •

edited

YanhaoZhang commented Jun 11, 2024 •

edited

cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？ #28

cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？ #28

Comments

zhuyu2015 commented May 14, 2024

zhuyu2015 commented May 14, 2024

hbb1 commented May 14, 2024 • edited

YanhaoZhang commented Jun 7, 2024

YanhaoZhang commented Jun 7, 2024

hbb1 commented Jun 7, 2024

YanhaoZhang commented Jun 10, 2024

hbb1 commented Jun 10, 2024 • edited

YanhaoZhang commented Jun 11, 2024 • edited

hbb1 commented May 14, 2024 •

edited

hbb1 commented Jun 10, 2024 •

edited

YanhaoZhang commented Jun 11, 2024 •

edited