Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗? #28

Open
zhuyu2015 opened this issue May 14, 2024 · 8 comments

Comments

@zhuyu2015
Copy link

#else
			dL_dweight += (final_D2 + m_d * m_d * final_A - 2 * m_d * final_D) * dL_dreg;
#endif
			dL_dalpha += dL_dweight - last_dL_dT;
			// propagate the current weight W_{i} to next weight W_{i-1}
			last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;
			float dL_dmd = 2.0f * (T * alpha) * (m_d * final_A - final_D) * dL_dreg;
			dL_dz += dL_dmd * dmd_dd;

尤其367行的final_A和final_D要随循环的变化而变化吗?

@zhuyu2015
Copy link
Author

第367行是这一行:

float dL_dmd = 2.0f * (T * alpha) * (m_d * final_A - final_D) * dL_dreg;

@hbb1
Copy link
Owner

hbb1 commented May 14, 2024

Hi, here is the formulation for back-propagation of the depth-distortion loss:

$$L= \sum_{i=0}^{N-1}\sum_{j=0}^{N-1}w_iw_j(d_i-d_j)^2$$

Now we specifically analyze the $k$-th item and do some simplification,

$$ L_k = \sum_{j=0}^{k-1}w_kw_j(d_k-d_j)^2 + \sum_{i=k+1}^{N-1}w_iw_k(d_i-d_k)^2 $$

$$ L_k = \sum_{j=0}^{k-1}w_kw_j(d_k^2-2d_kd_j + d_j^2) + \sum_{i=k+1}^{N-1}w_iw_k(d_i^2-2d_id_k+d_k^2)$$

factorize the nested composition terms and we have got:
$$dL_k / dw_k = \sum_{j=0}^{k-1}w_j(d_k^2-2d_kd_j+d_j^2) + \sum_{i=k+1}^{N-1}w_i(d_i^2-2d_id_k+d_k^2)$$

$$dL_k / dw_k = D^2-w_kd_k^2 + d_k^2(A_k-w_k) - 2d_k(\sum_{j=0}^{k-1}w_jd_j + \sum_{i=k+1}^{N-1}w_id_i +w_kd_k -w_kd_k) = D^2 +d_k^2A-2d_kD $$

$$dL_k / dd_k = 2(\sum_{j=0}^{k-1}w_kw_j(d_k-d_j) + \sum_{i=k+1}^{N-1}w_iw_k(d_k-d_i)) = 2(w_k(d_k(A-w_k) - (D-d_kw_k))) = 2(w_k(d_kA - D))$$

@YanhaoZhang
Copy link

Hi @hbb1 The above formulation is very helpful in understanding the implementation of the backward function. May I further ask about the formulation of the normal loss?

@YanhaoZhang
Copy link

Another quick question. When calculating the derivative, why not considering the case of d L_k / dw_m, (when m not equals to k)?

@hbb1
Copy link
Owner

hbb1 commented Jun 7, 2024

Because we will instead compute something like dL / dw_{k-1} so that the algorithm can be efficiently run back-to-front.
Note that dL_k / d_wk * d_wk / d_w{k-1} = dL_k / dw_{k-1}

last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

@YanhaoZhang
Copy link

@hbb1 Thanks a lot for your prompt reply. I tried to understand how the backward is implemented these days but still a bit confused. I would be very appreciative if you could explain a bit more.

Based on (17)
$$\mathcal{L} = \sum^{N-1}_{i=0} \sum^{i-1}_j w_i w_j (d_i - d_j)^2$$.

$$w_k = \alpha_k \prod_{i}^{k-1}(1-\alpha_i)$$
The derivative of $w_k$ is
$$\frac{\partial \mathcal{L}}{\partial w_k} = \frac{\partial \mathcal{L}_k}{\partial w_k} = D^2 + d_k^2 A - 2d_kD $$
As the answer above, $\mathcal{L}_k$ is the k-th term of
$$\sum^{N-1}_i \sum^{N-1}_j w_i w_j (d_i - d_j)^2 $$,
where the second sum is to $N-1$ rather than to $i-1$ as (17).

The following code calculates $\frac{\partial \mathcal{L}}{\partial \alpha_k} $

dL_dalpha += dL_dweight - last_dL_dT;
// propagate the current weight W_{i} to next weight W_{i-1}
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

My understanding is that
$$\frac{\partial \mathcal{L}}{\partial \alpha_k} = \sum_i^{N-1} \frac{\partial \mathcal{L}}{\partial w_i} \frac{\partial w_i}{\partial \alpha_k}$$
I know that $\frac{\partial w_i}{\partial \alpha_k}=0$ when i<k. Considering a back-to-front process, we iteratively calculate the following derivatives inside the loop for (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++)
$$\frac{\partial \mathcal{L}}{\partial \alpha_{N-1}}=\frac{\partial \mathcal{L}}{\partial w_{N-1}} \frac{\partial w_{N-1}}{\partial \alpha_{N-1}}=\frac{\partial \mathcal{L}}{\partial w_{N-1}} \prod_{i}^{N-2}(1-\alpha_i)$$
$$\frac{\partial \mathcal{L}}{\partial \alpha_{N-2}}=\frac{\partial \mathcal{L}}{\partial w_{N-1}} \frac{\partial w_{N-1}}{\partial w_{N-2}}\frac{\partial w_{N-2}}{\partial \alpha_{N-2}}+\frac{\partial \mathcal{L}}{\partial w_{N-2}} \frac{\partial w_{N-2}}{\partial \alpha_{N-2}}=( \frac{\alpha_{N-1}}{\alpha_{N-2}}(1-\alpha_{N-2})\frac{\partial \mathcal{L}}{\partial w_{N-1}} +\frac{\partial \mathcal{L}}{\partial w_{N-2}})\prod_{i}^{N-3}(1-\alpha_i)$$

I was wondering if I made any mistakes. If not, how to get the implementation above based on the formulations. Thanks a lot.

@hbb1
Copy link
Owner

hbb1 commented Jun 10, 2024

Hi, I found I did not write it clearly.
$L$ can be seen as a summation of a $N \times N$ matrix. When we need derive the gradient of $w_k$, only some entries are related. I denoted the summation of them as $L_k$. So that $\partial L / \partial w_k = \partial L_k / \partial w_k$.

Now let's think of the gradients of $a_k$, they can pass from both $L$ through $w_k$ and from $w_{j}$ where $j&gt;k$ (occlusion). The $dL_d / da_k$ is compute as above and the gradients from lateral $w_k$ is computed as

dL_dalpha += dL_dweight - last_dL_dT;
# pass the next
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

@YanhaoZhang
Copy link

YanhaoZhang commented Jun 11, 2024

@hbb1 Really appreciate your prompt reply. I can understand most of the parts until the last sentence on $dL/d a_k$. Because of occlusion $\frac{\partial w_i}{\partial a_k}=0$ where $i &lt; k$, therefore
$$\frac{\partial L}{\partial a_k} = \sum_{i=k}^{N-1} \frac{\partial L}{\partial w_i} \frac{\partial w_i}{\partial a_k}$$
Also
$$w_k = a_k \prod_{i=0}^{k-1}(1-a_i)$$
which means $w_k = \frac{a_k(1-a_{k-1})}{a_{k-1}}w_{k-1}$. Therefore we have $\frac{\partial w_k}{\partial w_{k-1}}=\frac{a_k(1-a_{k-1})}{a_{k-1}}$ and $\frac{\partial w_k}{\partial a_k}=\prod(1-a_i)=T_{k-1}$.
Considering a back-to-front process and starting on the final $n$-th Gaussian:
$$\frac{\partial L}{\partial a_n}=\frac{\partial L}{\partial w_n} \frac{\partial w_n}{\partial a_n}=\frac{\partial L}{\partial w_n}T_{n-1}$$
$$\frac{\partial L}{\partial a_{n-1}}= \frac{\partial L}{\partial w_{n-1}} \frac{\partial w_{n-1}}{\partial a_{n-1}} + \frac{\partial L}{\partial w_n} \frac{\partial w_n}{\partial w_{n-1}} \frac{\partial w_{n-1}}{\partial a_{n-1}}=(\frac{\partial L}{\partial w_{n-1}} - \frac{a_n(a_{n-1}-1)}{a_{n-1}}\frac{\partial L}{\partial w_{n}} )T_{n-1}$$.

Based on the code

dL_dalpha += dL_dweight - last_dL_dT;
# pass the next
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

Denoting last_dL_dT as $\frac{d L}{d T_{k+1}}$, and ignoring $T_k$, within the loop:
$$\frac{d L}{d T_{n+1}}=0$$
$$\frac{\partial L}{\partial a_n}=\frac{\partial L}{\partial w_n} - \frac{d L}{d T_{n+1}} =\frac{\partial L}{\partial w_n} $$
$$\frac{d L}{d T_{n}}=\frac{\partial L}{\partial w_n}a_n+(1-a_n)\frac{d L}{d T_{n+1}}=a_n\frac{\partial L}{\partial w_n}$$
$$\frac{\partial L}{\partial a_{n-1}}= \frac{\partial L}{\partial w_{n-1}} - \frac{d L}{d T_{n}} = \frac{\partial L}{\partial w_{n-1}} - a_n\frac{\partial L}{\partial w_n}$$.
I find $\frac{a_n}{a_{n-1}}\frac{\partial L}{\partial w_{n}}$ is missing.

May I know if I made any mistake? Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants