Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transducer grad compute formular #37

Open
zh794390558 opened this issue Jul 25, 2022 · 9 comments
Open

transducer grad compute formular #37

zh794390558 opened this issue Jul 25, 2022 · 9 comments

Comments

@zh794390558
Copy link

zh794390558 commented Jul 25, 2022

The formular for gradient is below in warprnnt_numba and warp_transducer cpu:

    T, U, _ = log_probs.shape
    grads = np.full(log_probs.shape, -float("inf"))
    log_like = betas[0, 0]  # == alphas[T - 1, U - 1] + betas[T - 1, U - 1]

    # // grad to last blank transition
    grads[T - 1, U - 1, blank] = alphas[T - 1, U - 1]
    grads[: T - 1, :, blank] = alphas[: T - 1, :] + betas[1:, :]

    # // grad to label transition
    for u, l in enumerate(labels):
        grads[:, u, l] = alphas[:, u] + betas[:, u + 1]

    grads = -np.exp(grads + log_probs - log_like)

that is not same to torchaudio, optimized_transducer and ,warp_transducer gpu,
but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio, how that is achieved?

@csukuangfj
Copy link
Owner

but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio

Where did you find that?

@csukuangfj
Copy link
Owner

The README.md says:

Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

It only says alpha and beta, not grad.

@zh794390558
Copy link
Author

zh794390558 commented Jul 25, 2022

It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93. I also created a [colab notebook](https://colab.research.google.com/drive/1vMkH8LmiCCOiCo4KTTEcv-NU8_OGn0ie?usp=sharing) to reproduce that issue.

This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).

Sorry, I got it wrong. So for the known conclusion, trochaudio is aligned with optimized_transducer. The warp_transducer gpu will has the same grad result as optimized_transducer, beside warp_transducer cpu since the gradient formula is not right?

@zh794390558
Copy link
Author

zh794390558 commented Jul 25, 2022

image
why cpu and gpu loss for warp_transducer is not equal, in the codelab?

I think the above wrong conclusion is got from here.
image

@csukuangfj
Copy link
Owner

The warp_transducer gpu will has the same grad result as optimized_transducer

No. You can find the conclusions in the colab (listed in the README.md).


why cpu and gpu loss for warp_transducer is not equal, in the codelab?

Please ask the author of warp-transducer.

@zh794390558
Copy link
Author

image

用的codalab的case, espnet的rnnt,结果是一致的。是我使用有问题吗?

image

image

@csukuangfj
Copy link
Owner

我刚刚又跑了一遍上面的 colab notebook, 发现复现不了以前的结果了。不知道哪里出问题了。

@zh794390558
Copy link
Author

zh794390558 commented Jul 29, 2022

所以这个问题还有吗?可能是cuda版本问题?

BTW, 能把colab里的torch版本固定吗? 上次跑了下,发现无法跑通。

@csukuangfj
Copy link
Owner

codelab

readme.md 中,给的 colab notebook, 里面使用了 Tesla K80 gpu.

我今天试的 colab notebook, 被分配到了 Tesla T4, 所以测试环境不一样了。

如果你能在 Tesla K80 gpu 中复现的话,那么,这个问题,就是存在的。不能的话,那么应该就不存在了。

(我稍后在本地的 v100 gpu 中,看能不能复现).


BTW, 能把colab里的torch版本固定吗? 上次跑了下,发现无法跑通。

可以的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants