Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference in half precision #380

Open
Tomsen1410 opened this issue Feb 28, 2024 · 3 comments
Open

Inference in half precision #380

Tomsen1410 opened this issue Feb 28, 2024 · 3 comments

Comments

@Tomsen1410
Copy link

Hey,

just wanted to ask whether it is safe to run DINOv2 in half precision for inference. Is there any degradation in the quality of the features?

Thanks!

@heyoeyo
Copy link

heyoeyo commented Feb 28, 2024

Whether or not float16 leads to a degradation probably depends on the use case. For example, if you pass the features into a model that's very heavily fit to the output of the float32 version, maybe it breaks on float16. This seems to (very rarely) happen on some depth estimation models I've tried. Using bfloat16 usually fixes the problem.

Following from issue #373, here's a comparison of the token norms at each block of vit-l running on the same input image, with float32 on the left & float16 on the right. Qualitatively at least, they're identical. Even the 'high norm' tokens are the same, which suggests that the float16 conversion doesn't lead to unstable results.
vitl_f32_vs_f16

@Tomsen1410
Copy link
Author

Thank you very much!

@heyoeyo
Copy link

heyoeyo commented Feb 29, 2024

Just as a follow up, to not give the impression that bfloat16 is always better to use, here's a small zoomed/cropped section of a depth estimate of some rocks on the ground. There's a plane-of-best-fit removal and a contrast boost, so it's a bit of an extreme example, just to show the differences. Float32 is on the left, then float16, then bfloat16:

f32_f16_bf16

Float32 and 16 look very similar (there are slight differences when flipping between images though). On the otherhand, bfloat16 has visible grid-like artifacts.

From what I've seen, float16 is usually fine, but rarely will give random inf or NaN results, in which case bfloat16 tends to give more reasonable results (but otherwise always has small artifacts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants