New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference in half precision #380
Comments
Whether or not float16 leads to a degradation probably depends on the use case. For example, if you pass the features into a model that's very heavily fit to the output of the float32 version, maybe it breaks on float16. This seems to (very rarely) happen on some depth estimation models I've tried. Using bfloat16 usually fixes the problem. Following from issue #373, here's a comparison of the token norms at each block of vit-l running on the same input image, with float32 on the left & float16 on the right. Qualitatively at least, they're identical. Even the 'high norm' tokens are the same, which suggests that the float16 conversion doesn't lead to unstable results. |
Thank you very much! |
Just as a follow up, to not give the impression that bfloat16 is always better to use, here's a small zoomed/cropped section of a depth estimate of some rocks on the ground. There's a plane-of-best-fit removal and a contrast boost, so it's a bit of an extreme example, just to show the differences. Float32 is on the left, then float16, then bfloat16: Float32 and 16 look very similar (there are slight differences when flipping between images though). On the otherhand, bfloat16 has visible grid-like artifacts. From what I've seen, float16 is usually fine, but rarely will give random |
Hey,
just wanted to ask whether it is safe to run DINOv2 in half precision for inference. Is there any degradation in the quality of the features?
Thanks!
The text was updated successfully, but these errors were encountered: