This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Float to int conversion: Clamp the result to the integer range #79

Open

mbitsnbites opened this issue Feb 20, 2019 · 1 comment

Member

mbitsnbites commented Feb 20, 2019 •

edited

Instead of always returning 0xffffffff for every out-of-range / overflow situation, instead do this:

	< min	> max	NaN
Signed	−2^31	2^31 - 1	0
Unsigned	0	2^32 - 1	0

Note: Several data processing systems (OpenCL, CUDA, SIMD, ...) prefer that NaN translates to zero.

The text was updated successfully, but these errors were encountered:

Member Author

mbitsnbites commented Mar 27, 2020

According to: https://github.com/mbitsnbites/leanfloat

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.