Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point fill values' endianness #279

Open
clbarnes opened this issue Nov 1, 2023 · 7 comments
Open

Floating point fill values' endianness #279

clbarnes opened this issue Nov 1, 2023 · 7 comments

Comments

@clbarnes
Copy link
Contributor

clbarnes commented Nov 1, 2023

Following on from #236

IEEE754 doesn't specify an endianness for float representations - does this mean that the hex string representation of the fill value of a float dataset is dependent on the endianness of the codecs? If so, it would be much more convenient to just say that it's always of a particular endianness.

@jbms
Copy link
Contributor

jbms commented Nov 1, 2023

No, the hex string always has the sign bit as the most significant bit (i.e. first) and does not depend on endianness. Perhaps you can create a PR to clarify.

@clbarnes
Copy link
Contributor Author

clbarnes commented Nov 1, 2023

Is that an implementation detail of the C function referenced in the spec?

@jbms
Copy link
Contributor

jbms commented Nov 1, 2023

Is that an implementation detail of the C function referenced in the spec?

No, and actually the warning about strtod was in relation to the NaN syntax nan(1234) that I previously proposed but was rejected.

strtod accepts the "OxYYYYYYYY[.ZZZZZZ]" hex floating point syntax which has a different meaning. Unfortunately strtod does not guarantee that every distinct NaN value has a corresponding string representation so we can't rely on the strtod spec.

I intended to convey what I said in #279 (comment) with the language "specifying the byte representation of the floating point number as an unsigned integer", where I was assuming the usual endian-agnostic representation of the floating point number as a sequence of bits, where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits. The NaN example also serves to clarify. Perhaps there is a better way to state it, though.

@clbarnes
Copy link
Contributor Author

clbarnes commented Nov 1, 2023

the usual endian-agnostic representation of the floating point number

This norm is what I was struggling to find details of, just came up with ambiguity e.g. https://stackoverflow.com/questions/2945174/floating-point-endianness

@clbarnes
Copy link
Contributor Author

clbarnes commented Nov 1, 2023

Writing the PR using this language

where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits

and had another question - different languages may default to different NaN values when using their respective NaN-creation routines. Are we taking a "NaN" fill to mean that any NaN value is valid, or are we specifying a specific NaN as implied by the example in the "0x..." point? If the former, implementations probably shouldn't ever write "NaN" (opting for the byte string instead) because they don't necessarily know the intention of other readers/writers. The alternative is to disallow specific NaNs entirely.

@jbms
Copy link
Contributor

jbms commented Nov 2, 2023

Writing the PR using this language

where the first (most significant) bit is the sign bit, followed by the exponent bits, followed by the mantissa bits

and had another question - different languages may default to different NaN values when using their respective NaN-creation routines. Are we taking a "NaN" fill to mean that any NaN value is valid, or are we specifying a specific NaN as implied by the example in the "0x..." point? If the former, implementations probably shouldn't ever write "NaN" (opting for the byte string instead) because they don't necessarily know the intention of other readers/writers. The alternative is to disallow specific NaNs entirely.

"NaN" means the specific value as defined in the specification:

"NaN", denoting thenot-a-number (NaN) value where the sign bit is 0 (positive), the most significant bit (MSB) of the mantissa is 1, and all other bits of the mantissa are zero.

(There is a missed space.)

@jbms
Copy link
Contributor

jbms commented Nov 2, 2023

Note that an IEEE 754 NaN value is indicated by any sign bit, all 1 exponent bits, and any non-zero mantissa. By specifying the sign and mantissa we fully specify the value.

LDeakin added a commit to LDeakin/zarrs that referenced this issue Nov 4, 2023
This accounts for the fact that f32/f64::NAN is not guaranteed to match the byte representation of a NaN as specified in the zarr spec.
zarr-developers/zarr-specs#279
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants