New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support something like OPT_STRICT_INTEGER for deserialization #473
Comments
Do you have a use case for strictly 53 bit numbers, or is this more that 64-bit integers, 128-bit integers and how floats can be coerced is not done well? Does deserializing a number that doesn't have a dot or exponent strictly as an integer handle the concern? In general, stricter behavior and support for 128-bit integers and floats would be good. I think the implementation is likely on the scale of writing a yyjson-like parser in Rust, though--the parser passing "raw numbers" probably being a big regression and definitely more difficult to handle now that it's fallible after the tape has been constructed. |
I do have a use case for strictly 53-bit integers: I'm interoperating with JavaScript. Specifically, I'm round-tripping JSON to/from JavaScript and Python, and I want to be able to do so losslessly. Here's a sequence where things would fail with
>>> orjson.dumps({ "number": 2.8e18 }, option=orjson.OPT_STRICT_INTEGER)
b'{"number":2.8e18}'
> JSON.parse('{"number":2.8e18}')
{ number: 2800000000000000000 }
> JSON.stringify({ number: 2800000000000000000 })
'{"number":2800000000000000000}'
>>> data = orjson.loads('{"number":2800000000000000000}')
>>> data
{'number': 2800000000000000000}
>>> orjson.dumps(data, option=orjson.OPT_STRICT_INTEGER)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Integer exceeds 53-bit range If I could add the proposed option on step 4 above, things would work perfectly: the value would be deserialized to As for an implementation, I spent some time looking through the codebase, and while I'm not particularly familiar with either Rust or yyjson, I think perhaps we could hook into things here: orjson/src/deserialize/yyjson.rs Lines 184 to 196 in 632345a
|
Bumping this so the stalebot doesn't close this out prematurely! |
I think I follow your use case. I don't follow how adding the option is a real solution given you can specify >53 bit integers as floats, yes. I'm not a numerical computing person and I suppose I don't follow how this doesn't quickly become a much wider issue of having defined behavior for all of it, including 128-bit integers and floats. And because none of that is in the spec it would require understanding what is done across as many libraries as possible and looking for a "most compatible without being incorrect" option. If this is an important issue for you I would suggest experimenting with your own fork instead of waiting on this. I think I bias to not touching number parsing unless there's a general plan. |
I'd be happy to work in a fork to try this out, but I'd be hesitant to maintain a fork indefinitely. Do you see any path to this landing in |
@ijl any chance I could get a response from you so I can know if this is worth pursuing? |
Consider this code:
This will fail, which is currently the expected behavior. The value is deserialized to
int(2800000000000000000)
, which is>= 2 ** 53
, and so cannot be serialized perorjson.OPT_STRICT_INTEGER
.However, the following code succeeds:
This is because
orjson
parses that value asfloat(18446744073709551616)
, and it can always serialize floats back to strings, albeit not losslessly.What I'd like is to be able to opt-in to parsing any numbers that are
>= 2 ** 53
as floats so that they can be round-tripped throughorjson
without error. We could do this via roughly the same API asorjson.dumps
uses:We could of course use a different constant, which might be reasonable given that the semantics are different ("strict" could imply failure, but what I really want are just stricter bounds on what is parsed as an
int
instead of afloat
).Would you be open to a PR implementing his feature? If so, do you have thoughts on what the API should look like?
The text was updated successfully, but these errors were encountered: