Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add impls for i128 16-byte integer db conversions (behind feature gate) #392

Closed
wants to merge 1 commit into from

Conversation

mqudsi
Copy link
Contributor

@mqudsi mqudsi commented Sep 5, 2018

rust nightly has i128/u128 as 16-byte integer types, expected to land
into stable at some point in the (near?) future. SQLite's maximum
variable-length integer size is 8-bytes, so this needs to be serialized
into a blob for storage in the database.

Obviously endianness is a concern here; I've opted to use the platform's
native endianness rather than a hard-coded BE/LE choice. This is
contracted out to i128's from_bytes/to_bytes, which will take care
of the platform-specific endianness for us.

The usage of unsafe in the call to i128::from_bytes is to avoid a
potentially expensive extra allocation and looping over the contents of
the vector to copy them to a [u8; 16] intermediate. Once RFC #2000 lands in nightly (tracked in #44580), const generics should make it possible to convert the Vec<u8> to a fixed-length [u8; 16] array in a nicer fashion.

rust nightly has i128/u128 as 16-byte integer types, expected to land
into stable at some point in the (near?) future. SQLite's maximum
variable-length integer size is 8-bytes, so this needs to be serialized
into a blob for storage in the database.

Obviously endianness is a concern here; I've opted to use the platform's
native endianness rather than a hard-coded BE/LE choice. This is
contracted out to i128's `from_bytes`/`to_bytes`, which will take care
of the platform-specific endianness for us.

The usage of `unsafe` in the call to i128::from_bytes is to avoid a
potentially expensive extra allocation and looping over the contents of
the vector to copy them to a `[u8; 16]` intermediate. Once [RFC #2000](rust-lang/rfcs#2000) lands in nightly (tracked in [#44580](rust-lang/rust#44580)), [const generics](https://internals.rust-lang.org/t/lang-team-minutes-const-generics/5090) should make it possible to convert the `Vec<u8>` to a fixed-length `[u8; 16]` array in a nicer fashion.
@mqudsi
Copy link
Contributor Author

mqudsi commented Sep 5, 2018

There's no existing #[cfg(nightly)] for rusqlite, so I just stuck this in its own feature (i128).

@gwenn
Copy link
Collaborator

gwenn commented Sep 6, 2018

Thanks.
My concern is that SQLite is a Single-file Cross-platform Database.
But here, we are breaking the rule.

Blob seems ok: https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg07864.html.

I need to check other bindings (apsw, go-sqlite3, sqlite-jdbc, System.Data.SQLite, ...).

@mqudsi
Copy link
Contributor Author

mqudsi commented Sep 7, 2018

I had the same concerns. Ideally, something like "if number fits within 64-bits, store as numeric. Else, store as 128-bit blob with the same endianness sqlite uses internally for ints (with leading zero bytes lopped off?)" would seem to be the most pragmatic approach and would be future compatible should sqlite ever choose to adopt 16-byte integers, but it was a bit too much runtime logic for me to be comfortable with.

@gwenn
Copy link
Collaborator

gwenn commented Sep 7, 2018

It seems that the only binding supporting big number is sqlite-jdbc:
https://github.com/xerial/sqlite-jdbc/blob/master/src/main/java/org/sqlite/jdbc3/JDBC3PreparedStatement.java#L388

BigInteger bigint = ...;
PreparedStatement stmt = ...;
stmt.setObject(1, bigint);

A CLOB is used.

@rmini
Copy link

rmini commented Sep 24, 2018

I've got some experience trying to do this with SQLite in C, so here are my two cents:

Here the appropriate thing is to look at the SQLite C API's behavior, as opposed to any assumed internal or on-disk representation. SQLite's API allows passing in integer values as ints (sqlite3_bind_int), 64-bit ints (sqlite3_bind_int64), or as text of digits (sqlite3_bind_text). Passing in BLOBs or other forms of digits won't result in a numeric value being stored. Type conversion happens automatically according to the defined type of the column, but only between numeric and text formats. SQLite's actual storage of numeric values uses a variable-length representation, and isn't important as it's transparent to the user of the API.

SQLite stores integer values outside of the range of i64 when passed in as text (sqlite3_bind_text) as the REAL storage class when inserted into columns with INTEGER or NUMERIC affinity, with the concomitant loss of precision (SQLite guarantees only 15 digits of precision). For columns with TEXT affinity, the values are stored losslessly as TEXT. This would be the forward-compatible way to provide these values as parameters if SQLite were to natively support higher precision integers in the future (there's no indication that is the case, however). With numeric columns, values in the range of i64 will sort and operate properly on other numeric values but, outside of this range, precision is lost. With text columns, values in the range of i64 will not sort or compare as expected without a custom collation, although it's likely that numeric operations would still work properly with implicit conversions, and outside of this range, precision would be lost during numeric operations.

By passing i128 as a 16-byte big-endian BLOB, the columns would sort and compare properly, but any other numeric operations in SQLite would not work, and this format would not likely be forward-compatible if SQLite were to support higher-precision numeric formats in the future, but at least there wouldn't be any surprises with loss of precision.

Python has arbitrary-precision integers and handles those with its sqlite3 library by throwing an exception when attempting to pass an int outside of the i64 range, but I wouldn't expect that behavior (either an error or a panic) in Rust if the API is accepting i128 values.

I'd suggest renaming the feature i128_blob, and forcing the representation to BE so sorts/comparisons still work as expected, at least for non-negative numbers. For a representation that sorts/compares negatives properly, an excess-2^128 BE representation would work.

@gwenn
Copy link
Collaborator

gwenn commented Dec 17, 2018

Would you mind closing the PR ?

@mqudsi mqudsi closed this Dec 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants