Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int32Type: avoid unnecessary boxing for common cases #3513

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wilbit
Copy link
Contributor

@wilbit wilbit commented Mar 29, 2024

Indexed properties of DbDataReader returns a boxed value, which we convert to Int32 (and then box again, but it is a different story).
This is a performance hit and produces memory traffic. Why? Because, for example, SqlDataReader in Microsoft.Data.SqlClient stores int (and all other reference types) as a reference value, and reading of value as object forces SqlDataReader to box it.

@wilbit wilbit changed the title Int32Type: avoid unnecessary boxing for common types Int32Type: avoid unnecessary boxing for common cases Mar 31, 2024
@hazzik
Copy link
Member

hazzik commented Apr 10, 2024

@wilbit can you show some numbers?

@wilbit
Copy link
Contributor Author

wilbit commented Apr 10, 2024

@hazzik , sure! What numbers would you like to see? Memory traffic difference, performance difference, something else?

@gliljas
Copy link
Member

gliljas commented Apr 10, 2024

A agree with this effort, but I believe it should go beyond just Int32Type. Every primitive type suffers from this to some extent. However, the only thing that significantly improves performance (as in CPU performance, time taken) is to not be permissive about the type returned in the reader. GetInt32 is at least twice as fast as GetValue (rs[...]), but that performance advantage is gone when preceding it with a GetFieldType. What remains is getting rid of the unnecessary boxing, which is of course very good in itself (probably more important) and the stated purpose of the PR.

@fredericDelaporte
Copy link
Member

Would it be worth it to add a new option, something like sql_types.strict, disabled by default, which, when enabled, would cause the basic types to assume the data type loaded from the database is a perfect match?

That would allow for a simpler implementation of the optimization, which would be opt-in for those ensuring their database typing matches their mapping.

@hazzik
Copy link
Member

hazzik commented May 2, 2024

I think there will be boxing anyway. What we want to achieve, if I understand correctly, is unnecessary (object)int => int => (object)int conversion.

I think following code should suffice

var value = rs[name]; // object
return rs[name] switch
{
        int _ => value,
	BigInteger bi => (int) bi,
	var c => Convert.ToInt32(c)
};

@wilbit could you please check if that would would improve your scenario?

@wilbit
Copy link
Contributor Author

wilbit commented May 2, 2024

A agree with this effort, but I believe it should go beyond just Int32Type

One step at a time.
Also, for my use cases I don't see so big memory traffic for other types (but we do not use bytes, longs, etc widely)...

Would it be worth it to add a new option, something like sql_types.strict, disabled by default, which, when enabled, would cause the basic types to assume the data type loaded from the database is a perfect match?

That would allow for a simpler implementation of the optimization, which would be opt-in for those ensuring their database typing matches their mapping.

This new options seems to me fragile, because different DbReaders have different behaviours how they handle small type mismatchings (like int <> long <> decimal).
In terms of performance it would be great

  • to have implementations of Int32Type (and others) per db driver.
  • or the ability to create/assign custom implementations of this type "convertors"

I think there will be boxing anyway. What we want to achieve, if I understand correctly, is unnecessary (object)int => int => (object)int conversion.

I think following code should suffice

var value = rs[name]; // object
return rs[name] switch
{
        int _ => value,
	BigInteger bi => (int) bi,
	var c => Convert.ToInt32(c)
};

@wilbit could you please check if that would would improve your scenario?

Interesting. It should be the next:

var value = rs[name]; // object
return value switch
{
        int _ => value,
	BigInteger bi => (int) bi,
	var c => Convert.ToInt32(c)
};

to avoid 1 boxing, but I got you point.

Yes, it will work for what is said in the PR.
But, honestly, I proposed my changes in a way they are because:

  • in my fork I also have a cache of boxed int values between 0 and 1000. I do not propose this cache here because it is very specific to my case and will make performance even worse for majority of users.
  • I hope that one day there will be a way to return something more light than boxed values.

So, it won't work for my scenario, but will work for the improvement in this PR.

PS: after some considerations, I think your approach also will work for me (it will change nothing for Firebird 4/BigInteger case, but I'm OK with that).

@fredericDelaporte
Copy link
Member

The double boxing has been introduced by 990cd6a twenty years ago due to some deficiencies of a driver at that time. Prior to that, the code was using the typed methods of the data reader, without attempting to support unexpected types.

I consider it to be a bad legacy. It is neither sound nor safe to have a type mismatch between the model property type and the underlying table column type. Or am I missing cases where that is legit?

@hazzik
Copy link
Member

hazzik commented May 14, 2024

Or am I missing cases where that is legit?

Yes: SQLite.

It was long on my todo list to redesign the type system to uncouple the Get and Set logic from the types themselves. Hibernate done this in I think v 6.0.

@fredericDelaporte
Copy link
Member

The SQLite driver handles SQLite lax typing itself. It does perform the required conversion itself when a typed method is used, according to deAtog in #3530.

@deAtog
Copy link
Contributor

deAtog commented May 15, 2024

I like the concept, but I do not agree with the approach taken here. In my opinion, NHibernate should always call the type specific methods of the DbDataReader. If and only if that fails, should any other conversions be performed using the value of the indexer property. Any calls to Convert must specify a locale which must be configurable by the user and default to CultureInfo.InvariantCulture. Any driver agnostic conversion must not rely on the type returned from calling GetFieldType of a DbDataReader.

The System.Data.SQLite driver for instance will automatically convert numeric types to a text form when inserting into a TEXT column and automatically convert from the text form to the numeric type when requested to do so. In such cases, a call to GetFieldType would return a String while calling SQLiteDataReader.GetDouble would return a double after converting the value.

The underlying database driver must be given the opportunity to convert the value to the correct type if necessary prior to any other conversion. This allows the underlying driver to perform the conversion using the locale of the database, or perform any other database specific conversions.

The GetDateTime method of the System.Data.SQLite driver, for instance, will attempt to convert a value stored in a TEXT, INTEGER, or NUMERIC column to a DateTime type using the DateTimeFormat, DateTimeFormatString, and DateTimeKind properties of the connection string. The indexer method of the SQLiteDbDataReader will not perform this conversion unless the DetectTextAffinity or DetectStringType flags are set on the Connection or Command object. These flags cause the driver to examine the values of all string columns returned from the database and negatively impact performance dramatically. The GetDateTime method does not have this problem as it only attempts conversions of requested columns.

990cd6a indicates there might be issues with the Oracle driver in regards to the type specific methods. If those methods throw exceptions, then any subsequent conversion using the indexer property of the DbDataReader would work as expected. However, this would be extremely inefficient for the Oracle driver as exceptions are inherently slow in .Net and this would likely cause a flurry of exceptions if these methods throw an exception on every call. In my opinion, the problematic driver should have been encapsulated by a proxy that implemented the type specific functions of the DbDataReader by converting the value from the indexer method to the appropriate type in the Oracle specific way. This would eliminate the flurry of exceptions for that driver.

The above approach could be taken here if the Microsoft.Data.SqlClient driver cannot perform the requested conversions without throwing exceptions for common type conversions. Such an implementation could rely on the value of calling GetFieldType of the underlying DbDataReader if necessary.

@deAtog
Copy link
Contributor

deAtog commented May 16, 2024

For examples of why calling Convert without specifying a locale is wrong, see all the failing test cases I added in this pull request: #3548

The original failing case involved System.Data.SQLite's handling of DateTime which stores DateTime values as text by default.

I expanded my tests to include other common cases where a user might map a numeric property to a text column. A majority of these tests fail for the databases in the test suite here as the database will coerce numeric and date values to text upon insert, but NHibernate is not able to correctly retrieve the value. I suspect many of the failing tests would pass if the type specific methods of the DbDataReader were called instead of using Convert. This is certainly the case for SQLite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants