HHH-16531 HHH-18004 ColumnDefinitions should respect @Column(columnDefinition) #8137

quaff · 2024-04-10T05:33:36Z

https://hibernate.atlassian.net/browse/HHH-16531
https://hibernate.atlassian.net/browse/HHH-18004

hibernate-core/src/main/java/org/hibernate/type/SqlTypes.java

hibernate-core/src/main/java/org/hibernate/tool/schema/internal/ColumnDefinitions.java

gavinking · 2024-04-10T08:45:23Z

hibernate-core/src/main/java/org/hibernate/tool/schema/internal/ColumnDefinitions.java

+			String required = new StringTokenizer( normalize( column.getTypeName() ), "( " ).nextToken();
+			String actual = normalize( columnInformation.getTypeName() );
+			return actual.equals( required );
+		}


I don't understand the purpose of this code, and naively it doesn't look like it should be necessary.

It should already be taken care of by the existing:

normalize( stripArgs( column.getSqlType( metadata ) ) ).equals( normalize( columnInformation.getTypeName() ) );

Since column.getSqlType( metadata ) returns the sqlTypeName if there is one, and since stripArgs(x) does the same thing as new StringTokenizer(x , "( " ).nextToken().

It should already be taken care of by the existing:

normalize( stripArgs( column.getSqlType( metadata ) ) ).equals( normalize( columnInformation.getTypeName() ) );

It should return false immediately instead of fallback if @Column(columnDefinition) present.

stripArgs() doesn't strip not null.

Well, if that's the case then better fixes would be:

Add else if (column.getSqlTypeName()!=null) { return false; } which is much simpler.

Change the stripArgs() to handle the additional cases you're concerned about.

But the thing is I just don't see how this can be made robust. Your implementation reduces character varying (10) not null to just character, which is a different type. So that's no good. Now, sure, you could look for not null explicitly and remove it, but there's many more complicated things apart from not null that this freeform columnDefinition could contain.

I mean, if you can think of something, I'm all ears, but just "take the first token" definitely isn't good enough.

I added dedicated method extractTypeName(String typeExpression) to handle this.

hibernate-core/src/main/java/org/hibernate/tool/schema/internal/ColumnDefinitions.java

hibernate-core/src/main/java/org/hibernate/mapping/Column.java

gavinking · 2024-04-10T09:56:10Z

OK, this LGTM now. Thanks!

quaff · 2024-04-10T10:15:55Z

OK, this LGTM now. Thanks!

Updated, fix that column not updated if columnDefinition added and differs to inferred type but the inferred type is equals to actual type in database, please review again.

gavinking · 2024-04-10T10:18:12Z

hibernate-core/src/main/java/org/hibernate/tool/schema/internal/ColumnDefinitions.java

 		boolean typesMatch = dialect.equivalentTypes( column.getSqlTypeCode(metadata), columnInformation.getTypeCode() )
-				|| normalize( stripArgs( column.getSqlType( metadata ) ) ).equals( normalize( columnInformation.getTypeName() ) );
+				|| sqlTypeNameEquals;


I mean after this latest change you are now optimizing for the uncommon case, putting all the string manipulation up front. I have no idea if it's a significant performance hit, most probably not, but I did prefer the previous version.

I mean after this latest change you are now optimizing for the uncommon case, putting all the string manipulation up front. I have no idea if it's a significant performance hit, most probably not, but I did prefer the previous version.

Previous version doesn’t update existing data type if columnDefinition added, I will add tests to cover it tomorrow.

Previous version doesn’t update existing data type if columnDefinition added

Ok, so let's take a half-step back here. Consider:

@ColumnDefinition(columnDefinition="character varying(255) not null default '-'") String name;

Schema Update, as it's written today, simply can't do anything to help with things like this. The only things it knows how do deal with are column types and lengths. And to do any more that, it would need to be able to parse and understand the native DDL of eleven different databases.

I think you're thinking that people use columnDefinition to override just the column type and length. And perhaps some poor misguided souls do indeed use it that way. But that's something we tell them not to do.

So I don't think we should go beyond checking that the SQL column type and length is sensible. We can't validate that the whole columnDefinition is respected, because JDBC metadata doesn't give us access to that.

Long version;

First run, It's expected that version column added as bigint.

Now I want it be integer but keep java type as Long.

@ColumnDefinition(columnDefinition="integer") Long version;

Second run, It detect Long equals to bigint, and schema will not be updated, that's the behavior of previous version, I think it's a bug, WDYT?

Right, but this is not a way of using columnDefinition that we encourage or even really support.

For this we tell you to use @JdbcTypeCode(INTEGER), so that we can:

generate DDL that is portable across different databases,

correctly add not null constraints and sometimes other things, and

perform any appropriate type conversions on the Java side, and use the correct methods when writing to/from JDBC (not a problem in this particular very trivial case, but important in general).

And the issue is that since columnDefinition is not a SQL type name, but a fragment of native DDL from one ~10 different dialects of SQL, even just recognizing that all we have here in your case is a type name is a nontrivial problem.

I mean, really it's a failing of JPA itself, which I guess should let @Column specify a columnType from the range of well-defined JDBC types, similar to what our @JdbcTypeCode does. Back in the day this was not added to the spec because TopLink didn't support it, but perhaps EclipseLink does now (I have no idea).

Now, what could we do? Well, in principle, I suppose, we could add support to the Dialects for parsing and interpreting columnDefinitions, something sorta similar to what you proposed here earlier, but obviously it would need to be much more robust. And then we could build that into Column itself, so that your code:

@Column(columnDefinition="integer") Long version;

would be interpreted as if you had written:

@JdbcTypeCode(INTEGER) @Column(columnDefinition="integer") Long version;

and:

@Column(columnDefinition="decimal(10,2) not null") Long version;

would be interpreted as if you had written:

@JdbcTypeCode(DECIMAL) @Column(columnDefinition="decimal(10,2) not null", nullable=false, precision=10, scale=2) Long version;

So how hard would it be to implement this? Well, not necessarily very hard I suppose. There are some type names which are a bit nontrivial to parse, e.g. timestamp(6) with time zone, char(128) for bit data, etc, and lots of dialect-specific variation in naming, e.g. timestamptz, varchar2, etc, but it's not difficult work, just a bit of tedium involved in going through all the databases and compiling all the information.

I think it would be a very nice thing to have, to be honest, but I'm not motivated to lose a week of my life to this.

What I'm saying is: if you want to do this, do it properly, instead of hacking in something that sorta works in only the most trivial cases.

I agree with you, but columnDefinition is JPA specification and @JdbcTypeCode is Hibernate specific that most people like me doesn't even know.

I mean after this latest change you are now optimizing for the uncommon case, putting all the string manipulation up front. I have no idea if it's a significant performance hit.

Please review latest commit, It should avoid string manipulation for common case.

What I'm saying is that I don't want to do anything here unless we do it properly. In particular I don't want to make this code more complex than it already is just to handle 1% of possible cases. That's just technical debt.

If you really believe that this is important, then it should be done according to something like the approach I've outlined.

quaff · 2024-04-16T02:21:27Z

JDBC type names and codes are chaos, It wasted my several days, this may be my final attempt, I chose another safer way that differentiate updating from validating schema which should ignore user-specified definition, It will minimize impact.

Meanwhile I added a dedicate field columnDefinition instead of existing sqlType to store user-specified definition, It's not required(not confirmed) but worthwhile to not mix user-specified and inferred one.

WDYT @gavinking

quaff · 2024-04-24T02:51:45Z

It fix https://hibernate.atlassian.net/browse/HHH-18004 also.

…finition)

quaff marked this pull request as draft April 10, 2024 05:33

quaff force-pushed the HHH-16531 branch 3 times, most recently from 7d4ff56 to 4eb1925 Compare April 10, 2024 08:08