Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

have default collation be utf8mb4_0900_ai_ci #7856

Open
jycor opened this issue May 13, 2024 · 3 comments
Open

have default collation be utf8mb4_0900_ai_ci #7856

jycor opened this issue May 13, 2024 · 3 comments
Labels
correctness We don't return the same result as MySQL good repro Easily reproducible bugs

Comments

@jycor
Copy link
Contributor

jycor commented May 13, 2024

By default, MySQL's collation is utf8mb4_0900_ai_ci
https://dev.mysql.com/doc/refman/8.0/en/charset.html#:~:text=The%20default%20MySQL%20server%20character%20set%20and%20collation%20are%20utf8mb4%20and%20utf8mb4_0900_ai_ci

Dolt has it set as utf8mb4_0900_bin.

As a result, we have to set the collation through @@collation_connection (@@persist.collation_connection if you want it to stick for dolt sql and dolt sql -q) to get the same behavior out of the box.

tmp/main*> select @@collation_connection;
+------------------------+
| @@collation_connection |
+------------------------+
| utf8mb4_0900_bin       |
+------------------------+
1 row in set (0.00 sec)

tmp/main*> select 'abc' like 'ABC';
+------------------+
| 'abc' like 'ABC' |
+------------------+
| false            |
+------------------+
1 row in set (0.00 sec)

tmp/main*> set @@collation_connection = utf8mb4_0900_ai_ci;
tmp/main*> select 'abc' like 'ABC';
+------------------+
| 'abc' like 'ABC' |
+------------------+
| true             |
+------------------+
1 row in set (0.00 sec)

Additionally, MySQL's LIKE operator is always case-insensitive, regardless of @@collation_connection
https://stackoverflow.com/questions/14007450/how-do-you-force-mysql-like-to-be-case-sensitive

related: #7851

@jycor jycor added good repro Easily reproducible bugs correctness We don't return the same result as MySQL labels May 13, 2024
@nicktobey
Copy link
Contributor

nicktobey commented May 13, 2024

I believe this is a deliberate decision that we made for performance reasons / compatibility with older versions of Dolt.

@timsehn
Copy link
Sponsor Contributor

timsehn commented May 15, 2024

@Hydrocharged did this for a reason. Please elaborate Daylon.

@Hydrocharged
Copy link
Contributor

Responded to James and forgot to write it here too.

For utf8mb4_0900_bin, it's both legacy and performance. Before we had collations, Go's default string handling operates the exact same as utf8mb4_0900_bin, so we put that collation everywhere since it's technically correct.

Now though it's for performance, since we don't have to do anything special for those strings, but for all other collations we have to handle them in some special way. IIRC _ai_ci collations are pretty heavy, so we avoid them when we can as far as defaults go. In addition, the change would mean that importing the same table in a newer version would create a different table, which is its own issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correctness We don't return the same result as MySQL good repro Easily reproducible bugs
Projects
None yet
Development

No branches or pull requests

4 participants