Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could sqitch tables use C collation rather than the default collation? #763

Open
datafoo opened this issue Jun 14, 2023 · 4 comments · May be fixed by #765
Open

Could sqitch tables use C collation rather than the default collation? #763

datafoo opened this issue Jun 14, 2023 · 4 comments · May be fixed by #765
Assignees
Labels
engine patched Fixed in a branch but not yet merged.

Comments

@datafoo
Copy link

datafoo commented Jun 14, 2023

PostgreSQL indicates:

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

Could Sqitch create its tables using the C collation rather than the default collation?

@datafoo
Copy link
Author

datafoo commented Jun 14, 2023

That would remove the need from re-indexing due to collation version mismatch (see https://www.postgresql.org/docs/current/sql-altercollation.html#SQL-ALTERCOLLATION-NOTES) since the C collation does not change.

@theory
Copy link
Collaborator

theory commented Jun 19, 2023

Yeah, probably a good idea, although even the C and POSIX collations can change when you upgrade glibc. There was a ton of discussion of the issue at PGCon this year, including this session. Maybe C locale would be the least bad, tho, considering that Sqitch mainly relies on indexes for uniqueness rather than sorting (it mainly sorts by dates).

@datafoo
Copy link
Author

datafoo commented Jun 20, 2023

even the C and POSIX collations can change when you upgrade glibc

I was under the impression that C collation doesn't ever change after reading a comment from @laurenz on a blog article he wrote at https://www.cybertec-postgresql.com/en/icu-collations-against-postgresql-data-corruption/#comment-6208123551

[the C collation] doesn't ever change. it is just the order defined by memcmp() of the strings, which only depends on the encoding.

Look here:

The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

@theory
Copy link
Collaborator

theory commented Jun 20, 2023

Oh, perhaps you're right, I forgot about that. The examples from the presentation use en-US.UTF-8.

theory added a commit that referenced this issue Jun 25, 2023
Applies also to Yugabyte, and separately add for Cockroach. Using the
POSIX collation ensures that index ordering never changes when the
database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
@theory theory linked a pull request Jun 25, 2023 that will close this issue
@theory theory self-assigned this Jun 25, 2023
@theory theory added engine patched Fixed in a branch but not yet merged. labels Jun 25, 2023
theory added a commit that referenced this issue Jun 25, 2023
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher.

Using the POSIX collation ensures that index ordering never changes when
the database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
theory added a commit that referenced this issue Jul 1, 2023
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher.

Using the POSIX collation ensures that index ordering never changes when
the database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
theory added a commit that referenced this issue Jul 29, 2023
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher.

Using the POSIX collation ensures that index ordering never changes when
the database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
theory added a commit that referenced this issue Sep 2, 2023
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher.

Using the POSIX collation ensures that index ordering never changes when
the database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
theory added a commit that referenced this issue Feb 4, 2024
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher.

Using the POSIX collation ensures that index ordering never changes when
the database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
theory added a commit that referenced this issue Feb 5, 2024
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher.

Using the POSIX collation ensures that index ordering never changes when
the database is upgraded, which is a particular problem with glibc
collations, but since POSIX/C collation is strictly byte-ordered, it
should be fine. Of course, any of use of `ORDER BY` on such columns will
return unexpected results when users are used to other locales, but
since Sqitch itself only ever orders by timestamp, it should not be an
issue in its own use.

Closes #763.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine patched Fixed in a branch but not yet merged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants