Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicees on VARCHAR(255) break if charset "utf8mb4" is used with MariaDB #831

Open
phish108 opened this issue Feb 9, 2019 · 1 comment

Comments

@phish108
Copy link

phish108 commented Feb 9, 2019

I ran across an issue with the MariaDB/MySQL backend.

The old style VARCHAR(255) breaks the database spec if the utf8mb4 charset is used.

utf8mb4 supports the full uft8 range with character sizes up to 4 bytes.

If the default charset of a database is set to utf8mb4, then the database specification of MT breaks because the maximun size for index columns is 1000 bytes. However, if VARCHAR(255) is specified with utf8mb4 the maximum size of that field would be 1020.

This is a problem if emojis are used, for example in titles, author names, and other user editable meta data. Unfortunately, users love using emojis pretty much everywhere ;)

There are two approaches to fix this:

  1. Reduce the field size to 250 and trim existing data if necessary (would not be a problem here).

  2. Have user editable fields with the 255 length in utf8bm4 and downsize them into 255 2-byte utf8 strings for the corresponding index fields. (not sure if that makes sense).

@ghost
Copy link

ghost commented Feb 12, 2019

@phish108
Thanks for your comment.
Movable Type does not support utf8mb4 encoding yet. Sorry for your inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant