Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt] : when composite primary key has a column that is auto-incr, and this column does not specify a value, no dup check is needed #15810

Closed
wants to merge 8 commits into from

Conversation

jensenojs
Copy link
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #15809

What this PR does / why we need it:

when composite primary key has a column that is auto-incr, and this column does not specify a value, no dup check is needed

@matrix-meow matrix-meow added the size/XS Denotes a PR that changes [1, 9] lines label Apr 30, 2024
@jensenojs jensenojs requested a review from fengttt April 30, 2024 06:45
Copy link
Contributor

@fengttt fengttt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, I have a concern on correctness.

If we disable the dup check, and someone insert a row with a value, later, auto_incr reuse that value, we will have a data corruption.

@jensenojs
Copy link
Contributor Author

For this PR, I have a concern on correctness.

If we disable the dup check, and someone insert a row with a value, later, auto_incr reuse that value, we will have a data corruption.

if we allow user insert an id value, we invalidate all id caches and start from a value that is bigger than this id. We have already done this and it was confirmed by @zhangxu19830126

@jensenojs
Copy link
Contributor Author

We can supplement relevant bvt test cases

@fengttt
Copy link
Contributor

fengttt commented Apr 30, 2024

For this PR, I have a concern on correctness.
If we disable the dup check, and someone insert a row with a value, later, auto_incr reuse that value, we will have a data corruption.

if we allow user insert an id value, we invalidate all id caches and start from a value that is bigger than this id. We have already done this and it was confirmed by @zhangxu19830126

Now,

  1. add test in bvt that proves you are correct.
  2. This is a high risk, high impact change. So I ask you please to add a table level config, so that we can enable/disable. I would like this config to be OFF by default, but you may insist this to be ON by default.

@jensenojs jensenojs marked this pull request as draft April 30, 2024 10:17
@matrix-meow matrix-meow added size/S Denotes a PR that changes [10,99] lines and removed size/XS Denotes a PR that changes [1, 9] lines labels Apr 30, 2024
@matrix-meow matrix-meow added size/L Denotes a PR that changes [500,999] lines and removed size/S Denotes a PR that changes [10,99] lines labels May 6, 2024
@jensenojs
Copy link
Contributor Author

jensenojs commented May 6, 2024

我理解这里可能有风险的根本原因是

  • 针对自增类型列的插入
    • 如果允许在插入的时候指定值
      • incr service能不能让对应的column cache失效, 并从插入的值中最大的值开始自增

上面这个事情需要有一个足够强的测试去覆盖这个场景, 目前没有, 所以当这里触发bug的时候, 需要dup check兜底

学习了一下incrservice的代码, 这个机制在mo中具体实现的位置大概在 :

if maxValue > 0 {

incr service 会维护插入数据中有值的最大值, 然后拿去更新. 但在column_cache.test中好像确实没有关于这方面的测试, 即并发地调用insertAutoValues, 观察lastInsertValue是否正确 —— 即lastInsertValue是否大于insert时指定/没指定的所有值

所以在正式开始配置前, 我补充了相关的bvtut测试, 其中ut可以(或者说已经.?)做到高并发的检测, bvt只有两个session, 强度不够高, 后面可能需要继续补充.

  • ut详见func testColumnCacheConcurrentlyManualInsert这个工具函数

在实现配置之前, 想再向田博士 @fengttt 请教和确认一下

  • 如果是table level config的话, 假设有N个表, 为了测不是默认配置的场景, 是不是需要手动配置N次之后才能开始测试? 不确定我理解错了没有
    • 如果是这样的话好像会有点啰嗦, session level的配置会不会简单一些?

For this PR, I have a concern on correctness.
If we disable the dup check, and someone insert a row with a value, later, auto_incr reuse that value, we will have a data corruption.

if we allow user insert an id value, we invalidate all id caches and start from a value that is bigger than this id. We have already done this and it was confirmed by @zhangxu19830126

Now,

  1. add test in bvt that proves you are correct.
  2. This is a high risk, high impact change. So I ask you please to add a table level config, so that we can enable/disable. I would like this config to be OFF by default, but you may insist this to be ON by default.

@fengttt
Copy link
Contributor

fengttt commented May 7, 2024

在实现配置之前, 想再向田博士 @fengttt 请教和确认一下

  • 如果是table level config的话, 假设有N个表, 为了测不是默认配置的场景, 是不是需要手动配置N次之后才能开始测试? 不确定我理解错了没有

    • 如果是这样的话好像会有点啰嗦, session level的配置会不会简单一些?

If you can, this should be a table level config. We accept the manual work if we need to configure it on several tables during test.

This should not a session level config. Cannot depend on user to set this one correctly.

@matrix-meow matrix-meow added size/XXL Denotes a PR that changes 2000+ lines and removed size/L Denotes a PR that changes [500,999] lines labels May 7, 2024
test/distributed/cases/dml/insert/insert_auto.sql Outdated Show resolved Hide resolved
pkg/frontend/authenticate.go Outdated Show resolved Hide resolved
Copy link
Contributor

@fengttt fengttt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test has no coverage on the config. Please add.

@jensenojs
Copy link
Contributor Author

还有一点需要补充的是, 目前auto incr unique key是没有这个优化的, 也就是说unique key无论如何都会进行去重, 已经建立了相关的issue 进行跟踪, 后续close对应issue的时候会补充相关的测试用例

@jensenojs jensenojs force-pushed the autopk_opt2 branch 2 times, most recently from 504af7c to fe8da77 Compare May 11, 2024 10:25
@jensenojs jensenojs force-pushed the autopk_opt2 branch 3 times, most recently from f0886a5 to 0e3ab6e Compare May 14, 2024 08:09
@jensenojs jensenojs force-pushed the autopk_opt2 branch 5 times, most recently from 13ee4ec to 7daa23b Compare May 23, 2024 10:22
@jensenojs
Copy link
Contributor Author

jensenojs commented May 23, 2024

性能对比 :

sysbench-insert 在深圳的台式机上跑了三次, 针对主键带有auto incr的看, 性能有所提升.

image

配置行为描述

现在给mo添加了一个DB level级别的配置 : unique_check_on_autoincr, 它主要针对下面的场景

针对场景

create table t(a int auto_increment, b int, primary key(a));
insert into t(b) values (1); -- *

选项, 以及默认值

对于insert auto pk的时候, 如果没有指定值, 理论上是不会有重复数据的, 可以取消掉去重的计划. 现在这个配置就是控制是否需要避免去重, 有三个合法的values :

  • None : 在insert auto pk 且值是由incr service指定的时候, 不去重

  • Check : 在insert auto pk 且值是由incr service指定的时候, 仍然去重

  • Error : 在insert auto pk 且值是用户指定的时候, 报错When unique_check_on_autoincr is set to error, insertion of the specified value into auto-incr pk column is not allowed

  • 默认值说明

    • 对于用户表来说, 默认值是None
    • 对于mo内部的表来说, 永远是Check

读写方式 :

-- 读取
select `variable_value` from mo_catalog.mo_mysql_compatibility_mode where dat_name =tblName and `variable_name`="unique_check_on_autoincr";

-- 修改
alter database insert_auto_pk set unique_check_on_autoincr = "Check";

上面的读写方式会通过后台SQL的方式将配置写入mo_mysql_compatibility_mode表中, , insert的时候读取这个配置信息的方式是通过 session 字段中 cache住的信息, 在多session或者多cn下可能会outdated. 如果session 1对这个字段进行了修改, session 2上的配置如果要同步, 可以再执行一次use xxx, use xxxalter xxx都会通过后台SQL获取最新的配置信息.

遗留工作

这个pr给前端添加了一个获取config的接口, 和resloveVariable那一套的体系是分开的, 第一个DB level config的配置编写是硬hack的, 所以现在的实现其实也很不优雅, 需要重构, 不然后面table level config更不可能实现, 下面的issue 待分配.

@jensenojs jensenojs requested a review from fengttt May 24, 2024 08:05
@jensenojs
Copy link
Contributor Author

现在restore相关的bvt并不稳定, 而且这个pr的内容也太大了, 我会把它拆成两个部分的内容去提交.

  • Database level config
  • unique_check_on_autoincr 的配置

@jensenojs jensenojs closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/wip kind/enhancement size/XXL Denotes a PR that changes 2000+ lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants