Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: run tpcc test and then restore report Duplicate entry #16100

Open
1 task done
YANGGMM opened this issue May 14, 2024 · 22 comments
Open
1 task done

[Bug]: run tpcc test and then restore report Duplicate entry #16100

YANGGMM opened this issue May 14, 2024 · 22 comments
Assignees
Labels
kind/bug Something isn't working resolved/v1.2.1 severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@YANGGMM
Copy link
Contributor

YANGGMM commented May 14, 2024

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

newest

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

mysql> restore account sys from snapshot sp_01;
ERROR 1062 (HY000): Duplicate entry '1' for key 'hist_id'

mysql> restore account sys database tpcc from snapshot sp_01;
ERROR 1062 (HY000): Duplicate entry '8193' for key 'hist_id'

Expected Behavior

No response

Steps to Reproduce

1.create snapshot 

2.run tpcc test

3.restore

Additional information

No response

@YANGGMM YANGGMM added kind/bug Something isn't working needs-triage labels May 14, 2024
@YANGGMM YANGGMM assigned matrix-meow and YANGGMM and unassigned matrix-meow May 14, 2024
@matrix-meow matrix-meow added this to the 1.2.0 milestone May 14, 2024
@YANGGMM YANGGMM added the severity/s0 Extreme impact: Cause the application to break down and seriously affect the use label May 14, 2024
@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 14, 2024

尝试复现

mysql> show create table bmsql_history ;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table         | Create Table                                                                                                                                                                                                                                                                                                                                         |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bmsql_history | CREATE TABLE `bmsql_history` (
`hist_id` INT NOT NULL AUTO_INCREMENT,
`h_c_id` INT DEFAULT NULL,
`h_c_d_id` INT DEFAULT NULL,
`h_c_w_id` INT DEFAULT NULL,
`h_d_id` INT DEFAULT NULL,
`h_w_id` INT DEFAULT NULL,
`h_date` TIMESTAMP DEFAULT NULL,
`h_amount` DECIMAL(6,2) DEFAULT NULL,
`h_data` VARCHAR(24) DEFAULT NULL,
PRIMARY KEY (`hist_id`)
) |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> create table `bmsql_history_restore`(`hist_id` INT NOT NULL AUTO_INCREMENT,`h_c_id` INT DEFAULT NULL,`h_c_d_id` INT DEFAULT NULL,`h_c_w_id` INT DEFAULT NULL,`h_d_id` INT DEFAULT NULL,`h_w_id` INT DEFAULT NULL,`h_date` TIMESTAMP DEFAULT NULL,`h_amount` DECIMAL(6,2) DEFAULT NULL,`h_data` VARCHAR(24) DEFAULT NULL,PRIMARY KEY (`hist_id`));
Query OK, 0 rows affected (0.07 sec)

mysql> insert into `bmsql_history_restore` select * from `bmsql_history` {snapshot = 'sp_01'};
Query OK, 300000 rows affected (3.27 sec)

恢复成功,而且数据量正确

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 14, 2024

用restore 报错

mysql> restore account sys database tpcc table `bmsql_history` from snapshot sp_01;
ERROR 1062 (HY000): Duplicate entry '8193' for key 'hist_id'

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 14, 2024

image

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 15, 2024

测试数据量最大的表restore,正常恢复

mysql> restore account sys database tpcc table bmsql_order_line from snapshot sp_01;
Query OK, 0 rows affected (12 min 57.49 sec)

恢复后数据一切正常

mysql> select count(*) from bmsql_order_line;
+----------+
| count(*) |
+----------+
|  2998810 |
+----------+
1 row in set (0.04 sec)

mysql> select count(*) from bmsql_order_line{snapshot = 'sp_01'};
+----------+
| count(*) |
+----------+
|  2998810 |
+----------+
1 row in set (0.04 sec)

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 15, 2024

mysql>  restore account sys database tpcc table `bmsql_history` from snapshot sp_01;
ERROR 1062 (HY000): Duplicate entry '16385' for key 'hist_id'
mysql> 
{"level":"ERROR","time":"2024/05/15 11:08:30.914112 +0800","caller":"fuzzyfilter/filter.go:275","msg":"error: Duplicate entry '16385' for key 'hist_id'","span":{"trace_id":"3df833af-1e84-c416-e634-d8f9cc4a1ed0","span_id":"d03e90d309a9ddaf"}}

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 15, 2024

cerate database tpcc;
use tpcc;
create table `bmsql_history` (`hist_id` INT NOT NULL AUTO_INCREMENT,`h_c_id` INT DEFAULT NULL,`h_c_d_id` INT DEFAULT NULL,`h_c_w_id` INT DEFAULT NULL,`h_d_id` INT DEFAULT NULL,`h_w_id` INT DEFAULT NULL,`h_date` TIMESTAMP DEFAULT NULL,`h_amount` DECIMAL(6,2) DEFAULT NULL,`h_data` VARCHAR(24) DEFAULT NULL,PRIMARY KEY (`hist_id`));


--load file
tpcccust-hist.csv

create snapshot sp_01 for account sys;

 restore account sys database tpcc table `bmsql_history` from snapshot sp_01;

@jensenojs
Copy link
Contributor

mysql>  restore account sys database tpcc table `bmsql_history` from snapshot sp_01;
ERROR 1062 (HY000): Duplicate entry '16385' for key 'hist_id'
mysql> 
{"level":"ERROR","time":"2024/05/15 11:08:30.914112 +0800","caller":"fuzzyfilter/filter.go:275","msg":"error: Duplicate entry '16385' for key 'hist_id'","span":{"trace_id":"3df833af-1e84-c416-e634-d8f9cc4a1ed0","span_id":"d03e90d309a9ddaf"}}
  1. 和钊哥一起看了一下, fuzzy 的table scan不应该读取上来数据, 但读上来了.

  2. 在后台SQL的执行过程中用select count(*) from tblName试了一下, 没有读取上来数据

@Ariznawlll
Copy link
Contributor

Ariznawlll commented May 15, 2024

情景一:
单机测试了sysbench 点查
commit:16a954e8eaa0c37f36c1fd6cc01691511758b69f
数据量:10w
测试步骤1:

sys租户连接mo:
create database sysbench_db;

本地:git clone https://github.com/matrixorigin/mo-sysbench.git

cd mo-sysbench
sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=100000 --threads=100 --time=30 --report-interval=10    cleanup

建db并插入数据
sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=100000 --threads=100 --time=30 --report-interval=10    prepare

sys租户连接mo:
create snapshot sp01 for account sys;

执行点查测试
cd mo-sysbench
sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111   oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=100000 --threads=100 --time=300 --report-interval=10    --range_selects=off --point_selects=1 run

sys租户连接mo:
restore account sys from snapshot sp01;

表schema:
image

restore结果:
image

测试步骤2:

sys租户连接mo:
create database sysbench_db;

本地:git clone https://github.com/matrixorigin/mo-sysbench.git

cd mo-sysbench
sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=100000 --threads=100 --time=30 --report-interval=10    cleanup

建db并插入数据
sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=100000 --threads=100 --time=30 --report-interval=10    prepare

sys租户连接mo:
create snapshot sp01 for account sys;

什么测试也不执行直接执行下面的restore

sys租户连接mo:
restore account sys from snapshot sp01;

表schema:
image

image

情景二:
单机执行tpcc 10-1000测试:

建表:https://github.com/matrixorigin/mo-tpcc/blob/main/sql/tableCreates.sql
mysql -h 127.0.0.1 -P 6001 -udump -p111
create snapshot sp03 for account sys;

执行tpcc 10-1000测试
restore account sys from snapshot sp03;
image

表schema:
image

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 15, 2024

单独开事务执行是没有问题的

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> drop table if exists `bmsql_history`;
Query OK, 0 rows affected (0.01 sec)

mysql> use tpcc;
Database changed
mysql> 
mysql> create table `bmsql_history` (`hist_id` INT NOT NULL AUTO_INCREMENT,`h_c_id` INT DEFAULT NULL,`h_c_d_id` INT DEFAULT NULL,`h_c_w_id` INT DEFAULT NULL,`h_d_id` INT DEFAULT NULL,`h_w_id` INT DEFAULT NULL,`h_date` TIMESTAMP DEFAULT NULL,`h_amount` DECIMAL(6,2) DEFAULT NULL,`h_data` VARCHAR(24) DEFAULT NULL,PRIMARY KEY (`hist_id`));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'};
Query OK, 300000 rows affected (1.93 sec)

mysql> commit;
Query OK, 0 rows affected (0.48 sec)

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 15, 2024

image

@Ariznawlll Ariznawlll modified the milestones: 1.2.0, 1.2.1 May 15, 2024
@jensenojs
Copy link
Contributor

企业微信截图_5145987b-4371-4b75-a315-01ec85f513b4

mo-service-backend.log
mo-service-frontend.log

分别打印了frontend/backend执行的snapshot进行对比, 发现frontend比backend多了点东西, 需要向相关同事请教一下是什么, 打印日志的基本格式如下所示

// func restoreToDatabaseOrTable
	if tblName != "" {
		if b, _  := bh.(*backExec); b != nil {
			fmt.Println("---")
			fmt.Println("drop table")	
			fmt.Println(b.backSes.txnHandler.txnOp.GetOverview().Meta.SnapshotTS)
			fmt.Println("---")
		}
		if err = bh.Exec(toCtx, "drop table if exists "+dbName+"."+tblName); err != nil {
			return
		}
	}

单独开事务执行是没有问题的

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> drop table if exists `bmsql_history`;
Query OK, 0 rows affected (0.01 sec)

mysql> use tpcc;
Database changed
mysql> 
mysql> create table `bmsql_history` (`hist_id` INT NOT NULL AUTO_INCREMENT,`h_c_id` INT DEFAULT NULL,`h_c_d_id` INT DEFAULT NULL,`h_c_w_id` INT DEFAULT NULL,`h_d_id` INT DEFAULT NULL,`h_w_id` INT DEFAULT NULL,`h_date` TIMESTAMP DEFAULT NULL,`h_amount` DECIMAL(6,2) DEFAULT NULL,`h_data` VARCHAR(24) DEFAULT NULL,PRIMARY KEY (`hist_id`));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'};
Query OK, 300000 rows affected (1.93 sec)

mysql> commit;
Query OK, 0 rows affected (0.48 sec)

@jensenojs
Copy link
Contributor

企业微信截图_5145987b-4371-4b75-a315-01ec85f513b4 [mo-service-backend.log](https://github.com/matrixorigin/matrixone/files/15318756/mo-service-backend.log) [mo-service-frontend.log](https://github.com/matrixorigin/matrixone/files/15318757/mo-service-frontend.log)

分别打印了frontend/backend执行的snapshot进行对比, 发现frontend比backend多了点东西, 需要向相关同事请教一下是什么, 打印日志的基本格式如下所示

// func restoreToDatabaseOrTable
	if tblName != "" {
		if b, _  := bh.(*backExec); b != nil {
			fmt.Println("---")
			fmt.Println("drop table")	
			fmt.Println(b.backSes.txnHandler.txnOp.GetOverview().Meta.SnapshotTS)
			fmt.Println("---")
		}
		if err = bh.Exec(toCtx, "drop table if exists "+dbName+"."+tblName); err != nil {
			return
		}
	}

单独开事务执行是没有问题的

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> drop table if exists `bmsql_history`;
Query OK, 0 rows affected (0.01 sec)

mysql> use tpcc;
Database changed
mysql> 
mysql> create table `bmsql_history` (`hist_id` INT NOT NULL AUTO_INCREMENT,`h_c_id` INT DEFAULT NULL,`h_c_d_id` INT DEFAULT NULL,`h_c_w_id` INT DEFAULT NULL,`h_d_id` INT DEFAULT NULL,`h_w_id` INT DEFAULT NULL,`h_date` TIMESTAMP DEFAULT NULL,`h_amount` DECIMAL(6,2) DEFAULT NULL,`h_data` VARCHAR(24) DEFAULT NULL,PRIMARY KEY (`hist_id`));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'};
Query OK, 300000 rows affected (1.93 sec)

mysql> commit;
Query OK, 0 rows affected (0.48 sec)

从日志看, 不管是前端执行还是后端执行, snapshot都不止一个. 都是有问题的

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 16, 2024

mysql> begin;
Query OK, 0 rows affected (0.01 sec)

mysql> 
mysql> drop table if exists `bmsql_history`;
Query OK, 0 rows affected (0.03 sec)

mysql> use tpcc;
Database changed
mysql> 
mysql> show full tables from `tpcc` {snapshot = 'sp_01'};
+------------------+------------+
| Tables_in_tpcc   | Table_type |
+------------------+------------+
| bmsql_config     | BASE TABLE |
| bmsql_customer   | BASE TABLE |
| bmsql_district   | BASE TABLE |
| bmsql_history    | BASE TABLE |
| bmsql_item       | BASE TABLE |
| bmsql_new_order  | BASE TABLE |
| bmsql_oorder     | BASE TABLE |
| bmsql_order_line | BASE TABLE |
| bmsql_stock      | BASE TABLE |
| bmsql_warehouse  | BASE TABLE |
+------------------+------------+
10 rows in set (0.02 sec)

mysql> show create table `tpcc`.`bmsql_history` {snapshot = 'sp_01'};
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table         | Create Table                                                                                                                                                                                                                                                                                                                                         |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bmsql_history | CREATE TABLE `bmsql_history` (
`hist_id` INT NOT NULL AUTO_INCREMENT,
`h_c_id` INT DEFAULT NULL,
`h_c_d_id` INT DEFAULT NULL,
`h_c_w_id` INT DEFAULT NULL,
`h_d_id` INT DEFAULT NULL,
`h_w_id` INT DEFAULT NULL,
`h_date` TIMESTAMP DEFAULT NULL,
`h_amount` DECIMAL(6,2) DEFAULT NULL,
`h_data` VARCHAR(24) DEFAULT NULL,
PRIMARY KEY (`hist_id`)
) |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> CREATE TABLE `bmsql_history` (
    -> `hist_id` INT NOT NULL AUTO_INCREMENT,
    -> `h_c_id` INT DEFAULT NULL,
    -> `h_c_d_id` INT DEFAULT NULL,
    -> `h_c_w_id` INT DEFAULT NULL,
    -> `h_d_id` INT DEFAULT NULL,
    -> `h_w_id` INT DEFAULT NULL,
    -> `h_date` TIMESTAMP DEFAULT NULL,
    -> `h_amount` DECIMAL(6,2) DEFAULT NULL,
    -> `h_data` VARCHAR(24) DEFAULT NULL,
    -> PRIMARY KEY (`hist_id`)
    -> );
Query OK, 0 rows affected (0.03 sec)

mysql> insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'};
Query OK, 300000 rows affected (8.63 sec)

mysql> commit;
Query OK, 0 rows affected (0.48 sec)

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 16, 2024

在执行restore中insert之前先提交事务,restore就可以顺利通过,感觉是backGroundExec的事务问题
image

image

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 16, 2024

在statement 执行入口添加log

-----------------------
executeStmtWithIncrStmt
sql : restore account sys database tpcc table `bmsql_history` from snapshot sp_01
snapshotTs : 1715844568861921000
-----------------------

-----------------------
executeStmtWithIncrStmt
sql : drop table if exists tpcc.bmsql_history
snapshotTs : 1715844568865944000
-----------------------


-----------------------
executeStmtWithIncrStmt
sql : show full tables from `tpcc` {snapshot = 'sp_01'}
snapshotTs : 1715844568884104000
-----------------------

-----------------------
executeStmtWithIncrStmt
sql : show create table `tpcc`.`bmsql_history` {snapshot = 'sp_01'}
snapshotTs : 1715844568887886000
-----------------------

-----------------------
executeStmtWithIncrStmt
sql : CREATE TABLE `bmsql_history` (
`hist_id` INT NOT NULL AUTO_INCREMENT,
`h_c_id` INT DEFAULT NULL,
`h_c_d_id` INT DEFAULT NULL,
`h_c_w_id` INT DEFAULT NULL,
`h_d_id` INT DEFAULT NULL,
`h_w_id` INT DEFAULT NULL,
`h_date` TIMESTAMP DEFAULT NULL,
`h_amount` DECIMAL(6,2) DEFAULT NULL,
`h_data` VARCHAR(24) DEFAULT NULL,
PRIMARY KEY (`hist_id`)
)
snapshotTs : 1715844568887886000
-----------------------


-----------------------
executeStmtWithIncrStmt
sql : insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'}
snapshotTs : 1715844568903943000
-----------------------

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 16, 2024

frontend 执行的log

-----------------------
executeStmtWithIncrStmt
sql : drop table if exists tpcc.bmsql_history
snapshotTs : 1715845293989984000
-----------------------

-----------------------
executeStmtWithIncrStmt
sql : show full tables from `tpcc` {snapshot = 'sp_01'}
snapshotTs : 1715845319434256000
-----------------------


-----------------------
executeStmtWithIncrStmt
sql : show create table `tpcc`.`bmsql_history` {snapshot = 'sp_01'}
snapshotTs : 1715845329498667000
-----------------------
-----------------------
executeStmtWithIncrStmt
sql : CREATE TABLE `bmsql_history` ( `hist_id` INT NOT NULL AUTO_INCREMENT, `h_c_id` INT DEFAULT NULL, `h_c_d_id` INT DEFAULT NULL, `h_c_w_id` INT DEFAULT NULL, `h_d_id` INT DEFAULT NULL, `h_w_id` INT DEFAULT NULL, `h_date` TIMESTAMP DEFAULT NULL, `h_amount` DECIMAL(6,2) DEFAULT NULL, `h_data` VARCHAR(24) DEFAULT NULL, PRIMARY KEY (`hist_id`) )
snapshotTs : 1715845333868618000
-----------------------

-----------------------
executeStmtWithIncrStmt
sql : insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'}
snapshotTs : 1715845337754797000
-----------------------

@YANGGMM
Copy link
Contributor Author

YANGGMM commented May 16, 2024

compile阶段的ts 跟 前端的ts 以及 snapshot的ts都对得上

-----------------------
executeStmtWithIncrStmt
sql : insert into `tpcc`.`bmsql_history` SELECT * FROM `tpcc`.`bmsql_history` {snapshot = 'sp_01'}
snapshotTs : 1715847008536497000
-----------------------

-----------------------
compileTableScanDataSource
schemaName : tpcc
relationName : bmsql_history
snapshotTs : 1715847008536497000
-----------------------


-----------------------
compileTableScanDataSource
schemaName : tpcc
relationName : bmsql_history
is snapshot op: true
snapshotTs : 1715846766094095000
-----------------------
image

@triump2020
Copy link
Contributor

The cause had been located, wait for PR

@triump2020
Copy link
Contributor

triump2020 commented May 17, 2024

  1. 基本上可以判定自增列有关系, 事务 的 insert into select xxx statement 执行 与 update mo_incremental_columns 并行执行, 每个statement 在 compile.Run 前会更新 txn.snapshotWriteOffset, 导致了reader 出现了万圣节问题.
  2. 这个问题 跟snaphsot read 无关,只是 snapshot read 更容易暴露问题.

@YANGGMM YANGGMM assigned qingxinhome and unassigned YANGGMM May 21, 2024
@qingxinhome
Copy link
Contributor

Got it, deal with it as soon as possible

@qingxinhome
Copy link
Contributor

Positioning

@qingxinhome
Copy link
Contributor

qingxinhome commented May 29, 2024

问题原因详情已经
FixRestoreDuplicateEntry#16100.zip

确认,请参考文档

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working resolved/v1.2.1 severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

8 participants