Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL SQL文件审核定位行数错误 #2240

Open
taolx0 opened this issue Jan 31, 2024 · 2 comments
Open

MySQL SQL文件审核定位行数错误 #2240

taolx0 opened this issue Jan 31, 2024 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@taolx0
Copy link
Collaborator

taolx0 commented Jan 31, 2024

verison

UI: main aeee2f9d
DMS: main-ee 19ee2c1ea2
SQLE: main-ee 342c635080

错误复现

发现两个错误

1. 不能解析的SQL会被认为是同一行(github不能上传.sql文件,将下面的sql复制到sql文件复现)

create table point_trans_shard_00_part_202401(like point_trans_shard_00 including all) inherits(point_trans_shard_00);
Alter table point_trans_shard_00_part_202401 ADD CONSTRAINT chk_point_trans_shard_202401 CHECK (processedtime >= '1704038400000'::bigint AND processedtime < '1706716800000'::bigint );
create table point_trans_source_shard_00_part_202401(like point_trans_source_shard_00 including all) inherits(point_trans_source_shard_00);
Alter table point_trans_source_shard_00_part_202401 ADD CONSTRAINT chk_point_trans_source_shard_202401 CHECK (processedtime >= '1704038400000'::bigint AND processedtime < '1706716800000'::bigint );
grant all on point_trans_shard_00_part_202401 to kgoldpointapp;
grant select on point_trans_shard_00_part_202401 to prd_fin, dbsec, sec_db_scan;
grant all on point_trans_source_shard_00_part_202401 to kgoldpointapp;
grant select on point_trans_source_shard_00_part_202401 to prd_fin, dbsec, sec_db_scan;

20240131103313

2. sql末尾有“;”的sql,影响行数错误

grant all on point_trans_shard_00_part_202401 to kgoldpointapp;
grant select on point_trans_shard_00_part_202401 to prd_fin, dbsec, sec_db_scan;
grant all on point_trans_source_shard_00_part_202401 to kgoldpointapp;;
grant select on point_trans_source_shard_00_part_202401 to prd_fin, dbsec, sec_db_scan;

20240131103815

3 当有语法错误时,展示的行数应该是sql的起始行数,而不是语法出错的行数

有语法错误的sql起始行数是第2行
image

这里应该显示sql定位行数为2,但显示为出错的行数
image

问题1原因

问题背景:问题一中的第一到四条sql都属于解析器无法解析的sql,第5到8条属于解析器可以解析的sql.

当解析器遇到不能解析的sql时,会走到图1不能解析sql的逻辑,在这个逻辑中,会将Line值设为默认值1,遇到\n时解析器会默认将Line值加1,每次遇到解析解析器不支持的sql时,解析器都会将Line值设为1,所以第二到四条sql的行数都为2.
img_4
img_5
img_6
img_7

问题1解决方案

每次当解析器遇到不能解析的sql时,累加startLineOffset行数的值,这样就可以解决问题1

问题1影响面

问题1中的开始行数显示正常
image
image

问题3原因

因为为Line值是由\n的个数决定的,问题3的sql存在两个\n,所以Line值为3,导致页面行号行号显示为3
image

问题3解决方案

@taolx0 taolx0 added the bug Something isn't working label Jan 31, 2024
@ColdWaterLW ColdWaterLW added this to the 3.2404.0 milestone Apr 16, 2024
@taolx0 taolx0 self-assigned this Apr 17, 2024
@actiontech actiontech deleted a comment from ColdWaterLW Apr 17, 2024
This was referenced Apr 18, 2024
@taolx0
Copy link
Collaborator Author

taolx0 commented Apr 23, 2024

问题3备注

背景

目前使用的是 parser.lexer.r.pos().Line 的值用作当前行号,parser.lexer.r.pos().Line的值实际是指当前解析的字符的行号,而不是开始行数.所以当遇到有换行符且不能解析的sql时, 解析器的行号会停在出错的行号,导致行号不准确.例如有如下sql,解析器会停在第3行,而不是第2行,导致开始行号不准确

select 1;
select *
fromd
tt;

一些尝试

思路

既然通过解析器的 parser.lexer.r.pos().Line 值来获取开始行号不准确,那么可以换一种思路,先获取当前行数解析点之前的sql,
然后计算当前解析点之前sql的总的行号,再加上当前解析的sql的换行符符前缀数量,就是当前sql的开始行号.

实现步骤

  1. Parser struct 增加 endLineOffset 字段,用于记录当前解析点之前的sql的总行数
type Parser struct {
	charset    string
	collation  string
	result     []ast.StmtNode
	src        string
	lexer      Scanner
	hintParser *hintParser

	// the following fields are used by yyParse to reduce allocation.
	cache  []yySymType
	yylval yySymType
	yyVAL  *yySymType

	startLineOffset int
	endLineOffset   int
}
  1. 实现思路 (perfect_parser.go)
    实现思路如下, 通过解析器的解析结果,获取当前解析点之前的sql的总行数,然后计算当前解析的sql的开始行号.
    但是这种思路依赖对解析器的解析结果进行分析处理,处理能力有限且容易出错,我觉得解决思路还是应该在解析器内部实现.
package parser

import (
	"bytes"

	"github.com/pingcap/parser/ast"
)

// PerfectParse parses a query string to raw ast.StmtNode. support parses query string
// who contains unparsed SQL, the unparsed SQL will be parses to ast.UnparsedStmt.
func (parser *Parser) PerfectParse(sql, charset, collation string) (stmt []ast.StmtNode, warns []error, err error) {
	_, warns, err = parser.Parse(sql, charset, collation)
	stmts := parser.result
	parser.updateStartLineWithOffset(stmts)
	if err == nil {
		return stmts, warns, nil
	}
	// if err is not nil, the query string must be contains unparsed sql.

	if len(stmts) > 0 {
		for _, stmt := range stmts {
			ast.SetFlag(stmt)
		}
		stmt = append(stmt, stmts...)
	}
	// The origin SQL text(input args `sql`) consists of many SQL segments,
	// each SQL segments is a complete SQL and be parsed into `ast.StmtNode`.
	//
	//     good SQL segment       bad SQL segment
	// |---------------------|---------------------|---------------------|---------------------|    origin SQL text
	//			     		 ^				^
	//		            stmtStartPos   lastScanOffset
	//										|------|---------------------|---------------------|    remaining SQL text
	//
	//                       |<   unparsed stmt   >|<          continue to parse it           >|

	start := parser.lexer.stmtStartPos
	cur := parser.lexer.lastScanOffset

	remainingSql := sql[cur:]
	l := NewScanner(remainingSql)
	var v yySymType
	var endOffset int
	var scanEnd = 0
	var defaultDelimiter int = ';'
	delimiter := defaultDelimiter
ScanLoop:
	for {
		result := l.Lex(&v)
		switch result {
		case scanEnd:
			endOffset = l.lastScanOffset - 1
			break ScanLoop
		case delimiter:
			endOffset = l.lastScanOffset
			break ScanLoop
		case begin:
			// ref: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
			// ref: https://dev.mysql.com/doc/refman/8.0/en/stored-programs-defining.html
			// Support match:
			// BEGIN
			// ...
			// END;
			//
			delimiter = scanEnd
		case end:
			// match `end;`
			var ny yySymType
			next := l.Lex(&ny)
			if next == defaultDelimiter {
				delimiter = defaultDelimiter
				endOffset = l.lastScanOffset
				break ScanLoop
			}
		case invalid:
			// `Lex`内`scan`在进行token遍历时,当有特殊字符时返回invalid,此时未调用`inc`进行滑动,导致每次遍历同一个pos点位触发死循环。有多种情况会返回invalid。
			// 对于解析器本身没影响,因为 token 提取失败就退出了,但是我们需要继续遍历。
			if l.lastScanOffset == l.r.p.Offset {
				l.r.inc()
			}
		}
	}
	unparsedStmtBuf := bytes.Buffer{}
	unparsedStmtBuf.WriteString(sql[start:cur])
	unparsedStmtBuf.WriteString(remainingSql[:endOffset+1])

    // 不能解析sql之前的sql的总行数
	if start != 0 {
		parser.endLineOffset += getLineNumber(sql, start)
	}

	unparsedSql := unparsedStmtBuf.String()
	// 当前sql的开始行号parser.endLineOffset加上当前sql的换行符前缀数量countNewLinePrefix(unparsedStmtBuf.String())
	// 获得当前sql的开始行号
	parser.startLineOffset = parser.endLineOffset + countNewLinePrefix(unparsedStmtBuf.String()) - 1
	// 如果出现所有sql都是无法解析的sql,则当前sql的开始行号startLineOffset为0,结束行号endLineOffset为1
	if parser.endLineOffset == 0 && countNewLinePrefix(unparsedStmtBuf.String()) == 0 {
		parser.startLineOffset = 0
		parser.endLineOffset = 1
	}

    // 更新结束行号
    parser.endLineOffset += getTotalLine(unparsedStmtBuf.String())

	if len(unparsedSql) > 0 {
		un := &ast.UnparsedStmt{}
		un.SetStartLine(parser.startLineOffset + 1)
		un.SetText(unparsedSql)
		stmt = append(stmt, un)
	}

	if len(remainingSql) > endOffset {
		cStmt, cWarn, cErr := parser.PerfectParse(remainingSql[endOffset+1:], charset, collation)
		warns = append(warns, cWarn...)
		if len(cStmt) > 0 {
			stmt = append(stmt, cStmt...)
		}
		if cErr == nil {
			return stmt, warns, cErr
		}
	}
	return stmt, warns, nil
}

func (parser *Parser) updateStartLineWithOffset(stmts []ast.StmtNode) {
	for i := range stmts {
		stmts[i].SetStartLine(stmts[i].StartLine() + parser.startLineOffset)
	}
}

func getTotalLine(remainingSql string) int {
	count := 0
	for _, char := range remainingSql {
		if char == '\n' {
			count++
		}
	}
	return count
}

func getLineNumber(s string, pos int) int {
	if pos == 0 {
		return 0
	}

	lineNumber := 0
	for i := 0; i <= pos; i++ {
		if s[i] == '\n' {
			lineNumber++
		}
	}
	return lineNumber
}

func countNewLinePrefix(s string) int {
	count := 0
	for _, char := range s {
		if char == '\n' {
			count++
		} else {
			break
		}
	}
	return count
}

@ColdWaterLW ColdWaterLW modified the milestones: 3.2404.0, 3.2405.0 Apr 24, 2024
@ColdWaterLW
Copy link
Collaborator

ColdWaterLW commented May 7, 2024

处理方案考虑
把处理行数的逻辑从解析器中剥离出来,和解析器解耦合,方便解析器升级

相关issue:
#2354
#2406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants