Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semicolon Detection Fails for "--" in Strings in SQL Statements #699

Open
Ynng opened this issue Feb 12, 2024 · 3 comments
Open

Semicolon Detection Fails for "--" in Strings in SQL Statements #699

Ynng opened this issue Feb 12, 2024 · 3 comments

Comments

@Ynng
Copy link

Ynng commented Feb 12, 2024

When executing SQL migrations, the parser misses the semicolon and incorrectly merges statements if a string contains "--" after a space, mistakenly interpreting it as the start of a comment. This issue occurs even though the "--" is part of a string and should not be treated as a comment. For example:

-- +goose Up
CREATE TABLE t1 (
    c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    c2 VARCHAR(100))
ENGINE=InnoDB
COMMENT='Look at this cool arrow -->';

CREATE TABLE t2 (
    c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    c2 VARCHAR(100))
ENGINE=InnoDB;

-- +goose Down
DROP TABLE t1;

This results in syntax errors when running migrations, as seen below:

partial migration error (type:sql,version:1): Error 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CREATE TABLE t2 (
    c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    c2 VARCHAR' at line 7 

The issue appears to be in the semicolon detection logic endsWithSemicolon(line string) bool:

func endsWithSemicolon(line string) bool {
scanBufPtr := bufferPool.Get().(*[]byte)
scanBuf := *scanBufPtr
defer bufferPool.Put(scanBufPtr)
prev := ""
scanner := bufio.NewScanner(strings.NewReader(line))
scanner.Buffer(scanBuf, scanBufSize)
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
word := scanner.Text()
if strings.HasPrefix(word, "--") {
break
}
prev = word
}
return strings.HasSuffix(prev, ";")
}

@mfridman
Copy link
Collaborator

Ye this is a bit unfortunate, I suppose you could move the semicolon to a new line, or wrap that statement in a +goose StatementBegin / +goose StatementEnd.

The SQL parser is quite basic and does the bare minimum. Open to suggestions on how this could be improved.

@Ynng
Copy link
Author

Ynng commented Feb 15, 2024

Certainly, I can manually adjust the SQL, but for my scenario, which involves programmatically generating migrations from mysqldump, the task becomes more challenging.

For now, my temporary workaround is to regex my arrows from --> to ->, but this is obviously not a universal fix for --.

I don't really see any solutions that doesn't require complicating ParseSQLMigration.
Maybe we can track whether or not we are inside a string by looking for the ' character? But there are many edge cases...

@mfridman
Copy link
Collaborator

mfridman commented Feb 22, 2024

Certainly, I can manually adjust the SQL, but for my scenario, which involves programmatically generating migrations from mysqldump, the task becomes more challenging.

Yep, that's an excellent example.

I don't really see any solutions that doesn't require complicating ParseSQLMigration.

Pretty much. Which gets us into the territory of writing a full-blown SQL parser, otherwise we're always fighting a new edge case. To make matters worse, there's always some subtle dialect-specific difference.

I'll keep this issue open and continue to think this through in the background.

I wonder if you could wrap your entire dumped schema with all statements within:

-- +goose Up
-- +goose StatementBegin

... your entire schema here

-- +goose StatementEnd

This tells goose to send the entire set of querie(s) as a single semicolon-separated query. And usually this just works unless you have an extensive schema, exceed the database limit or a specific query can't be run in the same transaction.

A bit more background on these annotations can be found here:

https://pressly.github.io/goose/blog/2022/overview-sql-file/#multiple-statements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants