Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL code formatter #45

Open
Ngalstyan4 opened this issue Aug 11, 2023 · 6 comments
Open

Add SQL code formatter #45

Ngalstyan4 opened this issue Aug 11, 2023 · 6 comments

Comments

@Ngalstyan4
Copy link
Contributor

This should format lantern.sql and regression test SQL files.
Sqlfluff is one good option to use for this, but have not looked too much into it.

@Ngalstyan4 Ngalstyan4 added the good first issue Good for newcomers label Aug 14, 2023
@dqii
Copy link
Contributor

dqii commented Aug 21, 2023

I tried adding sqlfluff to Dockerfile.dev with

pip install sqlfluff

and to CMakeLists.txt with

# Find all SQL files in your project
file(GLOB_RECURSE SQL_FILES ${CMAKE_SOURCE_DIR}/*.sql)

# Add custom target to lint SQL files
add_custom_target(
  lint-sql
  COMMAND sqlfluff lint ${SQL_FILES} --dialect postgres
  COMMENT "Linting SQL files with SQLFluff"
)

# Add custom target to format SQL files
add_custom_target(
  format-sql
  COMMAND sqlfluff fix --force ${SQL_FILES} --dialect postgres
  COMMENT "Formatting SQL files with SQLFluff"
)

I was able to run it with make format-sql, but there's a lot of issues from our custom operator, such as:

L:  45 | P:  21 |  PRS | Line 45, Position 21: Found unparsable section: '<->
                       | array[0,1,0] LIMIT 7'

@dqii
Copy link
Contributor

dqii commented Aug 21, 2023

I think pgFormatter will be a better fit, since there is also pl/pgsql code.

@dqii dqii linked a pull request Aug 22, 2023 that will close this issue
@dqii
Copy link
Contributor

dqii commented Aug 24, 2023

I'm stopping work on this. Some comments for visibility.

The overall reason is that there isn't sufficient support for auto-formatting SQL + psql + pl/pgsql.

sqlfluff is one of the more customizable SQL formatters, but it doesn't for example support custom operators (e.g., <->), psql commands like \gset and \echo, and variable substitution like LIMIT :LIMIT.

pgFormatter worked fine with custom operators, but I think it was inadequate. It didn't for example handle CTEs well, and I didn't see a good way to fix that. Also, it didn't support "magic comments". They do support placeholders, which I tried to use for magic comments, but this caused formatting issues. (e.g., next line always started with a space)

pglast I liked because it was based on libpg_query, which is a port of Postgres's parser. I wasn't the biggest fan of all their formatting choices, but I thought it'd be easier to hack together support for the psql commands on top of it + magic comments. My attempt is included in the closed PR. In the end, it didn't handle \gset ideally, it didn't handle :'v4444' (but did handle limit :limit). I could spend some more time getting those cases to work, but then there's the additional pain of, sometimes it doesn't format \gset in the way you might like.

At that point, if you're constantly using magic comments to force it to format in an aesthetic way, what is the point?

I think if there's a nice formatter with psql support in the future, it would be great to use that.

@bobdevine
Copy link

Hi, I'm taking a fresh look at this issue. Two solutions seem possible:

  1. use a third-party formatter/beautifier to standardize the SQL statements. But while there are dozens to consider, most fail to handle the PostgreSQL-specific syntax.
  2. Use the PostgreSQL lexer/parser. This solution guarantees full comparability. If the query-string's syntax is correct, a parse tree is created and then the tree can be printed in standard format.

I suggest that choice (2) be implemented using a library (https://github.com/pganalyze/libpg_query).

If this approach is acceptable, I can write a testing tool.

@Ngalstyan4
Copy link
Contributor Author

This sounds interesting, @bobdevine !

Quick question:

Use the PostgreSQL lexer/parser. This solution guarantees full comparability

Does this guarantee compatibility with PSQL magic commands (\d, \t+, etc) or is it only about SQL?
Our formatter needs to handle PSQL magic commands since we use those in our tests.

It seems @dqii considered an approach similar to your proposal (2) via pglast (which is also based on libpg_query!) but that approach did not handle PSQL magic commands.

@bobdevine
Copy link

formatter.zip

Here's a zip-file containing 2 C files to read a test file and then format all lines that are SQL. Formatting uses the PostgreSQL parser to validate the SQL command. Then the parsed SQL is "deparsed" to a canonical string.

@Ngalstyan4 Ngalstyan4 removed the good first issue Good for newcomers label Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants