Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest limited columns in basic SQL queries #71

Open
eatonphil opened this issue Jun 20, 2022 · 4 comments
Open

Ingest limited columns in basic SQL queries #71

eatonphil opened this issue Jun 20, 2022 · 4 comments
Labels
good first issue Good for newcomers

Comments

@eatonphil
Copy link
Member

There may not be an existing SQLite parser we can use from Go but for simple queries we can use a PostgreSQL parser, see here for a good one and example of use.

The way this would work is that it would attempt to parse the query. If it can parse the query and the query consists of only syntax that we support, return all fields in the query. Then we pass this list of fields to the SQLiteWriter. If this list is set in the SQLiteWriter then when we write fields to SQLite we only write the ones in this list.

For a first pass I'd suggest supporting:

  • SELECT x FROM {} WHERE y = 1 where this returns ['x', 'y']

Additional ones that won't be too bad:

  • SELECT COUNT(x) FROM {} WHERE y = 2 returns ['x', 'y']
  • SELECT x FROM {} GROUP BY z returns ['x', 'z']

Harder but reasonable examples:

  • SELECT a.x FROM {0} a JOIN {1} b ON a.id = b.json_id returns {'a': ['x', 'id'], 'b': ['json_id']}

Examples this must fail on (this is not a comprehensive list):

  • SELECT x, * FROM {} (because of the star operator
  • SELECT x FROM {0} JOIN {1} ON id=json_id (ambiguous where x, id, and json_id come from; also requires supporting different columns for different tables)
@eatonphil eatonphil added the good first issue Good for newcomers label Jun 20, 2022
@eatonphil
Copy link
Member Author

This could also be extended to support LIMIT x without an ORDER BY clause to have it ingest only x rows.

@eatonphil
Copy link
Member Author

Also, this mode must be disabled when -C/--cache is on.

@mc-borscht
Copy link

I wouldn't mind have a go at this - I've been doing some initial investigations with the pg_query_go library and I think I can get something together to cover some of the above cases.

@eatonphil
Copy link
Member Author

Hey @mc-borscht there's a PR open for this #76 but I got stuck because pg_query_go doesn't build on windows.

If you want, you can pick up that PR and get it working. Although before merging it I wanted to have some benchmarks that show it's actually an improvement.

To deal with pg_query_go not building on windows we could either fix pg_query_go's build process or we could use compile flags in Go to make this feature ignored on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants