Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance ideas #72

Open
eatonphil opened this issue Jun 21, 2022 · 0 comments
Open

Performance ideas #72

eatonphil opened this issue Jun 21, 2022 · 0 comments
Labels
good first issue Good for newcomers

Comments

@eatonphil
Copy link
Member

eatonphil commented Jun 21, 2022

Catchall for now for potential improvements to datastation/dsq.

  • SQL pre-processing
  • Support more input types using SQLiteWriter, basically requires supporting expanded nested objects in (see notes in Regression between v0.19.0 and v0.20.0 around processing arrays in JSONL files? #67 )
  • Maybe Handle jsonl in parallel since newlines must not be within individual JSON lines
  • Get rid of map[string]any inside datastation
    • At the very least put WriteRecord into the ResultWriter interface so SQLiteWriter can avoid map[string]any which it converts from anyway
  • CSV parser improvements
  • Add benchmarks for every file format, not just CSV. Basically every file format needs to be worked on individually
@eatonphil eatonphil added the good first issue Good for newcomers label Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant