Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxed mode? #298

Open
chapmanjacobd opened this issue Oct 15, 2022 · 3 comments
Open

Relaxed mode? #298

chapmanjacobd opened this issue Oct 15, 2022 · 3 comments

Comments

@chapmanjacobd
Copy link

chapmanjacobd commented Oct 15, 2022

I really like how fast octosql is and it seems like a great tool to bring unstructured data into a strict hierarchical, columnar format.

Unfortunately, wrangling unstructured data usually requires some flexibility and interpretation. I have two main issues so far:

  1. Selecting variables which are in some files but not others:

    ⭐ It would be really nice to have a --relaxed flag where unknown variables are silently replaced with null

unzstd --memory=2048MB --stdout RS_2008-02.zst | octosql "SELECT overridden_url, valid_column FROM stdin.json" -o csv

Error: typecheck error: unknown variable: 'overridden_url'

Ideally, that would:

  • with octosql --relaxed turn into "SELECT null, valid_column FROM stdin.json"
  • with normal, strict mode be the same error
  1. No way to cast booleans to int?
octosql "SELECT cast(false as int)"
Error: typecheck error: typecast of non-union type expression 'Boolean'

octosql "SELECT int(false)"
Error: typecheck error: unknown function: int(Boolean)

For example, in sqlite:

sqlite-utils memory -t 'select cast(false as int) as hello'
  hello
-------
      0
@chapmanjacobd
Copy link
Author

chapmanjacobd commented Oct 15, 2022

also, what's going on with len() ?

octosql "SELECT * FROM docs.function_signatures"
...
| 'len'            | []                       | 'NULL'        | true   | false            |
| 'len'            | []                       | 'NULL'        | true   | false            |
| 'len'            | []                       | 'NULL'        | true   | false            |
| 'len'            | ['String']               | 'Int'         | true   | true             |
...

octosql "SELECT len((0,1,2,3))"
+-------+
| col_0 |
+-------+
|     4 |
+-------+

seems to work fine...

octosql -v
octosql version dev

also, you might also want to store/print the git commit during compile time or something, for example:

'octosql version dev' + git describe --match=NeVeRmAtCh --always --abbrev=8 --dirty && git status --short

@cube2222
Copy link
Owner

cube2222 commented Oct 15, 2022

Hey!

So, multiple things you mentioned:

relaxed flag

The use case makes sense, I'll think about it.

No way to cast booleans to int

Good point, happy to add this.

what's going on with len()

When the simple_signature is false, then the arg types / return types have no meaning. But it's a good point that this way it's completely unreadable right now what's supported and what isn't. I'll see if I can add "synthetic" signatures there.

len works with strings, lists, tuples, and objects right now.

@cube2222
Copy link
Owner

No way to cast booleans to int

This is now added in cc43f25 and released in 0.11.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants