Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selector syntax #5

Open
Gozala opened this issue Feb 26, 2024 · 21 comments
Open

selector syntax #5

Gozala opened this issue Feb 26, 2024 · 21 comments

Comments

@Gozala
Copy link

Gozala commented Feb 26, 2024

Here is my best attempt to describe jq inspired selector syntax and deliberate incompatibilities with jq.

Supported Forms

Name Selector Input Output jq Output
Identity . {x:1} {x:1} {x:1}
Iterator .[] [1, 2] 1, 2 (1, 2)
Null Iterator .[] null ⛔️ ⛔️
Optional Null Iterator .[]? null () ()
Nested Iterator .[][] [[1], 2, [3]] ⛔️ (1, ⛔️)
Optional Iterator .[][]? [[1], 2, [3]] (1, 3) (1, 3)
Object Key .x { x: 1 } 1 1
Missing Key .x {} ⛔️ null
Optional Missing Key .x? {} null null
Null Key .x null ⛔️ null
Optional Null Key .x? null null null
Array Key .x [] ⛔️ ⛔️
Optional Array Key .x [] null null
Quoted Key .["x"] {x: 1} 1 1
Optional Quoted Key .["x"]? {} null null
.length .length [1, 2] ⛔️ ⛔️
.length? .length? [1, 2] null null
Index .[0] [1, 2] 1 1
Out of bound Index .[4] [0, 1] ⛔️ null
Optional Index .[4]? [0, 1] null null
Negative Index .[-1] [1, 2] 2 2
String Index .[0] "Hi" "H" ⛔️
Bytes Index .[0] Bytes([0, 1]) 0 n/a
Array Slice .[0:2] [0, 1, 2] [0, 1] [0, 1]
Array Slice .[1:] [0, 1, 2] [1, 2] [1, 2]
Array Slice .[:2] [0, 1, 2] [0, 1] [0, 1]
String Slice .[0:2] "hello" "he" "he"
Bytes Index .[1:] Bytes([0, 1, 2]) Bytes([1, 2]) n/a

We treat bytes as u8 array. We also treat links as u8 array from selectors point of view.

@expede
Copy link
Member

expede commented Feb 27, 2024

Asking around, having this sugar...

[">", ".foo", 0]

...was universally loved ❤️

@expede
Copy link
Member

expede commented Feb 27, 2024

Lifting some conversation from DMs here:

  • Ranges (if we eventually support them)
    • I'd expect .foo[3:9] to produce a stream of values, not a smaller array
  • Asked around about in versus other options, and some came out on top
    • gozala suggested includes in DMs; let's run some vs includes past folks 👍

@expede
Copy link
Member

expede commented Feb 27, 2024

I'd expect .foo[3;9] to produce a stream of values, not a smaller array

UPDATE: that's not what jq does.

{foo: [0,1,2,3,4,5]}
.foo[1:3] // ..., [1,2,3], ...
.foo[1:3][] // ...1, 2, 3, ...

The refrain in feedback in basically any community I've asked is "just do what jq does unless you have a good reason not to", so let's respect jq behaviour here 👍

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

I have implemented selectors and put together this test vector so we could utilize it for compatibility
https://github.com/web3-storage/ucanto/blob/7bd8f8ec1aad147a70635ec3c6c97f297a0e281d/packages/core/test/policy/selector.vector.js

Few things I'd like to call out

  1. Object iterator MUST sort keys alphabetically to ensure deterministic order both across languages and to be immune to insertion order in JS.
  2. I end up with an return type like
    type SelectionResult = 
      | { one: Data }
      | { many: Data }
      | { error: ParseError|TypeError }
    You get many whenever selector contains [] and you get one otherwise. You get ParseError if selector is invalid and TypeError in cases where jq produces errors.
  3. I think jq is very inconsistent in it's behavior, e.g. pathing into null like .foo produces null, but same on array or string produces an error. I wish it just always returned null if path was not valid or always returned error.
    • Iteration also has some strange edge cases that is inconsistent I'll paste some of those below
  4. .length on arrays is produces errors, that is what jq does and probably makes sense in non-js contexts. On the other hand if we allowed .length on arrays we could avoid having to introduce operator in it's place.
  5. jq does not allow indexed access on strings, I found it surprising, maybe we can allow character access.
  6. out of bound array access produces null.

I think it would be a good idea to not follow jq blindly and make things bit more consistent. E.g. always result in null if path does not lead to a value and [] when iteration is run over non iterables. Empty iterables turn into never / bottom on some and every which seem like what you'd expect. In case of null it is not ideal but not completely unreasonable. If we don't want to return null perhaps we could just return errors in cases like (null).foo instead.

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

Some more WTF from jq land

echo '{}' | jq '.q[0]'
> null

echo '{"q": {"0":1} }' | jq '.q[0]'
jq: error (at <stdin>:1): Cannot index object with number

echo '{}' | jq '.[][][][]'

echo '{"q":{}}' | jq '.[][][][]'

echo '{"q":{"x":1}}' | jq '.[][][][]'
jq: error (at <stdin>:1): Cannot iterate over number (1)

@expede
Copy link
Member

expede commented Feb 29, 2024

Some more WTF from jq land

LOL yeah I don't love their heavy use of null. I guess they're using null as undefined?

echo '{"q": {"0":1} }' | jq '.q[0]'

If you use quotes it works:

$ echo '{"q": {"0":1} }' | jq '.q["0"]'
1

Which is maybe a fine enough way to distinguish between array index vs maps? I'm not against that behaviour being explicit, though I would have expected it to returning the first alphabetical element of a collection on a map 🤷‍♀️

This one seems reasonable to me, and IIRC matches how JS behaves

@expede
Copy link
Member

expede commented Feb 29, 2024

echo '{}' | jq '.[][][][]'

😆 wut

@expede
Copy link
Member

expede commented Feb 29, 2024

E.g. always result in null if path does not lead to a value

Do you mean like this?

// {a: 1}
".b" == null

I guess that makes sense. We want to be able to say that somethnig is unset. Do we need to distinguish between these cases in validators?

query = ".a"
{a: null}
// null

query = ".a"
{}
// null

@expede
Copy link
Member

expede commented Feb 29, 2024

Object iterator MUST sort keys alphabetically to ensure deterministic order both across languages and to be immune to insertion order in JS.

Agreed 👍

I end up with an return type like

Seems reasonable. From the chat in Discord, it probably needs to extend with some/every, but it's basically right yeah 👍

.length on arrays is produces errors

I think I see why: it looks identical to expecting a map key. {length: 0} is definitely a case that I'd like to support, and I woudl expect it to fail if run on a non-object

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

Do you mean like this?

// {a: 1}
".b" == null

I don't like this but it is jq behavior

echo '{"a":1}' | jq '.b'  
null

I guess that makes sense. We want to be able to say that somethnig is unset. Do we need to distinguish between these cases in validators?

query = ".a"
{a: null}
// null

query = ".a"
{}
// null

These behave as follows in jq, I don't like it but it does

echo '{"a":null}' | jq '.a'
null
echo '{}' | jq '.a'        
null

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

LOL yeah I don't love their heavy use of null. I guess they're using null as undefined?

JSON does not have undefined so it makes sense that they use null although there are bunch of cases where I'd expect error instead. Or they could also return null in cases where they produce errors today specifically it makes no sense to me that .a on null is null while on [] it is an error.

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

This one seems reasonable to me, and IIRC matches how JS behaves

In JS it will cast 0 into "0" and return corresponding property, which may not be ideal, but I do still find surprising that jq error instead. It is also less, but still odd that .foo on {} is null but .[0] on {} is an error. I would expect both to be errors or both to produce null 🤷‍♂️

@expede
Copy link
Member

expede commented Feb 29, 2024

How do you feel about this behaviour:

// {}
["not", ".foo"] // true

// {foo: null}
["not", ".foo"] // true

// {foo: false}
["not", ".foo"] // true

Which is a bit more like JS

@expede
Copy link
Member

expede commented Feb 29, 2024

(specifically trying to find a way to predicate omission)

@expede
Copy link
Member

expede commented Feb 29, 2024

still odd that .foo on {} is null but .[0] on {} is an error.

Yeah they fail on some (what I think they're treating as) type errors, but not others

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

I would personally leave not to logic operators and leave selectors out of it you could instead use [== .foo? null]

@expede
Copy link
Member

expede commented Feb 29, 2024

leave selectors out of it

Which I guess is an advantage of missing keys matching null, but I can't distinguish between "the field was null" and "the key was missing", right?

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

I had a typo I meant [== .foo? null], which if I understand ? correctly will turn missing key into null.

@Gozala
Copy link
Author

Gozala commented Feb 29, 2024

Which I guess is an advantage of missing keys matching null, but I can't distinguish between "the field was null" and "the key was missing", right?

I don't like that jq returns null when you don't have a corresponding element, I would personally prefer error there. That way adding ? in the end gives you a way to say no error just give me null. Which also aligns with JS optional chaining syntax.

@expede
Copy link
Member

expede commented Feb 29, 2024

I had a typo I meant [== .foo? null], which if I understand ? correctly will turn missing key into null.

Yes that's the behaviour in jq 👍 That works for my use case above :)

I don't like that jq returns null when you don't have a corresponding element, I would personally prefer error there. That way adding ? in the end gives you a way to say no error just give me null. Which also aligns with JS optional chaining syntax.

Yeah that makes a lot of sense. I do think that sticking fairly closely to jq is still a good idea, and having a CLI tool to test things against has been helpful even in this thread. IMO we can diverge from their behaviour where there's a good reason. Let's maybe draft a bullet point list on how we'd like to diverge from them?

@Gozala
Copy link
Author

Gozala commented Mar 2, 2024

Turns out jq utilized []? in a good way

➜ echo '[[1], 2]' | jq '.[].[]'
1
jq: error (at <stdin>:1): Cannot iterate over number (2)

➜ echo '[[1], 2]' | jq '.[].[]?'                                  1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants