Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify array traversal using mappers #42

Open
judofyr opened this issue Jan 11, 2021 · 4 comments
Open

Specify array traversal using mappers #42

judofyr opened this issue Jan 11, 2021 · 4 comments
Assignees

Comments

@judofyr
Copy link
Contributor

judofyr commented Jan 11, 2021

Introduction

Today we have two different array traversal semantics:

  • The one in production in Sanity.io today (v1)
  • An improved one that's currently implemented in GROQ-JS and the latest internal version of GROQ in Sanity.io (v2).

This is a third proposal which attempts to make

The main problem with the original proposal for improved array traversal

In v2 this will no longer work: *[_type == "user"]._id. This is because .id accesses an attribute and arrays don't have attributes. Instead, you would have to write *[_type == "user"][]._id. This is a highly breaking change. However, the following still works: *[_type == "user"]{foo{bar}}. Why doesn't this have to be *[_type == "user"][]{foo{bar}}? (The answer is because {…} has special behavior.)

This leads to a strange situation: We are introducing a highly breaking change in one area, but we're not achieving full consistency.

It should also be mentioned that *[_type == "user"]._id doesn't have any other sensible meaning other than "apply ._id to each element". It seems unfortunate that we're discarding a syntax which is unambiguous and clear.

The new proposal

Goals

The goal of this proposal is to

  • try to break as little existing GROQ as possible.
  • be as consistent as possible.
  • enable new use cases
  • be completely determined at compile/parse time.

Overview

The main complication of supporting *[…]._id is knowing why ._id should be mapped inside the array without depending on run time information. In *[_type == "user"]{"a": bar._id} we want ._id to mean "access _id on the bar object" and never treat it as an array projection.

The solution in this proposal is to treat traversal ([]), filtering ([_type == "user"]) and slicing ([0..5]) as array coercing markers. These will coerce the left-hand side to an array (meaning that if it's not an array it will be converted to null), and as such we know that any .foo coming after it makes sense to treat as mapping inside the array.

Details

The core idea behind this proposal is to introduce a concept of a mapper. .foo, [_type == "bar"] and ->name are all mappers. A mapper is an operation which you can apply to a value, but are not valid expressions by themselves. We can group mappers into two categories: Simple mappers and array mappers, with the distinction being that array mappers work on arrays.

We have the following simple mappers:

  • Attribute access: .foo
  • Object projection: {foo} (in *{foo}, not as a standalone expression)
  • Dereferencing: -> and ->foo.
  • Group projection: .(foo).
    This is a new invention which would allow more flexibility in the way you compose mappers. This allows e.g. *[_type == "user"].(count(books)) which would return an array of numbers.

And then we have the following array mappers:

  • Slicing: [0..5]
  • Filtering: [_type == "user"]
  • Traversal: [].
    This is the same as a [true] or [0..-1].
    It acts merely as a way of marking that the value is an array.

Pipes (|) are supported in various places to handle backwards compatibility.

Here's the grammar for composing mappers:

MapSimple ::= …
MapArray  ::= …

MapSimpleMulti ::=
  MapSimple+ MapArrayMulti?

MapArrayMulti ::= 
  MapArray+ MapSimpleMulti?

ArrExpr ::= Star
BasicExpr ::= …

Expr ::=
  BasicExpr | ArrExpr |
  BasicExpr MapSimpleMulti |
  BasicExpr MapArrayMulti |
  ArrExpr MapSimpleMulti |
  ArrExpr MapArrayMulti

Explanation:

  • MapArrayMulti and MapSimpleMulti represents composed mappers. They are mappers which are built on top of other mappers.
  • MapSimpleMulti is quite simple: When applied to a value it will apply the simple mappers and the MapArrayMulti in order.
  • MapArrayMulti is a bit more complicated:
    • When applied to a value it will first coerce the value to an array. If the value is not an array then it returns null immediately.
    • Then it applies all the array mappers (e.g. filtering, slicing) on that array.
    • If there's a MapSimpleMulti it will apply that mapper on each element of the arrry.
  • In addition, * is interpreted as an array expression. The only impact this has is that a MapSimpleMulti applied on an ArrExpr will apply the mapping on each element instead of on the value itself. This casues *{foo} to be interpreted as intended.

Implications

  • *[_type == "user"].id returns the ID of all documents.
  • *[_type == "user"].slug.title returns slug.title.
  • *[_type == "user"].roles[].title returns a nested array of role titles. If there's two users who have the roles (A,B) and (C), then this will return [["A", "B"], ["C"]].
  • In *[_type == "user"]{foo{bar}}, then foo must be an object. If it's an array then it will end up being null.
  • In *[_type == "user"]{foo[]{bar}}, then foo must be an array.

How do we teach this?

Here are some phrases which can be used for explaining the behavior:

  • "When GROQ knows that you're dealing with an array then you can add .foo to get the foo attribute of all of the documents/elements/objects."
  • "We can also dereference array of objects the same way: Just add -> at the end."
  • "Here GROQ doesn't know that it's an array, so we'll have to add []."

How to deal with flattening?

There's never any flattening happening here. I propose that we separately introduce a flat-function (that's what it is called in JavaScript): flat(*[_type == "user"].roles[].title) will flatten it one level.

@judofyr
Copy link
Contributor Author

judofyr commented Jan 13, 2021

A few things I've discovered:

Element access is maybe a bit confusing

In this original proposal then element access (foo[0]) was not a mapper. In *[_type == "user"].books[0] the [0] refers to the first user. It's equivalent to (*[_type == "user"].books)[0]. This makes sense in e.g. *[_type == "user"]._id[0], but when the nested attribute is a an array it's a bit confusing. Especially because in *[_type == "user"].books[author == "holm"] the filtering is working on the nested array.

I therefore propose that element access should also be treated as an inner mapping as well. This means that:

  • *[_type == "user"].books[0] is parsed as *[_type == "user"].(books[0])

There will be two different ways to trigger the other behavior:

  • (*[_type == "user"].books)[0]
  • *[_type == "user"][0].books

The second one is probably the most sensible solution. It makes more sense to filter/slice/access before you map than after.

Projection can't be a simple mapper

In this proposal we have the rule that "an array mapper after a simple mapper will work on the inner value". This makes sense in *.foo[baz > 2].bar: Here we assume that .foo is an array that we will filter on baz > 2. However, the current rule states *{foo}[baz > 2}{bar} should be interpreted the same way: It assumes that {foo} is an array that we will filter on baz > 2:

*.foo[baz > 2].bar =>
  // For each document execute `foo`, then treat it as an array and filter it

*{foo}[baz > 2}{bar}
  // For each document execute `{foo}`, then treat it as an array and filter it

This obviously is not the semantics we want.

Go over the the mappers again I realize that there are more mappers than just "simple" and "array". It probably makes more sense to group them as:

  • Slicing, filtering, traversal: Array-to-array
  • Element access: Array-to-any
  • Projection ({foo}): Any-to-object
  • Dereference (->): Object-to-object
  • Dereference (->foo): Object-to-any
  • Attribute (.foo): Object-to-any
  • Group (.(whatever[0] + 5)): Any-to-any

(We also need to treat * as a sort of "to-array" mapper.)

The general rule we want for array traversal is: **When a X-to-array mapper is next to an any/object-to-Y mapper" then it invokes an array traversal.

The trickiness here is also that this array traversal is something that we want to be right-associative. I'm struggling a bit to represent it in CFG…

@atombender
Copy link
Member

I think this is pretty good, but I'm not sure it's easier to explain than previous efforts. Is it possible that we're overthinking it?

The previous idea of saying that [] (whether filter, slicing, or empty) puts the expression in an "array mode" seems to be something that people can intuitively understand, even though it might not make complete sense with associativeness.

@judofyr
Copy link
Contributor Author

judofyr commented Jan 28, 2021

Alright, here's a current set of rules which I think works the way we want:

Classify the mappers into three categories:

  • A: Array mappers. Filters, slicing and [].
  • B: Basic mappers. Attribute/element access, deref.
  • C: Projection. This mapper works in dual-mode (can both be applied to objects and arrays) and therefore needs special treatment.

Next we classify the composite mappers into four different categories. (Similar to the modes in the previous proposal.)

  • MA: A chain of mappers which start with a A.
  • MB: A chain of mappers which start with a B.
  • MC: A chain of mappers which start with a C.
  • MA-flat: Something which should be flattened when mapped.

The difference between these categories and the previous "modes" is that these are defined in a right-to-left order. This is needed to capture the associativity correctly. Another advantage of the right-to-left rules is that they are easier to parse with a recursive decent parser since the terminal is on the left-hand side in all of these rules.

When combining multiple mappers there are three possible choices:

  • join(lhs, rhs) => Return a mapper which applies lhs and then rhs. join(.foo, .bar) is $base.foo.bar.
  • arrayMap(m) => Return a mapper which applies m to each element. arrayMap(.foo) is $base.map(x => x.foo).
  • flatMap(m) => Return a mapper which applies m to each element and then flattens it out. flatMap(.foo) is $base.flatMap(x => x.foo).

Example expression: foo.bar[foo > 2]{baz}

  • Converting these into the different types we get: foo B A C.
    • .bar is a B
    • [] is an A
    • {baz} is a C.
  • Step 1: C => MC gives foo B A MC.
    • We take the last one and converts it into a "composite" mapper consisting of itself.
  • Step 2: A MC => MA gives foo B MA.
    • Next we combine the last pair: A MC.
    • The rules below say that this should do the action join(lhs, arrayMap(rhs)) which means that the rhs here will be applied inside an array mapping (array traversal!)
    • The mapper now corresponds to $base.filter(x => x.foo > 2).map(x => ({"baz": x.baz})).
    • The resulting category is MA.
  • Step 3: B MA => MA-flat
    • Once again we combine the last pair.
    • This corresponds to the action join(lhs, rhs): They should just be applied in order.
    • The current mapper is now $base.bar.filter(x => x.foo > 2).map(x => ({"baz": x.baz})).
    • It also returns a MA-flat. This doesn't have any impact right now, but if there were more mappers in-front here this would eventually trigger the automatic flattening.
  • Step 4: foo MA-flat
    • Now we have reduced all of the mappers into a single composite mapper and we can return a regular Node.
    • foo.bar.filter(x => x.foo > 2).map(x => ({"baz": x.baz}))

The rules below are designed to handle all of the following cases:

  • *[_type == "foo"].bar traverses inside the array.
  • *[_type == "foo"]{bar}[bar > 2}: You can use [] and {} in any order.
  • *[_type == "foo"]{bar{baz}}: You can still use {} inside a projection. This will not cause any array traversal.
  • *[_type == "foo"]{bar[]{baz}}: … unless you explicitly mark it as an array.
  • *[_type == "foo"].foo{bar}.bar[].baz: You can combine array traversals after a projection as well.
  • Everything is decided on compile-time.

Here are the full matrix of rules:

(For all of the examples assume that there is e.g. `*` in front)

MA := A
    | A MA
        Example: [foo] [bar]
    	Action: join(lhs, rhs)

    | A MB 
    	Example: [foo] .bar
    	Action: join(lhs, arrayMap(rhs))

    | A MC
    	Example: [foo] {bar}
    	Action: join(lhs, arrayMap(rhs))

    | C MA
    	Example: {bar} [foo]
    	Action: join(arrayMap(lhs), rhs)

    | A MA-flat
	Example: [foo] .bar[]
    	Action: join(lhs, flatMap(rhs))

MB := B
    | B MB
	Example: .foo .bar
    	Action: join(lhs, rhs)

    | B MC
    	Example: .foo {bar}
    	Action: join(lhs, rhs)

MC := C
    | C MB
	Example: {bar} .foo
    	Action: join(lhs, rhs)

    | C MC
    	Example: {bar} {bar}
    	Action: join(lhs, rhs)

MA-flat :=
      B MA
	Example: .foo []
	Action: join(lhs, rhs)

    | B MA-flat
    	Example: .foo .foo[]
	Action: join(lhs, rhs)

    | C MA-flat
    	Example: {} .foo[]
    	Action: join(lhs, rhs)

@judofyr
Copy link
Contributor Author

judofyr commented Jan 28, 2021

I think this is pretty good, but I'm not sure it's easier to explain than previous efforts. Is it possible that we're overthinking it?

The previous idea of saying that [] (whether filter, slicing, or empty) puts the expression in an "array mode" seems to be something that people can intuitively understand, even though it might not make complete sense with associativeness.

We haven't really had a full proposal with this "array mode" so it's a bit hard to compare. I also don't quite see how it changes any of the complexity. If you know that a LHS is in array-mode, there are still multiple choices between what can happen. *[foo].bar is a an array traversal while *[foo][bar] is two filters. So you need a set of rules(e.g. array-mode + identifier => array traversal) here as well.

In addition, considering that it's right-associative I don't think there's a nice way of dealing with this going left-to-right. When parsing *[foo].bar.baz[].bar you need to know that the right-hand side contains another array traversal to figure out that you should be flattening.

seems to be something that people can intuitively understand

Sure, and we can teach it like that. The specification only needs to be unambiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants