Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation returns 0 rows #244

Open
andrewbaxter opened this issue Mar 6, 2024 · 3 comments
Open

Aggregation returns 0 rows #244

andrewbaxter opened this issue Mar 6, 2024 · 3 comments

Comments

@andrewbaxter
Copy link

andrewbaxter commented Mar 6, 2024

Working off a random documentation example:

{:create hos {state: String, year: String}}
{?[count(year)] := *hos{state:state, year:year}, state="y"}

returns one row as expected:

count(year)
0

But after moving this into a rule:

{:create hos {state: String, year: String}}
{
x[state, count(year)] := *hos{state: state, year:year}
?[year] := x["y", year]
}

it returns 0 rows. AFAICT these should be semantically equivalent, so I'd expect one row with count(year) 0 as above.

In the example I used count since the definition is simple, but in my use case I need min_cost and I want a null value if there are no rows in the aggregation... my actual query is:

{
    container[id, container_id, container_dist] := 
        *triple{subject:id @ 'NOW'}, 
        container_id = id,
        container_dist = 0
    container[id, container_id, container_dist] := 
        *triple{subject:container_id, predicate:"sunwet/1/element", object:id @ 'NOW'}, 
        container_dist = 1
    container[id, c2_id, c2_dist] :=
        container[id, c1_id, c1_dist],
        *triple{subject:c2_id, predicate:"sunwet/1/element", object:c1_id @ 'NOW'},
        c2_dist = c1_dist + 1

    nearest_val_[root_id, pred, min_cost(val)] := 
        container[root_id, container_id, container_dist],
        *triple{subject:container_id, predicate:pred, object:val0 @ 'NOW'},
        val = [val0, container_dist]
    nearest_val[root_id, pred, val] :=
        nearest_val_[root_id, pred, val0],
        val = first(val0)

    ?[album_id, album, artist, cover] := 
        *triple{subject:album_id, predicate:"sunwet/1/is", object:["id", "sunwet/1/album"]},
        nearest_val[album_id, "sunwet/1/name", album],
        nearest_val[album_id, "sunwet/1/artist", artist_id],
        nearest_val[artist_id, "sunwet/1/name", artist],
        nearest_val[album_id, "sunwet/1/cover", cover]
}

(there may be other issues with it, but IIRC it worked if there's at least one value for all of the fields: name/artist/cover/etc)

@andrewbaxter
Copy link
Author

andrewbaxter commented Mar 6, 2024

Okay, I guess this is because of bottom up evaluation - x is evaluated first with something like "for each state in hos, produce a row with the count of all years". Since hos is empty it produces no rows, even though any value for state satisfies the predicate.

I think the workaround is to basically not do any aggregates until the very end. I was using min_cost(x) then first(x) in a second rule to get default/optional values in joins, I'm not sure if there's a better idiom for that. The extra rule is a fair amount of boilerplate with risks of transposing values, de-localizes the logic requiring changes in multiple rules, prevents composition (what was a[x, y] := b[x, y] now needs something like a[x, y] := x_ = get(x,0), x_cost = get(x_, 1), b[x_, y_], y_1 = get(y_, 0), y_cost = get(y_, 1), y = [y_1, [x_cost, y_cost]] to propagate the costs for the final aggregation, etc.

Could the execution model somehow be extended to support infinite or generated sets evaluated lazily? Or is there a better idiom for dealing with optional rules? Could some sort of custom macro function/macro rule thing help get rid of some of the boilerplate maybe?

@andrewbaxter
Copy link
Author

andrewbaxter commented Mar 6, 2024

Actually

{
    ?[a, b, c] <- 
        [[1, 'a', 'A'], [2, 'b', 'B'], [3, 'c', 'C'], [4, 'd', 'D']]
    :create fd {a, b => c}
}
{
    x[a, b] <- []
    ?[a, count(b)] := *fd{a: a}, x[a, b]
}

produces 0 rows too. I assume here *fd{a: a} is a non-empty set, x[a, b] is an empty set, and it must be doing the union before doing the count(b)...

I found a workaround, but it's very labor intensive:

{
    ?[a, b, c] <- 
        [[1, 'a', 'A'], [2, 'b', 'B'], [3, 'c', 'C'], [4, 'd', 'D']]
    :create fd {a, b => c}
}
{
    x[a, b] <- []
    y[a, b] := *fd{a}, b = null
    y[a, b] := x[a, b_], b = [b_, 0]
    z[a, count(b)] := y[a, b]
    ?[a, b] := z[a, b_], b = b_ - 1
}

i.e. use or to add dummy rows based on the non-optional data, then adjust the results to remove the dummy rows as an additional step (for smallest_by this doesn't need extra code to remove, but for count and possibly other aggregations you need to do -1 afterwards)

Again, I don't think it can be "wrapped up" - in my full example

    nearest_val_[root_id, pred, val] :=
        *triple{subject:root_id, predicate:pred @'NOW'},
        val = [null, 99999]
    nearest_val_[root_id, pred, val] := 
        container[root_id, container_id, container_dist],
        *triple{subject:container_id, predicate:pred, object:val0 @'NOW'},
        val = [val0, container_dist]
    nearest_val[id, pred, min_cost(val)] := nearest_val_[id, pred, val]

    ?[album_id, album, artist, cover] := 
        *triple{subject:album_id, predicate:"sunwet/1/is", object:["id", "sunwet/1/album"] @ 'NOW'},
        nearest_val[album_id, "sunwet/1/name", album],
        nearest_val[album_id, "sunwet/1/artist", artist_id], nearest_val[artist_id, "sunwet/1/name", artist],
        nearest_val[album_id, "sunwet/1/cover", cover]

doesn't work because the first nearest_val_ also produces no values, because no rows with predicate sunwet/1/artist exist, so the dummy [null, 99999] row is never put in the set. The workaround in the minimal example at the start of this reply only works because the only variable in the expression has the value that's guaranteed to exist (a).

@andrewbaxter
Copy link
Author

andrewbaxter commented Mar 6, 2024

This works

    dummy_null[n] <- [[null]]
    dummy_preds[p] <- [["sunwet/1/name"], ["sunwet/1/artist"], ["sunwet/1/cover"]]

    nearest_val_1[root_id, pred, val] :=
        dummy_null[root_id],
        dummy_preds[pred],
        val = [null, 99999]
    nearest_val_1[root_id, pred, val] :=
        *triple{subject:root_id @'NOW'},
        dummy_preds[pred],
        val = [null, 99999]
    nearest_val_1[root_id, pred, val] := 
        container[root_id, container_id, container_dist],
        *triple{subject:container_id, predicate:pred, object:val0 @'NOW'},
        val = [val0, container_dist]
    nearest_val_2[id, pred, min_cost(val)] := nearest_val_1[id, pred, val]
    nearest_val[id, pred, val] := nearest_val_2[id, pred, val_], val = first(val_)

    ?[album_id, album, artist, cover] := 
        *triple{subject:album_id, predicate:"sunwet/1/is", object:["id", "sunwet/1/album"] @ 'NOW'},
        nearest_val[album_id, "sunwet/1/name", album],
        nearest_val[album_id, "sunwet/1/artist", artist_id], nearest_val[artist_id, "sunwet/1/name", artist],
        nearest_val[album_id, "sunwet/1/cover", cover]

Basically I hardcode all predicates and the null id to produce dummy rows in the absence of data in the triples relation. It works because there's a finite set of predicates and chaining only has to deal with null ids. I'm not sure this is generally applicable.

I'm not sure, but maybe this ticket should be retitled like "Add tools for dealing with empty sets" or something. Either a way to create an "infinite set" where rows are only generated during conjunction or a way to provide a default for atoms that don't match (like nearest_val[album_id, "x", y] or y = null) but that doesn't end up producing combinatorial expansion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant