Aggregation returns 0 rows #244

andrewbaxter · 2024-03-06T15:58:02Z

Working off a random documentation example:

{:create hos {state: String, year: String}}
{?[count(year)] := *hos{state:state, year:year}, state="y"}

returns one row as expected:

count(year)
0

But after moving this into a rule:

{:create hos {state: String, year: String}}
{
x[state, count(year)] := *hos{state: state, year:year}
?[year] := x["y", year]
}

it returns 0 rows. AFAICT these should be semantically equivalent, so I'd expect one row with count(year) 0 as above.

In the example I used count since the definition is simple, but in my use case I need min_cost and I want a null value if there are no rows in the aggregation... my actual query is:

{
    container[id, container_id, container_dist] := 
        *triple{subject:id @ 'NOW'}, 
        container_id = id,
        container_dist = 0
    container[id, container_id, container_dist] := 
        *triple{subject:container_id, predicate:"sunwet/1/element", object:id @ 'NOW'}, 
        container_dist = 1
    container[id, c2_id, c2_dist] :=
        container[id, c1_id, c1_dist],
        *triple{subject:c2_id, predicate:"sunwet/1/element", object:c1_id @ 'NOW'},
        c2_dist = c1_dist + 1

    nearest_val_[root_id, pred, min_cost(val)] := 
        container[root_id, container_id, container_dist],
        *triple{subject:container_id, predicate:pred, object:val0 @ 'NOW'},
        val = [val0, container_dist]
    nearest_val[root_id, pred, val] :=
        nearest_val_[root_id, pred, val0],
        val = first(val0)

    ?[album_id, album, artist, cover] := 
        *triple{subject:album_id, predicate:"sunwet/1/is", object:["id", "sunwet/1/album"]},
        nearest_val[album_id, "sunwet/1/name", album],
        nearest_val[album_id, "sunwet/1/artist", artist_id],
        nearest_val[artist_id, "sunwet/1/name", artist],
        nearest_val[album_id, "sunwet/1/cover", cover]
}

(there may be other issues with it, but IIRC it worked if there's at least one value for all of the fields: name/artist/cover/etc)

The text was updated successfully, but these errors were encountered:

andrewbaxter · 2024-03-06T17:41:05Z

Okay, I guess this is because of bottom up evaluation - x is evaluated first with something like "for each state in hos, produce a row with the count of all years". Since hos is empty it produces no rows, even though any value for state satisfies the predicate.

I think the workaround is to basically not do any aggregates until the very end. I was using min_cost(x) then first(x) in a second rule to get default/optional values in joins, I'm not sure if there's a better idiom for that. The extra rule is a fair amount of boilerplate with risks of transposing values, de-localizes the logic requiring changes in multiple rules, prevents composition (what was a[x, y] := b[x, y] now needs something like a[x, y] := x_ = get(x,0), x_cost = get(x_, 1), b[x_, y_], y_1 = get(y_, 0), y_cost = get(y_, 1), y = [y_1, [x_cost, y_cost]] to propagate the costs for the final aggregation, etc.

Could the execution model somehow be extended to support infinite or generated sets evaluated lazily? Or is there a better idiom for dealing with optional rules? Could some sort of custom macro function/macro rule thing help get rid of some of the boilerplate maybe?

andrewbaxter · 2024-03-06T18:51:12Z

Actually

{
    ?[a, b, c] <- 
        [[1, 'a', 'A'], [2, 'b', 'B'], [3, 'c', 'C'], [4, 'd', 'D']]
    :create fd {a, b => c}
}
{
    x[a, b] <- []
    ?[a, count(b)] := *fd{a: a}, x[a, b]
}

produces 0 rows too. I assume here *fd{a: a} is a non-empty set, x[a, b] is an empty set, and it must be doing the union before doing the count(b)...

I found a workaround, but it's very labor intensive:

{
    ?[a, b, c] <- 
        [[1, 'a', 'A'], [2, 'b', 'B'], [3, 'c', 'C'], [4, 'd', 'D']]
    :create fd {a, b => c}
}
{
    x[a, b] <- []
    y[a, b] := *fd{a}, b = null
    y[a, b] := x[a, b_], b = [b_, 0]
    z[a, count(b)] := y[a, b]
    ?[a, b] := z[a, b_], b = b_ - 1
}

i.e. use or to add dummy rows based on the non-optional data, then adjust the results to remove the dummy rows as an additional step (for smallest_by this doesn't need extra code to remove, but for count and possibly other aggregations you need to do -1 afterwards)

Again, I don't think it can be "wrapped up" - in my full example

    nearest_val_[root_id, pred, val] :=
        *triple{subject:root_id, predicate:pred @'NOW'},
        val = [null, 99999]
    nearest_val_[root_id, pred, val] := 
        container[root_id, container_id, container_dist],
        *triple{subject:container_id, predicate:pred, object:val0 @'NOW'},
        val = [val0, container_dist]
    nearest_val[id, pred, min_cost(val)] := nearest_val_[id, pred, val]

    ?[album_id, album, artist, cover] := 
        *triple{subject:album_id, predicate:"sunwet/1/is", object:["id", "sunwet/1/album"] @ 'NOW'},
        nearest_val[album_id, "sunwet/1/name", album],
        nearest_val[album_id, "sunwet/1/artist", artist_id], nearest_val[artist_id, "sunwet/1/name", artist],
        nearest_val[album_id, "sunwet/1/cover", cover]

doesn't work because the first nearest_val_ also produces no values, because no rows with predicate sunwet/1/artist exist, so the dummy [null, 99999] row is never put in the set. The workaround in the minimal example at the start of this reply only works because the only variable in the expression has the value that's guaranteed to exist (a).

andrewbaxter · 2024-03-06T19:51:19Z

This works

    dummy_null[n] <- [[null]]
    dummy_preds[p] <- [["sunwet/1/name"], ["sunwet/1/artist"], ["sunwet/1/cover"]]

    nearest_val_1[root_id, pred, val] :=
        dummy_null[root_id],
        dummy_preds[pred],
        val = [null, 99999]
    nearest_val_1[root_id, pred, val] :=
        *triple{subject:root_id @'NOW'},
        dummy_preds[pred],
        val = [null, 99999]
    nearest_val_1[root_id, pred, val] := 
        container[root_id, container_id, container_dist],
        *triple{subject:container_id, predicate:pred, object:val0 @'NOW'},
        val = [val0, container_dist]
    nearest_val_2[id, pred, min_cost(val)] := nearest_val_1[id, pred, val]
    nearest_val[id, pred, val] := nearest_val_2[id, pred, val_], val = first(val_)

    ?[album_id, album, artist, cover] := 
        *triple{subject:album_id, predicate:"sunwet/1/is", object:["id", "sunwet/1/album"] @ 'NOW'},
        nearest_val[album_id, "sunwet/1/name", album],
        nearest_val[album_id, "sunwet/1/artist", artist_id], nearest_val[artist_id, "sunwet/1/name", artist],
        nearest_val[album_id, "sunwet/1/cover", cover]

Basically I hardcode all predicates and the null id to produce dummy rows in the absence of data in the triples relation. It works because there's a finite set of predicates and chaining only has to deal with null ids. I'm not sure this is generally applicable.

I'm not sure, but maybe this ticket should be retitled like "Add tools for dealing with empty sets" or something. Either a way to create an "infinite set" where rows are only generated during conjunction or a way to provide a default for atoms that don't match (like nearest_val[album_id, "x", y] or y = null) but that doesn't end up producing combinatorial expansion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation returns 0 rows #244

Aggregation returns 0 rows #244

andrewbaxter commented Mar 6, 2024 •

edited

andrewbaxter commented Mar 6, 2024 •

edited

andrewbaxter commented Mar 6, 2024 •

edited

andrewbaxter commented Mar 6, 2024 •

edited

Aggregation returns 0 rows #244

Aggregation returns 0 rows #244

Comments

andrewbaxter commented Mar 6, 2024 • edited

andrewbaxter commented Mar 6, 2024 • edited

andrewbaxter commented Mar 6, 2024 • edited

andrewbaxter commented Mar 6, 2024 • edited

andrewbaxter commented Mar 6, 2024 •

edited

andrewbaxter commented Mar 6, 2024 •

edited

andrewbaxter commented Mar 6, 2024 •

edited

andrewbaxter commented Mar 6, 2024 •

edited