Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling run() multiple times #30

Open
B-Lorentz opened this issue Dec 27, 2023 · 6 comments
Open

Calling run() multiple times #30

B-Lorentz opened this issue Dec 27, 2023 · 6 comments

Comments

@B-Lorentz
Copy link

B-Lorentz commented Dec 27, 2023

If I understand correctly, ascent is't a 'differential datalog'-style implementation, but is optimized for batch processing.
The API does however allow .run() to be called multiple times, so users will eventually do it.
I did a little experiment:

use std::time::Instant;

use ascent::ascent;

fn main() {
    ascent!{
        #![measure_rule_times]
        relation x(u64,u64);
        relation y(u64,u64);
        y(i,n) <-- x(i,n), x(i,n+100);
        x(i,n+200) <-- y(i,n), if *n < 400000;  
    }
    let mut prog = AscentProgram::default();
    fn time(prog: &mut AscentProgram, title: &str) {
        let start = Instant::now();
        prog.run();
        println!("{:?} {:?}", prog.x.len(), prog.y.len());
        println!("{}", prog.scc_times_summary());
        println!("{}: {:?}",title, start.elapsed());
    }
    prog.x = vec![(0, 1,), (0, 101,)];
    time(&mut prog, "Initial run");

    prog.x.push((1,1,));    
    time(&mut prog,"Add of non-consequential fact");

    prog.x.push((1,101,));
    time(&mut prog,"Add of consequential fact");

    let mut prog = AscentProgram::default();
    prog.x = vec![(0, 1,), (0, 101,), (1,1,), (1,101)];
    time(&mut prog,"Full rerun");
}

The output is:

4002 4001
scc 0 time: 1.436391002s
  some of rule times: 1.42136186s
  rule x <-- y_indices_none_delta, if ⋯
    time: 4.024887ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 2.227919ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 1.415109054s
-----------------------------------------

Initial run: 1.436422348s
4003 4001
scc 0 time: 1.443135223s
  some of rule times: 1.42810157s
  rule x <-- y_indices_none_delta, if ⋯
    time: 5.573965ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 7.418424ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 1.415109181s
-----------------------------------------

Add of non-consequential fact: 9.375828ms
8004 8002
scc 0 time: 11.459918911s
  some of rule times: 11.423892339s
  rule x <-- y_indices_none_delta, if ⋯
    time: 12.857089ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 17.993869ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 11.393041381s
-----------------------------------------

Add of consequential fact: 10.019436059s
8004 8002
scc 0 time: 3.073175727s
  some of rule times: 3.047507618s
  rule x <-- y_indices_none_delta, if ⋯
    time: 7.811237ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 4.606486ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 3.035089895s
-----------------------------------------

Full rerun: 3.073211278s

It seems to me that:

  • running after adding to the input relation sometimes (my second run), does a lot less work than a full rerun.
  • at other times, when a lot of new facts were implied by the new fact, it does more work than a full rerun.

While (as I expected), removing facts from relations isn't supported.

 ascent!{
        struct A;
        relation x(u64);
        relation y(u64);
        y(n) <-- x(n);
    }
    let mut prog = A::default();
    prog.x = vec![(1,), (2,)];
    prog.run();
    println!("{:?}", prog.y);
    prog.x = vec![(1,)];
    prog.run();
    println!("{:?}", prog.y)

produces
[(1,), (2,)]
[(1,), (2,)]

My question is, would it be possible to either:

  • add some support for incrmentalized computation, so that you could indicate 'I've added facts to this relation' and 'I've removed facts from this relation', causing some of the indices to be recomputed?
  • prevent the user from calling .run() multiple times, by making run take ownership of the program instead of taking it by mutable reference
@s-arash
Copy link
Owner

s-arash commented Dec 27, 2023

Great observations.

  • add some support for incrmentalized computation, so that you could indicate 'I've added facts to this relation' and 'I've removed facts from this relation', causing some of the indices to be recomputed?

This is what is done now. Ascent conservatively assumes that all relations may have changed, and recomputes all indices for the program. Here, it seems that this recomputing of indices is very expensive for some reason. This is something that can potentially be addressed. Although I'd have to think about its ramifications for BYODS.

  • prevent the user from calling .run() multiple times, by making run take ownership of the program instead of taking it by mutable reference.

This is also a possibility. But I prefer the other option.

@B-Lorentz
Copy link
Author

@s-arash thanks.
But there isn't only a performance issue.
At least in my last example, where input facts are removed, the result isn't consistent after calling run at the second time.

@bugeats
Copy link

bugeats commented Feb 15, 2024

I have yet to run into a case where calling run() multiple times does not cause a subsequent geometric increase in eval time, even when the row counts don't actually change.

My programs are large but not super complex, and I don't think I'm doing anything too off the beaten path (using enum instances for values).

My solution for now is to manage caches myself and break things down into multiple one-off ascent_run! calls.

@s-arash
Copy link
Owner

s-arash commented Mar 9, 2024

At least in my last example, where input facts are removed, the result isn't consistent after calling run at the second time.

Right, I can't think of a good way to prevent removing facts from relations. Even replacing the Vec<T> type for relations with an append-only vec type will not prevent removing facts.

I have yet to run into a case where calling run() multiple times does not cause a subsequent geometric increase in eval time.

Ascent wasn't really designed for incremental processing of data.

I can just disallow calling run() multiple times. But I feel like it may prevent a few legitimate use cases.

I have purposefully not hidden the internals of Ascent's relations and indices so they can be tinkered with by those who know what they are doing.

@B-Lorentz
Copy link
Author

@s-arash
I think if there is one way (calling once) that is the normal and designed for use of the API, and one that is more for power users who really know what they are doing, it may make sense to separate them, like:

fn run(self) -> ResultSet 

(where ResultSet provides the contents of the relations after run, and the indices, but can't be used to run again)

and

/// Runs the program and allows it to be run later. Runs in general are not incremental, and may take longer time than rerunning from scratch, or may fail to account for facts manually removed from the input relation after the first run.
fn run_and_keep(&mut self)

Alternatively, there could be some switch attribute on the macro itself, which would generate the fields by default as private, with appropriate getters (handing out shared references), but would allow advanced users to request an 'exposed internals' ascent program.

@StarGazerM
Copy link

I think mod/rerun will require add provenance (like add fact/remove fact) semantic.
for example in query

edge(1,2).
path(a, b) :- edge(a, b).

changing edge (like remove edge(1,2)) will cause datalog rule not holds logically (some path(1,2), can't be inferred).
with some semantic support like semi-ring should help with incremental computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants