New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
temporary QueryWorkflowAnalysis code #684
base: rku/aql-machine
Are you sure you want to change the base?
Conversation
fd9dbfc
to
ee74b6e
Compare
} | ||
} | ||
|
||
pub fn check(&'a self, query: &'a Query) -> Vec<QueryWorkflowAnalysisError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query
argument should be the same as self.query
, right? In that case it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes.
.workflows | ||
.iter() | ||
.for_each(|(_ident, workflow)| errors.extend(self.check_workflow(workflow))); | ||
errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d use self.query.workflows.values().flat_map(|wf| self.check_workflow(wf)).collect()
} | ||
Call { | ||
workflow: workflow_ident, | ||
cases: _, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cases need to be checked as well
.. | ||
} => { | ||
if let Some(workflow) = self.query.workflows.get(&*workflow_ident) { | ||
self.check_workflow(workflow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn’t do this here, since we have a reliable top-level enumeration of all workflows. This means the tracker isn’t needed.
} | ||
|
||
#[derive(Default)] | ||
pub(crate) struct WorkflowTracker<'a>(pub(crate) RefCell<BTreeSet<&'a Workflow<'a>>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably no longer needed, but still: why pub(crate)
? Such internal things should always stay as local as possible, especially with a RefCell inside.
pub(crate) query: &'a Query<'a>, | ||
pub(crate) workflow_tracker: WorkflowTracker<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think neither of these fields are needed: the (recursive) analysis functions can just get the outer Query
and the syntax element in question as arguments. When analysing the flow inside a workflow we’ll probably need to have some more context, though, so it might be appropriate to keep this struct with the query
field in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also imagine the type checker information to be stored in this struct too because it will be used inside while checking binders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was faced with similar vec vs iter decisions while writing the code to dive down the workflow getting the steps. Curious to learn your take on that.
} | ||
|
||
fn check_step(&'a self, step: &'a WorkflowStep<'a>) -> Vec<QueryWorkflowAnalysisError> { | ||
use super::workflow::WorkflowStep::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Not a fan of using all the enum members. In this case is not particularly problematic but take a look into this https://youtu.be/8j_FbjiowvE?si=DOchI2_0vieFkEjg&t=103 (not the full video)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wat?? TIL thanks for the reference
}); | ||
} | ||
|
||
errors.extend(cases.iter().flat_map(|(_, steps)| self.check_steps(steps))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this always runs, you can do:
let mut errors = cases.iter().flat_map(|(_, steps)| self.check_steps(steps)).collect::<Vec<_>>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, I feel that workflow_identity error should go first before the cases
error. Not that I am sure why this ordering matter at all
My first thought was: if these methods can return an iterator rather than vec, then at the end a chain and collect is executed, then it might result in a better code. But I do not know how to do that nor I can find any references on the internet on how to do that |
6c6149c
to
866c948
Compare
07ecd0e
to
9470962
Compare
9470962
to
159f37b
Compare
a9def1a
to
832e473
Compare
} | ||
_ => Err(format!("{:?} cannot be accessed by {:?}", x, first)), | ||
Type::Atom(type_atom) => match type_atom { | ||
TypeAtom::Universal => return Ok(Type::Atom(TypeAtom::Universal)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
return Ok(*cur_type) // or cur_type.clone() if Type doesn't implement Copy
Or even just:
return Err(anyhow!("UNIVERSAL type is not indexable"))
pub(crate) error_type: QueryTypeCheckErrorType, | ||
} | ||
|
||
type ExtraMessage = String; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is usually not done in Rust: type
aliases are used to write down complex types with less verbosity, otherwise newtypes like struct ExtraMessage(String)
are used to create a separate type with different semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With struct
how can I make it so that ExtraMessage
can be operated as if it is a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you want to achieve? In which way shall the new type be different from String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to state that this is a string, but also an "ExtraMessage" for InvalidIndex
error.
The reason is, DrillDownResult
cannot be Result<Type, QueryTypeCheckErrorType::InvalidIndex>
and Result<Type, String>
isn't clear enough on what the string are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I’d just use Result<Type, QueryTypeCheckErrorType>
as the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the control flow structure will need to be geared towards traversing the statements of a workflow in the same order as during evaluation, including a context that is enriched along the way. This enables checking that eventLabel @ whatever
refers to a valid whatever
— which may either be a workflow argument or a binding.
} | ||
} | ||
|
||
pub(crate) struct QueryTypeCheck<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, type checking is a function, not an object, so I’d start from functions, not methods. If the argument lists become too long and repetitive, then some arguments can be passed within a single struct.
tail: vec![Index::String(String::from("some_string"))].try_into().unwrap(), | ||
}, | ||
) | ||
.is_err()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add the second parameter to assert!
with a short explanation of what failed
} | ||
|
||
#[test] | ||
fn tuple() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd separate all the asserts in this test into multiple tests. AFAIK you also get to take more advantage of parallelism that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file is lacking some docs, they should help you pick the file back up in the future too as well as any potential contributors.
(Type::Record(a), Type::Record(b)) => { | ||
let a = a.iter().collect::<BTreeSet<_>>(); | ||
let b = b.iter().collect::<BTreeSet<_>>(); | ||
let intersection = BTreeSet::intersection(&a, &b).cloned().collect::<BTreeSet<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following records:
type A = { x: number, y: string }
type B = { y: string }
As far as your code goes, this makes it so that A supertypeof B
which I agree with, however, what if we have the following:
type A = { x: number, y: { y_x: string, y_z: string } }
type B = { x: number, y: { y_x: string } }
Logic follows that A supertypeof B
is still true, however I don't think this piece of code handles this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven’t yet read the code, but @jmg-duarte your comment above is incorrect: a subtype can be used in place of its supertype (Liskov substitution principle), which means that it must have at least the properties of the supertype. A is a subtype of B, in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really nice catch. thanks for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven’t yet read the code, but @jmg-duarte your comment above is incorrect: a subtype can be used in place of its supertype (Liskov substitution principle), which means that it must have at least the properties of the supertype. A is a subtype of B, in both cases.
True, I applied the logic for refinements here where NUMBER is supertype of 10.
But I think the nesting issue still applies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nesting issue still applies I think.
One(T), | ||
} | ||
|
||
fn spread<T>(vec: Vec<T>, spreader: impl Fn(&T) -> Spread<T>) -> Vec<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the point of this function, could you add docs or explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do. and it will be moved down because it doesn't make sense being between struct Type and impl Type
/// Spread union tree into a set of type references without collapsing | ||
fn spread_union((a, b): &(Type, Type)) -> BTreeSet<&Type> { | ||
let spread = spread(vec![a, b], |x| match x { | ||
Type::Union(union) => { | ||
let (a, b) = union.as_ref(); | ||
Spread::Many(vec![a, b]) | ||
} | ||
x => Spread::One(x), | ||
}); | ||
|
||
spread.into_iter().collect() | ||
} | ||
|
||
/// Spread union tree into a set of types with collapsing | ||
fn spread_union_collapsing((a, b): &(Type, Type)) -> BTreeSet<Type> { | ||
let spread = spread(vec![a.clone(), b.clone()], |x| match x.clone().collapse() { | ||
Type::Union(union) => { | ||
let (a, b) = union.as_ref(); | ||
Spread::Many(vec![a.clone(), b.clone()]) | ||
} | ||
x => Spread::One(x), | ||
}); | ||
|
||
spread.into_iter().collect() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there must be a better way of writing this API. The double clone that ends up happening in line 337 seems wrong.
Maybe you could use a Collapse
trait and implement it for the various BTreeSet
, delaying the cloning until its really necessary for example. Furthermore, as we discussed, collapse
probably doesn't need to consume the type, you're just checking a bunch of things and returning a new type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while collapse doesn't consume, it still produces a new type.
this in turn forces this function to return BTreeSet and not BTreeSet<&Type>
because we cannot return references to objects created within the function (produced by collapse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense but my point was also about how (if possible) to make a slightly better API by just having a single function instead of these two.
1c8396e
to
b160519
Compare
No description provided.