-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DYOD] Only use valid UCCs in optimization and limit cacheability of plans optimized using UCCs #2600
base: j-hellenberg/maintain_data_dependencies
Are you sure you want to change the base?
Conversation
8016545
to
1ef774f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A looooot of changes, but good job!
Again only minor comments and all in all great changes! Feel free to consider my comments and if they're not relevant, skip them.
For the future: Maybe, when touching so many files, it makes sense to specify an order in which the reviewer can work through the files.
Happy coding!
@@ -6,7 +6,11 @@ namespace hyrise { | |||
|
|||
FunctionalDependency::FunctionalDependency(ExpressionUnorderedSet init_determinants, | |||
ExpressionUnorderedSet init_dependents) | |||
: determinants(std::move(init_determinants)), dependents(std::move(init_dependents)) { | |||
: FunctionalDependency(std::move(init_determinants), std::move(init_dependents), true) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recently seen this in somebody else's code and I thought it improves readability. Maybe you find it useful as well?
: FunctionalDependency(std::move(init_determinants), std::move(init_dependents), true) {} | |
: FunctionalDependency(std::move(init_determinants), std::move(init_dependents), /* permanent=*/ true) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wondered why this is being set to true. Maybe it makes sense to explain this in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recently seen this in somebody else's code and I thought it improves readability. Maybe you find it useful as well?
While they improve readability, I personally dislike adding comments like this because they need to be maintained as well and can easily become out of sync. Also, in this case, the constructor that is called is just two lines below that, so one can easily look there to see how the parameter is named.
I also wondered why this is being set to true. Maybe it makes sense to explain this in a comment?
I added a comment for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see comment in header)
@@ -26,16 +26,25 @@ namespace hyrise { | |||
* | |||
* Currently, the determinant expressions are required to be non-nullable to be involved in FDs. Combining null values | |||
* and FDs is not trivial. For more reference, see https://arxiv.org/abs/1404.4963. | |||
* | |||
* If the FD may become invalid in the future (because it is not based on a schema constraint, but on the data | |||
* incidentally fulfilling the constraint at the moment), the FD is marked as being not permanent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is the first mention of the term permanent, it maybe makes sense to explain it and its consequences quickly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment :)
IsCacheable operator&(IsCacheable lhs, IsCacheable rhs) { | ||
return static_cast<IsCacheable>(static_cast<bool>(lhs) & static_cast<bool>(rhs)); | ||
} | ||
|
||
IsCacheable& operator&=(IsCacheable& lhs, const IsCacheable rhs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity: why is only IsCacheable
parameter passed as reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure if I get the question right: lhs
is passed by reference, because we actually want to modify this parameter with the operator. We don't want to modify rhs
, so this parameter is passed by value.
But I guess, the return type of the function can be IsCacheable
instead of IsCacheable&
. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the explanation regarding the lhs
is correct, but returning a reference is just the normal way to implement that operator: returning the result of the assignment again instead of making a copy.
@@ -187,6 +197,8 @@ void DependentGroupByReductionRule::_apply_to_plan_without_subqueries( | |||
|
|||
return LQPVisitation::VisitInputs; | |||
}); | |||
|
|||
return rule_was_applied_using_non_permanent_ucc ? IsCacheable::No : IsCacheable::Yes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this aligns with the coding guidelines. Personally, I'd prefer this expression, but maybe the teaching team can leave a comment here. The rule I'm referring to is this one:
Use braced control statements, even for single-line blocks. Moreover, unless the block is empty (e.g., while (!ready) {}), add line breaks. Instead of if (...) x(); (or if (...) { x(); }), write:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this coding guideline applies to ternaries. I think the point of this guideline is to not write something like
if (...)
foo()
and later accidentally change it to
if (...)
foo()
bar()
without adding the necessary brackets, which is interpreted as
if (...) {
foo()
}
bar()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely right, conditional/ternary operator is perfectly fine.
src/test/lib/optimizer/strategy/join_to_predicate_rewrite_rule_test.cpp
Outdated
Show resolved
Hide resolved
d214251
to
b3b721c
Compare
…uaranteed to be currently valid Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…y be false if they were derived from a table key constraint which can become invalid Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…e resulting query plan is cacheable Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
… we need to be able to call is_permanent on it Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…query plan and only cache cacheable query plans Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
a5382c1
to
be582b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like your code and most comments are just minor remarks or ideas on design choices (and some code guidelines though). Did not really find functionality issues or bugs, nice work.
DebugAssert(!expressions.empty(), "Invalid input. Set of expressions should not be empty."); | ||
DebugAssert(has_output_expressions(expressions), | ||
"The given expressions are not a subset of the LQP's output expressions."); | ||
|
||
const auto& unique_column_combinations = this->unique_column_combinations(); | ||
if (unique_column_combinations.empty()) { | ||
return false; | ||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return {};
* sanity checks. | ||
*/ | ||
bool has_matching_ucc(const ExpressionUnorderedSet& expressions) const; | ||
std::optional<UniqueColumnCombination> get_matching_ucc(const ExpressionUnorderedSet& expressions) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess that's a matter of taste, but we also have find_column_id
with comparable semantics, so find_matching_ucc
?
@@ -139,7 +139,7 @@ UniqueColumnCombinations AggregateNode::unique_column_combinations() const { | |||
|
|||
// Make sure that we do not add an already existing or a superset UCC. | |||
if (unique_column_combinations.empty() || | |||
!contains_matching_unique_column_combination(unique_column_combinations, group_by_columns)) { | |||
!get_matching_unique_column_combination(unique_column_combinations, group_by_columns).has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need has_value()
here?
*/ | ||
struct FunctionalDependency { | ||
FunctionalDependency(ExpressionUnorderedSet init_determinants, ExpressionUnorderedSet init_dependents); | ||
FunctionalDependency(ExpressionUnorderedSet init_determinants, ExpressionUnorderedSet init_dependents, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pick here, seen it multiple times: Just use a default parameter instead of two constructors
FunctionalDependency(ExpressionUnorderedSet init_determinants, ExpressionUnorderedSet init_dependents,
bool permanent = true);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not 100% happy with the naming. An FD, e.g., also stems from an Aggregate (group by columns determine aggregates). So permanent might be misleading. Maybe sth like time_independent
? Open for better ideas.
|
||
bool operator==(const FunctionalDependency& other) const; | ||
bool operator!=(const FunctionalDependency& other) const; | ||
size_t hash() const; | ||
|
||
bool is_permanent() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any means to create an accessor for a public member?
@@ -446,4 +446,11 @@ TEST_F(BetweenCompositionTest, HandleMultipleEqualExpressions) { | |||
EXPECT_LQP_EQ(result_lqp, expected_lqp); | |||
} | |||
|
|||
TEST_F(BetweenCompositionTest, CheckCacheability) { | |||
const auto input_lqp = PredicateNode::make(equals_(_a_a, 100), PredicateNode::make(equals_(_a_b, 100), _node_a)); | |||
const auto lqp_result = StrategyBaseTest::apply_rule_with_cacheability_check(_rule, input_lqp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess we don't need StrategyBaseTest::
prefix here?
const auto predicate_node = PredicateNode::make(equals_(lqp_column_(stored_table_node, ColumnID{0}), "zzz")); | ||
predicate_node->set_left_input(stored_table_node); | ||
|
||
const auto lqp_result = StrategyBaseTest::apply_rule_with_cacheability_check(_rule, predicate_node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see above)
stored_table_node); | ||
const auto lqp_result = apply_rule_with_cacheability_check(rule, lqp); | ||
const auto cacheable = lqp_result.cacheable; | ||
EXPECT_EQ(cacheable, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPECT_TRUE
const auto lqp_result = StrategyBaseTest::apply_rule_with_cacheability_check(rule, input_lqp); | ||
const auto cacheable = lqp_result.cacheable; | ||
EXPECT_EQ(cacheable, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see above, skip this comment for the next few occurrences)
@@ -274,17 +274,18 @@ bool AbstractLQPNode::is_column_nullable(const ColumnID column_id) const { | |||
return left_input()->is_column_nullable(column_id); | |||
} | |||
|
|||
bool AbstractLQPNode::has_matching_ucc(const ExpressionUnorderedSet& expressions) const { | |||
std::optional<UniqueColumnCombination> AbstractLQPNode::get_matching_ucc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never really access the UCCs, right? We only care about their cacheability later, don't we? So maybe return std::pair<bool, IsCacheable>
directly?
if (!table->constraint_guaranteed_to_be_valid(table_key_constraint)) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You discard a possibly valid UCC in an edge case here: Imagine UCC was valid for CommitID 5. Our transaction runs on a snapshot w/ CommitID 7. Another transaction committed with CommitID 8 and inserted tuples that could possibly invalidate the UCC. However, the UCC is still valid for the snapshot that our transaction sees. We cannot derive this knowledge only using the max begin CID, so we should add a comment here.
Put this on hold until end of August. We plan to incorporate this feature, but there is currently no capacity for it. Will work on it when back from the internship. |
Make sure to only use UCCs in optimization that are
guaranteed_to_be_valid
.Also, we want to ensure that Hyrise only caches logical and physical query plans that cannot become invalid, because executing an invalid query plan might produce incorrect results. Query plans could become invalid if they have been optimized using UCCs that are not permanent, e.g., incidental UCCs discovered by the
UccDiscoveryPlugin
. The same can happen if a Functional Dependency (FD) was used that was derived from a non-permanent UCC. We ensure correct caching by having each optimization rule return whether the resulting Logical Query Plan can be cached.Note that this PR currently targets our first PR branch, in order to not include the changes from there again here :) Please review the other PR before this one. We will change the PR to target master before merging.