Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring to perform cardinality estimation specifically for YT. #4482

Merged
merged 1 commit into from
May 24, 2024

Conversation

alephonea
Copy link
Collaborator

Reviewed in
#3866

but accidentally closed that PR - reopening.

@alephonea alephonea requested review from a team as code owners May 13, 2024 22:24
Copy link

github-actions bot commented May 13, 2024

2024-05-13 22:27:36 UTC Pre-commit check for 44833f2 has started.
2024-05-13 22:27:38 UTC Build linux-x86_64-relwithdebinfo is running...
🔴 2024-05-13 22:44:20 UTC Build failed. see the build logs.
🔴 2024-05-13 22:45:58 UTC Tests run skipped.

Copy link

github-actions bot commented May 13, 2024

2024-05-13 22:27:50 UTC Pre-commit check for 44833f2 has started.
2024-05-13 22:27:53 UTC Build linux-x86_64-release-clang14 is running...
🔴 2024-05-13 22:43:42 UTC Build failed. see the build logs.

Copy link

github-actions bot commented May 13, 2024

2024-05-13 22:27:58 UTC Pre-commit check for 44833f2 has started.
2024-05-13 22:28:00 UTC Build linux-x86_64-release-asan is running...
🔴 2024-05-13 22:44:34 UTC Build failed. see the build logs.
🔴 2024-05-13 22:46:08 UTC Tests run skipped.

pavelvelikhov
pavelvelikhov previously approved these changes May 14, 2024
Copy link

github-actions bot commented May 14, 2024

2024-05-14 16:06:11 UTC Pre-commit check for 0d36e1c has started.
2024-05-14 16:06:14 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-05-14 16:35:59 UTC Build successful.

Copy link

github-actions bot commented May 14, 2024

2024-05-14 16:06:12 UTC Pre-commit check for 0d36e1c has started.
2024-05-14 16:06:15 UTC Build linux-x86_64-release-asan is running...
🟢 2024-05-14 16:22:58 UTC Build successful.
2024-05-14 16:24:45 UTC Tests are running...
🔴 2024-05-14 18:12:14 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14057 13966 0 32 50 9

Copy link

github-actions bot commented May 14, 2024

2024-05-14 16:24:08 UTC Pre-commit check for 0d36e1c has started.
2024-05-14 16:24:09 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-05-14 16:42:34 UTC Build successful.
2024-05-14 16:44:30 UTC Tests are running...
🔴 2024-05-14 18:37:59 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
72892 59685 0 3 13188 16

TOptimizerStatistics() : KeyColumns(EmptyColumns) {}
TOptimizerStatistics(double nrows, int ncols): Nrows(nrows), Ncols(ncols), KeyColumns(EmptyColumns) {}
TOptimizerStatistics(double nrows, int ncols, double cost): Nrows(nrows), Ncols(ncols), Cost(cost), KeyColumns(EmptyColumns) {}
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double cost): Type(type), Nrows(nrows), Ncols(ncols), Cost(cost), KeyColumns(EmptyColumns) {}
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double byteSize, double cost): Type(type), Nrows(nrows), Ncols(ncols), ByteSize(byteSize), Cost(cost), KeyColumns(EmptyColumns) {}
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double cost, const TVector<TString>& keyColumns): Type(type), Nrows(nrows), Ncols(ncols), Cost(cost), KeyColumns(keyColumns) {}
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double byteSize, double cost, const TVector<TString>& keyColumns): Type(type), Nrows(nrows), Ncols(ncols), ByteSize(byteSize), Cost(cost), KeyColumns(keyColumns) {}
TOptimizerStatistics(EStatisticsType type, double nrows, int ncols, double byteSize, double cost, const TVector<TString>& keyColumns, IProviderStatistics* specific)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Почему нельзя все сделать одним конструктором с дефолтными значениями аргументов?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 20 to 24
Undefined,
LookupJoin,
MapJoin,
GraceJoin,
StreamLookupJoin //Right part can be updated during an operation. Used mainly for joining streams with lookup tables. Currently impplemented in Dq by LookupInputTransform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Здесь перечислены только DQ-специфичные алгоритмы. В YT провайдере есть другие реализации. Что с ними предполагается делать? Как будет расширяться этот enum?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Предполагается что enum будет общим для DQ и YT. Если YT понадобится алгоритм джойна, который здесь не перечислен, то для DQ IsJoinApplicable() должен вернуть false для этого алгоритма.

Сейчас YT делает либо Map join, либо merge join (GraceJoin в этом енуме).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraceJoin больше соответствует Common стратегии. Не понятно, как тут различать merge и common

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Добавил MergeJoin

В качестве альтернативы, можно привязать алгоритмы все джойна к системе, в которой они выполняются:

DqLookupJoin,
DqMapJoin,
..
YtMapJoin,
YtMergeJoin

чтобы было понятно из значения enum-а как конкретно он будет исполняться. Но мне кажется, это излишнее усложнение.

Comment on lines 40 to 41
TJoinColumn(TString relName, TString attributeName) : RelName(relName),
AttributeName(attributeName) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -37,7 +23,7 @@ bool NDq::operator < (const NDq::TJoinColumn& c1, const NDq::TJoinColumn& c2) {
return false;
}

TString NYql::ConvertToJoinAlgoString(EJoinAlgoType joinAlgo) {
TString ConvertToJoinAlgoString(EJoinAlgoType joinAlgo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Используй GENERATE_ENUM_SERIALIZATION в ya.make

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Эта функция используется для вывода в std::stringstream (как и ConvertToJoinString) который не поддерживается GENERATE_ENUM_SERIALIZATION

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не понял про поддержку. Ты будешь просто писать ToString(joinAlgo) вместо ConvertToJoinAlgoString(joinAlgo). И ToString() тебе сгенерит GENERATE_ENUM_SERIALIZATION, а не руками перечислять все константы

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

const std::set<std::pair<NDq::TJoinColumn, NDq::TJoinColumn>>& joinConditions,
EJoinAlgoType joinAlgo) const override;

static const TBaseProviderContext& instance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Instance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -109,4 +109,123 @@ void TJoinOptimizerNode::Print(std::stringstream& stream, int ntabs) {
RightArg->Print(stream, ntabs+1);
}

bool IsPKJoin(const TOptimizerStatistics& stats, const TVector<TString>& joinKeys) {
if (stats.KeyColumns.size()==0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: пробелы вокруг ==

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return false;
}

for(size_t i=0; i<stats.KeyColumns.size(); i++){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: пробелы вокруг =, < и ()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

github-actions bot commented May 15, 2024

2024-05-15 22:21:50 UTC Pre-commit check for 667db86 has started.
2024-05-15 22:21:53 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-05-15 22:37:04 UTC Build successful.

Copy link

github-actions bot commented May 15, 2024

2024-05-15 22:21:57 UTC Pre-commit check for 667db86 has started.
2024-05-15 22:21:59 UTC Build linux-x86_64-release-asan is running...
🟢 2024-05-15 22:39:03 UTC Build successful.
2024-05-15 22:41:00 UTC Tests are running...
🔴 2024-05-16 00:32:07 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14365 14263 0 42 51 9

Copy link

github-actions bot commented May 15, 2024

2024-05-15 22:21:58 UTC Pre-commit check for 667db86 has started.
2024-05-15 22:22:01 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-05-15 22:38:26 UTC Build successful.
2024-05-15 22:40:27 UTC Tests are running...
🔴 2024-05-16 00:36:38 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
73204 60533 0 6 12657 8

@alephonea alephonea force-pushed the cbo-exp1 branch 2 times, most recently from e277117 to 14c68ba Compare May 21, 2024 22:05
Copy link

github-actions bot commented May 21, 2024

2024-05-21 22:07:37 UTC Pre-commit check for da0f35c has started.
2024-05-21 22:07:40 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-05-21 22:23:51 UTC Build successful.

Copy link

github-actions bot commented May 21, 2024

2024-05-21 22:07:52 UTC Pre-commit check for da0f35c has started.
2024-05-21 22:07:55 UTC Build linux-x86_64-release-asan is running...
🟢 2024-05-21 22:24:18 UTC Build successful.
2024-05-21 22:26:08 UTC Tests are running...
🔴 2024-05-22 00:23:16 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14439 14335 0 42 50 12

Copy link

github-actions bot commented May 21, 2024

2024-05-21 22:10:05 UTC Pre-commit check for da0f35c has started.
2024-05-21 22:10:07 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-05-21 22:27:00 UTC Build successful.
2024-05-21 22:28:55 UTC Tests are running...
🔴 2024-05-22 00:24:13 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
73414 60714 0 3 12692 5

Copy link

github-actions bot commented May 22, 2024

2024-05-22 18:43:26 UTC Pre-commit check for 841ded6 has started.
2024-05-22 18:43:28 UTC Build linux-x86_64-release-asan is running...
2024-05-22 18:53:50 UTC Check cancelled

Copy link

github-actions bot commented May 22, 2024

2024-05-22 18:44:57 UTC Pre-commit check for 841ded6 has started.
2024-05-22 18:44:59 UTC Build linux-x86_64-release-clang14 is running...
2024-05-22 18:54:06 UTC Check cancelled

Copy link

github-actions bot commented May 22, 2024

2024-05-22 18:45:35 UTC Pre-commit check for 841ded6 has started.
2024-05-22 18:45:38 UTC Build linux-x86_64-relwithdebinfo is running...
2024-05-22 18:54:07 UTC Check cancelled

Copy link

github-actions bot commented May 22, 2024

2024-05-22 18:55:18 UTC Pre-commit check for e23d56f has started.
2024-05-22 18:55:20 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-05-22 19:00:18 UTC Build successful.

Copy link

github-actions bot commented May 22, 2024

2024-05-22 18:57:36 UTC Pre-commit check for e23d56f has started.
2024-05-22 18:57:39 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-05-22 19:05:01 UTC Build successful.
2024-05-22 19:06:52 UTC Tests are running...
🔴 2024-05-22 21:03:28 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
73449 60739 0 5 12695 10

Copy link

github-actions bot commented May 22, 2024

2024-05-22 18:57:36 UTC Pre-commit check for e23d56f has started.
2024-05-22 18:57:39 UTC Build linux-x86_64-release-asan is running...
🟢 2024-05-22 19:03:18 UTC Build successful.
2024-05-22 19:05:08 UTC Tests are running...
🔴 2024-05-22 21:06:50 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14450 14342 0 44 55 9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants