Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: possible race condition on EXCHANGE TABLES query #61135

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
558d764
add storage cache to context and use it during query analysis and exe…
yakov-olkhovskiy Mar 10, 2024
f8b4ef0
test added, comment fixed
yakov-olkhovskiy Mar 11, 2024
2de26fa
style check typo
yakov-olkhovskiy Mar 11, 2024
2f051c7
fix for query having no query context
yakov-olkhovskiy Mar 11, 2024
067a8c8
move to DatabaseCatalog
yakov-olkhovskiy Mar 11, 2024
2825656
fix
yakov-olkhovskiy Mar 11, 2024
064eca6
use weak_ptr
yakov-olkhovskiy Mar 12, 2024
c66d8e1
use StorageID instead of database.table
yakov-olkhovskiy Mar 12, 2024
792a298
Merge remote-tracking branch 'origin/master' into fix-exchange-tables
yakov-olkhovskiy Mar 13, 2024
b24629d
Merge remote-tracking branch 'origin/master' into fix-exchange-tables
yakov-olkhovskiy Mar 16, 2024
5955cc5
add no-ordinary-database tag
yakov-olkhovskiy Mar 20, 2024
e121665
cache storage by UUID
yakov-olkhovskiy Mar 22, 2024
190823f
cleanup
yakov-olkhovskiy Mar 22, 2024
4eeff80
Merge branch 'master' into fix-exchange-tables
yakov-olkhovskiy Mar 22, 2024
c67a898
fix
yakov-olkhovskiy Mar 22, 2024
0eae42a
better argument name
yakov-olkhovskiy Mar 30, 2024
242dc60
Merge branch 'master' into fix-exchange-tables
yakov-olkhovskiy Apr 2, 2024
57627ec
Merge branch 'master' into fix-exchange-tables
yakov-olkhovskiy Apr 3, 2024
856cb21
add mutex to storage_cache
yakov-olkhovskiy Apr 6, 2024
1bd25d0
Merge branch 'master' into fix-exchange-tables
yakov-olkhovskiy Apr 6, 2024
c84b702
fix clang tidy
yakov-olkhovskiy Apr 6, 2024
f0d55ea
Merge branch 'master' into fix-exchange-tables
yakov-olkhovskiy Apr 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 17 additions & 0 deletions src/Interpreters/Context.cpp
Expand Up @@ -912,6 +912,23 @@ String Context::getFilesystemCacheUser() const
return shared->filesystem_cache_user;
}

DatabaseAndTable Context::getOrCacheStorage(const StorageID & id, std::function<DatabaseAndTable()> f, std::optional<Exception> * exception) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could cache it in Context::resolveStorageID, but a new method is fine too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to enforce this cache everywhere where we call to Context::resolveStorageID - at least having a query context is a condition we need to use this cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check this condition in resolveStorageID

{
if (auto storage_id = storage_cache.find(id); storage_id != storage_cache.end())
{
DatabaseAndTable storage = DatabaseCatalog::instance().tryGetByUUID(storage_id->uuid);
if (exception && !storage.second)
exception->emplace(Exception(ErrorCodes::UNKNOWN_TABLE, "Table {} does not exist anymore - maybe it was dropped", id.getNameForLogs()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, a query will fail if it requests a table the second time, but the table was dropped after the first get. But we can improve it later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to improve it - if table was dropped concurrently then query is better to be interrupted than trying to reposes the table from drop queue - at this point we don't even know who is holding it from physical deletion...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do know, see addStorageHolder

return storage;
}

auto storage = f();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a better name

if (storage.second)
if (const auto & new_id = storage.second->getStorageID(); new_id.hasUUID())
storage_cache.insert(new_id);
return storage;
}

Strings Context::getWarnings() const
{
Strings common_warnings;
Expand Down
7 changes: 7 additions & 0 deletions src/Interpreters/Context.h
Expand Up @@ -21,6 +21,7 @@
#include <Interpreters/Context_fwd.h>
#include <Interpreters/StorageID.h>
#include <Interpreters/MergeTreeTransactionHolder.h>
#include <Interpreters/DatabaseCatalog.h>
#include <Parsers/IAST_fwd.h>
#include <Server/HTTP/HTTPContext.h>
#include <Storages/ColumnsDescription.h>
Expand Down Expand Up @@ -430,6 +431,10 @@ class ContextData
/// mutation tasks of one mutation executed against different parts of the same table.
PreparedSetsCachePtr prepared_sets_cache;

/// Cache for storages resolved during query processing - we want to use the same storages we have resolved on
/// analysis stage for the execution - storage can be changed by concurrently running 'EXCHANGE TABLES' query.
mutable std::unordered_set<StorageID, StorageID::DatabaseAndTableNameHash, StorageID::DatabaseAndTableNameEqual> storage_cache;

public:
/// Some counters for current query execution.
/// Most of them are workarounds and should be removed in the future.
Expand Down Expand Up @@ -524,6 +529,8 @@ class Context: public ContextData, public std::enable_shared_from_this<Context>
String getFilesystemCachesPath() const;
String getFilesystemCacheUser() const;

DatabaseAndTable getOrCacheStorage(const StorageID & id, std::function<DatabaseAndTable()> f, std::optional<Exception> * exception) const;

/// A list of warnings about server configuration to place in `system.warnings` table.
Strings getWarnings() const;

Expand Down
20 changes: 14 additions & 6 deletions src/Interpreters/DatabaseCatalog.cpp
Expand Up @@ -949,29 +949,37 @@ bool DatabaseCatalog::isDictionaryExist(const StorageID & table_id) const
StoragePtr DatabaseCatalog::getTable(const StorageID & table_id, ContextPtr local_context) const
{
std::optional<Exception> exc;
auto res = getTableImpl(table_id, local_context, &exc);
if (!res.second)
auto table = local_context->hasQueryContext() ?
local_context->getQueryContext()->getOrCacheStorage(table_id, [&](){ return getTableImpl(table_id, local_context, &exc); }, &exc).second :
getTableImpl(table_id, local_context, &exc).second;
if (!table)
throw Exception(*exc);
return res.second;
return table;
}

StoragePtr DatabaseCatalog::tryGetTable(const StorageID & table_id, ContextPtr local_context) const
{
return getTableImpl(table_id, local_context, nullptr).second;
return local_context->hasQueryContext() ?
local_context->getQueryContext()->getOrCacheStorage(table_id, [&](){ return getTableImpl(table_id, local_context, nullptr); }, nullptr).second :
getTableImpl(table_id, local_context, nullptr).second;
}

DatabaseAndTable DatabaseCatalog::getDatabaseAndTable(const StorageID & table_id, ContextPtr local_context) const
{
std::optional<Exception> exc;
auto res = getTableImpl(table_id, local_context, &exc);
auto res = local_context->hasQueryContext() ?
local_context->getQueryContext()->getOrCacheStorage(table_id, [&](){ return getTableImpl(table_id, local_context, &exc); }, &exc) :
getTableImpl(table_id, local_context, &exc);
if (!res.second)
throw Exception(*exc);
return res;
}

DatabaseAndTable DatabaseCatalog::tryGetDatabaseAndTable(const StorageID & table_id, ContextPtr local_context) const
{
return getTableImpl(table_id, local_context, nullptr);
return local_context->hasQueryContext() ?
local_context->getQueryContext()->getOrCacheStorage(table_id, [&](){ return getTableImpl(table_id, local_context, nullptr); }, nullptr) :
getTableImpl(table_id, local_context, nullptr);
}

void DatabaseCatalog::loadMarkedAsDroppedTables()
Expand Down
Empty file.
27 changes: 27 additions & 0 deletions tests/queries/0_stateless/03007_exchange_tables_race.sh
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
# Tags: no-ordinary-database

set -e

CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh

${CLICKHOUSE_CLIENT} --multiquery <<EOF
DROP TABLE IF EXISTS tbl_03007_1;
DROP TABLE IF EXISTS tbl_03007_2;
CREATE TABLE tbl_03007_1 (n Float64) ENGINE=Memory;
CREATE TABLE tbl_03007_2 (n Int256) ENGINE=Memory;
EOF

for _ in {1..50}; do
(! ${CLICKHOUSE_CLIENT} --query "SELECT n * 0.123 FROM (SELECT * FROM tbl_03007_1)" 2>&1 | grep LOGICAL_ERROR) &
${CLICKHOUSE_CLIENT} --query "EXCHANGE TABLES tbl_03007_1 AND tbl_03007_2" &
done

wait

${CLICKHOUSE_CLIENT} --multiquery <<EOF
DROP TABLE IF EXISTS tbl_03007_1;
DROP TABLE IF EXISTS tbl_03007_2;
EOF