Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating per session cloud table on read requests #1232

Open
ozcelgozde opened this issue Feb 22, 2024 · 7 comments
Open

Creating per session cloud table on read requests #1232

ozcelgozde opened this issue Feb 22, 2024 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@ozcelgozde
Copy link
Contributor

Question

I was looking into some timeouts during reads and noticed we are creating a cloud table over and over again for each read transaction even though table has not changed at all. This causes 3-10 seconds delay for each query and sometimes timeouts. I was wondering why read replicas dont keep a copy of the queried tables once its loaded until an modify event occurs in which case, they can reload?

@ozcelgozde ozcelgozde added the question Further information is requested label Feb 22, 2024
@yuanzhubi
Copy link
Collaborator

Do you mean the StorageCloudMergeTree object in the worker? Different sub-query from server may search for different parts in the same table.(addDataParts, loadDataParts). So it should not be used outside the query session.

@ozcelgozde
Copy link
Contributor Author

Yes I mean StorageCloudMergeTree object in the worker. I understand loading different parts, but recreating table every time creates an interesting error on retry or fallback when a timeout or rpc socket connection issue occurs:

2024.02.29 07:15:03.325072 [ 1494590 ] {3ba693d9-d6d2-11ee-892c-ba3d820a74a5} (448054137376211845) TCPHandler: Code: 57, e.displayText() = DB::Exception: DB::Exception: Table ed_mt_v1.ed_metric_2d6be233-f7bb-4fe1-90a5-28a95c86ec9c_448054137376211845 already exists. SQLSTATE: 42P07.: While executing Remote SQLSTATE: 42P07, Stack trace:
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int) @ 0x223c0252 in /opt/byconity/bin/clickhouse

  1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0xebc1580 in /opt/byconity/bin/clickhouse
  2. DB::readException(DB::ReadBuffer&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, bool) @ 0xec30dcf in /opt/byconity/bin/clickhouse
  3. DB::RPCHelpers::checkException(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) @ 0x1b706f8b in /opt/byconity/bin/clickhouse
  4. void DB::RPCHelpers::onAsyncCallDoneWithFailedInfoDB::Protos::SendResourcesResp(DB::Protos::SendResourcesResp*, brpc::Controller*, std::__1::shared_ptrDB::ExceptionHandlerWithFailedInfo, DB::WorkerId) @ 0x1b51522a in /opt/byconity/bin/clickhouse
  5. brpc::internal::FunctionClosure4<DB::Protos::SendResourcesResp*, brpc::Controller*, std::__1::shared_ptrDB::ExceptionHandlerWithFailedInfo, DB::WorkerId>::Run() @ 0x1b516fa1 in /opt/byconity/bin/clickhouse
  6. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/controller.h:704: brpc::Controller::EndRPC(brpc::Controller::CompletionInfo const&) @ 0x1e1ae9a9 in /opt/byconity/bin/clickhouse
  7. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/controller.cpp:0: brpc::Controller::OnVersionedRPCReturned(brpc::Controller::CompletionInfo const&, bool, int) @ 0x1e1ad506 in /opt/byconity/bin/clickhouse
  8. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/details/controller_private_accessor.h:0: brpc::policy::ProcessRpcResponse(brpc::InputMessageBase*) @ 0x1e1d6a30 in /opt/byconity/bin/clickhouse
  9. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/input_messenger.cpp:386: brpc::InputMessenger::OnNewMessages(brpc::Socket*) @ 0x1e1cd2ce in /opt/byconity/bin/clickhouse
  10. /home/ByConity/build_docker/../contrib/libcxx/include/memory:1655: brpc::Socket::ProcessEvent(void*) @ 0x1e2fd6cd in /opt/byconity/bin/clickhouse
  11. /home/ByConity/build_docker/../contrib/incubator-brpc/src/bthread/task_group.cpp:304: bthread::TaskGroup::task_runner(long) @ 0x1e19af2d in /opt/byconity/bin/clickhouse

@yuanzhubi
Copy link
Collaborator

@smmsmm1988 Could you check about the exception and confirm whether it is a knowable issue?

@smmsmm1988
Copy link
Collaborator

smmsmm1988 commented Mar 1, 2024

@smmsmm1988 Could you check about the exception and confirm whether it is a knowable issue?

@yuanzhubi No, it seems to be a new issue. @luffyhwl Could you help check this exception?

@luffyhwl
Copy link
Collaborator

luffyhwl commented Mar 1, 2024

Yes I mean StorageCloudMergeTree object in the worker. I understand loading different parts, but recreating table every time creates an interesting error on retry or fallback when a timeout or rpc socket connection issue occurs:

2024.02.29 07:15:03.325072 [ 1494590 ] {3ba693d9-d6d2-11ee-892c-ba3d820a74a5} (448054137376211845) TCPHandler: Code: 57, e.displayText() = DB::Exception: DB::Exception: Table ed_mt_v1.ed_metric_2d6be233-f7bb-4fe1-90a5-28a95c86ec9c_448054137376211845 already exists. SQLSTATE: 42P07.: While executing Remote SQLSTATE: 42P07, Stack trace: 0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int) @ 0x223c0252 in /opt/byconity/bin/clickhouse

  1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0xebc1580 in /opt/byconity/bin/clickhouse
  2. DB::readException(DB::ReadBuffer&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, bool) @ 0xec30dcf in /opt/byconity/bin/clickhouse
  3. DB::RPCHelpers::checkException(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) @ 0x1b706f8b in /opt/byconity/bin/clickhouse
  4. void DB::RPCHelpers::onAsyncCallDoneWithFailedInfoDB::Protos::SendResourcesResp(DB::Protos::SendResourcesResp*, brpc::Controller*, std::__1::shared_ptrDB::ExceptionHandlerWithFailedInfo, DB::WorkerId) @ 0x1b51522a in /opt/byconity/bin/clickhouse
  5. brpc::internal::FunctionClosure4<DB::Protos::SendResourcesResp*, brpc::Controller*, std::__1::shared_ptrDB::ExceptionHandlerWithFailedInfo, DB::WorkerId>::Run() @ 0x1b516fa1 in /opt/byconity/bin/clickhouse
  6. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/controller.h:704: brpc::Controller::EndRPC(brpc::Controller::CompletionInfo const&) @ 0x1e1ae9a9 in /opt/byconity/bin/clickhouse
  7. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/controller.cpp:0: brpc::Controller::OnVersionedRPCReturned(brpc::Controller::CompletionInfo const&, bool, int) @ 0x1e1ad506 in /opt/byconity/bin/clickhouse
  8. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/details/controller_private_accessor.h:0: brpc::policy::ProcessRpcResponse(brpc::InputMessageBase*) @ 0x1e1d6a30 in /opt/byconity/bin/clickhouse
  9. /home/ByConity/build_docker/../contrib/incubator-brpc/src/brpc/input_messenger.cpp:386: brpc::InputMessenger::OnNewMessages(brpc::Socket*) @ 0x1e1cd2ce in /opt/byconity/bin/clickhouse
  10. /home/ByConity/build_docker/../contrib/libcxx/include/memory:1655: brpc::Socket::ProcessEvent(void*) @ 0x1e2fd6cd in /opt/byconity/bin/clickhouse
  11. /home/ByConity/build_docker/../contrib/incubator-brpc/src/bthread/task_group.cpp:304: bthread::TaskGroup::task_runner(long) @ 0x1e19af2d in /opt/byconity/bin/clickhouse

@ozcelgozde
This exception may occur during fallback. The commit below fixes this issue. When sql fails to execute in optimizer mode, worker resource will be synchronized delete.

commit ac18e6a
Date: Thu Jan 18 10:32:52 2024 +0800

Merge branch 'zema/cnch-2.0-fixRemoveWorkerResource' into 'cnch-ce-merge'

fix(clickhousech@m-3000501254): remove SUBMIT_THREADPOOL for removeWorkerResource

@ozcelgozde
Copy link
Contributor Author

ozcelgozde commented Mar 4, 2024

We are actually running the latest master and still observe the same error sometimes. When it happens, I also observe this error from server:
2024.03.04 06:34:41.053854 [ 370177 ] {} SessionResource(448144097629438632): Error occurs when remove WorkerResource{448144097629438632} in worker 10.64.139.114:8124: Code: 2008, e.displayText() = DB::Exception: 112:[E112]Not connected to 10.64.139.114:8124 yet, server_id=904 [R1][E112]Not connected to 10.64.139.114:8124 yet, server_id=904 [R2][E112]Not connected to 10.64.139.114:8124 yet, server_id=904 [R3][E112]Not connected to 10.64.139.114:8124 yet, server_id=904 SQLSTATE: HY000, Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int) @ 0x223c0252 in /opt/byconity/bin/clickhouse

  1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0xebc1580 in /opt/byconity/bin/clickhouse
  2. DB::RpcClientBase::assertController(brpc::Controller const&) @ 0x1b54489f in /opt/byconity/bin/clickhouse
  3. DB::CnchWorkerClient::removeWorkerResource(DB::TxnTimestamp) @ 0x1b5128ef in /opt/byconity/bin/clickhouse
  4. DB::CnchServerResource::cleanResourceInWorker() @ 0x1b6b46dc in /opt/byconity/bin/clickhouse
  5. DB::CnchServerResource::~CnchServerResource() @ 0x1b6b4a43 in /opt/byconity/bin/clickhouse
  6. DB::Context::~Context() @ 0x1be06937 in /opt/byconity/bin/clickhouse
  7. std::__1::__shared_ptr_pointer<DB::Context*, std::__1::shared_ptrDB::Context::__shared_ptr_default_delete<DB::Context, DB::Context>, std::__1::allocatorDB::Context >::__on_zero_shared() @ 0x1be50097 in /opt/byconity/bin/clickhouse
  8. DB::TCPHandler::runImpl() @ 0x1d4d8ab1 in /opt/byconity/bin/clickhouse
  9. DB::TCPHandler::run() @ 0x1d4e6b9c in /opt/byconity/bin/clickhouse
  10. Poco::Net::TCPServerConnection::start() @ 0x2233598c in /opt/byconity/bin/clickhouse
  11. Poco::Net::TCPServerDispatcher::run() @ 0x22335e6c in /opt/byconity/bin/clickhouse
  12. Poco::PooledThread::run() @ 0x2241cf7a in /opt/byconity/bin/clickhouse
  13. Poco::ThreadImpl::runnableEntry(void*) @ 0x2241ab2c in /opt/byconity/bin/clickhouse
  14. start_thread @ 0x7ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
  15. clone @ 0xfca2f in /lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.7.1)

From read workers:
2024.03.04 06:34:40.540414 [ 28444 ] {} BrpcRemoteBroadcastReceiver: Broadcast BrpcReciver[2_1_0_0_10.64.103.226:8124]:ExchangeDataKey[448144099350676295_0_0_18446744073709551615] finished and changed to SEND_UNKNOWN_ERROR with err:'Try close receiver grafully'

2024.03.04 06:34:40.540763 [ 28754 ] {} PlanSegmentExecutor: [420c95fe-d9f1-11ee-a2dd-a6d5e601c8a5_1]: Query has excpetion with code: 2010, detail
: Code: 2010, e.displayText() = DB::Exception: Fail to call DB.Protos.RegistryService.registry, error code: 1014, msg: [E1014]Got EOF of Socket{id=678 fd=178 addr=10.64.139.114:8124:52794} (0x0x7f44601c6980) SQLSTATE: HY000, Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int) @ 0x223c0252 in /opt/byconity/bin/clickhouse

  1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0xebc1580 in /opt/byconity/bin/clickhouse
  2. DB::RpcClient::assertController(brpc::Controller const&, int) @ 0x1d72114d in /opt/byconity/bin/clickhouse
  3. DB::MultiPathReceiver::registerToSendersJoin() @ 0x1d718fa4 in /opt/byconity/bin/clickhouse
  4. DB::PlanSegmentExecutor::registerAllExchangeReceivers(DB::QueryPipeline const&, unsigned int) @ 0x1c9672c8 in /opt/byconity/bin/clickhouse
  5. DB::PlanSegmentExecutor::buildPipeline(std::__1::unique_ptr<DB::QueryPipeline,

@ozcelgozde
Copy link
Contributor Author

Any number of possible errors return table already exists :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants