Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column format error in part gc thread #1198

Open
dogauzuncukoglu opened this issue Feb 8, 2024 · 8 comments
Open

column format error in part gc thread #1198

dogauzuncukoglu opened this issue Feb 8, 2024 · 8 comments
Assignees
Labels
bug Something isn't working P2 minor issue

Comments

@dogauzuncukoglu
Copy link
Contributor

Bug Report

Briefly describe the bug

We started observing an error being logged on the server component. Here is the error log. If we can collect anything else that might be helpful please let us know.

2024.02.08 11:44:31.089511 [ 2096 ] {} <Error> ed_mt_v1.`ed_metric_3b3e736f-59b3-4058-84d0-927e72f60e58_agg_5m` (72649060-8fba-4cbd-8c7f-744fc093fc89)(PartGCThread): Error occurs while remove part 20240205_447510457035260771_447510457035260771_1_447510459719091038: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' before: '�pDi\b�\0\0ersion: 1\n0 columns:\n', Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x21e25df2 in /opt/byconity/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xea63780 in /opt/byconity/bin/clickhouse
2. DB::ParsingException::ParsingException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xea67cb0 in /opt/byconity/bin/clickhouse
3. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0xead4713 in /opt/byconity/bin/clickhouse
4. ? @ 0xead4df4 in /opt/byconity/bin/clickhouse
5. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0x1b2a3110 in /opt/byconity/bin/clickhouse
6. DB::NamesAndTypesList::parse(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x1b2a430b in /opt/byconity/bin/clickhouse
7. DB::createPartFromModel(DB::MergeTreeMetaBase const&, DB::Protos::DataModelPart const&, std::__1::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >) @ 0x1b15af7b in /opt/byconity/bin/clickhouse
8. DB::ServerDataPart::toCNCHDataPart(DB::MergeTreeMetaBase const&, std::__1::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&) const @ 0x1ae44dcd in /opt/byconity/bin/clickhouse
9. void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::CnchPartGCThread::doPhaseTwoGC(std::__1::shared_ptr<DB::IStorage> const&, DB::StorageCnchMergeTree&)::$_6::operator()(unsigned long, unsigned long, unsigned long, unsigned long) const::'lambda'(), void ()> >(std::__1::__function::__policy_storage const*) @ 0x1b8dfee0 in /opt/byconity/bin/clickhouse
10. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xeaa7011 in /opt/byconity/bin/clickhouse
11. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0xeaa8b28 in /opt/byconity/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xeaa3720 in /opt/byconity/bin/clickhouse
13. void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()> >(void*) @ 0xeaa7b9a in /opt/byconity/bin/clickhouse
14. start_thread @ 0x7ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
15. __clone @ 0xfca2f in /lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.8.7.1)

The result you expected

How to Reproduce

Version

cc4e467

@dogauzuncukoglu dogauzuncukoglu added the bug Something isn't working label Feb 8, 2024
@nudles
Copy link
Collaborator

nudles commented Feb 13, 2024

@dogauzuncukoglu it looks like the columns.txt file in this part 20240205_447510457035260771_447510457035260771_1_447510459719091038 has wrong encoded text.
Could you check it?

Does it happen for many parts or just this one?

@kevinthfang
Copy link
Contributor

Hi @dogauzuncukoglu, would you please check the file nudles listed?

@ozcelgozde
Copy link
Contributor

I see that happening in a few tables on different parts. How can I get the columns.txt file for specific parts on S3?

@smmsmm1988
Copy link
Collaborator

smmsmm1988 commented Mar 5, 2024

hi, @ozcelgozde @dogauzuncukoglu . You can download the part data from S3 by using the relative path "bucket/{YOUR_PREFIX}/part_id/data". part_id is the uuid of part reporting such error in the log. You can get part_id from system.cnch_parts by using part name

@kevinthfang
Copy link
Contributor

@ozcelgozde @dogauzuncukoglu any updates on this? are you still encountering this issue?

@ozcelgozde
Copy link
Contributor

ozcelgozde commented Apr 16, 2024

We are still encountering the issue because we were not able to update Byconity version if any fixes were introduced due to blocking issue
But as an interesting info, our tables have ttl of 30 days but i still get errors like this

2024.04.16 19:22:07.769537 [ 272590 ] {} <Error> ed_mt.`ed_log` (7edd3289-c5dc-4e71-8cf4-09be080424d6)(PartGCThread): Error occurs while remove part 20240224_447952891282457490_447952891282457490_1_447953046046244943: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' before: '\0N���~\0\0ersion: 1\n0 columns:\n', Stack trace (when copying this message, always include the lines below):

its still trying to delete a part from February. Ofcourse this part no longer exists in cnch_parts table so I cannot check the part_id to look in s3. Something might be stuck on cache or fdb

@smmsmm1988
Copy link
Collaborator

hi @ozcelgozde . Such part has been moved to trashed items, so you have to access FDB to get the part_id for downloading this part. The key format for trashed items in KV is GCTRASH_{table_uuid}_{part_name}. In the lastest version of ByConity, part_id is recorded in server_part_log.
Such part has been corrupted and blocks its remove by GC. To address this, you can remove this part by using its part_id in S3 and then remove it in FDB.

@kevinthfang
Copy link
Contributor

@ozcelgozde Does the method @smmsmm1988 suggested work for you ?

@kevinthfang kevinthfang added the P2 minor issue label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 minor issue
Projects
None yet
Development

No branches or pull requests

5 participants