Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more detail info for metablk debug #426

Merged
merged 4 commits into from
May 30, 2024

Conversation

JacksonYao287
Copy link
Contributor

@JacksonYao287 JacksonYao287 commented May 15, 2024

the metablk crc mismatch issue is not easy to be reproduced, and one most possible reason is write failure, which will lead to crc mismatch.
as we discussed in homestore meeting, this PR add more detail debug info which will be helpful when the issue occurs. also in this PR, we will terminate the process with an assert failure so that it will not lead to later CRC mismatch when recovery

the strategy of handling write failures of different service of homestore will be written up to a doc and discussed later on

@JacksonYao287 JacksonYao287 marked this pull request as draft May 15, 2024 14:09
@JacksonYao287 JacksonYao287 self-assigned this May 16, 2024
@JacksonYao287 JacksonYao287 added the bug Something isn't working label May 16, 2024
@JacksonYao287 JacksonYao287 marked this pull request as ready for review May 16, 2024 06:29
@JacksonYao287 JacksonYao287 requested review from hkadayam and yamingk and removed request for hkadayam May 16, 2024 06:29
try {
m_sb_vdev->sync_write((const char*)m_ssb, block_size(), m_ssb->bid);
} catch (std::exception& e) { HS_REL_ASSERT(false, "exception happen during write {}", e.what()); }
auto error = m_sb_vdev->sync_write((const char*)m_ssb, block_size(), m_ssb->bid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about create a new function MetaBlkService::sync_write and move all error handling to that function, to avoid duplicated error handling?

void MetaBlkService::sync_write(const char* data, uint32_t size, BlkId const& bid) { 
        // copy line: 199~206 here 
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do that , we also have to handle the the error returned by MetaBlkService::sync_write whereever it is called. that is almost the same as we use m_sb_vdev->sync_write directly

Copy link
Contributor

@yamingk yamingk May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. MetaBlkService::sync_write::sync_write is returning void and only error handling happens in MetaBlkService::sync_write, not in the caller. So instead of 4 places doing the same error handling (e.g. a release assert), only one place is doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok , will do this in another PR

Copy link
Contributor

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@JacksonYao287 JacksonYao287 merged commit e9721a0 into eBay:master May 30, 2024
21 checks passed
@JacksonYao287 JacksonYao287 deleted the metablk-crc branch May 30, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recover meta block crashes sometimes in github CI and local machine build
3 participants