Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover meta block crashes sometimes in github CI and local machine build #365

Open
yamingk opened this issue Mar 29, 2024 · 3 comments · May be fixed by #426
Open

recover meta block crashes sometimes in github CI and local machine build #365

yamingk opened this issue Mar 29, 2024 · 3 comments · May be fixed by #426
Assignees
Labels
bug Something isn't working
Milestone

Comments

@yamingk
Copy link
Contributor

yamingk commented Mar 29, 2024

/home/runner/.conan/data/homestore/6.0.1///build/64eb2822c4463ffd57405ab6c52e9954705502c6/src/lib/meta/meta_blk_service.cpp:1126: void homestore::MetaBlkService::recover_meta_block(homestore::meta_blk*): Assertion `0' failed.
[03/25/24 19:47:05+00:00] [critical] [test_meta_blk_mgr] [7169] [meta_blk_service.cpp:1126] ******************** Assertion failure: =====> Expected '726044547' to be == to '3060308851' [type=Test_Rand_Load], CRC mismatch: 726044547/3060308851, on mblk bid: blk#=158081 count=1 chunk=0, context_sz: 152064

https://github.com/eBay/HomeStore/actions/runs/8423765023/job/23072604152

It can also be hit in local build machine

@yamingk yamingk added this to the MileStone4.2 milestone Mar 29, 2024
@yamingk yamingk added the bug Something isn't working label Mar 29, 2024
@yamingk
Copy link
Contributor Author

yamingk commented May 9, 2024

@JacksonYao287 JacksonYao287 linked a pull request May 15, 2024 that will close this issue
@xiaoxichen
Copy link
Contributor

xiaoxichen commented May 15, 2024

If we read the log carefully , there is always a write failure before the restart. I cannot tell if the write failure is exactly the metablk as we dont log the offset of the metablk, but the length do match.

In this run https://github.com/eBay/HomeStore/actions/runs/8423765023/job/23072604152
write failure

Warning:  19:46:59+00:00] [warning] [test_meta_blk_mgr] [7169] [drive_interface.cpp:435:sync_write] Error during write offset=284729344 write_size=152064 written_size=-1 errno=22 fd=35

assert

[03/25/24 19:47:05+00:00] [critical] [test_meta_blk_mgr] [7169] [meta_blk_service.cpp:1126] ******************** Assertion failure: =====> Expected '726044547' to be == to '3060308851' [type=Test_Rand_Load], CRC mismatch: 726044547/3060308851, on mblk bid: blk#=158081 count=1 chunk=0, context_sz: 152064

In this run https://github.com/eBay/HomeStore/actions/runs/9020268861/job/24785052596?pr=403
write failure

Warning:  16:42:56+00:00] [warning] [test_meta_blk_mgr] [6627] [drive_interface.cpp:435:sync_write] Error during write offset=428580864 write_size=33792 written_size=-1 errno=22 fd=37

assert

[05/09/24 16:43:01+00:00] [critical] [test_meta_blk_mgr] [6627] [meta_blk_service.cpp:1126] ******************** Assertion failure: =====> Expected '2845276984' to be == to '2914764139' [type=Test_Rand_Load], CRC mismatch: 2845276984/2914764139, on mblk bid: blk#=158125 count=1 chunk=0, context_sz: 33792

@xiaoxichen
Copy link
Contributor

as the error code is 22 (einval)

EINVAL
fd is attached to an object which is unsuitable for writing; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned.

I believe it is our issue not env issue. I checked the offset is align to 4K, size align to 512, but not sure if the buffer is aligned, also the offset is well below 1GB so unlikely we are hitting any size boundary . Regarding logs, suggesting more logs in iomgr regarding the write failure, especially dump the buffer address as well as the FD open flag.

But as we get CRC mismatch, that means we probably get a partial write. This is much easier to happen compare to bit rot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants