-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.13.0~rc1: epic fail? Growing amount of missing chunks; "replication status: IO error" #746
Comments
Oh no! :(:(:( Thanks for the warning and being the sacrifical one. I'll wait for now. I hope you have a backup. |
Thank you for kind words, @njhurst. No, there is no backup. Where would you backup 100+ TiB? Systems like LizardFS are meant to protect from disasters, not cause them... Unforgivable... Now I have 100_000+ missing chunks... I'm guessing that 3.13 destroyed chunks that did not finish goal change and had excessive chunks (mix of replicated and EC chunks). Most of the damage occurred in the most precious data with goals ec(2,2) and ec(3,2) - those files should have been protected by 2 redundant chunks (RAID-6 level of safety). Replicated goals were not affected as far as I can tell... Maybe safe upgrade could be to change all EC goals to replicated ones, wait till no EC chunks are left and then upgrade... |
Any chance of a snapshot?
…Sent from my Windows 10 phone
From: Dmitry Smirnov
Sent: Monday, 27 August 2018 8:34 PM
To: lizardfs/lizardfs
Cc: Subscribed
Subject: Re: [lizardfs/lizardfs] 3.13.0~rc1: epic fail? Growing amount ofmissing chunks; "replication status: IO error" (#746)
Thank you for kind words, @njhurst. No, there is no backup. Where would you backup 100+ TiB? Systems like LizardFS are meant to protect from disasters, not cause them... Unforgivable...
Now I have 100_000+ missing chunks... I'm guessing that 3.13 destroyed chunks that did not finish goal change and had excessive chunks (mix of replicated and EC chunks).
Most of the damage occurred in the most precious data with goals ec(2,2) and ec(3,2) - those files should have been protected by 2 redundant chunks (RAID-6 level of safety).
All lost files were readable. No hardware failure was involved.
Replicated goals were not affected as far as I can tell... Maybe safe upgrade could be to change all EC goals to replicated ones, wait till no EC chunks are left and then upgrade...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Yes I've managed to retrieve some files from snapshots though only some... I've managed to recover some missing chunks from older (recently replaced) HDD by connecting it to chunkserver 3.12 (chunkserver 3.13 rapidly deletes valid ec chunks). |
Here is a quick summary of devastating upgrade from 3.12.0: tibibytes of data destroyed; 100_000+ missing chunks; 80_000+ files damaged. Almost all data in EC goals is gone either due to direct or collateral damage. Pattern of damage is
Before upgrade I had fully replicated files with ec(2,2) goals and most of them are gone despite having no undergoal files prior to upgrade. I also have significant loss (at least 50%) in ec(3,2) chunks of which some were fully replicated and some were in progress of changing goal from std:3 to ec(3,2) so there were enough replicas to avoid data loss. This is how damaged ec(3,2) files look, according to
Snapshots were useless to recover data unless snapshots had different goals. In the aftermath I'll probably make an std:1 goal to use exclusively on snapshots an pin it to slow-ish chunkserver. 3.13.0~rc1 is very unsafe for EC chunks. It deletes valid copies causing massive replication of remaining data. Beware... |
Hi Guys. Now you scared me shitless. What is the safest way forward? the whole point on using lizzard was to use ec instead of btrfs/zfs parity |
If you are already on |
@onlyjob |
We updated to 3.13.0rc1 in order to get proper bandwidth limit handling - but we did not notice this ticket prior to updating. We also use EC goals (EC6,3 and EC7,2) with 660280 chunks (404057 fs objects). Looking into the issue I stumbled upon your ticket and now I'm not sure how to proceed - To be honest I would love to see at least some sign of life from skytech here ... at least a heartbeat showing that they acknowledge our findings and issues. @onlyjob: how did you ultimately proceed? |
@creolis, I think downgrade is the only option to save your data. It is especially important to avoid upgrading chunkservers (or downgrade them ASAP). I don't know if anything else could be done. From my memory, ~rc chunkservers were aggressively removing valid EC chunks. I've lost terabytes of data due to this bug and ultimately moved away from LizardFS. Knowing no better alternatives, I recommend to use MooseFS instead of LizardFS. |
sigh I hate to accept this ... but data loss is the only thing I can't cope with and I have a hard time trusting a FS that allows this, even if we're talking about a RC. For me the real issue is two things:
|
Regarding 1 I'd be very reluctant to install software from source to production infrastructure, not to mention pre-release. To some extent Debian users are protected from this regression because I could not upload such broken release knowing severity of the problems. Official Debian package is not a panacea but it is better/safer because at least package maintainer double checked the release. As for 2, it seems there are nobody left to care. I think either all developers are left (or they were pulled away from the project). You can see from the history of commits that senior developers stopped committing a while ago, then there were no commits at all, then (after a while) a new (junior?) developer started to work on simple issues. Indeed data loss is the worst but there are other severe issues I've listed in the milestone: https://github.com/lizardfs/lizardfs/milestone/2. Notably #662 causes a lot of grief in CI because various IMHO priorities of this project drifted too far away from quality (towards features?), causing so much damage to trust that I'd be surprised if it ever recovers... :( |
LizardFS is now under new management ( see #805 (comment) ) so hopefully LizardFS will start to get back on track again. |
Is there any progress on this issue? |
Nah. Also, we ended up with another error, that presented itself as "bit rot" in replication (non EC!) goals that was not detected by the chunk check loop - at this point we could not stay with LizardFS and had to migrate away. We disbanded our 20 node LizardFS array and switched to another storage solution (and no, it's not MooseFS, due to the lack of features that made LizardFS exactly what we needed in our - apparently weird - use case). We've waited several years now for any sign of progress or change, this regression is open since 2018 (4 yrs at the time of this writing!) but since there seems to be no interest in fixing critical flaws that result in data loss, we had to close this chapter. I'm still reading here, hoping that the guys will eventually resurrect this project, but I don't think this is going to happen in the current situation. I had really high hopes for LizardFS ... it's a pity.
|
Oh, that's no good! I am using both Moose and Lizard right now, but we plan on migrating out Moose cluster to Ceph as we have the hardware to do it... I still use Lizard at home, but am actively looking for SOMETHING that can do what it does and still let me migrate disks in and out... May I ask what you switched to? |
Well, theres moosefs, no EC goals for the free version though. If you're only using a single server, there's bcachefs, if you don't mind beta filesystems and building a custom kernel :) |
Ya, EC is basically a must as well. Seaweed looked interesting, but the response I got from the dev when specifically asking for a commercial license was less than inspiring... I had NOT heard of bcachefs! And no, I don't mind a custom kernel at all. Thanks. :) |
@creolis what did you end up going with? |
What features that would be?? EC is overrated. It allows slightly more efficient utilisation of space at expense of performance, higher administration and maintenance cost, more troubleshooting, more downtime and a risk of paying the ultimate price -- data loss. Are those troubles worth the price of several (cheap/slow) high capacity HDDs to accommodate non-EC replicas on MooseFS? And not just that. MooseFS have an awesome feature to compensate for lack of EC: Storage Classes, that allow to pin data to disks with different capacity/performance and design tiers for efficient utilisation of hybrid storage with SSD and rotational disks. As for master/master replication, we've also found that it is not as useful as it seems. It can be re-implemented with fail-over to metadata backup logger, with some downtime. However the point is to avoid accidental/automatic switch between masters (like in case of network switch maintenance) hence it is safer to move master manually when required. So my conclusion is that MooseFS is massively superior to LizardFS even without EC, because TCO and reliability matters. Less reliable storage tends to be more costly to operate. P.S. We've tried and considered almost every Open Source storage solution (e.g. Ceph, GFarm, GlusterFS, RozoFS, SeaweedFS, XtreemFS and few others that I don't recall at the moment) but nothing stands even close to MooseFS. |
Gotta admit, when I migrated my media server from LizardFS to MooseFS I didn't worry about giving up on EC goals. Had a large section of media that was on ec(2,1), converted it to Goal 2, added an extra 5TB disk, problem solved. |
A flexible number of replicas to be configured (in one extreme with a 80 node lizardFS cluster, 80 replicas!). |
As @onlyjob pointed out, we confirmed that this bug was introduced by the following commit: The issue only appears after upgrading from version 3.12 (last official released version) to version 3.13 (still in release candidate status) with EC chunks under rebalancing. We have replicated the issue in our testing infrastructure and are working on fixing it. The Safe Scenarios are:
|
well, my issue was that while using replication goals (and again we're talking about 80 replicas, 'cause we use it as a kind of sync) more and more replicas ended up with garbage inside the files and lizardFS never noticed that this happened and flagged all of them as available and okay. We could not find a clear reproducer, it "just happened" for individual files that were all around 2 GB in size .. but not for all of them. Anyway - maybe some day lizardFS will release a new version - maybe even with the announced complete rewrite .. then I will happily take a look at it again, as it served me really well for years. I'm looking forward to it :) |
@creolis - did you, at any point, have a non ECC-memory systems connected as clients which process the data? Workstations, etc.. |
@borkd Yes, we had. The clients (and at the same time chunkservers) whose vmware template cache partitions have been kept in sync using LizardFS replication targets are non-ECC. It worked flawlessly for so long ... so to be honest I never even bothered to think bit-rot could be a problem due to non ECC memory ... sheesh |
After upgrading chunkservers to 3.13.0~rc1 I'm afraid I'm not getting away without massive data loss:
mfsmaster
logsreplication status: IO error
all the time and as replication progresses, CGI's Chunks view report growing (!) number of missing chunks in ec, and xor goals.Bloody hell... :( :( :(
The text was updated successfully, but these errors were encountered: