3.13.0~rc1: epic fail? Growing amount of missing chunks; "replication status: IO error" #746

onlyjob · 2018-08-27T00:49:25Z

After upgrading chunkservers to 3.13.0~rc1 I'm afraid I'm not getting away without massive data loss: mfsmaster logs replication status: IO error all the time and as replication progresses, CGI's Chunks view report growing (!) number of missing chunks in ec, and xor goals.

Bloody hell... :( :( :(

The text was updated successfully, but these errors were encountered:

njhurst · 2018-08-27T01:28:08Z

Oh no! :(:(:( Thanks for the warning and being the sacrifical one. I'll wait for now. I hope you have a backup.

onlyjob · 2018-08-27T10:34:06Z

Thank you for kind words, @njhurst. No, there is no backup. Where would you backup 100+ TiB? Systems like LizardFS are meant to protect from disasters, not cause them... Unforgivable...

Now I have 100_000+ missing chunks... I'm guessing that 3.13 destroyed chunks that did not finish goal change and had excessive chunks (mix of replicated and EC chunks).

Most of the damage occurred in the most precious data with goals ec(2,2) and ec(3,2) - those files should have been protected by 2 redundant chunks (RAID-6 level of safety).
All lost files were readable. No hardware failure was involved.

Replicated goals were not affected as far as I can tell... Maybe safe upgrade could be to change all EC goals to replicated ones, wait till no EC chunks are left and then upgrade...

Blackpaw · 2018-08-27T16:20:44Z

Any chance of a snapshot?

…

Sent from my Windows 10 phone From: Dmitry Smirnov Sent: Monday, 27 August 2018 8:34 PM To: lizardfs/lizardfs Cc: Subscribed Subject: Re: [lizardfs/lizardfs] 3.13.0~rc1: epic fail? Growing amount ofmissing chunks; "replication status: IO error" (#746) Thank you for kind words, @njhurst. No, there is no backup. Where would you backup 100+ TiB? Systems like LizardFS are meant to protect from disasters, not cause them... Unforgivable... Now I have 100_000+ missing chunks... I'm guessing that 3.13 destroyed chunks that did not finish goal change and had excessive chunks (mix of replicated and EC chunks). Most of the damage occurred in the most precious data with goals ec(2,2) and ec(3,2) - those files should have been protected by 2 redundant chunks (RAID-6 level of safety). All lost files were readable. No hardware failure was involved. Replicated goals were not affected as far as I can tell... Maybe safe upgrade could be to change all EC goals to replicated ones, wait till no EC chunks are left and then upgrade... — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

onlyjob · 2018-08-28T04:14:16Z

Yes I've managed to retrieve some files from snapshots though only some...

I've managed to recover some missing chunks from older (recently replaced) HDD by connecting it to chunkserver 3.12 (chunkserver 3.13 rapidly deletes valid ec chunks).

onlyjob · 2018-09-03T03:36:55Z

Here is a quick summary of devastating upgrade from 3.12.0: tibibytes of data destroyed; 100_000+ missing chunks; 80_000+ files damaged. Almost all data in EC goals is gone either due to direct or collateral damage.

Pattern of damage is not enough parts available:

        chunk 0: 00000D9CA2DCB53A_00000001 / (id:14966398432570 ver:1)
                copy 1: 192.168.0.130:9422:wks part 4/4 of ec(2,2)
                not enough parts available

        chunk 0: 00000D9CA2DCB6EA_00000001 / (id:14966398433002 ver:1)
                copy 1: 192.168.0.250:9622:stor part 1/4 of ec(2,2)
                not enough parts available

Before upgrade I had fully replicated files with ec(2,2) goals and most of them are gone despite having no undergoal files prior to upgrade.

I also have significant loss (at least 50%) in ec(3,2) chunks of which some were fully replicated and some were in progress of changing goal from std:3 to ec(3,2) so there were enough replicas to avoid data loss.

This is how damaged ec(3,2) files look, according to lizardfs fileinfo:

        chunk 0: 0000000000954D73_00000001 / (id:9784691 ver:1)
                copy 1: 192.168.0.204:9422:pool part 4/5 of ec(3,2)
                copy 2: 192.168.0.250:9622:stor part 1/5 of ec(3,2)
                not enough parts available

        chunk 0: 0000000000954B9D_00000001 / (id:9784221 ver:1)
                copy 1: 192.168.0.2:9422:pool part 4/5 of ec(3,2)
                copy 2: 192.168.0.3:9622:pool part 3/5 of ec(3,2)
                not enough parts available
        chunk 1: 0000000000954BA5_00000001 / (id:9784229 ver:1)
                copy 1: 192.168.0.2:9422:pool part 3/5 of ec(3,2)
                copy 2: 192.168.0.204:9422:pool part 4/5 of ec(3,2)
                not enough parts available
        chunk 2: 0000000000954BAC_00000001 / (id:9784236 ver:1)
                copy 1: 192.168.0.130:9422:wks part 2/5 of ec(3,2)
                not enough parts available
        chunk 3: 0000000000954BB8_00000001 / (id:9784248 ver:1)
                copy 1: 192.168.0.2:9422:pool part 4/5 of ec(3,2)
                copy 2: 192.168.0.3:9622:pool part 3/5 of ec(3,2)
                copy 3: 192.168.0.4:9422:wks part 1/5 of ec(3,2)
                copy 4: 192.168.0.204:9422:pool
                copy 5: 192.168.0.250:9422:stor part 2/5 of ec(3,2)
                copy 6: 192.168.0.250:9522:stor part 5/5 of ec(3,2)
                copy 7: 192.168.0.250:9622:stor
        chunk 4: 0000000000954BBC_00000001 / (id:9784252 ver:1)
                copy 1: 192.168.0.2:9422:pool part 3/5 of ec(3,2)
                copy 2: 192.168.0.3:9622:pool
                copy 3: 192.168.0.4:9422:wks part 2/5 of ec(3,2)
                copy 4: 192.168.0.204:9422:pool part 4/5 of ec(3,2)
                copy 5: 192.168.0.250:9422:stor part 5/5 of ec(3,2)
                copy 6: 192.168.0.250:9522:stor part 1/5 of ec(3,2)
                copy 7: 192.168.0.250:9622:stor
        chunk 5: 0000000000954BC4_00000002 / (id:9784260 ver:2)
                copy 1: 192.168.0.2:9422:pool part 1/5 of ec(3,2)
                copy 2: 192.168.0.3:9622:pool
                copy 3: 192.168.0.130:9422:wks part 2/5 of ec(3,2)
                copy 4: 192.168.0.204:9422:pool part 4/5 of ec(3,2)
                copy 5: 192.168.0.250:9422:stor
                copy 6: 192.168.0.250:9522:stor part 3/5 of ec(3,2)
                copy 7: 192.168.0.250:9622:stor part 5/5 of ec(3,2)

Snapshots were useless to recover data unless snapshots had different goals. In the aftermath I'll probably make an std:1 goal to use exclusively on snapshots an pin it to slow-ish chunkserver.

3.13.0~rc1 is very unsafe for EC chunks. It deletes valid copies causing massive replication of remaining data. Beware...

eleaner · 2018-09-14T08:51:45Z

Hi Guys. Now you scared me shitless.
I just started my adventure with LizzardFS and obviously with 3.13.0~rc1
200k chunks and I don't see any major problems yet, maybe except #765

What is the safest way forward? the whole point on using lizzard was to use ec instead of btrfs/zfs parity
is there a way to downgrade lizard version to a working one?

onlyjob · 2018-09-14T09:17:14Z

If you are already on v3.13.0~rc1 then you might be safe. Between 3.12 and 3.13 they've made a very unsafe change to convert EC chunks made by earlier LizardFS versions: e76c386. Unless there are other issues affecting EC chunks, this particular one is about upgrade to 3.13.0~rc1.

eleaner · 2018-09-14T09:28:32Z

@onlyjob
Yes. I started in v3.13.0~rc1
So I hope I am safe

creolis · 2019-08-10T15:24:11Z

We updated to 3.13.0rc1 in order to get proper bandwidth limit handling - but we did not notice this ticket prior to updating.

We also use EC goals (EC6,3 and EC7,2) with 660280 chunks (404057 fs objects).
Now 3 days into the update we start losing chunks in EC6,3 for no apparent reason,
we lost 5 chunks (3 files) so far.

Looking into the issue I stumbled upon your ticket and now I'm not sure how to proceed -
not sure if downgrading would be an option to consider since I don't know if the recalculation is still running and if we have to expect lost chunks adding up leaving it on 3.13.0rc1.

To be honest I would love to see at least some sign of life from skytech here ... at least a heartbeat showing that they acknowledge our findings and issues.

@onlyjob: how did you ultimately proceed?

onlyjob · 2019-08-11T02:40:27Z

@creolis, I think downgrade is the only option to save your data. It is especially important to avoid upgrading chunkservers (or downgrade them ASAP). I don't know if anything else could be done. From my memory, ~rc chunkservers were aggressively removing valid EC chunks.

I've lost terabytes of data due to this bug and ultimately moved away from LizardFS.
IMHO current governance of LizardFS can not be trusted and even if they'd care to repair the trust it would take a lot of time, expertise and communication with community.
Skytech is hopeless. It's been almost a year and they couldn't care less... :(

Knowing no better alternatives, I recommend to use MooseFS instead of LizardFS.

creolis · 2019-08-11T20:40:34Z

sigh I hate to accept this ... but data loss is the only thing I can't cope with and I have a hard time trusting a FS that allows this, even if we're talking about a RC.

For me the real issue is two things:

I wonder why 3.13.0-rc1 is still online and the preferred link if you hit "Download" on lizardfs.com, without the slightest note or warning that there is the chance to loose data on existing EC goals.
I really don't know why an absolute showstopper like this is not handled as a priority. If your users loose data, you get a reputation problem, even if you're working on an updated branch. Just dedicate enough time to 3.13.0-rc2 to prevent data loss. Management features that do not work? I'll survive that. Lost Performance? I can deal with that. Unstable chunkservers? I'll watchdog them with a shell script and restart them if necessary. Data Loss? I can't deal with this.

onlyjob · 2019-08-12T01:01:46Z

Regarding 1 I'd be very reluctant to install software from source to production infrastructure, not to mention pre-release. To some extent Debian users are protected from this regression because I could not upload such broken release knowing severity of the problems. Official Debian package is not a panacea but it is better/safer because at least package maintainer double checked the release.

As for 2, it seems there are nobody left to care. I think either all developers are left (or they were pulled away from the project). You can see from the history of commits that senior developers stopped committing a while ago, then there were no commits at all, then (after a while) a new (junior?) developer started to work on simple issues.
IMHO under current governance there is no hope for this project: #805.

Indeed data loss is the worst but there are other severe issues I've listed in the milestone: https://github.com/lizardfs/lizardfs/milestone/2. Notably #662 causes a lot of grief in CI because various git commands randomly fail. I'm not sure what else might be affected by #780 and #672 but it feels very insecure. #754 leaves even less confidence and #742/#743 show how much worse the quality of ~rc1 has become comparing to previous releases.

IMHO priorities of this project drifted too far away from quality (towards features?), causing so much damage to trust that I'd be surprised if it ever recovers... :(

zicklag · 2019-09-11T18:14:50Z

LizardFS is now under new management ( see #805 (comment) ) so hopefully LizardFS will start to get back on track again.

BloodBlight · 2022-06-30T22:28:23Z

Is there any progress on this issue?

creolis · 2022-06-30T23:44:23Z

Nah.

Also, we ended up with another error, that presented itself as "bit rot" in replication (non EC!) goals that was not detected by the chunk check loop - at this point we could not stay with LizardFS and had to migrate away.

We disbanded our 20 node LizardFS array and switched to another storage solution (and no, it's not MooseFS, due to the lack of features that made LizardFS exactly what we needed in our - apparently weird - use case).

We've waited several years now for any sign of progress or change, this regression is open since 2018 (4 yrs at the time of this writing!) but since there seems to be no interest in fixing critical flaws that result in data loss, we had to close this chapter. I'm still reading here, hoping that the guys will eventually resurrect this project, but I don't think this is going to happen in the current situation. I had really high hopes for LizardFS ... it's a pity.

Daniel

BloodBlight · 2022-07-01T00:03:24Z

Oh, that's no good!

I am using both Moose and Lizard right now, but we plan on migrating out Moose cluster to Ceph as we have the hardware to do it...

I still use Lizard at home, but am actively looking for SOMETHING that can do what it does and still let me migrate disks in and out...

May I ask what you switched to?

Blackpaw · 2022-07-01T00:06:45Z

I still use Lizard at home, but am actively looking for SOMETHING that can do what it does and still let me migrate disks in and out...

Well, theres moosefs, no EC goals for the free version though.

If you're only using a single server, there's bcachefs, if you don't mind beta filesystems and building a custom kernel :)

BloodBlight · 2022-07-01T00:12:21Z

Ya, EC is basically a must as well. Seaweed looked interesting, but the response I got from the dev when specifically asking for a commercial license was less than inspiring...

I had NOT heard of bcachefs! And no, I don't mind a custom kernel at all.

Thanks. :)

jkiebzak · 2022-07-01T00:43:47Z

@creolis what did you end up going with?

onlyjob · 2022-07-05T07:15:38Z

We disbanded our 20 node LizardFS array and switched to another storage solution (and no, it's not MooseFS, due to the lack of features that made LizardFS exactly what we needed in our - apparently weird - use case).

What features that would be??

EC is overrated. It allows slightly more efficient utilisation of space at expense of performance, higher administration and maintenance cost, more troubleshooting, more downtime and a risk of paying the ultimate price -- data loss.

Are those troubles worth the price of several (cheap/slow) high capacity HDDs to accommodate non-EC replicas on MooseFS?
In our case the answer is definitive no. After switch to MooseFS we have more reliable storage, less bugs, greater availability, less administration effort, better performance, lower access latency, better support, etc.

And not just that. MooseFS have an awesome feature to compensate for lack of EC: Storage Classes, that allow to pin data to disks with different capacity/performance and design tiers for efficient utilisation of hybrid storage with SSD and rotational disks.

As for master/master replication, we've also found that it is not as useful as it seems. It can be re-implemented with fail-over to metadata backup logger, with some downtime. However the point is to avoid accidental/automatic switch between masters (like in case of network switch maintenance) hence it is safer to move master manually when required.

So my conclusion is that MooseFS is massively superior to LizardFS even without EC, because TCO and reliability matters. Less reliable storage tends to be more costly to operate.

P.S. We've tried and considered almost every Open Source storage solution (e.g. Ceph, GFarm, GlusterFS, RozoFS, SeaweedFS, XtreemFS and few others that I don't recall at the moment) but nothing stands even close to MooseFS.
LeoFS was also considered but I've spent so much time on trial and comparison of everything else, and was already so happy with MooseFS that I never had a chance to try LeoFS... If anyone tried it, please let me know your impressions. Thanks.

Blackpaw · 2022-07-05T07:22:40Z

EC is overrated. It allows slightly more efficient utilisation of space at expense of performance, higher administration and maintenance cost, more troubleshooting, more downtime and a risk of paying the ultimate price -- data loss.

Gotta admit, when I migrated my media server from LizardFS to MooseFS I didn't worry about giving up on EC goals. Had a large section of media that was on ec(2,1), converted it to Goal 2, added an extra 5TB disk, problem solved.

creolis · 2022-07-05T08:53:20Z

What features that would be??

A flexible number of replicas to be configured (in one extreme with a 80 node lizardFS cluster, 80 replicas!).
Weird use case, I know. But it was exactly (!!) what we needed.

lgsilva3087 · 2022-07-13T15:37:09Z

As @onlyjob pointed out, we confirmed that this bug was introduced by the following commit:

e76c386

The issue only appears after upgrading from version 3.12 (last official released version) to version 3.13 (still in release candidate status) with EC chunks under rebalancing. We have replicated the issue in our testing infrastructure and are working on fixing it.

The Safe Scenarios are:

Installation of version 3.12 having files with EC replication goals.
Installation of version 3.13 having files with EC replication goals. (Clean installation, not upgraded from v3.12 if you have files with EC goals!).
Upgrade from version 3.12 to 3.13 if no EC replication goals are used on the v3.12 cluster.

creolis · 2022-07-14T11:51:30Z

well, my issue was that while using replication goals (and again we're talking about 80 replicas, 'cause we use it as a kind of sync) more and more replicas ended up with garbage inside the files and lizardFS never noticed that this happened and flagged all of them as available and okay. We could not find a clear reproducer, it "just happened" for individual files that were all around 2 GB in size .. but not for all of them.

Anyway - maybe some day lizardFS will release a new version - maybe even with the announced complete rewrite .. then I will happily take a look at it again, as it served me really well for years. I'm looking forward to it :)

borkd · 2022-07-16T00:40:49Z

@creolis - did you, at any point, have a non ECC-memory systems connected as clients which process the data? Workstations, etc..

creolis · 2022-07-20T08:37:58Z

@borkd Yes, we had. The clients (and at the same time chunkservers) whose vmware template cache partitions have been kept in sync using LizardFS replication targets are non-ECC. It worked flawlessly for so long ... so to be honest I never even bothered to think bit-rot could be a problem due to non ECC memory ... sheesh

njhurst mentioned this issue Sep 3, 2018

lizardfs lacks a test suite to detect regressions. #757

Open

onlyjob mentioned this issue Sep 14, 2018

Question: lizardfs on a single machine #764

Open

onlyjob mentioned this issue Oct 21, 2018

Inconsistent results when walking the filesystem concurrently #780

Closed

onlyjob added bug blocker labels Dec 17, 2018

onlyjob added this to the 3.13.0 milestone Dec 17, 2018

onlyjob pinned this issue Dec 18, 2018

borkd mentioned this issue Jan 15, 2019

what filesystem for chunkserver (XFS or ZFS?) #797

Open

This was referenced Feb 16, 2019

are ec goals trustworthy? - question #815

Open

question - lizardfs as FS for home backup server #804

Closed

hak8or mentioned this issue Apr 3, 2019

Is LizardFS development still alive? #805

Closed

blink69 unpinned this issue Jun 6, 2019

onlyjob pinned this issue Aug 11, 2019

aletus mentioned this issue Jul 9, 2020

If chunkserver goes down during changing goal, old chunks stuck around forever after that #876

Open

szycha76 mentioned this issue Jun 13, 2022

Upgrade 3.12 to 3.13 #922

Open

BloodBlight mentioned this issue Jun 22, 2022

fuse: unknown option `nofail' #890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.13.0~rc1: epic fail? Growing amount of missing chunks; "replication status: IO error" #746

3.13.0~rc1: epic fail? Growing amount of missing chunks; "replication status: IO error" #746

onlyjob commented Aug 27, 2018

njhurst commented Aug 27, 2018

onlyjob commented Aug 27, 2018

Blackpaw commented Aug 27, 2018 via email

onlyjob commented Aug 28, 2018

onlyjob commented Sep 3, 2018

eleaner commented Sep 14, 2018

onlyjob commented Sep 14, 2018

eleaner commented Sep 14, 2018

creolis commented Aug 10, 2019

onlyjob commented Aug 11, 2019

creolis commented Aug 11, 2019

onlyjob commented Aug 12, 2019 •

edited

zicklag commented Sep 11, 2019

BloodBlight commented Jun 30, 2022

creolis commented Jun 30, 2022 •

edited

BloodBlight commented Jul 1, 2022

Blackpaw commented Jul 1, 2022 •

edited

BloodBlight commented Jul 1, 2022

jkiebzak commented Jul 1, 2022

onlyjob commented Jul 5, 2022

Blackpaw commented Jul 5, 2022

creolis commented Jul 5, 2022

lgsilva3087 commented Jul 13, 2022

creolis commented Jul 14, 2022

borkd commented Jul 16, 2022

creolis commented Jul 20, 2022 •

edited

3.13.0~rc1: epic fail? Growing amount of missing chunks; "replication status: IO error" #746

3.13.0~rc1: epic fail? Growing amount of missing chunks; "replication status: IO error" #746

Comments

onlyjob commented Aug 27, 2018

njhurst commented Aug 27, 2018

onlyjob commented Aug 27, 2018

Blackpaw commented Aug 27, 2018 via email

onlyjob commented Aug 28, 2018

onlyjob commented Sep 3, 2018

3.13.0~rc1 is very unsafe for EC chunks. It deletes valid copies causing massive replication of remaining data. Beware...

eleaner commented Sep 14, 2018

onlyjob commented Sep 14, 2018

eleaner commented Sep 14, 2018

creolis commented Aug 10, 2019

onlyjob commented Aug 11, 2019

creolis commented Aug 11, 2019

onlyjob commented Aug 12, 2019 • edited

zicklag commented Sep 11, 2019

BloodBlight commented Jun 30, 2022

creolis commented Jun 30, 2022 • edited

BloodBlight commented Jul 1, 2022

Blackpaw commented Jul 1, 2022 • edited

BloodBlight commented Jul 1, 2022

jkiebzak commented Jul 1, 2022

onlyjob commented Jul 5, 2022

Blackpaw commented Jul 5, 2022

creolis commented Jul 5, 2022

lgsilva3087 commented Jul 13, 2022

creolis commented Jul 14, 2022

borkd commented Jul 16, 2022

creolis commented Jul 20, 2022 • edited

onlyjob commented Aug 12, 2019 •

edited

creolis commented Jun 30, 2022 •

edited

Blackpaw commented Jul 1, 2022 •

edited

creolis commented Jul 20, 2022 •

edited