Add support for scrubbing all chunks on a chunkserver for CRC errors #624

guestisp · 2017-11-10T13:18:04Z

By sending a SIGUSR2 to a chunkserver process, a scrub loop is started (or aborted).
Scrub will check all chunks for any inconsistencies (crc errors) and will mark any failed chunks as damaged, forcing LizardFS to fix them

psarna · 2017-11-12T09:30:07Z

The first few red lights:

This code runs in a thread, but it's not thread-safe at all. We have locks for our directory structure (folderlock), for our chunk hash table (hashlock) and others. If you don't use them, you'll end up with several threads operating on the same data structure without synchronization. That leads to memory corruption and no consistency.

1a. Also, remember that when you do add locks, they should be acquired for as little time as possible, because taking a lock for a long time leads to huge performance drop (other threads would do nothing until you release it).

Instead of

+    // SIGUSR2 could be signal number 31, 12 or 17 based on platform
+    if ( signal == 31 || signal == 12 || signal == 17 ) {

it's just

+    if (signal == SIGUSR2) {

if somebody uses a chunkserver on one system and compiles it on a system with totally different signal numbers, then it's not something we should care about. This one would unnecessarily reserve 3 signal numbers for one purpose.

There's a (very big) memory leak. You allocate "cssorttab" structure, which has the size of all chunks in the directory, via malloc, but it's never freed with free.
cssorttab allocation is also redundant here - you could just take chunks straight from the test list.
Finally, this code is very similar to test thread, so another solution would be just to rewrite current test thread code, so it could check more than 1 chunk per second. Then, starting a "scrub" would just be turning up test limits to something like 1000 chunks per second.

To sum up, I think it's better if "scrubbing" is implemented by amending existing test thread, not adding another one.

guestisp · 2017-11-12T10:38:23Z

Thank you for your advice, as wrote in other thread, i'm not a C programmer.
First version of scrub was made by using the test thread, i've dismissed that version due to some issue (that honestly, i don't remember anymore)

I'll try again to reuse the tester thread, that would be a better solution for sure.

guestisp · 2017-11-12T13:00:51Z

One of the issue I had, was how to detect the end of the loop, in other words: how can I know that all chunks was scrubbed, inside the tester thread ?

I'm refactoring the whole procedure, but insted of increasing the chunks per second, i'm using a variable to see if scrub is running. If is running, no "rate limit" is applied and all chunks are checked in one after the other

psarna · 2017-11-13T08:50:47Z

It's hard to define "all chunks", because number of chunks constantly changes on a working installation.

I already have an old commit that makes test_thread more frequent, I'll dig it out and post it here. From there, it's not hard to create a script that changes chunkserver config, reloads, and thus turns on "scrubbing".

After that, there are several ways to check if "all" chunks were checked. One of them is to add a counter that resets on reload and can be queried to check how many chunks were already checked. Then, since each chunk is checked once, you can guess when the scrub covered all chunks already.

psarna · 2017-11-13T08:51:22Z

Here: http://cr.skytechnology.pl:8081/#/c/3177/

4Dolio · 2017-11-13T13:55:03Z

I still think the easiest scrub is to null read all data from a mount at each chunk server... but thats just me leveraging what's already available. Shrugs.

guestisp · 2017-11-13T14:08:54Z

It's hard to define "all chunks", because number of chunks constantly changes on a working installation.

That's why, my first version, used a static array populated on scrub start.

I already have an old commit that makes test_thread more frequent, I'll dig it out and post it here. From there, it's not hard to create a script that changes chunkserver config, reloads, and thus turns on "scrubbing".

This is cool, and should be merged, but is not a proper solution for a scrubbing.
Changing configuration on the fly is error prone, you can script that, but i think it's obvious that having to change a configuration file on a production system is much more risky than triggering a SIGUSR2 (or anything else). Additionally, you won't be able to automatically stop the scrubbing when all chunks where scrubbed. You'll end with a permanent scrub with very low interval, that will be a performance hit.

Our current test system has about 9.000.000 chunks.
Even setting 1 chunks per second, i'll need about 104 days to scrub them all and with 1 chunks per second running forever i'll have a huge performance hit.
Something different is absolutely needed and must report a progress somewhere (like my implementation did, with progress and ETA), because if you have a scrub running for many days, you must be notified about it's progress, like with a RAID controller:

11/13/17  7:13:59: ld sync: all LDs sync'd:53:  66=Consistency Check started on VD 00/0
11/13/17  7:13:59: ld sync: all LDs sync'd:59:  65=Consistency Check progress on VD 00/0 is 0.00%(6s)
11/13/17  7:14:16: EVT#22150-11/13/17  7:14:16:  65=Consistency Check progress on VD 00/0 is 0.24%(23s)
11/13/17  7:14:50: EVT#22152-11/13/17  7:14:50:  65=Consistency Check progress on VD 00/0 is 0.78%(57s)
11/13/17  7:15:24: EVT#22154-11/13/17  7:15:24:  65=Consistency Check progress on VD 00/0 is 1.24%(91s)
11/13/17  7:27:14: EVT#22187-11/13/17  7:27:14:  65=Consistency Check progress on VD 00/0 is 10.20%(801s)
11/13/17  7:28:28: EVT#22189-11/13/17  7:28:28:  65=Consistency Check progress on VD 00/0 is 10.69%(875s)
11/13/17  7:29:05: EVT#22191-11/13/17  7:29:05:  65=Consistency Check progress on VD 00/0 is 11.23%(912s)

After that, there are several ways to check if "all" chunks were checked. One of them is to add a counter that resets on reload and can be queried to check how many chunks were already checked. Then, since each chunk is checked once, you can guess when the scrub covered all chunks already.

That what I did, here: https://github.com/guestisp/lizardfs/blob/a7b3f02c7b7564da8b6c481bc805903f7425e391/src/chunkserver/hddspacemgr.cc#L2975

but if chunks always change (as it should be), the current tester implementation will never finish (that's good for the tester thread). The scrub should finish. What I'm trying to do is to fetch a list of existing chunks at a given time (the time of scrubbing start), loop through them and then stop at the end.

I can read the first chunk path, then go on until the same chunk is seen again. When we see the first scrubbed chunk for the second time, all chunks where scrubbed

I still think the easiest scrub is to null read all data from a mount at each chunk server... but thats just me leveraging what's already available. Shrugs.

Yes, but it's the same for ZFS. So the native ZFS scrub is nonsense, as you can easily read all data from disk (ZFS will trigger a scrub on every read).
Data security/reliability must not be delegated to external system.

guestisp · 2018-05-11T19:31:27Z

Ok, still trying to implement this feature by resuing hdd_tester_thread here: https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2847

If I understood properly, this is a huge infinite loop
https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2859
that will cycle all chunks here https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2897 and wait up to the defined interval: https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2906

Is this true ?

Now, what I'm unable to understand is what the following code does:
https://github.com/lizardfs/lizardfs/blob/master/src/chunkserver/hddspacemgr.cc#L2876-L2893

variables are too "short" and not self-explaining

Another question: is that thread self-updating when chunks are added or removed ? How can I "fix" that ? I don't want to create an infinite loop for the scrub. Scrubbing should start, scrub what already exist at the time of start, and then finish.

gandalfthegreydev added 2 commits November 10, 2017 14:16

Add support for scrubbing all chunks on a chunkserver for CRC errors

b4e560c

Replaced syslog with lzfs_pretty_syslog

a7b3f02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for scrubbing all chunks on a chunkserver for CRC errors #624

Add support for scrubbing all chunks on a chunkserver for CRC errors #624

guestisp commented Nov 10, 2017

psarna commented Nov 12, 2017

guestisp commented Nov 12, 2017

guestisp commented Nov 12, 2017 •

edited

psarna commented Nov 13, 2017

psarna commented Nov 13, 2017

4Dolio commented Nov 13, 2017

guestisp commented Nov 13, 2017

guestisp commented May 11, 2018

Add support for scrubbing all chunks on a chunkserver for CRC errors #624

Are you sure you want to change the base?

Add support for scrubbing all chunks on a chunkserver for CRC errors #624

Conversation

guestisp commented Nov 10, 2017

psarna commented Nov 12, 2017

guestisp commented Nov 12, 2017

guestisp commented Nov 12, 2017 • edited

psarna commented Nov 13, 2017

psarna commented Nov 13, 2017

4Dolio commented Nov 13, 2017

guestisp commented Nov 13, 2017

guestisp commented May 11, 2018

guestisp commented Nov 12, 2017 •

edited