-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] : Storage Class change from 2 replicas to 1 removes second copy before data is moved #484
Comments
Sounds very strange. I'm setting up our test mfs3 instance right now to have storage class like yours and I will try to repeat your steps. |
I tried to repeat your steps and it works for me. To be exact:
What I got: files, that had 2 chunk copies now have 1 chunk copy. This happened rather fast (the instance has some other data on it, but not much and doesn't do anything else at the moment, no clients actively reading or writing). This is correct, as So, obviously, MooseFS is not just deleting chunks nilly-willy. Something else must have happened. With only 1 copy of each chunk, if that copy gets corrupted somehow (invalid, missing), you no longer have a file. It could have happened with a few files. With all files in one directory - that's a bit suspicious. Is there no trace of what might have happened in master or chunkserver logs? When did the missing chunk messages start to appear? Were there any restarts in your instance, any other new chunksevers (non L labeled) added? Did you run out of space on any chunkserver at any moment? Were there i/o operations performed on chunks in Side note: STRICT mode is the only mode in which MooseFS can actually kind of deliberately loose chunks. Scenario: if you have chunks in many copies even (let's say 3), but all on wrongly labeled chunkservers in a class that is STRICT and correctly labeled chunksevers for that class do not exists or are otherwise unavailable (nearly 100% full or non-responsive), if any of those copies goes missing, MooseFS will not replicate it, because the class is STRICT. In NORMAL and LOOSE it would, but not in STRICT. So if, over time, all available copies get corrupted, the chunk will be gone. This is a dangerous behaviour, but the possibility of loosing data is described in the manpage for |
Forgot to ask: are your chunks really missing? Or just invalid/wrong version? |
I've noticed loss of data few days later. Absolutely nothing extraordinary happened during that time - no lack of space on chunkservers or anything. All chunks in class |
Original class makes absolutely no difference, once files are moved to the new class, there is no information about the old class. The only "consequence" of the old class is where the chunks were stored prior to the change (because this is where the 1 copy stayed or should have stayed), but you said "non C-labeled" so that is what I did in my test instance. But it doesn't matter whether the previous class was strict or not. "My" test files are still there, nothing is gone. I tried today some i/o on them, I read them all several times, modified a couple. Nothing bad happened. Since I'm not able to replicate the problem, the only way I could help would be with access to system logs, changelogs and metadata of the affected instance (assuming you still have the information from the time period). |
@onlyjob can you show your mfshdd.cfg containing zfs mountpoint on chunkserver affected? Does your zfs have compression enabled? I'm just wondering (not tested yet) if having compression-enabled zfs and mfshdd.cfg without ~prefix may cause moosefs to think the "disk" is damaged and in effect cause issue like yours? |
I don't know how can I make it any more clear: I don't have ZFS chunkserver! That's the point. I've created Storage Class that refers to non-existing label, assigned to no chunkserver. The data re-assigned to that new storage class was gone to nowhere - all of it. The was no chunkserver where it could be sent, according to storage class definition. |
Oh, Understood.
So pretty much similar to @chogata tests and results Some questions that came to my mind:
|
Agata, you are right, I have missed/forgotten something important: some time around changing storage class I've stopped and removed a chunkserver that held one copy of data. Your comment made me think that the data might have survived there and it did -- once I've started that chunkserver again, it had one (unmigrated) copy of all chunks in So the problem is not as severe as I originally thought but still bad enough: when storage class is changed from 2 replicas to 1, one replica is being removed before(!) data is physically moved to destination chunkserver. This is why I thought all data was lost: I expected one copy of data to be available on active chunkservers until data is replicated to destination. That was not the case, so the problem reminded me of #233. @tokru66: 1) There was no outages, cluster was operating normally; 2) it was not the default storage class but a custom I have modified @chogata, could you please check/confirm if archiving from |
@onlyjob no, it doesn't. It will remove 1 copy of data and try to replicate the other copy to the correct label - independently. MooseFS kind of tries to keep the data in the required number of copies as a first priority and on correct labels - as a second priority. The only thing is, if the class is strict, it might, in certain circumstances, give up on the first goal - number of copies - if no correct servers are available (like in the example I gave a couple of posts ago). |
When storage classes designed to reflect reliability of chunkservers, data placement is crucial to safety. Therefore reducing number of replicas before data is moved to designated location is a significant risk factor. One copy of data on non-redundant HDD is endangered to much greater extent than if it is pinned to RAID-6 backed chunkserver. MooseFS does not have to know that but it should be able to make safe transition from 2 replicas to 1, when relocation is involved. And that requires to move data first and only then delete redundant replica (not vice versa). (Label |
MooseFS does not assume any levels of safety on underlying storage, in that it deems them all safe or un-safe to the same degree. So it does not care that the user thinks one copy on this label is safer than one copy on that label. One copy is unsafe, period ;) Some of our team even wanted to not make "goal 1" (or its equivalent in labels) possible to use - start with 2 copies, always :) We can discuss the change you propose internally, but I'm not sure how it would go with the efficiency/speed of the current replication process. We don't consider current behaviour of MooseFS as a bug - it's just designed like that. |
I've demonstrated that goal change is done in unsafe manner that sometimes lead to data loss and you still say "We don't consider current behaviour of MooseFS as a bug"... Really? Bug or not bug, this can be improved, probably without much difficulties. Does it make sens to you that redundant replicas should be removed only after placement of the data is compliant with storage class? |
Using one copy and strict storage classes is using MooseFS in an unsafe manner. Yes, achieving both targets of replication in the same step would be better, that's why I wrote that we will analyse that and if it will be possible without significant performance downgrade, we will change it. But the current behaviour in itself cannot be considered unsafe. One copy is one copy and MooseFS does not have any clue that one location might be "safer" than another. From the system's point of view, one copy is equally unsafe on any server it is kept on, that's why 2 copies (and for larger instances 3 copies) is a recommended minimum in any scenario. |
Related question: In the storage class manual, section 2.9 it sounds like Since |
@inkdot7 this section of the manual only describes what happens at creation time. However, if you look into the manpage of
It shows both what happens when it's creation time of a chunk and what happens when a chunk needs to be replicated. Changing storage mode (CREATE to KEEP or KEEP to ARCHIVE) consists of a combination of deletions and replications (or only one of those, if the others are not necessary). And any and all replications in MooseFS will follow the rules in the above table. So in STRICT mode, which is of interest in this thread: if a chunk is created in a STRICT class and there are no appropriate labeled servers available, the chunk write operation will either hang (if there are servers with space, but just too busy at the moment to accept another write request) or return ENOSPC (if there is no space or simply no chunkservers with the requested label(s) ). If a chunk in a STRICT class needs to be replicated (for whatever reason: it is endangered, undergoal, on wrong labels, on unevenly balanced server), if all servers with appropriate label(s) are busy, the replication will wait, if there are no servers with appropriate label(s) or no space on them, the replication will not happen at all. |
@chogata Thanks for the more detailed info. What would then happen with data marked for several archive copies if all such servers are permanently full in strict more, e.g. I did notice that the manual page was rather clear about strict having a potential for loosing data if labels which run full are used. In that sense, MooseFS is not breaking any promises from the manual. However, since strict cannot be set to only apply to creation or archiving separately(?), if one wants to have the ability to prevent user creation of data when some certain storage class is full (by using strict), there is no currently no way to avoid the dangerous |
@inkdot7 You are right, strict mode is applied always to the whole class definition. Actually, you raised an interesting point and we had a short discussion about it. We made a note to research the possibility of applying different modes to different storage stages (CREATE, KEEP, ARCHIVE). Of course, in theory everything is possible ;) , we just need to evaluate the performance part of this idea. |
I have a server with dozen disks in RAID6-like ZFS configuration (
raidz2
) and I've thought of moving some of the least important data to ZFS-based chunkserver, as a single replica. In order to facilitate that I've created aSTRICT
storage class as follows:-C C,C -K L -A L -d0 zfs
. Chunkservers labelledC
are SSD-based so newly created chunks first land to fast chunkservers then eventually moved to the only chunkserver labelledL
on ZFS that I was about to make but never did.I had some data I've selected to be moved first. The data had been assigned to storage class
2
, with two available replicas sitting on non-SSD (non-C
labelled) chunkservers.I have re-assigned it to newly created
zfs
storage class (mfssetsclass -r zfs {unfortunate_data_folder}
) with chunkserver yet to be made. At that moment, there was no chunkserver labelledL
and never have been. I was going to configure it later.I walked away thinking that data will not migrate to nowhere, assuming it is still safe with two replicas. Imagine my surprise when few days later I've found that all data is gone, and all chunks have
0
(zero) replicas i.e.no valid copies
!!!The text was updated successfully, but these errors were encountered: