Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 3.0.117: delayed standard chunks compliance. #524

Open
onlyjob opened this issue Feb 25, 2023 · 8 comments
Open

[BUG] 3.0.117: delayed standard chunks compliance. #524

onlyjob opened this issue Feb 25, 2023 · 8 comments

Comments

@onlyjob
Copy link
Contributor

onlyjob commented Feb 25, 2023

We have a STRICT class defined as -C C+N,C+N -K N,N,S -d0 -A N,N,S (standard and archive chunks definition is identical). Intention of such setup is to quickly create two replicas on SSD-backed chunkservers C then create a third replica as data is moved to permanent location as per storage class definition of standard chunks.

Occasionally unrelated chunkserver labelled M is down (we only power it on few times a week). While chunkserver M is unavailable (in temporary maintenance mode), the above storage class show undergoal standard chunks, indicating that C --> K replication is NOT happening as it should.

(Un-)Availability of chunkserver that is not a part of Storage Class definition should not affect migration, replication and archiving.
Standard chunks compliance to Storage Class should be ensured ASAP when all required chunkservers are available, regardless of other chunkservers' availability.

@chogata
Copy link
Member

chogata commented Mar 13, 2023

It's not a bug, it's a feature ;)

Maintenance mode is designed as a TEMPORARY condition of maintenance in your MooseFS instance, during which there should be as little "movement" in the system as possible. Hence the block on replications, any replications. What you need is a different feature, that will allow you to keep a chunk server offline and for MooseFS to not treat is as an "error". But to even design such a feature, we would have to understand your need. Why exactly do you keep a CS down? What are you achieving by such setup?

@eleaner
Copy link

eleaner commented Mar 13, 2023

Although I don't know exactly the use case of @onlyjob I can see something like the functionality of "extra copy".
e.g. class with goal 3 "+1 extra"
the chunk server would have to be marked as extra and, I assume, also "assigned" to a class.

If extra is connected, I have an over goal that does not need to be rebalanced - actually, it needs to be rebalanced to 4.
But when extra is disconnected, the goal of 3 is perfectly OK.

Combined with copy metadata, it could be, e.g. used as an offsite copy that is periodically re-connected

@onlyjob
Copy link
Contributor Author

onlyjob commented Jun 23, 2023

It's not a bug, it's a feature ;)

Not only it is a bug, but it is a nasty one that potentially endangers data.

I've explained situation in which undergoal chunks are not replicated while unrelated chunkserver is offline (being shut down gracefully). According to defined storage classes, an offline chunkserver is never going to be a destination for (suspended) replication that could (and should be able to) succeed.

A (large) cluster should be as operational as possible when some chunkservers are down for maintenance. MooseFS (master) already knows goals, labels and storage classes to behave gracefully (but it does not).

It should be perfectly fine having a chunkserver dedicated/reserved exclusively for archival purposes not being up all the time.

@chogata
Copy link
Member

chogata commented Jun 26, 2023

It should be perfectly fine having a chunkserver dedicated/reserved exclusively for archival purposes not being up all the time.

Yes, but it would be a DIFFERENT FEATURE than Maintenance Mode, which is dedicated to short maintenance breaks, not to keeping a chunk server offline for long periods of time. The latter would be a different feature that MooseFS does not (yet) have. We could add it, but to make the best of it we need to understand how our users need it to work. We need feedback, aka answers to the questions we ask. @eleaner supplied some, thank you :)

@borkd
Copy link
Collaborator

borkd commented Jun 26, 2023

"Short" break is a relative term that hinges on file system activity, size and characteristics of underlying filesystems and media.

Automatic maintenance mode is useful for graceful mode reboot/mainenance on smaller, read-mostly instances, where end-to-end maintenance cycle is merely a blip on charts.

As the end-to-end maintenance cycle of stop/dereguster/actual maintenance/register/online gets longer current approach becomes liability on its own.

I have worked around it on larger busy clusters, but it requires additional out of band orchestration and a way to express conditions for a minimum viable storage class definition whose goal is best effort to safeguard newly created chunks and ensure timely replication of any chunks which became endangered due to maintenance itself. Having knobs to describe such "while_in_mainenance" policy would be quite useful.

@chogata
Copy link
Member

chogata commented Jun 27, 2023

@borkd - MooseFS 4 (yes, CE is coming and actually not that far away right now) has that changed a bit, chunks that are being modified when one copy is in maintenance mode are replicated (unlike in 3). So even a longer break will not endanger your data: what is newly written and/or modified is still kept in required number of valid copies (unless STRICT mode interferes, but that is the nature of STRICT). What is unmodified is not replicated, but when the offline copy re-connects, it will still be valid.

BTW, historically maintenance mode was not present in MooseFS - every disconnection triggered replications. It was our users' idea to add it, because, and I quote "I don't need these chunks to replicate when I just restart the server after some operating system upgrade" (said more than one MooseFS user) :)

But the OP did not mean that. He wants to keep a chunk server offline, not for maintenance, but because this is an "archive" chunk server. And this is something we do not have special provisions in MooseFS for. We could add a new feature, but to design a good one, we need to know why our users need it, how they want to use it.

For example, I see 3 approaches: first could be something like @eleaner described: an extra copy (possibly off-site). So a class is defined like: "keep 3 copies on servers with labels A, one copy on servers with label B, but if all B-labeled servers are offline, don't bother to make a 4th copy elsewhere". And actually this is possible right now with STRICT mode (and offline server NOT in maintenance mode), but STRICT mode is dangerous (what if you run out of A servers too?). So maybe this would need some tweaks.

Second would be to mark a chunk server as "possibly-offline, do not replicate" (aka archive or something like that) - whenever this chunk server is offline, the rest of the system works as usual, but does not try to replicate chunks that have copies there.

Third would be to have a special flag for data (files and directories) that would mark them as "possibly offline", no to be replicated (and also probably readonly, because it would clash with modifications).

Or maybe there is another approach I (nor anybody from my team) did think about, but the users did? This is why this community forum exists - to discuss how new features could work and which features will be useful. Not get angry, that existing ones work differently than we want them to work - someone, somewhere needs them exactly as they are :)

@borkd
Copy link
Collaborator

borkd commented Jun 27, 2023

@chogata - yay for good news! I've played with a number of scenarios described above, including systems taken offline for extended periods, and my feedback stands for both v3 and v4.
The new maintenance mode knobs I am looking for stem from situation where STRICT placement is in use (and must be until there is a chance to assert "convergence" of chunk migration to desired labels, or at least measure the percentage of chunks which would pass the STRICT placement check if it were enforced at the time of query) and where rolling, partially serialized maintenance tasks across entire cluster take many hours on large, busy systems.

@onlyjob
Copy link
Contributor Author

onlyjob commented Jun 28, 2023

No "while_in_mainenance knob" is NOT needed. What is required is fixing bugs that prevent or delay replication according to well defined STRICT storage classes. Like in my case when there is only one chunkserver to accommodate one archival replica -- so replication is not happening when it is offline, regardless whether it is in maintenance mode or not.

On unrelated note, STRICT should be default mode of operation. Spill of chunks to chunkservers where they do not belong is terrible. If there is no space left then either cluster is underprovisioned or there is a flaw in design of storage classes. There already should be few relevant issues opened about that. One can design storage class as -C B+C+D,E+F+G -K D,G to allow creation of chunks on other chunkservers (B,C,E,F) when D and G chunkservers are not 100% available -- or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants