Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch mounted in specific pool sub directory, with priority #1042

Open
magma1447 opened this issue Jun 12, 2022 · 5 comments
Open

Branch mounted in specific pool sub directory, with priority #1042

magma1447 opened this issue Jun 12, 2022 · 5 comments

Comments

@magma1447
Copy link

magma1447 commented Jun 12, 2022

Describe the solution you'd like
A clear and concise description of what you want to happen.
I have a huge data mergerfs file system, which is built upon four raids as branches. They are all rotating discs. I would like to use SSD storage for incoming data, in my case for a specific directory, but it could also be for specific directories (plural).

Imagine raids being mounted in /mnt as /mnt/md[0-3], and then the pool mounted in /data. I would then like to have /mnt/ssd0 used for /data/incoming.

I imagine this could be implemented by adding a new branch type, though I also assume that it might not be super easy, and might even be out of scope for the project. I am thinking a new branch type with syntax like this could be added:
/mnt/ssd0=PO,/data/incoming,/data/incoming2
PO = Policy override, but there could probably be a much better name for it. I was first thinking PRIO (priority), which I find worse.

Then every time mergerfs wants to create new data in those directories it should use ssd0 instead of following the configured policy.

The next issue is that if/when the data is moved from the SSD, it shouldn't keep the data on the SSD, regardless of the policy set.

Describe alternatives you've considered
The best alternative I have found is to use mount --bind /mnt/ssd0 /data/incoming, though that has other downsides. Like being able to run out of space while the pool still has plenty of space. Or "hiding" underlying data in case the mount was missing for a while.

I have also considered bcache, but that adds a whole lot of complexity that I don't want. Also I can't keep the SSD "cache" to just a single directory.

Additional context
If you feel that this is way too cumbersome or too much out of scope. Just close the request with a deny without further explanation. I am fine with that since I understand that this probably isn't a hack done in a day. I am also afraid it could have consequences I haven't been able to figure out myself.

@trapexit
Copy link
Owner

I understand what you're asking for but I'm not sure I understand your proposal. There is no such thing as a 'branch type'. What you're asking for is per directory policies vs the defacto "global" policies.

  1. Managing the config for this isn't easy. Policies apply per function. Right now you have "number of policies" * "functions". Adding directories to that you you're not multiplying that per directory. It could be very verbose and the existing config setup just doesn't make this easy to add.
  2. I'm working on a massive rewrite of mergerfs and really not spending much time on mergerfs 2.x (also working on other projects.) I was adding "list of list" of branches but wasn't intending to have per directory behavior. The intent for the list of lists was specifically for these kinds of situations. Perhaps a per directory policy would be better. I need to think about it.

A list of list would work by having the SSD in the first list with a epmfs policy and then the second list is everything else however you want. Per directory policies don't exactly provide the same behavior. You'd need to be able to reorder or define the branches for it to be fully useful. Having to define a list of branches per directory seems like a lot to configure. I'm moving to TOML for config so it's possible to articulate a lot more than currently but branches are kinda verbose. Perhaps if I made some branch indirection. A branch is defined and given a name and then you use that elsewhere?

It's not that a per directory behavior isn't useful in some situations it's just 1) are there other ways to solve the problem 2) how to make it so config is not too complex.

BTW... this kind of situation is worked around in the docs with having two separate mounts. Not ideal but is used by people who want to have a ssd as the target for downloads or whatnot.

@magma1447
Copy link
Author

I understand what you're asking for but I'm not sure I understand your proposal. There is no such thing as a 'branch type'. What you're asking for is per directory policies vs the defacto "global" policies.

The docs referred to 'branch type' regarding RO/RW/NC. But whatever, the important thing is that we understand each other.

It's not that a per directory behavior isn't useful in some situations it's just 1) are there other ways to solve the problem 2) how to make it so config is not too complex.

I fully understand your arguments and don't complain at all. Mergerfs is awesome in most other ways, after all these years I am still impressed by it, and surprised I don't know more people than I do that uses it.

I will probably mount my /mnt/ssd0 into /data/incoming on top of my mergerfs and see how that works out for me. It would just have been nice to have the big data array as a backing storage. I could always just buy a bigger SSD instead. :)

Thank you for taking your time.

@trapexit
Copy link
Owner

The docs referred to 'branch type' regarding RO/RW/NC. But whatever, the important thing is that we understand each other.

That's a typo. It should say "mode". I'll fix that.

What exactly is the behavior you're looking for? For that specific directory you want it to prioritize the SSD for create and if that fails due to minfreespace or whatnot it would choose the rest of the pool?

@magma1447
Copy link
Author

What exactly is the behavior you're looking for? For that specific directory you want it to prioritize the SSD for create and if that fails due to minfreespace or whatnot it would choose the rest of the pool?

Prioritizing it for a specific directory yes. But also:

  1. Don't fill it with anything else. For example when using a policy that uses the drive with the most space available.
  2. When moving things out of the directory, let the set global policy decide on which branch/source device it ends up.

If I am not missing anything myself, it sounds like just mounting it on top of mergerfs solves everything but the minfreespace part. Which probably means that it adds too much complexity to mergerfs for a fairly small gain.

@trapexit
Copy link
Owner

When moving things out of the directory, let the set global policy decide on which branch/source device it ends up.

There isn't any "move out of directory" feature currently in mergerfs except moveonenospc feature.

Don't fill it with anything else.

This is different than what most people have requested. They tend to want a general tiered setup. With list of lists you would put the faster storage in the first list and slower in following lists. Then files are moved down the list on whatever algo preferred. They want creates to move to slower branches if the first set is full and to include files in general. If you only put one directory on N filesystems then mergerfs will have to error EXDEV when a move/rename/link happens which would screw up most people's setup where they want to download to fast storage, "move" / link to the target location, and then when the fast storage fills or after some time move those files to the slower storage. Your setup would copy from fast storage in /data/incoming to slow storage /data/isos.

I still wonder about how one could make something that isn't overly complicated to configure. Even with my existing prototype code for list of lists it doesn't yet support different policies per list. Every time you add something you have to multiply the number of config by that number.

Each function has a set of policies. Over a dozen functions with several policies each. That multiplied by each group of branches. In a per directory setup you'd need both. For each directory -> function policies + branches.

In toml I've got something like.

[branches]                                                                                                                                                                                                                                    
min-free-space = 123                                                                                                                                                                                                                          
                                                                                                                                                                                                                                              
[[branches.group]]                                                                                                                                                                                                                            
                                                                                                                                                                                                                                              
[[branches.group.branch]]                                                                                                                                                                                                                     
active = true                                                                                                                                                                                                                                 
path = '/mnt/hdd/fs0'                                                                                                                                                                                                                      
path-type = 'literal'                                                                                                                                                                                                                         
mode = 'RW'                                                                                                                                                                                                                                   
if-not-mountpoint = 'fail'
                                                                                                                                                                                                                                              
[[branches.group.branch]]                                                                                                                                                                                                                    
path = '/mnt/hdd/drives*'                                                                                                                                                                                                                     
path-type = 'glob'                                                                                                                                                                                                                           
                                   
[[branches.group]]
[[branches.group.branch]]
....

Having that multiple times seems like a nightmare. Would have to rethink this. Maybe allowing the creation of branches with unique names that then are separably defined in groups or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants