Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow allocation of persistent volumes on non-root mesos disk resources #3765

Closed
timcharper opened this issue Apr 13, 2016 · 8 comments
Closed

Comments

@timcharper
Copy link
Contributor

Mesos supports multiple-disk resources of the following kinds of disk resources: "Root", "Path", and "Mount". In the present release candidate, it is impossible to allocate a persistent volume on a disk resource other than root. Marathon will see non-root disk resources offered, and completely ignore them.

I believe this could be resolved somewhat easily by:

  • Adding an optional string parameter, kind, to PersistentVolumeInfo (and updating serializers / json-schema)
  • Updating ResourceMatcher to consider non-Root Resource offers (currently, my experience indicates these are ignored, but I can't exactly tell where, in the code, they are being filtered out). I would be inclined to make it such that if kind were unspecified, then a root volume is used, although there is no reason to not consider path disk resources as well. (the biggest concern would be allocating an entire disk to a process that doesn't need / want an entire disk).
  • Updating OfferOperationFactory createVolumes and reserve to specify the appropriate disk source for the selected disk resource(s).

Am I totally off in thinking this should be somewhat easy to do? I'm an experienced Scala programmer and am willing to take a stab at it, but want to discuss / solicit any guidance before starting. This feature would drastically simplify my mesos cluster deployment.

@meichstedt
Copy link
Contributor

Hi @timcharper, thanks for offering your help here, we'd be happy about your contribution.

As for the current state, I am not sure why disks other than Root are not used; they are not filtered out explicitly.

For Path disks, the approach sounds good and I agree, that seems rather easy to implement. For Mount disks, however, I'm not sure about the simplicity or the best approach. There are things to consider:

  • Mount disks cannot be carved up into smaller chunks. If the disk has 100GB, how can the volume specify take all? It will get the whole disk, but it needs to know how big that is ...
  • therefore, a Mount disk seems somewhat special (think fast SSD)
  • there might be several mount disks with different specs in the cluster or even on one agent
  • how would we differentiate/match these?

So, by simply adding a kind or kinds, you could only specify which kind will be considered for a match. I don't think that's sufficient. Given that a Mount disk might have a special purpose, I wouldn't want a task that doesn't specify kind to take my special purpose SSD. Maybe the default would be Root or Path but NOT Mount?

For the using a Mount disk, we would probably want to further specify the disk: this task should have a persistent volume on a Mount disk that's labelled with SSD and the persistent volume should include the whole disk.

Any thoughts on this?

@timcharper
Copy link
Contributor Author

timcharper commented Apr 18, 2016

I believe that you could, in theory, use resource roles to control which disks are available to be allocated by an application. However, that might complicate things as memory and ram would need to be allocated by the same resource role. ¯_(ツ)_/¯

I don't know of any other signals we can use. Perhaps resource roles could be specified or persistent storage only?

@meichstedt
Copy link
Contributor

We can't use resource roles for that – as soon as you statically reserve a resource by assigning it a role, you can no longer dynamically reserve it. I guess we could change the logic to match on statically reserved disks, but then we couldn't add reservation labels (to identify which task uses it).

@andlaz
Copy link

andlaz commented May 20, 2016

any update for implementing consumption of Path disk resources? I'm seeing offers logged with 0.0 disk resources while a few lines later the correct offered resources are logged

May 20 16:57:47 daasinoa-vls01 marathon[13795]: [2016-05-20 16:57:47,753] INFO Offer [8054cda3-39a1-4af3-aa5b-fb1020033a7b-O13934]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (256.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-155)

then, on the next log line, you see the complete offer, including the non - Root disk ( disk(*) 65536.0 )

May 20 16:57:47 daasinoa-vls01 marathon[13795]: [2016-05-20 16:57:47,754] INFO Finished processing 8054cda3-39a1-4af3-aa5b-fb1020033a7b-O13934. Matched 0 ops after 1 passes. ports(*) 20000->65535; disk(*) 65536.0; cpus(*) 8.0; mem(*) 30989.0 left. (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-103)

@timcharper
Copy link
Contributor Author

I've got a work-in-progress commit for this:

https://github.com/timcharper/marathon/tree/multi-disk

It compiles (except for the tests). It's not really pretty, just shoveling code around to get a sense for general direction at this point. I had to change the disk resource allocation strategy to pair each allocation with a persistent volume (so that the persistent volume could be created on the corresponding disk). Allocation strategy is myopic in that it won't try different combinations of allocations to try and make it succeed, but that can be improved once this is ironed out.

As for resource tagging, I'm thinking of the following:

  • add a "whole disk" flag to persistent volume. If checked, use mount disk. If not, use root or PATH volume.
  • add constraints to persistent volumes in order to compensate for the lack of resources tags in marathon. "constraints": [["path", "LIKE", "^.+/ssd.*]]. This would allow the user to put disk property information in the path in which the disk is mounted, which probably isn't a bad idea anyways, and then select the disk accordingly.

@timcharper
Copy link
Contributor Author

If you're watching this, please take a look at my PR ^^

@joerg84
Copy link
Contributor

joerg84 commented Aug 22, 2016

@timcharper Did you see @meichstedt's comments on your PR?
Kudos to both of you for tackling this!

@timcharper
Copy link
Contributor Author

This has been merged and should be released in 1.4.0

@mesosphere mesosphere locked and limited conversation to collaborators Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants