Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid accounting for Mesos Executor resources when supervisor already exists #160

Open
erikdw opened this issue Jul 19, 2016 · 0 comments
Open

Comments

@erikdw
Copy link
Collaborator

erikdw commented Jul 19, 2016

Right now the MesosNimbus scheduler code is necessarily naive about the actual resources required for the tasks, since it has no way to consistently account for the supervisors already existing.

i.e., say you need 1 CPU per worker/task, and 0.5 CPUs per supervisor/executor, and say there is already a supervisor on the target host. In that case we really only need 1 CPU to schedule a task, however we currently need to assume we need 1.5 CPUs in order to be able to launch the executor as well.

This is necessitated by the disconnect in knowledge between INimbus.allSlotsAvailableForScheduling and INimbus.assignSlots. i.e., INimbus.assignSlots doesn't get the existingSupervisors parameter that allSlotsAvailableForScheduling does.

The likely solution for this problem is to first solve #158 by ensuring the MesosNimbus is aware of all of the existing Mesos Executors and Tasks for this framework.
Another possible solution is "memoizing" (i.e., recording) of the Cluster that is passed into IScheduler.schedule(), and then consulting that in assignSlots().

erikdw added a commit to erikdw/storm-mesos that referenced this issue Jul 22, 2016
…e being

1. Our floating point tabulations in AggregatedOffers can lead to confusing results
where a 1.0 becomes 0.999999999996 for example.  We add a fudge factor by adding 0.01
more CPU resources in one test to work around this until the following issue is fixed:
  * mesos#161
2. Since we are disabling the ability of allSlotsAvailableForScheduling to account
for supervisors already existing when calculating resource needs, we disable the test
that is validating that behavior.  That will be reenabled once we fix this issue:
* mesos#160
erikdw added a commit to erikdw/storm-mesos that referenced this issue Jul 22, 2016
…e being

1. Our floating point tabulations in AggregatedOffers can lead to confusing results
where a 1.0 becomes 0.999999999996 for example.  We add a fudge factor by adding 0.01
more CPU resources in one test to work around this until the following issue is fixed:
  * mesos#161
2. Since we are disabling the ability of allSlotsAvailableForScheduling to account
for supervisors already existing when calculating resource needs, we disable the test
that is validating that behavior.  That will be reenabled once we fix this issue:
* mesos#160

Also a few cosmetic-ish changes:
1. Use more standard hostnames with a domain of "example.org", a standard domain
for documentation.  See http://example.org
2. Rearrange/renumber some of the offers to prevent confusion.
erikdw added a commit to erikdw/storm-mesos that referenced this issue Jul 23, 2016
…e being

1. Our floating point tabulations in AggregatedOffers can lead to confusing results
where a 1.0 becomes 0.999999999996 for example.  We add a fudge factor by adding 0.01
more CPU resources in one test to work around this until the following issue is fixed:
  * mesos#161
2. Since we are disabling the ability of allSlotsAvailableForScheduling to account
for supervisors already existing when calculating resource needs, we disable the test
that is validating that behavior.  That will be reenabled once we fix this issue:
* mesos#160

Also a few cosmetic-ish changes:
1. Use more standard hostnames with a domain of "example.org", a standard domain
for documentation.  See http://example.org
2. Rearrange/renumber some of the offers to prevent confusion.
JessicaLHartog pushed a commit to JessicaLHartog/mesos-storm that referenced this issue Jul 23, 2016
…e being

1. Our floating point tabulations in AggregatedOffers can lead to confusing results
where a 1.0 becomes 0.999999999996 for example.  We add a fudge factor by adding 0.01
more CPU resources in one test to work around this until the following issue is fixed:
  * mesos#161
2. Since we are disabling the ability of allSlotsAvailableForScheduling to account
for supervisors already existing when calculating resource needs, we disable the test
that is validating that behavior.  That will be reenabled once we fix this issue:
* mesos#160

Also a few cosmetic-ish changes:
1. Use more standard hostnames with a domain of "example.org", a standard domain
for documentation.  See http://example.org
2. Rearrange/renumber some of the offers to prevent confusion.
JessicaLHartog pushed a commit to JessicaLHartog/mesos-storm that referenced this issue Jul 28, 2016
…e being

1. Our floating point tabulations in AggregatedOffers can lead to confusing results
where a 1.0 becomes 0.999999999996 for example.  We add a fudge factor by adding 0.01
more CPU resources in one test to work around this until the following issue is fixed:
  * mesos#161
2. Since we are disabling the ability of allSlotsAvailableForScheduling to account
for supervisors already existing when calculating resource needs, we disable the test
that is validating that behavior.  That will be reenabled once we fix this issue:
* mesos#160

Also a few cosmetic-ish changes:
1. Use more standard hostnames with a domain of "example.org", a standard domain
for documentation.  See http://example.org
2. Rearrange/renumber some of the offers to prevent confusion.
JessicaLHartog pushed a commit to JessicaLHartog/mesos-storm that referenced this issue Jul 29, 2016
…e being

1. Our floating point tabulations in AggregatedOffers can lead to confusing results
where a 1.0 becomes 0.999999999996 for example.  We add a fudge factor by adding 0.01
more CPU resources in one test to work around this until the following issue is fixed:
  * mesos#161
2. Since we are disabling the ability of allSlotsAvailableForScheduling to account
for supervisors already existing when calculating resource needs, we disable the test
that is validating that behavior.  That will be reenabled once we fix this issue:
* mesos#160

Also a few cosmetic-ish changes:
1. Use more standard hostnames with a domain of "example.org", a standard domain
for documentation.  See http://example.org
2. Rearrange/renumber some of the offers to prevent confusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant