Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm for RH limit checks #2676

Open
dilshat opened this issue Oct 8, 2018 · 4 comments
Open

Algorithm for RH limit checks #2676

dilshat opened this issue Oct 8, 2018 · 4 comments

Comments

@dilshat
Copy link
Member

dilshat commented Oct 8, 2018

We need to improve the algorithm used for RH limit checks.
Current implementation is too restrictive.
For details on the current algo, see https://github.com/subutai-io/peer-os/wiki/RH-checking-algorithm

@samsonbek
Copy link
Member

Yes, current algorithm is not flexible. For example, if Resource Host has weak CPU, but huge RAM and Disk, limit check will prevent using RAM and Disk fully, because CPU quota will be exhausted quickly by new containers. Same could be with RAM and Disk: Resource Host may have strong CPU but small RAM or Disk storage.

From CPU, RAM and Disk, most volatile resource is CPU, after that goes RAM. And Disk is resource which requires exact measuring. So, I think for the beginning, we should skip checking CPU limits.
And for RAM, we could introduce some volatility factor, like 80%. I.e. if existing container quota is 4GB and it's historical consumption for last hour is 2GB, "limit check algorithm" should subtract 4*0,8 GB (not whole 4GB) from available resources.

Also, limit check values may vary depending on ratio of RAM to DISK and vice-a-versa. I.e. if Resource Host has small RAM but huge Disk, RAM volatility factor might be lower.

@dilshat
Copy link
Member Author

dilshat commented Oct 15, 2018

@lbthomsen @niclash your comments are requested

@niclash
Copy link
Member

niclash commented Oct 15, 2018

I think that the underlying issue is that CPU is considered "reserved" rather than measured, per container. And users can't select whether they want to "reserve" CPU or is ok to use "shared" CPU. If we treat the containers a "reserved", then utilization will probably look dismal as many containers will use very little CPU.
Fixing this would need various changes, allowing users to (for a fee) reserve CPU, rather than "get some CPU", and in the current situation, such change is rather big, and quickly comes into the full resource management system that should be in place for RH owners and Container deployers.

So, in the short term, I recommend;
a. RAM is reserved by container, and not by demand at all. It is the primary "selector" of container (tiny SIZE, not tiny SPEED)
b. CPU load limits per container already exists and can remain as per "RH-checking-algorithm" page.
c. Initially only allocate containers to available "CPU capacity", i.e. 100% * NoOfCPUs.
d. Measure each containers usage, and if over a "long" period, say a week, add 80% of the "not used" part to the "CPU capacity" of the RH.

@dilshat
Copy link
Member Author

dilshat commented Oct 15, 2018

a. The algo considers RAM as reserved already, so done
b. then it is done
c. now we calculate as availableCpu == numberOfCores * idleCpu
d. ok I can increase historical metric to 1 week instead of 2 hrs

what about the other resources, disk? do we use them in calculations? @niclash

@dilshat dilshat added this to the 8.0.1 milestone Oct 30, 2018
@dilshat dilshat removed this from the 8.0.2 milestone Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants