Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-48625, CP-48752: VM anti-affinity support for host evacuation and related UT #5599

Open
wants to merge 8 commits into
base: feature/vm-anti-affinity
Choose a base branch
from

Conversation

gangj
Copy link
Contributor

@gangj gangj commented Apr 29, 2024

Host evacuation plan with anti-affinity support will be carried out in 3 steps:

  1. Try to get a "spread evenly" plan for anti-affinity VMs, done if
    all the rest VMs got planned using binpack, otherwise continue.
  2. Try to get a "no breach" plan for anti-affinity VMs, done if all the rest VMs
    got planned using binpack, otherwise continue.
  3. Carry out a binpack plan ignoring VM anti-affinity.

@gangj gangj force-pushed the private/gangj/CP-48625 branch 3 times, most recently from bf8cf27 to e35169d Compare April 29, 2024 09:45
ocaml/xapi/xapi_vm_helpers.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/xapi/xapi_ha_vm_failover.ml Outdated Show resolved Hide resolved
@gangj

This comment was marked as resolved.

@gangj

This comment was marked as resolved.

in
vm_anti_affinity_spread_evenly_plan ~__context hosts_increasing
remaining_vms vm_can_boot_on_host
(grp_ranked_hosts |> update_anti_affinity_grp_ranked_hosts group h)
Copy link
Member

@minglumlu minglumlu May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data structure could be HostMap (instead of IntMap), then the update would be as simple as:

HostMap.update h (fun c -> Option.(value ~default:0 c |> succ |> some)) hosts

And the only cost is to generate the ranks (IntMap, required by select_host_from_ranked_groups) from the HostMap by:

     let h =
       HostMap.fold (fun h' vm_cnt acc ->
         IntMap.update vm_cnt (fun hs ->
           Option.value hs ~default:[] |> List.cons h' |> some
         ) acc
       ) hosts IntMap.empty
       |> IntMap.bindings
       |> List.map snd
       |> Xapi_vm_helpers.select_host_from_ranked_groups vm host_selector
     in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I would prefer the current implementation:

  1. Don't need to regenerate the whole ranked_list everytime a VM is planned on a host.
  2. Reuse(share) the same code: Xapi_vm_helpers.rank_hosts_by_vm_cnt_in_group with VM start, this can make sure they behave the same as code changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think updating the grp_ranked_hosts is really painful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time complexity of update_anti_affinity_grp_ranked_hosts is:
log N + (N * log N * log N) + log N * (N * log N + log N)

But after choosing a proper data structure, the time complexity is:
update: log N
re-generate: N * log N

Copy link
Contributor Author

@gangj gangj May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current implementation, I think the time complexity of update_anti_affinity_grp_ranked_hosts is O(N) (the number of hosts in the pool):

The functionlocate_host is N, as the total number of hosts to find from is N.

the operation to update ranked_grp for the host's original rank:

          ranked_hosts
          |> IntMap.update vm_cnt_on_host (fun hosts ->
                 match
                   get_hosts hosts |> List.filter (fun h -> fst h <> host)
                 with
                 | [] ->
                     None
                 | hs ->
                     Some hs
             )

IntMap to locate vm_cnt_on_host : log N
Filter it out from the original list: N

then add it to the next rank:

          |> IntMap.update (vm_cnt_on_host + 1) (fun hosts ->
                 Some
                   (Binpack.insert bigger_than (host, updated_host_size)
                      (get_hosts hosts)
                   )
             )

time complexity:
IntMap to locate vm_cnt_on_host +1 : log N
Insert it into the ordered list: N

For the other VM groups(at most 4 now)

          ranked_hosts
          |> IntMap.update vm_cnt_on_host (fun hosts ->
                 Some
                   (Binpack.insert bigger_than (host, updated_host_size)
                      (List.remove_assoc host (get_hosts hosts))
                   )
             )

IntMap to locate vm_cnt_on_host : log N
Update it in the ordered list: N


While for re-generating from a HostMap (host -> numOfVMs), I think the time complexity might be more than N * log N, for each rank, we still need to sort the hosts in terms of each host's free memory (we at least need log N to insert a host into a sorted list, but if insert with List, I think it might be N), then the overall time complexity will be: N * log N or even N * N.

I think it is simply because for re-generating, we have to "update" for each host, while for update_anti_affinity_grp_ranked_hosts, only need to update for 1 host.

ocaml/tests/test_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/tests/test_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/tests/test_ha_vm_failover.ml Outdated Show resolved Hide resolved
ocaml/tests/test_ha_vm_failover.ml Outdated Show resolved Hide resolved

(** Update the 2 layers Map - grp_ranked_hosts when a VM, its memory size is vm_size and
it is in anti-affinity group, will be planned to run on the host. *)
let update_anti_affinity_grp_ranked_hosts vm_size group host grp_ranked_hosts =
Copy link
Member

@minglumlu minglumlu May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is even larger than before. 😟
Wouldn't the difficulty of updating on such a data structure (2-layer map) imply that the choice on the data structure is not good?

Copy link
Contributor Author

@gangj gangj May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think updating the 2 layers map is not difficult. :)

I updated the comments as:

(** Generate 2 layers Map: the outer layer is a Map from `VM_group to ranked_hosts, while
    each ranked_hosts is a Map from vm_count to sorted host(`host, free_memory_size) List
    in increasing order in terms of the host's free memory, where every host in the list
    has the same number of running anti-affinity VMs resident on it: vm_count, for example:
    { grp1 ->
       { 0 -> [ (master, 500), (slave2, 508) ] }
    , grp2 ->
       { 0 -> [ (slave2, 508) ]
       , 1 -> [ (master, 500) ]
       }
    }
    indicates that there are 2 VM groups. For grp1, all the 2 hosts has 0 running VM
    from grp1, where master has 500 free memory, slave2 has 508 free memory, the 2 hosts are
    in increasing order in terms of the host's free memory. For grp2, slave2 has 0 running
    VM from grp2, and master has 1 running VM from grp2. *)

Wish it will be helpful in understanding the data structure.

Copy link
Member

@minglumlu minglumlu May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compare could be based on a two-dimension key (vm_cnt, free_mem). Then the data structure will be a sorted list (or a Set) still. Creating and updating the list would be easier?
The host free memory could be in a Hashtbl to gain O(1) in comparison. - (This is wrong)

Copy link
Contributor Author

@gangj gangj May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR updated this way: choose host from HostsSet, the min_elt one in the Set is the most suit host for the VM.

@gangj
Copy link
Contributor Author

gangj commented May 10, 2024

@gangj, there have been so many fixup commits now so that the only way to review this PR is to look at the overall diff, which is very large and includes refactoring and code re-organisation. This makes it very hard to verify that what has changed relative to the original code is correct (especially in xapi_ha_vm_failover.ml). Could you please squash the fixups and create a set of commits that can be reviewed one by one?

@robhoes Sure, the commits are squashed.

@gangj gangj force-pushed the private/gangj/CP-48625 branch 5 times, most recently from 8507f71 to 618c1e8 Compare May 13, 2024 12:13
Gang Ji added 8 commits May 16, 2024 17:06
Add func host_to_vm_count_map to be used, rename RefMap to HostMap

Signed-off-by: Gang Ji <gang.ji@citrix.com>
fixup review comments

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Host evacuation plan with anti-affinity support will be carried out in 3
steps:

1. Try to get a "spread evenly" plan for anti-affinity VMs, done if
   all the rest VMs got planned using binpack, otherwise continue.
2. Try to get a "no breach" plan for anti-affinity VMs, done if all the
   rest VMs got planned using binpack, otherwise continue.
3. Carry out a binpack plan ignoring VM anti-affinity.

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Add "keep_localhost" to enable test pool with more than 3 hosts.

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
let compare = Ref.compare
end)

module IntMap = Map.Make (struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless module?

module HostsSet = Set.Make (struct
type t = int * int64 * [`host] Ref.t

let compare (vm_cnt_0, size_0, host_ref_0) (vm_cnt_1, size_1, host_ref_1) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could get rid of the nested match:

match (Int.compare x1 x2, Int.compare y1 y2, Ref.compare z1 z2) with
| n, _, _ when n <> 0 -> n
| _, n, _ when n <> 0 -> n
| _, _, n -> n

, Memory_check.host_compute_free_memory_with_maximum_compression
~__context ~host None
(* Used to sort pairs into increasing order of their second component *)
let compare_snd (_, a) (_, b) = compare a b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless function?

let update_host_size_map host vm_size host_size_map =
match Xapi_vm_helpers.HostMap.mem host host_size_map with
| false ->
error "Failed to find free memory of host in host_size_map for host." ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is for almost impossible case. Either using monad or catching exceptions outside to get the code compact.

Db.VM_group.get_placement ~__context ~self:group = `anti_affinity
)

let init_anti_affinity_grp_host_vm_cnt_map ~__context =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use a module to wrap all of these VM anti-affinity functions and types.
So that in the module, we don't need to add anti_affinity to the names of functions and variables.

(Option.value ~default:Xapi_vm_helpers.HostMap.empty host_vm_cnt_map)
)
)
grp_host_vm_cnt_map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the impossible Not_found being caught outside, how about the following?

let update_grp_host_vm_cnt group host grp_host_vm_cnt_map =
  let M = Xapi_vm_helpers.HostMap in
  let hs =
    VMGroupMap.find group grp_host_vm_cnt_map in
    |> M.update host (fun c -> value ~default:0 c |> succ |> some)
  in
  VMGroupMap.update group (fun _ -> Some hs) grp_host_vm_cnt_map

in
host_vm_cnt_map grp
|> Xapi_vm_helpers.HostMap.find_opt host
|> Option.value ~default:0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, with catching the impossible Not_found, this could as simple as:

let vm_cnt =
  VMGroupMap.find group grp_host_vm_cnt_map
  |> Xapi_vm_helpers.HostMap.find host
in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no entry for host in the map when there is no VM from the same group running on it.

2. For each anti-affinity VM, try to select a host for it so that there are at least 2 hosts which
has running VMs in the same anti-affinity group. If there are already 2 hosts having running VMs
in that group, skip planning for the VM. *)
let rec vm_anti_affinity_no_breach_plan ~__context total_hosts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some unnecessary parameters for this function. Say,
Initially, we have

A: HostMap: host -> free_mem
B: GroupMap: group -> (HostMap : host -> vm_cnt)

The plan_for_spead_evenly requires:

A: HostMap: host -> free_mem
B: GroupMap: group -> (HostMap : host -> vm_cnt)
B -> C: GroupMap: group -> (vm_cnt, free_mem, h)

The plan_for_no_breach requires:

A: HostMap: host -> free_mem
B -> D: GroupMap: group -> (hosts, inhabited_hosts)

And the B -> D is as simple as:

  GroupMap.map (fun hosts ->
    let m1, m2 = HostMap.partition (fun _ vm_cnt -> vm_cnt = 0) hosts in
    (List.map fst HostMap.bindings, HostMap.cardinal m2)
  ) 

)
grp_host_vm_cnt_map

let init_anti_affinity_grp_hosts_set ~__context grp_host_vm_cnt_map hosts =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this performs the following mapping:

B: GroupMap: group -> (HostMap : host -> vm_cnt)
->
C: GroupMap: group -> (vm_cnt, free_mem, h)

So it could be just as the B is already a map, a simple map will do the trick.

GroupMap.map (fun hosts ->
  HostMap.to_seq
  |> Seq.map (fun (h, vm_cnt) -> (vm_cnt, (HostMap.find h free_mems), h))
  |> HostSet.of_seq
) GroupMap.empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no entry for host in the map B when vm_cnt of the host is 0 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants