Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hwloc: Add weighted interleave support #662

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

honggyukim
Copy link

Since a new memory policy MPOL_WEIGHTED_INTERLEAVE is added at [1], hwloc needs to support this flag.

This new flag is expected to be released from linux-v6.9.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa3bea4e1f8202d787709b7e3654eb0a99aed758
Signed-off-by: Honggyu Kim honggyu.kim@sk.com

Since a new memory policy MPOL_WEIGHTED_INTERLEAVE is added at [1],
hwloc needs to support this flag.

This new flag is expected to be released from linux-v6.9.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa3bea4e1f8202d787709b7e3654eb0a99aed758
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
@honggyukim
Copy link
Author

Hi, this is my first time contribution to this project so please let me know if there is anything to fix. Thanks.

@bgoglin
Copy link
Contributor

bgoglin commented Apr 20, 2024

Hello. Thanks for the reminder. I followed early versions of these patches but forgot about it when it became close to ready for inclusion. Given that this interface is supposed to interleave in a more clever way, should we use it by default for hwloc's interleave policy when supported? As long as the kernel doesn't set buggy weights on nodes, it should work fine, right?

@honggyukim
Copy link
Author

Hi @bgoglin, thanks for the quick response.

should we use it by default for hwloc's interleave policy when supported?

IMHO, MPOL_WEIGHTED_INTERLEAVE can do the same behavior as MPOL_INTERLEAVE when the weight ratio is set to 1 for all the nodes. But this flag is yet another new flag so it's better not to change the default policy.

As long as the kernel doesn't set buggy weights on nodes, it should work fine, right?

The weight values are set to 1 for all the nodes by default unless a system admin changes the values at /sys/kernel/mm/mempolicy/weighted_interleave/node*, which was introduced by https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dce41f5ae2539d1c20ae8de4e039630aec3c3f3c.

And yeah, it would work fine when weights have sane values.

@bgoglin
Copy link
Contributor

bgoglin commented Apr 24, 2024

When MPOL_PREFERRED_MANY was added, hwloc just used it instead of MPOL_PREFERRED because it was supposedly better in all cases. The situation isn't exactly the same here, but we could still envision using WEIGHTED by default in hwloc since it doesn't break the current specification of hwloc's INTERLEAVE flag (we could also switch back to the old MPOL_INTERLEAVE when the HWLOC_MEMBIND_STRICT flag is given). I need to think more about it.
One thing to keep in mind is the future extension of the mbind() syscall for custom interleaving. This won't fit in hwloc's current API at all, we may have to add a Linux-specific hwloc binding call. I'd like to see better this whole picture before take a decision on this new WEIGHTED flag alone (but I still want to make it in hwloc 2.11 in the near future).

@honggyukim
Copy link
Author

Hi @bgoglin,

When MPOL_PREFERRED_MANY was added, hwloc just used it instead of MPOL_PREFERRED because it was supposedly better in all cases.

Do you mean by 6abf03d?

The situation isn't exactly the same here, but we could still envision using WEIGHTED by default in hwloc since it doesn't break the current specification of hwloc's INTERLEAVE flag (we could also switch back to the old MPOL_INTERLEAVE when the HWLOC_MEMBIND_STRICT flag is given). I need to think more about it.

MPOL_WEIGHTED_INTERLEAVE works with a single global policy written at sysfs. If this isn't a problem in hwloc, then I think it can be fine to make it for default interleave policy. But I don't have much experience on hwloc project yet so please take your time for the further consideration before making decision.

One thing to keep in mind is the future extension of the mbind() syscall for custom interleaving. This won't fit in hwloc's current API at all, we may have to add a Linux-specific hwloc binding call. I'd like to see better this whole picture before take a decision on this new WEIGHTED flag alone (but I still want to make it in hwloc 2.11 in the near future).

It looks like you mean vma range weighted interleaving. It was previously suggested at https://lore.kernel.org/linux-mm/20240103224209.2541-1-gregory.price@memverge.com with a mbind2 syscall but it requires to convince more maintainers why this is needed with more practical use cases. I think it won't be accepted anytime soon so we don't have to worry about it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants