Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distrib #370

Open
wants to merge 135 commits into
base: master
Choose a base branch
from
Open

Distrib #370

wants to merge 135 commits into from

Conversation

NicolasDenoyelle
Copy link

This branch is an addition to exisitng hwloc_distrib() method to distribute cpusets of the topology.
It adds a new way to iterate over topology objects of a single level with a hierarchical policy.
Utility hwloc-distrib has been modified to reflect the new capabilities.
This branch purpose is to bring thread binding policies to hwloc toolset by using it with hwloc-thread-bind branch.

bgoglin and others added 29 commits October 22, 2019 13:41
No code change, just add comments about things being official instead of assumptions.

The Gen5 specs is pretty-much finalized, things won't change anymore.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Check that Linux can add NUMA nodes to x86 CPU information.

And check that Linux can annotate x86 AMD topoext NUMA nodes.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
If the workspace clone ever ran on another branch (e.g. in my zbgoglin jobs),
git branch returns multiple lines, which causes the 2nd branch name
to be ran as a command-line after the only expected line "job-0-tarball.sh <firstbranch>"

Use git rev-parse --abbrev-ref HEAD instead.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
This tells the code not to ever merge that group with structurally-identical
parent or children.

This is useful for Groups implementing new "types" that cannot be backported
to stable releases. New types won't be merged by default, but Groups would.

Requested by Intel for Die objects.

This doesn't break the ABI because the attribute structure has always been
calloc'ed, which means this attribute was "0", which matches the default
"merge group" behavior.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Update CPUID.1f x86 test case not to merge Die groups anymore.
Hence there's no need to ignore Caches anymore.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
…ile/module types

Make them groups.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
…rectories

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
I managed to convince Intel that adding another foo_siblings
between core_siblings and thread_siblings would break userspace
and situation could be even worse if they ever add another
intermediate level in the future.

So they are finally renaming to filenames whose semantics doesn't
depend on intermediate levels: core_cpus and package_cpus.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Linux 5.3 will have new "die_cpus" and "die_id" sysfs files
for upcoming architectures with multiple dies per packages.

When the die cpuset is different from the package, add a "Die" group.

Don't add it when there's a single Die per package because
most CPUs don't want to show a useless additional Die level.

We don't want to set the Die level to keep_structure because
it would get automerged in L3 caches on CLX, and lstopo displays
everything by default anyway.

Set the "dont_merge" group flag if HWLOC_DONT_MERGE_DIE_GROUPS
is set in the environment, just like in the x86 backend.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Old kernels exposed two packages on E5v3 in Cluster-on-Die mode
because the package core_siblings was wrong.
We detected that case when two packages had the same physical_package_id.

This was fixed in Linux 3.18, backported in RHEL7.
Other important distros use a more recent kernel now.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Just like for cores and packages.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Fixes commit c1c34a6

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Otherwise the matrix would be wrong.

Further fixes commit c1c34a6

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
…vailable

Hence, we don't have to run both on Linux/x86 anymore,
and we don't have to manually tarball the CPUID files.

Refs open-mpi#186

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Use realpath so that we can change the current directory
without breaking the destination relative directory.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Reported by Intel from the output of klocwork.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Reported by Intel from the output of klocwork.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Reported by Intel from the output of klocwork.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin and others added 26 commits October 22, 2019 13:42
…objects during build

Thanks to Eloi Gaudry for the patch.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Instead of having all of them in the main solution file.

Thanks to Eloi Gaudry for the patch.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Defined with recent VS.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Thanks to Eloi Gaudry for the patch.

We force retarget to an old vs110 for ci.inria.fr.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Thanks to Eloi Gaudry for the patch.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Move idea of hwloc-ps to a github issue.
Update some comments, add details for command-line build.

Thanks to Eloi Gaudry for the suggestion.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Will run in the extended nightly tests.

Runs only on master on the main repo by default.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
…-processors

Closes open-mpi#368

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
They are renamed to PREFIX_hwloc_FOO instead of PREFIX_HWLOC_FOO
We could fix it but it doesn't matter much (people aren't supposed to
use those renamed names anyway) and it could break existing hacks
(if anybody actually depends on such renamed name).

Thanks to Samuel K. Gutierrez for the report.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Don't AND(normal, topology_allowed) in the normal (v2) case
to avoid hiding internal allowed set bugs.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
In some (old?) corner cases, Linux cpusets may return offline PUs
in the allowed sets of cpusets/cgroups.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
…ectory

fsroot and cpuid are implemented in tools using environment variables
(those debug cases are not in the API since v2).
Those backends forced by environment variable override the normal
topology thissystem flag that may be set with set_flags() in the API
and with --flags or --thissystem in cli tools. One must use the
HWLOC_THISSYSTEM envvar to force the this system flag.
Implement this automatically in the tools (common helpers).

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: ndenoyelle <ndenoyel@anl.gov>
Signed-off-by: ndenoyelle <ndenoyel@anl.gov>
Signed-off-by: ndenoyelle <ndenoyel@anl.gov>
Signed-off-by: ndenoyelle <ndenoyel@anl.gov>
Signed-off-by: ndenoyelle <ndenoyel@anl.gov>
Signed-off-by: ndenoyelle <ndenoyel@anl.gov>
@bgoglin
Copy link
Contributor

bgoglin commented Dec 4, 2019

@NicolasDenoyelle can you rebase/squash these commits to ease review?

@NicolasDenoyelle
Copy link
Author

NicolasDenoyelle commented Dec 4, 2019 via email

@bgoglin
Copy link
Contributor

bgoglin commented Mar 24, 2020

Note to open pull requests: some things changed in the CI yesterday, you'll need to rebase on top of master to avoid total CI failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants