Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coll: enable topology-aware CVARs in json #6782

Merged
merged 2 commits into from
Apr 11, 2024
Merged

Conversation

dycz0fx
Copy link
Contributor

@dycz0fx dycz0fx commented Nov 6, 2023

Pull Request Description

Allow setting up topology-aware CVARs in collective tuning json file for bcast, reduce, and allreduce.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@dycz0fx dycz0fx marked this pull request as ready for review November 15, 2023 19:16
@dycz0fx dycz0fx force-pushed the json_topo branch 2 times, most recently from 4dd2e38 to 34309ec Compare February 27, 2024 21:25
@dycz0fx
Copy link
Contributor Author

dycz0fx commented Mar 1, 2024

test:mpich/ch4/gpu

@dycz0fx
Copy link
Contributor Author

dycz0fx commented Mar 1, 2024

test:mpich/ch4/gpu

@dycz0fx
Copy link
Contributor Author

dycz0fx commented Mar 19, 2024

test:mpich/ch4/gpu

@dycz0fx
Copy link
Contributor Author

dycz0fx commented Mar 28, 2024

test:mpich/ch4/gpu

@dycz0fx
Copy link
Contributor Author

dycz0fx commented Mar 29, 2024

test:mpich/ch4/gpu

@hzhou
Copy link
Contributor

hzhou commented Apr 2, 2024

Note: we need make sure that tests failures are not related. This could be due to the Jenkins node issue, but we need resolve that first before passing this PR.

@dycz0fx
Copy link
Contributor Author

dycz0fx commented Apr 3, 2024

I will try the tests on main to see if these errors appear.

@dycz0fx
Copy link
Contributor Author

dycz0fx commented Apr 4, 2024

It seems like these test failures exist in the main branch. I created an empty PR and run test:mpich/ch4/gpu (#6960). The test failures in this PR can be found on that PR as well.
What should we do about these errors? I feel like some of the errors are related to the machine used for CI.

@hzhou
Copy link
Contributor

hzhou commented Apr 11, 2024

test:mpich/ch4/gpu/ofi

@hzhou
Copy link
Contributor

hzhou commented Apr 11, 2024

Refer to #6648 (comment) on test failures.

Allow setting up topology-aware CVARs in collective tuning json file for bcast, ireduce and allreduce.
Add k value when caching the topo_aware and topo_aware_k trees.
Add overhead, lat_diff_groups, lat_diff_switches and lat_same_switches
when caching topo_wave tree.
Copy link
Contributor

@hzhou hzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hzhou hzhou merged commit c7fdb53 into pmodels:main Apr 11, 2024
4 checks passed
@abrooks98 abrooks98 mentioned this pull request May 3, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants