Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switched the test cases to use threads instead of processes #115032

Closed
wants to merge 1 commit into from

Conversation

Pouya0079
Copy link

@Pouya0079 Pouya0079 commented Dec 3, 2023

I made some modifications to the test cases, switching from processors to using MultithreadTestCase with DTensorOpTestBase. Additionally, I replaced device_mesh with self.build_mesh and eliminated the with_comms wrapper, as per your suggestions in other discussions.

Fixes #108744

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @LucasLLC @d4l3k

…e changes to use self.build_mesh instead of device_mesh and also removed the with_comms wrapper
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Dec 3, 2023
Copy link

pytorch-bot bot commented Dec 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115032

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 2 Unrelated Failures

As of commit 4df96a5 with merge base 12f95df (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented Dec 3, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Pouya0079 / name: Pouya Nekou (4df96a5)

@awgu
Copy link
Contributor

awgu commented Dec 3, 2023

I was curious to learn more about the motivation behind changing device_mesh to self.build_mesh as an attribute.

def test_addmm(self):
device_mesh = DeviceMesh(self.device_type, list(range(self.world_size)))
self.build_mesh = DeviceMesh(self.device_type, list(range(self.world_size)))
Copy link
Contributor

@fduwjj fduwjj Dec 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of Andrew's question, DeviceMesh return a device_mesh, maybe you want to call it self.device_mesh or self.mesh?

@fduwjj fduwjj added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Dec 3, 2023
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 6, 2023
@albanD albanD added oncall: distributed Add this issue/PR to distributed oncall triage queue and removed module: distributed labels Dec 8, 2023
Copy link

github-actions bot commented Feb 6, 2024

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Feb 6, 2024
@wconstab
Copy link
Contributor

wconstab commented Feb 7, 2024

@wanchaol @tianyu-l do we want to land this? Looks like it got lost

@github-actions github-actions bot closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR oncall: distributed Add this issue/PR to distributed oncall triage queue open source Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch more test cases to use MultithreadTestCase
7 participants