Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETFeeder] Resolve deps based on not only data_deps, but also ctrl_deps #11

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

changhai0109
Copy link
Contributor

Summary

In ETFeederNode, add fields of all_deps, which is a complete set of data_deps and ctrl_deps.
Add unreleased_deps to track the parents of a node which is not issued yet.
Update deps resolving reference to all_deps instead of data_deps.

Test Plan

Add following function to et_generator

def two_comp_nodes_ctrl_dependent(num_npus: int, runtime: int) -> None:
    for npu_id in range(num_npus):
        output_filename = f"two_comp_nodes_ctrl_dependent.{npu_id}.et"
        with open(output_filename, "wb") as et:
            encode_message(et, GlobalMetadata(version="0.0.4"))

            parent_node = get_node("COMP_NODE", COMP_NODE)
            parent_node.duration_micros = runtime
            parent_node.attr.append(ChakraAttr(name="is_cpu_op", bool_val=False))
            encode_message(et, parent_node)

            child_node = get_node("COMP_NODE", COMP_NODE)
            child_node.duration_micros = runtime
            child_node.ctrl_deps.append(parent_node.id)
            child_node.attr.append(ChakraAttr(name="is_cpu_op", bool_val=False))
            encode_message(et, child_node)

register this function in main as follows

def main():
   ...
  two_comp_nodes_ctrl_dependent(args.num _npus, args.default_runtime)
  ...

Run et_generator and get two_comp_nodes_ctrl_dependent.{npu_id}.et, then run with astrasim.

@changhai0109 changhai0109 requested a review from a team as a code owner December 12, 2023 00:05
Copy link

github-actions bot commented Dec 12, 2023

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@JoongunPark
Copy link
Contributor

JoongunPark commented Jan 30, 2024

@changhai0109,
Does taking into account both control dependency and data dependency enhance the accuracy of simulations?
Are there any instances where an order meets the data dependency requirements but not the control dependency requirements?

@TaekyungHeo
Copy link
Contributor

@changhai0109, I appreciate your contribution to Chakra.

I believe we should solely rely on data dependencies, rather than control dependencies. Control dependencies refer to dependencies used in the host Chakra execution traces (PyTorch execution traces). Although the final Chakra host + device execution traces include control dependencies, these are encoded for compatibility purposes rather than for simulation. Thus, it would be preferable to avoid depending on control dependencies for simulation purposes.

@TaekyungHeo TaekyungHeo self-requested a review February 6, 2024 19:46
@srinivas212
Copy link
Contributor

@changhai0109 kindly answer @TaekyungHeo and @JoongunPark questions. We can review this PR and see if this is required. As things stand, we do not need to encode this additional deps for simulation use cases.

@changhai0109
Copy link
Contributor Author

@TaekyungHeo @JoongunPark

I understand for now we do not need to rely on ctrl deps for now. However, support ctrl deps might be beneficial for ppl who study scheduling problems.

So instead of supporting only data deps, how about adding a flag in ETFeeder, so that users can choose whether enable ctrl deps, then for current usages just disable ctrl deps.

@JoongunPark
Copy link
Contributor

@TaekyungHeo @JoongunPark

I understand for now we do not need to rely on ctrl deps for now. However, support ctrl deps might be beneficial for ppl who study scheduling problems.

So instead of supporting only data deps, how about adding a flag in ETFeeder, so that users can choose whether enable ctrl deps, then for current usages just disable ctrl deps.

Your suggestion to add a flag in ETFeeder for enabling control dependencies raises an interesting point. Have you seen any examples or scenarios where the scheduling process is significantly impacted by the choice of dependency option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants