Utilize ctrl_deps for operator dependencies in simulation #25

TaekyungHeo · 2024-02-22T21:55:07Z

Summary

Previously, the data_deps field was utilized to encode operator dependencies in simulations. However, data_deps should actually be reserved for encoding data dependencies, not for simulating operator dependencies. Therefore, this commit updates the code to ensure that pytorch2chakra_converter.py employs ctrl_deps for this purpose.

Test Plan

$ cd ~/param
$ cd param/train/comms/pt
$ pip install .
$ cd ../../compute/python
$ pip install -r requirements.txt
$ python setup.py install
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_0.json --kineto-file ~/llama_kineto/worker0_step_12.1697596714999.pt.trace.json --output-file ~/rank0.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_1.json --kineto-file ~/llama_kineto/worker1_step_12.1697596715001.pt.trace.json --output-file ~/rank1.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_2.json --kineto-file ~/llama_kineto/worker2_step_12.1697596714848.pt.trace.json --output-file ~/rank2.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_3.json --kineto-file ~/llama_kineto/worker3_step_12.1697596714880.pt.trace.json --output-file ~/rank3.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_4.json --kineto-file ~/llama_kineto/worker4_step_12.1697596714944.pt.trace.json --output-file ~/rank4.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_5.json --kineto-file ~/llama_kineto/worker5_step_12.1697596714871.pt.trace.json --output-file ~/rank5.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_6.json --kineto-file ~/llama_kineto/worker6_step_12.1697596714614.pt.trace.json --output-file ~/rank6.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_7.json --kineto-file ~/llama_kineto/worker7_step_12.1697596714853.pt.trace.json --output-file ~/rank7.json &

$ cd ~/charka
$ pip install .
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank0.json --output_filename ~/rank.0.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank1.json --output_filename ~/rank.1.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank2.json --output_filename ~/rank.2.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank3.json --output_filename ~/rank.3.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank4.json --output_filename ~/rank.4.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank5.json --output_filename ~/rank.5.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank6.json --output_filename ~/rank.6.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank7.json --output_filename ~/rank.7.et --num_dims 1

$ cd ~/astra-sim
$ ./build/astra_analytical/build.sh
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware --workload-configuration=/Users/
theo/rank --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \                                                                  
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
sys[2] finished, 7213509000 cycles                                
sys[6] finished, 7226613000 cycles                                
sys[0] finished, 7269182000 cycles                                
sys[4] finished, 7276689000 cycles                                
sys[1] finished, 7340042000 cycles                                
sys[7] finished, 7367494000 cycles                                
sys[5] finished, 7374663000 cycles                                
sys[3] finished, 7375565000 cycles

github-actions · 2024-02-22T21:55:20Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Previously, the data_deps field was utilized to encode operator dependencies in simulations. However, data_deps should actually be reserved for encoding data dependencies, not for simulating operator dependencies. Therefore, this commit updates the code to ensure that pytorch2chakra_converter.py employs ctrl_deps for this purpose.

TaekyungHeo force-pushed the ctrl-dep branch from 8977da7 to fdf1894 Compare February 23, 2024 01:03

TaekyungHeo marked this pull request as ready for review February 23, 2024 01:30

TaekyungHeo requested a review from a team as a code owner February 23, 2024 01:30

JoongunPark closed this May 10, 2024

github-actions bot locked and limited conversation to collaborators May 10, 2024

JoongunPark reopened this May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize ctrl_deps for operator dependencies in simulation #25

Utilize ctrl_deps for operator dependencies in simulation #25

TaekyungHeo commented Feb 22, 2024 •

edited

github-actions bot commented Feb 22, 2024 •

edited

Utilize ctrl_deps for operator dependencies in simulation #25

Are you sure you want to change the base?

Utilize ctrl_deps for operator dependencies in simulation #25

Conversation

TaekyungHeo commented Feb 22, 2024 • edited

Summary

Test Plan

github-actions bot commented Feb 22, 2024 • edited

TaekyungHeo commented Feb 22, 2024 •

edited

github-actions bot commented Feb 22, 2024 •

edited