Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810

imaihal · 2024-05-01T04:20:11Z

This PR introduces new operations for Operator-level parallelization. Overall plan for implementation of Operator-level parallelization is described in issue #2811 . This PR is the first PR.

ONNXParallelOp
- Specifies parallel region. ONNXForkOps should be included in the body region. The threads created by ONNXForkOp are synchronized with main thread at the end of this operation. The results to be used in following operations need to be set as the results of this operation by using ONNXYieldOp.
ONNXForkOp
- Creates a thread to run the operations included in the body region of the operation.

%Y0, %Y1 = onnx.Parallel{
   %Y0 = onnx.Fork () -> tensor<>{
      // ONNX operations 
      onnx.yield %Y
    }
   %Y1 = onnx.Fork () -> tensor<>{
      // ONNX operations
       onnx.yield %Y
   }
   onnx.yield %Y0, %Y1
 }

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>

AlexandreEichenberger

@imaihal very clean interface. High level question before I go into the details of the PR; how are the output of the ONNX.Parallel defined? Is the first output the output of the first fork, and the second output the output of the second fork?

Because in mlir, parallel already exists in thescf dialect as the "equivalent" of omp parallel for, maybe we should name this construct ONNX.ParallelForks? Just to avoid future conflicts if we were to introduce a parallel loop.

In term of OMP, the provided functionality here is the closest to the omp parallel sections with omp section inside. So that could be an alternative naming scheme (namely ONNX.ParallelSections and ONNX.Section).

imaihal · 2024-05-02T00:34:51Z

@AlexandreEichenberger Thanks for your comments!

how are the output of the ONNX.Parallel defined? Is the first output the output of the first fork, and the second output the output of the second fork?

I didn't consider it. The output order of ONNX.Parallel is the same as the inputs order of ONNX.Yield. That is, the output of second fork can be placed at the first output of ONNX.Parallel. Do you think we need to define the order?

Regarding naming, current ONNX.Parallel is too generic. So, I agree with your suggestion to change. I prefer ONNX.ParallelSections because it clearly expresses there is parallel region, but I would like to hear others' opinion.

AlexandreEichenberger · 2024-05-02T12:56:07Z

didn't consider it. The output order of ONNX.Parallel is the same as the inputs order of ONNX.Yield. That is, the output of second fork can be placed at the first output of ONNX.Parallel. Do you think we need to define the order?

The way the values of each "yield" gets reflected into the output of the "parallel" needs to be defined in the text associated with the operations. If the current approach is that the "first" fork/yield produces the "first" output value of the "parallel", and the "second" fork/yield generates the "second" output value of the parallel, that is perfectly fine. We just need to make sure the fork are not reordered (I am sure they are not), and clearly explain this ordering in the definition/example associated with the new operations.

tungld · 2024-05-07T05:54:10Z

Thanks for proposing these operations. I have some high-level comments.

%Y0, %Y1 = onnx.Parallel{
   %Y0 = onnx.Fork () -> tensor<>{
      // ONNX operations 
      onnx.yield %Y
    }
   %Y1 = onnx.Fork () -> tensor<>{
      // ONNX operations
       onnx.yield %Y
   }
   onnx.yield %Y0, %Y1
 }

onnx.Parallel and onnx.Fork do not have input, so it looks to me you assumed global variables in ONNX so that operations inside onnx.Fork can use them directly, not via function arguments. In general, it's more difficult to reason about a program with global variables. Is it acceptable in the ONNX specification? E.g. any ONNX ops you know that use global variables.

There is a subtle difference between yield and return. return is often to stop the function and quit, while yield is like a generator just pausing the execution and the next execution (e.g. next iteration in a loop) can have the yield value as input. onnx.Return is more suitable.

It looks to me adding onnx.Parallel and onnx.Fork will change the onnx model a lot. By introducing regions, we need to move the original ONNX operations into the region of onnx.Parallel and onnx.Fork, which potentially makes optimizations (e.g. removing Stick/Unstick pair) difficult. Not sure about if the lowering to krnl dialect is easy or not. Have you considered any other approaches? e.g. annotating operations like the ones the ONNX Multi-Device group is working on.

chentong319 · 2024-05-08T00:18:53Z

The other structural ONNX ops, such as ONNX.IF and ONNX.Loop, can read value from outside, but all the generated values can be used outside only through yield. Do the new ops have the same behavior?

chentong319 · 2024-05-08T00:22:42Z

To me, the parallel op is a scheduling decision, and should be added in a later phase. They should be not be in onnx dialect, if there is another choice. By the way, can we use omp.sections for this purpose?

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>

imaihal · 2024-05-10T03:29:29Z

@AlexandreEichenberger

The way the values of each "yield" gets reflected into the output of the "parallel" needs to be defined in the text associated with the operations.

I updated the description for the operations. Thanks.
However, we may need to consider the case that onnx.Fork op has multiple results. In the case, i-th results of onnx.Parallel is not the result of i-th onnx.Fork.

imaihal · 2024-05-10T05:30:13Z

@chentong319

The other structural ONNX ops, such as ONNX.IF and ONNX.Loop, can read value from outside, but all the generated values can be used outside only through yield. Do the new ops have the same behavior?

Yes. The ONNXParallel and ONNXFork have the same behavior.

To me, the parallel op is a scheduling decision, and should be added in a later phase. They should be not be in onnx dialect, if there is another choice. By the way, can we use omp.sections for this purpose?

We assume these operations are inserted just before ONNXToKrnl pass after applying optimizations in ONNX IR.(Currently we are using a rewriting tool for testing. We generate ONNXIR using -EmitONNXIR, then insert ONNXParallelOp and ONNXForkOp using the tool).

I think omp.section is memref-level operation. We wanted tensor-level operations to parallelize in tensor level. Currently in PR #2756, we used krnl.parallel in memre-level reusing existing openMP parallelization, but we should be able to lower these operations into omp.section.

imaihal added 3 commits May 1, 2024 04:15

Add td file for ONNXParallelOp and ONNXForkOp.

fefae10

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>

Add shapeInference for the ops.

409d9d2

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>

lit test for shape inference.

8b5579e

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>

imaihal mentioned this pull request May 1, 2024

Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

Open

imaihal requested review from AlexandreEichenberger, tungld, negiyas and chentong319 May 1, 2024 07:13

AlexandreEichenberger reviewed May 1, 2024

View reviewed changes

imaihal added the Ready for Review label May 2, 2024

imaihal added 2 commits May 10, 2024 03:20

Update descriptions for onnx.Parallel and onnx.Fork

338e42a

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>

Merge branch 'main' into parallelop_forkop_with_omp_pr1_opdef

e9282c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810

Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810

imaihal commented May 1, 2024 •

edited

AlexandreEichenberger left a comment

imaihal commented May 2, 2024

AlexandreEichenberger commented May 2, 2024

tungld commented May 7, 2024

chentong319 commented May 8, 2024

chentong319 commented May 8, 2024

imaihal commented May 10, 2024 •

edited

imaihal commented May 10, 2024

Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810

Are you sure you want to change the base?

Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810

Conversation

imaihal commented May 1, 2024 • edited

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

imaihal commented May 2, 2024

AlexandreEichenberger commented May 2, 2024

tungld commented May 7, 2024

chentong319 commented May 8, 2024

chentong319 commented May 8, 2024

imaihal commented May 10, 2024 • edited

imaihal commented May 10, 2024

imaihal commented May 1, 2024 •

edited

imaihal commented May 10, 2024 •

edited