Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: propose split op for oneDNN Graph API #1628

Open
wants to merge 1 commit into
base: rfcs
Choose a base branch
from

Conversation

xiang1guo
Copy link
Contributor

This RFC is to add a new Split operation to oneDNN graph API opset.

The rendered version: link

@xiang1guo
Copy link
Contributor Author

Can you help to review this RFC? Thanks! @igorsafo @mgouicem @dzarukin

@igorsafo igorsafo added the RFC A design document label Apr 24, 2023
@chunyuan-w
Copy link

  1. PyTorch split OP has a split_size_or_sections input:
  • When it is list, it is aligned with the size_splits attr here.
  • When it is an int, it seems that there's no correspondent attr:
    • PyTorch int split_size_or_sections:

      • It is the size of the chunk and it denotes that the input will be split into equally sized chunks (if possible).
      • Last chunk will be smaller if the tensor size along the given dimension dim is not divisible by split_size.
    • num_splits in oneDNN graph:

      • It is the number of chunks
      • The dimension of input tensor data shape along axis must be evenly divisible by num_splits attribute.

In static shape cases, if the input shape is known, the integration could try to convert int split_size_or_sections to num_splits in case the dimension could be evenly divisible by num_splits. In dynamic shape cases, I feel like it will be hard for the integration to map the OP to oneDNN graph.

  1. For PyTorch, each of the output chunk is a view of the original tensor. If it is not the case in oneDNN graph, may I know if this will be mentioned anywhere in the document?

@sanchitintel
Copy link

@xiang1guo, we plan to emulate dynamic shape support in IPEX with multiple compiled partitions for various static shapes.
Since the num_splits attribute is to be provided at the time of graph creation, IPEX would provide its value for the first input but then it might change for a later input! :(

cc @chunyuan-w

@xiang1guo
Copy link
Contributor Author

xiang1guo commented Apr 26, 2023

  1. PyTorch split OP has a split_size_or_sections input:
  • When it is list, it is aligned with the size_splits attr here.

  • When it is an int, it seems that there's no correspondent attr:

    • PyTorch int split_size_or_sections:

      • It is the size of the chunk and it denotes that the input will be split into equally sized chunks (if possible).
      • Last chunk will be smaller if the tensor size along the given dimension dim is not divisible by split_size.
    • num_splits in oneDNN graph:

      • It is the number of chunks
      • The dimension of input tensor data shape along axis must be evenly divisible by num_splits attribute.

In static shape cases, if the input shape is known, the integration could try to convert int split_size_or_sections to num_splits in case the dimension could be evenly divisible by num_splits. In dynamic shape cases, I feel like it will be hard for the integration to map the OP to oneDNN graph.

  1. For PyTorch, each of the output chunk is a view of the original tensor. If it is not the case in oneDNN graph, may I know if this will be mentioned anywhere in the document?

For 1, as you mentioned in the meeting, we can add another attribute to align the meaning of Pytorch. It helps for mapping, but it still just can't handle the situation when the axis dimension of the input tensor is dynamic. @ZhennanQin @yifeizh2 Do you have any comments?

For 2, yes, graph API doesn't guarantee the view or copy. I will add more descriptions for it. Thanks for your reminder!

@ZhennanQin
Copy link
Contributor

For 1, as you mentioned in the meeting, we can add another attribute to align the meaning of Pytorch. It helps for mapping, but it still just can't handle the situation when the axis dimension of the input tensor is dynamic.

What do you mean by the axis dimension of the input tensor is dynamic? Can you provide an example for better understanding?

@xiang1guo
Copy link
Contributor Author

For 1, as you mentioned in the meeting, we can add another attribute to align the meaning of Pytorch. It helps for mapping, but it still just can't handle the situation when the axis dimension of the input tensor is dynamic.

What do you mean by the axis dimension of the input tensor is dynamic? Can you provide an example for better understanding?

For dynamic input data [1, 28, ?](? means this dimension can be gotten in the execution stage) and the axis=2.

@ZhennanQin
Copy link
Contributor

For 1, as you mentioned in the meeting, we can add another attribute to align the meaning of Pytorch. It helps for mapping, but it still just can't handle the situation when the axis dimension of the input tensor is dynamic.

What do you mean by the axis dimension of the input tensor is dynamic? Can you provide an example for better understanding?

For dynamic input data [1, 28, ?](? means this dimension can be gotten in the execution stage) and the axis=2.

I see. Thanks for the example. In this case, the number of outputs from split is dynamic, how's this represented in the framework graph?

@xiang1guo
Copy link
Contributor Author

@xiang1guo xiang1guo closed this Apr 26, 2023
@xiang1guo xiang1guo reopened this Apr 26, 2023
@xiang1guo
Copy link
Contributor Author

For 1, as you mentioned in the meeting, we can add another attribute to align the meaning of Pytorch. It helps for mapping, but it still just can't handle the situation when the axis dimension of the input tensor is dynamic.

What do you mean by the axis dimension of the input tensor is dynamic? Can you provide an example for better understanding?

For dynamic input data [1, 28, ?](? means this dimension can be gotten in the execution stage) and the axis=2.

I see. Thanks for the example. In this case, the number of outputs from split is dynamic, how's this represented in the framework graph?

Seems for now we don't have any real case. @chunyuan-w @sanchitintel Do you have any real case about it?

@wuxun-zhang
Copy link
Contributor

@xiang1guo, we plan to emulate dynamic shape support in IPEX with multiple compiled partitions for various static shapes. Since the num_splits attribute is to be provided at the time of graph creation, IPEX would provide its value for the first input but then it might change for a later input! :(

cc @chunyuan-w

Hi @sanchitintel, by saying then it might change for a later input, do you mean that num_splits will be no longer an attribute? How is it handled in bridge?

@sanchitintel
Copy link

by saying then it might change for a later input, do you mean that num_splits will be no longer an attribute? How is it handled in bridge?

Hi @wuxun-zhang, if the input-size would change, then num_splits may change as well, but since the framework would have passed a value of num_splits corresponding to the first input encountered at the time of graph creation, I think we would run into a shape-mismatch issue at runtime.

So, it'd be great if we could pass this value at compile time instead. IIRC, something similar is done for DynamicReshape as well.

@sanchitintel
Copy link

sanchitintel commented Apr 27, 2023

In this case, the number of outputs from split is dynamic, how's this represented in the framework graph?

@ZhennanQin, currently, the IPEX/PyTorch JIT framework graph is created by tracing with a sample input, so the framework graph's nodes are specific to that input, so, for the case you mentioned, another input with a different input-shape would have generated a different graph.

So, if we were to lower such an op (that would generate a different number of tensors) to oneDNN Graph, then we would encounter a runtime error with an input tensor of a different shape than profiled. (However, if we wouldn't lower such an op to oneDNN Graph, then those ops would be handled by Pytorch/IPEX eager mode.)

@frost-intel @chunyuan-w, do you know how Dynamo handles such cases? Thanks!

@chunyuan-w
Copy link

chunyuan-w commented Apr 27, 2023

In this case, the number of outputs from split is dynamic, how's this represented in the framework graph?

@ZhennanQin, currently, the IPEX/PyTorch JIT framework graph is created by tracing with a sample input, so the framework graph's nodes are specific to that input, so, for the case you mentioned, another input with a different input-shape would have generated a different graph.

So, if we were to lower such an op (that would generate a different number of tensors) to oneDNN Graph, then we would encounter a runtime error with an input tensor of a different shape than profiled. (However, if we wouldn't lower such an op to oneDNN Graph, then those ops would be handled by Pytorch/IPEX eager mode.)

@frost-intel @chunyuan-w, do you know how Dynamo handles such cases? Thanks!

I would suggest adding an attribute split_size to match the int type split_size_or_sections in PyTorch.

  • When the TorchScript IR indicates that this int split_size_or_sections is a Constant, the bridge will set the split_size attr of Split OP to be the Constant value that is known during compilation time.
  • If it is not a Constant according to the TorchScript IR, we're not able to map it to the static attr. Another runtime representation of split_size will be needed. A bit similar to the discussion on the exponent of the Pow OP.

@frost-intel
Copy link

@frost-intel @chunyuan-w, do you know how Dynamo handles such cases? Thanks!

Dynamo can either use a constant input size similar to PT JIT, or it can have a dynamic size marked as a SymInt object with undetermined size. We haven't explored dynamic sizes in our Dynamo/oneDNN Graph bridge, but an attribute or flag marking a dimension as Dynamic would likely be required.

@sanchitintel
Copy link

  • If it is not a Constant according to the TorchScript IR, we're not able to map it to the static attr. Another runtime representation of split_size will be needed. A bit similar to the discussion on the exponent of the Pow OP.

Thanks, @chunyuan-w! If split_size_or_sections varies for different input-sizes, then it'd not be constant in TorchScript IR, but would be generated with a combination of ops including aten::size. So, I agree with your proposal, except that in the multiple-compiled-partitions-for-different-static-input-shapes implementation, split_size would also be available at compile time (as opposed to run-time).

Thanks for your inputs, @frost-intel!

@xiang1guo xiang1guo force-pushed the xiang/rfcs/graph-split-op branch 2 times, most recently from a8d4b09 to 4d27c53 Compare May 6, 2023 06:54
@xiang1guo
Copy link
Contributor Author

xiang1guo commented May 6, 2023

@chunyuan-w @sanchitintel @retonym @ZhennanQin @yifeizh2 I just updated the RFC and added an option2 per pytorch team's request. Please help to review the option2 again, if confirmed, our final implementation will be based on this option.

| input | `src` | input data | Required | f32,f16,bf16* |
| attribute | `axis` | Specifies which dimension to split along. | Required | s64 |
| attribute | `num_splits` | number of outputs into which the input tensor data will be split along axis dimension. | optional | s64 |
| attribute | `size_split` | size for single output tensor along axis dimension. | optional | s64 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during graph build stage, what if size_split is specified while input shape is unknown? How to determine the output number?

Copy link
Contributor Author

@xiang1guo xiang1guo May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case you mentioned is a dynamic case that is not supported and has not been encountered yet. Once the dynamic shape feature lands in the future release, we can take it into consideration. cc @wuxun-zhang .
The option2 here is for the convenience of integration on the IPEX side which can simply map the size_split_or_sections into size_split during the build stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a dynamic shape case. FWK won't provide concrete shapes during the graph build stage even for models with static shapes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during graph build stage, what if size_split is specified while input shape is unknown? How to determine the output number?

Why need to determine the output number at graph build stage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to determine the output number at graph build stage?

If you don't know the output number, how to build the graph after this op? A simple example is split+concat, how to collect all the outputs from split and feed them into contact op?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's related to how bridge pass framework's graph to oneDNN Graph. We may need framework team to help answer, cc @chunyuan-w @sanchitintel @retonym.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this is a case where the graph structure depends on the input shapes? In the current integration, the graph will only be built once. I feel like it does not work anymore in such cases. Let me check with @jgong5 & @sanchitintel on it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunyuan-w, I agree that this case would encounter a runtime error with our multiple compiled partitions per partition approach. I think we can't solve this problem without re-creating the graph.


#### operations used in Frameworks and toolkit

| Tensorflow | Argument Name | Description |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As TF workloads moves to XLA compiler path, and Pytorch move to Torchdynamic/TorchInductor, FW operations are decomposed to XLA HLO op or Pytorch PrimTorch OP. These operations usually has simpler semantics. oneDNN Graph operation should first consider what is the basic operation we need to support, and usually these operation design would give better insight - what operations are popular (those unpopular ones may not lowered to these simple ops), and how the basic operations look like.

Could you please also include how these two set operations support split?

Copy link
Contributor Author

@xiang1guo xiang1guo May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @Jianhui-Li, just checked the XLA and PrimTorch, these two operations don't directly support Split.

  • TensorFlow's Split will be lowered to XLA HLO Slice op. I have updated the definition of XLA HLO Slice in this section.
  • For Pytorch/PrimTorch, when Torch IR is lowered to Aten IR/Prim IR, the torch.Split will call aten.Split and there is no decomposition to PrimTorch ops happening during this stage. And inductor will lower aten.Split to slice in its implementation.
    cc @retonym @chunyuan-w

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC A design document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet