[RC] Release candidate for version 0.3.1 #442

yaoyaoding · 2024-04-03T12:24:56Z

No description provided.

Simple model with one conv2d failed. - fix signature for conv* ops to corresponds torch.nn.functional] - add missed padding normalization After that the model works

Partial changes related to #18

Add .vscode to .gitignore

Previously, if a performance regression fails due to an exception, the job that stops the runner VM instances will be skipped, leaving the instances on. This will make the stop_instances job run even when previous jobs failed. Not sure if always() will override the inputs.shutdown_instances flag, if it does we can move it into the step scope.

See details: #426

Module wrapper around groupnorm operator. Supports compiled app development.

Adds ResNet model functionality and model hierarchy for compiled apps. Some comments in files are artifacts left for the pipeline interface (part 2 of this PR). See huggingface implementation for original API inspiration. Resolves #59

- move scripts from `.github/scripts` to `tests/benchmarks` - move `run_configs.json` (describes what perf tests we run) move from hidet-ci repo to this repo - add individual operators' benches via torch API (not added to CI run yet) - unify scripts to run either hidet as backend or inductor as backend

Increase batch size for vision benchmarks from 1 to 128 to - be close to real life example - deacrease fluctuation of time

Add bias to Conv2d Module. Defaults to false for back compatibility, **this is different from torch default**. Towards #57

Flagging slow tests as a result of huggingface dependency (2hrs). To debug on private CI runs. Resolves #87.

Add some necessary module components used frequently in Stable Diffusion's UNet. Includes fixes to module attribute access from LLM branch and work arounds for torch weight copying. Towards #57.

The CentML compilation backend I am working on wants to wrap the CompiledGraphs forward function (the one returned by get_wrapper) in a torch.fx.GraphModule. This GraphModule would then be pickled and sent from a server to a client. However, it isn't possible to pickle the lambda/local function returned by get_wrapper. Therefore, I am turning get_wrapper into a class CompiledForwardFunction whose forward function behaves like the wrapper returned by get_wrapper. Additionally, in order to pickle CompiledForwardFunction, I have defined pickling and unpickling behaviour for CompiledGraph using __getstate__ and __setstate__ respectively. These just call CompiledGraph's existing save and load functions.

#104) Add `@cahced_property` for constant in IR data type to improve compilation time. Measured with `$ python bench_op.py matmul_f16 --params 1x4096x4096,1x4096x4096 --dtype float16` with `hidet.option.parallel_tune(max_parallel_jobs=1)` **before: 152.5 sec after: 132.5 sec improvement is 15%**

Add graph module for using flash attention and clarify some differences in flash attention vs torch sdpa. **Attention: (pun intended)** Softmax has temperature scaling option. Divides inputs by scalar, good explanation of numerical effects [here](https://medium.com/@harshit158/softmax-temperature-5492e4007f71). Used when softmax inputs QK are too big for float 16 (abs value > 65504). This usually means the numbers are so large that dividing by small (< 4) scalar has little effect. Stable diffusion does not use this, as torch spda supports float 32 (or somehow avoids NaNs from large values). No visual or significant numeric differences in this output layer noticed. Towards #57.

vadiklyutiy and others added 17 commits March 5, 2024 11:57

Fixes to make simplest conv working (#22)

5169410

Simple model with one conv2d failed. - fix signature for conv* ops to corresponds torch.nn.functional] - add missed padding normalization After that the model works

[Torch][Operator] Some operator support (#49)

8cccced

Partial changes related to #18

Add .vscode to .gitignore (#61)

75ca9e1

Add .vscode to .gitignore

[Compilation] Optimization for compilation process (#65)

c15fbac

See details: #426

[Graph] Add GroupNorm module (#70)

6b02878

Module wrapper around groupnorm operator. Supports compiled app development.

[App] Resnet Compiled App - Modeling (1/2) (#47)

abec2ee

Adds ResNet model functionality and model hierarchy for compiled apps. Some comments in files are artifacts left for the pipeline interface (part 2 of this PR). See huggingface implementation for original API inspiration. Resolves #59

Increase batch size for vision benchmarks (#86)

5034b31

Increase batch size for vision benchmarks from 1 to 128 to - be close to real life example - deacrease fluctuation of time

[Graph] Conv2d Bias (#92)

487aada

Add bias to Conv2d Module. Defaults to false for back compatibility, **this is different from torch default**. Towards #57

[Fixbug] Mark slow compiled app test (#89)

01b7b92

Flagging slow tests as a result of huggingface dependency (2hrs). To debug on private CI runs. Resolves #87.

[Graph} Add basic UNet module components (#93)

6566437

Add some necessary module components used frequently in Stable Diffusion's UNet. Includes fixes to module attribute access from LLM branch and work arounds for torch weight copying. Towards #57.

Merge remote-tracking branch 'origin/main' into upstream-main

3dd9826

bump version to 0.3.1

df05f83

yaoyaoding merged commit 33d8bdd into main Apr 3, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RC] Release candidate for version 0.3.1 #442

[RC] Release candidate for version 0.3.1 #442

yaoyaoding commented Apr 3, 2024

[RC] Release candidate for version 0.3.1 #442

[RC] Release candidate for version 0.3.1 #442

Conversation

yaoyaoding commented Apr 3, 2024