Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RC] Release candidate for version 0.3.1 #442

Merged
merged 17 commits into from Apr 3, 2024
Merged

[RC] Release candidate for version 0.3.1 #442

merged 17 commits into from Apr 3, 2024

Conversation

yaoyaoding
Copy link
Member

No description provided.

vadiklyutiy and others added 17 commits March 5, 2024 11:57
Simple model with one conv2d failed. 
- fix signature for conv* ops to corresponds torch.nn.functional]
- add missed padding normalization

After that the model works
Add .vscode to .gitignore
Previously, if a performance regression fails due to an exception, the
job that stops the runner VM instances will be skipped, leaving the
instances on. This will make the stop_instances job run even when
previous jobs failed. Not sure if always() will override the
inputs.shutdown_instances flag, if it does we can move it into the step
scope.
Module wrapper around groupnorm operator. Supports compiled app
development.
Adds ResNet model functionality and model hierarchy for compiled apps.

Some comments in files are artifacts left for the pipeline interface
(part 2 of this PR).

See huggingface implementation for original API inspiration.

Resolves #59
- move scripts from `.github/scripts` to `tests/benchmarks`
- move `run_configs.json` (describes what perf tests we run) move from
hidet-ci repo to this repo
- add individual operators' benches via torch API (not added to CI run
yet)
 - unify scripts to run either hidet as backend or inductor as backend
Increase batch size for vision benchmarks from 1 to 128 to
 - be close to real life example
 - deacrease fluctuation of time
Add bias to Conv2d Module.

Defaults to false for back compatibility, **this is different from torch
default**.

Towards #57
Flagging slow tests as a result of huggingface dependency
(2hrs). To debug on private CI runs.

Resolves #87.
Add some necessary module components used frequently in Stable
Diffusion's UNet.

Includes fixes to module attribute access from LLM branch and work
arounds for torch weight copying.

Towards #57.
The CentML compilation backend I am working on wants to wrap the
CompiledGraphs forward function (the one returned by get_wrapper) in a
torch.fx.GraphModule. This GraphModule would then be pickled and sent
from a server to a client.

However, it isn't possible to pickle the lambda/local function returned
by get_wrapper. Therefore, I am turning get_wrapper into a class
CompiledForwardFunction whose forward function behaves like the wrapper
returned by get_wrapper.

Additionally, in order to pickle CompiledForwardFunction, I have defined
pickling and unpickling behaviour for CompiledGraph using __getstate__
and __setstate__ respectively. These just call CompiledGraph's existing
save and load functions.
#104)

Add `@cahced_property` for constant in IR data type to improve
compilation time.

Measured with
`$ python bench_op.py matmul_f16 --params 1x4096x4096,1x4096x4096
--dtype float16`
with `hidet.option.parallel_tune(max_parallel_jobs=1)`

**before: 152.5 sec
after: 132.5 sec
improvement is 15%**
Add graph module for using flash attention and clarify some differences
in flash attention vs torch sdpa.

**Attention: (pun intended)**

Softmax has temperature scaling option. Divides inputs by scalar, good
explanation of numerical effects
[here](https://medium.com/@harshit158/softmax-temperature-5492e4007f71).

Used when softmax inputs QK are too big for float 16 (abs value >
65504). This usually means the numbers are so large that dividing by
small (< 4) scalar has little effect.

Stable diffusion does not use this, as torch spda supports float 32 (or
somehow avoids NaNs from large values). No visual or significant numeric
differences in this output layer noticed.

Towards #57.
@yaoyaoding yaoyaoding merged commit 33d8bdd into main Apr 3, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants