Skip to content

Releases: hidet-org/hidet

Hidet v0.3.1

03 Apr 15:21
33d8bdd
Compare
Choose a tag to compare

What's Changed

  • [Version] Bump version to v0.3.1.dev by @yaoyaoding in #361
  • [Option] Add an option to disable imperative execution by @serach24 in #362
  • [Graph][Benchmark] Update benchmark function by @Aalanli in #363
  • [Compile Server] Update deps for compilation server by @xinli-git in #365
  • [Utils] Changed the multiprocessing context by @destefy in #367
  • [Dynamo] Refactoring code for Hidet remote compilation by @destefy in #369
  • [Graph][Dynamo Backend] Lshift/Rshift/Mod by @Aalanli in #371
  • [Graph][Operator] Fix reduce bug, add uint8x4 by @Aalanli in #372
  • [CompiledGraph] Add option to store dispatch table option by @destefy in #377
  • [Graph][Tensor] remove unnecessary synchronization by @xiaocenxiaocen in #374
  • [Graph][Dynamo Backend] Minor imperative run bug fix by @Aalanli in #383
  • [Graph] Fix CompiledGraph aliasing bug by @Aalanli in #384
  • [Frontend] Add mapping for torch.sqrt by @yaoyaoding in #387
  • [Fix][Graph] Write compiled graph to tempfile first by @destefy in #392
  • [Operators] Improving fp32 matrix multiplication on x86 CPUs by @BolinSNLHM in #378
  • [Fixbug] Fix a bug related to c/c++ integer promotion by @yaoyaoding in #391
  • [Option] Add option to set class Var id attribute to 0 by default by @destefy in #393
  • [CI] Add CI workflow and scripts by @hjjq in #394
  • [CI] Fix deadlock by @hjjq in #395
  • [Operator] Enhancements to Reduce by @hjjq in #366
  • [CI] Launch and stop compile server via workflow by @hjjq in #396
  • [Operator] Support advanced options for pooling operators by @yaoyaoding in #399
  • [Torch] Implements torch_func protocol by @yaoyaoding in #400
  • [Docs] Add more documentation by @yaoyaoding in #401
  • [Fixbug] Fix a performance bug in auto-scheduler by @yaoyaoding in #402
  • [Library] Add cublas library by @yaoyaoding in #404
  • [Operator] Add hidet.ops.matmul_cublas operator by @yaoyaoding in #405
  • [Fusion] Allow shallow fusion of cublas operator by @yaoyaoding in #407
  • [CI] Clear op cache by @hjjq in #406
  • [Runtime] Add a new compiled format CompiledApp by @yaoyaoding in #408
  • CPU AVX implementation for Softmax, Norm by @fishingguy456 in #357
  • [CI] Reduce scope of secrets by @hjjq in #413
  • [Operator] Add a opaque operator base class by @yaoyaoding in #414
  • [IR] Support inplace operators by @yaoyaoding in #416
  • [Graph][Quantization] Multi-stage software pipelining and update parallel k rule by @Aalanli in #364
  • [CI] Trigger workflow by @hjjq in #417
  • [Scheduler] Add the fused task name to auto-scheduled kernels by @yaoyaoding in #418
  • [CI] Use cudagraph for benchmarks by @hjjq in #419
  • [CI] Remove unnecessary synchronization by @hjjq in #420
  • Update Netron viewer link by @KTong821 in #421
  • [Operator] Add cublas to matmul tune space by @hjjq in #422
  • [IR] Support integer subbyte by @xiaocenxiaocen in #403
  • [README] Fix ONNX link by @dbabokin in #425
  • [cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls by @yudi0201 in #423
  • [Fixbug] Fix dynamic memcpy bug by @KTong821 in #427
  • [Compile Server] Fetch repo before checking out by @hjjq in #429
  • [CI] Use slurm for runners by @hjjq in #430
  • [CI] CI migration by @hjjq in #433
  • [Fixbug] Fix graph metadata hash by @KTong821 in #428
  • [CI] Add back tests by @hjjq in #436
  • [Fix] Skip a failed test due to huggingface transformers update by @yaoyaoding in #439
  • [RC] Release candidate for version 0.3.1 by @yaoyaoding in #442

New Contributors

Full Changelog: v0.3.0...v0.3.1

Hidet v0.3.0

28 Sep 15:53
ea32c5c
Compare
Choose a tag to compare

Notes

In this release, we add more support for large language model inference, distributed inference, and quantization. We also make hidet script more stable and added more documentation for it. More operators and models are supported. See below for more details.

Frontend

  • [Frontend] Dynamic shape fx trace by @Aalanli in #294
  • [Torch] Steal Pytorch weights by @hjjq in #310
  • [Dynamo Frontend] Refactor the dynamic shape support by @yaoyaoding in #319
  • [Torch][Graph][Operator] Add and fix various items for torchvision model support by @hjjq in #347
  • [Dynamo] minor enhancements to attention and register a few functions by @xinli-git in #345

Operators and models

  • [Operator] Further performance enhancements for conv2D by @Aalanli in #290
  • [Operator] Refactoring matrix multiplication implementation by @yaoyaoding in #296
  • [Model Support] Add support for wav2vec by @yaoyaoding in #303
  • [Operator] Update attention for dynamic shape by @hjjq in #307
  • [Operator] Resolve Adaptive Pool to reduce by @hjjq in #308
  • [Reduce] optimize and unify reduce operator to a single place by @xinli-git in #311
  • [Operator] optimize normalize op with vectorized load, dynamic shape and more by @xinli-git in #316
  • [Model] Add missing operators for T5 by @yaoyaoding in #322
  • [Fixbug] Reduce should perform syncthread after initializing shared memory to zero by @xinli-git in #325
  • [Models] Llama 2 support by @Aalanli in #324
  • [Models] Llama2 fix by @Aalanli in #333
  • [Operator] Composite Elementwise Operation by @hjjq in #337
  • [Operator] Add clamp/isinf/any/all op, enhance where op by @yaoyaoding in #343
  • [Torch][Operator] More torchvision model support by @hjjq in #348
  • [Operator] Add einsum by @hjjq in #349
  • [Operator][Graph][Regression] CNN optimizations by @hjjq in #356
  • [Graph] Minor bug fixes by @hjjq in #358

Distributed inference

  • [Distributed] all_reduce op and distributed info in graphs by @soodoshll in #284
  • [Distributed] Add more runtime distributed communication functions by @soodoshll in #314
  • [Fixbug] group_start and group_end should be able importable without nccl by @soodoshll in #317

Quantization

  • [Operators] preliminary symmetric weight quantization by @Aalanli in #298
  • [Quantization] Quantization API by @Aalanli in #309
  • [Quantization] fix quantization pass bug by @Aalanli in #355

IR and passes

  • [FixBug] Don't instantiate symbol for primitive functions by @hjjq in #291
  • [Fix] NCCL API mismatch and NCCL primitive fix by @soodoshll in #301
  • [Fixbug] Prevent allreduce op from being fused by @soodoshll in #304
  • [Enhancements] add a vcude device to help mitigate compile time GPU memory usage by @xinli-git in #302
  • [Task] More descriptive kernel names for nsys/ncu by @Aalanli in #315
  • [Fixbug][Hidet Script] Fix a bug that hidet script does not recognize return type by @yaoyaoding in #329
  • [Hidet script] Add hidet.lang.types submodule by @yaoyaoding in #340
  • [IR][Parser] Hidet IR grammar, parser and ir reconstructor by @Aalanli in #354

Runtime

Backends

  • [Fixbug] Fix the c++ standard to c++11 for both nvcc and gcc compilers by @yaoyaoding in #327
  • [CPU][Scheduler] Use mutli-threads for autl-scheduler by @yaoyaoding in #341

Documentation

Others

  • [Version] Bump version to 0.3.0.dev by @yaoyaoding in #286
  • [Tools] simple benchmarking utility by @Aalanli in #292
  • [Compile Server] Support remote compilation via compilation server by @yaoyaoding in #297
  • [Compile Server] Allow the user to specify the repo and branch/tag to use by @yaoyaoding in #300
  • [Compile Server] Add a new option to specify the cuda arch by @yaoyaoding in #305
  • [Fixbug] Fix a bug in compile server by @yaoyaoding in #306
  • [Graph] Minor graph benchmark fix by @Aalanli in #313
  • [Regression] Local performance regression by @hjjq in #321
  • [Regression] Increase benchmark iters and update perf data by @hjjq in #328
  • [CI] List package versions in ci by @yaoyaoding in #334
  • [Fixbug] Clear the intermediate object files for kernel tuning by @yaoyaoding in #339
  • [Config] Add configuration file by @Aalanli in #359

Full Changelog: v0.2.4...v0.3.0

Hidet v0.2.4

21 Jun 02:00
289377a
Compare
Choose a tag to compare

What's Changed

  • [Version] Bump version to v0.2.4.dev by @yaoyaoding in #188
  • [Dynamo] module tests + operator support by @AndreSlavescu in #148
  • Refactor compilation workflow to support CPU without CUDA by @LDY1998 in #189
  • [Stack] Allow the the ulimit stack size less than expected by @yaoyaoding in #195
  • [Readme] Add platform requirements by @yaoyaoding in #196
  • [DataType] Add complex64 and complex128 data type by @yaoyaoding in #200
  • [Example] Add an example of running GPT-2 model by @yaoyaoding in #203
  • [Fusion] Use inline pass in fusion to allow template call functions with kernel params by @yaoyaoding in #197
  • [Frontend][Operator] Add missing operators for dinov2 by @yaoyaoding in #206
  • [Backend] Add openmp support by @yaoyaoding in #208
  • [Operator] Update batch_matmul to use Hidet Script by @hjjq in #207
  • [Cache] Add cache management command line interface by @yaoyaoding in #212
  • [IR] Creation-time constant fold for constant expressions by @yaoyaoding in #209
  • [Torch][Operator] Allow change torch tensor device when possible by @yaoyaoding in #214
  • [Torch][Operator] Add op mapping for torch.min/max/minimum/maximum by @yaoyaoding in #216
  • [Typo] Fix a typo in resnext.py by @eltociear in #210
  • [Operator] Adding missing operators for llama by @yaoyaoding in #219
  • [IR] Adding more support for dynamic shape on Task and FlowGraph level by @yaoyaoding in #220
  • [Torch] Add mapping for torch.ops.aten.add and torch.ops.aten.cos by @yaoyaoding in #223
  • [Operator][Backend] Add nvcc flags for faster math and update Attention schedule by @hjjq in #221
  • [CI] Always clear the cache before tests by @yaoyaoding in #224
  • fix batch_matmul for invalid mma config for sm < 80 by @xinli-git in #227
  • [Dynamic Shape] Adding more dynamic shape support by @yaoyaoding in #228
  • [CI] Add importlib_metadata to requirements-dev.txt by @yaoyaoding in #233
  • [Script] Add list comprehension support in hidet script by @yaoyaoding in #235
  • [Refactor][Dynamic Shape] Introduce SymbolVar to implement dynamic shape by @yaoyaoding in #236
  • [Script] Add pointer arthematic by @yaoyaoding in #237
  • [Operator][Torch] Add causal fmha and torch sdpa mapping by @hjjq in #238
  • [Fixbug][Pass] Fix a bug in the inline_let_stmt pass by @yaoyaoding in #240
  • [Options] Add option for controlling parallel build with number of jobs or memory reserved for each job by @xinli-git in #230
  • [Typo] Fix a typo by @BolinSNLHM in #245
  • [Typo] Fix minor spelling mistake by @Aalanli in #246
  • [Fixbug] Fix a bug in StmtRewriter which discard declare scope information by @yaoyaoding in #248
  • [Refactor] Adding support for compiled model by @yaoyaoding in #247
  • [Operator] batch_matmul: Remove duplicate smem declaration by @hjjq in #249
  • [Operator] Adding CPU support for matrix multiplication by @BolinSNLHM in #251
  • [Hidet Script] Allow bind_tuple argument in mapping.on(...) and grid(...) by @yaoyaoding in #254
  • [Hidet Script] Add in and not in expression in hidet script by @yaoyaoding in #255
  • [Codegen] Include header files as needed by @yaoyaoding in #256
  • [Operator] Add new operator "normalize" that makes a group of layers (layer norm, group norm and instance norm) faster using hidet script by @xinli-git in #257
  • [Testing][Models] Add gpt2 module in testing models by @yaoyaoding in #252
  • [Fixbug] Fix test warnings and the incompatibility of two recent PRs by @yaoyaoding in #258
  • [Operator] Add sm75 support for attention by @hjjq in #259
  • [Operator] batch_matmul: Remove unroll and reduce tuning space by @hjjq in #260
  • [Fixbug] Fix a bug when fused operator has no input by @yaoyaoding in #263
  • [Graph] Translate softmax and reduce to hidet script by @Aalanli in #242
  • [Fixbug] batch_matmul: move cc checking inside schedule by @hjjq in #264
  • [Refactor] Refactor building system and adding compiled products by @yaoyaoding in #261
  • [Fixbug] Reduce the default unroll factor to 4 by @yaoyaoding in #266
  • [Torch] Add some torch frontend mappings for roberta-base by @hjjq in #267
  • [Refactor] Remove schedules submodule under hidet.graph.ops by @yaoyaoding in #269
  • [Device] Add support for mixed cpu and cuda kernels in the same flow graph by @yaoyaoding in #270
  • [Dynamic Shape] Adding dynamic shape support for reduce by @Aalanli in #268
  • [Complex Type] Add more support for complex data type by @yaoyaoding in #271
  • [Tools] Model translator by @Aalanli in #273
  • [Model] Llama model implementation in hidet by @Aalanli in #243
  • [Operator] Add support for cross attention by @hjjq in #275
  • [Operator] Add dynamic shape support and tests for Operators. by @Aalanli in #274
  • [Fusion] Enhance the prologue epilogue fusion by @yaoyaoding in #277
  • [Drivers] Suppress OSError by @hjjq in #278
  • [Dynamic Shape] More correctness guards by @Aalanli in #276
  • [Operator] Make Convolution gemms fusible by resolving to batch_matmul by @hjjq in #279
  • [External Tasks] Move task build into method call for external kernel support by @xinli-git in #282
  • [Distributed] add nccl primitives by @soodoshll in #280
  • [Operators] Conv2d fp16 implicit gemm kernel by @Aalanli in #283

New Contributors

Full Changelog: v0.2.3...v0.2.4

Hidet v0.2.3

24 Apr 20:31
9a65fa2
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.2...v0.2.3

Hidet v0.2.2

24 Mar 00:51
3f15236
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.2.2

Hidet v0.2.1

18 Feb 06:26
0617089
Compare
Choose a tag to compare

What's Changed

  • [Version] Bump version to 0.2.1.dev by @yaoyaoding in #73
  • [CI] Prevent fork repos from running workflow by @yaoyaoding in #74
  • [Fixbug] Fix a bug in trace_from when the inputs are directly used as outputs by @yaoyaoding in #76
  • [Operator] Add reduce_f16 and squeeze as Reduce's resolve variants by @hjjq in #75
  • [IR] Input specification assertion message for valid IR check by @AndreSlavescu in #78
  • [Operator] Add conv3d, max_pool3d, avg_pool3d by @hjjq in #79
  • [Dynamo] Add the entry point registration for dynamo by @yaoyaoding in #80
  • [Fix] Update shape utility functions to expect Sequence instead of List by @yaoyaoding in #86
  • [Bugfix] 'double'->'float64' in onnx dtype conversion by @soodoshll in #88
  • [Fix] Mark the reduce fp16 operator not fusible by @yaoyaoding in #100
  • [Fixbug] Use uint64_t instead of unsigned long long for literals by @yaoyaoding in #101
  • [Fixbug] Fix a bug in the minimum and maximum operator by @yaoyaoding in #102
  • [Dynamo] Update dynamo registration after pytorch refactored that part by @yaoyaoding in #84
  • [Fixbug] Fix bugs in binary_arithmetic op and swizzle layout by @hjjq in #104
  • [Fixbug] Call fuse in reduce_fp16 operator by @yaoyaoding in #105
  • [ONNX] Fix the out of bound error in onnx slice function during importing by @yaoyaoding in #106
  • [Fixbug] Reverse map of binary operator by @yaoyaoding in #107
  • [Fixbug] Add attributes to Clip operator by @yaoyaoding in #108
  • [Fixbug] Binary arthmatic ops raise error when one is scalar on GPU by @yaoyaoding in #109
  • [Graph] Refactor forward function of FlowGraph by @yaoyaoding in #110
  • [Fixbug] Use int64 as the output of arg-reduce by @yaoyaoding in #111
  • [README] Update readme by @yaoyaoding in #114
  • [Fixbug] Fix a bug when an graph output is constant by @yaoyaoding in #113
  • [Community] Create CODE_OF_CONDUCT.md by @yaoyaoding in #115
  • [Community] Update issue templates by @yaoyaoding in #116
  • [Fixbug] Resolve the min/max function according to compute capability by @yaoyaoding in #112
  • [Workflow] Update workflow by @yaoyaoding in #117
  • [Workflow] Update publish workflow by @yaoyaoding in #119

New Contributors

Full Changelog: v0.2.0...v0.2.1

Hidet v0.2.0

13 Jan 23:59
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1...v0.2.0

Hidet v0.1

06 Jan 02:57
001c438
Compare
Choose a tag to compare

This is the first release of hidet.

For the usage of hidet, please visit: https://docs.hidet.org

What's Changed

  • [Docs] Update documentation by @yaoyaoding in #2
  • [Operator] Add leaky_relu and conv2d_transpose operator by @yaoyaoding in #3
  • [Doc] Add doc on how to define operator computation by @yaoyaoding in #4
  • [Bug] fix bugs in reshape and conv2d_transpose by @yaoyaoding in #5
  • [Option] Add option module by @yaoyaoding in #6
  • [Docs] Add documentation on how to add new operators by @yaoyaoding in #7
  • [Operator] Add PRelu op by @hjjq in #8
  • [Docs] Add documentation for operator cache & fix a typo by @yaoyaoding in #9
  • [Operator] Add Abs and And operator by @hjjq in #10
  • [CI] Update github workflow by @yaoyaoding in #11
  • [CI] Update docs workflow, not delete remote dest dir by @yaoyaoding in #12
  • [Operator] Add conv2d_transpose_gemm operator & fix a bug by @yaoyaoding in #13
  • [Runtime] force to use gpu tensor buffer in cuda graph by @yaoyaoding in #14
  • [Functor] Fix a bug in IR functor by @yaoyaoding in #15
  • [Graph] Force users to give an input order when multiple symbolic inputs are found in traced graph by @yaoyaoding in #17
  • [Operator] Add BitShift, Bitwise*, Ceil Operators by @hjjq in #19
  • [IR] Refactor scalar type system by @yaoyaoding in #18
  • [IR] Refactoring math functions by @yaoyaoding in #20
  • [Operator] Fix a bug when resolve matmul to batch_matmul by @yaoyaoding in #21
  • [Operator] Add cubic interpolation to Resize Operator by @hjjq in #22
  • [Packfunc] Refactor packed func & add vector type by @yaoyaoding in #23
  • [Pass] Add lower_special_cast pass and refactor resolve rule registration by @yaoyaoding in #24
  • [Docs] Change github repo url by @yaoyaoding in #25
  • [Operator] Add float16 precision matrix multiplication by @yaoyaoding in #26
  • [Docs] Add a guide on operator resolving by @yaoyaoding in #27
  • [CI] Avoid interactive query in apt installation of tzdata by @yaoyaoding in #28
  • [Docs] Add sub-graph rewrite tutorial by @yaoyaoding in #29
  • [Tensor] Implement dlpack tensor exchange protocol by @yaoyaoding in #30
  • [Frontend] Add a torch dynamo backend based on hidet "onnx2hidet" by @yaoyaoding in #31
  • [Frontend] Add hidet dynamo backend based on torch.fx by @yaoyaoding in #32
  • [Frontend] Make onnx dependency optional by @yaoyaoding in #33
  • [Frontend] Add more operator mappings for pytorch frontend by @yaoyaoding in #34
  • [Opeartor] Fix a bug in take (index can be in [-r, r-1]) by @yaoyaoding in #35
  • [Frontend] Add an option to print correctness report in hidet backend of torch dynamo by @yaoyaoding in #36
  • [IR] Refactor the attribute 'dtype' of hidet.Tensor from 'str' to 'DataType' by @yaoyaoding in #37
  • [Operator] Add a constant operator and deprecates manually implemented fill cuda kernel by @yaoyaoding in #38
  • [ONNX] Add reduce l2 onnx operator by @yaoyaoding in #40
  • [CLI] Add the 'hidet' command line interface by @yaoyaoding in #39
  • [Codegen] Add explicit conversion type for float16 by @yaoyaoding in #41
  • [Docs] Add the documentation for 'hidet' backend of PyTorch dynamo by @yaoyaoding in #42
  • [Runtime] Refactor the cuda runtime api used in hidet by @yaoyaoding in #43
  • [Testing] Remove redundant models in hidet.testing by @yaoyaoding in #44
  • [Runtime][IR] Refactor the device attribute of Tensor object by @yaoyaoding in #45
  • [Array-API][Phase 0] Adding the declarations of missing operators in Array API by @yaoyaoding in #46
  • [Operator] Add arange and linspace operator by @yaoyaoding in #47
  • [Bug] Fix a bug related to memset by @yaoyaoding in #49
  • [Docs] Add and update documentation by @yaoyaoding in #48
  • [Docs][Opeartor] Add more pytorch operator bindings and docs by @yaoyaoding in #50
  • [License][Docs] Add license header and update README.md by @yaoyaoding in #51
  • [Docs] Update docs by @yaoyaoding in #52
  • [IR] Add LaunchKernelStmt by @yaoyaoding in #53
  • [Operator] Add some torch operator mapping by @yaoyaoding in #54
  • [Bug] Fix a bug in hidet dynamo backend when cuda graph is not used by @yaoyaoding in #55
  • [Dynamo] Allow torch dynamo backend to accepts non-contiguous input by @yaoyaoding in #56
  • [Graph] Add to_cuda() for Module class by @hjjq in #57
  • [Bug] Fix a bug where the shared memory becomes zero in LaunchKernelStmt by @yaoyaoding in #58
  • [Release] Prepare to release the first version of hidet to public by @yaoyaoding in #59

New Contributors

  • @hjjq made their first contribution in #8

Full Changelog: https://github.com/hidet-org/hidet/commits/v0.1