Skip to content

Hidet v0.3.0

Compare
Choose a tag to compare
@yaoyaoding yaoyaoding released this 28 Sep 15:53
· 71 commits to main since this release
ea32c5c

Notes

In this release, we add more support for large language model inference, distributed inference, and quantization. We also make hidet script more stable and added more documentation for it. More operators and models are supported. See below for more details.

Frontend

  • [Frontend] Dynamic shape fx trace by @Aalanli in #294
  • [Torch] Steal Pytorch weights by @hjjq in #310
  • [Dynamo Frontend] Refactor the dynamic shape support by @yaoyaoding in #319
  • [Torch][Graph][Operator] Add and fix various items for torchvision model support by @hjjq in #347
  • [Dynamo] minor enhancements to attention and register a few functions by @xinli-git in #345

Operators and models

  • [Operator] Further performance enhancements for conv2D by @Aalanli in #290
  • [Operator] Refactoring matrix multiplication implementation by @yaoyaoding in #296
  • [Model Support] Add support for wav2vec by @yaoyaoding in #303
  • [Operator] Update attention for dynamic shape by @hjjq in #307
  • [Operator] Resolve Adaptive Pool to reduce by @hjjq in #308
  • [Reduce] optimize and unify reduce operator to a single place by @xinli-git in #311
  • [Operator] optimize normalize op with vectorized load, dynamic shape and more by @xinli-git in #316
  • [Model] Add missing operators for T5 by @yaoyaoding in #322
  • [Fixbug] Reduce should perform syncthread after initializing shared memory to zero by @xinli-git in #325
  • [Models] Llama 2 support by @Aalanli in #324
  • [Models] Llama2 fix by @Aalanli in #333
  • [Operator] Composite Elementwise Operation by @hjjq in #337
  • [Operator] Add clamp/isinf/any/all op, enhance where op by @yaoyaoding in #343
  • [Torch][Operator] More torchvision model support by @hjjq in #348
  • [Operator] Add einsum by @hjjq in #349
  • [Operator][Graph][Regression] CNN optimizations by @hjjq in #356
  • [Graph] Minor bug fixes by @hjjq in #358

Distributed inference

  • [Distributed] all_reduce op and distributed info in graphs by @soodoshll in #284
  • [Distributed] Add more runtime distributed communication functions by @soodoshll in #314
  • [Fixbug] group_start and group_end should be able importable without nccl by @soodoshll in #317

Quantization

  • [Operators] preliminary symmetric weight quantization by @Aalanli in #298
  • [Quantization] Quantization API by @Aalanli in #309
  • [Quantization] fix quantization pass bug by @Aalanli in #355

IR and passes

  • [FixBug] Don't instantiate symbol for primitive functions by @hjjq in #291
  • [Fix] NCCL API mismatch and NCCL primitive fix by @soodoshll in #301
  • [Fixbug] Prevent allreduce op from being fused by @soodoshll in #304
  • [Enhancements] add a vcude device to help mitigate compile time GPU memory usage by @xinli-git in #302
  • [Task] More descriptive kernel names for nsys/ncu by @Aalanli in #315
  • [Fixbug][Hidet Script] Fix a bug that hidet script does not recognize return type by @yaoyaoding in #329
  • [Hidet script] Add hidet.lang.types submodule by @yaoyaoding in #340
  • [IR][Parser] Hidet IR grammar, parser and ir reconstructor by @Aalanli in #354

Runtime

Backends

  • [Fixbug] Fix the c++ standard to c++11 for both nvcc and gcc compilers by @yaoyaoding in #327
  • [CPU][Scheduler] Use mutli-threads for autl-scheduler by @yaoyaoding in #341

Documentation

Others

  • [Version] Bump version to 0.3.0.dev by @yaoyaoding in #286
  • [Tools] simple benchmarking utility by @Aalanli in #292
  • [Compile Server] Support remote compilation via compilation server by @yaoyaoding in #297
  • [Compile Server] Allow the user to specify the repo and branch/tag to use by @yaoyaoding in #300
  • [Compile Server] Add a new option to specify the cuda arch by @yaoyaoding in #305
  • [Fixbug] Fix a bug in compile server by @yaoyaoding in #306
  • [Graph] Minor graph benchmark fix by @Aalanli in #313
  • [Regression] Local performance regression by @hjjq in #321
  • [Regression] Increase benchmark iters and update perf data by @hjjq in #328
  • [CI] List package versions in ci by @yaoyaoding in #334
  • [Fixbug] Clear the intermediate object files for kernel tuning by @yaoyaoding in #339
  • [Config] Add configuration file by @Aalanli in #359

Full Changelog: v0.2.4...v0.3.0