What's Changed
- [Version] Bump version to v0.3.1.dev by @yaoyaoding in #361
- [Option] Add an option to disable imperative execution by @serach24 in #362
- [Graph][Benchmark] Update benchmark function by @Aalanli in #363
- [Compile Server] Update deps for compilation server by @xinli-git in #365
- [Utils] Changed the multiprocessing context by @destefy in #367
- [Dynamo] Refactoring code for Hidet remote compilation by @destefy in #369
- [Graph][Dynamo Backend] Lshift/Rshift/Mod by @Aalanli in #371
- [Graph][Operator] Fix reduce bug, add uint8x4 by @Aalanli in #372
- [CompiledGraph] Add option to store dispatch table option by @destefy in #377
- [Graph][Tensor] remove unnecessary synchronization by @xiaocenxiaocen in #374
- [Graph][Dynamo Backend] Minor imperative run bug fix by @Aalanli in #383
- [Graph] Fix CompiledGraph aliasing bug by @Aalanli in #384
- [Frontend] Add mapping for
torch.sqrt
by @yaoyaoding in #387 - [Fix][Graph] Write compiled graph to tempfile first by @destefy in #392
- [Operators] Improving fp32 matrix multiplication on x86 CPUs by @BolinSNLHM in #378
- [Fixbug] Fix a bug related to c/c++ integer promotion by @yaoyaoding in #391
- [Option] Add option to set class Var id attribute to 0 by default by @destefy in #393
- [CI] Add CI workflow and scripts by @hjjq in #394
- [CI] Fix deadlock by @hjjq in #395
- [Operator] Enhancements to Reduce by @hjjq in #366
- [CI] Launch and stop compile server via workflow by @hjjq in #396
- [Operator] Support advanced options for pooling operators by @yaoyaoding in #399
- [Torch] Implements torch_func protocol by @yaoyaoding in #400
- [Docs] Add more documentation by @yaoyaoding in #401
- [Fixbug] Fix a performance bug in auto-scheduler by @yaoyaoding in #402
- [Library] Add cublas library by @yaoyaoding in #404
- [Operator] Add
hidet.ops.matmul_cublas
operator by @yaoyaoding in #405 - [Fusion] Allow shallow fusion of cublas operator by @yaoyaoding in #407
- [CI] Clear op cache by @hjjq in #406
- [Runtime] Add a new compiled format CompiledApp by @yaoyaoding in #408
- CPU AVX implementation for Softmax, Norm by @fishingguy456 in #357
- [CI] Reduce scope of secrets by @hjjq in #413
- [Operator] Add a opaque operator base class by @yaoyaoding in #414
- [IR] Support inplace operators by @yaoyaoding in #416
- [Graph][Quantization] Multi-stage software pipelining and update parallel k rule by @Aalanli in #364
- [CI] Trigger workflow by @hjjq in #417
- [Scheduler] Add the fused task name to auto-scheduled kernels by @yaoyaoding in #418
- [CI] Use cudagraph for benchmarks by @hjjq in #419
- [CI] Remove unnecessary synchronization by @hjjq in #420
- Update Netron viewer link by @KTong821 in #421
- [Operator] Add cublas to matmul tune space by @hjjq in #422
- [IR] Support integer subbyte by @xiaocenxiaocen in #403
- [README] Fix ONNX link by @dbabokin in #425
- [cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls by @yudi0201 in #423
- [Fixbug] Fix dynamic memcpy bug by @KTong821 in #427
- [Compile Server] Fetch repo before checking out by @hjjq in #429
- [CI] Use slurm for runners by @hjjq in #430
- [CI] CI migration by @hjjq in #433
- [Fixbug] Fix graph metadata hash by @KTong821 in #428
- [CI] Add back tests by @hjjq in #436
- [Fix] Skip a failed test due to huggingface transformers update by @yaoyaoding in #439
- [RC] Release candidate for version 0.3.1 by @yaoyaoding in #442
New Contributors
- @destefy made their first contribution in #367
- @xiaocenxiaocen made their first contribution in #374
- @fishingguy456 made their first contribution in #357
- @KTong821 made their first contribution in #421
- @dbabokin made their first contribution in #425
- @yudi0201 made their first contribution in #423
Full Changelog: v0.3.0...v0.3.1