Increase the inference speed of the model
-
Updated
Jun 7, 2022 - Python
Increase the inference speed of the model
trying to write a mini triton backend in rust
Deploy KoGPT with Triton Inference Server
A library for interfacing with Triton.
WPF application for editing XML based configuration files
Run CI jobs in Manta when trigger by Pull Requests
Package for running Nvidia Triton within python test with features like Dockerfile DSL and building images on fly.
This bootcamp is designed to give NLP researchers an end-to-end overview on the fundamentals of NVIDIA NeMo framework, complete solution for building large language models. It will also have hands-on exercises complimented by tutorials, code snippets, and presentations to help researchers kick-start with NeMo LLM Service and Guardrails.
Learnings and experimentation with GPU programming
This repository contains everything regarding the bachelor thesis: NLPiP (NLP in Production).
Manta adapter for Spine models running in NodeJS
Add Some plus extra features to transformers
The benchmark for OpenAI Triton.
Triton reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].
Add a description, image, and links to the triton topic page so that developers can more easily learn about it.
To associate your repository with the triton topic, visit your repo's landing page and select "manage topics."