Indic-LLM is a framework that provides the foundation to adapt Language Models (LLMs) for Indic languages supporting LLMs such as LLama 2,Mistral,Gemma
git clone https://github.com/adithya-s-k/Indic-llm.git
cd Indic-llm
conda create -n indic-venv python=3.10
conda activate indic-venv
pip3 install -r requirements.txt
Model | Tokeniser | Pretraining(Lora) | SFT | DPO | Evaluation |
---|---|---|---|---|---|
LLama2 | ✅ | ✅ | ✅ | ✅ | ✅ |
Mistral | ✅ | ✅ | ✅ | ✅ | ✅ |
Gemma | - | ✅ | ✅ | ✅ | ✅ |
Qwen | - | - | - | - | - |
Please Refer to the Docs
It's crucial to be aware that the models provided in this framework have not undergone detoxification. While they showcase impressive linguistic capabilities, there is a potential for generating content that may be considered harmful or offensive. Users are strongly advised to exercise discretion and closely monitor the model's outputs, especially in public or sensitive applications.
We welcome contributions to enhance and expand this project. If you have suggestions or improvements, please open an issue or submit a pull request.
This project is licensed under the GNU GPL v3.0 license. For details, refer to the LICENSE.md file.
IMPORTANT: The GPL 3.0 License applies solely to the source code and datasets provided in this repository. As Indic-LLM is a derivative of Meta's LLama 2 model, it is subject to the original licensing of LLama 2, which cannot be altered. Therefore, for comprehensive details regarding the licensing of the model, please consult the LLAMA2-LICENSE file.
This repository draws inspiration from the following repositories: