Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate IndicTrans2 models and tokenizer into HF Transformers #30818

Open
2 tasks done
VarunGumma opened this issue May 15, 2024 · 4 comments
Open
2 tasks done

Integrate IndicTrans2 models and tokenizer into HF Transformers #30818

VarunGumma opened this issue May 15, 2024 · 4 comments

Comments

@VarunGumma
Copy link

VarunGumma commented May 15, 2024

Model description

IndicTrans2 is a multilingual transformer model developed by AI4Bharat, and is available in 3 flavors: indic-en, en-indic and indic-indic. Each flavor has 2 versions, a large 1B model, and a distilled 200M model. The architecture is a standard transformer, very similar to NLLB and M2M models. However, the major difference is the vocabularies of the encoder and decoder and not shared, as they require different languages.

Unlike, NLLB and M2M models, IndicTrans2 required specific preprocessing for the inputs. Hence a custom processor class has been developed, and is required for training/inference. More examples can be found in the aforementioned repository.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Authors: @AI4Bharat @jaygala24 @PranjalChitale @oneraghavan @VarunGumma @sumanthd17 @prajdabre @anoopkunchukuttan

Official GitHub Repository: AI4Bharat/IndicTrans2

The HF compatible models and tokenizer are available here as of now:

@amyeroberts
Copy link
Collaborator

Hi @VarunGumma, thanks for opening this model request!

This looks like a great candidate for adding the model on the hub. This is the easiest and recommended way to make a model available in transformers and means, once working, the model can be found and used immediately without having to go through the PR process. We find this is a lot quicker as the bar for adding code into the library is high due to the maintenance cost of every new model, and so reviews take quite a while.

We have as much support as we can for this - let us know know if there's any issues in implementation. Here is a tutorial if that sound good to you!

@VarunGumma
Copy link
Author

Hu @amyeroberts,

Thank you for your reply. We also need some help to add flash_attention_2 to our model. We were able to modify the modeling script for it, but it throws us an error that our model class IndicTransForConditionalGeneration itself is now supported. How can we proceed in this case?

@amyeroberts
Copy link
Collaborator

@VarunGumma Could you share the error message and full traceback?

@VarunGumma
Copy link
Author

@amyeroberts , thank you. We were able to resolve it on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants