You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IndicTrans2 is a multilingual transformer model developed by AI4Bharat, and is available in 3 flavors: indic-en, en-indic and indic-indic. Each flavor has 2 versions, a large 1B model, and a distilled 200M model. The architecture is a standard transformer, very similar to NLLB and M2M models. However, the major difference is the vocabularies of the encoder and decoder and not shared, as they require different languages.
Unlike, NLLB and M2M models, IndicTrans2 required specific preprocessing for the inputs. Hence a custom processor class has been developed, and is required for training/inference. More examples can be found in the aforementioned repository.
Hi @VarunGumma, thanks for opening this model request!
This looks like a great candidate for adding the model on the hub. This is the easiest and recommended way to make a model available in transformers and means, once working, the model can be found and used immediately without having to go through the PR process. We find this is a lot quicker as the bar for adding code into the library is high due to the maintenance cost of every new model, and so reviews take quite a while.
We have as much support as we can for this - let us know know if there's any issues in implementation. Here is a tutorial if that sound good to you!
Thank you for your reply. We also need some help to add flash_attention_2 to our model. We were able to modify the modeling script for it, but it throws us an error that our model class IndicTransForConditionalGeneration itself is now supported. How can we proceed in this case?
Model description
IndicTrans2 is a multilingual transformer model developed by AI4Bharat, and is available in 3 flavors:
indic-en
,en-indic
andindic-indic
. Each flavor has 2 versions, a large 1B model, and a distilled 200M model. The architecture is a standard transformer, very similar to NLLB and M2M models. However, the major difference is the vocabularies of the encoder and decoder and not shared, as they require different languages.Unlike, NLLB and M2M models, IndicTrans2 required specific preprocessing for the inputs. Hence a custom processor class has been developed, and is required for training/inference. More examples can be found in the aforementioned repository.
Open source status
Provide useful links for the implementation
Authors: @AI4Bharat @jaygala24 @PranjalChitale @oneraghavan @VarunGumma @sumanthd17 @prajdabre @anoopkunchukuttan
Official GitHub Repository: AI4Bharat/IndicTrans2
The HF compatible models and tokenizer are available here as of now:
The text was updated successfully, but these errors were encountered: