This is a deep learning project created with the help of Tensorflow that predicts the language of a given text snippet. Currently, this language prediction model supports a total of 22 languages as of now, which include: Arabic, Chinese, Dutch, English, Estonian, French, Hindi, Indonesian, Japanese, Korean, Latin, Persian, Portuguese, Pashto, Romanian, Russian, Spanish, Swedish, Tamil, Thai, Turkish, and Urdu.
The dataset used in this project is taken from kaggle: https://www.kaggle.com/datasets/zarajamshaid/language-identification-datasst
- Vanilla Sequential model
- TextCNN model
- Bidirectional SimpleRNN model
- Bidirectional LSTM model
- Bidirectional GRU model
- Ensemble Learning(Bidirectional LSTM + Bidirectional GRU) model
Out of the all the above models, textCNN proved to be the most effective one with a training accuracy of around 78.99% and testing accuracy of around 73.65%
The deep learning model of this project is connected with an application created with Gradio for real time prediction and it is deployed on HuggingFace Spaces.
Live Preview: https://som11-language-predictor.hf.space/
While the model of this project can classify languages correctly, but in some cases, the model may misclassify languages, therefore, it is strongly advised not to rely solely on the output of this model.