Language Identifier

Introduction

This is a deep learning project created with the help of Tensorflow that predicts the language of a given text snippet. Currently, this language prediction model supports a total of 22 languages as of now, which include: Arabic, Chinese, Dutch, English, Estonian, French, Hindi, Indonesian, Japanese, Korean, Latin, Persian, Portuguese, Pashto, Romanian, Russian, Spanish, Swedish, Tamil, Thai, Turkish, and Urdu.

Dataset used in this project

The dataset used in this project is taken from kaggle: https://www.kaggle.com/datasets/zarajamshaid/language-identification-datasst

Models used in this project

Vanilla Sequential model
TextCNN model
Bidirectional SimpleRNN model
Bidirectional LSTM model
Bidirectional GRU model
Ensemble Learning(Bidirectional LSTM + Bidirectional GRU) model

Out of the all the above models, textCNN proved to be the most effective one with a training accuracy of around 78.99% and testing accuracy of around 73.65%

About the web application of the deep learning model

The deep learning model of this project is connected with an application created with Gradio for real time prediction and it is deployed on HuggingFace Spaces.

Links

Live Preview: https://som11-language-predictor.hf.space/

Warning

While the model of this project can classify languages correctly, but in some cases, the model may misclassify languages, therefore, it is strongly advised not to rely solely on the output of this model.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
01_language_identifier_sequential_model.ipynb		01_language_identifier_sequential_model.ipynb
02_language_identifier_textCNN_model.ipynb		02_language_identifier_textCNN_model.ipynb
03_language_identifier_bidirectional_SimpleRNN_model.ipynb		03_language_identifier_bidirectional_SimpleRNN_model.ipynb
04_language_identifier_bidirectional_LSTM_model.ipynb		04_language_identifier_bidirectional_LSTM_model.ipynb
05_language_identifier_bidirectional_GRU_model.ipynb		05_language_identifier_bidirectional_GRU_model.ipynb
06_language_identifier_ensemble_model.ipynb		06_language_identifier_ensemble_model.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

.gitignore

.gitignore

01_language_identifier_sequential_model.ipynb

01_language_identifier_sequential_model.ipynb

02_language_identifier_textCNN_model.ipynb

02_language_identifier_textCNN_model.ipynb

03_language_identifier_bidirectional_SimpleRNN_model.ipynb

03_language_identifier_bidirectional_SimpleRNN_model.ipynb

04_language_identifier_bidirectional_LSTM_model.ipynb

04_language_identifier_bidirectional_LSTM_model.ipynb

05_language_identifier_bidirectional_GRU_model.ipynb

05_language_identifier_bidirectional_GRU_model.ipynb

06_language_identifier_ensemble_model.ipynb

06_language_identifier_ensemble_model.ipynb

README.md

README.md

app.py

app.py

requirements.txt

requirements.txt

Repository files navigation

Language Identifier

Introduction

Dataset used in this project

Models used in this project

About the web application of the deep learning model

Links

Warning

About

Releases

Packages

Languages

somenath203/Language-Identifier-using-Tensorflow

Folders and files

Latest commit

History

Repository files navigation

Language Identifier

Introduction

Dataset used in this project

Models used in this project

About the web application of the deep learning model

Links

Warning

About

Topics

Resources

Stars

Watchers

Forks

Languages