Fluency

A desktop notification app that uses a Seq2Seq model to allow users to create various language notifications.

Transforming English to French Language Translation

Data Preparation

In this section, we will prepare our dataset for training by performing the following tasks:

Clean the text data by removing punctuation symbols, numbers, and converting characters to lowercase.
Replace Unicode characters with their ASCII equivalents.
Determine the maximum sequence length of both English and French phrases to establish input and output sequence lengths for our model.

Handling language data formatting

	english_text	french_text
0	youre very clever	[start] vous etes fort ingenieuse [end]
1	are there kids	[start] y atil des enfants [end]
2	come in	[start] entrez [end]
3	wheres boston	[start] ou est boston [end]
4	you see what i mean	[start] vous voyez ce que je veux dire [end]

Language Tokenization

⚒️ We will tokenize the English and French phrases using separate Tokenizer instances and generate padded sequences for model training. The steps involved are as follows:

Fit a Tokenizer to the English phrases and another Tokenizer to their French equivalents.
Compute the vocabulary sizes based on the Tokenizer instances.
Create padded sequences for all phrases.
Prepare features and labels for training:

The features consist of the padded English sequences and the padded French sequences excluding the [end] tokens.
The labels consist of the padded French sequences excluding the [start] tokens.

Model Training and Evaluation

We train 🚂 the model and evaluate its performance on the validation set. Below are the current learning assessment metrics.

Evaluate the model's performance

1563/1563 [==============================] - 14s 9ms/step - loss: 0.2290 - accuracy: 0.8512
Test Loss: 0.22895030677318573
Validation Accuracy: 0.8511516451835632

Assess the model's learning accuracy

Translation Testing

Handle the translation process based on the model's predictions.

English: let us out of here => French: laissenous sortir dici
English: it could be fun => French: ca pourrait etre marrant
English: this is my new video => French: cest ma nouvelle video
English: do you like fish => French: aimestu le poisson
English: you were in a coma => French: vous etiez dans le coma
English: dont be upset => French: ne soyez pas fache
English: didnt you know that => French: le saviezvous
English: im not exactly sure => French: je nen suis pas a la tete
English: i put it on your desk => French: je lai mise sur votre bureau
English: somehow tom knew => French: pourtant tom savait

Translation Comparison

Compare against Baseline model is: LibreTranslate which uses a NMT Model architecture

English: let us out of here => French: laissez-nous sortir d'ici
English: it could be fun => French: ça pourrait être amusant
English: this is my new video => French: c'est ma nouvelle vidéo
English: do you like fish => French: vous aimez le poisson
English: you were in a coma => French: tu étais dans le coma
English: dont be upset => French: ne soyez pas contrarié
English: didnt you know that => French: tu ne savais pas que
English: im not exactly sure => French: im pas exactement sûr
English: i put it on your desk => French: je l'ai mis sur ton bureau
English: somehow tom knew => French: tom le savait

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

README.md

README.md

Repository files navigation

Fluency

Transforming English to French Language Translation

Data Preparation

Handling language data formatting

Language Tokenization

Model Training and Evaluation

Evaluate the model's performance

Assess the model's learning accuracy

Translation Testing

Translation Comparison

About

Languages

Blakley/Fluency

Folders and files

Latest commit

History

src

src

README.md

README.md

Repository files navigation

Fluency

Transforming English to French Language Translation

Data Preparation

Handling language data formatting

Language Tokenization

Model Training and Evaluation

Evaluate the model's performance

Assess the model's learning accuracy

Translation Testing

Translation Comparison

About

Topics

Resources

Stars

Watchers

Forks

Languages