Preparation of French Corpus

French Corpus is a translated version of WebNLG release3.0 English dataset. We used English to French [NMT model][[(https://storage.googleapis.com/samanantar-public/V0.3/models/en-indic.zip)]] provide by https://pytorch.org/hub/pytorch_fairseq_translation/ to generate french sentences.

To generate the french corpus

download the required packages

pip install -r requirements.txt

Generate files for train,dev and test folder

python3 run.py <path to the folder containing english xml files>

In our case, we used english language datapath as it is easy to replace english lex with french lex. WebNLG corpus can be downloaded from this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
Corpus		Corpus
Monolingual models		Monolingual models
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

Corpus

Corpus

Monolingual models

Monolingual models

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Preparation of French Corpus

To generate the french corpus

download the required packages

Generate files for train,dev and test folder

About

Releases

Packages

Languages

CFR2000/WebNLG2022

Folders and files

Latest commit

History

Repository files navigation

Preparation of French Corpus

To generate the french corpus

download the required packages

Generate files for train,dev and test folder

About

Topics

Resources

Stars

Watchers

Forks

Languages