NLP-Grammar-Checker

We write a grammar and a parser to parse the POS tag sequence.

Data

Input data: sentences with POS tags The input is a tsv (tab-separated values) file like the sample:

id	label	sentence	pos
73	0	Many thanks in advance for your cooperation .	JJ NNS IN NN IN PRP$ NN .

The id column is the unique id for each sentence. The label column indicates whether a sentence contains grammar errors (1 means having errors and 0 means error-free). The pos column contains the POS tags for each token in the sentence, also separated by a single space.

The POS tags follow the Penn Treebank (PTB) tagging scheme, described here

Tasks

Task 1: Building a toy grammar

We wrote a toy CFG for English in NLTK’s .cfg format.

Task 2: Constituency parsing

We used the chart parser from NLTK to parse each of the POS sequences in the dataset with the toy grammar we wrote in task 1. We stored results in a TSV file with three columns:

Column name	Description
id	The id of the input sentence.
prediction	1 if the sentence has grammar errors, 0 if not. In other words, whether the POS sequence can be parsed successfully with your grammar and parser.

Task 3: Evaluation and error analysis

We evaluate the performance of our grammar checker by calculating its precision and recall on the data available to us. To do that, we compared the prediction of our system on a given sentence and its corresponding label in the dataset.

Report and Results

Further details and results can be found here

Contributors

Leen Alzebdeh @Leen-Alzebdeh

Sukhnoor Khehra @Sukhnoor-K

Resources Consulted

Penn Treebank P.O.S. Tags

Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall.

GitHub Copilot

Libraries

We run this project using standard Python libraries csv, sys, nltk.

Instructions to execute code

Ensure Python is installed, as well as the Python Standard Library.
Ensure the library nltk is installed, it can be installed using the following command:

pip install --user -U nltk

Ensure you have input data in the format outlined above and in a file 'data/train.tsv'

Example usage: use the following command in the current directory.

python3 src/main.py data/train.tsv grammars/toy.cfg output/train.tsv

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
grammars		grammars
output		output
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

grammars

grammars

output

output

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

REPORT.md

REPORT.md

Repository files navigation

NLP-Grammar-Checker

Data

Tasks

Task 1: Building a toy grammar

Task 2: Constituency parsing

Task 3: Evaluation and error analysis

Report and Results

Contributors

Resources Consulted

Libraries

Instructions to execute code

About

Releases

Packages

Languages

License

Leen-Alzebdeh/NLP-Grammar-Checker

Folders and files

Latest commit

History

Repository files navigation

NLP-Grammar-Checker

Data

Tasks

Task 1: Building a toy grammar

Task 2: Constituency parsing

Task 3: Evaluation and error analysis

Report and Results

Contributors

Resources Consulted

Libraries

Instructions to execute code

About

Topics

Resources

License

Stars

Watchers

Forks

Languages