Skip to content

kurd-cc/kurdish-words-corrector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kurdish words corrector

Correct the typos and the Unicode problems in Kurdish (Kurmanji) by brute forcing and comparing with a dictionary.
The brute forcing has 3 different depths and specific most popular typos like writing s instead of ş or e instead of ê.

Incorrect sentence:

Reso cu dur.

Corrected sentence:
Reşo çû dûr.

Usage:

python kurdish-words-corrector.py -t "Reso cu dur." -o "results.txt"
  • You can read the results in yaml or json formats when you didn't include -o path.
  • The script will save a states file in the same path which include the json or yaml formated results (it includes some states too).

An example of the resulted json:

{
  "correct_words": [],
  "incorrect_words_with_possible_corrections": [
    {
      "word": "Reso",
      "message": "Is not in our database, and we found similar word/s",
      "status": 1,
      "possibilities": [
        "reşo"
      ]
    },
    {
      "word": "cu",
      "message": "Is not in our database, and we found similar word/s",
      "status": 1,
      "possibilities": [
        "çû"
      ]
    },
    {
      "word": "dur",
      "message": "Is not in our database, and we found similar word/s",
      "status": 1,
      "possibilities": [
        "dûr"
      ]
    }
  ],
  "incorrect_words_without_possible_corrections": [],
  "total_words": 3,
  "total_incorrect": 3,
  "total_incorrect_with_corrections": 3,
  "total_incorrect_without_corrections": 0,
  "incorrect_percentage": 100
}

Arguments:

Argument Description
-w or --word A word to only get its corrected form
-t or --text The entered text to correct its words
-o or --output The path of output results file with corrected text
-f or --file The path of the file to correct its text's words
-d or --depth With the values 1, 2 or 3, to increase the level of brute forcing but also the time it needs to be processed
-p or --parser The parsing of the outputed file; yaml (default) or json
-wr or --workers The number of workers (threads) that you want to use, default=100