corpusDenominator

This is a small program written to bring corpora in the same language but in different orthographies to common orthographic denominator. It creates a “deformed” but orthographically uniform corpus for stylometric analysis with R (https://github.com/computationalstylistics/stylo).

Parameters in `corpusDenominator.py`

# Define parameters here

separator = "\t" # you can change the separator here ("\t" for TAB, "," for COMMA, etc.)
schemeFile = "conversionList.txt" # you can change the file name of the scheme
key = "RE" # use "RE" for regular expressions, "PLAIN" for simple find/replace

# Folder variables

folderOld = "./textsOld/" # folder for texts in old orthography
folderNew = "./textsNew/" # folder for texts in new orthography
folderMod = "./textsMod/" # folder for texts in mod orthography

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textsMod

textsMod

textsNew

textsNew

textsOld

textsOld

README.md

README.md

conversionList.txt

conversionList.txt

corpusDenominator.py

corpusDenominator.py

Repository files navigation

corpusDenominator

Parameters in `corpusDenominator.py`

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
textsMod		textsMod
textsNew		textsNew
textsOld		textsOld
README.md		README.md
conversionList.txt		conversionList.txt
corpusDenominator.py		corpusDenominator.py

maximromanov/corpusDenominator

Folders and files

Latest commit

History

Repository files navigation

corpusDenominator

Parameters in corpusDenominator.py

About

Resources

Stars

Watchers

Forks

Languages

Parameters in `corpusDenominator.py`