Skip to content

aminraz/word_stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The Python script provided in this repository is a text analyzer or a Python filter. It gives you some basic stats and an advanced measure called Type-Token Ratio (TTR).

TTR is a measure used to assess the diversity or richness of vocabulary in a text. TTR = (Number of unique word types) / (Total number of words).

In order to use the script, find your text file and in a terminal type the following:

python word_stats.py my_file.txt

Here is a sample output from the script:

The text has 7077 characters, 1183 words, and 52 sentences.
The average lenght of a sentence is 22 word.
Standrd deviation of the length of sentence is 11.5 word.
Type-Token Ratio (TTR): 0.49

About

Corpus analysis of plain text and providing Type-Token Ratio as well as some other statistics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages