Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application of increasing metrics #3

Open
BradKML opened this issue Apr 14, 2021 · 0 comments
Open

Application of increasing metrics #3

BradKML opened this issue Apr 14, 2021 · 0 comments

Comments

@BradKML
Copy link

BradKML commented Apr 14, 2021

Here is a list of metrics from other heavily cited papers that can be added to the project, if there are individual metrics of interest they can have their own issues

http://www.stolerman.net/studies/cs613/cs613_Writeprints_Ariel_Stolerman_paper.pdf and https://dl.acm.org/doi/10.1145/1344411.1344413

  • Letter
    • Monograms
    • Bigrams
    • Trigrams
  • Digits
    • Monograms
    • Bigrams
    • Trigrams
  • Word Length Distribution
  • Vocabulary Richness
    • hapax legomena
    • dis legomena
    • Yule’s Characteristic K
    • Simpson’s diversity index
  • Special Characters
    • Punctuation
  • Function words
  • POS tags
    • Monograms
    • Bigrams
    • Trigrams
  • Words
    • Monograms
    • Bigrams
    • Trigrams
  • Misspellings

https://spectrum.library.concordia.ca/36253/1/2010_Mining_Writeprints_from_Anonymous_E-mails.pdf

  • Character Count
  • Digit ratio
  • Total Letter ratio
  • Upper case ratio
  • whitespace ratio
    • Space ratio
    • Tab ratio
    • Newline ratio
  • Individual Letter Ratio
  • Special Characters Count
  • Token
  • character-based sentence length
  • Token length
  • Word Length Frequency (from 1-20 characters)
  • Short word (1-3 character) ratio
  • Word Type ratio
  • Vocabulary Richness
  • Yule's K richness
  • Hapax Legomena
  • Hapax Dislegomena
  • Punctuations Occurance
  • Function Words

https://commons.erau.edu/cgi/viewcontent.cgi?article=1299&context=adfsl

  • Individual Occurance
    • Special Characters
    • Digit Frequency
    • Lowercase
    • Uppercase
    • Spaces
  • Total Occurance
    • Special Characters
    • Digit Frequency
    • Lowercase
    • Uppercase
    • Spaces
  • Ratio
    • Special Characters
    • Digit Frequency
    • Lowercase
    • Uppercase
    • Spaces
  • Total Short Words (1-3)
  • Average Word Length
  • Total word count
  • Abbreviation Frequency
  • Emoticon Frequency
  • Function Word
  • Punctuation
  • Words per Message
  • Characters per Message

https://dl.acm.org/doi/10.1145/3132039

  • Short words (1-3)
  • Word Length (1-20)
  • POS N-grams
    • Monograms
    • Bigrams
    • Trigrams
  • Function words
  • Punctuations
  • Word bi-grams/trigrams
  • Unique word ratio
  • N-grmas
    • Monograms
    • Bigrams
    • Trigrams
    • Tretragrams
  • Case Function
    • all lower
    • all caps
    • Capital Case
    • Camel Case
    • non-traditional
  • Vocabulary Richness
    • Hapax Legomenon
    • Yule's I
    • Sichel's S
    • Brunet's W
    • Honore's R
  • Syntactic Pairs
  • Frequency
    • Digits
    • Special Characters
    • Alphabets
    • Whitespace/Tabs
  • Count
    • Tab/Space
    • Uppercase
    • All letters
    • Characters/digits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant