Skip to content

taylorwood/ADRDemo

Repository files navigation

ADRDemo

A simple F# demonstration of Automated Document Recognition using techniques like text tokenization, n-grams, TF-IDF weighting, CSV parsing, and text classification.

The code assumes the existence of some training data, in the form of plaintext files organized into folders by category:

  • \TrainingData
    • \CategoryA
      • \Sample1.txt
      • \Sample2.txt
    • \CategoryB
      • \SampleA.txt

...and a plain text file to be classified: "unknown.txt".

It also assumes the existence of a word whitelist CSV file, but this can be easily changed to a blacklist ("stopwords") or removed altogether.

Releases

No releases published

Packages

No packages published

Languages