Skip to content

juanpablocruz/tf-idf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TFIDF

Python 2.7

This program watches over a directory and returns the N top ranked files for a given query string.

Algorithm

Term Frequency - Inverse Document Frequency is an algorithm for computing the relevance of a word in a file against itself and the corpus of all the others files in the directory.

The time complexity in the worst case is:

  • equation
  • equation assuming there are the same number of terms as files and words in files

And the space is equation as an array and a dict of files are stored.

Dependencies

In order to watch over a directory TFIDF uses the watchdog module.

Installation

$ python setup.py install

This will add tfidf script to PATH. In OSX/UNIX it will be added to /usr/local/bin

Usage

$ python tfidf.py -d dir -n N -p P -t "terms"

Run tests

$ python -m unittest discover -s test -t tfidf

About

TF-IDF report over a watched directory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages