Skip to content

gbrsouza/TF-iDF

Repository files navigation

TF-iDF

Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. Reference

How to run

  • For run this software is necessary a files database (use the archives paste to this).
  • Add in file "forRead.txt" all files links that you want read. For this work, run the script "read.py".
  • Modify the parameters to generate the links correctly.
  • Open the code in a IDE Java as Maven project
  • Run the file App.java in path src/main/java/bigdata/TFidF as a JavaApplication

Concurrent Techniques

Mutex

Semaphore

Fork Join

About

A Term Frequency and inverse distance Frenquency (TF-idF) algorithm in Java language using concurrent techniques

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages