Skip to content

Text Mining code using TF-IDF algorithm for finding keywords and Apriori algorithm to produce association rules

Notifications You must be signed in to change notification settings

MrPatel95/Text-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-Mining

This code can be used to assign keywords to documents and find association rules between words from database of documents. Further, with little modifications one can create a document suggestion system using search keywords.

Getting Started

  • Clone this repository
  • Execute textMining.py
  • You will be asked support and confidence value. Ones you enter those, you'll get the association rules as output.
  • That's pretty much it. Good Job!

Prerequisites

Need to have python 3.6 installed on your machine.

Running the tests

  • The code is written in such a way that when you execute TextMining.py, it will check for the folder named documentDatabase and read all the .txt files in it. Each text file acts as a separate document. Since the input of the code should be database of documents, we have multiple documents in documentDatabase folder.
  • Ones all the documents are read, they are cleaned by removing stop words. A word is further cleaned using stemming. A list of stop words can be found in listOfStopWords.txt
Example of stemming: fill, filled, filling can be interpreted as fill
  • Further, each document is assigned few keywords using tf-idf algorithm. Keywords are written in a file named aprioriInput.txt At last Apriori Algorithm takes on the work. It reads aprioriInput.txt and generate association rules based on Minimum Support and Minimum Confidence
  • Minimum Support: A minimum support is applied to find all frequent itemsets in a database.
  • Minimum Confidence: A minimum confidence is applied to these frequent itemsets in order to form rules.

Built With

Fork the repo and try to come up with some optimized version of the algorithm.

Author

Social

It is crucial to stay social ;)

About

Text Mining code using TF-IDF algorithm for finding keywords and Apriori algorithm to produce association rules

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages