Skip to content

DamiPayne/Feature-Agglomeration-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature Agglomeration Clustering

Reads CSV text files and uses a Tfidf vectoriser to semantically cluster like sentences, then uses a hierarchical clustering algorithm to assign the words to n clusters.

I have also included a Kmeans clustering example for comparision.

Dependencies

Numpy (http://www.numpy.org/)

Sci-kit Learn (http://scikit-learn.org/stable/index.html) (you will need to compile numpy and scikit learn from source on windows)

Pandas (http://pandas.pydata.org/)

NLTK (http://www.nltk.org/)

Matplotlib (https://github.com/matplotlib/matplotlib)

Virtual Env (https://virtualenv.pypa.io/en/stable/) (creating a virtual environment is my preferred method of installing dependencies)

How to use it?

  1. Install dependencies using pip
  2. run python.exe > import nltk > nltk.download()
  3. Download the 'stopwords' corpus
  4. run Cluster.py choose the CSV file you want to cluster and the number of clusters
  5. View results

About

Reads CSV text files and uses a Tfidf vectoriser to semantically cluster like sentences, then uses a hierarchical clustering algorithm to assign the words to n clusters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages