Feature Agglomeration Clustering

Reads CSV text files and uses a Tfidf vectoriser to semantically cluster like sentences, then uses a hierarchical clustering algorithm to assign the words to n clusters.

I have also included a Kmeans clustering example for comparision.

Dependencies

Numpy (http://www.numpy.org/)

Sci-kit Learn (http://scikit-learn.org/stable/index.html) (you will need to compile numpy and scikit learn from source on windows)

Pandas (http://pandas.pydata.org/)

NLTK (http://www.nltk.org/)

Matplotlib (https://github.com/matplotlib/matplotlib)

Virtual Env (https://virtualenv.pypa.io/en/stable/) (creating a virtual environment is my preferred method of installing dependencies)

How to use it?

Install dependencies using pip
run python.exe > import nltk > nltk.download()
Download the 'stopwords' corpus
run Cluster.py choose the CSV file you want to cluster and the number of clusters
View results

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
AGG Cluster.py		AGG Cluster.py
Kmeans.py		Kmeans.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

AGG Cluster.py

AGG Cluster.py

Kmeans.py

Kmeans.py

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Feature Agglomeration Clustering

Dependencies

How to use it?

About

Releases

Packages

Languages

License

DamiPayne/Feature-Agglomeration-Clustering

Folders and files

Latest commit

History

Repository files navigation

Feature Agglomeration Clustering

Dependencies

How to use it?

About

Resources

License

Stars

Watchers

Forks

Languages