Skip to content

cjdd3b/car-datascience-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAR Data Science Toolkit

The CAR datas science toolkit is a collection of common data science tools and algorithms, implemented and documented as simply as possible for data journalists to learn from and understand.

Tools currently implemented include:

  • Clustering algorithms: DBSCAN; k-means clustering
  • Classification: Naive Bayes classifier; k-nearest neighbors
  • Similarity metrics: Euclidean distance; Jaccard similarity; cosine similarity; Pearson similarity; Hamming distance
  • MapReduce workflow that calculates pairwise document similarity based on TF-IDF weights.

About

Simple implementations of data science tools for use by newspaper reporters.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages