Skip to content

timasjov/scala-spark-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark clustering algorithms

Implemntation of DBSCAN and K-means clustering algorithms in Scala using Spark framework. Algorithms deal only with two dimensional (x and y) data.

DBSCAN

Program arguments: <input_file> <min_points_in_cluster> <epsilon>

KMeans

Program arguments: <input_file> <number_of_clusters> <converge_distance>

Dataset

Sample dataset file is included - data.txt.

Running

  • When launching on a cluster refer to Spak official documentation.
  • In order to run on local machine use -Dspark.master=local VM option.

About

Scala implementation of clustering algorithms using Spark framework

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages