GitHub - gabrielspmoreira/kmeans_mapreduce_thunders: K-means on Map Reduce implementation (Python) for thunders locations clustering on South hemisphere

This is a Python implementation of K-Means clustering algorithm using the Map Reduce paradigm. It is customized for processing a thunders dataset, extracted from STARNET (Sferics Timing And Ranging NETwork)

Under src/ folder are the mapper and reducer scripts that can be run on a Hadoop environment. There is also a script to run the Hadoop job on Amazon Elastic Map Reduce (run_kmeans_emr.py).
Under data/ folder is a sample of the thunders that "falled" on February 28, 2014.
Under results/ folder are CSV datasets containing all thunders of that date in a single file and the resulting clusters (for k=10 and k=50).

In the picture bellow are presented a thunders distribution map and a heat map to show concentration, generated by ArcGIS Desktop.

In the following maps, you can see the generated cluster distribution (for k=10 and k=50) against the heat map, showing that K-Means algorithm worked as expected, trying to minimize the distance among clusters and data points.

A detailed description of the problem and this implementation is available in this post

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

results

results

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

About

Releases

Packages

Languages

License

gabrielspmoreira/kmeans_mapreduce_thunders

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages