Skip to content

Clustering via Dirichlet Process Mixture Models

Notifications You must be signed in to change notification settings

ekinakyurek/DPMM.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DPMM.jl

This repository is a research work on parallel dirichlet process mixture models and clustering on Julia by Ekin Akyürek with supervision of John W. Fischer III.

Getting Started

Demo:

  gm = GridMixture(2)
  X, clabels = rand_with_label(gm,100000)
  fit(X; ncpu=3) # runs parallel split-merge algorithm

Visual Demo (requires OpenGL) :

  gm = GridMixture(2)
  X, clabels = rand_with_label(gm,100000)
  scene = setup_scene(X)
  fit(X; ncpu=3, scene=scene) # visualize parallel split-merge algorithm

For details please see the function documentation

Algorithms

  1. Collapsed Gibbs Sampler
labels = fit(X; algorithm=CollapsedAlgorithm) # serial collapsed
  1. Quasi-Collapsed Gibbs Sampler
labels = fit(X; algorithm=CollapsedAlgorithm, quasi=true) # quasi & serial collapsed
labels = fit(X; algorithm=CollapsedAlgorithm, quasi=true, ncpu=4) # quasi & parallel collapsed
  1. Direct Gibbs Sampler
labels = fit(X; algorithm=DirectAlgorithm) # direct
labels = fit(X; algorithm=DirectAlgorithm ncpu=4) # parallel direct
  1. Quasi-Direct Gibbs Sampler
labels = fit(X; algorithm=DirectAlgorithm, quasi=true) # quasi direct gibbs algorithm
labels = fit(X; algorithm=DirectAlgorithm, quasi=true, ncpu=4) # quasi & parallel direct gibbs direct gibbs
  1. Split-Merge Gibbs Sampler
labels = fit(X; algorithm=SplitMergeAlgorithm) # split-merge
labels = fit(X; algorithm=SplitMergeAlgorithm, ncpu=4) # parallel split-merge

Parallel Benchmarking

Run below command:

julia --project test/parallel_benchmark.jl  --N 1000000 --K 6 --Kinit 1 --ncpu 4
  • Results-I: Time (sec) to run 100 DP-GMM iterations for d=2, N=1e6, K=6.
Code ncpu=1 ncpu=2 ncpu=4 ncpu=8
C++ 76.94 40.57 22.23 13.01
DPMM.jl 75.71 41.54 20.86 12.77
Julia-BNP 1101.97 572.50 345.58 172.30
  • Results-II: Time (sec) to run 100 DP-MNMM iterations for d=100, N=1e6, K=6.
Code ncpu=1 ncpu=2 ncpu=4 ncpu=8
C++ 134.25 77.55 40.97 23.60
DPMM.jl 113.131 68.46 45.55 30.79
Julia-BNP 234.40 136.43 87.34 55.10

Releases

No releases published

Packages

No packages published