Skip to content
/ CLICKS Public

CLICKS finds subspace clusters in categorical data using a k-partite clique mining approach.

License

Notifications You must be signed in to change notification settings

zakimjz/CLICKS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLICKS Categorical Subspace Clustering Algorithm

CLICKS finds subspace clusters in categorical data using a k-partite clique mining approach.

Mohammed J. Zaki, Markus Peters, Ira Assent, and Thomas Seidl. CLICKS: an effective algorithm for mining subspace clusters in categorical datasets. Data and Knowledge Engineering, 60(1):51–70, January 2007. special issue on Intelligent Data Mining. doi:10.1016/j.datak.2006.01.005.

Mohammed J. Zaki, Markus Peters, Ira Assent, and Thomas Seidl. CLICKS: an effective algorithm for mining subspace clusters in categorical datasets. In 11th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. August 2005.

HOW TO

cd click/code and run make for kcdriver

cd convert and run make for mconvert

Here's a sample session for the

mushroom dataset

1.) mconvert CSVTOCLICK mush.data mush.confusion mush.mapping 23 0 0 > mush.click

    -> Converts mush.data (the space-separated data) to mush.click. 
    -> mush.confusion is a file where mconvert writes - for each line in
    mush.data - an integer class label.
    -> mush.mapping is a file where mconvert saves the mapping between
    integer classes (which are used by click) and the literal classes in the
    actual data file.
    -> 23: total number of columns in mush.data
    -> 0: Index of the label column (used to derive the class for
    mush.confusion)
    -> 0: Omit column 0 in the output data file

2.) ../../code/kcdriver mush.click 0.1 0.1 click.bench SUB CONFUSION MAP mush.mapping

    -> Runs CLICK(S) with alpha = 0.1 and minsup = 0.1, benchmark results
    are appended to click.bench, subspace clusters are enabled
    -> A confusion file is created in click_confusion.txt
    -> CLICKS uses the mapping information contained in mush.mapping (see
    step 1) to output clusters in terms of actual classes (from mush.data)
    instead of integer class labels

NOTE: just type ./kcdriver for all the options. The code allows you to mine clusters using alpha*expectation or you can also treat alpha as just a minsup value (between [0,1]).

3.) mconvert CONFUSION mush.confusion click_confusion.txt dummymap.txt

    -> Output confusion matrix in LaTeX table format. The first file
    (mush.confusion) contains the actual classes, the second one the classes
    derived by CLICKS.
    -> dummymap.txt is an empty file here. You can use is to map the integer
    class labels between the two confusion files (e.g. class 1 in
    mush.confusion corresponds to class 0 in click_confusion, ...)

More infos on the two commands can be found in the appendices of the thesis, which is attached in the most recent version.

Releases

No releases published

Packages

No packages published

Languages