Implementation CMI

Title CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection
Authors Hoang Vu Nguyen, Emmanuel Müller, Jilles Vreeken, Fabian Keller, Klemens Böhm
 
This page provides information on how to run the implementation of our CMI algorithm.
 
In order to execute CMI you can use the following command line structure:

Parameter Meaning
-FILE_INPUT name of input file
-FILE_SUB_OUTPUT name of output file for subspaces
-FILE_OSCORES_OUTPUT name of output file for outlier scores
-NUM_ROWS number of records
-NUM_MEASURE_COLS number of columns
-FIELD_DELIMITER field delimiter of the input file
-NUM_NEIGHBORS number of nearest neighbors
-USE_DUSO_SEED set to 'true' to use CMIC
-MAX_NUM_SUBSPACES top subspaces used
-ALPHA subsample's size
-NUM_SUBSAMPLING number of subsamples
-NUM_SEEDS number of clusters
-CANDIDATE_CUTOFF size of beam
-MIN_PTS used for clustering
-EPSILON used for clustering
 

Default Parameter Settings

  • CMI:
    • Beam size M = 400 (set M = 32 for data set with less than 10 dimensions)
    • Number of clusters Q = 10
    • Expected subsample size ε = 0.1
    • Number of subspaces = 100
 

Implementation and Code of CMI

We provide the executables and source code of our method in one file: cmi.jar
Note that in order to access the source files, one can rename and unzip the file.

For evaluation of outlier results, as in our paper, we provide an additional executable assisting in the calculation of AUC measures: run.jar (two input parameters: input file name and output file name)