SAND

SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream

Synopsis

SAND is a semi-supervised framework for classifying evolving data streams. Unlike many other existing approaches, it detects concept drifts in an unsupervised way by detecting changes in classifier confidences in classifying test instances. It also addresses concept evolution problem by detecting outliers having strong cohesion among themselves. Please refer the paper given below for a details description of the approach.

Requirements

SAND requires that

Input file will be provided in .arff format.
All the features need to be numeric. If there is any non-numeric featues, it should be converted using standard techniques prior using with SAND.
Features should be normalized to get better performance.

Environment

Java SDK v1.7+
Weka 3.6+
Common Math library v2.2
Apache Logging Services v1.2.15

All of above except java sdk are included inside SRC_SAND_v_0_1 & DIST_SAND_v_0_1 folders.

Execution

To execute the program, use the following steps:

Open a command prompt inside DIST_SAND_v_0_1 folder.
Run the command "java -jar SAND_v_0_1.jar [OPTION(S)]"

Option(s):

-F
Input file path. Do not include file extension .arff in the file path.

Optional option(s):

-S
- Size of warm-up period chunks. Default size is 2000 instances.
-L
- Maximum number of models in the ensemble. Default value is 6.
-U
- Value for confidence threshold. Default value is 0.90. Please refer to the paper for description of confidence threshold.
-D
- use 1 here to execute SAND-D, 0 to execute SAND-F. Default value is 1. Please refer to the paper for description about SAND-D, and SAND-F.
-T
- Labeling delay in number of instances. Default value for classification only is 1. Use appropriate value for novel class detection.
-C
- Classification delay in number of instances. Default value for classification only is 0. Use appropriate value for novel class detection.

Output

Console output

Progress or any change point detected throughout execution.
At the end, it reports percentage of labeled data used.

File output

.log file contains important debug information.
.tmpres file contains the error rates for each chunk. There are six columns as follows:

Chunk #= The current chunk number. Each chunk contains 1000 instances.
FP= How many existing class instances misclassified as novel class in this chunk.
FN= How many novel class instances misclassified as existing class in this chunk.
NC= How many novel class instances are actually there in this chunk.
Err = How many instances are misclassified (including FP and FN) in this chunk.
GlobErr = % Err (cumulative) upto the current chunk.

.res file contains the summary result, i.e., the following error rates:

FP% = % of existing class instances misclassified as novel
FN% = % of novel class instances misclassified as existing class instances.
NC (total) = total number of (actual) novel class instances.
ERR% = % classification error (including FP, FN, and misclassification within existing class).

Reference

SAND: Semi-supervised Adaptive Novel Class Detection and Classification over Data Stream

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
DIST_SAND_v_0_2		DIST_SAND_v_0_2
SRC_SAND_v_0_2		SRC_SAND_v_0_2
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIST_SAND_v_0_2

DIST_SAND_v_0_2

SRC_SAND_v_0_2

SRC_SAND_v_0_2

LICENSE

LICENSE

README.md

README.md

Repository files navigation

SAND

Synopsis

Requirements

Environment

Execution

Option(s):

Optional option(s):

Output

Console output

File output

Reference

About

Releases

Packages

Languages

License

ahhaque/SAND

Folders and files

Latest commit

History

Repository files navigation

SAND

Synopsis

Requirements

Environment

Execution

Option(s):

Optional option(s):

Output

Console output

File output

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages