Skip to content

A parameter-free anomaly detection using pattern-based compression

License

Notifications You must be signed in to change notification settings

HamedMP/CompreX

Repository files navigation

Comprex

CircleCI

A parameter-free anomaly detection using pattern-based compression

Anomaly detection as one of the tasks in unsupervised learning is a hard task by itself, and adding categorical data which most of the algorithms requires encoding them makes it harder.

When dealing with high dimensional categorical data, features which have +1 million different values (which is my case) encoding becomes impractical. While researching for best approaches for dealing with this kind of datasets, I found CompreX 1 approach the most intuitive way to deal with the data by encode them using shannon entropy and MDL (Minimum Description Length).

Dependencies

  1. Numpy
  2. Pandas
  3. Scikit-learn

Roadmap

  • Initial implementation, v1.0.0
  • Add tests
  • Add Docstrings
  • Publish documentation
  • Finish Scikit-learn style api compatibility issues
  • Upload to PyPi

Reference

1 L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast and reliable anomaly detection incategorical data. 2012. (read here)

About

A parameter-free anomaly detection using pattern-based compression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages