Anomaly-Detection-in-Logs

Anomaly/ Outlier detection has picked up wind in recent days, owing to its applications in cyber security and server monitoring. This repo explores how to use count vectors and identify anomalies through unsupervised learning.

About Dataset

The dataset is a logs data from a remote server generated over 15 days. This dataset is created post cleaning and picking only relevant events on which we wish to identify anomalies.

Columns:

Timestamp of the log
Unique identifier of the request
User IP from which the request is made

Approach

We create Profiles for User-IP over certain time periods. This time period can vary from few hours to few weeks. The profile can include basic count vectors such as total counts, average unit(day/week/hour) counts to complex network calls vectors such as upload/download ratio based on the use case.

In this repo we use basic count and frequency vectors. With profiles in hand, we can use ML algorithms to identify anomalies.

ML Approach

Once the feaure space is generated, we use kmeans to cluster and the points which are farther from all clusters combined are considered anomalous. We use sum of squared distances from the centroids in this repo. We use squared distance instead of absolute distance to weigh the outliers more than others(similar to using MAE vs MSE).

While euclidean distance in the feature space is one way to look at it, Isolation forest offers a unique approach to this problem. Isolation trees see the number of splits it take to reach a certain point, the lesser splits required, the more isolated the point is and hence, anomalous.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
IP Profiling- Anomaly Detection.ipynb		IP Profiling- Anomaly Detection.ipynb
README.md		README.md
final_ip_mapped_data.csv		final_ip_mapped_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

IP Profiling- Anomaly Detection.ipynb

IP Profiling- Anomaly Detection.ipynb

README.md

README.md

final_ip_mapped_data.csv

final_ip_mapped_data.csv

Repository files navigation

Anomaly-Detection-in-Logs

About Dataset

Approach

ML Approach

About

Releases

Packages

Languages

chaiitanyasangani88/Anomaly-Detection-in-Logs

Folders and files

Latest commit

History

Repository files navigation

Anomaly-Detection-in-Logs

About Dataset

Approach

ML Approach

About

Topics

Resources

Stars

Watchers

Forks

Languages