Skip to content

spetrescu/literature-survey-log-parsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

literature-survey-log-parsing

This repository contains all the relevant information referenced in Log Parsing Literature Survey.
To run the experiments, please navigate to Experiments.
To understand more about the search method of the survey, please navigate to Method.

This section contains detalied information regarding the Experiments section of Log Parsing Literature Survey.
There are two environments available for running the experiments, namely Python 2 and Python 3. Based on the method that you would like to experiment with, please follow the appropriate setup.

Results

Each method below has been run 10 times for each of the dataset sizes.

Scalability

AEL
  • BGL
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

  • OpenSSH
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Thunderbird
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Windows
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

Spell
  • BGL
    • [1k, ..., 300k] => NO 10k

      [1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • OpenSSH
    • [1k, ..., 500k] => NO 10k

      [1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Thunderbird
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Windows
    • [1k, ..., 20k] => NO 50k

      [1k, 2k, 4k, 10k, 20k]

LogMine
  • Android
    • [1k, ..., 20k] => NO 10k

      [1k, 2k, 4k, 20k]

  • BGL
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

  • HDFS
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

  • Thunderbird
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

  • Windows
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

SLCT
  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Thunderbird
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

  • Windows
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

Drain
  • Android
    • [1k, ..., 20k] => ALL

      [1k, 2k, 4k, 10k, 20k]

  • BGL
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • OpenSSH
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Thunderbird
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Windows
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

IPLoM
  • BGL
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • OpenSSH
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Thunderbird
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Windows
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

Lenma
  • Android
    • [1k, ..., 200k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]

  • BGL
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • OpenSSH
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Thunderbird
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Windows
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

MoLFI
  • BGL
    • [1k, ..., 50k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k]

  • HDFS
    • [1k, ..., 100k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k]

  • OpenSSH
    • [1k, ..., 50k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k]

  • Thunderbird
    • [1k, ..., 50k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k]

  • Windows
    • [1k, ..., 50k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k]

SHISO
  • BGL
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • OpenSSH
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • Thunderbird
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • Windows
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

LogCluster
  • BGL
    • [1k, ..., 300k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • OpenSSH
    • [1k, ..., 500k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 500k]

  • Thunderbird
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Windows
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

LogSig
  • OpenSSH
    • [1k, ..., 200k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]

  • Thunderbird
    • [1k, ..., 200k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]

  • Windows
    • [1k, ..., 200k] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]

Accuracy

Spell
  • BGL
    • [1k, ..., 300k] => NO 10k

      [1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k]

  • HDFS
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • OpenSSH
    • [1k, ..., 500k] => NO 10k

      [1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k, 500k]

  • Thunderbird
    • [1k, ..., 1M] => ALL

      [1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]

  • Windows
    • [1k, ..., 20k] => NO 50k

      [1k, 2k, 4k, 10k, 20k]

Algorithms

Python 2 methods
  • AEL
  • Drain
  • IPLoM
  • LenMa
  • LFA
  • LKE
  • LogCluster
  • LogMine
  • LogSig
  • SHISO
  • SLCT
  • Spell

Although implemented, methods with * are not scalable.

Python 3 methods
  • MoLFI
  • NuLog

This section contains detalied information regarding the Method section of Log Parsing Literature Survey.
The queries used for the survey can be found under Queries.
Overall statistics can be found below.

Databases queried: Google Scholar, Scopus.

Number of queries (Google Scholar): 7
Number of queries (Scopus): 1

Number of papers selected after running queries (Google Scholar): 59
Number of papers selected after running queries (Scopus): 13

Number of papers selected after snowballing (Google Scholar): 34
Number of papers selected after snowballing (Scopus): 0

Total references checked while snowballing (Google Scholar): 1707
Total references checked while snowballing (Scopus): 344
Total references checked while snowballing: 2051

Total number of papers selected for survey: 93

Queries

Find below the queries used for the survey.

Google Scholar

  1. [Query 1] log parsing

    1. Tools and Benchmarks for Automated Log Parsing (57)
      1. SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
      2. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning
      3. Detecting Large-Scale System Problems by Mining Console Logs
      4. A Data Clustering Algorithm for Mining Patterns From Event Logs
      5. LogCluster - A Data Clustering and Pattern Mining Algorithm for Event Logs
      6. Clustering Event Logs Using Iterative Partitioning
      7. Length Matters: Clustering System Log Messages using Length of Words
      8. LogMine: Fast Pattern Recognition for Log Analytics
      9. Abstracting Log Lines to Log Event Types for Mining Software System Logs
      10. LogSig: Generating System Events from Raw Textual Logs
      11. Incremental Mining of System Log Format
      12. Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper)
    2. Towards Automated Log Parsing for Large-Scale Log Data Analysis (54)
      1. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
      2. A Lightweight Algorithm for Message Type Extraction in System Application Logs
    3. An Evaluation Study on Log Parsing and Its Use in Log Mining (41)
      1. Mining Event Logs with SLCT and LogHound
    4. Drain: An Online Log Parsing Approach with Fixed Depth Tree (35)

    5. A Directed Acyclic Graph Approach to Online Log Parsing (41)

    6. Logram: Efficient Log Parsing Using n-Gram Dictionaries (74)
      1. Mining Invariants from Console Logs for System Problem Detection
      2. An automated approach for abstracting execution logs to execution events
      3. Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays
    7. Self-Supervised Log Parsing (20)

    8. LogParse: Making Log Parsing Adaptive through Word Classification (34)
      1. Learning Latent Events from Network Message Logs
    9. Improving Performances of Log Mining for Anomaly Prediction Through NLP-Based Log Parsing (19)

    10. Spell: Streaming Parsing of System Event Logs (18)
      1. LogTree: A Framework for Generating System Events from Raw Textual Logs
      2. HLAer: a System for Heterogeneous Log Analysis
    11. LPV: A Log Parser Based on Vectorization for Offline and Online Log Parsing (21)

    12. An Efficient Log Parsing Algorithm Based on Heuristic Rules (30)

    13. Paddy: An Event Log Parsing Approach using Dynamic Dictionary (21)

    14. A Theoretical Framework for Understanding the Relationship Between Log Parsing and Anomaly Detection (25)

    15. Spell: Online Streaming Parsing of Large Unstructured System Logs (36)

    16. A Confidence-Guided Evaluation for Log Parsers Inner Quality (48)

    17. AWSOM-LP: An Effective Log Parsing Technique Using Pattern Recognition and Frequency Analysis (45)
      1. Towards an NLP-based log template generation algorithm for system log analysis
    18. Prefix-Graph: A Versatile Log Parsing Approach Merging Prefix Tree with Probabilistic Graph (27)
      1. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs
      2. Logan: A Distributed Online Log Parser
    19. Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks (47)
      1. Device-Agnostic Log Anomaly Classification with Partial Labels
    20. Robust Log-Based Anomaly Detection on Unstable Log Data (48)
      1. Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection
    21. LogStamp: Automatic Online Log Parsing Based on Sequence Labelling (23)

    22. A Review of Unstructured Data Analysis and Parsing Methods (33)

    23. OLMPT: Research on Online Log Parsing Method Based on Prefix Tree (13)

    24. A Parallel Approach of Weighted Edit Distance Calculation for Log Parsing (10)
      1. LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems
    25. Flexible Log File Parsing using Hidden Markov Models (13)
      1. A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs
    26. Log Clustering Based Problem Identification for Online Service Systems (30)
      1. Experience Mining Google’s Production Console Logs
    27. Unsupervised Noise Detection in Unstructured data for Automatic Parsing (21)

  2. [Query 2] log parsing survey

    1. System log clustering approaches for cyber security applications: A survey (80)
      1. One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs
      2. GenLog: Accurate Log Template Discovery for Stripped X86 Binaries
    2. A Survey on Automated Log Analysis for Reliability Engineering (205)

  3. [Query 3] log abstraction

    1. A systematic literature review on automated log abstraction techniques (55)
      1. A Method of Large - Scale Log Pattern Mining
    2. Symptom-based Problem Determination Using Log Data Abstraction (37)

    3. Unsupervised Event Abstraction using Pattern Abstraction and Local Process Models (15)

    4. Automatic Event Log Abstraction to Support Forensic Investigation (28)

    5. Event-Log Abstraction using Batch Session Identification and Clustering (20)

    6. Event Log Abstraction in Client-Server Applications (24)

    7. Log Abstraction for Information Security: Heuristics and Reproducibility (39)
      1. amulog: A General Log Analysis Framework for Diverse Template Generation Methods
    8. Practical Multi-pattern Matching Approach for Fast and Scalable Log Abstraction (15)

  4. [Query 4] log abstraction survey

  5. [Query 5] event log parsing

    1. LogLens: A Real-Time Log Analysis System (36)

    2. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics (69)

    3. Experience Report: System Log Analysis for Anomaly Detection (49)

    4. LOGAIDER: A Tool for Mining Potential Correlations of HPC Log Events (23)

    5. LogGAN: a Log-level Generative Adversarial Network for Anomaly Detection using Permutation Event Modeling (32)
      1. Event Extraction from Streaming System Logs
    6. A Search-based Approach for Accurate Identification of Log Message Formats (36)

  6. [Query 6] log signature extraction

    1. Unsupervised Signature Extraction from Forensic Logs (27)
    2. Towards a neural language model for signature extraction from forensic logs (16)
    3. A hybrid approach for log signature generation (17)
  7. [Query 7] event log signature extraction

Scopus

  1. [Query Scopus] TITLE-ABS-KEY(log AND parsing) OR ((logs OR log OR logging OR events OR "event log" OR "event logs" OR "event logs templates" OR "event log signatures" ) AND (abstractionOR parsing))
    1. Log and Execution Trace Analytics System (26)
    2. Virtual Knowledge Graphs for Federated Log Analysis (23)
    3. The Use of Template Miners and Encryption in Log Message Compression (39)
    4. LogEA: Log Extraction and Analysis Tool to Support Forensic Investigation of Linux-based System (27)
    5. On Automatic Parsing of Log Records (36)
    6. MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures (38)
    7. An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples (34)
    8. An Extensible Parsing Pipeline for Unstructured Data Processing (22)
    9. A Dynamic Processing Algorithm for Variable Data in Intranet Security Monitoring (14)
    10. METING: A Robust Log Parser Based on Frequent n-Gram Mining (19)
    11. Log Parser with One-to-One Markup (36)
    12. FastLogSim: A Quick Log Pattern Parser Scheme Based on Text Similarity (17)
    13. AECID-PG: A Tree-Based Log Parser Generator To Enable Log Analysis (13)

Acknowledgements

About

Literature survey on log parsing. Code for accuracy and scalability experiments, and also details on methodology.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published