Skip to content
Wes edited this page Apr 25, 2018 · 9 revisions

Table of Contents

General Implementation Information

Checking Container State

 sudo docker-compose ps

Stopping Containers

 sudo docker-compose down

Configuration

The configuration file at cli/config/config.yaml is used by the CLI tool and defines several criteria, such as:

  • Certstream logging (include issuer CA, include root CA, include log source, etc)
  • Active classifier
  • Data sources for features and training data
  • Classifier thresholds for phishing predictions

Here's an example of what it looks like:

certstream:
  colors: true
  include_issuer_ca_name: true
  include_log_source: false
  include_root_ca_name: false
  include_seen_timestamp: false
classifier:
  active: 4_24_v1
data:
  benign_dir: /opt/streamingphish/training_data/benign/
  fqdn_keywords_dir: /opt/streamingphish/training_data/fqdn_keywords/
  keywords_dir: /opt/streamingphish/training_data/keywords/
  malicious_dir: /opt/streamingphish/training_data/malicious/
  similarity_words_dir: /opt/streamingphish/training_data/similarity_words/
  targeted_brands_dir: /opt/streamingphish/training_data/targeted_brands/
  tld_dir: /opt/streamingphish/training_data/tlds/
logging:
  enabled: true
  path: /opt/streamingphish/predictions/
logging_tiers:
  high:
    color: red
    threshold: 0.9
  low:
    color: cyan
    threshold: 0.6
  suspicious:
    color: yellow
    threshold: 0.75
system:
  log_path: /opt/streamingphish/system/
version: 1

Training Data

The training_data/ folder is bind-mounted from the host system directly into the cli container. Any changes to the training data, features, keywords, targeted brands, or TLDs will persist to the host system in the training_data/ folder regardless of the state of the underlying container.

The list of phishing domains used for training are in the training_data/malicious/ folder. The list of benign domains used for training are in the training_data/benign/ folder.

Phishing Predictions from Certstream

Fully-qualified domain names (FQDNs) predicted as phishing will be written to a bind-mounted folder named predictions/. Log files will be generated in this folder based on the scoring thresholds defined in cli/config/config.yaml. The score produced by the classifier when evaluating a fully-qualified domain name will always be between 0 and 1 (1 == phishing, 0 == benign). FQDNs with higher scores are more likely to be phishing. The default thresholds are as follows:

  • "High" threshold is 0.90 and above
  • "Suspicious" threshold is between 0.90 and 0.75
  • "Low" threshold is between 0.75 and 0.60

Any FQDN with a score of 0.60 or lower will not be logged.