Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Contains the materials used for the blog post about the usage the security.txt file.

License

Notifications You must be signed in to change notification settings

ExcelliumSA/SecurityTXT-Study

Repository files navigation

Description

Test python script

This project contains the materials used for the following blog post about the usage the security.txt file:

https://excellium-services.com/2021/05/18/security-txt/

Scripts

Script call chain: generate-source-(ct|majestic).sh > generate-stats.py.

Requirements for the python (>= 3.7) script are installed via the command: pip install -r requirements.txt

generate-source-ct.sh:

Extract a list of LU domains from Certificate Transparency log using crt.sh data provider.

generate-source-majestic.sh:

Extract a list of LU domains from Majestic Top 1 million most visited sites data provider.

generate-source.ps1:

Same goal the generate-source-ct.sh but using direct database access in order to extract more records. This script deal with limitations in terms of execution time allowed for a SQL query.

💬 However, after several tentatives, it was more efficient to use the web API via the advanced search because query execution time limitations were too restrictives.

generate-stats.py:

Check for the presence of the security.txt file on the differents domains.

💬 The approach, regarding the LU obtained domains, is the following:

  • If the domain is related to a non-web one (pop, ftp, lync, etc) then the subdomain is replaced by www: sip.excellium.lu become www.excellium.lu
  • If the domain if a mail address then the domain is extracted and the www subdomain is used as prefix: info@excellium.lu become www.excellium.lu
  • Duplicate domains are handled to only test a domain one time.

test-script.sh:

Used by the GitHub action workflow to ensure that the python script peform its duty correctly.

Data file

File test-source.txt is the same file than source-(ct|majestic).txt files. However, it contains a subset of the domains because it is only used for the GitHub action workflow. The GitHub action workflow is used to allow the dependency checker of GitHub to verify that upgrading a dependency do not break the python script.

source-ct.txt:

Contains the list of all LU domains gathered from Certificate Transparency log.

source-majestic.txt:

Contains the list of all LU domains gathered from Majestic Top 1 million most visited sites.

Images

File with filename IMG*.png are just images used for the blog post.

IDE

Visual Studio Code are used for all the scripts.

A workspace file as well as a python debug configuration file are provided.

References