Skip to content

AdguardTeam/companiesdb

Repository files navigation

Companies DB

This is a companies DB that we use in AdGuard Home and AdGuard DNS. It is basically the Whotracks.me database converted to a simple JSON format with some additions from us.

In addition, there's also a file with companies metadata that we use in AdGuard VPN.

Workflow

  • create a fork of the repository on GitHub.
  • create a branch from actual main branch.
  • add a tracker.
  • create a Pull Request.

Naming of branches and commits

  • the branch name format: fix/issueNumber_domain
fix/34_showrss.info
  • the commit message format: Fix #issueNumber domain
Fix #34 showrss.info

Assignment of files

The list of trackers and companies is generated from the database whotracks.me.

Trackers:

Companies:

VPN Services:

How to add new or rewrite whotracks.me data

If you need to add new data or to rewrite whotracks.me data:

Warning

Add companies and tracker names in alphabetical order. Add tracker domains alphabetically by value.

How to add a new company or overwrite whotracks.me data

The data about the company is added to the source/companies.json file into the JSON key with the name that defines companyId, which is used when adding trackers:

  • name - the official name of the company, will be displayed in the filter log.
  • websiteUrl - the address of the company website, also used to define the company icon.
  • description - company description, not displayed anywhere.
"companyincID": {
    "name": "Company inc.",
    "websiteUrl": "https://www.company.org/",
    "description": "Description of Company inc."
}

How to add a new tracker or overwrite whotracks.me data

The data about the tracker is added to the source/trackers.json file into the nested JSON key inside the trackers section with the name that defines the tracker name of the company, which is used when adding trackers to the trackerDomains section:

"trackers": {
        "company_trackername": {
            "name": "Company inc. Analytics",
            "categoryId": 6,
            "url": "https://analytics.company.org/",
            "companyId": "companyincID"
        }
}

Add tracker domains to the trackerDomains section:

  • key - tracker domain.
  • value - the tracker name of the company.
"trackerDomains": {
        "collect.company.org": "company_trackername"
}

Warning

If the value does not exist - enter null:

"url": null

Tracker categories

# Name Purpose
0 audio_video_player Enables websites to publish, distribute, and optimize video and audio content
1 comments Enables comments sections for articles and product reviews
2 customer_interaction Includes chat, email messaging, customer support, and other interaction tools
3 pornvertising Delivers advertisements that generally appear on sites with adult content
4 advertising Provides advertising or advertising-related services such as data collection, behavioral analysis or re-targeting
5 essential Includes tag managers, privacy notices, and technologies that are critical to the functionality of a website
6 site_analytics Collects and analyzes data related to site usage and performance
7 social_media Integrates features related to social media sites
8 misc This tracker does not fit in other categories
9 cdn Content delivery network that delivers resources for different site utilities and usually for many different customers
10 hosting This is a service used by the content provider or site owner
11 unknown This tracker has either not been labelled yet, or we do not have enough information to label it
12 extensions -
13 email Includes webmail and email clients
14 consent -
15 telemetry -
16 mobile_analytics Collects and analyzes data related to mobile app usage and performance

How to build trackers data

yarn install
yarn convert

The result is:

  • dist/companies.json - companies data JSON file. This file contains the companies list from whotracks.me merged with AdGuard companies from source/companies.json.

  • dist/trackers.json - trackers data JSON file. Combined data from two files:

    • source/trackers.json
    • dist/whotracksme.json.

    An additional key is added to the information from AdGuard files: "source": "AdGuard"

  • dist/trackers.csv - trackers data CSV file. This file is used by the ETL process of AdGuard DNS, be very careful with changing it's structure.

  • dist/whotrackme.json - actual whotrack.me trackers data json file, compiled from trackerdb.sql.

During the build process, a list of warnings and errors is displayed that should be fixed.

Company icons

The favicon of the company website is used as the company icon. It can be checked using our icon service:

https://icons.adguard.org/icon?domain=adguard.com

Policy

The detailed policy currently is under development. The decision to add a company is at the discretion of the maintainers, each request will review on a case-by-case basis. Factors such as the company's industry, reputation, and relevance will be taken into account during the evaluation process.

Currently, we are avoiding adding personal websites/blogs or services that do not seem to have sufficient popularity.

Acknowledgements

We would like to thank the team at whotrack.me for their work. Initially, our database was built on top of the whotrack.me database, using their extensive data collection. However, we would like to emphasise that our current database is now independent and updated separately from whotrack.me.

About

This is a companies DB that we use in AdGuard Home and AdGuard DNS.

Resources

License

Stars

Watchers

Forks