Skip to content
David Callies edited this page Jan 25, 2024 · 70 revisions

Introduction

An diagram of HMA

Hasher-Matcher-Actioner (HMA) is an open-source trust and safety tool. You can submit content to your own instance of HMA to scan through content on your platform and flag potential community standards violations. These flags can help you find content that you otherwise would have missed, which you can choose to route to manual review.

HMA looks for copies of content that its been programmed to look for. While you can curate your own lists of content to scan for, the true power of HMA is from its ability to load and share hash lists with existing programs.

See also: Meta's newsroom post about HMA

Why would I want HMA?

HMA can be either a tool in your existing content moderation strategy, or the starting point of a wider moderation ecosystem on your platform. At its core, HMA provides "similarity detection" or "copy detection", which allows detecting perceptually same or similar content that you (or someone else) has already seen. If the specific algorithms that HMA comes with only meet some of your needs, there are interfaces to plug into other solutions (including solutions that aren't just copy-detection, and can handle never-before-seen content), or to only use the subset of functionality you need. Where HMA might shine is integrating with collaborative trust & safety solutions, such as ThreatExchange.

What kinds of capabilities does HMA have?

Over time, HMA may gain more functionality. Additionally, HMA was designed to play nicely with other systems, and so missing functionality can be added by interfacing with other solutions.

  • ✅ Ready
  • 🚧 In development or planned 2022
  • 📋 Planned / Long Term
Content Type Matching Capability
Photos ✅ PDQ
Videos ✅ MD5

Where is the data hosted?

You run your own instance of HMA and have control of the contents you evaluate. You end up having to pay the hosting costs as a result. If someone else runs an instance and says you can call it, then they host the data.

HMA can download matching signals from APIs hosted by someone else.

How does HMA use external APIs?

If you configure it to, HMA will connect to external APIs (like ThreatExchange) to get signals and hashes to compare against.

HMA does not share any data that you do not explicitly share by configuring it to do so. No metrics, no telemetry, etc.

Can I use HMA without connecting to external APIs?

Yes, you can create your own collections of content (called "banks") and match against them without sending data outside of your platform.

How long does it take to start using HMA?

You can get a test deployment on one machine in a few minutes, and test its matching capabilities using a debug UI.

Most platforms can do performance testing on a sampled set of traffic with only a single engineer and maybe only a day's worth of engineering time.

For a full scaled deployment at your platform, scanning every image and video, you'll need to use your own infrastructure management tools like Terraform, Kubernetes, or others, which may take some design. Depending on a number of factors about your platform, it may take 1-2 engineers about a month.

What scale can HMA run at?

HMA is horizontally scalable - to get more throughout you can just deploy more machines. The docker image can also be used in serverless deployments. It has been tested up to 10M banked photos at about 30 ms / lookup.

How expensive is it to run HMA?

This depends on how you deploy HMA. For testing purposes, you might only need a single virtual machine with a small amount of resources.

In benchmarking tests in March 2021, when deployed in a serverless mode, the cost for checking 1MB images was 1 cent per 1000 images, or about $10 per million images. Computing hashes in your own infrastructure can reduce the cost, as hashing is the most expensive component.