Skip to content

whymarrh/private-relay

Repository files navigation

Private Relay

A privacy-preserving TCP proxy based on Signal's Expanding Signal GIF search article.

See the image on Docker Hub →

See the Terraform module →

This README outlines the high-level ideas, see CONTRIBUTING.md for information about how to contribute to/build the project.

What's the idea here?

From Signal's article:[1]

In order to hide your search term from GIPHY, the Signal service acts as a privacy-preserving proxy.

When querying GIPHY:

  1. The Signal app opens a TCP connection to the Signal service.
  2. The Signal service opens a TCP connection to the GIPHY HTTPS API endpoint and relays bytes between the app and GIPHY.
  3. The Signal app negotiates TLS through the proxied TCP connection all the way to the GIPHY HTTPS API endpoint.

Since communication is done via TLS all the way to GIPHY, the Signal service never sees the plaintext contents of what is transmitted or received. Since the TCP connection is proxied through the Signal service, GIPHY doesn't know who issued the request.

The Signal service essentially acts as a VPN for GIPHY traffic: the Signal service knows who you are, but not what you're searching for or selecting. The GIPHY API service sees the search term, but not who you are.

This proxy is an implementation of exactly that.

What does this mean in practice?

If you deployed an example proxy for httpbin.org to relay.privaterelay.technology. You could send requests to httpbin.org through that proxy to hide your IP address from the service.

httpbin.org has a /ip endpoint that will return the requester's IP address:

curl -sSL 'https://httpbin.org/ip' | jq '.origin'
# => $ADDRESS1
curl -sSL --connect-to httpbin.org:443:relay.privaterelay.technology:443 'https://httpbin.org/ip' | jq '.origin'
# => $ADDRESS2
# Note that $ADDRESS1 ≠ $ADDRESS2

In the example above, $ADDRESS1 is your external IP address, as expected, while $ADDRESS2 is the IP address of the proxy.

(See the cURL man page for: --connect-to <HOST1:PORT1:HOST2:PORT2>)

How does it work?

"It's just HAProxy"

The proxy server runs HAProxy in TCP mode, and the TLS connection passes through. A useful diagram from the HAProxy docs:[1]

HAProxy TLS pass-through diagram

HAProxy does not and cannot decipher the traffic.

You can see the full HAProxy configuration used in proxy/haproxy.cfg.

What is the benefit?

Privacy, mostly, at the cost of an extra TCP connection.

From Signal's article, again:[1]

[The proxy service] knows who you are, but not what you're searching for or selecting. The GIPHY API service sees the search term, but not who you are.

How can I host my own proxy?

Fork the repo and configure it!

Architecture

There are two main components:

  1. Cloudflare Load Balancing
  2. HAProxy servers
Cloudflare Load Balancing

(Note: this is not used for load balancing per se, more a way of routing users to the closest HAProxy instance.)

The first component is a Cloudflare Load Balancer in DNS-Only mode with a 30 second TTL.

Operating in this mode does have a caveat:

[This] relies on DNS resolvers respecting the short TTL to re-query Cloudflare’s DNS for an updated list of healthy addresses.

The DNS-only load balancer does dynamic latency-based DNS resolution via Dynamic Steering:

Dynamic Steering uses health check data to identify the fastest pool for a given Cloudflare Region [...]

Dynamic Steering creates Round Trip Time (RTT) profiles based on an exponential weighted moving average (EWMA) of RTT to determine the fastest pool. If there is no current RTT data for your pool in a region or colocation center, Cloudflare directs traffic to the pools in failover order.

HAProxy servers

As described above, HAProxy runs in TCP mode, and the TLS connection passes through. DigitalOcean hosts the HAProxy servers.

Costs

"How much does this cost to host?"

(All amounts are USD.)

The hosting costs depend on the configured regions and bandwidth usage.

The individual monthly costs:

  • DigitalOcean droplet costs vary depending on the droplet size used
    • $5/month for s-1vcpu-1gb
  • DigitalOcean bandwidth costs: (GB used − 1024 GB × # of droplets per region × # of regions total) × $0.01
    • e.g. 10 TB total outbound would be ~$50/month
  • Cloudflare Load Balancing costs

The total monthly costs for the config in this repository:

DigitalOcean Droplets $25
Bandwidth (~8 TB) ~$30
Cloudflare Basic $5
5 origin servers $15
15s checks $15
RTT from 8 regions $15
Latency-based traffic steering $10
DNS (~5.5M queries) $5
Total ~$120

Resources:

This repository is available under the ISC License. See LICENSE.md.