Skip to content

OpenMined/PSI

Repository files navigation

om-logo

CD License OpenCollective

PSI

Private Set Intersection protocol based on ECDH and Golomb Compressed Sets or Bloom Filters.

Protocol

The Private Set Intersection (PSI) protocol involves two parties, a client and a server, each holding a dataset. The goal of the protocol is for the client to determine the intersection between their dataset and the server's dataset, without revealing any information about their respective datasets to each other.

The protocol proceeds as follows:

  1. Setup (server)

The server encrypts all its elements x under a commutative encryption scheme, computing H(x)^s where s is its secret key. The encrypted elements are then inserted into a container and sent to the client in the form of a serialized protobuf and resembles* the following:

[ H(x_1)^(s), H(x_2)^(s), ... , H(x_n)^(s) ]
  1. Request (client)

The client encrypts all their elements x using the commutative encryption scheme, computing H(x)^c, where c is its secret key. The client sends its encrypted elements to the server along with a boolean flag, reveal_intersection, indicating whether the client wants to learn the elements in the intersection or only its size (cardinality). The payload is sent as a serialized protobuf and resembles* the following:

[ H(x_1)^(c), H(x_2)^(c), ... , H(x_n)^(c) ]
  1. Response (server)

For each encrypted element H(x)^c received from the client, the server encrypts it again under the commutative encryption scheme with its secret key s, computing (H(x)^c)^s = H(x)^(cs). The result is sent back to the client in a serialized protobuf and resembles* the following:

[ H(x_1)^(cs), H(x_2)^(cs), ... , H(x_n)^(cs) ]
  1. Compute intersection (client)

The client decrypts each element received from the server's response using its secret key c, computing (H(x)^(cs))^(1/c) = H(x)^s. It then checks whether each decrypted element is present in the container received from the server, and reports the number of matches as the intersection size.

It's worth noting that the protocol has several variants, some of which introduce a small false-positive rate, while others do not generate false positives. This behavior is selective, and the false-positive rate can be tuned. The selection has implications on communication costs as well.

NOTE resembles*: The protocol has configurable containers. Golomb Compressed Sets (Gcs) is the default container but it can be overridden to be BloomFilter or Raw encrypted strings. Gcs and BloomFilter will have false positives whereas Raw will not. Using Raw increases the communication cost as it is sending raw strings over the wire while the other two options drastically reduce the cost at the price of having false positives.

Security

See SECURITY.md.

Requirements

There are requirements for the entire project which each language shares. There also could be requirements for each target language:

Global Requirements

These are the common requirements across all target languages of this project.

  • A compiler such as clang or gcc
  • Bazel

Installation

The repository uses a folder structure to isolate the supported targets from one another:

private_set_intersection/<target language>/<sources>

C++

See the C++ README.md

JavaScript

See the JavaScript README.md

Go

See the Go README.md

Python

See the Python README.md

Rust

See the Rust README.md

Usage

To use this library in another Bazel project, add the following to your WORKSPACE file:

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

git_repository(
   name = "org_openmined_psi",
   remote = "https://github.com/OpenMined/PSI",
   branch = "master",
)

load("@org_openmined_psi//private_set_intersection:preload.bzl", "psi_preload")

psi_preload()

load("@org_openmined_psi//private_set_intersection:deps.bzl", "psi_deps")

psi_deps()

load("@rules_python//python:pip.bzl", "pip_parse")

pip_parse(
    name = "pip_deps",
    # Generated via pip-compile requirements.in
    requirements_lock = "@org_openmined_psi//private_set_intersection/python:requirements.txt",
)

load("@pip_deps//:requirements.bzl", "install_deps")

install_deps()

load("@build_bazel_rules_nodejs//:index.bzl", "node_repositories", "npm_install")

node_repositories()

npm_install(
    name = "npm",
    package_json = "//:package.json",
    package_lock_json = "//:package-lock.json",
)

load("@emsdk//:emscripten_deps.bzl", emsdk_emscripten_deps = "emscripten_deps")

emsdk_emscripten_deps()

A full description of the protocol can be found in the documentation of the PsiClient class. The corresponding server class is PsiServer. An example of how to interleave the different phases of the protocol can be found in psi_server_test.cpp.

Changes

See CHANGES.md.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Contributors

See CONTRIBUTORS.md.

License

Apache License 2.0