Skip to content

Tech Meeting Notes 2020 07 30

Erik Moeller edited this page Jul 30, 2020 · 4 revisions

SecureDrop Tech Meeting, 2020-07-30

Topic: Reproducible builds

Facilitator: Kushal

Notetaker: Erik

Agenda

  1. What are reproducible builds?
  2. Why do we want reproducible builds?
  3. What is stopping us to have reproducible builds easily?
  4. How did we achieve reproducible builds for debian packages (except securedrop-app-code package)?
  5. What are the pain points for developers with the current workflow?

What are reproducible builds? - https://reproducible-builds.org/

Kushal: Goal: Follow specified steps and always get exactly the same binary Debian package, whether it's done by someone at FPF or a third party. It's an old problem with many moving parts. Even in Debian project it's not accomplished yet for all package, and Red Hat / RPM is even further behind.

Why do we want reproducible builds?

Kushal: Security is a top concern for SecureDrop. Reproducible builds provide our users with assurances that what they are using is what we, the SecureDrop project, developed; minimize risk of tampering and make it easier to detect. This strengthens both actual security and perception thereof.

What is stopping us to have reproducible builds easily?

Kushal: We're still using older build tooling. E.g., Ubuntu Xenial, Debian Buster. Build flags and generation of source tarballs, final distribution methods: we're reliant on older tooling in the distributions we're using. That makes our work more difficult than if we used bleeding edge.

Debian Buster is latest stable version -- packages frozen ~18 months ago. It's worth comparing Debian Buster vs. Debian Unstable to monitor specific improvements that make builds more reproducible.

Conor:

  • One compromise is to enable the stable-backports repo.
  • Can you give some specific examples here? It seems to me that we have support for features like SOURCE_DATE_EPOCH.

Kushal: Example: When we decided on build practices for SecureDrop Workstation, we agreed that we want to start from a source tarball. This is done using commands like python setup.py sdist. These tarballs are signed by our actual keys and are the starting point. The missing part is that the python setup.py sdist tarball procedure is not yet reproducible.

https://github.com/python/cpython/pull/20331 will fix the .tar part, but not the .tar.gz part.

Conor: That would still be a big win because there are other tools for the .gz part, like the "strip-nondeterminism" tool maintained by the reproducible builds initiative ( https://salsa.debian.org/reproducible-builds/strip-nondeterminism ). We could also make the output a ZIP file.

With minor variation we observe on source tarball, we could code our way around that -- we could munge the problematic non-reproducible bits in post-sdist tooling we'd have to create/maintain.

Kushal: There is another upstream pending change, https://github.com/pypa/setuptools/pull/2136 , which would modify sdist itself.

Let's say our source distribution is reproducible. Because we are writing Python applications, we are dependent on different Python modules. To ensure we can ship latest versions and not outdated system packages, we use dh-virtualenv to create a virtual environment with all our dependencies packaged in our Debian package.

To build those packages, we download sources and rebuild. The result of building wheels vary between different operating systems. We solve this by creating wheels beforehand, signing and uploading them an LFS repo. This way, we know that everyone is using the same wheels.

Mickael: Does dh-virtualenv use any Python dependencies itself, and if so, how does it manage them?

Kushal: dh-virtualenv can use virtualenv command or venv module. We are still using the virtualenv project to create the final virtual environment.

Mickael: What I meant was, at build time, does dh-virtualenv itself need any dependencies, and if so, what dependencies does it use for that?

Kushal: dh-virtualenv creates a virtualenv of its own using the virtualenv command or the venv module. Each have their own way of installing Python modules. venv gets a package called python3-wheels, which contains all the required wheels to bootstrap the package.

As Debian decided to remove venv from the standard installation, we have to install python3-venv and python3-pip at the system level.

python3 -m venv .venv -- will create a virtual environment.

How did we achieve reproducible builds for debian packages (except securedrop-app-code package)?

Kushal: dh-virtualenv takes the provided wheels and installs them in the provided virtual environment. Because they are already signed and built, the final packages are reproducible.

What are the paint points for developers with the current workflow?

Let's say I want to introduce a new dependency for securedrop-client. If we add it as a dependency, the Debian package build will fail, because dh-virtualenv will try and fail to find this dependency in our repository. We have to first get that wheel built, which slows down development.

https://github.com/freedomofpress/securedrop-debian-packaging/wiki

Wheels go to https://pypi.securedrop.org/localwheels/

Conor: It would be good to be able to track what we need to make wheels reproducible -- some of them may already be there. setuptools supports reproducible wheels as of 0.27.0 (2016-02-05): https://wheel.readthedocs.io/en/stable/news.html

Kushal: Agree - that seems like a potential ACTION. More work can be done as part of the Ubuntu 18.04/20.04 transition.

Most major companies have their own internal repos of wheels.

Erik: Do we imagine we will maintain pypi.securedrop.org indefinitely?

Kushal: If/when wheels are fully reproducible we may stop maintaining our own repository. Having our own repository provides additional security, e.g., we won't build maliciously added dependencies.

Erik: If we must maintain pypi.securedrop.org, what velocity advantages (if any) do reproducible builds offer? It seems we're committed to maintaining that work.

Conor: In an ideal world, I'd like to see all the rungs on the ladder fully reproducible. Right now we're drilling into the wheels part. If Python ecosystem adopts reproducible builds philosophy sufficiently, all future versions of wheels will be fully reproducible. We may not need to maintain pypi anymore and will then need to decide if it's worth continuing to do so.

Kushal: Maintaining mirror provides important defense.

Mickael: Part of the value of reproducible builds is trusting the system has built the artifact we're expecting. If we have the ability to control the input and say "build SecureDrop package X", and then it can compile it fully from source and package that into a Deb. In the current process we need to verify every single step, but if the whole pipeline is deterministic, we only need to verify/approve what CI does.

Kushal: The CI point of view is very helpful for reproducible builds. In a lot of places we can use developer time, verify what CI built.

Allie: Say the package building logic for the Python packages is fully reproducible -- is there a case where a new dependency may not be reproducible, even though the packaging logic ensures reproducibility? Could there be value keeping the mirror up just for the packages we have issues with?

Kushal: I think that might still happen.

Mickael: Is the question: Is there a way for us to mirror only wheels that are not reproducible?

Allie: In part. For the tarball problem it seemed very clear what exactly about it is not reproducbile, e.g., the timestamp. For the wheels, I'm less familiar with the reasons they're not reproducible. Is it a packaging logic issue, e.g., we need to change one place and then all our wheels will be reproducible? Or could there be cases where we want to add a new package and we can't make it reproducible?

Conor: Wheels are just ZIP archives, compressing down code that needs to be shipped. The reproducible build folks maintain a tool called diffoscope. It gives you nice colorized diff output, e.g., "look at these timestamps in this subdirectory fluctuating". There's another tool called reprotest that lets you run through build logic repeatedly.

We should actually test by fixing the tarball reproducibility and testing if we in fact have reproducibiltiy issues with our wheels.

Kushal: Related question as an action item -- should we become an official part of the Reproducible Builds project? E.g. get listed on https://reproducible-builds.org/projects/

Conor: Yes, we could do this after continued investigation.

Erik: Sounds like a good idea, esp. if we can link to clear tracking issue.

Erik: Touching base on https://github.com/freedomofpress/securedrop-debian-packaging/pull/185

Kushal: At minimum we need to create clear documentation for how to create the tarballs.

Conor: We could merge the tarball step with the tag creation step.

Kushal: That sounds very much like the original plan when we first developed the build strategy.

Mickael: One clear advantage of having the tarball in the packaging repo -- we do need to update the packaging repo with changelog etc. When someone builds package, they just need to one tag, run build.

Conor: The proposed dynamic build process does include scripting for cloning repos etc.

NO CONSENSUS yet on removing tarballs directory from securedrop-debian-packaging repo.

ACTION: Conor to continue tarball reproducibility spike, with an eye to a wrapper to make tarballs themselves reproducible.

Mickael: What are the cross-compilation options as we have to build for Ubuntu 16.04 + 18.04/20.04

Kushal: It seems to me that the differences are relatively minor, e.g. Python versions.

Erik: We'll need to have follow-up discussion about build strategies, e.g. continued use of the Docker-based build strategy in SecureDrop Core. In follow-up, I'd also like to better understand when we need full signing procedures, so we can delegate as much work to machines as possible.

Clone this wiki locally