API safety and soundness #29

vi · 2023-07-10T22:46:04Z

Description mentions that the library uses mmap, also links to "Single-level store" article, which suggests that jammdb may give users references leading to mmaped files.

I expected e.g. jammdb::DB::open to be unsafe fn, as full protection against unreasonable things that is needed to make it completely sound (or maybe even enough-for-practical-considerations sound) may not be viable gived the architecture.

But it seems to be not the case, the crate API does not hint at potentials undefined behaviours users may face with the library. For example, what worst could happen if the database file is mangled by external process while in use? What if multiple processes open the same database for writing (including using networked filesystem without locking)?

Shall the entry function be marked unsafe to make users commit to not doing unreasonable things to the database? Or is jammdb actually fully handles all possible complications of memory mapping, so the API is indeed sound even when misused?

The text was updated successfully, but these errors were encountered:

pjtatlow · 2023-07-11T01:46:36Z

Hey @vi! Thanks for bringing this up. I don't know that it's realistic to try and handle all possible complications involving memory mapping, and it's definitely possible that by manipulating the underlying file you can mess up your database and cause a panic or other undefined behavior. However, I do believe that the steps I've taken (locking the file if possible) should be "safe enough" provided you're not doing something ridiculous, which is why I chose to not to make the function unsafe. unsafe is no "protection" as you've put it, so I don't see the point of marking the function that way. In my opinion it would just cause uncertainty in the library's stability rather than warning users to not do something dumb. I'm happy to debate this point though!

Perhaps some more documentation would be appropriate though? Some big warning that says not to mess with the file while it's open by another process?

vi · 2023-07-11T07:32:44Z

unsafe is no "protection" as you've put it, so I don't see the point of marking the function that way.

unsafe is not a protection, but moving the responsibility from authors of the crate to its users.

I do believe that the steps I've taken (locking the file if possible) should be "safe enough" provided you're not doing something ridiculous, which is why I chose to not to make the function unsafe.

For example, would it be wise to use the app from privileged suid executable that accesses a database which is created by unprivileged users? Will straightforward usage of the API lead to a secure application? Users of the crate may think "if database is adversarial then it may return fake data, so we need e.g. to verify signatures of those data before using it" while in reality mere opening the database leads to code execution.

Perhaps some more documentation would be appropriate though?
Some big warning that says not to mess with the file while it's open by another process?

Documentation should be a starting point. The documentation should mention what worst can happen if the rules are not being followed. If this "worst" involves undefined behaviour, then it invites unsafe fn.

If, however, the function is a gateway to something large and potential unsoundness only reachable in really tricky scenarios, then the function is often not marked unsafe. For example, accessing debugging tools like /proc/self/mem from filesystem API or /usr/bin/gdb from process spawn API, executing Python code (which may involve unsafe ctypes module) are not typically marked unsafe.

But, for example, opening attacker-provided or corrupted database file should not be considered as too exotic scenario and should invite unsafe, unless it is guaranteed that problems would be encapsulated only to the database and data returned by it (i.e. OK ways to misbehave: infinite loop, deadlock, process abort, large memory allocation, returning arbitrary, but well-formed data to users, infinite recursion. Not OK ways to misbehave: misaligned/dangling references, returning non-UTF-8 byte blocks as strings to users; mangling other, neighbouring, unrelated and uncorrupted jammdb::DB instance).

Precedent: BTreeMap/HashMap in std and bad Ord/Hash implementations:

It is a logic error for a key to be modified ... The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

This encapsulation allows BTreeMap to stay sound even when misbehaving. Can the same be stated for jammdb::DB?

(locking the file if possible)
if possible

There may be two functions for opening the database: usual pub fn open which enforces proper mandatory locking, overridable only with root access or debuggers and failing to open the database if it is not possible; and pub unsafe fn open_unchecked which skips locking or makes it optional.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API safety and soundness #29

API safety and soundness #29

vi commented Jul 10, 2023 •

edited

pjtatlow commented Jul 11, 2023

vi commented Jul 11, 2023 •

edited

API safety and soundness #29

API safety and soundness #29

Comments

vi commented Jul 10, 2023 • edited

pjtatlow commented Jul 11, 2023

vi commented Jul 11, 2023 • edited

vi commented Jul 10, 2023 •

edited

vi commented Jul 11, 2023 •

edited