Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread-safe access to Index #51

Open
danieleades opened this issue Apr 15, 2020 · 2 comments
Open

thread-safe access to Index #51

danieleades opened this issue Apr 15, 2020 · 2 comments
Labels
C-bug Category: Bug

Comments

@danieleades
Copy link
Contributor

danieleades commented Apr 15, 2020

there's a lot of unwraps in the git2 index.

Internally the git2::Repository is held in a mutex, which is unlocked and unwrapped in every method. It's not clear what purpose the Mutex serves in this module.

I suspect that the mutex is added so that the index itself is Sync, meaning it can be used "naked" in the tide::Server state. is that correct?

It might be better to move this into the calling code. I think you're kind of cheating by placing a mutex here.

The Indexer trait only has immutable references in the in the interface, which gives the appearance (as far as Rust's type system is concerned) that types which implement Indexer can safely be used across threads from inside an Arc. But since these objects are actually mutating the filesystem, you can end up with all sorts of race conditions. Neither the CLI or the git2 implementation are actually thread safe. (you might consider explicitly marking the Tree as !Sync!

for example you have this code in api/publish.rs-

state.index.add_record(crate_desc)?;
state.index.commit_and_push(commit_msg.as_str())?;

since there are no guarantees there is only one thread running this code at a time in your server, you could find yourself in a position where the commit contains more (or less!) than 1 change to the index. You'd have to be pretty unlucky to run into this, but the likelihood will scale with the number of users.

The other problem with this (as i'm sure you've discovered!) is having only immutable references in the Indexer trait precludes you from caching any information about your index. You have to do a round-trip to the filesystem for every type of query.

The simplest solution would be to wrap the config::State object in a Mutex. This seems to be a pretty common way to handle state in Tide.
While that won't hurt, this is maybe a little coarse for you since there are other objects in your State struct that are already thread safe (like the database connection pool).

alternatively, you could wrap just the Index object in the State struct. Since an Arc<Mutex> can take a trait object, this is a nice way to make your app slightly more composable. instead of Arc<Mutex<Index>>, you could simply have an Arc<Mutex<dyn Indexer>> (ie you no longer care what concrete type the index object is).

If you want to get really clever you could look at using a RwLock to only lock access on the methods that take &mut self. My gut feeling is this might not work since RwLock<T> is only Sync if T is Sync, and you're back to square 1. A simple mutex is probably fine until the number of concurrent users scales by 3 or 4 orders of magnitude.

Either way, you'll then be able to remove the mutex from the git2 implementation (since enforcing single access to the git2::Repository is only part of the problem anyway!)

Architecturally it's cleaner also, the Index is responsible for accessing and mutating the filesystem only, and it is the responsibility of the caller to ensure only one Index is doing its thing at any one time

@danieleades danieleades changed the title remove unwraps from git2Index thread-safe access to Index Apr 17, 2020
@Hirevo Hirevo added the C-bug Category: Bug label Apr 25, 2020
@Hirevo
Copy link
Owner

Hirevo commented Apr 25, 2020

Yes, the Mutex in Git2Index was indeed to have it be Sync.
At the time, I thought it would be fine but you made very good points about how that lock is not actually useful and how having it Sync is not accurate. I didn't realize that and that's completely on me.
I am sold on removing the lock from there and using an Arc<Mutex<config::State>> as Tide's state.
Thank you for pointing that out, that is quite a big deal and could easily lead to consistency issues in the registry's state.

@danieleades
Copy link
Contributor Author

no problem! it also lifts the restriction that all the methods in the Indexer trait should take &self. Being able to take &mut self allows implementors of Index to avoid using interior mutability to modify themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Bug
Projects
None yet
Development

No branches or pull requests

2 participants