Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not register fake repository as owner of ODB #6443

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oreiche
Copy link
Contributor

@oreiche oreiche commented Jan 4, 2023

Accessing ODB is guaranteed to be thread-safe. Registering a fake repository (created via git_repository_wrap_odb()) may not register itself as owner of the ODB in order to maintain that guarantee. Otherwise, accessing objects from ODB will try to obtain the cache from the owning repository (via odb_cache()) and produce a race if this ODB is concurrently used to create fake repositories in other threads. Consequently, operations on fake repositories will interact with the ODB's cache instead of the repository's cache.

Accessing ODB is guaranteed to be thread-safe. Registering a
fake repository (created via git_repository_wrap_odb()) may
not register itself as owner of the ODB in order to maintain
that guarantee. Otherwise, accessing objects from ODB will
try to obtain the cache from the owning repository (via
odb_cache()) and produce a race if this ODB is concurrently
used to create fake repositories in other threads.
Consequently, operations on fake repositories will interact
with the ODB's cache instead of the repository's cache.
@ethomson
Copy link
Member

Sorry for the delay. I've come back to this PR a few times to try to understand what's going on and what you're really trying to solve. More information there would be helpful, because ultimately, it seems like you have an object database and are trying to create multiple fake repositories on top of it. The immediate problem to this approach is that wrap_odb sets the odb owner to the new fake repository, which means that trying to call it multiple times means that the odb will always look to the most recently created repository's cache.

If we wanted to try to change this, we should do so without breaking the API of set_odb. There's no need for that, we should just change wrap_odb to handle the updates itself. For example:

diff --git a/src/libgit2/repository.c b/src/libgit2/repository.c
index 97f776c4a349..9a6083c399cf 100644
--- a/src/libgit2/repository.c
+++ b/src/libgit2/repository.c
@@ -1184,9 +1184,17 @@ int git_repository__wrap_odb(
 
 	repo->oid_type = oid_type;
 
-	git_repository_set_odb(repo, odb);
-	*out = repo;
+	/*
+	 * Do not GIT_REFCOUNT_OWN this odb; doing so would
+	 * cause the odb to use this repository's object cache
+	 * instead of its own cache. Using the odb cache allows
+	 * multiple fake repositories to be created via `wrap_odb`
+	 * and they will not try to override the odb cache.
+	 */
+	GIT_REFCOUNT_INC(odb);
+	GIT_ASSERT(git_atomic_swap(repo->_odb, odb) == NULL);
 
+	*out = repo;
 	return 0;
 }

However, this is only part of the problem. The objects in the cache themselves also have an owner, and the odb cache shouldn't have objects from different repositories inside of it.

So - I'm curious why you can't new up multiple object databases and wrap those? Are they in-memory object databases?

@oreiche
Copy link
Contributor Author

oreiche commented Aug 29, 2023

Thanks! I'm also very sorry for my delayed response.

So yes, you got it mostly right. This is what we are doing exactly:
We have a repository on disk from that we read the ODB and distribute its pointer across multiple threads. Accessing the OBD in parallel is of course thread-safe and fine for most operations. However, for some operations (e.g., commit/tree lookup) we have to use the shared ODB to create a (temporary) thread-local fake repository, which causes races due to that odb_cache (de)registration.

Due to that fake repo being local/private per thread, we were hoping to avoid this race without locking the fake repo (de)registration for shared ODBs (which is what we currently do as a workaround).

It seemed weird to us that creating the fake repo is not thread-safe, although it is just another ODB operation (from our point of view), and despite the resulting repo not being shared, it is still modifying a shared state (the ODB). This is what we were trying to fix, but if it is intended behaviour, we would also appreciate any other suggestion that avoids unnecessary locking. :)

Again, many thanks for taking the time to have a look it this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants