Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refs: optimise writing unchanged refs #1120

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danchr
Copy link
Contributor

@danchr danchr commented Jan 16, 2023

This is inspired by an issue filed in hg-git; essentially, writing a lot of unchanged references is quite slow, due to the fsync() implied in re-writing the file. Although we could fix this within hg-git, it seems to me that the general optimisation might apply to other users of Dulwich, so this PR makes it so that it avoids acquiring the lock to a ref if that ref already matches the expected update.

@jelmer
Copy link
Owner

jelmer commented Jan 16, 2023

Thanks! I'll take a look later today

As an aside, I think one of the things currently missing from Dulwich is the explicit (user-accessible) concept of a (read) transaction during we could lock files and cache results.

@danchr
Copy link
Contributor Author

danchr commented Jan 16, 2023

As an aside, I think one of the things currently missing from Dulwich is the explicit (user-accessible) concept of a (read) transaction during we could lock files and cache results.

Well, looking at it from the hg-git perspective, the “transactions” in Mercurial and Dulwich are quite different. A Mercurial transaction is global and atomic; you either perform all changes or none at all. A Dulwich transaction, on the other hand, is file-specific. I guess some of this difference is driven by how Git itself handles transaction, but it'd be nice to have a similar, global transaction, if possible.

EDIT: One option, and a case where this would be useful, is for packed refs. There isn't really any support for locking that file during a transaction, bunching up updates, and writing them all at once. Instead, the client has to do that manually, which might easily lead to accidental O(n²) behaviour…

@jelmer
Copy link
Owner

jelmer commented Jan 16, 2023

As an aside, I think one of the things currently missing from Dulwich is the explicit (user-accessible) concept of a (read) transaction during we could lock files and cache results.

Well, looking at it from the hg-git perspective, the “transactions” in Mercurial and Dulwich are quite different. A Mercurial transaction is global and atomic; you either perform all changes or none at all. A Dulwich transaction, on the other hand, is file-specific. I guess some of this difference is driven by how Git itself handles transaction, but it'd be nice to have a similar, global transaction, if possible.

EDIT: One option, and a case where this would be useful, is for packed refs. There isn't really any support for locking that file during a transaction, bunching up updates, and writing them all at once. Instead, the client has to do that manually, which might easily lead to accidental O(n²) behaviour…

I suppose it matters more for pack files, where we currently have to repeatedly check which pack files exist if you're doing lots of random access to them - since new ones may have appeared. If we could just remember which ones existed at the start of a (read) transaction, that would probably help performance a lot.

That said, for writing we could also provide transactions that cross individual ref files - we can just keep multiple locks open at the same time.

# first, check if we need to write anything at all without
# acquiring the lock, which triggers a fs write, close and
# thus fsync()
if self.read_loose_ref(realname) != new_ref:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what about the case where somebody else has the lock open? Previously you'd get a FileLocked error, but now the code would run fine.

Perhaps it's better to just do what we previously did but f.abort() if orig_ref == new_ref ? That way you'd still avoid the fsync() - and you'd avoid an extra read if things have changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants