Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing to josh produces non-roundtrip commit #1325

Closed
RalfJung opened this issue Apr 20, 2024 · 14 comments
Closed

Pushing to josh produces non-roundtrip commit #1325

RalfJung opened this issue Apr 20, 2024 · 14 comments

Comments

@RalfJung
Copy link
Contributor

RalfJung commented Apr 20, 2024

To reproduce, I assume josh r23.12.04 is installed via

RUSTFLAGS="--cap-lints=warn" cargo +stable install josh-proxy --git https://github.com/josh-project/josh --tag r23.12.04

and is running as josh-proxy --local ~/.cache/rust-analyzer-josh --remote https://github.com --port 42042 --no-background.

I prepare a branch sync-from-ra in my rustc fork that is at commit 84e729a59f216cc64755788a470f165429a731f4.

Then I check out this branch (commit d39b00da8bd89741250eda3350450ffa6bd03565), and do

$ git push 'http://localhost:42042/RalfJung/rust.git:rev(f5a9250147f6569d8d89334dc9cca79c0322729f:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer.git' HEAD:sync-from-ra
Enumerating objects: 3930, done.
Counting objects: 100% (2465/2465), done.
Delta compression using up to 20 threads
Compressing objects: 100% (512/512), done.
Writing objects: 100% (1572/1572), 326.55 KiB | 108.85 MiB/s, done.
Total 1572 (delta 1035), reused 1546 (delta 1016), pack-reused 0
remote: Resolving deltas: 100% (1035/1035), completed with 286 local objects.
remote: josh-proxy
remote: response from upstream:
remote: To github.com:RalfJung/rust.git
remote:    84e729a59f2..f1430d6b25b  JOSH_PUSH -> sync-from-ra
remote: REWRITE(6efdf26a001b6baf2ad169c489bd37a7d1d16ad5 -> ec575f8bae97b13f75c3724e2a42d072025066f0)
remote: 
remote: 
To http://localhost:42042/RalfJung/rust.git:rev(f5a9250147f6569d8d89334dc9cca79c0322729f:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer.git
   93a578e30b..6efdf26a00  HEAD -> sync-from-ra

However, now when I fetch that branch again, I do not get my commit back out -- I expected d39b00da8bd89741250eda3350450ffa6bd03565, but I got 4037ab599d4e50aefbf476a9d492b57834b7ae4a:

$ git fetch 'http://localhost:42042/RalfJung/rust.git:rev(f5a9250147f6569d8d89334dc9cca79c0322729f:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer.git' sync-from-ra
From http://localhost:42042/RalfJung/rust.git:rev(f5a9250147f6569d8d89334dc9cca79c0322729f:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer
 * branch                  sync-from-ra -> FETCH_HEAD
$ git rev-parse FETCH_HEAD
4037ab599d4e50aefbf476a9d492b57834b7ae4a

These commits have the same tree, but not the same history.

Usually this means something went very wrong and we'll duplicate history if we now merge this into the rustc master branch and keep synchronizing, so I stopped here.

d39b00da8bd89741250eda3350450ffa6bd03565 was created by pulling 84e729a59f216cc64755788a470f165429a731f4 from rustc and merging the result, but somehow even pushing back that merged branch to the exact same rustc commit produces an invalid result.

Cc @lnicola

@RalfJung
Copy link
Contributor Author

RalfJung commented Apr 20, 2024

This probably has something to do with rust-lang/rust-analyzer#16983: josh creates non-roundtrip pushes exactly if it is asked to push something that includes rust-lang/rust-analyzer#16983. That PR was created by git subtree and it is pretty strange -- the overall diff is empty but it contains a commit that has a huge diff. Still, josh shouldn't just silently produce wrong history even when there are weird commits.

@christian-schilling
Copy link
Member

Hi @RalfJung thanks for the report, I'll look into it. Somehow it seems you like finding bugs only while I'm on vacation ;)

@RalfJung
Copy link
Contributor Author

Sorry for that.^^

FWIW we've been hitting more non-roundtrip issues recently that do not seem to involve git subtree syncs. I haven't looked into those at all yet though.

@RalfJung
Copy link
Contributor Author

I realized something is already wrong with the branch we pulled (josh-non-roundtrip). The commit Merge commit 'e36a20c24f35a4cee82bbdc600289104c9237c22' into ra-sync-and-pms-component exists twice in that branch. That's not intended -- probably a mistake with the filter, or so?

@RalfJung
Copy link
Contributor Author

RalfJung commented Apr 27, 2024

Yeah there's definitely something odd with the history in the branch we are trying to push from.

Specifically, there's something odd when doing a pull via

git fetch 'http://localhost:42042/rust-lang/rust.git:rev(55d9a533b309119c8acd13061581b43ae8840823:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer.git'

(The rev commit changed since new things landed in rustc since this issue was created)

The resulting history contains two roots!

$ git rev-list FETCH_HEAD --max-parents=0
8a7cde45522e6c3de5693b3c287aca3ef00dd172
a63222cd240d9b5405826783603f3b391c90885d

a63222cd240d9b5405826783603f3b391c90885d is expected, the other one is new.

8a7cde45522e6c3de5693b3c287aca3ef00dd172 corresponds to rust-lang/rust@43acb50 which did create this folder -- but then, the situation seems similar for Miri where rust-lang/rust@f45b570 created the Miri folder, and yet extracting the Miri history via :rev(75dd959a3a40eb5b4574f8d2e23aa6efbeb33573:prefix=src/tools/miri):/src/tools/miri only produces a single root. Specifically, rust-lang/rust@f45b570 does not show up in the extracted history at all.

Notably in Miri, rust-lang/rust@f45b570 is a git subtree merge of exactly 75dd959a3a40eb5b4574f8d2e23aa6efbeb33573, the commit showing up in rev... in RA, rust-lang/rust@43acb50 is a subtree merge of 977e12a0bdc3e329af179ef3a9d466af9eb613bb which is an ancestor of the commit showing up in rev.

@RalfJung
Copy link
Contributor Author

I have tried to use exclude to make josh skip the part of the history that was already synced via git subtree, but that doesn't seem to have any effect.

:rev(688c30dc9f8434d63bddb65bd6a4d2258d19717c:exclude[:/],55d9a533b309119c8acd13061581b43ae8840823:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer

43acb501b93c637b32fa134ba8cf586d82e96732 is an ancestor of 688c30dc9f8434d63bddb65bd6a4d2258d19717c and still it gets synced. No idea why.

@RalfJung
Copy link
Contributor Author

More specifically, the exclude does have an effect, but not enough -- it excludes some parents of 688c30dc9f8434d63bddb65bd6a4d2258d19717c, but if a parent commit has multiple paths to the HEAD and some of them bypass 688c30dc9f8434d63bddb65bd6a4d2258d19717c then it is still included.

Is there a way to fully exclude all parents of a given commit?

@christian-schilling
Copy link
Member

There is: :SQUASH. (note uppercase means this was meant for "internal use only" but it can probably be made "offical" at this point)
I have not tried this myself, but combining it with :rev(..) should work:
:rev(688c30dc9f8434d63bddb65bd6a4d2258d19717c:SQUASH)

@RalfJung
Copy link
Contributor Author

RalfJung commented Apr 27, 2024

That doesn't seem to help. I suspect what happens is that josh walks up the graph (from the HEAD via the parents towards the root), and when it hits 688c30dc9f8434d63bddb65bd6a4d2258d19717c it applies the filter -- but if there is a path that bypasses 688c30dc9f8434d63bddb65bd6a4d2258d19717c and reaches 43acb501b93c637b32fa134ba8cf586d82e96732 another way, that still gets included. In a complex history it is not easy to find a clean "cut", i.e. a set of commits such that on the way from HEAD to 43acb501b93c637b32fa134ba8cf586d82e96732 we'll definitely go through one of these commits. (And also that is fragile, as new PRs landing in rustc could have an old parent and thus add a new bypass to our existing "cut".)

@RalfJung
Copy link
Contributor Author

RalfJung commented Apr 27, 2024

FWIW my original precursor to this rev filter in #961 did involve an is_ancestor_of query that would also notice paths "bypassing" a particular commit. Not sure if that is in any way coherent with the design of josh though. ;)

I tried working around this by adding a lot of commits (specifically, I added 688c30dc9f8434d63bddb65bd6a4d2258d19717c and the 64 merge commits that precede it) to ensure we can't "bypass" them, but that seems to make josh extremely slow. I guess this somehow defeats caching, or so. Anyway that wouldn't be a proper solution, just an experiment.

SQUASH also doesn't seem to stop the traversal, or does it? (I tried to understand its implementation and failed.^^) Basically what I am looking for is a filter that returns the empty tree with no parents. Though I have no idea if that's actually right...

@RalfJung
Copy link
Contributor Author

RalfJung commented Apr 27, 2024

Ideally ofc we wouldn't need any of this, we'd "just" get a nice history out of rustc. :)

I am surprised that this becomes a root in the first place. One of the parents of rust-lang/rust@43acb50 is 977e12a0bdc3e329af179ef3a9d466af9eb613bb which is a child of 55d9a533b309119c8acd13061581b43ae8840823 so it should get included verbatim....

... oh wait. We have the same "bypass" issue here. rev(55d9a533b309119c8acd13061581b43ae8840823...) only applies to 55d9a533b309119c8acd13061581b43ae8840823 directly, not to any other commit that references a parent of 55d9a533b309119c8acd13061581b43ae8840823. So I think all we need is a version of rev that recognizes parent commits properly.

It is even possible that the reason this worked for Miri is that the Miri merge was done with my version of the filter (#961), so it would correctly deal with parents of "the git subtree root" (55d9a533) being referenced further up the rustc history. Then later we switched to your version of the filter but at that point we already had nicely aligned history so the filter's job was simpler? Or maybe in the case of Miri, the "git subtree root" is indeed a cut, i.e. none of the parents of the Miri "git subtree root" are referenced in rustc. (EDIT: neither of these can be entirely right. There is in fact a "bypass" commit in the rustc history. But nothing in that part of the rustc history has files in src/tools/miri so it all comes out as empty with the filter and gets discarded, so there is no problem.)

For rust-analyzer with the current josh semantics, we'd have to figure out all the RA commits that are referenced from rustc, and add a :prefix=src/tools/rust-analyzer filter to them.

@RalfJung
Copy link
Contributor Author

I sketched a version of rev that avoids this "bypass" issue: #1326.

That seems to be able to handle RA's history the way I was hoping; I can pull and then push (and even merge RA changes that happened in the mean time and then push) without getting the non-roundtrip error.

@christian-schilling
Copy link
Member

Thanks for the thorough investigation of this issue. I think you got it completely right. If the same commit is reachable via multiple paths and some of them are in :rev and others are not we get duplicated history and the roundtrip issue.

@christian-schilling
Copy link
Member

fixed in #1329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants