Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peru/SVN: Can not import files from a repo-path that existed in the past but has been moved #209

Open
Tweakbert opened this issue Nov 23, 2020 · 8 comments

Comments

@Tweakbert
Copy link

Tweakbert commented Nov 23, 2020

Hi,
I just started using peru (Version V1.2.0, Windows 10) and did various tests in order to get some experiences with it.
For the time being, I use SVN only (I am not familiar with git).

The issue I experienced is, that I can not import files from svn repository paths, that have been moved to a different location within the same repository and do not exist in the head revision.
I explicitly specify the corresponding revision of when the path still existed in the past.

The corresponding peru.yaml file looks like this:

imports:
    txtTest: common

#This does not work although it should
svn module txtTest:
    rev: "32"
    url: https://<someRepoPath>/dir1_bornRev32

someRepoPath is obviously just a place holder. The sub directory dir1_bornRev32 exists in Revision 32 at the specified location, but it does not exist in the head (latest) revision, because it has been moved elsewhere.

However, if I start peru,
peru sync
the operation can not be completed successfully. Peru passes the following error message to the command shell:

svn: E160013: '<someRepoPath>/!svn/rvr/35/trunk/dir1_bornRev32' path not found

What meets the eye, is that the Revision that is passed back is not the one explicitly specified (32), but the head revision (35). So the information in the error message is correct: that path does not exist in revision 35 (head).

I assume that at some point the specified revision is not explicitly passed to SVN.

It would be great, if you could try to reproduce the issue. My colleague was able to reproduce the issue (with the same repository).

Thank you in advance.

Best Regards,
Tweakbert

@olson-sean-k
Copy link
Member

Perhaps using the pick rule to isolate the dir1_bornRev32 subdirectory would work here (instead of including the subdirectory in the repository URL, which is not generally supported). Using pick in the peru.yaml manifest would look something like this:

imports:
    txtTest: common

svn module txtTest:
    rev: 32
    url: https://<someRepoPath>
    pick: dir1_bornRev32

Does that work?

@oconnor663
Copy link
Member

Or alternatively an export rule. The difference is that pick preserves the directory path (and you can pick more than one thing), while export makes that directory the new root of your module (and so there can only be one).

@oconnor663
Copy link
Member

Also if you want to see exactly what svn command peru is running on your behalf, take a look at this line: https://github.com/buildinspace/peru/blob/master/peru/resources/plugins/svn/svn_plugin.py#L44-L45 You might be able to try running the same command by hand, to see if it behaves the way you expect it to.

@Tweakbert
Copy link
Author

Thank you both for your replies, Sean and Jack.

Sean: Yes, in my specific example, it does work with pick, because the rest of the path in "someRepoPath" is still valid in the head-revision.

But I see the issue somewhere else. I take it, that the primary goal is the reproducibility of source/builds and I did not get, that either pick or export are required in order to make the source reproducible, I rather understood them as something like filters.
I specified modules in peru.yaml that could be synced and built fine in one revision.
When I re-organized directories in the next revision, I was not able to sync / build anymore, although all the necessary information was in peru.yaml and in the repository. The paths were not found anymore, although they existed in the revisions I explicitly specified in peru.yaml. This is not desireable.

Jack: I checked the parameters that peru passes to svn. Hope I did not get it wrong, I don't have any python experience, nor did I use svn from the command line before.
I tried the parameters that I understand peru passes to svn in the command shell - and I got the same result / error message from svn.
I took a closer look at svn export and noticed an important detail: the peg revision, which is currently not passed to SVN.
To my understanding: the peg revision defines path, that existed at a certain point (specific revision) in the history of an svn repository.

I guess it is crucial, to pass this peg revision, which shall be the same as the module revision in order to enable SVN to export from the specified path, that might not exist in the head revision anymore (I guess this is what it defaults to, if not specified).

Here is more information about it:
https://edoras.sdsu.edu/doc/svn-book-html-chunk/svn.advanced.pegrevs.html

Currently, peru passes the following parameters (I use the example from before):
svn export --force --revision 32 https://<someRepoPath>/dir1_bornRev32 common
But as mentioned before, that does not work, when the specified path to dir1_bornRev32 has been moved and does not exist anymore in the head revision.

Here is a proposal that fixed the issue in my attempts:
svn export --force --revision 32 https://<someRepoPath>/dir1_bornRev32@32 common

Please pay attention to the @32 appended to the url (no spaces between url and peg revision). When the optional peg revision is added like this, it works and the files can be retrieved from the path that existed for this specific revision.

@oconnor663
Copy link
Member

Peg revisions are specified to the Subversion command-line client using at syntax, so called because the syntax involves appending an “at sign” (@) and the peg revision to the end of the path with which the revision is associated.

But what of the --revision (-r) of which we've spoken so much in this book? That revision (or set of revisions) is called the operative revision (or operative revision range). Once a particular line of history has been identified using a path and peg revision, Subversion performs the requested operation using the operative revision(s).

Fascinating! Thanks for figuring this out.

I suppose this raises a tricky question. On the one hand, peru could make the simplifying assumption that when you specify rev, you mean it be both the "operative revision" and the "peg revision". That would make your example work, and arguably it'd be the behavior that most people (who aren't SVN experts) expect. On the other hand, if the designers of SVN considered it important to make peg revisions part of the fetch URL, peru could respect that choice. That would be the "unopinionated" approach, where we conform to the behavior of the underlying tool as closely as possible.

It's not immediately obvious to me which of those behaviors is better. I think the documentation you linked to has some important context here. SVN's treatment of paths is more complex than I realized. Apparently if you ask for "directory foo at historical revision R", then SVN is going to give you the contents of the directory currently called foo, even if it had a different name at revision R. This is quite a powerful feature, if you want it :) Though I think I remember reading some discussions years ago about how that feature can make merge conflicts very difficult to resolve, if two different branches create directories of the same name. That might've influenced the design decisions they made in Git, which doesn't track identities across renames like this.

Thoughts?

@olson-sean-k
Copy link
Member

I took a closer look at svn export and noticed an important detail: the peg revision

Wow, thanks for finding this! I think I encountered something like this years ago, but it's been ages and I remember very little about using Subversion.

On the one hand, peru could make the simplifying assumption that when you specify rev, you mean it be both the "operative revision" and the "peg revision". ... On the other hand, if the designers of SVN considered it important to make peg revisions part of the fetch URL, peru could respect that choice.

My gut reaction is that peru should not map the rev field to both the operative and peg revisions and the svn plugin should allow for peg revisions in the url field. This seems to be a Subversion pitfall and I'm not immediately convinced that peru should try to paper over it. Doing so may lead to bad interactions between fields too, since the peg revision is encoded into the URL.

If we decide not to be opinionated and roll with Subversion, then we'll probably want to briefly document support for peg revisions in the svn plugin. That way, experts can see that it works as they expect and everyone else has a hint when something breaks.

@Tweakbert
Copy link
Author

I also did not imagine it being this complex.
You both brought up some pretty good points about it.

Here are my thoughts on this:
Just having started with peru, my perspective is probably a bit narrow. For me, it is primarily intended as a tool to use common sources in different projects and to define these sources and revisions in a reproducible way and that has top priority for me. So even if there are major changes to the structure/directories of the repositories, it must still be possible to reproduce the intended source revisions. I understood the role of the peru.yaml file in the same way as an svn tag - a snapshot of a project (or parts of it) in time.

I tried to get a better understanding for the use of operative and peg revisions (reading the article referenced before https://edoras.sdsu.edu/doc/svn-book-html-chunk/svn.advanced.pegrevs.html) and forgot about "peru reup" for now, to not burden the mind with possible consequences.

To my understanding, in order to make a snapshot of a project, the operative and peg revision should be identical. If both versions match, there is no room for interpretation of what components / revisions to use. Also the referenced path/url should represent the current path of when the snapshot was made.

Here's just a first thought:
If svn would be called in a different way - just specifying the peg revision without the operative revision - the operative revision would default to the peg revision. My point is, that peru currently uses just one revision (rev). This revision is currently applied to the operative revision, which is explicitly passed to svn. The peg revision is currently not passed.
If instead, the revision would be assigned to the peg revision and this one would be explicitly passed to svn, the operative revision would default to the peg revision. This is probably, what most users would expected svn export to do in the first place. It is not even necessary to specify two revisions at all.
So in short a change from
svn export --force --revision 32 https://subversion.regau.abatec.at/svn/abatec_common_uC/trunk/testDeleteMePlease2 common
to
svn export --force https://subversion.regau.abatec.at/svn/abatec_common_uC/trunk/testDeleteMePlease2@32 common
would work and is probably what most users expect peru to work like.

Doing so may lead to bad interactions between fields too, since the peg revision is encoded into the URL.

If I did not get it wrong, there might be already a minor latent issue here: I would recommend to append an at character to the url anyway, because svn will always try to interpret the last at character in the url as the peg revision, even if it is part of the actual url/path name. Appending an at character to the path should fix the danger of a path containing at characters unintentionally being interpreted as a peg revision, because only the very last at character is used for this (this is also well explained in the referenced article).

Currently it matters a lot what the operative / peg revision defaults to. If the peru.yaml should be considered as a tag (I assume svn and git tags basically work in a similar way), the default value of the peg revision is not suitable in my opinion (according to the article it defaults to BASE) - at least not, when a rev differs from "head".

We are not the first to worry about this svn pitfall. Apparently there have been changes to the format / order of parameters in svn that led to this pitfall. The beginning of this thread on stackoverflow is quite interesting and points out the dangers that emerged with the "new" format:
https://stackoverflow.com/questions/19156464/best-practice-use-of-svnexternals
Also there are some similar thought's on the reproducibility of projects.

Still not worrying about the implications it might have on "peru reup", my proposal would go in this direction:
The support for an additional revision would be welcome, e.g. pegRev.
The current rev is currently used as operative revision and could be used in the same way as before.

  • If neither revision is specified, both rev and pegRevare "head".
  • If only one is defined (rev or pegRev) the other revision should default to the same value. This would probably remove the pitfall and would limit the impact from the current to a coming revision to a minimum (in a way that no changes would be necessary, but the reproducilbilty should work, even if directories have been moved).
  • If both rev and pegRev revisions are defined explicitly, that is fine too they are passed as is to svn. However, I don't think it is good practice to define two revisions that differ in the context of peru (maybe just for a quick test). This would be something that only experienced svn users should play around with.

These are at least my thought's on this (sorry for the long comment).

@oconnor663
Copy link
Member

(Apologies for leaving this hanging for a while. I'm still leaning towards keeping the current behavior, but I want to think more carefully about this and maybe play around with both before we make a final call, and I might not find time to do that soon.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants