Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting SourceLink to work with Bonobo (git archive problem?) #889

Open
eberleitguy opened this issue Aug 11, 2021 · 7 comments
Open

Getting SourceLink to work with Bonobo (git archive problem?) #889

eberleitguy opened this issue Aug 11, 2021 · 7 comments

Comments

@eberleitguy
Copy link

First, I'm not terribly terribly savvy with Git overall, and as a result, i'm poking in the dark a little. I use git for source control, at what can probably be characterized as a beginner level.

As background for others, review the following https://github.com/dotnet/sourcelink

I've been trying to puzzle out how sourcelink could be used with bonobo, but believe my problems stem from not knowing enough about git.

I've implemented bonobo git (at the root of an IIS Website) so my bonobo URLs are something like https://host.domain.tld/Repository.git

I've also implemented BaGet for NuGet packages on the same web server at a vanity domain so my package repository is at https://baget.domain.tld/v3/index.json

I've managed to work out packaging my assemblies, and creating snupkgs for symbols.

The last piece of the puzzle to making my packages fully debuggable appears to be getting sourcelink information successfully into the packages.

I've cloned the SourceLink repository locally, and have duplicated the GitLab project (and the GitLab unit tests) and updated names therein.

However, i do not know what URI format i need to translate to in order to download a set of source files from bonobo. I suspect that the mechanism uses git archive, however in attempting to run various URIs with git archive against bonobo, i've failed at each turn to produce anything apparently successful.

Here is what i've attempted:

`git archive -v -o Repository.tar --remote=https://host.domain.tld/Repository.git 40-char-sha-of-commit /*

fatal: operation not supported by protocol

git archive -v -o Repository.tar --remote=https://host.domain.tld/Repository.git 40-char-sha-of-commit

fatal: operation not supported by protocol

git archive -v -o Repository.tar --remote=https://host.domain.tld/Repository.git

fatal: operation not supported by protocol

git archive -v -o Repository.tar --remote=http://host.domain.tld/Repository.git

fatal: operation not supported by protocol

git archive -v -o Repository.tar --remote=git://host.domain.tld/Repository.git

fatal: unable to connect to host.domain.tld:

host.domain.tld[ip.ad.dr.ess]: errno=Unknown error

git archive -v -o Repository.tar --remote=host.domain.tld/Repository.git

remote: fatal: 'host.domain.tld/Repository.git' does not appear to be a git repository

remote: git upload-archive: archiver died with error

fatal: sent error to the client: git upload-archive: archiver died with error`

TL:DR; on the Sourcelink business, as far as i understand, the .nuspec file defines the source control type, the repository URI, and a commit sha in the following format:
<repository type="git" url="proto://host.tld/virtualdir/Repository.git" commit="40-char-sha-of-commit" />
example (from roughly randomly selected repository):
<repository type="git" url="https://github.com/ctaggart/SourceLink.git" commit="aac0493939bc3e82e4fd9651b6a5025e37affa92" />

This is translated by the sourcelink <> (not quite certain what the correct term is here, you include an implementation specific SourceLink reference in your project) during creation of the PDB into a URI as such:
https://raw.githubusercontent.com/ctaggart/SourceLink/aac0493939bc3e82e4fd9651b6a5025e37affa92/*
This is embedded in the PDB as a JSON.

Frankly, this is where i lose the trail. I don't know where SourceLink gets the value for "raw.githubusercontent.com" and I'm not sure precisely what mechanism SourceLink is using to retrieve files, and therefore do not know what format the ContentURI should be for Bonobo.
This is especially true as Bonobo gives individual file URIs within the repository browse web interface as GUIDs and so I can't work out if need to first remap the "Repository" name to the GUID or if the "Repository" itself will be useful. Git seems to be able to get by without this, so i have to assume either it's not necessary for Git's protocol, or something internal to Git is able to translate, and that either way the GUIDs are for the benefit of the web portal, and not actual source control.

Of course, while the "raw.githubusercontent.com" uri above works, raw.githubusercontent.com itself returns a 404 using a browser, so i assume this is special to git protocol.

My apologies if this is over ramble-y, but reading other issues it appears that it's common to err more on the side of too little information, and i wanted to avoid that if possible.

@eberleitguy
Copy link
Author

This is probably obvious to anyone who has used Sourcelink, but while the URL
https://raw.githubusercontent.com/ctaggart/SourceLink/aac0493939bc3e82e4fd9651b6a5025e37affa92/*
returns a 404, the URL
https://raw.githubusercontent.com/ctaggart/SourceLink/aac0493939bc3e82e4fd9651b6a5025e37affa92/LICENSE
actually returns the expected file for that commit.

The similar URL for bonobo appears to be like
https://host.domain.tld/Repository/guid-for-repository/40-char-sha-of-commit/Raw/file.ext?display=True
(presumably the ?display=True) parameter is irrelevant for SourceLink, as omitting it results in a direct download.)

This is what leaves me with the "how do i get a GUID for a given Repository.git in an expedient way, if indeed i need to do so?"

@eberleitguy
Copy link
Author

I've worked out that the translation logic is implemented by a structure similar to the following in either a .props file or the .*proj file:

<Project>
  <ItemGroup>
    <SourceLinkGitHubHost Include="github.com" ContentUrl="https://raw.githubusercontent.com"/>
  </ItemGroup>
</Project>

This leaves, i believe, only the question of:

Given a .git URI (https://mybonoboserver.local/myBonoboDirectory/myProject.git) and a 40 char sha commit hash, is there a URI that allows me to retrieve an arbitrary file by combining information from only those strings, or does a project GUID need to be retrieved, and what mechanism can be used to do so?

@willdean
Copy link
Collaborator

I don't think there's a URL which just lets you fetch a file like this - you'd be expected to either use a Git client to talk to the repository, or to use the browser-based interactive stuff to view files as a human. You perhaps need something which is a little bit of both?

@eberleitguy
Copy link
Author

eberleitguy commented Aug 12, 2021

I don't think there's a URL which just lets you fetch a file like this - you'd be expected to either use a Git client to talk to the repository, or to use the browser-based interactive stuff to view files as a human. You perhaps need something which is a little bit of both?

I've already provided an example of a URL for bonobo, and examples from GitHub of the equivalent.
Example:
https://host.domain.tld/Repository/guid-for-repository/40-char-sha-of-commit/Raw/file.ext

Obviously, the ultimate tool used by VS to implement SourceLink uses Git (as authentication or other requirements may be needed) but then again Git is largely HTTP/HTTPS. Git is able to download independent files (though perhaps one could argue it's done via git's internal snapshot objects and then local resolution to files for the checked-out commit).

Basically, since i don't really understand how bonobo executes (which code to examine to answer my question) as yet, a simpler question may be:

Internally, how does bonobo generate the URLs for the links on https://bonoboserver.local/Bonobo-Git-Server/Repository/Index to each repository.
Example link: https://bonoboserver.local/Bonobo-Git-Server/Repository/Detail/ef42f3d2-abb1-42e6-9758-5dcce5ad62c3

The GUID in that link is what i ultimately need to be able to plumb this if there's no service URL or other direct URI to access the files directly.
(EDIT: And i need to be able to get that GUID with only the data in the string "https://bonoboserver.local/Bonobo-Git-Server/Repository.git", ultimately, just like Git does)

@willdean
Copy link
Collaborator

willdean commented Aug 12, 2021

I've already provided an example of a URL for bonobo,

Yes, but that was a URL which is part of the interactive repo browser, and it's not really using Git-over-http to get that file at all - the Bonobo server is using its own Git client to retrieve the file locally and then serve it up.

The GUIDs you're seeing in the URLs are keys into the database Bonobo is maintaining to keep track of repositories, and I don't think there's an easy way to convert a repo name into a GUID that's exposed via HTTP.

When you make a "Git" connection to Bonobo, Git is never really aware of this GUID - it's used within Bonobo as part of looking up the repo and its user permissions, but once permissions have been granted, Git just gets on with doing its Git protocol stuff and never needs to know about the GUID.

If you're happy to build yourself a new version of Bonobo, then it would be easy enough to add something which could convert a repository name into a GUID. But probably even better, to modify the Detail action on the RepositoryController so that it could take either a GUID or a repository name - then you'd just be able to pass the repo name into those Detail URLs rather than the GUID - perhaps something like this (from RepositoryController.cs):

    public ActionResult Detail(string id)
        {
            Guid repoId;
            if (!Guid.TryParse(id, out repoId))
            {
                // This doesn't seem to be a GUID - let's assume it's a name
                var repo = RepositoryRepository.GetRepository(id);
                if (repo == null)
                {
                    return new HttpNotFoundResult();
                }
                repoId = repo.Id;
            }

            ViewBag.ID = repoId;

            var model = ConvertRepositoryModel(RepositoryRepository.GetRepository(repoId), User);
            if (model != null)
            {
                model.IsCurrentUserAdministrator = RepositoryPermissionService.HasPermission(User.Id(), model.Id, RepositoryAccessLevel.Administer);
                SetGitUrls(model);
            }
            using (var browser = new RepositoryBrowser(Path.Combine(UserConfiguration.Current.Repositories, model.Name)))
            {
                string defaultReferenceName;
                browser.BrowseTree(null, null, out defaultReferenceName);
                RouteData.Values.Add("encodedName", defaultReferenceName);
            }

            return View(model);
        }

@eberleitguy
Copy link
Author

I will have a look at this avenue and see if i can put something together which works. This is quite helpful (or at least appears to be on the surface).

Any thoughts on why Bonobo isn't structured this way by default (is there some problem which is supposed to be solved by obscuring the Repo name as a GUID during browse operations, or is this an optimization to not need to look up the DB record GUID from the name on every page hit)?

Any thoughts on where the navigation URLs are generated (as a possible in-road to making an optional configuration wherein names are used in normal navigation rather than GUIDs for at least testing?)

@willdean
Copy link
Collaborator

I don't know the history of it because I wasn't involved in the project that far back, but Git repo names are perhaps not as obvious a choice as UUIDs are for a database key, which is what that 'Id' action parameter is. There may be potentially URL safety issues too, so some repo names might need escaping? The urls for the interactive repo browser just use the database ID, because that's the easiest thing to use, and they probably weren't intended to be providing an API for people to fetch files with other clients. A lot of the "per repository" page serving in Bonobo never actually touches the Git repo, it's all about permissions, etc, and it's just manipulating the Bonobo database.

The URLs on the index page are just generated using various ActionLink calls in RepositoryController\index.cshtml - the "id" part of the URL is set to the Id property of the database object (which is the key).

If you were to change the meaning of "Id" within the URLs at the point they're generated then you'd have to change all the corresponding action methods, like my example above. Simply making the relevant action methods happy to take either a GUID or a repo name seems like a nice safe straightforward starting point, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants