Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't git clone from Gitlab from within a grain #3644

Open
moyamo opened this issue Jul 16, 2022 · 11 comments
Open

Can't git clone from Gitlab from within a grain #3644

moyamo opened this issue Jul 16, 2022 · 11 comments
Labels
app-platform App/Sandstorm integration features question

Comments

@moyamo
Copy link

moyamo commented Jul 16, 2022

I'm trying to write an app that git clone from a private Gitlab repo into a grain. I've used the powerbox to pass the URL of the repo and the Personal Access Token to my grain. If I git clone --depth=1, everything works fine, so I've setup everything correctly.

However if I git clone without the --depth flag I get an 500 Internal Server Error from Gitlab. The 500 error is in response to git doing a POST <repo>/git-upload-pack.

Is there anything weird that the sandstorm-http-bridge could be doing that could cause git clone to fail?

P.S. I've written an http proxy that sits between git and sandstorm-http-bridge to add the Authorization: Bearer <token> but otherwise passes the requests verbatim to the sandstorm-http-bridge.

@ocdtrekkie ocdtrekkie added app-platform App/Sandstorm integration features question labels Jul 16, 2022
@ocdtrekkie
Copy link
Collaborator

@zenhack Do you have any suggestions on this one?

@zenhack
Copy link
Collaborator

zenhack commented Jul 18, 2022

Is the code for this publicly available?

Is there anything of interest in sandstorm.log? Because of the way the HTTP proxying over capnp works, sometimes information about actual errors gets obscured by the time the grain sees a response, but errors in logs can be illuminating. (It is possible that GitLab is not giving you a 500, but somewhere inside sandstorm some error is occurring).

@moyamo
Copy link
Author

moyamo commented Jul 18, 2022

Is the code for this publicly available?

No, it's very WIP at the moment.

Is there anything of interest in sandstorm.log?

I checked the grains log and I was just seeing 500 Internal Error that looked like it was probably from Gitlab. I didn't think to check the sandstorm log. I'll go digging later and report back.

I've managed to get git clone to work though the IpNetwork interface. I didn't bother trying to use the ApiSession interface. I think that means the problem is either in the sandstorm-http-bridge or the ApiSession.

Using the IpNetwork, of course, is less than ideal, since only the server admin can grant access to it.

So I know the main use case for the ApiSession/sandstorm-http-bridge is to access REST APIs.
I guess my question is then: Is git clone over HTTPS something that ApiSession supports? Or is it too weird of a use case?

@ocdtrekkie
Copy link
Collaborator

We have multiple apps which use Git repos, none of which use IpNetwork, so you should be okay there. You may want to look at our GitWeb package. (We have GitLab too, but it is much older, I believe.)

@zenhack
Copy link
Collaborator

zenhack commented Jul 18, 2022

@ocdtrekkie, I believe @moyamo is trying to use a git client from inside a grain, rather than serve a git repo from a grain -- so we don't actually afaik have existing apps that do this.

I would hazard a guess the issue is either in our implementation of ApiSession (it seems entirely possible that git clone does something our current implementation isn't handling correctly), or in the way you're trying to use it in the app code.

Do you have an objection to publishing the code (or some simplified version that exhibits the problem)? I feel like it would save a lot of time for me to just be able to see what you're doing.

@moyamo
Copy link
Author

moyamo commented Jul 19, 2022

So I checked the sandstorm logs and it didn't log any errors so I'm 90% sure that the 500 error is coming from Gitlab and not from the sandstorm-http-bridge.

Here's some snippets of the code:

First we create a token that we can claim to access Gitlab.

@0x9759ad011d40ab4c;  # generated using `capnp id`

using Powerbox = import "/sandstorm/powerbox.capnp";
using ApiSession = import "/sandstorm/api-session.capnp".ApiSession;

const myTagValue :ApiSession.PowerboxTag = (
  canonicalUrl = "https://gitlab.com/api/v4",
  authentication = "basic",
);
const myDescriptor :Powerbox.PowerboxDescriptor = (
  tags = [
    (id = 0xc879e379c625cdc7, value = .myTagValue)
  ],
);

NOTE we set authentication = "basic" since that's how Gitlab wants us to do auth.

We do the thingy described in the docs to get the base64-encoded version of this constant.

descriptor = "EA5QAQEAABEBF1EEAQH/x80lxnnjecgAQAMRCdIAABERMv9odHRwczovLwJnaXRsYWIuY29tL2FwaS92ATQfYmFzaWM="

Then we use a jinja2 template to put the token in the HTML we do the thingy in the docs again to get the claim token.

<!doctype HTML>
<html>
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">
    </head>
    <body>
        <button onclick="connectGitlab()">Connect to Gitlab</button>
        <button onclick="clone()">Clone</button>
        <script>
            function connectGitlab() {
              window.parent.postMessage({
                powerboxRequest: {
                  rpcId: 1,
                  query: [
                    "{{descriptor}}"
                  ],
                  saveLabel: {defaultText: "gitlab API access"},
                }
              }, "*");
            }
            window.addEventListener("message", function (event) {
              if (event.source !== window.parent) {
                // SECURITY: ignore postMessages that didn't come from the parent frame.
                return;
              }

              var response = event.data;

              if (response.rpcId !== 1) {
                // Ignore RPC ID that dosen't match our request. (In real code you'd
                // probably have a table of outstanding RPCs so that you don't have to
                // register a separate handler for each one.)
                return;
              }

              if (response.error) {
                // Oops, something went wrong.
                alert(response.error);
                return;
              }

              if (response.canceled) {
                // The user closed the Powerbox without making a selection.
                return;
              }
              // We now have a claim token. We need to send this to our server
              // where we can exchange it for access to the remote API!
              doClaimToken(response.token);
            });
            async function doClaimToken(token) {
              r = await fetch("/token", {method: "POST", body: token})
            }

            async function clone() {
              r = await fetch("/clone", {method: "POST"})
            }
        </script>
    </body>
</html>

Then we claim the token and store it in the file /var/bearer.txt for later use.

@app.route("/token", methods=["POST"])
def token():
    tok = request.data.decode('utf-8')
    session_id = request.headers.get("X-Sandstorm-Session-Id")
    r = requests.post(f"http://http-bridge/session/{session_id}/claim",
                      headers={"Content-Type": "application/json"},
                      json={"requestToken": tok, "requiredPermissions": ["read"]}
                  )
    gitlab_cap = r.json()['cap']
    with open('/var/bearer.txt', 'w') as f:
        f.write(gitlab_cap)
    return ''

Now comes the interesting part. Unfortunately git doesn't allow you to use Bearer authentication so we use mitmproxy to add the Bearer token in between git and the sandstorm-http-bridge.

HOME=/var/ mitmdump  --mode upstream:$HTTP_PROXY -s /opt/app/gitproxy.py  &

NOTE --mode upstream:$HTTP_PROXY tells mitmdump to "pass-on" the connection to sandstorm-http-bridge.

-s /opt/app/gitproxy.py tells it to run this simple plugin which adds the Bearer token (which we get from /var/bearer.txt where we stored it earlier).

#!/usr/env/bin python

def request(flow):
    try:
        with open("/var/bearer.txt") as f:
            bear = f.read().strip()
        flow.request.headers["Authorization"] = "Bearer " + bear
    except Exception:
        pass

Now we try to git clone the repository

@app.route("/clone", methods=["POST"])
def go():
    os.chdir('/var')
    proxy = "http://localhost:8080" # This is mitmdump
    subprocess.run(['rm', '-r', 'myrepo'])
    # Try with --depth=1 (succeeds for some reason)
    subprocess.run(["git", "clone", "http://http-proxy/moyamo/myrepo.git", "--depth=1"], env={"http_proxy": proxy, "HTTP_PROXY": proxy})
    subprocess.run(['ls', '-l']) # Verify repo is cloned
    subprocess.run(['rm', '-r', 'myrepo'])
    # Try again without --depth=1 (fails for some reason)
    subprocess.run(["git", "clone", "http://http-proxy/moyamo/myrepo.git"], env={"http_proxy": proxy, "HTTP_PROXY": proxy})
    subprocess.run(['ls', '-l']) # Show repo is not cloned
    return ''

The output in the grains logs is

rm: cannot remove 'myrepo': No such file or directory
Cloning into 'myrepo'...
127.0.0.1:40290: clientconnect
127.0.0.1:40290: GET http://http-proxy/moyamo/myrepo.git/inf…
              << 200 OK 17.59k
127.0.0.1:40290: POST http://http-proxy/moyamo/myrepo.git/git…
              << 200 OK 56b
127.0.0.1:40290: POST http://http-proxy/moyamo/myrepo.git/git…
              << 200 OK 3.3m
127.0.0.1:40290: clientdisconnect
total 28
-rw-rw---- 1 723 463   60 Jul 19 18:41 bearer.txt
drwxrwx--- 9 723 463 4096 Jul 19 19:41 myrepo
drwxrwx--- 5 723 463 4096 Jul 19 18:15 lib
drwxrwx--- 4 723 463 4096 Jul 19 18:15 log
drwxrwx--- 3 723 463 4096 Jul 19 19:41 run
-rw-rw---- 1 723 463   50 Jul 19 18:43 thing.txt
drwxrwx--- 2 723 463 4096 Jul 19 19:41 tmp
Cloning into 'myrepo'...
127.0.0.1:40292: clientconnect
127.0.0.1:40292: GET http://http-proxy/moyamo/myrepo.git/inf…
              << 200 OK 17.59k
127.0.0.1:40292: POST http://http-proxy/moyamo/myrepo.git/git…
              << 500 Internal Server Error 0b
error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500
fatal: the remote end hung up unexpectedly
127.0.0.1:40292: clientdisconnect
total 24
-rw-rw---- 1 723 463   60 Jul 19 18:41 bearer.txt
drwxrwx--- 5 723 463 4096 Jul 19 18:15 lib
drwxrwx--- 4 723 463 4096 Jul 19 18:15 log
drwxrwx--- 3 723 463 4096 Jul 19 19:41 run
-rw-rw---- 1 723 463   50 Jul 19 18:43 thing.txt
drwxrwx--- 2 723 463 4096 Jul 19 19:41 tmp
[pid: 14|app: 0|req: 2/2] 127.0.0.1 () {62 vars in 1180 bytes} [Tue Jul 19 19:41:33 2022] POST /clone => generated 0 bytes in 5426 msecs (HTTP/1.1 200) 2 headers in 78 bytes (1 switches on core 0

Given that the git clone --depth=1 succeeded. I'm pretty sure that I did the authentication properly.

@zenhack
Copy link
Collaborator

zenhack commented Jul 20, 2022

Yeah, if it works with --depth=1 then it's probably not an auth problem. Best guess is that git is hitting some edge case that our implementation of ApiSession doesn't handle correctly. I'll have to remind myself how git clone actually does stuff at the protocol level and see if I can't figure out what we're missing (I knew the details at one point, but it's been a while). Hopefully I will have time to investigate soonish.

@moyamo
Copy link
Author

moyamo commented Jul 21, 2022

So if I set --depth=260 it actually gives an error instead of just failing silently

capnp/rpc.c++:160: info: returning failure over rpc; exception = capnp/arena.c++:153: failed: Exceeded message traversal limit. See capnp::ReaderOptions.

If I set --depth=259 it succeeds with Gitlab giving a response of 31.99MB. So it's clearly tapping out at a 32MB response. Weird that IpNetwork doesn't have the same limitation.

@moyamo
Copy link
Author

moyamo commented Jul 21, 2022

On the other hand maybe this is a different error. Maybe when I do a git clone Gitlab gives me a 500 Internal Server Error without sending large amounts of data, but when I do git clone --depth=260, Gitlab is fine, but then capnp gives in.

@zenhack
Copy link
Collaborator

zenhack commented Aug 1, 2022

So it's clearly tapping out at a 32MB response

Hm, I know we've had problems with larger requests in the other direction, but I can never remember if those got solved (@ocdtrekkie, ring any bells?)

@ocdtrekkie
Copy link
Collaborator

Only thing I know of with large transfers was the whole range request thing, which is still an open PR, and I am turned around enough to not know which way that was or which way this is, which may not be much help to anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app-platform App/Sandstorm integration features question
Projects
None yet
Development

No branches or pull requests

3 participants