New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3g sidecar passthru #8287
S3g sidecar passthru #8287
Conversation
…ng s2 uses in more detail.
// body into memory, if it does that's bad for large writes | ||
// and we should figure out how we can stream it to disk | ||
// first... | ||
awsauth.Sign4(req, awsauth.Credentials{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we probably need to plumb the volume with the web identity token into it. See https://pachyderm.slack.com/archives/C01LBA4NSJU/p1666033437705929?thread_ts=1666032276.294569&cid=C01LBA4NSJU
sc = SparkContext(conf=conf) | ||
sc.setLogLevel("ERROR") | ||
# sc.setLogLevel("DEBUG") | ||
sc.setSystemProperty("com.amazonaws.services.s3.disablePutObjectMD5Validation", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably don't need this any more
import os | ||
|
||
conf = SparkConf() | ||
minio = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could strip this out
|
||
Writes to `s3_out` will then work with Spark, especially when Spark is writing a large amount of data. (With the normal S3 gateway, you see slow-downs and errors relating to "copyFile" failing.) | ||
|
||
This directory contains a worked example. We've built and pushed the Docker image for you already, so all you need to do is run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to the bits?
var CurrentBucket string = "out" | ||
|
||
// This is like pipeline_name-<job-id> | ||
var CurrentTargetPath string = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JobScopedPrefix
awsauth "github.com/smartystreets/go-aws-auth" | ||
) | ||
|
||
type RawS3Proxy struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the global variables inside the struct, duh
Wireshark: https://github.com/eldadru/ksniff#ksniff
Set a filter in wireshark to |
// transform each of the response headers | ||
for k, v := range resp.Header { | ||
for i, vv := range v { | ||
v[i] = transform(vv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be untransform
… (see user container pipeline logs grep for PROXY)
… into s3g-sidecar-passthru
…e storage secrets, and we don't want to leak them to the user code. The triggering and blocking on the copy however needs to happen from the user code. So we make an http call from the user container on localhost to a new API endpoint on the storage container called /finish, BEFORE we finish the commit in the worker, and block until it completes.
This reverts commit 846176d.
This reverts commit 2296eb5.
This reverts commit ceb09bb.
This reverts commit 28b673a.
Closing PR as it's obsolete (though we should consider revisiting the project someday) |
Draft for collaboration/knowledge sharing