Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Replicator appends a / to target and source prefix #3499

Open
reweeden opened this issue Oct 19, 2023 · 0 comments
Open

S3 Replicator appends a / to target and source prefix #3499

reweeden opened this issue Oct 19, 2023 · 0 comments

Comments

@reweeden
Copy link
Contributor

Use case

I'm currently trying to set up a dedicated log bucket where all logs go. Logs are prefixed by the bucket name that the logs came from, so the bucket looks something like this:

public-bucket1/log-file
public-bucket2/log-file
protected-bucket1/log-file
protected-bucket2/log-file

These logs then need to be copied to the EMS distribution bucket, however, the s3 replicator is not capable of handling an empty source_prefix in order to grab everything from the logs bucket.

The issue

The way the policy is generated, the format string contains a trailing / after the source_prefix.

resources = [
"arn:aws:s3:::${var.source_bucket}/${var.source_prefix}/*",
"arn:aws:s3:::${var.target_bucket}/${var.target_prefix}/*"
]

This effectively means it's impossible to replicate logs from the root of the bucket (empty prefix) because the arn generated there will look like this: arn:aws:s3:::source-bucket-name//*.

The target prefix also is handled the same way in the code:

Key: `${process.env.TARGET_PREFIX}/${path.basename(srcKey)}`,

meaning that the s3 replicator will always add an extra / into the object key.

The correct way to handle prefixes would be like this:

"arn:aws:s3:::${var.source_bucket}${var.source_prefix}/*"

and

`${process.env.TARGET_PREFIX}${path.basename(srcKey)}`, 

and to set add the trailing slash to the source_prefix variable as is expected AWS convention. This would also allow more flexibility with prefixes that don't use a /.

Another edge case

Since the bucket notification trigger just uses the source_prefix variable as is, it is possible for undesired objects to be copied.

lambda_function {
lambda_function_arn = aws_lambda_function.s3_replicator.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = var.source_prefix
}

For instance, if the source bucket has logs in two directories like this:

ems-distribution/s3-server-access-logs/log-file
ems-distribution/s3-server-access-logs-do-not-copy/log-file

In order to copy logs from ems-distribution/s3-server-access-logs/ the source_prefix must be set to ems-distribution/s3-server-access-logs which will also catch anything from ems-distribution/s3-server-access-logs-do-not-copy/ on the bucket notification.

Workaround

The workaround currently is to put everything in the logs bucket into a shared prefix ending with a /. This requires copying the existing logs in the bucket to the new prefix which can take a while since s3 access logging generates a huge number of objects.

Ideal solution

Ideally the s3 replicator would treat s3 prefixes in the same way that AWS does, with no special logic for adding slashes implicitly, allowing the use of empty prefixes or prefixes that don't end with a /.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant