Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS S3 file create HTTP 400 errors #693

Open
wpeterson opened this issue Oct 3, 2023 · 8 comments
Open

AWS S3 file create HTTP 400 errors #693

wpeterson opened this issue Oct 3, 2023 · 8 comments
Labels

Comments

@wpeterson
Copy link

We recently upgraded a Rails app from 3.20.0 to 3.21.0 and began seeing all of our writes/file creation calls to AWS S3 failing with a 400 error from S3s service calls:

Example exception raised from excon client layer:

Excon::Error::BadRequest Expected(200) <=> Actual(400 Bad Request)
excon.error.response
  :body              => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>IncompleteBody</Code><Message>You did not provide the number of bytes specified by the Content-Length HTTP header</Message><NumberBytesExpected>228</NumberBytesExpected><NumberBytesProvided>0</NumberBytesProvided><RequestId>3SG15S95XAWWA2ZA</RequestId><HostId>g0MdNHKuphyF1e5ee7zt0g/C+4RBv4eKguXKul5TPkkK3esrFfzrEgiu63uDyIgSgl9f0FhJNpo=</HostId></Error>"

I don't see any changes in fog-aws or excon that appears related, but confirmed upgrading to fog-aws 3.21.0 and excon 0.104.0 causes this issue for us. Apologies for not having a simpler test case/reproduction, but thought I would log an issue in case other folks encounter this problem.

@geemus
Copy link
Member

geemus commented Oct 3, 2023

@wpeterson Thanks for the report. Are you using fog directly or through an intermediary like carrierwave? Could you share an example of how you are calling it in the vicinity of one of the errors?

A couple of guesses:

  1. We changed it to no longer send back a couple of the AWS headers, potentially this could mess with the length of things.
  2. We changed it to no longer retry client errors, but I would mostly expect what you describe to either work or not (and that retries wouldn't make a difference).

I'm not sure on either though, and could probably benefit from a bit more detail on your usage if you can share it.

@wpeterson
Copy link
Author

@geemus Thank you for looking/thinking further about this. We are using both fog-aws and carrierwave, but the usage which produces this error appears to be a direct invocation of Fog via Fog::Storage#bucket / bucket.files.create invocation:

    bucket.files.create(
      key: key,
      body: file,
      multipart_chunk_size: 5242880
    )

I've also included a redacted version of our Gemfile.lock if that's helpful (without the fog-aws upgrade):

https://gist.github.com/wpeterson/d8785654dc8c6c1a9061eb9f9690498f

@geemus
Copy link
Member

geemus commented Oct 11, 2023

Thanks for the extra details, just a little slow getting back to it as I'm on parental leave right now with a new baby so things are a little chaotic.

A couple of follow up questions:

  1. Are the failing files bigger than the specified multipart_chunk_size? (I'm pretty sure this must be the case, but seems best to be explicit and certain).
  2. Can you tell from the error messages which underlying call is failing? ie I'm guessing this is using multipart uploading, if so is it failing on initiating the upload, uploading a part, finalizing the parts, or something else altogether.

The only difference between 3.20.0 and 3.21.0 should be #691. On the plus side it's a fairly constrained change, on the less plus side, it's not immediately apparent to me how/why this would cause the problem, but hopefully since it's narrow we'll be able to pin it down without too much trouble.

@wpeterson
Copy link
Author

@geemus Thanks for looking into this and for all the work that goes into open source. This isn't urgent and you shouldn't feel the need to look at this if you've got a new baby at home.

Here's a slightly redacted stack trace showing the 400 from the AWS S3 layer. You can see a request from our controller calling a helper class, which is calling a Fog layer directly:

Excon::Error::BadRequest: Expected(200) <=> Actual(400 Bad Request)
excon.error.response
  :body              => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>IncompleteBody</Code><Message>You did not provide the number of bytes specified by the Content-Length HTTP header</Message><NumberBytesExpected>228</NumberBytesExpected><NumberBytesProvided>0</NumberBytesProvided><RequestId>3SG15S95XAWWA2ZA</RequestId><HostId>g0MdNHKuphyF1e5ee7zt0g/C+4RBv4eKguXKul5TPkkK3esrFfzrEgiu63uDyIgSgl9f0FhJNpo=</HostId></Error>"
  :cookies           => [
  ]
  :headers           => {
    "Connection"       => "close"
    "Content-Type"     => "application/xml"
    "Date"             => "Mon, 02 Oct 2023 20:42:25 GMT"
    "Server"           => "AmazonS3"
    "x-amz-id-2"       => "g0MdNHKuphyF1e5ee7zt0g/C+4RBv4eKguXKul5TPkkK3esrFfzrEgiu63uDyIgSgl9f0FhJNpo="
    "x-amz-request-id" => "3SG15S95XAWWA2ZA"
  }
  :host              => "s3.amazonaws.com"
  :local_address     => "REDACTED"
  :local_port        => 58182
  :method            => "PUT"
  :omit_default_port => false
  :path              => "/admin-uploads.bookbub.com/bulk_notable_listing_requests/739/Sourcebooks%20SL%20Nov%20add%20-%20Sheet1.csv"
  :port              => 443
  :query             => nil
  :reason_phrase     => "Bad Request"
  :remote_ip         => "REDACTED"
  :scheme            => "https"
  :status            => 400
  :status_line       => "HTTP/1.1 400 Bad Request\r\n"

response_call(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/excon-0.104.0/lib/excon/middlewares/expects.rb:13)

response_call(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/ddtrace-1.11.0/lib/datadog/tracing/contrib/excon/middleware.rb:47)

response_call(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/excon-0.104.0/lib/excon/middlewares/response_parser.rb:12)

response(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/excon-0.104.0/lib/excon/connection.rb:460)

request(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/excon-0.104.0/lib/excon/connection.rb:291)

request(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-core-2.3.0/lib/fog/core/connection.rb:79)

request(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-xml-0.1.4/lib/fog/xml/connection.rb:9)

_request(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-aws-3.21.0/lib/fog/aws/storage.rb:686)

request(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-aws-3.21.0/lib/fog/aws/storage.rb:681)

put_object(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-aws-3.21.0/lib/fog/aws/requests/storage/put_object.rb:48)

save(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-aws-3.21.0/lib/fog/aws/models/storage/file.rb:283)

create(/srv/app/lello/vendor/bundle/ruby/3.1.0/gems/fog-core-2.3.0/lib/fog/core/collection.rb:50)

put_file(/srv/app/lello/lib/fog_storage_handler.rb:21)

upload_csv_to_s3(/srv/app/lello/app/services/partners/bulk_uploads/notable_listing_request_creator.rb:137)

save(/srv/app/lello/app/services/partners/bulk_uploads/notable_listing_request_creator.rb:15)

create(/srv/app/lello/app/controllers/partners/radmin/bulk_notable_listing_requests_controller.rb:93)

@geemus
Copy link
Member

geemus commented Nov 17, 2023

@wpeterson Hey, finally trying to circle back to this now that I'm back to work. I strongly suspect that if you lock to that earlier fog-aws release it should avoid the issue (I wouldn't think the excon changes would be impactful for this in particular, but definitely let me know if that seems not to be the case).

@geemus
Copy link
Member

geemus commented Nov 17, 2023

@wpeterson Thanks for the stack-trace, that is helpful. For instance, we can see that it goes to put_object, which is the simpler non-multipart version (and should indicate that the file is less than the chunk size).

@geemus
Copy link
Member

geemus commented Nov 17, 2023

A couple of follow up questions to help narrow this down when you get the chance:

  1. If you leave the fog-aws version, but bump excon, what happens? (trying to further narrow down which changes might be in play).
  2. Relatedly, if it is the fog change, does adding { retry_limit: 5, retry_interval: 1 } to connection options fix it?

Sorry I don't have a ready answer, it's kind of a weird case and definitely something that we only see through a lot of layers of indirection, so just trying to narrow things down a bit more. Thanks.

Copy link

This issue has been marked inactive and will be closed if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants