Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of asynchronous DNS callback request failures #36

Open
cgunther opened this issue Apr 12, 2021 · 0 comments
Open

Improve handling of asynchronous DNS callback request failures #36

cgunther opened this issue Apr 12, 2021 · 0 comments

Comments

@cgunther
Copy link

Rackspace treats non-GET requests dealing with DNS (and maybe other services?) as asynchronous:
https://docs.rackspace.com/docs/cloud-dns/v1/general-api-info/synchronous-and-asynchronous-responses

This gem handles that via:

response = wait_for_job service.add_records(@zone.identity, [options]).body['jobId']

def wait_for_job(job_id, timeout=Fog.timeout, interval=1)
retries = 5
response = nil
Fog.wait_for(timeout, interval) do
response = service.callback job_id
if response.body['status'] == 'COMPLETED'
true
elsif response.body['status'] == 'ERROR'
raise Fog::DNS::Rackspace::CallbackError.new(response)
elsif retries == 0
raise Fog::Errors::Error.new("Wait on job #{job_id} took too long")
else
retries -= 1
false
end
end
response
end

def callback(job_id, show_details=true)
validate_path_fragment :job_id, job_id
request(
:expects => [200, 202, 204],
:method => 'GET',
:path => "status/#{job_id}",
:query => "showDetails=#{show_details}"
)
end

However it's expecting a 200, 202, or 204 response for the polling of the status to be successful.

Rackspace also enforces rate limits on requests, 5 per second for polling status, returning a 413 code when exceeding the limit:
https://docs.rackspace.com/docs/cloud-dns/v1/general-api-info/limits#rate-limits

This can create a scenario where your code tries to create a record, the initial request to Rackspace is successful (but your application is still waiting), the gem starts polling for status, a polling request may return a non-200, 202, or 204 response (more common if you have multiple background jobs dealing with DNS simultaneously), which the gem then treats as a failure, which surfaces as if your code to create the record failed. However, that's not fully accurate, just a status request failed, the underlying job to create the record may still be processing and eventually succeed on it's own, however an error has already been raised in your application.

Given that the callback/status request is idempotent and just polling, I wonder if it should be less strict about what codes it expects, instead treating non-200, 202, 204 responses as a silent failure, triggering another retry. For example, if a callback request returned a 413 or 500 code, we likely don't need to treat the outer operation (adding a record) as a failure, we could just consider the callback a failure and hope we get a better response on the next retry, or ultimately erroring if we exceed the number of retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant