New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks/WP-190: Handle concurrency with Tapis OAuth Token Refresh #932
base: main
Are you sure you want to change the base?
Conversation
* Fix and enable shared workspaces unit test * Remove submodule added in a previous PR
initial commit
initial commit
…l into tasks/WP-190-Tapis-Mutex
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #932 +/- ##
==========================================
- Coverage 64.81% 64.71% -0.10%
==========================================
Files 434 434
Lines 12482 12509 +27
Branches 2573 2608 +35
==========================================
+ Hits 8090 8095 +5
- Misses 4161 4183 +22
Partials 231 231
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good I think
Co-authored-by: Sal Tijerina <r.sal.tijerina@gmail.com>
I haven’t had the chance to test it yet, but I appreciate the detailed PR description, particularly the 'Possible Solutions' section 💯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
client.refresh_tokens() | ||
except Exception: | ||
logger.exception('Tapis Token refresh failed') | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On failure, perhaps we should call logout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will test it and share info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic is all in a model, can't do http redirect or control view responses from here. Have to do from view. I setup an custom exception and handled it in Base View to send 401 back to client. On testing, by forcing an error - it turned the 401 to redirect to tapis oauth, but that failed due to CORS policy issue, I have to check if this is local setup or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came to that realization as well today. I tried another solution with DesignSafe, which is to put the refresh logic in a middleware. In CEP we originally moved that logic from middleware to the client()
method, but I think the solution you propose here might solve the original issue there?
Haven't tested yet, what are your thoughts?
https://github.com/DesignSafe-CI/portal/blob/task/DES-2702--tapis-v3-oauth/designsafe/apps/auth/middleware.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstijerina - sorry for delay in response, I missed this note.
I looked at the code in that branch. It looks good, one comment on overall integration:
- Should you do this also for logout?
logout(request)
return HttpResponseRedirect(reverse('designsafe_auth:login'))
Some testing aspects:
- Behavior on xhr requests when tapis token expiry fails. If xhr does not handle 302 cleanly, some extra check and redirect might be needed.
- Walking through the code, it is protected from infinite loop through this(which is good). If refresh fails, goes to login, which has to authenticate with tapis, if authentication works, this middleware immediately returns (because there is no expiry) and move away from this middleware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use do this also for logout?
logout(request)
return HttpResponseRedirect(reverse('designsafe_auth:login'))
Yes, thanks. Added:
https://github.com/DesignSafe-CI/portal/blob/task/DES-2709--v3-apps-views/designsafe/apps/auth/middleware.py
Behavior on xhr requests when tapis token expiry fails. If xhr does not handle 302 cleanly, some extra check and redirect might be needed.
Can you expand more on this part? Where would tapis token expiry fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior on xhr requests when tapis token expiry fails. If xhr does not handle 302 cleanly, some extra check and redirect might be needed.
Can you expand more on this part? Where would tapis token expiry fail?
I meant - "Behavior on xhr requests when tapis token expires, and the refresh fails - this will hit the logout code and send a 302 back to client. If javascript side of response handling does not handle 302 cleanly (page rendering after 302, etc), may be extra logic need be needed to check for 302 status and specific error type(token expired) and then setting location href to logout".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstijerina - regarding this PR, if middleware is the right place for auth and if it is working in designsafe, should I do the same here and start testing. What is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution I have in DesginSafe does work, here are example logs from a refresh that just occurred for me:
des_django | [DJANGO] INFO 2024-04-12 14:28:57,764 middleware designsafe.apps.auth.middleware.process_request:49: Tapis OAuth token expired for user sal. Refreshing token
des_django | [DJANGO] INFO 2024-04-12 14:28:57,769 middleware designsafe.apps.auth.middleware.process_request:49: Tapis OAuth token expired for user sal. Refreshing token
des_django | [DJANGO] INFO 2024-04-12 14:28:57,775 middleware designsafe.apps.auth.middleware.process_request:61: Refreshing Tapis OAuth token
des_django | [DJANGO] INFO 2024-04-12 14:29:02,626 middleware designsafe.apps.auth.middleware.process_request:72: Token updated by another request. Refreshing token from DB.
It might not be fool-proof though, and could definitely use more testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could talk about best place for token refresh in the next infra scrum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! And tested well
Re-working this PR as middleware similar to https://github.com/DesignSafe-CI/portal/blob/task/DES-2709--v3-apps-views/designsafe/apps/auth/middleware.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Overview
When Tapis OAuth token expires for a user and multiple tapis api calls are requested concurrently (example: page refresh), all of them send requests to Tapis to refresh token. This duplicate request is a waste of resource and slows performance.
The solution is to only make one request per user when token expires. Any solution should work with requests distributed across multiple processes.
Possible Solutions
django select_for_update
django-db-mutex
This PR uses select_for_update since is readily available in django and has waits.
Related
Changes
Testing
Expire token
Enable Setting
In settings_local.py, add entry:
ENABLE_OPTIMIZED_OAUTH_REFRESH = True
Test cases:
Whole page refresh - triggers multiple concurrent requests with expired token:
** Only one "Refreshing tapis oauth token" message is seen, rest are all waiting for row lock
** 2 of waiting transactions, get the update access token info and process the request.
Single request
** Only one "Refreshing tapis oauth token" message is seen, rest are all waiting for row lock
** No other request acquired row lock because they already saw the non-expired token
Basic UI Sanity Tests
Ran through basic UI sanity tests to ensure acquiring client does not fail
UI
Notes