Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LG-13136: Fix rate limiting logic when user is on last try #10538

Closed
wants to merge 14 commits into from

Conversation

night-jellyfish
Copy link
Member

@night-jellyfish night-jellyfish commented May 1, 2024

🎫 Ticket

LG-13136

🛠 Summary of changes

When we used limited? we found the user was prevented from completing their last chance at a request, even if the request came back from TrueID as a success. Due to the use of >= in limited?, in this situation the attempts and max_attempts were equal. While polling for the results, the user would be short-circuited to the too_many_requests logic.

Instead we needed a way to only prevent that if the user had exceeded their max amount of tries.

📜 Testing Plan

  • Set doc_auth_max_attempts to a low number locally in your application.yml
  • Fail an attempt due to doc auth errors up to minus one the amount of times you allowed in the first step (so if you changed to 5, fail 4 times with doc auth errors)
  • For your last attempt, use a successful image / yml and make sure you're able to get through to the SSN page.

@@ -28,7 +28,7 @@ def status
:unauthorized
elsif document_capture_session.cancelled_at
:gone
elsif rate_limiter.limited?
elsif rate_limiter.exceed_max?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking whether we should completely remove this check, due to the way rate_limiter implemented.

And technically this controller wont be calle, since when doc_auth is submitted then the rate_limit should not be exceeded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying out removing this call locally. Everything seems to be passing so far.

My main concern is that - if we shouldn't ever call this controller, is there some sort of earlier check that's misfiring? Because if it shouldn't have ever been called, then this user should never have encountered this issue...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@night-jellyfish , i meant this wont be called technically if rate limited, unless something goes wrong or called intentionally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the earlier check that would filter out rate limited requests? I am having a hard time finding it.

Copy link
Contributor

@dawei-nava dawei-nava May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@night-jellyfish here, it's a controller before_action hook

before_action :confirm_not_rate_limited, except: [:update]

also

before_action :confirm_not_rate_limited

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! I also found this relevant conversation that further supports your point.

With that in mind, I updated this PR to instead delete the check for rate limiting, as you suggested. I still need to test this manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to also add a feature test for this, perhaps.

When we used `limited?` we found the user was
prevented from completing their last chance at a
request, due to the `>=`, since the attempts and
max_attempts were equal.

Instead we needed a way to only prevent that if
the user had exceeded their max amount of tries.
In the previous commit, we were looking to prevent the
`capture_doc_status_controller` from setting the rate limit too early.

In this commit, we are instead removing the logic to even look at rate
limiting here, since [as Dawei pointed out](#10538 (comment)), and as I found [a past
discussion for](#9370 (comment)), this code should never be reached if rate limited.
@night-jellyfish night-jellyfish force-pushed the brittany/lg-1316-fix-rate-limit-bug branch from 7753cb3 to 67569af Compare May 1, 2024 23:14
Copy link
Contributor

@amirbey amirbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@amirbey amirbey self-requested a review May 7, 2024 16:05
Copy link
Contributor

@amirbey amirbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I get rate limited during they hybrid flow, the desktop is stuck on the polling page. It'd help to add a feature test for hybrid rate limiting

Copy link
Contributor

@eileen-nava eileen-nava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to successfully complete all the steps in the testing plan. I can approve this once you address Amir's feedback.

spec/features/idv/hybrid_mobile/hybrid_mobile_spec.rb Outdated Show resolved Hide resolved
@eileen-nava
Copy link
Contributor

When I get rate limited during the hybrid flow, the desktop is stuck on the polling page. It'd help to add a feature test for hybrid rate limiting

@amirbey When you were testing the hybrid flow and got rate limited, did the desktop still display the We sent a message to your phone screen? I think we might have a ticket/figma related to this, but I wanted to be sure I understood what you were referring to. Thanks!

@amirbey
Copy link
Contributor

amirbey commented May 7, 2024

When I get rate limited during the hybrid flow, the desktop is stuck on the polling page. It'd help to add a feature test for hybrid rate limiting

@amirbey When you were testing the hybrid flow and got rate limited, did the desktop still display the We sent a message to your phone screen? I think we might have a ticket/figma related to this, but I wanted to be sure I understood what you were referring to. Thanks!

hi @eileen-nava ... yes, that screen remains unchanged after the user gets rate limited
Screenshot 2024-05-07 at 3 28 41 PM
Screenshot 2024-05-07 at 3 29 57 PM

@eileen-nava
Copy link
Contributor

@amirbey @night-jellyfish I followed up with Kelli and realized that we had a Figma and ticket for a different issue. Kelli documented the past behavior in this figma and Team Ada fixed it in LG-9813.

@night-jellyfish
Copy link
Member Author

I pushed up a commit to restore the original test that looked for that page.

Unfortunately I'm struggling to get both tests to pass. @amirbey or @dawei-nava perhaps we could look at it tomorrow?

@eileen-nava eileen-nava self-requested a review May 9, 2024 18:25
timer = JobHelpers::Timer.new
# user submit a request, set the requested_at timestamp
document_capture_session.requested_at = Time.zone.now
document_capture_session.save!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persist it immediately in case we get racing condition in poller

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timestamp is used at another place

document_capture_session.requested_at = Time.zone.now

In this case, guess there is no racing concern, so no need to persist immediately.

Also, should we reset it after doc auth since it use used at this place also, maybe not.

# doc auth failed due to non network error or doc_pii is not valid
if client_response && !client_response.success? && !client_response.network_error?
errors_hash = client_response.errors&.to_h || {}
failed_front_fingerprint = nil
failed_back_fingerprint = nil
if errors_hash[:front] || errors_hash[:back]
if errors_hash[:front]
selfie_image_fingerprint = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need explicitly save failed result and result timestamp when finger print check disabled.

@@ -21,10 +22,16 @@ def self.mock_response!(method:, response:)
@response_mocks[method.to_sym] = response
end

def self.response_delay(delay)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding delays in mock client to facilitate tests that reflect reality, useful for hybrid flow poller testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. 👍🏻

@@ -142,20 +140,25 @@

before do
allow(IdentityConfig.store).to receive(:doc_auth_max_attempts).and_return(max_attempts)
allow(IdentityConfig.store).to receive(:doc_auth_check_failed_image_resubmission_enabled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable check so we can resubmit same images, easier to test.

# final failure
# reset to return mocked normal success response for the last attempt
DocAuth::Mock::DocAuthMockClient.reset!
DocAuth::Mock::DocAuthMockClient.response_delay(8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add delays longer than a polling cycle which is 5 sec.

@dawei-nava dawei-nava force-pushed the brittany/lg-1316-fix-rate-limit-bug branch from 547f055 to 7ff8083 Compare May 10, 2024 14:46
@dawei-nava dawei-nava force-pushed the brittany/lg-1316-fix-rate-limit-bug branch from 7ff8083 to 9a22735 Compare May 10, 2024 14:48
@dawei-nava
Copy link
Contributor

dawei-nava commented May 10, 2024

Future works:

  • network exception should not be counted for rate limiting, this includes LN service return 200 but indicate service not available in response message sometimes.
  • examine behavior of exceptions on hybrid flow? like image upload controller exception, capture status controller exception/availability when polling

@@ -98,10 +98,15 @@ def had_barcode_attention_result?
end

def redo_document_capture_pending?
return true if !session_result && document_capture_session&.requested_at.present?
return unless session_result&.dig(:captured_at)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may take a detailed look, is it necessary? what's our assumption when this method is invoked? what are the edge cases?

@eileen-nava
Copy link
Contributor

Team Timnit paused work on this bug in anticipation of the upcoming team transition. The comments provide guidance for whoever picks it up in the future. Thanks, @night-jellyfish and @dawei-nava, for all your work on this bug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants