Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect transaction duration for pages after redirect #1279

Closed
trainings opened this issue Sep 21, 2022 · 16 comments · Fixed by #1400
Closed

Incorrect transaction duration for pages after redirect #1279

trainings opened this issue Sep 21, 2022 · 16 comments · Fixed by #1400
Assignees
Labels

Comments

@trainings
Copy link

We use token handler pattern for enforcing authentication for web applications.
Steps:

  1. user types abc.xxx.com and sends a request
  2. on the server side it checks for the presence of a token in the cookie. if there is no token, the user is redirected to authorization service auth.xxx.com
  3. user enters his credentials
  4. user redirected back to the abc.xxx.com

everything is standard, but:

page load transaction measured from point 1 (not 4)
on the sample of a transaction, gaps for items 2 & 3

@devcorpio
Copy link
Contributor

devcorpio commented Sep 28, 2022

Hi @trainings,

Thanks for reaching out!

Bear in mind that the RUM page-load transaction will end just after the browser fires the page load event.

The agent does not keep state between full-page refreshes which means that every page will have each own page-load transaction.

Important to mention that it's possible to disable the page-load instrumentation so you can start and end such transaction manually, that tends to be common in SPAs given that you might want to decouple "my SPA is fully loaded/bootstrapped" from the load event the browser fires. On your particular scenario this might not help since it seems that you are doing redirects that force the browser to fully refresh.

In a future version of the agent we will introduce the concept of user sessions which will provide (among other things) with the possibility of seeing every page-load transaction grouped together under the same session id.

Thanks,
Alberto

@trainings
Copy link
Author

I feel like I've been misunderstood.
Now, if an anonymous user, in the process of gaining access to a web page, goes to an external service (e.g. keycloak) for authentication, then all redirects (to keycloak and back to my service callback, etc.) time are counted as a transaction time.

For example - I have a page with page-load-time 3sec.
And we have the following:
User is opening a browser and typing xxx.com and sending it
My service processes the request and checks that the user isn't authenticated
My service redirect (302 http code) the user to the IDP (keycloack) on another domain auth.com
The user gets to this page and enters his password there (5 sec) and submits it
Auth service verifies user credentials and redirect user back to the xxx.com/callback with a authorization code
User goes to xxx.com/callback, xxx application completes the phase post-login (extracts jwt token from auth service) and only after that redirects the user to the originally requested page (xxx.com)

What we have:
I have a page with page-load-time 3sec (for authenticated users)
But for users who was not authenticated before and who should pass authentication transactions time will be:
5sec (enter credentials) + 3sec (real page load time) + some time for redirects
Transaction duration will be: 8-9 sec instead of 3

This negatively affects the metric
This distorts the metric, because my application has not become worse
I can't figure out on my own how much time the user now spends on going through the authentication path

@litwin90
Copy link

Hello everyone, seems we have the same problem
Are there any updates on this issue?

@litwin90
Copy link

litwin90 commented Nov 24, 2022

As I could see the problem could be located here
image

I mean that earliest span is highly like being got from Performance observer (which initiated earlier then the final navigation is started in case of auth with redirections)

image

@TELEUZI
Copy link

TELEUZI commented Nov 28, 2022

Hi everyone, our team met the same problems, what is the current state of issue and may be there is some temporary solution?

@devcorpio devcorpio added the bug label Nov 28, 2022
@devcorpio
Copy link
Contributor

devcorpio commented Nov 28, 2022

Hi everyone,

The bug has been found.

Unlike createNavigationTimingSpans, when we create createResourceTimingSpan we are NOT substracting fetchstart from the span start and end properties. In a nutshell, we are not creating these spans relative to when the current page has been requested.

This is important since performance entry times include the time spent before the redirection happened.

Example (node.js server):

http://localhost:3000/the-server-will-redirect-this.html

  setTimeout(() => {
    res.redirect("http://localhost:3000/index.html")
  }, 2000)
})

Once the redirection is performed, we can see how the navigation spans are correctly set:

Screenshot 2022-11-28 at 13 37 09

However, the resource timing spans are not:

Screenshot 2022-11-28 at 13 37 53

When skipping the redirection, we can see how the resource timing spans are correctly set:

Screenshot 2022-11-28 at 13 39 16

We will give priority to this fix

Thanks,
Alberto

@trainings
Copy link
Author

Hi all!

Are there any news and when is it planned to fix this problem?

Last release was June 14th,
Does it mean that the service/agent is dead and it's time to look at an alternative?

@oxcafedead
Copy link

oxcafedead commented Apr 5, 2023

Hi @devcorpio!
It seems that the issue is clear now, but is the fix ready or some contribution wanted? Maybe you could provide some estimated time when this going to be fixed?
In your last comment, you mentioned that:

We will give priority to this fix

Thanks!

@litwin90
Copy link

litwin90 commented Apr 5, 2023

Would be great to get an update on this issue
@devcorpio

@drewpost
Copy link

Just a quick update on this. We are aiming to get this issue resolved and shipped in the next couple of months alongside several other RUM agent improvements.

fonshirin added a commit to fonshirin/apm-agent-rum-js that referenced this issue Jul 4, 2023
@devcorpio
Copy link
Contributor

devcorpio commented Jul 7, 2023

Hi everyone,

The team is currently working on this, too.

To give you a little more context, after further analysis, we don't consider this a "bug", but an important enhancement that we need to do. The problem is that we weren't including the redirect duration in the agent beacons and hence the APM UI waterfall was not showing it.

In fact, that's how browsers work, for instance, take this example from Chrome:

The first screenshot shows the duration of the redirect:

Screenshot 2023-07-07 at 12 40 00

The second screenshot shows when the actual page, "index.html", starts loading (just after the redirection):

Screenshot 2023-07-07 at 12 42 04

So, all those timings are part of the user experience to end up in the "final" page, that's why the redirect will be also included.

We will start adding a span named "Redirect" (if there is a redirect) and you will be able to see why the transaction lasts more than usual:

E.g.

Screenshot 2023-07-07 at 12 11 22

Thanks,
Alberto

@oxcafedead
Copy link

oxcafedead commented Jul 7, 2023

@devcorpio can we make a difference for redirects within the same site/origin and redirects to third party systems?
I mean, including a same-origin redirect into a transaction does make sense to me, but for third party systems this is not something we can control at our end. For example, I found in navigation timing API the following words:

Though redirectStart and redirectEnd are exposed in PerformanceResourceTiming, they have a different meaning in Navigation Timing, where they return zero for navigations with cross-origin redirects.

If I understand right, navigation timing browser API should return 0 for redirects to cross-origin, can we have similar behavior?

UPD: Seems like 3rd-party redirects are not counted. Sorry for confusion.

@devcorpio
Copy link
Contributor

devcorpio commented Jul 18, 2023

Hi everyone,

There has been more discussions and investigation into this within the team.

It's not possible programmatically to identify if there has been a cross-origin redirect (something that also @oxcafedead stated recently), so browser APIs set everything related to redirect to 0. Because of this, an eventual solution for the original problem described in the issue would be hacky (using navigationStart for this is not as reliable as it might seem at first instance) and prone to cause bugs (even breaking changes) that might affect different scenarios.

On top of that, the current behaviour is aligned with how a browser and other tools work and we want to be aligned with how they do things, let me show you an example:

redirect-demo.mov

The video shows a cross-origin redirect (from localhost:3001 to localhost:3000). This redirection takes place (it lasts 1,5s) while we are doing a performance recording with Chrome. In the late stages of the video, you will see how all the marks identified by the browser (e.g. LCP, FPC, DLC) also include the redirection timing in the marks, for instance, LCP happens in 1,87s

Although we understand that it would be great to differentiate the time a user spent in a different origin and the time spent on the actual page, that's something that browsers currently don't allow us to identify. In fact, for browsers, the time a user spent in the browser before the redirection is also part of the user experience.

As previously mentioned, what we can confirm we are going to do is to fix the agent behaviour if redirect info is available (this happens when it is a same-origin one), this can be broken down in a few steps:

  • add a span mentioning a redirect happened
  • make sure all marks (LCP, TTFB, etc) show up properly in the waterfall - they should appear after the new span and never before

So, instead of this:

before

We should have something like this:

after

I understand that not doing what you asked for is unfortunate, hope you find the motives described above reasonable, we can assure you we have reflected plenty on this.

Additionally, I would like to thank you again for helping us to keep improving the product, the redirect (same-origin) behaviour we are going to fix is a clear example of how useful your feedback is.

Cheers,
Alberto

@fonshirin
Copy link

fonshirin commented Jul 20, 2023

@devcorpio, maybe we can use PerformanceNavigationTiming interface instead of PerformanceTiming to create navigation spans. PerformanceNavigationTiming uses relative timing values instead of timestamps so its timings will be aligned with Resource Entry timings, LCP timings, etc. In case of cross-domain redirect PerformanceNavigationTiming properties values will also include the duration of the redirect. So we will be able to detect that some action was happened before the navigation events were really triggered. In this case we will be able to draw this action as the first span in the waterfall, like in the example of the future fix provided by you.

Thx.

@fonshirin
Copy link

@devcorpio, maybe we can use PerformanceNavigationTiming interface instead of PerformanceTiming to create navigation spans. PerformanceNavigationTiming uses relative timing values instead of timestamps so its timings will be aligned with Resource Entry timings, LCP timings, etc. In case of cross-domain redirect PerformanceNavigationTiming properties values will also include the duration of the redirect. So we will be able to detect that some action was happened before the navigation events were really triggered. In this case we will be able to draw this action as the first span in the waterfall, like in the example of the future fix provided by you.

Thx.

Dear @devcorpio, and how about the fix for cross-domain redirect?
Please see my previous comment above in quote.

Thank you in advance.

@devcorpio
Copy link
Contributor

devcorpio commented Jul 31, 2023

Hi @fonshirin,

Sorry for the delay, I was waiting for the release before adding here a comment.

When we discussed the cross-origin with the team. We also mentioned the possibility of switching to PerformanceNavigationTiming, but for now, we discarded that option for how big and sensitive the change might be for the agent.

Additionally, we decided that for now (as I also mentioned in the previous comment), we want the RUM agent to behave like browsers do (meaning, including the extra redirection to the total time)

We will retackle this in future versions of the agent (most likely when we switch PerformanceNavigationTiming, but that's not going to happen in the short/mid-term)

Additional actions:

  • We will create shortly a new enhancement ticket to switch navigation API 2
  • We will also create a subtask for fixing the cross domain issue

I'll link them to this ticket (for reference) once we create them

Thanks,
Alberto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants