chore: run unauthenticated fetch before authenticated #127

jeswr · 2022-09-06T10:17:55Z

closes #126

jeswr · 2022-09-06T10:18:45Z

@NSeydoux can you review this quickly

rubensworks

I'm not sure this is actually the fix we really want.

The problem with this solution is that we can increase the number of HTTP requests per query two-fold. Especially in cases where we're doing link traversal, where the number of requests can go in the order of thousands, this will become a significant bottleneck, as query execution will basically take twice as long in some cases.

Alternatively, we could have a (non-default) option in the web client that enables this fallback-based fetch implementation. This would ensure that the general case still performs as fast as possible, and only those queries that require this specific workaround can make use of this mode.

In any case, I think this issue is broader than just the web client here, and should eventually be tackled on a higher level.

jeswr · 2022-09-06T11:03:23Z

I'm not sure this is actually the fix we really want. | In any case, I think this issue is broader than just the web client here, and should eventually be tackled on a higher level.

@NSeydoux - correct me if I'm wrong; but won't this become the default logic of the Inrupt libraries in a few months anyway?

Especially in cases where we're doing link traversal, where the number of requests can go in the order of thousands

I see your point here; does it make sense to then use a caching mechanism like; and if so should we implement something similar when we add this kind of logic to solid-client-authn

  if (config.context.workerSolidAuth) {
    const authenticatedFetch = workerToWindowHandler.buildAuthenticatedFetch();
    const authUrls: Record<string, boolean> = {}

    async function comunicaFetch(...args) {
    if (authUrls[getNamesapce(args[0])]) {
            // In future we should check if this is a 403? (double check code) response
            // and if so try the *unathenticated* fetch. This handles cases like a qpf endpoint
            // changing an endpoint from being auth-required to public halfway through a session
           return await authenticatedFetch(...args);
    }
    
    
      const response = await global.fetch(...args);

      if (response.status === 401) {
                 authUrls[getNamespace(args[0])] = true;
                                  
                 return await authenticatedFetch(...args);
      }

      return response;
    }

    config.context.fetch = comunicaFetch;
  }

^^this solves performance when there is large number of requests to a QPF endpoint.

Especially in cases where we're doing link traversal

So if we are doing lots of traversal outside then this isn't an issue a Pod this isn't an issue, as all the documents we get should be able to retrieved with the first call; but if we are traversing documents within a Pod I do see your point. I'm not sure if it makes sense to cache origins rather than namespaces to make particular authenticated calls on (though that seems fraught with potential problems).

Alternatively, we could have a (non-default) option in the web client that enables this fallback-based fetch implementation.

I agree with the idea of an option; but strongly disagree with the default order. I think this fallback-based implementation should be the default as we should favour correctness over performance (which I know is bold of me to say given how many performance rabbitholes I've gone down in the last few months).

rubensworks · 2022-09-06T12:04:23Z

Good idea, I think this namespace/origin-based caching should solve all major performance issues that could arise from this mechanism. And this could then become the default indeed.

I'm not sure yet though what you have in mind exactly for this getNamespace implementation. Just removing the path of the URL?

NSeydoux · 2022-09-06T12:19:16Z

The next version of solid-client-authn will indeed rely on a similar logic (defaulting to unauthenticated fetch, and only authenticating if necessary), because preemptive authentication is causing all sorts of issues, in addition of requiring the client to make a guess about the authentication mechanism, which should be discovered from the resource server.

However, you are absolutely right, it would not be acceptable to issue twice as much requests, and optimizations such as the caching @jeswr proposed will be required.

jeswr · 2022-09-06T12:32:08Z

My stance is that we should be caching by namespace rather than origin, i.e. http://example.org/path?query=1 goes to http://example.org/path rather than http://example.org/

This could be implemented as

function getNamespace(input) {
  const url = new URL(new Request(input).url);
  return url.origin + url.pathname;
}

jeswr · 2022-09-06T12:52:19Z

@rubensworks I've added the caching optimisations now. Still won't be as optimal as you want for link traversal to different paths within the same Pod. However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions.

In the future I guess we can use Pod metadata to optimise this.

rubensworks · 2022-09-06T13:45:45Z

My stance is that we should be caching by namespace rather than origin, i.e. http://example.org/path?query=1 goes to http://example.org/path rather than http://example.org/

But in that case we still suffer from the performance problems, right?

For example, when traversing over a pod with 100 Turtle files in http://example.org/container/, this would require 200 HTTP requests (instead of 100).

So perhaps origin-based caching may be better after all?

However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions.

If I'm correct, using authenticated fetch is never a problem on pods, right? I've successfully done authenticated queries over pods with public resources in the past.
So the false-positives in that case may not be a problem.

In any case, a way to disable this behaviour seems important.

In the future I guess we can use Pod metadata to optimise this.

Definitely, but we just have to be sure that we don't break existing solutions in order to fix other problems :-)

jeswr · 2022-09-06T14:02:23Z

If I'm correct, using authenticated fetch is never a problem on pods, right?

I have a hunch that this is implementation dependent - and it just so happens that our current implementations are ok with it. But that is more of a questions for someone like @NSeydoux .

In any case, a way to disable this behaviour seems important.

Agreed - I think we were just in disagreance on which should be the default. I can add in that option here & to the website once we make a decision on whether this is the correct caching strategy for us right now.

rubensworks · 2022-11-09T13:22:52Z

Any updates on this?

chore: run unauthenticated fetch before authenticated

1e17e90

chore: remove unecessary line break

b16073c

NSeydoux approved these changes Sep 6, 2022

View reviewed changes

jeswr mentioned this pull request Sep 6, 2022

Remove old deployment file #128

Closed

rubensworks requested changes Sep 6, 2022

View reviewed changes

perf: cache authenticated URLs

0ff5dd7

chore: update CI script

b4bc80f

rubensworks force-pushed the master branch from d478f1d to cd53dcc Compare November 10, 2022 10:45

rubensworks closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: run unauthenticated fetch before authenticated #127

chore: run unauthenticated fetch before authenticated #127

jeswr commented Sep 6, 2022

jeswr commented Sep 6, 2022 •

edited

rubensworks left a comment

jeswr commented Sep 6, 2022 •

edited

rubensworks commented Sep 6, 2022

NSeydoux commented Sep 6, 2022

jeswr commented Sep 6, 2022

jeswr commented Sep 6, 2022

rubensworks commented Sep 6, 2022

jeswr commented Sep 6, 2022

rubensworks commented Nov 9, 2022

chore: run unauthenticated fetch before authenticated #127

chore: run unauthenticated fetch before authenticated #127

Conversation

jeswr commented Sep 6, 2022

jeswr commented Sep 6, 2022 • edited

rubensworks left a comment

Choose a reason for hiding this comment

jeswr commented Sep 6, 2022 • edited

rubensworks commented Sep 6, 2022

NSeydoux commented Sep 6, 2022

jeswr commented Sep 6, 2022

jeswr commented Sep 6, 2022

rubensworks commented Sep 6, 2022

jeswr commented Sep 6, 2022

rubensworks commented Nov 9, 2022

jeswr commented Sep 6, 2022 •

edited

jeswr commented Sep 6, 2022 •

edited