Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: run unauthenticated fetch before authenticated #127

Closed
wants to merge 4 commits into from

Conversation

jeswr
Copy link
Member

@jeswr jeswr commented Sep 6, 2022

closes #126

@jeswr
Copy link
Member Author

jeswr commented Sep 6, 2022

@NSeydoux can you review this quickly

@jeswr jeswr mentioned this pull request Sep 6, 2022
Copy link
Member

@rubensworks rubensworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is actually the fix we really want.

The problem with this solution is that we can increase the number of HTTP requests per query two-fold. Especially in cases where we're doing link traversal, where the number of requests can go in the order of thousands, this will become a significant bottleneck, as query execution will basically take twice as long in some cases.

Alternatively, we could have a (non-default) option in the web client that enables this fallback-based fetch implementation. This would ensure that the general case still performs as fast as possible, and only those queries that require this specific workaround can make use of this mode.

In any case, I think this issue is broader than just the web client here, and should eventually be tackled on a higher level.

@jeswr
Copy link
Member Author

jeswr commented Sep 6, 2022

I'm not sure this is actually the fix we really want. | In any case, I think this issue is broader than just the web client here, and should eventually be tackled on a higher level.

@NSeydoux - correct me if I'm wrong; but won't this become the default logic of the Inrupt libraries in a few months anyway?

Especially in cases where we're doing link traversal, where the number of requests can go in the order of thousands

I see your point here; does it make sense to then use a caching mechanism like; and if so should we implement something similar when we add this kind of logic to solid-client-authn

  if (config.context.workerSolidAuth) {
    const authenticatedFetch = workerToWindowHandler.buildAuthenticatedFetch();
    const authUrls: Record<string, boolean> = {}

    async function comunicaFetch(...args) {
    if (authUrls[getNamesapce(args[0])]) {
            // In future we should check if this is a 403? (double check code) response
            // and if so try the *unathenticated* fetch. This handles cases like a qpf endpoint
            // changing an endpoint from being auth-required to public halfway through a session
           return await authenticatedFetch(...args);
    }
    
    
      const response = await global.fetch(...args);

      if (response.status === 401) {
                 authUrls[getNamespace(args[0])] = true;
                                  
                 return await authenticatedFetch(...args);
      }

      return response;
    }

    config.context.fetch = comunicaFetch;
  }

^^this solves performance when there is large number of requests to a QPF endpoint.

Especially in cases where we're doing link traversal

So if we are doing lots of traversal outside then this isn't an issue a Pod this isn't an issue, as all the documents we get should be able to retrieved with the first call; but if we are traversing documents within a Pod I do see your point. I'm not sure if it makes sense to cache origins rather than namespaces to make particular authenticated calls on (though that seems fraught with potential problems).

Alternatively, we could have a (non-default) option in the web client that enables this fallback-based fetch implementation.

I agree with the idea of an option; but strongly disagree with the default order. I think this fallback-based implementation should be the default as we should favour correctness over performance (which I know is bold of me to say given how many performance rabbitholes I've gone down in the last few months).

@rubensworks
Copy link
Member

Good idea, I think this namespace/origin-based caching should solve all major performance issues that could arise from this mechanism. And this could then become the default indeed.

I'm not sure yet though what you have in mind exactly for this getNamespace implementation. Just removing the path of the URL?

@NSeydoux
Copy link

NSeydoux commented Sep 6, 2022

The next version of solid-client-authn will indeed rely on a similar logic (defaulting to unauthenticated fetch, and only authenticating if necessary), because preemptive authentication is causing all sorts of issues, in addition of requiring the client to make a guess about the authentication mechanism, which should be discovered from the resource server.

However, you are absolutely right, it would not be acceptable to issue twice as much requests, and optimizations such as the caching @jeswr proposed will be required.

@jeswr
Copy link
Member Author

jeswr commented Sep 6, 2022

My stance is that we should be caching by namespace rather than origin, i.e. http://example.org/path?query=1 goes to http://example.org/path rather than http://example.org/

This could be implemented as

function getNamespace(input) {
  const url = new URL(new Request(input).url);
  return url.origin + url.pathname;
}

@jeswr
Copy link
Member Author

jeswr commented Sep 6, 2022

@rubensworks I've added the caching optimisations now. Still won't be as optimal as you want for link traversal to different paths within the same Pod. However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions.

In the future I guess we can use Pod metadata to optimise this.

@rubensworks
Copy link
Member

My stance is that we should be caching by namespace rather than origin, i.e. http://example.org/path?query=1 goes to http://example.org/path rather than http://example.org/

But in that case we still suffer from the performance problems, right?

For example, when traversing over a pod with 100 Turtle files in http://example.org/container/, this would require 200 HTTP requests (instead of 100).

So perhaps origin-based caching may be better after all?

However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions.

If I'm correct, using authenticated fetch is never a problem on pods, right? I've successfully done authenticated queries over pods with public resources in the past.
So the false-positives in that case may not be a problem.

In any case, a way to disable this behaviour seems important.

In the future I guess we can use Pod metadata to optimise this.

Definitely, but we just have to be sure that we don't break existing solutions in order to fix other problems :-)

@jeswr
Copy link
Member Author

jeswr commented Sep 6, 2022

If I'm correct, using authenticated fetch is never a problem on pods, right?

I have a hunch that this is implementation dependent - and it just so happens that our current implementations are ok with it. But that is more of a questions for someone like @NSeydoux .

In any case, a way to disable this behaviour seems important.

Agreed - I think we were just in disagreance on which should be the default. I can add in that option here & to the website once we make a decision on whether this is the correct caching strategy for us right now.

@rubensworks
Copy link
Member

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Website does not work with DBpedia fragments after login to Solid authentication
3 participants