Allow for easily matching rules using path prefixes #1089

davidspek · 2023-04-11T13:09:13Z

Preflight checklist

I could not find a solution in the existing issues, docs, nor discussions.
I agree to follow this project's Code of Conduct.
I have read and am following this repository's Contribution Guidelines.
This issue affects my Ory Network project.
I have joined the Ory Community Slack.
I am signed up to the Ory Security Patch Newsletter.

Context and scope

Currently rules are matched to an incoming request based on regex or glob pattern matching. Since only a single rule is allowed to match a request, the regex or glob patterns must be very precise and become increasingly complex as different paths need to be matched. While the used regex library supports negative lookahead, creating rules where requests are matched based on path prefixes is still difficult since you'd need to provide negative matches for all other rule regexes with the same base path. As glob doesn't support negative lookahead doing path prefix based routing is not possible. Even when using regex patterns with negative lookahead, the issue that arises is that the end user ends up needing to manage a system with the state of all rules to then be able to create rules with the appropriate regexes including negative lookahead patterns for all other rules. Since the most common type of routing people likely would want to implement is prefixed based routing, the fact that doing so with oathkeeper is difficult and error prone is in my views its biggest drawback.

Also see #441.

Goals and non-goals

Goals:

Allow for easily matching routing rules using path prefixes.

Non-goals:

Allow for using regex in the scheme section of the URL, since that would make it much more difficult to enter rules into the Trie.
Allow for regex or glob patterns in places other in the host section of the URL, since making a Trie compatible with pattern matching is likely very difficult if not impossible.
Allow for regex or glob patterns in places other than the last section of the path, since making a Trie compatible with pattern matching is likely very difficult if not impossible.

The design

In order to allow for easily matching route prefixes we can make use of a Trie Data Structure. These are commonly used for things like search engines where you'd want to quickly find entries that contain a word without looping through all entries within the database. We will use a Trie to find the oathkeeper rule with the longest matching prefix. Since the time complexity is O(n) the matching of rules should be very fast, and likely much faster than what is currently being done since currently oathkeeper loops through each rule to compile a list of matches.

Here is an example of Trie that will be used:

Entering rules into the Trie.

To simplify and speed up matching the first node in our Trie will be the protocol used. This can be either HTTP or GRPC. The next level node in the Trie are the methods (GET, POST, PUT, PATCH, DELETE). This is followed by a node containing the URL scheme (for example: https, as parsed by url.Parse) and then a node containing the host (as parsed by `url.Parse).

Now for the paths. First, we will cleanup the path and keep anything up to the first regex or glob pattern. Since we are only interested in full paths that match, we will not be entering each character of the path into the Trie. Rather, we will split the path of a rule by the / character and enter each complete path into the Trie as a node. On the last path we will add the rule object to the Trie node so we can retrieve it. Technically a Trie node can contain multiple rules if patterns are used somewhere in the paths.

Matching incoming requests

When a request comes in and prefix matching is enabled we will first search the Trie for the rule(s) that match the protocol, method, sheme, hostname and path. Next, we will evaluate the rule(s) for regex or glob pattern matching the same way that is done currently. If after the pattern matching there are still multiple rules that are matched an error will occur the same way this happens currently.

Examples

Using the above example Trie diagram the following requests would en up at the following nodes in the Trie.

Protocol	Method	Scheme	Hostname	Path	Node Number
ProtocolHTTP	GET	https	example.com	/some/path	9
ProtocolHTTP	GET	https	example.com	/some/other/path	12
ProtocolHTTP	GET	https	example.com	/something/else	8
ProtocolHTTP	GET	https	example.com	/some/thing	5
ProtocolHTTP	GET	https	example.com	/unknown/path	1
ProtocolHTTP	GET	https	example.com	/something/else/and/very/long	8
ProtocolHTTP	POST	https	example.com	/	2
ProtocolHTTP	POST	https	example.com	/some/long/path	7
ProtocolHTTP	POST	https	example.com	/some/path	11

APIs

No response

Data storage

No response

Code and pseudo-code

An implementation can be found in #1073.

Degree of constraint

No response

Alternatives considered

No response

The text was updated successfully, but these errors were encountered:

davidspek · 2023-04-11T13:15:06Z

/cc @aeneasr since you asked for a more thorough explanation and/or design document for my PR.

davidspek · 2023-06-21T15:02:38Z

Friendly ping @aeneasr .

davidspek · 2023-06-21T15:23:40Z

Friendly ping @zepatrik maybe you can also chime in here.

aeneasr · 2023-07-17T13:03:58Z

This looks pretty nice! Would it make sense to re-use the matching logic of httprouter? I think they solved this already, basically, and it's really fast!

davidspek · 2023-08-08T21:04:13Z

I haven’t looked into httprouter so I’d need to have a look at it.

davidspek · 2023-08-08T21:24:01Z

It looks like httprouter is doing the same thing, except it seems to be a bit more advanced in terms of matching. What is the same though is that wildcard matching (or in our case regex and glob) can only be done at the end of a URL. Another similarity is that they also seem to be adding the method into the Trie.

One big difference though is that they don’t insert the host name into the Trie. Rather, you’d need to create a new router (and therefore a new Trie) for each domain.

I’d need to look into the code a bit closer to see if there’s pieces we could use as a module or if they can otherwise be adapted into this codebase. Since it doesn’t look like that project is super active and that the Trie structure we’d use is like a bit different I’m leaning towards the latter.

davidspek · 2023-08-08T21:37:23Z

After looking at the code it seems very similar to what I’ve already implemented, and I don’t think we’d be able to reuse it as is. However, it probably does have some useful bit that can be used to improve the code I’ve already written.

So @aeneasr, my question now is how can we best try to move this forward?

davidspek added the rfc A request for comments to discuss and share ideas. label Apr 11, 2023

dadrus mentioned this issue May 29, 2023

Allow rule matching using path prefixes dadrus/heimdall#652

Closed

3 tasks

cmmoran mentioned this issue Jan 31, 2024

fix: update regex engine to support possessive match and lookbehind syntaxes #1147

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for easily matching rules using path prefixes #1089

Allow for easily matching rules using path prefixes #1089

davidspek commented Apr 11, 2023 •

edited

davidspek commented Apr 11, 2023

davidspek commented Jun 21, 2023

davidspek commented Jun 21, 2023

aeneasr commented Jul 17, 2023

davidspek commented Aug 8, 2023

davidspek commented Aug 8, 2023

davidspek commented Aug 8, 2023

Allow for easily matching rules using path prefixes #1089

Allow for easily matching rules using path prefixes #1089

Comments

davidspek commented Apr 11, 2023 • edited

Preflight checklist

Context and scope

Goals and non-goals

The design

Entering rules into the Trie.

Matching incoming requests

Examples

APIs

Data storage

Code and pseudo-code

Degree of constraint

Alternatives considered

davidspek commented Apr 11, 2023

davidspek commented Jun 21, 2023

davidspek commented Jun 21, 2023

aeneasr commented Jul 17, 2023

davidspek commented Aug 8, 2023

davidspek commented Aug 8, 2023

davidspek commented Aug 8, 2023

davidspek commented Apr 11, 2023 •

edited