Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support colons in path for server-side #5676

Merged
merged 21 commits into from
Jun 4, 2024
Merged

Conversation

jrhee17
Copy link
Contributor

@jrhee17 jrhee17 commented May 13, 2024

Motivation:

Currently, gRPC verb suffix is supported by introducing a VerbSuffixPathMapping.
There were several issues with this approach:

  • VerbSuffixPathMapping was applied inconsistently.
    • For instance, RouteBuilder path(String pathPattern) applies VerbSuffixPathMapping, but RouteBuilder path(String prefix, String pathPattern) doesn't.
  • In the context of ExactPathMapping, VerbSuffixPathMapping was acting as a workaround to support colons in the path. Additionally the support for colons is incomplete since only the last colon is correctly handled.
    • Another side-effect of this is route collisions are incorrectly reported since the verb suffix isn't taken into account.
  • Continued support of VerbSuffixPathMapping makes supporting colons natively difficult.
  • Misc) A path such ending with an escape character failed to start up the server: /path\\

The original motivation for #5613 was that the following patterns weren't supported.

  • /path/{param}:verb
  • /path/literal:verb

I propose that the first case be handled via using REGEX PathMapping types, and the second case be handled by natively supporting colons in our path parameters.
Note that colons are supported per the rfc3986, and we already have an issue #4577.

  1. Supporting /path/{param}:verb

I propose that we support this simply by introducing a new PathMappingType.REGEX. We can simply check if the last PathSegment is a VerbPathSegment. If it is a VerbPathSegment, then we can just use PathMappingType.REGEX.

  1. Supporting /path/literal:verb

Internally, we represent colons as parameterized variables both in ParameterizedPathMapping and RoutingTrieBuilder. This makes it difficult to support colons, so I propose that we modify the internal representation to a different character (\0). This character isn't a valid path character per rfc3986, so I believe the impact is minimal.

One side effect of this approach is that ParameterizedPathMapping#paths will return null character delimited paths.
e.g. /v0/:/path -> /v0/\0/path
Having said this, I believe the normal user path doesn't really use this value so it shouldn't matter.

  1. Supporting colons in general

When one calls RouteBuilder#path(String), we determine whether to use ParameterizedPathMapping or ExactPathMapping depending on whether a colon is used.

if (!pathPattern.contains("{") && !pathPattern.contains(":")) {

  • If the colon does not start the segment (e.g. /a:b), it is trivial to assume that the colon should be matched exactly.
  • If the colon does start the segment (e.g. /:param), it is ambiguous whether the user intended this to be a literal or parameter.

For this case, I propose the following:

  • If a colon is used as-is, /:param, then the segment will be used to represent a parameter
  • If a colon is escaped, /\\:param, then the segment will be treated as a literal

Optimization) ExactPathMapping is more performant than ParameterizedPathMapping because a simple equality check is done. I propose as an optimization we modify the condition to check if /: is contained instead of :. As a downside of this approach, the colon escape logic needs to also be added to ExactPathMapping. This can be undone if this logic seems too complicated though.

Modifications:

  • When VerbPathSegment is used, use RegexPathMapping for gRPC transcoding.
  • Removed VerbSuffixPathMapping and related changes in RoutingTrieBuilder, RouteBuilder, and ParameterizedPathMapping
  • Modified RoutingTrieBuilder and ParameterizedPathMapping to use \0 instead of ':' to represent parameterized path segments. This allows us to use ':' in most PathMapping types.
  • Relaxed ParameterizedPathMapping to act like ExactPathMapping when a path segment isn't capturing a path parameter.
  • Replace /\\: to /: in ParameterizedPathMapping and ExactPathMapping to give users an option to use the first colon as a literal.

Result:

@jrhee17 jrhee17 added this to the 1.29.0 milestone May 13, 2024
@@ -74,13 +74,13 @@ service HttpJsonTranscodingTestService {

rpc EchoTimestampAndDuration(EchoTimestampAndDurationRequest) returns (EchoTimestampAndDurationResponse) {
option (google.api.http) = {
get: "/v1/echo/{timestamp}/{duration}"
get: "/v1/echo/timestamp/{timestamp}/{duration}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths can be a source of flakiness since it clashes with the following path:

get: "/v1/echo/response_body/value"

get: "/v1/echo/response_body/repeated"

};
}

rpc EchoTimestamp(EchoTimestampRequest) returns (EchoTimestampResponse) {
option (google.api.http) = {
post: "/v1/echo/{timestamp}:get",
post: "/v1/echo/timestamp/{timestamp}:get",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

post: "/v1/echo/{timestamp}:get",

sortedEndpoints.stream().map(httpEndpoint -> httpEndpoint.spec().route().patternString())
sortedEndpoints.stream()
.map(httpEndpoint -> httpEndpoint.spec().route())
.filter(route -> route.pathType() != RoutePathType.REGEX)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary because GrpcDocServicePluginTest.servicesTest fails.

When the path type is regex, a question mark is included in the path. e.g. /(?<p0>[^/])/hello

However, the regex path is incorrectly cleansed in MethodInfo.

final RequestTarget reqTarget = RequestTarget.forServer(path);

Since this automatic behavior is done for gRPC only, I've added a logic to filter out this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we create the method info with the examplepath as is?
I'm a bit worried about the situation that all paths that have verb suffixes wouldn't have the example paths. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I guess I didn't think example paths were too important.
Having said this, I do think it's possible to replace regex patterns with the capturing group name at the cost of extra complexity. Pushed a commit to address this.

@jrhee17 jrhee17 marked this pull request as ready for review May 16, 2024 09:43
@jrhee17 jrhee17 changed the title Refactor gRPC verb suffix mapping Support colons in path for server-side May 17, 2024
Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks straightforward! Left small suggestions. 👍

sortedEndpoints.stream().map(httpEndpoint -> httpEndpoint.spec().route().patternString())
sortedEndpoints.stream()
.map(httpEndpoint -> httpEndpoint.spec().route())
.filter(route -> route.pathType() != RoutePathType.REGEX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we create the method info with the examplepath as is?
I'm a bit worried about the situation that all paths that have verb suffixes wouldn't have the example paths. 🤔

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍 👍 👍

@@ -543,32 +541,12 @@ private static PathMapping getPathMapping(String pathPattern) {
"pathPattern: " + pathPattern +
" (not an absolute path starting with '/' or a unknown pattern type)");
}
if (!pathPattern.contains("{") && !pathPattern.contains(":")) {
if (!pathPattern.contains("/{") && !pathPattern.contains("/:")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this change, /path/infix{foo} becomes a valid pattern. Previously, an exception was raised.

{ is not an allowed character so it is percent-encoded by Armeria client or server. It means a request whose path is /path/infix{foo} fails to match the service for /path/infix{foo} and returns 404.

We may reject the pattern as before or treat this as a known problem if the percent-encoding issue is not relevant to this PR. I also don't see this mismatch as a critical problem.

Copy link
Contributor Author

@jrhee17 jrhee17 May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ is not an allowed character so it is percent-encoded by Armeria client or server. It means a request whose path is /path/infix{foo} fails to match the service for /path/infix{foo} and returns 404.

I agree with this sentiment. However, I also thought that not only {} but also other characters are valid path patterns but have the same issue. e.g. /$@ will have a similar issue.

Because of this, I thought 1) this was easy enough for maintainers to reason about 2) users won't be using this pattern anyways 3) If we want to limit these type of characters, it shouldn't be limited to only {} but also other characters that should be precent-encoded.

I do think such strict validation will be nice to have for those who aren't familiar with percent-encoding rules though (I also have to refer to the rfc every time if I'm being honest). I think it might be nice to refer to other frameworks to see how strictly they are handling validation rules as well.

Copy link
Contributor

@ikhoon ikhoon Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/path/infix{foo} is not a common pattern for REST API. Let's resolve the problem when we supporting the use of a path variable with literal characters like https://github.com/spring-projects/spring-framework/blob/main/spring-core/src/test/java/org/springframework/util/AntPathMatcherTests.java#L659-L661

'(' +
// If the segment doesn't start with ':' or '{', the behavior should be the same as ExactPathMapping
"/[^:{][^/]*|" +
"/:[^/{}]+|" +
Copy link
Contributor

@ikhoon ikhoon May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question)

  1. Asumming that a path pattern is /a/:b:c/:d, b:c is used as a path variable, e.g. ctx.pathParam("b:c"). Is that an expected behavior?
  2. /brace/{b}:def is an invalid pattern. /colon/:a:verb is an valid pattern and a path segment is captured with a:verb at which I was a bit suprised.

If we don't support /:param:verb style in the ParameterizedPathMapping, I prefer to raise an exception when a path variable includes :. What do you think?

Copy link
Contributor Author

@jrhee17 jrhee17 May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found that the handling of {} and : is different which confused me at first.
Having said this, I decided to just follow the previous regex pattern since I didn't want to introduce inadvertent breaking changes.

Asumming that a path pattern is /a/:b:c/:d, b:c is used as a path variable, e.g. ctx.pathParam("b:c"). Is that an expected behavior?
/brace/{b}:def is an invalid pattern. /colon/:a:verb is an valid pattern and a path segment is captured with a:verb at which I was a bit suprised.

Double checked that this behavior is consistent with the main branch.

If we don't support /:param:verb style in the ParameterizedPathMapping, I prefer to raise an exception when a path variable includes :. What do you think?

I wanted to avoid introducing more breaking changes since this PR is already introducing certain breaking changes. I also agree that the assymmetry is confusing and it is easier to reason about if : and {} have the same treatment.

I guess this comes down to "does this issue annoy maintainers enough to make a breaking change (I'm not sure if users are actually using colon in names though)". I think we can discuss this further at the next stand up meeting.

Copy link
Contributor

@ikhoon ikhoon May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked that this behavior is consistent with the main branch.

I was confused. /brace/{b}:def only works on the main branch but didn't work with the previous release version, 1.28.4. As it is consistent with the previous version, it is okay.

This PR is good as is. However, ultimately, it would be good to support a path pattern that allows the path variable can be located in a path segment. Some URL patterns seem useful:

  • /{project}/{repoName}.git
  • /download/{filename}.pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, ultimately, it would be good to support a path pattern that allows the path variable can be located in a path segment

I agree this could be useful. I think it's probably doable if we start using {} instead of \0 to internally representing the parameterized variables. Additionally, we probably need to modify the route parsing logic at Routers as well.

I don't have immediate plans to handle this since RegexPathMapping can kind of handle this case - I'd be happy to review if you would like to work on it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support complex path patterns to build finer grain routes, I am considering dropping : from the path variable patterns in Armeria 2.0. We don't debate about that though.

I don't see it was a good choice that we used : as a path variable. : seems simple but it can be also used as a literal. If a path variable is mixed with a literal :, it is ambiguous and difficult to know the exact name of the variable.

I don't have immediate plans to handle this since RegexPathMapping can kind of handle this case - I'd be happy to review if you would like to work on it though.

I'll try it next time. 😃

Copy link

codecov bot commented May 30, 2024

Codecov Report

Attention: Patch coverage is 88.00000% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 74.13%. Comparing base (14c5566) to head (79fe6e6).
Report is 21 commits behind head on main.

Files Patch % Lines
...ria/server/grpc/HttpJsonTranscodingPathParser.java 77.77% 1 Missing and 3 partials ⚠️
...rmeria/server/grpc/HttpJsonTranscodingService.java 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5676      +/-   ##
============================================
+ Coverage     74.05%   74.13%   +0.07%     
- Complexity    21253    21340      +87     
============================================
  Files          1850     1855       +5     
  Lines         78600    78827     +227     
  Branches      10032    10068      +36     
============================================
+ Hits          58209    58436     +227     
+ Misses        15686    15671      -15     
- Partials       4705     4720      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

minwoox pushed a commit that referenced this pull request Jun 4, 2024
Motivation:

This bug was found while analyzing the code for #5676.

Our gRPC-transcoding path parser contains logic which uses the parameter name when constructing the mapped path.
While doing so, we use an arbitrary name prefixed with a 'p' if we are unable to determine a suitable parameter name.

https://github.com/line/armeria/blob/1737482b82255d2ff4728927bdf91631bf1b23e9/grpc/src/main/java/com/linecorp/armeria/server/grpc/HttpJsonTranscodingPathParser.java#L385

This can be problematic since a path may be constructed in a different way from what the user expected.

e.g. Assume we use the following path pattern:
- path: `/v1/conflict/{id=hello/*}/{p0}/test`

The above path will currently be mapped to:
- `/v1/conflict/hello/:p0/:p0/test`.

However, there is no guarantee that `id` and `p0` should be equal, and hence should be built to the following path:
- `/v1/conflict/hello/:p0/:p1/test`.

I propose that we prepend a '@' to the generated fields. The reasoning for this is:
- Protobuf doesn't allow field names other than letters, digits, underscores in identifiers.
  - https://protobuf.dev/reference/protobuf/proto3-spec/#identifiers
  - https://protobuf.dev/reference/protobuf/proto2-spec/#identifiers
- '@' is a valid character in a path segment, so it stays the same after `RequestTarget#forServer` is called. This is useful when displaying the paths in `DocService` (called by `MethodInfo`).
  - https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

As long as the above is satisfied, I think any character is fine.
I actually believe any `sub-delims` character can be used, so let me know if any character should be preferred more.

>  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Modifications:

- Modified to prepend a '@' character in front of generated fields.

Result:

- There is no longer a bug when a user uses field `p{\d}` for gRPC transcoding.

<!--
Visit this URL to learn more about how to write a pull request description:
https://armeria.dev/community/developer-guide#how-to-write-pull-request-description
-->
Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this tricky problem, @jrhee17!

@minwoox minwoox merged commit ea19400 into line:main Jun 4, 2024
17 checks passed
@minwoox
Copy link
Member

minwoox commented Jun 4, 2024

@jrhee17 👍 👍 👍 👍 👍

Dogacel pushed a commit to Dogacel/armeria that referenced this pull request Jun 8, 2024
…e#5679)

Motivation:

This bug was found while analyzing the code for line#5676.

Our gRPC-transcoding path parser contains logic which uses the parameter name when constructing the mapped path.
While doing so, we use an arbitrary name prefixed with a 'p' if we are unable to determine a suitable parameter name.

https://github.com/line/armeria/blob/1737482b82255d2ff4728927bdf91631bf1b23e9/grpc/src/main/java/com/linecorp/armeria/server/grpc/HttpJsonTranscodingPathParser.java#L385

This can be problematic since a path may be constructed in a different way from what the user expected.

e.g. Assume we use the following path pattern:
- path: `/v1/conflict/{id=hello/*}/{p0}/test`

The above path will currently be mapped to:
- `/v1/conflict/hello/:p0/:p0/test`.

However, there is no guarantee that `id` and `p0` should be equal, and hence should be built to the following path:
- `/v1/conflict/hello/:p0/:p1/test`.

I propose that we prepend a '@' to the generated fields. The reasoning for this is:
- Protobuf doesn't allow field names other than letters, digits, underscores in identifiers.
  - https://protobuf.dev/reference/protobuf/proto3-spec/#identifiers
  - https://protobuf.dev/reference/protobuf/proto2-spec/#identifiers
- '@' is a valid character in a path segment, so it stays the same after `RequestTarget#forServer` is called. This is useful when displaying the paths in `DocService` (called by `MethodInfo`).
  - https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

As long as the above is satisfied, I think any character is fine.
I actually believe any `sub-delims` character can be used, so let me know if any character should be preferred more.

>  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Modifications:

- Modified to prepend a '@' character in front of generated fields.

Result:

- There is no longer a bug when a user uses field `p{\d}` for gRPC transcoding.

<!--
Visit this URL to learn more about how to write a pull request description:
https://armeria.dev/community/developer-guide#how-to-write-pull-request-description
-->
Dogacel pushed a commit to Dogacel/armeria that referenced this pull request Jun 8, 2024
Motivation:

Currently, gRPC verb suffix is supported by introducing a `VerbSuffixPathMapping`.
There were several issues with this approach:
- `VerbSuffixPathMapping` was applied inconsistently.
  - For instance, `RouteBuilder path(String pathPattern)` applies `VerbSuffixPathMapping`, but `RouteBuilder path(String prefix, String pathPattern)` doesn't.
- In the context of `ExactPathMapping`, `VerbSuffixPathMapping` was acting as a workaround to support colons in the path. Additionally the support for colons is incomplete since only the last colon is correctly handled.
  - Another side-effect of this is route collisions are incorrectly reported since the verb suffix isn't taken into account.
- Continued support of `VerbSuffixPathMapping` makes supporting colons natively difficult.
- Misc) A path such ending with an escape character failed to start up the server: `/path\\`

The original motivation for line#5613 was that the following patterns weren't supported.
- /path/{param}:verb
- /path/literal:verb

I propose that the first case be handled via using `REGEX` `PathMapping` types, and the second case be handled by natively supporting colons in our path parameters.
Note that colons are supported per the [rfc3986](https://datatracker.ietf.org/doc/html/rfc3986#section-3.3), and we already have an issue line#4577.

1. Supporting `/path/{param}:verb`

I propose that we support this simply by introducing a new `PathMappingType.REGEX`. We can simply check if the last `PathSegment` is a `VerbPathSegment`. If it is a `VerbPathSegment`, then we can just use `PathMappingType.REGEX`.

2. Supporting `/path/literal:verb`

Internally, we represent colons as parameterized variables both in `ParameterizedPathMapping` and `RoutingTrieBuilder`. This makes it difficult to support colons, so I propose that we modify the internal representation to a different character (`\0`). This character isn't a valid path character per [rfc3986](https://datatracker.ietf.org/doc/html/rfc3986#section-3.3), so I believe the impact is minimal.

One side effect of this approach is that `ParameterizedPathMapping#paths` will return null character delimited paths.
e.g. `/v0/:/path` -> `/v0/\0/path`
Having said this, I believe the normal user path doesn't really use this value so it shouldn't matter.

3. Supporting colons in general

When one calls `RouteBuilder#path(String)`, we determine whether to use `ParameterizedPathMapping` or `ExactPathMapping` depending on whether a colon is used.

https://github.com/line/armeria/blob/8ab42847c146b481c72f000dbc1c4d77dc009220/core/src/main/java/com/linecorp/armeria/server/RouteBuilder.java#L546

- If the colon does not start the segment (e.g. `/a:b`), it is trivial to assume that the colon should be matched exactly.
- If the colon does start the segment (e.g. `/:param`), it is ambiguous whether the user intended this to be a literal or parameter.

For this case, I propose the following:
- If a colon is used as-is, `/:param`, then the segment will be used to represent a parameter
- If a colon is escaped, `/\\:param`, then the segment will be treated as a literal

Optimization) `ExactPathMapping` is more performant than `ParameterizedPathMapping` because a simple equality check is done. I propose as an optimization we modify the condition to check if `/:` is contained instead of `:`. As a downside of this approach, the colon escape logic needs to also be added to `ExactPathMapping`. This can be undone if this logic seems too complicated though.

Modifications:

- When `VerbPathSegment` is used, use `RegexPathMapping` for gRPC transcoding.
- Removed `VerbSuffixPathMapping` and related changes in `RoutingTrieBuilder`, `RouteBuilder`, and `ParameterizedPathMapping`
- Modified `RoutingTrieBuilder` and `ParameterizedPathMapping` to use `\0` instead of ':' to represent parameterized path segments. This allows us to use ':' in most `PathMapping` types.
- Relaxed `ParameterizedPathMapping` to act like `ExactPathMapping` when a path segment isn't capturing a path parameter.
- Replace `/\\:` to `/:` in `ParameterizedPathMapping` and `ExactPathMapping` to give users an option to use the first colon as a literal.

Result:

- line#4577 is closed
- Logic related to gRPC verb suffixes is generalized and cleaned up

<!--
Visit this URL to learn more about how to write a pull request description:
https://armeria.dev/community/developer-guide#how-to-write-pull-request-description
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants