Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make UUri specification more formal #121

Conversation

sophokles73
Copy link
Contributor

This addresses #115 and other issues.

The UUri specification has been lacking many details that are relevant
for correctly implementing the data model in client libraries.

A lot of the specification's textual content has been transformed into
formal requirements based on invariants and predicates expressed using
the Object Constraint Language.

Last but not least, the 'id' and 'ip' properties have been removed from
UAuthority's data model because they did not really serve a specific
pupose beyond the semantics of the 'name' property.

@sophokles73
Copy link
Contributor Author

sophokles73 commented Apr 22, 2024

Here's my first shot at a UUri spec formalization.

One thing that still bothers me:
The 8-bit length field of the Micro Format allows for the authority to use up to 255 bytes. The name property, on the other hand, contains (unicode) characters which may be encoded into up to three bytes (using UTF8) per character.

This shouldn't necessarily be an issue because most characters used for the logical name will consist of ASCII characters only and those are encoded into a single byte (using UTF8).

However, IMHO we should properly handle cases in which the authority would be encoded into more than 255 bytes. Given that the Long and Short Format do not put any constraints on the authority length, I wonder if we really need to constrain it in the Micro Format.

If we need/want to set a limit, then IMHO we should define this constraint on the Object Model itself. We should then probably only allow ASCII characters in the authority name so that we know upfront, how many bytes it would be encoded into ...

@sophokles73
Copy link
Contributor Author

@stevenhartley @PLeVasseur @tamarafischer @AnotherDaniel @evshary

any feedback would be welcome

@sophokles73 sophokles73 added documentation Improvements or additions to documentation enhancement New feature or request labels Apr 22, 2024
@stevenhartley
Copy link
Contributor

Here's my first shot at a UUri spec formalization.

One thing that still bothers me: The 8-bit length field of the Micro Format allows for the authority to use up to 255 bytes. The name property, on the other hand, contains (unicode) characters which may be encoded into up to three bytes (using UTF8) per character.

This shouldn't necessarily be an issue because most characters used for the logical name will consist of ASCII characters only and those are encoded into a single byte (using UTF8).

However, IMHO we should properly handle cases in which the authority would be encoded into more than 255 bytes. Given that the Long and Short Format do not put any constraints on the authority length, I wonder if we really need to constrain it in the Micro Format.

If we need/want to set a limit, then IMHO we should define this constraint on the Object Model itself. We should then probably only allow ASCII characters in the authority name so that we know upfront, how many bytes it would be encoded into ...

@sophokles73 we need to follow RFC3968 which means it cannot be any form of UTF-8 character only those characters listed in Appendix A, a collection of ABNF for URI.

@sophokles73
Copy link
Contributor Author

we need to follow RFC3968 which means it cannot be any form of UTF-8 character only those characters listed in Appendix A, a collection of ABNF for URI.

Well, for the long and short form URIs that is done by means of percent encoding any illegal characters. However, that also increases the number of ASCII characters in the URI.

For the micro form serialization I do not see why we would be bound by RFC3986. We are not producing a (textual) URI but a custom binary representation and as long as we can recreate the original UUri from it, we should be safe, shouldn't we?

Another approach, of course, would be to restrict UAuthortity.name to only contain ASCII characters in the first place, if that is what you mean.

@stevenhartley
Copy link
Contributor

stevenhartley commented Apr 22, 2024

we need to follow RFC3968 which means it cannot be any form of UTF-8 character only those characters listed in Appendix A, a collection of ABNF for URI.

Well, for the long and short form URIs that is done by means of percent encoding any illegal characters. However, that also increases the number of ASCII characters in the URI.

For the micro form serialization I do not see why we would be bound by RFC3986. We are not producing a (textual) URI but a custom binary representation and as long as we can recreate the original UUri from it, we should be safe, shouldn't we?

Another approach, of course, would be to restrict UAuthortity.name to only contain ASCII characters in the first place, if that is what you mean.

Yes, lets restrict to ASCII characters

Copy link
Contributor

@stevenhartley stevenhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments. Much better than before for sure!

basics/uri.adoc Outdated Show resolved Hide resolved
basics/uri.adoc Outdated

[source]
A UUri can be represented as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A UUri can be represented as
A UUri can be represented as:

basics/uri.adoc Outdated
[source]
A UUri can be represented as

* an instantiation of the <<_data_model,UUri Data Model>> in a supported programming language,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* an instantiation of the <<_data_model,UUri Data Model>> in a supported programming language,
* An instantiation of the <<UUri Data Model>> in a supported programming language

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to refer to the section, not the UML class diagram here ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should refer to the section with that title unless you named the title and the diagram the same. The reason why I proposed the change is that the link did not work for me when I rendered it in the PR.

basics/uri.adoc Outdated Show resolved Hide resolved
basics/uri.adoc Outdated Show resolved Hide resolved
|`[1, 1, -128, 0, 0, 0, 3, 0, -64, -88, 1, 100]`
[source,java]
----
public class UUri {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUri is defined in proto and the classes are generated to which we cannot extend. I recommend we do not show coding examples in this spec but state that serializers and deserializers and validators are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that might be true for the tooling you use for generating Java code. For Rust, it is very much possible to add such code to the generated types. In general, I wonder if this specification should be based on the features/shortcomings of proto3 and the language specific tooling, or if we want/need to define a model that serves the purpose of uProtocol best. In that context, I have to admit that I am very unhappy with the fact that the Object Model uses UInt32 types when the domain actually requires UInt16 for identifiers and UInt8 for version_{major,minor}.
FMPOV we should define in the spec what we need and consider right and then point out, that the proto3 definition's sole purpose is to supports serialization and deserialization of a (valid) instance of the object model, but nothing more.

basics/uri.adoc Outdated Show resolved Hide resolved
basics/uri.adoc Outdated Show resolved Hide resolved
basics/uri.adoc Outdated Show resolved Hide resolved
basics/uri.adoc Outdated Show resolved Hide resolved
@sophokles73
Copy link
Contributor Author

@stevenhartley I have updated according to your feedback but also have added some questions to the comments ...

@stevenhartley
Copy link
Contributor

@stevenhartley I have updated according to your feedback but also have added some questions to the comments ...

I answered your questions this morning.

@sophokles73
Copy link
Contributor Author

sophokles73 commented Apr 23, 2024

@stevenhartley I have updated according to your feedback but also have added some questions to the comments ...

I answered your questions this morning.

@stevenhartley It looks like some of them slipped through ... maybe you haven't seen them because they have been collapsed by GitHub?

@sophokles73
Copy link
Contributor Author

@stevenhartley I just had a little chat with @AnotherDaniel about his problems using UUris from within his uSubscription implementation which are mainly related to the duality of identifiers (name vs. id) :-)
We ended up thinking out loud about trying to get rid of the numerical id properties in the same way as we did in UAuthority, i.e. leave the resolving of logical names to identifiers to the UTransport that actually requires it. Currently, this would only affect the SOME/IP client if I am not mistaken and we could imagine it maintaining a mapping of names <-> (SOME/IP) ids in very much the same way as we let it do the mapping of names <-> IP addresses.

Would it make sense to think a little more into that direction or is it totally out of the question from your point of view?

@stevenhartley
Copy link
Contributor

@stevenhartley I just had a little chat with @AnotherDaniel about his problems using UUris from within his uSubscription implementation which are mainly related to the duality of identifiers (name vs. id) :-) We ended up thinking out loud about trying to get rid of the numerical id properties in the same way as we did in UAuthority, i.e. leave the resolving of logical names to identifiers to the UTransport that actually requires it. Currently, this would only affect the SOME/IP client if I am not mistaken and we could imagine it maintaining a mapping of names <-> (SOME/IP) ids in very much the same way as we let it do the mapping of names <-> IP addresses.

Would it make sense to think a little more into that direction or is it totally out of the question from your point of view?

No we cannot do this (leave it up to the transport layer). The source of truth for service ids is the proto and is known to the application layer code or uDiscovery (not the transport layer). UAuthority to ip address can be done using linux resolver or static IP address table.

@AnotherDaniel
Copy link

Hm, I hear what you're saying. So pushing ahead with uDiscovery, and making that the source of truth for this kind of lookup - based in the proto definitions - would that be an option?

I really don't like this information-duality between names and ids in the UURi, also the fact that we are redefining/extending what an uri is... feels off. Would much rather have uris just be uris, and have a well defined, separate way to translate to id representation and back (like url - DNS - ip)

@stevenhartley
Copy link
Contributor

I'm open to other suggestions and agree when we only had long form URI it was much simpler. I haven't been able to think of a different solution.

@AnotherDaniel
Copy link

AnotherDaniel commented Apr 24, 2024

I'm open to other suggestions and agree when we only had long form URI it was much simpler. I haven't been able to think of a different solution.

This might become too big a discussion for this PR - take this out into a dedicated issue maybe?

I'm asking this from my usual clueless point of view - but my thinking would be: shouldn't the up-client/boundary-ustreamers adapters be able to perform name-to-id-to-name translation for the messages that flow by, with the relevant information retrieved from uDiscovery (DNS-style) services?

These uDiscovery entities, within a certain scope (i.e. vehicle), might then address the question of how to efficiently manage their translation tables as some form of distributed configuration, that can be managed as a separate problem to solve (so these tables might be just built from proto definitions, as default/semi-static config, but obviously some form of updatability might be desirable in the future)...

@stevenhartley stevenhartley reopened this Apr 24, 2024
stevenhartley added a commit that referenced this pull request Apr 25, 2024
The following works to address #121 by making the contract explicit and to differentiate the old LongForm URI and short form URI.
@evshary
Copy link
Contributor

evshary commented Apr 25, 2024

Another approach, of course, would be to restrict UAuthortity.name to only contain ASCII characters in the first place, if that is what you mean.

Yes, lets restrict to ASCII characters

Sorry for disrupting the discussion. I also like the restriction of UAuthority.name, but I'm not sure what the range of ASCII characters here is. Does it only limit to use alphabet and number, or it can also use something like !, +, @, etc?
Also, is the UAuthority.name still case-insensitive?
These are related to whether the character are valid when transforming into protocol topic / key.

@sophokles73
Copy link
Contributor Author

Sorry for disrupting the discussion. I also like the restriction of UAuthority.name, but I'm not sure what the range of ASCII characters here is. Does it only limit to use alphabet and number, or it can also use something like !, +, @, etc?

In the PR I am currently restricting the authority to be a host following the corresponding production in https://datatracker.ietf.org/doc/html/rfc3986#appendix-A

In particular, this also includes characters from the sub-delim production which also includes !, & etc

Also, is the UAuthority.name still case-insensitive?

Yes, FMPOV that is the plan.

@evshary
Copy link
Contributor

evshary commented Apr 26, 2024

In the PR I am currently restricting the authority to be a host following the corresponding production in https://datatracker.ietf.org/doc/html/rfc3986#appendix-A

In particular, this also includes characters from the sub-delim production which also includes !, & etc

OK, then I still need to do some transformation while generating Zenoh key. At least, $ and * are forbidden in Zenoh key.

Yes, FMPOV that is the plan.

Then, no matter MQTT topic or Zenoh key should transform the authority to either lowercase or uppercase, since they are case-sensitive.

Is there any reason why we stick to RFC3986 standard here? If not, how about defining our own UAuthority for our own purpose?

@sophokles73
Copy link
Contributor Author

Is there any reason why we stick to RFC3986 standard here? If not, how about defining our own UAuthority for our own purpose?

IMHO the biggest advantage is to be able to use tooling that is available for RFC3986 URIs, e.g. for parsing, comparing etc. Most programming languages have support for URIs in some way.

That said, as long as we make sure that a UUri is-a RFC3986 URI, we should be on the safe side. This should indeed allow us to e.g. restrict the supported characters in the host production if we feel that this would be beneficial. Do you have a concrete proposal for that?

@evshary
Copy link
Contributor

evshary commented Apr 26, 2024

IMHO the biggest advantage is to be able to use tooling that is available for RFC3986 URIs, e.g. for parsing, comparing etc. Most programming languages have support for URIs in some way.

Totally agree with you.

Do you have a concrete proposal for that?

No, but I'm thinking perhaps using unserved here in UAuthority is enough.

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

Also, limiting the using lowercase (or uppercase) alphabets here might be simpler.
But it's still ok to transform UAuthority into lowercase (or uppercase) in the transport layer (MQTT, Zenoh...).

stevenhartley added a commit that referenced this pull request Apr 30, 2024
The following adds to #121 to simplify the datamodel for UUri object.
@stevenhartley
Copy link
Contributor

@sophokles73 (et. al.),

I want to summarize conversations over email so we can conclude on this topic. I really don't think we will get a single UUri definition that will map perfectly to all transports so need to to compromise, taking into consideration the ease of use, $ for transmission, and other factors.
Given that we will have to statically map service/topic information for mechatronics anyways (for SOME/IP), I think the "good enough" solution shall be:

  • Go back to a single definition of a UUri that takes the best of long & short while removing micro all together
  • Make UAuthority mandatory: Removes ambiguity if present or not and allows for simplifies normalization of UUris
  • Constrain UAuthority::name and UEntity::name to "unreserved" defined above (and possibly limit size)
  • Represent UResource as an ID in lieu of name+instance+Message strings: This reduces the size of the UUri while still providing uniqueness for the UResource and flexibility for UEntity naming to number mapping. We keep the definition of id ranges to what we've already defined (i.e. 0 is repsonse, 0-0x7FFFF is request, 0x8000- is publish/notification)

If we settle on above, there is no more need for isResolved(), isLong(0, isShort(), isMicro() and all the other validator logic that is overly complicated. Serializers are simplified back to a single toString() and fromString() and the string format is now:

up://[UAUTHORITY_NAME]/[UENTITY_NAME]/[UENTITY_VERSION_MAJOR][.UENTITY_VERSION_MINOR]/[URESOURCE_ID]

I've captured these changes in #126 . The reason for keeping UAuthority, UEntity, and UResource is to minimize the amount of changes in all the existing code and not to re-invent UUri completely.

@sophokles73
Copy link
Contributor Author

sophokles73 commented May 2, 2024

... and the string format is now:

up://[UAUTHORITY_NAME]/[UENTITY_NAME]/[UENTITY_VERSION_MAJOR][.UENTITY_VERSION_MINOR]/[URESOURCE_ID]

So far, the minor version had not been part of any of UUri's serializations. Just want to make sure that we really need it in there. And if we do, I guess the inclusion of minor version should be dependent on the presence of major version

up://[UAUTHORITY_NAME]/[UENTITY_NAME]/[UENTITY_VERSION_MAJOR[.UENTITY_VERSION_MINOR]]/[URESOURCE_ID]

or be put into a separate path segment

up://[UAUTHORITY_NAME]/[UENTITY_NAME]/[UENTITY_VERSION_MAJOR/[UENTITY_VERSION_MINOR]/[URESOURCE_ID]

However, I'd prefer the former, if we really need minor version in it ...

@sophokles73 sophokles73 force-pushed the make_uuri_spec_more_formal branch 2 times, most recently from ed93082 to a482912 Compare May 13, 2024 08:34
@sophokles73
Copy link
Contributor Author

@stevenhartley I have adapted the UUri spec according to the discussed simplifications of the UUri data model. Some things I want to bring to your attention:

  • I propose to use base16 encoding for the identifiers in the URI which makes them easier to read FMPOV and will also save a few bytes compared to decimal encoding
  • The host production used for the authority name has not been adapted (yet) to use only unreserved characters as proposed by @evshary. We might want to consider this, though.
  • With the introduction of explicit wildcard values we no longer support URIs having empty path components. FMPOV this is actually good because it makes parsing much easier and should not have any real impact on message exchanges because the wildcards will only be used in the definition of forwarding rules in a streamer.
  • I have added OpenFastTrace specification item markup where appropriate. We can use these for testing the OpenFastTrace toolchain before rolling out to other spec documents.

@stevenhartley
Copy link
Contributor

@stevenhartley I have adapted the UUri spec according to the discussed simplifications of the UUri data model. Some things I want to bring to your attention:

  • I propose to use base16 encoding for the identifiers in the URI which makes them easier to read FMPOV and will also save a few bytes compared to decimal encoding
  • The host production used for the authority name has not been adapted (yet) to use only unreserved characters as proposed by @evshary. We might want to consider this, though.
  • With the introduction of explicit wildcard values we no longer support URIs having empty path components. FMPOV this is actually good because it makes parsing much easier and should not have any real impact on message exchanges because the wildcards will only be used in the definition of forwarding rules in a streamer.
  • I have added OpenFastTrace specification item markup where appropriate. We can use these for testing the OpenFastTrace toolchain before rolling out to other spec documents.

I'll take a look. I started to code these things on the 1.5.8 branch of up-java and the serializers and deserializers were drastically simplified as well.

Copy link
Contributor

@stevenhartley stevenhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sophokles73 only minor comments left but looking much better than before!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the white background so it is difficult to see the image, please add it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this makes it harder to see in vscode (with a dark theme/background). However, when viewing on GitHub this is not an issue, right? The rfc3986.png also has a transparent background so I thought we might want to do the same for the URI diagram ...

| Returns true if the Uri part contains the required ids to serialized to Micro format and the fields of the Uri can fit within the specified number of bits and bytes. Both sets of details can be obtained under <<Micro URIs>>
[source,ocl]
----
inv: authority_name.length() <= 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this requirement come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had been discussing that we wanted to limit the length of the authority name to some degree. I just picked 128 as a first shot. Do you still think that we should limit the length?

basics/uri.adoc Outdated
--
A UUri's `ue_id` property value determines the type and instance of service being referred to. +
The value's _most_ significant 16 bits represent the _service instance_. +
The value's _least_ significant 16 bits represent the _service type_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifier and not type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually talk about instances of a certain type, don't we? A service identifier to me suggests that we are talking about a particular instance already, instead of its type.

basics/uri.adoc Outdated
[.specitem,oft-sid="dsn~entity-id~1"]
--
A UUri's `ue_id` property value determines the type and instance of service being referred to. +
The value's _most_ significant 16 bits represent the _service instance_. +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put a note that a service instance of 0 means "don't care".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added that to the section on pattern matching ...

basics/uri.adoc Outdated
|Format | Value
[.specitem,oft-sid="dsn~pattern-matching~1",oft-needs="impl,utest"]
--
A UUri that is used as a pattern to match other UUris against
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated from the sentence above

basics/uri.adoc Outdated
|Long
|`//vcu.veh.gm.com/core.usubscription/3/SubscriptionChange#Update`
* *MAY* have its `authority_name` set to the `*` (`U+002A`, Asterisk) character in order to match _any_ (including _no_) authority,
* *MAY* have the _service type_ part of its `ue_id` set to `0xFFFF` in order to match _any_ type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need to call it service number to differentiate the instance id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean, replace service type with service number or replace service instance with service number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading some/ip spec for a little inspiration, they have service_id and service_instance_id, no where do we refer to service_id as type hence why I wanted to keep the id or number but not type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's stick to service id and service instance id. I'll update accordingly ...

basics/uri.adoc Outdated
|`//vcu.veh.gm.com/core.usubscription/3/SubscriptionChange#Update`
* *MAY* have its `authority_name` set to the `*` (`U+002A`, Asterisk) character in order to match _any_ (including _no_) authority,
* *MAY* have the _service type_ part of its `ue_id` set to `0xFFFF` in order to match _any_ type,
* *MAY* have the _service instance_ part of its `ue_id` set to `0x0000` in order to match _any_ instance,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!! better than what I was thinking and coding in Java...

@sophokles73
Copy link
Contributor Author

@stevenhartley I have pushed changes and amended your comments. Would you mind taking another look?

@stevenhartley
Copy link
Contributor

@sophokles73 not sure if my comment got lost but the diagram needs to have a white background (it was removed) otherwise I cannot view it when using dark mode in github.

@sophokles73
Copy link
Contributor Author

@stevenhartley I have added the white background and updated the terminology used for ue_id

Copy link
Contributor

@stevenhartley stevenhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @sophokles73 !

@sophokles73 sophokles73 added this to the alpha.1 milestone May 14, 2024
@sophokles73
Copy link
Contributor Author

@stevenhartley any idea why the verify-pr check is not completing? Or what it actually does/is, for that matter?

@stevenhartley
Copy link
Contributor

stevenhartley commented May 14, 2024

@stevenhartley any idea why the verify-pr check is not completing? Or what it actually does/is, for that matter?

It builds the proto to make sure there are no issues with syntax.
@sophokles73 I can't seem to retrigger it on my side, if you push something new (single line change) it should retrigger the workflows

@sophokles73
Copy link
Contributor Author

@stevenhartley this PR seems to be stuck, I have tried to push some changes but that doesn't seem to resolve the situation. I will now close this PR and create a new one instead ...

@sophokles73
Copy link
Contributor Author

Reopening to test a theory ...

@sophokles73 sophokles73 reopened this May 15, 2024
The UUri specification has been lacking many details that are relevant
for correctly implementing the data model in client libraries.

A lot of the specification's textual content has been transformed into
formal requirements based on invariants and predicates expressed using
the Object Constraint Language.

Last but not least, the content model of UUri has been drastically
simplified, thus removing all ambiguity around serialization and
validation. This should make implementation of the object model
much easier and improve general interoperability.
@sophokles73 sophokles73 merged commit 8191328 into eclipse-uprotocol:main May 15, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants