Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support referenceIntoComponents for other components than message #97

Closed
thake opened this issue Nov 9, 2022 · 55 comments
Closed

Support referenceIntoComponents for other components than message #97

thake opened this issue Nov 9, 2022 · 55 comments
Labels
enhancement New feature or request

Comments

@thake
Copy link

thake commented Nov 9, 2022

Reason/Context

Currently, the referenceIntoComponents option only seems to work for message components. However, it can also be useful for all other component types (see https://www.asyncapi.com/docs/reference/specification/v2.5.0#componentsObject).

Description

Changes:

Although the bundled spec will change, I would not consider this feature as a breaking change. Implementing this feature would only mean the correct implementation of the current documentation of the referenceIntoComponents flag:

Pass true to resolve external references to components.

@thake thake added the enhancement New feature or request label Nov 9, 2022
@github-actions
Copy link

github-actions bot commented Nov 9, 2022

Welcome to AsyncAPI. Thanks a lot for reporting your first issue. Please check out our contributors guide and the instructions about a basic recommended setup useful for opening a pull request.
Keep in mind there are also other channels you can use to interact with AsyncAPI community. For more details check out this issue.

@aeworxet aeworxet self-assigned this Nov 10, 2022
@derberg
Copy link
Member

derberg commented Nov 22, 2022

I think if maybe we should plug in https://github.com/asyncapi/optimizer/ or not even plug in but simply describe how both should be used 🤔

cc @KhudaDad414 @Souvikns

@Souvikns
Copy link
Member

While building reference-into-components feature I thought I could use optimiser for resolving references but that was not possible, what we can do is optimize the document after all the referencing is done, will open a proposal for this soon.

@derberg
Copy link
Member

derberg commented Nov 22, 2022

@Souvikns do you remember the reason? it would be good to have one mechanism for things like this. But of course only if it makes sense and will help and not complicated life

@aeworxet
Copy link
Collaborator

While trying to find out what would be better architecturally:

schemas
servers
channels
operations       <-- will be added in specification 3.0.0
messages
securitySchemes
serverVariables
parameters
correlationIds
externalDocs     <-- will be added in specification 3.0.0
tags             <-- will be added in specification 3.0.0
operationTraits
messageTraits
serverBindings
channelBindings
operationBindings
messageBindings

and search, say, for message through all schema (iterate through all properties of the schema object), and if it finds a message (with a $ref which is an external $ref), then this $ref is dereferenced, and full dereferenced message object is moved to components: { messages: {} },

I found that there exists one big pro in the second approach, which will make the solution more generic (@magicmatatjahu): if objects move around the schema as specification changes with time, nothing would change for Bundler who in search of suitable objects was iterating through all of the schema object from the start.

And this led me to using @derberg's and @Souvikns's idea of utilizing Optimizer as much as possible.

'As much as possible' I see as adding to Optimizer the ability to dereference external references before starting the optimization of a given schema, and then Bundler will only merge schemas Optimizer will give out to it (because by logic, Optimizer is meant to OPTIMIZE ONE given schema, and Bundler is meant to BUNDLE SEVERAL schemas into one).
@KhudaDad414

@jonaslagoni do you have additions or comments?

@Souvikns
Copy link
Member

@Souvikns do you remember the reason? it would be good to have one mechanism for things like this. But of course only if it makes sense and will help and not complicated life

optimizer currently does not resolve external references, and only works with internal references, so to build the reference-into-components we could not use optimizer. But we can use optimizer after the bundling step when all the external references are resolved and files are merged, then optimizer can optmize the document since there is only internal references to work with. (currently only true for messages)

'As much as possible' I see as adding to Optimizer the ability to dereference external references before starting the optimization of a given schema, and then Bundler will only merge schemas Optimizer will give out to it (because by logic, Optimizer is meant to OPTIMIZE ONE given schema, and Bundler is meant to BUNDLE SEVERAL schemas into one).
@KhudaDad414

Are you suggesting that we move the external reference logic to optimizer?

@aeworxet
Copy link
Collaborator

Are you suggesting that we move the external reference logic to optimizer?

Not move but copy. Extend Optimizer's functionality. And Bundler will use it after extension.

I can even take such a task.

The question is whether there is a need in such functionality, and if it doesn't go against Optimizer's concept (it might be intended for local use only.)
For now AFAIK external references dereference parser-js and bundler. Those are only the ones that I know. So such functionality might become available in all AsyncAPI Initiative tools by default. (it might be even time already to create a library that holds all reusable code 😆)

@KhudaDad414
Copy link
Member

TBH, I don't think moving components to the components section is ideal unless we gain some specific benefits. why? since it adds complexity and makes the specification file harder to read for humans.
In the optimizer, we accept this complexity since it provides reusability by only moving components to the components section if there is more than one copy of the same component.

If I only have one copy of a message why would I want to move it to the components section? I want to have it where it belongs.

I think we should let each of these tools do its thing. let the bundler bundle the specification into one file and let the optimizer check the output of the bundler and move components to the components section if there are reusability benefits.

@derberg
Copy link
Member

derberg commented Nov 29, 2022

The thing is that it is a recommendation to put as much as possible in components. At least I always recommend this way.

Reading huge YAML is hard, but when things are moved into components it is easier. As long as you read YAML in editor that supports $ref that allows you to click it and go into a given reffed component.

But of course moving something to components, if there is just one ref that will use it, do not make much sense either 😄


anyway 😄

bundler -> optimizer user flow makes much more sense to me really.

At the library level, we just explain what people should do, and we make sure that these libraries are not heavy.
At CLI or Studio level, we can abstract it in a way that you can still bundle or optimize separately but we also enable that when you bundle, you can also pass a flag like --moveToComponents (just quick idea, to be discussed once work on CLI happens) that will trigger optimization after bundling (same can be done in Studio).

@magicmatatjahu
Copy link
Member

Yes, the bundler is supposed to do one thing and the optimizer something else, combining it should only be done in integrations like CLI or Studio. As for the approach, as you wrote yourself, the generic approach is better. I'm just wondering whether we should use the API from the parser to bundle external references more easily, or rather write all the logic from scratch. Of course, the parser (for now) does not provide all the data needed for proper bundling (e.g. the original reference of the de-referenced object), but if it did, it would make the work on the bundler much easier.

Currently, you can create this generic function and try to bundle only the message as the bundler currently works, and in other PRs we can add other components to bundle - to do it iteratively.

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Nov 29, 2022

On the other hand, bundler could work as a combination of parser + optimizer, because at the end, the bundled spec should look identical if you use a parser and then an optimizer - but for now let's not deal with it because it has some problems in implementation.

@thake
Copy link
Author

thake commented Nov 29, 2022

I just want to highlight a potential problem if moving to components is done in a separate step after bundling:

The $ref usually also carries a semantical meaning to understand easier what it is (example "$ref : financial-system.yaml#/components/schemas/bankAccountIdentifier"). If the bundling just resolves this ref inline, the semantical meaning of the $ref pointer gets lost and cannot be recovered in later steps. The optimizer would need to invent an artificial component name for the "bankAccountIdentifier" when moving it to the components section.

Because of this, I think moving to the components section needs to be done in the bundler itself.

@aeworxet
Copy link
Collaborator

aeworxet commented Dec 2, 2022

Conversion of this high-level discussion into a technical task.

Bundler does only one job: it bundles several specifications into one file (dereferencing external $refs), optionally aggregating all AsyncAPI Specification-legal components in the components object. Anything more than this is the concern of other tools.

Bundler accepts three options:

referenceIntoComponentsAll: true | false
referenceIntoComponentsOptimize: true | false
saveOriginsOfRefs: true | false

referenceIntoComponentsAll || referenceIntoComponentsOptimize

ONE, and ONLY ONE, of options referenceIntoComponentsAll and referenceIntoComponentsOptimize can equal true.

  • If both options equal true, referenceIntoComponentsAll takes higher priority, and the referenceIntoComponentsOptimize is forced to equal false.
  • If both options equal false, Bundler only bundles and dereferences external $refs, without searching for AsyncAPI Specification-legal components and performing any optimization on the resulting specification.
  • If none of two options is specified, they are both considered false.
  • If one of two options is not specified, the non-specified option is considered false.

referenceIntoComponentsAll (optional) (referenceIntoComponents is also queried, to provide backward compatibility with existing referenceIntoComponents, issuing a warning of future deprecation)

If equals true, 'Bundler only' workflow is used.
ALL, and EVERY, AsyncAPI Specification-legal component is moved into components object, regardless of optimization considerations.
Bundler's internal generic mechanism to find every AsyncAPI Specification-legal component throughout all of the schema object is used in this case.

referenceIntoComponentsOptimize (optional)

If equals true, 'Bundler -> Optimizer' workflow is used.
Bundler bundles several specifications into one file (dereferencing external $refs) and lets the Optimizer check the output.
Optimizer moves components to the components object if there is more than one copy of the same component (if there are reusability benefits).

saveOriginsOfRefs (optional)

If equals true, Bundler creates list of origins of external $refs via Specification Extensions, in form

{
  "x-origins-of-refs": {
    "#/components/schemas/bankAccountIdentifier": "./financial-system.yaml#/components/schemas/bankAccountIdentifier"
  }
}

to keep traceability of where each resolved external $ref came from.

Please react with like/dislike/seen or add comments.

@KhudaDad414
Copy link
Member

Of course, the parser (for now) does not provide all the data needed for proper bundling (e.g. the original reference of the de-referenced object), but if it did, it would make the work on the bundler much easier.

This. I mean I am ready to sign off my firstborn child for this feature. 😆 This would make the logic of the optimizer much simpler. do we have an issue for this? @magicmatatjahu

@thake has a point here #97 (comment) but if the parser provides that piece of information, the optimizer can do it as well.

I don't have any strong opinion about @aeworxet's proposal. but if we save the origin of refs, it should be the same in parser so other tools can consume it easily.

@aeworxet
Copy link
Collaborator

aeworxet commented Dec 5, 2022

And this again raises the question about the library which would contain all reusable code, used by all tools, through all of the AsyncAPI Initiative. 😀
Could be a library, like @asyncapi/library-js/@asyncapi/library-ts/@asyncapi/library-go etc.
Could be a collection of code snippets for different languages (though being a 'collection of snippets' would lower its 'official' status).
Could be a collection of 'AsyncAPI best practices'.
Could be a collection of pseudocode snippets (to make it language-agnostic) reflecting AsyncAPI Initiative's best practices.

Could be anything - the idea here is about having a centralized store of reusable code, like 'standard library' in programming languages or frameworks.

Also, it could be a separate exported function in parser, of course.

@magicmatatjahu
Copy link
Member

@aeworxet

We can handle these three parameters in options as you described, but first we should implement the referenceIntoComponentsAll parameter. I think we don't need to create the referenceIntoComponentsAll parameter, but we can reuse the referenceIntoComponents parameter - I don't think the two parameters will interfere with each other. Do you see any use case where they will?

About referenceIntoComponentsOptimize - hmm, I don't think so that we should use optimizer inside bundler, as I described in previous comment, we should "combine" them in integration tools like cli, studio, server-api. However, we can add them, but after implementing referenceIntoComponentsAll.

saveOriginsOfRefs - I think that logic should be handled in parser-js, not in the bundler, but as previously, we can add it but after referenceIntoComponents. But personally, I don't like that the bundler will work that way. We should actually separate the functionality of the libraries.

Also, it could be a separate exported function in parser, of course.

Rather than separate library for common functions, every util/helper function which will "operate" AsyncAPI document (parsed or not) should be written in parser-js, and for another language also in corresponding parser, e.g. parser-go :)

What do you think about it? Do you have a different opinion, or am I wrong somewhere?

cc @derberg @aeworxet @KhudaDad414 @Souvikns

@asyncapi-bot
Copy link
Contributor

Hello, @magicmatatjahu! 👋🏼

I'm Genie from the magic lamp. Looks like somebody needs a hand! 🆘

At the moment the following comments are supported in issues:

  • /good-first-issue {js | ts | java | go | docs | design | ci-cd} or /gfi {js | ts | java | go | docs | design | ci-cd} - label an issue as a good first issue.
    example: /gfi js or /good-first-issue ci-cd

@aeworxet
Copy link
Collaborator

aeworxet commented Dec 6, 2022

Then I will simply extend the functionality of the Bundler's option referenceIntoComponents to

ALL, and EVERY, AsyncAPI Specification-legal component is moved into `components` object,
regardless of optimization considerations.
Bundler's internal generic mechanism to find every AsyncAPI Specification-legal component
throughout all of the schema object is used in this case.

as @thake requested and that's it.

Optimization of the resulting specification after Bundler, will be a separate process, handled by Optimizer at CI level.

Creating the list of origins of external $refs via Specification Extensions, to keep traceability of where each resolved external $ref came from, will be the functionality of parser-js, and Bundler will import this function during bundling, in case saveOriginsOfRefs: true is specified in its options (say, dynamic import).

How about this?

@magicmatatjahu
Copy link
Member

Then I will simply extend the functionality of the Bundler's option referenceIntoComponents to

👌🏼

Optimization of the resulting specification after Bundler, will be a separate process, handled by Optimizer at CI level.

Yeah but without referenceIntoComponentsOptimize parameter in thebundler options. That option should be added in the integrations :)

Creating the list of origins of external $refs via Specification Extensions, to keep traceability of where each resolved external $ref came from, will be the functionality of parser-js, and Bundler will import this function during bundling, in case saveOriginsOfRefs: true is specified in its options (say, dynamic import).

Yeah, it could be :)

Thanks @aeworxet for your awesome work! Let's wait for comments from another people :)

@derberg
Copy link
Member

derberg commented Dec 8, 2022

Lemme summarize as I'm a bit lost

  • no need for referenceIntoComponentsOptimize as this should be just integrated in CLI or other tools, use bundler and optimizer separately
  • I agree we do not need referenceIntoComponentsAll. Especially that bundler did not reach major release yet. Breaking changes are expected, and even if we do not like them, then let's just release 1.0
  • I get that parser could and actually is the only place where we could implement functionality to preserve $ref info (actually I think we need it with 3.0 because of inification to use $ref everywhere where needed)

What I don't get, why we need referenceIntoComponents here instead of having optimizer handle it 😄

  • bundler should bundle references to locations where they point from. I actually think current referenceIntoComponents is a bit beyond bundler scope
  • I know optimizer is not moving to components stuff that showed up only once, but we can have it behind an option, right? in the end it is optimization for some folks, some schemas might be super large and you want them in components. This is a perfect real-world example where the schema is huge, one in the entire file but you still want to move it to components because it is hard to read other info from the file -> https://docs.confluent.io/cloud/current/stream-governance/async-api.html#components-section

Look at this example

asyncapi: '2.5.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup:
    subscribe:
      message:
        payload:
          $ref: 'https://raw.githubusercontent.com/asyncapi/studio/354e1cfa51b2d79d855d5aeb74c431a1b5f04cfb/src/examples/simple.yml#/components/messages/UserSignedUp/payload'

bundler should turn it into

asyncapi: '2.5.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup:
    subscribe:
      message:
        payload:
          type: object
          properties:
            displayName:
              type: string
              description: Name of the user
            email:
              type: string
              format: email
              description: Email of the user

optimizer into

asyncapi: '2.5.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup:
    subscribe:
      message:
        payload:
          $ref: '#/components/schemas/UserSignedUp'
components:
  schemas:
    UserSignedUp:
      payload:
        type: object
        properties:
          displayName:
            type: string
            description: Name of the user
          email:
            type: string
            format: email
            description: Email of the user

And yes, I know it won't work as described above as optimizer cannot guess UserSignedUp schema name.
Even if we preserve $ref (https://raw.githubusercontent.com/asyncapi/studio/354e1cfa51b2d79d855d5aeb74c431a1b5f04cfb/src/examples/simple.yml#/components/messages/UserSignedUp/payload) we also are not sure we can use it for getting the schema name. The solution is to rely on uid helpers that extract it from different locations, like name in case of message or $id in case of schemas.

@KhudaDad414 what you do in optimizer? use helper functions from parser models or just automatically generate these names of components?

@KhudaDad414
Copy link
Member

KhudaDad414 commented Dec 9, 2022

@derberg optimizer generates names for schemas and it uses the name in case of messages. I totally remember looking into it a year ago and seeing there was some room for improvements in naming. there was a useful field that was coming from the parser(I don't exactly remember what 🤦). I will look into it.

why we need referenceIntoComponents here instead of having optimizer handle it.

This makes total sense to me.

@aeworxet
Copy link
Collaborator

Is there a final decision, whether Bundler's referenceIntoComponents should be extended or removed completely?

@derberg
Copy link
Member

derberg commented Dec 15, 2022

let's see what @Souvikns and @magicmatatjahu also think about my last comment

@Souvikns
Copy link
Member

I agree that referenceIntoComponents should be something handled by the optimizer, in fact the example @derberg is using, I was actually thinking of the same thing, that bundler would resolve the references and then optimizer could localize them further, I even had a talk with @KhudaDad414 about this, where I learned that this is not possible for the optimizer. If there would two copies of the payload in two different places then the optimizer could help.

@KhudaDad414 if you are up for it, I would love to move this feature, to optimizer and then update bundler to use optimizer to referenceIntoComponents

@derberg
Copy link
Member

derberg commented Dec 19, 2022

just keep in mind @Souvikns, that once we get it handled by optimizer, there is no need for referenceIntoComponents anymore. On library level we would just advice people to use both, bundler and then optimizer, and on CLI level we can do referenceIntoComponents that basically calls optimizer, so people do not have to do asyncapi bundle & asyncapi optimize

@magicmatatjahu
Copy link
Member

@derberg

  • bundler should bundle references to locations where they point from. I actually think current referenceIntoComponents is a bit beyond bundler scope
  • I know optimizer is not moving to components stuff that showed up only once, but we can have it behind an option, right? in the end it is optimization for some folks, some schemas might be super large and you want them in components. This is a perfect real-world example where the schema is huge, one in the entire file but you still want to move it to components because it is hard to read other info from the file

I think that the bundler itself should also transfer some things to the components - that's how bundlers should work - the optimizer is an add-on that can simplify the specifications, but it should not take whole "care" of the reference thing in bundling phase (or after it) - but extend it. To be clear, the appearance of several references to the same file should be bundled like the json-schema-ref-parser does.

Copy link
Member

derberg commented Dec 21, 2022

I think that the bundler itself should also transfer some things to the components - that's how bundlers should work

Can you share some example of the transfer. What do you mean? in what cases something would be moved and in what cases not? Currently referenceIntoComponents is not "selective" and handles all messages.

@derberg
Copy link
Member

derberg commented Jan 20, 2023

@magicmatatjahu yo, can you have a look at my last comment. It would be good to reach some consensus here

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Feb 2, 2023

@derberg

and what you mean is that the same should be for this bundler? that whenever there is for example payload reference to outside, the same payload reference in 2 different messages, after bundling, should be modified to point to new local reference to components.schema (schema moved from external reference). And this should be default?

Isn't it actually how bundler works like at the moment?

It works like you described but it should work for every type of refs, e.g. not only for messages but for schemas, parameters, server variables etc.

So yeah, bundler should check if external ref is used at least two times and move that ref (with resolved/value) to the components.[type of object]) and that's it. Someone can say that behaviour (I mean referencing two this same references to components) is reserved for optimizer, but as I wrote, optimizer is add-on to show that something should be optimized, like removing unused components, or even optimize some resolved references etc.

Copy link
Member

derberg commented Feb 6, 2023

optimizer is add-on to show that something should be optimized

it shows and can also do the optimization for you, and one of these is moving stuff that occurs more than once to components.

Input:

asyncapi: '2.5.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup1:
    subscribe:
      message:
        payload:
          $ref: 'https://raw.githubusercontent.com/asyncapi/studio/354e1cfa51b2d79d855d5aeb74c431a1b5f04cfb/src/examples/simple.yml#/components/messages/UserSignedUp/payload'
  user/signedup2:
    publish:
      message:
        payload:
          $ref: 'https://raw.githubusercontent.com/asyncapi/studio/354e1cfa51b2d79d855d5aeb74c431a1b5f04cfb/src/examples/simple.yml#/components/messages/UserSignedUp/payload'

should be bundled in dummy way as:

asyncapi: '2.5.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup1:
    subscribe:
      message:
        payload:
           type: object
           properties:
                displayName:
                    type: string
                    description: Name of the user
                email:
                    type: string
                    format: email
                    description: Email of the user
  user/signedup2:
    publish:
      message:
        payload:
           type: object
           properties:
                displayName:
                    type: string
                    description: Name of the user
                email:
                    type: string
                    format: email
                    description: Email of the user

Bundler should not recognize that there are duplicates and move stuff to components, this logic belongs to optimizer

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Feb 6, 2023

@derberg Nope, bundler should also save this same references (only external ones).

Copy link
Member

derberg commented Feb 6, 2023

bundler should also save this same references (only external ones)

sorry but I do not understand what you mean 😞

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Feb 8, 2023

Ok, I'll explain again how it should work. I will show how bundling works in https://github.com/APIDevTools/json-schema-ref-parser library, based on JSON Schema examples, so that everyone can test it themselves.

Having schemas like:

# my-schema.yaml
type: object
properties:
  external:
    $ref: './definition.yaml#/User/properties/name'
  externalMulti1:
    $ref: './definition.yaml'
  externalMulti2:
    $ref: './definition.yaml'

# definition.yaml
User:
  type: object
  required: [name]
  properties:
    name:
      type: string
  examples:
    - name: "Homer"

and code

import $RefParser from '@apidevtools/json-schema-ref-parser';

async function main() {
  const schema = await $RefParser.bundle("my-schema.yaml");
  console.log(schema);
}
main();

we have output:

{
  type: 'object',
  properties: {
    external: { '$ref': '#/properties/externalMulti1/User/properties/name' },
    externalMulti1: { User: [Object] },
    externalMulti2: { '$ref': '#/properties/externalMulti1' }
  }
}

so as we can see, @apidevtools/json-schema-ref-parser first resolves refs and tries to "optimize" them as much as it can. externalMulti1 field has full external object and the next one fields has subset of object or whole object, but "saved" as internal references (references in this same file/document). In our bundling we don't need such a optimization, because that should be handled by optimizer library, however... we should "optimize" external references saving them in the components object if this same external reference is used at least two times in document (not subset, this same reference, this same value in $ref field) - to decrease size of document, so example from @derberg, should be bundled to this:

asyncapi: 2.6.0
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup1:
    subscribe:
      message:
        payload:
          $ref: '#/components/schemas/UserSignedUpPayload'
  user/signedup2:
    publish:
      message:
        payload:
          $ref: '#/components/schemas/UserSignedUpPayload'
components:
  schemas:
    UserSignedUpPayload: # the id of external reference can be generated or can use subset of ref path
      type: object
         properties:
            displayName:
              type: string
              description: Name of the user
            email:
              type: string
              format: email
              description: Email of the user

In other example, if someone use only one time given external ref, it should look like (external ref should be written to document inline in given place, where ref is defined):

asyncapi: 2.6.0
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup1:
    subscribe:
      message:
        payload:
           type: object
           properties:
                displayName:
                    type: string
                    description: Name of the user
                email:
                    type: string
                    format: email
                    description: Email of the user

How I would do it and so I recommend that it should work that way. We can always add options like optimizeExternalRefs which would "shorten" the document by moving such repeated external references more than 1 time to components. if someone would define this option as false, the bundler would work as @derberg described.

Now I am not sure if our bundler works identically as I described for messages objects. If yes then ok, and we should extend this logic to other object types present in AsyncAPI document. Is my idea already understood?

@derberg
Copy link
Member

derberg commented Feb 8, 2023

HUGE thanks for detailed explanation 🙏🏼

First of all I'd like to add that if some tool does something similar to what we do, it doesn't mean we have to follow. If we would like to follow, we would just add bundler inside the parser-js, but we did not, we choose the path to have separate ["parser-js", "bundler", "optimizer"] and not one compo library that does that.

But even if we look at the @apidevtools/json-schema-ref-parser, as you can see in the result of the bundler, it does some strange reuse, but it definitely does not "optimize" 👇🏼

@apidevtools/json-schema-ref-parser does not move User schema to definitions where all 3 properties can $ref to. Just like our bundler should not move stuff to components as this is not a job of the bundler

if we mimic what they do, we would do:

asyncapi: 2.6.0
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups
channels:
  user/signedup1:
    subscribe:
      message:
        payload:
          UserSignedUpPayload: # the id of external reference can be generated or can use subset of ref pat
            type: object
            properties:
              displayName:
                type: string
                description: Name of the user
              email:
                type: string
                format: email
                description: Email of the user
  user/signedup2:
    publish:
      message:
        payload:
          $ref: '#/channels/user~1signedup1/subscribe/message/payload/UserSignedUpPayload'

But this is confusing, as the other JSON Schema example.

the id of external reference can be generated or can use a subset of the ref path

this is the "magic" we already have in optimizer, and there is no point in duplicating it in bundler

decrease size of document

for me size decrease is an optimization


If bundler would be a project outside asyncapi organization, and optimizer would be maintained by some other organization, then sure, let's put as much into bundler as we need for the use case. But we are in a position where both projects are here in asyncapi so we can clearly separate their responsibilities.

  • on library level, if people need both, they simple use both libraries
  • on UI level, pfff we can do whatever we want, the user doesn't care what libraries work underneeth,
  • on CLI level we can have asyncapi bundle, asyncapi optimize and even asyncapi bundle --optimize (where underneath optimizer is used)

@magicmatatjahu
Copy link
Member

@derberg We can go as you described :)

@aeworxet
Copy link
Collaborator

As I understood from #97 (comment)

bundler should not move stuff to components as this is not a job of the bundler

the id of external reference can be generated or can use a subset of the ref path

this is the "magic" we already have in optimizer, and there is no point in duplicating it in bundler

none of things I proposed in #97 (comment) should be implemented.

I'm talking about

referenceIntoComponentsAll: true | false
saveOriginsOfRefs: true | false

Instead, quite opposite is true: even moving message to components with referenceIntoComponents should be removed.

Moving all and every AsyncAPI-Specification-legal component into components object will be done by optimizer.

Creation of the list of origins of external $refs via Specification Extensions, in form

{
  "x-origins-of-refs": {
    "#/components/schemas/bankAccountIdentifier": "./financial-system.yaml#/components/schemas/bankAccountIdentifier"
  }
}

to keep traceability of where each resolved external $ref came from, will be done by optimizer.

Copy link
Member

derberg commented Feb 14, 2023

We only need to remember to not remove referenceIntoComponents from bundler until we are 100% sure optimizer can do all the work we need

@aeworxet aeworxet removed their assignment Apr 11, 2023
@aeworxet
Copy link
Collaborator

Bundler and Optimizer work same on YAML which has only internal $refs, but Optimizer does not have ability to dereference external $refs.

image

image

This blocks possibility of

Creation of the list of origins of external $refs via Specification Extensions, in form

{
  "x-origins-of-refs": {
    "#/components/schemas/bankAccountIdentifier": "./financial-system.yaml#/components/schemas/bankAccountIdentifier"
  }
}

to keep traceability of where each resolved external $ref came from, will be done by optimizer.

Should functionality of dereferencing the external $refs be added to Optimizer?

@derberg
Copy link
Member

derberg commented Apr 13, 2023

I think so, yes, @KhudaDad414 ?

@KhudaDad414
Copy link
Member

KhudaDad414 commented Apr 13, 2023

@derberg now I am lost. 🤔
based on #97 (comment)

bundler should bundle references to locations where they point from.

So to answer

Should functionality of dereferencing the external $refs be added to Optimizer?

isn't it the whole idea behind bundler?

am I missing something?

@derberg
Copy link
Member

derberg commented Apr 13, 2023

oh 🤦🏼 you are completely fine. In the end the flow is bundler -> optimizer

bundler scope is to bundle all the AsyncAPI files and related refs in one document and optimizer scope is to work on a single AsyncAPI document.

@aeworxet can you try to elaborate more why optimizer needs to dereference external references? idea is that optimizer will get a bundled AsyncAPI document that has all references dereferenced already

@aeworxet
Copy link
Collaborator

Functionality

Creation of the list of origins of external $refs via Specification Extensions, to keep traceability of where each resolved external $ref came from, will be done by optimizer.

was in silently approved scope so I checked it.
Should this functionality be added to Bundler as a request from user?

@KhudaDad414
Copy link
Member

KhudaDad414 commented Apr 14, 2023

@aeworxet @thake what value does it provide? I mean how is it useful to keep the origins of external $ref? As @derberg mentioned it can't be used to extract the components name from it because there is no standard way to decode this kind of info out of it. any other use case? 🤔

@aeworxet
Copy link
Collaborator

Since @thake, the original requester of the functionality, doesn't reply, is it safe to assume that this request only addressed one specific use case and does not have the significant impact required for it to be implemented at Bundler-wide level?

@thake
Copy link
Author

thake commented Apr 25, 2023

I intended to provide functionality that bundles schema components into one schema without changing the way tools display the schema.

Some tools treat external references differently than inlined schema components. For example, the AsyncAPI Studio lists all schemas which are not inlined separately at the end of the page. Inlined schemas are only listed within their used operation.

That is how I interpreted the already existing referenceIntoComponents functionality. This issue was about extending this functionality to also bundle other types of schema components than messages.

This bundling mode is not uncommon for API spec tools. For example, the Redoc CLI bundles OpenAPIs per default in this way. Components can be inlined by specifying --dereferenced.

@derberg
Copy link
Member

derberg commented May 24, 2023

This bundling mode is not uncommon for API spec tools

and it completely makes sense in my opinion to support in AsyncAPI tools.

For example, the Redoc CLI bundles OpenAPIs per default in this way. Components can be inlined by specifying --dereferenced.

yes, and what happens internally in CLI do not matter for anyone.

I think we all agree that moving schemas and other parts to components definitely makes sense. Just from design perspective better do it in optimizer than bundler

Then in the CLI we can do whatever flow is the best for the user

  • can be asyncapi bundle for just bundling
  • can be asyncapi optimize for just optimizing without bundling
  • and for better DX, so user do not have to run both, we can have asyncapi bundle --optimize or whatever we thing sounds better

@derberg
Copy link
Member

derberg commented May 24, 2023

regarding #97 (comment) comment
and saveOriginsOfRefs mentioned by @aeworxet

@aeworxet @thake what value does it provide? I mean how is it useful to keep the origins of external $ref? As @derberg mentioned it can't be used to extract the components name from it because there is no standard way to decode this kind of info out of it. any other use case? 🤔

I wrote in #97 (comment) that counting on $ref to get schema name from it has cases when it won't work. With 3.0 and asyncapi/spec#910 we will even have more cases when it won't work.

But, but generation of schema names is also not the most perfect solution 😅 the anonymous-name-generation stuff we have in parser backfires all the time

so I think preserving $ref would be useful, and then in optimizer we can have a flag where user can decide if "try-my-best-naming" or "generate-naming".

Thoughts @aeworxet @KhudaDad414 ?

@KhudaDad414
Copy link
Member

@derberg agreed.
So to summarise the changes that we need:

Bundler

  • remove the referenceIntoComponents functionality.
  • provide origins of $refs

Optimizer

  • have another option to move all of the components to the components section.
  • try to use the origins of $refs to extract better names.

Copy link
Member

derberg commented Jun 1, 2023

awesome candidates to mark for bounty when we roll it out, as they are non trivial, no easy for first time contributors

@KhudaDad414
Copy link
Member

the fruit of this discussion has been summarised here: #141

I suggest we close this issue now.

@aeworxet
Copy link
Collaborator

The issue is closed with the completion of #141.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants