Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve modulemds* creation API #3510

Open
pedro-psb opened this issue Apr 19, 2024 · 9 comments
Open

Improve modulemds* creation API #3510

pedro-psb opened this issue Apr 19, 2024 · 9 comments
Labels

Comments

@pedro-psb
Copy link
Member

Problem

The API for creating modulemds, modulemds_defaults and modulemds_obsoletes is a bit redundant:

  • There is a set of fields which we want to keep track in the db (e.g, module, stream, profile, etc)
  • There is a snippet field (a yaml file) that "should" contain the same data as the set of fields above. (thats not checked)

This allows creating inconsistent records of this type of content.

Proposal

Some ideas (not mutual exclusive) for a more ergonomic design are:

  1. Add only a snippet as required, plus repository and packages (for modulemds) as optional. The additional data that Pulp want to keep track of should be present in the snippet, or we won't accept it.
  2. For modulemds: If packages are provided (which are pulp_hrefs), validate that they strictly match the rpm->artifacts in the yaml.

The snippet may be a string or a file, I'm not sure what is best.

Additional Context

Motivated by #3427

@pedro-psb
Copy link
Member Author

pedro-psb commented May 9, 2024

After further discussion w/ rpm team, we've concluded that:

  • source of truth: All the data should be derived from the snippet (including packages), as the snippet is what is going to be used in the generation of the metadata (publication step). This is to avoid inconsistent data state.
  • pulp object linking (Modulemds):
    • The current packages parameter of the API, which today can be filled with package hrefs, should be found using information on the snippet. The snippet provides packages nevras (through the name), and thats going to be used internally to look up for that NEVRA in a given repository.
    • If the packages can't be found, Pulp should raise an error

Open questions

When should the attempt to link snippet packages with Pulp Packages be performed?

  • immediate (on upload processing):
    • This is the most sane choice for maintanability/predictability of Pulp.
    • Downside is that build systems should adjust their workflow to ensure that modulemds are uploaded as a last stage. How inconvenient is that for a complex build system?
  • lazy:
    • This would allow the build-system workflow of throwing packages and modulemds in any order an only try the linking when it matters (e.g, publish or using the Modify API w/ copy/remove).
    • General downside is that it adds more complexity to Pulp.
    • Variants:
      • on-publish: When publishing, try to link the packages. Possibly would be on using modify API aswell.
      • on-first-access: On the first attempt to access Modulemd.packages
      • on-explicit-link-request: Have an endpoint to trigger the linking
      • on-repo-associaton: I guess we shouldnt, but we can have "repositoryless" content. In that case, we could try looking in the repository when its associated to a Repo, which may or may not be on upload.

Where should Modulemd look for packages?

  • On the repository its in (latest RepoVer). If its not in a repository, then what?
  • On global set of packages available

@dralley
Copy link
Contributor

dralley commented May 16, 2024

@daviddavis Does microsoft have any interest in or need for uploading their own modules directly into Pulp (without a sync)? If so, do you have any issues with the current modulemd creation API and do you have any feedback on what it should look like?

@daviddavis
Copy link
Contributor

No, we don't use modules and haven't had a publisher ask to use them. Thanks for checking though.

@pedro-psb
Copy link
Member Author

@javihernandez, it would be nice to have further feedback on this.

There are two improvements I'm trying to do with this:

  1. Improve user experience of the API
  2. Make Modulemd storage/processing consistent

I want to know how that may affect the distributed upload of modular packages inconvenience that you reported early on, about the immutability of Modulemds.

I've proposed that, when the Modulemd is added to a Repository, then Pulp will try to find the Pulp Packages (matching the listed nevras in the Modulemd) in the context of the RepositoryVersion its being added to. The main advantage of this is that is assures Modulemd and Repository consistency*.

For your workflow, that means you could upload the Module before the end of the build, but still, you would need to add the Module (via its href) to the final Repository in the end, if the uploads are successful. I'm not sure if that's helpful or not in the context of you workflow. Wdyt?


*We've been discussing about how RpmRepository constrains/consistency are really suited for Distribution workflows, but not much for Build-system workflows. We still need to understand build-system requirements better so we can have better first class support for it.

@pedro-psb
Copy link
Member Author

Also, a more minor question about this API improvement: I'm inclined to make the snippet upload be a File rather than a String (as it is currently), because we have similar endpoints which uses a File. Any preference here?

@ipanova
Copy link
Member

ipanova commented May 20, 2024

Open questions

When should the attempt to link snippet packages with Pulp Packages be performed?

I think it makes sense to expect the module snippet to be uploaded as last, meaning that all the rpms that it mentions should already be present in pulp otherwise the module should be considered corrupted.
There is a similar workflow in container registry where manifest.json that describes the image composed of layers is uploaded as last and it's upload is rejected if not all layers are already present in pulp. This works well for us and makes sure pulp creates all the necessary relations and guarantees 'composite' content integrity.

Where should Modulemd look for packages?

We should always look at the latest repo version, if package(s) not available, fail the upload with meaningful message.
The package might be present in pulp, but in another repo, we do not want to create relations in this case.
More to this, we would to still fail the module upload into repoB, if repoB will not have the necessary packages, even if same module with packages is already present in repoA.

@pedro-psb
Copy link
Member Author

@ipanova There is detail about linking the packages "on upload", strictly speaking, because Repository is optional.
So whether we make it required or we can trigger the link "on repover creation", when its effectively being added to a repository.

@ipanova
Copy link
Member

ipanova commented May 21, 2024

@pedro-psb I thought we made the upload to require repo always, otherwise the content without a repo is considered orphan and user will not have access to it because the permissions on the content are being scoped off the repo permissions.
But yes, I believe we could do linking somewhere at finalize_repo_version step too, which should fail if it happens that the version-in-progress does not contain necessary packages.

@ipanova
Copy link
Member

ipanova commented May 21, 2024

ok, we do not require a repo to be provided for admins https://github.com/pulp/pulpcore/blob/main/pulpcore/app/global_access_conditions.py#L488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants