Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bundle and feature package type for Eclipse p2 artifacts #272

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ptziegler
Copy link

An initial draft for bundles and features. There are more p2 types, but I can't see any use case where it would be necessary to explicitly specify them.
#271

@laeubi
Copy link

laeubi commented Nov 25, 2023

I just wanted to note that if we talk about P2 units(!) they usually use the symbolicname = unit id, but this is not mandatory! For features they even use unit id = featureid+.feature.group

So I would suggest to not talk about bundles and features, P2 has:

  1. An id for each unit
  2. A version for each unit

so the most general one would be to define it as p2/<id>/<version> what should (if repository is given) uniquely identify the P2 unit

@ptziegler
Copy link
Author

I just wanted to note that if we talk about P2 units(!) they usually use the symbolicname = unit id, but this is not mandatory! For features they even use unit id = featureid+.feature.group

That's a good point. Trying to be specific is something that'll most likely backfire sooner rather than later.

so the most general one would be to define it as p2// what should (if repository is given) uniquely identify the P2 unit

I'd still like to bring in a classifier, somehow. Just to be able to tell at first glance, whether a unit is e.g. a binary or plug-in. But that is not a required attribute and should instead be provided as an additional, optional qualifier.

@mbarbero
Copy link

We were discussing this with @waynebeaton in https://gitlab.eclipse.org/eclipsefdn/emo-team/sbom/-/issues/4. You'll see that our proposals are very close to what you have here.

However, I would like to note that PURLs are about referencing (physical) artifacts. As such, the PURL should be about units in p2 artifacts repositories i.e., artifacts, and not (installable) units from metadata repositories.

@laeubi
Copy link

laeubi commented Nov 28, 2023

@mbarbero thanks for the hint, would you mind to continue discussion here (CC @waynebeaton @merks)?

As such, the PURL should be about units in p2 artifacts repositories

If we want to go for artifacts (== P2 Artifact repositories, not units a unit only can reference artifacts) then an artifact has three mandatory attributes

  • classifier (e.g. org.eclipse.update.feature others are osgi.bundle, or binary)
  • id (e.g. org.eclipse.ui.tools.feature or org.eclipse.ui.tools)
  • version (e.g. 1.0.0.200806161247)

regarding repository and defaults I like to add the following:

  1. an artifact are not necessarily deployed to an public http repository
  2. Using "eclipse release" as the default seems a bad choice as this is a moving target and many artifacts are probably not there
  3. Given that there is no central authority like maven central and we have no way to know the real origin, I would describe the uri (!) (not url as p2 uses URIs) as an optional hint where to find the artifact (e.g. you might be able to find them on a mirror as well!) and maybe should even allow to specify a list of urls.

@ptziegler
Copy link
Author

ptziegler commented Nov 28, 2023

Here we add the repository URL repository_url as a known key qualifier. This would be mandatory, as we have no default.

In the end, I think making the repository URL mandatory is quite dangerous and can lead to a whole lot of problems which I'd like to avoid. @laeubi already said in the CycloneDX discussion, that it might get difficult, to detect which repository an artifact originally comes from.

Yes at least optional, with P2 its quite common that an artifact "travels along", so question is how one finds it

But in addition, this would also be problematic for proprietary software...
Example: We are running an private Nexus instance, where we have created proxies for all the Eclipse repositories we use in our application, as well as host our own p2 repositories for internal artifacts.

When creating an SBOM, all artifacts would originate from e.g. https://www.nexusint.com/repository/eclipse-2023-09 instead of https://download.eclipse.org/releases/2023-09/.
The former URL is meaningless, unless you are inside our network and it's not possible for Tycho to derive the original repository.

However, I would like to note that PURLs are about referencing (physical) artifacts. As such, the PURL should be about units in p2 artifacts repositories i.e., artifacts, and not (installable) units from metadata repositories.

In the end, I don't think that's generally possible, when working on the artifacts alone.
Example: If I have the artifact with id org.eclipse.sdk and version 4.29.0.v20230903-1000 located in the repository https://download.eclipse.org/releases/2023-09/, then one might assume the artifact can be found at https://download.eclipse.org/releases/2023-09/features/org.eclipse.sdk_4.29.0.v20230903-1000.jar.
Except that we are dealing with a composite repository, meaning the correct URL is https://download.eclipse.org/releases/2023-09/202309131000/features/org.eclipse.sdk_4.29.0.v20230903-1000.jar

So if I want to derive the the physical location of an artifact, I need to process the p2 metadata (at least the content.xml), which is working with units...

@laeubi
Copy link

laeubi commented Nov 28, 2023

Just to prevent confusion, the final URI has to be derived from the artifacts.xml as it could contain a mapping or as you mentioned include composites.

As an alternative one might use the derived url that would be something like https://download.eclipse.org/releases/2023-09/features/org.eclipse.sdk_4.29.0.v20230903-1000.jar instead of https://download.eclipse.org/releases/2023-09 but as mentioned before I don't think the url/uri itself is really that important / useful at all.

Instead I think a security scanner that needs to fetch the final artifact (why?) needs to be configured with a set of artifact repositories it should use to search for the artifact key.

@ptziegler
Copy link
Author

ptziegler commented Nov 28, 2023

Given that there is no central authority like maven central and we have no way to know the real origin, I would describe the uri (!) (not url as p2 uses URIs) as an optional hint

You're right... a repository_uri is better suited than a repository_url

Instead I think a security scanner that needs to fetch the final artifact (why?) needs to be configured with a set of artifact repositories it should use to search for the artifact key.

Wouldn't the task of calculating the full url be part of the scanner, anyway?

Looking at the Maven specification, it only requires the GAV, but doesn't say anything about how it's stored in a m2 repository. Meaning the scanner needs to know that they are stored under /<group-id>/<artifact-id>/<version>/.

In that fashion, I would also put the burden on the tool, to figure out whether the uri points to a compound repository, a "plain" p2 repository, a Target file hosted on a Maven repository or even something completely different, rather than adding all that complexity to the specification.

@laeubi
Copy link

laeubi commented Nov 28, 2023

In that fashion, I would also put the burden on the tool, to figure out whether the uri points to a compound repository, a "plain" p2 repository, a Target file hosted on a Maven repository or even something completely different, rather than adding all that complexity to the specification.

👍

@mbarbero
Copy link

mbarbero commented Nov 28, 2023

When creating an SBOM, all artifacts would originate from e.g. https://www.nexusint.com/repository/eclipse-2023-09 instead of https://download.eclipse.org/releases/2023-09/.
The former URL is meaningless, unless you are inside our network and it's not possible for Tycho to derive the original repository.

If you use a Maven caching proxy (e.g. Sonatype Nexus), you have the same issue: the artifacts comes from your proxy rather than Maven central. Whether you put the internal reference or the public one is a SBOM tooling problem, not one to consider for the p2 PURL.

Given that there is no central authority like maven central and we have no way to know the real origin, I would describe the uri (!) (not url as p2 uses URIs) as an optional hint

I have the exact reverse reasoning :) : given there is not central authority, the repository URL is a mandatory hint, otherwise there is no way to find where the artifacts come from.

@laeubi
Copy link

laeubi commented Nov 28, 2023

I have the exact reverse reasoning :) : given there is not central authority, the repository URI is a mandatory hint, otherwise there is no way to find where the artifacts come from.

You can not even know this for maven artifacts, why the repository url is not mandatory there?
If I use software and have no clue where it comes from how can an SBOM help me given an URL I probably can not access at all? e.g. p2 items can come from anywhere and P2 often downloads them from a (random chosen) mirror, how should a tool like Tycho know what is the "real" source? That's simply impossible except for very simplified cases, e.g. you only consume from one update-site and you consume everything from that site, so no mirrors, no proxies, no ...

@mbarbero
Copy link

You can not even know this for maven artifacts, why the repository url is not mandatory there?

For Maven artifacts, the repository URL is not mandatory because the Maven scheme's specification states that the default repository is https://repo.maven.apache.org/maven2.

If I use software and have no clue where it comes from how can an SBOM help me given an URL I probably can not access at all? e.g. p2 items can come from anywhere and P2 often downloads them from a (random chosen) mirror, how should a tool like Tycho know what is the "real" source? That's simply impossible except for very simplified cases, e.g. you only consume from one update-site and you consume everything from that site, so no mirrors, no proxies, no ...

According to the specification, a purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.. 'Locating' does not imply universal accessibility, but rather that the location should be precisely defined. As such, tools responsible for creating these PURLs should make their best effort to achieve this.

When the mirrorsUrl mechanism is utilized for downloading an artifact, it's advisable that the repository URI in the PURL be designated as the "original" or "source", rather than the mirror URI. I guess that tools (Tycho and others) should be able to give this information.

In scenarios involving a private copy (e.g., a mirror created by p2.mirror tasks), a satisfactory solution may not currently exist. However, it's conceivable that such a "copy" could retain metadata regarding its origin, which could then be utilized by tools to generate an accurate PURL.

Thoughts?

@merks
Copy link

merks commented Nov 28, 2023

Some considerations:

  • During a Tycho build, one does not necessarily know (generally don't know) where the results will be promoted after the build.
  • When building a product, one might very well not place the individual artifacts anywhere other than within the product's zip/tar packaging for redistribution.

@laeubi
Copy link

laeubi commented Nov 28, 2023

Well the main problem is how p2 works (and this is similar to maven where you can define multiple repositories), that you give it a set of artifact repositories, and then you can query for an artifact key (that is type, id, version) and then you get back an artifact, but you can never know where it comes from because:

  1. p2 might has contacted any of its mirrors
  2. p2 might have a cached version from somewhere in the past (it must not be from the set of currently used repositories!)
  3. p2 might have used a referenced site to contact other servers as well
  4. p2 might have used a local build artifact (in case of Tycho)
  5. ...

So maybe p2 can record the data where it has fetched from once but that don't mean it is the only "real" source, if you look at eclipse-sdk-prereqs.target what is the input for Eclipse Platfrom build we have there EMF, ORBIT, ECF, ... now we publish the eclipse.download/releaseXY site... what is a the source of artifact emf/ecf/.. at version X... is it download.eclipsereleaseXY? Or is is not download.eclipse.org/emf/ecf/orbit... what if the artifact key can be found in multiple locations?

And even at maven you configure a set of maven repos, still the GAV don't guarantees it is download from what server... even if in an eclipse build all artifacts are downloaded from an eclipse mirror should we really claim the are from that mirror? How could an automated tool ever know?

The only possible option (for me) would be to feed the tool (Tycho) with a list of repositories it should query and use in the PURL, but this of course puts the burden on the producer side to manage the urls, also there is no guarantee a user configures the "right" ones. This also does not answer how a PURL should look like that is (not yet) deployed anywhere but probably will, e.g. in most cases I want to deploy the BOM together with my release, but without a release the URL will not exits...

@mbarbero
Copy link

I understand that the implementation presents challenges. However, our current focus is on defining the method to identify and locate a p2 artifact, which is the primary function of a PURL. This involves four key components:

  1. ID
  2. Version
  3. Classifier (such as org.eclipse.update.feature, osgi.bundle, or binary)
  4. Repository URL, which is essential according to the PURL specification, as a default cannot be assumed.

A p2 artifact cannot be fully identified or located without the repository URL.

Your insights are indeed valuable and appreciated. They pertain more to the implementation aspects, which would be best addressed in the p2/tycho discussions.

PURL-TYPES.rst Outdated
@@ -397,6 +397,59 @@ nuget

pkg:nuget/EnterpriseLibrary.Common@6.0.1304

p2
----
``p2`` for Eclipse p2 units:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``p2`` for Eclipse p2 units:
``p2`` for Eclipse p2 artifacts:

I'd use the word artifacts rather than units. This is about identifying and locating artifacts from artifact repositories, not resolving units from metadata repositories. Same change should be done throughout the rest of the document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. My initial idea was to use units but from what it looks like, artifacts are the better approach. I also have to check whether the individual bullet points still make sense.

PURL-TYPES.rst Outdated
3.5.500.v20220812-1420
2.0.0.202304281106

- The software artifact are accessed from a p2 repository. Given that each

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The software artifact are accessed from a p2 repository. Given that each
- The software artifact are accessed from a p2 artifact repository. Given that each

While it's common to have both artifacts and metadata repository at the same location, one could have both split, and metadata repo is not relevant here.

PURL-TYPES.rst Outdated
https://download.itemis.com/updates/releases/2.1.1

- A p2 repository can host a multitude of artifacts. The type of artifact is
provided by the ``classifier`` qualifier key and is optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's optional, a default shall be defined. I guess that osgi.bundle is a good one. WDYT?

Copy link

@laeubi laeubi Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For artifacts, the type/classifier is never optional and there is no default see IArtifactKey

Copy link
Author

@ptziegler ptziegler Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that my initial idea was to use units, not artifacts. So the only reason I introduced the classifier was not for technical reason, but rather as a hint to immediately see what type of element is described by the PURL.

With artifacts, having a classifier is now mandatory. Maven does something similar, with jar being the default, if no classifier is specified. In general, I like this idea, because it would reduce the length of a lot of PURLs.
But when we now put into the specification that osgi.bundle is assumed to be the default value, we then impose on p2 that this string must always be used to indentify bundles. That's not something I can decide...

Note that there was a similar discussion regarding Tycho recently and how it uses p2.eclipse-plugin as an artifical group-id for bundles that don't have proper Maven coordinates. External tools shouldn't rely on this string to always stay like this, because it's an implementation detail, rather than a formal specification. To me, this sounds like a similar situation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A maven classifier is something completely different! jar is the type / extension of an artifact (and the default is jar).
For P2 there is no default but the classifier can be empty (what is something different than classifier=osgi.bundle!) so if one needs to differentiate between not specified and empty what might be confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A maven classifier is something completely different! jar is the type / extension of an artifact (and the default is jar). For P2 there is no default but the classifier can be empty (what is something different than classifier=osgi.bundle!) so if one needs to differentiate between not specified and empty what might be confusing.

That's for the clarification. Then the classifier must remain an optional hint, with no default value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the spec to include the feedback so far. In short, the PURL contains a:

  • namespace (artifact id)
  • version (artifact version)
  • qualifier (classifier (optional), location (mandatory))

@laeubi
Copy link

laeubi commented Nov 28, 2023

@mbarbero from P2 point of view you can only locate an artifact inside an Artifact Repository, that's correct. Care must be taken to read the mapping form the repository to resolve the final artifact though. An artifact repository could be located at any URI and might require special java code to accessed.

@ptziegler
Copy link
Author

So to briefly summarize:

The PURL should be calculated based on p2 artifacts, rather than units. This necessitates the following components:

  • namespace: the artifact id, required
  • version: the artifact version, required
  • qualifiers: the classifier, optional with no default value

Because there is no central authority for hosting p2 artifacts, it should also contain a means to find its physical location, in order to satisfy the locator property:

The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism

Problem with this requirement are, among others, that:

  • the artifact may be hosted on several repositories. Therefore the locator is not unique.
  • the artifact may not even be hosted on a repository in the first place...

As a side-note, PURL already defines a repository_url as common qualifier. So it might be confusing to also introduce a repository_uri qualifier. I suggest calling it e.g. location instead. This would also avoid the implication that each artifact must belong to repository.

According to the specification, a purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.. 'Locating' does not imply universal accessibility, but rather that the location should be precisely defined. As such, tools responsible for creating these PURLs should make their best effort to achieve this.

Given that the URL only needs to provide a "primary access mechanism", I don't see the precise definition as a requirement. To pick up the example of eclipse-sdk-prereqs.target, all p2 repositories containing a given artifact would be valid locators.
To later verify that the repository artifact matches the build artifact one could then use e.g. the checksum. I think this would also avoid a lot of issues regarding mirrors, proxies, etc...

I total, we need the a locator. in addition to the remaining three components. Whether this locator be a repository URL, a Maven GAV, a relative path on the file system or whatever else is then implementation specific and shouldn't be discussed as part of the specification.

Did I miss anything? Are there any objections or concerns that I haven't addressed?

This specification describes how the PURL for a given Eclipse artifact
can be constructed. The locator includes both the information from the
(unique) artifact key, as well as the base URI of the artifact
repository.
@ptziegler
Copy link
Author

ptziegler commented Feb 18, 2024

A proof-of-concept has recently been merged to Tycho via eclipse-tycho/tycho#3258, based on this proposal.

The only noteworthy change is that the location now corresponds to the base URI of the artifact repository, rather than the download link to the jar/binary file.

Example:
bom.xml.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PURL type definition Non-core definitions that describe and standardize PURL types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants