-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow project / package "duplicates" with the same VCS location #6533
base: main
Are you sure you want to change the base?
Conversation
This is analogous to `addPackages()` and will be used in an upcoming test. Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
// Some source code repositories contain projects that are used as packages by other contained projects. So the | ||
// same code is once seen as a project, and once as a package by ORT. Allow such "duplicates" as the ids | ||
// actually refer to the same thing by requiring duplicates to refer to different VCS locations. | ||
val realDuplicates = duplicates.filter { (_, duplicates) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oss-review-toolkit/core-devs, what do you thing about this? Should we rather only keep one of these (either the project or the package) instead of keeping both of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm specifically asking as I currently don't oversee all side-effects that might occur if we keep a project and a package with the same id in the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's likely that assumption that Id's are unique are spread all over the place. To list a few, while guessing many further things can be found:
ort/model/src/main/kotlin/OrtResult.kt
Line 451 in 19c89ff
fun getAdvisorResultsForId(id: Identifier): List<AdvisorResult> = advisorResultsById[id].orEmpty() ort/model/src/main/kotlin/OrtResult.kt
Line 532 in 19c89ff
fun getScanResultsForId(id: Identifier): List<ScanResult> = scanResultsById[id].orEmpty() private val packageId by option( - https://github.com/oss-review-toolkit/ort/blob/main/model/src/main/kotlin/OrtResult.kt#L132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's likely that assumption that Id's are unique are spread all over the place.
Right. But some of these only assume ids to be unique within projects or packages, but not within the union of both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But some of these only assume ids to be unique within projects or packages, but not within the union of both.
Yeah, that's for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISTR that there was also a problem with the WebApp report when duplicate IDs were present: The JavaScript part uses an index-based mapping to reference objects, and if there were projects and packages with the same ID, this mapping got broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered as alternative solution: Making sure that the
Identifier.type
for projects differs fromIdentifier.type
for packages?
Actually, I was playing around with similar ideas 3 years (!) ago:
- In the identifier-types branch I tried to turn an
Identifier
'stype
into asealed class
withProjectIdentifier
andPackageIdentifier
sub-classes. - In the multiple-identifier-classes branch the
Identifier
class itself is tried to be turned in asealed class
.
Both approaches had the goal to distinguish project and package ids, and to iterate over all known id types (e.g. for when
statements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @fviernau that there are probably lots of places in the code which assume that identifiers are unique and it could be very hard to find all of them, and with @oheger-bosch that likely the WebApp report and probably other report formats would break.
There are also situations where the same package could appear twice with different metadata (e.g. if different repositories are used), I'm not sure if we currently fail in this case or just ignore it.
I think properly addressing this issue would be a big effort, another approach than the sealed class you suggested could also be to extend the identifier with properties like isProject
and provider details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think properly addressing this issue would be a big effort
Indeed. That's why I never completed my branches mentioned above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also situations where the same package could appear twice with different metadata (e.g. if different repositories are used), I'm not sure if we currently fail in this case or just ignore it.
I verify this. We have a repo in which ORT is failing because three or four packages appear twice due to different metadata.
…tion For some background information see [1], and specifically the NPM project / package at [2]. [1]: https://gitlab.eclipse.org/eclipsefdn/emo-team/eclipsefdn-ort/-/issues/56#note_1084923 [2]: https://git.eclipse.org/r/plugins/gitiles/cdt/org.eclipse.cdt/+/7c6bd5bdcb4cf8c31050aeeb102250a27a157728/qt/org.eclipse.cdt.qt.core/acorn-qml/package.json Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
d9327fc
to
7b1c5c9
Compare
@@ -46,9 +46,18 @@ class AnalyzerResultBuilder { | |||
|
|||
fun build(excludes: Excludes = Excludes.EMPTY): AnalyzerResult { | |||
val duplicates = (projects.map { it.toPackage() } + packages).getDuplicates { it.id } | |||
require(duplicates.isEmpty()) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some story maybe worth sharing here: I recently scanned ORT with ORT locally,
it exited complaining in about duplicates in this function. I debugged it only quick and had to stop at some point due to other tasks. If I didn't make a mistake the finding was that there were multiple packages (not projects) with the same id
which were completely identical except for on attribute. IIRC it was the homepage URL.
That's just FYI, didn't look into this PR deeply yet .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had the case of multiple packages with the same identifier. This happened when a package was referenced from multiple package managers using the same package type (for instance GoMod and GoDep). If the package metadata is not exactly the same, multiple Package
instances are created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my above case the full identifiers were identical, including the type
.
The change have been tested and is fine from our side |
Thanks for the feedback. However, this PR will more likely evolve into the direction of dropping the duplicate package in favor of only keeping the project. |
Please have a look at the individual commit messages for the details.