Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Enable Dependabot on the repository #4720

Closed
2 of 3 tasks
ppkarwasz opened this issue Jan 4, 2024 · 20 comments · Fixed by #4827
Closed
2 of 3 tasks

[Bug] Enable Dependabot on the repository #4720

ppkarwasz opened this issue Jan 4, 2024 · 20 comments · Fixed by #4827
Labels
bug Something isn't working dependencies Pull requests that update a dependency file license

Comments

@ppkarwasz
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Environment

Other

EventMesh version

Other

What happened

Since EventMesh has a lot of dependencies and the upgrades are done manually, the TAR distribution contains many libraries with known vulnerabilities.

For examples Jackson 2.13.0 is included multiple times in the distribution.

One way to solve these problems is to enable Dependabot (cf. documentation) or another dependency manager on the repository. Dependabot is highly configurable. You can upgrade every dependency or just those that are vulnerable.

How to reproduce

Check the contents of the distribution archive and look for Jackson 2.13.0.

Debug logs

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ppkarwasz ppkarwasz added the bug Something isn't working label Jan 4, 2024
@Pil0tXia
Copy link
Member

Pil0tXia commented Jan 5, 2024

We have many PRs submitted by dependabot, but we can't merge it directly, because it will cause dependency&license check fail. Do you have any suggestions?

@ppkarwasz
Copy link
Contributor Author

We have a merge-dependabot-reusable.yaml Github Actions script that merges Dependabot PR's automatically if they pass the tests. @vy, any ideas how to adapt this to EventMesh?

The main problem, as I see it that EventMesh has a binary distribution with third-party dependencies, while Log4j doesn't. Therefore your merge Dependabot script must be more complex to adhere to the Apache "Assembling LICENSE and NOTICE' policy.

I can try to enhance your check-dependencies.sh so that it does not fail if a dependency version changes. Most of the time (i.e. if the new version does not have new dependencies) it should be enough to merge the Dependabot PR automatically.

IIRC Apache Airflow also has a complex CI deployment to upgrade several Python packages a day. @potiuk, is there some part of your infrastructure that can be adapted to a Java project?

@vy
Copy link
Member

vy commented Jan 5, 2024

If I understand it right, for a particular dependency (e.g., Log4j), these four locations must be aligned:

  1. build.gradle
  2. tools/third-party-licenses/NOTICE#L626
  3. tools/dependency-check/known-dependencies.txt
  4. tools/third-party-licenses/licenses/java/LICENSE-log4j-api.txt

[Please correct me if I am missing any above.]

You can follow a strategy as follows:

  1. Break down the tools/third-party-licenses/NOTICE file into multiple tools/third-party-licenses/notices/*/NOTICE-*.txt files. This is similar to what you already have for licenses, i.e., tools/third-party-licenses/licenses/*/LICENSE-*.txt.
  2. Enrich known-dependencies.txt to contain the link patterns formatted by version to external NOTICE and LICENSE files. For instance, you can use the following link pattern for the Log4j NOTICE file: https://github.com/apache/logging-log4j2/blob/rel/<version>/NOTICE.txt.
  3. Implement a dependabot-specific GHA workflow such that if NOTICE and LICENSE files still do match and tests pass, update the version in build.gradle and known-dependencies.txt files.

If you are interested in, I can implement this as a contractor and contribute to EventMesh.

Disclaimer: I am a long time Log4j contributor and Apache Logging Services (log4j, log4cxx, etc.) PMC Chair.

@Pil0tXia
Copy link
Member

Pil0tXia commented Jan 5, 2024

@vy

these four locations must be aligned

another location shall be aligned: tools/third-party-licenses/LICENSE.

Thank you for proposing the solution. I find it feasible.

@xwm1992 @qqeasonchen , do you think we can further streamline the five files mentioned above?

@ppkarwasz
Copy link
Contributor Author

For licensing problems there is also an Apache Creadur Whisker project, although I don't know what is it worth.

@Pil0tXia
Copy link
Member

Pil0tXia commented Jan 9, 2024

@vy

In your experience and knowledge, do you think our LICENSE and dependency management are excessively configured?

For example, we attach a txt file for every third-party dependency (e.g., tools/third-party-licenses/licenses/java/LICENSE-log4j-api.txt). Is it necessary to declare the licenses of these dependencies again in tools/third-party-licenses/LICENSE?

Regarding dependency management, I haven't seen any other projects using the tools/dependency-check/known-dependencies.txt file. The RocketMQ project doesn't perform dependency checks, while Kafka uses a Gradle plugin with minimal configuration. Does this mean that the tools/dependency-check/known-dependencies.txt file is redundant?

@vy
Copy link
Member

vy commented Jan 9, 2024

In your experience and knowledge, do you think our LICENSE and dependency management are excessively configured?

@Pil0tXia, I wouldn't say so. There are no plug-n-play tooling to make the life easier for 1) Gradle/Maven application developers 2) that need to provide a binary distribution 3) complying with the ASF distribution requirements. Airflow, Kafka, etc. they all roll out their own solutions. These efforts are mostly manual, that is, people occasionally check the validity of these files and update them as they see fit.

Note that point (2) – that is, the fact that you distribute binaries – is crucial. For instance, Log4j doesn't distribute binaries containing dependencies. Hence, there, we don't have this problem. We only maintain a NOTICE manually, which is almost fixed for decades.

For example, we attach a txt file for every third-party dependency (e.g., tools/third-party-licenses/licenses/java/LICENSE-log4j-api.txt). Is it necessary to declare the licenses of these dependencies again in tools/third-party-licenses/LICENSE?

No, I don't think so.

Regarding dependency management, I haven't seen any other projects using the tools/dependency-check/known-dependencies.txt file. The RocketMQ project doesn't perform dependency checks, while Kafka uses a Gradle plugin with minimal configuration. Does this mean that the tools/dependency-check/known-dependencies.txt file is redundant?

I don't think so. You have license check as a part of the build. This is great. You also partially implemented a framework to automatically include the LICENSE, etc. files while creating a distribution. But updates to this framework are not automated yet.

In conclusion, I don't think you are over-engineering. I think you are trying to do the right thing. I will consult to members@apache.org on this matter and see if we can simplify the process before getting down to implement an automation pipeline around it.

@ppkarwasz
Copy link
Contributor Author

ppkarwasz commented Jan 9, 2024

For libraries that don't bundle their dependencies there is some automatic tooling: maven-apache-resources. It contains Velocity templates that generate META-INF/LICENSE, META-INF/NOTICE and META-INF/DEPENDENCIES files.

For example the last one in log4j-core looks like this:

From: 'com.github.luben'

  - zstd-jni (https://github.com/luben/zstd-jni) com.github.luben:zstd-jni:jar:1.5.5-11
    License: BSD 2-Clause License  (https://opensource.org/licenses/BSD-2-Clause)


From: 'Conversant Engineering' (http://engineering.conversantmedia.com)

  - com.conversantmedia:disruptor (https://github.com/conversant/disruptor) com.conversantmedia:disruptor:jar:1.2.15
    License: The Apache License, Version 2.0  (http://www.apache.org/licenses/LICENSE-2.0.txt)


From: 'FasterXML' (http://fasterxml.com)

  - Woodstox (https://github.com/FasterXML/woodstox) com.fasterxml.woodstox:woodstox-core:bundle:6.5.1
    License: The Apache License, Version 2.0  (http://www.apache.org/licenses/LICENSE-2.0.txt)
...

and we don't have to do anything to generate it (the maven-apache-parent contains the necessary logic).

I believe that for projects that bundle their dependencies a similar mechanism could be used. Notice files are more complex (but fortunately uncommon), but licenses IMHO could be listed in a file like the example above (and bundled in a single copy). IANAL

@vy
Copy link
Member

vy commented Jan 9, 2024

I have shared this issue in members@apache.org. (Note that it is only accessible by ASF members.)

@vy
Copy link
Member

vy commented Jan 9, 2024

@Pil0tXia, I would like to share some updates from the members@apache.org thread:

  1. You are not over engineering, you are doing the right thing.
  2. To my surprise, almost all projects manually collect and check the license information.
  3. You can indeed massively ease dependency updates: hook up to dependabot PRs, auto-merge the PR iff build succeeds and license of the new version matches with the one of the old. Otherwise, block the PR and add a PMC-review-needed label.

I figured that license information is available in pom.xmls and can easily be accessed/downloaded, no need for manually storing link patterns, etc. as I suggested earlier. No need to store licenses, etc. either. They can be download while creating the distribution.

I am still interested in this gig. Let me know what you think.

@Pil0tXia
Copy link
Member

@vy

I agree with your idea. Thank you very much for the insights you've provided.

When EventMesh creates a distribution, we execute the dist and installPlugin tasks from build.gradle. We can create a new task that automatically downloads LICENSE files, and then utilize its outcome in the dist task.

@Pil0tXia Pil0tXia added dependencies Pull requests that update a dependency file license labels Jan 10, 2024
@Pil0tXia
Copy link
Member

May we make some progress here? ☺️

@vy
Copy link
Member

vy commented Feb 20, 2024

@Pil0tXia, who is expected to deliver the work?

As I have stated earlier, I can implement this feature as a contractor – i.e., paid job. Yet I haven't been approached in this direction.

@Pil0tXia
Copy link
Member

@vy
I couldn't find the necessary configuration to allow modifications to the known-dependencies.txt file in dependabot.yml. Do you have any relevant reference materials on this? Thank you very much.

@Pil0tXia
Copy link
Member

I think dependabot does not support editing known-dependencies.txt file in GitHub Actions after reading all dependabot documents. I've decided to weaken the version number determination of check-dependencies.sh.

@ppkarwasz
Copy link
Contributor Author

ppkarwasz commented Apr 11, 2024

@Pil0tXia,

No, it doesn't support your known-dependencies.txt format, but it supports the standard libs.versions.toml file.

My approach to this issue would be to:

  1. Switch to the libs.versions.toml file for dependency versioning. This also requires removing the Spring dependency-management-plugin you are currently using,
  2. Manually create a "bundle" (in libs.versions.toml) that contains all the third-party dependencies that end up in your binary distribution. E.g.:
    [libraries]
    accessors-smart = { group = 'net.minidev', name = 'accessors-smart', version = '2.4.7' }
    alibabacloud-gateway-spi = { group = 'com.aliyun', name = 'alibabacloud-gateway-spi', version = '0.0.1' }
    ...
    
    [bundles]
    dist = ["accessors-smart", "alibabacloud-gateway-spi", ... ]
  3. Create a Groovy script that will use the data in the bundle to check that:
    • your binary distribution archive contains exactly the dependencies from the "bundle",
    • your binary distribution contains a license file for each dependency from the "bundle".
  4. Implement automatic merging of Dependabot PRs, which @vy implemented in Apache Logging Services.

This should be a rather safe setup since:

  • The TOML file contains structured data, so a Groovy script can easily deduce that a file called accessors-smart-2.4.7.jar corresponds to the accessors-smart entry, which should have a license file in licenses/net.minidev/LICENSE-accessors-smart.txt,
  • Dependabot will only update the "libraries" section of your file, it will never add or remove libraries from "bundles". So if a new transitive dependency appears or disappears, the Dependabot PR will fail, so you can manually add or remove a license file.

Remark: the main disadvantage of this setup is that you manage through Dependabot all your dependencies, both direct and transitive. You have about 300 deps, which means around 10-20 Dependabot PRs daily! The automatic merge is really required.

@ppkarwasz
Copy link
Contributor Author

An alternative approach would be through SBOM generation:

  1. You generate an SBOM for your binary distribution (there is a Gradle plugin for that),
  2. The CycloneDX format has all the data you need to determine which license applies to which component,
  3. You can create a script that generates your licensing data.
    According to the INFRA guide on licensing you don't need a file for each dependency, but you need to modify the main LICENSE file in your binary distribution to mention the licenses of all the bundled dependencies. Therefore the structure of your binary distribution could look like:
    LICENSE
    NOTICE
    licenses/
        Apache-2.0.txt
        BSD-3-Clause.txt
    
    where LICENSE is generated and looks like:
    <text of Apache 2.0 license>
    
    This distribution contains the following third-party:
    
    lib/accessors-smart-2.4.7.jar licensed under 'Apache-2.0'. For details see: licenses/Apache-2.0.txt
    lib/super-widget-1.2.3.jar licensed under 'BSD-3-Clause'. For details see: licenses/BSD-3-Clause.txt
    

@Pil0tXia
Copy link
Member

Pil0tXia commented Apr 12, 2024

I found that the plugin https://github.com/jk1/Gradle-License-Report has been introduced in the project to generate license reports. However, license files are exported to the build/reports/dependency-license path only if the third-party dependency jar package contains the LICENSE file. It exports 139 licenses, far fewer than the 306 third-party dependencies the project is currently using.

While the html report it generates declares 300+ dependencies in full, it does not provide the contents of the LICENSE file for each one. Trying to generate a LICENSE file for a project using html as a data source is difficult.

The existing LICENSE file for the project has only 200+ dependencies in it, which is also incomplete. It seems that this plugin is hardly a help for generating LICENSE files.

However, the json file generated by cyclonedx-gradle-plugin is too huge. I am writing gradle scripts based on its data format.

#4827 mainly addresses the dependencies problem. I'll address license problem in another PR.

@Pil0tXia
Copy link
Member

@ppkarwasz

You have about 300 deps, which means around 10-20 Dependabot PRs daily! The automatic merge is really required.

Hi, I've made several attempts at apache/eventmesh-dashboard#122, apache/eventmesh-dashboard#128 and apache/eventmesh-dashboard#130. It looks like the most we can do is auto-approve a Dependabot PR, but can't do merge it.

I looked at merge-dependabot-reusable.yaml. It deals with git, bypassing GitHub's PR mechanism in favor of committing directly on the master branch.

I'm not sure if the GPG_SECRET_KEY key exists in our repository. This key seems to need to be manually requested from infra before it can be used.

@ppkarwasz
Copy link
Contributor Author

@Pil0tXia,

The GPG_SECRET_KEY most certainly does not exist in your repo, it must be generated and added manually by INFRA. See our INFRA-23996 for how our key was set up, you can reference it in your INFRA ticket. Our GPG key is assigned to the Apache Logging Services PMC. For security reasons we can only access it via Github Actions, but we can invalidate the key if it is abused. The key is somehow bound to the asf-rm GitHub user and the commits it creates appear to be generated by that bot account.

The automation of Dependabot merges is not so trivial: it took @vy a week or two of full-time work to figure out all the details (fortunately we had a sponsor for that). I was than able to reproduce it in half a day in copernik-eu/log4j-tomcat/.github/workflows/merge-dependabot.yaml.

The basic idea is that we follow the manual procedure to merge PRs:

  1. We merge/cherry-pick/rebase ${{ github.head_ref }} (e.g. refs/heads/dependabot/gradle/org.mongodb-mongodb-driver-3.12.14) with ${{ github.base_ref }} (e.g. refs/heads/master),
  2. We push the commit to ${{ github.head_ref }} (might require -f) to update the PR,
  3. After a couple of seconds we push the same commit to the master branch ${{ github.base_ref }},
  4. GitHub realizes that ${{ github.head_ref }} and ${{ github.base_ref }} now point to the same commit and marks the PR as merged.

There are some pitfalls you should be aware of:

  • The workflow must be triggered by the pull_request_target, because workflows triggered by pull_request don't have write access to the Git repo.
  • The pull_request_target event is special: the github.ref is the same as github.base_ref (e.g. refs/heads/master). Since most actions implicitly use github.ref, you could end up running your build and tests on the master branch instead of the PR branch. To fix this I had to add an explicit ref parameter to actions/checkout:
    - name: Checkout repository
      uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633   # 4.1.2
      with:
        # When running on `pull_request_target` use the PR branch, not the target branch
        ref: ${{ github.event_name == 'pull_request_target' && github.head_ref || github.ref }}
  • The checkout action by default does a shallow clone: only the last commit is downloaded so Git can not always find the branching point where the master branch and PR branch diverged. Usual merge/cherry-pick/rebase commands might not work unless you change the depth parameter of the action. Volkan works around this by downloading the PR as patch and applies it to the master branch:
    - name: Download and apply patch
      shell: bash
      env:
        PATCH_URL: ${{ github.event.pull_request.patch_url }}
       run: |
         wget -O- "$PATCH_URL" | git apply
    In the workflow of my personal repo I increase depth to an arbitrary 32 commit (I could also set it to 0 to make a complete clone.
  • The checkout action only clones one branch. If you need two branches for a merge/cherry-pick/rebaseoperation, you need to callgit fetch` explicitly.

Debugging tips

It is usually easier to test automatic Dependabot merges in your personal fork, since you can add your GPG key as repository secret.

If you run out of Dependabot PRs, you can:

mxsm pushed a commit that referenced this issue May 17, 2024
* Sync changes in #4719

* minor change

* Only keep the artifact name

* Run `sed -i 's/-[0-9].*\.jar//g'`

* Run `sort known-dependencies.txt | uniq > known-dependencies-unique.txt`

* Allow CI to run on branches with namespace in the branch name in forked repos

* Correct typo and remove useless command

* Use `sort -u -o` instead of `uniq` to remove duplicate artifacts with different version

* Enlarge open-pull-requests-limit

* minor: polish tips

* Test apache/skywalking-eyes/dependency CI result

* Fix 'unable to find version `0.6.0`'

* See debug log to prove it works

* skywalking-eyes/dependency doesn't support gradle, test basic actions/dependency-review-action

* Add all denied licenses

* Remove redundant check

* Remove not included SPDX: ASL, RSAL

* Add a useful printAllDependencyTrees task

* Exampt safe artifact under multiple licenses

* Exempt more safe artifacts (Looks like the last of them)

* 'allow-dependencies-licenses' attribute only supports single-line text

* Add a TODO comment

* Add more file extensions for checkstyle

* Resolve some checkstyle header violations

* Add back apache/skywalking-eyes

* Fix downloaded file didn't have a `.`

* Disable Go deps update & Must pass CI before merge

* No need to force up-to-date & Auto-approve only

* Remove the slash at the end of the homepage url in Repo GitHub desc

* Skip patch updates temporarily to reduce PR noise

* Logback removed after be06ef7

* Accept patch update

* Submit dependency graph

* Follow https://github.com/gradle/actions/blob/main/docs/dependency-submission.md#usage-with-pull-requests-from-public-forked-repositories

* try to sort dependency graph workflow exec seq

* `workflow_run` event will only trigger a workflow run if the workflow file is on the default branch

* Grant required permission of CodeQL

* Attempt to fix 'No dependency graph files found to submit'

* Attempt to fix 'No dependency graph files found to submit' try 2

* Attempt to fix 'No dependency graph files found to submit' try 3

* Attempt to fix 'No dependency graph files found to submit' try 4

* Try to check dependency-review

* Only check bundled dependencies

* Fix 'No snapshots were found for the head SHA' attempt 1

* Test runtimeClasspath dependencies

* Revert "Test runtimeClasspath dependencies"

This reverts commit 3de89a5.

* Try to retry 1 hr wo wait for snapshot update

* Test gradle/actions#196 (comment)

* Add todo comments

* Keep implementation and compileOnly for now

* Keep runtimeOnly deps

* [Breaking Change] Remove dependency-review-action and wait for its bugfix

* Add checkDeniedLicense into CI

* minor code optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file license
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants