Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get clean code_repository url from pypi "project_urls" data #1156

Open
manasaV3 opened this issue Jul 17, 2023 · 8 comments
Open

Get clean code_repository url from pypi "project_urls" data #1156

manasaV3 opened this issue Jul 17, 2023 · 8 comments
Labels
backend bug-fix Release Label: Used for categorizing bug fixes in automated CI release notes

Comments

@manasaV3
Copy link
Collaborator

To get the code repository, we currently check if there is a Source Code field in the project_urls section of the PyPI response. In case the Source Code field does not exist, we iterate through the URLs, and fetch the first one, which is a GitHub url. In those cases, we might fetch options that aren't the best fit.

For example, look at the snippet below:

"project_urls": {
    "documentation": "https://github.com/imagej/napari-imagej#README.md",
    "download": "https://pypi.org/project/napari-imagej/",
    "homepage": "https://github.com/imagej/napari-imagej",
    "source": "https://github.com/imagej/napari-imagej",
    "tracker": "https://github.com/imagej/napari-imagej/issues"
}

In here, the code repository, gets set as "https://github.com/imagej/napari-imagej#README.md". This leads to incorrect fetching of data from gitHub.

We should parse the url to remove any additional anchor tags, query parameter in the url for code repository.

Also, we should check if the source field exists in the project url before iterating over the urls in that section.

@manasaV3 manasaV3 added backend bug-fix Release Label: Used for categorizing bug fixes in automated CI release notes labels Jul 17, 2023
@bnelson-czi
Copy link

I went through and found every plugin with a malformed or no code_repository. This include repos with a technically correct GitHub url formats but the repo doesn't exist (maybe it got moved or taken down).

ERROR WHEN REQUESTING napari-bud-cell-segmenter REPO: https://github.com/AurelienMaillot/napari-bud-cell-segmenter
ERROR WHEN PARSING napari-nibabel REPO: None
ERROR WHEN PARSING napari-hello REPO: None
ERROR WHEN PARSING napari-svetlana REPO: https://bitbucket.org/koopa31/napari_svetlana/src/main/
ERROR WHEN PARSING napari-pram REPO: None
ERROR WHEN REQUESTING napari-imagej REPO: https://github.com/imagej/napari-imagej#README.md
ERROR WHEN PARSING napari-bioimageio REPO: None
ERROR WHEN PARSING eats-worm REPO: None
ERROR WHEN REQUESTING recOrder-napari REPO: https://github.com/mehta-lab/recOrder/tree/main/recOrder
ERROR WHEN REQUESTING napari-indices REPO: https://github.com/Emmanulla0/napari-indices
ERROR WHEN REQUESTING allencell-ml-segmenter REPO: https://github.com/AllenCell/allencell-ml-segmenter
ERROR WHEN PARSING napari-mri REPO: None
ERROR WHEN PARSING faser REPO: None
ERROR WHEN PARSING napari-annotate REPO: None
ERROR WHEN PARSING napari-labelprop REPO: None
ERROR WHEN PARSING napari-nanopyx REPO: None
ERROR WHEN REQUESTING misic-napari REPO: https://github.com/pswap/misic
ERROR WHEN REQUESTING napari-stl-exporter REPO: https://github.com/jo-mueller/napari-stl-exporter.git
ERROR WHEN PARSING napari-tomodl REPO: None
ERROR WHEN PARSING napari-tissuumaps REPO: None
ERROR WHEN PARSING avidaq REPO: None
ERROR WHEN PARSING napari-apr-viewer REPO: None
ERROR WHEN PARSING grabber-ift REPO: None
ERROR WHEN PARSING napari-proofread-brainbow REPO: None
ERROR WHEN PARSING imaxt-multiscale-plugin REPO: None
ERROR WHEN PARSING napari-live-flim REPO: None
ERROR WHEN PARSING napari-microscope REPO: None
ERROR WHEN PARSING napari-hough-circle-detector REPO: None
ERROR WHEN REQUESTING napari-laptrack REPO: https://github.com/haesleinhuepf/napari_laptrack
ERROR WHEN PARSING napari-kics REPO: None
ERROR WHEN PARSING multireg REPO: https://gitlab.pasteur.fr/gletort/multireg
ERROR WHEN PARSING napari-mzarr REPO: None
ERROR WHEN PARSING napari-tomocube-data-viewer REPO: None
ERROR WHEN REQUESTING napari-data-preview REPO: https://github.com/WyssCenter/napari-data-preview
ERROR WHEN PARSING napari-mclabel REPO: https://gitlab.cs.fau.de/xo04syge/mclabel
ERROR WHEN PARSING okapi-em REPO: None

@richaagarwal
Copy link
Collaborator

I didn't realize we try to guess the code_repository in this way. @manasaV3 What are your thoughts on being stricter here and only using a value if it is explicitly provided by the plugin developer under the Source code field, rather than iterating through the remaining URLs?

Regarding when a code_repository value is provided but is not valid (e.g. perhaps the repo has since gone private or moved/renamed), I think we could do a check on it to see if the response is 200 or not and otherwise treat the value as null. Thoughts on that?

@richaagarwal
Copy link
Collaborator

Hmm thinking about this one more, it seems like we expect a very specific format for this field - https://github.com/{org_name}/{repo_name} and would want to avoid displaying data for a plugin if it doesn't meet that pattern. For instance, for https://www.napari-hub.org/plugins/recOrder-napari, their source code field is set to https://github.com/mehta-lab/recOrder/commits/main.

It seems like we should at least validate the URL pattern in these cases - curious to get your thoughts @manasaV3

@manasaV3
Copy link
Collaborator Author

I agree. We could even parse the URL to remove anything outside of https://github.com/{org_name}/{repo_name}.

@richaagarwal
Copy link
Collaborator

richaagarwal commented Aug 10, 2023

@manasaV3 That sounds good to me. I'm realizing that for both validating and cleaning there would be some edge cases, e.g. if someone were to provide a source code URL that's a github URL but something unrelated like https://github.com/orgs/chanzuckerberg/projects/12/views/26. But that seems unlikely & would still be better than current behavior.

Regardless, let's plan to validate the URL to the best of our ability and then handle it the same way we do for null code repository values*.

*This is what we currently display in that case, though I wonder if we would want to tweak the language slightly 🤔
Screenshot 2023-08-10 at 12 35 01 PM

@richaagarwal richaagarwal added the P0 Critical Priority label Aug 10, 2023
@richaagarwal
Copy link
Collaborator

richaagarwal commented Aug 10, 2023

Also, I'm marking this a P0 because it results in displaying inaccurate data for plugins. We should at the minimum not display the data at all if we don't have a correctly formatted URL, which validation should help with.

cc @junxini for awareness

@manasaV3
Copy link
Collaborator Author

With this issue, we have identified cases where a plugin could have a code_repository url but not have valid stats for maintenance due to several possibilities.

To handle those cases, we need additional validation on the front end. Before rendering the visualizations for maintenance, we should verify 2 conditions.

  1. the code_repository in the plugin response is not null.
  2. the maintenance.stats.total_commits > 0 or the maintenance.stats.latest_commit_timestamp is not null.

If either of the 2 conditions is false, we should display the text copy to inform the user the commit history stats are unavailable, which is referenced in here.

Reasoning:
It is possible for a repository not to have had a commit in the last 12 months, but if it has a total of 0 commits or if we are unable to fetch the latest_commit_timestamp for it, it could only be because the repo was inaccessible.

@richaagarwal
Copy link
Collaborator

Removing P0 from this and tracking the bug itself in #1220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug-fix Release Label: Used for categorizing bug fixes in automated CI release notes
Projects
Status: Backlog
Development

No branches or pull requests

3 participants