Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bandersnatch size_project_metadata plugin casuses some packages to not sync - e.g. pip + falcon #1169

Open
J-Phi1123 opened this issue Aug 5, 2022 · 10 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@J-Phi1123
Copy link

I want to begin with, I'm pretty sure this is a user error thing but can't figure out what I'm doing wrong on this. It is not obvious whatever is causing it and bandersnatch is not very helpful in identifying the issue. Thanks in advance for any support to fix this. I have been screwing with this for over 2 weeks now and almost done with all of this.

I am trying to create a complete offline pip repo and it seems like it is working but of course, out of thousands of packages that are online, two are not being updated; specifically pip, and falcon

I see '^pip\ " and "^falcon\ " names and many other files in the "todo" file after bandersnatch mirror --force-check runs.

If I try to run bandersnatch sync falcon falcon is still not present in pip/pypi/web/simple/falcon

I recently turned on json = true and reran bandersnatch mirror --force-check it created the json folder which does not contain the falcon or pip file?

I am currently running bandersnatch verify now that I have a json folder which I guess will take a few days to finish so unfortunately I can't run bandersnatch sync --debug falcon. From my memory the only thing that seemed different while running it with --debug is it mention filter rules; filter and file filter. Definitely nothing about how it couldn't download anything. It seems to think the files were already downloaded?

Specs:
bandersnatch 5.2.0
OS: ubuntu 20.04
syncing to external ext4 drive

Config:
'''
[plugins]
enabled =
size_project_metadata
[size_project_metadata]
max_package_size = 100M
[mirror]
directory = /media/user/ExternalEXT4/pip/pypi
json = true
release-files = true
cleanup = false
master = https://pypi.org
timeout = 10
global-timeout = 1800
workers = 3
hash-index = false
stop-on-error = false
storage-backend = filesystem
verifiers = 3
compare-method = hash
diff-file = /media/user/ExternalEXT4/pip/pypi/mirrored-files
'''

@cooperlees cooperlees added bug Something isn't working help wanted Extra attention is needed labels Aug 5, 2022
@cooperlees
Copy link
Contributor

Will try and look into this over the weekend and see if I can reproduce ...

@J-Phi1123
Copy link
Author

Thanks for the help.

@cooperlees
Copy link
Contributor

So I was able to repro with using the size_project_metadata plugin ... So the bug is in there ...

Debug run with plugin enabled:

crl-m1:~ cooper$ /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon 2>&1 | tee /tmp/bander_sync_falcon_debug
2022-08-06 18:50:11,894 DEBUG: Checking config for storage backend... (configuration.py:121)
2022-08-06 18:50:11,894 DEBUG: Found storage backend in config! (configuration.py:123)
2022-08-06 18:50:11,895 INFO: Selected storage backend: filesystem (configuration.py:129)
2022-08-06 18:50:11,895 DEBUG: Checking config for compare method... (configuration.py:161)
2022-08-06 18:50:11,895 DEBUG: Found compare method in config! (configuration.py:163)
2022-08-06 18:50:11,895 INFO: Selected compare method: hash (configuration.py:175)
2022-08-06 18:50:11,895 DEBUG: Checking config for alternative download mirror... (configuration.py:178)
2022-08-06 18:50:11,895 DEBUG: No alternative download mirror found in config. (configuration.py:183)
2022-08-06 18:50:11,895 DEBUG: Skip checking download-mirror-no-fallback because dependent optionis not set in config. (configuration.py:203)
2022-08-06 18:50:11,950 DEBUG: Initializing Master's aiohttp ClientSession (master.py:79)
2022-08-06 18:50:11,977 INFO: Initialized metadata plugin size_project_metadata to block projects > 104857600 bytes (metadata_filter.py:232)
2022-08-06 18:50:11,983 DEBUG: Adding json directories to bootstrap (mirror.py:536)
2022-08-06 18:50:11,983 INFO: Setting up mirror directory: /tmp/pypi/web/simple (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/packages (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/local-stats/days (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/json (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/pypi (mirror.py:546)
2022-08-06 18:50:11,984 DEBUG: Retrieving FileLock instance @ /tmp/pypi/.lock (filesystem.py:36)
2022-08-06 18:50:11,984 DEBUG: Acquiring FLock with timeout: 1 (mirror.py:551)
2022-08-06 18:50:11,984 INFO: Generation file missing. Reinitialising status files. (mirror.py:586)
2022-08-06 18:50:11,985 DEBUG: Modifying destination: /tmp/pypi/generation with: /tmp/pypi/generation.m6ggg53h (filesystem.py:122)
2022-08-06 18:50:11,985 INFO: Status file /tmp/pypi/status missing. Starting over. (mirror.py:608)
2022-08-06 18:50:11,985 INFO: Syncing with https://pypi.org. (mirror.py:59)
2022-08-06 18:50:11,985 INFO: No release filters are enabled. Skipping release filtering (mirror.py:80)
2022-08-06 18:50:11,985 INFO: No release file filters are enabled. Skipping release file filtering (mirror.py:82)
2022-08-06 18:50:11,985 DEBUG: Package syncer 0 started for duty (mirror.py:127)
2022-08-06 18:50:11,985 INFO: Fetching metadata for package: falcon (serial 0) (package.py:58)
2022-08-06 18:50:11,985 DEBUG: Getting /pypi/falcon/json (serial 0) (master.py:146)
2022-08-06 18:50:12,005 DEBUG: Package syncer 1 started for duty (mirror.py:127)
2022-08-06 18:50:12,005 DEBUG: Package syncer 1 emptied queue (mirror.py:134)
2022-08-06 18:50:12,005 DEBUG: Package syncer 2 started for duty (mirror.py:127)
2022-08-06 18:50:12,005 DEBUG: Package syncer 2 emptied queue (mirror.py:134)
2022-08-06 18:50:12,307 DEBUG: Package syncer 0 emptied queue (mirror.py:134)
2022-08-06 18:50:12,307 INFO: Generating global index page. (mirror.py:486)
2022-08-06 18:50:12,308 DEBUG: Writing temporary file /tmp/pypi/web/simple/.index.html.x64odmpl to target destination: /tmp/pypi/web/simple/index.html (filesystem.py:93)
2022-08-06 18:50:12,308 DEBUG: Closing Master's aiohttp ClientSession and waiting 0.1 seconds (master.py:99)
2022-08-06 18:50:12,410 INFO: 0 packages had changes (mirror.py:1051)
2022-08-06 18:50:12,410 INFO: Writing diff file to /tmp/pypi/mirrored-files (mirror.py:1061)

So I disabled the plugin falcon downloaded fine. cmd: /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon

Full repro commands

mkdir /tmp/pypi
vim /tmp/pypi/bandersnatch.conf
- Changed dirs to be based out of /tmp/pypi
python3.10 -m venv /tmp/tb --upgrade-deps
/tmp/tb/bin/pip install bandersnatch==5.2.0

So we'd need to add more debugging info into the plugin code + plugin calling code to see what exactly is making it skip this package as a whole. Fixes welcome, I'm low on time to dig in and fix this plugin. As plugins are optional, I generally rely on contributions for them. I focus more on making core bandersnatch function (as I don't use bandersnatch + haven't for years and would really love to get a new maintainer)

@J-Phi1123
Copy link
Author

J-Phi1123 commented Aug 7, 2022 via email

@cooperlees cooperlees changed the title bandersnatch does not grab all packages. Todo list is left populated after the bandersnach mirror bandersnatch size_project_metadata plugin casuses some packages to not sync - e.g. pip + falcon Aug 8, 2022
@J-Phi1123
Copy link
Author

J-Phi1123 commented Aug 9, 2022

Took your advice and used the pypistats tool to generate a list of large projects. Seems to be working great.

It would seem that that plugin looks at the size of all files in the package and if sum of all bytes of all versions in a package is greater than what you specify; it doesn't grab any of them. I was thinking it was .whl | .tar.gz individually because I did see a few pip .whl files that were >100MB so I used that number thinking it was a sane Maximum. The plugin recommended 1GB but even that been blocking the packages I did want.

Anyways, Thanks again for the help and keep up the good work. You guys Rock!

@cooperlees
Copy link
Contributor

Thanks for digging in and explaining why things happened.

@cooperlees cooperlees reopened this Aug 16, 2022
@cooperlees
Copy link
Contributor

I think we should advertise that we'd love a fix for the size_project_metadata plugin + it's a known issue.

@J-Phi1123
Copy link
Author

J-Phi1123 commented Aug 16, 2022 via email

@cooperlees
Copy link
Contributor

O, so it SUMs() the whole project. I'll check if I can make documentation clearer than :) Cause I didn't get that from reading it either or missed it. Thanks for clearing that up too.

@J-Phi1123
Copy link
Author

J-Phi1123 commented Aug 16, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants