Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Error occurs and "generating global index page..." take too many time #1108

Open
r00t1900 opened this issue Apr 7, 2022 · 2 comments
Labels
question Further information is requested

Comments

@r00t1900
Copy link

r00t1900 commented Apr 7, 2022

case

The console output logs:

...
2022-04-07 08:08:54,922 INFO: yolkfolk no longer exists on PyPI (package.py:65)
2022-04-07 08:08:55,083 INFO: yuijfish no longer exists on PyPI (package.py:65)
2022-04-07 08:08:55,401 INFO: yuij-xiaoxiaolog no longer exists on PyPI (package.py:65)
2022-04-07 08:08:55,566 INFO: zju-hitcarder-xuhao no longer exists on PyPI (package.py:65)
2022-04-07 08:08:55,567 INFO: Generating global index page. (mirror.py:483)
2022-04-07 09:31:02,978 INFO: New mirror serial: 13353580 (mirror.py:507)
2022-04-07 09:31:03,218 INFO: 0 packages had changes (mirror.py:1043)
2022-04-07 09:31:03,218 INFO: Writing diff file to mirrored-files (mirror.py:1053)

From the logs, we can see that bandersnatch took almost 90 mins to do generating global index page even 0 packages had changes. This often happen when the bandersnatch run with error like:

2022-04-07 00:00:37,509 ERROR: Error syncing package: pl-nightly@13343947 (mirror.py:363) 
...

After this error happen, bandersnatch will go straight toward to generating global index page and then finish the work. However, you need to rerun bandersnatch for another generating global index page operation( but I don't know why) to remove todo file and then can resume to a normal status.

questions

So here I have some questions:

  • I have set stop-on-error=Flase, but why the bandersnatch still make a stop-like action when ERROR: Error syncing package
  • I have set download-mirror, but recently bandersnatch often gives hints like "conducting to next uri" and then download from "https://files.pythonhosted.org", which is much more slower. Why would this happen?
  • On my previous issue, one of the developer had instructed me to add "generating_global_index=True" to avoid executing "generating global index page" every time. However it did not work since I don't know where should I exactly add this parameter to.
  • I am now reach about 9.0T data, and I can figure out that the reason why I previous download is only 8.61T is, the bandersnatch error and stop. Because some of the network error, the bandersnatch goes into a weired loop, and go straight to "generating global index page", which make me think it has made it to the end. However this is just a false-end.
@r00t1900
Copy link
Author

r00t1900 commented Apr 7, 2022

One more thing:

  • If I would like to ignore the prerelease file, what should I do? I've noticed that there are prerelease plugin, is this plugin to enable prelease download or disable prelease download? What I need it to ban the prelease download, would someone give me an explain? thanks.

@cooperlees
Copy link
Contributor

cooperlees commented Apr 8, 2022

You're an inquisitive one ...

I have set stop-on-error=Flase, but why the bandersnatch still make a stop-like action when ERROR: Error syncing package

When we error, we still log it. Maybe that's the confusion here.

I have set download-mirror, but recently bandersnatch often gives hints like "conducting to next uri" and then download from "https://files.pythonhosted.org/", which is much more slower. Why would this happen?

If the mirror you set does not have the file, I'm pretty sure we fall back. I'd have to read the code to be 100%.

  • Code will always answer you questions here ...

On my previous issue, one of the developer had instructed me to add "generating_global_index=True" to avoid executing "generating global index page" every time. However it did not work since I don't know where should I exactly add this parameter to.

I ment, we'd have to change the code to support that option. As in do a PR. That option does not exist today.

I am now reach about 9.0T data, and I can figure out that the reason why I previous download is only 8.61T is, the bandersnatch error and stop. Because some of the network error, the bandersnatch goes into a weired loop, and go straight to "generating global index page", which make me think it has made it to the end. However this is just a false-end.

Yes, bandersnatch is designed to be eventual consistent. We don't ever expect a perfect run every time. This is the internet after all.

If I would like to ignore the prerelease file, what should I do? I've noticed that there are prerelease plugin, is this plugin to enable prelease download or disable prelease download? What I need it to ban the prelease download, would someone give me an explain? thanks.

https://bandersnatch.readthedocs.io/en/latest/filtering_configuration.html#prerelease-filtering

  • It just reads metadata and does not download the pre release versions of a package

Please feel free to submit any updates to documentation if you'd like to help make it more understandable.

@cooperlees cooperlees added the question Further information is requested label Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants