Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use external download manager (wget or curl) #147

Open
wxguy opened this issue Apr 4, 2024 · 5 comments
Open

Option to use external download manager (wget or curl) #147

wxguy opened this issue Apr 4, 2024 · 5 comments

Comments

@wxguy
Copy link

wxguy commented Apr 4, 2024

This is the kind of package I was looking for. API is simple to use.

I have an issue that most often parfive fails to download files from URLs. This do happen after spending long time in downloading. To overcome this, I propose either of following options:

  1. Provide resume option in API.
  2. Provide external downloader engine such as wget or curl (with keyword like 'engine=wget') which works butiful.
@Cadair
Copy link
Owner

Cadair commented Apr 4, 2024

Hi 👋 glad you find parfive useful.

You can retry failed downloads with retry but I don't think that it will resume partial downloads (that would be a good feature to add though #10 ).

I don't think I will ever support alternative download engines, especially not ones which require shelling out to a binary. That sounds like it would be a very different code path and quite hard to fit into the same API. There is #143 where I am considering ditching aiohttp.

Can you elaborate on the issues you are facing? Have you looked into if any aiohttp settings could make your particular downloads more reliable?

@wxguy
Copy link
Author

wxguy commented Apr 4, 2024

Thank you for your response.

My use case is that I have to download approx 100+ files from various resources in my application to create the final product. Even if one file is not downloaded or missing, the final product won't be created. One way to resolve the issue is to download only the missing or resume from a partial download. While it is not an issue to re-download all files again, the time taken and overall size of downloaded files would be a huge down for me.

I don't think I will ever support alternative download engines, especially not ones which require shelling out to a binary.

I understand the reason why you won't be implementing this feature.

I have limited exposure to aiohttp library. However, reading from various web resources indicates that it is possible to implement a partial resume of download. One good example is given here https://stackoverflow.com/questions/58448605/download-file-with-resume-capability-using-aiohttp-and-python for aiohttp and https://stackoverflow.com/questions/22894211/how-to-resume-file-download-in-python for general downloader engine. Additional references are given here as well https://stackoverflow.com/questions/12243997/how-to-pause-and-resume-download-work.

If you can implement this feature, it would be of great help and excellent value addition for parfive as I have not seen any other similar libraries giving this feature.

Thank you in advance.

@Cadair
Copy link
Owner

Cadair commented Apr 4, 2024

One way to resolve the issue is to download only the missing

retry should do this for you.

dl = Downloader()
dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./")
files = dl.download()
if files.errors:
    dl.retry(files)

resuming partial downloads would be good, but I don't have time to work on that in the near term. If you or someone else can pick it up I would be happy to help.

@wxguy
Copy link
Author

wxguy commented Apr 4, 2024

retry should do this for you.

Did this already. But still a few partial downloads are left which are skipped by parfive.

Anyway, thank you for listening to the voice. Hoping to see new update with this feature in the near future.

Thank you.

@Cadair
Copy link
Owner

Cadair commented Apr 8, 2024

But still a few partial downloads are left which are skipped by parfive.

This feels like a bug, I don't suppose you have a way to reliably reproduce this? Is the issue that the failed download files are not getting deleted correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants