Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't properly work anymore #275

Open
caiot5 opened this issue Jan 14, 2024 · 3 comments · May be fixed by #280
Open

Doesn't properly work anymore #275

caiot5 opened this issue Jan 14, 2024 · 3 comments · May be fixed by #280

Comments

@caiot5
Copy link

caiot5 commented Jan 14, 2024

I used to use wayback-machine-downloader quite a lot, however, it doesn't seem to work anymore (at least in a proper way).
The reason I think that is behind it not being able to properly download the content anymore is a connection throttling mechanism that archive.org seem to have implanted, as you can see in the log below (which you can establish from the 'connection refused' error) :

http://www.ig.com.br:80/home/editorial/stories/editorial_body/0,1205,254060,00.html # Failed to open TCP connection to web.archive.org:443 (Connection refused - connect(2) for "web.archive.org" port 443)
websites/www.ig.com.br/home/editorial/stories/editorial_body/0,1205,254060,00.html was empty and was removed.

For me it looks like one needs to slow down the individual TCP connection establishment in order not to suffer from the throttling mechanism.
Is there anything we can do to delay those connections?

@rustam
Copy link

rustam commented Jan 14, 2024

please give a look for this thread
#273 (comment)

@caiot5
Copy link
Author

caiot5 commented Jan 14, 2024

please give a look for this thread #273 (comment)

Thanks for that. I'm using this workaround right now and it worked great!
I think it needs to go mainstream 'cause (for now) wayback-machine-downloader is useless without this 'mod'.

@caiot5
Copy link
Author

caiot5 commented Jan 14, 2024

It would be really nice if in the workaround we could ignore the 'sleep 3' if the file already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants