New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permission denied - connect(2) for "web.archive.org" port 443 #273
Comments
I think the archive.org site adding a limit on number of files to be downloaded in a time period. I can download maybe 6 or 7 files before I see the error. I added a sleep 60 into a script after downloading 6 files and it seems to be working ... slowly.. :-( |
I get the same error every now and then. My two cents to this issue: why not retrying after a failure (i.e. ECONREFUSED) and putting a sleep only in this case? |
got same error, solved by sleep 2 seconds every download. |
how to add sleep 2 seconds every download ? @rustam |
@setiawan-chandra you can do it with a simple script. In powershell, I used wayback_machine_downloader <website> -l -a --from 2023 > files.txt
Get-Content .\files.txt | ForEach-Object {Start-Sleep -Seconds 2; wayback_machine_downloader $_ --from 2023 -e} The |
i think it's not working @davidpfister ForEach-Object: command not found |
@setiawan-chandra, do you have powershell? and which version? |
it's true if i running this command like this? @davidpfister |
@setiawan-chandra if you are on windows, just start powershell from the start menu. Or start a powershell terminal rather than using it from WSL. If you still want to use it with your WSL shell, then you might need to import some PS modules. Try to do
that should get you the Get-... Cmdlet |
@setiawan-chandra , just go lib directory for wayback_machine-downloader, then add sleep(3) to download_files method of wayback_machine_downloader.rb, my case line 213, currently changed that 2 secs to 3 secs cd /var/lib/gems/3.0.0/gems/wayback_machine_downloader-2.3.1/lib def download_files
|
I also needed to add sleep to the def get_all_snapshots_to_consider
# Note: Passing a page index parameter allow us to get mor>
# but from a less fresh index
print "Getting snapshot pages"
snapshot_list_to_consider = []
snapshot_list_to_consider += get_raw_list_from_api(@base_u>
print "."
unless @exact_url
@maximum_pages.times do |page_index|
snapshot_list = get_raw_list_from_api(@base_url + '/*'>
break if snapshot_list.empty?
snapshot_list_to_consider += snapshot_list
print "."
sleep(3) # <--- Added this
end
end
puts " found #{snapshot_list_to_consider.length} snaphots >
puts
snapshot_list_to_consider
end |
@rustam instead of putting sleep(3) the threads/file_queue loop, which will slow down every access even if the files have been downloaded, put it in the download_file function just before the else file exists. between line 295 and 296, means it will pause only when doing some real downloading. so you can run it on an existing directory and it will be very fast and pause whilst downloading only new files. wayback-machine-downloader/lib/wayback_machine_downloader.rb Lines 292 to 301 in 653b94b
should look like: semaphore.synchronize do
@processed_file_count += 1
puts "#{file_url} -> #{file_path} (#{@processed_file_count}/#{file_list_by_timestamp.size})"
end
sleep(3)
else
semaphore.synchronize do
@processed_file_count += 1
puts "#{file_url} # #{file_path} already exists. (#{@processed_file_count}/#{file_list_by_timestamp.size})"
end
end The only new line is the |
The problem is the download not even starts, I run the line with the command to download a page and after some seconds of the search the error appears. |
@lacertrader that's a different problem, seems specific to you. is the error the same? did you try different network connection? disable vpn, etc? might be worth starting a new issue. |
Best solution is to use the updated files in PR #280 |
No works anymore, all time I try (for any site) I receive the same error:
C:/Ruby31-x64/lib/ruby/3.1.0/net/http.rb:1018:in `initialize': Nenhuma conexÒo p¶de ser feita porque a mßquina de destino as recusou ativamente. - connect(2) for "web.archive.org" port 443 (Errno::ECONNREFUSED)
The text was updated successfully, but these errors were encountered: